Getting started with the code
Let's get our hands dirty with some code. If you have used NumPy before, you are at home here. Don't worry if you haven't; PyTorch is made for making the beginner's life easy.
Being a deep learning framework, PyTorch can be used for numerical computing as well. Here we discuss the basic operations in PyTorch. The basic PyTorch operations in this chapter will make your life easier in the next chapter, where we will try to build an actual neural network for a simple use case. We'll be using Python 3.7 and PyTorch 1.0 for all the programs in the book. The GitHub repository is also built with the same configuration: PyTorch from PyPI instead of Conda, although it is the recommended package manager by the PyTorch team.
Learning the basic operations
Let's start coding by importing torch
into the namespace:
import torch
The fundamental data abstraction in PyTorch is a Tensor
object, which is the alternative of ndarray
in NumPy. You can create tensors in several ways in PyTorch. We'll discuss some of the basic approaches here and you will see all of them in the upcoming chapters while building the applications:
uninitialized = torch.Tensor(3,2) rand_initialized = torch.rand(3,2) matrix_with_ones = torch.ones(3,2) matrix_with_zeros = torch.zeros(3,2)
The rand
method gives you a random matrix of a given size, while the Tensor
function returns an uninitialized tensor. To create a tensor object from a Python list, you call torch.FloatTensor(python_list)
, which is analogous to np.array(python_list)
. FloatTensor
is one among the several types that PyTorch supports. A list of the available types is given in the following table:
Data type |
CPU tensor |
GPU tensor |
---|---|---|
32-bit floating point |
|
|
64-bit floating point |
|
|
16-bit floating point |
|
|
8-bit integer (unsigned) |
|
|
8-bit integer (signed) |
|
|
16-bit integer (signed) |
|
|
32-bit integer (signed) |
|
|
64-bit integer (signed) |
|
|
Table 1.1: DataTypes supported by PyTorch. Source: http://pytorch.org/docs/master/tensors.html
With each release, PyTorch makes several changes to the API, such that all the possible APIs are similar to NumPy APIs. Shape was one of those changes introduced in the 0.2 release. Calling the shape
attribute gives you the shape (size in PyTorch terminology) of the tensor, which can be accessible through the size
function as well:
>>> size = rand_initialized.size() >>> shape = rand_initialized.shape >>> print(size == shape) True
The shape
object is inherited from Python tuples and hence all the possible operations on a tuple are possible on a shape
object as well. As a nice side effect, the shape
object is immutable.
>>> print(shape[0]) 3 >>> print(shape[1]) 2
Now, since you know what a tensor is and how one can be created, we'll start with the most basic math operations. Once you get acquainted with operations such as multiplication addition and matrix operations, everything else is just Lego blocks on top of that.
PyTorch tensor objects have overridden the numerical operations of Python and you are fine with the normal operators. Tensor-scalar operations are probably the simplest:
>>> x = torch.ones(3,2)
>>> x
tensor([[1., 1.],
[1., 1.],
[1., 1.]])
>>>
>>> y = torch.ones(3,2) + 2
>>> y
tensor([[3., 3.],
[3., 3.],
[3., 3.]])
>>>
>>> z = torch.ones(2,1)
>>> z
tensor([[1.],
[1.]])
>>>
>>> x * y @ z
tensor([[6.],
[6.],
[6.]])
Variables x
and y
being 3 x 2 tensors, the Python multiplication operator does element-wise multiplication and gives a tensor of the same shape. This tensor and the z
tensor of shape 2 x 1 is going through Python's matrix multiplication operator and spits out a 3 x 1 matrix.
You have several options for tensor-tensor operations, such as normal Python operators, as you have seen in the preceding example, in-place PyTorch functions, and out-place PyTorch functions.
>>> z = x.add(y)
>>> print(z)
tensor([[1.4059, 1.0023, 1.0358],
[0.9809, 0.3433, 1.7492]])
>>> z = x.add_(y) #in place addition.
>>> print(z)
tensor([[1.4059, 1.0023, 1.0358],
[0.9809, 0.3433, 1.7492]])
>>> print(x)
tensor([[1.4059, 1.0023, 1.0358],
[0.9809, 0.3433, 1.7492]])
>>> print(x == z)
tensor([[1, 1, 1],
[1, 1, 1]], dtype=torch.uint8)
>>>
>>>
>>>
>>> x = torch.rand(2,3)
>>> y = torch.rand(3,4)
>>> x.matmul(y)
tensor([[0.5594, 0.8875, 0.9234, 1.1294],
[0.7671, 1.7276, 1.5178, 1.7478]])
Two tensors of the same size can be added together by using the +
operator or the add
function to get an output tensor of the same shape. PyTorch follows the convention of having a trailing underscore for the same operation, but this happens in place. For example, a.add(b)
gives you a new tensor with summation ran over a
and b
. This operation would not make any changes to the existing a
and b
tensors. But a.add_(b)
updates tensor a
with the summed value and returns the updated a
. The same is applicable to all the operators in PyTorch.
Note
In-place operators follow the convention of the trailing underscore, like add_
and sub_
.
Matrix multiplication can be done using the function matmul
, while there are other functions like mm
and Python's @
for the same purpose. Slicing, indexing, and joining are the next most important tasks you'll end up doing while coding up your network. PyTorch enables you to do all of them with basic Pythonic or NumPy syntax.
Indexing a tensor is like indexing a normal Python list. Indexing multiple dimensions can be done by recursively indexing each dimension. Indexing chooses the index from the first available dimension. Each dimension can be separated while indexing by using a comma. You can use this method when doing slicing. Start and end indices can be separated using a full colon. The transpose of a matrix can be accessed using the attribute t
; every PyTorch tensor object has the attribute t
.
Concatenation is another important operation that you need in your toolbox. PyTorch made the function cat
for the same purpose. Two tensors of the same size on all the dimensions except one, if required, can be concatenated using cat
. For example, a tensor of size 3 x 2 x 4 can be concatenated with another tensor of size 3 x 5 x 4 on the first dimension to get a tensor of size 3 x 7 x 4. The stack
operation looks very similar to concatenation but it is an entirely different operation. If you want to add a new dimension to your tensor, stack
is the way to go. Similar to cat
, you can pass the axis where you want to add the new dimension. However, make sure all the dimensions of the two tensors are the same other than the attaching dimension.
split
and chunk
are similar operations for splitting your tensor. split
accepts the size you want each output tensor to be. For example, if you are splitting a tensor of size 3 x 2 with size 1 in the 0th dimension, you'll get three tensors each of size 1 x 2. However, if you give 2 as the size on the zeroth dimension, you'll get a tensor of size 2 x 2 and another of size 1 x 2.
The squeeze
function sometimes saves you hours of time. There are situations where you'll have tensors with one or more dimension size as 1. Sometimes, you don't need those extra dimensions in your tensor. That is where squeeze
is going to help you. squeeze
removes the dimension with value 1. For example, if you are dealing with sentences and you have a batch of 10 sentences with five words each, when you map that to a tensor object, you'll get a tensor of 10 x 5. Then you realize that you have to convert that to one-hot vectors for your neural network to process.
You add another dimension to your tensor with a one-hot encoded vector of size 100 (because you have 100 words in your vocabulary). Now you have a tensor object of size 10 x 5 x 100 and you are passing one word at a time from each batch and each sentence.
Now you have to split and slice your sentence and most probably, you will end up having tensors of size 10 x 1 x 100 (one word from each batch of 10 with a 100-dimension vector). You can process it with a 10 x 100-dimension tensor, which makes your life much easier. Go ahead with squeeze
to get a 10 x 100 tensor from a 10 x 1 x 100 tensor.
PyTorch has the anti-squeeze operation, called unsqueeze
, which adds another fake dimension to your tensor object. Don't confuse unsqueeze
with stack
, which also adds another dimension. unsqueeze
adds a fake dimension and it doesn't require another tensor to do so, but stack
is adding another tensor of the same shape to another dimension of your reference tensor.
If you are comfortable with all these basic operations, you can proceed to the second chapter and start the coding session right now. PyTorch comes with tons of other important operations, which you will definitely find useful as you start building the network. We will see most of them in the upcoming chapters, but if you want to learn that first, head to the PyTorch website and check out its tensor tutorial page, which describes all the operations that a tensor object can do.
The internals of PyTorch
One of the core philosophies of PyTorch, which came about with the evolution of PyTorch itself, is interoperability. The development team invested a lot of time into enabling interoperability between different frameworks, such as ONNX, DLPack, and so on. Examples of these will be shown in later chapters, but here we will discuss how the internals of PyTorch are designed to accommodate this requirement without compromising on speed.
A normal Python data structure is a single-layered memory object that can save data and metadata. But PyTorch data structures are designed in layers, which makes the framework not only interoperable but also memory-efficient. The computationally intensive portion of the PyTorch core has been migrated to the C/C++ backend through the ATen and Caffe2 libraries, instead of keeping this in Python itself, in favor of speed improvement.
Even though PyTorch has been created as a research framework, it has been converted to a research-oriented but production-ready framework. The trade-offs that came along with multi-use case requirements have been handled by introducing two execution types. We'll see more about this in Chapter 8, PyTorch to Production, where we discuss how to move PyTorch to production.
The custom data structure designed in the C/C++ backend has been divided into different layers. For simplicity, we'll be omitting CUDA data structures and focusing on simple CPU data structures. The main user-facing data structure in PyTorch is a THTensor
object, which holds the information about dimension, offset, stride, and so on. However, another main piece of information THTensor
stores is the pointer towards the THStorage
object, which is an internal layer of the tensor object kept for storage.
x = torch.rand(2,3,4) x_with_2n3_dimension = x[1, :, :] scalar_x = x[1,1,1] # first value from each dimension # numpy like slicing x = torch.rand(2,3) print(x[:, 1:]) # skipping first column print(x[:-1, :]) # skipping last row # transpose x = torch.rand(2,3) print(x.t()) # size 3x2 # concatenation and stacking x = torch.rand(2,3) concat = torch.cat((x,x)) print(concat) # Concatenates 2 tensors on zeroth dimension x = torch.rand(2,3) concat = torch.cat((x,x), dim=1) print(concat) # Concatenates 2 tensors on first dimension x = torch.rand(2,3) stacked = torch.stack((x,x), dim=0) print(stacked) # returns 2x2x3 tensor # split: you can use chunk as well x = torch.rand(2,3) splitted = x.split(split_size=2, dim=0) print(splitted) # 2 tensors of 2x2 and 1x2 size #sqeeze and unsqueeze x = torch.rand(3,2,1) # a tensor of size 3x2x1 squeezed = x.squeeze() print(squeezed) # remove the 1 sized dimension x = torch.rand(3) with_fake_dimension = x.unsqueeze(0) print(with_fake_dimension) # added a fake zeroth dimension
As you may have assumed, the THStorage layer is not a smart data structure and it doesn't really know the metadata of our tensor. The THStorage layer is responsible for keeping the pointer towards the raw data and the allocator. The allocator is another topic entirely, and there are different allocators for CPU, GPU, shared memory, and so on. The pointer from THStorage that points to the raw data is the key to interoperability. The raw data is where the actual data is stored but without any structure. This three-layered representation of each tensor object makes the implementation of PyTorch memory-efficient. Following are some examples.
Variable x
is created as a tensor of size 2 x 2 filled with 1s. Then we create another variable, xv
, which is another view of the same tensor, x
. We flatten the 2 x 2 tensor to a single dimension tensor of size 4. We also make a NumPy array by calling the .NumPy()
method and storing that in the variable xn
:
>>> import torch >>> import numpy as np >>> x = torch.ones(2,2) >>> xv = x.view(-1) >>> xn = x.numpy() >>> x tensor([[1., 1.],[1., 1.]]) >>> xv tensor([1., 1., 1., 1.]) >>> xn array([[1. 1.],[1. 1.]], dtype=float32)
PyTorch provides several APIs to check internal information and storage()
is one among them. The storage()
method returns the storage object (THStorage
), which is the second layer in the PyTorch data structure depicted previously. The storage object of both x
and xv
is shown as follows. Even though the view (dimension) of both tensors is different, the storage shows the same dimension, which proves that THTensor
stores the information about dimensions but the storage layer is a dump layer that just points the user to the raw data object. To confirm this, we use another API available in the THStorage
object, which is data_ptr
. This points us to the raw data object. Equating data_ptr
of both x
and xv
proves that both are the same:
>>> x.storage() 1.0 1.0 1.0 1.0 [torch.FloatStorage of size 4] >>> xv.storage() 1.0 1.0 1.0 1.0 [torch.FloatStorage of size 4] >>> x.storage().data_ptr() == xv.storage().data_ptr() True
Next, we change the first value in the tensor, which is at the indices 0, 0 to 20. Variables x
and xv
have a different THTensor
layer, since the dimension has been changed but the actual raw data is the same for both of them, which makes it really easy and memory-efficient to create n number of views of the same tensor for different purposes.
Even the NumPy array, xn
, shares the same raw data object with other variables, and hence the change of value in one tensor reflects a change of the same value in all other tensors that point to the same raw data object. DLPack is an extension of this idea, which makes communication between different frameworks easy in the same program.
>>> x[0,0]=20 >>> x tensor([[20., 1.],[ 1., 1.]]) >>> xv tensor([20., 1., 1., 1.]) >>> xn array([[20., 1.],[ 1., 1.]], dtype=float32)
Summary
In this chapter, we learned about the history of PyTorch, and the pros and cons of a dynamic graph library over a static one. We also glanced over the different architectures and models that people have come up with to solve complicated problems in all kinds of areas. We covered the internals of the most important thing in PyTorch: the Torch tensor. The concept of a tensor is fundamental to deep learning and will be common to all deep learning frameworks you use.
In the next chapter, we'll take a more hands-on approach and will be implementing a simple neural network in PyTorch.
References
- Ronan Collobert, Koray Kavukcuoglu, and Clement Farabet, Torch7: A Matlab-like Environment for Machine Learning (https://pdfs.semanticscholar.org/3449/b65008b27f6e60a73d80c1fd990f0481126b.pdf?_ga=2.194076141.1591086632.1553663514-2047335409.1553576371)
- PyTorch's home page: https://pytorch.org/
- Optimizing Memory Consumption in Deep Learning (https://mxnet.incubator.apache.org/versions/master/architecture/note_memory.html)