Codementor Events

How to perform basic operations in PyTorch code

Published Oct 31, 2019

PyTorch is a tensor computation library that can be powered by GPUs. PyTorch is built with certain goals, which makes it different from all the other deep learning frameworks. Being a Python-first framework, PyTorch took a big leap over other frameworks that implemented a Python wrapper on a monolithic C++ or C engine.

Being a deep learning framework, PyTorch can be used for numerical computing as well. In this post we discuss the basic operations in PyTorch. We'll be using Python 3.7 and PyTorch 1.0 for all the programs.

This post is taken from PyTorch Deep Learning Hands-On by Packt Publishing from author Sherin Thomas with Sudhanshu Passi. This book features hands-on projects covering all the key deep learning methods built step-by-step in PyTorch.

Learning the basic operations
Let's start coding by importing torch into the namespace:

import torch

The fundamental data abstraction in PyTorch is a Tensor object, which is the alternative of ndarray in NumPy. You can create tensors in several ways in PyTorch.

uninitialized = torch.Tensor(3,2)
rand_initialized = torch.rand(3,2)
matrix_with_ones = torch.ones(3,2)
matrix_with_zeros = torch.zeros(3,2)

The rand method gives you a random matrix of a given size, while the Tensor function returns an uninitialized tensor. To create a tensor object from a Python list, you call torch.FloatTensor(python_list), which is analogous to np.array(python_list). FloatTensor is one among the several types that PyTorch supports. A list of the available types is given in the following table:
t1.JPG

With each release, PyTorch makes several changes to the API, such that all the possible APIs are similar to NumPy APIs. Shape was one of those changes introduced in the 0.2 release. Calling the shape attribute gives you the shape (size in PyTorch terminology) of the tensor, which can be accessible through the size function as well:

>>> size = rand_initialized.size()
>>> shape = rand_initialized.shape
>>> print(size == shape)
True

The shape object is inherited from Python tuples and hence all the possible operations on a tuple are possible on a shape object as well. As a nice side effect, the shape object is immutable.

>>> print(shape[0])
3
>>> print(shape[1])
2

Now, since you know what a tensor is and how one can be created, we'll start with the most basic math operations. Once you get acquainted with operations such as multiplication addition and matrix operations, everything else is just Lego blocks on top of that.
PyTorch tensor objects have overridden the numerical operations of Python and you are fine with the normal operators. Tensor-scalar operations are probably the simplest:

>>> x = torch.ones(3,2)
>>> x
tensor([[1., 1.],
     [1., 1.],
     [1., 1.]])
>>>
>>> y = torch.ones(3,2) + 2
>>> y
tensor([[3., 3.],
     [3., 3.],
     [3., 3.]])
>>>
>>> z = torch.ones(2,1)
>>> z
tensor([[1.],
      [1.]])
>>>
>>> x * y @ z
tensor([[6.],
     [6.],
     [6.]])

Variables x and y being 3 x 2 tensors, the Python multiplication operator does element-wise multiplication and gives a tensor of the same shape. This tensor and the z tensor of shape 2 x 1 is going through Python's matrix multiplication operator and spits out a 3 x 1 matrix.

You have several options for tensor-tensor operations, such as normal Python operators, as you have seen in the preceding example, in-place PyTorch functions, and out-place PyTorch functions.

>>> z = x.add(y) 
>>> print(z) 
tensor([[1.4059, 1.0023, 1.0358], 
             [0.9809, 0.3433, 1.7492]]) 
>>> z = x.add_(y) #in place addition. 
>>> print(z) 
tensor([[1.4059, 1.0023, 1.0358], 
            [0.9809, 0.3433, 1.7492]]) 
>>> print(x) 
tensor([[1.4059, 1.0023, 1.0358],
            [0.9809, 0.3433, 1.7492]]) 
>>> print(x == z) 
tensor([[1, 1, 1], 
            [1, 1, 1]], dtype=torch.uint8) 
>>> 
>>> 
>>> 
>>> x = torch.rand(2,3) 
>>> y = torch.rand(3,4) 
>>> x.matmul(y) 
tensor([[0.5594, 0.8875, 0.9234, 1.1294], 
            [0.7671, 1.7276, 1.5178, 1.7478]])

Two tensors of the same size can be added together by using the + operator or the add function to get an output tensor of the same shape. PyTorch follows the convention of having a trailing underscore for the same operation, but this happens in place. For example, a.add(b) gives you a new tensor with summation ran over a and b. This operation would not make any changes to the existing a and b tensors. But a.add_(b) updates tensor a with the summed value and returns the updated a. The same is applicable to all the operators in PyTorch.

Note
In-place operators follow the convention of the trailing underscore, like add_ and sub_.

Matrix multiplication can be done using the function matmul, while there are other functions like mm and Python's @ for the same purpose. Slicing, indexing, and joining are the next most important tasks you'll end up doing while coding up your network. PyTorch enables you to do all of them with basic Pythonic or NumPy syntax.

Indexing a tensor is like indexing a normal Python list. Indexing multiple dimensions can be done by recursively indexing each dimension. Indexing chooses the index from the first available dimension. Each dimension can be separated while indexing by using a comma. You can use this method when doing slicing. Start and end indices can be separated using a full colon. The transpose of a matrix can be accessed using the attribute t; every PyTorch tensor object has the attribute t.

Concatenation is another important operation that you need in your toolbox. PyTorch made the function cat for the same purpose. Two tensors of the same size on all the dimensions except one, if required, can be concatenated using cat. For example, a tensor of size 3 x 2 x 4 can be concatenated with another tensor of size 3 x 5 x 4 on the first dimension to get a tensor of size 3 x 7 x 4. The stack operation looks very similar to concatenation but it is an entirely different operation. If you want to add a new dimension to your tensor, stack is the way to go. Similar to cat, you can pass the axis where you want to add the new dimension. However, make sure all the dimensions of the two tensors are the same other than the attaching dimension.

split and chunk are similar operations for splitting your tensor. split accepts the size you want each output tensor to be. For example, if you are splitting a tensor of size 3 x 2 with size 1 in the 0th dimension, you'll get three tensors each of size 1 x 2. However, if you give 2 as the size on the zeroth dimension, you'll get a tensor of size 2 x 2 and another of size 1 x 2.

The squeeze function sometimes saves you hours of time. There are situations where you'll have tensors with one or more dimension size as 1. Sometimes, you don't need those extra dimensions in your tensor. That is where squeeze is going to help you. squeeze removes the dimension with value 1. For example, if you are dealing with sentences and you have a batch of 10 sentences with five words each, when you map that to a tensor object, you'll get a tensor of 10 x 5. Then you realize that you have to convert that to one-hot vectors for your neural network to process.

You add another dimension to your tensor with a one-hot encoded vector of size 100 (because you have 100 words in your vocabulary). Now you have a tensor object of size 10 x 5 x 100 and you are passing one word at a time from each batch and each sentence.

Now you must split and slice your sentence and most probably, you will end up having tensors of size 10 x 1 x 100 (one word from each batch of 10 with a 100-dimension vector). You can process it with a 10 x 100-dimension tensor, which makes your life much easier. Go ahead with squeeze to get a 10 x 100 tensor from a 10 x 1 x 100 tensor.

PyTorch has the anti-squeeze operation, called unsqueeze, which adds another fake dimension to your tensor object. Don't confuse unsqueeze with stack, which also adds another dimension. unsqueeze adds a fake dimension and it doesn't require another tensor to do so, but stack is adding another tensor of the same shape to another dimension of your reference tensor.
T2.JPG
Figure 1.18: Pictorial representation of concatenation, stack, squeeze, and unsqueeze

PyTorch comes with tons of other important operations, which you will definitely find useful as you start building the network.

The internals of PyTorch
One of the core philosophies of PyTorch, which came about with the evolution of PyTorch itself, is interoperability. The development team invested a lot of time into enabling interoperability between different frameworks, such as ONNX, DLPack, and so on. Examples of these will be shown in later chapters, but here we will discuss how the internals of PyTorch are designed to accommodate this requirement without compromising on speed.

A normal Python data structure is a single-layered memory object that can save data and metadata. But PyTorch data structures are designed in layers, which makes the framework not only interoperable but also memory efficient. The computationally intensive portion of the PyTorch core has been migrated to the C/C++ backend through the ATen and Caffe2 libraries, instead of keeping this in Python itself, in favor of speed improvement.

Even though PyTorch has been created as a research framework, it has been converted to a research-oriented but production-ready framework. The trade-offs that came along with multi-use case requirements have been handled by introducing two execution types.

The custom data structure designed in the C/C++ backend has been divided into different layers. For simplicity, we'll be omitting CUDA data structures and focusing on simple CPU data structures. The main user-facing data structure in PyTorch is a THTensor object, which holds the information about dimension, offset, stride, and so on. However, another main piece of information THTensor stores is the pointer towards the THStorage object, which is an internal layer of the tensor object kept for storage.

x = torch.rand(2,3,4)
x_with_2n3_dimension = x[1, :, :]
scalar_x = x[1,1,1]     # first value from each dimension

# numpy like slicing
x = torch.rand(2,3)
print(x[:, 1:])        # skipping first column
print(x[:-1, :])       # skipping last row

# transpose
x = torch.rand(2,3)
print(x.t())           # size 3x2

# concatenation and stacking
x = torch.rand(2,3)
concat = torch.cat((x,x))
print(concat)         # Concatenates 2 tensors on zeroth dimension

x = torch.rand(2,3)
concat = torch.cat((x,x), dim=1)
print(concat)         # Concatenates 2 tensors on first dimension

x = torch.rand(2,3)
stacked = torch.stack((x,x), dim=0)
print(stacked)        # returns 2x2x3 tensor

# split: you can use chunk as well
x = torch.rand(2,3)
splitted = x.split(split_size=2, dim=0)
print(splitted)       # 2 tensors of 2x2 and 1x2 size

#sqeeze and unsqueeze
x = torch.rand(3,2,1) # a tensor of size 3x2x1
squeezed = x.squeeze()
print(squeezed)       # remove the 1 sized dimension

x = torch.rand(3)
with_fake_dimension = x.unsqueeze(0)
print(with_fake_dimension)        # added a fake zeroth dimension

Fig1.19.JPG
Figure 1.19: THTensor to THStorage to raw data

As you may have assumed, the THStorage layer is not a smart data structure and it doesn't really know the metadata of our tensor. The THStorage layer is responsible for keeping the pointer towards the raw data and the allocator. The allocator is another topic entirely, and there are different allocators for CPU, GPU, shared memory, and so on. The pointer from THStorage that points to the raw data is the key to interoperability. The raw data is where the actual data is stored but without any structure. This three-layered representation of each tensor object makes the implementation of PyTorch memory-efficient. Following are some examples.

Variable x is created as a tensor of size 2 x 2 filled with 1s. Then we create another variable, xv, which is another view of the same tensor, x. We flatten the 2 x 2 tensor to a single dimension tensor of size 4. We also make a NumPy array by calling the .NumPy() method and storing that in the variable xn:

>>> import torch
>>> import numpy as np >>> x = torch.ones(2,2)
>>> xv = x.view(-1)
>>> xn = x.numpy()
>>> x
tensor([[1., 1.],[1., 1.]])
>>> xv
tensor([1., 1., 1., 1.])
>>> xn
array([[1. 1.],[1. 1.]], dtype=float32)

PyTorch provides several APIs to check internal information and storage() is one among them. The storage() method returns the storage object (THStorage), which is the second layer in the PyTorch data structure depicted previously. The storage object of both x and xv is shown as follows. Even though the view (dimension) of both tensors is different, the storage shows the same dimension, which proves that THTensor stores the information about dimensions, but the storage layer is a dump layer that just points the user to the raw data object. To confirm this, we use another API available in the THStorage object, which is data_ptr. This points us to the raw data object. Equating data_ptr of both x and xv proves that both are the same:

>>> x.storage()
1.0
1.0
1.0
1.0
[torch.FloatStorage of size 4]
>>> xv.storage()
1.0
1.0
1.0
1.0
[torch.FloatStorage of size 4]
>>> x.storage().data_ptr() == xv.storage().data_ptr()
True

Next, we change the first value in the tensor, which is at the indices 0, 0 to 20. Variables x and xv have a different THTensor layer, since the dimension has been changed but the actual raw data is the same for both of them, which makes it really easy and memory-efficient to create n number of views of the same tensor for different purposes.

Even the NumPy array, xn, shares the same raw data object with other variables, and hence the change of value in one tensor reflects a change of the same value in all other tensors that point to the same raw data object. DLPack is an extension of this idea, which makes communication between different frameworks easy in the same program.

>>> x[0,0]=20
>>> x
tensor([[20.,  1.],[ 1.,  1.]])
>>> xv
tensor([20.,  1.,  1.,  1.])
>>> xn
array([[20.,  1.],[ 1.,  1.]], dtype=float32)

In this post, we covered the internals of the most important thing in PyTorch: The Torch tensor. We also discussed the basic operations in PyTorch. If you found this post useful, check out PyTorch Deep Learning Hands-On by Packt Publishing for more information on implementing key deep learning methods in PyTorch.
Start writing here...

Discover and read more posts from PACKT
get started