 Namaste. In this course so far we have reviewed basic concepts in machine learning and deep learning. This module focuses on mathematical foundations of deep learning and demonstrate these concepts through Python code. Let us begin. The first neural network that we built for MNIST handwritten digit recognition we started from data stored in multidimensional arrays called tensors. Tensors is a container for data where we store almost always numerical data. Tensors are a generalization of matrices to an arbitrary number of dimensions which are also called as axis or rank. A tensor is defined by 3 attributes, number of axis or ranks, shape and data type. We can obtain the number of tensor dimensions through ending function on tensor. Its shape can be obtained by shape function and data type can be obtained by d type function. Let us look at different type of tensors. For each tensor we will print the number of dimensions, shape and data types. We will import numpy library for some of the tensor operations. Let us first look at the simplest of the tensor which is zero dimensional tensor. This tensor contains only one number. They are also called as scalar or scalar tensor or zero dimensional tensor. Let us take an example. Scalar tensor can be defined as follows. We simply write np dot array and we give a number. Let us execute this code. You can see that this is a scalar tensor. Let us look at the shape of this tensor. You can see that since this is a scalar tensor, the shape is empty. We expect the scalar tensor to have zero dimensions and the data type in this tensor is a 64 bit integer. Let us look at the next type of tensor which is 1D tensor. It is also called as vectors. Let us define a 1D tensor. We define 1D tensor with np dot array function where we give a vector as an argument. We have a list or a vector here containing 4 elements. Let us execute this and look at the shape of this tensor. You can see that the shape of the tensor is 4 comma. So, this tensor has 4 elements in it. It is also called as a 4D vector. Let us look at the number of dimensions. Since this is vector, it has got one dimension. Let us look at the 2D tensors which are also called as matrix. We can define matrix as a vector of vectors. Here we have matrix which has got 3 rows and 3 columns. The entry from the first axis are called as rows. Here 1, 2 and 3 constitutes the first row of the matrix X. The entries from the second axis are called the columns. So, 1, 1, 4 and 7 is the first column of the matrix. Let us look at the shape of the matrix. It is 3 comma 3. It is a 2D matrix. Let us look at the number of dimensions. Number of dimensions are 2 and it holds 64 bit integers as elements of the tensor. Let us look at the 3D tensors. We can pack matrices in an array to get 3D tensors. Similarly, we can pack 3D tensors in a new array to get 4D tensors and so on. We can pack tensors of n minus 1 dimension in a new array to get a tensor of n dimension. Let us look at the 3D tensor. You can see that we have taken bunch of matrix. We have taken the matrix that we defined in the 2D tensor and we have copied it 3 times. So, we have packed the matrix in the array of matrix. So, this becomes a 3D tensor for us. Let us look at its attributes. First the shape. It has got the shape 3 comma 3 comma 3. There are 3 entries or there are 3 matrices and each matrix is of shape 3 comma 3. So, that is how we have 3 comma 3 comma 3. The number of dimensions are obviously 3 because it is a 3D tensor and each element holds 64 bit integer as its data type. Now that we have seen some examples of the tensor, let us look at the tensors from the MNIST dataset for which we built the model. In order to build, in order to load the MNIST dataset, let us first install tensorflow 2.0. Let us load the MNIST dataset. As we must be remembering, the load data command or load data function loads the MNIST dataset as two tuples, one corresponding to training and the second corresponding to the test. X underscore train and Y underscore train are features and labels of the training set whereas, X underscore text test and Y underscore test are features and labels of the test examples. Let us look at the attributes of training data tensor. X underscore train is a training data tensor. So, we look at the number of axes or number of dimensions of X underscore train, X underscore train tensor. You also look at its shape and the data type of each of the elements stored in that. You can see that the training tensor has three dimensions. Its shape is 60000 comma 28 comma 28. As you might recall that this is an array of 60000 matrices each of 28 by 28 size and each element in the tensor holds 8 bit integer between 0 to 255. Let us look at the attributes of training label tensor. So, training label tensor is a vector of 1D array of labels containing 60000 entries and each element is a 8 bit integer here. Each element of the tensor represents the class of the image. There is a question for you now, can you find out the number of dimensions, shape and data type of X underscore test and Y underscore test which are tensors corresponding to the test data. Now, that we have understood basics of tensors. Let us look at how to select a specific element in a tensor. The process of selecting a specific element in a tensor is called tensor slicing. Let us select a single data point from a tensor. So, we can let us select the first data data point from X underscore train tensor to be specific. So, that can be selected simply by mentioning the name of the tensor and putting 0 as the index. So, this particular statement returns the first image that is stored in the X underscore train tensor. Let us see what it returns. We know that this particular image is represented in 28 cross 28 matrix each containing a value between 0 to 255. So, you can see that lots of elements are 0 and there are some elements which have got non-zero values up to 255. So, plt comma IM show function is used to display an image. So, we will use we will pass the tensor to display underscore image that we have defined here that converts the tensor into an image. Let us look at the image corresponding to this. So, you can see that the first example in the training tensor is an image corresponding to handwritten digit 5. Now, that you have understood how to access an individual element from a tensor there here is a question for you can you write can you write a code to look at 11 data point in the training set and display the number with the help of IM show command. So, we can simply access 11th element by specifying index equal to 10 and we can. So, because the index starts at 0 I is equal to 10 represent 11th image in the training set and it will give us the tensor representation as well as image representation. Now, that we have understood how to select a single data point. We are often required to select multiple data points from the tensor. Let us see how to do that. So, we use a colon operator to select the entire axis or alternatively we can specify the start and end of the tensor. If we specify i colon j we can select the tensor or we can slice the tensor from ith position to j minus 1 position. Note that the jth position is not included in the slice. Let us see a concrete example of selecting multiple data points. So, let us try to select data points from 10 to 100. Note that 100 will not be included in this. So, we can simply put 10 colon 100 as the selection criteria and slice the tensor. Let us look at the attribute of this tensor slice. So, you can see that we get the same tensor we get the tensor of the same dimension as the original tensor. So, x underscore train was a three dimensional tensor we got the slice also as a three dimensional tensor. The shape of the tensor is different from its original tensor since here we are only selecting 19 elements we can see that we have 90 samples each with each having 28 plus 28 matrix in them and each of the element in the tensor is a 8 bit integer. There are a few more equivalent ways of doing the same thing. Since we have a 3D tensor we can also specify explicitly the remaining two axes using a colon command. It will result in the same selection. You can compare that the data slice or the tensor slice that we obtained from the earlier method and this particular method has exactly the same result. Let us look at one more alternative to the selection. Now instead of selecting these axes using the colon command we will specify the start and end point. We know that both these axes have 28 elements each. So, we specify or we ask or we ask to slice the axis completely or get all the elements in the axis using the start and and the end which is matching the length of the axis. So, you can once we run the code you will realize that it also results in the same thing where we get a tensor where we get a 3D tensor having 90 examples or having 90 examples containing 28 plus 28 matrix and each tensor element is a 8 bit integer. So, this gives you multiple ways of achieving the same thing. Now here is a question for you can you write the code to select the bottom right patch of 14 by 14 from the training image. So, here is the answer. So, you can simply get a bottom right part by specifying by by adding a condition or by specifying the slice as follows. We want to select the first axes as it is, but we want only 14 by 14 patch from bottom right. So, we specify 14 colon and 14 colon. So, everything after the 15th element up to the last will be selected in both the cases. So, let us look at the attributes of the data slice tensor. So, you can see that we have a 3D tensor having 60,000 images with 14 cross 14 patch on the bottom right and each of this tensor contain integers of 8 bit integers. Can you write the code to crop images to crop images to patches of 14 cross 14 pixels centered in the middle? Let us see how to do that. Since we want the patch in the middle we go from seventh element up to all the elements except last seven elements in both the cases in both the axes here and let us see what it results in. We essentially get a 14 cross 14 patch for all the images which is centered in the middle. Since we consume data in batches during training let us understand how data batches how data batch tensors look like. We usually break the data into small batches and process those batches. If we have the complete data the first axis in all data tensor is a sample axis or sample dimension. The first axis on the other hand for batch tensor is called batch axis or batch dimension. Let us look at the first batch of 128 examples. Here we have a batch of size 128. So, we simply specify 128 batch with this particular slicing criteria and then we select the next 128 example with this particular slicing criteria where we specify the start and the end position. Let us look at the attribute of these data slices for both the batches. You can see that both the batches have 3D tensors having exactly the same shape. Each batches contain 128 examples. Each example is represented in a metric form of 28 cross 28 and each data type of tensor is an 8-bit integer. Here is a question for you. Can you write the code to get nth batch? You can later expand the cell and see your solution once you come up with your own solution. Let us look at some of the real life examples of data tensors. So, in real life while building machine learning models we often come across tensors of dimensions between 2 and 5. 2D tensors are most commonly appearing tensors in machine learning. 2D tensors have shape samples comma features. So, we have essentially each sample represented with bunch of features. Let us take a couple of examples. One is representing text document. Let us say we have a set of k documents each represented with m features. So, this particular set of text document can be represented by k comma m tensor because there are k samples in this data set and each sample is represented with m features. The other example could be a fuel efficiency data set that contains a set of automobiles and their feature. So, the defining feature of the 2D tensor is that there are samples and each sample has a list of features. Then we have 3D tensors that we often encounter in time series data set or in sequence data sets. Let us take an example of a time series data set where we want to store the stock prices. The 3D tensors have 3 axis, the first axis is the sample axis, the second axis is is the time is the time step axis and the third axis is the feature axis. What happens is in a time series data we have a sample across multiple time steps. Specifically for each time step we have a list of features. Let us say in stock price data set we store the features associated with the stock at every minute. The features could be current price, the highest and lowest price in the past minute or some such kind of features. Every minute is encoded as a md vector where m is the number of features. An entire day of trading is encoded as a 2D tensor of shape 390 comma m where 390 minutes where there are 390 minutes in a trading day. In order to store data for 250 days it can be stored in a 3D tensor of shape 250 corresponding to 250 days and for each day there are 390 data points each containing m features. Let us see how we can use 3D tensor to store data set of tweets. Here we encode each tweet as a sequence of 280 characters. At each position one of the 128 unique characters are possible. Thus each character can be encoded as a one hot encoding with 128 length vector. Each tweet can be encoded as a 2D tensor of shape 280 comma 128. The data set of 1 million tweets can be stored in a 3D tensor having shape 1 million comma 280 comma 128. In each of the 1 million tweets we have 280 characters and in each of the 280 character is represented as a 128 dimensional feature vector. And these 128 are all the possible characters that is why we are using one hot encoding to represent every position in the tweet. Let us look at how do we store the video data. Images are stored as 4D tensors where there are where there are 4 channels. Images are stored at as images are stored as 4D tensors where the first axis is samples where which is dedicated to every image. So, we have sample and for every sample if we have a color image we have channels and height and width. There is a channel corresponding to red, green and blue in the colored image. In gray scale image there is a single channel and for every channel we have height and width that is defined which is the resolution of the image and at every cell we store the value of the illumination of the pixel. Video data is an extension of the image data where in addition to the image data which is channels height and width we store the image information at the frame level. And each of the video has multiple frames and in each frame we store and in each frame we store the image information. Each frame in the video is encoded as a 3D tensor. Now, there is a question for you. Can you reduce the shape of a tensor to store 4 videos where each video is a 60 second clip of size 128 by 256 sampled at 4 frames a second.