 In this short practical, we will go over the internals on the most common layers of a convolutional neural network. We will start with looking at the spatial convolution layer, where we can find the padding, stride, channel and kernel size parameters. Moreover, we will get acquainted with the 4D kernels that are called weight and the 3D feature maps, which are called output. The same we will do for the nn-relu, the rectifying linear unit non-linearity. And we will see how the 3D feature maps look like. Finally, we will see the nn-spatial max pooling, where we again see the padding, stride, channels and kernel size parameters. And then we will visualize the 3D feature maps. Let's see now, in greater detail, the last neural network architecture we have presented in the previous lecture. We can start Qlua and require the end display package. Then I can set torch.manual seed to 0, so that all the values will be random but still will be all the same for every trial. Then I can require my three convolutional pooling architecture we have defined in the previous lecture. And finally, I require the image package that will allow us to see some internals of these networks. Our input x will be the classical Lena, which we will rescale to 256, 256. So we can print x. And we have three channels, RGB, 256 pixels in height and 256 pixels in width. We can visualize this image with image display, image equal x and legend equal my input x to the network. We can have a look and here she is. Now we can ask to print the size, for example, of when I forward to the network our input x. We fixed 1000 classes. I will display the network in half-speed screen, so that we can keep an eye on the layers. So I can do require 3t and n, net equal require p, com, pool. So if I print just net, it's going to give us the whole network definition. And let's say if I'd like to get the first module, I can do net, get first module. And here we have the spatial convolution. And if you'd like to see the internals, we can use the curly brackets operator. And we see what's inside. But I will use this lower screen just for displaying the network architecture. So we can go here. For example, we can display the first module. So what we see is that we have 3 input planes because we get the RGB input. We get 6 output planes because that's where we decided to have the first convolution shooting at. Then we have a stride of 2 in height and a stride of 2 in width as we have set. And then if we check the kernel size, kh and kw, there are 5 and also 5 as we set. We see that the kernel is four dimensional. We have 6 as m1, 3 is the n1 as the input, and 5 and 5 are p1 and p2 respectively. Moreover, the grad wave will have the same dimensionality, of course. And we can also check that the dimension of the output is 3D tensor. So y is 3D tensor of 6 maps, m1 of height m2, 128 times m3, 128. Moreover, we can see the bias there, which is a singular vector. It's a 1D tensor of 6 elements, which is 1 for each of the 6 maps of the output y. Grad bias will have the same dimensionality. Let's display now these kernels. This network has not been trained, so the kernels will be completely random. We will see later how the kernels of trained network will look like. So to visualize the kernels, we can simply do image.display. My image is going to be equal net get1.wait. Then we can say legend equal k1, so the kernel of the first convolution. We also apply a zoom of 18, otherwise we don't see anything. Otherwise, they are 5 pixels by 5 pixels. We also use some padding to distinguish one kernel to the other. Here we can see the 6 kernels of 5 pixels by 5 pixels of the first convolutional layer. In order to display the other layer's output, I will write a helper function, which is going to shorten the typing. So I can write function show my layer l and the text t. So we'd like to print the layer l. And then we are going to have image.display. Image is equal to l.output. And then we have legend equal my text t. And then we have scale each equal true, which is going to basically set a dynamic range for visualization for each of these separate maps. And end. So now we can show net get1. And the first output of my first convolution I will call y of the first convolution. Let's see how it looks like. And here it is. So this is the output of the first convolutional layer. Each of these maps are being produced by convolving these 5 kernels with the input image RGB here. Let's get back here. Let's show now number 2. So let's check what's number 2. We have net get2. And text is going to be y because it's available. This is going to be y1 with a plus. We can send enter. And we can see here that the dimensionality of the output is the same. So it's 6 times 128 times 128. And if we check the result. We have that the maps are simply being 0 when the values were negative. So we can see a greater deal of black because all negative values are now 0. Before the black was the center value since the values in the first output of the convolution can range from the negative to the positive range. Now everything has been clipped to 0. Whatever was below 0. Let's keep going. So let's show now net get3. What is 3? We can check here. Net3 is the next convolution. So this one is going to be y2. And we see that the input planes are 6. In this case not 3 anymore. Output planes are also 6. Therefore the weight, the kernels are 6. Which is the m1. 6 are m1. 5, 5 are the height and width. We can check now the result. And this is the output of the second convolutional layer. Let's go back here. So after the number 3 we are going to have number 4. Which is a red one. So we simply have a plus here. And if we check we are going to have this one. Again all the values below 0 have been 0. And the final point is going to be number 5. Which you can see it's a pooling layer. So this is going to be a pool of the y2 plus. We can see now that the striped dh and dw are 2. And the kernel of the pooling running window are also 2 and 2. Therefore the output has dimensionality still 6. The same number of channels. But it's 64 and 64. Whereas the input or the output of the previous layer was 6 times 128 and 128. We can check the result. And here we have pool of y2 plus. Let's finish with the last 3 layers. So we have the last convolution. So it's going to be layer 6. And this is going to be our third convolution. It's basically identical of the second convolution. We have input plane 6, output plane 6. A striped of 1, 1. And the kernel size is 5 and 5. Padding is 2 and 2. So we preserve dimensionality. The input was 6 times 64 and 64. And now the output is going to be also 6 times 64 and 64. And if we check the result, here we have the y3. We keep going. So after the convolutional layer, the number 6, we are going to have number 7. Which is our non-linearity. And we can check the result here. y3 plus. We have that again. The maps with values below 0 have been 0ed. And only the positive feature has been sent forward. And finally, we have layer number 8. Which is going to be the pool of the output of the 3. And here we see that the output is going to be 6 times 32 and 32. Whereas the input was 6 times 64 and 64. And if we check how it looks like. And here we have the tiny small last map. The last two steps would simply have those little pixels just reshaping to one big long vector. So we can actually print net get number 9. And we see here that we have just one big vector of 6144 elements. And the last one. We have the last linear layer. Which has, of course, a matrix of 1,000 in height and 6,144 in width. And, well, it would be 6,145 if we would consider also the bias together with the matrix. And that's pretty much it. In the next tutorial we will see more advanced architectures and the underlying principles on which they are based.