 So, welcome back to this series on deep neural networks for domain experts. And we finished the first section where we just looked at the multi-late perceptron or densely connected neural networks, and we're going to move on now to the very exciting world of convolutional neural networks. And when we talk convolutional neural networks, we're really thinking artificial intelligence, machine learning on images. It really is a type of network architecture that lends itself very well for learning to recognize images. So let's just have a quick introduction as to how we're going to go. The problem that we obviously have with densely connected neural networks, if we think of an image and we think of an image being represented as a matrix, let's go down here and just have a look at this representation. We're all familiar with pixels on a screen, those little dots of color, and if we just think of a black and white image, that just is a brightness value. Every pixel will just have a brightness value from zero being black to say 255 being white and anything like gray level in between. So just a five by five representation of a pitch black little image here will just be a five by five matrix of all zeros. But if you think about a five by five pixel little image, that's tiny little corner of your screen. If you were to ramp that up just to 100 by 100 pixels, that's still a very, very, very small little image. But 100 times 100 pixels, there's four zeros there, that's already 10,000 data points. So you're talking about an input vector with already 10,000 nodes just for a tiny little image. And if you do a densely connected neural network and every deep layer has 10,000 plus nodes in it, that's going to be computationally very, very expensive. So we're going to solve that problem with convolutional neural networks. And what we have to talk about then is just this convolution operation. So what you see is two, rank two tensors, two matrices, and they're both three by three, and I'm going to multiply them with each other in a very special way. It's called, I'm going to convolve. Well, it's not the full convolution operation. We'll get to that. But what we do is we're going to multiply each corresponding element in the two neighbors. So this one here, this one here, and this three here will be multiplied with each other. Then we move over to this two, and then this three. So it's one times three, and then the two times three. So we do all these nine products, and we sum up those nine products. So we one times three, plus two times three, plus three times two, plus four times one, and eventually we add all of those and we get to 61. Now when we look at an image, and here we have a larger image. We count one, two, three, four, five, six, six by six little image. And now I have my three by three matrix, and I start with it at the top left. And I'm going to do that type of multiplication that we just described. So I'm going to have a value for each of these little gray blocks that I do in here. They'll each have a value in them. And the picture at the back, the gray picture with a six by six pixels, that will have values and I'll multiply all of them and add them. And then I'll have a number. And this number is going to form for this first one, the top left pixel value of my new image. So I'm going to have what I'm going to call a resultant image that I'm trying to create. And the first top left pixel value is going to be this sum product over here. Then I'm going to move across one, still with the same nine values in my little matrix here. New values below it. I'm going to do that same process. It's going to give me a second value, a second scalar value. And that becomes the second pixel on the top of my resultant image. And so I move on and on and on across to the right-hand side, drop down one, move across to the right, drop down one, move across to the right till I get to the bottom. And that is the convolution operation. And it's going to build up for me a new resultant image. I'm calling it a resultant image. It's easy to understand what we mean by that. You can sort of form a visual idea of what is happening. Now, when we have this little 3 by 3 block, we call that those nine values. We call that a filter or a kernel. And that is nine values if we have a 3 by 3, which are actually weight values. And that is what our neural network is going to try and learn. It is going to try and optimize those values that have to go in there. And if it runs across an image like this once, it keeps those same values. Now, we just have to think about this resultant image. What is its size going to be? Well, if we start with an N by N, that's N for November, and we have a kernel of M by M, the resultant, if we just step across one, and we call that a stride, if we step across one at a time, our resultant image is going to be N minus N plus 1. So we'd had 6 minus 3 is 3 plus 1 is 4. So we're going to end up with a smaller resultant image, which is 4 by 4. Now, if we think about a color image, though, a color image has three channels, a red channel, a green channel, and a blue channel. And all we're going to do is we're going to have this time of just two by two. It was just easier to draw instead of a 3 by 3. Usually we'll use 3 by 3. It's the same one that runs across all three of my channels. And now I'm just going to have a times p as well. So if I have a 6 by 6 image in grayscale in color, that's going to be a 6 by 3. So there's going to be these pixel values for each of my three channels. And the same process is just going to happen. It's going to run across all of them. And here we have an example, and you can check it out, from my original pixel monochrome and its values. These have nothing to do with the black and the gray and the white that you see there. I just wanted to color them different just so that you can see. And try and do this for yourself and run across this whole thing. And you can see we went here from a 6 by 6 to a 4 by 4 resultant image. And at the moment it's not clear what the idea is, what is going to happen here. And I've actually got a video which I'll show the link in the description below. And there's also a link in this document, a video on YouTube where I use Microsoft Excel. And I show you what the result is. And what happens here? What does this little kernel or filter that moves across, what does it achieve? And what it achieves at least initially in a convolutional neural network, because we're going to have many of these convolutional layers, is it starts to learn edges. So it's going to see that there is, in your picture, there's an edge, a clear distinction between one brightness value and another. If you have an image of a dark cat against a white background, there's going to be an edge, and it's going to start detecting these edges. And as it goes deeper, these kernels are going to learn how these edges fit together, how they form shapes and more complex shapes. And eventually, we're going to end up with this product from which your neural network can learn, it can recognize this image. Next up is just this concept of padding. So what you can do if you have an image and you want the resultant image to be of the same size, you can pad your original image with the zeros all the way around. Now if you run it across, and it was originally six by six, you have drawn a five by five one. If you run it across, it's going to end up being exactly the same. And that's just going to be a hyperparameter that we set in our layer. What we want to do the padding, do we want the resultant image to be of the same size, or are we just going to allow it to get smaller as we move through? That is something we just choose. I've mentioned stride before. You can also set what the stride length is going to be here. And in this depiction here, we're jumping with our two by two kernel. We're jumping two pixels across, not just one. So we can set that stride as well. Now there's another concept called pooling, which we can also add to our layers. And we usually would do something like two by two pooling. So once we have our resultant image after convolution operation, we might decide that we're going to take a little two by two grid like this and take the maximum value in there. And that maximum value is 78, and that becomes the first pixel there. Then we're going to move one across, again the stride. And if we look at that, the maximum for the next four is also going to be that 78. So that's 78. And then we move it across one, and then the maximum is going to be 70. And that's 70. And that's called the max pooling. We're trying to almost get the maximum information from this image. Before, there was also, some people also used the average pooling. So it would be the average of those four values. But max pooling actually works best. Well, there's no evidence really to say that there's anything better than max pooling. Max pooling is going to give you good results. The last concept just is flattening. When you've run through a few of these layers, a few convolutional layers, a few max pooling layers, eventually you're going to do a flattening layer. And what the flattening layer is going to do is say, for instance, our resultant image is 9 by 9. It's just going to form a single vector of these. So it's just going to put them all in one. So we can have nine nodes, for instance. 78, 78, 70, 108, 108, 108, 108, 88 is one long normal vector, which we can then put through a normal densely connected neural network. And that usually happens at the end of a convolutional neural network that we put a few densely connected layers. And doing which we also have this learning process as we had before in the normal multi-layer perceptron or deep neural network. It has an added benefit that we can talk about afterwards. There are massive networks available that have already learned from images which you can just download and then add, if you want to put your own images through that, make this convolutional part such that those weights never update, but that eventually that feeds to a normal deep neural network from which learning will take place for the images that you feed in. And we'll talk all about that in later videos. So in short, that's a summary of a convolutional neural network. The best way to go about this is just to construct one. And in the next video, that's exactly what we're going to do, implementing some of these concepts that you now know of. You know they exist. You have an idea of how they work. We're going to implement them in the next video.