 So, now in practice, we can of course not just have a single filter that we can put to all locations in an image but multiple of them. Now here we have a diagonal filter we can apply to all places here. And we find that there's a lot of diagonal-like things, well, a long list diagonal. And we can see that there's something like a checkerboard pattern right in the middle and we can see that there's a lot of those diagonal elements kind of a long list diagonal. Now you can say a convolutional layer basically says now we can apply all of them at the same time. Now like these will of course be put together into a tensor and the output of this will also of course be put into a tensor like that. And incidentally the image if it was a color image will also already be a tensor namely RGB versus space. Now I should also mention there's the chance to add a reload right after doing the convolution where we can say we do the convolutions at all places and then we put it through a reload. There's other places where we could put a reload in such a conflict. Now there's something missing so far. So this will tell us oh look there's places where that's a really good fit for that feature and there's other places that are really bad fit for the feature. But we are still in this high dimensional space and also we don't have any invariance yet. So what we have is what's called aquivariance. So if we take the same stimulus we move it a little bit gives us exactly the same output only just like slightly shifted. Namely if we move by one pixel it will be moved by one pixel in this representation. But we want some invariance. Like because as we had in the example in the beginning we want that things that just rearrange things a little bit should produce very similar activities. So how could we get there? Well for that there's the idea of max pooling. How does max pooling work? We take these first two by two image convolution results and we take the maximum of that. And the best one here is one and we write that into the output of max pooling. Now we do that next for the next square here and there's definitely nothing like a diagonal of that kind here. And we do it and so on and so forth. Again potentially worries about padding but we assume that now here we have zero padding on the outside. And then we can do that kind of max pooling for all possible locations. Now how many parameters did we just use? Now how many extra parameters do we need for that? Well zero. Because all we did is we took the max within that area. That's not a free parameter. When we took the max within that area that's not a free parameter and so forth. So this is zero free parameters which is very nice because it gives us a feature that we are looking for. Namely something like rough translation invariance. And it doesn't cost us any more parameters. Now it does cost us something because we now have four by four instead of nine by nine parameters. So now this gotten smaller. Now in the end that's what we want. Now we start with big images we want to come out with like yes is this an X or is this a zero? Is this a cat or is this a dog? But at the same time if I take a network and I force it to have very few channels. I force it to lose some important information. In any case we will have a bunch of folders. Not like you saw the three folders that we introduced. We can apply it. We can do the max pool for all of them as this will be one big tensor. There's an operation that just operates on the tensor. And here we have all that information there. Keep in mind that now this has a certain amount of invariance. And locally move a feature without changing anything. And of course we'll have multiple channels and multiple re-loose. And now I want you to implement max pooling and really make sure that you understand this because you'll be using a lot of max pooling in your deep learning life.