 So let's quickly calculate that. If we have a convolution one by one image with two padding becomes a five by five image. So if we apply a five by five filter, that will give us exactly one output. So what do we see here? Two times padding is being added to the dimensions when we add padding. And if you want, we lose the image size minus one when we do the convolution itself just by virtue of it that we can't move it to all places. So what that means is instead of a one by one image, we have an image that is 299 in one dimension, 399 bigger in the along the two axis with two padding, we will have an effectively a padded image of 304 by 404. Once we apply the five by five filter, we will be at 300 by 400. Now what you see here is if we have a convolution kernel of basically n by n, and we do n minus one over two padding, the image stays the same. Now what if we now apply a max pool? We have 300 by 400, divisible by two, and we have stride two, and we have no padding here. It's simply both dimensions go down by a factor of two, and we now have 150 by 200 image. Now before training, let us really make sure that we understand the data set. You will be building a network yourself very soon. And it's therefore essential that you really understand the properties of the data set. So now visualize the data set, understand its properties, and ask yourself how hard will it be to classify these images. So go get a feeling for the data set.