 There's nothing that says that you have to pad. There's nothing that says that your output has to be the same size as your input. Another option is just to start calculating convolutions once your signal completely is overlapped by your kernel. And those convolutions are all valid. They don't rely on any data that's outside the range of your signal ever. That's a nice, very safe way to make sure you get valid convolutions all throughout. You don't have to make any guesses about what data to invent. And the only drawback then is your output is a little shorter than your input. So it, in effect, reduces the volume of data that you have. If you're looking at a time series signal, it'll chop off the beginning and end portion. If you're looking at an image, it'll chop off a border around the image. But you're guaranteed that what you're left with is all signal and no edge artifacts. Another element that you can play with when doing convolution is the stride. So by default, convolution has a stride of one, which means each time you shift your kernel over by just one position before calculating the next convolution value. This is safe, it's complete, it's dense. Sometimes this computation is a bottleneck in whatever it is you're trying to do, and you'd like to speed it up. Also, sometimes the signal that you've sampled, you've sampled much more densely than you need to. And it's okay to just revisit it periodically, you don't have to check it every single millisecond. In that case, you can skip. So instead of sliding your kernel over one position, you can slide it over two or three or four positions each time or even more and just calculate the convolution periodically. This skip is called a stride. So if you're visiting every single position, that's a stride of one. If you're skipping a position each time, that's a stride of two, etc. And the stride that you choose is based on what you know about your data, how densely you expect that information that you care about to be represented, and also how you plan to use the result. In the case of convolutional neural networks, often the input signal is much more dense than can be usefully handled by the network downstream. So some type of resolution reduction happens. Usually this happens using pooling, which we'll talk about a little bit later, but another way that you can do this is by introducing a stride, so that you're only calculating in the two-dimensional case, you're only calculating a fourth of the total number of convolutions by reducing your resolution by a factor of two in both directions and you get a reduction in resolution and a speed up in your computation time.