 The power of convolutional neural networks comes from the convolution operator, which is a way that features can be pulled out of a signal. If it's a two-dimensional signal, an image, these can be little pieces of the thing you want to find. If you're looking for a cat, it can be an eye or an ear or a tail. What makes convolutional neural networks so robust is they don't care exactly where these are placed in relationship to each other. They can be rearranged a little bit like Picasso and the network would still be able to identify them. There's two parts to this. One is the convolution operator that matches the feature and the other is a pooling operator that gives a little bit of wiggle room to the exact position of where that feature is located. So we're going to focus on the convolution piece, the pooling of these features out of the signal. We can also do convolution on one-dimensional signals, for instance, audio, stock prices, anything that can be ordered along a single line. Often any data that's organized by time fits on a nice one-dimensional line. It could also be applied, for instance, to three-dimensional signals. Say video, where you have both the X and Y position of a pixel and its position in time. Because our paper, because our images are two-dimensional, starting with a low-dimensional signal to illustrate the process will be helpful. Convolution is the process of taking a kernel and doing a sliding dot product with a signal. Here the signal is the thing that you're trying to classify or to pull the feature out of. It might be the image or a snippet of audio or a portion of an electrocardiogram. We can visualize this as a lollipop plot where each value is represented sequentially on a line and the height of the lollipop stem is the value of that point. Points falling right on the axis have a value of zero. Points above the axis have a positive value and points below the axis have a negative value. And the length of that lollipop stem shows the magnitude of the value. To do convolution, we have a signal and a kernel. By convention the kernel is much smaller than the signal. It's the portion, it's the feature that you'd like to pull out of some part of the signal, wherever it occurs. It's the fingerprint of the feature that we're trying to find. The first step of convolution is you take the kernel and you flip it around left to right. Then you take your flipped kernel and slide it along the signal. At each location where your points in your kernel line up with your signal you multiply together the value of any points that line up and then you add all of those together to get the final result. So for instance in this location the kernel lines up with these three values of the signal but they're all equal to zero, they all sit exactly on the line. So even though the kernel has positive values for all those they all get multiplied by zero and added together giving a result of zero. So the convolution at that point will be zero. And then you slide it one step to the right and do it again. This process of multiplying each aligned pair of points together and then adding all of those products together is called taking the dot product. And moving our kernel one position at a time and repeating this is a sliding dot product. So a convolution is a sliding dot product of a flipped kernel with the signal. We can see how this plays out as we go point by point. It's only where both the signal and the kernel are not zero that the result of the convolution is not zero.