 You can see here where our signal has just sparsely non-zero elements, most of them are zero except for a few, that each of these non-zero elements results in a copy of the original kernel. Depending on the sign and on the magnitude of the non-zero element, we can change both the size and the direction, the magnitude and the sign of the kernel in the result, but each of the points just takes a copy of the kernel, adds it to the convolution, and then scales it accordingly. This becomes harder to see when the signal becomes dense, has non-zero values altogether, but you can imagine it as each element of the signal just taking a copy of the kernel and adding it in to the result. We can see here, even with the different kernel, we get the same result. By taking it, flipping it, and doing the sliding dot product with the signal, a sparse signal will take that kernel and make scaled copies of it. You can see the first one is just like the original kernel, but smaller in magnitude. The second copy is just like the original kernel and similar in magnitude. And the third copy is just like the original kernel, but flipped in sign. We can express this same thing in math. Let's call our signal x, our kernel w, and our convolution result y. In this case, we will make sure that our result y is the same length as our signal x by trimming off the ends. We'll come back to this and relax this assumption later. And then our kernel w is going to have n elements. Our input and output, our x and our y, are going to have m elements. And this is what it would look like in a neural network. Your input is your signal x, your result is your output y, and your kernel w are your internal weights within the layer that you'll learn during the training phase. We'll change the notation up a little bit. Instead of indexing the values of our kernel from 0 to n minus 1, it makes it a little easier to handle if we index it between minus p and p. This assumes that we have an odd number of elements in our kernel, which is convenient when we go to slide it along. It means that we can line up the value of the result with the value of the signal, and they don't get sandwiched halfway in between. So an odd numbered kernel is helpful. And by assuming an odd numbered kernel, we can index it from minus p to p, and this just makes some of the notation a little more convenient later.