 Having equations for convolution is a great start. They are specific enough that we can turn around and implement them in code. The other thing that we have to be able to do in a neural network that uses back propagation to learn, is we have to be able to differentiate these equations, specifically with respect to the error gradient, the loss gradient that's going to be passed back down through the layers. If these concepts are unfamiliar, there are some links in the text up above that you can use to get familiar with them. For now, I'm going to assume that they're not totally new and jump in. Back propagation is how we take the sensitivity of the loss function to changes in each of our layers output values. This is the partial derivative of the loss with respect to the outputs y. We want to be able to propagate that back and calculate the partial derivative of the loss with respect to the inputs x. This is one link in our chain and we back propagate this sensitivity of the loss to each set of outputs and inputs all the way back through the network. By the chain rule of calculus, the way that we get the partial derivative of loss with respect to x, is we take the partial derivative of loss with respect to y and multiply it by the partial derivative of y with respect to x. Now in our case, x and y are arrays. They're not single values. They're a whole long line of values, a list of values. Really, what we want to calculate is the partial derivative of the loss with respect to each one of the inputs individually. The way we calculate that is by the chain rule, the partial derivative of the loss with respect to y times the partial derivative of y with respect to each input element. Each of our outputs are also individual values, not one element. To really break this down to individual elements, we have the partial of loss with respect to each input element is the sum of the partial of loss to each output element times the partial of that output element to the input element. Added up over all of the output elements. So because it's so verbose, we'll use as a shorthand for partial derivative of loss with respect to the output. We'll just call that the output gradient and the partial derivative of the loss with respect to the input. We'll call the input gradient. So to go from the output gradient to the input gradient, we have to know this quantity here, the partial derivative of each output element with respect to each input element. That's what we mean when we say that the layer is differentiable. We have to be able to calculate this bit to be able to do back propagation. So in order to do that, we can go back to our definition of convolution. We can explode it back out and now say each y sub j is equal to x sub j minus p times w sub p, et cetera, et cetera. We expand that summation out and we expand out for all of our potential elements j. So this is just a long hand way to represent that summation sign and to represent the various elements j. Then we can take, in each of these cases, we can take the derivative of that y sub j with respect to each of the x's that occur in it. So in this case, the very first element, we take the derivative of y sub j with respect to x sub j minus p. The derivative is w sub p. The very next term, the derivative of y sub j with respect to x sub j minus p plus one is w sub p plus one. And so what we get is a bunch of small partial derivatives of individual output values with respect to individual input values. So this is starting to look really good. This is the type of thing that we're looking for.