 Because this is where calculations get intense, this is where a lot of things happen, they happen again and again, and they're easy to implement with for loops, we're going to use the NGIT decorator on each of the functions from here on out. Also, notice that they're no longer part of the class, they're no longer a member of the block class that we're working with, they stand on their own inside the module, the conv1d.py file itself. This is one of several quirks of working with Numba. You want to take out your small chunks of computation and decorate them with an NGIT decorator, but have them outside of the class. So we'll start with our Calculate Outputs function. We expect our inputs, our signal, and our set of kernels. Remember, we're not just working with one kernel anymore, we have a whole collection of them. So the goal of this function is to compute the whole set of convolutions for a collection of kernels with a multi-channel input. So our input to this is our 2D array of floats. It's N channels, number of rows, and N inputs, number of columns. And then we have a kernel too, which is a three-dimensional array of floats, which has N channels in the 0th dimension, the size of the kernel in the 1st dimension, and the number of kernels in the 2nd dimension. And then the result that comes out of this is a two-dimensional array of floats, which is number of kernels in the rows, and number of inputs minus the kernel size plus 1, so the number of outputs in the columns. And you can see how this will then be the input to possibly the next convolutional layer. What was the number of kernels now becomes the number of channels in the next layer. What was the number of outputs now becomes the number of inputs in the next layer. So this right here is just going to be a loop where we break out and handle one kernel at a time. We calculate what the size of the result should be. So the number of kernels by the number of outputs, and we initialize an array of zeros. Initializing an array and then filling it in with the result, it's a good way to do iterative calculations quickly. It takes a long time to reallocate new memory space. So if you know how much you're going to need, you just preallocate it all at once and initialize it to zeros and then just change the elements as you need to to generate the result. So here we generate the full result for all of our outputs, and then we go through one by one with our kernels and handle our inputs versus just one kernel at a time. So we make sure to pass it our full set of inputs, and then of our whole set of kernels, we just pull out the one kernel, the i-th kernel for that iteration. And then we assign whatever the result of that Calculate Single Kernel Output function is. We assign it to that relevant row in the result. So we're filling in this preallocated result one row at a time. Now you can see we've done the strategy again where we kick the can further down the road. We create a new function that we haven't written yet, Calculate Single Kernel Output. So next step, we come and write that function. We have our signal and our kernel coming in. Now we expect that our signal, it's still our two dimensional array of floats, which is number of channels by number of inputs. And we have our kernel now, which is also a two dimensional array of floats, number of channels by length of the kernel. Now we can take this multi-channel signal and multi-channel kernel and break it down further. Here we clarify again that we use the valid mode only doing the convolutions where the kernel completely overlaps the signal. We preallocate the result for this function, which is going to be the number of outputs. And now we go through for each channel in our signal and in our kernel, we calculate the convolution of that particular channel of signal with that particular channel in the kernel. And then whatever the result of that is, we add it to the result. You can see here that we reference our convolve1D function. This is almost exactly the function that we created when we were benchmarking convolution before. We can jump down to the bottom and see what that looks like. So here we've taken our convolve1D function and we've actually broken it apart a little bit more. So we use the fact that convolution, once you reverse your kernel, it's exactly the same as cross correlation. So we break it out one step more. We take and reverse our kernel and then we pass that signal and the reversed kernel and the number of outputs that we expect and pass that to a new function which we call xcor1D. So this is a cross correlation in one dimension. What this function then does is performs the sliding dot product just as we did before when we were profiling our convolution. So we initialize the results. We know how long it's going to be. We know the number of outputs and then we can iterate through in a for loop through each output, perform the dot product of the kernel with that particular chunk of signal that it overlaps with and then write the result into that pre-allocated array. This now we've finally boiled it down to hear what we were down to is the numpy dot function which we determined was simple enough to do what we needed to do and it was fast. Good point in the trade-off. So we've gone down now four levels to get to our individual calculation starting up at the multi-kernel level then reducing it down to the multi-channel level, then doing single channel convolutions and then flipping those around so it's actually a cross correlation that we're calculating.