 Because we want to make sure that this is fast, we'll take it and compare it to NumPy's convolution operator. In this testing loop down here, we'll go through a given number of iterations with a signal of a given length and a kernel of a given length. And we'll keep track of the total time it takes for NumPy to execute it and the total time it takes for NumPy to execute it. So in each case here, we'll step through the iterations. We'll create random signals and random kernels, a new one each time to make sure that it doesn't cheat and just remember the result of the last computation. But it actually performs the convolution again each time. And we want to make sure that when we're calculating our timing here, we just start our clock just before the convolution operation and then stop the clock just after it. So that we're only timing the convolution, we're not timing how long it takes to create those random signals and random kernels. The NumPy Convol function also takes a signal and a kernel and we specify that the mode is valid so that it does the same thing and only calculates the dot product where the kernel completely overlaps the signal. Then we ignore the very first iteration just to account for things getting started up. And in the case of Numba, to account for the compilation, we want to take that part out of it for this analysis. And then starting with the second iteration through, we take the time that it took to perform the convolution and add it to the total that we're tracking. So we go through this loop N itter times for the NumPy Convol function. And then we go through this same loop, the same number of times for our convolve1D function that we wrote above. And then we take that total and we multiply it by a million so that we can get our answer in microseconds instead of seconds. And we divide by the number of iterations. So this will give us how many microseconds it takes to do one convolution in NumPy versus Numba. And then when we run this, we can compare head to head, which is faster. Doing this yields some pretty interesting results and gets us a long way toward being able to speed up our operation. The first thing we can do is we can try it with or without the NGIT present. And we can see right away that if we have the Numba acceleration in place, it cuts down computation by 30 times. It makes it so much faster. So this is good to remember anytime you have some kind of heavy computation, the Numba pre-compilation is a powerful way to speed that up. Another thing that we discover is we have a choice when we're calculating the dot product of doing the individual summations and multiplications separately or just using NumPy's dot function, which is a shorthand for doing all those multiplications and summations at once. And it turns out that the dot function is quite a bit faster. In fact, it speeds things up by another factor of five. So we definitely want to use dot instead of using point-wise multiplication and sum. And finally, when we're using dot, it turns out that it runs much faster if instead of passing it, kernel with the colon colon minus one, we could just pass it that directly or we could make a copy of that reversed kernel and pass the copy. It turns out that because of the optimizations that NumPy's dot function uses, having that copy where the values are exactly in order in physical memory already speeds things up by another factor of three. So by having all of these speed-ups in place, the Numba compilation using NumPy dot instead of NumPy sum of the point-wise multiplication and making a copy of the reversed kernel, we can get this down to where it's almost as fast as NumPy's native convolve. It's about 25% slower. This is actually a really good trade-off because at this point we have total control over how that computation takes place. If we want to, we can skip iterations in our for loop to implement arbitrary strides for our convolution. We can do other things if we want to to experiment and we do pay a little bit of an overhead with that. 25% slower is not nothing, but it's also not 25 times slower. Before doing all of this, this was many, many times slower. Now it's pretty close and we explicitly value flexibility and the ability to experiment and so this buys us that at a reasonable cost.