 whose nine entries are polynomials of degree r. Then we're going to apply multipoint polynomial evaluation to each of those nine entries to compute m prime of 0, m prime of r, m prime of 2r, et cetera. Notice we're instantiating not at 0, 1, 2. We're instantiating it at multiples of r because m prime of 0 is equal to the product of m of 0 times m of 1 times m of 2, et cetera, up to m of r minus 1. OK, and then when we're done with that, we then have, on the order of square root of matrix, we have s matrices, which is roughly square root of p. They're all now honest 3 by 3 matrices over fp. And we want to multiply our vector v times all of those matrices plus any we missed because r times s was smaller than p minus 1. And so for that last step, we'll fall back to what we did in the first algorithm. We'll just do vector matrix multiplication. But now we're only doing, on the order of square root of p, vector matrix multiplications rather than doing, on the order of p of them. And then at the end, the last step is identical to the previous algorithm. Once we've computed the vector v times all of these matrices, we just pull out the third coefficient and multiply it by minus f, the constant coefficient of n, f raised to the nth power. And so this improves our complexity. This takes our linear time algorithm and turns it into an algorithm that runs in time roughly on the order of the square root of p. Now the log factors are a bit bigger because we had to compute this big product tree. The product tree had on the order of log p levels. At each layer of the product tree, we had something like square root of p log p bits of information. And when we do multiplication with something times log, something bits in it, we're going to get a log squared. And then we have a log, log layers in the tree is going to give us a log cubed. If that went by too quickly, don't worry because we're going to do the exact same analysis in the next algorithm in more detail. Now I should mention that you can save a log factor here by using additional information that we haven't taken advantage of. So our multi-point evaluation algorithm is we'll work for evaluating at arbitrary points. But in fact, we're not evaluating at arbitrary points. We're evaluating the integers from 0 to p minus 1, which form an arithmetic progression. And when you're evaluating a polynomial along an arithmetic progression, you can instead use a slightly different approach rather than just multiplying all these polynomials up the tree and then doing multi-point evaluation at the top. You can instead take an evaluation interpolation approach where you're actually keeping track of the values of your polynomials as you go and doubling the amount of information you're carrying with you at each step. But that algorithm is somewhat involved. And I don't want to try and go into the details of it. I'll refer you to the literature. The Boston Godree shows in a 2003 ANTS paper and they save a log factor. And in practice, this is the algorithm that everyone uses. But pretty much all of the square root p time algorithms for computing zeta functions, whether they're based on an average polynomial time approach using these linear recurrences or based on a Kedlaya's algorithm type approach using Monsky-Voschnitzer co-homology, the complexity bound that you get is all they're all taking advantage of this BGSO3 result to save a log factor. And they're using multi-point evaluation to do it. OK. All right, so let's take a quick look. I didn't mean to click on it or have it open. OK. At the algorithm. So this top function is implementing the multi-point evaluation approach. This first loop here is computing the product tree. It looks a little more complicated than it needs to because I'm not assuming that the number of points I'm wanting to evaluate is a convenient power of two. So this function needs to work where this sequence of evaluation points has arbitrary length. So whenever somebody writes it on the blackboard, they are very conveniently will choose the number of leaves to be a power of two. But when it's not, you need to make some minor adjustments. That's the only reason this code looks more complicated than it should. And then once you get to the top, excuse me, it's then reducing down the tree at each step. So you can see where this mod here is where it's reducing g modulo each of the children as it works its way down the tree. And then at the bottom, it knows it's going to have polynomials of degree zero, and it just returns the constant coefficients. And then the algorithm for running this recurrence using the multipoint valuation, the first part of the algorithm, so this section is identical up to here, is identical to the linear time algorithm we already saw. Where things change is at this point, we switch to viewing our matrix as a matrix over a polynomial ring. We pick our value of r. We compute our product tree. And to simplify matters, here I am going to simplify matters and actually just make r a power of 2. So this is not an optimal algorithm. And so I'm computing in this situation, we don't actually need to retain the entire tree. We're just going to pair entries up and compute the product all the way till we get a single matrix at the top, whose entries are polynomials of degree squared p. And then this next step, we're using the multipoint evaluation, applying it to all nine entries. So that's why there is a 3 by 3 instantiation here, where we're applying multipoint evaluation to each matrix entry, m sub i, m sub j. And then this line is computing the product of those matrices, of the multipoint evaluations. And then I'm running my vector, our linear recurrence, taking the vector and multiplying it times on the order of square root of p matrices. And then the last two steps are identical to the linear time algorithm. And I don't expect you to have all absorbed that in real time. I just want to at least walk you through it. So hopefully in your head, you can relate what you saw on the slide with the code, so that if you ever want to come back to it and really dig into it and understand exactly what it's doing, or maybe implement it in another system or in another setting, where maybe you're not working with 3 by 3 matrices, and maybe you're working with different linear recurrences, you have an example that anchored in your head. OK. Any questions on this algorithm before we move on to the average polynomial time algorithm? So now we're going to switch our perspective and imagine that our elliptic curve that was given to us isn't an elliptic curve over a finite field, but it's an elliptic curve over q with integer coefficients. It's defined by some square phi cubic polynomial f with coefficients in z. And we're just going to assume its constant coefficient is non-zero upfront. If it's not, the caller should fix it so it is, which is easy to do over q. We want to avoid primes that divide the discriminative f. Those are primes for bad reduction for the curve. And we're not going to worry about the prime 2. We know we can use an algorithm we've already developed to point count over f2. That's a lot of hard problems. So we're just going to focus on odd primes p divide the discriminative f. I'm going to use the same notation. I'm going to write h sub p of e to mean the Haase invariant of the reduction of e modulo p, just an abusive notation. And now we're going to take advantage of the fact that we set up our linear recurrence relation matrix m sub k so that its entries don't depend on p. So this matrix makes perfect sense over z. If the coefficients of f are in z, this matrix will live in z. And if we could compute all of the products of v sub naught times m sub 1 up to m sub n minus 1 modulo n for n running from 1 to n, we could compute all of the Haase invariants we want. So we would just do this by picking out the n's that are prime. And for those n's, taking this vector and taking the last entry, which I've denoted v sub p I'm using now. I don't want to have a double subscript of 3 n of p. So v sub p means the third entry of the capital v sub p. And multiply by minus the constant coefficient raised to the p minus 1 over 2. But notice now this is going to be a different p at each step. So we're running through all the odd primes that don't divide d up to n. So that's great. But it's not at all obvious how you would do this efficiently, right? I mean these matrices have integer entries. And if we're reducing modulo different moduli, but we just have a single sequence of matrices. And if we multiply them all together, the integers are going to get huge, absolutely enormous. OK. So how do we do this efficiently? And we don't want to compute these products for every n. We don't want to compute n products of roughly n things, because that would give us something that we're running time that's quadratic in our bound n. So what we're going to do instead is use what's known as an accumulating remainder tree. This is due to David Harvey, I think is at least the first person to write it down in this form and formally call it this. And so it solves the following problem. We're given a vector v. We have n matrices, capital n matrices, and we have capital n moduli. And our mission is to compute the product of our given matrix, sorry, this v0 is the same as should be v. We want to compute our given matrix v times the product of m sub 0 up to m sub n minus 1 mod n. Call that v sub n. We want to compute all the v sub n's for n running from 1 to n. And optionally, we'll see why we want to do this later, we'll also want to compute the product of the un-reduced product, not modulo anything, the integer product of v times the product of all these matrices. So how do we do this efficiently? Well, we're going to use product trees again. So we're going to compute a product tree of the n matrices, and we're going to compute another product tree of the m moduli. And then we're going to take our vector v, and we're going to reduce our vector down the tree. And at each stage, there's a single invariant that we want to preserve. We want our vector to be v times the product of all the matrices to the left. Because notice we're ultimately going to be reducing mod m sub n. And notice the nth result we want is a product of all the matrices that come before index n. So if our mission is to make sure that we have the vector times the product of all the matrices to the left, well, at the beginning, at the top of the tree, there's nothing to the left. We're at the very top. There's nothing sort of off to the left, because we're at the root of the tree. And in fact, as we go down the left side of the tree, if we just look down all the left children, enough of this algorithm is going to do absolutely nothing. It's just going to send our initial vector, which might be 0, 0, 1, all the way down to the leftmost child. Nothing interesting is going to happen. But whenever we need to descend to a right-hand child, now we have something to our left. And so we're going to take our vector and multiply it to our left-hand sibling. And at each stage, we're going to be reducing modulo the modulus that's at our point in the tree. And so rather than trying to write out a very complicated expression of indices, I've just tried to express this. Just imagine you have, at any local point in the tree, you have a parent, and you have two children, a left and a right child. And what do you do? You take your parent, and you just send it down to the left child. And all you do is reduce modulo the modulus at the left child. And to go to the right child, you take your parent vector, you multiply it by the matrix on the left child, and you reduce modulo the modulus of the right child. And you just keep doing that until you get to the leaves. And when you do, at the leaves, you will have precisely this vector matrix product that we wanted to compute at every leaf.