 I'm going to start. So I'm going to be talking in this lecture about polynomial approximation. So I'm going to move off track of the quantum algorithm stuff that we were talking about before. And the reason why I'm talking about this is sort of we established before, or hopefully I established to you, that you could get from using your quantum algorithms, you could get from blocking coatings of some matrix A to a blocking coating of a matrix that's this polynomial applied to A. And roughly speaking, if this was like a Q blocking coating, meaning that the gate complexity is Q, then this turned into like a DQ, sorry, NQ, NQ blocking coating of P, where N is the degree of P. And so provided that I want to implement some low degree polynomial and you need P to be bounded, then we could do this efficiently with our QSVT protocol. And the issue is that normally, for a lot of the applications that we want, we're actually wanting to apply non-pollinomial functions. So in this Hamiltonian simulation example, we wanted to apply this function from H to E to the minus IHT. This was not a polynomial. Similarly, there's another example, which is this linear regression algorithm that's sort of central to quantum machine learning. And this performs the map A to A inverse when A is some sparse matrix. And so if you think about what this is applying, this is just applying the function that's 1 over x. And so in these instances, these are not polynomials. And so we're going to have to work a little bit harder to find polynomials that approximate them. So in this lecture, I'm just going to be discussing how can you take a function and then get a polynomial out of it. Yeah, so our tool for doing this will be Chebyshev polynomials. And basically, you can do this, use these Chebyshev polynomials to get any sort of approximation that you might want, up to some details of actually just the computation of working things out. OK. Yes. Yes, so there are more simple ways to see this in more complicated ways that are more efficient. The simple thing is to say that your Hamiltonian is a linear combination of unitaries. So here, these Pauli's are unitaries. There's another approach that basically goes any S sparse matrix you can implement in S with an H over the sum of lambda A. There's another approach that's if you have an S that's H that's S sparse, you can get a block encoding of H over S. All the entries are bounded by 1. And this is, I think the argument is basically that just that you somehow decompose it into Pauli's in an insufficiently nice way. Are there any other questions about why I'm doing polynomial approximation or anything else? So I'm going to give a bunch of theory of Chebyshev polynomials. And what they are, how I'm going to define them, is that the degree n Chebyshev polynomial, which is called Tn, is the function that satisfies for a complex Z, this equation. And OK, so this is how it's defined from this definition. You might ask why is this a polynomial? Well, you can see that this is a polynomial by observing that it satisfies this recurrence relation. And yeah, so what you can do to see this is you can plug in your value of x and then you can see that this recurrence basically becomes the following, which is these are twos. I don't know if I can continue doing this. Yeah, hopefully, I'll keep this in mind that these two are similar. So this recurrence I wrote is equivalent to this equation or you can just check that this is true. And furthermore, you can see that T0 is equal to 1 and T1 is equal to x, like T1 of x. Yeah, OK. And so from this, you can conclude that all of the Chebyshev polynomials are indeed polynomials from this definition. OK, if we plug in, if we take z to be equal to 1 half, sorry, if we take z to be something on the complex unit circle, then this 1 half z plus z inverse is equal to cosine of the angle. It's like a real part. And so you get a definition that might be more familiar to you, which is that Chebyshev polynomial applied to cosine theta is cosine n theta. And from this, you can conclude that, OK, first of all, OK, is this possible to see? Yeah, from this, you can conclude that Tn of x is bounded by 1 for all x in minus 1, 1. I'm going to start abbreviating this by saying Tn minus 1, 1 is bounded by 1. I'm just going to use this notation to refer to the above. And the reason for this is like for any x in minus 1, 1, there's a corresponding cosine theta, or corresponding real theta, such that cosine theta is equal to x. And therefore, cosine n theta will be between minus 1 and 1. Another thing you can observe is that Tn has parity n. So you can see this from the recursion that your T1 is odd, T2 is even, T3 is odd, and so on. So the property of Chebyshev polynomials that will be important for us is that it's basically that any sufficiently nice function from minus 1, 1, to reels can be written in terms of Chebyshev polynomials. In this way is a series of polynomials. So k from 0 to infinity of ak, Tk of x, where these ak's are like absolutely, are like they're absolutely converging. And sufficiently nice here is a very mild restriction. I think what you need is some Lipschitz continuity. So basically, you just need that the derivative of f is always finite, or as finite as it has an upper bound between minus 1 and 1. And the reason for this is for the same reason that you can have like Fourier series of functions. And in fact, there is essentially parallel theories. Because if you have this function f from minus 1, 1 to r, you can define another function g, which is, I guess, from the unit circle to r by saying that g of z is equal to f of the real part of z. And so if f of x has a decomposition into Chebyshev polynomials, then g will have a decomposition of the form I'm plugging in the value of the series here. So it's ak, Tk of 1 half z plus c inverse, which is the sum from, it's a sum over z to the k and z to the minus k. And the thing to note is that this is like a Laurent series. These are like Laurent series. And furthermore, you can define some h from minus pi pi to r by saying that it's equal to, by taking the corresponding angle of g, the corresponding, applying g to the value on the unit circle with the corresponding angle. And then you'll get that this is equal to k from 0 to infinity of, what is it here? It's e to the ik theta plus e to the minus ik theta. And these are Fourier series for h, which is a 2 pi periodic function. And so what I'm going to do later is I'm going to show properties that these ak's are decaying. And so you can truncate them to get polynomials. But just keep in mind that there are these three parallel theories. So if you have intuition from Fourier analysis or any other of these, there's sort of parallel statements for all three of them. So that's what I wanted to say on that. So from these theories, you might suspect such a thing. But there is an orthogonality property that these temperature polynomials satisfy, which is that these are orthogonal under some inner product. And I'll just write down what the equation I want to say is. So here, this is the inner product. So if I have two functions, tk and tl, then I'm taking their inner product to be this integral. And I'm saying that this is orthogonal under this inner product, which means that this is going to be, I guess, I guess, orthonormal. But it's going to be 1 if k is equal to l and 0 otherwise. And I think you might need it. Actually, I think it evaluates to 2 if k equals l equals 0. So there's some slight issue. So there's always some slight constant things that happen at 0. But basically, they're all orthogonal. OK, how you can see this is you can say, I'm just going to take this and I'm going to substitute. I'm going to do a u substitution. x equals cosine theta. And then dx equals minus sine theta. And then what happens is that my integral becomes 2 over pi. I think this goes now from pi to 0, tk of cosine theta, tl of cosine theta. And square root of 1 minus cosine squared is sine. So this is sine theta. Am I off by some thing here? There's, sorry? That's true. I'm worried about this absolute value. Is there some absolute value here that I'm, I don't think so. I think we're fine. Yeah, OK. All right, so this is integral from 0 to pi of, this is tk of cosine theta. By what I said before, this is cosine of k theta. And this is cosine of l theta. And OK, I'm going to leave it as an exercise to show that this is equal to 1 when k is equal to l, and then 0 otherwise. You can get this with a cosine addition formula. Yeah, any questions on this? The form of the inner product is like, yeah, the way that you should think about these polynomials is that they belong on the unit complex circle. And so that's why you get these terms of square root of 1 minus x squared, which is like, if you were working with these on the complex unit circle, then these terms would disappear, and then these would become like polynomials in z, or I guess Laurent polynomials in z. Another way you could think about it is that like, another way you can think about it is that this inner product is like the normal inner product, except when your function is near minus 1 or 1, in which case things start to get strange, like things are more heavily weighted. And then this is a behavior of Chebyshev polynomials that's common that you'll see in other places as well. I hope that's helpful. Any other questions? OK, so the nice thing about this orthogonality property is now you can actually take a function and then figure out what Chebyshev coefficients are. Because if I have some f of x that's written as this sum, this Chebyshev series, then I can imagine I integrate this against some, I use this integration inner product thing. And I take the inner product with tk, and this will give me the value at the coefficient at tk. So if I take this inner product, tk of x, then what is this? This is like some, I'm interchanging the sum in the integral, so don't get mad. Hello, hello. OK, great. So as I was saying before, if we have some function with a Chebyshev series, we can figure out which coefficient, we can figure out coefficients via this integral, because we can take the inner product with tk, and then we get this sum. And then this expression is the integral we evaluated before, which is 1 whenever L is equal to k. I guess out here. And so you get that this is equal to ak. OK, so if you have this integral here, if you compute this integral here, you can get the value of ak. Now, I don't recommend doing this computationally, because I think it's not the best way, but in any case. So one thing, but from this, we can conclude a lot of useful properties about these Chebyshev coefficients. For example, something that we can conclude is that if my function f is bounded, then it turns out my Chebyshev coefficients are bounded. So here I'm just writing what my value of ak is. OK, what can I do? I can then bound this by, I take the norm of f in here. I take the norm of tk in here. And then this is still 1 over 1 minus x squared. And this is equal to 2 over pi times the integral 1 over root 1 minus x squared. And if you believe me, this is just pi, so the value is 2. So I've just shown that for all of my Chebyshev series, if my function is bounded, then the coefficients are bounded. And this is one reason why you see it all the time in these polynomials all the time in applied math, because this is different from what happens in the monomial basis. So for example, you can have polynomials like the Chebyshev polynomial, where your Chebyshev polynomial is bounded. But if you write it in the monomial basis, the leading term here is 2 to the n minus 1 x to the n. All of the coefficients are exponentially large in the degree. And so you get some instability, which is why it's better to work in the Chebyshev basis where things are more stable. OK, so this suggests the reasons why we're looking at these. OK, so the main polynomial approximation we're going to be looking at is that of Chebyshev truncation. So if we have some function minus 1 1 to r, and it has a Chebyshev series, then we can find this truncation to be just we take all of the coefficients up to n. And this will be our degree n approximation to f. And well, you could ask, how well does this do? And we can measure it. We can see how well it does in the uniform. And it's like a norm that point-wise. So how poorly this approximates, well, since it comes from the Chebyshev series, we just need to bound the tail. As we said before, all of these tk's were bounded by 1. And so you can bound this by k equals n plus 1. You can bound this by the sum of the tail of the Chebyshev coefficient tail. And this is sort of what our goal is to bound, because we know that for our series, this does converge. And then maybe if we choose n sufficiently large, then this bound will be like, because we know that this, because we know that the whole series of ak's converge, we can choose n large enough so that this is epsilon. And this will give us a Chebyshev polynomial approximation that's good up to epsilon. So basically the only thing is that we want this to be epsilon. And our broad vision here is that if we have some a, and then we want to apply some f of a, but we can actually only apply fn of a, like this polynomial, then using this argument, we can show that how much we're off by in operator norm is bounded by epsilon. Yeah, so that's the high level viewpoint from the quantum algorithms perspective. OK. So we want to show that these ak's are decaying. There's a bunch of theorems that basically state something of the following form, which is that the smoother that your function f is, the faster that these Chebyshev coefficients decay. So for example, there's these Jackson theorems which say that if I am singly differentiable, or if my first derivative is bounded, then my tail, then these ak's decay is like 1 over k squared. If they're doubly differentiable, then they decay as k cubed. And the nice thing is that you don't really need to try to get a handle on what explicitly computing these coefficients, which can be difficult. So I'm going to prove a statement of this form, which is that this theorem comes from a book of Triphethan. And what this is is if f is a function and it's analytic and it's also analytically continuable to the interior of this ellipse. So I'm defining this ellipse as going to be a little confusing at first. And here it satisfies on this ellipse that f of z, I guess, is bounded by m on this ellipse. Then what we can conclude is that our ak's are exponentially decaying for all k. OK, so what am I saying here? So OK, we look at some analytic function. So for example, e to the x, e to the tx for Hamiltonian simulation. Or I can look at other applications. You want to look at some smooth version of the inverse, or you can look at some version of arc sign if you rescale it. These are things that appear in applications of quantum algorithms. And we want it to be continuable to this ellipse here, and I'll draw what it looks like. So here is my complex plane. Here's minus 1, and here's 1. And what my ellipse sort of looks like is it looks like this. And here if rho is like 1 plus delta, then this width here is like delta squared, and this width here is like delta. So it's not drawn to size. Basically, the idea is that I want my function to be bounded a little bit outside of minus 1, 1. And how far I can go outside of minus 1, 1 and still be bounded dictates the rate of convergence. So if rho is 1 plus delta, that means that I need my size of my approximation to be like 1 over delta. Right? So what's going on here is that then we can choose. Then we can notice that by this theorem, my error is bounded by the sum of these AKs, which is bounded by 2M. Or you have this exponential tail. I'll just write down what the answer is here. I believe this is right. Maybe it should be an N plus 1. Anyways, all this to say that you can choose N to be equal to 1 over log rho log of M over rho minus 1 epsilon. If you want this to be bounded by epsilon. So if I can extend to this rho as 1 plus delta, then I get a 1 over delta degree approximation. And here, when I say exponential convergence or exponential decay, this will give you a log 1 over epsilon term here. OK, so to see where this is coming from, I'm going to prove it. As I wrote before, we have this integral for our value of the coefficient. And what I want to do is I want to lift it to something on the unit circle. So the way that I do that is I say x is equal to 1 half z plus z inverse, where my z is on the unit circle. And if I do this change, what I'll get is that this is equal to, let me see. I'm just going to tell you what it is. So I'm integrating around the complex unit circle once in the counterclockwise direction. And here, this is something that you might recognize as the Cauchy integral formula. And what we can say is that we know that f, as we said before, is analytic in some interior of this ellipse, this Bernstein ellipse. And so what I'm going to assert is that we can actually just take this and we can expand our contour to be the contour of the circle of radius the row. And you'll see that the space that you pass through is the space in the interior of the Bernstein ellipse. And so we can still make this so this is an inequality. And then what we can use is that this function here is bounded by m on the ellipse. So it's bounded here as well. And we have that this guy is bounded by row to the minus k plus 1. Is that right? Yeah. And then the length of the contour that we're integrating over is it's a circle of radius row, so it's 2 pi row. And so this will give us a total 1 over in magnitude. This is bounded by 1 over pi times 2 pi row times row to the minus k plus 1. And this is precisely what we wanted. This 2 row to the minus k times 1. OK. OK. Any questions about this? Right, so this you might be wondering at this point what the difference is between this Chebyshev approximation and a Taylor series approximation. The Taylor series are probably more familiar. And then you see them a lot as the tools that you use for doing polynomial approximation. And there is a corresponding statement that you can make here about Taylor series. Roughly speaking, what you need is that you need your Taylor series coefficients to also be decaying. So if you have your, for example, if your Taylor series coefficients are bk, then what you need is bk. I'm going to center it at 1, at 0. You need that this guy is bounded by some m. And you can sort of think about this as saying, OK, I have my Taylor series approximation, or my Taylor series, and then somehow I need to bound it on a circle, or I need to be bounded. And this is slightly outside of minus 1, 1, something like this. And so I think you can get similar results. They might not be optimal, but you can get a log 1 over epsilon convergence. And the main annoyance here is that sometimes you don't want to necessarily go to figuring out what all these coefficients are. And in this case, you can just figure out what this bound is of your function in some region. And this might be easier. So this is just trying to explain how this works. So sometimes this isn't enough for us, though. As I mentioned before, sometimes we want to approximate functions that are like 1 over x. And 1 over x is a function that's not bounded in the Bernstein ellipse at all. And you might wonder what you can do here. Well, the thing is that our criteria of what we want to approximate has changed. So here, if we are trying to apply this to a linear system, what we are allowed to say is that our matrix that we are performing this on is well conditioned, meaning that none of the singular values go beyond this value, say, like delta. And so what we really want to do is we want to approximate it on this region. And then we don't really care what happens in this region. We maybe need that it's bounded between minus 1 and 1 because then we can apply our QSVT. But apart from that, we don't really care what happens. And so in these settings, when you have a different criteria, then there's different tools that you need to massage it into some other form that you can then use. In approximating this particular function, people have been pretty resourceful. So I think there's sort of an ad hoc way to approximate this, which is saying f of x is approximately 1 minus this function. And the idea is that this 1 minus x squared to the b thing is going to be sending this to 0, but it's otherwise like, sorry, how should I say this? This is some function that's peaked at 0 and then decays very fast. And so if you combine these two in such a way, then you're going to get some function that sort of bridges this gap in the middle between minus delta and delta. So there are ad hoc ways that you can do this. There are also theorems that basically say that you can take some f of x, which is like 1 over x. And you can approximate them on smooth areas and then sort of gloss over areas of discontinuity. So imagine I can do the following, which is I can take my function. I want to approximate it in this region, outside this region. And imagine I could do the following, which is I have two different functions. And one function performs this approximation. And then it immediately goes to 0. And then it's just 0 for the rest of the approximation. And then I can imagine I have another function that approximates this negative piece and then it goes to 0 and then 0 for the rest of the approximation. Then what I can do is I can sum these two and then I get my approximation for 1 over x. That's good on both of these regions. Basically, now all you need is some way to get a good polynomial approximation, but one where it goes to 0 outside of the area of approximation. And I'm going to state a theorem and not prove a theorem, this theorem, that says that you can do this. So if you have f that satisfies, OK, yeah, sorry. So what I'm going to say is it has the same properties and continuable to this Bernstein ellipse, e-row, and then bounded by m on this ellipse. Then you can find some polynomial approximation, and it's going to be of degree as here, such that my function is, maybe I'm going to say m is equal to 1, so it's just going to be a function that's bounded by 1. And so I'm going to get a good approximation on f for f. For the region that's slightly outside of f, it's bounded. And then finally, on the rest of it, so I can extend it even further, this is bounded by epsilon. So here I'm imagining I have my inversion polynomial. And what I've done is in order to use this, I rescale so that my function is like, I sort of shift everything so that the region I want to approximate is in minus 1, 1. And then I can indeed get this approximation that's good on minus 1, 1, and then it's bounded in some region that's like delta size. And then it goes to 0, or it goes to near 0 very quickly after. And so using this tool a bunch of times, I can get approximations of sort of piecewise smooth functions. And this is interesting because if you took a naive approximation, like a Chebyshev truncation of this polynomial, the Chebyshev approximation would approximate it well on minus 1, 1. But then it would shoot off very quickly. And this is, Chebyshev truncations are generally very poorly controlled outside of minus 1, 1. And these are scaling as x to the n, so it could be very large. OK. And let me see. So I think I'm going to, I think I'm not going to discuss the proof for time. But basically, the idea is that you take your Chebyshev, you take this x to this bad scaling guy, and then you just take this and then multiply by some rectangle function that's 1 in the region minus 1, 1, and then quickly drops down to 0 here. And the idea is that if this is like, you can choose particular sine functions, such that this guy is actually scaling like e to the minus x squared. Or that's like the way it's scaling. And then so it beats out the x to the n. And so you multiply this by this error function. And then when you do that, you then do polynomial approximation again. And using this, as I mentioned, how you do this before, you can get these approximations to 1 over x. Notice that compared to what I was able to do before with the standard theorem, the standard theorem was giving you something like this expression, which I said it before, but it's 1 over, when rho is 1 plus delta, it's like 1 over delta log, when m is also a constant, it's like 1 over delta epsilon. 1 over delta log, 1 over delta epsilon. And so if I'm requiring this boundedness, I'm only losing this factor of b, which is the amount, how long I want to stay 0 for. And so using this, I can then conclude as a corollary, but it's like a computationally heavy corollary, that you can get some approximation of 1 over x. Here I'm scaling by delta so that my function is bounded between, so that my function is rescaled to be size like norm 1. And so I can say that there exists some odd polynomial p such that both I have that p is bounded in minus 1, 1, and also that p is close to f on this region of delta 1. And because it's odd, it'll be nearly the same on minus 1 to minus delta. So this is how you might get these polynomial approximations. The final thing I wanted to discuss is lower bounds. So it's a question of how you know that your polynomial is as good degree as you can get. And there's a couple of main things that I use as intuition for these polynomial approximation problems. And there's inequalities called, OK, there's one called the Markov brothers inequality, which says that if p is a polynomial that's degree n and p is bounded between minus 1 and 1, then you have that the derivative of p is bounded by n squared and also in minus 1, 1. And there's another theorem called, I think it's like the Bernstein inequality, which says that in the same setting you get that the derivative is bounded by n over root 1 minus x squared. And so what this means is that if you have some polynomial approximation between minus 1 and 1, in the region of a constant from, say, 0.9 to minus 0.9, in this region, your derivative has to be bounded by n. And then in this region, it can be larger. It can be up to n squared. And then so something that you can note is that, for example, in this setting before where I wanted to approximate delta over x on this bounded region, then what I wanted is I wanted some function. So if this was my approximation of delta over x, then I wanted something that here was minus 1 at minus delta. And here was plus 1 at delta. So that's what I want for my approximation. I can be epsilon off of this. This is basically what I want for epsilon sufficiently small. And so by the mean value theorem, this means that there must be some value, some point in this interval with derivative that's like a derivative like omega of 1 over delta. OK? So from this, you can conclude that any polynomial that approximates this delta over x sufficiently well needs to have a derivative of 1 over delta. And therefore, using this Bernstein inequality, you can conclude that it must have degree 1 over delta. Now there's one more thing that I wanted to mention, sorry, which is that if suppose that you were able to, this sort of suggests that if I have some ill-conditioned part of my function, then it would be better if it was on the edge. And this turns out to be true, which is that if you have your function and it looks like this, where it's like this happens near minus 1, for example, it could be like minus 1 minus delta. Then the degree of approximation you can actually get is you can get quadratically better of 1 over root delta. And this is the principle behind why sometimes you can get a quadratically better dependence on your condition number for positive definite matrices. And in the lecture notes, I've linked to a reference that says that if you're given a particular kind of blocking coding for a positive definite matrix A, then you can get an algorithm that scales as root k, root kappa instead of kappa. OK, that's it. Thanks.