 So let's take our seats, please. OK, great. So let's review where we were. I was talking about this proof that the permanent of a matrix over with entries, a sufficiently large finite field, is sharply hard on average. So in other words, it's a hard problem not only to compute the permanent of a worst case matrix, an arbitrary matrix, but even of a random matrix with high probability. And so the starting point, of course, is that the worst case problem is hard. This is a really famous landmark theorem in complexity theory. And then to boost from worst case hardness, which is the hardness of an arbitrary instance, an arbitrary matrix, to this stronger notion of average case hardness, we looked at this simple algebraic property, which is that the permanent is a degree n polynomial in the n squared variables, which are the n by n matrix entries. And then what we said is, OK, well, here's the setting. The goal is to compute the permanent of an arbitrary worst case matrix that you give me over the finite field. Call that x. I don't know how to do that, a priori. But luckily, I do have in my disposal, by assumption, this sort of faulty algorithm O, or average case algorithm O, that works to correctly compute most permanence over my finite field, the permanent of most matrices with entries in the finite field Fp. In other words, this is what O does. I think we described it as having a good day and a bad day. A good day happens 1 minus 1 over a polynomial fraction of the time of the matrices. The algorithm outputs the correct permanent, exactly. But then on a 1 over a poly fraction of matrices, which is our bad day, the algorithm can error significantly and do something completely arbitrary. I don't know if x is a good day matrix or a bad day matrix. I have no idea if this algorithm gets it correctly. And yet I want to somehow be assured that I can compute the permanent correctly. And so how do I do this? It's a polynomial extrapolation argument. We choose n plus 1 fixed non-zero points in our field. We can call them t1, t2, through tn plus 1. And then a uniformly random matrix R. And I fix R. This is uniformly random and fixed. I then consider this line a of t, which is x plus t times R. Two observations which spell out the proof and are very simple. One is the scrambling property, which is that for each i individually, a of t sub i is a uniformly random matrix over my field. That's just because I took a worst case matrix x and I shifted it by something random. So I get something uniformly random. And then two is this univariate polynomial property, which is that the permanent of a of t is a degree n polynomial in t. That, of course, follows from, it's kind of inherited from the algebraic property of the permanent itself. The permanent is a degree n polynomial in n squared variables. OK, but now it's clear what we do. We take our faulty box, O. We evaluate it at these n plus 1 points to compute the permanent of a of t sub 1, a of t sub 2, and so on. Well then, because we now have a univariate polynomial of degree n, we specified it at n plus 1 points, well, these uniquely define the polynomial. So we can use polynomial extrapolation to reconstruct the polynomial permanent of a of t in the single variable t. Once we do that, we simply evaluate the polynomial at 0 to get back permanent of x by construction, because permanent of a of 0 is permanent of x. Now, there were several subtleties here, and I think we were discussing them. Subtlety one is that this only works the way I've described, because the probability of my faulty algorithm, O, is sufficiently high. Because if we can make this 1 over some arbitrary polynomial, or even a linear quantity, let's say like 100 times n, we can guarantee you with very high probability that all of the n plus 1 points are correct by a union bound. Once they're correct, the algorithm works with high probability to give us back our worst case permanent. So we'll talk more about that today. Another subtlety that I think is maybe very appropriate for what's coming next is at the end of the day, when we consider quantum mechanics, we really want to consider the complex or the real number. It's not finite fields. Sort of much more natural for our setting when we'll get into talking about random circuit probabilities, because these are fundamentally random circuits with matrix entries, if you will, unitary entries and complex numbers. So let's start with that discussion. So what goes wrong when I try to adapt this argument? If instead of finite fields, I had a matrix x that was, say, over the real numbers, and I considered random matrices now over an IID Gaussian ensemble. So each entry is IID Gaussian. So we talked about this a little bit, but can someone tell me what's the obvious problem here? What happens? Yeah, precisely. So in other words, it's sort of the scrambling property is sort of broken. You take a fixed matrix x. Maybe it's a 0, 1 matrix, or whatever. It's some real matrix. You shift it by something Gaussian, and you have some shifted Gaussian. You don't have the Gaussian ensemble anymore. That's a problem, because the average case algorithm that we're assuming, this faulty algorithm, it works only with high probability with respect to the, say, the IID Gaussian ensemble. It doesn't necessarily work with respect to this weird shifted Gaussian ensemble. There's an easy way to combat that, though. And let me tell you how it is. And then we'll see that already we see a lot of subtlety in this easy way to do it. So maybe it's not so easy after all. But let me tell you what you can do. You can take a different path through the matrices. So you might consider something like this. So let's say instead of a of t equals x plus t times r, let's say a of t is equal to t times x plus 1 minus t times r. So t times the worst case matrix plus 1 minus t times a Gaussian matrix. Now, if my box, my box O only works with high probability over IID Gaussian matrices to compute the permanent. There's kind of a clear strategy that I would use. And I think I already discussed this, but can you remind me what would I do? How would I choose my points? So here t1, t2 through tn, if they're finite field points, we don't even care. Any finite field point is fine. Now that's not going to be true anymore, right? But there is a clear strategy. So again, a of t is going to be changed to be t times x plus 1 minus t times r. Yeah? Sort of, yeah, you're not going to check. Check is maybe too strong of a word, but yeah, analytically you kind of want that to be true, exactly, and that's the whole hope. You want to make these points, these a of t1, a of t2, blah, blah, blah, as close as possible to Gaussians. Now here's a hint. We're not going to be able to, I don't know how to do this so that they're exactly IID Gaussians anymore, the same mean and variance, but there is a strategy that sort of gets very close to that. What would the strategy be? How would I, okay, so let me say it one more time. I would really like this to come from you. It's not that difficult. The curve a of t is going to be t times x plus 1 minus t times r. Ah, that's the right idea, but it's the opposite of what I was thinking, I think. It's close to one or close to zero. If it's t times x plus 1 minus t times r, we want to remove the dependency on x. x is very much not random. So we want t to be close to zero, absolutely. So then what's the strategy? You take a whole bunch of close to zero points, but they're different, but they're close to zero. You use that as your t1 through tn plus one, but then at the end of the day, when you extrapolate and you reconstruct the polynomial in t, still permanent of a of t, you then want to evaluate it t equals one because you get back the worst case matrix. Okay, it's very similar to the strategy we're going to apply for random circuits, but you can already see at a very high level, I'm kind of waving my hands rapidly, but you can already see what's kind of going to happen that's going to be very tricky here, which is that somehow it seems like we're cheating. We're sort of taking our faulty algorithm and we're sort of evaluating a whole bunch of points that are really clustered together. And in particular, it's not just that they're clustered together, it's that they're also super far from the point that we actually care about. Okay, so we're going to see what goes wrong there and the subtlety, but this is the whole idea that somehow we should be punished, I'll tell you at a high level how we're going to be punished for that. If it was really exact computation, so we could trust that O computes the permanent of a of t1, a of t2, until and exactly, this would be fine, actually. This would not be a problem. But when we start considering approximations where these points are not exactly what they should be, but additively close to them, we'll see that the fact that we have such a big difference between a distance, a big distance between our worst case point at one and the sort of average case points that look close to Gaussian, that's going to make a huge difference, okay? It's going to be like a big part of what we're going to lecture about today. Okay, questions about that before I go on? It's just a high level. Yeah, ah, good. No, they're absolutely not independent and that's the whole point. That's why we're using a union bound. Union, that's the whole point. That exactly, so it's a fantastic, I'm glad you brought that up specifically because it motivates why we're stuck with this one over, at least in this argument, with this one minus one over poly because these are fundamentally extremely correlated points. But individually, they look Gaussian. That's what we're doing. No, that's not actually true. That's not actually true. Yeah, yeah, yeah, yeah. But you might think so, but it's not, yeah. Okay, sorry, other question? Yeah, approximating the permanent in what sense? Worst case or average case? Average case, right. Well, that's what we're trying to prove. Right, yeah. It's very much the same statement as what we had about circuits. We would want to know about the permanent and you can ask about that. I even had a Boson sampling line in my slide, if you remember that and that was about this permanent problem. Yeah, yeah, sure, sure. Yeah, yeah, yeah, okay, okay, hold on. I agree and I disagree with you. I agree with you in that average case complexity is not well-defined if you don't specify what distribution you're talking about, right? Yes, yes, yes, yes, yes, yes. Right, so another way of saying, let me say it in my language, a very good point. You could imagine sort of giving yourself a much stronger box, much stronger O that sort of worked for like a family of distributions, right, that's I think what you meant. A whole bunch of distribution and then yes, this would work. Unfortunately, it would not connect back to the sampling picture and the Stockmeyer's algorithm and so on because of the way the reduction works. So you can absolutely do that, that would get you out of trouble. It just wouldn't connect all the way back to the goal. Other questions, it's a really good point though. Okay, let's continue. So now, how do we adapt Lipton's argument from the permanent of a random matrix to the output probability of a random circuit, which is what we really care about. Yeah, that's what the experiment is doing. So the first thing we notice is that actually these are sort of the permanent and the output probability of a random circuit have a lot of similarities mathematically. In particular, they have the similarity we really care about, which is this polynomial structure, right? So I claim that the output probability of a random quantum circuit has polynomial structure and to see that, you can consider as breaking up the circuits into its two qubit gates. So these CM, CMM minus one and so on, these are just two qubit gates. And now the polynomial structure comes from the path integral, right? Which says that we can write an output amplitude of this state, let's say the all zero amplitude, right? As a ginormous sum, insanely large sum of an exponential number of paths, okay? That's the Y1, sorry, Y2 through YM. But the point is the value of the term of each of these paths, right, is very manageable. It's just the product of M gate entries, yeah? And so, you know, it's not hard at all to see that this is a degree, the output amplitude for any fixed outcome, like all zero, all zero, is a degree M polynomial in the gate entries of the circuit, right? The gate entries are just these, each one of the terms in this product, yeah? And so in particular, by the born role, the output probability P0 of C is a degree, is a polynomial of degree two M. Here I'm just gonna assume that all the entries are real. You can always sort of deal with complex numbers in sort of obvious ways by separating real and imaginary parts. Let's assume everything is real, you have a single polynomial to have degree two M and it represents the output probability in terms of the gate entries of the circuit, yeah? Okay, cool, so this is similar. Now let's try to use this property with this polynomial structure to adapt Lipton's proof, okay? Now what does Lipton's proof do? At a high level, the idea is we're assuming we have the ability to compute the permanent of a random matrix, but we want the ability to compute the permanent of a particular matrix that you give me, a worst case matrix that I don't get to choose. So the strategy is clear. You want to take the worst case matrix and you kinda wanna make it look cleverly more random because that's what you have the ability to compute. That's the average case problem. So it's all about, this Lipton argument is all about taking the worst case matrix and scrambling it, but scrambling it in such a way that there's still some structure that you can sort of pull back. So we're gonna do exactly the same thing for random circuits. Here's a first attempt, it's not gonna quite work, but it will fail instructively. So the idea is, let's say you wanna compute the output probability of a worst case circuit with M gates, I'll call that C. That's a circuit you give me. I don't get to choose that, it's any circuit. Now, what I'm gonna do is I'm going to take those M gates, I'm gonna fix M-har random to qubit gates. So we'll call them H1, H2 through HM. Those are random, now they're fixed. Now, here's the idea, we're going to scramble our worst case circuit, but we're gonna scramble it in such a way that it uses the implementation of a tiny fraction of the inverse of these har random gates, HI inverse. Okay, so in other words, here's what we're going to do. And it's supposed to look very much, it should look very familiar, very much like what Lipton is doing in this different setting. We're going to let the ith gate of the scrambled circuit, which I'll call CI prime, to be CI, that's the ith gate of the worst case circuit that you gave me, times HI. Now, if you know anything about the har measure, you know that if you take any arbitrary gate and multiply it by a har random gate, the new gate is har random, okay? So I've totally scrambled things, but that wouldn't be enough for me because I wanna gain this structure, I wanna sort of have some more structure of this. So what I'm gonna do is I'm going to multiply by tiny fraction of the inverse of these har random unitaries, HI. In other words, I'm going to multiply by e to the minus i little HI, which I'm defining to be the log of big HI, times theta. Okay, so hold on, help me out. So why does this have a Lipton-like structure to it? How can I see this as very similar to Lipton's proof? I'll give you a hint, theta is the parameter we should be caring about. So what happens when theta is really small? Scrambles, it scrambles, absolutely, approximately scrambles, right? Why is that? Because if theta is super tiny, I mean close to zero, right? Well then, this is gonna be very close to CI times HI, that's completely scrambled, right? What happens if theta is one? Well, in particular, we get back our worst case circuit, right, because if theta is one, we've implemented CI times HI times nothing. Or sorry, that's just theta two. If theta is one, CI times HI times HI inverse, of course, those cancel, give you the identity, you get CI times itself, okay? So what's gonna be the strategy? Very similar to Lipton, what do we do? Take a whole bunch of thetas, the thetas are all close to zero. We do some sort of polynomial extrapolation strategy, right, and we then evaluate the polynomial at one. Exactly, okay. Right, so that's exactly the strategy. We take several non-zero but small thetas, we compute the output probability of the sort of random but correlated circuits. I'll call those C prime of theta one, C prime of theta two, and so on. We're thinking about around two M of them because that's the degree. Maybe two M plus one, I'm gonna miss, you know, certain constants, whatever, it doesn't matter very much. You take a whole bunch of them, you get, I'll call these random but correlated circuits, sort of for obvious reasons. They're individually, they look really random, but as we've determined already, they're actually quite correlated with each other, why, because they're all using the same HI's. Yes, yes, yes, yes, yes, yes, yes, and we're gonna be punished for that, and we will see in a moment. I agree, it sort of feels like cheating, and we'll see exactly why it feels like cheating in two slides, two or three, something like that, yeah? Okay, great, so the strategy is clear. This is still not quite right, and the problem is really simple. It's that e to the minus i hi, little hi theta is not a fixed degree polynomial in theta, right? And so what do we do? Well, at least as the state of the art in our original paper, the original paper that talked about this, and going back to 2018, right, was to do what any physicist would do when you give them this problem, right? You simply take a fixed order Taylor series truncation of the exponential e to the minus i little i theta, right? So now here's the new way to scramble, and this will sort of work in a certain way. We take the ith gate of the worst case circuit, that's ci, we multiply it by hi, but now instead of multiplying by e to the minus i hi theta, we multiply by the big case order Taylor series truncation of e to the minus i little hi theta. Okay, this sort of makes us happy and sort of makes us sad. Let me tell you why it makes us happy. Because now each gate entry is a polynomial in theta, because the Taylor series is a gate entry, is a polynomial in theta. And by the path integral, so is the output probability of the circuit. So we have a polynomial, which is great. That's what we were looking for. So we didn't have in the last slide, okay? You can do exactly the same thing. You can now reconstruct the polynomial, which I'll call p in the single variable theta, and get back the output probability of our worst case circuit. Why am I not happy though? Something is a little weird about this. Yeah, it's not unitary. We lost unitarity, okay? So in fact, the interesting thing is here is how do you motivate the fact that you're taking these Taylor series truncations, which actually, like he just said very correctly, make the distribution that we're sort of now supported on, well the support consists of circuits that are not really even circuits. They're kind of slightly, ever so slightly non-unitary circuits. Now they're very close to unitary circuit because you can push that Taylor series truncation to be like an arbitrary polynomial parameter. And so in real terms, they're very, very, very close, like exponentially close in some sense. But they're not quite unitary. And so what I just described to you took the first three pages of our 30, 50 page manuscript. The rest of it was trying to prove this result, okay? Which says the following. Well remember, the goal here was not to prove that exactly computing the output probability of the circuit was hard. That was sort of something that we were doing because it was sort of necessary but not sufficient to prove sampling hardness. But we really wanted this estimation, this one over two to the n added estimation. So what we show, and I think it's sort of the main technical result of our paper, is that in fact, if that's what we care about, the hardness of giving additive estimates to the output probability of the random circuit, then in fact it doesn't matter if you, formally if you take the truncated circuit or the unitary circuit, this is a difficult result but it relies on, I think in pretty obvious intuition. You know, intuitively it's because the truncation error that we're making when we truncate the Taylor series at a fixed order, right? Is so much smaller than the size of the additive error that we want to be hard in the first place. And here's the picture, you know, the red dash here is what we're showing is hard, that's the output probability of C prime, which we're defining to be the truncated circuit, right? What we want to be hard is this little, this black dash right next to it. It's very close to it, it's like exponentially close. It's the output probability of the unitary circuit. But then we are making this conjecture that outputting anything in this much wider interval around the black point, right? The true output probability of the random circuit was hard. And so intuitively, if we're going to make that crazy conjecture, right? It shouldn't really matter if the center of the interval was like, you know, was the unitary circuit, which is really, really, really close to that unitary circuit relative to the error that we're going to be assuming is hard anyway. So we proved that's true. And so at the end of the day, this is really just something, the truncations we think of is just happening in the analysis, you can think, it doesn't change the experiment, it didn't change the circuit. But it took a lot of work, okay? And you know, I think luckily for everyone involved, you know, more recent follow-up work by Rami Smobasag showed in fact a very related argument that eliminates the need for these truncations altogether. Now I'm not going to describe that with this, but you can read about it, it's called the Cayley path. It's just, it's exactly the same argument that I just described. It uses this Lipton style polynomial extrapolation. The only difference is that the way, the way he goes from the, okay, so the way he draws a path, if you will, from the worst case circuit to the random circuit is different, right? Still uses the same strategy, but it has the property that that path stays unitary on every point, okay? So we don't need to worry about this anymore, although this was the primary worry in our original paper, yeah. Yes, no, no, it should be like one over K factorial. Yeah, and the point is you can take K to be an arbitrary polynomial. Yeah, yeah, right, right, right, but the point is is that the power of the oracle that whether it solves it for unitaries or slightly non-unitary oracles is the same. That's the formal result, great. And I don't think about it as an oracle, but yes, you can think about it. I just think about an algorithm that works on the average case. So we're comparing two algorithms. One is an algorithm that works with respect to a distribution that's supported on slightly non-unitary circuits. The other is unitary circuits, same power, yeah. Yes. Yeah, we'll talk about exactly the dependence on theta. The truth is theta is super small, so these things end up sort of coming out in the wash, but we'll show exactly what we'll get more formal about this. The very important question, like how close exactly is the distribution over circuits with respect to theta from the true distribution that we care about? And we'll answer that in a moment, I promise. Any other question before we go on? Yes. Yeah, yeah. Yes. That's right. I think this will be clear. You get an exponentially close accuracy. Okay, good. Okay, what I want to discuss next is what we really care about. The robustness to additive error of this method. This will take us to the research frontier. We'll get to essentially the end of the literature. So what we've been doing since 2018, yeah, among other things. Okay, so so far we've assumed the ability to compute the output probabilities of these random but correlated circuits exactly. We assume that we could exactly compute the output probability, which I'm calling P0, of C1, sorry, of C prime theta one, C prime theta two, and so on. These are these correlated but individually random looking circuits. But of course the actual setting is approximate. It's additive approximation. So we're given two M evaluation points, theta one, y one, theta two, y two, and so on, so that for two thirds, let's say, of these points, right? The evaluation, y i, is delta close to the output probability of these random but correlated circuits. And remember the delta we want is one over, some constant over two to the n, right? That's what we need to get hardness of sampling by Stockmeyer's algorithm. We've established that earlier. We're not gonna be able to show that, but we're gonna be able to show this for sort of increasingly larger and larger deltas, okay? That's the goal. So actually if you look at the picture that I have in my head here, it looks a little bit like this. We're really considering two polynomials, right? There's the polynomial I'll call P of theta, okay? That's the single variant polynomial that encodes the true output probability of these correlated but individually random circuits, right? And then there's the extrapolated polynomial. I'll call that q, right? That's the polynomial that we get access to by using say Lagrange extrapolation on our points, right? So in this picture, the points here are these red points. Those are some approximations to the output probability of the random circuit. I should say actually now that I look at it, this picture's a little bit optimistic. One thing that's missing here is it seems like all of the red points are kind of hugging the ideal curve, the black curve. But in fact, we know that like one third of them could be like out in space. Okay, but anyway, it's a little bit optimistic, but whatever you get the point, you have these red points. They're encoding an approximation to the output probability of these random circuits. But then what you actually do is you extrapolate and you have this blue polynomial, which we call q. That's what comes out of your Lagrange extrapolation process. Now, you know that within this tiny subinterval of the unit interval 01, which we'll call zero through theta max, right? That's your largest theta that you choose. These two polynomials are pretty close to each other. Whatever that means, we'll formalize that in a moment. But they're pretty close to each other, right? But then what we really want, as we've said many times now, is not the difference of the polynomials in zero through theta max. We want the difference of the polynomial all the way over at one, because that's encoding the worst case point, right? We really, in this language, we really want to know p of one, right? Just encoding the output probability of our worst case circuit by construction, okay? So the question that we should be asking is how close are these two polynomials all the way at one in terms of the two parameters that we have, which is delta, that's the error, right? And theta max, that's the largest extrapolation point that we have, okay? Now, the answer was actually given to us in some sense by this beautiful result due to Pateri in the early 90s. Now, Pateri was not thinking about quantum computing, as far as I can tell. He was thinking about approximation theory. And he comes up with this result. So he says the following, I think it's really nice. If we have a degree, let's say a real polynomial in a single variable, z of theta, and it's degree d, and it's bounded in a certain subinterval of the unit interval of width theta max, yeah? Then all the way at one, that polynomial is upper bounded by delta, right? That's the error within the box, right? Times two to the d times theta inverse max or theta max inverse, okay? So in other words, in our case, what's the polynomial we're gonna consider? We're going to consider the degree two m polynomial, which we'll call z in theta, which comes from subtracting the ideal polynomial and the extrapolated polynomial, okay? And the picture now looks something like this. If we're thinking about it in terms of z, we've bounded the polynomial in a box of width theta max, height delta. That's your error, yeah? But we don't care about it there. We care about it, unfortunately, right? We care about it all the way at one, and the problem is that it can jump, okay? In general, it does. How much does it jump? Well, in the worst case, by delta times two to the o of d theta max inverse, okay? So what is Pateri saying here? What he's saying is the following. If the goal is to reduce the error at our worst case point, that's to make z of one in the language of the prior slide, as small as possible. We have two choices, right? We have to focus on the stuff that has an exponential dependence, which is d and theta max inverse. So in particular, we can either make d smaller to reduce that exponential dependence. That's really hard. I don't know how to do that, in fact, right? It's really hard because that d in circuit is, it's not depth. The degree, it's close to depth, it's m. It's the size of the circuit, which is d times n. Yeah, you're right, there's two d's. That's a little bit weird. Now, the d here is a degree, and it also happens to be like two m, which is two times the size of the circuit, the depth times the number of qubits. And so that's really difficult to reduce the depth for this Pateri argument. We're going to have to change the polynomial, which I don't know how to do, okay? Not even clear that's possible. Okay, but wait a second, there's another parameter, right? Which is this theta max parameter. So the other thing Pateri says you can do if you wanna decrease z of one, is increase theta max because the dependence on the exponent is theta max inverse. So you increase theta max, that's very intuitive that that would reduce your error. Why is that? Because what are you doing? You're bringing the points closer to the worst case point. Right? Very intuitive that that should reduce your error. Okay, so what's in this argument, what is determining how large we can set theta max? That's the important question, okay? So let me tell you what it is. It's really this Lagrange extrapolation, the fact that at the end of the day we really wanted, we were sort of being perfectionists, we wanted to get every single point correct, okay? Remember we wanted a union bound and the point of the union bound was that, we wanted to take the probability of the faulty algorithm, the average case algorithm to be so high that with high probability, with probability five, six or whatever, you get all the points correct. So that means that we need the algorithm, the average case algorithm to succeed with probability greater than or equal to one minus something like one over m. Maybe it's one over two m. Right, that's so that by a union bound we can ensure that it's correct on all points. Okay, good. Okay, but now let's think about this. As theta gets larger, this doesn't come for free, as theta gets larger, right? The scrambled circuit, scrambled but correlated circuit is getting closer and closer to the worst case circuit and it's looking less and less random. Remember we go back to what we talked about in the permanent case, it's exactly the same thing. There's clearly a penalty for making all of your points like essentially the worst case point. There has to be a penalty there, right? Okay, cool. Where does it come in? Well, it turns out, I'm not gonna prove this. It's not so hard to see if you know basic facts about the hard measure. The distribution of the circuit C prime of theta is O of m times theta close in TVD to the truly random circuit, okay? And so what that means is that the worst case, sorry, the average case algorithm is going to work less well as theta gets larger and larger. How much less well? Well, with probability one minus m times theta on these points, that's a property of the TVD. And so if we want one minus m times theta to be one minus order of one over m, that requires setting theta to be one over m squared, right? Okay, now if it's one over m squared or smaller, we now plug in Paturi's bound, right? We get that z of one, the error of our worst case point, that's what we care about, is upper bounded by delta times two to the d times theta max inverse, which I'm just plugging in now, plugging in, right? Is delta times two to the m, right? That's d or like two m, whatever, doesn't matter that much, times one over theta max, right? So that's m squared, we get, we need delta, little delta, to be like one over two to the m cubed to compensate for the blow up, okay? So this is actually how accurate we need if we're going to use this Lagrange method. Questions about that? It's good? Excellent, so, okay, so now we wanna improve this and we were stuck on this for several years, actually like two years, two or three years. But what we knew is that if we want to do better, we're going to have to have a new error robust means of polynomial extrapolation. We can't use Lagrange extrapolation because that's what's getting us stuck. You know, demanding, being a perfectionist, demanding that the algorithm gets all the points right is precisely why we're stuck in this argument as I just showed in the last slide. So to improve the imprecision, we need a new error robust means of polynomial extrapolation. Okay, we're gonna do this by oversampling. Remember, one parameter that we had sort of at our disposal is that so far we were always taking the number of points to be like the degree of the polynomial like two M or whatever, plus one. But there was nothing stopping us from taking the, some much larger polynomial, right? That's something we have at our disposal. It's just that it makes the reduction a bit slower but just polynomially slower, okay? So here's the main theorem that we prove and I'm not gonna prove it here. It's quite a technical theorem but the point is just to tell you to sort of get you to be inspired maybe to read about it. We call this the robust Berlick-Camp-Welsh theorem. The reason it's called that is because it's similar in spirit to a theorem that was known for finite fields. It's very similar. It's used in classical error correction. But anyway, here's what the theorem says. It says, suppose I give you order of D squared. A faulty evaluation point, theta one, y one, theta two, y two all the way up through D squared. To a single variant real polynomial theta of degree D with two properties, okay? First, all of these thetas are in this narrow sub-interval of the unit interval which happens to be of width one over D. That's really important actually. It's one over D, okay? And two, we know at least two thirds of these points of the yi are delta close to the evaluation of the polynomial at theta i, okay? That's the property we get. Then what we show is suppose someone gives you another polynomial. We call it q in theta, okay? The only thing we know about this polynomial q is that it's delta close on a potentially different two thirds of these points, okay? Maybe a different two thirds of these points than the original polynomial is close on. We don't know, okay? But no matter what polynomial this adversary chooses as long as it's delta close on two thirds of these points, well then it needs to be delta times two to the O of D close to P of theta for any theta in this interval, okay? So here's the picture. Okay, so we have the black curve. This is now a more accurate picture of what's going on in the last one, by the way. We have the black curve that's encoding the output probabilities of these, you know, random but correlated circuits, right? That are parameterized by single real variable theta. And then we have this, but we're not given that. We're not lucky enough to be given that polynomial, you know, exactly. Instead we're given these red points and these red points are faulty, right? What do I mean by faulty? I mean two thirds of them are within delta of the black curve, that's, you can see that, right? But then one third of them are like wildly off and we don't know which is which. Yeah, we're not that lucky, okay? But what this theorem tells us is that if I give you any blue curve that hugs a potentially different two thirds of the points, then no matter what blue curve you give me, within the sub-interval, zero, one over D, the maximum that the blue curve can differ from the black curve is like delta, sorry, it's like delta times two to the O of D, okay? All right, so I'm not proving this, but this is our theorem. Now, this robust Berle-Kamp-Welch theorem helps us. It's going to allow us to take theta max larger, okay? Let me say why. Yeah, go, sorry, please. Yes, yes. No, no, no, no, no. I will tell you how we come up with Q shortly. Yeah, good question. For now, we don't, this is an information theoretic statement. It has nothing about the complexity of coming up with Q. We have to worry about that, of course. All right, cool. So now, here's the idea. We're given these faulty points to the polynomial, P of theta, right? Again, these are these say M squared points. The degree is like two M, whatever. Theta one, Y one, theta two, Y two. We know that two thirds of these points are delta close to the original polynomial, right? Then one third of these points are crazy and they're off. Yeah? Okay, so now here's what we're gonna do. This actually answers Alex's question. We're gonna ask an NP oracle, okay? To give us any polynomial Q that's delta close to a potentially different two thirds of these points. Okay, wait a second. There's two questions there. One, why is that fair game? Seems like I pulled that out of my hat. Why is that fair game? Any ideas? Why do I have it? It happens to be that without loss of generality, with this reduction, I have an NP oracle left over. Where did I use an NP oracle already in this reduction? Going from? The Stockmauer algorithm, going from sampling to computing. So we already have an NP oracle, so we can use one. Second question, why is this an NP problem? That's not, I mean that is clear, but it's not entirely clear, right? What is an NP problem? It has to be something that you can check, right? So your adversaries send you a polynomial. You might worry that your adversary cheats you somehow. But he's not going to, he or she. It's not going to, why not? There you go, you just check the points one by one, we're not saying, we're not demanding any relationship between the polynomial that the NP oracle sends you and the original polynomial P, that we couldn't do. We're just saying, as long as the adversary sends us any polynomial that's delta close to two thirds of these points, any two thirds of these points, well that clearly you can check by just seeing how close the adversary's polynomial was to each of these evaluation points. And you accept if two thirds of them are within delta and you reject otherwise. So this is an NP problem. Okay, good. Okay, the way I presented it, it's absolutely like a relation problem oracle where the output is like a, I think, but there's a sort of a binary search like reduction. There's a search decision reduction so it doesn't need to be stated like that. I'm not going to get into that, but read the paper. That's not a limitation of this approach. Okay, so this can be easily checked. Now what does Robos-Berlin-Kamp-Walsh theorem says? Well it says within the small intervals, zero through one over m, right? No matter what polynomial q then the oracle sent you, right, you're now bounded by this delta prime, which is within this interval zero, one over n, which is now by the Berlin-Kamp-Walsh theorem, delta times two to the O of m, the m is the degree. Okay, now we plug in Peturi. Now what do we get? Well, we get that the blow up is delta prime, I'm just defying delta prime to be delta times two to the m. So we have delta times two to the m times two to the O of d, d is again about m, right? But now times, you know, theta max inverse. But theta max now is actually what? One over m, theta max inverse is m. We get that to compensate for the blow up, which is two to the m squared. We need to take delta to be one over two to the m squared. This saved us by, you know, factor of m in the exponent. We went from one over two to the m cubed to one over two to the m squared. Questions? Yes, what the difference is? It's all about the fact that we're willing to make errors here, okay? By oversampling we can compensate for those errors using error correction techniques. And the only difference, the quantitative difference, which is very clear, is that it allows us to take theta max instead of one over d squared, one over d. Yes, yes, yes, yes, yes, that's the way to say yes. We took m squared points, much more points, but now in a much larger sub-interval, exactly. Exactly. Okay, great. But I advertised two to the minus m log m, okay? So there's one more trick here. It has to get you from two to the minus m squared to two to minus m log m. Here it goes. Remember what is the input? We're given these faulty points, same thing. So it's like theta one, y one, theta two, y two, and there's m squared of them, and each one lives from zero to one over m. That's what we just described, yeah? All right, now, what did we do before? We asked the NP oracle for a degree, for a degree two m or whatever, polynomial, that delta agreed with two thirds of these points. That's what we asked. And now we're going to do something just a little bit more, a little bit different, okay? Here's the trick. Rather than asking the NP oracle for this approximating polynomial Q of degree m, we're going to consider just substituting the variables in this polynomial. It's just a formal replacement of variables. We're going to replace the variable theta with theta to the k, for some large k that I'll fix in a moment, and we're going to ask for this new polynomial, theta k, theta prime in, not Q prime in theta k, okay? In theta to the k, all right? Now, equivalently, what's going on here, maybe a little easier to follow, is that we're going to ask the NP oracle, rather than giving them these faulty points theta one, y one, theta two, y two. We're gonna ask the NP oracle for the points theta one to the one over k, y one, theta two to the one over k, y two, and so on. Okay, now hold on, this is really counterintuitive that this is going to help us. Oh, and sorry, let me say one more thing. We're also now, of course, going to compensate by asking for a larger degree to compensate for the one over k, so rather than asking for a degree two m polynomial, or say m polynomial, it's now m times k polynomial, yeah? Now, hold on, this sounds really counterintuitive. It sounds like we just compensated somehow by blowing up the degree of our polynomial. That has to be bad news. Has to be bad news because, what, why actually? Because Pateri's blowup is two to the d, right? So we just took one for the team there. We blew up the degree. We made the blowup worse in some sense. Okay, but what happens to theta max? And that's the important thing here. That's where we're saving. Well, before it was whatever it is, the largest of these thetas, right? And we knew that the largest of these thetas was one over m, right? That was the, that's how we chose them, right? Now, what is it? Well, now it's that theta max, whatever it was, which is at most one over m, to the one over k, yeah? That's how we gave it. Okay, so now we're gonna plug that in to Pateri's bound. We're gonna see that we make progress. All right, so Pateri's bound says z of one is upper bounded by delta times two. To the degree, it's now k m, but theta max is now m to the, or one over theta max is now m to the one over k. Because it was one over m to the one over k, now it's m to the one over k. All right, we're going to take k to be log m. And what do we get? Well, we get log m times m, right? And then we get m to the one over k, which is subleading at some constant, yeah? You can do that in your head, hopefully. But you can trust me, it's subleading. And so we get that this blow-up is delta times two to the m log m. And so to compensate for the blow-up, we need to take delta in the order of one over two to the m log m. That is the state of the art. Can I take questions on that? Yeah, yeah, okay, good. Yes, you do need to take more points, but remember we can take as many points as we want. In particular here, we are already taking like m squared points, right? So we don't have to take that many more points, right? Yeah, yeah, yeah, yeah. No, no, not m to the 2k. You just need like m squared, so whatever your degree is squared, yeah, yeah, yeah. Sorry, yes, that's right, that's right, that's right. To do any better, okay, if you're going to just use this argument, then you only have one place to go. But you could also do something, I think that's not really the hope that we have. I think the hope that we have to do better is that you can use something more about the nature of these polynomials. So in fact, you know, I'm gonna say this sort of in passing because it's a little bit complicated, but this is essentially tight. You can't really do better if you are not going to use some special property of the polynomial that comes out of this circuit. You can see that because Chebyshev polynomials end up essentially saturating these bounds, but that doesn't mean that the polynomial that comes out of this extrapolation that comes from the circuit, right, is a Chebyshev polynomial, it's probably not, but we have to understand why not, because otherwise you wouldn't be able to do better than these bounds in this parameter regime, okay? Yes, no, it's, okay, I'm not gonna do math on the spot, but this should all be a constant at the end of the day, okay? You should work it out. It's M to the one over log M, yeah. Do we agree with me? Yeah, yeah, yeah, yeah, exactly, exactly. Should be some small constant, like two or something like this, yeah? Okay, it looks scary, but it's not. Any other questions? Yeah, yeah, yes. It does, absolutely, say about that, let's go back. I hope it does, otherwise I messed it up, which is possible, uh-huh, here it is. So each of the theta i's have to be between zero and one over the degree. Ah, the distance between them, yes, okay, good, good, good, okay, right, I thought you were meaning the distance to the worst case point. Yeah, there is, there are some technical requirements about the distance between them that I'm dropping, but it's not very, they're not very, it's not very large, okay? You're sort of already paying for the fact that these points are very clustered by this exponential blow-up. I mean, that's the intuition, yeah. Okay, great. How much time do I have, Alex? 10 minutes, okay. I would like to keep going, but I'm happy to take questions. This is essentially the end of this particular section. More questions? Yeah, yeah, yes. You mean this non-unitary thing, yeah. Yeah, yeah, yeah, yeah, yes. Well, we sort of know that, right? Because we know exactly how much error we're incurring in total variation distance, right? Between these two distributions, the distribution on the one hand over the truncated circuit and the distribution of the unitary circuit. You can bound these things, right? Yeah, for theta you do have, yeah, right. It's a good question though. Can we keep them coming, anything else? Yeah, no, what if you or do you? You do not in the way I do. What if I did? Yeah, that's a good question. I'm not sure you would gain anything there. My guess is that the error would depend on the max theta, essentially, that you choose. I'm pretty sure that's what's going on, but you could work it out. We didn't do that. There are certainly other ways you could imagine extrapolating. And at the end of the day, you can also use this Cayley path, which is more elegant in the sense that you don't use the unitary. Again, we think of that as an analysis trick, but it's a very nice analysis trick and that it saves us 25 pages of work or whatever, yeah. Okay. Let me end with some comments and open directions. So still we haven't resolved what I consider to be the main question here, which is to push this imprecision to two to the minus n. Let me also say there was another paper on the archive relatively recently, which claimed that they got robustness two to the minus m. This paper had some new results and used a lot of the techniques that you just saw. We haven't actually been able to verify this paper yet, though, so I'm not really stating it as a theorem until I can kind of work it out myself, okay? It's not that it's wrong. I just personally can't verify it. Nonetheless, the state of the art could be two to the minus o of m, okay? Could be that the ideas are already there. Not gonna make any more comments about that. Or boson sampling, things get better. Essentially, because the size of the Hilbert space changes. So what we wanted is one over e to the n log n, because that's the dimension of the Hilbert space in a bosonic sampling problem with m equal n squared modes, n is a photon number. And if you work the same argument out, you get one over e to the six n log n. So we're off by a factor of six in the exponent. Notice you still have this sort of n log n. And the degree here is n, because the permanent is of degree n. So it's really the same argument, just gets you a little closer to what you want, because you wanted e to the n log n, okay? The other thing is that we've also discovered barriers that I'm not really going to describe so much, but saying that we'll need some new proof techniques if we want to get further. And I already talked a little bit about one, the fact that there's this like Chebyshev polynomial that in principle sort of saturates these upper bounds. But there are others as well saying that, maybe the fact that we need to go from six to one, for example, in the exponent for boson sampling, might be harder than we think it is, okay? Okay, great. I want to move on though. This is starting a very new thing. We'll probably have, oh yeah. Yeah, yeah, yeah, yeah, yeah, yeah. Let me say your question a little bit more precisely, which will show how important it is, okay? All of these arguments are really using the degree. That's really what they're using, right? They're using an exponential blow-up in the degree. I don't think there's gonna be a way around that. Now, the problem is for random circuits that degree is m, which is n times d, always bigger than n. We want n, right? That seems really problematic, right? And you can think of that as a barrier. On the other hand, there's a few, there's a few hopeful suggestions that I'll give out there. We don't have an answer. If I had an answer, I would have solved the problem. You know, it's a hard problem, but one thing is that actually this new paper, which we're still sort of looking at, to get two to the minus m, does exactly that. It finds a new polynomial, which actually does have a reduced degree, okay? Now, again, I don't want to talk more about the paper, but I think that there's an interesting idea there, which actually works by reducing the degree in some sense. You know, the second thing I'll say is that, again, I think to do better, we've sort of reached the point which we're not gonna be able to just use polynomial extrapolation, precisely for this reason, right? Because you're always gonna have an exponential in the degree if you just use polynomial extrapolation. But the hope would be that the points that are actually coming out of this, the polynomials that are actually coming out of this might be able to be approximated by a considerably lower degree polynomial, right? And in fact, we've done a lot of thinking about this. I think this is actually somewhat promising. Yeah, yes, please. Yes, yes, uh-huh. And I talked a little bit about that, but it is confusing, so maybe I'll spend a little time talking about that. In the beginning, when we talked about these sum problems, which is a much simpler discussion, I think, these quantum sum problems and these classical sum problems, we were talking about a multiplicative error always, right? You know, but then when we went to talk about the output probabilities of random circuits and this delta circuit estimation problem, we went back to delta being an additive error. But I think I said at the time, and I'll say again, because it's important, what the quantity we want is one over two to the m. Now that happens to be one over the Hilbert space dimension, but more importantly, it also happens to be the size of a typical outcome probability of a random circuit, right? And that means actually that we know if you have a one over two to the, sorry, one over two to the n additive error estimate to the outcome probability of most random circuits, that suffices to give you a one plus, plus or minus one over poly multiplicative error estimate. Precisely because the additive error, if you multiply out the multiplicative, you get epsilon times the quantity itself. That's the additive error that you make when you're multiplicatively approximating things, right? And if you can show that the size of that quantity is usually around one over two to the n, then in fact, the additive error that gives you a multiplicative error is one over two to the n. So I made a slate of hand, right? I went from multiplicative error and then I went to additive error. Why would I do that? Any idea, what was I thinking? Why would I do something like that? Because what was I trying to show next? So I wrote every, it really was multiplicative error. I then made it additive error just by multiplying out what the multiplicative error was. But why would I wanna do that? Can you, was it just sort of a crazy idea? Because what could I actually show when I used these polynomial extrapolation results? Additive errors, right? That's what I can actually show. I can't show the multiplicative error which would be this two to the minus n, okay? So that's why I wrote everything out additively. That was a choice, it was a pedagogical choice. It wasn't necessary. I could've also said, look, I want, what I want is a multiplicative error. And I can't get it, instead I can get these additive errors that are getting sort of larger and larger and closer to two to the minus n, yeah? Great, how much time? Okay, I don't wanna, and I don't wanna go to a totally new section, but I'm happy to take one or two more questions. We're all out of questions. Okay, let's take them offline. When we come back, we'll talk about benchmarks. These are actually much closer to the experiment, much closer to how we verify the experiment and asking if there's a hardness in sort of different signals. So not bounded TVD, but other things that you can ask about the output distribution of the experiment, okay? Thanks a lot, I'll see you next time.