 talking about various benchmarks have been used to try to verify that we're sampling from the right output distribution when we consider random circuit experiments. And the one that Google sort of is most partial to recently has been this linear cross entropy benchmark. Okay. I think of it as an alternative measure of heaviness. We've seen this hog and quoth, which sort of measured that maybe a little bit more directly. But it's very similar, right? So XEB took two parameters, the distribution, the output distribution of the experiment. I'm calling that PX. We want to assume as little as possible about that. And then the output distribution of the ideal circuit, we have no noise. Okay. That's P ideal. In principle, we know about that distribution because we're holding in our hands the ideal circuit. Okay. Great. The XEB is really easy to describe. It's just 2 to the n. That's a normalization factor times the dot product of these two distributions if you think of them as vectors of exponential length. Yes. Yeah. Hold on. Yes. Absolutely. There's a really important reason that has to do with sample efficiency. We talked about this last time, but I'll get there in a moment. Sorry. No. Give me a second. Give me a second. Let me get into this. Okay. All right. So it's 2 to the n times this dot product between the two probability distributions. Okay. What you're thinking of vectors? All right. And of course, you can think about this as an expectation value of P ideal. All right. Great. Now, we said one thing we can calculate if we assume Porter-Thomas statistics, right, which is essentially that each output probability is sort of individually distributed as IID exponential distribution. Okay. We can calculate that the expected XEB in the ideal case where PX is equal to P ideal, so it's the XEB of P ideal and P ideal, that would give you 2. That's a very high probability. But if the distribution you're considering, the experimental distribution is uncorrelated with the ideal distribution, like, for example, the uniform distribution here, you get 1. Okay. And both of these facts are very easy to compute. In fact, the second fact doesn't even use Porter-Thomas. It's just true. Okay. Okay, great. Now, why do we care about this? I think this answers the question I just got. Well, because it can be well approximated in few device samples via what I'm calling concentration of measure arguments. That's a little bit more complex. It sounds more mysterious than it really is. It's just sort of the law of large numbers. But it's still not an efficient score, and this is really important, right? It takes exponential time to compute ideal output probabilities of the observed samples. So how did we say that we calculated this in principle? So you have an experiment. You know, I have in my hand the random circuit that the experiment is supposed to implement. What do I do? Oh, come on. Okay, I'll start. You take the experiment, you measure, right? You do it many times. You get a relatively small number of outcomes. Yeah? Now, what do I do with those outcomes? Perfect. You compute the P ideal. That's a completely innocuous process that costs us nothing, right? No, not at all, right? That's exponential time. In fact, there's a deep irony in the current theory of quantum advantage, which is precisely what we were trying to prove was hard, right? In the first few days of this lecture is this task, computing the output probability of a random circuit. But then, you know, but then when it comes to verify that this problem has been solved, that's what we do. Now, it's true that we might have, there might be a small advantage compared with just sampling from the circuit because you only have to compute in principle a small number of output probabilities. Whereas, sampling from the output distribution with a classical algorithm would presumably take a lot more. But it's sort of a fine line and we'd be much more comfortable if we didn't have to do this, okay? Great. So this is what we do. We observe some experimental outcomes, z1 through zk. Then we just take 2 to the n, that's our normalization, times the ideal output probabilities. Each one of these guys, we compute on a classical super computer, we take the arithmetic means. We divide by k. That's the number of samples. Okay, great. But why is scoring well on XEB classically hard? This is exactly what Ernst and Ngunn studied in 2019. They came up with this problem called X-Hog, okay? For linear cross entropy, I guess that's the X heavy output distribution. That's Hog. All right? And what it says is it's sort of exactly the same intuition. It's sort of formalized, it's trying to formalize the intuition for why we might think that scoring well on linear XEB is a hard problem. By the way, what was that intuition? We talked about it quite a bit, but what was that intuition? So distributions that score well, pairs of distributions that score well on XEB tend to what? Tend to share the heavy elements, the heavy outcomes, right? And that's the whole point, all right? And so XEB, sorry, this X-Hog is just getting to that property directly, saying it's conjecturing that given a random quantum circuit, outputting K samples, Z1, Z2 through ZK from the output distribution of that circuit so that, you know, the, that expectation is an expectation over I, chosen uniformly, so it's just, this is just the, you know, the mean, the arithmetic mean. The arithmetic mean of the output probability is sufficiently high. It's B over 2 to the n, okay? Where B is all, you know, we want it to be like 1 plus epsilon and preferably as large of an epsilon as possible, okay? So an ideal experiment, right, if we, if we ran a noiseless circuit, which of course never happens if we're not error correcting, we're hoping to achieve B equals 2 epsilon equal 1, right? That just follows from Porter-Thomas like we discussed a moment ago. Okay, but noise can, can have the, it can cause the experiment to have considerably different values for B. Now help me out. If I'm scaling the experiment and I'm not correcting my noise, would I expect 0.02, 0.005? What would I expect actually? Yeah, something, something exponential. Now there's, depending on what parameters, it could be 2 to the minus D to the minus, let's say 2 to the minus D for now, but it could be even worse than that, okay? So, you know, it's really important that when we're talking about this, at least so far, we're really talking about the noiseless case, okay? So if someone tells you that you should expect to see a constant value of B, right? That's, you know, that's, that's a noiseless statement, okay? Now Google scores 1.002 on its 53 qubit RCS experiment, right? Now, you know, why might this be hard? We talked about this. This sounds like a very small number. Every time I say this, the first thing that the people do, they kind of, I can't believe that a hardness claim was made when you're so close to 1, which would have been just the uniform distribution. That's literally some guys sitting there with a random coin, right? It gets 1. And we're really excited about 0.002 above 1. But I said that's not entirely crazy. I'm not saying it's not crazy, but not entirely crazy, particularly. But why? Right, so it's, it's actually reasonably large. If you're, if you're thinking about an exponential decay, right, which is, which is the noise, which is the noisy case, the realistic case. What else, though? I think there's actually a more important thing to realize. Which was really the, which is really, I think one of the main reasons Google was happy, not really complexity theorists like me, but Google was happy. Why was that? Yeah. Exactly. They simply plug this into their supercomputer, they have a big supercomputer, they try to, you know, achieve the same sort of XCB. And they found it was really, really hard. Now the initial estimates were like 10,000 years with the classical supercomputer. I think almost anyone realized that that might be a bit exaggerated. It since come down. Okay. But I think it's still pretty clear that it's not entirely trivial to get a score like this. And of course the, the, the latest experiments have been larger and so on. And by the way, achieve somewhat similar score, which is incredibly impressive, right? Because if you're, if you have a larger experiment, you're still not correcting your error, and you achieve the same sort of score, it means that your individual gate errors have gone way down. Okay. Great. Now, this seems like a sampling task, though, right? It seems like we're not really, you know, scoring well on XCB, it's really not getting us that far, hold on, it's not getting us that far from sampling from the output distribution, which is what we were talking about the first few lectures. So we're interested in, in, in showing that a different problem is hard, which we're going to call XQUATH. And yes, what is the question? Yeah, I don't know. That's politics, right? I don't know. I mean, it's not exactly the same score. It's just sort of, you know, relatively close. And, and, you know, you would not expect that if you blow up the, you know, the, the dimension of the Hilbert space, the number of qubits, and you're not reducing your noise, you know, you expect that that score is going to go way, way, way, way down. So the fact that it's not, not meaningfully way lower indicates that they've improved their noise gates a lot. Yeah. Suspicious or not? I don't know. I didn't find it suspicious because I trust the experimentalists, but yes, that's an experimental claim. Yeah, maybe. Maybe. I don't think so, though. We'll talk about noise, and I think we understand this. There's a theory here, and I think we understand this. I don't think it's, I don't really think there's a good reason to be suspicious. Yeah. Yes. That's a good question. No, I've been profusely vague about this, this entire lecture, because it largely depends on what regime of noise you're talking about, and what you want to believe about your noise. There are regimes in which these quantities, like say, fidelity, which is actually a good proxy for, for XEB is a good proxy for fidelity in certain regimes. There are regimes in which it scales as exponential in merely the depth. There are also regimes in which it scales not only in the depth, exponentially, you know, inverse exponentially in the depth, but also in system size. It's very important to, to think about these things. We'll actually talk about that when we get to noise. I think for me, for you, right now, I simply want to think that in general you're going to be, it's clear you're going to be shrinking exponentially in some parameter, depth, or maybe system size, and so these sorts of scores are pretty impressive. Yeah. Okay, great. So there's this x-quad assumption, and, and there's a reduction that says x-hog is hard, assuming x-quad is hard, is, is, is true. And x-quad is saying that there's no efficient classical algorithm that given a random quantum circuit produces an estimate to the output probability, P0, with the following property. Right? What did I say was, this is a weird claim. So it's not something that's, you know, comes from your undergraduate complexity book. Right? This is a kind of a very customized conjecture, but what's the intuition? What is this sort of saying? What did I say about this last time? Uh-oh. Yeah, yeah, exactly, exactly. So the way you think about this is as follows. You're comparing two algorithms, they're both trying to get an estimate to the output probability, that's P0 of the random circuit. The first of them is just going to output 1 over 2 to the end, no matter what circuit. That's the trivial thing. I mean, even I can do that. You give me a circuit? I don't even think about it. I say 1 over 2 to the end. The second algorithm is, is, is going to output P. And we want to say there's no algorithm that outputs P that's too much better than the trivial algorithm, but in this mean squared metric. Okay? Cool. Now, and we want, and we want the too much better to be like no more than, than 2 to the minus. Okay? Yeah? Okay, so no classical algorithm can achieve a mean squared error at estimating an output probability of a random circuit that's slightly better than the trivial algorithm that outputs, that always outputs 2 to the end, 2 to the minus end, rather. Okay, so x quoth implies x hog and we sketched how this reduction works. You're going to tell me, how does the reduction work? You even may have proved it in the problem set, so I'm hoping that you understood. So hold on, I'll start. It's by country positive. We assume that there's an algorithm for x hog. That means you output, you know, there's a classical algorithm that given a quantum circuit outputs a whole bunch of outcomes, two-thirds of which are heavy, right, in the distribution, so you get from a, you know, from a turnoff bound, and we're going to, we're going to claim that the all-zero outcome is heavy, if what? It's there, it's on that list. Otherwise we're going to flip a coin, and I'll put heavier light. Great. Okay, let's keep going. Okay, now here was a surprise and I think, like I said, there were many classical algorithms that have been introduced for random circuits you know, I always have like a, you know, a sort of individual notion for how surprising I find each of these algorithms. I actually found this one to be maybe the most surprising, which is that x-quoth, this conjecture we just said, you know, that ends up being false at sublinear depth, and we were going through the the intuition, so this is x-quoth again, I'm going to just define the the bias that we're talking about an x-quoth, this difference of squares, to be to be the x-score, just for simplicity. That's x-score. So we want that no algorithm, no classical algorithm can achieve an x-score that's greater than 2 to the minus n. Yeah? Okay, the intuition, so why did we believe this in the first place, right? I think the intuition that was said many many times was that, well what's the best thing you can do to solve this problem, to achieve a good x-score, given a random circuit. Best thing you can do is write out this, you know, this output probability as a path integral on the computational basis, and sort of sub-sample paths, right? Take a small number of paths, evaluate the value of each path, which is efficient, okay? You know, and then, you know, maybe take an average and accept if the average is sufficiently large, the problem is there's too many paths, right, to give any sort of bias that scales like 2 to the minus n, okay? But here's the observation, right? It turns out that's actually not true. So I think this was quite surprising. It's, you know, it's true in the computational basis, but if we rather expand the the output probability in the poly basis, which we'll discuss in a moment, in fact the values of the paths are highly non-uniform. So the idea is that you can just pick a path very carefully, okay? We'll tell you exactly what that path is. We can calculate what the value of that algorithm would be. The classical algorithm that simply evaluates the value of that single path. It's a very easy, easy thing to do in this poly basis, and it turns out that will already, that algorithm, a very simple algorithm, will already contradict this quoth conjecture, but only when D is sublinear, okay? So it's said another way the score will achieve is something like 2 to the minus O of D. So if D is less than n, that's greater than the 2 to the minus n, yeah? Not holding my breath for deeper circuits though, why not? Sorry? No, it's not about paths, I mean, yeah it's fine, but it's not really about paths. Now we're thinking about noise, we're thinking that if you have super linear circuit, noise is going to affect you really poorly, okay? We'll talk about exactly how in a few slides. Okay, great. So we already described poly path integrals, you know we're denoting the the normalized poly operators by p sub n. It's exactly what you think it is, i, x, y, and z, suitably normalized, tensor n. The point is we can we can we can write an arbitrary n-cubit density matrix as a sum of these n-cubit poly operators where I'm calling this alpha t to be the coefficient of the t-th poly operator, yeah? It's just the trace of t times rho. Now we have this computational path integral and sort of the analog of this I'm claiming it's just sort of this this in the poly basis you can write out the poly coefficient of a particular n-cubit poly operator s as a sum, it's a sum over all t which is these are all the and the possible n-cubit poly operators, right? Of this quantity trace s, u, t, s transpose whatever that I'm going to call that a poly transition amplitude, okay? Times the corresponding coefficient of the t-th poly operator, okay? We went through this, the point is I'm going to call this quantity this trace with an s and t transition amplitude, okay? Okay, now I claim that we can express any output probability as a poly path integral in analogy to what were you were accustomed to in the computational basis, okay? It's not so hard to see, we can write out an outcome like Px or P0, yeah, the output probability of seeing all zeros when we measure n-cubits in this way. It's a sum over all poly paths s, okay? Each poly path is going to be a d plus one tuple of n-cubit poly operators, so there's a huge number of them, just like in the computational basis there's a huge number of them, but the point is, you know, the probabilities are sorry, the values of each path, that's the each term in the sum, relatively manageable, it's just a product, right, of these traces, and basically the way I think about this, you have two sorts of traces, you have the bookends, the first and the last term, and those are just poly coefficients of the input state and the output state, and then you have these transition amplitudes, yeah, in between, okay, cool, and we define this the value of each path to be this function f, which is a function of the circuit, the input x, and the particular path that we're choosing, s, okay? Now, remember how the algorithm is going to work? We're going to pick an extremely simple path, s, we're going to calculate the value of that single path, that's the value of that function on s, and we're going to claim that the expectation that that achieves, or rather the x-score that that achieves, is, you know, like one over two to the d, that's the goal. This proof is really simple, it's mostly algebra, but it relies on two important facts about hard random gates, okay, about the hard measure, I'm not going to prove these things, but they're very elementary. You can find them proved in many early papers on this subject, like this paper by Harrow and Low in 2009, so here's the first fact, there's fact one and fact two. Fact one, let's say u is a hard random two-qubit gate, and then we have two two-qubit poly operators, p and q. Well then if we look at this quantity, which is the expectation over u of a squared transition probability, okay, that's actually going to have extremely restrictive values, okay, either going to be one, if these two polys are both the two-qubit identity, it's going to be zero if strictly one of them is the two-qubit identity, the other is not, or it's going to be one over 15 in all other cases, okay, which is two to the four minus one, that's the important thing, okay, great, so now the second fact I'll call fact two is orthogonality of polypals, and we'll be using this a lot, which is that let's say c is a random quantum circuit with hard random gates, and you have two of these paths, s and s-primes are polypals, again they're d plus one tuple of n-qubit poly operators, then the expectation of the product of the values of these paths are zero, okay, so I'm not proving these things, it's a little out of the scope of these lectures, but each one of these things have like less than a page proof, it's very simple, okay, so fact one, the fact that when you consider these two qubit hard random gates, and then these squared, you know, poly transition amplitudes have very restricted values, okay, fact two, if you consider sort of n-qubit paths, right, in pn to the d plus one, the value, you have this orthogonality property, which means if you have two different paths, s and s-prime, the product of the values of these paths are zero in expectation, cool, okay, now we're ready, here's the algorithm, we're given a random circuit C, we're simply going to output p, which is this is this is what the algorithm outputs, it's by definition p, which is 1 over 2 to the n, some constant, plus the value of a very specific path, which I'm going to call s-star, okay, remember that's a d plus one tuple of n-qubit poly operators, and the d plus one tuple we're going to choose is simply z on the first qubit, to fence our eye on everything else, d plus one times, that's it, so if you're thinking about this as like a circuit, like has depth d plus one, each layer is very very simple, it's like z on the first qubit, i everywhere else, z on the first qubit, i everywhere else, and so on, yeah, any question about what the algorithm is? Super, super easy, right? It's kind of hard to imagine a simpler algorithm, okay great, so let's recall what we want to do is bound this x-score quantity, okay, I'm just copying and pasting what I had before, this is what, you know, this is the score that we're getting on x-quad, yeah, okay, we're simply going to, now a whole bunch of algebra, none of this is super tricky, it's just all algebra, okay, we're just going to simplify this expression, all I'm doing is expanding the squares and subtracting and getting some cancellations and so on, okay, so it looks like this, okay, well one thing we can do is look at this p0 to the n term, this is the second term in the expectation, minus 2 over 2 to the n times p0 to the n, it turns out you're going to prove this in your problem set, it's very simple to see that the expectation over p0 to the n is 1 over 2 to the n, okay, so that simplifies things a little bit, okay, and that's how I got this first quantity, it was 1 over 2 to the 2n, now we have minus 2 over 2 to the n, sorry, 2 over 2 to the 2n and we get a minus 1 over 2 to the 2n, okay, pretty simple, all right, great, now we're going to use another fact which you're going to prove in the thing which is very simple, it's that the expectation value over c of the value of this particular path itself, not the square, the value of the particular path of s prime, that's 0, okay, and because of that, that's very simple proof, very easy to see that, because of that, if you write out this p squared, okay, what happens, that's the second path, that's the minus p squared, do you see what I'm doing, well what happens is that, you know, you're getting a whole bunch of cancellations with any of the terms that only involve a linear value of the path, okay, and you're stuck with minus the value of the path of s star squared, okay, and we're just keeping the third term, the 2p times p0 to the n, that's just alone, we haven't simplified it yet, okay, good, now we're going to just simplify this again using the definition of p, all I'm doing is looking at the third term and plugging in the definition of p, it's algebraic substitution, that's all it is, okay, and now what we notice here, first of all because the expectation of p0 to the n is 1 over 2 to the n, right, that ends up giving, that ends up cancelling with this first term, this minus 2 over 2 to the n, we have a minus 2 over 2 to the n and a plus 2 over 2 to the n, that cancels simple algebra, then the third thing, sorry, the last thing I'm going to ask you, what we're looking at is this last term, 2 times the value of the path of s prime, times the output probability itself, and I'm claiming that by one of the properties we talked about, namely orthogonality, that's the fact 2, right, that ends up simplifying to 2 times the square of the value of the path, how do I see that? It's orthogonality, it follows clearly from orthogonality but can someone help me out a little bit? What am I doing here? Yeah, yeah, okay, I didn't quite hear you but it sounded like you said the right thing, all I was doing here is writing out p0 to the n as a sum over polypaths, right, and then I was multiplying that sum over polypaths by the value of a single polypath, namely this s star, that's f of c s star 0 to the n, now what happens? Well we're talking about expectations, right, that's crucial, now you have a huge number of terms but they essentially all cancel, and not cancel, they are 0, they're 0 by orthogonality precisely, okay, so we're only stuck with this quadratic term, now we have a super super simple algebraic expression and it ends up being 2 to the 2n, that's again a normalization, times the expectation over c of the square, this is just algebra, if you don't understand it, please just go back, this is not hard, this is like something a high schooler could do, yeah, no it was 0, it was 0, like in the sense that, no, no, for this, the claim was for any two paths that were not equal, the expectation of the product of the values of those paths was 0, right, there were two, there were two facts, this is fact two I'm talking about, it's called orthogonality, thanks for asking though, any other questions? Okay, this is simple, if you have a problem with this please just go through it and I think you'll, you'll find it okay, alright great, still pretty simple actually, so now let's recall the algorithm, you have this circuit, okay, and then you have a special path as, as star, okay, now we're gonna remember, we were thinking about each of these as layers, each of these CIs as layers of n-cubic gates, so they weren't a single n-cubic gate, they were, or there weren't a single two-cubic gate, they were like n over two, two-cubic gates, right, so now each, let's say each layer CI consists of gates CI1, CI2, and so on, so those are the two-cubic gates in each layer, so far we've shown that the x-score is equal to 2 to the 2n times the expectation of this square, that's what we just did in the last slide, so now let's write what this is, well I'm literally plugging in the definition of the, of the path, it's just the squares of a whole product of traces, okay, but each one of these has s-star, right, but there's, remember there's d plus one of them, so like s-star sub b is the, the d-th one and so on, but they're all the same, each of these s-stars is this z tensor identity, okay, now the first and last terms, the bookend terms, those are super easy to calculate, not hard at all to see that each one of them is one over square root of 2 to the n, but then we square it because it's the trace squared, and we get 1 over 2 to the 2n because there's two of them, and that cancels with the normalization, okay, so that's great, now everyone else but the bookend, these, these terms in the middle that are transition amplitudes, right, those look like this, each one of them looks like this, right, all I've done here is looked at any of the terms that are not the bookends, right, and written it out in terms of what s-star is, which remember is z times the identity, okay, great, now there's something you can verify maybe by yourself, it's not hard to see at all by just elementary facts of tensor products and traces and how they operate, but basically we can do is rather than thinking about this as an expectation over an n-cubit layer, we can now break this apart into the product of expectations of two cubit gates in that layer, okay, so this expectation, right, of of of Ci1, that's like an expectation over a two-cubit gate, the first gate in Ci and so on, okay, and of course they're all independent, all of these these gates in the layer, they're independent, so we can, we can, you know, break them up, and then all we're doing here is grouping the two-cubit gates that act on each pair of cubits together, you know, like we say, okay, which, which gates act on cubits one and two and so on, and we're just all sort of grouping them together by elementary properties of the tensor product, okay, aha, now we're really happy though, why are we really, at least I'm really happy, why are we really happy, there's a fact that talks about exactly this situation, which fact was that, fact one, now, now we're talking about fact one, good, excellent, what is it, what is fact one say, do you remember what fact, and this might be a hard question, what is fact one say, if you remember it, about this first squared transition amplitude, well, neither one of the two, the two, two-cubit poly operators that we're looking at are the identity, you remember what we said about it, one over fifteen, whatever, it doesn't matter so much, but it's a particular constant, it's not one, the rest of these, they're both identities, all of them, so the final answer to this whole mess is one over fifteen, but remember this is each, we have a product of D of these, so we get one over fifteen to the D, thank you so much, one over fifteen to the D, that's it, that's all it is, questions, that's literally the entire algorithm, it's very beautiful, it's very simple, but it also tells us something that I think we were surprised about as a community which is that this x-quoth is probably not true for any reasonable setting of parameters, that's not to say that x-hog is false, at least not in the noiseless case, because remember how it worked, we had this reduction, we said if quoth was true, then x-hog is hard, and so that reduction lost an exponential bias, and that's precisely what we're getting here, in the sense that if D is sufficiently large, right, we're getting something that's not so great, but that falsifies quoth, or x-quoth rather, precisely because we were naturally talking about an inverse exponential bias there, yeah? Okay, let me keep going on, I'll tell you some some some consequences, okay so the algorithm is really simple, in fact it only takes time, O of n to the D, because we're just calculating a single path, it scores one over two to the D, and what it means is that one over two to the O of D rather, of course it's one over 15 to the D, and so if the circuit depth is sub-linear and this is a higher score than one over two to the N, yeah? Okay, great, now a similar algorithm achieves a score of two to the minus O of D on XCB, not just x-quoth, but XCB, in fact essentially the same thing can be used to score to the minus O of D on XCB, but I'm not as concerned about this one, why am I not as concerned about a score of two to the minus O of D on cross entropy, as compared with X-quoth? Any ideas? Here's the answer is as follows, let's see, so I'll ask you another question of course, what is the score that the noiseless circuit makes on XCB? Remember this? Okay, so this, you know, if we're not considering noise, and if we are considering noise, a totally different story, if we're considering the noiseless case, this isn't really threatened very much, okay? A two to the minus O of D score can easily be beaten by the noiseless algorithm, okay? It gets more interesting when we consider noise where, like I said, you do have a, you do have a decay and then you end up comparing the decay of the XCB score that you get with the noiseless algorithm in the asymptotic limit and this algorithm, so that can be a different story, but at least in the noiseless case, this isn't that competitive, okay? On the other hand, even in the noiseless case, it is competitive with the XQuoth, precisely because we have this exponential loss in the bias. Remember I said the reduction was sort of lossy, and that's the point here, that's what I want you to get out of this, yeah? So it doesn't mean that XHog, at least in the noiseless case, is false, it just means that the reason we thought it might be true, namely that this XQuoth conjecture was true, is false, for sublinear depths. Is that clear? Great. Okay. There's alternative methods, there's tensor network approaches, there's a whole bunch of stuff, maybe we'll talk about those at the end, okay? Great. How much time do I have? Half an hour. Perfect, that's great. So now, so far we haven't really talked seriously about noise, okay? And I now want to talk about classical algorithms for random circuits that take advantage of noise, okay? Which is a really interesting topic. It's also super important because we know that uncorrected noise is like the defining feature of the NISC error, in fact, I'm not mistaken, it's what N stands for, right? So, you know, the fact that we've only been talking about what I consider in, you know, imprecision or additive error, which might make sense from a classical algorithm simulating that ideal circuit, but does not make sense from this perspective of a noisy experiment, right? That's super important. So for example, and we talked about this already, Google estimates that their signals, say measured in fidelity, was like 0.2 percent, their noise was 98, 99.8 percent, and then every indication that if they had kept, that's their 53 qubit experiment, they had kept going and going and going, larger and larger system size, but they had not corrected their error, that would have decayed, okay? Exponentially decayed. So now, what you might want to ask then, of course, is can uncorrected noise help us to classically simulate near-term quantum experiments, right? So far it seems a little bit unfair in the sense that the quantum experiment has been super noisy, but when we think about how hard it is classically to simulate that quantum experiment, we haven't been talking about noise, and so it's kind of unfair to the classical algorithm in some sense, because it's assuming the classical algorithm has to simulate the exact circuit, not the noisy circuit. Okay, now, for a lot of this slide until the end, we're going to make an assumption, which is that we know what the noise is, and it's only of this form, okay? Which is that we're going to fix a noise model, and for RCS, a first reasonable choice is depolarizing noise, okay? This is an assumption that we're going to make, it's not something that we can make lightly, okay? In fact, I think there's a lot of reason to believe that this is really not such a great model for near-term experiments, but it is a first mathematical model that makes sense. I think that's what I want to say about it, okay? And for many reasonable quantum experiments, it's believed to be the sort of dominant noise model, not all of them. Okay, so what do I mean by that? Well, each layer of random gates, there's two qubit gates in our random circuits, will be followed by a layer of single qubit depolarizing noise with constant noise strength gamma. So gamma is a constant greater than zero, and this is how the noise channel looks. If you've seen depolarizing noise, it's not super surprising to you, but the property that the algorithm that we're about to see uses is super, super simple, okay? And it's the following. It's that this noise channel, this single qubit depolarizing channel has an asymmetry, okay? An asymmetry in the following sense, and you can verify it. What it does for the identity is it preserves it, sends it to the identity, okay? It's a unital noise channel, yeah? Then all the other guys, x, y, z, yeah? Well, they're traceless, okay? So in fact, that second quantity, the gamma i over 2 times trace of whatever the row is, x, y, or z in this case, yeah? That's going to be zero. So you're going to get, for all the other polys, the x, y, z, you're going to get one minus gamma times the poly. You see that from the expression? Okay, that's the critical property that's being used in this algorithm. Sort of, if I had one thing to say about what it's using about the noise, that is it. Very simple, but that's it. There's an asymmetry between how this noise channel treats identity and how it treats the other single qubit polys. Any question about that? No, right? Okay, cool. Now, I want to say this. You know, the fact that we're talking about noisy random circuits, it's very, I always see this as a very presumptuous discussion, not something that we even really want to talk about. You know, I think it's a helpful first step in understanding this, you know, the complexity of these real-term experiments, but why might we not want to make these sorts of assumptions? It's sort of in some ways counter to the spirit that we were trying to achieve in the beginning, and why was that? Do you remember way back to the first lecture, what did I say about the goal of quantum advantage in the first place? You want to test quantum physics. You don't want to make a super strong assumption about the noise. So for example, we love to be living in an alternative universe, which unfortunately at the moment we don't live in, maybe we will someday, in which you could simply talk about the hardness of scoring sufficiently well on XCB. XCB in principle is, you know, it's noise independent. You simply look at your experimental outcomes, you assign them a score based on their output probability, and then you want to accept if that score is sufficiently high. Now if we had a better proof that that was hard, that would be enough, and it wouldn't take a noise assumption, right? That would be the goal. That would be completely consistent with what we've been doing, but we don't know that sort of thing, and so it's really helpful, and I think very interesting, and very informative, to say let's just assume a very simplified model of noise, and let's see what happens, okay? But I want to be very clear, when I start talking about the hardness of noisy RCS, the very presumptuous situation we're in. We're really assuming that we know exactly how the noise works, okay? But bear with me because this is what we've been doing, myself included. You know, it's a nice, it's a nice first step to consider just depolarizing noise. That's it, nothing else. So it's you, you have a, you know, two qubit layer of gates, you know, n qubit, you know, n over two, two qubit gates, then you have depolarizing, single qubit depolarizing noise on each of the qubits, with constant noise strength, that's also super important for the algorithm I'm about to describe. It's a constant, doesn't scale with n. You know, and then you keep going. Okay, great. Now, let's quantify this. It's actually really interesting. So there's something that's actually very easy to see. Oh, yes? Yeah, yeah. Yes. No, this is, okay, good. I'm glad you brought this up, and many people brought this up. Is it really constant? Okay, someone got, help me out please. We've been talking about this so many times. Is it really constant? Not to make fun of you, but just can someone help me out. Is it really constant in a 53 qubit experiment? It's, it's what? Yeah, perfect, perfect, exactly, exactly. We're not, it's nothing asymptotic here. I don't even know what it means, okay? It's really, really important. This is messed up all the time, right? I don't know what it means. You have a 53 qubit experiment. Everything is a constant, right? Now, you can be clever about things. You can say, oh, what if like every two years Google produces a larger and better experiment? Well, then in some sense you can extrapolate scaling. But even that's a little bit crazy, because you know, we're not going to go to infinity, and it's not going to infinity. It's like, it's like, you know, 53, 70, 80, whatever, but like, you know, it's hard to extrapolate to infinity that way, right? And probably inaccurate. In fact, what you would see, for example, is that the two qubit errors are actually going down, right? Between Google's, you know, 53 qubit experiment and their 70 qubit experiment, does that mean they can keep dropping the two qubit error rate as n goes to infinity? Certainly not, right? Okay, anyway, this is fine. The very simple thing, but a lot of people get very confused about this. Okay, cool. So I would say it's a great question. We don't know, okay? It's a kind, whatever, everything's a constant. But your question was good in that, yes, there are other sources of noise. We know it's not depolarizing noise. We see that, okay? So this is a, quite a big simplification, actually. Okay, now okay, let's talk about depolarizing noise. Let me tell you something else that's not so hard to see, but that's really important when we think about this particular noise model. Okay, so hold on. What is, what does single qubit depolarizing noise do? It sends us to the maxly mixed state, and that's the definition, right? Now, you know, or the uniform distribution, if you think about it as a classical distribution over bit strength. Now, hold on. If we do the following experiment, which is what we're saying we're doing, we have a, you know, layer of, of, uh, har random two qubit gates, then you send yourself a little bit to the maxly mixed state. Now, you add a layer of har random two qubit gates, go a little bit to the mixed state. You keep going and going and going and going. What are we going to reach? The maxly mixed state. That's not particularly subtle, right? Now, what is interesting question about that is how is, is what is the rate of convergence to the maxly mixed state or the uniform distribution? That's not entirely obvious, right? It should be exponential, but exponential in what? Right? And D, and D times N and what, and it, it depends, it turns out, but let me tell you the history of this, and it's actually really interesting. This is not something that we first observed in this context, not at all, right? This is something that, you know, people like Dorita Haranov and other people who are thinking about fault tolerance, you know, they were thinking about this when I was like in elementary school, okay? Uh, and, and, and the first result they have is the following, and this is a, a result that's worst case in the circuit, which is very strong. In this case, what I mean by worst case is it holds for any circuit. So you have any circuit C, then you have noise, and so on, okay? Layers of the, the, the, you know, each depth and then noise and then a layer of gates and then noise and so on. No matter what circuit we're talking about, if you have depolarizing noise, the output distribution of that circuit, if I measure in the standard basis, is 2 to the minus gamma times D close, right? Gamma is a constant that, in particular, involves the two qubit error rate. And so, in other words, what this is saying is that, you know, past log, right, say log squared depth, right? We're kind of in trouble, and we're in trouble in a really bad way, in the sense that this is not about random circuits or anything else, this is about, or about some fixed circuit that, you know, we'll never see, this is any circuit. You have any circuit, depolarizing noise, constant rate, you reach maximum mixed state, okay? But again, not, not super surprising, but totally foundational, like, you know, really important result, okay? Um, and this is an upper bound. Yeah, question about it? Okay, I think we're good. A question, yes? Yeah, yes. Barbara, it's such an important conversation and we'll get exactly there. This is an oversimplification. I personally think it's a good first step, but we have a paper recently where we're considering actually amplitude damping, T1 decay, all of these things, and I don't want to spoil the punchline for you, although it kind of will. Things look very different when you consider this, even if you consider depolarizing noise and then, you know, T1 decay or something, and then depolarizing noise and then, which is not a ridiculous model of what's happening. So, I'll talk about this, well, I'm hoping, but thank you, that's very, very important, and it's totally, like, it's extremely important in this discussion, right? But okay, that's not what we're thinking of at the moment. Uh, we're just thinking about the simplified model, which is depolarizing noise with constant noise rate. Um, so this already sort of rules out scalable noisy advantage, so now we are thinking about scaling, which is sort of contrary to the spirit of what we've been talking about so far, but we're thinking about scaling. If we're interested in scaling, it means we can't have super logarithmic depths, because then we're super polynomial, inverse super polynomial, whatever you want to call it, close to something, something easy, uniform distribution. Okay, but then there's this really interesting question. Well, wait a second. That's a, sort of, that could be an optimistic result in the sense that, optimistic in the sense that it works for any circuit. Could it be that for random circuits, it's actually much worse than that? The decay is much, much worse, and in fact, a lot of the Google experiment was based on a conjecture that this happens faster for random circuits with high probability over the circuit. All right, and they gave numerical evidence. It confused us for a long time, and now I think I understand it, but okay, the numerical evidence was that in fact the TVD was bounded by 2 to the minus gamma times d times n. Now we're really hosed. This is like really bad, if this is true, this is really bad news. Why is it really bad news? I mean, let's, okay, we're talking about random circuits and not everyone cares completely about random circuits, but, but still, random circuits are a good model of, sort of, typical behavior for quantum system, and if we're saying that the noise is really decaying at this rate, we're kind of in trouble, right, because even at depth one, what's happening? Oh, come on. It's too small. It's inverse exponential in n. So it's not like you can play this game where it's like, okay, so if we kept, if we keep the DAPS sufficiently shallow, we can manage this situation, which is kind of the game you could play if you think it's 2 to the minus d times, 2 to the minus o of d, because you can be like, okay, let's make it log or constant depth or something like that, and then maybe we're not that, but actually, they think it's much faster, okay? Okay, so good. And so this would rule, if this is true, this would rule out scalable, noisy quantum advantage in any depth, and interestingly, this is what Google is thinking about when they're, when they're implementing their experiment. So when people ask, like, is Google surprised that they're reaching, you know, that they think they're reaching a maximally entropic, say, no, not at all. In fact, they thought they were reaching it super, super fast, okay? Okay, good. No. So that then begs the question. So how much depth, oops, yes. Yeah, okay, this is a fantastic, how much time do I have? Okay, all right, all right. This is a great question. There's so many great questions here. It's a very exciting topic. It doesn't really help them so much. In fact, you might think it hurts them because they're trying to stay away from the uniform distribution, right? Because again, that's something like a man flipping random coins, not like a, you know, 100 million dollar device, right? So they want to stay from that. They want to stay away from that. But they also really want to understand what their fidelity should be, at least in principle, right? And so they're particularly, they're particularly interested in how fast should we, you know, at least in an idealized noise model, how fast should this, should this actually decay? Because in particular, a big claim that they've made again and again, Google and other people to be clear, is that cross entropy score, the linear cross entropy score, should correspond to fidelity. And so we're interested in how quickly these decay so that we can understand what our fidelity should be, okay? Now, there's a whole debate and a whole bunch of literature on, you know, rigorously why are they close and what assumptions about the noise? Do you need to show that they're close? They match the fidelity. We maybe can get into that later. But the point is that it's just important to understand. Now they would prefer, of course, if it was not so fast, at least from this perspective. Okay. Yeah. Yeah, yeah, yeah. With depolarizing noise alone, you converge exponential equivalence. It's really important that it's depolarizing noise or at least unital noise because if you don't have unital noise, not even clear, you converge to the uniform distribution. Okay. Good. So now, okay. Oh, yes, stay. Oh, yeah. Sorry, I didn't see you. Yeah. Yeah. Okay, I can answer the intuition and then we can talk about what I, what I prove, which is what I'm gonna talk about in a few bullets, but a few bullet points. But I think the intuition would be that random circuits in many different formal senses scramble really quickly. You know, they entangle very quickly and so on. In some sense, that's good news, right? Google likes to scramble quickly or entangle very quickly, let's say, whatever that means, but some, you can make it rigorous. Why would you like that the random circuit entangles very quickly? Google's actually happy about that. Why? Yeah. Because tensor network algorithms then don't work very well, but tensor network algorithms at fixed size, right? Because tensor network algorithms in general are not efficient algorithms. So they're happy about that on one hand. But by the same token, they're very sad because they're approaching a trivial distribution. Okay? At least they think. Now, so now the question is how much depth is actually required for quantum advantage? And here's the sort of embarrassing thing that I'm going to tell you. Very little of the current theory of quantum advantage that we've been discussing is very depth dependent. Now, you can think of that as a good thing. It's like, oh, that means very shallow depth works. I actually think of that as a bad thing. What I think is that we're going to need a depth dependence. Right? And some of our techniques should have some depth dependence somewhere for proving hardness. And probably the reason we haven't proved these conjectures is because they don't. Right? And in fact, there are formal barriers that say that we need a depth dependent somewhere. The only real thing in current quantum advantage, the theory that uses depth in a non-trivial way is called anti-concentration. Now, there's a lot to say about anti-concentration. Let me say it very quickly. Okay? So, anti-concentration is the one current ingredient of hardness of sampling arguments that seem to require very deep, relatively deep circuits. Okay? So, what does it mean to anti-concentrate? Well, there's sort of two definitions. There's a weak and a strong one. The original one said something like this. Well, it means that there exists constants alpha and C so that the probability over the choice of circuit that the output probability is sufficiently large is large. Right? You know, in other words, the distribution is kind of well spread. That's really what this is saying. Now, a stronger statement, it's not entirely clear why it's stronger. It's like a one page or a few line proof that we won't get into. It's what's called the collision probability. You see this all over the literature. It's the following saying that there exists a constant C so that 2 to the n times the expectation over C of the sum of all outcomes of Px squared is upper bounded by a constant. Why is it called the collision probability? Because it's the probability you see what? Twice, in a single outcome twice. Yeah? Okay. So it turns out this second statement is sometimes helpful in the literature but it implies the first statement. Okay, it's like a stronger anti-concentration. Great. Now, I want to say though, this is, people get this wrong all the time in literature. This is a statistical property. It does not imply hardness. Right? In fact, in some sense of anti-concentration, the uniform distribution anti-concentrates really well. It's very spread out. Nothing hard about the uniform distribution from the perspective of a randomized algorithm. Okay, cool. It's just used as a sanity check to make sure that these additive errors that we're talking about, right, are non-trivial. Remember, we're trying to show the hardness of estimating a random circuit output probability 2 to the minus n. How do you know that's not overwhelmed by this signal? Well, if it anti-concentrates, it's not because this would give a good multiplicative error estimate. And we talked about that a little bit already. Now, until recently, we only knew anti-concentration for 2D circuits at depth square root n. That's it. Before that, we didn't know. Constant log, we didn't know. It was conjecture that it didn't happen. Okay? So that means that we thought until about, until very recently that we needed about square root n depth for anything interesting to happen. Okay? It's essentially because the theory of anti-concentration was very tied to the theory of approximate 2 designs, and that's where we start seeing approximate 2 designs. You know what that means? It's great if you don't. We thought we needed square root n depth. Okay. Now, hold on. Why might Google coming with this knowledge not think that they would have scalable quantum advantage? Well, because if you have depolarizing noise, you're 2 to the minus square root n close to the uniform distribution. And you need square root n to anti-concentrate. So they were not optimistic at all that their experiment would scale with noise. Okay? Again, so then you ask, would they be surprised if someone says, oh yeah, remember to read a Harnov's earlier result from the 90s? It said that, now it's like, wait, they were thinking about that. It's, you know, they thought that they needed root n depth for anti-concentration and at root n depth, the reads result, her early result, said that you were too close. All right? Okay, great. So then, oh yes. Five minutes. All right, cool. Yeah. Yeah. It's okay. It's okay. No excuses about lunch. I'm hungry too. Yeah. I'd prefer to summarize what I have to say and then you can ask me questions. Okay, cool. So, so now, is there any hope for a fully scalable noise-equantum advantage from RCS? Well, until last year, we thought not. We thought that's why we look at this Goldilocks solution where we always said we're looking for the middle bear, middle size. All right? Not too big, not too small. Okay? But then, there are two new results. This was like 2022 that rekindled hope that at exactly log n depth, we could have a scalable advantage even with depolarizing noise. Okay? Let me tell you the two results. Anti-concentration at log depth. This is actually really surprising, at least for, at the time, the idea was that you can divorce the anti-concentration from approximate to design property, which happens much later. And then, a group, a group that I actually helped lead showed that the, the TVD, in fact, for a random circuit was lower bounded by 2 to the minus D with high probability. Okay? So this faster decay rate is some sort of finite size effect. It's not something that we see if we go very, very large. Okay? It doesn't happen. Okay? And of course this means that even for random, for random circuits, this is tight. It's upper bound by a Harnov in the 90s is tight. Okay. Now, so that's the question that we're asking, and at log depth, exactly log depth, not too shallow, not any more shallow because you don't anti-concentrate, not too deep, you're too close. Can we achieve scalable quantum advantage? Okay. For D equal log, we, what do we know about it? Well, by Harnov's 96 result, we know that the uniform distribution is one over some fixed polynomial close in TVD because it's one over two to the log. Okay? Like one over n to the c. But you could still have hope that even though that's a polynomial, that for smaller polynomials, you would have an advantage because with the uniform sampling, you can't like run the algorithm for longer and do better. It just is what it is. It's one over two to the D close. You have a smaller polynomial, or you know, larger polynomial, so one over that polynomial is smaller and it could be there's quantum advantage. Okay? That's what the recent paper by, again, to read Aharnov and others, you see she plays a really central role. Again, now in 2022, that's what this is saying. This is a classical algorithm that's saying as soon as you have anti-concentration, which is at log depth in particular, you can have a classical sampling algorithm that gets an arbitrary inverse polynomial total variation distance where the trivial total variation distance that you just get by outputting uniform would score one over fixed polynomial. Is it clear what this says? We're not going to prove it. I want to give one slide about the intuition and then we'll sort of start wrapping up. Okay? Okay. Cool. So how does it work? It's actually very simple, very beautiful, very simple algorithm. Here's the idea. So, oh and, sorry, another thing I want to say is their algorithm, it runs polynomial in N and one over epsilon, which is a TVD, but scales exponentially in the noise rate one over the noise rate. So as we make this less and less noisy, this is getting worse and worse. It's a constant if gamma is constant, of course. Yeah. Okay. It's also not a practical algorithm even if you believe that the noises is depolarizing noise, it doesn't outperform the finite size experiments. Okay. Also, the algorithm requires anti-concentration, so it really only is meaningful at exactly log depth, not deeper, not more shallow. Okay. And finally, it requires a hard random gates. There's actually some serious dependence on that. Let me tell you the main ideas. Well, the key idea is that let's express the output probability in the polybasis. We just did this. Exactly the same notation. This f of C sx is the value of each term in this polybasis, the trace of a whole bunch of things, right? The product of a whole bunch of traces. Now, here's the idea. If you look at what noise does to the output probability, okay, it's really easy to see if it's depolarizing noise. That's very important. You can just end up pulling out outside the f of C s and x, this one minus gamma that's the noise rate to the hamming weight of the path. Okay. The hamming weight is the number of non-identity operators. This should be very familiar. This comes from the asymmetry I told you about in the depolarizing noise channel that it does something different to the identity and everyone else. So it's saying for everyone else, x, y, and z, we're getting this one minus gamma term. For identity, we don't. So what's the strategy? To compute the output probability, simply throw away high weight terms. That's it. That's all it does. It's a very beautiful, very simple algorithm. And then for an appropriate cutoff, you just literally go through term by term and compute, okay? Now, I had a whole bunch of proofs. You should go through this just because it's very, the proofs are not so hard. The one thing I want to, and I'm not going to go through it because I have like minus one minute probably, but I do want to say a few more things. The main thing I want to say is the analysis, the way that we show that, or the way that they show, that when you throw away a whole bunch of paths, you don't lose too much. That uses anti-concentration, and that's really important. That's why they need log depth. So strictly speaking, again, their algorithm requires log depth. It doesn't work larger than log depth simply because it's trivial to output the uniform distribution at that point and it does work. It's just not particularly interesting if you can do better with uniform sampling. Before log depth, we don't know if it works precisely because anti-concentration fails. We need anti-concentration to bound the error that this algorithm is making when we throw away terms in the polypath integral. Okay, great. So I'm going to skip this. I'm going to go for a few comments, and then I'll stop, okay? So a few comments. First of all, the algorithm applies to constant noise rates. Now, there has been literature that still considers unital noise depolarizing noise, but they consider noise strength something like 1 over n. And here, maybe not surprisingly, there are certain senses in which you can get back hardness, okay? I'm going to be a little bit vague about what that is. You can look up the so-called white noise model. It's kind of global depolarizing which happens at this point. But it's an interesting result. On the other hand, it's not super realistic if you really think about n going to infinity for the gate error rate to drop, in my opinion, okay? Now, Google looks at this and they say, no, that is actually not so unreasonable because when we go from 53 to 70 to eventually whatever, 100 to 100, that's what we're doing. We would error correct if we didn't do that because otherwise our fidelity gets crappy, right? But again, that's not really asymptotic. So I think this depends on how you think of it. It's an interesting setting, but it's maybe not, it's not really realistic as n goes to infinity, which is usually what we're thinking about. Okay, now, this algorithm doesn't spoof current RCS experiments. Can we do better? Can we generalize Duret's algorithm, this new algorithm to other noise models? I don't know the answer to this. We're very interested in this. Let me tell you what I did show, though, sort of answering Barbara's question earlier. So what we can show is that if you have depolarizing noise, any constant noise strength, and then followed by amplitude damping, any constant noise strength. In fact, amplitude damping is not that important. It's like non-unitural noise is the key. And suppose that two noise strengths are both constant, but it doesn't matter which constants. Could be that you have very large depolarizing noise, very small amplitude damping, but the important thing is that they're all constant, as n goes to infinity. Then in fact, what we can prove is you do not anti-concentrate at any depth, at any depth. Okay? So in fact, what this is saying, which is in some sense a disappointing result, I have to say, is that, and it's an important but disappointing result, is that we're going to need new techniques, no matter whether we want to show easiness or hardness. The hardness arguments require anti-concentration. That's not happening in this setting. The easiness requirements, the easiness algorithms, like Derrit's algorithm, they're really beautiful, but they require anti-concentration. As far as we know, the analysis certainly does. All over the place. It doesn't work here. Great. Gate sets that are far from Harrandum, we don't know. But I think the most important question here on this slide is most generally is fully scalable quantum advantage possible without error mitigation for any experiment. I'm not an optimist about this, but strictly speaking, I think we do not know the answer to this question. Okay? I gave a whole bunch of work that my group and other people have done on this. I would love if you check this out. I don't have time to get into all of this. This is all on the archive. I hope you do. Thanks a lot. All right, that's it.