 All right, great, I'm going to start. So yeah, if you could sit down, that would be great. OK, so today we're going to sort of switch gears a little bit. And I'll tell you sort of a second general category of hardness arguments for these quantum experiments. We're going to talk about the hardness of scoring sufficiently well on various benchmarks that people have considered. So here's sort of my story for my pitch for why this might make sense. So far we've modeled the error in our experiment, or our random circuit, purely in terms of bounded total variation distance. That's the only sense in which we were talking about error. But that's extremely restrictive. It's extremely restrictive for several reasons. One, because TBD is difficult to measure. And I mean that in a really formal sense, which is that if you can imagine that you have some distribution, like the distribution of some near-term experiment, it's outputting samples. And then your goal is to test whether those samples are close in TBD to the ideal outcome if it made no noise. No errors. Well, there are a lot of famous results that give sample complexity lower bounds for that task, showing that you need an exponential number of samples from the box that's outputting samples to test if the TBD is close or not. And in fact, this is true even for comparatively modest tasks. Even if you want to tell if the distribution is uniform or far in TBD from uniform, even that would take an exponential number of samples. So TBD is not something that we can measure, at least not if we don't want to make any further assumptions about the device. And then the second thing I also mentioned before, closeness and total variation distance is not a reasonable model of uncorrected noise. In the sense that so far we've been talking about sort of asymptotics, the system is getting really large. And when you're thinking about TBD, it only really makes sense to talk about sufficiently small constant TBDs because the uniform distribution is a constant far and constant close, however you think about it, in TBD to the ideal distribution. It only makes sense if we think about small constants in TBD. And if the system size is growing, that sort of means that in some sense, as the system size is getting bigger, the error isn't changing, which is not a particularly reasonable model for near-term experiments, which tend to see things more like the fidelity or the quantum signal is sort of decaying exponentially for not sort of sufficiently reducing our noise as the system size grows. So I think what's really natural to ask, and I think the starting point for these benchmarks, is precisely this question. Is there a quantum signal in random circuits that's somehow easier to verify and easier to implement that might make sense, might make more sense for a noisy experiment? So what are the candidates? Well, almost all of them rely on this property of random circuits, which we're going to call the Porter-Thomas property, the property of ideal random circuits. So what does this mean? Well, what it means is that each output probability is exponentially distributed. Or a little bit more formally, the probability over the choice of circuit that the outcome probability of any particular outcome x, so any fixed outcome, is equal to q over n is exponentially distributed in minus q. So what this means, of course, is that this distribution is not, the output distribution of each outcome is not flat, it's not uniform, but it's kind of flat-ish in the sense that most of the outcomes are maybe around 1 over 2 to the n. Some are a little heavier than that, maybe by a constant. Some are a little lighter than that. But you don't have a tremendous variation of probabilities in this distribution. Now, where did this come from? This is actually a little bit subtle. So let me tell you what I know about this. So this is true for hard random unitaries. Let me tell you what I mean. Suppose you take a truly hard random n-cubic unitaries. This is not a circuit. It's a matrix, a 2 to the n by 2 to the n matrix. And I ask you, what is the distribution of each entry of that matrix? So this is well-known in the random matrix theory community. And what is the answer? It's Gaussian, right? It's a Gaussian. So now, the right analogy here, that would be sort of the amplitude. And the right analogy is the probability. So we would be squaring a Gaussian. And that's sort of what we're trying to get across here. It's kind of like a squared Gaussian distribution. Now, this Porter-Thomas property is conjectured to be true even for sort of approximations of the global harm measure. So for example, we think we see it even in relatively shallow depth quantum circuits. But generally speaking, this is not proven. This is something that's numerically observed and that a lot of people believe, even at relatively shallow depths, let's say, square root n depth and 2D grid. But it's not something that we have a proof for. And it's not something that we know follows from something like a 2-design or a k-design. So this is really based on analytics. But based on this property, which again implies that most of the outcome probabilities are pretty similar, some a little heavier than others, there's something really natural we can ask, which is, in terms of quantum signal, could it be that there's a hard quantum signal in outputting heavy outcomes? So what do I mean by that? Well, let's say we choose a random circuit. And then we run that random circuit. We fix it now. So it's fixed. We run it on the all-zeros state many times. When we measure, naturally, we're going to get outcomes that occur with relatively heavy probability. And we'll formalize what we mean by heavy in a moment. But the heavier probabilities are going to come out more likely than not. And so the question is, how difficult is this to do classically? So if I give you a random quantum circuit, how difficult is it to figure out what the heavy outcomes are? Something a quantum computer does naturally. And this is, I think, exactly the intuition behind one of the first benchmarks for random circuits, which is called heavy output generation we're all abbreviated using HOG, H-O-G, due to Aronson and Chen from 2017. And it's exactly what they were saying. So they said, here's the definition that we're going to use for what heavy means. We're going to, with respect to the output distribution of the circuit, and certain outcome x, which is just an n-bit string, we'll call an outcome heavy if it's above median in the outcome distribution of the circuit. There's nothing particularly special with median. It's just something that nicely formalizes what we mean by heavy. If you're like me and you haven't thought about exactly what median means for a while, it's a very simple concept. You simply sort the outcome distribution. That's this vector here. I'm thinking about an outcome distribution of 2 to the n probabilities. And you sort of choose the middle one. That's the median, OK? OK, great. So now here's the hog problem. We're given a random circuit, and we're asked to output strings x1, x2 through xk. Good thing if k is a constant, if you like. But it doesn't matter too much. You want to output, you want two thirds of those strings that you outputted, the x1 through xk, to be heavy. And now heavy meaning with heavier than median in the outcome distribution. So this is something we can do with a quantum computer, like I mentioned earlier, simply by taking the random circuit, running it on all zeros, and measure many times. This uses a property of a Porter-Thomas distribution. Because notice that this would be clearly false for loads of different circuits. There's no reason that the median would have to be so large. But what you can calculate using the Porter-Thomas distribution is that with very high probability over the choice of circuit, the mass of the distribution that's above median is going to be bounded by a constant away from half, like 0.7, something like that. And so then we can use a Chernoff bound to show that at least two thirds of the outputs are heavy with high probability. This is not so hard. It's the same thing as flipping a bias coin and estimating the bias. This is the same idea. So questions about this. Pretty clear what Hogue is and why we can solve it with a quantum computer. Yeah, yeah. Yeah. Yeah, right. OK. There's a whole bunch of you can do other things, but you'd have to be careful about the parameter settings. The parameter settings are not completely arbitrary. There's a complicated dependency between a number of them. Yes. Other questions. This is not too hard though, right? That should be the message. OK. Now, why is this attractive to us? Well, so first things first. Hogue doesn't seem like it's too different from these tasks that we were talking about before. It kind of seems like a sampling task. It seems like it's not so far removed from the task of just sampling from the output distribution of the circuit. Now, strictly speaking, it's not a sampling task. It's a relation task, if you want to call it that. In other words, the input is a random circuit and the output are a bunch of outcomes. It's not necessarily samples. But it's very, it's clearly not too far away from the task of just sampling from the output distribution because, of course, if you sample from the output distribution, you're going to get heavy outcomes. That's the whole point. So it seems like just a weaker sampling task. So the question is, why should this be hard for a classical computer? We spent the entire time in the last few days talking about the hardness of sampling. Now we're making the task weaker because to solve hog, you don't necessarily have to sample from the distribution. You just have to sort of have a proxy for what is the heavy outcome and what are the light outcomes. It's a formally easier task. But then proving it's hard should be even harder. So Aaron and Chen think about this. And the best thing they come up with is the following, which is that this hog problem is classically hard, assuming another conjecture, which they call quoth. Quoth, I think, stands for quantum threshold assumption. So let me tell you what quoth is. Says the following, no efficient classical algorithm takes its input, a random circuit, with a lot larger number of gates than qubits. So m is much greater than n. And decides if a single outcome probability, like the all-zeros outcome, is heavy, that means above median, with probability that has an exponential in n bias. So this is obviously a custom conjecture. I'm not aware of anyone making this conjecture before these random circuits were considered. But what's the motivation? So have we made progress? And I think it's fair to say it's not clear if this makes progress. In other words, I think why not just conjecture that hog is hard and be done with it? Why is going through a reduction and showing that this quoth problem, this quoth conjecture, suffices to show that hog is hard? Why is this interesting? I think the motivation is something like this. Unlike this sampling task, it seems at least at a high level that this quoth task looks like a computation of an output probability. It's a non-standard computation of an output probability because it's not asking you to output the probability with really high precision, like what we were talking about or something like that. It's just asking you to take a fixed outcome and gauge roughly speaking how heavy it is. But it seems like we might have better insight for what the best sort of classical algorithm might do for this sort of task, right? Whereas hog, it's a sampling problem. I think the thought was it was not so clear. Here it seems like it's really hard to imagine how you would solve the quoth problem without sort of getting some estimate to the output probability. That's the intuition. There's not much more than that other than intuition. But I think the intuition is also served by the history here, right? Because I think if you think about what we were doing before in the last few days, we started with a sampling problem, not too different from hog, right? A stronger version of hog. But then we immediately reduced, using the Stockmeyer reduction, to a computation problem. And the reason we did that was because then we could get a handle on the hardness. We could make very non-trivial complexity theoretic statements. Admittedly, we still can't prove the hardness without conjectures and so on, but we could at least say something interesting other than, okay, we'll conjecture it's hard to sample because I don't know how to do it. I think this is a very, very similar intuition. I think that's important to appreciate. I think the point was that hog seems extremely close to what the quantum computer is really doing. It's not that different from sampling. Quoth seems like it's computing and output probability. We actually do have some good evidence that computing output probabilities of random circuits are hard. You just saw that. We also understand better sort of the type of simulation algorithms, classical simulation algorithms that exist to compute these sorts of quantities. So it seems like maybe making this conjecture was a bit less of a shot in the dark. And that was sort of the idea, okay? Another thing I wanna say about this conjecture is that the bias scaling exponentially in N, number of qubits, rather than M is really important because remember we've made this assumption that the size is much greater than M. So there's a big difference here. And it turns out that it's not hard to show, which means I won't quite show it, but I'll hand-wave it, that there are classical algorithms that do work to compute output probabilities or rather solve this problem, this heaviness problem, with a bias that scales exponentially in the system size rather than in N, okay? And the way you do this essentially is you write this probability out as a Feynman path integral, much like we did a day ago or two days ago, whatever, in the worst average case reduction. And you just randomly guess a small number, like a polynomial number of paths, you evaluate the values of the paths and you accept if it's larger than some predefined threshold, which I won't calculate, but it's pretty simple. And what's not too hard to see is that this works with a bias over random guessing, that's the half, right? That scales with the number of paths in the Feynman path integral, okay? Because sort of you'd expect that all of the values of the paths and each of the values of the paths individually are about the same, right? It's true in expectation, actually. So you sort of assume that they're all the same. And so it seems like the best thing you can do is just guess an efficient number of them, polynomial number of them, see what that is, that gives you some signal, but it's very small, okay? Okay, that's I think the motivation for this. Now, how does the reduction work? And maybe you can help me with this, because it's actually not that hard. So I wanna show that if quoth is true, that is if it's hard to, you know, if you give me a single outcome like all zeros and you ask me, is this heavy or not? I can do that, you know, and that's hard to do, even with inverse exponential and n bias. And that implies that hog is hard. In other words, it's hard to output a list of outcomes, most of which, say two thirds or greater, are heavy in the output distribution. So how would you do this? Any ideas? Sort of an obvious thing you can do. So maybe I'll start and then you tell, you jump in, okay? We're gonna prove this by contrapositive, all right? So we wanna show that quoth is true implies hog is hard, contrapositive, hog is easy, quoth is false. So let's say hog is easy. What does that mean? Have an algorithm. It's a classical algorithm, but that's not super important for this reduction. It spits out outcomes, and at least two thirds of these outcomes are heavy in the output distribution of the circuit that we're giving the algorithm, okay? Now what we want to know to falsify quoth is a particular outcome, like all zeros. Of course, there's nothing important about all zeros. It could be any of it, some fixed outcome. Is that heavy or not? Okay, can you help me out? So all right, so I'm gonna start. I use the hog algorithm. I run it. It outputs a list. What do I do? Yeah, do you see zero? That's it. That's all you do. Now hold on. You can already see why the bias takes a huge hit, right? Because this distribution is essentially flat, right? It's not exactly flat. Well, it's flat, this would be meaningless. But it's not, but then it's the whole point. But it's essentially flat. It's not too far from being flat, right? So even if zero to the n is heavy, now it maybe occurs with probability two over two to the n or something like that. There's an inverse exponential chance that you'll see it. It's just two over two to the n. It's not 0.5 or something like that. It's super small. But the point is just that there's a small bias that's a little better than guessing. And in n, exponential in minus n, that sort of says it's more likely to happen, it's slightly more likely to happen if it's heavy. That's on your list. Okay, so let's prove it, although I think this is one of these things where the proof is more confusing than intuition, which is what I just said. It's essentially the same. But let's go through it anyway. I think this is also a problem on the problem set. So we'll get a good review here. Okay, so let's do this more formally. So it turns out it's easier to, rather than asking if the all zeros outcome is heavy, it's easier to analyze the problem of asking, given a random circuit and a random outcome z, in which is a binary string, is z heavy for a c. But these things are the same difficulty. Right, in other words, it doesn't matter so much if we're thinking about a fixed outcome, or if I choose a random outcome and then I ask if that's heavy. Does anyone see why or have an intuition for why that would be true? So two problems. One problem is what we're asking. You're given a random circuit. You ask, is the all zeros output? Fixed outcome. Is that heavy or not or light? Second thing is given a random circuit, then a random outcome, uniformly random z, is that heavier or light? But these things are the same, yeah. Yeah, precisely, and then exactly. So that's exactly what we do. We take the random circuit, we draw a uniformly random z, that's a n-bit string. And then we add a layer to the circuit that just sort of permutes the outcomes. I think formally speaking it would be like you add an x gate to the ith qubit if the ith bit of z is one, and otherwise not. So now this has the property that this new circuit, and maybe that's called c prime, has the property that the z string has exactly the same probability as the zero string, the all zero string in c, the original circuit. But furthermore, that's not enough. We also wanna show that the distribution of c prime, that's the circuit, the random circuit c, and then this extra layer of random x gates, that that's also hard random, yeah. Sorry, go ahead, yeah, yeah. Yes, exactly. So this is a property of the hard measure, right? You take a hard random circuit, you have two qubit gates, hard random gates. Then you just sort of have another layer where you have like some random z or whatever. Excuse me, random subset of x's. It's still a random circuit, okay? And so in fact, these problems are the same. It just helps to analyze this problem to think not of a fixed outcome zero, but of a uniformly random z, okay? Great, so that's the idea. Now the strategy is going to be, I think I say it's the same. I would say it's extremely similar. There's a difference here, but let me tell you the strategy. Basically what we said, you use the assumed Hoag algorithm, it comes because we're taking the contrapositive, on C prime to output this list of outcomes, z one through z k, so that at least two thirds of them are heavy. Then rather than just seeing if it's on the list, it's easier to analyze the case where you just pick a uniform outcome that you saw, uniform observed outcome in this list, and you, and what do you do? You say heavy if the outcome is, if you have a collision, if the uniformly random outcome that you chose of the observed outcome happens to be z, which itself was uniformly random, okay? No, yeah you pick, okay yes, not deterministically, you choose a uniformly random one of the list, you see is that equal to z? If it's equal to z, you accept otherwise, in fact otherwise you don't reject, in fact there's a smarter thing you can do. What is the thing, so if you get a miss, there's a hit, a hit you know you're going to say it's heavy. There's a miss, right? And then you flip a coin, yeah? Okay let's analyze this, it's really really simple and you're going to do it on your problem set, so I'll just go through it quickly and then you should just review it, okay? How do we analyze this? Well it's really simple to see that there's two ways you can get the answer right, right? One is if you get a hit, right, what probability does that occur with? Uniformly random outcome, uniformly random thing that you're looking for, like one over two to the n, right? Times two thirds, why two thirds? At least two thirds, because two thirds of the guys are heavy in the observed list, that's the turn off, okay? Or the other way you can get it right is if you get a miss, that occurs with probability one minus one over two to the n, but then you flip correctly, okay? Well it's median, so that's probability half. The point is you then do the algebra and you get that this is one half plus a bias that's at least one over two to the n. I'll let you work that out, but that's just doing math. Yeah. Yeah, yeah, yeah. Yeah. Right, you can do all of these variants depending on the parameters that you choose in the problem. Yeah. Anything else? The addiction should be, yeah, yeah. Ah, yes, coming, coming, thank you. Not quite yet. We're still in theory land. We only care about hardness, we're not going to ask like, how do we know that it was there or something like that? Yes, in the back. That's right, that's exactly right, because there are algorithms that work with bias one over two to the m about. No, no, no, no, like square. Like m equal n squared, let's say. That's a safe parameter setting. You can make it, you can do better by looking at exactly the coefficient, you know the thing, but we're not going to do that. Let's just set it to be n squared. More question before I go on? Okay, great, let's keep going then. All right, so that's how it works. Now, Google though in, well, 2016 and then later in 2019 with their experiment, actually ended up using a different but very, very related benchmark to test their device. They call this linear cross entropy. We're going to abbreviate it by XCB. Let me describe to you what this is. And if you get lost in the details, the thing I want you to remember about this is just that it's an alternative measure of heaviness, right? It's a slightly different measure of heaviness. I don't know a formal relationship between the hog problem and scoring well on this linear XCB problem, but the intuition is exactly the same. That's really what it is, okay? That's the important thing. So here's what it is. The XCB is a function of two distributions. One I'll call PXP. That's the experimental distribution. That's whatever your box is outputting. We want to make as few assumptions about that as possible. We don't want to be presumptuous about what that distribution is. And then P ideal. P ideal is you have in your hand the randomly chosen circuit. It's now fixed, okay? And the output distribution of that circuit when you measure all in qubits and standard basis is called P ideal, okay? And what is this XCB? It's super simple. It's really, really easy to say. It's two to the N, that's a normalization factor. I would forget about it, okay? Times the dot product of these two distributions if you think of them as two to the N bit vectors. That's all it is, okay? And sometimes a helpful way to think about this, we all can think about this as a two to the N times an expectation value of the ideal distribution with respect to the X drawn from the experimental distribution. That's nothing more than simplifying the expression, okay? Okay, cool. So now a few things to say and these follow from Porter-Thomas and there are actually expectations which I missed here, I didn't have it. But if we have a perfect experiment and Px is equal to P ideal, the expectation over the choice of circuit of XCB turns out to be two. It's not hard to work out if you know the Porter-Thomas expression. It's just a simple integral. But if you're sampling from the uniform distribution that's generally uncorrelated with the ideal distribution, answer is one. It's not a complete coincidence. We normalized it so that that's true, okay? It's nothing really special about two or one. It could be two over two to the two N or whatever. But that's why we normalize things with this two to the N factor. And so, but here's why it's nice and this is better than, I think this is the main reason why it's better than TBD, okay? So follow it. Well, there's many reasons. This is, I think, the main one. Unlike TBD, XCB can be well approximated in a few device samples by what I call concentration of measure arguments. I realize that sounds more vague than it is. I simply mean your sort of law of large numbers, okay? It's not that hard. But it requires exponential time to compute the ideal output probabilities of observed samples. So what do I mean by this? So what does the experimentalist actually do to calculate this experiment, right? They take small number of observed samples, right? From PXP, right? I'm calling that Z1 through ZK. I think it'd be consistent with what we were doing earlier. Now, I compute two to the N, that's the normalization, right? Times the arithmetic mean of the ideal output probabilities of these observed outcomes, okay? And it's not hard to show it all. Again, using for a Thomas property, in fact, the fact that the sort of second moment is well understood, that this well approximates the actual cross entropy with high probability, okay? Now, but okay, so that's all simple. But there's an interesting question here though, right? Which is how on earth do you actually compute that sum? It's true that that sum is not huge sum. It's a relatively small number of terms. But they're all ideal output probabilities. Ideal output probabilities we actually believe to be hard. In fact, that was the moral of last time's lesson if you didn't catch that, right? It's that the reason, in fact, the reason we believe sampling is hard at the end of the day is because computing output probabilities is hard. So what on earth would we do? How would we compute this? This is where you have to stop thinking like a complexity theorist and start thinking like a large company that has a giant supercomputer on their premises, right? Which will not be named. So that's exactly what you do, right? You literally exhaustively compute the output probability of each of these observed outcomes. Okay, but the point is because it's sample efficient, you don't need that many of them, okay? You take the arithmetic mean, you normalize. You use that as an XCV score. Okay, so now, oh sorry, any questions about that? Yeah, yeah. Yeah, you can work that out like you can say like a multiplicative precision or a one over to the end, but let's just say it's exact. It's essentially what they do, yeah. And you can also see that this is already causing some problems, right? And we go back now to the first lecture where I told you that we're really looking for a Goldilocks solution, right? If this is the score that we're intent on using to verify our random circuit, we're not interested in scaling the system to larger and larger sizes. It's not going to work, right? So we can talk about asymptotics all we want. Complexity theorists like me insist on talking about asymptotics because I don't know how else to prove rigorous hardness arguments. But fundamentally, we do not know how to verify these systems if the system size is sufficiently large. This may not be the best we can do. There might be an efficient way sort of hiding somewhere that gives you a better way of doing this that's efficient. I don't know where that is. I'm not really that optimistic about it, okay? This is the way we currently verify these experiments. Yeah, yes. Precisely, precisely, precisely. That's a good proxy, a first proxy. I mean, two to the n, you can do better than that a little bit, but the idea is the k. It's small. You don't have to go through all of the output probabilities. If you wanted to sample naively, right? You would have to do much, much more computation. But there's still exponentials. You're just sort of fighting over the exponential, yeah? Of course, if n is 53, that might matter a lot. Yeah, good. Other questions? Okay, how much time do I have? Half an hour, good, okay, perfect. Okay, so then, why are you scoring, well, on XCB, classically hard, right? Well, Aronson and Gunn come into the picture, have a very similar argument to what they argued before. So the first thing they do is they have a corresponding version of hog, which they call x-hog, which just more aptly captures the hardness of the XCB. It's very, very similar to hog, because after all, you know, XCB itself is really similar to hog. Ah, I did wanna say one more thing. Maybe you can help me out, so you have everything, I think, at your disposal now to answer this question. So I said that hog is really all about heaviness, and that's clear. Then I said that XCB is fundamentally about heaviness. Let's go back for a second. Yeah, here we go. Why is XC... How can you see, even from this expression, that XCB is about heaviness? What do I even mean by that? Yeah, I mean, the way I think about it, very simply, is that, you know, distributions, pairs of distributions that score well on XCB are going to be distributions that share... What? Share heavy outcomes, right? That's the intuition. I think if you read anything more into this, you're going to be missing the point. I think that's really the point, okay? Okay, cool. So, Ericsson and Gunn, why is XCB classically hard? Well, because of X-Hog, all right? So here's X-Hog, X stands for XCB, of course, okay? Here's what it is. Given the circuit C, output K distinct samples, Z1, Z2 through ZK, so that the arithmetic mean of those samples is, that's the expectation, not the expectation over C, it's expectation over I. C is fixed, you know? I is just the number of samples, you know? K is the number of samples, so I is a uniformly chosen sample. Okay, it should be hard to output these K samples so that the arithmetic mean of them, the arithmetic mean of the output probability with respect to the ideal circuit is greater than or equal to B over two to the n. Why does this match XCB? Well, precisely because now we're just saying, like it's hard to output a whole bunch of strings that occur with slightly heavy probability and heavy probability now is heavier than one over two to the n, okay? Great, when we think of B as one plus epsilon and we have to be a little careful here, if we had a noiseless circuit, it's not hard to calculate what we should be seeing and we should be seeing with very high probability B equal two, so most of the outcomes that we observe should be something like two over two to the n. The problem though is that noise can cause the experiment to have considerably different values for B. In fact, depending on how you model noise, there are very reasonable noise models for these random circuits, albeit a little bit simplified, increase entropy, right? They keep increasing entropy. You actually reach the uniform distribution if you're not careful, right? And so in fact, in a real world experiment, B is not going to be two, right? If you actually scale the experiment larger and larger, this epsilon is going to be more like one plus one over two to the d, something like that. Something that's getting smaller and smaller and smaller as in particular the depth grows. It's not surprising at all. The signal fundamentally is decreasing if you're not correcting your noise, right? Okay, but we're gonna be complexity theorists as we were with all of the other arguments and we're going to assume that we have a constant score in the XCB, okay? It's not true for a asymptotic random experiment. Thank goodness we're not talking about asymptotic noisy experiments, okay? Of course, in fact though, Google scores 1.002 on its 53 qubit random circuit experiment. It's actually quite impressive though. The first thing that anyone does always is say, oh my god, that clearly seems like much closer to uniform because uniform is one, right? Then to two. But they were happy about this. Can you help me out? Why would anyone be happy about this? There are actually some really good reasons to be happy about this. No one's happy though? Wait, hold on, I just told you what the score should be. Well, anything more than one, if it's exactly one and they couldn't resolve the difference that's kind of upsetting. That means that someone was, that there's no more power there than like flipping random coin. But it's not a constant. I mean, it is a constant at 53. Is it really a constant? Definitely not. Again, this goes back to the same thing I've now talked about several times and subtle and a lot of people make this mistake. But it's not a constant if things aren't, I don't even know what it means. Everything is a constant in some sense. Nothing's scaling. But I think the one reason that they're excited about, well, okay, two. There's sort of two reasons. One, if you really think that it's scaling with like two to the minus D or two to the minus even D times N depending on what settings, what parameters you're talking about. Actually, this is a lot better than you'd expect from asymptotic experiment, okay? Much, much better than some exponential in minus 53 or something like that, right? But also, or minus 53 times 20, that's D times N, something like that. I mean, it's actually pretty big number. The other thing is, you know, what we care about in some sense is the complexity theory, definitely, to give us a feeling for how hard the task is in some mathematical rigorous sense. At the end of the day, we also care about the finite size experiment. And so Google's mentality here was that when we actually get a finite size score like this, we should be comparing this against the best classical, the best known classical algorithms run on, you know, Google's giant classical supercomputer, right? And in fact, it turns out that doing the same task, apples to apples, is not a trivial one, okay? So when they first came out with their result, they estimated that it would take their best supercomputer, something like 10,000 years. You know, I have to say, I wasn't thrilled that they came out. They gave a prediction like that, that looked a little bit optimistic. It turned out that the answer is probably more like a few minutes. If you really optimize the algorithm, you know, like 20 minutes or something, if you really optimize the algorithm, you use fastest supercomputer, is that quantum advantage or not? I don't know, that's kind of up to you. But the point is that, the point is that it's non-trivial to do even this 0.002, okay? Questions about this? Yeah, yeah, yeah, yeah. So does it, does it? I mean, I don't know exactly what you mean by error correction. There were error mitigation strategies that were employed, they were certainly, there was a post-selection, there was a whole bunch of other things. It's definitely not full-blown error correction. I don't know exactly what modeling, like modeling it theoretically. I guess we can try. I haven't done that. Yeah, right, right, right, exactly. Oh, I see, can you, can you prove rigorously that you can do better? What is the hope? Yeah, I mean, look, they had loads of benchmarks. I'm sure many of them could be reflective of how well they're managing the error. I mean, one of them is cross entropy because they're doing better than two to the minus 53. I don't know exactly what else you would want but we can talk about it offline. Next question? Yeah, yeah, right. It's very close to the uniform distribution, exactly. I didn't say that, I didn't say that. They made all sorts of statistical claims about how confident they were in this. We will not get into that because we're not going to go into the statistics at that level. But yes, they had very high confidence, assuming a whole bunch of stuff. That's the problem that this was. Yeah, okay, let's talk about that offline. There's loads of assumptions and I can tell you them. Some I think are very believable, some a little less so. It's a good question though. I think I'm going to purposely refuse to answer that at the moment. Yeah? I have a clarification question. Yes. The definition of XB? Yes, yeah. No, no, no. This is the expectation over a choice of random circuit. It's with respect to Porter-Thomas that you do this. It's not with respect to any distribution. Super important. Other questions? Yeah. It's expectation over C of XCB of PX, and let's say P sub C rather than P ideal. Whatever it is. Yes, what comes out of the device. You want to make, that's fixed. But what I'm saying is, right, precisely. What I'm saying though is in the ideal case, PX would be equal to P ideal. The PX is like, you've collected it the right time. Yes. Well, it's, I mean, sort of, exactly. Like in the sense that what you do is you draw a circuit randomly, and then that defines your P ideal. Yeah. Other questions? Yeah, yeah, yeah. Yes. Yeah, yeah, yeah. Yes, yes, yes. Fantastic question. Remember I was trying to say we were not going to talk about noise, but it's such a great question. It's one of my favorite ones that I'll answer anyway. Yeah, he was asking, you know, the fact that we sort of were measuring this against one, at least it kind of is indicative of the fact that we think maybe a failure mode would be like the uniform distribution. And so he was saying like, how reasonable is it asymptotically that we go to the uniform distribution? Fantastic question. Something we've been thinking a lot about in my group, in fact. Here's the answer to that. It's very, very clear that if we're talking about depolarizing noise, then you're clearly going to the uniform distribution. That's, by the way, that's not deep. That's just because, you know, if you keep having a layer of random circuits, or, you know, random gates, two cubic gates, then you have depolarizing, while you're adding your entropy, right? That's what depolarizing does, it pushes you to the maxi-mixed state. Now you have another layer of our random circuit pushed to the maxi-mixed state. Of course you're going to the uniform distribution. Now there are interesting questions involving how quickly do you go to the uniform distribution. But there's another question, which is, is that the right noise model? It turns out the answer is no. It's not the right noise model, right? We know it's not the right noise model. It doesn't capture everything in our system. When you add other noises, I think the question about whether you're going to the uniform distribution or not, depends on a whole bunch of different things. So for example, it depends on, obviously the noise channel itself, right? It also depends, I think, on the rate of the noise. Is the rate of the noise weak, by which we mean sort of a short hand for like one over the system size, or a constant over the system size? Or is it a constant? Not C over N, but constant, right? I think these all depend, I think this all depends. What does Google have? They think they're close to the uniform distribution. Is it true? Not entirely clear. It's true that I think they have a high entropy distribution. It's not clear how close it is to the uniform distribution. Yeah, yes, yes, yes, we don't know. We don't know. It's absolutely possible. I think it's very possible. Yeah, what do we talk about noise though, hopefully at the, either the last lecture, I'm hoping the last lecture, or if not you can read the slides. I have a whole thing on noise. These questions will come up. But it's the right thing to ask. Okay, let me keep going. Okay, so why is XCB hard? Or why is X hog hard? Right, well, the analogy is hog is to quoth, what XCB is to X quoth, of course, right? So, what is X quoth? Well, there's no efficient classical algorithm that given its input or random circuit C, produces an estimate P to the output probability of all zeros with the following. And now I'm gonna walk you through what this is because it's, if you thought the other, if you thought quoth was a little bit reverse engineered, this one, you know, well I think this one is a bit artificial, but I think it's interesting, very interesting problem. So let me tell you, let me walk you through what this is. Here's intuition. You have some algorithm and you wanna show that it doesn't exist and the algorithm is sort of, you know, again, trying to estimate the output probability of all zeros. Now, how are we gonna compare this algorithm? We're gonna compare it to another algorithm, which we think is sort of the trivial thing to do, which is output one over two to the n, because that's the uniform probability, right? And you wanna say that no matter what algorithm you have, classical algorithm that outputs this P value, this P, this value of P, right? That it's not, you know, it can't be much better than the trivial algorithm that outputs one over two to the n regardless of the circuit, okay? That's the intuition. How are we going to measure how well these two algorithms do? It's gonna be this mean squared error, all right? What do I mean by mean squared error? It's gonna be expectation over the choice of circuit of the squared, of the squared, the different squared. Why do we have this? Well, because that's what the reduction gave us, right? What is the reduction going to be? Well, you're gonna work it out. I'm not gonna do it on the spot, because it's messy, but you already see the intuition. It's exactly the same intuition. In other words, you know, the fact that if hog is easy, then this x-quad assumption is false. That's the contrapositive of what we want. That's quite simple to see. The structure of the argument is quite simple to see. The fact that you get this is not, you know, super simple, it just involves some tedious calculations that we won't get to, you will do later, maybe today. But how does the reduction go? So I have an x-hog algorithm. It all starts, you have the x-hog algorithm, you output the list, right? What do you do? No way, come on. Yeah, yeah, you do exactly the same thing. You do some hiding argument to have a random outcome rather than all zeros, right? You then accept, if a randomly chosen guy in the list is like that, accept here, sorry, means output B over two to the n, whatever the heavy thing is, right? B is say two. Otherwise, you flip a coin, right? You output either B over two to the n or one over two to the n. That's the heavy versus not heavy. And I'm gonna let you work on this yourself to see that this is the sort of hardness assumption you get, not super surprising that the square should be here though. Why, any ideas, my intuition, why does the square come up so naturally? Like, why are we, the mean squared error? That's very, very natural for XCB. Why is that? Not surprising at all. Yeah, exactly, because what is the XCB? It's the dot product. If you're looking at the ideal distribution, right? In other words, if P equal P ideal, then clearly you're taking a whole bunch of squares, right? So, you know, at a high level, XCB is all about the squares of the output distribution. So it's not surprising at all that this is going to be the measure that we're going to talk about. The rest is exactly the same as the reduction we were talking about before. Just a little bit more simple pushing, that's it. Yeah, aha, yeah, yeah. Good, it should in the sense that we're gonna talk about algorithms that can do better than we think because of some depth issue. But no, not as written, not as written. Yep, question. Of course, you can always add a depth thing, and then you're probably in somewhat better shape. We'll talk about that in a moment, actually. Other question? Okay, great, how much time? 10 minutes, okay. I'm gonna try to go a little bit fast through what's next. If we need to slow down, because it wasn't clear, we'll go back to it tomorrow. Okay, so let me give you a few open questions first. I've always thought of this intuitively as a very lossy reduction that you're sort of giving away, that this bias, this one over two to the n bias is kind of maybe unnecessary, maybe more of a function of the reduction than it is anything about the hardness of hog. But I don't know, I don't know a better reduction. It just seems like you're losing this one over two to the n bias because you're sort of naively picking a random outcome and maybe there's something better you can do. I really don't know of one, but it seems lossy to me, okay. Another thing that's really important about XEB, which my group has worked on for a long, long time now, several years, is that we think there's actually like real reasons to care about the XEB score. So it's not just that it might be hard and it's sort of contrived. We actually think that under relatively mild assumptions about the noise, it well approximates n body fidelity. So fidelity, sort of the overlap of the state that you're implementing and the ideal state, that's something that experimentalists really, really care about. Fidelity is very hard to measure, takes a lot of samples. And under reasonably mild assumptions about the noise in the experiment, actually fidelity is the same as XEB, which is really cool because XEB is really cool because XEB is sample efficient, right. We didn't know how to do that before. I think in fact that's probably the legacy of XEB, well past quantum advantage, it's gonna give us a somewhat sample efficient but not computationally efficient proxy to measuring fidelity in relatively large systems. Not infinitely large because we have to compute the ideal output probability. We have several papers about this so you can check them out. Yeah, okay, that depends who you ask. You ask physicists, they'll say definitely because what you can do is make loads of assumptions or okay, let's not pick on them too much. Make some assumptions about the noise in the experiment. If you make noise assumptions, then yeah, there are actually a lot better ways to do this. That shouldn't be super surprising, right. The hardness of these arguments, the reason it's so hard to test TVD or whatever is because we're wanting to assume nothing about the distribution whatsoever. If you make a strong assumption about the distribution, maybe it's not so surprising that you can do a lot better. I personally as a computer scientist who doesn't really think that much about noise in an experimental level, I'm worried about that, right. Why, let's go back to what we said was the motivation of this whole thing. To me, I want to think of quantum advantage as a test of quantum physics. It's not a test of quantum physics if you're assuming that quantum physics works in a very strong way. Now, I will say that even me, I often cheat a little bit and I say okay, well maybe we're gonna make a little assumption to make things easier, but you kind of have to think about what those assumptions are and you have to be really, really careful with that sort of thing. So I would say no, I haven't seen really a really good sort of computational efficient argument, but people will disagree. These are really, really good questions though. Can I get one more? Yeah, I mean that's true. We know what it is. The expectation of the all zero output is like one over two to the n. So if n grows, it's clearly going down. But that's not really, I mean I think what's really going on here is that you go back to the first, what we talked about last time, right. That's a hard thing to output. The output probability is fundamentally computationally difficult thing. We should not expect there to be, even on average for a random circuit. That was exactly what we talked about. Shouldn't expect there to be a polynomial algorithm for doing that thing, right. So it's a hard task. It's a hard task as an asymptotic, as n goes larger and larger. It does also happen to get smaller and smaller. Yeah, yeah, yeah. Yes, yes, yes, yes. Of course people do that, right. And in fact, the way they really, even Google, the way they're like measuring their largest XCB, right, is by sort of doing some game where they kind of cut half of the system, kind of look at XCB of that, cut another half of the system, look at the XCB of that, right. Then sort of make some plot, right. And so it's not too different from what you're saying. You're sort of extrapolating somehow and you're sort of trying to become confident that this is at least what you're expecting at smaller systems. You can do process tomography if you like. The problem with that is that, okay, this is now more of a computer scientist speaking in less of a physicist. I think physicists like that a lot, right. They say, well, what would happen? What could possibly happen if you're seeing something at 20 qubits, it seems to work. You look at two 20 qubit system sizes, they both seem to work. What could possibly happen when you get larger? Well, I'm not really sure what would happen. But what I do know is that the whole point of quantum advantage is to be testing this high complexity regime, right. And so if you make a really strong assumption you're testing the high complexity regime by assuming something in the smaller regime, right. That to me sounds like an assumption I might not be comfortable with, right. Fundamentally, and in fact what I've learned from thinking about this for years and years and years now, even the experimentalists don't really understand their noise very well on a very large scale, right. So we hope, for example, that there's not large scale correlated noise, right. We hope that the noise is very localized. Do we know that? Depends who you ask. I would say we probably don't and so I'm a little uncomfortable making a lot of assumptions about the noise. Even the sort of assumptions that you said, of course, if you know it fails for 20 then that's not such a good sign for 53 but they've tested 20 and it seems to work pretty well. Yeah, it looks like there's some correlations but they're relatively minor. It's basically local, yeah, yeah exactly. Okay, sorry, I would like to go ahead in the like what, five minutes I have something? Okay, five or 10, let's take 10. So here's what I'd like to say. Here's sort of the surprise I would say. In fact, this is, there are a few simulation results that came out. This is to me maybe the one that caught me off guard the most, okay. So it was in 2021 by Shungao and a few other collaborators. I think actually Boas Barak was also on that paper. Then it was explained in a more recent paper by Doreed Harinov and also Shungao and others, Umesh Vazharani, Yunchao Liu. But I think they're most, okay, from my opinion the most surprising result here was the following. It's just that Xquath is false, okay. Now it's only false at sublinear depths while we were talking about depths earlier when that came up, okay. But there's actually an algorithm, it's very simple algorithm it turns out, that actually gives you an Xquath score. It does better than this one over two to the n thing. When D is less than n, okay. I think it was a very surprising result especially for computer scientists. Let me also say it doesn't falsify X-Hog, right. Cause remember what was the reduction, at least not in the noise list case, right. The reduction was just that if you want to prove that X-Hog is hard it suffices to show that Xquath is hard. So Xquath being easy it doesn't say anything. Necessarily, it just says let's not talk about Xquath in this regime, okay. So let me start by, so here I'm just calling the X score, I'm just defining it to be this bias of the Xquath algorithm, right. That's exactly what we had written before. Xquath says that it should be hard to score any better than two to the minus n on this, okay. I'm just copying and pasting from what I was calling this the X score, yeah. Hopefully we're not confused. Why did we think this might be true? We thought this was true because there's this argument of the path integral that somehow the best thing you could really do to estimate if the all zeros probability was sort of sufficiently heavy or not was to estimate terms in the path integral, just a few of them, cause you can't afford to do too many, that polynomial number, take an average of the value of each of the terms in that path, okay. Now the idea was well okay so we're making like, let's say there's clearly greater than two to the n number of paths, okay. With uniform value and expectation, so it's unclear how to achieve an advantage that scales like two to the minus n. Seems like it should be more like two to the minus m or something like that, the size of the circuit, cause that's the number of paths scale exponentially with that, okay. Here's the observation, very simple. This is not true, okay. It's true in the computational basis, but by changing the basis, we're gonna consider the path integral in what I'll call the poly basis. I'll explain what I mean by that in the next few slides. The values of the paths are highly non-uniform and you can use that to your advantage by simply carefully choosing a particular poly path in your path integral, one of the, you know, exponential number, but you choose that one path, you calculate the value of just that one path and remember these poly paths have the, sorry, these path integrals all have the property that we're considering all have the property that calculating the value of each path is efficient. It's just that there's way too many of them to use that as an algorithm, right. Turns out that actually scales as something like two to the minus o of d, right. So if d is less than n, that's a problem. d is greater than n, not a problem. In fact, remember that was an assumption in the original quoth protocol. It just was no longer an assumption or at least was omitted when people were thinking about this from x-quoth. On the other hand, there's a reason that we'd want the depth not to be so large in practice. In other words, you know, so okay, here's one solution but only a computer scientist would say this, unfortunately. Solution is, oh yeah, it only works for sublinear n. Take n equal to n to the hundred. Physicist is not so happy with this. Why is physicist not so happy with this? Yeah, because the noise overwhelms you and it scales exponentially in d, at the very least. Depending on what you're talking about, sometimes dn, d times n. So you want to keep d shallow, okay. So that's the big problem here, okay. Well, let's see, what, what, two minutes? Three, five minutes? Okay, then I'm gonna keep pushing. Let me tell you what a polypath integral is, okay. So rather than thinking about a quantum circuit as applying a unitary gate to a vector, okay. We're going to just think about this in density matrix territory, just for analysis, right. So we're gonna think about it as applying unitary channels to density matrices that happen to start in the pure state all zeros. Yeah, okay. Now we're gonna denote the normalized polyoperators by this piece of n notation. I'm just normalizing them by square root n, otherwise it's just I, x, y, z, tensor n, okay. No big deal. This is the basis so we can write any n-cubit density matrix as a combination of these n-cubit polyoperators where the coefficient on any particular n-cubit polyoperator is trace of t times rho. That's elementary quantum information. Okay, now in the computational basis, we write this output amplitude as a sum over paths of sort of what we call like a, I'll call it a transition amplitude. It's this x, y thing. It's just in fact an entry in the unitary, right, times the coefficient, okay. Very similar way in the poly basis, you can write the poly coefficient of the s poly, that's an n-cubit poly, right. After you apply a unitary to whatever density operator you're talking about, that's rho, as is sum over an exponential number, you know, all of these n-cubit polyoperators, right. Of a transition amplitude, it's, I'll call it a transition amplitude. It's this trace of s, u, t, u transpose, and then times the coefficient, right. So it's not, again, this is elementary. The way you obtain that is nothing more than plugging in rho into this expression that I have before where you write it as a combination of n-cubit polyoperators and then you use linearity of trace. That's how I get that, okay. Now, we can express this as a poly operator, sorry, yeah. We can express the output probability, px or p all zero, let's say, of the circuit as a poly path integral. And it shouldn't surprise you, let's break up c into each layer of the gate. Now, just for convenience, because I see what's coming next and you probably don't, I'm not, these are not two-cubit gates anymore. I could have written them as two-cubit gates and said these are just layers of two-cubit gates, okay. cd times cd minus one and so on, where each of these guys acts on n-cubits and you can think of them as n over two two-cubit gates, okay, just for convenience. Okay, now, we write the output probability. Let's maybe think about px, but it could be p, p zero, whatever. Any outcome probability can be written like this. What is this? Well, this is the poly path integral. It's a sum over all poly paths. Each poly path is a d plus one tuple of n-cubit poly operators, okay. And it's written like this. The value of each path is just the product of a whole bunch of traces. There's the bookends. The bookends are just the corresponding coefficients of the input and output state. That's the first and the last term, okay. And then in between all of the other traces, those are just the transition amplitudes. How do you get this expression? How do you derive it? That's a problem on your sheet, but you've seen everything you need to know to derive this expression. You repeatedly apply what? You repeatedly apply the, yeah, okay, okay, hold on. Yes, classically it's very simple. Sorry, in the computational base, it's very easy to see that that's the resolution and the identity. Here it's maybe a bit harder to see. But what I wanted to say is just that you repeatedly apply this equation that I had before, right, where we were expressing this particular poly coefficient as a sum over particular thing. We just do this, that was for one unitary applied to row. Do this again and again and again and again. What you then get is a d plus one tuple of n-cubit poly operators, because you get a different one every single time you apply the unitary, of which there are d, right? And you get something like this. Okay, and we're just going to define this to be a sum over all values of a function f, which is just going to be defined to be the value of the path. So f is, there's nothing more than a definition. I'm just defining f and c and s and x to be the value of the path. Because that's what the path depends on. Okay, good. Now, oh, we're done, we're out of time. Okay, good. Then here's what we're going to do. Here's how the algorithm works. I'll describe why it works next time. Algorithm is going to simply pick a very special path, okay, and compute that path and output it. That's all it's going to do. Okay, we'll talk about that next time, and then we'll talk about noise. Thanks.