 So, when we were here last, we sort of ended in the middle of a proof. So let me sort of give you a refresher about what we were doing. We were talking about these quantum sampling problems, right, and the goal of the quantum sampling problem was, you know, exactly what it sounds, the sample from the output distribution of a quantum circuit, right? So, you know, the task is mathematically well-defined, you're given as input a quantum circuit, C, and then the experiment is supposed to run the circuit on the all-zeros input state, measure all-in qubits on the computational basis, and it gets its sample, right? Some string Y, which is a binary string of length N, right? So we define this output probability, which is simply the probability that this experiment sees a particular outcome Y when it measures, called that PY of C, and then the goal that we'll finish now is to prove the impossibility of a efficient classical algorithm that does the same thing, right, that given a classical description of a quantum circuit outputs a sample from exactly the same distribution as the quantum circuit, okay? Okay, great. Now, to analyze this sampling task, we introduce two sort of simple problems, or simple to describe problems. I call them the classical and the quantum sum problem. We're given a classical circuit that computes a Boolean function, which maps to 0, 1. We're supposed to compute the sum over all inputs X of the function F of X. Quantum sum problem, exactly the same thing, but now the efficiently computable function maps to plus minus 1 rather than 0, 1. And how hard did we determine these were? Sharpie. Sharpie. Thank you. Right? It's very simple to see. It's because, certainly, if I can compute either one of these sums, I can compute the number of satisfying assignments to a Boolean formula, okay? A Boolean formula being a very special case of classical circuit. Great. Okay, but now this discussion became a lot more interesting when we considered relaxing this problem to consider approximations, right? So I defined this classical approximate sum problem, which exactly the same input, but now the output is a multiplicative approximation to this classical sum, okay? And what we determined using an algorithm called Stockmeyer's algorithm is that this problem is strictly easier than the exact case unless unforeseen complexity consequence happened, namely the pH collapses, right? And the reason we believe the pH doesn't collapse is the same reason we believe that P is not equal to NP. It's something that we can't prove, but 99 out of 100 complexity theorists believe it, okay? It's kind of the best we can do in complexity theory. Okay, and then the way we're going to use Stockmeyer's algorithm is the following. It's sort of a conditional statement. We're going to say if a classical sampling algorithm exists that efficiently samples from the output distribution of any given quantum circuit, then outputting a multiplicative estimate of any probability that that classical sampler output, okay, of any outcome that that classical sampler output is strictly easier than sharp P, okay? We proved that, and then we looked at the quantum approximate sum problem, and we saw that the situation was really different, okay? So the quantum approximate sum problem, exactly the same problem. We still want the same sort of multiplicative estimation, but now we just made this very sort of seemingly innocuous switch, which is again that the function G is now plus minus one value instead of zero one value. And what we concluded is that in fact, unlike the classical approximate sum problem for the quantum case, or the function is plus minus one value, the multiplicative approximation to this sum is exactly as hard as computing the sum itself, okay? And only sharp P hard, okay? And we proved this. We used some binary search and padding argument, and I think that's roughly where we ended. Is that right? Fantastic. Okay, great. And then, oh, sorry. I think what we really ended is this following, which is, I told you to sort of take on faith that essentially the same argument can be used to show that giving a multiplicative estimate of the sum X, G of X squared is hard, and it's still a binary search and padding argument. It's very, very similar. Okay. Okay. Great. So now here's, I'm going to call this consequence two. It's a consequence of the hardness of the, you know, of multiplicatively estimating this quantum sum. It's that estimating the output probability of a quantum circuit is sharp P hard. Here's how do we see this. So here's the formal claim. I'm going to claim that given a quantum circuit C, estimating the probability that that circuit outputs all zeros when we measure all n qubits on the standard basis, that's this P zero of C quantity, that that's just as hard as the squared approximate sum problem. Okay. Good. The proof is by very simple quantum circuit. It's sort of ubiquitous in quantum algorithms, and I call it the quantum Fourier sampling circuit. So here's how this works. Here's the setting. You're going to give me your favorite quantum function, G. It's, again, a function G that maps to plus minus one. And then I'm going to use my ability, I'm going to assume I have the ability to compute the output probability of any quantum circuit, and I'm going to use that ability to compute this sum. Okay. Let's say, multiplicatively approximate. All right. Here's what we're going to do. We're simply going to, you know, run the following circuit. We'll first take Hadamard on all n qubits, we get the uniform superposition, right. Next we apply G, right, and now we get sort of a uniform superposition in the sense that, you know, we have uniform magnitudes on each of the amplitudes, but the sign of each of the amplitudes is determined by the Boolean function G. Okay. That's why this is important that this is a plus minus one valued function. Otherwise, that would not be, that would not make sense, that quantum state. Okay. Then we're simply going to take all n, you know, we're going to take the Hadamard transform on all n qubits, and we're going to measure all n qubits in this standard basis. Okay. Can anyone read my mind? What's so special here now about one of the probability amplitudes? I'm thinking about the probability that we get all zeroes. It's proportional to what? The sum over all inputs, X of G of X squared, right. There's many ways to see this, the most basic of which is that if you simply look at the last unitary that you just applied on n qubits, this Hadamard transform, you look at the, you say, the first row, right. That's true about the signs in that first row. They're all positive, right. And that's exactly the property that we're using. Yeah. Okay. So, if you look at the output probability of this quantum circuit, the probability of this particular outcome getting all zeroes, right, it's going to be proportional to this hard quantum sum. So now notice that estimating, giving a multiplicative estimate to the output probability of any outcome of a quantum circuit can only be as hard, you know, is at least as hard as giving a multiplicative estimate to this, you know, hard quantum sum, right. We needed the square because we're talking about Born's rule. Great. Okay. So now we're almost done. We're going to, we're going to put consequence one and consequence two together. Okay. In the following way. So, it's going to be approved by contradiction. We're going to assume that there exists an efficient algorithm that samples from the same distribution as any quantum circuit, right. So just by definition, that's this, that's this algorithm here that takes two inputs, the circuit, and then the, you know, sequence of random coin flips and outputs any outcome Y with the same probability as the quantum circuit would if we measured all n qubits. Now by consequence one, right, that's the Stockmeyer consequence, we know that estimating the probability that S outputs all zeros for that matter any outcome, right, is strictly easier than sharp P unless the pH collapses. Okay, but oh wait, wait a second, but consequence two, yeah, says that giving a multiplicative estimate to the same outcome probability, yeah, P0 of C is sharp P hard, well, because it's at least as hard as the squared quantum sum problem, yeah. Well, it's a contradiction, so there can't be such a sampler algorithm unless the pH collapses, yeah. Now, I don't want to take credit for this, the first time we see a similar type of argument is by Terhalle Diebenchenzo, way back in 2004. It sort of took a long time for people to realize the importance of the result that they had sort of come up with and until around 2011 when there was a resurgence due to a sort of similar, but you know, a little bit stronger result due to Bremner, Joseph, and Shepard, Aaron Sinarchipov had a similar result, many people have claimed a similar result. If you look at these papers, their proof is not going to look much like what I just described, but morally it's essentially the same, okay, whatever that means, but this is how I like to summarize their proofs, okay. Can I take any questions before I go on to more sort of advanced material? Yeah, yeah, yeah, that's right, precisely. Yes, let's talk about, yes, that's a fair question. Let's talk about how this proof works. So, in fact, I'm going to flip the order because maybe that's pedagogically a little better. Consequence two is not conditional, it's simply saying, you know, you want to estimate the output probability of a quantum circuit, that's sharp P hard, no conditions whatsoever, right. Consequence one is conditional. It's saying, now, if we had a classical algorithm, an efficient classical algorithm that's sampled from exactly the same distribution, then estimating the probability of all zeros of that classical algorithm is easier than sharp P. Now you put these two things together and you say, okay, well, we're assuming that we have a classical sampler algorithm, right. Therefore, the output probability of seeing all zeros of the classical algorithm, which is the same by definition as the quantum circuit, because that's what it means to have a classical sampling algorithm, that should be easier than sharp P. But we've proved it's as hard as sharp P. Well, you can't be easier than sharp P and as hard as sharp P at the same time, I hope you can agree with me on that. Yeah, that's how it works. I had another question, can I get it now? Yeah, okay, sorry, I didn't quite, can I think about, ah, the depth of the circuit, excellent question. No, very, very little. Okay, sorry, I should agree. The answer is really yes, but extremely little, all right. If you count the depth of the circuit, right, it's just that quantum Fourier sampling circuit. So there's a layer of Hadamard's on all qubits, that's depth one, right. The second layer, actually a little bit more, let's say the second layer that queries the function in superposition. But we needed that function to be a Boolean formula. And it turns out that it needs to be something like a CNF for that to work, for the hardness. So let's say, let's give ourselves like another depth, two, depth three, something like that, okay. So now we have depth four, then we just have a, another layer of Hadamard's that adds another depth. So let's say depth four, you know, something like that could be depth five, definitely constant depth circuit, yeah. Yes? No, yeah, okay, good. No, no, that's not how this works. Yeah, it's not how this works. Let's talk about, let's again sketch how the proof works. So what we're doing is we're taking the classical sampler, we're assuming it exists. And we're saying that the probability that that classical sampler outputs all zeros, that's an n-bit string, right, can be redefined in terms of a classical sum problem. Now how did we, do you remember the proof? How did we do that? So you're absolutely right, we have really a Boolean function that's computed by this sampler. And we're interested in the probability that that Boolean function, I guess it's not Boolean, it's n-bit function. We're interested in the probability that that Boolean function outputs zero to the n, or whatever, yeah. Now we had a clever trick that we talked about last time to map that to a zero one function. How did we do it? Precisely, precisely. So we defined a zero one valued function and all it did is took the input of this function which happens to be a random bit string and an output one, if on the input of this random bit string that sampler outputs all zeros, otherwise it outputs zero. Therefore, we've taken the probability that that sampler outputs all zeros and mapped it into sort of a zero one sum problem. So if that was the question, that's maybe the better answer, but we talked about that last time. Yeah, that's a good question though. Any others? Well then I have a question for you. So wait a second. We just tried to prove that you can't sample from the output distribution of any quantum circuit unless the pH collapses. But the quantum algorithm does sample from the output probability of a quantum circuit. In fact, it's completely trivial. Takes the all zero state, runs the circuit, measures all n qubits. By construction, that's exactly what it does. So wait, why doesn't the pH collapse? What step breaks when you consider the same hardness analysis, but have a quantum circuit rather than a classical sampler? This was in the problems that yesterday, but this is maybe the most important problem here. So let's talk about it a little bit. Gonna wait for a few hands. We really, we're stumped about this? Yeah, why not? Yeah, why not? Ah, that's right, but it's like, I think it's a step before that. It's a step before that. You absolutely can't do the Stockmeyer's approximate counting algorithm. That's the whole point. What are we counting in the Stockmeyer approximate counting algorithm? The number of inputs that map to a certain outcome, all zero, and those inputs in this case happen to be random coin flips, right? So it was absolutely critical in this argument that we assumed that a classical sampling algorithm was, could be written as a deterministic algorithm, S, whether I said that or not, that's what I meant by S, a deterministic algorithm that takes as input some random coin flips, okay? And then we use Stockmeyer's algorithm to count the fraction of coin flips that output some outcome, like all zeros. Is it obvious that a classical, that sorry, that a quantum algorithm can be written as a deterministic algorithm with some random coin flips? Absolutely not. Not only is it not obvious, you can think of this in another way to prove that if you can write it out like that, you can write a quantum algorithm as a deterministic algorithm with some classical coin flips that the pH collapses, okay? So sort of an interesting point, I think, that's I think often missed when we talk about the theory of quantum advantage. What this is saying is pretty profound. It's saying that there's something very different about quantum randomness and classical randomness. Yeah? Question, yes? I actually thought it was a different reason for me to tell you more about it. Yeah, yeah. Uh-huh. When you sample a circuit, it's very rather than a different pattern. Right, yes you do, but that's not how this is working. Okay, but it's a great question, so I'm happy to slow down and talk about it. So here's another way this analysis could have worked. In fact, it's the way that the media thinks the analysis works. Whenever I talk to the media about quantum advantage, they're absolutely convinced that this is how it works. And it's not right. Let me tell you what they think. And you can't get it out of their head, trust me. They think that, okay, well there's these, what you show is that there's this quantum algorithm and the probability that that quantum algorithm outputs some particular outcome, like all zeros or whatever, is something that's really hard, like sharp P hard. It's like this one of these hard quantum sums. Well, okay, fine. So what we're gonna do is we're just gonna repeat sample many, many times. And we're gonna then look at, we're gonna aggregate the outcomes and we're gonna look at the fraction of times we saw zero and we're gonna use that as a good approximation to, or as an approximation rather, to the probability that we see zeros. Yeah, that could totally be what you could do. You could also do that classically in the classical analysis. If you had a classical sampler, they think is what you were asking. Repeat sample many times. Okay, but hold on. The media is wrong. That does not work very well. Why does that not work very well? I have to say, I think this is a problem in the problem set today, but it's a really good question. We can, since we got it in the square, why does that not work? Why do we have to go through all of this, like Stockmeyer's analysis, a really fancy, like 80s complexity theory? Why do we have to talk about that? Why couldn't we just say, oh, you have a hard probability? You know, in some particular outcome, repeat sample many times. Get an estimate to that. But what's wrong with that? Yeah. It's not just that you might never see that outcome. You will essentially never see the outcome. Like the problem is that output probability, the all zeros, it's not just the hard sum squared, right? It's the hard sum squared over some like, I think it's like two to the two n. It's an exponential normalization. So it's an exponentially small quantity. Yeah? That means that if I'm trying to do this, this repeat sampling strategy, right? I'm going to have to have a huge number of samples before I get any reasonable sense of what that probability is. Now, you can formalize that. You formalize that using a Chernoff bound. I'm going to let you, and you can see exactly what that error is. It turns out to be a one over polynomial multiplicative, sorry, additive error if you repeat a polynomial number of times. Well, now the problem is one over polynomial, in fact, so what is the problem? So I just, so sorry, you can put this together now. So I have this exponentially small probability, but I really want to know what it is because I didn't code my hard problem. I can break people's bank account into bank accounts if I had that hard sum. That's a sharp P hard problem. It's tremendously hard. Much harder than factoring or whatever, right? Or SAT. But I'm going to make the sampling procedure, the naive way to do it, incurs a one over poly additive error. What's the problem with that? I'm here to tell you I can do that too right now. Yeah? How do I do that? Yeah, I just output zero or something like this. Some constant, right? The point is one over polynomial additive error completely overwhelms your signal. So the whole point of this analysis and the reason we had to go through all of this, the hard sum, quantum sum, classical sum approximation is precisely because Stockmeyer's algorithm does better than one over polynomial additive error. It gives us a one over poly multiplicative error. Multiplicative error are very special because the error scales with the, yeah, with the thing you're trying to estimate. And that's the whole point of all of this. Fantastic question though. Okay, but are you at least happy? Good, that's all I care about. Anyone else want to ask a good question? These are really great. Okay, great, let's keep going. All right, so this was discovered in like, depending on how you count somewhere between 2004 and 2011. But it's really not everything you could want. So in particular, this result is very much not robust. And I mean this in sort of two senses. This result has at least two weaknesses. The first is this sort of exactness assumption. It requires that the classical sampling algorithm samples exactly from the output distribution of the quantum circuit, or at least inverse exponentially close. How do we see that? Because the hardness result is entirely about the hardness of an inverse exponentially unlikely probability. So if I have the ability to mess up a very small amount of the probability distribution, in principle I could remove that probability or do something with it. And then I'd be stuck, yeah? So that's problem one. Problem two, which comes about when we start thinking about these random circuit experiments, is that there's sort of implicitly here a worst case assumption, right? The sampler algorithm that we're talking about needs to work for all quantum circuits, not just most quantum circuits with high probability, most random quantum circuits. How do we see that? What was worst case in what I talked about? Yeah, the sharp, or really the hardness of the quantum approximate sum problem, right? That was only true when you consider computing this sum, or the sum squared, let's say, for any quantum function g, right? We don't know how hard that is if I just say, okay, most quantum functions g or whatever, right? And we can talk about that, but a priori we don't know how hard that is, right? So it also needs this worst case assumption. And if you have an experiment, that's not such a natural thing to do. It means to convince me that you're solving a hard problem, you need to implement this experiment on every single circuit potentially, yeah? That's not really exactly what my experimentalist friend wants to hear, okay? So a major goal in the theory of quantum advantage, maybe arguably the most important problem that remains open, is to prove the impossibility of a more reasonable algorithm that addresses both of these goals, okay? So both of these weaknesses, right? In other words, prove impossibility of a weaker algorithm, okay? That's approximate and average case, yeah? So it's a weaker algorithm, so proving hardness of a weaker algorithm is a harder job, great. Okay, and the way we usually model this is in terms of bounded total variation distance. We'll talk about total variation distance a lot in this lecture, so you probably know what it is, but if you don't, it's like L1 distribution, the L1 distance for probability distributions, right? We take the sum for all outcomes and we, you know, of the absolute value of the difference between the two probabilities of that outcome, yeah? Okay, great. So what we'd really want to prove is that there's no classical approximate average case sampler that does the following. It takes its input, again, two things, a circuit, random coin tosses, and samples from any distribution X that's close in TVD to this circuit, okay? But with high probabilities. So in other words, these average case statements always have the form of like there's a good day and a bad day, right? That's how I think about them. On a good day, that is for two thirds of the circuits, this algorithm is going to sample from a distribution that's close in TVD to the output distribution of that circuit. On a bad day, that's probability one third over the circuits, this algorithm outputs complete nonsense, something that could be ridiculously far and not even correlated with the output distribution of the circuit. Nonetheless, we're trying to show that this weaker algorithm still cannot exist, okay? Let me ask a question though, because this is another thing that gets confused in the literature more often than not. This new algorithm, which is weaker, right? And do you think it's sort of modeling experimental noise in a reasonable way? Is that why we're talking about this TVD error? Is that something that I take to my experimentalist friend and they say, wow, I'm so glad you've proven this result. That notion of error, this bounded TVD distance from the ideal output distribution, that perfectly captures what I'm seeing in my experiment. Is that what they tell me? I'm seeing the right answer, but can someone tell me a little more, why is that? In fact, they laugh in your face when you tell them this. But why is that? The very simple explanation. Anyone has it to guess? I can't tell if you're raising your hand or not. Okay, let me give the most naive reason why the answer is no. It's a little bit naive, but it captures the spirit, right? Well, let's think about what this is really saying. All complexity results are in the asymptotic regime, right? So the system size is getting larger and larger. This epsilon we wanna think about is some reasonably small constant, right? It can't be larger than some constant because actually the uniform distribution is already constant close to the output distribution of a random quantum circuit. It's not hard to sample from the uniform distribution. So this only makes sense when epsilon is sufficiently small. So that means, so let me now tell you what I tell my experimentalist friend. Well, here's how you invoke this hardness result, which I can't even quite prove at the moment, but I can almost prove it in making progress. You just make your experiment larger and larger and you don't change the error, right? That's not going to be very popular, right? So in fact, these experiments, right, they have all experiments I've seen that don't correct their noise. They have the property that as you make the system size larger, if you don't compensate by making your noise less and less and less and less, which of course is not reasonable in asymptopia, you can exponentially decaying signal, okay? The fidelity decreases exponentially. Your error is getting worse and worse and worse and worse. It's not getting the same. They would be absolutely thrilled if their experiment had the same error as it got bigger and bigger and bigger without making their noise rate down. I've never seen an experiment like that, okay? But why do we care about this? Well, I think we care about this for two reasons. That's why it's become one of the major goals of this field. It's again, still unsolved. Well, I think first of all, because the proof techniques that we used for this exact result broke down very badly, precisely with this bounded TBD because it allows you to mess up the all zeros probability, right? Which is only exponentially small. The second reason I think it's really interesting is because I think it does actually translate well into what I call classical imprecision. That is, let's forget for the moment that we're trying to implement this experiment. Let's say that we're trying to implement this circuit in a quantum experiment. Say we're just trying to sample from the output distribution with a classical computer. Well, then actually this bounded total variation distance is not such a bad notion. In fact, many classical simulation algorithms that classically simulate quantum circuits have the property that if you run them for a longer time, they would actually get a better approximation. In fact, that kind of makes sense for a classical algorithm. Classical algorithm does not have noise in the sense that a quantum algorithm would. So it makes sense that if you run the classical sampler for a longer and longer time, you might even be able to do better. You might even have an algorithm that runs in polynomial and in one over epsilon time. So you can get like an inverse polytotal variation distance rather than constant. Now, of course, we're hoping that such a thing doesn't exist, but I do think this TVD notion is a nice model for classical imprecision. Not a good model for experimental error. Even though that's how like nine-tenths of these papers that talk about this will motivate that. Okay, any questions about that? Yeah, that's right. So in other words, in other words, you have two distributions that I'm considering. The distribution that's actually being sampled, say by the classical algorithm that's told to sample from the quantum circuit and the ideal distribution. And then you say, how close are they? That's what it is. Yeah, right, because it doesn't make, even for a classical algorithm, this exact sampling is probably too restrictive. Saying that there's no error whatsoever that this algorithm can make. That's really, really tough. I don't even know what it means exactly when you think about like exact arithmetic. You have to talk about those sorts of things because we're making such, because we're not making errors, right? So, you know, this is the major goal. It's very well-motivated. It just happens not to be motivated directly by considering experimental noise. And it's a very important point, actually, okay? Great, so let's talk about this. So how do we analyze the hardness of this new approximate, you know, average case algorithm? Well, and by the way, we can't quite prove this, but we've gotten better and better results. So let me tell you about this. Well, the central problem of study is what we call the delta random circuit estimation problem. And here's what it is. We're given as input, quantum circuit C, just like before, right? And now we want to output a number Q that gives us a delta approximation to the output probability of that circuit with high probability over the circuit. So probability two-thirds, let's say. So notice that this is an average case and approximate problem. Again, there's a good day and a bad day. Not a good day, two probability two-thirds over the circuit. This algorithm is going to give us, or this problem gives us a delta estimate to the output probability. But with probability one-third over the circuit, it can output complete crap and still solve the problem correctly, right? So it's an average case, approximate problem. Now, using Stockmeier's algorithm, it turns out what we would get. And so it suffices to prove the following statement. It suffices to prove the hardness of average case, sorry. Let me say it one more time. To prove the hardness of average case of approximate sampling, it suffices to prove that the delta equal two to the minus n random circuit estimation problem is sharp P hard. Now, there's only one sort of slight of hand that I'm pulling here. Otherwise, you'd understand exactly where this is coming from. This slight of hand is that now we're writing for convenience, for pedagogical reasons, we're writing the error as multiplicative, as a additive rather than multiplicative, which we talked about before. But it turns out that, for random quantum circuits, which is also what we're talking about, because this is an average case problem, works with high probability over a choice of circuit, the size of the, oops, sorry, the size of most output probabilities of a random quantum circuit is around two to the minus n. So in fact, this two to the minus n output probability is actually a good multiplicative approximation as well. That's why we're invoking this stock Meyer algorithm. If you understand that, that's great. If you don't, you can just take for granted that the goal, in other words, to prove hardness of sampling, it suffices to prove that this delta equal two to the minus n average case, additive approximate problem is sharp P hard. Okay, so that's the goal. That's what we've been working on as a field for over 10 years now. We still have not been able to solve this problem, but let me tell you how we've been doing this as a timeline, right, in terms of delta. So remember, as delta gets larger and larger, that gets an easier problem, right? And so it's harder to prove that this easier problem is hard. Right, so in 2018, in our original result on this, we didn't even bother quantifying what the delta was. Well, we can get an exact, we showed exactly computing the output probability of most random circuits was hard. It turns out that you can go back and retrace our steps, which is what Movisog did, and you actually get that the delta we end up showing is something like two to the minus o of m cubed. Now m here is the size, the number of gates of the circuit. Always gonna think about it as scaling like n times the depth. n is number of qubits, d is depth of circuit, m is n times d, okay? So in our original work, we didn't even bother saying what it was, we just like, ah, it's two to the minus some poly of m. Movisog in 2019 has this really nice idea that we really should be accounting for these errors. And he works it out, it's something like, and it takes a little bit of work too, it's not obvious, but it ends up being that what we showed was that the delta equal two to the minus m cubed problem was hard, okay? We were stuck for about two years at a breakthrough, and then we could prove two to the minus o of m log m, okay? But we're still not at our goal, which is two to the minus n. Now one thing we can say though is essentially all of these results, which means that not a few of them that may not matter, work even at constant depth. So you can take m equals n times a constant, that's your d, maybe it's think four, five, something like that, sufficiently large constant. So that means that we're off, in that case we would be off by what we want by a constant in the exponent, okay? Well time's the log, time's the log, okay? But we're not there yet. Now in Bose Unsampling, that's this linear optical experiment, you can do a very similar analysis, and things look a little better. In fact it's kind of infuriating because of how close we are to what we want. Well it turns out the goal, by the way the goal, the one over two to the n, it's really one over the Hilbert space dimension of the relevant experiment. So for an n qubit experiment, you have two to the n dimensional Hilbert space. For Bose Unsampling with n photons and m equal n squared modes, the relevant Hilbert space is of dimension one over e to the n log n, so that becomes the goal, that's the delta that we want to hit. The same results give us one over e to the six n log n. We can actually quantify this exponent, we are off by a six in the exponent. Seems really close, especially if you've been thinking this of these problems for as long as we have, and they started out where we weren't even like bothering to quantify what the exponent is, and now we can actually compute it, and then there's a factor of six. But we don't know how to get much past that, okay? And that's the frontier, yes? Yeah, yeah, yeah, excellent question. Yes, that would be, in fact, you're absolutely right. If there was a way, if I had a technique that could just prove that this was not in BPP to the NP, then that would be all I would need. So strictly speaking, the sharp P hard conjecture is like way more than I would need. But the truth is I don't have a lot of proof techniques that sort of say that something's outside of the hierarchy, outside of the, say, third level or BPP to the NP without proving sharp P hardness and then relying on Tota's theorem. Anyway, but strictly speaking, absolutely yes. That's a great question, any other, yes? Ah, good, what is the random circuit? It's exactly the distribution I showed last time that I said Google was doing where you sort of fix an architecture that you consider your two qubit hard random gates, although there's really no particular need to consider that distribution. It turns out these arguments work for almost any continuous distribution where your gates are drawn from some continuous measure and not at some discrete set, okay? There are a few technical requirements to the gate set but they're extremely minimal. Good, yes, yes, yes. The proof in the model is in the other case. Yeah. I do really want it else. Yes. And not harder of a thing. Yes, yes, yes, yes, yes. You're really trying for something like two to the minus N over a poly. Yeah, yeah, exactly. That's why we're so worried about all of these constants and so on. Even in Boson sampling case. Yes, right, it just doesn't work with it. It's not the Hilbert space dimension. That's what the technique needs. Said another way, another way to see this. That would not actually give you the multiplicative estimate that you want, right? Anyway, the point, but the, okay, there's many deep answers to your question. The short answer to your question is simply the techniques require two to the minus N, not two to the minus six N. Okay. Other questions, these are good. Yeah, yeah. Oh, wait, sorry. Is there any hope of improving these techniques to what? I see, yes, yes, yes, good. You know, formally speaking, that would be great. I don't see how to do it though. And the reason always comes back down to the Stockmeyer's algorithm. That's the only way I know how to translate between sampling, which is what the experiment is really doing, and computation, which is what we know how to prove. And Stockmeyer's algorithm is the culprit here. It's an amazing algorithm, it's also the culprit in this case, because it's giving you a multiplicative error. The multiplicative error in this case gives you a two to the minus N additive error. Or really, like I said, more generally, it gives us something like one over the Hilbert space dimension. And so, you know, you'd have to, to answer your question, you would have, if it's possible, of course, you could find something like that, but then you'd have to find a way of, you know, reducing from, going from sampling to computing that's sort of tighter than the multiplicative Stockmeyer bound. I don't know how to do that. And people, you know, since the 80s, have not known how to do that. Ah, yes, yes, absolutely. Give me such a circuit, we can talk about that, right? The problem is you'd also, it wouldn't be enough just to give me a random distribution, you'd also have to show that that distribution has in its support a worst case hard circuit, like the Fourier circuit that I just gave you. So it's not totally obvious how to do that. But formally, yes, I agree. That would be a possibility. Anything else before I go on? Yes. Yeah, I think there's a good question there, but as stated, it's a little bit too vague. What I would say is the whole thing is complexity. We're going to be talking about improving complexity results left and right. Probably the answer to your question is yes, but it depends on how you put it. Okay, now, how much time do I have? 50, good, perfect. So the answer, I'm gonna tell you though, how we got all of these results, okay? And the inspiration comes from one of my favorite results in complexity theory. It's a really beautiful result from the classical literature, going back to the early 90s due to Dick Lipton, who shows the average case hardness of computing the permanent. Or what I mean by this is that the permanent is hard, even on a random matrix with high probability, okay? Now, the starting point for his result is that in the late 70s, there was a very famous result in complexity theory which showed that computing the permanent of an end-by-end matrix is sharp P-hard, okay? Extremely surprising, especially in the 70s, because the permanent looks a lot like the determinant of a matrix. It's the same thing as the determinant with one minor issue, what seemed like a minor issue. It's that the determinant has a sign outside of this product, right? This sort of alternating sign, okay? And somehow the determinant is easy to compute. We compute it all the time. You use Gaussian elimination, it's not hard. It's polynomial time. The permanent, this is Valiant's result, is not only not in polynomial time, it's not even an NP. It's a sharp P-hard problem, okay? Okay, but Lipton is, he's not happy with this. He wants to show something even stronger, which is average case hardness. The hardness of computing the permanent of a random matrix with high probability over the matrix. And for the purposes of only this slide, we're going to talk about matrices with entries and sufficiently large finite fields. I promise you we'll get back to infinite fields in a moment when we're talking about quantum, but finite fields here, okay? So to boost this hardness, from worst case hardness to the stronger notion of average case hardness, he uses this algebraic property of the permanent, which is really simple. You can verify it immediately from the expression that defines the permanent. That the permanent of a matrix is a degree N polynomial in N squared variables. What are these N squared variables? The elements of the matrix, right? It's an N by N matrix, okay, great. Now here's the setting of his proof. Well, the goal is to compute the permanent of a worst case matrix, an arbitrary matrix that you give me. You tell me, Bill, I need to compute the permanent of this matrix, this particular matrix. That's the hard problem, that's sharp P-hard. That's what Valiant showed, okay? But I only have access in my pocket to a faulty algorithm. It's an average case algorithm. Here's what it is, I'm gonna call it O. It's an algorithm that works to correctly compute most permanence over my finite field. The permanent of most matrices with entries in Fp. It's originally large F, the first time, it's originally large P. So in other words, here's what O does, right? We give it as input, a matrix Y, and it outputs something that agrees with the permanent of Y with high probability. I'd say probability one minus one over some polynomial to be determined in a moment, okay? Notice this is an average case algorithm. There's a good day and a bad day, right? Good day, probability one minus one over polynomial and N, it gives me the permanent of that matrix. A bad day, probability one over polynomial and N, it gives me absolute nonsense. From the end user of the box, that's me. I have no idea when it gets it right, when it gets it wrong. I just know it gets it right most of the time. Nonetheless, I need to use this box to compute the permanent of the matrix that you gave me. Yeah? So how do we do this? Okay, it's a polynomial extrapolation argument. I think a very nice one. So here's what we do. We choose N plus one fixed non-zero points in our field, Fp. We'll call that T1, T2 through TN plus one. Then a uniformly random matrix over our field R. We're gonna fix R, okay? Now we're gonna consider the line A of T, which is X, that's our worst case matrix, plus T times R. So we're gonna take our worst case matrix, that's the one we want to know the permanent of. You gave me that, right? I don't get to choose it. And you're gonna shift it by a random matrix, a random shift, T times R. Okay, so that defines a line. Now there's two observations, and when we get, the observations are both really simple and we get done with them, they'll spell out how the proof works. Okay, so first we call this the scrambling property, which is that if I look at each I individually, A of T sub I is a uniformly random matrix, okay? Why is that? It's not hard to see at all. We took a fixed matrix, shifted it by something random, uniformly random. What we get out is uniformly random, okay? So each of these points individually are clearly correlated, right? Because they all involve X, right? But that's globally they're correlated. But if I look at each one individually, they're uniformly random, okay? So they're sort of random but correlated points. Two, is this what I call the univariate polynomial property, which is that the permanent in this in A of T is a degree N polynomial, but crucially it's a degree N single variant polynomial in T. That's of course inherited from the algebraic property of the permanent itself, that the permanent itself is a degree N polynomial but in N squared variables. Now we've sort of passed this univariate curve through things, we have a degree N polynomial in a single variable. What happens next? Yeah, yeah, yeah, yeah, absolutely. So here's what I do. Well, I use my box O to evaluate each of these points, you know the permanent of A of T1, permanent of A of T2, blah, blah, blah. Now I've specified a degree N polynomial in a single variable at N plus one points. Well that uniquely determines, it uniquely defines the polynomial, right? So what we can do is use polynomial extrapolation like grade book polynomial extrapolation, I think they call it Lagrange extrapolation if you wanna look it up in Wikipedia to recover the coefficients of this degree N single variant polynomial. Once we do that, it's totally trivial to get back the permanent of the worst case matrix that you gave me as we evaluate this polynomial at zero and by construction A of zero is X. Now the reason this works, okay, even though the algorithm is faulty is because we've been assuming here and I didn't even really tell you that the probability that this box O, this faulty algorithm works, is something like one minus one over, let's say a hundred times N, right? Doesn't have to be a hundred, but some large constant, let's say. And so what that means by a union bound, right, is that the probability that all of these N plus one points are correct is sufficiently large, okay? That's how we're getting over the fact that this box is faulty, right? Because with very high probability, we can be assured that this box outputs the correct permanent on all of the points. Once this box is correct on all of the points, we can then recover our polynomial and evaluate it at zero. So what this is saying is actually really profound. It's not something we see very often in complexity theory. It's like a handful of examples like this. This is a provable worst to average case reduction, saying that the permanent is hard even on a, you know, random day, okay? The hardness of the permanent, the reason it's sharply hard, it's not just because there's some very contrived instance. Actually happens a good fraction of time. Another thing that I'm not going to show you, but if you're so inclined, you can see this, is in fact, this one minus one over poly is not the best we can do. We can actually make this worst to average case reduction work even if you have something like one over poly. All right, so in other words, you're making things wrong almost all the time, okay? The box is extremely faulty now, and yet you can use very sophisticated, sort of it's called list decoding if you've heard of that. So error correct, classical error correction techniques to extract from this super faulty box the permanent of our worst case matrix. That's not going to be required for us, and nor am I going to explain that, but it's really cool, okay? Any questions about the Lipton proof though? Really? I don't believe it. Yeah, yes, yes, yes, yes. I'm assuming that, in fact, not only efficiently, you can look it up in wiki-pedial, it gives you an explicit form for the coefficients of this polynomial in terms of the points. So it's a super easy thing to do. There are of course, much more sophisticated polynomial extrapolation techniques that you can use, but at this point, we're just using what I call the grade school method, just like Lagrange extrapolation, I think is the formal name for it. Yeah, you have a... This is sort of a historical question. Yeah, yeah, yeah. Maybe if you don't know the exact polynomial. Yeah. Oh. Sorry, how hard would it be to find the permanent one? Oh, that's that, okay, yeah, great, great. I mean, I was like barely born at that point, so I do not remember that. Yeah, I think it's a great question. I think what you're really asking is how surprising was TOTA's theorem? And everyone I've heard says it's pretty surprising, right, so I'm not sure it was clear that Sharpey had a relationship to the pH at all, or really P to the Sharpey had a relationship to pH before TOTA, and my sense is that it was pretty surprising, but you should ask someone who was around. That's right. And also they had this sense that, the Sharpey was clearly the class of counting problems, so it was generalizing NP. That's obvious, by definition. What's not obvious, of course, is the relationship between the pH and Sharpey, and my sense is that that was a pretty big deal, so probably pretty surprising, but ask someone who's a little older. Yeah, yeah, yes, yes, exactly. Because the errors that we're making, they all come from this expression, you see this IE probability that the box works is greater than or equal to one minus one over poly. The point is we can push that poly down as far as we want, as that's a parameter of the problem. I mean it makes the result weaker, but we can push it as far as we want. Now I'm saying let's push that to like one over 100 times N. I think even one over six N is enough or whatever, you do the algebra, okay? Well, if that's the case, right, then in fact we have, so it's one over six times N plus one maybe, let's say. Then in fact we have N plus one points, we're giving it, right? And so then it really, by a union bound, right? With very high probability, whatever, five, six, or whatever I said, right? It's going to work on all the points, and when it works on all the points, then this works exactly, right? So at the end of the day we only have that this works with high probability, but of course we can amplify this probability. How do we amplify this probability? Fixed matrix X, but many, many R's, exactly, exactly, and you keep repeating it, yeah? You see how many times you get the same answer. This is an amplifiable problem. Good question, anything else? Yeah, yeah, yeah, yes, yes, yes, we'll get there, we'll get there. Yes, this is going to be very important. It's going to be important when we consider distributions over real matrices or real circuits, actually, or complex circuits, right? Because here is not really a well-defined notion of total variation distance, whatever. We're talking about finite fields, like what do we mean to be close? You have to define that. But absolutely, this is a big problem when we consider quantum circuits. Such a big problem, we'll spend a lot of time talking about it, okay? How much time? Five to 10 minutes. You know what, I think rather than racing through and talking about a genuinely new topic, this is a great place to stop. I'm just going to take more questions and then we'll stop whenever we want. Yes, well okay, we're going to give an argument which will extend this to say the complex numbers. It's just not what I did now. Like wait for, if you wait until tomorrow, I promise you, you will see the complex version of this thing. Things get a lot harder, a lot more difficult, but it doesn't mean you can't adapt it. It just means you have to work harder and that's what we do, of course. Well, okay, so first of all, I told you it kind of doesn't in the sense that we end up fixing things, but the one issue here is when you start taking the obvious, okay, the obvious issue is that when you take a worst case matrix over your fine field, but now you add t times the other matrix, right? You're not necessarily, you don't necessarily preserve your distribution anymore, right? Now think about it, let's say this is a Gaussian matrix, so what are you really doing? You're taking some fixed matrix, you're adding t times a Gaussian matrix. A Gaussian, I mean IID Gaussian, each entry is IID Gaussian. Like, do you have a Gaussian matrix? No, you have a Gaussian matrix. You're shifting each of the entry. Now, what ends up saving you though, and we'll talk about this at least in passing, is that if you take c, not c, t to be sufficiently small, right? Then you can say, ah, it's not Gaussian, but it's kind of close to a Gaussian, right? That ends up saving you, but it also ends up making this argument sort of very sensitive to errors that you might make in your extrapolation. We'll talk about that next time. In fact, it's one of those sort of the highlights of that discussion, okay? So there's a way to make it work. The way that we make this work is by essentially taking this t parameter to be really small and show that it's close to a Gaussian matrix, but it only works so far, okay? In fact, that's a big point and is a big part of why we can't prove what we want to prove, this two to the minus n accuracy. Yes, oops. Yes, precisely, precisely, precisely. No, no, well, hold on. No, no, I, what it means is that the size of the field has to scale with n, that's all. Yeah, formally yes, absolutely, but I would say that that's what I mean by Fp, but yes, you're right. For every n, you're talking about a matrix that scales, you know, the field size scales with n. Otherwise it doesn't work very clearly. When you make the, yes, it's a great question. When you make the field size, let's say we just fix it at three, then it's not clear how hard that is. It's an open question actually. No, it has nothing to do with worst versus average case. It's worst case hard, but the question is can you make it average case hard? Yeah, oh, I see, that's what you meant. Yes, that's what you meant. Yes, yes, if you take zero one, you can show it's worst case hard by Lipton, we can't extend this to average case. Yeah, sorry, I didn't understand what you were saying. Yes, it does have to do with average case complexity. Forget what I said a moment ago. Yes, thank you. Good question. Yeah, yeah, yeah, yeah, exactly. Yeah, yeah, yeah, good, yeah, yeah, right. They were, I think they were being very quick and they were reconstructing how this proof works. And we were conjecturing that you needed this delta random circuit estimation problem at delta equal one over two to the n, that we would need that to be sharp p-hard to get hardness of sampling, but they were saying, wait a second, this is using Stockmeyer's algorithm and Stockmeyer's algorithm runs in BPP to the NP. Now, I didn't, I'm not sure I told you this, but that's like Sigma three, something like that, the third level of the polynomial hierarchy, the level doesn't really matter, it's in some constant level of the hierarchy. They were saying, well, to invoke a contradiction, which is what we're trying to do, one way of doing that would be to show that that problem should not be in Sigma three because it's sharp p-hard, right? And we happen to know by what other people have been calling Todes theorem, which is a famous result that I didn't actually mention, but that we don't think that the pH is in any finite level of the hierarchy, right? But they were saying, well, wait a second, what if you could just prove that something weaker? Let's say it's not in, it's hard for Sigma a hundred. Well, that would be enough because we don't think Sigma a hundred should be solvable in the third level. And that's completely correct on a formal level, it's just that I don't have a lot of, I don't have any proof techniques at the moment that shows Sigma a hundred hardness and doesn't show sharp p-hard, or something like that. If you could come up with one because you're much more clever than I am, go ahead and that would suffice, yeah. Yeah, yes, yes, yeah, no. In fact, I'm not actually exactly sure what it, okay, hold on, first of all, we're talking about two qubit hard random gates, right? Now, and we're asking what distribution over two qubit hard random gates are we talking about? Well, it turns out that hard random is not super important. What is important, I think, is a few things. First of all, you want the support of the distribution over circuits to have a worst case hard circuit. So in other words, you have to have some, maybe really small probability of sampling from the Fourier sampling circuit, for example, right? Because that's how we get our worst case back, all right? And then you have to have a certain scrambling property that's true for hard random gates, but it's probably true more generally. And we'll talk about what that scrambling property is, but fundamentally one property we're going to use of the hard measure, which can be relaxed a little bit, but not too much, is that if you take sort of defining property actually of the hard measure, you take a fixed matrix, you multiply it by hard random matrix, you get back a hard random matrix or something close to it. Now you can generalize that a little bit, but that is the property that we're going to be using to adapt this sort of Lipton style proof. And by the way, now that we've gone through it, you kind of see why right away, why we're going to be doing something like that, right? Because the whole goal of this proof is to take a worst case matrix, so in this case we'll have a worst case circuit and make it look random, right? How do you do that? Well, you want to scramble it, right? How would you scramble a worst case quantum circuit just naively, let you apply hard random gates to you multiply, you left multiply, right multiply, whatever, however you want to define things, each of your gates by a hard random gate, right? Yeah, but no, no, but you're missing the, hold on, there's a misconception here though, it's a great question, but it's a misconception. When they talk about two designs in random quantum circuits, they're talking about the ensemble over the full circuit, not the two qubit gates. Oh, I see, they work, you're saying, I understand, you're saying they work even if the gates are not say two qubit hard random gates, but something else. Yes, but not my techniques. Interestingly, my techniques are not going to need, they're not gonna need two design properties, but we also don't know how to get it with only two, like they're orthogonal assumptions and we'll see what I need, but it's not going, the proof that I'm going to use to mimic this Lipton proof to random quantum circuits, it's not actually going to use a moment property of the distribution, at least not directly, yeah? Maybe one, do we have time for one more question? Or we're done.