 So welcome everyone. This is the first talk of the fall semester of DCS Plus. So thank you for joining us. So today's speaker is Abhisheera Tal. Before we start the talk, I'll go around the table and introduce the groups. So we have the group from Texas A&M joining us, Benjamin, we can't see you. Then we have the group from Stanford, Klamakonon here. Hi. Then we have the group from Ethio Zurek joining us. Hello. Then we have K. Gopalakashnan from North Carolina. Then we have Lebo here from RCQI. We again can't see you. Then we have, oops, Fadish joining us from MIT. Hello. Then we have the group from NYU. We have the group from Caltech. Hello. Hello. Then we have the group from Toronto joining us. And finally we have the group from Michigan. Hello. Hi. Oops. Okay. So before we start the talk, we currently do not have a talk scheduled two weeks from now. We are still working on it. But two weeks from then we have Sehshadri speaking. So October 17th. We have Sehshadri speaking October 31. So two weeks from then we have Mihal Kuski. And then November 14th, we have Urmila Mahadev. We still do not have a talk scheduled for two weeks from now, but we're working on it. So today's, and as usual, feel free to ask questions during the talk, but otherwise you'll be muted during the talk if you, and if you, and today's speaker is Abhishek Tal. So Abhishek is currently a post-doc at Stanford and was previously a post-doc at IES. And before that, did his PhD with Van Raas at Weitzman. Abhishek has done a fantastic amount of work in circuit, concrete complexities to the randomness. And most, the latest work is on quantum computing. And he's going to tell us about separating BQP from PH. So Abhishek. Thank you so much for this great introduction. And thank, I want to thank the organizing for inviting me to talk at TCS Plus and for all of you to join me in this talk. So this is a joint work with, as Anindya mentioned, Ron Raas, who was my PhD advisor, currently at Princeton University. And this is a talk about Oracle separation of BQP and the polynomial hierarchy. Okay, so I will start by recalling some of our favorite complexity classes. So we have P, we have NP, anything that can be verified efficiently. Co-NP, everything that can be refuted efficiently. And if you want to capture both NP and Co-NP, you need hierarchy. This is basically things that you can express using alternating quantifiers. And we will see the exact definition in two slides. But this entire polynomial hierarchy sits inside P space, things that we can compute with polynomial space. And we can also talk about randomized algorithms. So for example, bounded error, probabilistic polynomial time, or BPP, is known to contain P and decide the polynomial hierarchy and other assumption, we think that P equals BPP. But the main motivating question here, for our work, at least, is where does BQP, bounded error quantum polynomial time, sits in this picture? So what do we know? We know that BPP is inside BQP because anything that you can do classically, you can do in a quantum algorithm. And also, we know that you can simulate BQP in P space. So it's somewhere in the middle between BPP and P space. And we would like to get a better understanding where it is, what is the relation between BQP, NP, BQP, and pH, so on and so forth. So we believe that this is actually incomparable. The BQP and NP are incomparable. There are some problems that can be solved in NP and not in BQP and vice versa. But in order to show such a thing, we really need to separate complexity classes. And we don't know how to separate complexity classes in general. So we don't even know how to separate P from P space. And anything in between is even harder than that. So we need to relax this notion of separation and we relax it by considering oracle separations. And this was considered in the past. So in particular, this work of Bennett Bernstein, Brassard and Vasirani showed an oracle separation between NP and BQP. He showed that there exists some oracle in a language that can be solved in NP with access to the oracle A, but cannot be solved with BQP with access to this oracle A. You can see there's some evidence that quantum algorithms are unlikely to solve NP complete problems. It's sort of a weak evidence because it's with respect to an oracle, but you can view it as an evidence. I want to mention also this seminar result of Bernstein and Vasirani. They show that quantum algorithms are superior to randomized algorithm under oracles. So BQP is not in BPP with respect to some oracle. You can see it as some supremacy TORM. And this was strengthening the work of Watruth that showed that BQP is not even in MA, Marilyn Arthur protocol with respect to some oracle. And this is sort of like a generalization of NP and BPP. Okay, but still it remained open whether or not there is an oracle separation between BQP and pH or even between BQP and AM. Okay, so Arthur Marilyn protocol, which is below the second level of this hierarchy. And in this work, we would resolve this problem with respect to an oracle. So our main result is an oracle separation of BQP and pH. And let me just remind you what is pH. Okay, so let's define it formally. A language L is in the polynomial hierarchy. If there exists a constant K and some computable relation R such that membership in the language X is in L if and only if you can write this expression on the right-hand side. There exists a Y1 string Y1 such that for all strings Y2 there exists a string Y3 and you do it for K times. And then you apply the relation on X and all these Ys. So what do you require? You require that K, the number of quantifiers would be a constant and that the length of these Ys, these strings would be polynomial in the length of X. Okay, so if you think about it, NP is captured by only just one quantifier. There exists a Y such that some relation all the X and Y co-NP is captured by for all quantifier, for all Y2, the relation hold R and X and Y2. So both of them are capturing this language in this, sorry, complexity class. And our main result is that we show that there exists some oracle and some language that can be decided in BQP with access to these oracles but cannot be decided in PH algorithms with access to the same oracle. I want to tell you one way to interpret a result. And this was, this is a quote by Lance Fort now. He says, even in a world where P equals to NP, even if P equals to NPs and also it equals to the polynomial hierarchy, even making such a strong assumption, that's not gonna be enough to capture quantum computing. So there's still gonna be problems that quantum computer could solve that classical computers cannot. And this is, I think, one way to use this result. Okay, so I mentioned oracles. I want to show you this beautiful picture drawn by Kevin for the quantum magazine in the article that they published about our work. So you can think about it as like an oracle, you press a button in the middle and then it tells you a plus or a minus one. So what are these oracles? So they are sort of giving you hints that can or cannot help you solve a particular problem. And when we're doing oracle suppression, we want to show that the oracles help, let's say, the BQP algorithm, but doesn't help the PH algorithm. But I will quickly move on to talking about a relatively similar model. This is called the black box model. And there are very strong relations between these models. So let me specify what I'm meaning by the black box or the query model. So think of the input X as a huge string of length capital N. And the only access that you have to this huge string is via black box queries. So you can ask, what is the value of the I've coordinate of X? And then you'll get from the black box XI. Okay, so this will be a plus minus one. And notation, the plus minus one notation to represent bits will be useful later on in the talk. So when we are talking about this black box model, we are trying to figure out some property of this long string X. And we are trying to do it making as few queries as possible to the black box. If you are talking about deterministic algorithms in this model, they correspond exactly to decision trees. So the number of queries that you'll do to the black box is exactly the decision tree depth of the problem that you're trying to solve. And you can also think of randomized analog and undeterministic analogs of this model and quantum analog of this model. And it turns out that quantum analogs of this model are very useful. And so what do you do in a quantum algorithm instead of asking about a specific position I, you would ask a query in superposition. So you have a superposition over all possible coordinates. For each I, you would have some amplitude alpha I and you want it to be a quantum state. So you would require that the sum of squares equals one. And then you'll get as an answer, a superposition over all possible answer. So each state I will be multiplied by the sine of XI. Okay, so the amplitude, their size would not change but their sign would change. And then how do you use this black box? So one query looks like that. Then you do some post-processing. You're doing some manipulation on the quantum state and then you do another query, another manipulation, another query, so on and so forth. And the complexity measures that you're considering is a number of queries. I want to mention what is the analog of the polynomial hierarchy in this model. And this is actually bounded depth circuits or AC0 circuit. In fact, AC0 circuit were defined as an analog of pH in the Oracle model or in the black box model in order to come up with Oracle separation of pH and other classes. And we will see it later on. And I want to now forget about Oracle for a second and think just of black box separation. So we will show that some problems can be solved in the black box model using quantum algorithm but cannot be solved using AC0 circuit. And this, using well-known reductions that date back to the early 80s, this would imply Oracle separation between BQP and the polynomial hierarchy. Let me just very briefly mention why does pH correspond to AC0 circuit? So recall that we had this expression there exists for all, there exists for all and then some relation. This would describe the pH language. We can sort of replace any existential quantifier by an OR gate and any universal quantifier by an AND gate. And then we'll replace the relation on the bottom by some DNF. This would give us basically from an expression with K quantifiers we would get to a circuit of depth K plus two or K plus one that captures this in the black box model. So this is the correspondent between these two models but you can very soon forget about the previous model and just concentrate on this new model. In fact, I'm gonna talk about the question about pseudo-randomness and it would turn out that understanding a question about pseudo-random actually would prove the Oracle separation that we want. So what is pseudo-randomness? So let's look at this picture. So we have on the right-hand side a truly uniform string. So we have N coins. Each of them is heads with probability half, tails with probability half and all of them are sort of independent of one another. On the left-hand side, we have pseudo-random string. The string that looks random and okay but then the question is looks random to who? Who is looking? And there is some functions that tries to do some computation on these bits and these bits should look random to this function. Let me be a bit more precise. So we say that the distribution is pseudo-random against a class of function C. If any function in this class has similar expectation under the distribution D as it has under the uniform distribution. In other words, the function cannot tell the pseudo-random distribution and the uniform distribution. It will act the same on both of them. So in some sense, it looks random to this class of functions. Okay, so in these two beautiful works of Aaron Sohn and Pfeffermann et al. They pose this challenge. Can you find the distribution which looks random to AC0 circuit or bounded depth circuits but doesn't look random to polylog time quantum algorithms? If you could meet their challenge, if you can find distribution with these two properties then they showed a reduction building an oracle separation separating DQP from the polynomial hierarchy. Okay, and I will not get into much details about how this reduction works but these are arguments that go back to the early 80s in the work of oracle separation of P and NP. So basically these are very classical arguments. Okay, so we will meet this challenge. So just to state our main result, I want the definition of an advantage. We said that an algorithm A distinguishes between the distribution D and the uniform distribution with advantage alpha. If alpha is the difference between the acceptance probability of A on the distribution D and the acceptance probability of A on the uniform distribution. Okay. And our main result is that we present a distribution D such that there exists a log M time quantum algorithm distinguishes distinguishing between D and U with advantage at least one of the log M. However, any quasi polynomial size concept depth circuit distinguishes between these two distribution with advantage at most one over square root M. So as you can see, there is a exponential difference between the advantages of the quantum algorithm and the concept depth circuit. And in fact, you can even make the advantage of the quantum algorithm to be bigger. By standard techniques, you can sort of amplify the advantage of the quantum algorithm by sort of like repeating the algorithm, polylogant times, you can get the advantage to be one minus one over poly M. I will not go into these amplification techniques. Let me just prove or give you like the highlights of the proof of this main result, showing that you have like a pretty good advantage one of the log N in the quantum algorithm and a really tiny advantage one over square root M with respect to bounded depth circuits. I have a question. Yes. What is the role of like log N versus quasi polynomial in N, like instead of log N, if it was log square or polynomial in N, would it still be interesting? Yeah, so basically for the first bullet, I want the quantum algorithm to run in polylog time. So it would work just as well. I'm just saying that the quantum algorithms that we will demonstrate actually runs in logarithmic time. And in fact, it actually makes only one query to the, you know, to the black box. So it's a very simple quantum algorithm. But any quantum algorithms that would work with polylog N time would be good enough for our result. Okay. So what will be our, sorry. It's going a bit slow. Okay. Yeah, my computer is not super powerful. So things would like every time I click, I need to wait like two seconds for it to do something. Okay. So what will be the plan for the rest of the talk? So I want to give you like the proof highlights of this main result. So I want to define this distribution D. And this distribution D, remember, should look random to the constant depth circuit, but should not look random to very simple quantum algorithm. So the second step is, I'm gonna briefly mention a quantum algorithm. This algorithm was actually given by Aaron Sohn and Nambanis. The distinguishes between D and U. So basically both one and two were done maybe eight or nine years ago by Scott. But number, the point number three was missing. So it was conjecture that the distribution that Scott suggested was to run them against AC zero, but it was missing. And this is our main contribution. We show that this distribution D is to the random against constant depth circuits. In fact, we are changing the distribution a bit. The original suggestion of Scott a bit, and let's see how this distribution looks like. Okay, so let's start with point number one, the definition of the distribution D. So this is based on Aaron Sohn's suggestion for a distribution which he called for relation. So he calls it, we want to construct distribution over two N, strings of length two N. It will be convenient for us to think of string of length two N, as you'll see from this definition of the distribution. And it will be also convenient to think of capital N. I will not have small N. So let's just call it N. So it would be good to think of N as a power of two. I would have some parameter epsilon, which is one of the log N. And you'll see how it plays role in this slide. And I'm first gonna define a multivariate Gaussian distribution. This will be a distribution over real vectors. And then from this distribution, I'm gonna derive a distribution over Boolean strings. Okay, so but let's start with this Gaussian distribution G. So I'm taking a multivariate Gaussian distribution over two N dimensions. So it's over R to the two N. It has zero means. And the covariance matrix is this two N by two N matrix that has this block composition. So I can write it as these four blocks N by N blocks. So the first block is an identity. The second is a Hadamard matrix, then an Hadamard matrix and identity matrix. And I multiply all the entries by epsilon. Okay, what is this Hadamard matrix? This is a well-known matrix that is used all over the place. So in communication complexity, quantum computing, you can see it extractors and so on and so forth. This is a well-analyzed matrix. How does it look like? So all the entries are like plus minus one divided by square root N. And the sign depends on the inner product of I and J or rather the binary representation of I and J. Okay, but actually the only thing that will be important for us, I guess will be the fact that quantum algorithms can compute efficiently this Hadamard matrix and the fact that the entries are somewhat small like plus minus one divided by square root N. Okay, so this would be the multivariate Gaussian distribution and how do we derive from it distribution over Boolean strings? So we'll first gonna sample Z according to this multi-part Gaussian G. So then we get a bunch like two N numbers between minus one, sorry, real numbers. So they can be arbitrarily real numbers. So the first step is that we're gonna do is we'll truncate all the coordinates, all these two N coordinates to be within minus one one. So if the number is smaller than minus one, we are fixing it to be minus one. If it's between minus one and one, we are keeping it as is. If it's bigger than one, we are making it one, okay? So now you can notice that the choice of Epsilon was crucial for this first step. So I picked Epsilon to be one over log N so that with high probability all the coordinates will be within minus one one. So this truncation would actually do nothing to the vector Z, okay? That's because like each coordinate has zero mean and variance Epsilon, okay? Okay, so now I got a bunch of numbers between minus one and one and now I'm gonna draw a Boolean string from it. So for each coordinate between one and two N, I'm gonna think of this number Z Zi as a bias of a coin. And I'm gonna flip a coin with this bias. Yes, I'm gonna pick Z prime I to be a plus minus one random variable whose expectation equals the value of Zi. So let's do a fourth experiment. What happens when Z one equals zero? Then I'm really just flipping a random coin, right? One with probability half minus one with probability half. If Z one was one, then Z prime one would be one surely. And if Z one was like half, then I'm flipping a bias coin with probability three quarters and one, one quarter and minus one. Okay, so this is as a way to go from the Gaussian distribution to the discrete distribution. So I want to mention that we slightly modified our own suggestion. So our own suggestion was to take Z prime to be simply the sign of Z. Okay, so to go from Gaussian to discrete, it just took the signs and we will crucially rely or at least our analysis crucially rely on the fact that we are flipping bias coins with these bias. Okay, and we will see where it comes from towards the end of the talk. Okay, I want to mention that it's not clear actually that such a Gaussian distribution exists. So I just gave you the covariance matrix, it's not clear that such a Gaussian distribution exists, but so let me be more explicit about it. You can think of this Gaussian distribution G as first sampling N coordinates X one to XN, IID Gaussian from the Gaussian distribution with zero mean and variance epsilon. And then taking Y one up to YN to be H times X one to XN, okay? And recall again H is this Hadamard matrix. So now we have X one up to XN, Y one up to YN and we simply take Z to be the concatenation of the two. So notice that we don't have any correlation between the X's, we don't have any correlation between the Y's, you can also show that. The only correlations that we have are between the X size and the YJs. And these correlations are via this Hadamard matrix. So maybe it's easier to see it in the previous thing. So basically, right? So in the between like the left half, we don't have any correlation between the left after and the right half, we have small correlation, something like one over square root N. And I want to give you some intuitive idea what's the difference between the quantum algorithm and the classical algorithm. So we have a lot of small per voice correlation here in this distribution. The quantum algorithm somehow manages to accumulate all of them together. So he manages to take all these correlations which are of magnitude one over square root N to get from it something with a big amplitude, which would be something like epsilon or one of a log N. We will show that the classical algorithm cannot accumulate them. It can accumulate something like a polylog of them, but not more than that. And this would be the main difference between the two models. But this was very hand-wavy and we will see the details next. Abishai, I've got a small question. So here you do the truncation to bring back the Gaussian into the range minus one one. Right. If you did something less, not just a truncation, but just saying like you get a mapping like the inverse of arc tan to get back to minus one one would the rest of the proof go through or you also rely on the fact that you actually have a Gaussian truncated I haven't checked this. It's something that I want to consider going forward, but I haven't checked it. But I think that probably you are correct. Probably there are different transformation that would like take the reals into minus one one that would work. We analyze this way of mapping the reals to minus one one. But yeah, but you are, it's definitely something worth checking whether or not something like arc tangents or would work. Maybe you can actually improve the parameters if you'll do it this way because I'm really insisting that like I picked the parameter so that the numbers with high probability will be within minus one one. And if you're doing some other transformation maybe you will not need this epsilon. But this I think will have a slight improvement over the results so not significant improvement. But excellent question. Thank you. Okay, so we are done with the definition of the distribution D and it will be a good time. Actually, if you have question about this distribution although we are not going to use all the properties of it just to make sure that they're on the same page. So basically if you think of the distribution D if you look at the first half of the coordinates it's actually uniform. The uniform distribution over Boolean strings over like n coordinates. Also the second half is a uniform distribution and you have some correlation between the parts. Okay, so now I'm moving to the second step which is to show that there exists some quantum algorithm distinguishes D from U. This shouldn't be too surprising and this is because Scott Aronson came up with this distribution which we modified a bit but he came with this distribution and an algorithm at the same time. So he had in mind an algorithm when he cooked this distribution and I will briefly mention how the algorithm looks like but before saying how the algorithm looks like let's just say what the properties of the algorithm. So in this work of Aronson and then in the follow up by Aronson and Bunnies they gave a one query log n tank one to my algorithm Q such that on input X comma Y the probability of acceptance of the quantum algorithm Q on this input. So this is a probability only over, sorry, this is a probability or only over the randomness of Q. Okay, so X and Y you should think of them as fixed and I'm looking what is the probability that Q accepts this input and this is the probability of the quantum measurements. So they show that it's one plus phi of X Y divided by two where phi is this very nice expression. It's like one over n to the three over two, some over i's, some over j's, minus one to the inner product of i and j times X i times Y j. So this expression, this plus minus one thing should look familiar. This looks familiar from the definition of D but this is actually the acceptance probability of the quantum algorithm. So now let's show that using this this statement, let's show that actually the algorithm distinguishes between the uniform distribution and the distribution D. So what will be the expectation of phi under the uniform distribution? Remember that we are thinking of beats as like plus minus one values. So basically we can use linearity of expectation and show that for each monomial the expectation of X i, Y j under the uniform distribution is exactly zero, right? Because X i and Y j are independent under the uniform distribution and the expectation of each one of them is zero. It's like one with probability half and minus one with probability half. Okay, so this is fairly simple and the other bullet is also simple showing that the expectation under D is big is also pretty simple. And why is that? So again, we're gonna use linearity of expectation and we're gonna notice that the expectation of each monomial X i, Y j is exactly this minus one to the i comma j times epsilon divided by square root n. So we get that these plus minus one signs cancel each other out and everything actually gets to be positive with positive sign. So in this way, the quantum algorithm sort of take all these small correlations is one over square root n correlation and it gives all of them like a plus sign then it sums them up together. So in the end we'll get that the expectation of phi would be epsilon. Okay, which is roughly one over log n and indeed the difference between the acceptance probability of Q on the uniform distribution and on the distribution D is one over log n as we promised. Okay, I didn't really describe to you the quantum algorithm but it's fairly simple. I will not get into much details here but let me briefly mention how it looks like. So basically you have only one query so you cannot do too much. So you first prepare a state in superposition over all possible indices. So I think of like the coordinates zero i is like a coordinate in the X part and one i is the coordinate in the Y part. So I'm doing some superposition over all possible queries and then I query the black box. So then I'm getting to step two and I multiply this state zero i with X i and I multiply this state one i with the sign of Y i. And then I do some post processing so I can apply the atomized transformation only on the second half of the state. And I'm sorry if this notation is not clear this will be the only slide that uses this bracket notation. So if you don't know it you can just sort of like tune off for one slide and then get back to the next slide. Okay, so when you do this Hadamard transformation and it's important that you can do it fast you can do it in logarithmic time. You get this state in bullet three and then you measure the first qubit in the plus minus basis and it's a small calculation to show it's a probability of acceptance is exactly this expression that we saw in the earlier slides. So this was maybe a bit quick but I just want to demonstrate that the quantum algorithm is not super complicated and it shouldn't be surprising because the algorithm came up with the distribution in mind so or vice versa. So both of them came at the same time by Scott. Okay, so now that we finished talking about the quantum part I want to focus on the main result which is the proof that this distribution D is to the random against AC0 circuits. Okay, so I want to just make sure that we are on the same page to recall what are AC0 circuits. So we are talking about circuits where we have ends and or gates. Each gate can have many inputs unboundedly many inputs. Okay, so the fan in could be unbounded or the in degree could be unbounded. The main restriction on the circuit is that the depth is bounded. So the depth for the number of gates between an input and output would be at most a constant. We'll denote the depth by D and the size or the number of gates by S and we will focus on the parameters where the size is n to the polygon and the depth is order one or a constant. The reason that we are focusing on this set of parameters is that this is the analog of pH in the black box model. Okay, so what do we know about bounded depth circuits? So we basically really love AC0 circuit in the, you know, in circuit complexity community because we know how to prove pretty good lower bonds towards them. So in particular, this seminar results of ITI first accent Sipser and then Yao and Hostad shows that parity cannot be computed in this model. So it requires more than polynomial size. In fact, it requires more, even an exponential slice. It requires n to the exponentially and n to the one over D minus one size. And this results that parity is not in AC0 was really motivated by the application showing that there exists an oracle such that P space is not in pH. So in fact, the definition even of AC0 circuit was in order to show this oracle separation. So it was motivated by this problem. So this is a really classical result. I want to mention that there are a lot of proof of seats but let's look at one particular proof of it that came later by linear Monsourni-San based on Hostad's switching lemma. So linear Monsourni-San showed that AC0 circuits can be well approximated by low degree polynomials. On the other hand, if you want to approximate parity using low degree polynomials, you simply cannot. So if you want to even approximate it slightly, you need degree at least n. So this shows that there is a difference between these circuits and parity and particular parity cannot be in this class. You can even derive from it average case hardness of parity, but that's not code. So you can try to adopt this proof technique and try to do it for our task but you'll end up facing a problem with this approach. So what will be the problem? So recall that we want to sort of separate very efficient quantum algorithms making running in logarithmic time and making constant number of queries. And we want to separate them from AC0 circuit. However, any quantum algorithms that runs in logarithmic time can be well approximated by low degree polynomials so we cannot sort of compute parity using the quantum algorithm. And it seems that low degree polynomials is not the right language to distinguish between the two and this is sort of made a lot of people not go in this direction. We were sort of insisting on going in this direction and we were hopeful because we found out the difference between these two polynomials. So it's true both bq log time. So things that can be computed using quantum algorithm in logarithmic time and constant depth circuits can be approximated by low degree polynomials but these polynomials are really different because they are not of the same flavor. So in particular, bq log time can have very dense low degree polynomials that describe its acceptance probability. In particular, we saw that for phi for the quantum algorithms that we had before its acceptance probability is like one plus phi over two where phi is this real low degree polynomial. So it's as a polynomial on x and y, it's of degree two but it's really dense. It has n squared monomials. And you cannot sort of like drop some of them and get a good approximation. On the other hand, in this previous result from four years ago, I showed that AC0 circuit have sparse low degree approximations. So we can look at how many coefficients you have of monomials of degree K and this will be something like polylog N raised to the power K. And to be a bit more precise, you can look at the Fourier representation of the AC0 circuit and look at the sum of absolute values of sets of size K and show that this is the most polylog N raised to the K. And if you are not familiar with this notation of Fourier coefficient, I will show it in the next slide. Okay, so this gave us some difference between BQ log time and AC0 circuit. The question is, is it enough? And we will have the first attempt trying to apply this and derive a free analytical proof as it's these two random against AC0 circuits. By the name of the attempt, you can sort of guess how it will end up with but it will fail, obviously, but we will able to fix it. Okay, so what is this Fourier expansion? So for any Boolean function, we would use the fact that you can write it uniquely in the Fourier representation as a sum over all sets S. You have a coefficient F of S. This is just a real number between minus one one times the monomial product of X i where i is in S. So this is a way to write a Boolean function as a multilinear polynomial. Once you do it, it allows you to do weird things with this representation. You can now plug in real values instead of just Boolean inputs. Okay, so because this is a polynomial over the reals, you can just see what is the value of this polynomial over let's say the all zero string, the all zero vector or something like that. Over plus minus one to the two N, the value will be exactly the same as the value of the function. So this is what we require, but it's not clear how to interpret its value on over the reals. So let's take some vector Z where all the coordinates of this vector are between minus one one. Let's think how should we interpret the value of this multilinear polynomial on Z? How should we interpret F of Z? So a simple claim is that F of Z is really the expectation of F of Z prime where Z prime are drawn according to our coins, drawn according to the biases defined by Z. Okay, so again Z prime I would be a bias coin whose bias is exactly Z I. So how can we prove this thing? It's a fairly simple claim, but just to illustrate it. So you can start by considering the case where F was just a single monomial like the product of XI where I is in S. And then you just use the fact that expectation is of product is the same as product of expectation where we have independent running variables and we picked the Z prime to be independent. We picked them independently of one another. So this, the claim is obviously true for single monomial. And when we talk about a linear combination of monomials or a general Fourier representation, then we can just do linearity, use linearity of expectation. So really you should think of, so this claim really gives you a nice interpretation of the value of this Fourier expansion or this multilinear polynomial on non Boolean inputs. And in particular, what is the value on the all zeros input? It's the expectation of F on a uniform Boolean string or the expectation of F under the uniform distribution. So this is really something that we want to capture. Okay, so I hope that this was clear. So now recall what is our goal? We want to show that the expectation of F of Z prime where Z prime is drawn according to D is very simple, similar to the expectation of F on the uniform distribution. How similar, something like one of us quote. We call how we sample Z prime. We sample Z according to the Gaussian distribution G. And then we truncated all the coordinates. You can sort of ignore this step because we said that we find probability that truncation would not take into effect. And then we treated these numbers Z one up to Z two N as biases of coin. And we drone plus minus one variables independently with these biases. So this should be very familiar to what we had on the previous slide. So really you can show that the expectation of F on Z prime is the same as the expectation of F on the truncated version of Z. And since the truncation version of Z is equal to Z with high probability, we get that this is very similar to the expectation of F of Z when we are not taking any truncations at all. Okay, so what does it mean? It means that we can sort of like replace this expression. So we can replace the first expression by the expectation of F on Z where that is this Gaussian distributed according to the Gaussian distribution. We can replace the uniform distribution with F on the all zeros input. And our new goal will be to show that this quantity is small. So now we really think of like the value of this multi-linear polynomial on real vectors. So the all zero vector and the vector drawn according to a Gaussian distribution. Okay. So let's try to do it. And as by the name of the attempt, it will fail and we'll see how to fix it. So let's try to bound these quantities, the difference between these two expectations or between these two things. So by definition, I just can write down the, you know, the Fourier expansion of F and sort of do it term by term. So the first term would cancel each other out because F of zero is really only the empty coefficient. All the other monomials are killed by F of zero. So we are left with all these sets that are non-empty. We are left with all these sets S which are non-empty. And we have this Fourier coefficient F of S times the expectation of the monomial product that I were rising S according to G. Now we can use the Cerellis theorem and we're using the fact that we have a Gaussian distribution with zero means. And this theorem, which is like a hundred years old tells us that, you know, the odd moments of a multivariate Gaussian distribution with zero means are zero. So we can sort of like drop out all the odd degree monomials. And we are left with only a monomials of degree two L. Okay, so now I'm like going over the size of the set and I'm going over all sets of size two L and I'm using the fact that the sets of odd size gives you zero, okay? And now in order to bound the even size sets, I can again use the Cerellis theorem which gives me an expression of this moment of this moment in terms of the covariance matrix. So it turns it out that you can sort of bound this moment by epsilon to the L times L factorial divided by square root N to the L. Okay, so this is some calculations that needs to be done, but you can trust me on that. And then you can sort of use the fact that you know that the sum of absolute values of these Fourier coefficients is not too big. It's like polylog N raised to two L. You can try to see what does this expression gives you. So for L equals one, you get polylog N times epsilon times one factorial divided by square root N. So you get something small, actually what you wanted. You get epsilon times polylog divided by square root N. For L equals two, you get something even smaller than that. But you run into problems when you go to really high degree. So when L equals square root N, this L factorial turns out to be pretty big and we cannot sort of control the contribution of high degree terms. So we can control the contribution of the first square root N terms, but we cannot control what's going on in this sum when we consider the high degree terms. This is where we were stuck, I think, four years ago when we were trying a lot of different fixes to this thing, trying to change the distribution, trying to apply noise, and none of them worked until we had stumbled upon a new work that had a very cool and simple idea that helped us push it through. Okay, so the second attempt would be to do a mental experiment and in this mental experiment, we would do some random work. Okay, so this was a bit vague. Let me be more precise. So instead of viewing this sampling of Z in G as a one-shot sample, we would do it as a result of a random work, making a lot of small steps. So what are we gonna do? We're gonna sample T vectors, Z one up to ZT, Z one up to ZT, according to this distribution G. We're gonna do it independently. And then we're gonna take the sum of these vectors and divide it by square root T. What do you know about the sum of independent Gaussians? You know that it's a Gaussian as well, and you can sort of calculate the means and the covariances, and you would get that the means and the covariances are exactly the same as the means and the covariances of G. Okay, so this is really just a thought experiment. We are not really changing the distribution G. We can just view it in this way. And this thought experiment was inspired by this work of Ciaropati Khatami Hussaini Lovett, who built pseudo-random generators based on random works. I want to mention that one difference between our work and their work is that they're actually constructing the generator using a random work. We are only doing it as a thought experiment in the analysis. Okay, so what does this thought experiment gives you? It gives you a way to analyze what's going on in a hybrid argument. You can sort of think of your random work as making a lot of small steps and just trying to understand what's the difference in the acceptance probability between each step. Okay, so this would be a hybrid argument. We are defining T plus one hybrid. The first would be the origin. Okay, it's all zeros vector. And then for I going between one and T, we'll take the I hybrid to be the sum over the first Z one up to Z I vectors divided by square root T. Okay, so let's say the first hybrid is really a tiny vector. It's like it's a vector sample according to G but divided by square root T. The second is like the sum of two small vectors. The last one, HT will be the sum of many small vector so it will not be small actually. So HT will be distributed exactly according to the distribution G. So how big should you think of T? So you can take T to infinity and this would give you a stochastic process known as Brownian motion or winner process. We are not using this language. I believe that we could have used this language but we chose T to be a finite number. You can think of it as just like a big polynomial in N probably picking just T to be N is enough but you can think of it as like N to the 100 would be more convenient maybe. And what we are claiming is that the different like the expectance probability of F on the I plus one hybrid is very similar to the expectation of F on the I hybrid. How similar? You will show that it's something like polylog divided by T times square root N. And then using triangle inequality, you can show that the difference between that's the expectation of F on the last hybrid and the first one is something like T times this small quantity which is polylog divided by square root N. So this claim would be sufficient for getting the fact that G is sort of random for F. Okay, so this would be the idea. And now let's do a proof by picture. So in the first step, we're gonna use the fact that this vector is really tiny and we're gonna repeat the arguments that failed before and we will see that now the argument works. Again, quite surprisingly is the only coefficients that would matter for the analysis only for the coefficient of the function that would matter are the ones of degree two. So previously we could sort of handle everything up to degree square root N. Here, since we are taking a much smaller vector, we would only need to handle the degree two terms and everything beyond that will be negligible. Then in the I step would reduce the I step to the first step using a very powerful and simple lemma by Chetopati, Khatami, Hussaini and Lovett. And this would rely crucially on the fact that bounded depth circuits of a certain size and a certain depth are closed under restricting some of the variables. So if I'm fixing some of the inputs to be constant, then I'm not making the circuit bigger. I can only make it smaller. So this is an important ingredients that did not appear in the previous analysis. Okay, so let's do the first step. So as I said, it's very similar to what we did before. The only thing is that now, okay, so we are taking the difference between the hybrid number one and the hybrid number zero. Okay, so it's the same as taking the value of F on Z divided by square root T. So I'm dividing like all the coordinates by a big quantity and think of T as like N to the 100. So I can do the same calculations that I did before and essentially every monomial will be multiplied or divided by like a monomial of degree two L will be divided by T to the L, okay? Because we are simply dividing each variable by square root T. And again, think of T as huge. So really this T to the L would kill this L factorial. So that would not be a problem. And in fact, we can show, and it's not too difficult to see that the main contribution would be from L equals one. So L equals one gives you poly log N divided by T times square root N. L equals two give you something divided by T squared. But recall that T squared is really much bigger than T. So this is negligible compared to the first term and basically you can show it for all terms. So this shouldn't be too surprising. You can think of this dividing by T as like applying a lot of noise to the function. So much noise so that only, so it could be really well approximated by a degree two polynomial. And then everything which is of degree higher than two doesn't take into effect. So this is actually an idea that we had. We couldn't see how to make from it a random work. So using this idea, you can get that H one is pseudo random against AC zero circuits. But we couldn't show that you cannot show that H one can be identified by a quantum algorithm. So really you need not to analyze only one step, you need to analyze the sum of these T steps. Okay, so I mentioned that there is a reduction from the general case to the base case. Okay, so this is due to this beautiful lemma of C'eropadi, Khatami, Hussaini and Lovett. This shows that if you take any fixed vector V and you look at the difference of F on V plus Z minus F of V. Okay, so think of it as a new function that depends only on Z, okay, G of Z. You can write this difference as expected difference of F row on two Z minus F row on all zeros input. And what is this F row? This would be a random restriction of the original function F. Okay, so I'm not getting into the details how you pick the random restriction, but basically every coordinate will be fixed with probability half and would be kept alive for probability half. And when you fix a coordinate, you will do it according to the marginal would depend on V. So basically it gives you a way to bound the difference of two close by vectors. So think of Z as a really tiny vector. So it tells you that for a really tiny vector as a difference of F of V plus Z minus F of V is sort of like the average of the differences of F row between zero and a really tiny vector. So it sort of shifts two close by points to the zero and a small vector. So this would allow us to do some induction. So we will condition where we ended up if under the after the half step. So we'll condition on the value of HI. And with high probability actually all the coordinates will be between minus half and half. Okay, this is by this choice that we made that they told us that the truncations doesn't do anything with high probability. So also this would happen with high probability. So let's condition on this event. And then the difference between HI plus one and HI is really the difference between F on HI plus a small vector and the value of F on HI. And now we can apply the lemma and say that this is at most the expectation of restricted version of F on a small vector minus the restricted version of F on the all zero thing. And now we're using again the fact that this class of function AC0 circuit with certain with at most S gates and depth at most D is closed under restriction. So really F row also has a property that it is a small depth circuit. And we can apply the base case on F row and get that this is at most poly log N divided by T times square root N. So this is actually the end of the proof. So we managed to prove the theorem up to some things that I were hiding behind the rug. Okay, so let's do a recap what we saw so far. We define the distribution D, discrete distribution based on the Gaussian distribution G. We briefly mentioned the quantum algorithm due to Aaron's own and Nambanese that can distinguish between D and the uniform distribution. And the main technical part or the main contribution of this work is to show that this distribution D is the random against AC0 circuits. In order to do that, we did these thought experiments where we thought of this Gaussian distribution G as a result of a random walk making a lot of small steps. We used the result from four years ago in order to show that the first step has a small advantage, one over T times square root N. And then we use this beautiful lemma of chatter body et al to show that the I've step can be reduced to the first step. And this is basically, so I want to finish with mentioning some new directions and open problems that they have. Okay, so I see that we are running out of time. So maybe I will just do it very briefly. So I want to mention that following up on our walk, Aaron's own and Ford now gave an oracle such that P equals to NP and BQP is not in P. So this is exactly the quote that you saw earlier from Ford. Now you can have a world where P equals to NP but BQP can solve problems outside of P. And this uses some new ingredients. So you need to modify the oracle a bit. Also Lance showed that our oracle, under our oracle pH, the polynomial hierarchy is infinite. And this was known for other oracles and it was known actually under a random oracle but I think it's interesting to know it for our oracle. And they post some interesting open problems that sort of rely on on this result. So you must have this BQP pH separation if you want to tackle these oracle separation will not get into these details but it seems that they require more ideas. So it's not like it doesn't seem that taking our oracle would be enough. Okay, so I want to mention second open problem or a second style on open problems. And this is with respect to pseudorandomness. So you can ask whether or not you can extend this result to separate BQ log time and AC0 circuits with parity gates. And it's sufficient to show a Fourier bound on the Fourier coefficient of AC0 circuits with parity gates. And it's sufficient to show that the second level like coefficient of sets of size two is at most square root 10 divided by polylog. We are very close to showing this. We can show it square root 10 times polylog. So we really only a polylog away from proving it. And we actually conjecture that the right answer should be just polylog. Okay, this is in a joint work with Ishan, Pouya and Chachal. And if you can get our conjecture, then you would get a PRG for AC0 with parity gates with polylog seed length. This is a result that is just motivated by classical computation. And it would be a great improvement to the current best results that we know for AC0 with parity gates. The best PRG that we currently know have linear seed length. So if you could prove this conjecture, you would go from linear to polylog. And okay, so I'm running out of time. So thank you very much. And we'd love to hear your questions. Thank you, Avisheh. Questions? Hi, I've got another small question. When you do the decomposition, one big Gaussian becomes T small Gaussians and you renormalize. You use the same T basically. You're saying like it's one of a square root T, G1, blah, blah, plus GT. Would that help maybe improving a little bit the parameters if instead you weigh them saying like, sum from I is equal one to T of like, I times the Gaussian divided by the right normalization factor or something like that. Yeah, I haven't thought about it. I mean, this seems like the most natural thing and it works. So I haven't tried to play with this, but yeah, I don't know. I haven't thought about it, yes. But it makes sense to like this could, this seems to be like another forth experiments that could potentially work just as well. Hi, maybe along similar lines. So could you just clarify where the quantum algorithm fails the lower bound? So is it in just, does it distinguish a single step or is it in the application of the of the CHHL Lamar that it somehow, because... So in the quantum, yeah, the quantum algorithm. Okay, so let's see, I haven't analyzed the quantum one. One of my problems are also questions. Right, but basically in a single step, you would get a small advantage. You would get an advantage, let's say one over T in the quantum algorithm in a single step. So you cannot take now T to be big, but you will accumulate the distinguishability along the... Somehow you still need, you wouldn't be able to... Because if the quantum algorithm makes the distinction already at the first step and the classical VACs, the circuit doesn't do it, but you wouldn't be able to use just these first two distributions, H0 and H1, just to make the whole argument. You're still just confused with that. Okay, yeah, maybe this was a bit confusing, but you can do these forth experiments and you cannot do it, right? So you can, it's just a thought experiment. So the quantum algorithm actually distinguishes between the two of them. If you think what happens in the random walk, you would get, I guess, that between each step, the distinguishability of the quantum algorithm, something like one over T or maybe one over T log N, and then overall the distinguishability is one over log N. So actually you are accumulating these small advantages. All right, so there's an amplification argument that's going on. So if you just take one step, the quantum algorithm will distinguish with probability one over T and the AC zero circuit can distinguish with probability at most one over T log N or whatever the bond was. And that way itself is not enough to give you the separation that you care about by naive amplification. And so you have to try to walk argument that gives you a stronger, better kind of amplification. Right, so it's really important that, so in order to carry this argument, I needed T, I said that I picked T to be poly N. Actually you can pick T to be anything which is slightly bigger than poly log. Let's say poly log to some omega of one, a small omega of one, I mean, but then the advantage of the quantum algorithm would be like in a single step would be like smaller than the inverse poly log. And then if you want to amplify it to be a constant, then you will want it to run for more than poly log steps. So this would be sort of a separation and this is something that we thought about before realizing that the random walk would work. We thought about it and what it would give you, it would give you a separation where the quantum algorithm has more resources than the pH algorithm. So it would be sort of like a non-fair game. And such separation are not super interesting because we know that we have time hour keys and stuff like that. So it's not like we are actually exploiting the fact that it's a quantum algorithm. Using this analysis, we can even take the number of, you know, it takes the pH to run in almost exponential time and still it will not solve the problem. Any more questions? So the last, the very last slide that you showed for the, if the L1 norm of the second level is T than the PRG for, does it give a PRG with like poly T seed length? Yeah, so it gives a PRG with like something like T squared poly log N. So in fact, even improving the square root N bound and I said that we have a square root N bound even improving it slightly, we'd already improve the PRG parameters. I didn't mention, we have some evidence that make like, I believe this conjecture is true because we have two evidence towards it. So one is that we can do it for the first level. So we can show that the first level is bounded like poly log. We can also prove on all, I guess on all levels, we can do it if the circuit is read once. So if you read each input bits only once and we can do it, you know. And this is just for AC0 with party gates or like any circuit class which is closed under restrictions or something. This is true. For any circuit class that is closed under restrictions. Okay. You should somehow, okay. So one class that doesn't have this property is anything that can compute majority doesn't have the property. So if you look at the sum of free coefficient of sets of size one or of size three majority, it's big, it's sort of polynomial N. So if you can compute majority then sort of would not help you to use this theorem or this PRG. But if you cannot an NC0 circuits with party gates cannot then we are hopeful that it's gonna work. Yeah. Any other questions? Okay. So if there are no other questions, let me take this offline and again, we are not here just sure of the schedule two weeks from now but let's stay tuned. So thank you once again. Thank you very much.