 post-long session. So today, I'll be talking a little bit more about the strengths and weaknesses of quantum examples for learning. So let me just give a quick recap of one slide of what we saw yesterday, and then I start from there. Good. So yesterday, we saw this back learning model. So there is this concept class. That's a collection of Boolean functions. There is this distribution d. That's just a distribution on n-bit strings. And the concept class is known to everybody. The distribution is unknown. And in classical back learning, you're going to get x, c of x's. This is an n plus 1-bit string, where x is sampled from this distribution d. And in quantum back learning, you're going to get this coherence superposition examples of x, c of x's, where the amplitudes are square root d of x. And the goal in both quantum back learning and classical back learning is for every concept in the concept class, and for every distribution d on n-bit strings, the goal is, in the classical case, to obtain these labeled examples. In the quantum case, these quantum examples, the output of h, a hypothesis that is close to c, when measured according to the distribution d. And the main theorem that we proved yesterday using the pretty good measurement was, if you look at the PAC model of learning, where you need to work well for every concept and for every distribution, the quantum sample complexity is equal to the classical sample complexity. So we saw that quantum examples, even given to a quantum computer, are not more useful than classical examples given to a classical computer. In this very, you could call it a strict model of learning, but it's something that classical learning doesn't really care a lot about. Good, but like I said, I got many questions yesterday and let me try to address the main questions that I got yesterday, and a few of them were, okay, the distribution under which the quantum PAC learning had to, we proved the lower bound against was unnatural. Like, okay, sure, you want to prove a hardness, this makes sense, but the distribution you proved it under was just unnatural. Maybe that's not how nature works. What if d is nicer? And let's just say for simplicity, what if d is just a uniform subposition? So then the question would be, just if d is the uniform subposition over n bit strings, then this is just one over square root of two to the n. That's just this. So quantum examples are this, and classical examples stay the same, but x is just drawn from the uniform distribution now. So the first question is, are quantum examples better than classical examples? Say when I fix the distribution to be d. So this is in learning theory, this is called the distribution dependent model of learning. And one more question I got was, okay, let's just say for example, I didn't get like a labeled example, x comma c of x, what if I choose to query the unknown function c? So here we always assume that c is unknown, but coming from script c, and somebody chooses an x comma c of x and gives it to me, but what if supposing I could query it on some point y that I like, I query it on y one, I get c of y one, I get y two, then I get c of y two. Maybe then our quantum queries more useful than classical queries. Good, so the main idea today will be to actually talk of the strengths and weaknesses of these quantum examples compared to classical examples. I'll give you some positive results and then I'll give you some negative results, and then I'll talk of the relation between quantum and classical query complexity of learning. And before I get to this, let me first, I don't think many people got to the third problem in the problem session yesterday, but I think this is a very interesting problem and it's a cute problem, so I think I'd like to talk about it first. It's called the coupon collector problem. So we have n coupons labeled one to n. The question is, let's just say I have n coupons in my hand, I just pick a uniformly random coupon, I give it to you. You see, for example, seven, you give it back to me. So again, I have n coupons, I pick another uniformly random coupon, I give it to you, you see four. The question is, how many times do we need to repeat the sampling process before you see all possible coupons from one to n? So this is a pretty fundamental problem in one-on-one computer science, and the right answer is n log n. So you need to repeat the sampling process n log n many times before you see all the coupons from one to n. So let me change the problem a little bit here, so let's just say there are still n coupons. But let's just say there is a fixed one coupon that I choose, let me call that coupon I star, and I take that coupon from the n coupons, I keep it on the table. So in my hand, I have only n minus one coupons, and there's one coupon on the table. Now the question is from the n minus one coupons in my hand, how many coupons do I need to keep picking from these n minus one coupons? I keep giving it back to you, and the question is your goal is to find what is the coupon on the table right now. So again, there are n minus one coupons. You know that one coupon is missing, it's on the table. How many samples from this n minus one coupons do you need to take before you realize what is a missing coupon I star that I picked out and kept it on the table? And the goal is to learn I star. The same analysis almost about from coupon collector shows that n log n is the right answer. So you need to pick n log n many samples from this n minus one coupons before you realize what is a coupon that was missing in my hand. So what do you do being quantum? You look at quantum coupons. So what's a quantum coupon? Let's just consider this uniform solution state. So there are n minus one coupons in my hand. So it's just a uniform solution over n minus one items. I'm just removing the one item I star, which is on the table, and I'm just gonna call this thing ket I star or a quantum coupon. And the question is, how many quantum coupons? For example, I saw this quantum coupon as just a fixed quantum state I star, ket I star. I keep giving you this fixed quantum state I star. How many different quantum coupons do you need before you realize what's I star? So realize I call it I star because that's the only thing missing in the superposition. And the surprising answer is the right answer is theta of n. So it's surprisingly log n better than this, but it's just log n that may not surprise you, but given the fact that coupon collector is such a fundamental problem, the fact that you can get a log n improvement is kind of interesting. And the reason why I like this application is the following. So the way you show this upper bound is pretty simple. So you just look at, let's just say I give you T copies of this. So this is my I star to the power of T. And then you have to do state identification now because I need to learn what's the missing I star. Like I said yesterday, if you want to do state identification, hit it with a pretty good measurement. But the only observation here is like, if you write down the gram matrix for this pretty good measurement, these inner products are actually very nice and they can, so you can diagonalize it easily. So yesterday I kind of misled you into saying that, okay, pretty good measurement is useful always for a state identification, but there is a caveat in that. The fact is you need to always take the square root of the gram matrix, which could not be easy all the time. You need to find the eigenvalues of it. But it so happens for this example, diagonalizing the gram matrix is easy. And if you just diagonalize the gram matrix, write down the success probability and just plug in T equals the order of N, you realize that the pretty good measurement has success probability at least two thirds, which means that that's the closest to the optimal success probability. So, okay, this is I think a cute problem and a cute application of the pretty good measurement. But as far as I know, this is one of the only quantum speedups that I know which is not based on Fourier sampling. And that's what I'm going to be coming to in the next part of the talk. So like almost all known quantum speedups for learning are kind of based on this technique called Fourier sampling, which I'll talk about next. This is one of the few examples which I know which don't go via Fourier sampling and you just hit it with a pretty good measurement and it seems to work. Any questions? Good. So let's talk of Fourier sampling. Right, so what's Fourier sampling? So think of this, let me first define what's the Fourier decomposition and then talk of Fourier sampling. So C is just a Boolean function that maps n bits to minus one comma one. I just do minus one one for simplicity. Then the Fourier coefficients are just given by this formula and this formula is kind of intimidating. You can just think of this as the inner part between C and the parity function, the normalized inner product. So for every s and zero one to the n, I can define the Fourier coefficients this way. And there's something beautiful called the Parseval's identity. Let's say, okay fine, you have this Boolean function that maps n bit strings to minus one comma one. Look at the sum of squares of the Fourier coefficients. It's equal to expectation of C of x square but C of x was minus one comma one. So the square of this is one, so the expected value is one. So technically this sum of squares is actually the square C of, C hat of s squares is forming a distribution. Right, so the point is we just show by the simple identity that C hat of s square the sum of it is a probability distribution. And not only that, quantum examples allow you to sample from this distribution. So let's just say you have this uniform quantum example. So recall that again, I started this lecture where I was saying that I'm gonna fix D to be the uniform distribution. So I had this one over square two to the n, so position over x comma C of x. This is a uniform quantum example. There is a technique which you'll be doing in the exercise. If you just apply the Hadamards and if you measure, you can actually produce this quantum state. And if you just measure this quantum state, you're actually producing an s from the distribution C hat of s square. So essentially using constantly many quantum examples, you're actually producing this quantum state and I can actually sample from the distribution C hat of s square. And as far as we know, classically, this is a hard task. Like if I just give you uniform x comma C of x's, producing a distribution that's even close to this distribution C hat of s square, that is a hard task. Like I mean, there are certain regimes where you can even prove that it's exponentially hard. But just in this example model, we know the sampling from this distribution is hard, whereas given uniform quantum examples, you can actually sample with constantly many samples. Good, so at least, okay, here is one subroutine which could potentially be useful. Given a Boolean function, I can sample from the Fourier distribution quantum efficiently using just like order one quantum examples. But classically, I don't know how to sample from the Fourier distribution. And as I said in the previous slide, this will be almost like a backbone of almost all known quantum speedups in uniform distribution back learning. Good, so let me discuss some applications of Fourier sampling and how it helps me be learn some functions faster than the state of the art classical algorithms. Good, the first thing I'm gonna be looking at is the class of parities. So C1 is just a function where for every S, CS of X is just the inner part between S and X. So this inner part is a summation over I, SI, XI, and the summation is mod two. So S dot X is a number, it's in zero comma one and this concept class has two to the n many thing, two to the n many concepts, each labeled by an unknown S and CS of X is just the inner part between S and X. So one thing it's not too hard to show is if you're just given labeled example X comma CS of X, the number of labeled examples necessary to learn the unknown S is omega n. In the exercise you'll also be seeing why order n is an upper bound. And one more thing which is well known from, like although it's a very simple subroutine but Bernstein-Masrani observed this back in 93 is that if you just have like constantly many quantum examples that assume position over X comma C of X where the distribution is uniform, constantly many quantum examples actually suffice to learn C1. So here's a concrete example where quantum examples are exponentially better than classical examples. That is order one quantum examples are sufficient, omega of little n classical examples are necessary. And you will be seeing in the exercise session both upper bounds classical and quantumly. So let's look at one more example. This is slightly non-trivial now. This is called the class of junters. So C is just a Boolean function on n bits. We call it an L junter if C of X only depends on L bits. So let's just look at an example first. So C is defined on n bits X1 to Xn but the function is so defined as X1 to Xn is only defined on X1, X4, Xn. Only on three bits. So it's a three junter. For example, C could also be Xn plus X2 times Xn. This is again a three junter. So look at the class of all possible Boolean functions that only depend on L bits. A priori could have depended on all n bits but this is the new concept class I'm coming up with where C2 is a class of all n bit functions where you're promised that the function only depends on L unknown bits. In this case, one for n, in this case, 10 to n, which is a three junter. So this is a very interesting problem for classical learning theory. And in fact, just the regime where L is equal to log n and D is being uniform, like efficiently learning this has been a notoriously hard open problem. The best known algorithm we have is n to the log n and coming up with a polynomial time algorithm for junters is been a long-standing open question classically. So again, when I mean classically, I mean just given uniformly random X comma C of X where I promise you C is just a L junter. We know that learning it, the best known algorithm scales in quasi-polynomial time n to the log n. But surprisingly, you can actually learn this in, if it's an L junter, you can learn it with two to the L quantum examples and then time polynomial in two to the L. This seems exponential but if you plug L equals to log n, this is polynomial in n, polynomial in n. So quantumly you can learn junters in polynomial time using polynomial examples but classically the best known algorithm is super polynomial in n. We don't know of classical lower bound but the best known classical upper bound is super polynomial in n for just L being log n. Good, so I'll be talking about generalization of both these concepts and I'll give an algorithm for how to learn these, I'll give an algorithm for how to learn both these things but before that let me generalize both these things in the framework of Fourier sparse functions. Right, so we say a concept class c, a function c is k Fourier sparse so if you just write down all the Fourier coefficients, they're two to the n of them, just write down all the Fourier coefficients and the number of Fourier coefficients that are non-zero is just at most k. So you have an n-bit Boolean function, you write down the Fourier decomposition, the number of non-zero Fourier coefficients is at most k, we call the function k Fourier sparse. And you'll be seeing this in the exercise again, like if you write down the Fourier coefficients of cs, there's only one non-zero Fourier coefficient, so c one is one Fourier sparse and again, c two, if you write down the Fourier decomposition of c, it's two to the l Fourier sparse. So essentially you can just think of both bullets as just learning Fourier sparse functions for different parameters of k. Here k is equal to one, here k is equal to two to the l. So let's just consider learning the general concept class, script c, which is a collection of all Boolean functions that is k Fourier sparse. And as I said earlier, c one is a subset of this. Because the number of Fourier coefficients of elements in c one is one. And c two is also a subclass of this when I pick l to be log k. Because if l is log k, the number of non-zero Fourier coefficients is k, and so it's a subclass of this. And I'm going to give you a subroutine, I'm going to give you an algorithm where I can learn script c in polynomial time in quantum, using quantum examples. Good. Good, so again, we are restricted into this class, this concept class script c, is just a collection of all Boolean functions that are k Fourier sparse. And we would like to exactly learn, that is, I give you quantum examples, I would like you to identify what is the unknown k Fourier sparse function. And we are learning it under the uniform distribution as well. So classically, people have looked at this. So like a result of Haviv and Ragev in 2015, they looked at this problem and they said like, classically, the sample complexity of learning this thing is n times k. Recall that c is a Boolean function on n bits, and it's k Fourier sparse. So the classical sample complexity is n to the k. This is necessary and sufficient. And they proved that time upper bound of n to the power k. Good. And one thing that we observed is, you can actually learn this concept class using k to the 1.5 just quantum examples of this form. So classically, again, you have these quantum, classical examples, quantum, you have these quantum examples. So just given k to the 1.5 quantum examples, it is sufficient to learn k Fourier sparse functions. And observe that this upper bound is independent of n. So the truth table could be exponentially long. But if I just promise you that it's k Fourier sparse, and you could think of k being much smaller than 2 to the n, classically, the sample complexity depends on the entire truth table size. Well, not truth table size on little n, but quantumly, it's completely independent of n. It just depends on the Fourier sparsity. So if you promise the function is four-year sparse, quantumly, you can get a much faster quantum speedup using just quantum examples. Let me give you first a trivial upper bound of how to get k square, and then I will get to a k to the 1.5, but I'll just give you a proof sketch. I'm not going to give you the complete proof here. So, okay, let me first drive home the picture that more often than not, most of these quantum results that I'm going to be presenting here, essentially all that you're going to be observing is you can do Fourier sampling, then go to classical learning theory people. They provide you interesting theorems based on classical functions, use properties of these classical Boolean functions, and then use it for your quantum results. And in this case, one thing we're going to observe is if c is k for your sparse, then classical people have proven that the Fourier coefficients are actually nice. So all these squared Fourier coefficients k l is 1 over k square. And this is almost the most, from here you can get a naive upper bound of k square. So what do you do? You just use Fourier sampling. So I told you in the previous slide, given these quantum examples, I can, given these quantum examples, I can Fourier sample using order one copies. So using order one copies, sample an s from this distribution c out of s square. Now I know for a fact that this is k for your sparse, so there only came any non-zero Fourier coefficients. And I know for a fact that each of them is pretty large. So if I sample an s1 from this distribution, it's going to occur with probability 1 over k square. So I sample an s1, I get something. I sample again, I'm going to get s2. I know that each one of these samples are going to occur with probability 1 over k square. And now again, using a kind of coupon collector argument, you can say that you can collect all possible Fourier coefficients of this non-zero, of this k sparse function using order of k square samples. So take order of k square many samples here, do Fourier sampling on them, then use the structural property that classical people have proven. Let's see how to s square is at least one over k square. And then show that if I just repeat the sampling process k square many times, I collect all the non-zero Fourier coefficients. And then something that you'll be doing in the exercise is you can estimate all these Fourier coefficients using just classical examples, and all that can be done in polynomial time. So the overall sample complexity and time complexity is just polynomial and k. It's just order of k square, maybe a little bit more, but it's polynomial and k, but the time complexity for the classical algorithm scales is n to the power k. So this is a trivial upper bound. We can prove as something slightly sophisticated, this k to the 1.5 upper bound, but I'm not going to get into the proof of that. Right, the sample and time is k square. So the idea is somehow like, you might want to do something more than use this trivial lower bound of one over k square. You might want to use even more structural properties of k-sparse Boolean functions. And one thing we observe is the following. So you just repeat the sampling process, and you keep learning the sub, take S1, and then take S2. You repeat a certain set of times. We prove that after a certain set of samples, you actually learn the span of the entire Fourier support. So look at all the S for which C hat of S was zero, take its span, that is script V. So the interesting thing that we prove is it suffices to take the dimension times k many quantum examples to learn the entire Fourier span, where R is a Fourier dimension. And then again, we go back to classical, oh yeah, and once we get this thing, once we learn the entire Fourier span, you can actually use the classical result and get the Rk, use the Rk upper bound and you get Rk upper bound here as well. And it's necessary to upper bound R now, so for that again, we go back to classical learning or classical Boolean function analysis, and they've proven that for every case Fourier sparse Boolean function, R is at most root k. So the classical sample, the quantum sample complexity is R times k, since R is at most root k, it's k to the 1.5. So this is, again, if this was slightly technical, what I would like you to take away is the following fact. This was a purely classical as a result that people in Boolean function analysis prove for us. All that we did quantumly is we can see that we can Fourier sample, which is the main quantum observation here. And once we do that, we have to do some, again, bullet two is again a classical result, so bullet three and bullet four are just classical results. Bullet, just bullet one and two is something we do quantumly and the fact is we're just doing Fourier sampling. And we conjecture the right upper bound is Rk, but we don't know how to prove it. And we suspect that there's an open question in Boolean function analysis that we don't know how to prove, yes. Good, so once you learn this, right, once you learn this, you can perform an affine transformation. You can make this entire thing. So perform a transformation onto the Fourier dimension altogether. So now, for example, it just depends on the first R bits. So once you learn the Fourier span, perform an affine transformation so that C prime only depends on the first R bits. And then just to keep repeating classical sampling, you're gonna get X comma C prime of X, where you know C prime is just an R bit function. You don't care about the final N minus R bits. Good, let me go to the second example where quantum examples are much more interesting. And this was actually one of the first examples where quantum examples were much more powerful than classical examples. And that's for learning DNF formulas. So what's a DNF formula is just an R of AND of variables. So let's just look at an example. So you can, this is an R, like this. This is an R and this is an AND. So it's an R of two clauses here. This is clause one, clause two. And in each clause, you can just take an AND of variables, X one and X four and X three negated. And here you take a negated of X four and X six and X seven and negated of X eight, negation of X eight. So this is an R of ANDs and any boolean function that's representable as an R of ANDs is a DNF formula. And we say a DNF formula in N variables is S term if the number of clauses, so here is just two clauses. What you could have many more clauses, it's Atmos S. Good, so the question is again, so you have this concept class and you promise that everything in this concept class is an S term DNF, that means every boolean function on N bits can be represented as an R of ANDs and the number of ANDs is Atmos S, here it's two. And the question is can you learn this concept class with Atmos S terms under the uniform distribution? So the best on classical algorithm scales is N to the log N. It was proven in the 90 and it's still been the state of the algorithm. We don't know any better algorithms for DNF learning if you don't make any further assumptions on DNF. And it's been a long-standing open question. I think people conjecture that you can actually learn it in polynomial time just using classical examples but we don't have a polynomial time algorithm yet. And Schutty and Jackson back in 95 they gave a quantum polynomial time algorithm for this problem. So for learning DNFs, again the state of the algorithm scales is N to the log N but quantumly we just quantum examples, you can actually solve it in polynomial time. Let me give you a search of the upper bound. And again as I said, most of the hard work was done by a boolean function analysis and the quantumly we just observed that you can Fourier sample. So the structural property that we need here is the fact that if, so people have proven that if C, that's a boolean function on N bits, is computed by S term DNF formula, in this case two, if you had more N terms it's S then there exists one large Fourier coefficient. So there exists one U for which C hat of U is at least one over S. This was given as a black box to us now. Good, so what do you do quantumly? You know only one trick, sample from the Fourier distribution. I give you quantum examples, you sample T from the squared Fourier distribution C hat of T square. And you repeat this thing poly of N many times, poly of S many times and eventually you're gonna hit a T for which C hat of U, sorry, you're gonna hit a U for which C hat of U was large actually. So you keep sampling from the Fourier distribution, poly of S many times, you're gonna find a U for which C hat of U was actually large. And now this is almost good enough, like from this you can actually construct a weak learner. Because we know that C hat of U is at least one over S, it's not too hard to show that you can from that construct a hypothesis H for which H of X is equal to C of X is half plus S. And this H will just be the parity function on U. So what did you do so far? You had a quantum example, you Fourier sample poly of S many times, say you found U, U appeared one maybe twice or thrice, take U and chi U, which is your H now, actually has a pretty good, not great overlap with C. So it has half plus one over S overlap with the unknown function C, which is a DNF. But that's not good enough because we want, like if you recall the definition of back learning, we want this to be at least two thirds, we don't want it so, like you could have just flipped a coin, that would have been half. This is slightly better than half, it's half plus one over S, but in back learning or in general learning theory, you want to output a function H, which is at least two thirds close to the unknown concept. So this is getting you somewhere, but we don't know how to get it all the way. And this is where they use another technique that classical machine learning people came about, is this concept of boosting. So boosting is something which was introduced by Freud and Shapir, where they said that if you have an algorithm that succeeds with probability, that's slightly above half, you can always boost it to an algorithm that succeeds with probability two thirds. So you can think of it as a black box subroutine, but there is something, I'm sweeping out of the rug here, but you can think of it as a black box subroutine that takes a weak learning algorithm, that is Fourier sampling and outputting H to be chi U, which has bias half plus one over S. It takes this weak algorithm and produces a strong algorithm that has an overlap of two thirds. So this is boosting. So again, this bullet five is totally classical, bullet one is totally classical, all that we did quantum was Fourier sampling and we use this observation that there is a good overlap and at that point we're almost done. So essentially the main quantum part in this slide in the previous slide is you can Fourier sample and you can get a polynomial time quantum algorithm for which the state of the classical algorithm is super polynomial. Any questions? Yeah. Yes, I think. I think these guys had like an S to the fire, S to the six, but there was a follow-up paper by Shuti where he has S square or S cube. I think the sample complexity is polynomial and as the time complexity depends on N comma S. I mean in complexity like amplification means like different things as well. So boosting is just like it takes like an algorithm that does slightly better than random guessing and it converts it to an algorithm that does two thirds. Maybe there's a relation between boosting and amplification as well, but it's just like a weak algorithm converted to a strong algorithm. Oh, yes, yes, yes, definitely. I think there was an algorithm like from last year or something or a couple of years back where they gave an N to the log S algorithm, algorithm for learning DNFs, but this is like like in the past couple of years. So if S is N, this is N to the log N. Yeah, so I'll talk of the membership query model. So yeah, so again, so again C is a concept class of N with Boolean functions. C star is an unknown Boolean function in script C and so again in the classical membership query model, what you can do is the following. You can query C star. So earlier in so far I just said like I gave you uniform quantum examples or uniform examples, but classically let's just suppose like again query C star on a point X. So you as a learner can give me X, I'll tell you what C star of X, you see that you then tell me Y, I'll give you C star of Y and so on and so forth. So this is the membership query model. The goal is always to learn C star or maybe you can weaken it by saying output a H such that H is close to C star this way. And the complexity measure is the total number of queries you make to C and the sample, the query complexity of the concept class is like. Okay, so you can hear me, good. So in the classical query model, so you can make, you can query X to get C star of X and somehow in the quantum query model, you can make quantum observation queries. So you can make quantum observation queries in the following format. So you have access to this unitary OC star that on input X comma zero provides X C star of X. And in particular, in particular, if you can feed it like a so position X comma zero and if you feed OC star to this, you can prepare actually uniform quantum example. So actually this quantum queries are pretty powerful. They can not just make so position queries, but you can also prepare quantum examples this way. So these are pretty powerful model and the goal is always the same. Given quantum queries to this, you would like to produce, you would like to learn C star either exactly or produce a hedge where C star and hedge are close. And the complexity measure is similar to the classical complexity measure is classically the number of classical queries you make of this form, quantum in the number of quantum queries you make of this form. That's a Q of script C. Good. And the main open question, so the main question that we have here is like, can quantum queries, are they more exponentially efficient than classical queries? So given just classical query access versus quantum query access, is there a problem or a concept class for which quantum queries are exponentially faster than classical queries? In time or in query complexity? And the answer is no, actually. So the point is there's this very nice result that says like, so Q is at most D because quantum queries can always simulate the classical query, but the classical query complexity is always N times Q cube. So the best relation between quantum query complexity and classical query complexity is at most polynomial in N comma the parameter here. So it's good to know that quantum queries do not buy you much more than a polynomial speedup than classical queries. And in particular, in the membership query model, quantum queries can give you at most a polynomial speedup than classical queries. You could still get like Grover kind of speedup, so like you could get quadratic speedups, or even like for example, learning parities, Q is one and D is N, so this inequality is fine, but so you could get a one comma N or a Grover kind of speedup, but you could not get for like an exponential speedup when Q and D are non-trivial. Good, so this is the best upper one we can show, and we conjecture that the right upper bound is N cube square plus Q, N times Q square, and the reason we conjecture this is because of Grover, you could just look at the concept class of just like the point functions, the quantum query complexity is root N there and the classical query complexity is N there. So we suspect the cube can be made a square, but we don't know how to do it, so that's an interesting open question. I mean, there is no difference, like Grover is one particular instantiation of a concept class for which you could get a quadratic speedup, but what we believe is that's the best kind of speedup you should be able to get for an arbitrary concept class. Good, I didn't plan to go over it, but I can tell you quickly what's going on, so there is a combinatorial parameter that classical learning theory people introduced, it's called a greedy gamma parameter, so like what would the best classical algorithm do? For example, I look at the concept class, I'm gonna query an X, which rules out the most C's by making a query to X. Now once I come to the subset of C's that are alive after I make a query, find the Y that kills the most Y's and so on and so forth, so I can introduce a gamma parameter and show that somehow like this gamma parameter is one over square root gamma is a lower bound on Q and one over gamma is an upper bound on D because of this greedy algorithm I gave you and that kind of does it. The proof is not hard, but I didn't want to, I didn't explain it offline. Good, let me go to the second part of the talk. Good, so the point is like we saw so far that DNF formulas or shallow circuits or depth two circuits are actually quantumly, you can get an exponential speed up compared to the classical state of the algorithm. Can we extend the depth a little bit more? And for that I need a three gates, one is the AND gate, that is one, if all the inputs are one, R gate that is zero, if all the inputs are zero and majority gate it's one, if the number of ones in the input are more than N over two. And we say that a Boolean function on N bits is computed by a shallow circuit, if it can be computed by a constant depth circuit with a following format. So NC zero is a class of all problems that can be class of all functions that can be computed by circuits containing AND or not. And not only that, the AND can take in only two inputs. So this R can take in only two input and can take in only two input, this AND can take in only two input. So you can have NOT and OR gates, but all of them have just two inputs fed to them. Yes. I'm sorry? Just a second, I'll get to that. So that's NC zero. AC zero is when the fan-in is arbitrary. So here I'm just restricting the fan-in to be two, but for example this R gate, for example this R gate takes in many more inputs to it and so on. So this is like unbounded fan-in. This top R gate took everything as an input here. So the fan-in to these gates could be arbitrary. That is AC zero, so NC zero has fan-in two always, AC zero has unbounded fan-in. And the final thing we'll be looking at is TC zero, where not only you have AND or NOT gates, but you have this majority gate. So again, majority gate is just one if the majority of the inputs are at least N or two. So that's the majority gate. So we're gonna be looking at NC zero, AC zero and TC zero. And these are very interesting classes even for classical learning theory and we'll try to understand why quantumly. There could be something or there could not be anything. So first why consider NC zero and AC zero? So I'm not gonna be talking about these results, but there have been more than since 2018. So there was this breakthrough result of Ravi, Gossett and Konig where they kind of said like shallow quantum circuits can do something that log depth NC zero classical circuits could not do. And there have been many, many results for like an average case setting for certified randomness, for randomness expansion and so on and so forth. Where people have shown that you can do something with a shallow quantum circuit but you could not have done with classical circuits. And it kind of are even like in the local model of communication. So it kind of begs the question, like maybe like quantum circuits are actually pretty well understood and they seem to be more powerful in classical circuits. Maybe could you have learned them also more efficiently? So maybe do quantum circuits given advantage for learning? Did it give an advantage for all these other tasks? Maybe they do give an advantage for learning as well. And this also would generalize the result of Shruti and Jackson who said that you could learn depth to circuits. But more than that we don't know yet. So that's why we looked at NC zero and AC zero. And we looked at TC zero for another reason. TC zero is a theoretical way to model neural networks. So a simple feed forward neural network takes in as input x1 to xn and then you have all these sigma functions inside. You have all these sigma functions inside with weights w, zero to wn, the weights could be exponential in n and finally output something here. And there was this seminal result back in the 90s. They showed that like feed forward neural networks are morally equivalent to constant depth polynomial sized threshold circuits. So you could just instead of, if you're not comfortable with neural networks, you could feed forward neural networks, you could just think of TC zero circuits, unbounded fan in and or not majority gates. And the question we kind of, the question we kind of ask is the quantum resources, say if you have quantum examples, quantum membership queries, maybe if you just have quantum examples, the quantum resources help learn the class of neural networks faster. So that was the motivation of looking at TC zero. So let's look at a simple learning algorithm for NC zero first. So recall NC zero was the class of all functions on n bits that a C can be computed by an order one constant depth circuit with and or not gates with at most two bits, fan in two bits. So the simple observation that we have is you're supposing that depth is D and you know that all gates have fan in two, then the function could not have depended on too many bits because let's just say on the first layer you had fan in two, the second layer you had fan in four, two and it became two, it became four, it became eight and so on and so forth and depth D, although C was a function on n bits, it could only have depended on two to the d bits because it's depth D and the fan in is two. Good, so we just observed that depth D and C zero circuits are actually two to the d Hoontas. And we know for a fact that learning Hoontas is notoriously hard for le equals to log n or log n Hoontas are hard to learn but as I said a while back, like Hoontas are actually efficiently learnable in quantum polynomial time. So just putting these two observations together, we observed that n C zero of depth D is just a two to the d Hoontas and if you just plug in the result of Atichi and Servetio, you get the runtime of their algorithms n times two to the two to the d but d is a constant because we're talking of n C zero constant depth circuits. So if d is a constant, this entire thing is linear in n or at most polynomial in n. So in quantum polynomial time, you can actually learn n C zero and classically we don't know how to learn n C zero. The best known algorithm would scale as n to the, or I think it would be n to the log n but yeah. So at least quantumly we know how to learn Hoontas in quantum polynomial time. And so that's kind of promising like we want to learn n C zero, A C zero and T C zero. At least for n C zero, we gave like an extremely fast quantum algorithm. It was in fact linear in n. So the next question is, can we move? Okay, so the motivating question kind of is for the next 10 minutes is kind of like we wanted to look at constant depth circuits. We first have bounded Fanon that's n C zero. We kind of said, okay, you can learn it in linear time but for unbounded Fanon, we don't know what's going on. Once you have unbounded Fanon, you have A C zero and T C zero where T C zero is majority. Recall that these are neural networks. So you have this and this. And we can in fact go one step further if you have depth two circuits that's just DNF formulas. We know how to learn those as well in quantum polynomial time. So bounded Fanon, we get a quantum advantage for learning. And for depth two as well, we know how to get a quantum advantage for learning in the example model. But once we go to depth three here, we don't know what's going on. And if we have T C zero depth two, we don't know what's going on. And for higher depths, we don't know what's going on. And let's just see what's known classically. So so far I just like all this is, I would like to understand these models in the quantum framework, but classically what's going on? So just learning A C zero under the uniform distribution, the best known algorithm which we have the seminal algorithm of Lineal, Mansour, and Nissan in 89 where they show that you can learn A C zero circuits in quasi polynomial time. Okay, and essentially the main idea if you look at that is kind of estimating the four year spectrum. So they kind of learn the four, they observe some structural property about A C zero circuits that is concentrated well, and then they learn the four year spectrum in there. That's the first thing. And the second thing is after their result, or maybe a couple of years after their result, like Kharitanov proved that actually this end to the log and algorithm is optimal. And not only did he show that it's optimal, he showed that it's optimal assuming factoring is hard. Now this already kind of hints that, okay, there is maybe something quantum to do here. Like the crucial idea for this quantum, classical upper bound is estimating the four year spectrum. We know four year sampling is easy, quantumly. And we know that this is optimal, assuming factoring is hard. But we know factoring is hard again, quantumly easily. So maybe putting both these two things together maybe quantumly we can, A C zero is learnable in polynomial time. No, no, you're just saying if you have to learn it. They just, I'll come to the proof, but it's just not about learning the four year spectrum. It's just like unconditionally if you want to learn A C zero well enough. And how about TC zero, not much is known. Like even depth two TC zero circuits only until a few years back, we knew some results about the hardness of learning it under a uniform distribution. So in this same paper, Karitanov also ruled out polynomial time algorithms for learning TC zero, assuming factoring was hard. And there was a result by Clivens and Sherstow in 2009 where he showed like, they showed that for PAC learning where you don't just need to learn under the uniform distribution, but you need to learn under all distributions, even depth two TC zero circuits are hard. So kind of that motivates the question, can we learn, can we learn AC zero and TC zero and quantum polynomial time? And we proved two results in this direction. And the first one is, so we're going to fix the uniform distribution and just given quantum query access. We show that if you can learn AC zero and TC zero any better than classical. So if you can improve end to log n to anything better than end to the square root log n or anything of that sort, you can break learning with errors, which is like, so learning with errors, I'm not going to define the problem, but this is like the backbone of classical crypto systems. So we say that if you can have better learning algorithm for an interesting class of AC zero and TC zero, you could have solved learning with errors and polynomial time and that would be a breakthrough in itself. That would be a breakthrough. And the second result that we had a couple of years back is, okay, this is for arbitrary depth. We don't know what's the depth for which we can prove hardness, but just depth two TC zero circuits. So recall, we know how to learn depth to AC zero circuits, but depth to TC zero circuits, if you can have a non-trivial algorithm and I'll define what non-trivial means. When I mean non-trivial, I mean anything other than four year sampling or just querying the entire truth table. If you can do anything smart quantumly, then you can have a breakthrough in complexity theory. So the way I would view these results is like twofold. One is this might be an evidence as to why we are not able to come up with better quantum learning algorithms for interesting classes. But on the other hand, also you could think of it being an optimist like, okay, maybe if I come up with something interesting, maybe that does have a consequence somewhere else. I think I'm very much running out of time, but I'll try to go with a proof sketch and then I'll try to stop there. Yeah, so I'm not going to give you the proof of this, but I'll try to give you the proof of this and I'll give you a high level overview as to why learning implies something, why learning implies either circuit lower bounds or why learning implies you can break some crypto system. Right, so to begin with, you need the concept of pseudo random functions. So pseudo random function is script F, that's just a Boolean function, that's just a function that maps n bits to l bits and there's a function one for error in the secret key. So for every k bit string, that's a secret key, there is a function in the function class. And for this slide, I'm just going to be using notation. So when I mean A to the F, that means script A, that's an algorithm. When I mean it script A to the F, that means A can make queries to F. So given F, I give it oracle access, I can make quantum queries, classical queries, whatever I want. So A is an algorithm that can make black box queries to F and the point is we are not going to charge it. So it can do it for free, whatever it wants, but A should run in polynomial time. So we say a pseudo random function script F is secure if it satisfies the following property. So script A, so the point is either I give it FS where FS is coming from this function class script F or I give it U where U is uniformly random. It should not be able to distinguish whether I give it something from the function class or a uniformly random function with a good enough bias. In particular, so we say a function family F is secure if no polynomial time algorithm can distinguish whether it got access to F from this function class or uniformly random function. And if that's the case, we say this function class is secure. So as I said, A cannot distinguish between a truly random oracle U or a fake random function from this function class script F. And the point is, here the point is I assume that A makes classical queries to FS, but we say script F is quantum secure if A can make quantum queries to script F or script U and still not be able to distinguish which of it it got. And as I said, the main crypto system that we're gonna be using is the learning with errors problem. So this is one of the leading candidates for post-quantum crypto systems and we don't know whether there exists a quantum polynomial time algorithm for it, but we suspect it's hard. The best known algorithms run in exponential time and even if you add a sub exponential time quantum algorithm, this would be a big breakthrough. And the hardness that I'm gonna be proving in the next slide, I'm gonna just be assuming polynomial time hardness, that is, like forget even exponential time hard, exponential time, sub exponential time algorithms, even if you can, we're gonna be assuming that even just polynomial time, it's hard. Good, so let me tell you, this is the main, the main sub routine that I'm gonna be using and this is kind of like what you should be taking away while learning implies something. So, okay, I'm just define what's a pseudo random function family and what it means to be secure. So first things first, this is not a Boolean function. Like more often than not, when we think of learning, we think of wanting to learn Boolean functions. So let's define this new concept class, script C, subscript F, where I just take the first bit of this output. So FS was a, the output of FS was an L bit string, I just take the first bit, it's a one bit. So this thing, script CF is just a concept class in the standard sense. So let's assume that B is an efficient learner for script C. So let's say that B is just a learner for script C and we're gonna device a new algorithm, A, that does the following. So A is given oracle access to either O or uniform, so A is either gonna be given oracle access or something from this function class or uniformly random oracle. Okay, so A is gonna be my eventual distinguisher. So A is either given a function which is structured from this concept class, or A is given a uniformly random function. And all that we know is it, we know that there exists a B that could learn this concept class. So what does A do? A prepares this opposition and makes a query on O of X. Note that the goal of A is to kind of, the goal of A is actually to distinguish whether O was coming from this function class or O was uniformly random. So A prepares this thing and it can prepare it efficiently with just one quantum query, passes it to the learning algorithm B. And whenever B makes a query, A can simulate. So when, exactly when B makes a query, A can simulate it as well. So this is what happens. And we know that learning algorithm, once it obtains many quantum examples and many quantum queries, because it's an efficient quantum learner, it will eventually output a hypothesis H. And what does A do? A says O is in CF, if supposing the output hypothesis matches the Oracle query. So whenever B wants to make a query or B requires a quantum example, A kind of can make a quantum query or a quantum example to either O in CF or O being uniformly random, passes it to the quantum learning algorithm. The quantum learning algorithm outputs a hypothesis H and A looks at the hypothesis H, makes one more query on a uniformly random X, checks whether H of X is equal to O of X. And this is going to be my algorithm and the kind of small technical lemma that one needs to prove is if B is a learning algorithm that has bias beta, then A has bias of beta over two. And in particular, if supposing beta is one over poly of N, A can serve as a distinguisher in the sense that it can distinguish whether O came from CF or O was uniformly random and it has a bias of one over poly in N. But that could not have been the case because we assume that like, the point is now we came up with an efficient quantum polynomial time algorithm A that could distinguish between a uniformly random function or a function from the concept class and with bias one over poly in N. That means we broke the crypto system, but then we assume that that should not have been the case. So if A was a secure crypto system, you should not have been able to break it, but we started the assumption there is an efficient algorithm and we broke it. I'll just come to the next slide. So the point is now this is like the subroutine and how are we going to use it to compute AC0 and TC0? So the starting point is to go to this PRF function family given to us by Banerjee, Beikert and Rosen. So they give us a pseudo random function family that is secure assuming the learning with error problem is hard and not only that, they showed that one thing that we can show is this for every S, FS can be computed by TC0 circuit. And in particular, every concept in our concept class actually could be computed by TC0 circuit and the main result that we have is if there exists an quantum learner for CF, which is a subset of TC0, in particular the CF that is given to us by this paper, then there is a polynomial time algorithm for the LWV problem. So we look at a particular class of pseudo random functions given to us by this paper, observe that it's actually implemented by TC0 and then show that use the technique that we had in the previous slide and say like if you had an efficient algorithm for TC0, you would actually break this PRF, but this PRF is believed to be secure assuming LWV is hard. The similar idea doesn't work for AC0, so there is some technical thing that we need to do and somehow we also get a similar thing. We can construct also PRFs, which are also secure and show AC0 is harder. You should not be able to get faster AC0 algorithms quantumly. Let me conclude with the final slide. Yeah, so this just proves the hardness of AC0 and TC0, but there are a couple of drawbacks of this. The first thing is, this is true even classically, it's just not a quantum problem. It's just that, for example, if you use this PRF approach that I had in the previous slide, for example, if you go to Karitanov's lower bound, they could not say anything about the depth of the circuits for which you could prove a lower bound against. So for example, none of the PRFs that Karitanov or some subsequent papers use, they could not implement this PRF in depth less than six. So we know depth two is easy to learn. We know hardness for depth greater than six, but what about depth two, three, four, five? And so this is another result that we had where we actually argue about depth two circuits even. So if C is a class of polynomial sized circuits concepts that can be learned on the uniform distribution with bias gamma, then we show that you can actually prove that BQE or bounded quantum exponential time is not in the concept class. So if this is a little bit, let's just go over it a little bit slowly. So let's just see, for example, you have a concept class. What are the two trivial algorithms you can do? One, you just query the entire concept class. So I make two to the end queries, then my error is zero, because I've exactly identified the concept class. So epsilon was zero, that means gamma was half, and so this quantity is essentially two to the end, and my query complexity was two to the end. So I do the two trivial algorithms, either I query everything that takes query and time two to the end, and the error was zero, that matches this. Or I do Fourier sampling, because I can do that quantumly. So if I Fourier sample, I just take time poly of n, in fact just apply a bunch of Hadamard, such as n gates, and I measure, I get an S. I can just do Fourier sampling, but if I do Fourier sampling, my bias is just half minus two to the minus n over two. So gamma is two to the minus n over two. If I plug that here, this is just auto one, and essentially it's poly of n. So essentially this main result is saying that if you can do anything more smarter than just doing these two trivial algorithms, that's what I mean by non-trivial algorithms, then actually you could prove a circuit lower bound. And as one particular application, one thing we can show is if you just look at depth two TC zero circuits, if there exists a non-trivial learning algorithm for TC zero two, then you get new circuit lower bounds. And as I said, like before, there are two ways to think about this. This explains why maybe coming up with new learning algorithms is hard, or it gives a new motivation for providing new quantum speed ups. Yeah, I think I'll stop here, but yeah, I'll think I'll stop here.