 Hi, everyone. So I'll be talking today about batch verification and proofs of proximity with polylaborative mingle overhead. I'm Ron Rothblum, and this is a joint work with Guy Rothblum. And just in case you're wondering, no relation. Okay, so what is this task of batch verification? So the setting that I wanted to think about is the following. So we want to verify the correctness of KMP statements. All right, so we have KMP statements. We want to check that each and every one of them is correct. In order to do so, we're allowed to interact with a prover we don't trust, and the prover knows the empty witnesses, right? So one trivial solution to this problem is for the prover to simply send over the K witnesses. And what we are asking is whether it's possible to actually do better. So, you know, let's try to be very concrete. So here's an example, right? So suppose that we're given K integers and one up to NK. And we want to verify again that each and every one of them is in RSA moduli. It's an RSA moduli, just a product of two NBIT primes. So how can you do this? Well, the trivial solution, again, just send over the factorization for each of the integers. You can easily check that the factorization is correct and that it has the right form. It's a product, just a product of two primes. Can we do that? Right, so we have a verify who's given K integers. We have a prover who is given the factorization. And the question is whether the prover can indeed convince the verifier that each and every one of the integers is in RSA moduli. And to do so, it's better than trivial communication. Communication that's much better than K times N. Right, so to put things on someone more formal standing, what we're thinking about is an interactive proof. We have a verifier who's given K inputs of some language L and the proof is trying to convince the verifier that all of the inputs belong to L. The prover in addition is given the K empty witnesses. We want completeness of course, so if the of the statements are correct, the verifier should accept. If even one of the inputs statements is incorrect, then no matter what the cheating prover does, and here I'm talking even about a computationally unbounded cheating prover with high probability we want the verifier to reject. Right, so sounds statistical. In terms of efficiency, we're looking for a polynomial time verification and we'll also would want for the honest prover in contrast to potential cheating prover want the honest prover to run in polynomial time given these K empty witnesses. So that's the setting that we'll be considering for the first part of the talk. And our main result there, well, the TRDR is that we managed to achieve, to construct a batterification protocol for a subset of languages in NP, subsets a sub a class called UP with poly logarithmic overhead. Right, so what is UP? Well, just a subclass of NP or all NP relations in which there's a unique yes instances have a unique witness. Right, so this turns out to be a very rich class. It contains a bunch of problems in particular, a lot of natural problems coming from cryptography, things related to factorization, discrete log and so forth. Okay, so that's the TRDR, the long agonizing version, not terribly agonizing says the following. So for every language in UP, we have an interactive proof in order to verify that these K inputs X1 to XK are all in language L. The communication complexity is polynomial in M. M is just the length of a single one of the witnesses. So the communication complexity is polynomial in M. I want to emphasize that this polynomial is some fixed polynomial that depends only on the language L, doesn't depend on K. Okay, so some fixed polynomial that depends on M and only a poly logarithmic dependence on K. Okay, so that's a communication complexity, which is really what we're emphasizing in terms of a verifier runtime. Well, you know, the verifier has to read the inputs. So that's this K times N. Part that you're seeing here in some settings where the input is given in a coded format then even that you can save. And also, you know, the verifier needs to interact. So this poly M times poly log K comes from the interaction and the proof runs in polynomial time. Okay, so that's the first main result. Couple of things I wanted to say about this. So first of all, for certain relations, certain UP relations. So if you can verify the UP witness and we have bounded up circuit or a bounded space Turing machine, then we can actually improve the poly M factor to just linear in M. Okay, so for a very natural and rich subclass of UP, we actually got the dependence on M to be linear. And that is essentially optimal. So by results of Glowdrich and Hastide and Glowdrich, but Hanuman-Wiedersen, one can show when there are very reasonable complexity assumptions that then your dependence on M is inherent. Okay, so that's the first main result. I want to contrast somehow, how it compares to previous work. So maybe first thing that I want to compare to is what you can get via the IP equals P space theorem. So to check that K and P statements are correct, this is something you can do in very small space. By reusing space, you can essentially check this using M plus log K space. If you then apply the IP equals P space theorem to this, what you would get is an interactive proof with very good communication. Communication has polynomial and M and log K. The catch, however, is that using these generic transformations, the resulting protocol would have an incredibly inefficient prover. Prover would be running an exponential time, even given the NP witnesses, the UP witnesses, right? So that won't do. And really this question of doing batterification for UP with an efficient prover is something that we first studied in the work with together with Omarine Gold and Dyer-Authbaum four years ago. And again, the result there was also for UP and we managed to get non-trivial communication complexity. So like the current result, the dependence on M is polynomial, there's this poly M log K dependence, but we also had an additive dependence on K. So really that protocol wasn't sublinear in K. So that was four years ago, two years ago, we managed to get rid of this additive dependence on K, but that came at a larger multiplicative cost. So it was poly M times K to the epsilon. Epsilon was some arbitrarily small constant. And in this work, we finally managed to kind of get the best of both worlds. We got UP batching with communication complexity with polynomial and M and in log K. And a couple of things that I didn't include in this slide are works that consider kind of computational soundness. We only want soundness against computational and bounded cheating prover. Those works also typically rely on some form of cryptographic assumption, whereas our results are unconditional. And also a separate line of work from this year actually that looks at batch verification for SDK and so focusing on zero knowledge, which is not our focus. Okay, so that was the first main result that I wanted to talk about. And interestingly, that result is proved by a connection established in this prior work with Omar and Guy to something that's called subliminal time verification. So what's that? So at least here's the motivation. So suppose that you have a researcher who's trying to develop some form of a COVID vaccine. So she's working on her computer and there's some huge database that you'd like to access and kind of compute on. And unfortunately, this researcher can't really download the entire database too big. All she can do is kind of just random access to this database. So what she'd really like is to use, you know, the service of her favorite cloud provider. Cloud provider can download the entire database, do whatever computation is required and provide the result to the researcher. But our researcher would like to verify the correctness of the result and that can be done via an interactive proof. So it's sublinear time verification possible. Well, it turns out that it is, right? You can verify things without even reading the entire input. If you're willing to live with kind of a natural notion of approximation and the notion that we're going to be considering is inspired by property testing. It basically says the following. We have this kind of this green blob. Those are inputs that are kind of in the language. These are inputs that we'd like to accept. Then we have the nearby surrounding area, inputs that look kind of close and having distance or focus on relative having distance. So this gray area, kind of no guarantees. And then we have the red area. Those are inputs that are far from having the property that we're interested in. And those inputs are those that we want to reject. Okay, so that's kind of the property testing view of what we'll do. And our second main result is a general interactive proof of proximity. So proving to you that an input is kind of close to this red blob with the following properties. So what we show is that for anivaried language in NC, so every problem computable by a bounded up circuit has an IPP in which the number of queries that you do is Q and the communication is CC, where Q times CC can be any kind of quasi linear function in N, okay? So for example, if you said both Q and CC to be roughly square root N, what you'll get is, you know, everything is kind of roughly square root N, communication, number of queries, runtime and so forth. So that's way. Different setting of parameters that turns out to be useful is when you said the communication to be polyvalborethmic, and then this result would give you slightly sub-linear query complexity, because you can get non-trivial query complexity, even though it's just polyvalborethmic verification. And that result in particular is something new. So prior to our work, there was a work of Guy, Salil, Wadhan and Avi Weiderson, which we really build on. And that work got a very similar result. So they managed to get the query times communication complexity to be anything that was like N to the one plus little of one where the little of one is some, you know, it's sub-constant, but not, but fairly large-ish function. And here we really get it down to polylog. And the significance really comes into play when you look at the second setting of parameters where you insist on polyvalborethmic communication complexity. So their result would give you basically, wouldn't give you sub-linear time of verification, whereas we achieved that. And that turns out to be crucial in order to get our bad verification result. Second thing that I want to point out is that this result is very close to optimal. So in work together with Kali, we show that you need the query times communication complexity to be at least N over polylog N. And that's under a reasonable cryptographic assumptions. Okay, so let me tell you about how we do this. And really the roadmap follows kind of an approach outlined already in prior work. So let's think of the eventual goal to get to this UP batch verification. In order to do so, we're going to leverage a result from this paper with Diane Omer from two years ago, where we showed that if you get an efficient interactive proof of proximity for unseen, sort of the exactly the parameters that we got in this theorem too, that would suffice in order to get theorem one. So all we need to do is get to the second rectangle, get efficient interactive proofs for unseen. In order to do that, we're going to follow the approach outlined by Guy Avian-Salil. So they show that there's a particular problem called p-value, which we're going to define in a second, which is kind of complete for constructing interactive proofs of proximity for unseen. So if we managed to just give an efficient IPP, interactive proof of proximity for p-value, then everything follows from that. So that is indeed our main focus and that's what I'm going to discuss today. So what is p-value? So p-value stands for polynomial evaluation. This problem is kind of, should think of it as being parameterized by a bunch of points. So x1, y1 up to x, p, yt. We'll see in a second what these points mean. The input is big, right? In this context of interactive proofs of proximity. So we have this big input. We think of this input as being a truth table of a function f. Okay, and remember we can only read relatively few points from the truth table. So what is the input in the language? So what we're going to do is look at this f hat. F hat denotes the multilinear extension of f. So the multilinear extension, just to remind you, is the unique multilinear polynomial that agrees with f on zero one to the n, where we're working over some sufficiently large finite field. It's not going to the exact details, but some sufficiently large finite field. And we're looking at the multilinear polynomial over this field that agrees with f on zero one to the n. So yes inputs are those in which, if I take my function, I extend it, and I look at a bunch of points, I'm gonna see a bunch of values. So I look at the points x1 up to xt. I need to see the values y1 up to yt. Those are the yes inputs. No inputs are all functions who are far from the set. So functions g that are far from any function f satisfying these t equations. Okay, that's p-value. And it's going to be useful for us to view the multilinear extension as a tensor code. So this is a well-known connection. If you're not familiar with tensor codes, it doesn't matter, you're going to define everything here. So remember, the input is a true table of this function f from zero one to the n to zero one, boolean function. And here I split the function into two rows where kind of the first row is the first half of the truth table and the second row is the second half. So in other words, the first row is what you get if you restrict the first variable, you fix the first variable of the function to be zero. I'll call that the function f zero and the second row is what you get if you fix the first variable to be one and we'll call that f one. All right, so that's our input f. What is one way to view the multilinear extension of f is as follows. What you do is you first interpolate these two rows to get, you kind of interpolate each column individually. All right, so each column, you're only interpolating on two points, you got a degree one polynomial. Now what you do is you go for each one of the rows and interpolate that. Sorry, you interpolate that as a multivariate polynomial. So really what you're doing is essentially you're cursing. You're taking the n minus one dimensional multilinear extension of each one of the rows. Okay, that's one way to view the multilinear extension. It's going to be useful for us. Okay, so we're trying to design an IPP for pVal. So our input is the blue part. In our minds, we extend it to be the entire kind of pinkish part. And the claim is that if you look at these red points, just a bunch of points that are part of the parameterization of our problem, you're going to see a particular set of values. All right, so blue is input. It's what we actually have access to. Pink is the multivariate extension and we have the red, which is the points. It's good to kind of keep this color coding in your mind because there are going to be a couple more colors coming up. Okay, so how do we do this? Well, we're going to follow a divide and conquer approach. Basically following the approach of Guy Avian-Salil up to a point in which we diverge. So step number one, we're going to ask the prover to kind of decompose each or break down each one of these T claims into the individual contribution to each one of the rows. Right, so essentially we'll go to each one of the columns in which there is a red box and ask the prover, what is the contribution to this column of the two rows? So that would be the two green boxes, two corresponding green boxes that we see here. And of course, you know, the prover answers and we can check that their interpolation is consistent with that. Okay, and notice that each box that contained the false claim before, the prover has to lie somewhere in the green boxes above in order to be consistent with the lie that it's trying to maintain. Okay, the question, you know, that's how we divide. How do we conquer? Well, following, you know, this approach of RVW, what the verifier is going to do is take a random linear combination of the row. So it choose at a random alpha and beta and, you know, thinks in its head about the row that you get by taking the linear combination of alpha times first row times beta times the second row. All right, so we get a new kind of virtual row that we don't really have direct access to but you can find it. And notice that the claims that we have on the green boxes can be translated into new claims about this random linear combination, right? So what do we have? We have this new input. It's distance, it's not hard to show that the distance of this new row that we got we haven't lost too much. It can be, it's not gonna be much worse for sure than what we started off with. And you know, the size is half the original size. So that sounds great, you know, just keep on recursing until the input is tiny and then you can read everything. So that's pretty simple. Fortunately, it turns out that it doesn't quite work. And the problem is that we don't really have access to this new virtual row, right? Every query that we want to perform to this new row will only be able to emulate by making two queries to the input that we do have access to. So the input is shrunk by half but the query complexity has doubled and the distances kind of stayed the same. So we haven't made any progress. Okay, so that's a problem. And the way that we're going to resolve this in this work is by looking much more carefully at the distance, right? So for example, kind of as a mental experiment, it's good to think of a situation in which you have an input of p-val, so a correct function f that interpolates correctly and then you throw in some noise. You throw in a bunch of errors. These errors are denoted here by these boxes with the red diagonal lines. Okay, so we threw in a couple of errors. And if you notice, you can see that in terms of relative distance when you, you know, if you take a random linear combination and the errors appeared before, then it really looks like the relative error has doubled because the absolute number of errors kind of looks like it stays the same but the size is shrunk. So in terms of relative distance, it seems like the relative distance has doubled, which is much better than we thought before. The problem is that this is not always the case because it could be that our errors are kind of aligned with one another and then they don't add up. In this kind of situation that we have, this example that we have here, the errors are kind of on top of one another, they're aligned and then when you take the random linear combination, the error remains in the same place, you don't kind of gain the extra errors. So the problem is that errors could be aligned. So a big problem. And the way that we resolve it is really inspired by a beautiful recent work by Benson-Sohn-Clapparty and Serov, just in the context of testing read-solving codes. And what we're going to try to do is randomly or pseudo-randomly, permute the truth tables of the functions f0 and f1, okay? So that's the high-level idea. Actually implementing it turns out to be tricky. So there are a couple of things that you need to worry about. For one thing, we start off with claims about the multilinear extension of f0 and fhat. After we permute them, then we need to translate the claims that we had into claims about the multilinear extensions of the permuted functions. Which doesn't seem obvious. And the way that we resolve this is by carefully choosing the permutation that we use, we essentially choose the permutation to be a random affine map. The fact that it's degree one lets us reason about the multilinear extension. So that's how we resolve the first problem. The second problem is that once we do these permutations, then also the claims that we have about f0 and f1 are no longer aligned. And then when we look at the claims that we got about the new row, the random linear combination, we're going to have double the number of claims. And the way that we deal with that is by introducing another kind of sub-protocol that reduces the number of claims back to the same as there was in the beginning. So let's do a quick counting of where we are. Now with the random permutation, we can actually argue that the relative distance doubles, the size of the inputs halves, right? The inputs are half the size. And the number of queries doubled, but then you can think of the number of queries and the sizes canceling each other out. So really what happens as you go along is that the distance increases. And that's enough so that at the end you can do things with very small query complexity. So overall, we're good. Okay, so to summarize, what we show in this work is a batch verification protocol for UP with only a poly-algorithmic overhead, poly-algorithmic and K, the number of instances. We do this by making progress on a different question which is interactive proofs of proximity. And in particular, we get a result with poly-algorithmic communication and a sub-linear number of queries. In terms of open problems, one question that has really been frustrating, we're trying to solve is, we have this sequence of works, all of them stop at UP. There's no reason that we can see they wouldn't be able to get some form of result for general NP relations. And that would be, I think, an amazing result. Or maybe there's an inherent reason why we're stuck at UP and the lower bound would also be very nice. That's one question. Second question, just in terms of pushing the parameters even further, can we get, rather than having a poly-algorithmic overhead, can we possibly just get a constant overhead? So say, the cost of doing this verification would be 10 times, rather than n times poly-alg. And I'm fine with some additive additional overhead. So that's the second question. Third question, we have this result of IPPs for languages in NC. So bounded depth or bounded space, SC is bounded space. Can you get any kind of result beyond that? For P or higher complexity classes, with, say, square root n complexity. So that's it. I, if you have any questions and please feel free to contact me either in the online part of TCC or via email. I'll be delighted to answer questions. So thanks for your attention.