 OK, the next talk is verifiable delegation of computation over large data sets by Savyosh Binabas, Rosario Genaro, and Yufgeni Varys. And Yufgeni will be giving the talk. This is a joint work with Ciavoz Binabas from University of Toronto and Rosario Genaro from IBM Research. And the topic is verifiable delegation of computation over large data sets. All right, so since we're talking about delegation, I have to mention cloud computing. So in a typical cloud computing scenario, we have a user and a cloud server. And the user starts with some initial data that he has and potentially a program or a code that he wants to run on this data. The user then uploads this code and data to the server and later on the server sends back the evaluation of the program on the data of the user. And of course, from a security point of view, one of the concerns is that the server could be potentially malicious or arbitrarily buggy, which is basically the same as malicious. And in this work, we're mostly concerned with the problem of the server returning an incorrect answer. So instead of sending us the correct value of the program on the data, he will send some other value, which is not what we're looking for. All right, and the goal here is our goal in this work is to efficiently verify that the result we got back from the servers indeed the correct value that we were looking for. And efficiently is the key here because, of course, I could always verify the result by running the program on my own, on the data that I have. So the whole goal here is to make sure that the work done by the user is significantly less than what is needed to run the code on the data. All right, so what does efficient verification mean more formally? So there are several ways we could potentially model this. I'll mention two of them, which have appeared in previous work. So probably the most common way of modeling efficient verification is the following scenario. So the user has the algorithm and the F and the data D and the description of both the algorithm and the data are relatively short. And the difficulty is evaluating the algorithm on the data. So the computation itself is complex, but the description of what needs to be computed is quite short. And so for example, the problem could be factoring a certain product of two prime numbers, not necessarily RSA numbers, we can't factor those, but any two primes. And the goal is to outsource this to a server. So then we could send the product to the server, the server would factor and send us back the primes. And verification here is completely trivial because we can just multiply the primes and we'll check if we get back our composite number or not. So here the requirement is that the verification time is independent or asymptotically smaller than the time that is required to compute F on the data D. And in particular, it can be linear in the description of both F and D, right? So the second scenario, which is the scenario that we're concerned with is where the data itself is too big for the user to store. So the function that is evaluated is potentially very simple. It could be linear or almost linear, but the complexity here comes from the fact that running it on a very long sequence of data takes too much time and potentially too much storage. And this has many applications in practice. So various regression models, private information retrieval and so on. All these problems have a very simple insecure description, but the secure way of doing them is hard. So here linear verification is not enough because it would take us as much time as we need to evaluate the function. So then the time to verify the computation needs to be significantly less than the linear in the description of the data itself. All right. So last year, there were several breakthrough results in the area. And so by Genaro, Gentry and Parano, and Kaimin Chung, I think, Kalai and Vantan and the Applebaum, Shain Kushi Levitz, and all of them show pretty much that any function can be outsourced in the manner that I described in the option two in the previous slide. So this is done by an initial setup phase where I process the data in some way. I upload the processed data to the server and later on I can compute on the data as much as I want without doing any work that's linear in the description of the data. So all three of these solutions are based on fully amorphous encryption. And as we all know, at the moment, it's not quite practical, especially not to be used on a large data sets. So in the meantime, is there anything that we can do without it? That's the question that we try to answer. And the second issue, which is a more technical issue, is that in the security definitions of these three works, there was a requirement that a malicious server doesn't learn whether he was successful in convincing the verifier of an incorrect result or not. And so this one bit of information, if it leaked could be used to break all these three schemes and we wanted to avoid that. So these were the goals of the work and the results are the following. So the main result in the paper is a new delegation scheme for polynomials, which means that what we want to delegate is a function of the form, just p of x, a single variable polynomial, which can have a very high degree. And the coefficients of the polynomial are the data that we're trying to outsource. So that's the primitive that we're building. And so the degree D can be arbitrarily large. That's the parameter that we do best in. Extending this to multivariate polynomials seems hard and we were not too successful with that. We do have some solutions. And one, I guess, interesting fact is that our construction does have this adaptive security which allows the server to learn whether he was successful in cheating or not. All right, so this has several non-crypto applications as I mentioned as well as it gives us some new results in keyword search and proofs of retrievability which I may mention later on if I have time. All right, and using similar techniques to the first result we built an authenticated data structure, so I'm referring to Babis' excellent exposition that basically provides a trade-off between the assumption that is used and how much power the verifier needs. So in our solution, the verifier needs a secret key. In previous solutions it was public key but we use slightly better assumptions. So I will not go into the second primitive, it's in the paper, and I will focus on the verifiable delegation scheme for polynomials. All right, so there is a long line of work in dealing with verifiable delegation and computation that goes between a prover and a verifier, of course starting with the works on interactive proofs and continuing with PCPs and CS proofs and so on. And most of these works work in the first model that I described. I think as far as I know only the recent results from last year deal with the second model where the function itself is very simple but the description is too long. All right, so how do we do this? So what does it mean to delegate polynomial specifically? Well, the user starts with the description which is just a sequence of coefficients and he processes them into some very long potentially public key which he sends to the server. And at the same time he keeps some short secret key which is significantly shorter than the description of the polynomial. Later on the user may get an input which he wants to evaluate the polynomial on so the user will compile the input in some way and send the compiled query to the server. And the server has to evaluate the function and send back the result and the certificate proving that the result is correct. Right? And the goal here is again to be convinced that the result Y is really the value of the polynomial or output reject. And I would add that in this work we only try to achieve verification. We didn't care about privacy of the input or the polynomial. We can achieve some privacy of the polynomial with some generic techniques which I'll mention later but the main thing here is the verification. All right. So let's go into the technical details. So the main tool that we use is a new primitive which we call the N algebraic PRF with trapdoor efficiency. And so what does this mean? So this is a pseudorandom function but given certain secret information about its key it will be possible to compute algebraic operations related to the function in a very efficient way. Right? So, well, everyone here knows what a pseudorandom function is but basically this is a function that is indistinguishable from a random function when the adversary is given an oracle to either the random function or the pseudorandom function with a randomly sampled key. And an algebraic PRF is just something that we use for notation. It's not, there's nothing, I'm not saying anything new here. All I'm going to use is that the range of the pseudorandom function forms an abelian group. So just to maybe convince you that this is really not as any special property and any function probably satisfies this property. A function that outputs a sequence of bits. The output of the function lies in some abelian group which is all strings of that length. So the main thing I want to keep from here in terms of notation is the fact that this group has some kind of generator. All right. So what is this trapdoor efficiency property? So the property is the following. Given some range on which we're going to evaluate the pseudorandom function and values which conspicuously look like related to a polynomial, we are interested in this value which is the product of the pseudorandom values and the product here is just the group operation in this abelian group. So this is the product of the pseudorandom values where in the ith element we raise the pseudorandom value to the ith power of x. Okay, so this can be computed directly from these pseudorandom values using the algebraic property. But what we want is an additional property which says that if you're given the key of the pseudorandom function, it is easy to compute this value and it's significantly, it takes significantly less time than linear in n. So this is the main property that we use. And more generally it may be interesting to examine other algebraic properties that can be computed efficiently. Right, so how do we use this to build a verifiable computation scheme? We start with a vector of coefficients a0 through ad which define the polynomial. We choose a random multiplicative masking coefficient c and we generate random value, pseudorandom values f sub k is zero up to f sub k of d. And now we're going to upload to the server for each coefficient in the polynomial. We're going to encode it by raising the generator of the billion group to the power of c times ai and then masking it with the pseudorandom value f sub k of i. So hopefully this will make sense in a moment. So then to answer a query x, the server takes a product of all these tokens that we gave him and the i token is raised to the power x to the i. And the result is going to be this certificate c as well as the actual plaintext value of the polynomial. Right, so why are we doing all of that? Oh yeah, so this is a, so we can achieve secrecy of the coefficients of the polynomial by just encrypting them with some additive or morphic encryption scheme and additive or morphic encryption allows you to evaluate a polynomial in these coefficients because it's really just an inner product of the coefficients with the powers of the input. Okay, so how do we verify this certificate? So the verifiers key is going to be the key of the pseudorandom function and the masking coefficient c. And notice that the server gets essentially a one-time mark for each of the coefficients of the polynomial in the exponent of this generator g. So what we're giving to the server in the exponent is the description of this polynomial c times p which where p is the original polynomial plus some polynomial r which is not random with pseudorandom but it's indistinguishable from random, right? So the server is going to evaluate this polynomial, this c times p plus r and send us back that value in the exponent of the generator. So and to check that the result is correct, the verifier will take the value that the server claims to be the output and then compute, recompute the certificate on his own. So the difficult part here is computing this r of x which the server had to compute directly by multiplying the tokens. Now, just before I go into details how this is done, just to see why the server can't cheat if this was random, well, this just falls from the fact that this is a, this c times w plus r is really a one-time mark of the value. So if the server could produce a mark of any value which is not y, well, actually he can't do that because this is an information theoretically secure mark if this is a random polynomial. Now, to get the, we get only computational security because r is only pseudo-random. So now the remaining question is, how does the verifier verify, evaluate this r of x efficiently? So this is done, sorry, I already said that. So this is done exactly using the trapdoor efficiency property. What the property says is that the verifier given the key can compute this r of x in time which is sub-linear in the degree of the polynomial. So this is almost everything except I have to say how we achieve this trapdoor efficiency. How do we build this pseudo-random function that has this property? So I'll start with, I'll give two constructions. They're very short. And the first one uses a very strong assumption but it's very simple. The construction itself is very simple. So the assumption is that the sequence g to the x, g to the x squared and so on is indistinguishable from a random sequence of group elements. So this variance, weaker variance of this type of assumption has appeared before. And we can use it to build a PRF with trapdoor efficiency. So how do we do that? Well the PRF itself is just going to be g to the k to the x where k is the key and x is the input. And the efficiency is achieved using a very simple closed formula. So what happens if we value this product? What we get is we get in the exponent of g the sum of the powers of k times x. So all the powers of k times x from zero to d. And this has a closed form expression which is this, which can be evaluated in time logarithmic in d. So this is the idea and this extends to, to some extent to multivariate polynomials. I won't go into this. The other construction achieves basically the same thing except it uses the standard decision of the Fielman assumption. And here we will use the now wrangled pseudorandom function. Again, we will just write down the expression for this product, which is this ugly thing. And by applying the binomial theorem, we get that this is, this requires only evaluation of log n, log d products. And by applying the binomial theorem, we get that this value is equal to the product that we're looking to compute. All right. So in the paper we have a slight generalization with a logarithmic number of variables. And that's it. So let me summarize. So we get this construction based on decision of the Fielman, or strong decision of the Fielman. There are several applications we can, we can, we achieve. One is verifiable keyword search. The other is proofs of retrievability. And that's it. Thank you for your attention. Unfortunately, we don't have time for questions.