 My name is Paolo Rakoff and I'm going to present a joint work with Benjamin Applebaum about statistical randomizing coidings versus statistical zero-knowledge proofs. Okay, so as you can see in this work, we're going to compare two objects. One is SRE and the second one is SDK, and we will understand the relationship between these two complexity classes. Okay, so let's start first to introduce these notions and their settings. So we start with the notion of SRE. In the SRE setting, we have a computationally limited client. In this case, it will be polynomially time bounded. He can do only polynomial time computation. He holds some input x and he wants to compute some function f. So since he's computationally limited, he cannot compute a function directly by himself. So he wants to outsource his computation to a computationally unbounded server. So naively, what he can do, he can just take his input sent to the server and say, okay, look, computer function f. But since we're in our cryptographic world, we'd like to get some privacy. So privacy in this setting means that we would like to outsource our computation to the server, but let the server only compute the output of the function without learning anything about the input except the output of the function on this particular input. Okay, so far I just described you the setting where I say that we have some fixed function f, and for this function, you have a statistical randomize encoding system. Okay, with every function, you can associate the corresponding language. So I assume throughout this talk that functions are binary. So how from a function, you can get a language. So every input which is made by a function to zero is not in the language. Every input which is made by a function to one is in the language. Okay, so now I can consider the set of all languages that have this statistical randomize encoding system and this complexity class I call SRE. Okay, so the second complexity class is SDK and probably people are more familiar with it. I will still remind you. So we have again the setting with computationally limited client and bounded server and in this setting, both of them hold some input x and they have some language in mind. So in the setting, the goal of the server is to convince the client that his input is indeed in the language and without the client learning anything about his input except that this fact that is if input is in the language or not. And again, you can consider all the languages that have such a statistical zero knowledge proof system and you denote it with SDK. Okay, so the goal of this talk will be to compare these two complexity classes, SRE and SDK. So just briefly, as you can see two settings are quite similar even like syntactically what is happening in both settings. So the main difference is actually like that on the left hand side in the SRE setting you basically don't have any interaction. There is like a single short message which is sent from the client to the server. Well, in the SDK setting, there is some interaction. This is the first main difference and the second is that if in SRE setting, if in SRE setting you want the security against a malicious server, in case he wants to learn too much about the input, then in SDK setting, actually both the server can be malicious if he wants to convince the client in the wrong statement. And the client can be malicious if he wants to learn too much about his input except if the input is in the language or not. Okay, so the natural question to ask, what is the relationship between these two languages? Two complexity classes, SRE and SDK. And what are the results? So this is what we knew before. Actually, almost immediately you can see that SRE is a subset of SDK and this has been observed by Benny already in 2004. And recently there has been work that actually defined the class SRE properly by Agra, Valysha, Kurata and Pashkin-Chernovsky in 2015 and they showed Oracle separation. So now what we know that there is SRE is a subset of SDK but we don't know if it's actually a proper subset or not. So we have some indication that it might be a proper subset due to this Oracle separation but we don't know exactly. So unfortunately we cannot and basically when comparing SRE to SDK, the natural question is that is it a proper subset or not? And unfortunately we couldn't solve this question directly but we got two new questions. Okay, so what we know, so what we did, we actually we sandwiched two new complexity classes between SRE and SDK and now basically we reduced the question of SRE versus SDK to two new questions about two different complexity classes. Okay, so we introduced a new notion which is we call one RE which is a relaxation of statistical randomizing coding where you require privacy only for a subset of inputs and we showed that this class is essentially equal to the languages that have non-interactive statistical zero-knowledge group system. So basically the interaction is substituted with a dealer who provides common randomness in this setting. Okay, so now the big advantage of this result is that given what we showed with one RE equals to NISC, if you want to compare SRE and SDK, essentially your task is reduced to two new questions. You need to compare SRE with one RE and NISC with SDK. So if you want to understand if SRE is a proper subset of SDK, you need to understand if this relation is a proper subset and this relation is a proper subset. So the later, for example, this equation has been studied extensively and it's believed to be quite hard. So, okay, we have, of course, hope that we can resolve it, but I mean, we have some hope that one can show it's a proper subset or not. And this is a relatively new question and it would be interesting to understand if basically losing privacy for a subset of inputs, if it changes the power of a complexity class. Okay, so this is kind of our main result and also we try to work on some complexity implications of if the class SRE is trivial or not and SDK is trivial or not. So what we knew before is that if SDK is not trivial, not equals to BPP, then auxiliary input one-way function exists. And this is the result of Dostrovsky in 1991, okay? And also in the recent paper, it has been shown that based on LW assumption or DDH assumption, you can separate SRE from BPP. So it seems that both languages, SDK and SRE should be different from BPP. So what we did in this work, we took the results from Dostrovsky and make an analogous result for the class of SRE. So we showed that if SRE not equals to BPP, then infinitely often one-way function exists. So this object, infinitely often one-way functions is almost one-way functions and they kind of seem to be stronger than auxiliary input one-way functions. So it still doesn't answer us completely to the question if SRE and SDK are the same or not, but it kind of shows that probably SRE is different from SDK. Okay, finally, and we compared our, we continued the work of On and Vadan. So they showed that if there exists a hard-on-average language in SDK, then there exists constant round statistically-hiding commitments. And we got an analogous result for SDK or for SRE. So basically if you substitute here SDK with SRE, you get collision-resistant hash functions. And it's not actually SRE, it's a variant of SRE, but it's essentially think of it as SRE. So we call it perfect, but okay, this is not so important. So these are the main results. And basically, as you can see here, the question of SRE versus SDK, we have basically made some progress towards understanding it, but we still don't know if SRE is a proper subset of SDK or not. Okay, so now I will briefly make an overview of the setting and then I can prove some results. Okay, so this is the SRE setting, as I have said before, you have a computationally limited client who wants to outsource his computation to the unbounded server. So the goal is to compute f of x without disclosing x. And okay, there is a traditional definition of property-based, there is a correctness property that the output produced by the server with high probability equals to the right output to the f of x, and there's a privacy requirement. It means that the server doesn't learn anything about the input except of the output of the function on this input. Okay, so now I will introduce the notion of bubbles that will help me to talk about distributions. So this blue bubble will represent the distribution which is produced by the client when he makes an encoding of his input. Okay, so whenever I have some input x, I run this encoding procedure that codes the input and this message is later sent to the server. So I will denote it as a blue bubble. So my first claim is that if you consider encodings of different inputs, which you relate to one, this I will denote with the color blue, so there are blue bubbles. So the first claim is that the blue bubbles are close to each other. Why? Because of the privacy requirement, all the distributions should be similar, otherwise the server could tell apart inputs that relate to one. Similarly, if you consider inputs that relate to zero, this is the red bubbles. Red bubbles should be close to each other because otherwise the server could distinguish inputs that relate to zero. Okay, and finally, due to correctness requirements, blue bubbles should be far apart because the server can at least learn the output of a function. So this is like conceptually, the picture looks like this. There are these two distributions, blue and red ones, and blue should be close to each other, red ones are close to each other, and the blue and red are far apart. So now let's look at the one-sided version. In the one-sided version, you require privacy only for one input and correctness as before. So basically the picture looks like this, the blue bubbles are still close to each other, but the red bubbles are now distributed the way we want, except that we should be far from the blue bubbles. Okay, this is the kind of conceptual understanding of what is the object that the server gives you. It gives you a way to map inputs to two distributions, and one-sided privacy means that you can map only like to one distribution and the second one is far apart. Okay, and now let's talk to the NISC setting. So before I introduced like SDK, there was some interaction between the client and the server. Now this interaction is substituted with the dealer who gives them common randomness. So basically in the beginning, the dealer creates common randomness, sends it to the client or to the server. The server can now compute the proof. The proof is sent to the client except so not. Okay, and the goal is again for the server to prove that input is in the language without revealing any additional information about the input. And again, there's a property-based definition. There is a correctness, soundness, and zero-knowledge proof. Correctness, it means that the server can always prove a correct statement. Soundness means that the server cannot prove an incorrect statement, and zero-knowledge means that the client doesn't know anything about the input except if it's in the language or not. And the notion where there is no dealer but there is interaction is just a traditional SDK notion. Okay, this is our main theorem. We prove that one-sided randomizing coding equals to the non-interactive statistical zero-knowledge class. So let's go into the proof. So we start by proving that the NISC is a subset of one-ary. So let's take any NISC system for some language L and we want to construct one-sided, randomizing coding for this language. So the construction works as follows. It's actually, it's based on all these old ideas about proving properties of NISC. So it's nothing super revealing, but it still works, so we just present it. So the first thing what you do, you take the simulator of the zero-knowledge, you take the input, and you get an output of the simulator. So simulating the NISC setting, it outputs you the string from the dealer and the proof. Supposed to be the distribution which is indistinguishable from the real world, okay? Given this thing, you can verify the common string and the proof against the input X, and the verification succeeds when you output the string which is common randomness. And otherwise you will put some fixed string which is outside of all the support of the dealer. Okay, and this is how encoding works. So now, okay, a small thing. So we have a non-interactive statistical zero-knowledge proof system, and we want to construct one-sided randomness encoding. So encoding, it takes an input X and a randomness R, and this is how it works. Okay, so now I will prove the security of this construction. So we start by considering the output of any of this encoding procedure on one input. Okay, so we start with some input which is in the language and we want to see how this encoding procedure will work. So given that it's one input, it's in the language, it means that there exists a correct, so the distribution of the simulator is indeed close to the real distribution. So what I would like to claim that the distribution of the randomness output of the output of the encoding is very close to the distribution of the dealer. And I will do it in two steps. So the first thing is that the distribution of the simulator is close to the distribution of the dealer. So this first string Sigma should be close to the distribution of the Sigma because of the zero-knowledge property. And the second thing, because the input is in the language, we know that this to the proof, and the proof P and the Sigma C is a valid proof for the input X. So verification procedures should succeed with high probability, except of the correctness error on this proof P and the common randomness Sigma with the input X. So indeed you will output Sigma with high probability except of the error of correctness and the zero-knowledge, you will output Sigma. So the distribution of any one input should be close to the distribution of the dealer. Okay, so the picture looks like this. You have a distribution of the dealer, which is a green distribution, and you have a distribution of one inputs, which is a blue bubbles, and they're close to the distribution of the dealer. Now let's consider any distribution of the zero input. I claim that the distribution of the zero input should be far apart, or at most they can touch the distribution of the dealer a little bit. Why? Because every string which is output, which is produced by the encoding and is output for zero input, which touches the distribution of the dealer, it contributes to the soundness error. Whenever I output the string which is in the support, it means that it has been verified correctly, and it has been produced by the simulator. So this, any intersection between the red bubble and the green bubble, it contributes to the soundness error. So it means that the red bubbles should be far also from the green bubble. Okay, so the picture looks like this. You have blue bubbles that are close to the distribution of the dealer, and you have red bubbles that are far from it. So this is the picture of one-sided randomizing coding. And this direction is actually quite easy. Okay, now let's go to another direction. So we're gonna prove that one-sided randomizing coding is a subset of NISC. And our construction, it will work as following. So we assume that we have a one-sided randomizing coding. You have an encoding function, and you have a simulator for this encoding function. So by simulator, so remember, you have blue bubbles that are close to each other, I will choose a universal bubble. Okay, so there is a universal blue bubble that is close to all the other blue bubbles. And I will construct NISC as following. I will take this universal blue bubble, and I will sample a point from it. And I will ask the prover, so this is how it looks like, you have this universal blue bubble, okay, the color is green, but all the blue bubbles are close to it. So given that I sample this universal blue bubble, I can, this is the point, I will ask the server to find the random preimage under the encoding function for this point. Okay, and given that this universal bubble is close to all the blue bubbles, it should be always possible. And this preimage will serve as a proof which is sent to the client. Okay, so here I wrote R prime, you can just think of it as R. So if the server is malicious, you could just send some R prime, and this R prime is verified by the client, by taking R prime and encoding function and verifying that indeed this random string is the right answer for this. Okay, this is, you can think of it as a challenge. So basically you ask to invert the distribution, okay? So the picture, it looks like this, you have this green universal distribution, you have all the blue bubbles that are close to each other, you sample from the green distribution and ask the server to invert, okay? Whenever it is possible to invert, it means that the blue bubble actually covers this point when you can invert. And if the blue bubble doesn't cover this point when you cannot invert. So we know that red bubbles are far from it, so basically we get all the properties, you can verify them. So correctness properties of the zero knowledge system it follows from fact that the blue bubble is close to the green bubble. So whenever you want to invert, you can do it because two distributions are close. The zero knowledge properties essentially, it's inverting the, how you sample from the green distribution, it's also easy. And the soundness property, actually the crucial one, it tells you that under the red distribution, you cannot sample, you cannot find a preimage for the green distribution. You know that the red distribution should be far from the green distribution, but actually here is a small problem that it's not always possible to prove soundness like this directly because red distribution can touch a little bit the green distribution so that the preimage exists. So the soundness is not, is wired only for this picture. So now basically we'll have to fix soundness because in the case when the red bubble touches the green bubble. So here I refined the notion of bubbles. So remember before this is the red distribution, this is just a histograms. So this is the red bubble, this is the green bubble. So before I draw them as to be non-intersecting. So basically the statistical distance between two distributions is zero. The problem arises if you take the red distribution and let it touch a little bit the green distribution. So basically for every point from a green distribution you have a preimage on the red distribution. Even the weight of this point is small. And actually in order to fix it, we need to find a way to make like a cool transformation that allows us basically to make this distribution far apart. So basically given this two distribution, we want to, two distribution, two distributions we want to apply some transformation to fix this picture and actually to make red guys to be back to the red part. So we want a general transformation that given two distributions with a very far apart but still can intersect, but still their support can intersect a lot. We want to make them far apart. And another property actually simultaneously that the blue distribution, which is close to the green distribution, it should still stay close to the green distribution. So essentially you can formulate a transformation task that given three distribution will satisfy these properties. If two distributions were far apart, it will make them actually disjoint. If two distributions were close, you want them to be still close, okay? And this is kind of in the spirit of work in SDK, there has been some different transformation and with different properties, we need this particular transformation. And the way we achieve it is actually, we use a non-uniform advice. So we use the work of Goldricks and Vadan and based on that transformation, we actually, we made ours. So in order basically to make this transformation, we need to be able to guess the entropy of this universal green distribution. This is the uniform advice that we get. So this is possible and basically concludes the proof of this theorem, okay? So now I will just, okay, so this is not so important, but so basically the main idea given that you know the entropy of the distribution, you can essentially output for every sample the weight of this sample. And as you can see here, the red guys have small weight and the green and blue guys have a bigger weight. And this allows you to make the distributions by disjoint or still keep them close to each other. Okay, so actually now I think I will conclude. Okay, so we have this task of comparing the class complexity class SRE and complexity class SDK. So our main result is that we put two new complexity classes between them, the class one RRE and class NIS. Can we show the quality between them? And there are still questions that remain to be answered. So the first question is that this equality is almost a good equality. The only thing that we want, we want to make it uniform. So, and unfortunately our reduction, it has this non-uniform advice and it's annoying, but it's almost what we want. Okay, the second thing is that this is a kind of a new question, SRE versus one RRE. And we want to understand if this relation is a proper subset or not. And the second question is already has been studied before the question NIS versus SDK. I would also like to understand if NIS is a proper subset of SDK or not. And basically, as I said, the main goal is to understand the relation between SRE and SDK, but there are these now auxiliary questions that would be nice to have them all answered. And I think this will conclude.