 Thanks for coming. So this is some joint work with Dan, Alette, Niv, and Yuval. The topic of this talk is zero knowledge proofs. So as you know, zero knowledge proof is a two-party protocol between a prover and a verifier. So think of the verifier as taking his input of graph. And the prover is going to try to convince the verifier to say that this graph is three-colorable. So they'll exchange some messages. And at the end of this protocol, the verifier should be convinced that this graph is indeed three-colorable. So for this proof system to be useful, it should satisfy a number of properties. It should be complete in the sense that an honest prover should convince an honest verifier. It should be sound in the sense that a cheating prover should rarely fool an honest verifier. And it should be zero knowledge in the sense that a cheating verifier shouldn't learn anything about the graph, apart from the fact that it's three-colorable. So in this talk, we're going to be looking at a different type, a new type of zero knowledge proof that we call zero knowledge proofs on distributed data. So the setting is different in that there's multiple verifiers. And now each verifier takes his input only a piece of the statement that's being proved. So if we're looking at graphs, each verifier, say, could take in, as input, a piece of the graph. And the prover is gonna try to convince these verifiers that are holding this graph in distributed fashion that they're actually looking at a three-colorable graph. So the prover and the verifiers will interact over private channels. And the verifiers can even interact with each other. And at the end of this interaction, the verifier should be convinced that, yes, we're holding a graph that's three-colorable, even if I don't know what the other part of the graph is. And we can define natural notions of completeness and soundness and zero knowledge in this setting. So for us, we'll say the protocol is complete if an honest prover convinces an honest pair of verifiers. It's sound if a cheating prover rarely convinces an honest pair of verifiers. And we'll say it satisfies what we call strong zero knowledge. If a cheating verifier doesn't learn anything else about the other verifier's input, apart from the graph, the fact that the joint graph is three-colorable. So in essence, Verifier V1 only learns its input G1 and the fact that the entire graph is three-colorable, it doesn't learn anything else about the other verifier's input. And we can define in this setting kind of natural notions of round complexity and public coin-ness for these protocols as we do in any multi-party protocol. And we can also look at the setting in which there's more than two verifiers, so any number of verifiers you'd like. In the cryptographic context, there's one type of zero knowledge proof on distributed data that comes up all the time. And these are what we call zero knowledge proofs on secret shared data. The setting here is one in which the prover has a vector and the verifiers each have secret shares of the vector according to some say linear secret sharing scheme. So neither verifier actually sees the data in the clear. They only have shares of the data and yet the prover should be able to convince the verifiers that this vector is in some language. So this may seem a little bit contrived or a little bit theoretical, but it turns out that these zero knowledge proofs on distributed data actually have applications to a bunch of different privacy preserving systems that have already been built. So systems for PIR writing and private messaging, private computation of aggregate statistics and private ad targeting, all use these ideas kind of implicitly and many of these systems actually implicitly construct zero knowledge proofs on distributed data. So the goal of this work was to give unified definitions for this idea and also new more efficient constructions for zero knowledge proof on distributed data. So in particular, if we look at the applications that are on the slide, we get kind of exponential improvement in the communication complexity of the proof systems and I'll explain how that works. The other thing that we do is give a new application of zero knowledge proofs on distributed data to malicious secure multi-party computation. And just to point out that these things are actually used in practice, one of these systems is implemented in the Firefox browser and we're hoping to port some of these efficiency improvements that come out of this work into that setting as well. So the first type of result I wanna talk about are new zero knowledge proofs on distributed data that we construct in this paper. So the first theorem here is kind of a general feasibility result that says that if you have a language and our languages here are gonna be vectors in a vector space that's recognized by say an arithmetic circuit of size C, then there's gonna be a public coin zero knowledge proof on distributed data for this language that has a constant number of rounds and has communication costs that grows linearly with the size of the circuit. So this kind of generalizes some of the special purpose schemes that people have constructed implicitly before and we also give non-trivial extensions to the setting in which the prover and some of the verifiers collude. So this is essentially says that for any language you want we can construct a zero knowledge proof on distributed data. And I should say this first result is actually information theoretics. So there's no assumptions required here. The second type of result we have and here's one example is that if your language has structure. So for example, if your language is recognized by a low degree arithmetic circuit, then there's a public coin zero knowledge proof on distributed data for this language that requires only kind of logarithmically many rounds of communication and has communication costs that grows logarithmically with the size of the inputs. And this improves on the linear communication cost of the prior schemes that were implicitly constructed. I should mention there's a bunch of ways we generalize this result in the paper. So for example, you can look at a smaller number of rounds and get a different trade off in terms of communication complexity. And you can also look at a wider class of structured languages like circuits of constant degree or circuits with repeated structure and so on. So I'll point you to the paper for details on the theorem statement. So having defined what a zero knowledge proof on distributed data is, I now want to give you a little bit of intuition about how we construct these objects. And we do so using a new type of proof system called fully linear PCPs. So constructing a zero knowledge proof system on distributed data as we do kind of works in two steps. The first thing we do is we define this notion of a fully linear PCP. And then we show an efficient transformation that takes a fully linear PCP for some language and constructs an efficient zero knowledge proof for distributed data for that language. So after we have this first step done, all we need to do is construct new more efficient fully linear PCPs for languages of interest and that completes, that gives us new proofs on distributed data. So I want to, I think the easiest way for me to describe what a fully linear PCP is is first to recall what a linear PCP is. So this is a notion of proof that's been around for more than 10 years and has been used in a bunch of cryptographic constructions. And what it is, what a linear PCP proof is, is just a vector. That's what the proof looks like. And the way that you check this proof, essentially this vector, this proof is asserting that some input X is in some language L. And the way the verifier checks the proof is kind of unusual rather than reading the entire proof and reading the entire input and then accepting or rejecting. What the verifier does is it makes a constant number of what we call linear queries to the proof. So this verifier down here outputs a query and this query is a vector that's as long as the entire proof. And this verifier gives this query to the oracle, this proof oracle and the oracle responds with the inner product of the query and the proof. So this answer is kind of a constant size answer, one field element size. And after making a constant number of these queries, the linear PCP verifier should accept or reject. So even though the verifier can't kind of explicitly read the proof, it has implicit access through these oracle queries. And this type of proof system should satisfy natural notions of completeness and soundness as your knowledge. Okay, so that's what a linear PCP is. What a fully linear PCP is, which is the new idea in this work is a new abstraction, it's very, very similar, except now the verifier doesn't actually have explicit access to the input either. So what a fully linear PCP verifier does is it outputs some queries, it gets to make linear queries, but it's linear queries are to the combination of the input and the proof. And after making a constant number of these linear queries, the verifier is supposed to decide whether this X, to which it has only implicit access through this oracle, is either in the language or not. And again, we can define completeness and soundness in zero knowledge in this setting. So if you believe that fully linear PCPs exist for your language, then I'm gonna explain to you how you use them to construct an efficient zero knowledge proof on distributed data for this language. So the way this works, again, the setting is that we have multiple verifiers and each verifier has a piece of the input X. And the prover who has all of X is gonna try to convince the verifiers that this X is in some language. So what the prover does is it produces a fully linear PCP proof that attests to the fact that X is in the language. It then splits this proof using a linear secret sharing scheme into two pieces and it sends one piece to each of the two verifiers. So at this point, the verifiers now have a piece of the input and they have a piece of the proof. And what they wanna do to check the proof is essentially simulate the process of asking these Oracle queries to the linear PCP proof oracle. So what the verifiers will do is they'll use common randomness between themselves to sample a series of these fully linear PCP queries. And then by using the fact that they can compute an inner product on secret shared data, the verifiers can publish shares of the answers to this query that the fully linear PCP proof oracle would have given. So what the verifiers essentially publish, take the inner product of this query with their piece of the input and their share of the proof, both verifiers do this and by publishing these values they can recover the answer to this query that the oracle would have given. So by simulating these process of asking the oracle queries in this way, after a constant number of elements of communication, the verifiers can recover the answers to these queries that the oracle would have given and then they can just run the fully linear PCP verifier on these answers to check whether the proof is valid or not. And the communication here that was required between all the parties was essentially twice the size of the proof plus a constant number of elements for computing these oracle query answers. So I hope I've convinced you that if you can construct such a thing called the fully linear PCP for your language, then you can use this transformation I just showed to get an efficient zero knowledge proof on distributed data for that language. So now the question is how do you construct these fully linear PCPs? Well, it turns out that many existing constructions of linear PCPs also satisfy this notion of full linearity where the verifier only makes linear access to the input. So the linear PCPs that people use for cryptographic constructions already satisfy this property but they have a limitation which is that the size of the proof grows with the size of the circuit linearly in the size of the circuit. And in many applications we'd like sublinear size proofs. So our idea here is to get shorter proofs using interaction between the prover and the verifiers. And this set of techniques we have works when your languages is simple in a way I'll describe. This notion of using interaction to shrink the proof size is used in a bunch of different places in kind of related ways. So communication complexity protocols and interactive oracle proofs and some check-like proof systems all use very similar ideas to the ones that I'll describe. So I don't have time to go into the full details of how this works but I wanna give you a flavor of how we get sublinear size proofs using interaction. So taking just the example of a degree two circuit. So your language is recognized by a degree two circuit and the prover is trying to convince the verifiers again who hold pieces of the input in distributed fashion that some degree two circuit accepts this input. So what the prover can do is we show that the prover can send each verifier a short proof like a constant size proof. And this proof has the property that checking the proof only requires the verifiers to apply a random linear mapping to their piece of the input and the proof and then evaluate a degree two circuit on the result. So you'll have to believe me that this is true. Actually proving this requires a little bit of technical work but essentially what the verifiers can do is they can apply this randomized linear map on their piece of the input and their piece of the proof and then once they have they just need to check that this kind of new squished input satisfies this new degree two circuit. And rather than evaluating the circuit themselves they can outsource the work of evaluating the circuit to the prover and recursively invoke our proof system. So the verifiers send the coins that they use to sample this random mapping to the prover and then the prover convinces the verifiers that they would have accepted had they actually computed this circuit on their inputs. So in this way when your language has structure like this you can use this interactive trick to reduce the size of the proofs. So after logarithmically many rounds the verifiers will accept or reject. Okay, so the last thing I wanna do is talk about an application of zero knowledge proofs on distributed data to multi-party computation or in particular three-party computation. What we show is that for any arithmetic circuit you have over a field F there's a secure three-party protocol for computing this circuit that tolerates one malicious party is computationally secure with abort and we make only black box use of a PRG for this and it has amortized communication cost one field element per party per gate in the circuit. So this is interesting kind of for two reasons. So one reason is it gives constant factor improvements over the communication complexity of prior MPC protocols that satisfy these properties. But more interesting is that with this work we essentially match the cost of the best three-party computation protocols that only have semi-honest security. So you get malicious security for the cost of semi-honest security essentially using this transformation. So I wanna give you some sense of how we actually use these zero knowledge proofs on distributed data to get malicious secure MPC. So our starting point is we take a semi-honest MPC protocol that I'll call phi that has two extra properties. So this MPC protocol should have the property that the protocol reveals nothing is kind of perfectly secure until the players send their last messages. And this should hold this kind of perfect security property should hold even if parties deviate from the protocol while they're executing it. And furthermore that if the parties misbehave at the very last message the worst that can happen should be an abort. The second property is that it should be a degree to protocol in the sense that every one of each player's messages should be a degree to function of their inputs their randomness and the messages that they've received so far. These may seem like kind of onerous restrictions but it turns out that a number of the MPC protocols that are already in the literature satisfy these properties. So I've mentioned two here but I'm sure that there's more. So if we're given one semi-honest MPC protocol that satisfies these properties I'm gonna explain how we use distributed proofs on distributed data to sort of lift it into a malicious secure one. The way this works is the players first run this malicious secure MPC amongst these three players but before they publish their last message they halt. So they run the protocol all the way to the very end but then stop before they publish their last message. At this point the players each prove to each other that the messages that they sent so far complied with the protocol. So in particular player one proves the players two and three the messages that I sent you observed the protocol five that I did the right thing. Now notice this is a zero knowledge proof on distributed data. This is what the tool that's called for because the messages that player one sent during this execution are were received by players two and three and player two doesn't know what player three received and player three doesn't know what two received. So these messages are kind of held in distributed fashion. And because this protocol has the degree two property that I mentioned this language is actually recognized by a degree two circuit. So player one completes this proof players two and three check the proof then player two proves the same thing to players one and three and player three proves the same thing to players one and two. So we do three of these proofs and the communication complexity of this step is logarithmic in the size of the circuit. So this is how many field elements are required. And the reason that this is only a logarithmic cost is because of these new zero knowledge proofs on distributed data for degree two relations that I mentioned. So finally, if all the proofs check out the players can reveal their last message from the underlying semi honest MPC and they get the output. So just to give you a summary of what the cost is well there is C plus little of C messages from the underlying MPC protocol. The proofs that the step of proving things to each other just required log C number of field elements. So on average per party per gate we're looking at one plus little of one field elements. And in the paper we go through a number of generalizations to say a constant number of parties with any honest majority or to arbitrary rings. So if you wanna work over 32 bit integers or 64 bit integers similar ideas apply. For those of you who know a lot about MPC you know about the GMW compiler which is another way to go from a semi honest protocol to a malicious secure protocol. I wanted to mention just a few of the differences between these two approaches. The first is that GMW uses message by message zero knowledge proof. So with each step of the protocol every party proves to the others that this next message I'm sending is actually well formed. What we do is we defer the proofs into the very end of the protocol and do use one big sublinear size proof for this one big statement at the end. The second difference is that GMW requires commitments and therefore assumptions whereas the compiler that I mentioned just now is information theoretically secure so you don't need any assumptions. And then finally GMW requires that all players see all the messages because to verify a proof I kind of need to know what you sent to everybody else. So this compiler makes the most sense in the setting in which players communicate over a broadcast channel. Whereas with these distributed zero knowledge proofs this compiler makes sense even when players only have point to point channels and that's because we're able to prove things about data that's held in distributed fashion. So this talk has been about these new zero knowledge proofs on distributed data and the setting again is one prover and many verifiers with each verifier having a different input and the goal of the protocol is really to hide each verifier's input from the others. The proof systems we get out of this approach are information theoretic and lightweight and we construct them using this new tool called fully linear proof systems. The applications I mentioned to MPC are also to other types of privacy preserving systems and we're hoping to many other systems in the future. And I think there's still a bunch of work to do in comparing these proof systems and building them from other types of ideas in the cryptographic literature and also trying to understand the connections to other models of distributed proofs. So with that, happy to take your questions. Thanks. Questions? Yeah, so can you extend this to the case where also the prover is trying to hide things from the verifiers? Like can you extend this to the case where there's also a witness? Yes, yes, absolutely. Okay, but then you need some cryptographic or some just presumably. If you want a sublinear proof size, then you do. But if you, essentially what the prover can do is it can send the verifier's shares of the witness and then you're looking at the, essentially the same situation that we're looking at here. I see, I see. Oh, I see. Thanks, yep. Thanks for the presentation. I was wondering if you can comment on, is there some kind of natural barrier if you are trying to apply this result to achieve a dishonest majority, NPC? I think you would require a different set of techniques, yeah, because these information theoretic proof systems are not gonna apply in that setting. But yeah, it's an interesting question about what would work and what you could do along these lines. Yeah. Okay, let's thank Henry again. Great. Thank you. Thank you. Thank you.