 in the literature, the scenario that have been considered is one of security. So to get you started on the model a little more formally, in this scenario there is a trusted owner who generates the data and some authentication information which we denote as this little purple box there and outsources it to the server and goes offline from the scenario. And it also publishes a short succinct digest of the data and go offline from the scenario. And later third party clients interact with the server to ask query on the data and get responses. Now in this setting the security concern is integrity of the data so that the server also has to give a proof of the answer it is giving so that to prove that the answer is correct with respect to the data that the owner has stored on the server. So that the client can verify the proof against this short succinct digest published by the owner to be sure about the correctness of the result. So clearly in this setting the adversary is the server who's not trusted for maintaining the integrity of the data and therefore the proof has to be verified. And in case it is a dynamic scenario then the owner has to come online only again when she wants to update the data in which case she sends the update and the updated authentication information to the server who updates it at its own end. And she also publishes the updated digest and the rest of it is the same and the owner has to come online only when she needs to update the data. So as we see this setting does not deal with the privacy of the proofs meaning it doesn't talk about whether the proofs leak any information or not. And but you might say that maybe it is not formally considered but maybe traditional constructions already deal with the handle of this scenario which is unfortunately not the case. For example think of the Merkle hash tree which is the most widely used authenticated data structure. In this case the proof of an element X that belongs to the set is actually the sibling path the orange nodes along with the root and this proof reveals the rank of the element in the database. It also reveals the number of elements just by the height of the tree and so on. Now should we care about this kind of leakage? And the answer is yes because there has been serious attacks that has been demonstrated practical attacks in the scenario. So this attack is in the context of zone enumeration attack for DNS queries. In this setting the administrative name server who has a table of DNS names to IP address mappings outsourced this names are sorry outsourced as a stable to the secondary resolver who can also respond to clients queries DNS queries like what is the IP address for this particular domain name. But since this is the authoritative name server and the secondary is not for integrity purposes the primary resolver signs every domain name that is present in the database but along with it it also has to prove non-membership on the fly. So to facilitate that the primary resolver hashes every domain name and lexicographically sorts the hashes and then signs every pairwise hash in the sorted list. So this is to facilitate the proofs of non-membership. When a client query comes if the element is in the database the secondary resolver will only return its signature and it's done. If it is not in the database then for example q.com is not in this database in this toy example. Then the secondary resolver will say that it's not in the database and as a proof will return the address so if the hash of q.com falls within this range then the secondary resolver will return this pairwise hashes along with the signature by the primary resolver to prove non-membership. And this actually leaks a lot more information. So a client can basically ask some domain names which are not in the database and as a proof of non-membership collect these hashes on which it can mount this brute force offline dictionary attack. And this was demonstrated in this NSEC3 walker. And so we see that the problem is coming from the fact that the proofs are actually leaking more information than it should which is just non-membership in this case. Now if you think about it this the problem that we are considering is really the outsourced dictionary setting where a database has been outsourced and authentically proofs of membership and non-membership has to be given with respect to the database. There is a very well studied cryptographic primitive that actually deals with this problem which is the cryptographic accumulator. So the study was initiated in 93 but there have been many flavors and variants since then. But to summarize in the most general setting a cryptographic accumulator summarizes a set CHI with a very short succinct digest information such that later proofs of membership and non-membership on the accumulation set can be generated efficiently and publicly. And the proofs are also verifiable publicly. So if this setting already handled the problem of proof leakage in other words if the proofs were not leaking any information beyond membership and non-membership we could easily deploy this to solve the zone enumeration problem. But unfortunately traditionally in the literature the accumulators are not concerned with this security property. So the security that are have been considered is soundness which means forging a proof for an element is infeasible. So if an element is present a secondary will not be able to prove that it is not an vice versa. So this is where we come in and of the main contribution of our work is first formally modeling what it means for a zero knowledge accumulator what is a zero knowledge universal dynamic accumulator meaning adding privacy property to accumulators so that the proofs of membership and non-membership leak no information beyond that answer. And then we give efficient construction for zero knowledge accumulators and which we will show, we will see in this talk and finally we extend this to give efficient construction for the entire set algebra but I'll be only focusing on union for the purpose of this talk, okay. So a quick recap the model that we are considering is the same as this outsourced data structure model along with that there are so but traditionally like we said only the server was considered to be adversarial. So the answers were not trusted because they were generated by the server so additionally a proof had to be returned with the answer but in this setting we are also considering leak edge from the proofs which means that we are also saying the client can be adversarial and trying to learn more information than the answer itself. So we want the answers to be simulatable given only the answer and nothing else, okay. So I'm directly going to model the security properties. The first property is the soundness and this is as you would expect. This is for the adversarial server. So you see that in the initial setup phase the adversarial server receives the public key from the challenger and later it comes up with a set of its choice after looking at the public key and as a response it gets back those digests like you saw this is the, yeah. So the digest is this public digest that the owner publishes the purple box and the orange box. So it gets back the digests and later it can adaptively ask for update queries and it will get the corresponding updated digests as a response. And finally it comes up with an index J and the query and answer and the proof and we say that it is able to forge it if the answer was not correct with respect to the query on database Kij or the set Kij but the proof was accepted and this should happen with negligible probability. Meaning let's say in the set Kij X was not an element but the server was able to prove that the adversary was able to prove that it was an element and that's an accepted for J. So this is as you would expect this is a sound and definition. And then we formally define zero knowledge using simulation based definition. So this actually models the adversarial behavior of the client who might try to learn more information than the answer itself from the proof of the answer. So in this setup the adversary receives the public key from the owner as before. Then it comes up with the set Kij of his choice which is sent to the real challenger but the simulator gets no information of this set whatsoever except that the notification that the adversary has committed to a set. And then it gets the client digest as response of the simulator simulates this digest without knowing the set. And then the adversary can ask two kinds of queries. So the first is normal membership and non-membership query and the second is an update query. So for the membership and non-membership query the answer goes, the query goes to the challenger who comes up with the answer and the proof but in case of the simulator the query goes to the simulator and it is given oracle access to the set which means it gets only the answer of the query with respect to the set which is a bit. And then it simulates that proof and without any other information about the set whatsoever. And for an update query, the simulator only receives a notification that the adversary has updated without learning what the update is at all. So this actually captures the fact that the update cannot leak any information beyond the fact that an update has offered. Adversary should not be able to learn anything about it. And then the simulator comes up with the simulated digest. And finally, the adversary has to guess if it is talking to the real challenger or if it is talking to the simulator. And if it guesses correctly, then it wins the game and we don't want this to happen. This should happen with negligible probability. So that brings us to the end of the definitions. Now we are going to go into the construction of zero-knowledge accumulators. So if you have any question on the model itself this is a good time to ask. Okay, so then we go into the construction of zero-knowledge accumulators. So recall again, just to recap. So the query is on the set of elements X1 to Xn. And the query is the element in the set or not. The answer is yes or no along with the proof of it. So we have implemented this using bilinear accumulators. So before going into the construction let's fix some notations. So we will represent a set as its characteristic polynomial CH of Z which is represent as the product of Z plus Xi. And if it is evaluated at a point S we will just denote it with S here. Now we are working in the bilinear map setting where G and G1 are multiplicative cyclic groups of prime order. And we are, for the purpose of this talk we are writing it as a symmetric setting which obviously for efficiency we can implement with asymmetric setting. And this is a completed non-degenerate bilinear map and I guess you all already know that the bilinear map is, this is how it is represented of G to the A, E of G to the A, G to the B is equal to E of G, G to the AB. So we are going to instantiate our construction based on this. So the key gen and setup are run by the owner. Note that in this little red box here we I'm actually denoting the asymptotic complexity of each step just for the ease of exposition. I'll give a final summary slide. Don't have to pay a lot of attention. So in the key gen phase the owner takes a security parameter as input. It generates the public parameters for the bilinear map and also chooses a secret S from Zp star which is its secret key and the public key is G to the S and pub. Now to actually set up the accumulation it takes the set X0 as input. It chooses the blinding factor for the set which is what we call donut as R and it's the sigma zero of the set the actual accumulation is just D to the R into the characteristic polynomial of the set evaluated at S and the authentication information which it can outsource along with the database to the server is basically the string G to the S up to G to the S to the N along with the randomness and so for the purpose of update it has to maintain its internal state where it just remembers the entire set. Now for a query operation this is run by the server. So this has to be recall that for a query operation the proof has to be given by the server who doesn't know the secret key but knows this big string G to the S to the A up to G to the S to the N. So for the query which is performed by the server if the element is in the set then the answer is one and the proof is just sigma J which is the accumulation value raised to one over S plus X. Now note that even though the server doesn't know S but because it knows the string of G to the S up to G to the S to the N it can evaluate this. This is rather straightforward. The non-membership is the more tricky part. So recall that if an element X is not in the set then by definition it should be the case that the characteristic equation of Z plus X and the characteristic equation of the set itself should be co-prime to each other. So this is what we exploit to prove non-membership. So the server first runs extended Euclidean algorithm to compute polynomial Q1Z and Q2Z such that Q1Z into characteristic of Z plus Q2Z and Z plus X equal to one. Now the server picks a blinding factor gamma to blind each component properly so that Q1Z is set to this and Q2Z is set to this. So this is just so that in that extended Euclidean equation if you replace Q1 with Q1 prime and Q2 with Q2 prime it will still expand to it will still come equal to one. Now the server computes two components in this proof W1 which is G to the Q1 prime S into R inverse. Recall that R was a blinding factor used for the blinding of the set itself and it knows it because it received it from the owner and the W2 is G to the Q2 prime evaluated at S and sends this as a proof, okay? The verification is very efficient it is just accept verifying those equations hold using the sigma J and the public key that the client received from the owner and for in case of a membership it's just evaluation of two bilinear maps and in case of non-membership it's essentially verifying the extended Euclidean algorithm in the exponent so it is just three bilinear maps, okay? And in update is even more efficient so recall that by the definition of zero-knowledge update it means that an update has to be a proof has to be ephemeral which means that before an update if an element was in the database and a client received a proof that it was after the update the client should not be looking at the old proof it should not be able to tell whether that element any update has occurred to the element meaning whether it has been deleted from the database so actually all the proofs are ephemeral so in this case fresh blinding has to be done to the subset information so sorry, right? So to basically insert an element first the owner recall that the owner only owner can do the update so the owner does the update by choosing a fresh fresh randomness R prime and if an element is to be inserted then it computes the new digest as the old digest into S plus X into R prime and if it is to be taken out it computes it at the old digest into R prime by S plus X and it keeps the new updated set in its state and for the server to keep the update information it keeps its updated record the element that has been inserted or deleted and additionally keeps this fresh randomness so everything is constant time here, okay? This is the update and this is basically the zero-knowledge accumulator construction the entire construction so if you compare it with the GUEN construction of bilinear accumulator which has no privacy we see that privacy because of this clever randomization technique essentially comes for free except this one operation of weakness update which is a lot more expensive than the GUEN's construction and this is the price we pay for getting privacy but all the other operations are comparable or exactly the same, okay? So that brings me to the end of the first that part of the talk of zero-knowledge accumulators and now I'm going to talk about the union which is one of the set algebraic operations and I'm going to show you how we use this technique of zero-knowledge accumulator to actually implement set union, okay? So for set union first recall what the query is for the query is you have a collection of set chi one to chi m and the client query is return the union of certain indices, for example so it's easier to understand with the running example so I'm going to do this running example where the client asks for the union of set say two, five and nine, okay? So the server response is the answer which is the union of these three sets along with a proof that it is correct, okay? So for example let's say chi two is a, b, d chi five is d and f and chi nine is a and c then the answer in the union should be a, c, b, d and f and additionally the server has to respond to this along with the proof, okay? So how do I prove that the returned union is correct? So there are two conditions to prove that it is correct. The first is the superset condition which has to prove that every set is actually a subset of the answer proper subset or the subset of the answer return. Now I'm not going to go into the details of this but given the accumulation, zero knowledge accumulation of each set and the answer itself proving that the entire set is a subset of the answer is done by the server using a generalization of the set membership technique that we just saw. So the more interesting case for us is to prove that the second condition which is the membership condition. So this condition means that we have to, the server has to prove that every element in the answer actually came from one of the sets, chi two, chi five or chi nine. Now proving this is more tricky because if you, even if you use the zero knowledge membership technique, proof technique, this is problematic because it will reveal which set the element had come from. So but we have to hide that fact also for perfect simulability. So here we devised a new technique which is to prove that instead if we prove that the answer is a subset of the multiset union of the set and if we can prove that using, without giving out the multiset union of the set then also we prove membership. So what is multiset union? It is basically the union but all the elements preserve their multiplicity in this union. So in our running example recall that the union was only AC, B, A, C, B, D and F but the multiset union have two A's, two C's and so on. So it actually preserves the multiplicity. So now the server has to first prove that it has computed the sigma of multiset union, accumulation of the multiset union correctly and then prove that the answer is a subset of this. Now the multiset union, so the accumulation of the multiset union, sigma of u tilde is basically g to the product of the randomness used in the blinding of each of the individual sets into their characteristic equation. So if we expand this it looks like this. So first the server has to compute that and prove that sigma u tilde is computed correctly and then prove that the answer is a subset of this. So to prove, to first to compute and prove, step one and step two, basically the server does the following. So recall that the client has sigma two, sigma five and sigma nine from the owner itself. These are the blinded zero knowledge accumulations. So the server progressively builds up a tree where each internal node is the multiset union of its two leaves. The server can compute each of this sigma two, five sigma tilde u by using this big string of g to the s, up to g to the s to the n that it has. And when it computes this value, the root is basically the sigma of tilde u, which is the multiset union. Now how will the client verify it? The client gets all these internal nodes from the server, which is sigma two, five and sigma u tilde in this case. So the client can also verify it using bilinear map from ground up. So it has sigma two, sigma five and sigma nine from the owner and it builds the root of trust ground up. So it first verifies if sigma two, five is correct using sigma two and sigma five and g and then it verifies one level up. So finally it verifies a sigma u tilde is correct. Now once this verification is done, the only thing remains to show is the answer is a subset of this multiset union, which is also a generalization of the subset membership technique in accumulation. So the server computes the witness, which is basically g to the r two, r five, r nine, characteristic equation of u divided by the characteristic equation of the answer evaluated in s. So if you expand it, you will find this basically in our running example and the server can compute this and the verification will just be whether the bilinear map of the witness and g to the ch of s equals g to the sigma u tilde and g. By now the client is convinced that the sigma u tilde is computed correctly. Yeah, so that's basically the crux of how the set union is done. And finally that brings me to the end of the talk. So more in the paper, specifically in the paper, we rigorously discuss how the zero-knowledge accumulator definition and the construction, how this primitive actually relates to the existing primitives of zero-knowledge set, primary secondary name resolver, trap dollars accumulator. And then we give a formal proof that this notion is stronger than the indistinguishability best notion for privacy that was recently considered in these two papers. And finally, we give first efficient construction for the verifiable set algebra using a subset intersection union and difference with almost no additional cost over the existing non-private solution that was proposed in crypto 2011. And that brings me to the end of the talk. Thank you for your attention. Thank you. We have time for a very quick question. Let me ask a question. What assumption do you prove here? N bilinear Diffie-Hellman inversion assumption. Okay, so it's related to strong behemoths. Yes. Let's thank the speaker again.