 Hi, everyone. Welcome to the third session of crypto. The session is about outsourcing and delegating computation. The first talk is optimal verification of operations on dynamic sets by Babis Papamantu, Roberto Tomassia and Nikos Triandopoulos, and Babis will give the talk. Is it on? Okay. Hello, everybody, and welcome to my talk. Today I'm going to talk about the work I did in my PhD thesis, optimal verification of operations on dynamic sets, and this joint work with my advisor, Brown University, Roberto Tomassia, and my collaborator at RSA Labs, Nikos Triandopoulos. So we're also familiar with the proposition of the cloud, and since we started using the cloud extensively in our lives, the security community started working on several security problems, and specifically the two most important ones is first, data privacy. So for example, when we put our data online and we encrypt them, we like the server that is going to make use of our data and be able to actually use them in a meaningful way. Okay, so to deal with this problem, several solutions have been proposed for computation on encrypted data. So this is the first aspect of security problem that comes with cloud computing. And the second aspect of security of cloud computing is data and computations integrity. So basically we know that when we put our data online, a server is malicious and wants to tamper with our data. And we would like to have some guarantees that the answers that we get back from the server are as if they have been computed locally, basically. So as if the data have not been tampered or have not been accessed by anyone else. Okay, so we want some correctness guarantees on the computation that is executed remotely by some untrusted party. And in order to deal with these problems, we can use authenticated data structures and verifiable delegations of computation. And the main difference between these two paradigms is the fact that authenticated data structures allow someone to publicly verify some answer, some computation, worry as the verifiable delegation of computation framework involves some kind of secret key settings which inherently supports one specific user. This talk is not going to be about privacy, but it's going to be about checking the integrity of computations that happen that occur remotely. Okay, specifically, we're going to look into verifying outsourced computation in the authenticated data structures model. Okay, so one motivation for this world that we're going to present here is, for example, searching your Gmail inbox. So what happens when you go and use your Gmail, you want to search your emails. And you can give conjunctive queries, so return the emails that have the terms Brown and Berkeley, or disjunctive queries. For example, return the emails that have terms this is or publication. Okay, and what happens is that these conjunctive and disjunctive queries are implemented through an inverted index. So its keyword is mapped to a set of emails, and when you're looking to retrieve the emails that contain a certain combination of keywords, you would like to compute an intersection of the sets. Okay, so all these operations basically boil down to set operations. And this is what we're going to study here. How you can efficiently verify set operations that are being computed remotely. So in order to do that, we're going to use the formal model of authenticated data structures to present our results. So this model involves the following parties. We have a source that is the owner of the data, and we trust, which computes an authenticated state of the data and outsources along with an authenticated data structure to an untrusted server. Okay, so you can view the digest as a signature on our data. Okay, and so the server has the trusted sources data, and we have a client that would like to send queries to the data, receive some answers back, and based on the trust he has on the source to verify that the answer coming back from the server is correct. Okay, so the server comes back with an answer and a proof, basically. Okay, so this is the model of authenticated data structures. And what we're interested, and of course the client performs a verification. What we're interested in minimizing is the complexity of this model. So basically how long does it take, for example, to do an update of the data set that lies both of the source and the server. How long does it take for the server to compute a proof? This is the query complexity. How long does it take to verify and so on and so forth. And of course, we want to have a well-notion... a well-defined notion of security, which is a polynomial binary adversary, so not be able to provide us with a wrong answer, along with the proof that verifies. Okay, and so we're using computational assumptions here. Okay, so this is general framework. So let's see how we implement our solution in this model now. And we're looking into sets collection, a bunch of sets. So if we go back to the three-party model, we have a source that owns sets one, two, three, four. Okay, it's just general sets. These sets are being outsourced to an untrusted server. And there's a client here that says, okay, I want the server to compute an intersection of sets one and sets four. Okay, the server computes this intersection. As you can see there, the intersection is only one element, d. The answer is your turn back to the user, and there's also a proof there. Okay, now we have, and there's a verification. We can have another client now that asks for the intersection of sets S2 and S3. And the server comes back with an empty set because the intersection is just empty and a proof. Now, the main contribution of this work and the main goal we're having here is that we would like this proof. So basically, the proof that is returned by the server to be asymptotically equal to the size of the answer. So if the intersection is very small, empty, we want the proof to be just constant. Okay, and the main novelty is that we're able to achieve that and basically have a proof that doesn't take into consideration all the size of all the sets that are being intersected, for example. And we're basically doing that for multiple set operations, not only intersection, union, difference query, and so on and so forth. And we're going to use the following notation. M is the number of sets that are stored. M capital is the sum of size of all the sets. T is the number of the queried sets. Delta is the number of the elements that are contained in the answer. And N is the sum of sizes of the queried sets. Okay, so this notation is going to be used in the following. So let's see some related work. So here are the stable shows related to work comparison for the case that we're looking into computing a proof of an intersection of a constant number of sets. Okay, so if you use just collision-resistant hash function, Devambu and others and Yank and Papadhyas came up with a solution that doesn't increase the space at all, but the proof, as you can see here, is proportional to the size of the intersected sets. The proof is linear. Then Marcelli and others, by using some kind of bloom filters representation, they're able, there's another solution that, again, the proof is linear. Okay, so when you try to get the proof down to delta, there was a solution by Pankentan that appeared in the database community where the space goes up to M to the C. So basically, when you're basically trying to make the proof of the intersection as small as the number, you get the space complexity that is very inefficient. Space of the data structure, basically. And here what we're doing in this paper is achieving the best of both worlds, basically having the space linear and maintaining a proof for the intersection and other set operations to be delta, which is asymptotically equal to the size of the answer. Okay, so let's see how we use our solution. So we have a set X with n elements, and this is going to be represented with a polynomial in Zp. Okay, so now, if Z is an intersection of X and Y, that means that the polynomial Z of S is the great common divisor of X of S and Y of S. Okay, and now, if the intersection is empty of two sets, that means that the polynomials X of S and Y of S have great common divisor equal to one, which further, by the extended linear algorithm implies that there exists polynomial P of S and Q of S, such that P of S times X of S plus Q of S times Y of S equals one. So this is the test to be used in our proof that will give us this cool complexity assumption. And the cryptography we're using is Baleen air maps, where you have two multiplicative groups, G and T of prime order P, and we have all the Baleen air setting here. And the assumption that we're using is the Baleen air Q-strong diff-Helman assumption, which is the Q-string diff-Helman assumption in the Baleen air setting, where the challenge is basically outputting this tuple at the last point. Okay, so let's see. We're also using a Baleen air map accumulator. A Baleen air map accumulator is a way of representing a set of elements in a very succinct way with some security. Okay, so if you have a set of elements in ZP, and you have a base and a generator of the Baleen air map group and a secret S and ZP, you can basically represent this set with this digest. So this has, it's a constant size description of the whole set. And the good for this is basically that you can have a witness for any element in the set. And how this witness is computed, this witness is computed basically by, okay, this witness is basically computed by omitting X from the exponent and basically including everything else. So this is a witness that the element belongs to the set. And the verification is using the Baleen air map, of course. So it takes the witness, it takes the witness and the element that you're trying to prove. And if you know that you have the correct digest, you can do the verification by using the Baleen air map. This is a well-known method that has been proposed in the past and the security is based on the Q-Strength-Defiyama assumption. So our construction works as follows. If you have all these four sets, you compute the Baleen air map representation of every set, okay, and on top of that, you build something like a Merkel tree. Okay, this Merkel tree is special in the sense that it has constant height and sublinear degree. And we use a different Merkel tree, not of logarithmic size, because we want to achieve our complexity goals. Okay, and we call this accumulation tree and a similar construction was proposed earlier at CCS 2008. So the intuition here is that the accumulation digest of the sets protect the integrity of the sets and the accumulation tree on top, which is kind of a different Merkel tree, protects the integrity of the accumulation digest. Okay, so let's see now how we compute the proof for the intersection, S1 intersection S2. So first, the server has to compute the intersection. Okay, so this is set one, this is set two, and the intersection, as you can see, is C, E. Okay, now, since we're intersecting sets S1 and S2, the server will return proofs about the accumulation digest that correspond to the sets S1 and S2. And to return these proofs, he's going to use the accumulation tree. Okay, so this is something external to what's going on this level. Okay, and this proof basically is some values along the path of this tree and the construction takes a sublinear amount of time, and the size of the proof is constant. So this is the first part of the proof where you prove that the accumulation digest is a specific value. Now, you need to go inside the accumulation of the elements and do something more specific. Specifically, when C and T is the intersection of subset one and two, you need to prove a subset condition, namely that C and T both belong is a subset of the first set and a subset of the second set. Okay, so in order to do that, you're going to use the accumulation and provide a subset witness. Basically, since the accumulation is G to everything in the set, the subset witness is going to be G to everything in the set except for the elements you're querying for. So this is the first subset witness. And this is the second subset witness. As you can see here, it only basically includes the elements that are left out that do not belong to the intersection, H and Z here and here F, D, A and B, A, B, D, F. Okay, so these are the two subset witnesses. And of course, now we have to prove that in between the subset witnesses, there's nothing else in common. Okay, so not just all the complex of the subset witnesses in order to compute this. And so after you compute the subset witnesses, you need to make sure that what remains if you remove the intersection doesn't contain any more common elements because the problem with doing intersection is basically proving the completeness. And if you have the subset witness to be G to the P of S and G to the Q of S, then the completeness condition is basically G to the AS and G to the BS such that the polynomials that have been used here satisfy the extendoclidean algorithm condition. So basically, the proof is computed using such polynomials that satisfy the extendoclidean algorithm condition. And where P of S and Q of S are the subset witness polynomials. And this is important. That's why we need basically Ballinair maps because basically this condition can only be verified. It's internal product position condition so it can only be verified by using Ballinair maps. And it comes very helpful. And in order to compute this completeness witnesses, we need analog square time. Okay. So let's see now what's the complexity of our method. So the intersection, in order to compute the intersection, you need n time and the size is delta. In order to compute the accumulation value proofs, you need this amount of time m to the epsilon log m and the size is t. The subset witnesses, you need n log n. This comes from fast forget, as for basically all these operations here, that's why it's n log n and the size is t. The completeness witnesses, you do the extendoclidean algorithm and this is the complexity you get by using f of t and the size is t. And you end up having something that is t plus delta. So basically the proof for the intersection is t plus delta where t is the number of the sets you're querying and delta is the size of the intersection. And this is almost optimal basically. And almost is because in order to compute the proof, you have this extra logarithmic cost where it's just computing the intersection, it's just linear. But in terms of size of the proof, it's just optimal. And to see how this compares in practice with other solutions, we simulated the size of the proof in terms of bytes. So here we compare with the solution that appeared in Infocom in 2004 and we're actually using the numbers they're using in their paper and the first column is the number of elements in the first set, the second column is the number of elements in the second set and the third column is the size of the intersection. And here as we can see, most of the time, our proof is always... Excuse me? Yes, so this is kilobytes. It's the work by Morsell and others that appeared in 2004 and this is the kilobytes of this work. Okay. And here it just happens that the constant we're using happened to be a lot bigger, so that's why. But generally as you can see, when we go down to big number of sets, big sets and small intersection sizes, the size of our proof is basically proportional to the size of the answer, whereas this takes into consideration all the sets. And with that slide, I would like to conclude my talk. Thank you. Okay, we have time for a question or two? Yes. Excuse me? We're not in the middle of building a system, but it's in the works. We're working on that. So at that point, I was only able to get the simulations, basically, of the proof, because you really know exactly what is contained in the proof and how big is it so you can easily simulate. Yes? Yes, I mentioned... Yes, I was only able to show the intersection, but in the paper we will describe how you can do all the set operations, the union difference, which is really tricky to get actually, and subset, subset. Next one more question? Yes. Yes. So yes, it's going to be... The complex is going to be as big, proportional to the number, to the size of all the sets. No, no, no. It does have to do with the space of the queries. It only has to do with how many sets you have and how big are the individual sets. Okay, so you don't do any pre-computation of the space in the space query. Well, it's a polynomially bounded server. I mean, no, it has to be polynomial. It can be to the end. I don't know if... Okay, let's thank the speaker again. Okay.