 Hi, everyone. Thank you so much for taking the time to view the full version of the talk about our research paper, Encapsulated Search Index, Public Key Sublinear Distributed and Delegatable. This is joint work with folks that are at the comma, David Casual University of Chicago, and my advisor, if you can notice, and my name is Harish. And I will be giving this talk. So before we look at Encapsulated Search Index, we can first take a pause and look at actually searchable encryption. Searchable encryption, the motivation. Why is it so popular? Why has it grown increasingly relevant? The simple reason is remote storage has become ubiquitous, cheap, and convenient. Everyone wants to store things on the cloud. For example, this very presentation is at least stored on two cloud platforms. But the flip side is that do we ever trust the server that we store this data on? And if you ask a bunch of cryptographers, the answer is always no, because we have severe trust issues. And we typically solve by throwing encryption at it. And this is an elegant solution. But apparently our tools are a little too perfect, because the flip side is that it does really do a good job of hiding all information about the data. And consequently, what needs to happen for us to do some operations on the data is for us to locally download and then perform the operations. So searchable encryption was devised as a paradigm or a primitive where we can actually do something cleverer where we will ask the server to help us perform this operation while being completely mindful of the fact that the server is entrusted. And here, all operations we talk about is search, and the concept is therefore searchable encryption or private searching. So what is the setting? We basically throw some additional encrypted structures, which we call as index, and denoted by the letter E. So because there's encryption, you can think that there is some kind of a secrecy that is involved. Now there is the role of an index creator, who is actually the person who creates the index. Search approval is someone who can control the search process, who typically controls the secrecy and therefore can provide information to selectively decrypt the index. And finally, the storage owner who is typically modeled as someone who is untrustworthy controls the storage location on which the data is stored. So if you want to look at the data flow, you have a document that an index data takes as input to produce an index E. This is an encrypted index E. Now if I wanted to search for a keyword W, it will take some secret information, the search approval, to produce a token ZW. And ZW on the index is combined, we use as an input for the storage location to actually produce the matches. And here the security that we require is something known as index privacy or token privacy, where even if I have the index in my hand and a bunch of keywords, sorry, tokens for a bunch of keywords, I have no information about how a token for a keyword W prime looks like if I have never seen it before. And that is the goal that we want is for privacy of indices that have not been queried yet, or keywords that have not been queried yet. And in this setting, we ask the question, can we achieve sublinear search time and public key indexing? The former is very attractive for a simple reason that a lot of the data structure that are out there can support search operations in sublinear time, a bloom filter can do it in expected constant time whereas balance search trees will do worst case long in the size of the tree or the number of entries stored in the data structure. And public key indexing will have the feature of supporting multiple index creators. So there is no bottleneck where I'm reliant on explicitly one single person to do the job. So specifically we look at this construct are these two features of our searchable encryption primitive in this particular setting where most documents actually do not change, that is immutable. So what does it mean is that once it's created, it's there and this gives us the powerful tool of ensuring that our index be not be created incrementally. So I have it, I can be done with it by creating an index. And consequently what it also means is that I can have document specific tokens. What do I mean by that? It means that I have tokens for keywords that is specific to this document. So we do not need the feature of universal indexing or what we at least call as universal indexing where one single token can be used for the same keyword across any number of documents. So we have the setting where the user has two devices a port and a desktop. The desktop is really powerful but insecure. So what I do is I use the desktop to actually handle sensitive documents and I index it separately and then I really remove the data because I can't leave it sitting. So as long as I have control over it I can trust that it's not corrupted. And then if I want to use the index to search on the index I need the phone to authorize the search and security goal as discussed before is index privacy. And we also want the feature of delegation that is we want to support multiple search of keywords. So we allow an encrypted index that's built for a particular document D using a paired secret key, a public key pair SK comma PK to be actually searched by another secret key public key pair SK prime PK prime but without I knowing D and without modifying E. So essentially I do not want this reinvest effort to create a new index E prime but I rather I would still use the same E but I would just delegate the search process alone. Now this is the setting. This is the story that we are telling. This is the motivating application for which encapsulated search index makes sense where we have immutable documents and we want to achieve sublinear search with support for public key indexing and delegation. So more specifically if you were to look at the process flow I have a mobile phone that generates public key secret key for itself and sends public key to the desktop. Now the desktop with access to the public key takes the document D as input and produces index E and handle C and deletes the data, the original document D. So it does not have the document D anymore in its possession, great. So what do we have there? We next want to search for the keyword W. So all the desktop needs to do is send the handle C and the keyword W. So what do I require security wise is that it's privacy preserving. So phone learns nothing beyond W from Caesar. No C, it knows W. There's no other information that's gained about the index E and this should hold information theoretically. Now having received this input the phone has the capability to use a secret key SK and the compact representation or the handle C and the word W to actually produce a document, sorry a token, ZW for the keyword W. And to be specific, it's also specific to the index E, C for this document. So this token is unique to this document or rather to this E, C value, okay. And then what I want is the desktop will now receive ZW as input and we require the desktop to be empowered with the feature of verifying if ZW is corrective form. So we do not want the desktop to produce the wrong output. The most the phone can do is produce a denial of service attack. And then we also want token privacy or key index privacy where I give you a document, I give you a document D and I produce a token ZW for the keyword W in this document D. Then I have no information about W prime. That's not, that's different from W but is present in the document D but also for the same W that occurs in any other document D. So in both these counts we critically require privacy. And finally, the feature that we desire is that the same public key security pair can be used for different documents. So E, C is essentially unique to a document D. So I can use it to compute E prime, C prime for a document D prime and so on and so forth. Now the most important feature is that there is communication that's happening between the desktop and the phone. And we require that this communication for the search procedure between the desktop and the phone should be constant. So critically note that I said only C and W which are constant in size and not the entire index E because E can be as large as the number of keywords in a document. Now these are the required properties and the required process flow but and the formal syntaxes as seen here and I have a general algorithm that will generate public key secret key and the index algorithm that will actually produce the encrypted index E and a compact representation or a handle C. So I just useful for us to split this index procedure as two sub algorithms. One is called the prep algorithm or the prepare algorithm which takes us into PK and produces S and C. Think of S as a trap door and C as the handle and build index will take this trap door and the original document D to produce E and here the E is the index for D and C is some compact representation which is the only thing that needs to be sent for the search approval to provide a token. Now a search process will take as input the secret key the E prime, C prime and the keyword W and produces a bit that's either zero or one indicating if W was actually present in document D prime for which E prime and C prime are the index and the compact representation. Again, for simplicity, we have taken a modular approach in defining algorithms where we have split the search procedure into three sub procedures. Now, this is not important but the important part here is that we achieve privacy preserving unconditionally and that is a feature of separating the index algorithm to two sub algorithms, the prepare and the build index and the C that is produced the compact handle that is produced is independent of the document D. So therefore, even if I give you the secret key D naught and D one, I will not be able to determine which of the two documents was indexed. And that is the key part here where we achieve privacy preserving unconditionally and holds information theoretically. Okay, this is all great. How do we go about instantiating or implementing this primitive? Now, there is a naive solution there which achieves both public key and sublinear search and this is the solution there whereas it does not offer all of the features but it's a very useful launchpad for the rest of the talk how we come up with better primitives that achieves all the features that we want. Specifically, that we have the phone and we have the desktop as before. The phone basically runs the CCA gen algorithm. So think of CCA, that is a CCA secure encryption algorithm it runs the gen algorithm for this and produces public key secret key, keeps secret key for itself and sends public key to the desktop. The desktop now runs the gen algorithm for a pseudorandom function. So it produces a pseudorandom function key K and now for each keyword in the document D it produces a token by running eval of this PRF on the key K and on this input W and then it creates a set or an array of all of these keywords, sorry, all of these tokens but all of these keywords and then I will just build an encrypted index by combining Y with some linear dictionary. So say I think of it as a bloom filter or a balance search stream which I insert values and then it runs the CCA encryption algorithm to actually encrypt the PRF key K and to produce a compact handle or representation C and it deletes key. So at the end and also deletes the document D the end the desktop is left with the encrypted index E and the handle C for the ciphertext C in this case and whenever it wants to search it will just send C and W as before. Now the phone with knowledge of the secret key can actually decrypt the ciphertext C to get recover the PRF key K and then simply run the PRF eval function to get ZW and send ZW to the desktop. Recall that we want the feature where verifiability was required where in we wanted to make sure that ZW could be verified that it was correctly formed. So all the news replaced PRF with a VRF and we're still golden. However, that does not have support for distribution and delegation which were two key properties that we wanted and for completely different reasons these properties are not offered by this naive scheme and we refer you to our full version of this paper sorry, the full version of the paper and also actually our conference sub proceedings to actually read about why this is true. However, we have a good framework on which we can build it. So it's almost seems like the procedure to follow when we are creating a construction of the CSIs as follows, right? We will index W by computing a pseudo random function on W. So and when we store the value in a sub-linked search dictionary because we have public key setting we almost need two different ways to compute this PRF value and we need a consistency check which means that we need some kind of a VRF, not a PRF. So we introduce this new primitive called encapsulated verifiable random function and this primitive has support for delegation and distribution and that's the key point here whereas the naive scheme does not have these two support. So what is the syntax of the other? Think Alice, think Bob. Alice has public key secret key generated communicates public key to Bob. Now Bob has the power to run this encapsulation algorithm simply which is a randomized algorithm which takes us in but public key pk and produces a trapdoor t and a ciphertext c. So now it communicates the ciphertext c to Alice and keeps trapdoor for itself. So now Alice wants to compute the EBRF value on an input x. So what does it do? It takes a ciphertext c, takes the input x and uses a secret key to produce this value y. y is EBRF of x. Concurrently what Bob can do is Bob has this algorithm called comp which will take input x and the trapdoor t to produce the same exact value. So essentially we have produced a map for Alice and Bob to generate the same value y which is EBRF of x using two completely different ways. And there's just the communication of just the c. Okay. So what is the security requirement? Think challenger, think adversary, challenger runs this typical gen algorithm but in addition it also runs the end cap algorithm which is the encapsulation procedure to produce a ciphertext c and the trapdoor t. It communicates pk and c to the adversary and keeps b, sk and t for itself. Now the adversary has access to this eval oracle where it can make queries on inputs c prime comma x and it receives as response the eval on c prime and x. And then finally communicates a challenge input x star. Now the challenger does the standard in distinguishability game by computing y not by using the secret k sk and the trapdoor eval just the actually the secret k sk sufficient. Sorry, actually doesn't even my back. I trans this y not and y one, it computes these two values and then communicates y underscore b to the adversary. Brilliant. And the adversary responds with b prime and wins if b prime is equal to b. Okay, this is a simple setting and to prevent meaning to prevent trivial attacks we require that c and x star was never queried by the adversary. And now let us look at the construction where we have Alice, we have Bob Alice basically picks random element a and computes g to the a as the public key. Bob on the other hand runs this end cap procedure and communicates the ciphertext as in here and then it keeps the trapdoor for itself. Now Alice can compute y in this manner basically takes input e f h comma h of c comma x raised to the a comma c because it knows the value and correspondingly it does the Bob can do it by using the trap door which where it has the powerful element s which is nothing but a to the r or g to the a r. So essentially a to the a r will come as exponent outside and the same values hash because c is equal to r. This is a simple construction. We can prove that this is secure under the b d d h assumption in the random model model. Okay. So now all we have seen is we have made a case that e v r f is a primitive it's a useful primitive and we have shown that it's a construction possible. Now let's see how to build this ESI in a generic fashion. So the key gen algorithm is nearly running the e v r f key gen algorithm. And then the index procedure runs the end cap to actually compute c comma p and the indexing the remaining indexing procedure is very similar to the naive solution except that instead of running p r f dot eval it will run e v r f dot com and then runs the data structure build algorithm to produce an index key and then returns e and c. Now the search procedure will simply do the same thing as what we saw before instead of running e v r f dot eval by first decrypting and then running the eval it will just run eval on sk and c and then run the d s dot find algorithm. Now the beautiful part here is that we our composition result here will inherit the properties of distribution and delegation from the e v r f and formally prove it in the paper. So what do we get out of all of this? We get threshold ESI which achieves distributed token generation in a non-interactive manner. So we take our standard e v r f and throw verifiable secret sharing and some e's secret sharing and the scheme is secured under the b d d h assumption. Now the more important feature that we want to consider delegated where I can take a search approval one can delegate the search process to a search approval two without having to recreate the index. So we have this delegate algorithm that takes us input c one sk one and produces sk two and c two. Sorry, also a secret key of the intended recipient sk two and to produce a handle or a representation c two. So this is not recreating the index but simply modifying the ciphertext or the handle alone. And the verification procedure will either will verify if this delegation procedure has happened correctly. Now, as we saw before or rather as we stated before delegatable ESI can be built from delegatable e r f we didn't have this the properties. So let's look at building a delegatable e v r f. So what do we want this again to search approval one somehow we can delegate the process. So instead it will take ciphertext c one and c two and the correctness of the delegation requires that for every single input the eval function is correct essentially and same as a useful algorithm that we will define which will return one if this correctness of delegation passes if pk one c one pk two c two is given it is one if and only if the correctness of delegation passes. So now I've done all of this but I want to now discuss how do we actually build this delegatable e v r f at least the basic thing. So recall that we have this has the standard e v r f construction. And then I mean I've taken the diagrammatic representation and put it on a kind of in a textual format. Now the key idea here is separate the r. So I'm using this r twice here. So I'm once I'm using it to hash the value and once I'm using it as a parameter for the bilinear map. So let me see I want to separate the two process out. So how do I do it? I introduce this new term called d is equal to g to the r. So essentially r is equal to d is equal to g to the r. And I will also include the public key and this value d in C and T. So this is the new definition and the delicate algorithm is this. Basically does a comma r comma d one raise to the a one over a two. So a one over a two think of it is basically a one a two inverse mod of p one or mod of p basically the prime order group that we are working with. And a can now generate however this new Cypatec prime as a squared comma r comma r squared. If I see this, I just need to spread it. It's all public information. Now note that the minute this happens I can now query eval on this value c prime and x star and every check will pass, right? It will produce the correct result and unfortunately we need to prevent this. So the way out is to basically throw in the public key that you also get also inside the hash function. Okay, so that's the key thing here that public key also becomes a part of the hash value here that's computed, okay? And then so we have the delegate algorithm here and what's the next thing that we need to do? We will throw in this additional check, okay? So what we wanna make sure is to see if this is essentially holds true that the component that we get from pk and the component we get from c prime are consistent with each other are correct. Then this is trivially true when I've not modified it and I've not done any delegation. Otherwise d prime is uniquely determined by a prime, r prime and a. And it's because the check passes, right? And the reason why this big exposition is that it's okay to not include d prime in the hash because we have done a prime, we have done r prime in the hash but we are not including d prime but it's fine because this check passes and a prime, r prime and a will therefore uniquely determine d prime. So it's fine to not include it, fine to not verify that that is correct, fine, okay? And the final point here is about the same function because we needed the same procedure, right? So these are the terms. So pk one is nothing but g comma a one, pk two is g comma a two and these are c one and c two, right? They're different a, r and d one and this is some a prime, r prime and d two. If I went honestly delegated, okay? The first two terms will be the same. So if a comma r and a prime comma r prime are the same then it was honestly delegated. So I know that they will produce the same results. The delegation is correct, therefore it will produce the same results. Now, if a of a one comma d one, a one is the public key that we are dealing with and a two is the public key that we're dealing with here. If this value holds, again, we know that the eval function will produce the correct result. It can be mathematically verified and we verify it in the paper but the key point here is that the same procedure will work like this. If these two checks passes, then I will return one, okay? And this is a basic delegatable EVRF and here we only have an honest delegation which means that the adversary can only delegate to a secret key or a party whose secret key does not know and this is secure under the BDDH. Now I can take this and I can also allow for a little bit of corruption, okay? What do I mean by a little bit of corruption is now I can specify SK2, okay? I can specify SK2 and we prove that the same construction, we have some further extensions which will also support this extension and it's secure under BDDH, both in and out delegation. So now I can specify SK1 and ask to get SK2 or I can produce SK2 and ask to an honest SK1. Both of these are possible and we unfortunately, we can only prove it to be secure in a very strong model IBDI, the interactive BDDH assumption. Unfortunately, it's not well studied. So we try to find a middle ground which is the one time thing where only one in delegation varies allowed but any number of out delegation and we prove that the construction is secure under the EBDDH assumption. Now, there's a lot of acronyms here. There's a lot of settings here but the key argument here is this, right? I can, an attacker can simply take a query that take a C that it receives as challenge and then asks it to basically delegate it honestly to a second party. And then once the C has become C prime it can run C prime and it start and because we want it to be delegation correct this will pass, you get the same value and the correctness is avoided, right? We break the scheme trivially. Unfortunately, I mean, the way around this the fortunately rather is to use the same procedure and that is why we require the same procedure critically which will determine if the values that I'm playing with will actually produce the same result, okay? Now, I can do it once. I can keep doing it multiple times. At some point of time, I can do it. I need to somehow retrace the path and that's why bidirectional support for this kind of delegation is quite, you need some stronger assumptions to prove secure. We can check the paper for details for more expansive discussion. So now let's just basically summarize the talk. What have we done here? We introduced this primitive of encapsulated search index which has all of the features including the distribution and delegation. And we also introduced this primitive of independent interest called encapsulated verifiable random function which achieves this computing of PRF value through two different ways which can be done by two different independent parties and I can extend it to the distributed and delegated setting. The basic version secure under the BDDH assumption in the random model model. Now, having said all of this, I should also stress that this is commercially available. It's a product that's out there. Atacama has deployed it in several companies and it's quite successful. So all of this is not purely theoretical result. We also have practical efficiency guarantees because it's been deployed and used extensively by the clients. And that's all I have time for you now. Please thank you so much for taking the time to see the full version of this talk. I'll take questions during the live proceedings. Thank you.