 Hi everyone, so I'm going today to talk about our work typeset Boolean searchable encryption with worst case sublinear Complexity is a joint work with with Cinecom. Okay, so let's start so Bob like many of us use mini cloud services to store his data and Bob also cares about his confidentiality. So he's encryption his data before outsourcing it to the cloud Right, but you know naively encrypting the data makes Bob lose his search functionality and this motivates the problem of encrypted search Area, so the state of the art has many approaches to solve the encrypted search problem like structured encryption searchable encryption oblivious RAM functional encryption MPC property presenter encryption and Full homomorphic encryption and actually there are so many criteria to evaluate these approaches one of the most important one of them are the efficiency of the of the approach the expressiveness of the approach and also the security and You know based on this criteria We think that searchable symmetric encryption is better suitable for the encrypted search Problem so in this talk, we will focus on SSE and in more particular on expressive SSE and In this talk, we will talk about Boolean searchable symmetric encryption. So there are so many work that solve this problem namely OXT and blind seer constructions and they offer different trade-off between expressiveness and Efficiency all right. So the goal of this paper is to improve on these lines and propose something that proposes Better expressiveness and efficiency With less leakage as well So let's get in more details about these two construction The OICC and blind seer. So OICC by cash at all Affords sublinear search overhead for for conjunctive queries But it has linear search overhead for for these active queries and also for any arbitrary Boolean queries But it's it's non-interactive Blind seer on the other hand has sublinear search overhead for for arbitrary Boolean queries But it's interactive and has logarithmic multiplicative overhead over the result set. So both of these Constructions have Advantages and these advantages right and the goal of this work is to take the best of both worlds Namely having something which is sublinear in the worst case non-interactive Hopefully less leakage that we achieve also and being having optimal communication costs as well All right, so the contribution of our work can be divided in two classes One class that we call the black box construction and another one what we call the concrete constructions So the black box construction we propose three different ones like the first is IEX that has purely disjunctive Single keyword search and it's interesting Part of it is it can be built from any single keyword SSC We also propose BIX for Boolean single searchable symmetric encryption that can be built from IEX and then we propose the dynamic Disjunctive SSC that we denote DIX that can be built from any dynamic Search a single keyword SSC and can be made also forward secure So the nice thing that I want to point out here about this black box way of thinking about Searchable symmetric encryption is that we just depend on some simple building block like in this case a single keyword SSC and The security of our construction will always depend on this building block So any advances hopefully in futures that make this construction better will imply Basically that our construction will enjoy these advances as well. So as I have said we have concrete construction of our IEX Scheme so we basically instantiate the the single keyword SSC construction by the to live construction by cash at all and We also instantiate BIX with the same construction because it happens that this construction is also dynamic single keyword the SSC and Due to some storage overhead Issues we have also introduced a new single keyword construction, which is very compact that we call which is Instances it based on a new plaintext data structure that we call my source can feel there You can think of it as a new bloom filter data structure that I'll briefly talk about later in this talk It has linear search complexity though, but you will see that it doesn't have any impact in the way how we use it in our IEX construction and Of course, we can instantiate our IEX with ZMF as well because it's a single keyword SSC and it will provide a better compact Okay, so before getting in the details of our IEX I want to give you some preliminaries So we will heavily use in this talk two types of data structures Dictionaries and multi-maps. So this dictionary is basically our data structures that maps Labels to values you can see here an illustration of This data structure and for example if for the get operation if you want to retrieve the key the value associated to the keyword w3 It will return I identify to as you can so mil team maps are also data structure that map labels, but to tuples rather than Values and you can see here also an illustration of the data structure and For the get operation we can for example retrieve the tuple associated to the keyword tree and here in this example, it will Get back the identity far true at identified for as a building block also in our construction We use the primitive of structured encryption. So structured encryption just to give you a high level intuition about it It's a cryptographic primitive that encrypts any data structure in such a way that you can create private privately later on so for this Presentation I'm taking just an example a mil team up, but it can be any data structure graphs dictionaries arrays. It's it's general and And it's so the structure the encryption is composed of three algorithms Setup token and get so the setup takes as an input a security parameter and the data structure and will output an encrypted data structure and the key the token algorithm will take a key and a keyword and will output a Choken which is basically you can think of it as an encryption of as an ink type of an encryption of the of the keyword that It's given as an input and the get algorithm takes as an input the token and Encrypted data structure and it will output the the answer that can be encrypted or not in this example It's encrypted and we call it's a response hiding Otherwise, it's called the response review Okay, so and one analogy that I want you to keep in mind through this talk is that an encrypted mil team up Basically is a one way to view it is an encrypted inverted index, right? And an encrypted inverted index is basically a single keyword SSC that we have seen These recent 16 or 17 years from the work by song in in 2000 and so many construction have been Published so far that achieve different trade-off between security and efficiency and for this talk when we will get to the concrete Instanciations we will use in particular the construction by cash at all in 2014 All right, so It's also important to understand a little bit what security we get from structured encryption. So it's we have Like a real experiment where the adversary can send to the challenger any data structure So here for example, I have taken a mil team up as an example the challenger will encrypt it send it back to the adversary The adversary can actually send as many polynomial number of queries to the To the challenger who will encrypt and send back the joke and this can be also done Adaptively so additively means just the fact that the adversary will wait to see the token before sending the next token and The ideal experiment is the adversary send always the mil team up But here there is a simulator who will get just the setup leakage of this mil team up and just based on this setup leakage the simulator will simulate the encrypted mil team up just based on this LS, right and Similarly for the query phase the simulator will get just the query leakage of the Of the of the keyword that the adversary wants to search for and will simulate the token That will be sent later on the adversary and we will say that our structure the encryption is secure with respect to these to this setup and query leakage If the real experiment and the ideal experiment are in the same way for any people All right, so so now we are ready to to go to To the technical details But before I want to give you also a high level intuition that will underline our our thought process in Construction our construction so I will give you an overview So first of all I want you to think of a mil team up as a collection of sets So that you can view a mil team up as collection of sets and if you can see the mil team up like that a Disjunctive keyword queries can be viewed also as the union basically of these sets All right So one can may think of a of a simple solution basically using a single keyword SSC and Doing just a naive simple union between all the items But the problem of that we will get a multiplicity basically we will have redundant elements sent back to the client Which basically implies that the we will have sub-optimal communication and the leakage is quite heavy in this case as well All right And so we were thinking how to avoid that and basically this leads us to to introduce or to use the inclusion exclusion principle based union which removed basically this redundancy and and it will help us also to have like optimal communication and less leakage So we introduce a plain text set start set structure that makes using this inclusion exclusion principle easier and Then we will show how to generate the encrypted structure from that All right. So now that I have given you an overview I will give you an illustrative overview so you can keep in mind all these step of our touch process So this is the mil team up and basically you can view it as sets So this set is basically composed of the tuples Of the tuple associated to the keyword one. All right, and here it's I don't far one three four So this is our first set for keyword one. We can do the same for keyword two and the same for keyword three. Okay, so we have our set representation here So the second step as I have said we have our disjunctive query composed here of three keywords one two and three and you can view it as the disjunction of this This these sets so here one point to note is Look, we have some redundancy here. So they identify here and identify or for a redundant and If we use our naive construction, we will get these two identifiers back to the client And this means that we don't have optimal communication for our searchable Disjunctive searchable encryption scheme. So how can we avoid this redundancy? So I'm going to present it here. So this is our our our union and we will take our first set and We will remove The intersection between the first and second set and the first and third set So basically we will have only the identify one here Then we will move to the second set and for the second set. We will just Get rid of the intersection with the third set and For the third set we don't do anything All right, and you can note here that we have removed our redundancy And it will give us what we want optimal communication And this is just an example for three sets and you can generalize it with the inclusion Exclusion principle to to many sets okay so now we want to build our our Set structure with inclusion exclusion principle and basically we will perform some preprocessing So the preprocessing basically does the following it will take each of the sets that we have like year one two three and Will pre-compute the intersection Between the the set so for the first set set one This is basically the pre-computation of the intersection between set two and set three. We do the same also for the other sets Okay, so now we after doing this preprocessing We will roll back to our first present representation of the milchimap where the these sets are basically a milchimap So in this talk we will call it global milchimap because it will Contain the original view of our our milchimap and then we will add the local milchimaps that the preprocessed sets that they are basically also milchimaps with With labels that they are equal to the intersection as I have explained between keyword one and keyword two Which is set one and set two and this is also can be viewed as a milchimap And we will call it local milchimaps because they are local and they are preprocessed So this is one way to think about plain text that structure So how can we use this to to encrypt our our searchable symmetric encryption? So that we call IEX For short so as any structure the encryption is composed of the setup Of course the token and and the git algorithm the setup algorithm takes as an input security parameter on the milchimap and It will output basically the encrypted milchimap For the global and the local milchimaps here So you can understand that there was like a subroutine here of the setup algorithm That's run this preprocessing that I have discussed in the previous slide And then encrypt them using like a single keyword SSC You remember that the analogy that I have given you at the very beginning where an encrypted milchimap is equivalent to single keyword SSC Right, but here there is a problem We don't want to give the adversaries some additional leakage that we can easily get rid of So here the adversary can know the size of the local milchimap, which is basically the size of the intersections Okay So it's easily we can easily handle that with just encrypting the local milchimap using an encrypted dictionary and This will just leak as I'm going to detail later on just this the total size of all local milchimaps as a setup leakage and Of course, we have also the key as the output of our setup algorithm So the token algorithm of our IX construction takes a key and here for sake of clarity I have taken an example which is much simpler Which is just composed of a distinction of two keywords keyword one and keyword three and it will work as follows It will output a token composed of four sub tokens to two sub tokens for the global Which is basically for keyword one and keyword three So we will see you this this token will make it will make more sense when we will go through that the git Algorithm and we'll output a dictionary sub token and A local sub token so and all of them are encrypted so the adversary cannot say This server cannot see anything out of it so far So the git algorithm where the server wants to send the output to the client It will take the the token and the encrypted structure that we have out put it From our setup algorithm and it will work as follows First it will just call the git algorithm of a single keyword SSC and it will take the global Subtoken and the encrypted Multima and it will output the result which is basically a tag of identifiers They are encrypted and if you can see this encryption is dependent of the keywords. So Here for example when we were running again the git algorithm on a different keyword these tags will be different because they have different Keywords as they're in Okay, so next for the look for for for the search algorithm will have The git algorithm will take the the sub token for the dictionary and the encrypted dictionary that we had and We'll output just a local mil team up that's much the sub token that we want basically this algorithm will Extract the local mil team up that we need and then the local sub token that we we wanted just will be used With the encrypted local mil team up to output the tag that we want to remove from this intersection So basically remember these are the results and one thing that I want to off-size here is those are encrypted So in terms of leakage even if here ID for is the same here. Those are encrypted differently So that the server so far doesn't know actually That they are the same But this one is Encrypted with one of these it will be equal to one of these which is basically the first one because this is the first one that we have Processed and we will remove it the server will know and we'll remove it and we'll send just the result to the To the client. So here you can note that We don't have any redundancy. So this is option and this is what we want to achieve All right So it's very important to understand the security and efficiency of our construction and security and structure the encryption is to Analyze and to describe the leakage and the leakage is in our construction viewed in a black box. We analyze it in a black box fashion One thing that I want to say about that is that it gives us some flexibility in future because we will use special instantiation later on and this black box view will help us to Analyze better the leakage of IX whenever there is a new construction that comes in the future So the black box setup leakage of IX is composed of the setup leakage of both emm and edx The query leakage is the query leakage of the global emm and edx because those are the only one that we have access So what this? Black box being in reality So concretely if you want to instantiate the emm with some standard like cash it's all construction Concretely the setup leakage will be for example the size of the global emm Mutual map and or the total size and the total sorry and the total size of the local mutual maps So the concrete query leakage will be the search and access pattern of the global mutual map the local mutual map as well and the tags that I have have been Showing in the previous slide so and and finally the search and access pattern for the dictionary because we have accesses as well All right, and one thing that I want to stress here like compared to OXC for example our leakages is We have less leakage than OX So in terms of asymptotic we have a communication complexity which is optimal in terms of worst case in terms of search complexity We have a worst case search sublinear search complexity So for example if you have a disjunction is disjunction composed of Q keywords It will be Q square times M where M is the size the cardinality of the the response of The maximum of the the response size of the global Subchockings which is the keywords that compose the the query and in terms of storage we have this This formula which is in function of the number of pairs at the mutual map and also the total number of pairs of all local Mutual map and you can see that I have Put that in red and why you may ask why because it's quite heavy When we came when we wanted to evaluate our construction this term is quite a lot for some data for some data sets So a question is how can we for example? Reduce this this storage and make our IX construction more compact So there is a easy solution which is using the ZIDX construction by go in 2003 as our local mutual map So we instantiate the local mutual map with the special purpose You can see single keyword SC which is more compact it has a it has a linear cost But as long as we search just on local mutual map, it will not impact the sub linearity Even if it's linear, so it's okay. So this is why I say it's okay So it's very compact, but the problem of ZIDX is not adaptively secure ZIDX can be made Adaptively secure, but the token size can be too large and we you can if I can you you know You can go to see the the paper to see the details why this this holds So this motivates the the next step where we wanted to come up with a new construction That is compact, but also have like an optimal token size And we come up with this metrologic up filter Which is new nested bloom filter with variable size and fixed hash function and it has the following properties It's based on online ciphers. It's adaptively secure compact optimal token size and linear search complexity And we move from this formula of storage overhead to something like that where we depend only on the number of coworkers Right, so we have evaluated our construction We used an error on that a set which is around 40,000 Files if I remember correctly and this here is the search overhead and this is the storage overhead So the search overhead for example for 1000 selectivity basically the number of pairs that we were returned for a conjunction and 10,000 here It's around 12 millisecond So compared to xc that does 200 Millisecond we have one order of magnitude lesser than that even if OXC was implemented in C while ours it was in Java So for the setup This is just to show that the difference between ZMF IX, ZMF and IX will have like ZMF is the construction that I have discussed about using my Troshka filter and then we gain also one order of magnitude installed on the same data set, okay? and finally We introduce Clusion which is as far as we know is the first encrypted search library open source under GPL and It's written in Java it implements currently Different SSE construction like to live ZMF Dynamic SSE that are forward secure to live IX to live IX ZMF Boolean SSE to based on to live and ZMF There are some work in progress. We implement Dynamic SSE that have better trade-offs and this will come Soon force one and force two and graph encryption that we call LGX those are some new construction that we will add to the library and With that I want to thank you and you can find the leak of the library here