 So I'm going to talk how to handle leakage in encrypted search. So this is a joint work with Sini, Kamara, and Olya, Ohrimenko. So structured encryption was introduced by Chase and Kamara in 2010. And it's a cryptographic primitive that encrypts the data structure in such a way that the user can privately query it later. So at a high level, it works as follow. There is a setup algorithm that takes as input a security parameter and the data structure, and it puts a key and an encrypted data structure that is sent to the server. At query time, the user will run a token algorithm that will take as input a key, a query, and will output a token that is sent to the server. Then at query time, the server will run a query algorithm that takes as input the token and the encrypted data structure and output an answer that is sent to the user. So we call the information that the server learns about the data structure at setup time, the setup leakage, and we call the information that the server learns about the data structure and queries the query leakage. And at a higher level, we say that a structured encryption is less LQ secure. If for one, it doesn't reveal any information about the data structure beyond the setup leakage. And for two, it doesn't reveal any information about the data structure and the queries beyond the query leakage. And for more precise and formal security definition, please check the CK10 page. So examples of structured encryption include encrypted multi-maps, encrypted dictionaries, encrypted graphs, encrypted arrays, and so on and so forth. These are built-in blocks that one can use to design higher-level applications, such as encrypted relational databases, encrypted NoSQL databases, symmetric searchable encryption, garbage circuits, privacy preserving, network provenance, and so on and so forth. So when we design a structured encryption, there are actually three main criteria that one want to optimize on, and are efficiency, expressiveness, and security. So when it comes to efficiency and in SSE in particular, achieving optimal search was the first milestone. Then several work for law to achieve dynamism and optimal search dynamic STE construction. And recently, this year, there were several work that targeted designing IO efficient construction, in particular, achieving better search, locality, and storage trade-offs. By the way, the two talks later in the session will be on how to achieve better trade-offs. So with respect to expressiveness, so research started with single keyword SSE, single keyword being the most fundamental search query. Then several work investigated other settings, such as multi-user SSE, and more expressive query, such as Boolean, SSE, and range queries. So with respect to security, there were actually a couple of work that tried to investigate different adversarial models, such as active adversaries and snapshot adversaries. There were also several work that investigated attacks on SSE. And also recently, there were a lot of focus on how to design forward, secure, and backward secure SSE construction. So as we can see, efficiency, expressiveness, and several dimensions of security have been greatly investigated. However, leakage was the main area that didn't receive a lot of attention, and for which we don't have a good understanding. So we have identified three main directions that we believe that they will help us to better understand leakage in structured encryption, and which are, first, cryptanalysis. Cryptanalysis, what we mean by that, is the area that will design attacks for a specific leakage profile, and there's some specific assumptions. And the two query attacks, for example, that we are aware of, the IKK and the count attack, will require, for example, the occurrence pattern. However, they will rely on very strong assumptions, such as the knowledge of more than 80% of the user's data, and also around 5% of the user's query. The second dimension is leakage quantification. And at a very high level, what we mean by that is our ways how to measure leakage. And the third dimension, which we believe is the most important dimension among the three, is leakage reduction. And by which we mean, at a high level, studies ways how to reduce leakage. This was the focus of our work, and this will be the focus of this talk. So, before starting detailing our contribution, a valid question to ask ourselves is that, are there already ways to reduce leakage? And the answer to this question is yes, there are. And oblivious RAM simulation is one of the main ones. And by oblivious RAM simulation, we mean a technique that will take, at a high level, read and write operation, and will replace them by a polylogarage, a chemical read and write operation in a specific way. It has two main advantages. The first one is being generic, in that it can apply to any RAM program, and therefore achieving any level of expressiveness. The second advantage is that it has a very small leakage profile. However, the downside are it's interactive, and unfortunately, it doesn't scale to very large data sets. So there are also other approaches, such as garbled RAM and customized schemes, and for more details, please check our paper. So, reformulating the question, so the question that we want to answer is, are there more efficient ways to suppress leakage? And before detailing our contribution, let me introduce some background. So, when we first started working on this paper, we have noticed that the terminology and the formalism of leakage in the literature was sometimes inconsistent and even contradictory. So the first thing that we did is to address this problem, so we have introduced a more intuitive nomenclature and more precise description of leakage. So I will present just a subset of these leakage patterns, the ones that I'm going to use in my talk. So first, query equality, which is known as the search pattern in literature, is the leakage that leaks if and when a query is repeated. The response identity known as the access pattern in literature is the leakage that consists of the responses of a specific query. The response length is simply the length of a response while the sequence response length is the sum of responses length of a sequence of queries. So, the second thing that we have to introduce is a new concept of leakage, which was necessary for our security analysis, which is the non-repeating sub-pattern. At a very high level, we divide any pattern in two parts, a non-repeating part and a repeating part, where the non-repeating part is equal to the leakage that occurs on non-repeating queries. And the repeating pattern is the leakage that will occur on the other sequence of queries. So, for example, the query equality has a non-repeating pattern, which is equal to nothing. So, now, we are ready to dive into our techniques. And at a very high level, what we are proposing is a generic compiler that will suppress the leakage of any structured encryption scheme. It works as follows. So, consider that we have a structured encryption construction with a specific leakage profile, a setup leakage, and the query leakage here is equal to pattern one and pattern two. So, we have this compiler that will suppress a specific pattern, let's say here for this example, pattern two, and will output a structured encryption scheme with a new query leakage, which is only equal to pattern one. So, in this talk and in our work, we focus in particular on how to suppress query equality. And by that, we mean, let's consider a structured encryption scheme now, where the query leakage is equal now to the query equality and the non-additional pattern. And we view the additional pattern in the non-repeating and the repeating form that I have described. Then we introduce what we call a new compiler that we call the cache-based compiler, or CBC for short. We will see the reason behind its name later on. And this CBC, when we run it, it will output a new construction, a CEE construction with a leakage profile, where the query leakage is only equal to the non-repeating sub-pattern of the pattern. We will see why this is important. So, actually, our techniques and contribution are better represented in the form of a pipeline, where our CBC actually suppress the query equality and the repeating pattern. So, in terms of efficiency, which is one of the most important things, our CBC compiler will only induce an additive polylogarithmic overhead, contrary to a multiplicative polylogarithmic overhead in the case of ORAM simulation. However, in order for this to work, sorry. So, in order for this to work, the CBC requires that the base CEE construction has to be rebuildable. And as we know, most CEE construction are not rebuildable. So, we had to introduce a new technique that we call rebuild compiler, or RBC for short, that will make any CEE construction rebuildable. And it has two main advantages. The first advantage is that it will preserve the query leakage. So, the query leakage, as you can note, is the same before and after compilation. The second advantage is that it preserves the scheme's query efficiency. However, it will add a super linear rebuild cost. So, another point that I want to emphasize here, which is notice that actually the leakage profile of our final construction in the pipeline is only equal to the non-repeating pattern. And this non-repeating pattern is actually equal to the non-repeating pattern of our first construction that we put in the pipeline. So, what does it mean? It means if we want to achieve construction with very, very small leakage profile, it is actually sufficient to reduce the non-repeating pattern of our base construction. And based on this observation, we have designed a new construction that we call a piggyback scheme, or PBS for short, that, as far as we know, is the first construction that hides the response length. And note that, for example, if we use ORAM simulation, it's not trivial to hide the response length without using some types of padding. And actually, this construction can be of independent interest, especially in the context of recent attacks on volume, using the volume pattern. However, like the trade-off, it does introduce a new trade-off. This construction introduces a new trade-off, which is query latency. On the other hand, so now in the pipeline, when we use PBS at the beginning of our pipeline, so we have now two constructions, two possible installations at the end of our pipeline. The first one is AZL, which has a query leakage, which is equal to the sequence response length. So one thing that I want to emphasize here, this is a very, very small amount of leakage. And the second construction is FZL, which doesn't leak anything. However, it doesn't achieve perfect correctness, okay? So let's now focus on CBC Compile. So our approach to suppression was inspired and is based on the square root ORAM solution by Goldreich and Ostrovsky, that I recall here, very briefly. So we have a main memory and the cache that can hold up to lambda elements. And whenever the user wants to access an element, it always starts by reading the entire cache. And if the element doesn't exist, which is the case of this example, it will go to the main memory, access the real element, and then insert it back in the cache. So let's assume that the user wants to access the same element again. As I have said, we always start by reading the cache entirely. And here in this case, the element already exists in the cache. So what the user do, it will just access a dummy element in the main memory and then insert it back in the cache, okay? And after lambda accesses, the main memory is reshuffle and the cache is empty. So we have noticed actually that square root ORAM solution can be reinterpreted through the length of structured encryption. And what do we mean by that? So first notice that the main memory can be seen as an encrypted error, okay? And whenever we access an element, we access it through a pvp, which is deterministic. So what does it mean? It means that whenever we access the same element, again, we will leak the query equalize. On the other hand, the cache is actually an encrypted dictionary. And if you recall from the previous slide, whenever we access this cache, we access it entirely. And basically, this is equivalent basically to building a zero leakage trivial dictionary. And by zero leakage, we mean that the dictionary that has a query leakage nothing, okay? So let's assume now that the user accesses the element indexed by, that has index 50. So in the encrypted, so if the user accesses again the same element, of course it will leak the query equality because we are evaluating the same PRP, okay? So square root or arm solution, what it does is does, it's actually leverages the zero leakage queries to the zero leakage dictionary in order to suppress the query equality that's happening in the, in the, in the, right? And we access it at real and dummy. So what we did is that we generalized, okay? The square root or arm solution to apply to more complex data structure, such as encrypted, multi maps, encrypted dictionaries and so on and so forth. However, in order to achieve that, we have identified several requirements. First one is that the encrypted data structure has to be rebuildable. Second one, the data structure has to be extendable and safe. And I will come in the, in the, in the next slide to what we mean by extendable and safety. And third, we want to design that the, we want that the base scheme has a very, very, very small, no repeat leakage. So for our approach to work over. So what do we mean by extension? By extension, we mean the fact that we want to add to the query space of the original data structure some element, dummy elements. And we call lambda extension if we add lambda dummies. In a way that when we query the extended data structure using one of the dummy that we have added, the answer will be always equal to nothing, dummy. So while this seems actually straightforward, actually it's not because when one handles dummies, he needs to do it with care, especially that the leakage profile of the original construction was not designed while taking care of dummies. So actually the leakage profile, if we are not very careful, might leak information that help the adversary to distinguish between a real and dummy elements. So we introduce this notion of safety extension. And what we mean by that at a very high level is that the setup leakage of the extended data structure leaks at most the setup leakage of the original data structure. And the query leakage of the extended data structure leaks at most the query leakage of the original data structure. And with that, I'm done with the CBC, very high level. So unfortunately we don't have time to go over the RBC so I will jump to PBS. So PBS construction is based on a data transformation that works as follows. So I'm just taking for simplicity the case of a Mil-Chima. So what we do, we have our Mil-Chima where each label is associated to a tuple with different sizes. Then we will pad this multi-map, the tuple of this multi-map with dummies in such a way that the size of each tuple is actually a multiple of a batch size that we have fixed. Whereas a batch size here for example is just three. So here we have padded with two. So the size now is six. We have padded now with the second label with one. The size is three. And here we don't have to pad because it's already three. Next, we will create a dictionary data structure where we will have an entry for every batch. So we will have two entries for the first label because we have six here so we divided them in two and then an entry for label two and an entry for label three. So this is our dictionary structure that we have generated. So the setup algorithm of PBS is very simple. So it takes as input a security parameter and our Mil-Chima. It does the data transformation that I have presented and it will output an encrypted dictionary a key and the state. And what is the state is that a simple dictionary that will keep locally the association between every label and the number of batches that we have generated. So for example, label one, the number of batches is two. Label two is one, label three is one. So now consider that we have a query sequence. A query sequence composed of two labels, label one and label two. So how does the token algorithm work? So the token algorithm takes a key as an input the state and a label, label one in this case because it's the first one. So the first thing that we do is we go to the state and we fetch the number of batches associated to label one and here in this case is two. Then we insuncate a queue. A queue where we put in this queue two labels for the two batches and then we just send the token for the first label here label one to the server and then we update the queue by eliminating whatever we have sent to the server. The get algorithm of PBS is simply running inside a get algorithm of the encrypted dictionary that will take as an input an encrypted dictionary and the token that we have sent and it will output the answer. So then there is the second token that we want to apply on the second label. So now it's label two. So what we do is that as previously we fetch the number of batches from the state which is one now because label two is just associated to one then we update the queue. Okay, the queue with as you can see label two now concatenate to one just meaning that it has only one batch. So we just add it to the queue. Then we run the token algorithm but on the one remaining already. So we don't apply the token algorithm on the one that we have just taken as input. We need to first query the previous label and then we output the token that is sent to the server and then we update the queue and then now the queue just contains the last label. And then we just run as previously the get algorithm and just run the encrypted dictionary again that will output the response. And finally, because we still have an element in the queue we just run the token algorithm on nothing just to flush the queue, okay? And simply it will take just the last label and output the token and update the queue. Now the queue is empty. The queue is empty. So then the get algorithm which is as previously just takes the token and output the response. So as you can notice all responses has exactly size three, all of them. But if you have also noticed we had a sequence of query composed of two queries but we had to apply the token algorithm three times. So there is a notion of query latency. We need to wait to get our answers. So it's important then to study what is the latency or what is the latency of previous. So in the world, the world's case sequence of query sequence of size T has latency T times the maximum response length over the batch size minus one, okay? And this happens when your query sequence is composed of labels that are associated to the maximum response length, all of them, okay? Just imagine, you know, all of them has the same maximum response length. However, fortunately for us this doesn't happen in practice. Actually in real world setting queries and the response length are zip distributed, okay? And in such setting we can show that actually the query latency is way, way smaller than that. And with high probability we can show when we fix in a specific way the parameters, okay? So before I finish, I want just to give you like a very high level idea about our ACL construction just in terms of efficiency. So what I wanted to highlight in this slide is that we showed in this work and in the paper that our techniques and ACL can be in terms of efficiency, a small o of the query efficiency of ORAM simulation. And there's some assumptions, of course. But these assumptions, you don't need to go over, you know, the details of these assumptions, like you can go over them later when you read the paper. But at a very high level those are natural assumptions that we find in information retrieval. Like basically, you know, some assumptions about the way how queries are distributed and response length are distributed, okay? So to conclude, take away first. So we have introduced a new direction in encrypted search which is the direction of leakage separation, okay? We have introduced a general framework that suppresses the search pattern, you know, the query equality. We have introduced the first solution that hides the response length known as the volume pattern in literature. General compiler that make any STE rebuildable which is also of independent interest we can use again and again in future. And the first scheme that leaks at most the sequence response length as I have said, this is a very small leakage and we believe that it's very hard to exploit. And the first scheme that leaks nothing, of course nothing is between parentheses is because, you know, we don't achieve that. It doesn't achieve perfect correctness. And this is an interesting new trade-off that we have introduced in this paper which is can we do better in terms of query latency versus security? Actually, maybe we can do more research with disrespect. So future work that we think that are important to tackle immediate task is that to come up with a CBC, RBC that are dynamic, which is not the case for our work now. We want to reduce CBC overhead, achieve like a better efficiency. And, you know, next task that will require more work is to come with, to design compiler that will suppress other leakage patterns. And yeah, thank you.