 Thank you for the introduction. So my talk today will be about how to suppress volume leakage in structured encryption using computational assumptions. This is a joint work with Cine Camara. Okay, let's start with first recalling what structured encryption or SCE for short. So SCE is a cryptographic primitive that was introduced by Chase and Camara in 2010 that allows the user to encrypt the data structure in such a way that it can privately query it later on. And in particular, there is a setup algorithm that takes as input a security parameter and the data structure and output a key and an encrypted data structure. And the encrypted data structure is sent to the server. And at query time, the user will run a token algorithm that takes as input a key and the query and will output a token. The token is sent to the server. And once the server receives the token, it will run a query algorithm that takes as input the token and the encrypted data structure and output an answer and the answer is sent to the user. Okay? So we call the information that the server learns at setup time, the setup leakage, and it may include, for example, the size of the data structure. And we call the information that the server learns at query time, the query leakage, and it may include, for example, the search pattern or the access pattern. And at a very high level, we say that the SCE scheme is LSLQ secure. If for one, it doesn't reveal any information about the data structure beyond the setup leakage. And for two, it doesn't reveal any information about the data structure and queries beyond the query leakage. And for more detailed security definition, please refer to the CK10 paper. Okay. So when designing or analyzing structured encryption schemes, there are three main dimensions that one should pay attention to, which are efficiency, security, and expressiveness. And in fact, structured encryption has greatly evolved during the last two decades. And there are a lot of works that have investigated, for example, how to design efficient scheme or how to design more expressive schemes or how to attack or design more secure schemes. However, one aspect that didn't receive a lot of attention and is poorly understood is leakage. And actually, there are three main directions that one can reason about leakage and can help us to better understand it, which are cryptanalysis, measure, and separation. I'm going to detail each of these directions in the following slides. So what do we mean by cryptanalysis in encrypted search? So it means the following. So given a leakage profile, we want to design attacks that recover either the queries and the data under some assumptions. And the goal of cryptanalysis is that we want to empirically learn the impact of a specific leakage profile in the real world. However, the main limitations of cryptanalysis is that the gap between the assumptions and reality can get wide. So the second direction is measure. And what do we mean by this is that given a leakage profile, we want to somehow quantify, for example, in bits, the specific leakage pattern. And the goal here is that we want to theoretically compare between leakage patterns. But the main limitations of this approach is maybe there is no meaningful total order. And this is something that we are currently working on. And the third dimension, which is suppression, which we believe it's one of the most important directions, means the following. So given a leakage profile, we want to design a compiler or transform that will suppress a specific leakage pattern. And the goal here is we want to design these tools that will suppress various leakage patterns. However, the main limitations of this approach is that it will incur some overhead. Okay? And as this talk is about leakage suppression, I will recall two main approaches that we introduced last year at Crypto, which are compilation and data structure transformations. So let's start with compilation. So compilation is a mechanism that takes a structured encryption scheme as input with a leakage profile lambda such that the query leakage is equal to two patterns, pattern one and pattern two. And the compilation will output a new structured encryption scheme with a new leakage profile lambda prime such that the query leakage is equal only to pattern one. So that is, we have suppressed pattern two. So the second approach is the data structure transformation, which works as follows. So given a data structure, okay, we will apply our transform that will output a new data structure, the star that we will give as input to a structured encryption scheme, and itself it will output an encrypted data structure. And the structured encryption scheme here has a leakage profile lambda, but the resulting scheme that does include the transform inside will have a leakage profile lambda prime such that the query leakage is only equal to pattern one. So we have suppressed pattern two. So a valid question to ask is that are there any other approaches actually to suppress leakage? And the answer is yes. There are. And as I have just discussed, the KMO paper introduced two approaches, the black box compilation and data structure transformation, where the data structure transformation actually suppresses leakage pattern against an unbounded adversary. So in this work, we will consider a new data structure transformation that will suppress leakage against a bounded adversary. So this, this is actually important. And one of the major findings in our work stems from the question of whether actually it is possible to design a structured encryption scheme that actually do leak something such that an unbounded adversary can still learn the leakage pattern, but a bounded adversary cannot. And if we go back to our data structure transformation slide, this new transform is very similar actually at heart. So there is a data structure, there is a transform that will output a data structure that's given as input to a structured encryption scheme, which itself outputs an encrypted data structure. But now the resulting structure of the encryption scheme does leak a query leakage still composed of two patterns, pattern one and pattern star, where patterns are now in the eyes of a bounded adversary is actually equivalent to nothing. Okay. And in the remaining parts of this talk, I will show how to leverage this new capability to design SCE schemes that will hide the response length. I will detail what response length means in the next two slides that I'm dedicating actually to some brief background. So let's start by a simple yet fundamental data structure recap. What we mean by a dictionary data structure is a data structure that maps a label to a value. Here we map a keyword to a file identifier, and there is a gate operation that given a label it will output the corresponding value. A multi map data structure is a data structure that maps a label to a tuple of values. For example, here we map a keyword to a tuple of file identifiers. Similarly, there is a gate operation that's given a label it will output the corresponding tuple. So the response length pattern is known in searchable symmetric encryption volume pattern, and it's very simple. It's a pattern that occurs at query time, which is simply equal to the length of the response or the answer. So hide the response length is actually very challenging, especially if we want to preserve efficiency. And what I'm going to do in this slide or the upcoming slides is to show two ways to actually hide the volume pattern, but as you can see, they will be very inefficient. So the first approach, we call the knife padding approach. So we take a multi map data structure, and we will just pad it, add dummies to all tuples in such a way that our response length will have the same length. And then we take this transform data structure, and we will feed it to a multi map encryption scheme. Any standard multi map encryption scheme that does actually leak the response length. And the query equality here, QAQ refers to the search pattern in SSC searchable symmetric encryption literature. So the resulting scheme, SCE prime, is a multi map encryption scheme that actually hides the response length, and it's actually easy to verify this. So in terms of asymptotics, the query complexity of such an approach is not that good. The query complexity is actually linear in the maximum response length, and the storage complexity actually is terrible. It's actually linear in the number of labels signed, the maximum response length. But the good thing about this approach is that it's non interactive, and as I'm going to detail the second approach, which is actually interactive, this is actually a plus. So the second approach that we call through a free dictionary approach, we will start again with a multi map data structure, and we will transform it to a dictionary. And then we take this dictionary structure, and we will add dummies to it. And the number of dummies that we will add to this dictionary data structure is equal to the maximum response length minus one. And then we will feed this dictionary data structure to a dictionary encryption scheme that has the property of being leakage free, and we can instantiate such a primitive using oblivious RAM in such a way that the result in structured encryption scheme will hide the response length. As long as whenever we fetch a tuple, we fetch along with it a number of dummies in such a way that the total number of values we fetch is equal to the maximum response length. So in terms of asymptotic, this naive approach is also actually is worse in terms of query complexity compared to the naive padding. But it has great storage complexity. But it's also interactive, which can be bad for some scenarios. Okay. So a valid question to ask is can we actually achieve the best of both worlds? And the answer to this question is yes. And in the remaining parts of this talk, I will detail the pseudo random transform or PRT for source, which is a data structure transformation that has better query complexity and storage complexity when compared to the naive padding approach. But it has the disadvantage of being lossy. I won't have time to talk about VLH, which is the structured encryption that builds on top of PRT that hides the response length, but it's actually straightforward from the PRT given the slide that I have presented on the data structure transformation. I will detail though the densest subgraph transform or DSC for source, which is the data structure transformation that actually it's non-lossy, that has better storage overhead, but it has quite worse query complexity compared to the PRT. I won't have time, unfortunately, to talk about AVLH, which is the structured encryption that we build on top of the DSC. And also, I won't have time to talk about how to make VLH dynamic, but please refer to our paper to learn more about these three parts that I'm not going to talk about in the future. So, let's start with PRT. So, please don't pay attention to the formalism for now, as I'm going to describe this transform through the following illustration. So, we have two parameters, Lambda and New, where Lambda is the minimum response length and New is the output size of a PRF. So, we take our milchimap and the PRT will output a new milchimap data structure such that the new response length is equal to Lambda to which we add a PRF on the keyword concatenated to the old response length. And given the sum, we will make the decision whether to pad or truncate a specific tuple. So, let's get an example. So, for keyword 1, W1, the response length is equal to 3, ID1, ID3, ID4. So, we will evaluate the PRF on keyword 1 concatenated to 3, which is here, for example, is equal to 0. It's just for a second of example. And we take the 0 and we add it to 1, where 1 here is the value that we set for Lambda, and the result is 1. So, we compare 1 to 3, which is the old response length, and we see that it's smaller, and actually we have to truncate the tuple with two values. So, and this is why we have removed two values from in the new milchimap. For keyword 2, however, so we will apply, we do the same, we apply the PRF, the result now is 2, we add it to 1, it's 3, and we know that 3 is actually larger than 1, so we have to pad. So, we add dummies. And for keyword 3, we do the same. And in PRT, we also have this ranking function, which is basically ranking the file identifiers within a tuple, and the purpose of this ranking function is that it will allow us to lose only the less significant file identifiers, and we will preserve the most significant, which is very important from a practical point of view. So, a valid question to ask, what's about the number of truncation? What's about storage overhead? And for this analysis, we had to assume a special type of milchimaps that we call zip-distributed milchimaps. That's, I'm going to detail now. So, what do we mean by this? So, a zip-distributed milchimap has response lengths that are zip-distributed, where the arth response length is equal to this formula. It's not important to parse it. Actually, the illustration gives you an idea about how the response lengths are distributed. So, the choice of zip-distribution was not arbitrary. Actually, it's very common to find zip-distributed data sets in practice. And actually, we have run Enron just to check the response length distribution. Enron is a data set composed of half a million, roughly, emails. And as you can see the graph here, the response length distribution is a power low or a zip distribution, sorry, with some parameter. Okay. So, this slide basically summarized our main finding. I don't want you to get distracted with the details, but the main takeaway from this slide is we were able to show that the storage overhead of PRT is actually alpha times the storage overhead of the naive approach, where alpha is strictly smaller than one. And you can think of alpha as the storage reduction multiplicative factor. With respect to truncations, we were able to show that truncations are actually sublinear in the number of labels equal to this formula here. I just want to note that this analysis is not tight specifically to zip. We can do the same analysis for power low distribution. And most probably, we will have the same results, but just to give you an idea. So, as you can see, we have some truncations. A valid question to ask, how can we get rid of this truncation? Is it possible to have a data structure transformation that does not actually incur truncations? And this is what we are going to cover with the DST, Or Dense Subgraph Transform. So, the name will make sense. I'll promise at the end of this talk. So, we take a multi-map data structure, and we view the multi-map inside as a bipartite graph, where the top vertices are the keywords in the multi-map or the labels in the multi-maps. And the bottom vertices are actually bins. We have here, for the sake of example, four bins. We do actually instantiate also a state data structure. Think of it as a dictionary data structure that we will keep at the client side, and you will see how the transform works. So, we take our first keyword, and we will randomly pick three bins uniformly at random. And so, why three? Three is the maximum response length, and actually something that I forgot to mention is that through this bipartite graph, what we are trying to do is want to simulate a random Erdos-Rini graph. So, and once we select these bins, we take our file identifiers in the tuple, and we will just insert them in the bins that we have selected, each in every single bin. And in the state, what we do is that we map the keyword to all file identifiers, sorry, bin identifiers that we have selected in the previous step. We do the same for keyword two. So, here the difference is that keyword two has only a response length equal to one, but still we will have a number of edges equal to three, because as I have said, we want to have a random Erdos-Rini graph. So, but we will just put a single file identifier in one of the three selected bins. We will also update the state. Similarly, where we map the keyword to the bin identifiers, we do the same for the third keyword, and we will just finish by padding all the bins to have the same load. So, something that you may have already noticed is the size of this state actually is terrible, is actually linear, is actually equal to the naive padding approach. So, how can we remove this downside or this restriction? So, we use the following trick. At a very high level, we replace the random generation of edges by a pseudo random generation, where we take our, so how does it work? We first sample a random value, and we have a PRF that we will evaluate on this random value concatenated to counters, counters one, two, three. And three here is the response length, and the PRF will output something and that something is actually the bin identifier. And as you can notice, here we have a collision. So, what we should do is that, yes, so what we should do is we will have, we will repeat this process until no collision will be found. So, here we didn't have a collision and then we will just insert the file identifiers in the bin. Now in the state, instead of storing a mapping between the keyword and all of the bin identifiers, we just store the keyword that maps to the random value that we have used for the edge generation. And we do the same for keyword two. We will put the ID and we will update the state, similarly for keyword three and so on and so forth. We will pad and then now you can notice that the size of the state is way better. It's just linear in the number of keywords, instead of being actually linear or equal to the size of the naive padding approach. Great. So, what is the output of this DST is that we will output three things, a key and a state that we will keep on the client side, and the dictionary basically data structure that will map the bin identifier to its content. And when we want to perform a get, the client will, for a keyword w1, the client will first retrieve the corresponding randomness from the state and then generate, recompute the bin identifiers by evaluating the prf three times and then it will just fetch from the dictionary the corresponding bin content and we'll find the result. Okay. So, one natural question is how about the load of the bin? This is like the most important question that we need to ask for this transform to understand how it does in terms of storage. So, we were able to show that the load of the bin is equal to some to this formula, but the takeaway here is that we were able to show with high probability that the transform immunity map has a size big o of n, which n is the number of pairs, which is actually the same size of the the meeting map that we have given as input. So, there is asymptotically no additional overhead. The size of the state, as I have mentioned, is linear in the number of labels, which is actually dominated by the number of pairs, which is great. All right. So, when we were working on this, we have noticed something which is very nice. Actually, we can actually further reduce the storage overhead if we leverage some computational assumption. And what we leverage here is an assumption called planted dense subgraph, which is an assumption that has been already used in public key to design public key cryptography by Applebaum, Baraka, and and also in to study in the computational complexity of financial products by Aurora at all. And at a very high level, the assumption is as follows. So, you have on one hand a random Erdos-Renie graph. On the other hand, you have a random Erdos-Renie graph into which we plant a dense subgraph. And an adversary, an abounded adversary cannot distinguish between the two. So, we leverage this assumption in this work, which is quite nice because it's another application in cryptography. So, how does it work? And as I have said, maybe I forgot to say this, but this storage gain will only work for a specific type of multi-maps that I'm detailing here. So, the multi-map I have changed a little bit. So, this will work if we have some labels or some tuples that they have some non-empty intersection. And here you can notice that keyword one and keyword three, they have a non-empty intersection where ID2, ID4 actually appear in both. And this is what we call a concentrated multi-map. And this is what I have said. So, how the DSC works now? So, we have our bipartite graph where the top vertices are keyword and the bottom vertices are actually bins. So, what we start with, we will first put the concentrated part into the bins. And then, so, we do it reverse, actually. You can think of it as we are doing it actually reverse. And we just, we then generate the edges that will map the keywords, W1 and W3, to actually to this concentrated part. And then we will continue by generating, as I have explained it before, for the DSC, the other edges, save for keyword two. And here the idea is that we, as you can notice, we have added only the concentrated part only once. I'm running out of time, only once. And this is basically the result. We are reducing the load of the bins. As this is a contrived example, but as you can see, the bins now only contain, all of the bins only contain one value while in the previous example, they do contain two. And this is basically just a contrived example to show that we are gaining something. And actually, we can show asymptotically that the load of the bin is actually reduced by this concentrated part that we have identified. And it's better than the brief slope of the bin. All right. So, to conclude, what are the takeaways? So, I want to first start with something that I want to emphasize. So, volume pattern is an important pattern that has been recently leveraged to perform some attacks by Calaris at all and by Grabs at all to attack structured encryption or because it's just based on the volume. So, this is an important work that shows how to design structured encryption schemes that they are not prone to these attacks. And as you can see, hopefully this talk has shown that it's not actually trivial to suppress the volume pattern. We have seen that trivial approaches that are terrible in terms of efficiency. And I want also to stress that this line of work hiding the volume or hiding other pattern is a very important research direction in structured encryption and encrypted search in general. So, we have started this in 2018 by the because suppression work and this work is actually another step into this direction. Hopefully, you didn't fall asleep and you remember that we have seen the PRT and the DSC transform. Something that I want also to emphasize here is that if we go back a little bit and if you think about suppressing the volume pattern, intuitively, you might think that hiding volume information, theoretically, is not possible without padding. But we got around this using computational assumptions, which is kind of the first work that actually do this for any pattern and for volume in particular. And hopefully, this line of work will help us actually to suppress other patterns in structured encryption and encrypted search in general. Another point that we introduce in this work is that there is this new trade-off between correctness and security. And I will finish by mentioning that this line of work will actually help us to towards many, if not all, actually existing attacks that we know of in the literature, IKK, CGPR, and so on and so forth. Thank you. So, we don't have time for questions. Let's take the question offline.