 Hi, my name is Marilyn and I'm here today to present our work titled Structured Encryption and Dynamic Leakage Suppression. This is joint work with Seni Kamara and Tariq Murtaz. Why is data security and privacy important? It seems like almost every day in the news we hear of yet another data breach that compromises the personal data of millions of users. On this slide are just a few of the biggest data breaches in 2020 and 2021, each compromising millions of records and causing billions of dollars worth of damage. And there are so many more. One possible solution to this ever-increasing problem is to encrypt all client data before uploading it to any external untrusted server. However, once the data is uploaded encrypted form, it should still be usable by the client. For instance, the client might still want to run queries over their data and receive the correct responses from the server without having to download and decrypt all of their data. Is it possible to do this efficiently? This question is posed and studied in the field of encrypted search. There are three main aspects to any encrypted search scheme. The functionality of the scheme or how many different types of queries are supported by it. For instance, a scheme might support Boolean combinations of queries or range queries. The security of the scheme or how much meaningful information is revealed to the adversarial server and the efficiency of the scheme, which can be measured using metrics such as server-side storage or communication cost. There are several cryptographic primitives that can be used to construct an encrypted search scheme with varying trade-offs between these three aspects. For example, fully homomorphic encryption would allow the client to compute any function over the data with good security guarantees, but it is not very efficient. On the other hand, property preserving encryption also allows the client to compute many functions over the data and it is very efficient. However, it also reveals a large amount of information about the data to the server. In between the two, we have structured encryption, which offers reasonable functionality and efficiency while offering security that is in between that of fully homomorphic encryption and property preserving encryption. Structured encryption is a general primitive that allows the client to upload any encrypted data structure and run queries on the data structure. Structured encryption also allows for operations beyond queries, for example, ads or edits, which change the underlying data structure. And the security of a structured encryption scheme is often expressed with respect to a persistent adversary, which is an adversary who controls the server and observes the entire execution of the protocol and observes meaningful information during this execution about the data structure and the queries. And this information is referred to as leakage and leakage has been studied in many different ways over the years. One line of work asks the question of how much information is leaked, or is it possible to quantify the leakage in terms of the number of bits for various cryptographic primitives? Another line of work asks the question if the leaked information can be used to attack structured encryption schemes. And this was started in 2012. Yet another line of work asks the question if leakage can be eliminated completely. And this work is known as leakage suppression. And that is also the subject of our talk today. And before I continue with leakage suppression, I'd like to set up some preliminaries for terms that we will use in this talk. So common data structures that we will be talking about include an array RAM, which is a set of values in memory that can be read and written, and are usually indexed by the number of the position in the array. Another data type is a dictionary data type, which maps labels to values. And each label can be used to retrieve its corresponding value. Yet another data type is known as a multi map, which maps labels to tuples of values. And these tuples could be of varying lengths. So as you can see, label one is mapped to three values in a tuple, but label four is mapped to only one value in a tuple. And so that is a multi map structure. I would also like to introduce some leakage patterns that we'll be talking about throughout this talk. So given a client and say a multi map that's encrypted, so an EMM on the server, what is the query equality leakage? So the query equality leakage reveals to the server if two queries to the multi map are on the same label. So say the client issues the sequence of three queries, l one, then l four, then l one again. Even if the server doesn't know what the queries are, if the server can identify that the first and the third queries were for the same label, while the second one was not, then that scheme leaks the query equality. Another leakage pattern we'll be talking about is volume and that is how many values correspond to a query. So for instance, if the client queries label one and the server can see that three encrypted values were returned and then the client queries label four and the server can see that only one encrypted value was returned, then this scheme leaks the volume of the queries. Now when dynamic operations are introduced, which is operations that can change the underlying structure, so ads, edits, deletes, then there's different types of leakage that are also introduced which could reveal information to the server. So for instance, the operation identity, which is just simply which operation is the client running. So a dynamic scheme would support many different types of operations like queries, adding a new label, editing an existing label. And if the server can tell which operation is running currently, then that is referred to as the operation identity leakage. And similar to the query equality leakage, there's a notion of operation equality, which reveals to the server if two operations are performed on the same label. So if the client queries label one, then queries label four, and then edits label three, and the server is able to tell that the first and the third operations happened on the same label, then that leakage is referred to as the operation equality. So here's those are some data structures and leakages that we'll be talking about in the stock. And so we come back to the question of can leakage be eliminated completely? And over the years, there have been many works that look to answer this question for different components of leakage. So for the query equality pattern, there was a suppression framework that was introduced in 2018. And similarly, there was some work in volume hiding that was begun in 2019, and there have been many volume hiding schemes since. So why are these two patterns interesting? They are very common in structured encryption schemes. They show up in more complex schemes built out of simpler building blocks. And they are very difficult to hide with simple approaches without introducing large inefficiencies into the scheme. The query equality is one of the most common components of leakage. So can it be suppressed? A first idea is to use ORAM as a black box to suppress the query equality. What does this mean? That any data structure that is stored and queried by a client will have to be converted to an array and the corresponding accesses to the array. So for during setup, the data structure has to be flattened out to form an array. And then any subsequent query to the structure has to be converted into a series of ORAM reads and writes before the response can be returned to the client. However, this solution incurs overheads in storage communication and the number of round trips required for the interaction. Further, there could still be some leakage to the server, including the volume of a query, depending on the underlying data structure. In order to address this, a line of work on oblivious data structures began creating custom ORAM scheme tailored to specific data structures such as oblivious trees. So on one end, there was a general purpose solution which is inefficient and sometimes leaky. And on the other end, there are special purpose construction made for each data structure. So it is worth asking if there is a solution that provides something in between? Or can we suppress query equality for more general data structures more efficiently than black box ORAM simulation? This question was answered by the query equality suppression framework that was introduced in 2018. This framework was inspired by the seminal square root ORAM construction of Goldreich and Ostrowski. The authors noted that the square root ORAM could be seen as a zero leakage dictionary or the cache being used to suppress the query equality leakage of an array which is known as the main memory. And this could be viewed as a leakage suppression framework. And in fact, it could be generalized to more complex data structures than just arrays and therefore to higher order structure encryption schemes. And it turned out that this framework was more efficient than black box ORAM simulation and comparable in efficiency to custom made oblivious data structures. However, the framework only produced static scheme. So even if the input scheme to this suppression framework was dynamic, the output scheme would be static without any query equality leakage. And once again, it is natural to ask, is it possible to suppress the query equality leakage in a dynamic setting? Or even, is it possible to create a dynamic query equality suppressing framework? In our work, we show that this is indeed possible. However, there were several challenges to dynamic leakage suppression. The first of these was that due to our techniques and the use of cache based suppression from the static framework, operation equality had to be suppressed. For example, if an ad and an edit were on the same label of a multi map structure, then that would have to be hidden from the server. The second more complex challenge was that leakages are correlated in the dynamic setting. For instance, the operation identity leakage, which operation is occurring at the moment in conjunction with the volume leakage could reveal additional information about the query equality. So say the longest volume for a query was 100 results and then the server observes an ad operation and then later observes a query with 101 results. For some input and query distributions, it is possible for the server to infer the query equality using these additional leakages. So our approach was to start with a volume hiding scheme as input to the framework. So the volume leakage would already be suppressed and then we were able to suppress the operation identity and operation equality. However, volume hiding constructions at the time had limited dynamicity and could not support for dynamic operations. And therefore our framework had to upgrade the dynamicity of these underlying schemes. So what is the dynamic suppression framework do and how does it work? So once again, given a client and a server, say the client has a multi map with four labels and the four tuples corresponding to them. Our construction has an epoch length of lambda, which is equal to three and lambda controls two things during setup. One is that three dummy or empty entries are added into the multi map structure, which we will refer to as the main structure. And a cache of size three, which we can consider as a zero leakage dictionary is also initialized and it's empty when it starts. And both of these structures together form the encrypted data structure and are sent over to the server. So once the server holds the encrypted structure, the epoch begins. And so what is an epoch? An epoch consists of lambda operations on the structure and in this case, lambda is three. So up to three operations can occur in one epoch. So say the first operation is a query on the label L one. So the first thing the client does would be to read the cache and to check if L one is in the cache. It's not. And so the client now reads the location where L one is in the main structure. So it's in the fifth location. So the client receives the tuple corresponding to L one. The tuple corresponding to L one. And then the client writes the cache back to the server. But now during the right, the client updates L one and it's corresponding to pull into the cache. And say next the client wants to add label five, a completely new label to the multi map. Once again, the client reads the cache checks for the label in the cache. It does not exist. And so the client now queries a random location from the main multi map. So one of the dummy locations. And in this case, it reads the first dummy just at the first location. And then writes the cache back to the server. And this time the client adds the label five into the cache. And we noticed that the cache is collecting the updates to the main structure. And so the client now wants to edit the same label that it already queried. So how does that work? Once again, the client reads the cache. L one is in the cache. And so the client now reads a dummy from the main structure. And edits L one while writing back the cache. So at the end of these three operations, a query and add and an edit where the query and the edit were on the same label. The server sees that three random locations in the main structure have been read. And three entries have been written to the cache, but the server cannot see inside of the cache. And so one epoch is complete. And at the end of an epoch, we have a rebuild operation with a high level purpose of the rebuild is to regenerate the original state of the structure and the cache. So the rebuild will combine the updates into the main structure and empty out the cache so that once again, the epoch can start at zero. So once again, the cache will be empty. And the execution of the operations can start all over again. So how does the rebuild work? The rebuild is divided into three stages. So the first is extract and tag. Then this shot the sort and shuffle. And the update phase. So in the extract and tag phase, what happens? The main challenge is that the client must be must be able to collect all the correct updated entries for a new version of the multi map. While simultaneously not revealing to the server which of the labels were updated and not revealing to the server which how many labels were found in the cache and which they were. So the client initializes a RAM at the server side. And this RAM is just to be thought of as an array. And the client extracts each entry first from the cache and then from the main multi map and adds them to the RAM one by one. So this is now inside the RAM, each of these entries has been added to the RAM. And while they were added to the RAM, each of them is tagged with what we call a freshness value. And the freshness value indicates whether this particular label to prepare should be part of the new data structure. So is it a fresh value that should be considered in the next data structure? So just for illustration, these freshness values are showed here but you should think of it as being inside each of the corresponding entries. And we can see that only two of the entries have zero freshness or they are stale values. And that is a dummy that was found in the cache as well as the label one which was updated by the client in the last epoch. And so now we come to how do we separate out these stale zero freshness values from the other values without revealing to the server which is which. So for this we have a sort and shuffle phase. So the client and the server run an oblivious sort protocol based on the freshness values. And this enables the zero freshness labels to be sorted out from the remaining labels and the other labels are shuffled according to their freshness values. So the sort serves two purposes. One is to remove the zero freshness entries and the other is to shuffle up the non zero freshness entries. And once this step is complete the zero freshness entries can be truncated and discarded. And now once again the client uses the RAM the entries in the RAM to rebuild or to create a new structure which contains these updated fresh entries. So for each entry in the RAM the client uses like downloads it and then uses an update protocol to create a new encrypted multi-map on the server side. So each entry in the RAM is downloaded and then updated to this new structure. And then at the very end of this the new cache is initialized a new empty cache once again of size 3. And we can run the next epoch of size 3. The RAM is discarded and the rebuilding is complete. So once again to summarize the working of the dynamic query equality suppression framework the client runs lambda operations in one epoch and for each of these operations the server only sees a read of the cache, a read of a random location in the main structure and a write of the cache. And at the end of this epoch the client rebuilds the structure and to rebuild the structure it creates a RAM, uses the RAM to create a new updated structure and then discards the RAM. And all through this epoch the operation equality the query equality and the operation identity are suppressed and therefore are not revealed to the server. Finally on this slide we have some concrete efficiency numbers for a sequence of 64 add operations to a multi map containing 2 to the 16 values and we are comparing our framework applied to AVLH which is a volume hiding scheme to black box simulation using path or and the standard dynamic encrypted multi map scheme. And we are comparing on the measures of client state that is how much information the client has to store how much storage has to be used on the server side and how much communication happens for these lambda operations and the leakage and for our particular framework these numbers also include the rebuild. So if we compare our framework to black box simulation using ORAM we see that we are more efficient but we do have more leakage than ORAM. So at a high level we leak this total size of the structures the total number of labels values and the maximum tuple length and we leak the same parameters of the new structure so after rebuilding but we leak nothing during the operations or during the rebuild. So but if we compare our construction to standard EMMs we see that we are far off from optimal efficiency but our leakage profile is much better because standard EMMs always leak the volume and the query quality patterns. So once again here is a summary of our contributions. Thank you for your time and please see our paper for more details. Thank you.