 Hello everyone, my name is Michael Reichand. I represent to you the article SSE and SSD, Page Efficient Searchable Symmetric Encryption. This is joint work with Brice Minot, Pyrrhan Foug, Raphael Bost and Angelle Bosseur. So now I want to quickly motivate how and why SSE is useful. So say you have information that you don't want anyone to have access to, say, confidential messages or you know information about the aliens that of course the public should not know about or you'd have information about the one ring and of course SSE should not access this information in order to avoid destruction of the world. Well more concretely we have information, so a bunch of documents and each of those documents contain information about certain keywords, so here in blue for example messages, aliens or the ring and each of those keywords appear in certain documents and these documents are identified by document identifiers. So now we want to outsource this private information and we can do so by using Symmetric Searchable Encryption. For this we have a setup phase, so here the client will generally encrypt the keyword identifier pairs and also the documents and then send encrypted versions of those with potentially additional information to the server. Then the server stores this and later on the client can query keywords and then the server can answer these queries. It's important to note though that we can't just send the keyword ring for example directly because everything is encrypted, so the server can't really search without additional information and this additional information that is required to search certain keywords is a token. In this case for example token ring will enable the server to output all the identifiers and then server can send the encrypted identifiers that match the keyword ring to the client and then the client can of course decrypt all of these identifiers and then later on potentially retrieve all the matching documents from the server. For security we look at an honest but curious adversary model so the server will try to learn as much information as possible about all of our data but we can assume that he adheres to all the protocols honestly. So in general SSE is very efficient we only use at least in Symmetric Searchable Encryption Symmetric Primitives usually it's hash functions, PRFs or Symmetric Encryption and this of course is very fast though in general the cryptographic overhead is actually fairly low but in order to still be competitive with databases that don't have any privacy guarantees we actually need to look at memory accesses and for this of course it's important to know how memory is actually built and what memory we're using. So in general there's two big types of memories HDDs so hard disk drives where locality matters and SSDs where page efficiency matters. So locality measures the number of read non-adjacent memory locations because HDDs can read very fast actually adjacent memory locations so the higher this number the slower the read will be. On the other hand SSDs only care about the number of read pages per query so pages will actually be a few kilobytes large maybe four kilobytes for example and reading a single page is essentially as efficient as reading just a few bytes of information from a single page so in general you will read the entire page and this is why for page efficiency you only actually care about the number of pages you read but if we look into the research progress of SSE we actually find that page efficiency and SSD was actually not studied at all yet so what happened was that in 2006 the first SSE scheme with good security guarantees and also good efficiency was constructed then a breakthrough work of Cash and Tesaro from 2014 they showed that it's actually impossible to have locality read efficiency and storage efficiency constant at the same time so as a consequence we can never be as good as a database that has no security guarantees efficiency wise if we care about locality so if we outsource our database on on HDDs despite this result there's a bunch of photo artworks that try to construct SSE schemes that are as close to optimal as possible so for example in S18 we have log-log locality and or one read and storage efficiency so it's actually very close to optimal but because Cash and Tesaro showed it's not possible and we can never hope for something better and in this work we actually inspect page efficiency and we actually show that if you if you want to construct a page efficient scheme that also has perfect storage efficiency so constant storage efficiency it's actually possible so there's no such lower bound at least in the static case for page efficient SSE schemes and we named our scheme Tethys because it uses a max flow computation and Tethys in Greek mythology is the mother of all river gods so as a recap what we care about is page efficiency we want to minimize page access per accesses per query we care about storage efficiency we want to minimize the server storage and we care about security so we want to guarantee that there's as little leaked information as possible in order to be a bit more concrete about the security guarantees that we have I will quickly sketch the security model in static SSE schemes so in general we have an indistinguishability based security definition where the client interacts with the server and the server is the adversary so again he's honest but curious so in the beginning the server will specify a database and then the client will set up the database using the SSE setup functionality and then send the encrypted database to the client then the client will have to query by the adversary specified keywords so he will send the query token to the server and then the server will answer with the response and in the end the server should not be able to distinguish this from a simulated version where the simulator only gets access to a certain leakage of the information that the adversary specifies so for setup the the simulator only has access to the setup leakage and for the search functionality so for queries the the simulator only has access to the query leakage and in general the setup functions are the size of the database for the setup leakage and the size of the identifier set that you have to return so the response set are in this case so the number of identifiers that match a certain keyword down menu and also the search pattern usually the search pattern is essentially notifying the simulator when a keyword was already queried so that you can re-query the same keyword so yeah this query leakage is usually important in order to still be able to construct efficient schemes but on the other hand because of this static SSE or security definition after an interaction with the server the server can or the adversary can only learn these leaked information so the output of the of the setup leakage and the query leakage from the interaction from the with the client before I dive into all the technical details I will quickly talk about our contribution first of all we define page efficiency and we show that's actually a very good predictor for throughput of an SSE scheme if it's run on SSDs secondly we define data independent packing so DIP schemes and these are actually simple allocation schemes or packing schemes that that are purely combinatorical and from these schemes we can construct SSE schemes that retain the same efficiency measures and then lastly we define actually or we build an efficient DIP scheme based on cocoa hashing for weighted items so this not hasn't been done before so this is a new algorithm that allows us to use cocoa hashing so that allows us to do allocation essentially for weighted items and this is based on a max flow computation and also this is the main technical part of the paper and then lastly we use our framework that we defined before in order to construct an SSE scheme that has optimal page and storage efficiency based on tethys and yeah this SSE scheme is also called tethys so now we dive into the technical part and for this first of all I will introduce data independent packing so for data independent packing we have a multi-map so keys that map to items in total we have n items and m buckets into which we want to also allocate our items and each of those buckets have capacity p and then for the for the worst case where one bucket is already filled up to capacity p but we still want to import items we have a stash where we can allocate items to as well so in general it's purely combinatorical which means that one key defines a bunch of buckets so in this case two and then we can allocate all the items to these two buckets and we cannot actually allocate any green items to other buckets than that and we do the same for the blue key and here again blue items can only be allocated to the blue buckets that are highlighted here and again we do the same for the red key and the same for the yellow key and because for example the past capacity in this case might be five we can allocate the last item to the stash because the first two buckets are already full so more formally we have a size function that returns given the number n of items the number of buckets m that we need in order to store all those items we define this function in order to have the number of buckets be independent on the list distribution so that only the number of items actually defines how many buckets we want then we have a build function that takes a multi-map and returns buckets so m buckets in total that are filled with the items from the multi-map and this also allows to return a stash s that has potentially items from the multi-map also and then lastly we have a lookup function so for example we can look up key two so the blue key and we also need to supply the number of items that the blue key matches to and then this will return the bucket indices so in this case the first two buckets yes so this is essentially the ip and we have three efficiency measures so first of all the lookup efficiency this returns the number of buckets per keyword in our construction this will always be two then we have the storage efficiency meaning the overhead that i need to store n items so i will need m times p space for my buckets and i only have n items so m times p divided by n is the storage efficiency and then lastly i have the stash size so the number of items that i didn't actually manage to allocate to the buckets and by definition dip schemes are actually data independent which means that they informally no information about the number of items that match other keys when we look up one key and this is actually very important in order to construct sse schemes from the ap schemes so for this we have the setup function so the setup function takes us input the database which itself maps keywords to identifier pairs then at first we choose two p's so a key for an encryption scheme and a key for a prf function and then we use the prf in order to map each keyword wi to a mask mi and a key ki so we use the mask mi in order to encrypt li so li is the list length of all the identifiers that match keyword wi and which we will later use for the lookup of adip scheme and then we will store this in a map so we map ki the second output of the prf to this encrypted li in a table t and then secondly we use ki in order to replace wi in the database and then the database essentially is already a multi map and that matches ki to the identifiers that match wi and then we use the setup function of the dip scheme in order to construct essentially a bunch of filled buckets with capacity where p is the page size and also potentially the stash and then in total we define this so the table t and the filled pages as the encrypted database so first of all we store the stash and the keys on the client and the encrypted database we will send to the server and then the server stores the encrypted database so the table t and the filled pages and then if the client wants to search a keyword wi he will have to reevaluate the prf on the keyword wi he gets the token so this essentially is the token twi is the result of the prf and this is again as before the mask mi and the key ki so the mask mi we can use directly in order to decrypt the li so the list length and the number of matching identifiers for wi and then we can use the key also in order to look up using the dip scheme the matching buckets and then if we have the matching buckets we can just return the matching buckets to the client and then the client can decrypt those buckets and retrieve the items that he cares about and here it's important to note that all the efficiency guarantees from the dip scheme are inherited to the framework so to this sce scheme so now i've shown you how to construct efficient sse schemes from efficient dip schemes and all that is left is to construct an efficient dip scheme i mean this is easier said than done because this is actually the main technical content of the paper so i will go through an example with you of how that is works so we have a bunch of items so for example five green items to blue items etc and each color maps to one key so these items we want to store in four buckets each with capacity five and how that is proceeds is that for each bucket we draw four nodes in a graph so bucket one will correspond to node one etc and then for each key we will draw two random buckets so in this case we chose the buckets for one and we will draw five edges from bucket four to bucket one the orientation of the edge is not important now this will be optimized later but what it means essentially is that each item corresponds to one edge and this edge is oriented outgoing from one bucket so in this case all edges are outgoing from bucket four and this bucket will later on be the bucket where we store the item so now all items will be allocated to bucket four and we do this again for for the blue items so we choose two random buckets in this case one and two and we draw two edges and now all the blue items will be stored to bucket one and we do this for all the items now that we've drawn all the random buckets and allocated all the items to some buckets this of course is not optimal yet and for optimizing this we pre-compute essentially our degree of each bucket so for example bucket one has our degree six which means they have six items that are stored in bucket one and of course this is more than five so we want to essentially out optimize this to be as close the odd degree to be as close as possible to the page size for example here the the odd degrees three so that means there's still space for two more items here the odd degrees eight so again over five and here the odd degrees already perfect meaning it's five so this bucket will store exactly the right amount of items so now in order to optimize this what we can do is we can just compute a path from a bucket that's overflowing in this case four to a path add to a bucket that is underflowing and this means that if we reorient the edges the odd degree of the underflowing bucket will increase and the odd degree of the overflowing bucket will decrease and the buckets that are in the middle on the path actually don't change their odd degree which means that we actually improve our allocation by one essentially we have one less overflowing item and if we repeat this so for example again we have still seven items so this is too much we find a path to two which is the only underflowing bucket we turn around the edges so now there's no more path from an overflowing bucket to an underflowing bucket which means that this solution is already optimal and now based on this optimal graph we can actually allocate our items into buckets so again we have four buckets and each edge corresponds to one item so for example we can choose this edge which corresponds to this item and since the edge is outgoing from one we move it to bucket one again this edge is outgoing from bucket four so we move an item to bucket four and the other edges are also outgoing from bucket four so we do the same for all the remaining green items now for the blue items again we choose one edge for example this edge it's outgoing from bucket two so we move the item to bucket two for the other edge it's outgoing from bucket one so we move the item to bucket one similar for the red item we choose one edge for example the one outgoing from three we move one item to bucket three choose one other edge which is outgoing from for bucket four, so we move the item to bucket four. And for the last edge, we observe that we need to move it to bucket four by the graph, but the bucket four is already filled up completely up to capacity, so we can't actually move the item to the bucket. But for this, we have the stash, so we can just move the item to the stash. And then we do the same for all the remaining items, and we get this allocation based on the graph. And then we only have two items in the stash and all the other buckets are actually filled. And it's important to note that this solution is actually best among all these solutions, no matter where you start from, so no matter the orientation in the beginning, no matter what choices you make during the algorithm, it always outputs a graph that later on leads to an allocation with the lowest stash size possible. So this is essentially how Tetris works, and then we show the main theorem of our paper, which states that there exists a valid assignment with overwhelming probability, such that we can choose only m buckets, where m is equal to two plus epsilon times n over p, where epsilon's a small constant. So essentially n over p is the minimal amount of buckets that we would need for any packing algorithm, even if it's non-data independent. So the overhead is essentially two plus epsilon. And secondly, we show that this stash is only omega of log lambda divided by log n pages, and more importantly, this shows that the stash is independent on the size of the database, which is important because later on we will store the stash on the client, so we would like this stash to not grow if we outsource more and more data. And on a more theoretical note, this is actually a direct generalization of cuckoo hashing in the static case. So in our case, we allow for weighted items, so lists of variable size, but if we choose only lists of the maximal size p, then actually this algorithm is equivalent to cuckoo hashing. And we show that even though we allow for variable size lists, so not only the maximal size p, that we have the same asymptotic behavior, so the same stash size and the same amount of buckets. So in conclusion, we show that cuckoo hashing works with stash for items of variable size. Secondly, we define data independent packing and actually constructing efficient DIP scheme. And based on that, we get an SSE scheme that has all of one storage and page efficiency, which is called TATIS. So before finishing my talk, I quickly wanted to show you the results of experiments that we ran. So in dark red, you can see the throughput. And in bright red, in bright blue, and in dark blue, you can see the inverse efficiency of page efficiency, read efficiency and storage efficiency respectively. And if we look at the graph, then we can see that the page efficiency is a very good predictor for the throughput. So whenever we have a very high page efficiency, we generally have a very high throughput as well. Secondly, we can see that TATIS actually has high storage efficiency, read efficiency and page efficiency. And the only scheme that comes close to a plaintext database in all of these characteristics. And yeah, with that, I thank you for listening to this talk. And if you want any more information, then I invite you to read the e-print article or to ask me questions in private.