 Hello, everyone. I'm Alex Beanstalk, and I'll be presenting our work, Forward Secret Encrypted RAM, Lower Bounds and Applications, and this is joint work with Yevgeny Dodis and Kevin Yeo. Okay, so what is Forward Secret Encrypted RAM, or FSE RAM as we call it? Well, I was extensively studied for a long time, and the goal was to privately and forward secretly outsource the storage of a large data array using small client storage. Okay, and the adversarial model is that the adversary always sees the server storage, and also we want forward secrecy, meaning that the adversary can leak the client storage, and in this case, of course, everything that is currently stored in the RAM is no longer secure, but we want everything that has since been deleted or overwritten to be secure. Okay, and we also want efficient reads and writes, and for this work, we'll assume a static number of entries n. Okay, and of course, we want to use practical, namely symmetric crypto. Okay, so what's the trivial solution? Well, let's say we have these eight data items making up our data array. What we'll do is we'll just encrypt every entry with a single key. Okay, so we have this key K in our local client storage, and on the server, we have the encryption of each data item under this key K. Okay, but for security, then we need to refresh the key after each write. Okay, but of course, then this isn't efficient, because we will need to re-encrypt everything. Okay, so there's also a folklore solution that's a tree-based scheme, and this is how it works. And the client side, we again have one key K epsilon, and it's the root of this tree, and it encrypts its two children K0 and K1, and we store these cipher texts on the server. Then each of K0 and K1 encrypt their children, and at the end, we have the leaves are encryptions of the eight data items under their parents. Okay, so how do we read? Well, let's say we want to read the fourth data item, then we take K epsilon, decrypt K0, then using K0, decrypt K01, and finally decrypt D4. Okay, so this was a log-in overhead, as you may have realized. Okay, how do we write to a cell? Well, let's say we want to write to cell four, new data D4 prime, then what we'll do is, in addition to decrypting the direct path to the fourth cell, like we did for reads, we'll also decrypt the copath. Okay, so in addition to decrypting K0, we'll also decrypt K1, in addition to decrypting K01, we'll also decrypt K00, and in addition to, well, we actually won't decrypt D4, we'll just sort of get rid of it, and decrypt D3, and we're left with D3 K1 K00, we'll just forget about everything else, and we'll generate a new key, K01 prime, for the parent of D3, and also encrypt D3 and this new data D4 prime under K01 prime. Okay, then we'll generate a new key at the parent of K01 prime, encrypt both of its children, and finally generate a new K epsilon prime and encrypt both of its children. Okay, and so you'll notice now we have forward secrecy because all information about D4 is gone now. Okay, so even if the adversary corrupts K epsilon prime, well, K epsilon prime and everything that we can decrypt with it is totally independent of D4 or anything that could have previously decrypted D4. Okay, so what was the efficiency of this scheme? Well, we had a big O of one client storage, big O of N, sorry, that should be server storage, and big O of log N overhead for reads and writes. Okay, but we can also further extend this to have big O of S client storage, you know, for some S that's not too big, still sublinear, big O of N server storage again, and big O of log N overhead. And this is done by just creating S different trees and storing in each tree and over S of the data items. Okay, now the main question that we have in this work is, is this scheme optimal? And maybe surprisingly, this has been unknown for two plus decades. And in fact, in our work, we answer that yes, this is the optimal scheme. Okay, so for the rest of the talk, first, I'll give you some intuition for the lower bound that we prove. And I'll talk about the model in which we prove it, which is called the symbolic model, and give you the proof intuition in this model. Okay, then I'll give some applications of FS ERM where we sort of circumvent the lower bound. Okay, and these applications are forward secret memory checkers, oblivious forward secret encrypted RAM, and forward secret multicast encryption. Okay, so first, let me present to you the model. Before I do that, I just want to mention that the cell probe model is considered the holy grail for private data structure lower bounds. Okay, but in this case, it turned out to be too powerful. Okay, so the cell probe model only counts cell probes towards the cost of the protocol itself. So not any other computation, just how many server cells that you touch. Okay, but on the other hand, it still requires any adversary attacking the protocol to be PPT. So in fact, there is a trivial, big O of one cell probe solution. And this is done using authenticated, sorry, authenticated encryption. And the scheme, unfortunately, for the upper bound will not be PPT itself. But basically, what we'll do is when we go to do a new write, we'll just choose a random authentic authenticated encryption key and encrypt the data under this key, sort the encryption on the server and just forget the key. Okay, now any PPT adversary clearly can't break this scheme. But when the protocol goes to read a data item, it'll just try all authenticated encryption keys, which it can do because it has unbounded computation and only needs to download or probe the single cell with the data item. And it'll just try all these keys until it succeeds. Okay, so here we've shown that the basic cell probe model is too strong. And what about other models? Well, there's also the balls and bins model that has been used to prove some private data structure lower bounds. But in this case, it's actually too weak. And this is because in this model, server cells can only hold encrypted array contents and nothing else. Okay, so, namely, even the folklore construction for FS ERM is not captured in this model. Okay, this is because we have encryptions of keys on the server as well. So the model that we do use in the end is called the symbolic model. And it's been used recently in other works as well. And it's actually the perfect in between of these two models that I mentioned before. Okay, so what is a symbolic model? Well, in the symbolic model, server cells hold strings that are arbitrarily derived from some structured grammar. Okay, and so in our work, we allow for encryption and possibly dual PRF keys, which are basically just random coins, also cipher texts from this encryption scheme, secret shares, and data entries themselves. Now note that strings can be arbitrarily nested combinations of the above. So here's our grammar. Yeah, it encompasses all the things I said before. So like keys, cipher texts, yeah, data items, secret shares, all this stuff, and in a nested nature. Okay, so some examples. Well, we could have random coins drawn from our set of random strings, which is represented by capital R. So we could have random coin little r or little r prime. We could have some encryption under some PRF computation of some secret share of another encryption of some data item D1, and maybe the other secret share of this encryption as well. Okay, so now I'll talk about the allowed derivations in this model. And what I mean by that is the strings that are derived from this grammar don't actually have any meaning. So really like no meaning and also sort of no representation beyond just being some abstract symbol. So for example, there's no bits in this model. It's just, yeah, each symbol is its own distinct abstract symbol with no exact meaning on its own. Okay, but the so-called entailment relation is what actually captures their meaning. And it's quite intuitive. Basically, for example, you can only decrypt a ciphertext if and only if you have its corresponding key. With a PRF key, you can compute any output of that PRF. And so going back to the examples of the symbols that we had on our last slide, assuming two out of two secret sharing, we can derive D1 if and only if we have the following strings, R, R prime, this first encryption, and this secret share. So what we can do is we'll compute the PRF of R and C, decrypt this first ciphertext to get now both shares of the encryption of D1. So we get this encryption and then use R prime to decrypt it. Okay, but if, for example, we were missing the second share of this ciphertext or maybe R prime, then we would not be able to derive D1. Okay, so as part of our lower bound, we define this key data graph. And this key data graph abstracts encodings of the client and server cells and the data array itself. Okay, so for this talk, I'll only consider PRF and encryption for simplicity. So what is the key data graph? Well, it's a directed graph at time t such that its vertices vt are keys that are sampled and still accessible by the protocol, as well as the data entries at time t. And then for the edges, we have one case in which, so for an edge from u to v, it could exist if v corresponds to some PRF computation on u. Okay, or also it could be that u is part of some nested encryption of v. Okay, so if we have encryption of k1, sorry, encryption under k1, of encryption under k2, of encryption under k3, of k4, then, for example, we have edges from each of k1, k2, k3 to k4. Okay, and also this edge will exist if such a ciphertext was stored in a server cell at any time in the past. Okay, so of course, the folklore construction is easily captured, but it's really much more general, this data graph. Okay, so here's an example. We have that k1 is stored in a secret cell, and we also have the encryption of k2 under k1, k3 under k1, of v1 under k2, d2 under k2, d4 under k3, and also the nested encryption of, so the ciphertext, which is the encryption of d3 under k3, which is then encrypted under k2. Okay, so this is what our key data graph would look like for this example. And, you know, maybe we also have some other stuff as well. Okay, so what happens if the protocol needs to overwrite d3 in this case? Well, it must forget k1 for forward secrecy, and then it must also forget either k2 or k3, right? Because if one of them is gone, then the ciphertext encrypting d3 cannot be decrypted. Okay, so let's say we forget k1, k3. That means that the encryption of d4 under k3, as well as k2 and k3 under k1, and finally this nested encryption of d3 all become useless, right? Because the protocol itself also will never be able to decrypt the ciphertext since these keys are gone. Okay, so yeah, the ciphertext corresponding to these edges are all gone now. Okay, so now what's the intuition for the main theorem? Well, we have this first lemma where we say that each data entry must have a path in GT from a vertex representing an encryption key in a secret cell, right? And this is because of correctness. We must somehow be able to derive all data entries and also privacy. Data entries or keys that encrypt them can just be stored in the clear on the server, so there must be something in the client storage, okay? And now here's lemma 2. So this is just a general graph theoretic lemma. If we take any graph GT, satisfying some requirements, but very basic requirements, and we randomly choose a data entry, then the corresponding path from the secret cell key has expected out-degree log n over s. And this expectation is just over the random choice of cell to overwrite, okay? And finally, we have our theorem where, yeah, just like lemma 2, if we choose a data entry randomly to overwrite in each operation, then in expectation, the contents of log n over s server cells become useless, like in the previous example, okay? And yeah, the contents that become useless, each operation are unique across operations, so we're never double counting anything. And I just want to emphasize that this is agnostic to any random choices that a protocol makes, because lemma 2 holds for any graph, okay? So yeah, we can just also to take care of nested encryptions, just take the graph with minimum out-degree paths to each data entry, and apply lemma 2 to this. Okay, so now the main intuition for some applications, well, we'll just focus on primitives that already have efficient, meaning a big O of log n, tree-based constructions, okay? And if we naively compose the folklore FSE-RAM solution with them, we'd get log squared n overhead. But instead, what we do is we overlay the FSE-RAM tree with these constructions, the trees in these constructions to retain big O of log n overhead. Okay, so it looks something like this. Maybe this is the other primitive scheme, and this is FSE-RAM, and we just combine them. Okay, so the first application is on forward-secret memory checkers. So recall that for forward-secret encrypted RAM, we assume that the server always returned the correct stored cells to the client for read and write operations. On the other hand, memory checkers, which have also been very heavily studied in the past, guarantee integrity of the outsourced data array. Okay, and for FSE memory checkers, we require both. So FSE of the data still holds, and also the protocol should output some error if some tampering has occurred. You know, if the server returns incorrect server cells, and this should happen even with client-state leakages. And what's the intuition? Well, we'll just simply overlay a classical merkle tree with our folklore FSE-RAM construction. Okay, so we'll retain the gov log n overhead, which is optimal with respect to both our FSE-RAM lower bound of this work, and the most optimal memory checker construction. And yeah, the best known memory checker lower bound is actually the go mega log n over log log n. So yeah, not quite tight with respect to that, but it is tight with respect to the best construction. Okay, so here's what a merkle tree looks like. Basically, we have the data cells as the leaves, and each of their parents stores a hash of them, and we sort of recursively carry this through to the root. Okay, so each node stores a hash of its two children. Okay, so how do we combine this with FSE-RAM? Well, if we look at these two children here, we store two ciphertexts encrypting these data items, and at its parent, we'll start by storing the hash of these two ciphertexts. Okay, and we'll do the same thing for D3 and D4, and then what we'll do at node 00 and 01 is we'll have some key, so the key that decrypts the leaves encrypted under its parent just as an FSE-RAM. Okay, and then at the parent of these two nodes 00 and 01, we'll store a hash of both the ciphertexts of the children and the hashes of the children. Okay, and we'll just carry this through to the rest of the tree. Okay, and for this hash, you can just use collision-resistant hash functions, or if you want to avoid the random oracle model, then you can use wolves, but then, yeah, you need to also include the description of the hash function at each node, and also the word size must be polynomial. Okay, and also note that instead of using hashes, we can just use FSAAD, but then we give up on integrity after some leakage occurs, right, because then the basically the MAC key is obtained by the adversary, and so she can tamper with anything she wants. Okay, so yeah, how do reads and writes work? Well, for reads, it's just the same as in the FSE-RAM construction, but we just make sure the hashes of everything we download is okay, and if not, then we output error. Okay, and then for writes, it's also the same, but when we download again, we check the hashes, and then when we write the new keys and ciphertexts to the server, we also regenerate the hashes. Okay, so efficiency is nice, asymptotically the same as before, log n, and for the security intuition, well, if we look at the root key and the hash, then the hash ensures that we have the right ciphertext, so we correctly decrypt the keys at the children, and we still of course have privacy. Okay, and then we can inductively carry through the same argument, the keys and the hash decrypted at level i, we'll ensure that the ciphertext at level i plus one decrypt correctly, and we'll still have privacy. Okay, so that's it for FS memory checkers. Now let's talk about oblivious FS eRAM, so FS eRAM on its own does not require patterns of reads and writes to be hidden, but of course oblivious RAM does require this. Okay, but oblivious RAM does not allow for leakage of client state, so the question we ask is can we combine these notions? Well the strongest notion which gives FS and obliviousness after corruptions unfortunately is inefficient, and we in fact show in omega n cell probe lower bound, which basically means the trivial solution of decrypting and re-encrypting everything on every read and write is optimal. But we can achieve both modulo client leakage, and what I mean is if leakage occurs then we have FS as an FS eRAM, but the access pattern is not hidden. If there is no leakage then we still have that our access pattern is hidden, and of course we have a vacuous FS. Okay, and we do this with big O of log n times f of n overhead for any function that is little omega of 1. Okay, and this is almost optimal with respect to the O RAM lower bound and the FS eRAM lower bound. Okay, so what's the intuition for our construction? Well if we use tree-based O RAMs then we can overlay the FS eRAM construction on this tree, but of course tree-based O RAMs unfortunately require log squared n overhead, but on the other hand hierarchical O RAMs have recently been shown to achieve optimal log n overhead. Okay, but we can't of course easily overlay the tree-based FS eRAM construction on these hierarchical O RAMs. So what do we do? Well we just compose tree-based O RAMs with hierarchical O RAMs. We'll use tree-based O RAMs and replace the position map in these tree-based O RAMs with hierarchical O RAMs. Okay, so I'll show what I mean on this next slide. So yeah, first we have the position map which stores a map of each data cell to the leaf in the tree-based O RAM that we use. Okay, and we store this in a hierarchical O RAM. Okay, and we do this because yeah this position map is what causes tree-based O RAM to have the extra log factor. Okay, so then we have our tree where we have k epsilon stored at the root and in each interior node we store encryptions under the node at the parent of data items whose leaf is in the corresponding subtree. Okay, and we pad with dummies if needed so that every bucket has the same number of nodes always. Okay, and each of these nodes will have a bucket with constant many entries. Okay, and yeah, so to augment the tree-based O RAM in the literature with our FSE RAM construction in each of these nodes we also store the the an encryption of the key at this node under its parent. Okay, then we also have a stash which stores encryptions also under k epsilon of at most little omega log n data items. Okay, so how do we read and write? Let's say we're reading or writing cell i. First we just look up i in the position map which because of our hierarchical O RAM has a big O of log n overhead then we retrieve the direct path and copath of i's leaf and decrypt them as in FSE RAM which of course is big O of log n overhead and then we retrieve and decrypt the stash which is that most little omega log n overhead and if it's a read operation we just return data di and then continue along the next steps and if it's a write operation then we update the data at i so we say we're going to now store di prime then in both cases we pick a new leaf at random for i and store it in the position map which of course is log n because of our hierarchical O RAM then we put it back in the stash with its corresponding leaf so the label for its leaf and then we greedily evict items from the stash to the nodes on the direct path of cell i in the tree sorry it's it's old leaf in the tree where we put an item in the node only if its leaf is in the subtree of the node okay and since there is log n such nodes of course this is log n overhead and then finally we regenerate keys for the tree and re-encrypt the nodes as in FSE RAM and re-encrypt the stash which is log n overhead okay so yeah and then obliviousness follows from previous analyses of the tree based in the hierarchical O RAMs but you know basically of course for the p-map we have obliviousness from the hierarchical O RAM and then for the tree well we we just decrypt and re-encrypt a random root to leaf path and copath reach operation and also the whole stash okay so basically every operation looks the same okay now for our final application we have forward secret multicast encryption so what is multicast well multicast a group manager distributes keys to a group of n users and these users can be replaced at which point the group manager distributes a fresh key to the current users in the group such that the key is private to everyone else so namely everybody that was removed from the group previously and anybody who will be added to the group in the future okay so there's a classical tree based construction that achieves log n communication and computation okay so our application hopes for stronger security and recently uh there has the the the community has has desired stronger security notions regarding state leakage so for example in continuous group key agreement we have the same setting as multicast but there's no group manager and this serves as the core of secure group messaging protocols and in continuous group key agreement user states might might be leaked and we still want FS and what's called as what's known as post compromised security okay but unfortunately this primitive can be very inefficient big omega of n communication and computation per operation in the worst case so multicast can be useful for the sgm setting where there's only one administrator who adds and removes users because you know of course we retain this nice log n efficiency okay so in this talk we'll just focus on forward secrecy for the multicast group manager state leakages only and disregard user leakages for now okay and as I just said we'll retain log n communication and computation and also aim for small group manager a secret state okay and so the for the security what what we want is if the group manager is corrupted all previous keys so group keys should remain secure okay and this is optimal with respect to both our fse ram lower bound and previous multicast lower bounds okay so yeah here's the folklore multicast construction we have a tree of keys user keys are at the leaves and the invariant that we want is that users only know the keys at the nodes on their direct path from their leaf to the root okay and the group manager stores the whole tree so what happens when we replace user three with user nine well we put user user nine's key in the old leaf then we generate a new key for its parent and encrypt this key to both children so both to id9 and id4 then sample a new key for the parent of zero one prime and encrypt to both children and finally sample a new root key which serves as the group secret yeah and the users that are in the sub trees just decrypt whichever cipher text comes first along their direct path to to acquire the proper keys that it needs to to maintain this invariant I mentioned on the last slide okay so our first step in making this construction forward secret is the following let's say we do the same thing replace user three with user nine then yeah at each node we have in addition to a key also a prg seed and the key will actually be a one-time pad key okay and so yeah when we replace nine we have a new seed and key s3 prime and k3 prime and we sample a new seed and key at the parent and then we just do a one-time pad of the new key at the parent under both children so really also we want to be able to obtain s1 prime from these encryptions as well and I sort of just swapped that under the rug but basically you just encrypt a seed that derives both s1 prime and k1 prime okay but now importantly every user in the sub tree as well as the group manager will just ratchet forward the two children here the keys of the two children using a prg computation on s2 and s3 prime okay and so as soon as a key is used we delete it so we have forward secrecy okay because if a group manager is corrupted then all all the adversary sees is keys that have never been used before okay but unfortunately the group manager still stores the whole tree so what we do to remedy this is just overlay our FSE RAM construction on the multicast tree which the group manager stores okay so let's say we have n equals four so four users in our group well we have a RAM key in the group manager's secret state so it's client storage and this RAM key encrypts both the RAM key at its children as in our FSE RAM folklore construction but also the seed and key for the multicast construction okay and we just carry this through to the to the leaves okay so how does a replace work now well if we want to replace id2 with id5 we just download and decrypt the direct path and co-path of id2 as in our FSE RAM construction then we generate the new multicast keys and cipher tax for this path and co-path and then we just do the same for our FSE RAM component where we replace the RAM keys on the direct path and regenerate encryptions of the new keys so both the new RAM keys and also the new seed key pairs okay and yeah basically security composes because yeah so we have forward secrecy with respect to our FSE RAM part of the scheme meaning that none of the keys at the old sorry none of the old keys at the nodes of the FSE RAM tree can be obtained meaning when a corruption happens only the current seed key pairs at the multicast tree can be obtained and as I said on the last slide because the keys will have never been used that are corrupted then we get forward secrecy with respect to previous group keys okay uh thanks that's it uh here's a link to our uh e-print paper and uh yeah I'm happy to ask answer any questions over email or in the live talk all right bye