 Yep, thanks for the introduction. So like Rafael said, I'm Kevin. I'm here to present today our work on lower bounds for differentially private RAMs. And this is a joint work with Pino Perceano. All right, so before we begin, I want to describe the scenario. Differential privacy here means something a little different than what's been appearing in the last two talks, so let's go through it a little slowly. What we're essentially considering is a privacy-preserving storage protocols. So you have a client on your left and a server on your right. And what the client wants to do is essentially outsource data to the server for storage and wants to do it in a private way. So the first thing you could do is have client-side encryption. So you can imagine all these blocks being encrypted. However, it turns out that there's this complication of accessing data that can sort of compromise privacy. So for example, let's suppose the client wants index i. And it naively goes and accesses block i. Well, the problem, it turns out, is that the adversarial server can see this and say, oh, well, index i was downloaded. So OK, maybe for one access, it's fine. But over time, the server gets statistics, so things like index 15 was the most frequently downloaded or things like index 2 was never downloaded. And it's been shown in the literature in the past that these things can be actually pretty damaging and compromising towards privacy. So what do we really want? Well, we want some sort of ideal protocol, something like, OK, well, the client says I want index i. It does a sequence of accesses in some random way. Maybe the data is shuffled, et cetera, et cetera. It gets block i. And from the server's point of view, it doesn't know what was the requested index. So this is actually a problem that has appeared very well. It's actually has a different name. It's called oram, or oblivious ram. So what is an oblivious ram? More formally, consider two sequences of access patterns. So this is like the top sequences, access to block 5, block 27, block 13, and the bottom is the same thing. And they differ in many locations. So what an oram does is it takes these accesses and compiles them into, say, private or oblivious accesses. So OK, so what do you mean this compilation? It essentially means it's this red box. It takes an access for index i and creates this sequence of accesses which are private. And for privacy, it's essentially what you would expect. The two resulting oram access patterns are indistinguishable. And it turns out you can get both computational and statistical security, and you can do these kind of things. And they've appeared both in literature. All right, so OK, we talked about oram. So what is a differentially private ram? It's a weakening of oblivious ram. So what we're essentially saying is let's consider two sequences that are now neighboring. So what does neighboring mean? It essentially means that they differ in exactly one location. So here, they are everywhere identical, except for this bolded index. And what we want to do is we want to compile them as well. We want to take these accesses to logical indexes and convert them to physical accesses which are private. So what does privacy mean now? Well, since we're doing differential privacy, let's go all the way. They should essentially satisfy the standard differential privacy notion, which is epsilon and delta for composition. So since this has appeared a lot, I'll just skip it. But one thing I would like to note is that throughout this talk, we're considering computational differential privacy. So this adversary A is probabilistic polynomial time. All right, so this leads to the very interesting question that we came over is, so a differentially private ram is a weakening of ORAM in terms of privacy. It looks like actually a significant weakening. So can we construct differentially private rams that are faster than ORAMs? Very natural question. And it seems like it should be doable. I don't know, from when I first thought about the problem, I was like, one access pattern hiding it. All right, this should be easy. Well, OK, so let's go through the ORAM bounds to make sure we understand what we're competing with. So in ORAMs, actually, just in the recent past year, there's been a lot of work on the upper bound getting to logarithmic. So there was this panorama that actually came from our group as well as this optorama, which achieves the optimal logarithmic overhead. And there's actually been a lot of work on lower bounds as well. So dating back from the 90s, one of the original papers in ORAMs showed this omega log n lower bound. It had some caveats of statistical security and this non-coding assumption, which were removed in a crypto paper last year by Larson and Nielsen, which is actually, I think, one of the best paper. All right, so OK, so now we understand what we're fighting with. Let's see what we can do. So our contributions, we do a couple of them. But the main one, essentially, is that for typical differentially private parameters, epsilon equals constant, delta being less than 1 third. So delta is actually quite strong here in that you typically want it to be smaller than 1 third. But essentially, we show that differentially private RAMs must use this omega log n overhead. So it was actually implicit in that it's not faster than ORAM. So then, OK, that's pretty cool. And we actually show even stronger things. So since differentially private RAMs are so weak, you can imagine these different kind of differential private RAMs where neighboring sequences can only consist of different reads or different writes. So now you can actually separate the cost of reads and the cost of writes. So you can imagine maybe one of them is log n, the other one's maybe constant or vice versa. And what we show is that this still can't happen. Let's suppose we have this differential private RAM that uses this little O log n overhead and writes. Then we prove that the differentially private RAM must use a super log n of a big number of overhead and reads and vice versa. So in summary, what we essentially show is that differentially private RAM is asymptotically, is not asymptotically faster than ORAM for any typical parameters of differential privacy or even much weaker notions as well. All right, so let's get into how we prove these bounds. So before we begin, let me describe quickly what we're trying to prove the modeling. So this is actually a model that's appeared very frequently in the data structure land, but it's only appeared very recently in the oblivious data structure land. So what we consider is a cell probe model. What we essentially do is we have a server, and the server essentially is consisting of these cells. And these cells are W bits. And the only cost in the cell probe model is essentially the client accessing a single cell. So that's the only cost. So just to quickly summarize, the only cost is what we call a probe. So it's either a read or a write to a cell of W bits. Computation is completely free. You can assume as much computation as you want. So as a result of this very weak cost model, we get a very strong lower bound. Because if you can imagine, if we started counting computation, et cetera, we could get even stronger lower bounds. All right, so to start off, before we construct our lower bounds, I'll explain a little bit. I'll go quickly over the previous lower bounds. So this seminal work by Larson and Nielsen and this ORAM lower bound. And essentially, I'll also give you an idea of why we couldn't use it for our proof. So this lower bound uses the very famous information transfer technique by Petrascu in domain that was appeared in 2006. The idea, essentially, is you construct a binary tree with N leaves. So I've drawn it sideways, but the N leaves are on the right. And what you're going to do is you're going to assign each operation of your heart sequence to uniquely to a leaf. So operation one, all the way to operation N. All right, so the heart sequence that was actually used by Larson and Nielsen was this very simple one. It's the read to the 0th index, write to the 0th index, read to the 0th index, and alternate. And when we mean by write, we're always assuming you're writing a uniformly random block, so independently generated. So this seems like quite easy, right? This seems like it's pretty easy to handle. You're only dealing with one index. But the key idea they use is that this sequence must be oblivious for many more complex sequences that can do many more different things. And I'll discuss that a little more in detail. That's the key idea. So like we've done here, so now I've taken my heart sequence, these N operations, and assigned them uniquely to each leaf. So thinking about it more, what are we actually doing? Well, each of these, writing to the 0th index, reading from the 0th index, is actually implemented by cell reads and cell writes, so these probes. So for example, maybe the first option, I'm going to write a random block to the 0th index, is implemented by reading cell 15, writing to cell 72, writing to cell 228, et cetera, et cetera. So what we're going to do essentially is now I'm going to show you how to construct this information transfer tree. So what you do is you take any cell probe that's a read. So for example, here I'm going to read cell 15. And what I do is I find the most recent write to cell 15. So you assume that maybe it's up there, and you assume that this read 0 doesn't write to 15 in between. And what you're going to do is simply assign the cell probe to the lowest common ancestor. So in this case, it's that. So I'm going to tell you a high level, a very cool fact, actually. So it turns out that all these probes that are assigned to this, so, okay, let's go slowly. So let's say you pick a node and look at all of the probes assigned to this node. It turns out this is all the information that can be transferred from the top subtree to the bottom subtree. So what do I mean by that? Essentially all the answers that you give in the bottom subtree of this red, of this red node, right? Who uses information from the top subtree of this red node? The only information you can use is actually are the probes assigned to this node. So to quickly explain why, let's just consider a couple of examples. For example, why couldn't we be transferring information from a descendant of the tree? Well, if you think about it carefully, what are we doing here is that all the probes that are assigned to this blue node, okay, well, it's going to be taking information from the top subtree that's rooted at the red node, and also it's only gonna be red in the top subtree as well. So you can see quickly that it doesn't, that there's no information being transferred from the top subtree of the red one to the bottom subtree of the red node, rooted at the red node. And similarly, if you look at any ancestor node, it turns out that, well, okay, it could be that this blue node contains probes that had information from the top subtree of the red, of the red thing, but it's gonna be, it's only gonna be red at a location after the bottom subtree of the red node. So I'm going very fast here, but essentially that's the core idea of what the information transfer technique does. So it turns out that what you can do now is focus on each subtree, and what you're gonna do is construct a hard sequence for each subtree. So here, I mean, it's actually a very simple one. What you're gonna do is, in the top subtree, you're gonna write random blocks to unique indices. And in the bottom subtree, you're just gonna read them. All right, so you're gonna write one, write two, read one, read two. And you can quickly see what you'll see is that for any subtree with two L leaves, it turns out that if each block generated was B bits at random, L times B bits must be transferred from the top subtree to the bottom subtree. Just a simple information theory lemma, right? So then what you can essentially prove now is that if you believe me that, oh, the only information that's transferred from the top subtree to the bottom subtree are all the probes that are assigned to the root of that, then it turns out that that root must have L times B over W probes. If each probe gets W bits information, right? And well, okay, fine, we did this for one subtree. Well, by obliviousness, it must apply for all subtrees, right? Because if you see that a specific node or doesn't have enough probes, you can rule out a set of operations that even a computational adversary could do because it just counts probes. So long story short, whatever the worst case is for each node in the tree, it must be true. It must be that that many probes are assigned to it, otherwise you would break obliviousness. All right, so that's the Lars and Nielsen paper, and now you might ask, okay, well, why does this not work for differentially private RAM lower bounds? Well, let's take a look carefully at our heart sequence for each subtree. What we're essentially doing is if a subtree has L leaves, you're changing the original heart sequence to a different heart sequence by essentially on the order of the number of leave operations. So this is really bad for differential privacy because differential privacy guarantees exponentially deteriorated the number of operations that differ, right? So it turns out that from what I'm essentially saying is that what happens is when your subtrees are too large, we have no meaningful guarantees for privacy. We can't use obliviousness operation anymore because it just doesn't work because you have such small guarantees. And so another way to look at it, maybe the way I like to look at it is that if you change a single operation in the tree, you can only affect order log n subtrees. Here's a quick example why. Let's suppose we have this and we quickly change read zero to read one, right? How many different subtrees can we affect? It's just all the parents, right? So what happens is that changing, at least in this technique, changing one operation will only give you this very weak bound. It'll improve it by a very small factor that's not significant to the lower bound. All right, so how do we get around this? Well, we just went to a completely brand new technique that's actually much older. So we use the core program technique by Friedman and Sacks that was actually introduced in almost 1989. So okay, I'm gonna caveat this fact that I'm only gonna explain the log n over log log n lower bound because if the log n lower bound is much more complicated, but I think the simpler one does give you an idea or a flavor of what the core program technique is. So the hard sequence for our technique is essentially very simple. You're gonna write to index one, write to index two, write to index all the way to n, and again, these are all independently, randomly generated blocks. And then you're gonna read to zero. But I highlight the read to zero because that's the operation that will change now. So okay, what we do is we take the writes and we essentially separate them or partition them into exponentially decaying epochs. So for example, we number them in reverse order. So epoch zero is at the very end and epoch k the largest ones at the very beginning. And they're gonna exponentially decay in size by some factor parameter b which we'll describe later. All right, so now let's go back to our cell probe model and now we have all of these cells on the server. What we're gonna do is we're gonna actually number them and we're gonna number them in a very specific way. Essentially each cell will be assigned a number according to the last operation depending on the epoch of the last operation that overwrote that cell. So as an example, consider the leftmost cell. It's a five. So what that essentially means is that the last operation that modified the contents of this specific cell occurred in epoch five. And we did this for the entire thing. And you can imagine, of course, there are some cells that were never modified, but we just removed them. All right, so okay, so this actually, this actually lower bound is actually very intuitive and very simple. So what we do is fix an epoch i. For concreteness, let's pick i equals five. And all we're gonna do is analyze where can information about writes from epoch five be stored? So remember epoch five is essentially this, just a bunch of writes. And all we're gonna try to figure out now is where can this information ever be stored in our data structure? All right, so looking at it first, of course it could be, information could be written in any cell that was last overwritten in epoch five. That's very simple. And of course it can also be done in future epochs. So the ones that occur after. So any epoch that's numbered smaller than it. So you might ask now, wait, why can't we write information for epochs that are larger that appeared before? Well, because of we're generating these blocks uniformly and independently at random. So a previous query cannot write information about something that hasn't been generated yet. So it turns out that these two are the main locations where we can store information about writes that occur in epoch five. All right, so let's look at the, so that's exactly what's written here. I want to focus on this second bullet here. The cells that are written epochs after epoch five. So this is epoch zero, one, two, three, and four. So going back to our weird epoch construction, you might ask, oh, why are we having this geometrically decaying sizes? And it turns out this is the exact reason I was cleaning that. If you look at all of the epochs that occur in the future of epoch I, because they're exponentially decaying, we can essentially show that they're contained at most B to the I minus one updates, right? Assuming B is greater than or equal to two or something like that. And what we're essentially trying to say is that if you choose B to be large enough, there are just not enough operations in the future that can write a significant amount of information about epoch I. So for example, if you had the total if B is like, let's say log N, right? Essentially what we're saying is even if the data structure tried to only write about epoch I in the future, it can only write a one over log N fraction of information that was actually generated in epoch I, right? So this is actually very important. So essentially what we're trying to say is that it's too small. It just doesn't have enough information, doesn't have enough time to write anything substantial about epoch five. So I guess there's one more place I had sort of forgotten to tell you guys about is that there's also client memory, but for simplicity, let's assume it's really small. Let's assume it's like constant or sort of a key size. So it turns out that these three are the exact locations that any information about epoch five can be stored, right? And it turns out two of them are really small. Client memory is very small and these cells that are written after epoch five are really small. So what I'm essentially trying to get to is that let's suppose we had a read operation for some random variable X, and this X was chosen uniformly at random amongst the B to the I overwritten indices in epoch I, right? Essentially what I'm trying to say is that, well, it must be that this read operation has to read like omega B bits from cells last overwritten in epoch I. And you guys might ask why now? Well, it turns out the other two locations that could have stored information about epoch I are just too small. You just don't have like, it can try its best, but it only have like one over log N probability of picking the right block to even store about stuff like that. So on average it must be that omega B bits from cells last overwritten in epoch I must be read by this read operation. And now what we're gonna do is apply differential privacy. So right now we have a word, we have a bad or a worst case sequence for each epoch. And now we're gonna do the same thing where we had done obliviousness before. So obliviousness like the Larson Nielsen, what they had done is they had glued together all these worst cases for subtree using obliviousness, which we couldn't do because of differential privacy. It deteriorates with the number of operations. So what we do here is actually very simple in that because of what we're gonna do is we're only gonna modify the read operation. And what we'll do is we'll consider K different random sequences. So in the first sequence, index X is chosen uniformly at random from all the updates in epoch I. And it's the same thing for epoch II all the way to epoch K. So what you can quickly notice that all of these sequences, the only different one location, the final read. So what I'm trying to get to is that differential privacy guarantees still hold quite strongly. And now we can actually glue this together. So what happens is that this omega B bits that must be read from cells last over in epoch I, this was true for all epochs. Otherwise a computational adversary can quickly say, oh, let me look at the read operations, let me see all of the probes that are done from each epoch. And if you don't have enough probes from a specific epoch, you can essentially rule out specific read operations. So essentially what we've done is now, from a high level, what we've essentially done is we've found a way to glue together these epochs in a way that we couldn't do for the information transfer technique with the subtrees. All right, so like I said, I'm not actually giving you the omega log n lower bound. I'm giving you a weaker version. So what I'm actually giving you is this omega, essentially log B to the n lower bound where this B was this exponential decay of the epochs. And in this specific technique, what you need B to be is actually to be something like a polynomial when the average number of cells that are read operation can do and the size of the cell. So it turns out that this technique is actually peaks at log n over log log n, which is not what I'd given you. So now you guys are probably asking, do I cheat you guys? How do I get this log log n back, right? So okay, getting back to log log n factor, it's funny, log log n for all practical things in the world is like five, but we took 10 pages for us to prove it. But the key idea essentially is to randomize the query location. So the worst case and why this log log n factor exists is that if you go back, what the worst case is is the one I described before, where the data structure doesn't even care about future epochs. All it's doing is trying to attack a specific epoch. So all it's gonna do is just take all of its permitted query time or write times and just attack and try to store as much information about a specific epoch. So essentially to prevent this, what we can do is randomize the query location. So consider like maybe 100 and different write operations and then you randomly pick the read operation without the adversarial data structure knowing it. And what you can essentially prove is that the data, since the data structure doesn't know when the read occurs, it can't know the construction of the epochs. And in such a way, the best thing it can do is actually just try to equally apply and attack all the epochs simultaneously, which gives you the log log n factor back. And essentially when you do this sort of technique, what you can do is the epoch construction can now come with only a constant factor decay. So you get log b to the end, but the b is constant. Okay, going forward, we also had stronger trade-offs, right? I had promised you this thing where if you had little o log n reads, I would give you an omega log n writes lower bound and vice versa. So once you do this randomized query location thing, what you can essentially do is you get these very strong trade-offs between read and write operations and you can abuse epoch constructions. So what I described before was a very simple epoch construction that essentially decays with some parameter b. So what happens is if you have a small number of reads, if your reads are very small, right? What you can essentially show is that what you can do is modify the epoch in a way such that it makes it very bad for small reads and small writes. So as an example, let's suppose your writes are very small, right? If your writes are very small, what you can do is actually increase the number of epochs significantly because of what we had done before was, the worst case was that these future epochs would try to attack a specific epoch. But if you don't have enough time to write, you can make these epochs even smaller. And this allows you to get more epochs and then you can prove a lower bound of omega log n reads by applying the same technique over all the epochs. And then you can do the same thing vice versa for the reads. All right, so that's it. So I guess before I end, I wanna describe some more newer lower bound works from our group. So this first one is actually a little weird. It's a different kind of a lower bound. What we're essentially trying to do is, in all of literature, what we've essentially done is we have a specific definition for privacy. And we wanna say, what's the best efficiency we can achieve? So we flip the question. In a lot of databases, it's like impossible to consider anything more than constant overhead, right? These like highly accessed sequences or highly accessed machines. So we split the question and said, all right, let's suppose you're stuck with small overhead. What is the best storage access privacy you can achieve? And well, we characterize it very carefully for these small overhead differentiated private data structures that we show. It's a spoiler essentially that epsilon must be theta log n, which is not great for us. And in another work that I've been a part of is this lower bounds for oblivious near neighbor search. So it turns out so far that all of these oblivious lower bounds have peaked at log n. What we've done is taken data structures that have like constant overheads, you know? So like, and Iran has constant overhead or like stacks, queues, et cetera have constant overhead and improved omega log n lower bounds. So there's a natural question of, can you take a data structure problem where the best non-oblivious lower bound is like log n or lower and prove a super logarithm? Like let's say a log squared n lower bound. And we did that for oblivious near neighbor search. And it's actually pretty cool in that we developed all these new techniques that are actually, we believe are actually applicable to just normal data structure lower bounds that are even non-oblivious. And that's it, I'll take any questions I guess.