 Good morning, everyone. I'm Kartik Naik, and today I'll be talking about a work on locality-preserving Oblivious RAM. This joint work with Gillard, Hubert, Raphael Ling, and Elaine. So let me start off by quickly defining an Oblivious RAM. So in general, in a computation, you have a request sequence i, and the request sequence looks like the following. I want to read memory location a1, followed by write something to memory location a2, and so on and so forth. Your CPU actually executes this access pattern and sends a response back. The problem we are trying to solve with an Oblivious RAM is to hide which addresses are accessed in this request sequence. So we assume that you have an adversary which is snooping on your address bus, but it cannot look into the contents of the CPU, for example, or the actual contents of the data in the memory. For instance, you can think of your data as being encrypted. So what you want is to convert this logical access sequence into a sequence, let's call it ORAMFI, in such a way that the adversary does not learn what i is. So for security, you want, for any two access patterns, i and i prime of the same length, that ORAMFI is indistinguishable from ORAMFI prime. So typically when you're trying to obfuscate these accesses, you will end up accessing more memory locations than you would do in a logical sequence and that is why one important metric that you're trying to reduce is bandwidth or the number of memory locations you access for every logical access. So separately in systems, another notion that is very important is the notion of data locality. So what is data locality? If a memory location is accessed, a nearby memory location will be accessed soon enough. For instance, if you have a file server in such a way that let's say a file spans five blocks, then if you're accessing the entire file, after accessing the first block, you will end up accessing the next four blocks as well. And you will have a similar observation if, for example, if you have range queries. And this notion of locality has motivated the design of hard disks and SSDs that we have today, in such a way that when you're accessing the first block, that is expensive and accessing the subsequent nearby blocks are cheaper. So at this point, I would like to distinguish between this notion of locality which is spatial locality and temporal locality. So temporal locality says if I access a memory location, soon enough I will access the same memory location again. And that has motivated the design of memory hierarchy and caches and so on and so forth. But over here we are only concerned with spatial locality. So the problem is that with an oblivious RAM, this obfuscation of accesses inherently break locality. So in an ORAM scheme, let's say if I'm accessing memory location five, I end up accessing five and some more serious memory accesses. But more importantly, after performing this access, I end up shuffling my data around. And this such shuffling inherently breaks data locality. So the question we ask in this work is the following. Can we construct a bandwidth-efficient ORAM that preserves data locality? So this notion of bandwidth-efficient is important over here. But before we understand this question, let us first define what I mean by locality. So at a very high level, locality is just the number of headseeks or the headjumps or the number of discontinuous regions that I'm going to access. So if I'm making one ORAM access, and let's say your head moves in the following fashion where I have access three blocks, this contributes three to bandwidth and one to locality. And let's say then the head performs a jump and makes another set of accesses, then correspondingly my locality and my bandwidth would increase. So at a high level, I'm just trying to reduce the number of headseeks or headjumps. So if reducing locality was my only goal, then the following simple solution would just work. I'll just scan the entire memory, and then I'll get a locality of one, which is great. But the problem is that your bandwidth is going to be very high. So in that sense, I'm trying to reduce both locality and bandwidth together. And intuitively, I'm trying to get both of them to be polylogarithmic in N. So the first question is, can I actually get this, turns out we show a lower bound that you cannot have something like this. And in general, we show that we show a lower bound which says, if you want sublinear locality, then that implies linear bandwidth. Well, this is not great, but on the flip side, we show that if you have some leakage, perhaps something is possible. So the next question we ask is whether we can construct a bandwidth-efficient ORAM scheme that preserves data locality while only leaking the lengths of the contiguous regions that are accessed. And under such a leakage, we show the first ORAM scheme with data locality. So specifically, if your access sequence is making T requests in such a way that it spans K discontinuous regions, then we show an ORAM scheme which has a bandwidth which is T times polylogarithmic in N. So if you know the ORAM literature, this is something similar to what all ORAM schemes do. But the main part of the result is that we have locality that only depends on the number of discontinuous regions, and the blow-up is a factor of polylogarithmic in N. Specifically, it does not depend on the number of memory locations T that is accessed. We achieve this while using a client storage, while using a constant client storage. And interestingly, for some part of a computation, we end up using two disks instead of one. And that is essential in our scheme to achieve locality. So and as I mentioned earlier, we end up leaking the lengths of the K regions that we are accessing. So in a nutshell, we have this upper bound and the lower bound, and for the rest of the stock, I'm going to focus on the upper bound result. So before I explain how to achieve this for a standard ORAM, let me simplify the problem and show how it works for a simplified scheme, or a simplified input pattern. So in general, my request pattern can look like the following. I want to read memory locations 2, 3, 4, and then write to memory locations 7 and 8. So T equals 5, and K equals 2 over here. So the first simplification I'm going to make is that instead of these requests coming one by one, I know my range is offline. So I'm going to know ahead of time that the access is going to be for ranges 2, 4, and 7 to 8. And a second simplification I'm going to make is that I will only allow reads for now. So let us see how we can achieve this for the simple scenario. And the high-level idea is to use what we call read-only range ORAM. And the idea is very simple. Instead of storing data in one ORAM, I'm going to store data in log n different ORAMs. ORAM 0 to ORAM log n, or log n plus one different ORAMs. So ORAM 0 stores my primitive data blocks, the way you would think of storing data in your ORAM. ORAM 1 stores NO2 blocks, but each block is now a super block, which is a concatenation of two primitive blocks. So block 1 is block 1, 2. The second block is 3, 4, and so on and so forth. So you have n over 2 super blocks of size 2. And in general, ORAM i has n over 2 to the i super blocks, each of size 2 to the i. So the important invariant that we have over here is that all ORAMs should store identical information. So if I'm accessing some block i in ORAM 0, it should be the same as the primitive block i in any of the log n ORAMs, or log n plus 1 ORAMs. So let's say now I want to access a range of size 2 to the j. If it's something smaller than 2 to the j, I can add it to the nearest power of 2. The observation is very simple. A range of size 2 to the j is covered by two super blocks of size 2 to the j. And if you think about this for 10 seconds, you'll realize that this is true. And which two super blocks I can evaluate this by some simple math. Now, if I want to access a block of size 2 to the j, I'll just go to ORAM j and perform my access. So that will have a bandwidth of 2 to the j polylogin. The ORAM has a blow-up of polylogin, and each block has size 2 to the j. In terms of locality, observe that this is independent of 2 to the j. Because whenever I'm accessing these chunks, they are of size 2 to the j together. So the reason why this is a read-only ORAM is because we want this invariant which says all ORAMs should store identical information. And if I were to update this range in ORAM j, I need to do it in all of the other ORAMs, and that breaks either bandwidth or locality. So in order to support writes, we use a slightly generalized data structure, which we call a range tree. And a range tree is the following tree where, again, leaves are the primitive blocks, and they are sorted by addresses. The distinction from what we were doing in the last slide is that over here, these leaves could contain discontinuous blocks. For example, over here, it's one, two, five, and six, blocks three and four are not there. And every internal node has super blocks, which are a concatenation of its children. And they are, again, replicated, and so on and so forth. And looking ahead, each of these levels will be stored in different ORAMs. And in order to support, so if you think about what we did for read-only range ORAMs, it is this full range tree consisting of all elements. So if you were to support writes, I'm going to use a hierarchy of range trees, where the largest tree contains all of my data, and then I'm going to have many smaller range trees. And I'm going to maintain these two invariants. The first invariant is the range tree invariant, which says, within a tree, blocks are consistent. That is, if I'm looking for a primitive block, it will be stored at all of these different heights, and each of them will have the same information. And the second invariant that we maintain is that smaller trees are fresher. That is, across these ranges, blocks may not be consistent. But whenever I have two blocks across trees, the smaller one will be the fresher one, or smaller trees act as a cache or a stash for a larger tree. So let us see with this data structure how we can perform an access. At a high level within ORAM, each access consists of two parts. First, I fetch some data and store it on some contagious locations on the server. And then I engage in a maintain or a rebuild process, where I'm going to write this data back into the ORAM data structure. So let us see how to do each of these in part. Let's say I have a request, which says, read some locations as comma t. So for range trees, which have a root size smaller than this range, all I'm going to do is access the root. For range trees that are larger, and I now need to find two super blocks that store this range as comma t. So earlier with the read only range ORAM, this was easy, because I could use some simple math to figure out where these two super blocks are stored. But now it turns out I cannot do that, because my data need not be a set of all end memory locations. So in order to actually find which two these blocks are, we will use a separate auxiliary data structure, which would be oblivious. At a high level, if you perform some sort of binary search on top of an ORAM, that data structure should be sufficient to obtain these two blocks. And then I'm going to access these appropriate blocks from ORAMJ, where ORAMJ is log of t minus s plus 1. And at the end of this, I have data from each of these ORAMs, all written to a server. And the same block can be repeated multiple times. And I need to perform some sort of deduplication to get freshest range as comma t. So let us try to analyze the locality of this algorithm. And again, remember, locality is the number of discontinuous accesses or the number of jumps. So for the first step, we are just making up to log in discontinuous accesses of the size of the range. For the second step, for accessing the oblivious data structure, we are again making polylogarithmic accesses. But this is of size one, because all we are accessing in the data structure is some index. For the next step, when I'm actually performing this access, again, I'm making polylogarithmic number of discontinuous accesses of the size of the range. And finally, in order to perform deduplication, we'll use some sort of locality friendly oblivious sort. So for now, I'm going to punt on the oblivious sort. I'll revisit it when we reach the maintain phase. But for now, if you would observe all of these points, all of them do not depend on the range in terms of locality, or locality is independent of size of the range. So with that, let us see how we can perform a maintain operation or a rebuild operation. And the goal over here is to write this access data back into the data structure. So updating a range tree is hard. And it is because we want to store consistent values across different heights. So if you think of accessing a range of size k, and now I have to update it across all heights, for accessing the root, I'm going to incur a large bandwidth because I'll have to download the entire data. And similarly, for accessing the leaves, I'm going to incur a large locality because I need to do it one by one. And the fix over here is to actually write in a smaller range tree and rebuild range trees. If you take a bird's-eye view and see what we are doing over here, this is something similar to how rebuild operations are performed in hierarchical ORAMs in general. But the important constraint that we need to maintain over here is locality. So we require ORAMs that have a locality-friendly initialization and a locality-friendly rebuild procedure. Specifically, I cannot initialize an ORAM by saying I start with an empty ORAM and add blocks one by one. I need to do it as a batch. And an important ingredient, turns out, is a locality-friendly oblivious sort. And over here, we use bitonic sort. And we show that if you have two disks, then bitonic sort can be implemented with order log squared in locality and n log squared in bandwidth. And this is where the notion of two disks come in in our final result. All right, so we spoke about how to do read-only-range ORAMs using a very specific range tree, how to do read and write using a hierarchy of range trees. Let us try to see how we can move from offline to online. Again, the idea over here is simple. I have requests coming one by one. All I'll do is some sort of a predictive prefetching. Whenever I see my request sequence, if I see that the request sequence is discontinuous, I'll just access a block of size one. On the other hand, if requests are contiguous, then I'm going to prefetch a larger super block of double the size of what I did earlier. So it just grows geometrically. So if I have requests which looks like 2, 3, 4, 5, 6, followed by 1, 4, first I'll access a block of size 1, then a block of size 2, then a block of size 4, and then I observe that locality is not maintained when request sequence 1 comes in. And then I'll start all over again. And as we can easily see, a range of length L just requires up to log galaxies. So I can move from offline to online by throwing in another log in factor in terms of both locality and bandwidth. So to summarize, here are our contributions. I discussed the upper bound over here in the presentation. For the lower bound, I would urge you to read our paper. But before ending my talk, I would quickly discuss whether such a length leakage in an ORAM is reasonable. And I would first make an umbrella statement that this largely depends on the application. If lengths are already public knowledge, range ORAMs are useful, and then they'll give you efficiency because of locality. However, on the flip side, I'll also mention that range ORAM is a strict generalization of regular ORAM. I can always turn off the locality feature by just not using predictive prefetching. And finally, recently, people have considered trying to hide the axis pattern length using differential privacy. Perhaps something like this can be used over here to hide the length of ranges. Let me end my talk by posing some open questions. The first open question, and to me, this is the most interesting open question, is can we actually preserve the number of disks while achieving obliviousness? For instance, our solution uses two disks, and that stems from using a bitonic sort. It turns out that there has been some partial progress through a subsequent work which was published in NDSS 2019. And over there, they assume a larger client storage, and they are able to do this using a single disk. And the client storage that they assume is the size of the range that you have accessed. A second interesting open question over here is can we have a locality preserving ORAM, an oblivious parallel RAM? And finally, until now I have discussed everything in terms of polylog in N. So an interesting question is can we get actual asymptotic efficiency? In fact, a previous draft of our own work discusses a bit asymptotics for the same scheme that I described, and this work in NDSS 19 also talks about the actual asymptotics. So the interesting part of this work is that they use a tree-based ORAM instead of a hierarchical ORAM to achieve locality. Thank you, and I'd be happy to take questions. So yeah, questions? If you have a question, please come to the mic. All right, thank you. OK, let's thank the speaker again.