 Hopefully the slides will hold together enough for me to actually give this talk. Thank you for the introduction. As it was said, I'm Ariela Hamlin, and today I will be presenting our work on private anonymous data access. This is a joint work with Raphael Strausky, Moore Weiss, and Daniel Wicks. So without further ado, let's consider a setting where we have a server, a single server storing some database, in this case, a medical database. And we have a set of clients who want to make queries into this database to get information about flu symptoms, especially given the flu season we've had here in the states. You might have someone who is concerned about their headache or headache causes. But for other clients, they may be concerned about sort of their queries and their identities being linked to those queries. So in this setting, they want maybe both privacy for those queries and anonymity for their identity. So this is the setting that we consider within Panda, essentially a single server with multiple clients, and we want to be able to provide privacy and anonymity for the client's asking queries. So as with all works, we have an acronym, namely private anonymous data access, or as we like to call it, Panda, because who doesn't like adorable fuzzy creatures? So for the rest of the talk, I'll be referring to our system as Panda. So without further ado, what does Panda actually consist of? We have a set of protocol that takes in database and generates for each of our clients essentially a unique client key and gives the server an encoded database. So in this case, both the client and the server are stateful. We also support writing or, sorry, reading. So a client may have a value that they want to read within the particular database. Using their client key, they interact with the server and get the value back at that particular database location. We also do support writes in Panda, but in this talk, I will be talking primarily about our read-only Panda scheme. If you're interested in writes, we do present the sort of high-level work, but I would encourage you to refer to the paper or track me down after this session. So now that we sort of established the syntax, let's talk about what we want sort of in this setting in this definition. As I mentioned in our motivating example, we want security against a server who may be trying to learn client's queries and their identities, namely what we want privacy of access patterns against the server, and we want anonymity of the client's identities, meaning the server cannot cross-correlate client's identities between different queries. We want this not only against a server who might be observing these actions and interactions with the client, but also with a server who is allowed to collude with some number of clients and gain access to both their sort of actions and their secret key. And we set at setup time an iProary bound on the number of clients that the server may collude with, and we call this number T as our collusion bound. We also want a single server scheme, so not necessarily something that relies on multiple servers, especially not servers that don't collude. And finally, we want efficiency goals, namely we want the client and server read or write if we're talking about it, complexity to be sublinear in this collusion threshold T, or sorry, in the number of clients, N, and the database size. And we want it to be linear in the collusion threshold. So it's this combination of privacy and anonymity with bounded collusion and the certain efficiency goals that we call the panda setting. So without further ado, this may sound fairly similar to some of the primitives that have come before me in this session, namely oblivious RAM and another primitive called private information retrieval. So I want to spend the next part of the talk sort of comparing and contrasting the panda setting and whether O-RAM and peer are things that actually work to solve our problem. Spoilers, they don't. So I think previous talk did an excellent job motivating and explaining O-RAM. So at a high level, we have our single server. We have a single client who wants to make queries into a database that the server is storing while hiding their access patterns. O-RAM itself, there's a huge asterisk on the snow up here, but it's not inherently multi-client. There are schemes that attempt to support it, but most of the common schemes do not. One time is going to be polylog in the database size, and we have set up, assumed, and the server has state. And we're able to build O-RAM essentially from one-way functions. We also have a primitive called private information retrieval. We're talking here about the single server peer constructions. And in this, again, we have a single server and multiple clients who want to be able to make queries and hide their sort of access patterns. Unlike O-RAM, peer is inherently multi-client. Namely, the client does not need a secret key to make a query. So anyone can come along and sort of query and is not needed to be defined at setup time. Unfortunately, this requires linear server side work for any read accesses. And we're able to get peer from public key crypto. So how we compare with Panda, particularly O-RAM, we are able to get multi-client. And unlike peer, we're able to achieve polylog performance in terms of our database size. We do allow for setup and sort of state, and we're able to construct Panda schemes from FHE. So these are common primitives that you might compare Panda against, but there's actually a third one that I want to talk about a little bit. And that's doubly efficient peer or D-peer. This was first introduced by the Maladol in 2000 as a multi-server scheme. And basically what it does is D-peer allows for pre-processing unlike typical peer. And what you get as a trade-off there is that you can now achieve a polylog runtime for server access complexity during reads. And essentially what I want to compare against is the more recent work by Kennedy et al. and Boyle et al. in 2017 that reduces the multi-server D-peer down to a single server, and namely their public key D-peer variant. As with peer, we have a single server and sort of any person can come along and make queries against this sort of database the server is storing. We are, as I said, able to get multi-client, polylog runtime. We allow for setup in state or peer with pre-processing as it's also called. However, in order to achieve this, we require or it requires new coding assumptions and virtual black box obfuscation. So public key D-peer does in fact solve our Panda problem, but it does so with some fairly heavy assumptions or heavy duty tools. And we were wondering, is it possible to get a construction that relies on sort of less onerous assumptions? And yes, because my talk isn't ending and I have 14 more minutes to talk at you guys. So without further ado, I want to go into our results. So we're able to achieve a read-only Panda scheme relying only on FHE with T, our collusion bound, polylog L, our database-sized client-read complexity. We also support two variants of Panda that supports writes. We have a public writes Panda where we essentially have a public database that everyone writes to, and we have a secret writes Panda where everyone essentially has their own personal database. I want to spend most of the rest of the talk talking about how we construct our read-only Panda scheme. So without further ado, our main starting idea here is the original multi-server DPR scheme introduced by Bamal. And in this scheme, we have multiple servers. And in fact, we have K servers, and sort of they're all involved. And for setup, we have essentially trusted setup that takes in a database and outputs an encoded database. And it gives essentially this encoded database to each of the K servers. It is the same copy. There is no difference between the databases that each of these servers gets. When someone wants to come along and make a query, what they do is essentially send out K unique indices into this encoded database for codeword symbols that they want to read. The servers then go ahead and return the codeword symbols at that particular location. And the client is able to decode them and get the message symbol that it wanted to read at that particular address in the database. We say for correctness that the client must issue these K queries to the server. And we say that privacy holds in essentially multi-server DPR if less than s of the servers collude. And in that case, when they have less than s indices that they're comparing, these indices appear to be uniformly distributed. So that's what, when I say multi-server DPR privacy, further in the talk is what I'm alluding to. So obviously, if you'll remember, we had a sort of a list of requirements that we wanted for Panda, one of them which is we wanted to use only a single server. So how do we move essentially a single server scheme or a multi-server scheme to a single server? And we do this by essentially having one server emulate all of our K servers as virtual machines or virtual servers. Now, we're down to a single server, but obviously in this case, privacy is not preserved. As the single machine is emulating all of these virtual servers, it gets to see the access patterns into these encoded database. So when we're considering access patterns and the sort of wanting to hide them, the primitive that jumps to mind that we want to use might be something like oblivious RAM or ORAM. And so what if we did, instead of essentially encoding the same copy and sort of handing this out to all of our virtual servers, before we did that, we essentially put these encoded databases into K unique ORAMs. And each of these ORAMs now would have their own secret key. The question is now, how do we query these ORAMs when a client wants to come along and make their sort of K accesses to these virtual servers? So we now have to give each of the clients essentially all of the ORAM secret keys. So this would seem to work, except if we remember that Panda allows for collusion between clients and servers. So as soon as this collusion happens, the server gets access to the client's secret keys, which is all of them, and any notion of privacy provided by this ORAM is now lost. So why even put it in an ORAM in the first place? So we need to now essentially have that each client has some unique secret state that is not shared with all of the other clients. So when collusion happens, we still have some notion of secrecy. So what we do here is to introduce the idea of committees. So what instead of getting all of the secret keys for the sort of ORAMs, instead, clients get the keys for virtual servers on their assigned committee. And this is done at setup, and they essentially get assigned some subset of the servers, K of them, essentially, that they can now make their queries to. So in this case, in this example, client one gets access to the first and K virtual server, client number two is going to be given access to the first and the second virtual server, so on and so forth. As you'll note, we actually have to use more than K virtual servers in this case because we do want some additional servers so we don't have sort of entire overlap on all of our virtual committees because we want those committees to be unique for each client. So what happens now when collusion happens? The server only gets the keys for the committee of the colluding clients, which means that it only gets to see some accesses in plain text, only for those where there is an overlap between your committee and a colluding client's committee. So essentially, we can say that the server, because the overlap between your committee and the colluding client committees is going to be less than s, based on how these committees are generated, we can say that the multiserver deep peer privacy still holds because the server only sees less than s accesses essentially in the plain text. The rest of the access patterns are now going to be hidden by O-RAM. So privacy holds. So we have single server, we have privacy, but if we remember in Panda, we also want anonymity for clients, but now clients essentially have some unique information, namely, they have a unique committee associated with them. So servers can tell sort of clients apart based on which virtual servers they access as they have been assigned to their committee. So we lose anonymity. So how can we fix this? One could say, well, when we query the things on our committee, we can simulate sort of random accesses into the O-RAMs that are not on our committee and the server, excuse me, won't be able to tell the difference. However, this comes down to the fundamental problem. We don't actually know how to simulate these accesses into the O-RAM because you remember we don't actually have the secret keys for those O-RAMs. And what's even worse, our adversary, the server, might even in fact have the secret keys if it's colluding with a client that has that server on its committee. So essentially what we need is this ability to make smart dummy accesses, the ability to simulate a random access without the secret key. And this seems a property that most O-RAMs outside of trivial ones don't necessarily have. So instead, what we do is we try and get the next best thing. We get O-RAM-like behavior up to a certain number of queries. And what we do to achieve this is introduce the ideas instead of dropping our encoded database into an O-RAM, we instead simply permute it based on a pseudo-random permutation essentially. So instead of each encoded database in these virtual servers being dropped in an O-RAM, instead it is uniquely permuted as defined by a permutation key. And these keys are given to the clients in lieu of the O-RAM keys. So when we want to make a query, a client simply gets the index that it wants to query sort of in the unpermuted database. It permutes it according to its permutation key and asks those queries to the virtual servers on its committee. For things that are not on its committee, it simply asks for a random index in the encoded database because now it can simulate a totally random access and the server, even if it has the key, can't tell the difference between that or sort of an actual legitimate access. So for things that are not on my committee, I have sort of uniform queries by definition. I'm just asking things randomly. For things that are on my committee but the server doesn't have sort of access to it, we say that for a certain number of bounded number of queries, excuse me, that essentially it achieves an O-RAM-like hiding of access patterns thanks to a lemma from Kinetti et al's 2017 paper and for things that are on my committee and that the server does have the key for, it is going to appear to be random based on the multi-server DPR privacy. So we're kind of covered in all three of our cases. So we're able to achieve essentially up to a certain number of queries. And how do we move beyond this bound? And this is where our reliance on FHE comes in. Essentially for every B reads, where B is our bound, the server is capable of regenerating the permutations on its own because instead of having them sort of specified by sort of a pseudo-random permutation, we have them defined by a PRF. And this PRF key is essentially encrypted under FHE, allowing the server to essentially evaluate the permutation on its own and refresh it. For worst case efficiency, I think it's telling me that y'all are hungry for lunch and that I should hurry up. But we can say that for efficiency, we can amortize this action across our bounds or essentially our epoch in order to obtain sort of reasonable worst case efficiency. So this is how we achieve unbounded read-only panda. We also have, as I alluded to previously, panda that supports writes, namely a public rights panda with a public database where essentially rights are public, so both their content and where you're writing it to. But the identity of the writer is still hidden, so we still have anonymity. Reads, of course, are still private and anonymous. We also have a construction for secret rights panda where essentially each client has their own database that they can write to. And in that database, privacy and anonymity of rights is maintained. Obviously, reads are also still private and anonymous. The cost of this, however, is that performance where we have database size, namely L, now scales with the total number of writes across the entire system. That's not the end of my talk. Hello. It really doesn't want me to show you guys performance numbers. I'm almost at the end, so I can finish up if we have to without. I don't necessarily... This one, try this one. Enjoy Beautiful Seals. This was taken at Crypto in Santa Barbara, so. Anyhow, while they were fighting with that, I'll simply sort of finish off. So we're able to achieve public rights panda with the same read complexity and secret rights panda with the same read complexity as read-only panda. All you have to do is hit, I think, just Start, and it should work. Nope. All right. This is going to be, actually, my second to last slide, so if you guys will bear with me. So client write complexity is simply going to be O of log L, where L is database size, for client write complexity, secret rights panda achieves the same sort of complexity as reads, and server write complexity has a reliance on both the number of colluding clients and our database size. And here, Epsilon is for any Epsilon greater than zero. So that's what we achieve for panda with writes. In summary, we're able to achieve a read-only panda scheme from FHE where clients and server complexity is polylog in database size and linear in the collusion bounds. We support two forms of writes, secret and public, and we are interested in pursuing the panda problem sort of in other directions, namely, it would be interesting to see if we can remove this bounded collusion aspect of panda and have performance scale independent of sort of the number of collusions or corruptions and to remove this reliance on FHE in favor of more efficient primitives. So thank you for sitting through this before lunch with all the technical difficulties, and at this point, I am happy to take any questions. Thank you very much. Any questions about panda? Yes, they're adorable.