 I am Shai Alevi and I will talk about random index private information retrieval and application. This is joined to work with Craig Gentry, Bernardo Magri, Jesper Milsen, and Sofia Jakubov. Let's go ahead and remind ourselves what private information retrieval is here. Here we have a client that wants to fetch an entry from some public database, but the server that's holding the database shouldn't learn what entry the client is interested in. So, of course, one way to do that is for the client to just download the entire database, and the whole point of peer is to do better than this trivial solution, meaning to have the server send less to the client than the entire database. This was introduced by a show called Achkoshilevitz and Sudan in the 90s, and the first solutions that they gave relied on multiple non-colluding servers, but later on Achkoshilevitz and Ostolsky show that you can do that also when there is a single server, and there has been a very large body of research on peer, and it's useful for many different things. In this work, we look at peer with a twist, and that twist is the client is not interested in any one particular index. You just want to get one index, a random index, from the database, and the point is without the server knowing which one it got. So, you should think of a lottery, for example, where people sign up with the server and give their personal details, then the client chooses a random person to get the jackpot, but the server should know who was chosen before he announced the winner, because then the server can, I don't know, extort them or do some other nasty thing. We call this primitive a random index peer, or an RPR for short. Why looking at it? So, one thing that should be obvious is that RPR is a weaker primitive than peer, because if you could do peer, then you can definitely do RPR just by the client choosing a random index, and then you run your peer protocol, and we can hope, therefore, that RPR is easier to build than a peer. For example, maybe you can make it more efficient, maybe you can use weaker assumptions, maybe you can run it in more settings, or peer is impossible, etc. So, that's on one hand. On the other hand, RPR is sufficient for some applications. So, I already gave you the example of the lottery from before. The thing that motivated us to look at it is an application to the secrets and block on the blockchain architecture of Manchamuda et al from TCC of last year, and I will spend the end of this talk talking about that application. And also during this course of this work, we find a nice little application of it to peer with preprocessing, and I'll spend one slide talking about that as well later on. This work has two mostly orthogonal threads of work. One of them is just looking at RPR as a primitive, so defining it, looking at the variations, how it relates to peer, how to construct it, and then the application that I talked about to peer with preprocessing. And the second line of work is the application to the secrets of blockchain that was our motivation for this work with some optimization specific in the context of that application. So, let's start from the RPR as a primitive, and we'll begin with trying to define it. Now, in this work, we only looked at the semi-honest case, so the server is honest but wants to learn the index of the client. And this particular definition is for the single server protocol, so it's a two-party protocol. There is a client with no input, and there is a server that has the database, which is more than just as an MB string. The server will have no output. The client output is an index and the corresponding database bit. If both of them are honest, then the index that the client gets is uniform or close to it. In terms of what we need, when we need correctness, that is, the bit that the client learns is indeed the bit of the database, we need non-triviality, which means that the server sends less than n bits, and we need privacy for the client, which means that even with the server's view, the index that the client got is indistinguishable from another index that was chosen uniformly at random in irrespective of the protocol. Some variations that would be used for one of them is just batch. Instead of a single index, you want multiple one. The definition is identical, except instead of a single index, there is a vector of indexes that the client gets. And then in the context of batch RP, you can even look at weaker security. So instead of the indexes, the vector of indexes that the client gets being indistinguishable from uniform, it's indistinguishable from some other distribution D, which is random enough. What random enough is might depend on your application, but a reasonable general purpose definition of what it means to be random enough is that for every subset of vectors of n to the k, if that subset has a negligible probability mass according to the uniform distribution, then it also has a negligible probability means according to the distribution D that we want here. And that definition says that any bad event that happens with negligible probability in the strong sense, if you had random indexes, will also happen with negligible probability in this protocol. So that's a useful definition to have. As I said here, we only treat honest but curious. It actually is an open problem even to just define what RPR means in the Mauritius setting. It's not trivial and we did not try to do it. The first result that I want to tell you about is a theorem that says that RPR as a primitive is equivalent to peer, up to a small increase in the communication and around. On one hand that means that you cannot hope for too much in terms of RPR being easier to build than peer because they're equivalent, but we will still show that there are gains that can be made and it's still interesting enough to look at. But let's for now prove that RPR as a primitive is equivalent to peer. One direction is trivial. One direction we already show if you had peer, you can definitely implement RPR trivially. So let's try the other direction. Let's say that you have an implementation of a non-trivial RPR and let's try to build out of it an implementation of peer. So I'm going to start with a very simple peer protocol. It's a full blown peer. So the client has a particular index i that it wants to get. The server has a database and what they do is the formula. First they will run an RPR protocol where the client will get from the server just a random index j, not necessarily the index that it wants to get. Then the client sends to the server delta, which is an exclusive of the index that it wants and the index that it got. We think of all these two are log n big strings and you just send in the xor of the two. The server will partition the set of indexes n into n over 2 pairs k and kx or delta. And notice that one of these pairs will be i comma j. And then for every pair it computes just the xor of the two database bits in these two indexes. And it sends these two n over 2 bits to the client. Now the client already knows database at point j and now it knows the exclusive or of it with the database at point i. So now it can compute the database bit at position i. Now if the RPR takes r rounds and has certain communications c sub c for the client and c sub s for the server, then the simple peer protocol that I just described take r plus two rounds because you first run up here and then two more rounds. And the communication there was additional log n bits that the client sends and n over 2 bits that the server sends. n over 2 being less than n, this is a non-trivial protocol. But it's not great, n over 2 is still a lot, so we want to do better. And can we do better? So it turns out yes. And the observation here is look at the last two steps in this protocol. The last two steps is just the trivial peer protocol for a database of size n over 2. The server sends the n over 2 bits, the client look up one of them. So instead of that how about we replace this trivial peer with a recursive call for the same protocol itself. So that gives us recursive peer. Every level what happens is you run a peer protocol on a database of size n over 2 to d i, then the client sends log n minus i bits, which is the delta at that level. And then you make the recursive call all the way down until you get to a database of size one. And then the server just sends that one bit to the client. So if the r peer protocol takes r rounds and has communication c sub c and c sub s, then the recursive protocol here takes at most r plus one times log n rounds of communication. And the communication is at most log n times the r peer protocol, plus the client sends log n choose two bits and the server sends one more bit. So that's nice. But there is still a question of the number of rounds. I mean we multiply the number of rounds by log n, which is not great. Can we do something better than that? Can we have a protocol that has fewer rounds? And it turns out that in some sense we can. And that's just a generalization of the simple peer protocol. But instead of just a single r peer run, the client and server will now run t minus one of them. So the client will get t minus one indexes, random indexes that the server doesn't know. Then the client will partition this index set instead of into pairs into t tuples. One of these tuples includes the index i that it wants and all the j sub ks that it got and all the other sets there are just random. And then for every t tuple, the server sends the xor of all these t bits in the database and the client again knows all the bits at position j sub k and it knows the xor of everything, so now it can compute the bit at position i. This is r plus round two rounds. The server communication is n over two. The client communication is long because sending a random partition takes many bits and maybe you can improve it, but we didn't quite find a way to do that. An obvious open question is to find better reductions, so better tidal reductions from r peer to peer. The next thing that I want to show you is that you can actually use r peer in a setting where peer is not applicable and that setting is a non-interactive setting. You may have an initial setup phase that irrespective of both database and index of the client and after that every time the server wants to convey a random index to the client, it just sends to it a single message. The client never speaks again. Clearly you cannot do peer this way because the client has no chance of inputting the input index that it wants, so you cannot do kind of do peer this way, but you can do r peer. One very simple example is doing it with FHE. You have a setup phase where the client sends a public key and an encryption of a prf seed and then in the online phase every time the server wants to send a random bit to the client, it just chooses a nonce computed homomorphically. i which is a prf of the nonce with the seed and so you get an encryption of i and then continue to compute homomorphically the database at that position and send to the client, so this is a very easy protocol. In the paper we also have a more complicated non-interactive schemes which is based on just pseudo random permutation and this is essentially the protocol of Koshilevitz and Ostrovsky from 2000, but adapted to be non-interactive. Let me also take a small detour and talk about the multiserver r peer case and it turns out that in multiserver you also can do this non-interactive and in fact you can do slightly better in some sense because we have two construction. One construction is an information theoretic construction where it doesn't even have a setup phase. When just two servers both of them have the database and every time they want the client to get a random index then each of them send the message to the client, the client gets a random index and they both send less than n beats. So that's nice. The thing that's not so nice about it is one of them has to send half the beats of the database and we actually don't know how to do better than that. Then we have another approach sort of generic for converting m-server peer to an m-server non-interactive r-peer using sort of non-functions. I'll put generic in quotations because really we only have one example where we know how to make this transformation work, but it's plausible that there are others. Let's start with the information theoretic construction. This is very similar to the simple peer reduction that I described before. Server one will choose just a random index j and send j and the database at position j to the client. Server two will choose a random delta in zero one to the log n and will partition n into n over two pairs k and kx or delta. So again in the particular one of these pairs is j comma jx or delta and then it computes for each pair the exclusive o of the two database beats. It sends delta and the n over two beats to the client and the client recover i as jx or delta and recover the database at position i as the beat that it knows x or sigma for the pair ij. And just I didn't say it before, but you need to handle the case of delta equals zero. Then in that case servers to just send delta. There's no point in partitioning anything there and the client just output the database position beta position j and you can check that the probability distribution is the one it should be. It would be really nice to be able to extend it in some way and get construction where the server can send less than n over two beats. We were not able to find one. You cannot do the recursive things because it's interactive. You cannot do the partition one because describing the partition takes too many beats. So it is an open problem. Moving on to the transformation. Let me try to describe how to transform a multi-server private information retrieval into a non-interactive R peer. So look at a typical multi-server peer protocol where it has only two rounds of communication. In this case the client will send to the server's queries that are individually random but correlated and the servers will, each of them, have the database answer its own queries and they answer back to the client and the client will reconstruct the bit that it wants, the random bit that it wants, also the bit that it wants. So the question is, is there a way where the servers can generate the random correlated queries themselves without any interaction with the client? And actually we can hope that there is, because there's a lot of work in recent years about pseudo-random correlated randomness generation. So maybe some of this technology can be used here. And indeed there is one example where we know where that thing works and that example uses the Reed Solomon peer protocol that was from the original CG KS paper. In that protocol the database is encoded by a multivariate polynomial with V variables, degree D and a polynomial over some ZQ. And the way it's encoded is that inside of some cube of size D plus 1 to the V, every entry, the evaluation of the polynomial at every entry contains some bits of the database. So the evaluation at every point is an element of ZQ, so it contains log Q of bits from the database. And this game uses D plus 1 server and each server is holding the database and therefore knows the polynomial F sub DB. The client has a particular part of the database that it wants to recover. In particular it wants to get the evaluation of F sub DB at a particular point A inside of that cube. So it's going to choose a random line in ZQ to the V that passes via the point that it's interested in. So L sub X is A plus X times B where A is the point that it wants, B is a random point and X is a variable that ranges over ZQ. It's a scalar. It sends to the J server, the J's point on the line, L sub J. And the server replies with the evaluation of the database polynomial at that point. So YJ is F at C sub J. And note that the Ys are just computed by a polynomial which is the composition of the database polynomial and the linear line polynomial. So the YJs are just evaluation of some degree D polynomial G on X. And since there are D plus 1 server, then the client now knows D plus 1 evaluation point of this degree D polynomial. So it can recover the entire degree D polynomial. And then it finds the point that it's interested in as just G at the evaluated at zero. So this is how the client recovers the point that it's interested in. Converting this to a non-interactive RP we're going to use a pseudorandom secret sharing techniques. This is a techniques that date back to Bilboa Ishae and then Kramer-Darmgård Ishae later on. I'm not going to tell you how those works exactly, but the main thing is you in a pre-computation in a setup phase you will distribute PRF seeds among the servers and different subsets of server will get different PRF seeds. And then the servers can generate pseudorandom degree T Shamir sharing locally without any interaction. Every server just uses the PRF seeds that they know. So the first Shamir sharing they all evaluate the PRF seed, the PRFs at one, the second one they all evaluated at two etc. So they can generate an unlimited number of pseudorandom seeds without talking to each other at all. So sorry, Shamir sharing without talking to each other at all. Here we're going to use PRSS for random lines. So this is degree one polynomials. And the servers just generate the query, the thing that they would have received from the client by themselves, right? They just compute CJ equals LJ for a random line L that they generate themselves using their pseudorandom seeds. And then the lines are therefore pseudorandom lines. Now there is a point here that what the client needs to get eventually is the evaluation of F and a random point, but a random point inside of the cube. It's not guaranteed that the line which is just a random line will intersect that cube. If it does then the client learns a random database entry just chooses maybe a random if it intersects in more than one point it chooses a random one or none. But if the line doesn't intersect the cube then we're stuck. So you actually the servers actually need to send multiple lines so that we get that with high probability at least one of them will intersect the cube. What kind of parameters do you get? So you have a database of size n you need to find the parameters d, q and v and the constraints are first of all you need to be able to encode the entire database. So if you look at all the evaluation inside of the cube there are d plus 1 to the v of them each one of them can hold log q bits. So the total number of bits that you can encode this way has to be at least 10. q has to be bigger than d plus 1 because you need to interpolate a degree d polynomial. And the last thing is that d plus 1 to the power v the size of this cube has to be a large enough fraction of the entire space so that random lines will intersect the cube with noticeable probability and in fact you can see that the number of lines that the servers need to send is something like q over d to the power of v in order to get intersection with high probability. And for each line we have communication which is essentially d points, d plus 1 point or 2d plus 1 point whatever so roughly d log q bits. So the total communication of this protocol is d log q times q over d to the power of v. Now we can set the parameters to get various tradeoffs so we will always set q equals d plus 2 because it's the smallest possible. Now if you want the minimum amount of communication you will set both d and v to be log n and then you get a poly log communication and if you add the constant number of servers then you get n to the epsilon communication the more servers the less communication obviously. An obvious open problem here is find other instances where this transformation works. I mean this one was very very simple using PRSS maybe there are other ways to get the correlation that you need for peer protocols. Okay the next thing that I want to tell you a little bit is how to just one slide of how to apply the ideas that we had so far for to peer with preprocessing and here we notice that the simple peer and partition peer protocols that I described below how the following structure will first the client and server run some rpeer protocol on the original database and the client learns some bits then once the client knows what index it's interested in then it sends a single message to server and then the client and the server will run a peer protocol on the smaller database of size n over 2 or n over t if it's the partition one. That means that step one you can do the preprocessing so step two and three are done online but the work here is reduced by a factor of two or a factor of t so this is a very very lightweight of doing preprocessing and with that I told you all I wanted to tell you about rpeer as a primitive so let's spend a few minutes talking about application of rpeer to the secret on the blockchain architecture and in this context when I say blockchain what I really mean as a system with many nodes most of them are student owners there is nothing blockchainy about it the motivation slides that I'm going to go through now are the courtesy of Sophia so think of secure mpc as a service we have clients they want to compute some function and they want to send it to the cloud that will compute it for them but for privacy reasons for resilience reasons you want the cloud to be implemented as a secure mpc so that it will give you both guarantee that you get the correct output with guaranteed output delivery and privacy now of course that only happens as long as some less than some team of the servers that make up this cloud are corrupt so that's nice I mean you get additional benefits to the clients but now think of doing the same thing with millions of parties you have a blockchain and you want the blockchain to compute that thing for you so the entire system consists of god only knows how many nodes now need to compute them you can do the exact same thing right I mean they could in principle run a secure mpc protocol and give you correct output and privacy as long as not too many are corrupted but there is an efficiency issue here right a run of the mill mpc protocol typically have every server every node talked to every other node and that's very very expensive when there are many nodes so in addition to security goals that we had from before now we have an efficiency goal and the goal that we want to focus here is sub-linear communication in the number of parties we do not want every party to talk to every other part an obvious way to try to do that is use a small committee so choose a random committee to represent the entire thing if a majority of parties in the overall population are honest then hopefully with some high probability the majority of parties also in the committee will be honest and then you can start as a regular secure mpc protocol here the problem with that is what happens if the adversary is adaptive as soon as these nodes start talking the adversary know who the committee is and then it just goes and corrupt them or maybe not corrupt them maybe the other say it's not all that powerful but it can at least detoss them it just knocks them off and your entire computation all the state that you built so far is dead so what do we do one thing that we'll do is we'll switch to a yoso style secure computation protocol this is a style of protocol that was studied in crypto this year by gentry et al in the type of protocol that we're talking about we have evolving committee each active for just one step as soon as you say something your role is over so after the first step the adversary corrupt maybe the first committee but now there's a whole new committee that needs to talk and after the adversary maybe corrupt them there's the new committee every time by the time that the adversary learns who the committee is this committee is no longer active and there's nothing to be gained by killing it either seeing it or corrupting it so this is the style of protocols that we would like to use here and gentry et al showed that essentially every function you can compute in this style of protocols even if you want things like a delivery etc as long as you have a majority among the entire population but there is a problem and the problem is how to forward state between committees so in this setting we want the committees to be hidden from the adversary so that they will not be corrupted or d-dost but then if nobody knows who the committee is how do you send them the messages that they need to see in order to participate in the computation and this thing was actually considered by ben hamoud et al and they talked about the notion of target anonymous channels to send messages so everybody would be able to send message to party i in the next committee without knowing who that party is how to implement those was left explicitly out of scope in the yosopaper of gentry et al but it was addressed by ben hamoud et al and they gave some solution i'll talk about it a bit in the next slide but that solution is a little effective i mean it can only withstand corruptions of up to about roughly a quarter of the overall population and not more than that so the goal here is to construct target anonymous channels so first of all we're going to assume pki in authenticated broadcast the reason i'm don't have a problem assuming that is we're thinking of blockchains and blockchain pv that sort of for free as long as the blockchain works you have that and if you have pki in authenticated broadcast the really all you need is some way to re-randomize the public keys of party so if by some chance a re-randomized version of my public key appears on the on the broadcast channel then everybody can send me now messages and they don't need to know who i am all they need to know is well this is the public key of party number five in the next committee let's encrypt it and i know that this is my public key and i can decrypt it so it sort of boils down to the question is how do you choose and re-randomize the key without the adversary learning who that key belongs to and i'll actually did offer a solution in their case there was another auxiliary committee that chose the committee that is going to get here and going to participate in the computation uh and then each member of this auxiliary committee chose one member of the secret sharing committee and re-randomize its key but that has a problem of a double dipping attack because now the adversary knows who is in the committee if either that person itself wasn't corrupted or the person that nominated them to the committee was corrupted so there is a double dipping which is why we cannot get better than resilience against a quarter of corruptions by the adversary uh the idea that we want to explore here is we already have committees they are already running secure mpc protocol how about these previous committees will also do the work that's needed in order to uh establish the target anonymous channel we will bootstrap these things off of the previous committees and that works so think of the target anonymous channel function it takes n public keys this is a public input and it takes randomness this is a private input for the n parties or for the k parties whatever and it outputs k re-randomized keys out and that works i mean this from security point of view this is exactly what you want but it does have a problem uh in that it's not scalable our point was to use less than n bits of communications here and this definitely doesn't do that i mean if you think of the circuit that computes that function that circuit has long input in particular the n public keys so if you want to just apply as uh one of the mill mpc protocols uh you'll get at least n bits of communication so we want to do better and we're going to here is where we're going to use our peer how do we use our peer we can break the computation of the target anonymous channel function into first do batch rp to fetch k random public keys from the pki and then re-randomize this k run now for the batch rp the previous committees will just simulate the rp client and since the database is public then each member can individually play the server in its head the communication here is a little of n because rp is not driven but notice that this is a really weird use of of p or rp because you don't need to broadcast the thing that the server says everybody plays the server in their head and they can imagine what the server would say without needing to still broadcast it the thing that you really care about is to have uh very short communication for the client so it is a weird use of peer the other thing to notice is that uh since we have a committee that implements the secure computation the output of it that is the k public keys are shared among this committee at the end of this step so the others still doesn't know them and then you run the re-randomized part this is a function that doesn't depend on n so it's scalable and then you reconstruct the re-randomized public keys and broadcast them to the broadcast channel and this is how you establish the target anonymous channel so if this is what we want to do what do we need from the output code so we need it to be very efficient small communication simple processing and because we're simulating the client with a secure computation and you also secure computation at that these are typically heavier npc then it better be the case that what the client computes in this output protocol is very very very simple um and from security we want to make sure that the other side doesn't learn who's on the next committee which means that the next committee should be at least unpredictable in the in the sense that the adversary cannot guess more than half of the members of it to corrupt so we can use the batch rp with weaker security because all we want here is unpredictability and not super randomness and that helps a little bit let me just show you one very very simple thing that we can do just because we can use the unpredictability version of this instead of the super randomness so instead of choosing a random k entries from the entire database we just partitioned the database into m beans m is a parameter and we choose a random k over m for each bin so now you know clearly there are many subsets that cannot be chosen this way because many subsets don't have exactly k over m from each bin many k subsets but you can see that um at least it saves an m factor in server work because uh now you fetch the same number of points you said you fetch k points but you fetch them from databases of size k over m instead of size n over m instead of databases of size n uh so definitely you save at least the factor of m in the server work and whether or not to how much you send in client work depends really on what peer protocol you're using underlying it and in terms of probability so yeah as I said many subsets can no longer appear but the subsets that can appear are distributed uniformly and in fact the fraction of subsets that can appear is not too tiny it's a fraction that's exponentially in m and polynomially in n small so if you set m to be log then you know the you still have a polynomial fraction of the subsets that can appear and therefore the probability mass of each one of them grows by at most a polynomial factor and therefore you have this definition that anything that was uh any bad event that happened with negligible probability when you just use the completely random peer will still happen with negligible probability even in this case so that's all I wanted to tell you today uh we introduced random peer it's a weaker variant of peer it can be somewhat more efficient than peer but not too much because they're equivalent as primitives still there are gains to be had and it's motivated by our application to very large scale mpc which we call secrets on the blockchain and it allows us in particular to construct target anonymous channels which are resilient to compromise of up to half minus epsilon of the party as opposed to the solution of the model that can only tolerate about a quarter of the parties corrupt and that's all I wanted to tell you thank you very much