 And welcome to my talk on OPSI, Fast OPRF, and Certified PSI from Vector OLE. This is work I did at my time at home of the University of Berlin, and joined work with Peter Rindow from Visa Research. So this talk is going to be about Private Set Intersection. So let's quickly introduce what that is. Private Set Intersection, or Short PSI, is a protocol between two parties as involved. And they each have some set of input elements. The goal for the parties is to learn the items they have in common, so in this case the blue triangle and the red circle. So if at the result of the protocol all the parties learn is their intersection of the sets and nothing beyond that, then we call that PSI protocol actually secure. So there are many variants of PSI protocols. In particular, we could have, for example, both parties have both parties receive the output or only have one party receive the output. We could have associated values with the inputs or we could have the output secretared, which allows us to subsequently perform a different secure computation on the secretared output without the parties learning the intermediate results. OK, so in the last couple of years, the most common approach to build private set intersection protocols was based on oblivious pseudorandom functions. What is an oblivious pseudorandom function, or OPRF for short? Well, it's the distributed equivalent of a PRF in the centralized setting. So Alice here has the key to the PRF and she's going to get that as a result of the OPRF protocol. Bob, on the other hand, only learns the output of the OPRF on certain input values that he chooses at the beginning. So here, for example, Bob chooses two input values and gets the corresponding output values. Alice, on the other hand, who has the key, can compute the PRF value on any input she chooses offline. So the main observation here is that Alice can compute F locally, whereas Bob only interactively and only on a fixed number of inputs. Now what do OPRF give us? So how can we use OPRFs to build PSI protocols? Well, the approach is actually quite simple. Bob simply inputs all his elements into the OPRF protocol and in turn gets the corresponding PRF values. Alice gets the OPRF key and now can locally compute the PRF values for all of her inputs. And now she's just going to send those over in the shuffled order to Bob, who can then compare each of them with the OPRF values he got for himself. And as soon as he has a match, he knows that the corresponding input was in the intersection. So that's a very simple PSI protocol. And the work I'm presenting here follows the same approach. So we're also going to build an OPRF function, but we're going to build one that's more efficient than previous work. And that is also going to give us more efficient PRF protocol in turn. So the basic protocol underlying our OPRF construction is called vector only. I'm also quickly going to introduce what that is. Vector only. So only here stands for oblivious linear evaluation is basically a correlation generator that again can be run between two parties. So the vector only generator outputs three vectors, A, B and C, and a scalar delta. So here Alice gets B and delta and Bob gets two vectors A and C. And all of these are pseudo random, but with the property that they are correlated. And the correlation is such that C equals A delta plus B. You can think of this as an additive sheet secret sharing of our vector scalar product. So A delta is the product, and then C and minus B are the secret shapes. Now there are several works that explore how to build vector only generators, and we're building on them to build our OPRF construction. Now how can we get from vector only to an OPRF? Now suppose for a minute that A, B and C are very long, in particular that they are exponentially long. That means they have as many elements as there are possible elements for OPSI protocol. Of course in practice that's not going to be possible, but assume we have that for a moment and then we're going to see how we can actually get a realistic OPRF. So if we had such a long vector only correlation, and if in addition Bob could choose one of the vectors A to be 0 at certain positions, we could have an OPRF by Bob choosing A to be 0 at all the positions that correspond to his input elements. So if A of X is 0 at all the positions of Bob's inputs, then at these B and C will be equal. So irrespective of Delta, which is still random. And now if we just put those into a random oracle, we immediately get a PRF, and it's one that Bob can evaluate at all of his inputs, and Alice can actually evaluate anywhere, because the key is going to be just Alice's vector B. Now observe that at positions where Bob doesn't program A to be 0, C and B will not be equal. And that means that the corresponding result of the OPRF will not be equal. So we still have the property that Bob can only evaluate the OPRF at positions he chose in the beginning. Now that sounds very simple, right? But of course there are several problems with this approach. First of all, with just a vector only generator, Bob cannot choose A since it's generated pseudo-randomly. Secondly, as I said, A, B, and C would have to be as large as the OPRF domain, and that's not realistically possible with any vector only generator. And thirdly, if Bob acts maliciously, he could just program A, so if he can program it at all, you can program A to be 0 in more than n positions, right? So you can program it actually to be 0 anywhere, and therefore learn Alice's key, and that makes the OPRF protocol not secure. So I'm now going to show you how to fix each of these issues, and at the end what we're going to get is an actual OPRF protocol from vector only. So for the first problem that Bob cannot choose A, we're going to use a standard construction that reduces the problem of a chosen input OLE correlation to a pseudo-random OLE correlation. So if Bob cannot choose A, but he can instead choose a different vector P, he can just send P plus A to Alice, and here A is going to act as a one-time pair to mask P. And now Alice's key just becomes B plus delta times A plus P, and this again satisfies the vector only correlation, so we still have correctness. But now Bob can actually choose this vector, and that allows him to program it in exactly the way we want it to. So OK, that was simple, and that's a non-construction, so nothing new. The second problem is somewhat more involved. So for this ideal vector only protocol, we would need A, B, and C to be as large as the full OPRF domain. Since we cannot have that, we're going to go a different approach instead, and we're going to assume we have a public exponentially large matrix M. So this matrix again is as large as the vectors we would ideally want to have, but it's not very wide. So it's very tall, but not very wide, and the number of columns is just slightly more than the number of inputs that Bob chooses. What Bob is now going to do is choose his vector P such that the product of this exponentially large matrix M and P is zero at the positions he wants it to be. And that then allows us to use a vector only correlation of only size M, and M is size of M, so it's quite small. Yeah, so let's see how that looks. Here on the left we have the matrix M. Again, it's very tall, and it cannot be represented anywhere in memory. But we can index it by the elements that we have in Bob's input set. So Bob can just select the rows that he wants from this matrix M. And we have a short vector P, and we want the product of M and P to be zero at exactly the positions that correspond to Bob's inputs. How can we achieve that? Well, it's rather simple. We just choose the rows of M that correspond to the inputs, and then solve the linear system M P equals zero. And we're going to see several approaches that make this linear system solving actually quite efficient. As for the third problem that a malicious Bob can make now M P zero in more than positions, well, we're not going to actually program zero there, but we can program any value we want. And in particular, we can program a random value that's derived from the input element. So here we're going to use a random oracle, and use the output of the random oracle as the value programmed for M times P. So here again, the only thing that changed to a previous image was that now M P has the hash of the corresponding row in its value. And the intuition why this solves the problem that we had before is that here if Bob was able to program P in a way that contains more than N positions, or more than N positions, then that would correspond to compressing a random oracle, since that's not possible information theoretically. That means that Bob cannot find a vector P that programs more than N positions. And so that ensures that he can only query the OPRF at N positions. So this gives security from that perspective. Now, okay, the question is, we want this large public matrix M that has the property that Bob can efficiently choose sub matrices of it. So can efficiently index into it and choose a row that corresponds to an input. And additionally, we want him to efficiently solve the linear system. So M at the chosen rows times P should be equal to the hash of the row index. Now, there are several storming approaches that we can take here. One would be to just take a pseudo random matrix. So each row of M will be defined by a pseudo random generator that is just a PRG of the input I of the index I. Now, that is still quite concise. So if we do that pseudo random over field F or over binary field, then we only need roughly OVN columns in this matrix for it to have a solution. But the problem is that now finding that solution takes cubic time, because the best we can do here is Gaussian elimination. So that's quite inefficient in terms of computation. Another approach that's also been cited in the literature quite often is a garbled bloom filter. So here we don't have completely pseudo random rows, but instead we have a binary matrix that is only one at very few positions. So here the number of positions is equivalent to the statistical security parameter of lambda. And that means we only have lambda nonzero elements at each row. So that means we can also we can solve the linear system in linear time since it is a sparse system. But the issue now is that we need a very large number of columns in the matrix to ensure there's a solution. And so that means the communication becomes quite inefficient since the size of the rows is also the size of the vector only correlation that we have to generate. So one approach that is both efficient in terms of computation and communication is what we call the Vandermonde's order. So here m is going to be the Vandermonde matrix. That is each row is going to contain all the powers of the row index from the 0th to the n minus first. So now multiplying a row times any vector v is the same as evaluating a polynomial with coefficients from v at the row index x. Now if our matrix has that structure, that means that solving the linear system, so taking the rows of m that corresponds to Bob inputs and solving mx times p equals to the hashes of x, that's the same as interpolating a polynomial where the coefficients are p. And the polynomial is over the rows in a row indices x. Then computing the inverse, so multiplying the matrix mx times any vector v is the same as multipoint polynomial evaluation. So here's an example of how a polynomial could look like. So at any point x, we want the polynomial to have the hash of x as the value. At any point that is not in Bob's input set, we don't care, we just care about the exact input that Bob has. So the nice thing is that the number of columns this matrix needs to have is exactly m. So it needs to have the powers from 0 to n minus 1, so that's n many of them. So that means communication-wise, we're as efficient as we can get. Computation-wise, we know that polynomial interpolation and multipoint interpolation can be done in time n log squared n. But ideally, we would like to get rid of that log square factor. So ideally, we would like to have something that's linear. And for that, we have a second approach that we also present in favor. And that's based on previous work, and in particular, called the Pax of Sova. So the Pax of Sova is nice because it has a linear time n coding and decoding algorithm. And also, the rows have size of n. So they are larger than n, but only by across them. So the idea of Pax is inspired by cuckoo hashing, and I'm not going to go too much into it, but I'm going to try to get a high-level integration. So in cuckoo hashing, we have two hash functions that map to a space of a fixed size, here n. And exactly the two points that correspond to these hash functions will have set one bits. Additionally, we will have all of lambda uniform bits for each row of our matrix. So if we construct the matrix in that way, then we can actually solve the linear system in time linear in n. And how that works is by observing that the left part, so the left part of this matrix that has only very few one bits, has in most cases very few cycle if you interpret it as a graph. So if you just take all the columns that are part of cycles together with the dense part on the right, and you solve that using Gaussian elimination, for example, then the remainder of this matrix can be solved very efficiently in linear time. And so that's the approach that was taken in this crypto 2020 paper. And that's basically the same thing we're going to use here. So now we have two approaches to generate this matrix M. One is a bit more communication efficient. That is the van der Montsover. And the other one is very efficient in terms of computation and also pretty efficient in terms of communication. That's the taxes. So how does the full OPRF protocol now look like? We start with a vector only generator. So Alice gets delta NB. Bob gets A and C, such that C equals A delta NB. Now Bob uses whichever solver he chooses, so either the van der Montt interpolation or the Paxos encoding. To encode a vector P is such that the matrix indexed by his elements equals to the hashes of the elements. He then masks the vector P with the output of the vector only generator and sends P prime over to Alice. Now Alice just adds delta times P prime to her output of the vector only correlation. And that's her key. And now you can see that both parties actually can compute locally the values of the OPRF. But the property here is that the values will only be equal for elements that Bob in the beginning programmed to be equivalent to the hash in the linear system solving case. So now that's the full OPRF protocol. And if we plug that into the PSI from OPRF construction, we actually get a secure PSI protocol. As said in the beginning, you might also want to have a variant of PSI where the output is square. So that is useful if, for example, the two parties want to perform a subsequent computations on the output without learning the intersection elements themselves, but only a function of the intersection elements. For example, the sum. So an approach to do that that was presented in 2019 is to make the OPRF programmable. That means that Alice can fix the value of the OPRF for her inputs wide to a certain value that she chooses. So if we have such a programmable OPRF, then we can use just cuckoo hashing and generic circuit-based PSI to the circuit-based MPC to obtain a circuit PSI protocol. And we're going to follow the same approach. And so for that, I'm going to show you how to make the OPRF that I just presented programmable. So here at the beginning, Alice has, in addition to her inputs wide, she also has the labels z. So she wants to program the OPRF in such a way that for y1, the output is z1. Now we start by just running the simple OPRF protocol that I showed you. And so Bob inputs his elements x1 to xn and in turn gets output PRF values of x1 to xn. Alice, on the other hand, gets the key and can now locally evaluate this PRF on all of her inputs. She does this and subtracts the result from her labels z. So observe that now this z prime is essentially pseudo-random if you don't know the corresponding y values. Now if we now have an encoding that maps these ubi values to these pseudo-random z values and that also hides the values y that Alice inputs, then we can send that over to Bob without revealing anything. So what we did in our paper was we analyzed the Paxil solver and saw that, in fact, it can be transformed into such an encoding. So in particular, at any point where in the linear system solving of the Paxil's encoding, where there is a choice between multiple options, if we just choose randomly, then this satisfies the condition. So as long as z prime is pseudo-random, nothing is revealed about y by this encoded vector p. And since z prime is masked with OPRF outputs, which by definition are pseudo-random, then this suffices here. So Alice can encode this vector p and send it over to Bob. And now both parties can locally decode the Paxil's encoded vector on their inputs and add the results to the OPRF values that they already computed. So this gives them the programming OPRF value. And if you do the computation, you will see that for x and y, so if x is part of Alice's input, then actually the output of f of x will be equal to the corresponding label z that has input into the computation. So we have correctness here. And as I said, since p doesn't reveal anything, since z prime is pseudo-random, this also ensures a privacy for Alice's input. So with that, let's evaluate the protocols that I just presented. So we implemented both the standard PSI protocol as well as the circuit PSI protocol. And for the standard PSI, we also have a semi-honest and malicious variant. So this now is the semi-honest variant of PSI for a rather small input set size. And we compare it with previous work, that is the Paxil's paper, then another work by Jason Miao from 2020, and then the classic KKRT PSI protocol. And we see that in the LAN, still KKRT is the fastest, even though our new protocol has the lowest communication. We also see that if we exclude a one-time setup phase that is only present in our protocol, then actually our protocol is the fastest in any setting that's not the one. If we include on the other hand, this one-time setup, then the protocol of Jason Miao is still faster. However, if we increase the input size, so if we now go to a million input elements, then we see that very bandwidth constrained settings, our protocol is the fastest, even if we include the one-time setting. And that becomes even more apparent if we go to 2224, so really large input sizes. So here in the LAN, still KKRT is the fastest, but in any bandwidth constrained setting, the fact that we have such a low communication in the whole PSI gives us an advantage in terms of computation times as well. In the malicious setting, we see a similar picture. So here I have all the three input sizes at once. And we'll see that only in the LAN, the protocol, the original Paxil's PSI protocol is faster than ours by a small margin, but then in bandwidth constrained settings, ours is faster. For circuit PSI, we actually have two choices in our implementation. And that comes due to the fact that the circuit PSI at the end has a generic circuit-based MPC protocol. So we implemented that using GMW, and for that we have a choice of OT implementation that we use. And here we can either use the standard IKNP OT extension or a more communication efficient one that is silent OT. And we implemented both of them and compared them. And we see that in fact, in bandwidth constrained settings, silent OT outperforms IKNP. Both of them in turn outperform the original circuit PSI protocol that ours is based on in terms of communication, and then at least one of them outperforms them in terms of total running time. So that concludes my talk. And I just want to quickly highlight two related works that have been presented at crypto this year, and that can also be used to improve the protocols that are presented here. So one of them is silver, which is a faster vector only generator. So we use one from CCS 2019. And what's nice about silver vector only is that it's both faster in terms of computation and communication than the one we use. So this will strictly improve the PSI protocol if you implemented a mold-based PSI with this silver ball. A second paper that appeared at crypto was about oblivious key value stores. And those are basically more efficient variants of taxes. And again, just like the original taxes, those can be used in both our PSI and circuit PSI construction. And so can significantly reduce the communication overhead of those. With that, I'm concluding my talk. And I would like to thank you all for your attention. And I'm happy to take any questions.