 Hello, I'm Peihan Miao. I'm going to present private-set intersection in the Internet setting from Lightweight Oblivious PRF. It is joint work with Melissa Chase. First, what is private-set intersection? Let's start with a simple example. Let's say Alice and Bob are two cryptographers. They meet at a crypto conference for the first time, and they want to figure out who are their common friends. For example, here is a list of Alice's friends, here is a list of Bob's friends, and they want to figure out who are in common. What can they do? Well, they can take out their phones and put them together to compare the contact lists. From this, they can easily tell, for example, Melissa and I are their common friends, and so on. However, if they take this approach, then Alice would have to reveal all her other friends to Bob and vice versa. As cryptographers, they might not want to reveal such kind of information to each other. What is even worse, nowadays all the conferences are virtual, so this is not even an option. What they can do instead is to run a cryptographic protocol where they exchange encrypted messages and compute all encrypted messages, and somehow by the end of the protocol, they will learn their common friends, but nothing more. In particular, Alice would have no idea about Bob's other friends and vice versa. This is what we call private setting transaction, or PSI for short. It is a special case of secure two-party computation. Although the functionality that they want to jointly compute is very simple, PSI has found a lot of applications in practice, for example, private contact discovery, password breach alert, ad conversion measurement, and many more. There has been a lot of work on PSI in the literature in different scenarios and different settings, but in this work, we will focus on the semi-honest security model where both parties follow the protocol honestly but are curious about the other party's input, and we assume the two sets have roughly the same size. In this scenario, which protocol should we adopt in practice? We are obviously the most efficient one, but when we think about the concrete efficiency, most of the time there is a trade-off between computation and communication. If we draw a line between computational efficiency and communication efficiency, there is actually one PSI protocol at each end. On the one hand, in 2016, Kolasnikov et al. gave a PSI protocol that is computationally very efficient, but it requires a bit more communication, so it is the best fit for networks with very high bandwidth. On the other hand, last year, Pinkas et al. gave another PSI protocol called Spotlight, which achieves very low communication but requires a bit more expensive computation, so it is the best fit for networks with very low bandwidth. In the same work, they proposed a new metric called Monetary Cost, which takes both computation and communication into consideration and asks what is the most cost-effective PSI protocol. In this work, we ask the question whether we can achieve a better balance between computation and communication, and here is what we achieved. First, we construct a new PSI protocol that achieves a better balance between computation and communication, so it is the best fit for the Internet setting where the network bandwidth is not too high and not too low. For example, between 30 and 100 Mbps, our protocol is the most efficient. And second, our PSI protocol is semi-onion secure, but we can actually prove its maliciously secure against one party almost for free. I will get back to this point later in this talk. And finally, because we can achieve a better balance between computation and communication, our protocol also achieves the lowest monetary cost compared to the existing works in a lot of real-world scenarios. So this is our result. Now let's move on to the more technical part. Our starting point is the KKRT PSI, which is computationally very efficient, and their major building block is a primitive called single-point OPRF, and they constructed a single-point OPRF from OT extension and symmetric-key cryptographic operations only. So overall, it is computationally very efficient. However, to construct PSI from single-point OPRF, they need to use a data structure called KUKU hashing, and that's why they require a bit more communication. I will explain that later. In our work, we construct PSI from another primitive called multi-point OPRF, so that we can get rid of the KUKU hashing and reduce the overall communication. But we can also construct multi-point OPRF from OT extension and symmetric-key cryptographic operations only. So our protocol is still computationally very efficient. As a side note, the work of PRTY or spotlight also follows the same paradigm, but they constructed multi-point OPRF from OT extension and polynomial interpolation over a large field, and that's why they are computationally much more expensive. In the remaining of the talk, I will first tell you what is single-point OPRF and how to construct PSI from single-point OPRF using KUKU hashing, and then what is multi-point OPRF and how to construct PSI from multi-point OPRF. Next, I will briefly talk about how to construct single-point OPRF from OT extension and symmetric-key cryptographic operations and how to generalize these ideas to achieve multi-point OPRF. First, what is single-point OPRF? It is short for single-point oblivious pseudo-random function, which is a special case of a secure two-party computation protocol between Alice and Bob, where Alice has no input and Bob has a single input Y. And from the secure two-party computation protocol, Alice will learn a PRF key K and Bob will learn the PRF evaluated on his input Y. This is the functionality of single-point OPRF and given this primitive, we can construct a single-point PSI where Alice has a set of elements, while Bob has a single element Y in his set. To achieve the single-point PSI, the two parties first run the single-point OPRF and then Alice will evaluate the PRF on every element in her set and send all the PRF values to Bob. By comparing these PRF values with the PRF value of K of Y, Bob can easily tell whether Y is in the set X or not. But how can we achieve general PSI? A naive approach would be to run a single-point PSI for each element in Bob's set, but then both computation and communication have to grow quadratically in the number of elements. What they did in KKRT instead is to use a data structure called cuckoo hashing and the high-level idea is the following. Bob will construct a hash table that is slightly bigger than his set and then for each element in his set, he will compute two public hash functions on the element to generate two random positions in this hash table. For example, Y1 could be put into these two bins of the hash table. Y2 can be put into these two bins of the hash table and so on. And the data structure of the cuckoo hashing has the guarantee that every element can be put into one of the two bins and every bin has at most one element. And then the two parties will run a bunch of single-point OPRFs one OPRF per bin of the hash table. And then for each element in Alice's set, she will also compute the two public hash functions on this element to generate two bins of the hash table. But she doesn't know which bin Bob could have potentially put this element into. So she has to evaluate the PRF on this element under both keys and send both PRF values to Bob and similarly for X2 and so on. In fact, Alice has to send more PRF values per element in her set because if you are familiar with cuckoo hashing, there is actually an extra small stash that elements could have been potentially put into. And so Alice has to compute a PRF on every element under each key in the stash. But if you're not familiar with cuckoo hashing, that's fine. Forget what I said. The key takeaway here is that Alice has to send multiple PRF values per element in her set and that's why it requires more communication especially from Alice to Bob. And the fundamental issue is that they can only achieve single-point OPRF. If they can achieve multi-point OPRF, well, Bob has multiple elements as input and as the output, Bob learns the PRF evaluated on all the elements. Then we can achieve PSI very easily where the two parties first run multi-point OPRF and then Alice will evaluate the PRF on every element in her set and send all the PRF values to Bob. By comparing these PRF values, Bob can easily figure out the intersection of the two sets. So far we have seen how to construct PSI from single-point OPRF using cuckoo hashing and how to construct PSI from multi-point OPRF to get rid of the cuckoo hashing so we can reduce the communication especially from Alice to Bob. In the remaining of the talk, I will first briefly talk about how to construct a single-point OPRF from OT extension and symmetric key cryptographic operations and how to generalize these ideas to multi-point OPRF. But before that, a key primitive that we will use is oblivious transfer or OT for short, which is again a special secure two-party computation protocol between two parties. We call them sender and receiver. Where the sender has two arbitrary messages M0 and M1 as input and the receiver has a single bit B as input. And from the secure two-party computation protocol, the receiver will learn one of the two messages depending on her choice bit B and the other message is hidden to her while the sender learns nothing about the receiver's choice bit. This primitive oblivious transfer requires public key operations so it is computationally expensive. But if we are doing a large number of OTs then they can actually be done by a small number of public key operations or the lambda of them where lambda is a security parameter and all the remaining operations are symmetric key using the OT extension. So the overall computation is still very efficient. Now let's see how to construct a single point OPRF from OT. First, Alice will generate a random lambda bit string S where lambda is some security parameter and Bob will generate two lambda bit strings R0 and R1 where R0 is random and R1 is computed as follows. On his input Y, he will first compute a function F or Y which outputs a lambda bit string. We want this function F to be pseudo random and deterministic. For example, you can think of it as a PRF with the PRF key known to both parties. So this F of X produces a pseudo random lambda bit string and then R1 is computed as a bitwise X or of R0 and F of Y. Then after this, Alice and Bob will run lambda instances of OT where Alice is the receiver and Bob is the sender. In particular, the first OT has these two as the messages and this bit as the choice bit. And the second OT has these two as the messages and this bit as the choice bit and so on. By the end of the OT Alice will learn a lambda bit string Q and you can actually rewrite it as R0 X or with the bitwise end of S and F of Y. But this string Q looks random to Alice so it doesn't reveal any information about Y. And then we are done. So what is the output of Alice and Bob? The output of Bob is the PRF evaluated on Y which is computed as H of R0 where H is a hash function. On Alice's side the output of Alice is the PRF key K which consists of two bit strings S and Q. To evaluate the PRF on an input X, Alice will first compute F of X which produces a pseudo random bit string and then she will first take the bitwise end of S and F of X and then X or with Q and finally compute a hash function on top of it. And this can actually be rewritten as this formula which is a bit hard to process but the key takeaways here is the following. First, if X is equal to Y then F of X is equal to F of Y so this will be cancelled out and then the output will be equal to H of R0 no matter what S is chosen. On the other hand, if X is not equal to Y then a lot of the bits between F of X and F of Y would be different so they cannot be cancelled out. As a result, the output of FK of X would be harder to guess for Bob unless he can guess correctly for a lot of bits in S. So these are the two takeaways and now we're going to generalize these two takeaways to multipoint OPRF and here's the construction. First, Alice will generate a lambda random bit string S and Bob will generate two matrices R0, R1 each of dimension N times lambda where R0 is a random matrix and R1 is constructed as follows. For each element in his set, for example Y1, he will first compute F of Y1 which outputs one position in each column of the matrix. Again, we want this function to be pseudo random and deterministic so F of Y1 would output one pseudo random position in each column of the matrix and then Bob will copy these bits in these positions from R0 to R1 and similarly for Y2, he will compute F of Y2 which produces a pseudo random position in each column of the matrix and then he will copy the bits in these positions from R0 to R1 and similarly he will do this for every element in his set and then for the remaining empty positions he will flip the bits in these positions from R0 to R1. Okay. And then after this Alice and Bob will again run lambda instances of OT where Alice is the receiver and Bob is the sender. In particular, the first OT has these two columns of the matrices as the messages and this bit as the choice bit and the second OT has these two columns as the messages and this bit as the choice bit and so on. By the end of the OT Alice will learn one column of the matrix per OT to form a matrix of dimension n times lambda and this matrix would be her PRF key K. To evaluate the PRF on input X she will first compute F of X to produce a random position in each column of the matrix and then she will take all these bits to compute a hash on top of it. This will be the output of the PRF and then on Bob's side to compute the PRF on YI he will also first compute F of YI to produce the pseudo random position in each column of the matrix and then take all these bits to compute a hash on top. This will be the PRF value for FK of YI. If you think about it, we actually have the same guarantees as before. In particular, if X is equal to some YI then we have the guarantee that all these bits in these positions would be the same for both R0 and R1. So the output of FK of X would be equal to FK of YI no matter what S is chosen. On the other hand, if X is not in the set of Y then among these bits there will be a lot of differences between R0 and R1. So it will be hard to guess FK of X for Bob unless you can guess correctly a lot of bits in S. And then we are done. I want to briefly mention that we can actually prove security against a malicious Alice because if you think about it what Alice can learn from this OT is a matrix that is information theoretically random. So she doesn't learn anything about Bob's inputs. And if we assume the hash function to be a random oracle then it will be easy to extract her inputs. And then we are done. To summarize, we construct a new PSI protocol from multi-point OPRF to get rid of the usage of cuckoo hashing so we can reduce the communication from Alice to Bob. But since we can construct a multi-point OPRF from OT extension and symmetric key cryptographic operations only our computation is also very efficient. Finally, to mention a couple of open problems so our PSI protocol is somewhere between KKRT and PRTY. So an interesting question would be can we achieve the best of both worlds? Can we achieve the best computation as well as the best communication? And the second question is our protocol can achieve malicious security against Alice almost for free. Can we achieve malicious security against Bob without much loss in the efficiency? And that's it. Thank you for watching.