 Hi, I'm Mike Rosalek, and I'm happy to be presenting our paper, PSI from PAXOS, Fast Malicious Private Set Intersection. This is joint work with my fantastic co-authors, Benny Pincus, Ni Trieu, and Avishaya Nai. Let's get started. If Alice and Bob each have a set of items, they can use a private set intersection protocol to identify which items they have in common. They learn which items are in the intersection, but they don't learn anything about the other items. So Alice won't learn about any of Bob's items that are outside of the intersection, and Bob won't learn about any of Alice's items that are outside of the intersection. PSI has several nice applications. For example, if I sign up for a new encrypted messaging service, I'll probably want to know which of my friends already use the service. The information that I want is the intersection of my phone's contact list and the service provider's list of clients. I can use PSI to compute this intersection without sending my contact list to the service. As another example, Google has a service for the password manager in Chrome, where any user can learn which of their passwords have appeared in a breach. Again, this is an intersection computation that can be performed with PSI, so no one has to send all their passwords to Google. Finally, Google uses PSI to measure the effectiveness of their advertising campaigns. They have a list of people who saw a certain ad, and the company who placed the ad has a list of who purchased the item. They can use PSI to understand how many people are in both lists. I'll point out that Google uses a protocol that releases only the size of the intersection, but in this talk I'll only consider the case where the parties wish to reveal the contents of the intersection. To understand the context of our new results, I'll show you the state of the art in PSI. Here's a graph showing running time and communication cost of different two-party protocols executed on sets of size 1 million. Faster protocols are to the left, and protocols with less communication are towards the bottom. Note that both axes are in log scale. First the semi-honest protocols. The fastest one is due to Kolesnikov, Kumarasan, Rosalek, and Tru. It requires about four seconds for the intersection of sets of size 1 million. The fastest malicious PSI protocol is due to Rindel and Rosalek, and it requires about 14 seconds. Note the sizeable performance gap between the fastest semi-honest and fastest malicious protocol. Our new protocol almost closes this gap. It achieves malicious security but is only 25% slower than the fastest semi-honest protocol. Asymptotically it is the first linear time PSI protocol with malicious security based on OT extension. In the rest of this talk, I want to elaborate on this performance gap between semi-honest and malicious PSI. Why is semi-honest PSI so much more efficient? Why is it harder to achieve malicious security? Finally, I'll explain our new ideas that let us achieve such improved performance. I'll also explain what the abbreviation Paxos means from the title of this paper. The fastest semi-honest PSI protocols use a building block called Batch Oblivious PRF or OPRF. In a batch OPRF, the receiver Alice has a list of M selection strings, one for each slot. Batch OPRF generates a different random function F for each slot. The sender, Bob, learns these functions in their entirety, meaning that he can evaluate each FI on any input he likes. Alice learns only one output from each FI corresponding to her selection string XI. Bob learns nothing about Alice's choice of selection strings, and Alice learns only one PRF output per slot. Because these are PRFs, any other outputs of these FIs will look random to Alice. Batch OPRF protocols can be achieved very efficiently from OT extension. For example, it only takes a few seconds to execute a batch of 1 million OPRFs. You can also think of Batch OPRF as another name for one out of NOT, since in each slot the sender has many random values, the outputs of F, and the receiver learns only one value of her choice. Now I can tell you how the KKRT protocol works. Remember, it's the fastest protocol for two-party, semi-honest PSI. I'll describe the high-level idea of KKRT, which is the same overall idea as the protocol of Pincus, Schneider, and Zoner. In this example, Alice has items A, B, C, D, and Bob has items C, D, E, F. First, they agree on two random functions that map their items into M bins. For example, H1 could map item A into bin number 2, and H2 could map item A into bin number 7. These hash functions indicate two possible bins to place each item. Alice places each item into one of the two possible locations. This can be done with a process called cuckoo hashing, if there are a sufficient number of bins. Bob can't anticipate which of the two locations Alice will select, so he places each item into both of its possible locations. Now the parties use Batch OPRF to perform an OPRF for each bin. Alice plays the role of OPRF receiver. If she places item X into bin I, then she'll learn Fi of X. If she has no item in a bin, then it doesn't matter which F output she learns, so I haven't included it in this picture. Bob learns the PRF function for each bin, and he can evaluate these PRFs on any input. Bob can compute a PRF output for each item in each bin, and send this PRF value to Alice. Since each item is in two bins, Bob sends two PRF outputs per item. Alice can compare these PRF outputs to the ones she knows. In this case, she recognizes the outputs F3 of C and F7 of D, so she can conclude that C and D are in the intersection. The guarantee of the OPRF is that all other PRF outputs look random to Alice. This protocol is secure against semi-honest parties, but not malicious, and let's see why. Here's a reminder of what Bob is supposed to do in the KKRT protocol. Specifically, he's supposed to send two PRF outputs for each item. What happens if he sends only one of these two PRF outputs, let's say for item C? Now imagine that Alice has item C, and let's see whether she includes that item C in her PSI output or not. In this example, if she places item C into bin number 3, then she learns F3 of C, and since Bob is sending F3 of C, Alice will consider C to be part of the intersection. But remember, she's also allowed to place item C into bin number 7, and in that case, she would learn F7 of C. Since Bob doesn't send F7 of C, Alice will not consider C to be part of the intersection. So in summary, she only includes C in the PSI output if she happens to place it in bin number 3. But Alice's placement of item C into bin 3 or 7 depends on all of her other input items. Depending on what other items she has, it might even be impossible for her to place item C into bin 3, for example. And because of this issue, there's no way to simulate Bob's behavior in the ideal world. Now I'm going to talk about how we overcome this problem. Remember that Batch OPRF is the main building block of the Semi Honest PSI protocol. If we want malicious PSI, we will certainly need malicious secure Batch OPRF. The state-of-the-art protocol is due to Oru Orsini and Scholl. Fortunately for us, it's essentially as efficient as the Semi Honest Batch OPRF. It achieves malicious security using a consistency check, which takes advantage of a homomorphism property. I'm simplifying things a lot, but the main idea is that Xoring the outputs of FI, X and FJ of Y gives the output of another function that I'll call FIJ on X, X, or Y. Here FIJ is a different function from FI or FJ, but it's something that Bob can also compute on any input. The outputs of FIJ also look random to Alice, except for the one output that she can compute through this homomorphism property. Now this extra homomorphism property is the key to our malicious PSI protocol. Our protocol uses the same conceptual setup as the Semi Honest protocol. The two hash functions in this example have assigned Alice's item A into bins 2 and 7. Instead of placing that item into one bin or the other, we'll ask Alice to secret share that item between the two bins. Similarly the hash function assigns item B into bins 3 and 9, and we will ask Alice to secret share item B between bins 3 and 9. Now Alice has a vector S that satisfies many of these linear constraints. She uses this vector as her OPRF input and learns the output of a different PRF on each component of that vector. Item A is associated with PRF outputs F2 of S2 and F7 of S7. If she XORs these PRF outputs together, she'll get the result F27 of A, because of the homomorphic property that I mentioned. Similarly she can XOR the values F3 of S3 and F9 of S9 to get the result F39 of B. Now suppose Bob has item C, which is associated with bins 3 and 7. Recall that Bob can compute the function F37 on any input. So he simply computes F37 of C and sends it to Alice. Similarly he has item D, which is associated with bins 4 and 7, so he can send F47 of D to Alice. And the important thing here is that Bob sends only one F output per item. It's because of this fact that we avoid the attack that I mentioned before. For every item, there's only one F output for Bob to send. He either sends it or he doesn't. He can't halfway send it, like in the previous example. So it's much easier to simulate. Of course Alice can identify the intersection by seeing which of these PRF outputs she recognizes. The other PRF outputs look random to her. Our protocol requires Alice to secret share her items into this vector S. Now how exactly does she go about doing this? Let's imagine that the vector S is initially empty. She can set one location arbitrarily, and then she can find an item where only one of its two bins is unset. For example item A is assigned to bins 2 and 7, and at this point only bin 7 is unset. Since S2 and S7 are related by a linear constraint, Alice can easily solve for the correct value to put in S7. And she can repeat this process again and again. For example item C is assigned to bins 3 and 7, and so far only bin 3 is unset. So she can solve for the value to put in S3. And she can continue until all relevant positions of S are set. If there's any positions left over, their values don't matter and they can be set arbitrarily. Now unfortunately this process doesn't always work, as I described. It only works if the assignment of items into bins induces an acyclic graph. The graph that I'm referring to is called the cuckoo graph. Each item that we want to insert corresponds to an edge in this graph, and the endpoints are the two bins associated with that item. So if this process doesn't work in general, what do we actually do in our protocol? Well, we slightly generalize the problem. Abstractly speaking, the protocol is asking Alice to generate a vector S that satisfies a linear constraint for each of her items. That linear constraint involves the XOR of two positions of the vector, corresponding to those two bins associated with that item. However, the protocol still works if we consider constraints that are the XOR of more than two positions in the vector. We call this generalized data structure problem a probe and XOR of strings or Paxos for short. In the simplified version that I've shown you so far, the probe positions for an item X were H1X and H2 of X. But the more general Paxos allows for any set of probe positions for each item. In order for our protocol to work, we need the corresponding set of constraints to actually admit a solution. The simple Paxos that I showed you with only two probes per item does not lead to a satisfiable set of linear constraints often enough to be useful. The size of the Paxos vector corresponds to the number of OPRFs that we use in the PSI protocol. Since this is the main cost of the PSI protocol, ideally the size of the Paxos encoding should be linear in the number of items. Finally, you can think of Paxos encoding as solving a system of linear equations. But solving a system of equations can have cubic cost in the worst case. And we would like a Paxos scheme with enough structure to allow for a linear time in coding process. As I mentioned, the simple idea of probing two positions per item only works when the cuckoo graph happens to be acyclic. Since this condition turns out to be rare, it's not a useful Paxos construction by itself. You could also think of probing each position of the vector with probability one-half for each item. This indeed leads to a good Paxos construction with optimally small size. But unfortunately, the encoding process requires solving a random system of linear equations which does take cubic time as I mentioned. Previously, Dong, Chen, and Wen proposed a data structure called a garbled bloom filter. It's similar to what I described earlier, but instead of only two probes per item, there are lambda probes where lambda is a security parameter. Unfortunately, garbled bloom filters require the encoded vector to be lambda times larger than the size of the items. In this work, we introduce a new Paxos scheme that we call garbled cuckoo tables. It has linear size and a linear time encoding procedure. In the next slide, I'd like to give you some of the main ideas behind this new Paxos data structure. We start with a simple secret shared cuckoo idea that I mentioned before, where each item is secret shared between two positions in the vector. As I mentioned earlier, we can't guarantee that the cuckoo graph will be free of cycles. This example has a cycle of length three. To cope with such cycles, we're going to add K extra components to the Paxos vector. I'll call these extra components auxiliary positions. Now, each item X is secret shared not only across the two positions H1 of X and H2 of X, but also across a random subset of those auxiliary positions. For example, item A is meant to be secret shared to position S2, S7, and a random collection of the auxiliary positions. Now, how do we efficiently find a solution for all of these linear constraints? First, we'll start with an empty Paxos vector. Next, we'll identify all the items that participate in any cycle in the cuckoo graph. Recall that items correspond to edges in the cuckoo graph. The cycle in this example corresponds to items B, C, and E. Next, let's look at the system of linear equations induced by these cycle items. Their linear constraints refer not only to the vertices in the cuckoo graph, but also to these auxiliary positions. Since each item is assigned a random subset of auxiliary positions, we have what is essentially a system of random linear constraints. There are as many constraints as there are cycle items. And if the number of auxiliary positions is just a bit more than the number of cycle items, then a solution exists with overwhelming probability. Since a solution exists to this system of linear equations, we can compute it. But our only option is to compute it the hard way using the cubic time Gaussian elimination. Fortunately, the cost is cubic in the number of cycle items. So if the number of cycle items is very small, then this computation will be cheap when measured in terms of the number of total items. Now we have fixed all the auxiliary positions of the Paxos vector, as well as some of the primary cuckoo positions. All the cycle items have satisfied their linear constraints. The only thing left to do is satisfy the constraints of those other items. And we can do this using the iterative process that I described earlier. We find an item with only one of its probe positions unset, and then solve for that unset position. This process succeeds just as I described it before, because after processing all the cycle items, the remaining cuckoo graph is acyclic. And as I mentioned, we can't guarantee that the original cuckoo graph is acyclic. But we can choose parameters so that the number of cycle items is very small, logarithmic in N, with overwhelming probability. With such a small number of cycle items, in the worst case, both the size and the encoding complexity of this data structure is linear in the total number of items. So that's the big picture of our new malicious PSI protocol. I didn't have time for all of the technical details, so I invite you to find them in the full version of our paper. In summary, I showed you our new malicious two-party PSI protocol, which is the fastest one to date. It nearly closes the performance gap between semi-honest and malicious PSI. Along the way, we introduce the new interesting data structure called Paxos, which encodes a set of items into a vector. We also showed how to construct the first Paxos with linear size and linear complexity. Well, we've reached the end of my talk. Thanks for your time and your attention. I hope you learned something, and I hope to see you all in person at the next face-to-face IACR conference. Bye.