 Hello, I'm Pei Han. I'm going to present multi-party threshold private-set intersection with sublinear communication. This is joint work with Sai Krishna, Shrinivasan, and Peter from Visa Research. We start with the problem of private-set intersection, or PSI for short, where there are two parties, Alice and Bob. Each of them has a private set, X and Y. And they want to jointly compute the intersection of the two sets, but nothing more. In particular, we don't want Alice to learn anything about Bob's other elements and vice versa. They can run a secure two-party computation protocol, and by the end of the protocol, they can learn the intersection. In this case, it has two elements, B and D, but nothing more. In particular, Alice would have no idea about Bob's other elements and vice versa. More generally, we can consider multi-party PSI, where there are multiple parties each party has a private set, X1, X2, up to Xn. And they want to jointly compute the intersection of all the sets. Similarly, they can run a secure multi-party computation protocol, and by the end of the protocol, they can learn the intersection of all the sets, but nothing more. The PSI problem has a lot of applications in practice. For example, private contact discovery, as conversion measurement, password bridge monitoring, and alert, and many more. But in certain scenarios, PSI as a functionality is not sufficient. We need something stronger. For example, threshold PSI, where the two parties are only supposed to learn the intersection if the intersection is sufficiently large. And similarly, in the multi-party scenario, they're only supposed to learn the intersection if the intersection of all the sets is sufficiently large. And threshold PSI also has a lot of potential applications. For example, in privacy preserving machine learning, multiple parties might want to jointly perform some machine learning algorithm on their common data. But they might only want to collaborate if the common data is sufficiently large. In privacy preserving right sharing, multiple parties might only want to share a right if their common trajectory is sufficiently long. Now the next question is how to achieve threshold PSI. A natural approach is to first compute PSI cardinality, which is the size of the intersection, and then check if it is sufficiently large. And the best communication complexity that we can achieve using this approach is of order m, where m is the size of the smallest set. On the other hand, there is a communication lower bound for PSI cardinality that's also of order m. So by looking at this upper bound and lower bound, it seems like this is the best communication complexity that we can hope for, for threshold PSI. But is it true? Surprisingly, the answer is no. In a recent work by Gaussian Simkin in 2019, they developed a new approach for a two-party threshold PSI, where instead of computing the PSI cardinality and then checking if it is sufficiently large, they perform a private intersection cardinality testing to test if the intersection is sufficiently large. And this way, they can get around this lower bound for PSI cardinality. They can achieve communication complexity that only grows with the size of the set difference. In particular, they achieve the functionality that the two parties can only learn the intersection if the total set difference is of size at most t, where t is the threshold. So if the intersection is sufficiently large, then the set difference is sufficiently small. So the threshold t can be very small. It could be sublinear in the set sizes. And they achieve two protocols, one based on fully homomorphic encryption or FHE with communication complexity order t. And another protocol based on additive homomorphic encryption or AHE with communication complexity, roughly t squared, where this O2 here has a polylog factor. And they further show communication lower bound for such kind of cardinality testing of order t. In the rest of the talk, I will refer to this work as GS-19. And in this work, we ask two questions. The first question is whether we can extend this new approach for two-party threshold PSI to the multi-party case. And second, we notice there's a gap between the upper bound and lower bound for threshold PSI from additive homomorphic encryption. So can we bridge this gap? Can we achieve better communication complexity from assumptions weaker than FHE? I will talk about our results in a minute. But before we dive into our results, I want to take a step back to consider how to formally define multi-party threshold PSI. At a high level, the parties will only be able to learn the intersection if the intersection is sufficiently large. But what does sufficiently large mean? How should we define sufficiently large? And there are actually different ways to define it. The first option is to define it as they can only learn the intersection if the set difference of every set xi and the intersection i is of size at most t. This is the equivalent to saying the set intersection is sufficiently large. It is at least the size of each set minus t. So this is one way to define multi-party threshold PSI. And another way to define it is that the parties can only learn the intersection if the set difference between the union of all the sets and the intersection is of size at most t. So this is essentially saying that the entire set difference is sufficiently small. If you think about these two definitions, they seem to be very similar. And they seem to be defining the same thing from two different perspectives. And in fact, in the two-party scenario, these two definitions are equivalent. And the reason is that when there are only two sets, then the set difference between the union and the intersection is exactly the first set difference plus the second set difference. And if we further assume the two sets have the same size, then this is exactly two times a single set difference. However, this is not true in the multi-party scenario. In particular, this equation no longer holds in the multi-party case. And this is not a constant anymore. And the reason is that in a multi-party scenario, apart from the intersection, the rest of the sets could be overlapping by very little or they could be overlapping by a lot. And that's why these two definitions are so different. And in fact, the set difference between the union and the intersection could be anywhere between two times a single set difference to n times a single set difference where n is the number of parties. And because these two definitions are so different, in our work, we consider both functionalities. We refer to the first functionality as TPSI int, which considers the intersection to be sufficiently large. And we refer to the second functionality as TPSI diff where we consider the set difference to be sufficiently small. And now we're ready to present our results. So we consider both functionalities for multi-party threshold PSI. And we consider communication lower bound and communication upper bound. In terms of a lower bound, we show a lower bound of order n times T for both functionalities where n is the number of parties and T is the threshold. And the lower bound is proved in point-to-point fully connected networks. In terms of upper bound, we construct three different protocols, two from n out of n threshold fully homomorphic encryption or TFHE and one protocol from n out of n threshold additively homomorphic encryption or TAHE with the communication complexity listed in the table. So TFHE and TAHE can be thought of as FHE and AHHE in a distributed manner by multiple parties. So all the parties can jointly generate a public key and a secret key that supports fully homomorphic encryption or additively homomorphic encryption. But the secret key is shared by all the parties. So they can only do a decryption jointly by all the parties together. And all these upper bound, all these protocols are secure against semi-honest adversaries corrupting up to m minus one parties. So these are our main results. Next, I want to mention a few implications from these results. First, as I mentioned, these two functionalities are equivalent in the two-party scenario. And if we set n equals two, then actually we can achieve a two-party threshold PSI from additively homomorphic encryption with communication complexity roughly order T. This solves an open problem from the GS-19 work, as I mentioned earlier. And second, although these functionalities are only for multi-party threshold PSI, they can actually be used to achieve multi-party PSI with communication complexity that only grows with the size of the set difference, which could be sublinear in the set sizes if the set difference is very small. And this can be done by doing a binary search on the set difference, on the size of the set difference. And finally, these protocols can be thought of as a compact MPC for specific functionalities where the communication complexity is sublinear in the output size, in the output length. Because in these functionalities, the output is the intersection, which could be much larger than the set difference, could be much larger than the communication complexity. These are our results. Next, I want to briefly talk about the high-level ideas of our techniques. For lower bound, we do a reduction from two-party threshold PSI to multi-party threshold PSI, and then we can rely on the lower bound to prove in the GS19 work for two-party threshold PSI. And then for upper bound, all these three protocols follow the same paradigm. That is the protocol has two steps. The first step is a multi-party private intersection cardinality testing to test whether the intersection is sufficiently large or the set difference is sufficiently small. And in this step, the parties will only learn a single bit indicating if the cardinality testing is passed or not. If it is passed, then they will move on to the second step to compute the actual set intersection. For the first step, again, we will consider two functionalities, CTestInt and CTestDiff, for whether the intersection is sufficiently large or the set difference is sufficiently small, corresponding to these two functionalities. And we construct three different protocols, two from TFHE and one from TAHE with communication complexity listed in the table. For the second step, we construct a single protocol from TAHE with communication complexity n times t that works for both functionalities. And then by combining the cardinality testing and the second step, we can achieve these three protocols for the two multi-party threshold PSI functionalities. And next, I want to mention two concurrent work. The first is the full version of GS19 where they extend the two-party threshold PSI to the multi-party case and they consider the first functionality TPSIInt and they present a protocol from TFHE. And the second concurrent work is by Branco, Dothlin, and Poo, which was also published in PKC this year. They consider the cardinality testing for the first functionality CTestInt and they construct a protocol with communication complexity roughly n times t square from TAHE. And by combining this cardinality testing protocol with the second step, they can achieve a protocol for TPSIInt with the same communication complexity from TAHE. So they complement our work. In the rest of the talk, I want to focus on two of our protocols. One is for CTestInt from TFHE and the other is for CTestDiff from TAHE. And these ideas are also used in the other protocols. For simplicity, we will assume all the sets have the same size, but if they don't, then we can pad the smaller sets with dummy elements to make all the sets have, to make all the sets have the same size. So first protocol for CTestInt, the functionality is the following. The parties will learn a single bit from this functionality and the output bit is equal to one if and only if the set difference between every set XI and the intersection is of size at most T. So this is the functionality. And the high-level idea for the TFHE-based protocol is the following. First, every party will define a polynomial based on their private set. That is defined as the product of X minus each element in their set. So in this example, there are three sets and each party will define their own polynomial, P1 of X, P2 of X and P3 of X. And then we consider a rational function, P of X, which is a fraction of two polynomials, where the numerator is a summation of the polynomials from P2 of X up to PN of X and the denominator is a single polynomial, P1 of X. If we write out this rational function in this example, then we can see that the elements in the intersection will be canceled out and what's remaining in the numerator and denominator are exactly the set differences. So now we only need to check if the numerator and denominator both have degree at most T. That is to test whether the rational function P of X has a degree at most two T or equivalently whether the rational function P of X can be interpolated from two T plus one evaluations and this can be done homomorphically under TFHE. In more detail, the protocol works as follows. First, the end parties will jointly generate a public key and secret key for the TFHE scheme where the public key is known to every party and the secret key is shared by all the parties. So the equation can only be done jointly by all the parties together. And next, every party will evaluate their own polynomial on two T plus one different points alpha one up to alpha two T plus one and additionally evaluated on a random point Z. And then all the parties will equip all these evaluations and send these equations to the first party. And then the first party can homomorphically interpolate the rational function P of X from two T plus one evaluations and then homomorphically test if the interpolation is correct by verifying if P of Z is equal to the value computed from P one of Z up to PN of Z. And this homomorphic evaluation will result in an encryption of a single bit B. And finally the parties can jointly decrypt the final result B. And then we're done. This protocol seems to be correct but there is actually a subtle issue that's related to the polynomial cancellation. So if we look at this cancellation step the elements in the intersection will be canceled out for sure from the numerator and denominator but there actually might be some unexpected cancelling out. And here is a concrete example where we'll consider three sets and every set contains two elements. And if we write out this rational function we notice that the elements in the intersection will be canceled out but additionally the polynomials might be canceled out in an unexpected way. So to fix this issue we add a random term to every polynomial. The first party will add a random term X minus R one to the first polynomial P one. The second party will add a random term X minus R two to the second polynomial and so on. And after adding all these random terms we can make sure that we can make sure that the unexpected cancelling out would no longer happen. And this same idea is also used in the second step to compute the actual set intersection. And that's the high level idea for the first protocol. And next let's see the protocol for CTestDiff from TAHE. Again, the functionality for CTestDiff is the following. The parties will learn a single bit from this functionality and the bit is one if and only if the set difference between the union of all the sets and the intersection is of size at most T. The high level idea for TAHE based protocol is the following. First, every party will construct another polynomial that's different from before. Now it's a summation of X to the power of each element. So they will construct a polynomial as P one of X, P two of X and P three of X. And then we consider a polynomial P of X which is defined as N minus one times the first polynomial minus the remaining polynomials P two of X up to PN of X. And if we write it out, we will see that only the intersection only the elements in the intersection will be canceled out and the polynomial becomes like this. If we look at it, the number of monomials in this polynomial is exactly the total set difference. So we only need to test if the number of monomials in the polynomial P of X is less than or equal to T. To test that, the work of Gregor rescue at all in 2010 and GS 19, notice that this problem can be reduced to the problem of testing singularity of a Hankel matrix that looks like this, where U is a random element and P is the polynomial that we're interested in. And the GS 19 work stopped here and let the two parties jointly test the singularity of this Hankel matrix. But in our work, we further notice that this problem can be further reduced to another problem called half GCD by the work of Brent at all in 1980. And this half GCD problem can be solved in time, T log square T by the work of Thor and Yap in 1990. So by doing this additional step, we can reduce the communication capacity from roughly T square to roughly T. To summarize, we formalize the problem of multiparty threshold PSI by two different functionalities and we study is communication lower bound and upper bound. Finally, I want to mention a few open problems. In terms of efficiency, can we achieve better upper bound for C test int from TAHE? Can we achieve better around complexity and practically more efficient protocols for threshold PSI? And in terms of security, can we achieve threshold PSI with malicious security? And finally, our lower bound, our communication lower bound are approved for networks with point to point communication channels. Is there any difference in networks with broadcast channels? And that's it. Thank you for your attention.