 Hello, this is a joint work with my advisor Mike Rosalek at Oregon State University So private set intersection refers to this problem of two or maybe more parties each holding a set which they wish to keep private But they still wish to learn the intersection of the sets. So Throughout this talk, I'll use the terminology of a sender and a receiver and they built provide their sets, but only the receiver learns the intersection to motivate this One of my favorite applications of private set intersection is sort of this contact discovery Scenario where say like signal has a big list of users and then there's some new Customer that just signed up for the service and the customer wants to learn which of its contacts use the signal app and so But they don't necessarily want to reveal all their contacts and signal doesn't necessarily would just want to publish the list of users and so they can run private set intersection protocol where only the The user learns the other people that are that they know also on the service And there are many other applications, but I'll move on one of our fundamental building blocks that we use is a oblivious transfer and in this sort of functionality Alice on the left hand side has two messages of strings and Bob has a single bit C and Bob wishes to learn MC and nothing about the other message and Alice shouldn't learn the seat see the bit So there are highly efficient and malicious Secure protocols for this and so it's sort of for this protocol and it motivates it as the basis for private set intersection That's practically fast Another one of our building blocks is known as a bloom filter. This is a data structure. That's sort of similar to a hash table Except for it only allows for testing of set membership. So is an item in my set And it's parametrized by k hash functions h1 through hk and And a bit array denoted here by B, which is initialized to all zeros So to insert an item into this bloom filter You simply set all the bit locations indexed by the hash functions to one and you can repeat this operation for other other items simply just by repeatedly sending them to one regardless of what they were before and then Not too hard to figure out to test set membership We simply do the same thing again But now what we do is we take the bitwise or the bit and of all these and that reports whether an item is in an intersection Or in the bloom filter or not So it's easy to see when you insert like n items into a bloom filter with sort of m slots There's no false negatives in that we always if an item is in it. We always say yes, it's it's there But there is some sort of probabilistic property of whether for an item that hasn't been inserted We may falsely report it as being inserted and this sort of But this bad case can be bounded to be negligible in your security parameter and like intuitively what happens is that? At least one of the hash functions will likely hit one of the zeros and we just make this event very likely for all items which haven't actually been inserted Another cool property of bloom filters is that the bitwise and of two bloom filters is itself a valid bloom filter for the Intersection of the two sets so to see this We can just take the end of these two bloom filters and in fact A is the only item in the intersection and you can see that we do still report that as a being in the bitwise And of them however, there's some sort of additional information that's being leaked here Particularly that like bit in the middle, which is a sort of randomly get set to one and if you think about like the simulation of this The simulator only knows the intersection itself and so it wouldn't be able to set this know which bit to set additionally And so this somehow inherently leaks more information than just the intersection itself Dong Chen and when in 2013 sort of saw this Optimization and sort of overcame this site limitation by adding some additional stuff. So here on the right the receiver samples Generates our traditional bloom filter and then on the left hand side the sender generates this Array of strings of random strings of the same length as the bloom filter and Then they perform an oblivious chance for each of these rows where the Selection bit for the oblivious chance is the bloom filter bit and so by doing this the receiver picks up what's known as a garbled bloom filter and This is so they if they have a bit of one they learn the message Otherwise, they learn this bot string, which is just thinking about all zero string or anything. That's not important and now we have the sender on the left hand side construct their own standard bloom filter and final sort of Brilliant thing that they saw is that you can export together these Messages that are indexed by the hash functions to create a sort of an encoding of the corresponding value So a is encoded as and zero X or m5 do the hash functions and so then we can send over these encodings to the receiver and Because they use oblivious transfer to pick up all the messages that they are indexing They can also generate the corresponding sort of set of encodings for their items in their set And as you can see this encoding for a is in both of these Encoded sets and so you can sort of translate back to the original set what the intersection is So in the semi-honest setting It's not hard to see that this is secure against us semi-honest sender and that One way I like to explain is all arrows leave the the sender so it's can't leak any information But totally the OTS hide the selection bits and so it's great And then it is also going to be made secure against a semi-honest receiver Pretty much the only thing that they can do is that for some why that's not in their set They may be able to learn the the encoding So in this case Coding of y prime is m3 x or m4 but as you can see m4 isn't in the set of messages that they know and In fact Dong Chen Wen Showed that there's sort of an equivalence between the false positive rate in like a traditional bloom filter and this sort of style of attack and so this can just be bounded to be negligible and So in the semi-honest setting everything works out great Unfortunately when we go to the malicious setting things aren't quite as nice as usual in particularly it's in ski against a receiver and Once you see this attack. It's obvious they could instead of using zeros in many places They can simply always use the one string One bit and pick up all the corresponding messages in there and therefore they can simply probe for the Sort of brute force attack this X hat term and recover all Values so in their paper the John Dong Chen Wen sort of proposed first they proposed a semi-honest protocol And then they sort of a property-based manner Put forth a sort of a countermeasure to this attack Which they claim to make it malicious secure and their main idea was to restrict the receiver to only using valid bloom filters so here this all one blue filter isn't valid and This what this translates when you actually want to do it is that we want to make it roughly like half the bits In this bloom filter be one and half zero So their idea was to make the receiver prove there's arrow choice bits And by doing this they have the sender first sample a random key s And then generate a secret sharing or m over 2 out of m secret sharing of this S term denoted these s i's and then we will encode Instead of just transmitting nothing for the zero messages of the OT we'll put these si terms in there and Finally, we'll encrypt this final message under this sort of master secret s and by doing this to learn the intersection the receiver must decrypt this X hat term and However, they have a problem in that in all the places that they should have used zeros They've had used ones and so they didn't get any of the secret shares of s and couldn't decrypt this and so by adding this countermeasure we force the Receiver to use many zeros and preventing the initial attack So one obvious question is is this secure and We did successfully force the receiver to only use roughly like half ones and half zeros However, as we show in this work and in another work by lamb bake We by doing this countermeasure We actually introduce a selective failure attack on the part of a sender so now it was insecure against the receiver now insecure against a rescender and Once you see it's pretty obvious if they replace one of these Si terms with a random string are and then if this our term gets picked up by the other party They will try to use it as a part of the secret sharing and the secret sharing will not be Resolved to the correct Master secret s and so the output of the intersection if they pick up this If they pick up this bad our term will be the null set and whether this succeeds or fails inherently depends on their full set y and Therefore it can't be simulated because the simulator only knows the intersection and not necessarily the full set y So we sort of tackle this problem in a different way by making Making the receiver proof that they use zeros and input in input independent way So first we have the receiver simply use all random bits instead of Bloom filter as before And then we have the sender sample Send over a challenge to reveal some subset of these OTS and the receiver must prove that they use zeros in many places If they if they see so for example We might challenge on these two and we would expect roughly half of them to be zeros and or in this case They need to prove that they don't know ours R2 and then we eliminate these used up OTS and Consolidate the remaining ones, but now we have another problem in that this doesn't correspond to the bloom filter that we want to continue with the protocol and so what we do is We need we first have the receiver construct the bloom filter that they wish to have and then they send a random permutation that maps these random OTS to the desired bloom filter and Intuitively one way you can do this is just you keep picking random Items and bringing them up to the top and because they're sort of this randomized process It doesn't actually reveal any additional information And so now we arrive at the bloom filter that we desire and we can simply complete the protocol as before So one challenge and making this technique work Is that these random OTS and this cut and choose challenge may not exactly result in exactly half ones and half zeros? So an example I gave of course it works But an equally likely example you would get all ones and the only sensible thing here would be to abort so we need some robust way to sample check these zero bits and The so one way to think about this is that we want to make sure the prop property that the probability of accusing the bad Or a good guy of cheating is negligible and so Because this is randomized process the number of zeros that we'll see some falls some sort of distribution centered at the expected value But with and then if we see significantly fewer zeros we're gonna show an abort and so it's Pretty straightforward to be all we need to say is that we need to bound the area below the abort threshold to be negligible And turn off bounds are the perfect tool for this But the sort of other property that we need is that for a bad guy who may provide significantly fewer zeros that we still catch Them with a high probability and if they just use a couple fewer than specified You know it's gonna be hard to detect that and so we ask the question is if they use significantly fewer zeros like how many can they get away with and still not succeed with anything but negligible probability and So we can again apply a turnoff bound to figure out where that how far that Distribution can be shifted to the left and we cover this sort of t-value, which is Proportional to the number of zeros that they can get away with and so by doing this We can sort of bound the advantage of the adversary and just sort of make sure that the bloom filter is large enough And one cool property of this is that in practice? We only need to check like 1% of all these ot's and so it adds almost no overhead compared to the traditional semi-honest protocol and then Finally, I want to sort of switch gears a little and talk about one of our assumptions that we rely on the random Oracle model and to see this one of the important properties that a simulator in the malicious setting must do is extract the Effective input of the corrupt parties and this is case-case. We're considering a corrupt receiver and what the simulator must do is like extract first extract What their input is in this case y and then send it on to the ideal functionality this psi box And this is sort of the simulation based proofs And then this this box gives back the intersection which allows the completion of the simulation And so that's what we want to do, but how the question is how do we extract this set y? The first thing you might deserve is that we're doing these oblivious transfers which they allow you to extract the selection bits And so we can learn the corresponding bloom filter that they were used However, this turns out although you might think it's all we need it's not the bloom filter By themselves are not invertible naturally at least that is like just given a sequence of bit ones and zeros that form a bloom filter It's highly non-trivial to sort of find out what set it corresponds to and While they do exist invertible bloom filters To make this situation worse is that this bloom filter may not even correspond to any set in that the malicious adversary may Just send you completely random messages and the simulator somehow needs to interpret this as I guess the null set so that seems very difficult and So we take a slightly different route and we model these hash functions that map into the bloom filter as a random oracle and by doing this we allow the simulator to observe the Which items were potentially inserted into this bloom filter and It's in particular so first we observe which items were extracted or Hashed with the random oracles and then we check them against this bloom filter. This allows us to extract sort of the set y An important point to be made is that even if additional calls to this random oracle are made if they weren't actually inserted into the bloom filter Itself they won't get extracted. So it's a two-stage extraction which then we can send to the ideal PSI and finish the simulation one in a One important property that I guess we use is that we don't require the Programmability of these random oracles. So it's sort of a weaker random oracle. We just need to be able to extract And then here I'd also like to note While we design a private set intersection protocol another way to view sort of our the core construction is as oblivious PRF Protocol where at I one party provides a set and they get the PRF f Evaluate at it and the other party gets the function f itself And so OPRF so I use pretty widely I guess in crypto and so you might consider using this sort of portion of our protocol and then we to turn it into PSI you simply Send over the final encodings. So keep that in mind And then Yeah, so we compare ourselves to Decrystaphara came in two weeks of 2010 work, which is a much to scare sort of Diffie helman style Protocol this they use exponentiation and several zero-knowledge proofs for each item in the intersection and So this results in significantly more running time between like a hundred and thirty five times slower and just because This just goes back to like how much more efficient oblivious transfer is compared to exponentiation and then You could also consider the trade-offs between the amount of communication and running time so shown at the bottom dot is their protocol Which requires significantly more time, but it's less less data. And so there's some sort of trade-off here If you measure it, we're 38 times faster But we still send 23 times more data and this is for the 1 million size sets. And so there's some little bit of trade-off here and even if you some Times these Exponentations are inherently sort of paralyzable. So even if you add more threads though, we're still out perform competition And sort of an interesting side note is that we because we could implemented the original Dong Chen Wen paper and somewhat surprising the malicious variant So much surprising is that their protocol required significantly more time than all of them even though it looks pretty similar and This sort of comes back to the point that we're doing in their protocol that required like You know 100 million out of 50 million secret sharing So while secret sharing is inherently sort of fast when you scale to that size things slow down significantly So that was sort of an interesting side note that we observed that our protocol is definitely faster than the previous one even if The previous one was insecure and Then another interesting point to compare against an industry PSI is performed, but they use sort of tish traditionally they use an insecure variant where you simply pull up Put the items through a one-way function and then compare these Unfortunately, this is insecure. You can like launch brute force attacks and other things But it's interesting to compare against and you so you can see that we're our protocol is still significantly slower than Sort of this insecure industry private center section But we're making a big step towards in the right direction And we can also compare against the semi honest setting of These protocols use some different techniques, but they also build on oblivious transfer and so you can see That we do move quite a ways towards it, but they're still Much more work to be done in this area for in the militia setting at least And that concludes my talk, so thank you Thank you We have plenty of time for questions He can He can and he if he does he can Yeah, so we just use that We observe the court oracle queries to get a like a sort of condensed set of items that might be in the bloom filter And then we can test whether any of the bloom filter bits in this Have been set and this sort of there's there's a very subtle argument here that I sort of glossed over that is Why this works, but you can say that if none of the bits that they're hashed into this If none of the indices are set if all of them are set to zero then with high probability they Yeah, I don't know if that answers your question, but You yeah, you can you can just be confident that there's at least one message that they would not be able to recover Because there's this XOR term Wait for my You use the signal messenger and So-called insecure protocol where they hash phone numbers and then then compare it on the server as an example as a motivating example However, that's a many-party protocol many users one service provider and here you just have to how how would this look like this? Protocol in the large context. I think it would just be you just always write it with the signal service Because they have it's not necessarily like yeah, the signal has a list of all users And so and so you can just run it as a pairwise between you and the server But the server should not know the phone numbers in this case. Well, I think they do though Well, okay, then you you might need to add some interaction. That's future work Just wondering the Samira honest version of the bloom filter solution How does that compare with sort of the state of the art? It's like 40 times slower Lots slower. Yeah, that's because the blue filter is large Yeah, it's just the bloom filter has this like security parameter blow up And so you have to do like the number of elements times the security parameter number of oblivious transfers versus sort of these Pinkest style ones that are the best in semi-honor setting just do one oblivious chancellor per element