 Thanks. Good morning, everyone. My name is Xiaowang from Universal Maryland. Today I'm talking about faster secure two-party computation in the single execution setting. It's a joint work with Alex Molotov and Jonathan Katz, my advisor. So let's first give a high-level view of two-party computation implementations. So on the one hand, we have a lot of implementations and systems for semi-honest 2PC, and we have developed tons of applications, including privacy, preserving machine learning, matrix federalization, genomic computation, and all this stuff. On the other hand, when we take a look at the malicious 2PC, we only have AES, Xiaowang and Xia256, and that's all. So why this is the case? So we can summarize it in two points. So on the one hand, malicious 2PC is much slower. It's less efficient. On the other hand, it's very hard to scale to large input. And when we actually, according to our experience, when we directly scale it to large input, it becomes much, much slower. So in this real work, we want to partially address these two points and make it better. So let me first have some kind of a high-level view of the previous paradigms for 2PC. So we have circuit-level card and choose, which is based on a couple circuits where we do card and choose on a circuit level. It starts with India Pincas at 2007. So the most recent line of work is able to achieve a row number of circuits with a row statistical security parameter. And we also have gate-level card and choose, start with TCC09. So this line of work is also known as LEGO, and we also have other protocols for the amortized setting and preprocessing setting. So in this work, we focus on the circuit-level card and choose. And we notice that for our previous work, we either need a large number of public keys proportional to the input, or we need another execution of a 2PC protocol that is outside of this protocol to do the input recovery. So here in our work, we are able to reduce the number of public keys to Kappa, which is essentially what you always need for the base OT. And also, we don't need an additional 2PC protocol. So let me start with a semi-honest 2PC protocol based on a couple circuit. So let's assume that Alice and Bob want to compute a function with only one input from each side and one output. So the first step is to compute a couple circuit with its related garbled keys. So here the solid bars are for one bit and the empty bars are for zero bit. So Alice can first send the garbled circuit, send the labels corresponding to her own input, and then they can do a protocol called oblivious transfer. The color is not showing very nice. There's a box. There's supposed to be a great box around oblivious transfer. And then the protocol allows Bob to learn his own label and without letting Alice to know the value of Y. And then Bob can just evaluate this and get the output. However, this is obviously not maliciously secure for many various reasons. And so one of the paradigms is to use majority-based cut-and-truths. So in this case, instead of just sending one circuit, Alice is going to send about a three-times row number of circuit. And they can still do oblivious transfer to let Bob learn the input labels. And Bob can now do a cut-and-truth and roughly check half of the circuit whether they are correctly generated or not. So if any of them are not correctly generated, then Bob knows that Alice is cheating. And if not, Bob has a high-level idea that Alice is mostly being honest. And then from a high level, Bob can evaluate all the circuit that is not checked. And so Bob is expecting that actually the majority of the output is going to be the correct output. And the mess actually works out that if we send a three-times row number of circuit, everything works out. So more recently, there's a new paradigm called input recovery-based cut-and-truths. So in this case, the difference is that we only need a row number of circuit, which is kind of what the best we can get. And in this case, Bob is still going to evaluate all the circuit and some of them is going to give some other result that is flipped. So it essentially means that Bob is able to get both of the labels for one of the wire. And then they are going to run a protocol, something that Bob can input the evidence that Alice is not behaving correctly. And the protocol will allow this small box, input recovery, will allow Bob to learn Alice's input X. And then Bob can compute the function directly locally. So in this case, we just need to guarantee that at least one of the circuit is good. Therefore, we need like a much less circuit. We just need a row circuit. So there are mostly three common issues in these kind of protocols. One is to how to design an efficient input recovery protocol, I just shown. And also, there are also attacks that we need to address efficiently. So I will first talk about input recovery. So input recovery is this little box. Again, please imagine that there is a great box around. So prior works either require a large number of public key operations and another malicious 2PC protocol to compute this input recovery. So in this paper, we design a new protocol that is based on DDH and it just requires six exponentiation per circuit and then like four of them is fixed based, actually. So from a high level idea, so it's kind of follows the non-interactive 2PC protocol by AMPR, actually. So first, Alice is going to have keys and seed for each of the circuit and he's going to use all the randomness that I derived from the seed and the stuff that is supposed to be learned from the evaluation circuit is encrypted by the key. So they are going to run oblivious transfer such that Bob knows the key for the evaluation circuit and knows the seed for the check circuit. So very different from other previous works, we are going to let Alice run oblivious transfer where Alice is the receiver and Bob sends some random labels M0 and M1. And then after we are done with this once for each of the circuit, Alice is going to kind of mask the input, mask the random labels she received in the previous oblivious transfer and send the encryption of this masked value. So in some sense, this is, in some sense, a commitment to the X because Alice does not know the other label so she cannot really flip the label. And Bob can only get such value for the evaluation circuit but not the check circuit. After that, they can run like the remaining part of the protocol and at the very end, they can run a small protocol that is actually based on DDH such that if the value of delta and omega are the same, then Bob can learn the seed otherwise Bob learns nothing. So here, so if Bob learns the seed, it means that Bob learns seed another key for one of the circuit. And at that point, then Bob can actually strip out, decrypt and strip out the mask to learn the X. So this is just semi-honestly secure and there are more effort that you need to do to make it maliciously secure. However, it turns out that everything can be incorporated into the giant cut and choose efficiently. So more details, see the paper. Well, so the next issue is how to do this selective failure attack. So this is not like a new attack. It's very famous and everybody know this kind of attack. So essentially, the attack is that the gobbler can actually... So when trying to let Bob know the input labels, gobbler at least can kind of corrupt one side of the labels and such that whether Bob can evaluate the circuit depends on his own input. Because let's say in this case if Y is one, Bob learns the solid line which is valid, but if Y is zero, Bob learns like a corrupted red bars like garbage. So mostly the way to prevent selective failure attack is to use so-called probe matrix. So in the following, I'm going to introduce our improved probe matrix. It's kind of a dress like some hidden non-cryptographic overhead that is largely ignored in the previous papers. And for just like 64K bit input it actually has a 1,000 times improvement in terms of the running time. There are also other ways to address this kind of attack but that requires a large number of public keys. So let me first recall probe matrix is a matrix such that if we choose any subset of the rows and XR them, then they are at this row number of ones in the result in the row. So once we have this kind of probe matrix and given we have the input Y, we just need to find a random Y prime such that the matrix vector multiplication holds. So once we have this, this is the old protocol that is not secure against selective failure attack. We have Y, we have E. E is public and Y is known to Bob. So Bob can first select a random Y and feed Y into the oblivious transfer. And now the garbage circuit and instead of computing FXYs it will first recover the real input in the circuit and then do the following computation. So if the row probe matrix translates N number N input into N prime input so previous works we say that okay so we need about N prime number of OTs and we are happy because computing the row probe matrix is free thanks to free XR. However if you actually calculate carefully actually number of XR is quadratic to the input. So I mean XR is free, it's Y instruction but it's still N square and what's worse the size of the row probe matrix is also quadratic. So it means that in order to compute row probe in order to do this computation in the circuit we need to compute N square number of XR and in each step we also need to look up a N square number of size matrix to find out whether we should XR or not. So for example if we have again 64k bit input it's about 4 billion XR operation and the matrix of 4 billion bit assuming that you are packing all the bit very compactly. So this is our next task we want to reduce the size of this row probe matrix make it efficient and still secure. So I mean first idea is let's just chunk off the unnecessary part. So I mean so ordinarily it shouldn't be secure because I mean we lose a lot of randomness. However so notice that if we make sure that each of the small green part is due a row probe matrix then the large one where we just put aside everything together is due a row probe matrix. This is because I mean if we select any number of rows it's either in the same row probe matrix which is fine or they are in different row probe matrix which is even better because it won't have any overlap. So now the problem really just reduced to one single problem how to construct this small probe matrix with a particular size because we can reduce the same row probe matrix on this diagonal. So now we have this small row probe matrix. So inspired by Linda and River 915 we first add a identity matrix which replace the first part of the random matrix to our identity matrix. So this is actually very nice in the analysis because we now know that if we just excel like L number of rows we are guaranteed that there are at least L ones in the matrix because the identity matrix already contribute you L number of ones in the result. So now when we want to calculate the probability that this is a row probe matrix we don't need to consider the case where we select more than row number of rows because that's guaranteed to have a row number of ones. So now we only need to consider the case where they are smaller set where the size of S is smaller than row and very nicely even in this case the identity matrix contribute a lot of ones already so the bound is actually much tighter. So actually after we find all the concrete numbers and stuff so this is our final construction so we kind of chunk the input size into sizes into multiple of 232 bit and for each of them we have a blue up of two in this case. So compared to the prior best work we kind of have 1000 times speed up in terms of running time with like 65 k bit. So notice that actually here so the cost of our row probe matrix is actually much, is slightly higher than the previous probe matrix but that is fine because you can see actually the cost is dominated by computing this giant probe matrix actually row number of times. Okay so the last issue is how to enforce input inconsistency. So actually in the input recovery kind of paradigm there are two input consistency issues. So why is that how to ensure that the input is we are using the same input across different circuit in the cut and choose and also we want to ensure that so when we do the input recovery the input is the same as what we embedded into the cut and choose. So from high level this is the old version so the problem is that this X can all be different and we want to ensure that some consistency among them is true. So the idea of prior work is to kind of construct a zero-knowledge proof that takes this X is input and proof to Bob that they are secure they are consistent in some way. So the intuition is that I want to show that all these lines are consistent. So is it really like necessary actually no. So in this paper the high level so the intuition is that okay so we are not going to enforce this strong consistency instead we just want to enforce that at least one of the good circuit is consistent with the one in the input recovery and we just use whatever that we extract from the input recovery as the input to define that as the input of Alice. And it turns out that it is much better especially when we incorporate this consistency with the input recovery protocol I introduced previously because again this can be done very efficiently in cut and choose altogether with the input recovery. Okay so I mean now we have practical MPC of course we need to implement the protocol so we implement the protocol and it's a part of the EMP toolkit and everybody is welcome to try it. So we try so we run the experiment on a reasonable machine of this Amazon size but I mean however we just do a single call so it doesn't really matter that much and we use ADS and AI. So network is about 2.3 gigabyte and I mean all the experiments in the following assumes 40 statistical security parameter and 128 computational parameter. Okay so first three examples with various size the first one is as the circuit everybody uses ADS it takes about 65 millisecond and for modular explanation of 2000 bit it's a much much much much much much larger circuit with about 4 billion end gate and I mean we take about 5 hours to run in this case most of the cost is for the circuit is for sending the circuit and we also run another circuit that is very very wide and with reasonable depth for this circuit it's so this is a circuit that we can to sort 1000 integers each of 32 bit so it's about 10 million end gate circuit and takes about less than a minute so this is kind of some other experiment we do so in this case we kind of fix the input and output size and end gate and only change one of the input and see how our protocol works along with in terms of each of the component and it is actually linear for the input and output and the circuit size so it takes about 20 microsecond per bit and about 4 microsecond per end gate and so it's like a side story that actually so it is this experiment that we realize that the cost of Ropro matrix is not linear it's quadratic because when we run this experiment we find that the line in terms of the P2's input it's not the line it's a quadratic function okay so compare with the prior work because most of the work just is AES so we compare against AES so in this case we can see a trend here that it's getting better we are hoping that we are getting even better from this point so like concurrent and subsequent to this work we have a ton of improvement to malicious 2PC so we are actually working on one of the extension of the protocol that support MSE checking this is also done together with Semmel and Lucy and we are also able to drop the DDH assumption that we only need OTE here and like they are also like concurrent and more recent results on LEGO which actually improves function dependent cost so for example these papers put most of the computation to the face without the two parties does not know what function they want to compute and however for these kind of protocols they all need a large number of memories because like a large number of memories that is kind of proportional to the circuit size are even worse however for the circuit level we don't really need a large number of memories everything can be pipeline and the memory footprint size is only proportional to the statistical security parameter so for example for a billion size circuit the size of the memory you need to make sure it is constant run is going to be like hundreds of gigabytes unless you want to use disk okay so that's all for my talk and thanks and here is the code you can go and try and yeah we have time for a quick question when you were talking about the rope rope matrix for the selective failure attacks or against the selective failure attacks there was this S I guess there was a set but I'm not sure what S is there S is a so in analysis you want to show that for any subset it's going to be fine so that's the set and then you need to apply a union bound across all set and in this particular case because we don't need to consider set with more than row number of rows so the union bound is exponentially smaller randomly it comes from when we choose Y prime we choose Y prime randomly and have Y equals to E times Y prime E is a public matrix yes just curious you had this modular explanation circuit did you build your own or so in the EM thanks for the chance to give me to advertise the the framework so in the EMP toolkit we also have a on the fly circuit compiler so you can write so it's like we have like integers and the floating point and all the other stuff so I built a small program in that to compute modular explanation and it works out directly so you didn't try to optimize the modular expression just compile it it's like textbook version of modular explanation