 Hi, my name is Arcadia Rakhmovich and today I want to tell you about our work entitled some more of the merrier reducing the cost of large-scale MPC. This is joint work with Dove Gordon from George Mason University and Daniel Stern from Peraton Labs. So as the title suggests, this paper is going to be about optimizing the performance of a secure multi-party computation or MPC and in particular we're going to look at the setting where we want to run a secure computation between very large numbers of parties. But unfortunately, even though MPC has been a research topic in the crypto community for over 30 years, most of the truly practical protocols are really specialized for small numbers of parties, think two, three or four parties. And the reason for that is that in most of the protocols today, as the number of parties grows, per-party costs actually increases as well. So we're not getting sort of savings by increased number of parties, instead the costs increase and so in general we don't look at the case of main parties. And this brings us to our goal, which is how do we build large-scale MPC such that a per-party cost diminishes as the number of parties grows? And so, you know, where does this go in mind? We achieve the following result. Our main theorem says that assuming a secure pseudo-random generator as there exists an MPC protocol that's secure against a static malicious adversary corrupting at most a third of the parties as it has both communication and computation that decreases as it grows. And in particular, the communication sort of behaves as a circuit size over the number of parties, while the computation grows as s log n over n times the circuit size, where s is a statistical security parameter. So today I'm going to present to you how this construction works. But before I do that, I want to quickly say that we're of course not the first ones to achieve, you know, MPC whose costs decrease as number of parties increase. And in particular, traditionally such MPC protocols fall under the title of sublinear MPC. And the reason is that when you look at the total cost of protocol, if it grows sublinearly as the number of parties increases, then the average per-party cost is decreasing. There are two general approaches for achieving sublinear MPC. The first one of these introduced by Dan Gardel back in 2010 is to use SIMD computation and the basic idea behind SIMD computation is that instead of operating at individual inputs, we'll instead operate on vectors of inputs component-wise. So instead of adding two integers, for example, we'll add component-wise two vectors of integers. And we can set up such a way that the cost of doing this in MPC is essentially the same for adding two vectors as it is for adding two individual elements. Because that's of course, if you pack in enough values into these vectors, then this achieves significant savings in our MPC and allows us to get costs that grow sublinearly and end. But there is an important thing to remember, which is that there is an overhead. Because we want to operate on vectors, that means that we need our circuit to sort of be nicely aligned, what's called a SIMD circuit. And to convert a general circuit to the SIMD circuit, we have to pay some overhead. That's going to be a big bottleneck for this approach. On the other hand, another way of getting sublinear MPC is something called committee-based MPC. And here's the idea is instead of running our MPC protocol in all-end parties, let's first elect a small committee, in particular, say a committee of size of polylog N to preserve the honest party threshold. And just have those committee members run an MPC protocol for us, where everybody else just provides input and then kind of sits idle. And the point is, of course, because all of these non-committee members are doing nothing, the total amount of work, and hence also the average amount of work, does decrease. But there's also the drawbacks of the committee members still do a good amount of work. So the non-committee members do nothing, but the committee members do a lot of work. And both of these approaches sort of have some drawbacks. Our solution really is going to take these and combine them to sort of get the best out of each one in order to achieve a solution that beats all prior ones. In particular, let's quickly compare our solution to the best prior available solutions. So if you look at the best committee-based solution by Denver et al, you will actually see that our solution is, or their solution is O of log C times worse than ours. And this, of course, comes from having to convert a general circuit into a CMD circuit. Now, I should say that a subsequent to our work, very recently at Crypto 2021, are going out, I showed how to reduce this overhead from O of log C to O of 1. But unfortunately, their protocol does not have an implementation. And so it's very hard to judge how it concretely compares to our protocols and relieves that for future work. Then looking at sort of committee-based approaches, the best solution there seems to be a full-core solution where you kind of elect a large number of committees and then have them execute the whole protocol together. And sort of the way we estimate the cost of such a committee-based solution, it turns out that O of S times worse than our solution. And finally, the best implemented solution is actually not at all a sublinear solution. It's actually a variant of the SPEEDS protocol by Demgard et al, which has communication cost as O of N times worse than our protocol here. So as you can see, we essentially beat out all the previously available examples by more or less combining them in a more efficient way. So now, before I jump into our protocols, I want to give some preliminaries that will help you build up the basic tools our protocol uses. But before I do that, I want to talk a little bit about how we measure complexity of large scale MPC. The traditional measure in this setting is something called average complexity, which makes sense. Its idea is here is we just count essentially as a total cost, let's say communication cost across all parties and then divide by a number of parties, and that gives you the average cost. But this has some weird behavior. For example, it benefits from having many parties that do nothing, because they all show up in the denominator, but not in the numerator, hence reducing the average. So for example, if you think of the case of committee-based MPC, the average complexity will decline, even if the committee members do exactly as much work as they did before. So even though they're doing just as much work, because all these other parties are now sitting idle, the average complexity actually goes down. And for this reason, we feel that a more natural complexity notion to really consider in this setting is something called bottleneck complexity, which was proposed by Boyle et al in 2018. And bottleneck complexity rather than averaging complexity across all parties looks at the maximum cost for a single party. And this way, of course, the parties that do nothing no longer help, because if the party don't do anything, they obviously aren't going to help reduce the bottleneck complexity of the party that does the most work. And additionally, if you consider synchronous protocols as our protocol is, then the bottleneck party will actually determine the time it takes the protocol to complete, right? Because the protocol is likely going out to wait until the last party completes whatever it's doing, and this party will form the bottleneck. And so because of this observation, our goal and hopefully you will see how we achieve this is to distribute the work as evenly as possible among all the NPC parties. So we want all NPC parties to be equally involved in every stage of our protocol. Now let me quickly cover a few of the basic tools that we use. The first tool that we use is a standard shimmy and secret sharing. I wish to share the secret, just fix the random polynomial degree D. And the shares happen to be poisonous polynomial. And the point is that if you have D plus one points, you can recover the secret, but if you have less than D points, then you can't. But for this talk, what's going to be important is that you can actually compute in these values. So you can add semi-secret shared values by just adding your shares locally. And you can actually also multiply semi-secret values by multiplying your shares locally. But of course, this causes your degree to double. And so you have to do a little extra work to drive the degree back down. So that's true for semi-secret sharing. We're also going to consider an extension semi-secret sharing called packed secret sharing. And here the basic idea is that there's no reason why you have to store only one secret. In particular, before we store the secret, the value is zero. But you could have also designated the values as a negative one and negative two, or really any points that aren't given out of shares as secrets. And the point again is that if you have a polynomial degree D that stores some number of L of secrets, if you have D plus one shares in that polynomial, you can recover all L secrets. But if you have less than D shares, or less than D minus L shares, then you actually can't recover those secrets. And again, we can compute on these shares, but now in exactly in a pairwise fashion. So the point is that by adding my local shares, I actually add all of the secrets to the pairwise value. And by multiplying my shares, what I can do is I can multiply as a pair of secrets, but again, the degree doubles. And so again, I'm going to have to work to reduce that degree back down. Another standard tool that we're going to use is called out of the secret sharing. So for those who don't remember, out of the secret sharing is the basic idea where to share a secret, I just pick and random values that sum up to that secret in say a field. And each party sort of gets one of these shares. And then we have an extension about the secret sharing that we also use called authenticated secret sharing. And here's the idea is in addition to that out of the secret sharing of a secret S, we also add a MAC key alpha and we secret share alpha as well as alpha times S. And you can think of this as essentially an authentication of secrets that there's no way for you to modify what the secret is and successfully open it unless you know alpha. And the nice properties that both these former secret sharing still allow non-interactive linear operations. For example, you can still add secret shared values by just adding your shares. So that's all the basic tools that we need. You know, hopefully if you remember those work. And now let's dive into our protocol. And I want to start with a very high level overview of protocol. So we're going to follow sort of by now a very standard way of building MPC protocols, which is by using Beaver multiplication triples, which means that we're going to have an offline phase in which we generate multiplication triples. And then an online phase that uses those triples to actually evaluate an arbitrary circuit C. So we're going to start by having all parties prepare these triples. But because we want to be communication efficient and sublinear, we're going to do it using Pax secret sharing. So we're going to have each part of the party essentially choose values AI and BI and then pack them in into Pax secret sharing and produce the Pax secret sharing of triples. Then we're going to take those Pax secret shared triples and somehow unpack them because remember we want to compute general circuits. So we can't want unpack triple standard out of the triples, which we then pass into online committees that then syndicate them and then run some online MPC protocol and we'll cover all these steps in a little more detail now. So let's start with the triple generation step. So our goal here is to produce Pax multiplication triples, AI, BI and CI, such that CI is a point wise product of AI times BI. And what we're going to do is we're going to use a Pax version of the well-known Damgard Nielsen protocol. So the idea here is that each party is going to produce a Pax secret shared vector of random values A, another Pax secret shared vector of random values B and what's known as a doubly shared vector of value random values R. What that means is we're going to have a Pax sharing of R of degree D and another Pax sharing of R of degree 2D and we're going to have the same values. And the way that's nice thing here is that this requires a per-party communication of all of that, right, just to sort of send out these shares to everybody else. Now what we need to do now is of course the problem is that these AIs, you know, parties know these AIs. And now we want to produce random values that nobody knows. One way of doing this of course would be just to sum up all the AIs, but that's very inefficient because that requires essentially N AIs to produce one secret value A prime. Fortunately, the Damgard Nielsen protocol preserves that if you use some random subtraction, particularly by multiplying these AIs by a Vandermonde matrix, then you can actually get all of N secret values such as an adversary controlling a small number of parties, doesn't actually know any of them. And you can also do the same thing for the Bs and the Rs. So you can get essentially all of N Pax shares of A, all of N Pax shares of B, and all of N Pax shares of these Rs. And the important thing is that now you've packed in all of N squared values, you know, with only all of N communication. And now we do the following. And this is again still following Damgard Nielsen protocol. So what we do is we have each party locally multiply their packed vector A, you know, this A prime they produce times the vector B prime and add to it this two G C to sharing of R prime. So the problem is when you multiply A times B, the polynomial degree went up to 2D, but you can still blind this with this 2D secret sharing of R. And then you open that value to a designated party, let's say P0. P0 can now reconstruct all of these secrets, so you can reconstruct this vector AI, BI plus RI. And he's going to again reshare it as a packed secret sharing, but now our degree D. Now all the parties receive their shares of this packed secret sharing. They also have shares of the packed secret sharing of degree D of RI. And so they can subtract this out and actually get a packed secret sharing of CI, which is AI times BI. And so it seems great because now we have N squared packed values and it took us only over then for party communication, but not so fast. Unfortunately, this is a problem. And the problem comes from something that was observed by the Jenkins et al. that showed that actually this packed version of Demger Nielsen is vulnerable to something known as a linear attack. So the idea here is that these inputs of the AI or the BI, if one of them is not now deformed, meaning that they're not actually degree D secret shares, then the adversary can cause correlation between the packed slots. So for example, here, we see that the second slot of the product vector has a term that depends on the first slot of the input vectors. And this is very problematic. In particular, if you work through the online phase of protocol later, you'll see that having such correlated products and such correlated triples actually completely breaks security. But fortunately, there's an easy fix in this setting. Basically, we just need to check that all the original sharings of AI, BI and RI are actually valid sharing in the sense that there are a line of degree D polynomial. And the reason we can get away with this is that we only need to do one layer of multiplication. We never have a case where we have to take it out for that multiplication and multiply it by something else because we will do it differently. So because all the triples are generated in one layer, we just need to check them all. And in particular, because they're all done in parallel, we can actually batch this check and just do all the checks all those shares at once. And so this ends up being really cheap. And so just adding this really does not impact the cost of our communication. And we still get all of n of all of n squared communication to get all of n squared packed values. So this is great. Now we have these packed triples. So what do we do is I'm so let's think about that. So what can we do with these packed triples? Oh, here we go. So the first thing we can do, of course, is we can just perform the online computation and the packed values. We already talked about this, you know, we can do SIMD computation. But unfortunately, this means that you pay an O of log C overhead to convert your circuit into a SIMD circuit. Of course, you combine this with a committee based approach by, say, sending those packed triples to a large online committee, say, of science by log n. And you still have this log C overhead. So this is still not great. A third approach that is going in some different directions to say, okay, let's actually unpack these triples first. So we have these packed triples. Let's just unpack them. And then we can send them to a small online committee. In particular, once we unpack them, we can convert them into additive secret shared triples. And then we can use a dishonest majority committee. Right. The nice thing is this avoids the log C overhead. But the problem is that once you unpack the triples, you actually have O of n squared triples, right? Because that's what you had. And now you'd actually send all those O of n squared triples and then, unfortunately, it causes all of n squared communications. So what we do is something a little different. So what we do is we actually send packed triples to many small unpacking committees. Now, this is a little bit counterintuitive because the whole point was that when these triples are packed, you know, we pack them optimally. So it requires all n parties to actually reconstruct such a triple. And so how can we get away with sending these packed secret shares to a small committee which can't even reconstruct them? And here we sort of use a very useful trick, which is the following. We observe that what we can do is we can have each party take their packed secret share. So their share of this packed secret sharing and add only share that to what we call an unpacking committee. Let's suppose that this committee exercise O of s, then, you know, this requires essentially O of s times n communication per party. So if we want to have these n of z packed triples, then it takes O of s n to send them all to this unpacking committee. And this still is now holding, you know, O of n squared triple values in there. But now the question is, okay, so now that we have this, how does this online committee deal with this? How does it unpack this? And fortunately, it turns out that, oh, sorry, I forgot to mention that we actually improve this to O of n by using a PRG. But it turns out that unpacking, you know, packed triples is actually a linear operation. And so we can do this unpacking step entirely inside of the additive sharing. So now that we've additively shared our packed shares, we can actually unpack them inside of the online committee by just doing linear operations with no additional communication. And so again, we now have O of n per party communication to extract O of n squared triples, right? And now these are actually additively shared triples. So I do want to sort of talk about some advantages of this committee-based approach. So the one thing is that by using many committees, we have to split all of these triples to unpack across the many chosen committees. In particular, if you have committees of size O of s, we can choose, you know, O of n over s, such committees without having the overlap of parties. And each of these committees is now responsible for just a s over n fraction of, you know, the circuit of that number of triples. And this allows us to reduce the, make it a computation cost of unpacking triples by just splitting the work equally across all of these unpacking committees. And other benefits, as we've said, is we sort of switch now to dishonest majority committees by using the added secret sharing. And the observation here is that if we think of concrete values, you know, if we wanted to use even honest majority committees to achieve, say, two to the negative 40th probability of failure of selecting a good committee, you need to check, remember, we started with a T lesson anniversary overall population, you would actually have to select about 430 parties in your committee. Whereas just to guarantee, you know, at least one honest party in each committee, you only need to select a committee of size approximately 25. So as you can see, even concretely, you know, this is saving a factor of almost 20 by electing, you know, many, many more committees and keeping them as small as possible. So now moving on, of course, now we know that we unpacked these things into these out of the secret shared secret shared triples. Now we should finalize how do we actually compute on this. And for this, we just follow the details of the speech protocol. So essentially, what we do is we have these unpacked out of the secret shared triples. The first thing we do is authenticate them. For this, we just kind of use a standard technique that originates from the speech paper, as it requires us using eight unauthenticated triples to produce one authenticated triple. We also use the same approach of speech to do the inputs and also just end up using the speech protocol in each of our committees to actually evaluate the circuit. So now let's think about how well does our protocol perform. So again, you know, what we've done is we use an initial pack secret sharing step to produce our triples. And then we use a, you know, committee based approach to unpack and run online pages of protocol. So how well does this work? Well, so for the first comparison, we actually came up with a folklore solution that most resembled our approach, which is as follows, basically you take committees, either honest majority committees or dishonest majority committees, and you use as many of them as possible to compute the offline phase and online phase of protocol, you know, also transferring triples in the meantime, and we, you know, figured out what the cost of that would be. And if you look at these, the communication, you know, across the board, our protocol beats the best possible sort of folklore committee based approach directly because we're using these pack secret sharing combined with committee based approach. And just a comment, we use the protocol of cheat at all for honest majority and we use speeds for dishonest majority. Then we also implemented our protocol to evaluate, you know, how does it perform in the real world. Now I do have to say that we didn't have, you know, millions of parties to run this on special communications, the communication is estimated and simulated, but the computation costs are actually measured using the Lib IOP library. And you can see the results here shows that for, you know, and say up to two to 20, you can clearly see is that, you know, performance is improving as an increasing in particular for n equals to the 20, we're able to produce, you know, a million authenticated triples in about 10 milliseconds. So hopefully you see that as n grows, clearly the, you know, amount of work is decreasing and the efficiency of our protocols growing as well. Or as we would like to say, the more the merrier. Thank you very much for listening.