 Hi, I'm Mayank and today I'll be talking about our work on function secret sharing for Mixed Mode and Fixed Mind Secure Computation. This is joint work with Alette, Nishant, Neve, Divya, Yuval and Nishant. We will be working in the two-PC with pre-processing model where we have two phases. In the first phase, a dealer distributes correlated randomness to the two main parties, which then use this correlated randomness along with their secret inputs in an online phase to compute a joint function on their inputs. For a majority of this talk, we will assume that we have a trusted dealer and of the other two parties, we have at most one semi-honor square option. For a more realistic setting, one typically emulates the dealer using either a three-PC protocol, which is quite efficient, or using a two-PC protocol which can be much more expensive and I'll come back to this later in the talk. And to handle malicious parties, we can use a speed-style MAC approach which incurs a very small overhead over semi-honest and all of the ideas that I discussed in the talk today are applicable to these settings as well. What is function secret sharing? Think of a function f with outputs in a finite abelian group. With FSS, you can split the function into two shares f0 and f1, give one share to each of the parties who can now locally evaluate the function on inputs x in the domain, such that when their local evaluations are added together, you get back the output of the original function f. These function shares are typically called FSS keys. There are two main properties that one needs from an FSS scheme. The first one is correctness, meaning that you want that when the local evaluations are added together, you get the same output as the original function. The next one is security, where you want that a single FSS key hides the original function f. But it turns out a lot of trivial constructions satisfy these two properties, so we add another property of compactness, where we want that the FSS keys are compact in input and output size. In 2019, Boyle et al showed how to do two-PC with pre-processing using FSS. It all starts with representing your computation as an arithmetic circuit, where a wires can possibly carry elements from different abelian groups, but just to make things simpler for us, let's just assume that wires carry ring elements, which suffices for the purpose of this talk. Let's focus on a single gate here. Here, gate f takes as input x and outputs y. Their proposal was to move from this evaluation to a different evaluation where we work with masked values. The input wire is going to carry x plus r, where r masks x, and similarly for the output wire. And for correctness to hold, we want to change the gate's functionality f to a complementary function f hat, where f hat is parameterized by these shift factors r and s, takes x plus r as input and outputs y plus s. Just to make it easier for me to refer to both of these evaluations, I'm going to call the first one as the unmasked word, and the second one as the masked word. And the key idea here is that since r and s have to be kept secret, one can use fss to secret share this function f hat, and that would still hide r and s. Once we do that for all the gates and all the wires in the circuit, this is what it looks like. And in the larger 2PC landscape, what happens is that the dealer gives out fss keys for all the gates in this new masked evaluation in the preprocessing phase, and in the online phase, the parties both evaluate the gate locally using the fss shares that they got from the dealer. So for example here, let's look at the gate f. Both the parties input x plus r, but the way fss is defined, their outputs are going to be additive secret shares of y plus s, which is not exactly what we wanted because we want output to be y plus s, but that can easily be handled by both parties just repeating to each other what y plus s is, and in this way the computation can keep going on. But this masked word setting might look unfamiliar to you because MPC protocols typically work in this secret shared setting. But what I claim here is that both of these settings are equivalent for this particular evaluation. To understand that, let me add another gate g to the evaluation, which outputs z plus t. If you look on the left between the green segments, this segment takes as input additive shares of y plus s and outputs additive shares of z plus t. So this is what typically secret shared to secret shared setting of MPC looks like. And on the right, we have the mask to mask setting, and both of these are equivalent. It's just a matter of where you place your segment. But a great thing here is that since we can cast our evaluation as secret shared to secret shared as well, this now makes it modular to be used with other protocols. So you can use your favorite protocol, which doesn't have an efficient way to evaluate the gate g. You can use FSS as a plugin solution for that, and it will give you a one round solution to evaluate that gate g and leave you with secret shares of the output from which you can carry on with the rest of your protocol. There's two important things to note here. The first one is that the communication per party here is just one ring element per gate, and this communication happens over just a single round. Hopefully by now I've convinced you that you can use FSS to do two PC with preprocessing, but how well does it compare to other approaches? Let's look at that now. To interpret this table, just think of a commonly found non arithmetic gate, which means something which is more complex than a typical addition or multiplication. In the first row, we have gobbled circuits adapted to the trusted dealer model. Online communication in this setting is quite high. You have just two online rounds, but the correlation size, which is the size of the correlated randomness is also quite high. For GMW or other secret sharing based approaches, you have a load of moderate online communication, which depends on which non arithmetic gate we're talking about. You have high online rounds, but correlation size is going to be quite low. And finally, with FSS, you have a low online communication and just a single online round, like I mentioned in the previous slide, but the issue is that you have a high correlation size. And for online metrics, you can see that FSS is a clear winner here, and the biggest bottleneck for FSS is the correlation size. Now let's understand how correlation size affects the efficiency of a preprocessing in online phases. In particular, there are two costs that are directly affected by the correlation size. The first one is the storage cost and it directly affects the online phase because these correlations have to be stored and consumed in the online phase. The second one is the preprocessing cost where it captures the cost of securely realizing the dealer. This can be done in either of these two settings. The first one is three PC or trusted hardware. In this setting, the cost comes down to how quickly can we generate these correlations locally and stream them to the two main parties. And in the case of two PC, for Gabor circuits, it's quite similar to three PC. But for FSS, we now need to distribute the dealer via a two PC protocol. For smaller inputs, for example, up to 16 bits, we have a construction from Dornar Shalat, which is black box in PRG, meaning that we don't need to run PRG inside two PC. But the issue is that it requires high local PRG evaluations. For the case of larger inputs, we can resort to using two two PC friendly PRGs. But the main idea here is that no matter which setting we are in, reducing the correlation size will yield improvements to all of these. This is the perfect time to motivate our work to just quickly recap. FSS approach has the benefits of a fast online phase, but it's bottlenecked by high correlated randomness size, which implies a slower preprocessing phase. In this work, we reduce correlation size for FSS approach, which implies a faster preprocessing. To give you an idea of how our improvements look like, think of 16 bit values. And here I'm showing key size or correlation size for Gabor circuits prior FSS and our work. In the first row, we have the interval containment gate, which checks whether an input X lies in an interval A to B. For this gate, we achieve 3x improvement over Gabor circuits and 7x over prior FSS. For ReLU, the improvements are 2x and 6x. For sigmoid, which is approximated with a 12 piece plane, the improvements are much more substantial at 15 and 22x. For bit decomposition, our improvements over prior FSS are 11x, but we are slightly worse than Gabor circuits. And finally, for right shift, we achieve a comparable key size to Gabor circuits, and we provide the first efficient construction using FSS for this particular gate. That should give you an idea of how our improvements in key size look like. But let's see how it translates to when we distribute the dealer using 2PC. Here, I'm considering the case for the sigmoid gate. For the communication required for key generation, we are 6x better than Gabor circuits, but that comes at a huge cost of 113 times more AES calls that we need. At first, this might look too much, but if you consider a wide area network, then we can actually beat Gabor circuits in key generation runtime. However, I should point out that the main idea here isn't that we are going to be always better than Gabor circuits for preprocessing. In fact, for most cases, we are going to perform worse than Gabor circuits for preprocessing. But the main point is that we require lesser storage and we have a much faster online phase, and that is the key idea here. To understand our sources of improvement, let's divide the gates that we encounter into two categories. On the left, we have simple gates like additions and multiplications. We have complex gates, for example, interval containment, ReLU, right shift. And a common theme across all of these gates is that they use comparisons in some form. Our first source of improvement comes by improving the key size for FSS for comparisons by 4x. That helps all of these complex gates. And secondly, we provide improved key sizes for common gates, which are significantly better than what was known before. In the next two slides, I'll be going over each of these improvements in a little more detail. What do I mean when I say comparisons for FSS? For that, let's first understand comparison functions. A comparison function is a simple function which is parametrized by alpha and beta. When you give it an input which is at less than alpha, it outputs beta, otherwise it outputs zero. And a DCF or a distributed comparison function is an FSS scheme for these comparison functions. When we consider alpha and beta, both being n bits, then prior state-of-the-art construction required 8n lambda bits of key size, where lambda is the security parameter. And in this work, we reduce that to just 2n times lambda plus n. And for the common case where n is much less than lambda, this amounts to an approximately 4x improvement in key size. And these improvements come by providing a direct construction rather than using FSS for decision trees, as was done in the prior work. For some commonly found gates, we provide further improvements. For IC and ReLU, prior construction required 2 DCF keys while we reduced that to a single key. For bit decomposition, prior construction had n minus 1 keys while we reduced that to n over W DCF keys, where W is a window size parameter. W reduces key size, but it increases compute, so the exact choice of W depends a lot on your application. And finally, for splines with m pieces and d degree, prior construction required 2M DCF keys while we reduce that to a single key and a small amount of correlated randomness. And a common theme across all of these numbers is that for most of these gates, we require just a single DCF key no matter how many comparisons actually happen in the gate. These improvements come from a crucial insight. Let's understand that. In the unmasked world, prior construction assumed that the intervals inside the gates are secret. For example, for IC, they assumed that the interval boundaries A and B were secret, but it turns out that that's an overkill for most applications. And we relaxed that assumption in this work by assuming that the interval boundaries are public. This leakage is actually okay because the function that is to be computed is typically known to both the parties. And if you're not convinced, then let me give you an example of neural networks. In a neural network with ReLU activations, a ReLU function has a public intervals because everyone knows that you're checking X, which is in the positive range or not. And the intervals there are public, so this leakage is actually okay for most of the applications. In the rest of the talk, I'll be focusing on the IC gate, where I will assume that the boundaries A and B are public, and this is what it looks like in the unmasked setting. But since we will be working in the masked setting, the IC gate is going to look a little different. We will be checking whether X plus R lies from A hat to B hat, where A hat and B hat are A plus R and B plus R respectively. And we will be blinding the output with a plus S, but let's ignore plus S for now because that can easily be handled by the dealer giving out additive shares of S in the pre-processing phase, along with the other correlated randomness. And since A hat and B hat both have, both are now tainted with R, they'll now be secret values. First, I'll be showing you the IC construction from prior work, and then how we improve that construction to achieve just a single key. Let's start the technical part of this talk by taking a quick look at the IC construction from prior work. I'll be showing you this construction pictorially. Here, this long block show is a representation of the ring. On the left, we have zero. On the right, we have N minus one. So this is the ring of N elements, and anything that goes beyond N minus one wraps around and comes on to the left side. On top of this ring, I'll show you different points in the ring, and on the bottom, what all output values do we want for those points? For example here, for the IC gate, we want output to be one for all values from A to B and zero everywhere else. Let's put this on the right side for a moment. This is how the unmasked world looks like, but since we will be working in a masked world, let's take a look at that. We will have two cases here. In the first case, our shift factor R was small enough that it doesn't cause the green region to wrap around, but it just shifts it to the right a little bit. The idea in the prior construction was to use two DCF keys. The first one is a DCF key for A hat. What I mean by that is that this key is going to output minus one for all inputs, which are less than A hat and zero everywhere else. And a second key, which is a DCF or B hat, which outputs one on everything less than B hat and zero elsewhere. The proposal was to add the outputs of these two DCS together, and that gives you precisely what you wanted. It gives you one in the green region and zero everywhere else. So this takes care of the first case. Now let's take a look at the second case. In the second case, the shift factor R was large enough that it causes the green region to partly wrap around. So the green region is now split into two parts, one on the right and one on the left. And we still want one in the green region and zero everywhere else. Let's follow the exact same strategy and see what happens. So we have a DCF for A hat, DCF for B hat, we add them together, we get something. But this time it doesn't look exactly the same as what we wanted. But if you look closely here, you'll see that what we are getting versus what we want are offset by an additive factor of plus one. So we can actually look at both of these cases slightly differently. We can think that the first case requires a correction of zero and the second case requires a correction of one. And since the dealer will know which of the two cases we are in because the dealer knows R as well as A and B. So the dealer can actually give us corrections of zero or one depending on which case we're going to be in and that allows us to evaluate the IC gate. This is the construction from prior work. Our plan for improving the IC construction from prior work is going to be the following. The prior construction required two keys, one for computing X less than A hat and one for computing S less than B hat. What we're going to do here is that we're going to move to the unmasked world for just a second. We're going to pick an arbitrary C, which is greater than both A and B. And then we're going to come back to the masked world, give a key for X less than C hat, and we will reduce the keys for A hat and B hat to this single special key. And if we can achieve that, then we will be able to reduce the keys for IC from two keys to just a single key. Obviously, this cannot happen without giving some extra terms. So there will be some more terms needed, but the good thing here is that those terms are pretty easy to compute and don't need FSS keys for themselves. Let's look at our final construction now. On the right here, I have the unmasked world where I've chosen an arbitrary C, which is greater than A. Let's see what things look like in the masked world. We again will have two cases. In the first case, the R is small enough that it doesn't cause either of A hat or B hat to wrap around. So both of them shift to the right a little bit. And I'm going to divide this into four regions. First region would be everything which is less than A hat. Fourth region would be everything which is greater than C hat, and everything in the middle will be divided into two regions, partition at this special point and minus one, minus C minus A. The relevance of this point will be clear in just a moment. And this is actually what we want. We want DCF for A hat, but we will not be given access to this key, and we will somehow need to get the same map using the key for C hat. Our idea to achieve this is to shift everything right by C minus A. If you do that, A hat jumps onto the place where C hat used to be, and the special point is now the right most point in the ring. Let's see what DCF for C hat gives us here. It gives us one in the regions 1, 3 and 4, but we actually wanted one in just the first region. So in particular regions 3 and 4 are the problematic regions here where we are getting something else than what we want. But a crucial observation here is that these two are the only regions that wrap around when everything is shifted to the right by C minus A. So if we could penalize everything that wraps around when C minus A is added and add that to our DCF evaluation for C hat, we will get the exact same map as the DCF for A hat. And this solves the first case. And the major insight here, which lets this transformation work so nicely, is that this overflow is local for both the parties to find. Both the parties know C, A and X plus R. So they know which of the elements are going to wrap around when C minus A is added to X plus R. Let's take a look at the second case. In the second case, R is big enough that it causes C hat to wrap around, but not A hat. We will again divide it into four regions and let's follow the exact same strategy. So this is the map that we are looking for. We shift everything to the right by C minus A and this is the map for DCF for C hat here. The regions three and four wrap around. So we are going to penalize them. And when we do that and add that to our DCF evaluation for C hat, this is what we get. This is not exactly the same as DCF for A hat, but if you look closely, this is offset by just an additive plus one. So we can again ask the dealer to give us corrections of zero or one whether we are in case zero or whether we are in case one or case two. And this is our entire construction. We have a lot more in the paper. For example, we show how to do two round fixed point multiplication and prove a barrier for doing that in a single round with symmetric key cryptography. We provide a construction for a distributed key generation for DCFs, which is quite similar to the Dorners-Shalloth construction for DPFs. And we show how to handle malicious evaluators. If you are interested in any of that, I encourage you to take a look at our paper. Let's give ourselves a quick pat on the back for making this far in the talk. The key takeaways that I want you to remember are that whenever you have multiple comparisons in a gate, we can ask the dealer to just give us a single key and some ring elements and everything else can be taken care of by the two parties in the online phase. No matter how many comparisons there are in the gate. And when we couple that with our improved DCF key size, we achieve much smaller key size for commonly found FSS gates, which implies faster preprocessing, which was the biggest bottleneck for the FSS approach. A concurrent work Arian also proposes some optimizations for FSS, catered towards training and inference in neural networks. If you are interested in that, I encourage you to take a look at their paper. Thanks a lot for tuning in for the talk. Have a nice day.