 Hi, everyone. I'm Mohim Nakalkar, and I'm a PhD student at Cornell. Today I'll be talking about our paper on MPC-friendly symmetric cryptography from Alternating Modulate. And this is joint work with several wonderful co-authors from all over the world. So let me start with some motivation. Symmetric primitives have been used extensively throughout cryptography, and computation was initially done by a single party, but recent applications have required evaluation in a distributed setting, where inputs and keys are secret shared among two or more parties. And this is somewhat non-ideal for standard constructions like AES and SHA, which were optimized for centralized evaluation and not designed with this distributed setting in mind. So this kind of stimulates a natural area of research to design so-called MPC-friendly primitives from the ground up. Our work continues the exploration of this area. So let's take a step back and try to understand what goals would be useful for MPC-friendly cryptography. So to start, the constructions should have as little non-linearity as possible, since non-linear functions are more challenging to implement in MPC. So concretely, we want to minimize two parameters, non-linear depth, which is the depth of the non-linear components, and non-linear size, which is the total number of non-linear gates. The type of non-linear operation does matter, but these should be good enough proxies. Our other design goals revolve around security. The first is for these constructions to have a high algebraic degree, which is beneficial for security. And second, we want simpler designs and simple conjectures of security, because they're easier to implement with fewer errors and are also usually easier to reason about and cryptanalyze. So how is it that constraints like small non-linear components and high algebraic degree can coexist at the same time? So towards settling this apparent conflict, a line of work started by Bonnet at all and continued by Chion at all, looked at a new paradigm for constructing primitives. This was advertised as part of the so-called crypto dark matter, which they defined as a large space of unexplored areas of cryptography, focusing primarily on simple constructions and efficient protocols for practical applications. Their focus was on this alternating moduli paradigm, where the high level idea is to have constructions that compose linear functions over different moduli. And linear functions are easy to evaluate, so these constructions will be quite efficient, and since most of the computation is linear, the non-linear size should also be low. The security argument for these constructions needs to be carefully analyzed, and the two papers I mentioned initially started the study in this direction. So before I describe the contributions of our work, I'll set the stage of the candidate weak PRF proposed by Bonnet at all in this alternating moduli paradigm. So just as a quick reminder, a pseudo-random function is a keyed function that looks like a truly random function. So basically it's difficult for a poly-time adversary to distinguish between an oracle that uses a PRF and an oracle that uses a truly random function. And for a weak PRF in particular, there's an added restriction that inputs are chosen uniformly at random instead of allowing the adversary to pick them. So the main candidate proposed by Bonnet at all was the 2-3 weak PRF, which composes linear functions mod2 and mod3. This is parametrized by the length n of the input, the length m of the intermediate vector, and length t of the output. And they also use a public z3 matrix B of size t by n. So for the construction, the key is a z2 matrix of size m by n, the input is an n-bit vector, and the output is tz3 elements. So now the candidate works as follows. First, it multiplies the secret key matrix with the input, and this is a non-compressive secret linear map over z2. Next, w equals kx, which is the z2 vector, is reinterpreted as a 0-1 vector over z3. And finally, this is multiplied with a public matrix B over z3 to get the output y. In other words, the modular are mixed by first taking a linear mapping over z2, then converting the intermediate vector to z3, and then computing a linear map over z3. So why is this conjecture to work? So as it turns out, a linear map over z2 is a high-degree function over z3 and vice versa. High-degree isn't it, it cannot be approximated by low-degree polynomials. Another conjecture from Bonnet et al. implies hardness of learning for low complexity classes, like depth to ACC0 and width-rebranching programs. So this work continues the exploration of designing new candidate constructions in this alternating modular paradigm. So while Bonnet et al. only consider PRFs, we ask whether we can construct simple candidates for other primitives, like one-way functions and pseudorandom generators. So we'll also present a candidate weak PRF, where the key input and output are all over z2. So this is useful for settings where the output of the first PRF evaluation is kept secret and fed in as input into the second evaluation. So for all of our candidates, we perform substantial crypt analysis and use that to inform our parameter choices. And later we'll give efficient distributed protocols for several settings with and without pre-processing. And all of the protocols will only consider semi-honest parties. And finally we'll talk about some applications, and two primary applications we considered are OPRFs and signatures. So I'll now proceed to introduce our candidate constructions. The general structure of all of our candidates is quite similar. So we're parameterized by n, m, and t as before. And the core design now is to first map the input x to an intermediate w with non-compressive linear maps. And then map w to the output y with compressive linear maps. And these linear maps will be over different moduli. The first construction I'll describe is our candidate one-way function. So recall that a one-way function is an efficiently computable function that's hard to invert. So given a random y, it's difficult for an efficient adversary to compute the pre-image. So now our candidate two-three one-way function has the same structure as the two-three weak PRF candidate. Except now the key k is replaced with a public matrix A. So now you have a public linear map over z2, which uses A. And then a public linear map over z3, which uses B. I'll briefly describe an interesting attack on the two-three one-way function based on a reduction to a subset sum. So this is the only attack I'll describe in this talk. A lot more cryptanalysis was done for this candidate as well as the other candidates I'll describe. And all of this can be found in the full version of our paper. So for the two-three one-way function, the attacker is given a output y and attempts to invert it. Now given a vector w, there exists a parity check matrix p, such that pw equals 0 if and only if there exists an x such that A times x is w. So that is when there is an inverse x for a given y, then p times w equals 0 and bw mod 3 equals y. So now this basically boils down to finding the subset j of indices from 1 to m such that the sum over j of p times ej mod 2 is 0. And the sum over j of b times ej mod 3 is y. So here ej is the jth unit vector, which is a vector with zeros everywhere except for the jth entry, which is 1. So it's easy now to see how this is basically the subset problem except now over z2 cross z3 instead of the integers. And here the variables are the unit vectors. So we choose our parameters to resist attacks based on advanced subset sum algorithms. The second construction I'll introduce is our candidate weak PRF where the key input and output all need to be over z2. So for this candidate, we'll multiply the key in the input both over z2 and z3. So more specifically we'll compute u equals kx mod 2 and then v equals kx mod 3 mod 2. And now the intermediate vector w is taken as u plus v over z2. So finally, as in the previous construction, y is computed from w using the public matrix b. The only difference here is that b is over z2 rather than z3. So for this construction, each bit in w can be thought of as a deterministic LPN instance with noise rate one-third. So this noise is provided by v and each bit of the noise can be thought of as based on the input x and a specific column of the key k. So it turns out that this candidate is actually quite similar to the alternative weak PRF candidate proposed by Bonnet at all, but with a couple key differences. So for one, their output was a single bit and basically was the intermediate vector. And second, because of this, they did not have this final linear compressive mapping using b. So from a crypt analysis standpoint, having multiple output bits decreases security, but from our analysis, having the final compressive mapping offsets this loss. And finally, I'll briefly introduce our candidate length doubling PRG, which needs to be able to generate two n bit strings that look uniformly random. So the PRG has essentially the same structure as the LPN weak PRF, but now instead of using the key k, we have a public matrix a over z2. And for security, we'll need to make sure that the intermediate vector is slightly longer. So we performed extensive crypt analysis on all of our candidates, including the one from Bonnet at all and used it to influence our parameter choices. So in this work, we primarily focused on combinatorial attacks and statistical tests for S bit security. Here's the parameter table for our candidates. So we give both aggressive and conservative choices for our two, three candidates. And notably, our one-way function PRG candidates have S bit inputs, which is minimal for S bit security. And this is possible in principle for LPN weak PRF as well, but at the cost of other parameters, which hurts NPC friendliness. One other thing to note here is that since our constructions are quite new, it's not unlikely that even the conservative parameters could be broken. So we definitely welcome more crypt analysis. I'll now move to describing efficient protocols to evaluate our candidate constructions. As mentioned before, we focus on the semi honest setting and provide protocols for several distributed settings with and without pre processing. And all of our constructions use just five types of gates and protocols for these gates can be generically composed to evaluate any candidate. So the five gates we use are pretty straightforward. The first three are commonly used simple linear and additive gates. And then we have two other modulus conversion gates to convert between shares in C2 and C3. So the C2 and C3 gates, so the C2 to C3 gate converts shares of X over C2 to shares of the same X over C3. And the Z3 to Z2 gate converts shares of X over Z3 to shares of X mod 2 over Z2. And we'll provide efficient protocols for these modulus conversion gates. So in terms of gate protocols, the linear and addition gates can be computed locally and this directly follows from the linear homomorphism of additive secret sharing. Now for the bilinear gate, where both the matrix and the input are secret shared, either we can use standard multiplication triples or something like OLE or vector OLE correlations for unkeyed primitives or when the same key is used to evaluate multiple inputs. For our modulus conversion gates, we'll use a special type of preprocessing. To convert from Z2 to Z3, we'll use a 2-3 correlation, which is a random bit that's shared both over Z2 and Z3. And for the conversion from Z3 to Z2, we'll use preprocessing of the following form. A random R which is shared over Z3, R mod 2 which is shared over Z2, and R plus 1 mod 3 mod 2, which is also shared over Z2. And given these correlations, the protocols for evaluating the gates turns out to be quite straightforward. So I won't go into detail here. But later I'll also describe efficient techniques to generate the correlations that we require. So these gate protocols can be easily composed to compute the overall protocol cost for our constructions and also for other similar constructions in this alternating modular paradigm. We can optimize some of the cost of preprocessing using standard techniques, like using the same mask for the same input or compressing the preprocessing using a PRG. For our constructions in particular, since the bilinear gate will often feed into the modulus conversion gate, which requires reconstructing the masked input as its first step, we can also reduce the preprocessing by masking the output within the bilinear gate itself. Another highlight is that all of our preprocessing is PCG friendly, in that the correlated randomness that we require can be generated with sublinear communication cost and good computational efficiency for multiple instances at the same time. So here's a table with the concrete costs for evaluating our candidates in different settings, including the distributed 2PC, distributed 3PC with replicated shares and no preprocessing, and the public input 2PC with preprocessing. So online communication is given in bits, number of messages, and number of rounds, and preprocessing is given both without and with compression. Now as I mentioned earlier, I'll talk briefly about how we can distribute the dealer. For the bilinear gate, the preprocessing used can be easily compressed using existing PCGs for OLE and Vector OLE, and there's results in getting K correlations with far fewer than K bits of communication. For modulus conversion gates, we show how to efficiently generate preprocessing from OT correlations, and since OT correlations can be easily compressed with PCGs, this implies that all of our preprocessing can be efficiently generated. Overall, distributing the dealer does not actually add too much to the online cost. So as a concrete example for the 2PC setting with a 2-3 week PRF, the total online cost is only 23% higher. So now I'll talk briefly about how we generate our specific correlations. So the functionality for the 2-3 correlation is to generate shares for the same bit over C2 and C3, and for this we'll use one out of two OT correlations over C3. So here, Party 1 is given X0 and X1, and Party 2 is given a bit B and the corresponding XB. Now, when X0 and X1 are different, the conversion between the OT correlation and the 2-3 correlation can be done locally. So all that remains is for Party 1 to communicate to Party 2 which correlations it needs to throw out. And this on expectation takes 1.3 bits of communication per instance. Now for the 3-2 correlations that we require, the functionality is to generate Z2 shares of U and V, and Z3 shares of R such that U equals R mod 3 and V equals R plus 1 mod 3 mod 2. And for this, we'll use one out of three OT correlations where each entry is a 2-bit string. Now the parties in the protocol use these correlations in an OT protocol where Party 1's input is a random permutation of the Z3 truth table for U and V. So essentially we have two bits for each Z3 value and a total of six bits of communication in one message from P1 to P2 per instance. I'll now move on to talk briefly about our implementation and benchmarks. So we implemented our 2-3 weak PRF candidate and I'll first talk about the centralized implementation and mention the optimizations that we used. So first we represent Z2 vectors using 64-bit machine words and operate on them in an SIMD manner. Next we used the Z3 vector and represented it with two Z2 vectors with the least significant bits and the most significant bits of the vector. And third, we used a lookup table for the public matrix multiplication. So overall these optimizations make the implementation 25 times faster than the baseline. We also implemented our two-party distributed evaluation protocol along with the original protocol from Bonnet at all. And for this we also used the optimizations I talked about in the previous slide. So we found that our distributed protocol is about two to three times better than theirs on all fronts like pre-processing, computation, and communication. The core reason for this is our efficient modulus conversion gates versus Bonnet at all who used OT in their protocols. I'll now briefly describe some of the applications that we considered. The first application I'll talk about is oblivious evaluation for PRFs. So here there's two parties, the server who has the PRF key and the client who has the input. And the goal of the protocol is for the client to learn only the output while the server learns nothing about the input. And we'll provide two efficient protocols for the two-three weak PRF to compute it in an oblivious manner. So when the same key is used to evaluate multiple inputs, we don't need to mask the key again. So we'll pull this part out and put it into a key update phase. And now this phase needs to be run only when the server wants the key to be changed. For the actual input evaluation, the overall protocol is quite similar to the distributed evaluation. But now since the key has already been masked, some of the messages can be combined and sent in earlier rounds. So for the protocol, the client will compute its mask input and its share of the masked intermediate vector and send both of these to the server. The server will now compute its own share and use this to reconstruct the masked intermediate vector and use this to compute the final output share of y and send both to the client. So the client can use these now to compute its own share of y and then use the server's share to reconstruct the output locally. So for the input evaluation, note that the communication is optimal. We only have one message from the client to the server and then one message from the server to the client. Our two protocols only differ in how the key gets masked. So in the first protocol, the mask is additive, and in the second protocol, the mask is multiplicative. So we compared our two OPRFs to each other as well as to the standard DDH based OPRF. The second OPRF has a smaller evaluation cost, both in terms of communication and computation, but a higher cost to update the key since it involves a matrix multiplication. But still, both of our constructions are much faster to compute than a DDH based OPRF. They do require preprocessing and slightly more communication. But in fact, a highlight of our protocol is that they're faster than a single modular exponentiation or elliptic curve point multiplication. Plus, there's still room for hardware-based improvements so that the gain might be even higher. Our protocol is also two to three times better than the OPRF protocol from Bonaire for the same two to three week PRF. And from the table, you can see for communication, our protocols require 897 and 641 bits respectively compared to 512 for the DDH based. So the communication cost is only slightly higher. And even if no preprocessing is allowed, our online cost is only about 30 to 39% higher depending upon how frequently the key needs to be changed. So further, if we attempt to compare using communication and computation together at the same time, given a six-ish Mbps network, our first protocol would be overall faster than the DDH based OPRF. And for the second protocol, this happens with a two Mbps network, which should be very reasonable in practice. The other application I'll discuss is signature schemes. So at a high level, given a one-way function and a distributed protocol for computing it, Picnic provides a generic method to construct a signature scheme from this using MPC in the head. So basically you run M instances of an N-party protocol in your head, which computes the one-way function. And now you open tau of these instances and they can be checked by the verifier. So opening here means that you reveal the transcripts of the parties in the MPC and allow the verifier to check that they're consistent. And finally, you can use the Fiat-Chemir transform to get non-interactivity in a standard way. Another thing to note here is that these signatures can be made post-quantum using other transforms. We compare the signature sizes when using R23 one-way function versus low MC, which is used in standard Picnic. And for the same parameters, we can see that the signature sizes are quite comparable. In fact, they're slightly smaller across the board using R one-way function. So for example, for 128-bit security for the same Picnic parameters, R one-way function has signature size 10.66 kilobytes versus 12.36 kilobytes when using low MC. Another benefit is that using R one-way functions, we could have potentially much better communication for distributed signature evaluation. Since now you have many different parties computing the signature jointly. So here using low MC will result in a lot of rounds of communication since it has a high depth. But R23 one-way function, which will require much lesser communication. I'll also briefly comment on other applications. Our biggest competitive advantage over other approaches is likely to be for the two and three party evaluation of PRGs and the weak PRF in the fully distributed setting. So in particular, current techniques like OPRFs don't work in this setting. So for these kinds of applications, our candidates will be suited best. So for instance, our fully distributed weak PRF protocols can be used for distributed searchable encryption. Also, since our LPN weak PRF has inputs and outputs over Z2, it's natural to use this for applications like hierarchical key derivation, where the output of one evaluation needs to be kept secret and fed in as input to the other evaluation. Another application is to use less doubling PRGs to securely compute keys for function secret sharing. Finally, I'll conclude with some remarks. The space of constructing symmetric primitives with simple designs is still very much unexplored. And more crypt analysis is definitely needed. And as for some open directions, it would be interesting to have constructions for other primitives like block ciphers and hash functions. It would also be nice to have efficient protocols that have malicious security, which is a natural follow up question. Thank you.