 Hi, my name is Stan Peceni. In this talk, I will present our paper, Mass Triples Amortizing Multiplication Triples Across Conditionals. This work was done by myself, David Heath and Vlad Golesnikov, and we work at Georgia Tech in Atlanta, Georgia. In this work, I will be discussing a classic secure multi-party computation protocol, which uses pre-processed multiplication triples to evaluate arbitrary functions represented as Boolean circuits. A Boolean circuit consists of nonlinear endgates and linear exorgates. The circuit is evaluated gate by gate under encryption. While evaluating the linear exorgates is essentially free, the nonlinear endgates require the use of costly interactive primitives. More specifically, endgates are evaluated with beaver triples which are expensive to generate. In our work, we reduce the cost associated with generating multiplication triples. Our improvement results in significant improvement to both communication and runtime of the triple-based protocol. We improve the protocol by carefully reusing multiplication triples and conditional statements of the source program. Hence, if a program has an if or a switch statement, we can take advantage of the fact that only one of the branches is actually evaluated. Traditionally, it has been assumed that we need multiplication triples for all branches in a conditional. This was because we need to ensure security and each multiplication triple can be used to evaluate at most one hand gate. In this work, we show that this is unnecessary and the number of triples we need is proportional only to the size of the single longest branch. Prior work has looked into conditional branching improvements under other protocols. Start garbling is a line of research that has looked into the conditional branching problem in Yao's garbled circuit, so in the two-party setting. The first work to achieve significant conditional branching improvements in the multi-party setting was motif, a work done by myself, David Heath and Vlad Golesnikov and was presented at Asia Crypt 2020. Motif worked with the GMW protocol, a protocol closely related to the triple-base protocol in that both protocols evaluate Boolean circuits, work for any number of parties and are secure against dishonest majority. Our approach thus works in a similar setting to motif. However, our solution is distinct and yields a significant conditional improvement over motif. I will now elaborate on motif's limitations. The key cost in the GMW protocol is pre-computing random oblivious transfers, which are then used to evaluate the circuits and gates. Oblivious transfers, or shortly OTs, are expensive public key primitives and motif's improvement stems from amortizing OTs across conditional branches. In GMW, all available endgates are computed simultaneously in layers. At any time, only the endgates whose input shares have already been computed are evaluated. Motif's limitation is that it cannot amortize OTs in future layer endgates. In each layer of GMW computation, one can amortize OTs only over the ready gates. Thus, motif's improvement is dependent on the conditional branch alignment, meaning the branches need to have a similar number of endgates in each circuit layer for optimal performance. In the sample conditional, I show two layers, both of which are unaligned. In this conditional, motif would pay the cost of four endgates, while the longest branch has only three endgates. Thus, the example conditional is not an ideal fit for motif's protocol. Our work shows this limitation of motif can be quite significant. Our new work is called masked triples, and the benefit is that our improved protocol truly pays only for the longest conditional branch independently of the branch alignment. Recall that masked triples improves the triple-based protocol, which is closely related to the GMW protocol. Similarly to motif, we reduce the number of expensive random OTs. In the triple-based protocol, random OTs are one way to generate multiplication triples, which we amortize across conditional branches. Unlike in motif, the number of OTs we need to generate is proportional to the size of the longest conditional branch. So, I have introduced the key idea of our improvement. Now, in order to get into the technical aspects of our improvement, I briefly review how triple-based protocol works. The triple-based protocol evaluates an arbitrary Boolean function in four steps. First, the function is represented as a Boolean circuit. Next, the parties secret share their input bits and send the appropriate shares to other parties. Then, all parties step through the circuit gate by gate, and ensure they hold a valid XOR secret share of the true bit on each wire. After evaluating all gates, the parties reconstruct the output. Let's start with a simple circuit. For simplicity, we consider only two parties, Alice and Bob, but the extension to any number of parties is straightforward. The circuit on this slide consists of a single AND gate and a single XOR gate. Alice holds input X and Bob holds input Y. Alice secret shares her input X into X1 and X2. Symmetrically, Bob secret shares his input Y. Then, Alice sends X2 to Bob and Bob sends Y1 to Alice. Alice and Bob now each hold one XOR share of X and one XOR share of Y. Note that the Y with input 1 can be trivially secret shared by Alice setting her share to 1 and Bob setting his share to 0. Alice and Bob next evaluate the circuit gate by gate and compute valid XOR shares on the output wires of all gates. The last step is to reconstruct the output. Alice and Bob send to one another their share of the output and compute XYXOR1. So, I have explained how a Boolean circuit is evaluated in the triple-base protocol and now I will demonstrate how each gate is evaluated. XOR gates are virtually free and are evaluated locally without interaction. Alice and Bob simply add their shares. Unlike XOR gates, AND gates are expensive since they require the use of expensive and interactive primitives. The key idea due to Beaver is that if you have a way to compute a random AND gate you can then perform cheap linear arithmetic and some cheap broadcasts to compute any AND gate. These random AND gates are the multiplication triples and are expensive. We thus separate AND gate evaluation into two steps. A pre-processing step where a triple is generated and an evaluation step where the triple is used alongside linear arithmetic and big broadcasts to compute the AND gate. First, let's look at the first step. That is, pre-processing a triple which means computing a random AND gate. In this step, the parties draw two bits A and B uniformly and compute their product. The output is a uniform sharing of the product. This is achieved by each party uniformly drawing a bit for their share of A and B. These shares are then input to an MPC protocol which outputs the share of the product AB to each party. The MPC protocol can be, for example, a single GMW AND gate. The point is that the MPC protocol requires communication and is expensive. Now that we have pre-processed a triple, we are ready to evaluate the AND gate. We input the pre-processed triple that is the bits A and B and their product AB as well as the actual AND gate inputs X and Y. We then use cheap linear arithmetic and broadcast two bits to get a sharing of the output Z. This is a well-known protocol. To see the algebraic details, please see our paper. We have shown how to evaluate an AND gate. Let's now analyze the cost of an AND gate. The evaluation phase is cheap and requires broadcasting only two bits. The pre-processing phase is much more involved. If we generate triples with a GMW AND gate, we will need to compute two OTs in the two-party setting. Recall that OTs are asymmetric primitives and thus are expensive. If we generate OTs with the OT extension of Ishai, Kelly and Nisim and Petrang from CryptoO3, each one-bit multiplication costs communication on the order of the computational security parameter Kappa, which is usually 128 bits. We add that recent advancements have shown that oblivious transfers can be extended without the expensive Big O of Kappa communication overhead. However, these works still require the parties to perform significant computation to generate the correlated randomness associated with OTs. While we stress communication reduction, more generally, we simply reduce the number of needed OTs. Thus, even when using these new and powerful primitives, it is worthwhile to use our approach in order to decrease computation consumption. I have demonstrated that generating triples for AND gate evaluation is expensive. Hence, we would like to reduce the number of triples needed in the protocol. Let's consider a conditional with N branches and N AND gates across all branches. Normally, we would need to generate N distinct triples that is one triple per AND gate. This is because the triple values a and b are essentially one-time paths on clear text values in the circuit. Thus, if we use a triple for two different AND gates, we violate security. We would like to be able to generate a single triple for all these AND gates, but we need to take care to do it securely. So, if we are able to securely reuse a triple on AND gates across all branches, we have effectively made all untaken branches free. I will now demonstrate how we can reuse the triples securely. We work for any number of branches, but for simplicity, let's consider just two branches, where the top branch is the active that is the taken branch. I emphasize that neither player knows which branch is active. We would like to use the same triple on both AND gates in the two branches. This idea of reusing the same expensive resource is that of material we use and stems from the stocked garbling line of research discussed earlier in the presentation. The key idea behind reusing triples securely is that we can carefully apply secret shared masks to the triples. For the inactive branch, the parties mask the shares with uniform exhaust shares of uniform masks r and s, randomizing the triples and preventing us from breaking the security of one-time pad. By randomizing the triples, we violate the correctness of AND gates on the inactive branch, but this is of no concern. The output of each inactive AND gate is ultimately discarded. For the active branch, the parties use the triples as is, meaning the active branch is evaluated normally. Of course, the parties should not know which branch is inactive, so from the perspective of the parties, it should appear plausible that either branch could have used randomized triples. To achieve this, for the active branch, the parties also exhore masks onto the triples, but in this case, each mask is a uniform sharing of zero, hence the exhoring is a no-op. We stress that masking only the a and b bits of the triple is necessary and sufficient. This is because these two bits are used to mask broadcasts during the AND gate evaluation phase. The third value in the triple does not need to be masked. Each party holds a share of this triple value, but uses it only locally with garbage values. Whatever you learn from these local operations will not give you any useful information. Importantly, our approach yield significant concrete improvement because generating masks is much cheaper than generating additional triples. I have shown that if we are able to cheaply generate triple masks as described on the previous slides, we can reuse triples across conditional branches, thereby driving down the cost of conditional evaluation. I will now give a concrete example of our approach. Suppose our conditional consists of two branches. And the first branch is the active branch. Recall that neither party knows which branch is active. In a standard triple-base protocol, we would need to generate one triple per AND gate in all branches, and hence a total of five triples. In our approach, we need to generate triples only for the longest conditional branch. As the upper branch is the longer branch in our example, we need to preprocess only three triples, one for each AND gate in this branch. So, we have preprocessed our triples, and now we generate a sufficient number of masks. In our example, we need six pairs of masks, two pairs for each triple, as we are masking two bits in each triple. Each pair consists of a mask for the active branch and a mask for the inactive branch. Once the masks are ready, we proceed with the conditional evaluation. As the upper branch is active, the triples are masked with all zero masks. Hence, the triples' semantic values are unchanged, and the parties use them as is on the active branch. Note that the parties are not aware that the masks are shares of zeros, as the shares are uniform. In our masked triples protocol, we take care that the all zero masks are aligned with the active branch. On the inactive branch, we randomize the triples by applying uniform masks. In this case, we change the triple's semantic values and invalidate the correctness of AND gates on this branch. As we discard the outputs of the inactive branch, this is of no concern. So now, we hold the outputs of both branches, but we need to propagate only the value on the active branch. This can be achieved with ordinary Boolean logic. Specifically, we use a multiplexer, which takes as input a branch condition, a secret shared bid that determines which branch is active, alongside the outputs of both branches. The multiplexer consequently propagates the output value on the active branch and discards the invalid output on the inactive branch. Multiplexer costs one AND gate per conditional output. I have demonstrated that our protocol works assuming triple masks can be generated much more cheaply than additional triples. What remains is to show how we generate these masks. Recall that we have to generate a large number of these masks. I will refer to this number by AND. We need one triple for each AND gate on the longest conditional branch and we need twice as many masks. In order for us to evaluate a conditional statement for the cost of a single longest branch, we must ensure that the cost of generating masks is independent of AND, the number of masks we need. In fact, we just need a small constant number of OTs of long strings to generate any number of masks. For simplicity, I demonstrate our mask generation protocol in the context of two parties and two branches. Alice and Bob first sample a uniform bid S and a uniform and bid string R. Alice and Bob then compute a vector scale of product S times R and receive a sharing of this product. This product can be computed cheaply with only two OTs on end bid strings. I will not discuss the details here but you can find them in our paper. Alice and Bob then output their shares of the masks S times R and S times R plus R. Depending on the value of the bid S, one mask is a share of zeros while the other is a share of the random bid string R. Now I have successfully generated all zero and uniform masks. These masks are currently ordered according to the uniform bid S. If we go back to our example with two branches, we need to associate the all zero mask with the active branch. However, the parties do not know which mask is all zeros nor which branch is active. This is easy and cheap to arrange given the branch condition bid and the bid S with a single bid broadcast and a simple local permutation. So I have introduced the complete mask triples protocol. We have implemented our approach and now I show some of our results. We ran our experiments on circuits with randomly generated alignments across 100 runs. Please see our paper for details of the circuits we used. This first plot is in the two-PC setting and demonstrates how we reduce the number of OTs as a function of the branching factor that is the number of conditional branches. In our solution, the number of OTs is virtually constant whereas the number of OTs grows linearly in the standard triple-based protocol. Recall, at the beginning of the talk, I said motif was limited by the branch alignment in the conditional and that our work improves on this. This box plot shows motif's dependence on branch alignment in the case of two branches. We plot the distribution of the number of random OTs needed for two parties to evaluate each protocol. Across all 100 runs, mask triples and the standard triple-based protocol always need the same number of OTs. On the other hand, motif's performance differs depending on branch alignment. Because we sample alignments uniformly, this results in an increased number of consumed OTs. For two branches and on average, our approach required 1.5 times fewer OTs than motif and consistently required 2 times fewer OTs than the standard triple-based protocol. Our next plot shows the total per-party communication as a function of the branching factor in motif and our work. On 16 branches, our approach improves total communication on average by factor 2.6 over motif. Our improvement over the standard protocol was by factor 12 in the same setting. Our last experiment emphasizes our approaches scaling to the multi-party setting. In this experiment, we fixed the number of branches to 16 and plot per-party communication as a function of the number of parties. Our optimization does not add additional costs as compared to motif and standard triple-based protocol. Each technique consumes communication quadratic in the number of parties. While our approach scales efficiently to the multi-party setting, we also introduce more efficient mass generation protocols in the 2 and 3-party setting. You can find details about these protocols in our paper. So, this was mass triples. The key contribution is that we introduce a protocol with communication proportional to a program's longest execution path in the multi-party setting. Unlike state-of-the-art motif, our improvement is independent of the circuit topology. We implemented our approach and obtained significant concrete improvements over motif in all tested settings. So, thank you for listening!