 Hi, my name is Sten Peceni. In this talk, I will talk about our paper, Motive, almost free-branching in GMW via victor-scale multiplication. Motive stands for Minimizing Oblivious Transverse for Ifs. This work is done by myself, Dave Heath, and my advisor Vlad Kulesenkov. We work at the Georgia Institute of Technology in Atlanta, Georgia. In this work, I will be discussing the GMW protocol, which is a foundational, secure multi-party computation, or shortly MPC protocol. In an MPC protocol, an arbitrary number of parties can compete any function of their private inputs, and the security guarantees that nothing is leaked about the inputs, but what you can learn from your own input and the function output. GMW is also a circuit-based protocol, and that means that the function to be evaluated is represented as a Boolean circuit consisting of AND and XOR gates. Then the circuit is evaluated gate by gate under encryption. In a GMW protocol, the communication in terms of both bandwidth and network latency is the bottleneck. Latency is expensive as the number of rounds is proportional to the multiplicative depth of the circuit. Bandwidth, on the other hand, is expensive as a result of having to pre-compute oblivious transfers, which are used to evaluate the circuit's AND gates. In this work, we reduce bandwidth by taking advantage of the conditional branching that can occur in the source program. Hence, if a program has an IF or a SWIT statement, we can take advantage of the fact that only one of the branches is actually evaluated. Traditionally, it has been assumed that we need communication proportional to all branches in a conditional. This was due to the fact that we need to ensure security, and no player should learn which branch is the active that is the taken branch. In this work, we show that this is unnecessary and we need communication proportional only to a single branch. Our method at a high level is based on the fact that conditional branching is really a vector scale multiplication. We show that we can do vector scale multiplication for the same cost as a single AND gate, and as a result, we get conditional branching almost for free. We start by reviewing the GMW protocol because our improvement directly modifies the protocol. The GMW protocol evaluates a function in four steps. First, the function is represented as a Boolean circuit. Next, the parties secret share their input values and send the appropriate shares to other parties. Then all parties step through the circuit gate by gate and ensure they hold a valid XOR secret share of the true value on each wire. After evaluating all gates, the parties reconstruct the output. Let's start with a simple circuit. For simplicity, we consider only two parties Alice and Bob, but the extension to arbitrary number of parties is immediate. The circuit on this slide consists of a single AND gate on the left and a single XOR gate on the right. Alice holds input A and Bob holds input B. Alice secret shares her input A by uniformly sampling A1 and computes A2 such that A1 and A2 add to A. Symmetrically, Bob secretures his input B. Then Alice sends A2 to Bob and Bob sends B1 to Alice. Alice and Bob now each hold one XOR share of A and one XOR share of B. Note that the wire with input 1 can be trivially secret shared by Alice setting her share to 1 and all other parties, Bob only in our case setting his share to 0. Alice and Bob next evaluate the circuit gate by gate and compute valid XOR shares on the output wires of all gates. At the end of the evaluation Alice and Bob each hold a share of AB XOR 1 on the circuit's output wire. The only remaining step is to reconstruct the output AB XOR 1. Alice and Bob send to one another their share of the output and compute AB XOR 1. I will now demonstrate how each gate is evaluated. XOR gates are free and are evaluated locally that is without interaction. Alice and Bob simply add their share of A and B. For example Alice computes her share of C, C1 by adding A1 XOR B1. While XOR gates are essentially free and gates require the use of expensive interactive primitives. At the bottom of the slide we see that after expanding the multiplication of AB we get four terms. The first term A1 B1 and the last term A2 B2 are evaluated by Alice and Bob locally. Since Alice and Bob hold both terms that they are multiplying the middle two terms require communication. I will now demonstrate how to compute A2 B1 as A1 B2 can be evaluated symmetrically. We use one out of two oblivious transfer or more simply OT to get a sharing of A2 B1. Bob adds as the OT sender while Alice is the OT receiver. First Alice inputs her bid B1 to the OT. I emphasize that if B1 equals 0 the term A2 B1 then equals 0 and hence both Alice and Bob need to have the same share to get a sharing of 0. On the other hand if B1 equals 1 the output equals A2 and Alice and Bob need to obtain shares that add to A2. So Bob samples his share of A2 B1 uniformly and inputs it as his first OT input. This will also be Alice's share of A2 B1 in case that her input B1 equals 0. Bob's second OT input is for the case that Alice holds 1 for B1. I repeat that in this case the output shares must add to A2. Hence Bob exores his share of A2 B1 with A2 and inputs it to the OT. The OT protocol is then executed and Alice receives her share of A2 B1. Alice and Bob then output their respective shares. We showed that in order to evaluate AND gate by two parties we need to evaluate two terms interactively. Each of these interactive terms is evaluated via OT. As OTs are very expensive the cost of AND thus corresponds to two OTs in the two-party case and for P parties the cost is P times P minus 1 OTs. Now, OTs are expensive since they are public key primitives. In order to reduce the costs GMW implementations use the efficient OT extension protocol by Ishai, Kilian, Nisim and Petrang from CryptoO3. The idea of OT extension is that a small number of base OTs along with much cheaper symmetric operations can generate a large number of OTs. Furthermore, these OTs can be pre-computed by a trick introduced by Beaver in Crypto95. Despite these improvements, OTs consume OT extension matrix rows of size CAPA which is the computational security parameter and is often set to 128. Thus computing a one-bit multiplication requires a big deal of CAPA communication. Our work shows that conditional branching can be efficiently achieved via vector scale multiplication. Therefore, we demonstrate how vector scale multiplication can be efficiently implemented. The naive approach for vector of size N would require N AND gates and thus N times P times P minus 1 OTs. We can do much better than that. We introduce a natural generalization of the N gate and call it a versus gate. A versus gate requires only P times P minus 1 OTs for vectors of any size just as a single N gate. The basic idea is that instead of using N times P times P minus 1 OTs of one-bit secrets, we use P times P minus 1 OTs of N bit secrets. This significantly reduces communication as we consume fewer CAPA bit long OT extension matrix rows. We add that recent groundbreaking work has shown that oblivious transfers can be extended without the expensive bigot of CAPA communication. However, this work still required the parties to perform significant computation to generate the correlated randomness associated with OTs. While we stress communication reduction, more generally, we simply reduce the number of needed OTs. Thus, even when using this new and powerful primitive, it is worthwhile to use our approach in order to decrease computation consumption. We demonstrate our versus gate on a simple example where we consider a vector of two bits. We also show how to evaluate the versus gate by two players only. The generalization to any number of players and to vectors of any size is immediate. Now for scalar A and vector BC, we can naively use two N gates to get AB and AC. Instead, we replace the two N gates by our versus gate, which takes the scalar A and vector BC as input and outputs AB and AC. Now I demonstrate how versus gate works. Let's first expand all terms on the slide. As in the N gate evaluation, we get four terms. The first and the last term can once again be computed locally, while the middle two terms require interaction. In the interactive terms, one player holds the scalar and the other holds the vector. I will show only how to compute A1 times B2C2 as the other term is computed symmetrically. We follow the same steps as in the N gate OT, except Bob's inputs are no longer. As the vector has two bits, we will use one out of two OT on two-bit secrets. Bob acts as the OT sender, while Alice is the OT receiver. First, Alice inputs her bit A1 to the OT. Bob samples his share of the term uniformly. More specifically, he draws two random bits, X and Y, and inputs them as his first OT input. X and Y will also be Alice's shares of the term, in case her scalar input A1 is 0, so that Alice's and Bob's shares add to 0. The second OT input is for the case when Alice holds 1 for A1. Recall that in this case, the output shares must add to the vector B2C2. Thus, Bob exhausts his share of the term with the vector and inputs it to the OT. The OT protocol is run and Alice receives her share of A1 times B2C2. Alice and Bob output their computed shares. Now that we introduced our versus gate, let's first review some related work done on conditional branching in the two-party setting. Then, we will show how conditionals are evaluated in GMW and how versus gate facilitates efficient conditional branching. Three prior works have looked at the problem of how to reduce the amount of communication in Yao's Carbalt circuit in the presence of conditional branching. The first work, free if, answered the question of how we can get free if in case the circuit generator knows which branch is taken. The second work symmetrically answered what one can do in case the circuit evaluator knows which branch is taken. The general case, when neither party knows which branch is taken, was answered in the third work. We repeat we are the first work that yield significant conditional branching improvement in the multi-party setting. Now we demonstrate what conditional branching looks like in a Boolean circuit. We have a branch condition which controls which branch is taken and is the output of some prior circuit. We also have two branches. For now, we abstract and forget about the branch details and simply represent the branches as circuit C0 and C1. The branch condition bit determines whether C0 or C1 is taken. In our example, we consider C0 to be the active that is the taken branch. In order to propagate only the output of the active branch C0, we add a multiplexer which takes the branch condition as input and propagates the output of C0. In order to show how versus gate can efficiently evaluate conditionals, we first state our invariant. On each wire of each an active branch, players will hold a share of 0. On the other hand, on each wire of the active branch, players hold valid shares. We now consider a specific circuit where one branch consists of two AND gates and a single EXOR gate, while the other branch consists of two AND gates. There are three steps when evaluating a conditional inMotive. First, we need to establish our invariant. We demultiplex the input values based on the condition bit. Next, we need to maintain our invariant while evaluating the branches. EXOR gates trivially support our invariant. We show that versus gates maintain our invariant in the following slides. Multiplexer then propagates the output values on the active branch and discards the output of the inactive branches. We will return to the details of the demultiplexer and multiplexer later. For now, let's look at the key idea of our approach. Our key observation is that if our invariant holds, then we can substitute each pair of AND gates across branches with a single versus gate. Recall that versus gates use the same number of OTs as a single AND gate. Thus, by making these substitutions, we drive down the cost of the overall circuit. Now we demonstrate how we substitute the circuit AND gates with versus gates and show how versus gates maintain our invariant. We substitute the first two highlighted AND gates with one versus gate. And also, these two highlighted AND gates with another versus gate. So we have two AND gates that we want to substitute with a versus gate. In our example, the AND gate on top is active, the one at the bottom is inactive. By our invariant, this means that the inputs to the top gate X and Y hold true values, whereas the inputs A and B of the bottom AND gate equal zero. Now we construct the versus gate. We set the scalar to X XOR A. As we know that one of X or A is zero, this value always equals to the scalar on the active gate. The input vector then consists from the second inputs to the AND gates Y and B. We expect to get X Y for the first output bit as that is the output of the active gate and zero for the second output bit. To see, we indeed get this output, let's plug in zero for A and B. And now, let's simplify the expression. We get the expected output, X Y on the active branch and zero on the inactive branch. Our invariant is maintained. Again, the key point is that by replacing two AND gates with one versus gate, we reduce the number of needed OTs and hence the cost of conditional branching. Now we return to how we establish our invariant with a demultiplexer and propagate the output values on the active branch with a multiplexer. We demultiplex the input values by multiplying the branch inputs with the branch condition. As a convenient optimization, vector scale multiplication is an elegant fit for demultiplexing. As all wires on the inactive branches are zeros, we multiplex the output values of wires on the active and inactive branches by XORing corresponding wires together. Thus, multiplexer can be implemented for free only with XOR gates. Note that unlike in traditional conditional branching, it is demux that takes a condition bit as input and motive. Usually, branch condition is input to the multiplexer. We showed a simple example with only two branches. We can get an arbitrary branching factor that is the number of branches in the conditional recursively by nesting conditionals. We implemented our approach and compared motif to the standard GMW protocol. In the experiment above, we plot the overall per player communication as a function of the number of branches in the conditional. The experiment was run in two PC and on a circuit that compared two bit strings on each branch. For 16 branches, motif outperformed standard GMW by factor 9.4. We did not get factor 16 improvement for two reasons. First, both approaches use the same number of base OTs to set up an OT extension matrix. Second, the communication in the evaluation phase is not improved. That is in two PC, we still use three bits per OT. We also emphasize that our experiment uses the same circuit in each branch. And hence, we achieve perfect alignment, which we explain on the next slide. Due to the perfect alignment, our experiment shows the maximum benefit that our technique can provide. On this slide, I discuss what branch alignment is and how it affects our improvement. In GMW, we simultaneously compute all available AND gates in rounds. The number of rounds is proportional to the multiplicative depth of the circuit. At any time, we can only evaluate gates whose input shares have already been computed. We cannot include future round AND gates into the current versus computation. In each round of GMW computation, we can only amortize OTs over the ready gates. The more aligned the circuit branches are, meaning they have a similar number of AND gates in each circuit layer, the higher the performance improvement. Due to this problem, our work does not have communication proportional to the longest execution path. Recently, we discovered a different approach to improve branching, which does achieve communication proportional to the longest execution path. This work shows that beaver triples can be amortized across branches and is now under conference submission. So, this was motive. Again, our contribution is that we reduce communication in a standard GMW protocol by taking advantage of conditional branching in a source program. Our improvement results in almost constant communication in the number of conditional branches. We are the first work to achieve significant conditional improvement in the multi-party setting. We do that by efficiently evaluating vector scale and multiplication via R versus gate. So, thank you for listening.