 Hello, my name is Stan Peceni. In this talk, I will present our paper, Garbling, Stack and Staggered, Faster K out of N Garbled Function Evaluation. This is joint work with David Heath and Vlad Kulesnikov, and we work at the Georgia Institute of Technology in Atlanta, Georgia. Consider a server that offers a suite of various services to clients and suppose that these services have privacy concerns for both the server and client. As an example, suppose a telehealth company offers services that screen concern patients for a variety of medical conditions represented as function 0 through and minus 1. In this example, both parties may have privacy concerns. The patient, Alice in our case, may not wish to disclose her health data, and the telehealth company may use sensitive health data of other individuals to aid in the screening or use proprietary data. The patient may apriori note that a number of medical conditions are unlikely to be the source of her symptoms. For example, it is unlikely that the patient's headaches are caused by athlete's foot. Hence, the patient may only wish to be screened for K health conditions out of the possible N. That is, the patient wants to evaluate K out of N public functions to which the parties have some private inputs. I emphasize that the patient knows which K function she wants to evaluate. In this talk, I present our K out of N Garbot circuit based construction which yields a significant concrete improvement over the state of the art, also Garbot circuit based construction. So as some background for this work, Garbot circuit is an efficient protocol that allows two parties to securely compute an arbitrary function represented as a Boolean circuit of their private inputs. What I mean by securely compute is that the two parties provide their private inputs to a public function, which they evaluate, and the security guarantee is that an adversary corrupting a single party cannot learn anything about the other party's input just by looking at the transcript of messages he received and his internal state. The only thing he can learn is what can be deduced from the corrupted party's input and the function output. Garbot circuits are interesting because they are often the most efficient technique for secure two-party computation. In particular, Garbot circuit is a constant round protocol and this means we do not need to suffer from high network latency. I repeat that in our construction Alice's K choices are known to her, either a priori or revealed by the computation. I will start by presenting why a naive Garbot circuit solution to the K out of N problem is not satisfactory. An important point is that the generator should not know nor learn which case circuits Alice will evaluate. Hence, he garbles all N circuits, compiles them into a single string and sends this string across the wire to Alice. I would like to emphasize that this sending of the circuit garbellings, which I will henceforth refer to as the material, is expensive. In fact, this process of sending the material is traditionally considered the most expensive part of the Garbot circuit protocol. In 2020, David Heath and Vlad Kolesnikov introduced start garbling, which reduces precisely this bandwidth consumption for one out of N circuits. This work showed that it is not necessary to send a garbling for each of the N circuits. In fact, it suffices to send a single material of length equal to the single longest garbling among the N circuits. While it is communication that has been viewed as the Garbot circuit bottleneck, start garbling change the status quo. Indeed, even without start garbling and in many settings, communicating the Garbot circuit is usually only a small factor slower than garbling it, as garbling consumes significant computational resources. Thus, even for only three or four branches and in many settings, computation becomes the bottleneck. The start garbling technique can be generalized to arbitrary k. While the communication improvement is preserved, that is, the communication cost is proportional to k rather than N, neither extending this technique to k out of N circuits incurs factor k increase in computation. That is, computation cost grows with k. Hence, there seems to be an inherent tradeoff between computation and communication with the current techniques. Start garbling reduces communication consumption, but increases computation consumption. This tradeoff forces us to choose between communication and computation which is undesirable. This is the key problem we are trying to solve. More specifically, we ask whether we can have the best of both worlds. That is, can we pay communication only for k materials and still incur computation on the order of N? We answer in the affirmative and match the communication complexity of start garbling and the garbling complexity of standard garbled circuit. Central to our idea is the fact that material is viewed differently in each technique. Traditionally, garbled circuit was viewed as a collection of encrypted truth tables. The key idea of start garbling was that the material should instead be viewed as a bit string. This idea enabled to manage these materials as bit strings that use natural operations such as bitwise XOR to reduce communication. We take this idea further and view the material as an element in a large Galois field. As a result, we can perform linear algebraic operations on the material which will help us reduce the computation costs associated with start garbling. Now that I showed the costs associated with the different approaches and also stated a central observation of our work, I will get into how the individual approaches, the stock garbling in particular actually work since this is necessary to understand our technique. More specifically, I will start with k equals 1 and then generalize to arbitrary k. Assume that N equals 3, that is we have 3-circuit C0, C1 and C2 and C0 is the taken circuit. A key point is that each circuit encryption, that is the material, the bit string, can be viewed as an expansion of a pseudo-random seed. In particular, we will let the generator start from a short seed for each circuit and use this as a source of all the randomness that he uses to encrypt each circuit and produce M0 and 1 and M2. That is, a seed becomes a compact representation of each circuit. Now that the generator has garbled all N-circuits, he can concatenate them into a single string and send this string across the network to Alice. This would be the step in the standard Yau's garbled circuit approach. After receiving this material, Alice would simply retrieve the circuit encryptions she wants to take and evaluate them. In start garbling, instead of sending M0 and 1 and M2 separately, the generator adds these three values together using bitwise, exclusive OR. Here is where we have one in terms of communication. In the standard garbled circuit approach, we would have sent M0 and 1 and M2 separately, but now we are sending the XOR sum of these three values. The generator then sends this compact stock material to Alice. And from here, we want that Alice somehow recovers M0, that is, the material corresponding to the circuit that she wants to evaluate. Recall that the generator holds the short seeds, which compactly represent M0 and 1 and M2. Indeed, Alice could learn M0 by receiving the seed 0. However, we want Alice to learn M0 without receiving the seed corresponding to the taken circuit, as that would be insecure. This is because from this seed, Alice can derive all the garbled circuit keys. Instead, we convey seeds to Alice for the circuits that will not be evaluated, either by oblivious transfer or by encrypting them as part of the garbled circuit. Since circuits 1 and 2 are not evaluated, Alice obliviously receives the seeds 1 and 2. Upon receiving these seeds, Alice can re-expand the seeds into materials and 1 and M2. Then, using XOR, Alice can extract out M0. Now we achieved our goal that Alice retrieved M0, but importantly, she has never seen the seed corresponding to M0. Alice then evaluates M0. Note that the communication consisted of sending only a single longest material. In terms of computation, the generator only garbled and circuits, while the evaluator garbled and minus 1 circuits. That is all about the taken circuit. Thus, for k equals 1, communication is improved and the computation complexity is the same as in the standard garbled circuit approach. Now, let's consider the more general case when k is greater than 1, more specifically, k equals 2 and still equals 3. That is, we have 3 circuits C0, C1 and C2 of which C0 and C1 are taken. First, we try the same approach as before. The garbler produces the material for each circuit by expanding a short pseudo-random seed. Then, he stats the material and sends it to Alice. Alice now needs to reconstruct M0 and M1, corresponding to the taken circuits. She also obliviously receives the seeds for the circuits not taken. In this case, this is only seed 2. Alice expands the seed into M2 and exores it out from the stacked material. It is impossible for Alice to reconstruct M0 and M1 from M0, Exor and 1. So what can we do? Importantly, we observe that receiving seed 2 is not sufficient to reconstruct the taken material. Furthermore, I explain that it is not secure to send a seed to Alice, which corresponds to a taken circuit. This implies that to naively solve the k-out-of-end problem with stacked garbling, we need to generate k separate stacks and each stack must use n-fresh garbellings. More specifically, since k equals 2, we need two stacks. Note that the materials in stack 1 are different from those in stack 0, although they represent the same functions. This is exactly where the computation costs proportional to k times n garbellings comes in, as n new materials are needed for each of the k stacks. Instead, we would like to reuse the same n materials across the stacks, which is precisely what we achieved in our stacked and staggered construction. Recall that our key observation is that we can view circuit materials as elements in a large Galois field, and hence we can operate on them with linear algebra. This will enable us to compete only a linear number of materials and then construct k-linearly independent combinations of the same materials as the stacks. First, I will describe visually only then algebraically. In this slide, I show the two stacks in our stacked and staggered construction. Note that m0, m1, and m2 are same across the stacks. That is, there is a total of n materials. While stack 0 is the same as in the stacked garbling approach, the materials in stack 1 are shifted before being exorbed together. Now, it should be believable that these shifts provide some additional information that will enable Alice to bit by bit unstack the taken materials. I will get into details of unstacking later. At this point, I would like to emphasize that since we view the materials as elements in a Galois field, we can multiply them by different powers of 2, which simply add zeros and shifts each material within the stack. This means we can pick our linear algebraic operations such that everything can be implemented with simple bitwise exores, and hence we obtain high performance. We simply shift each material and exor it into a stack. These simple stacking operations are far, far cheaper than the garbling operations. The result is that although we still need to construct k stacks, we obtain performance that matches computation of the standard Galois-garbled-circuit approach and communication of stacked garbling. We call our construction stacked and staggered because as in stacked garbling, we stamp that is exored the materials together. However, we also stagger that is shift the materials such that we construct k linearly independent combinations of these same materials. Now that we know how to stack, I will get into how we unstack. Again, I will first present the idea informally and only then algebraically. Recall that Alice needs to learn M0 and M1 since those materials correspond to the taken circuits. M2 is not taken and hence Alice obtains a C that will enable her to reconstruct M2. After reconstructing M2, Alice can exor it out from each stack, taking care to shift M2 by the appropriate offset first. Now Alice is left with stacks containing only the taken materials. Note that Alice can just retrieve 1 bit of M0 directly from the second stack. She can then take this bit and exor it out with the corresponding position in stack 0. This will give her 1 bit of M1 which can then be used to obtain another bit of M0 from stack 1. Alice continues this bit by bit unstacking in this manner until she has recovered M0 and M1 in its entirety. Alice then evaluates M0 and M1. I would like to emphasize that when k is greater than 2, this unstacking procedure gets much more complicated. The algorithm must carefully coordinate the order in which bits are unstacked as we need to ensure we do not attempt to recover bits prematurely from each stack. While I will not get into details, we introduce a notion of a per stack delay which ensures that we do not try to unstack the bits prematurely. Now that I gave the intuition, I will describe our construction algebraically. Following up on our informal discussion, we need two stacks SK equals 2. Stacking is simple. We construct the first stack as a simple exclusive order of the materials. In the second stack, we multiply M1 and M2 by 2 and 4 respectively. This in practice shifts each material by one position from the preceding material. Hence, we get the same stack as in the pictorial representation. Again, we multiply the materials by powers of 2 such that everything can be implemented with simple bitwise exhausts. Hence, stacking is very cheap. Additionally, we reduce the number of bitwise exhausts needed to further optimize our stacking. We observe that it is not necessary to include all N materials in each stack. In practice, each stack is used to recover a single material. This means we need to ensure that each stack has at least one taken material to unstack. Hence, K-1 materials can be excluded from each stack. As a result, each stack needs to have only N-K plus 1 materials. More formally, to stack N materials M0, M1 and M2 into K stacks S, the generator simply multiplies the vector of materials by a stacking matrix A. A is a key object in our formalization. It is a K-by and matrix and formalizes the bitchip distance for each material in each stack. A key property of the stacking matrix is that any choice of K columns is linearly independent. This means that any submatrix with K columns is invertible. We will use this property when unstacking. Next, I will show a larger stacking matrix A so that it is easier to understand its structure. I show a stacking matrix for K equals 4 and N equals 6. Notice that each matrix row has K-1 that is 3 zeros. Other entries can be expressed as powers of 2. The powers of 2 increase with the rows. That is, in row 0, the powers do not increase. In row 1, they increase by 1, and in row i, they increase by i. These properties again allow for an efficient algorithm that stacks material using only simple X-ORs because multiplying by powers of 2 simply shifts the material in each stack. Now, we return to our former example. After the generator computes the vector of stacks S, he sends S to Alice. Alice then obliviously receives a minus K, that is 1 seed in our case, corresponding to the circuits not taken. In our example, this is seed 2. From here, Alice reconstructs M2. Now Alice is ready to unstack M0 and M1. First, Alice removes all materials not taken, M2 only in our example, from her stacks. She does that by shifting M2 by the appropriate amount, which means according to A, before XORing it with each stack. The resulting vector S' contains linear combinations of only the two taken materials. To define how Alice transforms these two stacks into the two taken materials, we note the following equality. On the left, S' is the vector of stacks of only the taken materials. We can get these stacks by striking out from the stacking matrix all columns corresponding to the indices of circuits not taken. In our case, the taken indices are 0 and 1, and hence we strike out the column at index 2. Then we multiply this modified stacking matrix by the vector of taken materials. Note that Alice can compute as prime the stacks of the taken materials as shown on the previous slides. She can also compute the modified stacking matrix as the stacking matrix is public and she knows her K taken indices. What Alice wants to learn is the vector of taken materials. She can thus solve the equation on this slide for the vector of taken materials. Thus to transform the two stacks into the two target materials, Alice needs to invert the modified stacking matrix. She then multiplies this inverted matrix by S' to recover the vector of taken materials. I repeat that like our stacking procedure, this unstacking can also be achieved using simple XORs only. But unlike our stacking procedure, this procedure is non-trivial, so I hope that if you are interested, you will take a look at our paper. Alice then evaluates M0 and M1. We implemented our stacked and staggered construction in C++ and reported experimental results obtained when running our system against both standard stacked garbling and standard garbled circuits without stacking. We used all three implementations to handle K out of N circuits where each circuit was SHA256. In this experiment, we set N to 16 and then varied K from 2 to N minus 1. Recall that our technique improves computation when evaluating K out of N circuits. This figure demonstrates that our method achieves this computation improvement without sacrificing communication. Specifically, our technique has similar communication to the stacked garbling method. In this plot, we show the wall clock runtime for all three approaches on a wide area network. Our experiment shows that we indeed concretely improve computation. Thus, we do achieve the best of both worlds, meaning we capture the low communication utilization of standard stacked garbling but without high computation. Notice that our performance is roughly upper bounded by the performance of standard garbled circuit without stacking. Specifically, our wall clock time approaches that of standard garbling as K approaches N. This can be explained by our choice of stacking matrix A. As K approaches N, A features increasing numbers of zeros, which reduces the cost to both stacked and unstacked material. In the special case, N equals K, A features zeros everywhere except on one diagonal. Where it is once, it is a mirror of the identity matrix. Hence, in this special case, our scheme and standard garbled circuit perform essentially identical actions. In summary, the key contribution is that we improve garbled circuit evaluation of K out of N functions, where the K choices are known to the garbled circuit evaluator. We retain the start garbling communication complexity, while simultaneously retain the computation complexity of standard garbled circuit. The result is we get a significant concrete improvement over both start garbling and the standard garbled circuit approach. For example, for N equals 128 and K equals 16, we improve over start garbling by approximately factor 7.7, and over yaw's garbled circuit by factor 4.8. Thank you for listening.