 This is joint work with Sean Kennedy and Wilfong for Graph Theory Depot, and this talk is on circuit-based secure computation and about tricks, how you can make circuit smaller before you apply the techniques that people know and use very efficiently. So, okay, so we're solving the problem of secure computation, and here most of these guys are talking about a two-party case, but everything applies to a multi-party case, not a multi-party case as well, because the main part, the most interesting part is about how to generate a circuit that's more efficient. And I will be talking about this overlaying in circuit causes, but you kind of can, you know, in the back of your mind, when we get to explain what it is, you can think of it as what we're doing is generating a circuit that's universal for a given set of circuits. So we don't want to generate a circuit that's universal for all possible circuits of size, and because that's expensive, and a lot of the time you don't actually need that. So we try to get improvement by solving this simple problem. Okay, so at the very high level, doing secure computation problem, doing secure computation is two steps. Step one is you generate, given a function F, you generate a circuit that implements the function F, and then step two, you evaluate the circuit using secure computation. And all of the work, the vast majority of work not all focuses on step two. And we are pretty good at step two. We can evaluate many, so for example, in the garbled circuit, we need to do only a couple of PRF evaluations per Boolean gate, and send two encryptions per Boolean gate, which is 32 bytes. So on a simple laptop and on a land, you can evaluate millions of gates per second. And this is really efficient, and we're kind of beginning to kill the limit of what we can hope to achieve. But this first part is people are kind of ignoring it because it's kind of given to us, and we just assume that the circuit is given. And solving this problem, there's some work that looks at it, and mainly this is with the program languages people where they try to optimize which part of the circuit is valid and which part is not. And I think this is kind of a promising direction, and this is where this work is focusing on. Okay, so I'll talk about Boolean circuits, why we're using them and kind of their limitations. And then this will lead to this motivation of set universal circuits or overlaying conditional branches, and propose heuristic and discuss performance, and then discuss how this can be used. We need to custom design the evaluation protocol, because naive things don't directly work. Okay, so again, we want to do us a key, and we're given a function F, and we want to focus on the step one here, for computing the function F. Okay, so circuit representation is, so first, before I want to say why circuit representation is not the best, I want to say why it is the best. And that is because it's really the basic step in the circuit, you know, dividing over single gate is very important. And we don't know, you know, people try to solve it using this, a good function in this or that way, but it's, you know, practically, completely, those things don't work as well as circuit-based, for example, Carville circuit, simply because the basic step is so fast. But when we're looking at some applications that don't map very well in circuit, so some applications, some functions F are not well represented in Boolean circuits, and this is where we're beginning to have problems. And one very well recognized issue with representing a function as Boolean circuits is the random axis. And people, so this line of work on a linear stream addresses that there's a long line of work, and it's really celebrated and really important. And also what's the issue there with the random axis? And the issue there is that the circuits, they access a specific data element, that's how they're programmed, that's how they're designed, that's what they do. And whatever you want to access, let's say an element of an array, then we are a little bit stuck because if that array element is indexed by some private variable, some private index inside the computation, then we cannot go directly and achieve that data point, that data, right? Because that will reveal information about our internal state. So what we have to do, sort of naively, we just scan the entire memory and then we multiply out the data that we want to get, and then the oblivious strand line of work improves the asymptotics of that by the cost of having a much more expensive, this basic step. But this is, I wanted to mention it as another example of what people look at that deals with issues of function presentation Boolean circuits. And what we focus on today is that if you have conditional statements, if you have conditional conditions in your function, then when you put it into the circuit representation, then you're going to have to pay a penalty. And so to illustrate this, I mean it's pretty clear what we're doing here, let's visualize this. So let's look at an example, but we should be clear that this example naturally generalizes it to many settings that we want to think about. So let's look at a semi-private function evaluation. So this is where one player knows which of the functions they want to compute and the other player doesn't. Again, it will be clear later that it applies to the general secure function. So all right, so let's say we have the two players have the function big F, that is basically a collection of functions in small f that are indexed by the choice of bit or bit C, right? So that's how all these functions are. So this setting was actually motivated by our work, my private work on private databases, where the database client wanted to query the database and the query was implemented as a garbled circuit. So the policy, the database policy, would allow, let's say, 30 different types of queries each would respond to a certain circuit structure. So what we want, we want to hide not only what exactly query, what the data would look like, but also what is the type of query. And so this is an actual problem. This is kind of the formalization problem that arose from this database. Okay, so let's say we want to, what would the circuit for function big F look like? It will look like this, right? Basically, it will be a collection, this is the natural way of doing it, and this is the best we know to always do it. There would be a number of these subcircuit, f1 through f30, right? Each of these will implement this function that we look at, and then in the bottom, so the outputs of all of these subcircuits will be fed into this selection function, and that selection function will then drop the outputs of all one of them and propagate the output of that one, right? So here you see, if you have many causes, then basically the circuit, it has to evaluate all of them, because if you don't evaluate even a single one of them, then you know that that case didn't happen, and so that leads to prevention, so we cannot do that. Okay, so what can we do? And the first step is, kind of obvious step, is to realize that the available, so in a graphical circuit context, the evaluator does not see, this is not what he sees, f1, f2, f3, but instead he sees only the topologies of f1, f2, f3, and so on, right? And this is because, of course, the gain functions are hidden inside the graphical circuit and other things. So there is some caveats here, because in the free and sore, the gain functions are not hidden fully, because the XOR gates are special, you have to know where the XOR gates are. But for this talk, let's ignore this. For now, I just want to say that we handle this in a decent way, but so let's just talk about the case where this is this all that we evaluate and perceive. Okay, so now let's pretend that if we could just do this magically, right, put one in all of these circuits on top of each other, right, then the way that we can evaluate the circuit is now we can get rid of all of these extra clauses, right, and we can simply evaluate one of them, because the circuit generator can program each of these 30 clauses, because we just put one on top of the other, so like each one of them is possible to implement, right? So if the circuit generator will simply assign the right garbled table to each of these gates and send a single subset, and that's really effective. Okay, and this is basically what works in print for the same or what I mentioned. Okay, so now the question is, well, so the next question is how this embedding can be done, and we have a theorem that's saying that this is not so simple to do optimally, at least, and that finding opt for set, for given set of circuits, finding optimal embedding is anything harder. So what can we do? So of course, we can naively do what we'll show in the tour, we can do this, right, all of this can be done, that's one way, and that circuit here is a circuit that is universal for this set of circuits, but we want to do better. Another way is to use universal circuit, and there was universal circuit is getting more attention recently, but the universal circuit doesn't work too well if it's smaller switches because the cost is high. The universal circuit is solving a much higher problem at a much higher cost, right? So we, most reasonable functions that you want to compute, the naive method, the naive way will be better than universal circuit, because it's hard to imagine a really huge conditional statement. And people looked very briefly at hand designs, how you could design those things, and this is work of Paus Schneider, and I think Afmat, the contemporary PSS-09. This was kind of a more definitional thing, and they proposed some kind of ways to do this, but only for very trivial circuit combinations. I think that people really don't know how to generate circuits, how to program circuits. I try to talk to the carbon engineers, nobody really understands how circuits look, the computers generate circuits, okay, but people don't, people are not able to process it. Okay, so what do we want to do? Of course, we want to kind of like very naturally, very naive, we want you, the circuits have some common topologies, and we put one circuit on top of the other, we want to exploit this commonality. The question is that how do you do this? That's not clear. And then another observation is that we don't have to stick to kind of our given circuits. We can massage them, we can stretch them, some branches, we can insert no updates anywhere we want, no operation here to just pass through data, we can insert extra edges, and so on and so forth. As long as those operations still allow the regional function of the circuit to be implemented. Okay, so this is, now we want to move from kind of circuit world into the graph world, which is kind of a simple move, but this was work with graph theory people, and they wouldn't be able to understand this picture, but they would be able to understand this picture to me, and I think crypto people are sort of the same, basically. So given a circuit, we're going to map gates, so we are going to create a DAG graph, the graph theory people understand, by simply calling Boolean gates, we're going to call the nodes, and the wires we're going to call the edges. That's our transformation from one world to the other. There is a little bit more because we want to deal with the costs of the way of the gates, for example, we want to try to map that because of free extor, we want to try to map non-free gates with non-free gates, but I'm not going to talk about this now. Okay, and then we want to define formally in the graph theory world what an embedding is because we're going to embed circuits inside of other circuits, container circuits. So an embedding of a graph D into D prime is a mapping app that maps nodes here into, in the graph theory, lingo is alphaborescences, but this is basically a tree, a directed tree, so it maps nodes into this kind of trees, and then it maps edges to edges, so this agent. And we want to preserve the properties. The most important one is property two, and this property basically ensures that we can evaluate this function. So that this new graph can serve as a topology for implementation of the function that was implemented by graph D, and this isn't sure basically that it reserves those nodes, that any of these alphaborescences is connected by an edge to the next one according to what was provided in this graph. So if there is a way to program these gates and to connect these gates, then here that would be the same thing on the side. Okay, so how we can do all this. The heuristic is actually relatively simple at the high level, but somehow took a long time to come up with it. So the first observation is that for formulas this is easy, and we can solve it exactly efficiently using dynamic programming. So what we do is that given two graphs that we want to, so we start with two graphs, and we want to generate a third graph that's in both graphs, right? So what we do is that for each pairs of nodes in graphs, we are going to find the best sub-chain rooted at node B. So for each pairs of nodes B and 2W, Bs and D1 and W and D2, we find the best one, right? And then using dynamic programming we continue doing that, so for formulas we say that there exists an algorithm, of course, we go of the product of the sizes of the two graphs that will find the optimal embedding for D1 and D2. But circuits is not, formulas is not really what we want because they're not that useful what we really want with circuits, so this is where the heuristic comes in. And the heuristic is basically to take the circuits that are given to us, then drop a bunch of edges, make a bunch of formulas out of the circuits, and then apply the formula algorithm, and then to restore them, and then see where we go. So kind of a little bit in more detail, let's say D1 and D2 are the circuits that they're given, so we're going to generate a spanning forest based on D1 by randomly kind of choosing how we want to do it, so there is some randomness in this, a lot of randomness in this process, and we're going to drop some of the edges because we, you know, not drop some of the edges because each of these has to be a cheap formula, right? So we're going to have this bunch of things, it's going to be a forest instead of a graph, instead of a general graph, the amount going to be a spanning forest and a spanning forest here, and all of these are cheese, and then we're just going to pairwise compute the best possible combinations, how, you know, a tree from here will be overlaid with the tree of here, right, and it's going to be a quadratic number of those things, and then we calculate the cost of overlay, which means the size of the, in the way of the, of the sub-cheese, of the, yes, of the cheese in the forest, and then we choose the best kind of mapping between the sub-cheese here and here, and that's what we say, that's going to be our basis for, for the circuit, but that's not all because we still need to put the edges that we dropped, we need to put it back into what we received, into what we did, and that creates a couple of small problems that are not too hard to solve, so one problem is that when we insert the edges, it can be that it would be some nodes with large fan in, and that can be expensive in garbled circuit because the cost of the garbled circuit is, is financial in the number of inputs to the gate, but this is easy to solve because we don't, when this happens, like let's say if this gate receives too many inputs then we can split this gate into a number of gates, and then, you know, those other inputs will be fed here, and you can still use the function that, that you want to do, and another one is that cycles can be introduced in, in the DAG, because when we insert the edges that we dropped, and it's an example of how a cycle may be, like this is introduced, but the cycles can be broken again by introducing a couple of nodes, and it works out. So, so this is a high level, this is, this is the heuristic, and basically to again to reiterate, this is a probabilistic heuristic, and the probability, the, the random points coming here in how we are choosing to split the circuits into the sub-trees, and then which edges are, yeah, which edges are dropped. So, you can have, we can be lucky, and then if we, if we, the random we choose the trees that map well to each other, then we're going to have a good small container circuit, and if we aren't likely to have a large container circuit, so what we can do, we can keep trying until something works. And this is a slow process, and this stuff we did in Python, so this is a little extra slow, but the point is that this is a one-time compilation expense, and once you've generated a circuit, then you're done, and you can use it to create a circuit. So it's, it's really hard to, to analytically estimate what, what we, what we're getting, so we did experimentally. We generated a bunch of circuits for using a CPC compiler. We tried to get as diverse circuits as possible, but there is limited number of, you know, smaller sized circuits that make any sense. So we, you know, we did all kinds of arithmetic operations in combinations of them, hoping that this will create, you know, it will be kind of confusing for our heuristic to work. But also, at the same time, it reflects reality where the circuits that we're dealing with are kind of consistent with basic building laws. So then what we did is that, so given all these 32 circuits, we did this tournament-style pairwise competition. So we looked at all pairs of circuits, ran it for a while, and chose the lowest cost container, and this is what happened. So basically, we looked at the lowest cost, the total level, we have around one lowest cost pairing. That's what we, that's what we did. And then we continued. So we went from 32 circuits, you know, pairwise, we generated, we got 16 circuits, right, each of these, each of the 16 circuits, each of them was able to implement two of the input circuits. And then we continued. That's what we got, one circuit in the end that is universal for all of these. And the number of circuits per, number of non-exorgetes per round in total of the, of the various circuits, we started at round zero where the input circuits had about 20,000 gates, right, and each round was reducing the number of non-exorgetes. And in the end, this circuit that embedded everything had quite small number of gates. So this is good. And if you have a private function evaluation scenario where one player knows the function that he wants to compute, then you can directly use this. Given this circuit that embeds everything, you can, that player can simply program that circuit to implement the function that he wants to use. And then, so there's no extra overhead compared to standard Garble circuit, for example. But if we go to generic, the general secure computation and your, your conditional statement is probably based on the internal variable, then that doesn't work. And so, so what works? And you can think of an naive, so the most straightforward way is that the circuit garbler can generate, you know, those K garblings. So you have K, the K clauses in your conditional statement and you generate K garblings. And then you do OT where one of these 32 is sent, right? So that's fine, but it doesn't actually work. Because under the good in OT, you have to send all of those K encryptions, right? And that's the cost of, that cost will be able to, to sending all of those in using a naive method where you just, just send them all. So what you can do is that instead of generating K garblings, you could generate K programming strings, where programming strings consist of a number of elements, a number of bits, positions, where each position says which gate function is assigned to this gate and this gate, right? And then you do the OT on the programming strings and the evaluator receives the programming string and then he uses the programming string to run the OT per gate. So then, so now per gate when we evaluate this, we're going to run one out of small number of OT. That doesn't depend on the number of clauses, it's going to be one out of five. Because there's, you know, there's three possible gate functions that you can have and then we introduce a couple of additional gates of gate functions because you want to do like a master gate. So you do one out of five OT. You don't have to do an additional gate because you can't just let the, the evaluator learn programming function, because of course you will learn the functions, you can evaluate it, but you can mask it and then during the government transfer you can unmask it, so this works. Okay, and so this course is kind of significant actually and our kind of, you know, the simple calculation shows that we're about 13 times lower compared to this, to the top gates. So to break even, we, we need to have a larger clauses. We expect like even 64 circuit and I think we can do better with optimizations, but if we go, if we don't do the verbal circuit approach, but the GMW approach, then we can do actually much better. It's okay. And the intuition, why GMW works better. So, let's see if I can just convey the intuition. The main cost, if you're doing the verbal circuit approach, the reason why the OT of the whole thing doesn't work well, because you need to send long secrets. So the whole entire garbling is one long secret, right? And if you, if you're doing one of the OT, then you need to send K of those long secrets. But if you are in the GMW case, then the secrets that you're running, you're sending inside the OT, they're short, they're one big secret, because they're just shares of the wire. And because of that, you can send a larger number of secrets for cheap. And I guess I don't have time to go into more detail, but just to state the result in the GMW, you can, you can have GMW gates with multiple inputs with let's say eight binary inputs. And the cost of this kind of comparable to the, to the sink into the two-input parable gates. So this is, this is, this iteration, we made, and this iteration was made independent in currently by the work of a number of authors, Thomas Schneider, one of the authors. The paper was titled something like the computing on the lookup tables or something like that. But anyway, so in the GMW setting, the circuit reduction that we get directly maps into the performance improvement in the, in the evaluation, and there is no additional overhead due to this need for, for the OT. And that's, that's it. Thank you very much. Thank you very much. Thank you for questions. So if you have any questions, you can contact the venue later. So let's thank the venue again. Thank you very much.