 Hello everyone, thank you for listening to this talk about breaking the circuit-sized barrier on the quasi-pollinomial LPN. As this is joined to work with Joffre Coutot, and the way I like to present our results is to say that we show how to achieve sublinear computation from LPN by using a tool known as homomorphic secret sharing. Now I'll get into what HSS is in a moment, but for now let's just have a look at our main result. Our main result is that under a reasonable flavor of LPN, there is a two-party protocol for securely computing any circuit which uses an amount of communication which is only sublinear in the circuit size. Now more precisely, and ignoring the assumption for now, if all you require is very slightly sublinear communication, for instance S over log star of S, then the amount of computation required is nearly linear. But you may wonder how far can we push the factor of sublinearity using our techniques. It so happens communication can be as low as S over log log S in which case computation becomes cubic. For the rest of this talk, I'll be explaining how our protocol works. But in order to do that, I'll have to start by introducing a few objects. But once that's done, we can go over step by step how our protocol works. I should start by mentioning that our protocol is of course not the first one to achieve sublinear two-party computation. This was known both in the correlated randomness model and from a variety of computational assumptions. So our contribution, as it were, is to add LPN to this list. As a matter of fact, our techniques are very much derived from two lines of work. So the first is the HSS line of work and the second is CUTO 19 in the correlated randomness model. But what is the LPN assumption? I'm sure most of you are familiar with this formulation of the learning parity with noise assumption. So it's parameterized by three values. So the dimensions of matrix A as well as the density of the noise vector or noise or error. But for our purposes, it's more convenient to consider the dual form of LPN, which is essentially equivalent. So here goes. I can always give you, instead of the matrix, its parity check matrix. That's roughly equivalent. And let's look at what happens the second term. If we multiply it by the parity check matrix, since H and A cancel out, all you're left with is H times E. So this is the dual LPN assumption. Essentially, if you wish to generate a pseudo random vector, all you need to do is to multiply a public contracting matrix by some sparse noise vector. And the parameters are still there as the dimension of parity check matrix H as well as the density of the noise. And let's look at the values of the parameters for our specific flavor of assumption, which is the quasi polynomial LPN assumption or super polynomial, if you will. Roughly, it's just that we assume that the adversary is allowed to run in a slightly super polynomial time. So the best known attack is exponential. So I'll leave you to judge whether the gap between the best known attack and the level of security will require is large enough to justify this assumption. The reason why we use the LPN assumption is that it enables us to instantiate two key primitives around which our protocol is built. The first is a pseudo random correlation generator or PCG. So the way it works is that there is a target correlation and we wish to be able to generate a pair of very compact seeds that's very important that the seeds be compact. That can be each expanded in such a way that if you take the distribution of expanded seeds, that should be computationally indistinguishable from the target correlation. Here, we'll only be interested in a PCG for a very specific correlation. What we want is to be able to generate seeds which expand to additive shares of the first D tensor powers of a pseudo random vector R. The reason why this is interesting is that any degree D polynomial in the elements of R can be obtained via a linear combination of the elements of these D tensor powers. This specific PCG allows us to build our second primitive function secret sharing. So given a function, we want to be able to derive a pair of evaluation keys which can be used to generate additive shares of the evaluation of any point. An important requirement is that the function be kept secret. So more specifically, if you're only given one of the evaluation keys k0 or k1, you shouldn't learn what the function is. So function secret sharing is defined with respect to a function class. And in this case, we're interested in function secret sharing for the class of all depth D circuits. I should briefly mention that function secret sharing is the dual of homomorphic secret sharing, where instead of having a secret function, now we have a secret input. But for the purposes of this talk, we needn't be concerned with HSS, simply mentioning that there exists a dual form. Finally, and just after this, I can present a protocol, we need to talk about circuits. Because our protocol doesn't work for the class of all circuits, but rather for the class of all layered circuits. And the circuit is layered if you can partition the gates into layers such that wires don't cross. Each wire only goes from one layer to the next. There is a generic transformation from circuit to layered circuits, but it incurs a quadratic blow-up in size. So that's why sublinear protocols don't carry over from layered circuits to general circuits. And in this presentation, we'll consider the simplified, and in this presentation, we'll only be considering the simplified case of rectangular layered circuits. So every layer has the same size, which we call the width of the circuit. So the width times the depth is equal to the size of the circuit. I'd just like to emphasize that this simplification is only done because the protocol becomes simpler to explain. But the protocol does work for non-rectagular layered circuits. We now have everything we need in order to understand the protocol. So consider for now a circuit C1 of width w and of depth k. Let's now also assume that there exists a protocol which allows two parties to convert additive shares of x into additive shares of C1 of x, and that this protocol only uses communication roughly proportional to the width. If such a protocol exists, and we consider now another circuit C2, then the parties can reuse the protocol in order to convert the additive shares of C1 of x into additive shares of C2 of C1 of x. And they can do so over and over for any such circuit. So this is how we use the assumption that the circuit is layered. First, you divide the layered circuits into chunks of k consecutive layers, and you apply such a low complexity protocol to each layer. If you do so, you'll obtain a secure computation protocol whose total communication is S, the circuit size, divided by k. So if k can be super constant, we've won and we have the protocol win. Fortunately for us, such a low communication protocol for shallow circuits exists from LPN. The amount of communication is all right. Unfortunately, the computation is too large. As soon as k is super constant, the computation will be super polynomial. In order to understand why that is, we'll have to open up this secure computation protocol for shallow circuits, which actually uses function secret sharing. So for the next few minutes, the talk is going to be very technical. Fortunately, there's only one key idea which needs to be extracted from this. So I'll first go into the details, and then we can just extract the main idea. So consider a shallow circuit C1 of depth k. Recall that the goal of the parties is to convert additive shares of the input X into additive shares of the output of the circuit. So at a high level, the protocol has three steps. The first is to generate the first 2 to the k tensor powers of some random vector we'll call the mask. We call it the mask precisely because we're going to use it in order to mask the input as the second step is for the parties to exchange shares of X plus R1 in order to reconstruct this mask value of the input. The first is a third step is to use what was generated in the first two in order to locally, so without communication, generate shares of the output. So let's see now how this works. The first step can be done by considering a PCG for the correlation we defined much earlier, and run a generic two-party computation protocol in order to distribute the seed generation. Since the seed generation is compact and the seed is very short, even this protocol needn't be sublinear. Just any efficient enough two-party computation will do. Then the parties can just locally run the seed expansion protocol in order to get the shares they need. The next step is simply for each party to take the sum of their share of X and their share of R1 and their share of the input and their share of the mask in order to define the new share of the mask input. The parties can safely exchange these shares in order to reconstruct the mask input. And the third step is I'll explain now. So the way it works is that if you consider the circuit which first removes the mask R1, then applies the circuit C1. So consider this circuit which has the masked hardcoded. It can be expressed as a polynomial whose coefficients are themselves degree 2 to the k polynomials in the elements of the mask. So since shares of the first 2 to the k tensor powers of R1 are enough to get additive shares of each of those polynomials, the parties have all they need in order to locally compute additive shares of C1 of X. Now let's just have a look at the communication complexity. So step one only uses a small amount of communication because the only thing which requires interaction is the distribution of the seed generation but that's time. The second step requires communication proportional to the size of the input i.e. the width of the circuit and this is going to be an important part of the rest of this talk. So I'll remind you of that when it when it when it's relevant. And what's remarkable is that the third step is completely silent. It requires no interaction. If you'd like you could pause the video and observe that what we've done or rather in what we've done is is actually hidden a an instance of a function secret sharing scheme for the function which first removes the masks then applies the circuit C1. But this observation isn't actually central to our protocol design. Now that we've seen how the protocol for computing a shallow chunk of layers works let's try and understand why its computational complexity is of the form w to the 2 to the k. It all has to do with how the PCG works. So as you may have guessed the pseudo random mask is in fact generated using an instance of dual LPN. In other words we have some public contracting matrix which is not as square as it looks here which is multiplied by some secret noise vector. And the fact of linear algebra is that if you take the tensor power of a product this is equal to the product of the tensor powers. Therefore if you wish to compute additive shares of the tensor power of R we can do so by multiplying some public matrix by an additive share of a tensor power of E the noise vector. So what I have stated before is that our PCG is in fact a wrapper around another PCG the one which generates additive shares of the tensor powers of the noise vector. So we don't need to to to have a look at how that that PCG works. All we need to know is that there is some clever way to exploit the fact that the noise vector E is very sparse in order to obtain a very compact seed generation. But now it should be clear why the computational complexity is of the form w to the 2 to the k. It's simply the size of the matrix the public matrix h tends to 2 to the k times. What we just saw is that the computational complexity of this primitive is prohibitively high to directly apply it to a chunk of of layers. Now this can't be the end of it or well I wouldn't be giving this presentation. As a matter of fact we can use an observation made by Couto who faced a similar problem in the correlated randomness model and this observation was that we're dealing with arithmetic circuits where each gate has fan-in at most 2. This combined with the shallow depth of the circuit means that it's highly local. Each output only depends on at most 2 to the k inputs. More generally if you take w prime outputs they can they only depend on at most 2 to the k times w prime inputs. So we can decompose the computation into small blocks this way and each of those blocks is now small enough that applying the previous protocol can be done in polynomial time. Indeed if k and w prime are small enough the computation is 2 to the k times w prime to the 2 to the k which can be polynomial in w. So the question is are we done here? Is all we need to do applies this decomposition into blocks and treats them as independent instances. If the answer were positive the protocol would be very simple but we wouldn't be contributing anything so obviously the answer is no. Now to understand why you have to recall how the protocol worked. Step number two required the parties to reconstruct the masked value of the input. So the communication was equal to the size of the input. So for each block the input has size 2 to the k times w prime so plug that in and you'll see that overall the computation will be too large. At this point you may pause and wonder because well you may be convinced that if you take this particular decomposition the first w prime outputs followed by the next w prime and the next and the next then this decomposition doesn't work. But does this mean that there is no block deep composition which works? Let's express this as a combinatorial problem. The circuit which is adversarily chosen is essentially for our purposes here just a list of subsets of inputs. Each input corresponds to at most 2 to the k inputs and our goal is to find the nicest decomposition which works for us. So what we want to do is instead of just batching double as the outputs arbitrarily we would like to batch them to batch together outputs with the most overlapping inputs in the hopes of finding a decomposition into blocks where each block has a small enough amount number of inputs that if we were to apply the protocol to each of those blocks overall the communication would still be linear in a w. Now unfortunately no such decomposition can exist. To understand why we have to remember that the circuit is adversarily chosen. In other words our protocol has to work for every single circuit or rather for every single circuit even worst case for us we have to be able to come up with a protocol. So in particular if the matching of outputs to inputs is just a random bipartite graph with high probability this is an expander. Being an expander means exactly that we won't be able to find such a small decomposition. So there is no way of simply finding such a decomposition which would allow us to just invoke independent instances of the previous protocol. Well if we can't treat these blocks as independent we'll just have to deal with them all at the same time in a correlated fashion. So since the issue we raised just before was that we couldn't generate too many little masks what about just generating one single mask and whenever we want to compute one of those blocks we'll only consider a submask. So unfortunately this doesn't actually solve the problem of computation so it's very easily we can very easily see why visually. So this is the mask R which covers the entire span of the inputs it has size w. Now let's say we're only interested in a submask R prime of size 2 to the k times w prime. So what we want to do with this vector is compute the first 2 to the k tensor powers of R prime. Unfortunately as you can see H prime has still one of its coordinates which depends on w. So if you actually raised R prime to the 2 to the k tensor power the computation still is still of the form w to the 2 to the k. We've only reduced one of the dimensions of the matrix not both. So perhaps on a more intuitive level the idea is that when we're considering a submask we're still looking at a linear combination of all of the noise vector. So that's why it doesn't really work. It's simply because the observations that we had input locality does not translate over to compactly generated submasks. At this point we've reached the heart of the problem. We need to define a PCG for a new type of correlation namely the following. When the seeds have been expanded we want the parties to get two things. The first are additive shares of a long mask of size w. The second is that for every submask of size 2 to the k times w prime we want the parties to be able to get the first 2 to the k tensor powers of that submask. The only question that remains is whether we can instantiate this new PCG under LPN. The answer is yes and the solution is the following. Instead of generating the random mask directly as public matrix time sparse noise we're going to generate it as the sum of many shorter masks. Now the point is if we're only interested in a given submask r prime it will only depend on a local subset of the shorter masks. Now if each of the shorter masks is generated as a small instance of LPN so some small matrix HI times EI well it so happens that even if you concatenate all of the matrices defining the masks which intersect with that subset the concatenation of all these matrices is still a small matrix. In particular it can be raised to the power 2 which is okay while remaining polynomial in w. So one observation is that since we've drastically reduced the size of our LPN instances we need to assume now the quasi or super polynomial security of LPN. But now if we go back to our main result we can understand why we require the super polynomial hardness of LPN in order to instantiate our protocol. Let's now briefly summarize how the protocol goes. The first step is to take the layered circuit and divide it into chunk of k consecutive layers where k is the desired factor of sublinearity. Then we take each chunk and break it down into blocks where the goal of each block is to compute only a small subset of the outputs. Finally each block can be evaluated using our new PCG. So when you plug all of this together we have a certain number of constraints on the dimension of the LPN problem, the size of the blocks, the width of the depth of the chunks and playing around with all these parameters we find a sweet spot where everything works together. The sublinearity factor can be anywhere from an arbitrarily small super constant factor to almost log log s and computation follows from whatever the communication is. With this I'd like to thank you very much for your attention and goodbye.