 And in this work, what I'm going to talk about is in a symptomatically efficient way to do this evaluation procedure. The only thing that I'm going to say about general how homomorphic encryption works is that in some sense, they all follow the original approach of Gentry, where the ciphertext has some noise is in it, that this is why you get security. And the noise grows with the homomorphic evaluations. And you cannot do too many homomorphic evaluations else the ciphertext gets too noisy and you cannot decrypt anymore. That's pretty much the only thing I'm going to say about how these homomorphic encryption scheme work. The consequence of that, and it was very clear in John Sebastian's presentation, you have very big ciphertext. A ciphertext size have to be large enough so that you have room for all of these noise to grow in it. In particular, definitely you cannot get below something which is quasi-linear in your security parameter. And there are many things that can be hidden inside this quasi-linear thing. The practical numbers, as John Sebastian said, is the ciphertext size is something like a few million bits. And it encrypts a single bit in it. Well, now when you're going to evaluate things on these huge ciphertext, it's going to take you a whole lot of time. So if you want to evaluate a single arithmetic gate on bits, when you do it homomorphically, you have to take in these two very, very large ciphertext. So it's going to take you a lot of time. At the very least, you need to read the input ciphertext. So if you just follow this approach the way it is and encrypt individual bits, the only thing that you can expect to, the best thing that you can expect to, is blow up in running time, which is the size of your ciphertext. So at least quasi-linear in your security parameter overhead. So if it takes you some time to evaluate the original circuit, at least a factor of security parameter larger is what you can expect with homomorphic encryption. The result that we have in this paper is a more efficient way of doing things that lets you reduce the ciphertext. In particular, you can get, in some cases, you can get overhead, which is only polylogarithmic in your security parameter. As long as the original circuit that you were trying to evaluate to begin with is wide enough. The approach is based on batching, and it's based on packing many bits inside a single ciphertext. So if your circuit ahead of time was wide enough, so you had enough bits that you can pack in a single ciphertext, then you can really decrease the overhead to polylogarithmic. Otherwise, you get this more complicated expression, which tells you, well, if you have a circuit like this, it has a T over W levels, where T is the number of gates, and W is the average width, and each level takes this expression to evaluate. The overall approach that we're using is we're going to use homomorphic encryption over polynomial rings. We're going to pack many bits of plaintext inside a single ciphertext. We're going to use ring odomorphism to move bits around inside of these arrays of encrypted bits. And the main technical thrust of this work is how to do that efficiently, how to move bits to the place where they supposed to be in an efficient way. And I'm going to describe all of those, at least on the very high level. So let's take a little bit of background, talk about homomorphic encryption over polynomial ring, talk about how do you pack many bits inside a single ciphertext, and how do you do operation in a simdy fashion on these packed ciphertexts. So homomorphic encryption over polynomial rings. We start from the currently most efficient family of homomorphic encryption schemes. There's the paper from earlier this year by Bacarsky et al. Then there's the Lopez-Alt paper in stock. And there's another paper of Bacarsky on e-print. All of them belong to slightly different variants of family of functions, such that the native plaintext space in those all of these schemes are actually polynomials. The thing that you manipulate, the thing that's encrypted there, is actually a polynomial in some polynomial ring. When you do multiplication, what you get is multiplication of polynomials in that ring. When you do addition, you get addition of polynomials in that ring. Now, you can use those polynomials to encode single bits, but as I will show in a minute, you can use them to encode things other than bits. It's binary polynomials, modulo phi m of x. Phi m of x is the m cyclotomic polynomial. This is a polynomial whose degree is phi of m phi, the Euler-Rotation function. And those polynomials are actually irreducible polynomial when you look at them over the integers. But modulo 2, when you look at it as a plaintext polynomial, you think of it as modulo 2. And modulo 2, they are not necessarily irreducible. They can factor into a product of irreducible polynomial. And it happens that they all have the same degree. And depending on what m is, the number of factors can vary, but you can choose a family of parameters m such that the number of factors grows almost as quickly as m. It's up to a log factor. So you have the polynomial that define your ring factors into a product of very many things. It's a very smooth polynomial in that sense. What do you do of them? Well, if you have the ring polynomial that factors into many small factors, then you can use Chinese remaindering to encode many things inside such a single ciphertext. So a polynomial a would represent L different elements. And the representation, one way to do it, is to think of a modulo each one of the factors of your ring polynomial. And this is just like representing an integer as Chinese remainder in modulo many different small primes. And if you want to encode bits, if you want to encrypt bits, then you just set it up so that each one of these polynomial a mod f of d is just a bit. It's just a constant polynomial with everything but the frequent efficiency being 0. Just like when you do Chinese remaindering, you can have an integer whose 0 mod 2, 1 mod 3, 0 mod 5. So it's 0, 1 modulo each one of the large primes. Bigger things fit in these slots, but you can put bits in them. And now when you did that, it behaves exactly like Chinese remaindering over the integers. When you multiply two polynomials, what happens is that each modulo each one of the factors, that particular part of the polynomial gets multiplied. So you get multiplication. If you take two of those and add them, then you get addition point wise on the AJs. If you take these two and multiply them, what you get is a point wise multiplication of these two plaintext slots. So you get an L add and L mod. We just operate on arrays as opposed to on single elements. And that gives you your SIMD operation, single instruction, multiple data. That's a smart and Valkatran suggested that possibility. You want to compute the same function L inputs at the price of one computation. So you're just going to pack your inputs into the slots. So you do a bit slice implementation. All the inputs to your first instance are going to be placed in the first slot of all your cipher text. All the inputs to the second instance are going to be in the second slot of all the cipher text. And then when you do your function computation, you compute these L instances of the same function on different data all at once. And you get your efficiency this way. One useful operation that you can do when you can do these point wise addition and multiplication is an L select operation. So suppose I have two cipher text, and I want to mix and match the plaintext in them. So I want to get another cipher text that only has the red data in it. So what I can do is I can multiply by a vector of ones and zeros. I'm going to get a vector of only the red cipher text and zero elsewhere. And then when I add it, I'm going to get my vector of mixed and match between the two. And this is a very useful operation. I'm going to use it a lot later. So let's forget about encryption at all for a moment and just talk about how do you compute in data arrays? What do you do when your input is presented to you as arrays? So suppose you wanted to compute some function, forgetting about array for a second. Clearly, addition and multiplication is enough to encode any function that you want. This is a complete set of operations. But now what happens when your input is packed in these arrays? And the thing that you can do is this L add and L malt operation that just do point wise addition and multiplication. Well, now it's not a complete set of operation because when I want to add x1 to x2, there is no way that I can do that. x1 and x2 belong to two different slots and there's no way these two can interact. I need to have a way to move these data around so that they can interact when I have my array wide operations. So this is our input. But what we really want in order to do their operation is the same input presented in different way, things packed in different slots so that we can add them. And we want to add x1 and x2, so they both need to be in the same slot of two different ciphertexts. What we suggest is, well, suppose you had, in addition to L add and L malt, suppose you had an L permutation operation, which would take one array and a permutation specified in the clear. You know what permutation you want and just apply that permutation. Well, clearly if you had that, then things would work out. For example, you want to have these three gates. You have x1 up to x5 as your input, but they're packed in this particular way in your array. You just copy the array. You copy the ciphertext. You have another copy of the same array of bits. You apply whatever permutations you want in order to move the plaintext slots to where they need to be. You apply this malt by 0 and 1. You get things aligned. You add them, and you get what you need. So if we had this L permute, that would have been enough. So here is our plan to how to compute on encrypted simd data. Basically, we're going to work in the simd fashion for each individual level. But then between levels, now we're done with level i, and we need to go up to level i plus 1. Things are not where they need to be. So we're just going to use L permute to route these output of level i into the appropriate place in the input of level i plus 1. This is the plan. It's not entirely obvious how to do that. L permute clearly helps you if you have it, but there are a few other things that you need to answer. First of all, you need to say how to implement L permute. It's very nice to want to have it, but if we really want to have it, we need to tell you how to do that. And then once you have L permute, it's not enough, because A, it's not a permutation from one level. To the next, some gates have high fan out, so you need to clone this, and you want to do it in an efficient way. And the second thing is L permute just lets you permute things inside one array. So these are permutations on these little arrays of only L entries. But the level itself could have weeds much larger than L, so you need to take this basic operation of little permutations and turn it into a big permutation on the entire level. And that's another thing that you need to do. So I'm basically just going to have time to talk about the first bullet there, which was supposed to be a 1. Is it a 1? No, it's not. Anyway, so let's talk about how to implement L permute. So let's recall there is a native plaintext, which is binary polynomials modulo dm cyclatomic polynomial. That polynomial encodes L different plaintext element, and additional multiplications are done pointwise. So the question, is there a natural operation of polynomials that you can apply that would move these values between slots? Turns out that there is. This was used in BGV. Here is an operation that you can apply. It's an algebraic operation that you can apply to polynomials. Take the polynomial a and the integer j in the clear. And map it into the polynomial a applied to x to the power j. This is operation inside the ring. So a applied to x to the power j is then reduced modulo the ring polynomial. This is similar techniques was used in LPR two years ago. Now, I'm not going to say anything about why that worked. I'm just going to tell you. I mean, it is in the paper in quite a lot of details. But I'm just going to tell you that, roughly speaking, the effect of that operation is to rotate the arrays in a cyclic way. It's not exactly true. But it's a good enough approximation that we can work with it. So for example, if a of x is a polynomial that encoded some l elements alpha 1 through alpha l, then maybe a of x to the power 5 or some other, maybe some other number other than 5. But there is some number that would make a cyclic rotation by 1. So if you compute power 5, it turns out that it encodes the array shifted by 1. Now, even if it was true, which is almost what it almost is, it still doesn't tell you how to do it on ciphertext. But at least it's an algebraic operation. So you can think maybe it would work nice with the algebra of the encryption scheme. And it turns out it actually does. It works well with the algebra. And you can actually, once you jump through enough hoops, make these cyclic arrays, cyclic shifts of arrays. So we have cyclic shifts of arrays that we can implement based on the native crypto system. But this is not what we want. We want generic permutations. How do you get generic permutation out of cyclic shifts? So here we can use the classic results from the 60s of Benes and Wachsman about permutation networks. Just a reminder of what permutation networks are. These are two back-to-back butterfly networks. A butterfly network consists of many exchanges. Each exchange has two straight line and two cross, two straight edges and two cross edges. And the first level, the exchanges are between a Jenset element, the second level between element of distance 2, the third level between element of distance 4, et cetera. And the results of Benes and Wachsman said that you can have this control bit for each exchange, saying if you have the data on the left and you want to send it to the right, whether you want to use the cross edges or the straight edges. By setting appropriately the control bits, you can implement any permutation that you want. So you have a target permutation that you want to implement. You just set the control bits appropriately and then run your data through this network. And at the end, come a permutation of your original network. And now my claim is that every level of this butterfly network can be realized using just two shifts and two selects. So I want to implement a whole line of these exchanges. Each one of them either use the cross or the straight edges. And I'm going to claim that if you have these array operations of shift, add, and mount, or shift and select in this case, you can use a constant number of them to implement a single level of the butterfly network. So here is a proof by example. This is the second level of the butterfly network when you do exchange or straight for bits at distance 2 from each other. And this is the permutation that we want to implement. 0 and 2 should be exchanged. 1 and 3 should go straight, et cetera. So this is the permutation that we want to implement. It's the same permutation from the previous slide. 0 and 2 needs to be exchanged. 1 and 3 go straight. We replicate the input array twice. We do shift by minus 2 and shift by plus 2. And then all we need to is select the appropriate bits from the different ciphertext. So 1 and 3 needs to go straight. So they come from the first ciphertext. 2, 6, and 7 go to the left. So they come from the second ciphertext. And 0, 4, and 5 go to the right. So they come from the third ciphertext. So one select operation, another select operation. And we have what we want. Here's the same proof in text. The main point, the thing that makes it possible, is the fact that in the butterfly network, all the exchanges are between elements that are of the same distance from each other. Level i, all the exchanges are between elements that are of distance 2 to the i. So if you have a shift 2 to the i in one direction and a shift 2 to the i in the other direction, this is all you need. OK, now that we have this observation, then realizing permutation networks is easy in an asymptotically efficient way. Every level takes a constant number of operations. There are log l level. Therefore, you can implement an entire l select in just order of log l operations on arrays. Of course, if l is not a power of 2, then some addition complication. But they're not that difficult. If you wanted to program them, you would have a hard time maybe. But in terms of asymptotics, it's not a problem. At most, you multiply it by 2 the number of operations that you do. Let's step back a little bit. What we want is routing values between levels. So the first thing you need to do is to implement this l permute. And this is what I talked about. You use this mapping from x to x to the j to get simple shifts. You use Banish Networks to get arbitrary permutation. And all of that takes only log l of operations. Then you need to do cloning to get a high fan out. Then you need to implement permutations on array wider than l. All of that can be done in log time. I don't have time to talk about it. The end result is that the intra-level routing takes order of log of the width operations for a level of width lw. Going even farther back, what we want to do is low overhead on morphic encryption. So we pack things into arrays. The arrays can be made to be as large as your security parameter up to a polylog factor. Then you use SIMD operations to implement each level. And you route the values between the levels using the techniques that I described. Every level, including the permutation that proceeds takes that much work. The security parameter, quasi-linear in the security parameter is essentially the time that it takes you to read the ciphertext in. The operations on the ciphertext are typically quasi-linear because they're polynomial multiplication. You can use these FFTs and such. And the number of ciphertexts that you have to represent each level is the ceiling of W over lambda. So the total work for size t width w circuit is the number of levels times the time to do each level, which is what's written on the slide. OK, I'm done. Any questions? No? OK, let's thank this. Thanks again.