 with. Okay, so hopefully people here know a little bit about what fully homomorphic encryption is all about. It in a nutshell allows for arbitrary computation on encrypted data. In this talk, in this talk, hello, I need help. Oh, now it's working. In this talk I'll be focusing on linear transformations and more specifically on applying a fixed public linear transformation to an encrypted vector. There are many other variations you can consider like computations involving an encrypted matrix and a plaintext vector, an encrypted matrix and an encrypted vector. But I'll be focusing on the situation where we have a public plaintext matrix and an encrypted vector. And I'll be focusing on the case where we're looking at the BGV fully homomorphic encryption scheme. A lot of the stuff I talk about today will actually apply to other schemes as well. So I'll present some new algorithms and talk about their implementation. The implementations are all in the HE Lib library, which Shai Halevi and I have been working on for a few years. We get speed-ups of up to 75 times. So you have to take this maybe with a bit of skepticism or a grain of salt, truth and advertising. I mean, yes, we do get a 75 times speed-up on some parameter settings that do arise in practice. Sometimes we get less, but that's about the best that we get out of what we do. Of course you could also look at this more pessimistically and say, well, our old implementation was 75 times slower than it really should have been, and now we're doing better. So why this problem? Well, one reason that we focused a lot on this problem is that it arises in bootstrapping. There's a few different ways of doing bootstrapping, but inevitably it involves some kind of a change of basis somewhere during the computation, and that's where this comes up. There's a new way of doing bootstrapping that came out this year at Eurocrypt, and we implemented this, and we found that most of the time is spent performing this change of basis, and so improving the linear maps is very important here. This problem really needs to get fixed, and we get a speed-up of up to about six times for the bootstrapping as a whole, because not all the time is spent in bootstrapping on linear transformations, but a lot of it is, and so with our improvements we get a six time speed-up. Okay, I'm going to stop and ask for this problem to be fixed before I continue. I've wasted about five minutes on a two-minute slide because of this. Am I pressing the button? It's not advancing when I press it. I have to press it like ten times. Are you doing it something different? The left one. Left or right? Right to advance. Left to develop. Okay, so a review of the BGV cryptosystem. We're going to be working a lot in a ring R, which is the ring of integer polynomials modulo of cyclotomic polynomial phi n of x. The plain text space is going to be this ring of polynomials modulo of cyclotomic modulo of small prime p, and the cipher text space is going to be built from, we're going to be working a lot in this ring. The ring of polynomials modulo and integer q, where q will be a large number actually, and the cipher text will be a pair of these ring elements in Rq. A secret key will be also a pair of these ring elements where the first one is actually the unit, and the second one is kind of a random element in this ring. Chosen with a specific kind of distribution, though. It's a small norm ring element, and if you're given a cipher text, which is a pair of these ring elements, and you want to decrypt, you basically just take the inner product of the cipher text and the secret key, you do everything mod q, reduce it mod q, and then what you're left with is the message plus some noise, and the noise is a small multiple of p pressing the button just like he did. Okay. So we're going to be, so a lot of the computation is in this ring Rq, and I want to talk about a couple of different ways of representing elements in this ring because that will really have a bearing on the algorithms and their efficiency. The most natural one would just be the coefficient representation, you're working with polynomials, and you just write down the coefficients. The other one we call double CRT, and here we're going to impose the restriction that the modulus q is a product of small primes. So each of these small primes has to contain an nth root of unity, and is the, we're working with the cyclotomic polynomial phi n, and we need nth roots of unity in each of these small fields, and an element in Rq is going to be, I'm just going to put everything up here. An element in Rq is going to be, in double CRT we're just going to take our polynomial, reduce it, module each of these small primes, and then evaluate each of those polynomials at these roots of unity, and that's kind of an equivalent representation. So that's the double CRT representation. The nice thing about double CRT representation now is that, while addition takes linear time, you just add things, modulo each of these small primes, and so does multiplication by a constant. Also takes linear time. Assuming that the constants are themselves represented in this double CRT format. The one thing to note is that switching back and forth between double CRT and coefficient representation is, is somewhat expensive. You have to do Chinese remainder ring, and you have to do these FFTs, Fast Fourier Transforms to go in between the coefficient and the, and the evaluation representation. So to multiply two ciphertexts in double CRT representation, I won't go into the details, but basically you're just multiplying these double CRT, which double CRTs, which itself is linear time, but then you end up with a ciphertext that's defined with respect to a different secret key. So you have to do some operation, which we call key switching. I'm pressing this button. Come on, there's two. Okay. So what we have to do is encrypt this other key, right? We get an encryption with respect to the wrong key. We encrypt this other key under the original public key. That has to go into the public information, which we call a key switching matrix. And using this, we can convert this product ciphertext back to an equivalent encryption under the original key. So that's called key switching, which is kind of a key part of any homomorphic encryption scheme. This key switching, though, is expensive. I'll talk a little bit more about it later, but it does require conversions between coefficients and double CRT representations, and those are actually somewhat expensive. Now, before I talk about linear transformations, I need to talk a little bit about the structure of the plain text space itself. Remember that the plain text space is the ring of polynomials mod p and mod phi n, basically. And if we look at how the cyclotomic polynomial factors mod p, then we get a bunch of irreducible factors. And by the Chinese remainder theorem for polynomials, we get that this coefficient, that this plain text space is isomorphic to a product of finite fields. So we can think of the plain text space as being a vector representing a vector of finite field elements. So there'll be h of them, h elements in the vector, each entry is going to be an entry in the finite field of cardinality p to the d, where d times h is equal to 5n. So we can view the plain text space as a vector in this sense, and we can do addition and multiplication on plain texts, and also homomorphically on cipher texts, kind of in parallel like this, kind of in a SIMD fashion. In addition to that, we can move data around in between the slots. If we look at integers j that are relatively prime to n, then each such integer j defines an automorphism on the plain text space that basically just takes the monomial x and maps it to x to the power j and leaves all the coefficients alone. Homomorphic evaluation is easy, especially in double CRT representation, just to just move shuffle things around in the double CRT representation, but it again, just like for multiplication, it gives us something that's encrypted with respect to the wrong secret key, and it requires another type of key switching. But once we have that, then this gives us a set of rotations that allows us to move data between the slots. So we can do things in a SIMD fashion, slot wise, and then we can also move data around in between the slots. Just to give you more of a concrete idea, here's a simplified but actually not very typical setting. So if p, the thing defining the plain text space, is 1 mod n, then the cyclothomic polynomial splits completely over zp, and what we really have is that then in this case the plain text space is isomorphic to a vector of elements where each element is in GFP. And then we have an isomorphism that basically maps a polynomial f of x to f evaluated at a primitive root of unity, omega to the power i, if omega is some primitive nth root of unity mod p. And then if we look at this automorphism that sends x to x to the j, if we're looking at this thing as a vector of elements, what that's doing is it's taking the component of this vector whose value is f evaluated omega to the i and sending that to the f evaluated omega to the power i times j. So in effect this is moving the data, I guess you have to reverse this, in slot ij, what used to be in slot i times j is now in slot i. If p does not equal 1 mod n, then something else similar happens to this, but the algebra is slightly more complicated and I won't go into it here. The general case, the set of data movements or rotations is kind of determined by the structure of the group z and star modded out by the subgroup generated by p. And if you look at the structure theorem for finite and abelian groups, then you'll get some kind of decomposition of groups. So for example, maybe this group structure is a product of two cyclic groups of order three, in which case we have nine slots, which we can view as a three by three array, eventually. And using the set of the automorphisms that we have, we can either rotate all the rows simultaneously by any amount or we can rotate all the columns simultaneously by any amount. And more generally, we have, is there something else I can press? I'll happily do that. No, there's nothing up here except some strange thing. Can I press another button? I mean, I know when you do it, it works, but when I do it, it doesn't. So maybe you should just press the button. Watch how I'm doing it. Am I doing something wrong? See what I'm doing? Okay. So I'll just keep clicking. There we go. So something happened there and something happened in this slide. So we have an encrypted vector with h slots. That's where we're at, right? So in the plain text space and we have an encryption of this and we want to apply public matrix to this encrypted vector, right? So that's the whole point of this talk. And because I'm, I don't know how much of this I'm going to be able to cover because I spend most of my time just pressing this clicker. So I'll stop. There's an obvious approach which is stupid and you shouldn't do it and I'm going to just skip over the slide as soon as I can. I can explain it while I'm doing this except I'm very mad and so I'm not in a very good mood right now. Okay. It's quickly, right? So here you have a matrix times a vector somewhere in high school or, or, or, or college. The first time you learned about matrix vector multiplication is maybe you learned that you can kind of sort of think of it as multiplying a column times v1 plus a column times v2 plus a column times v3, right? So you might say, well, maybe we can do something with that. But you have to remember the, the vector is encrypted, right? So to apply this idea naively, you'd have to like come up with a way to take this encrypted vector and get three encryptions, one of all v1, one of all v2 and one of all v3. So in kind of the Intel, SIMD, lingo, that would be like a broadcast type of thing, right? So you could do that. In fact, you could do it with order h, h is the number of slots, rotations and multiplication by constants, but it's overkill and it's not the most efficient way to do it. A better idea, which thankfully Dan Bernstein straightened us out early on to get us not to do this and suggested an old idea that was known to people who work in the parallel computing industry is to do something more directly with rotations. So what we can do is we can start with our vector that we have and if you look what happens when we multiply the diagonal component-wise, we get this vector and then if we rotate, if we rotate, if we rotate, if we rotate, if we rotate, if we, this vector by one position and we then look at what happens when we multiply by kind of a diagonal. It's saying of a diagonal that bends around and goes around and picks up some other elements. You get, you get this vector and then you rotate one more time and you pick up another diagonal and you can check what you get is exactly the matrix vector product that we want. So you can just do it with three rotations, really your initial vector plus two. Now, this matrix, remember, is public and everybody knows it, so there's constants and we can just kind of compute ring elements that we formed by computing a Chinese remainder of these three guys on this diagonal and then these three guys on the off diagonals and then we can even convert them to double CRT format as a pre-computation and then what we're left with is an algorithm that takes about h rotations which are expensive because they involve key switching and change of representation and these multiplication by constant. So here's a better idea, very old idea, baby step, giant step, it's always something to try and indeed it helps a lot here. So let's define rho to the i as the operation of rotating a vector v, i positions and here's what we want to compute based on the idea of the previous slide. We just want to sum over all indices i, some constant i that we pre-computed times the i's rotation of this vector. That's what we want to compute and these c i's are just constants. So the observation is that rho is actually an automorphism on the plaintext space and we can exploit that fact. So this is what we want to compute. So let's write each index i as j plus f k where f and g are like square root of h and j is running up to f and k is running up to g. So I'm just kind of like decomposing i like this and so this is just the same thing and then what we're going to do on the next line is just pull out this rho to the f k, we pull that out, of course that messes up this constant but it's an automorphism so we have a handle on what everything is. So when we pull this out we just have to replace this constant by rho to the inverse power f k times the original constant and these are all constants so they can be pre-computed so we don't care and that's it. So the algorithm becomes, we first compute powers rho to the j for j running up to the square root of h bound. Those are the baby steps and then we compute all of these sums of multiply by constants and add everything up and then we have the giant steps where we apply rho to the power f k for different values of k and the cost then is square root of h rotations for the baby steps, step two is just computing, we still have to compute h multiplication by constants we don't reduce that and step three is square root of h rotations itself. So that's what we gain with the baby step giant step method. I'm going to try to cover the other idea that I wanted to get at. So here's even a more better idea or in other words if two times h rotations are good then maybe a single rotation is even better. So can we do this at the cost equivalent to a single rotation? So let's look at what, so to do that I really need to dive into what happens when we do a homomorphic rotation. So we want to apply a rotation to an encrypted vector and generally speaking remember we want to apply a bunch of rotations. So a ciphertext is now remember what a ciphertext is, it's a pair of these ring elements. The first thing we do is we apply this raw automorphism to the components of the ciphertext itself and that's just shuffling around some data in these double CRT representations. But now we have to do this thing called key switching which now I have to kind of show you a little bit what it is. The first step in key switching is you really have to understand what all the issues are with in terms of managing the noise and everything in a homomorphic encryption scheme but the first thing we need to do is take the component c1 or actually c1 prime after we've applied the automorphism to it and decompose it as a sum into digits. So we're gonna, you can think of r sub k as you know powers of some number and we just want to decompose it into digits. So each coefficient of these polynomials gets written as a sum of digits so that each digit is small enough so that we can manage the noise appropriately. And this is expensive, this requires double CRT the coefficient conversion because the only way we really know how to do this breaking into digits stuff is we really need things in coefficient form. And then once we have that then we take the public information, this key switching matrices that are part of the public key really and we need to take the digits that we computed and just apply some linear maps to them, some simple public linear maps. Everything here now you can assume is in double CRT representation so it's fast and cheap. So the expensive part is just this part here. So the idea is to just refactor these three steps. Basically we're just gonna swap the first two steps by using the fact again that row to the eyes and automorphism that doesn't change the norm of anything by very much. So what we're gonna do is we're gonna initially do the key switching with part one which is the break into digit part. We're gonna do that to the original ciphertext instead of to the rotated ciphertext. And so we get that so we just break this into digits that's expensive. But then then we're gonna do the raw automorphism step applied to the individual digits that we got from step one, step A here. So we're gonna do the cheap raw automorphism step there and then we're gonna do the key switching step just as before. And this is equivalent because of the fact that this row to the eye actually is an automorphism on the ring and it doesn't change the norm very much so all the things you need to make the key switching work still work here. Why is this better? Because we can perform this first expenses step just once for many rotations row to the eye. The only thing that really used row to the eye is this guy up here and so we just can do this once. So if we need to apply many powers of this row we can just do this break into digits once that's independent of row to the eye. And then these other parts are cheap so they should actually we can do those for each individual row to the eye. So in the paper we call this idea hoisting just because compiler writers call this optimization of pulling out of a loop a common computation outside the loop kind of hoist you hoist the computation outside of a loop so that's what we're doing here. So for a given encryption of a vector v we can compute an encryption of many rotations of v which is one expensive step and h if we need to get h rotations h cheap steps that's the take away. So if we apply this to matrix multiplication on the one hand this is going to be faster than the basic method that uses h rotations because now computing all these h rotations just requires one expensive step and h cheap steps. On the other hand it actually may be slower than the baby step giant step thing just because we're doing these h cheap things they're cheap but they're not free and when and doing square root of h expensive things can be faster than doing h cheap things depending on the relative cost of everything and in practice you have to look at really the constants in the running time that come in and for very large h this can be slower than the baby step giant step method but on the other hand we can combine both techniques and do the baby step giant step in the first step where we compute the baby steps we can definitely use the hoisting technique to to to compute to hoist all of these rotations out so and we save a factor of two so instead of two times h rotations we get h rotations so that's it thank you