 Felly byddai eu bwysig o'r ysgol yng Nghymru, felly mae'r gweithio'r gweithio'r gwerthfawr i'r hanffr, rydyn ni'n gwybodol, yn olygu'r gweithio'r gwaith, mae nhw'n mynd i'n mynd i'r gweithio'r gweithio'r gwaith yn ffraith, mae'r gweithio'r gweithio'r gwaith yn fwy o'r gwaith yn fwy o'r gwaith a'n fwy o'r gwaith. Felly mae'n gweithio'r gweithio'r gwaith. ysgolection y dylai, ond rwy'n gwbod cynnwys gweithio children yng Nghymrured Baig indication yma. Rwy'n brfaith hynny'n mynd i'w risio ar yn 60 y mynd o ffynueth yn gallu gwneud y awdd y ffunctiwyd yn ei ddweud yw'r awddol wedi'u a ffin o'r 60. O'r ffynuedd yn digwydd yn fwy o boblion o bai'r awddol dar droi ar y ddweud yn ffynuedd yn ffynuedd yn fwy o bai'r awddol yn unig. y ffuncton y fath yw'r ddigrwy, ond byddwn yw'n edrych i mewn aeas, felly'n edrych i mewn aeas i. Felly mae'n edrych i'r ddiolch i ddim yn ymweld, ond yw'n edrych i'r ddweud yna, ond yn y ffrif penifesio ymblodau, mae'n edrych i'r ddweud i'r ddweudio'r ddweud o'r ffyrddol oherwydd ystafell yn dweud a'r ddweud iawn. So mae'n ddweud i efallai o'r aeas, ond mae'n ddweudio. Rydych chi'n teimlo yma arall, gallwch i'n oed yn ddau. Felly mae'n ddweud gallwch yn gwahanol i gael swydd. Felly maen nhw'r ffunctiwn. Gawd yn ddweud o'r cydewid? Felly mae'n ddim ymbyddiol. Yn ymddangos i'r cyFE- o'r cyfe- o'r cyfe- ymdyn nhw'n ddegryffwyd, mae'n ddegwch i'r cyfe- ac mae'n ddegwch i'r cyfe- o'r cyfe- o'r cyfe- o'r cyfe- o'r cyfe- and make it go efficient, okay. So that would be a good idea. Third, it's a very, very good design space. There's lots. And we know a lot about AES. We know there's various different ways of implementing it. It's got lots of algebraic structure. It's amenable to parallel computation. So it allows us to explore a design space that is well defined and allows others to explore different aspects. And the fourth answer, it is also used as a benchmark in MPC protocols. felly mae'r ffordd o'r ffordd o'r ffordd o'r LNPC, o'r FH, o'r 2 o'r 3 llyfr o'r LNPC, mae'r ffordd o'r ffordd o'r LNPC o'r AES o'r FHE, rwy'n cael ei wneud o'r ffordd o'r LNPC, a'r bwysig o'r comparyson o'r FHE o'r MPC. Felly mae'n gwneud yna'r LLAS? Great. OK. So why BGV? Cos it's there. OK. The main differences between BGV and other schemes, for example the one coming up in a few moments, and the entry scheme that was sometime, I can't remember where, was that Eocrypt or was that? I can't remember. A minor, OK. So they're all roughly of the same idea. And BGV, the Bukersky scheme, the entry scheme, they seem to be a lot better than the old stuff, yeah? The ancient stuff, stuff from 2009, 2010. So these modern schemes, as opposed to the dark age schemes, are significantly better and more efficient and have a lot more structure that we can play with. It's simple. It's very easy to write down. You're going to see a scheme that's very, very easy to understand. And we picked BGV because the other simple schemes went around at the time when we started this work. Cos this is a fast-moving dynamic research area, that you should all get involved in, because it's rather cool. And also, if we look at the schemes, like the entry-based scheme, the Bukersky scheme, BGV scheme, it's not actually clear which one is going to win in the long run, OK? So each have different trade-offs, the advantages and disadvantages of all. And what I call you to do is to duplicate the work in this talk and actually take the scheme from the next talk and do exactly the same stuff we did and see how fast you can evaluate AS. And then we can have a good exploration of the design space and we will know which one is best. It's not clear just by looking at the schemes, OK? Dum-de-dum, go. Right. OK, so the basics are we have a ring, which we take to be the ring of integers of a cyclotomic number field, where so there's some parameter M which we're going to fix later, OK? The ring RQ is this ring of integers reduced modulo Q for an integer Q, which is not necessarily prime. We're not going to be working with prime moduli here, OK? We're then going to take some secret key, which is going to be an element in the ring which is in some sense small, OK? We're going to take an element which is going to be a polynomial in this ring, and we're going to take something which has very, very small hamming weight and coefficients 0, 1 or minus 1, OK? And then there's going to be a sequence of moduli otherwise known as levels, which can go from Q0, 0 is small, so that's at the bottom, up to QL minus 1, L being big, that's at the top, OK? So think L about 60, so we've got 60 different moduli here. OK, what's a ciphertext? A ciphertext is three things. Two elements in the ring are QT, that's our modulo QT and the level of T. So this, the ciphertext is said to be at level T, OK? And then we decrypt with a very, very simple equation. We go c0 minus s times c1, we take that modulo QT, and then we take the result modulo 2 and out will pop the message if we've done everything correctly and we know what we're doing and blah, blah, blah, OK? And because you're all experts and you've all been at crypto before, you've at least read the papers, so addition and multiplication and modular switching are going to be done just as per normal BGV. So we're not doing anything, we're not doing rocket science here, OK? OK, so here's where we make some improvements. We are going to be using SIMD operations like there's no tomorrow. So what does that mean? We choose the M such that the, blah, blah, blah, the phi of M, OK, splits into L factors of degree D modulo 2, OK, for some sufficiently large L. And then using some work by some people, this means that the message space, which is R modulo 2, it means that the message space factors into L copies of the finite field F2D, so the degree D finite field of characteristic 2. And this means we can do SIMD operations, we can do L fold SIMD operations in a finite field of degree D on one side. If we add one message to another message, we're actually doing this on L fold SIMD, OK, which is nice. And then we use some new stuff, it was discussed at the last Eurocrypt, which we can do all sorts of other stuff with this representation for free. We can square elements for free without doing any kind of homomorphic operations, yeah? Well, we kind of, yeah, it's a homomorphic squaring, but we don't have to do the expensive homomorphic operation. And we can move data around from one SIMD slot to another. So if you're used to SSE operations in a modern chippy thing, then you know what I'm talking about. If you've never used SIMD before, you have no clue what I'm talking about. OK. OK, how are we going to hold data? So remember, these data, these elements of the ciphertext, the polynomials of degree M, 5M, which an M is going to be very big, modulo QT, OK, and we can hold these in various ways. We can hold it as a polynomial of degrees 5M minus 1, where all the coefficients are modulo QT. OK, so that's one way of doing it. That's the naive way of doing it. But let's pick QT to be the product of lots of small primes. In fact, we'll pick the primes to be fixed, and then QT be the product of the first T plus 1 primes, etc. And then what we can do is, instead of holding the polynomial modulo QT, we can hold it modulo PI for all the PI which divide QT. Yeah? And then we get the answer, so we map backwards and forwards by this Chinese remainder theorem. OK, so that's going to make things easier. Doing things with a Chinese remainder theorem, you know, that's like crypto implementation 101. We know that for RSA, so this is going to be good. OK, then we also be slightly more clever and we pick the moduli PI such that M, that's the roots of unity we're taking, divides PI minus 1. So that means that the finite field of characteristic PI has an nth root of unity. So we can then, instead of holding the polynomial modulo PI, we actually hold the polynomial evaluated at the nth roots of unity. And what that means is that when we do multiplication of polynomials, we don't multiply polynomials, we just multiply things coordinate-wise. This is taking a Fourier transform effectively, OK? Or it's a Chinese remainder theorem representation, a polynomial-based Chinese remainder theorem representation. So we have polynomial representation, polynomial CRT, and a moduli CRT. So we call the whole thing a double CRT. OK, there's lots of advantages. Addition and multiplication take linear time, which is nice. Whereas multiplication in polynomials is quadratic time, as we all know. Disadvantages moving backwards and forwards from the polynomial representation to the double CRT representation costs us. We cost us in FFT operations or inverse FFT operations. So what we need to do is we need to modify some procedures to minimize the number of data conversions we do. So we spend a lot of time in the paper looking at how we can modify the operations to avoid the total number of FFT and inverse FFT operations we have to do. OK, so that's trick one, which is just a data representation trick. Trick two is modular switching. Where can modular switching, this is the noise control operation that happens in the BGV scheme. And it allows us to scale the noise down that we have in a FHE scheme. So the noise grows with every operation. So we want to keep pushing the noise down so decryption is going to work in the end. And we have this thing called modular switching, which was in the original BGV paper. And we need to kind of modify it slightly to work with the fact that we're working in this double CRT representation. Again, we're trying to minimize the number of FFT operations. Again, see the paper for full details. OK, this is the cool one. This is the thing that shouldn't work, but it does. We have a new key switching operation. So what does key switching do? Well, key switching takes a three element ciphertext, d0, d1, d2, which will decrypt via that equation there for some value of some new key, new secret key that's somehow constructed, s dashed, and returns it to a standard ciphertext which decrypts via the standard equation. So the usual way to do this is to store loads and loads of information. Now, the thing is, if you've got huge amounts of... If your stuff is going to take large amounts of space, actually storing huge amounts of information which each of which takes a large amount of space means that you run out of space. So we have to kind of do something a bit more clever. So we're going to do something stupid. We're going to make the noise worse. So what we do is we multiply up by a very large number, then we do a key switch, which means we only have to store one thing of information, and then we squash the noise back down again. So we do the wrong thing. We make the noise worse before we make it better. Okay? So here's the basic idea. We have some extra large modulus p and we store some extra stuff in the public key which is just going to be an LWE instance like that. Blah, blah, blah. We use a single p for each level. We use the same p for all levels. So we only have to store one of these extra bit of key switching data in the public key once. And then when we do the key switching, we just have this input here. We multiply by p, which is the stupid thing to do. That increases the noise by p. Remember p is big, so we've increased the noise by a large amount. We then do the key switch, which we can do because everything's now very big with one piece of data. We're back down again using the standard key switching to squash the noise by a factor of p. So we blow it up by p, do a simple operation, and then squash it down by p again. Again, see the paper for all the details. And here's the code for doing it. So this is used in multiplication and when you want to Galois conjugate a ciphertext. Go. There we go. So when do we change levels? So in the old work, this was kind of like, every time you do a multiplication, you do a change a level. Every time you do addition, you change a level. That's a stupid thing to do. What we do is we carry around with each ciphertext an estimate of how much noise it's got. And then we only switch levels when we think the noise is going to get out of control. So we have some heuristics for doing that. But generally speaking, what we do is we switch levels before we go into a multiplication gate, not afterwards. It turns out that that is going to give you, that controls the noise much better. Exercise to the reader. Okay, so when I said parameters are big, how big are the parameters? We do an extensive analysis of the security. And so for different levels, we have a different value of 5M. So this is the dimension of the lattice we're interested in. Okay, so this is the size of the primes we need to make the LWE problem hard, assuming normal, assuming kind of like Chris Piker knows what he's talking about, which we have no reason not to believe, so we assume all that. And then we get this is the size of the large primes. So these are very big numbers. So to store something in the lattice, we basically take something of 72,000 dimensions and about 14,000 bits in each dimension. Yes, this is a bloody huge thing. And then two of them, because you need two things in each ciphertext. Then we have to pick the finite fields. And so what we do is we want to pick the finite field so we can pack as much stuff into the message space as we possibly can. So we kind of do some basic algebra. So that's kind of boring. And then so we've got specific parameters. We've got a specific parameter here for M. If you want to embed finite fields of degree 8, you know, 256-size finite fields, if you just want to encrypt bits, we have a different set of finite fields. We encrypt different types of elements. Okay, so our AS implementation has three types of implementation. It has a packed representation where you put the entire AS state into one ciphertext. So you only have to carry one ciphertext around. But there's lots of slots in the ciphertext. So you only need 16 of them for one AS state. So you could put lots of AS states in the ciphertext and do lots of AS encryptions at once, okay? Or we do byte slice where you actually have lots and lots of AS states across each ciphertext but you need 16 ciphertext to represent a single state, okay? Or you do bit slice which is you just enumerate the bit, you know, the binary circuit. Okay, so if a bit slice we use bioparauta because that seems to have to be the best circuit, otherwise we just use lots and lots of kind of cool maths tricks. This is the coolest thing. As everyone knows, the AS S-box has a nice algebraic description. Here's the nice algebraic description in which you can see more details in the paper. So as we come to the end, these are the results. We ran this on a machine called Blue Crystal which when the University of Bristol bought it was the 50th most powerful computer on the planet or something like that. We ran on one core because NTL isn't multi-threaded but we used a machine that had 256 gigabytes of RAM just because we had it, yeah? So here's the... Okay, so the bottom line is how fast can you do FHE AS, yeah? So if you have the patch representation, that's really nice because remember everything seems just one ciphertext and you just have one ciphertext around. We need about 60 levels. Key generation takes 43 minutes to encrypt the AS state in two minutes. That's really good, yeah? Two minutes to encrypt your AS state but then to do the key schedule takes 23 minutes and then to evaluate the round function takes 34 hours. Remember it's not totally and utterly impractical any more. It's only totally impractical, right? Okay, but we do 54 AS encryptions at a time so the total time per block is 37 minutes, right? So you should compare this to, I don't know, I don't know, like, I don't know, EDSAC maybe, I don't know. Okay, bytes sliced. This is where we have 16 ciphertext per byte. We need 50 levels and here's the cool thing. It takes 65 hours, it takes longer. But we could do 720 at the minute at a time, which means it only takes five minutes per AS block so we can encrypt with fully homomorphic encryption an AS block in under five minutes on average and the goal for you with 10 seconds to go is go faster. Thank you.