 After some technical difficulties we got the screens also to show here which kind of helps the speakers so it's my pleasure to open the first session, which is a letter space cryptography and To announce the first speaker Gregor Zeiler, which is who's speaking on joint work with Vadim Lubaszewski on Stuttering and entity are you truly fast and through using entity? So good morning, it's nice to open this chess. I hope maybe not everybody in the room is as tired as I am Yeah, so I'm going to speak about a Letters-based chem which is called entity are you and which is basically a version of Andrew using entity Which means a number theoretic transform for? fast polynomial arithmetic and Yeah, so this is a joint work with my advisor ready To motivate this work Let me make a few observations So first especially if we look at LWE schemes at the moment we see that In these schemes polynomial arithmetic has basically become so fast that in the runtime of these schemes symmetric key tasks like Getting a pseudo randomness These tasks make up for for the majority of runtime So we basically cannot speed up these schemes and software very much anymore on the other hand if we look at Andrew then There the situation is that you actually need muscle much less pseudo randomness and The main reason for this is that there is no big uniform public polynomial that you have to expand at some point And also the arithmetic in a sense is more complicated and the reason for this is that during key generation You have to invert some polynomial and this is a pretty expensive operation and so Then the basically the next observation we made is that if you look at the NIST competition Then there are a couple of and true schemes In submission, but none of them uses the entity for faster arithmetic So we thought that we can actually speed up and true Yeah Over what what is possible with LWE schemes And yeah, so this was the the goal of this work So we designed some variant of entry I H is S which is one of the NIST and true submissions by using or Yeah, by using a variant of the of the number theoretic transform and also what is important here Is that if one does this in the same way as for example new home where? one uses a ring of dimension a power of two then One isn't in a bit of a bad situation that one has basically only the choice between Two little security and two large sizes because somehow for example, and true H is S uses a ring of dimension 700 This seems to be somewhat a sweet spot for for a letter space cryptography at the moment And if you use 512 then you don't get enough security and 1,024 gives you very big sizes So this means we needed to use some non-power of two entity And I want to basically now spend the the whole talk and not so much focus on Entry with the scheme itself and more on how one can do a non-power of two entity for fast polynomial arithmetic And yeah, so after this introduction, let me start with some recap of the basic so-called cyclic entity and what one does there is one one takes a Yeah primitive and through the unity let's call it zeta and what the entity then Computes is basically it takes a polynomial f of of degree smaller than n and evaluates this f at all the Entrudes of unity and now since all these routes are Roots of the polynomial x to the n minus one We get that this is action isomorphism from this this ring And this didn't work. Yeah, so this ring to basically n copies of zq and because of this isomorphism it means we can do fast arithmetic and in particular multiplication and division in this ring by doing very easy Coefficient wise arithmetic in zq to the end and Now the problem is that in lattice cryptography We want to use a ring basically of this form which is zq x modulo some Modulo some some polynomial, but this polynomial phi needs to be irreducible over the integers So therefore x of the n minus one is not really a good choice or not something we can do for for lattice cryptography and so what what many schemes now do that use the entity is They use this very similar polynomial x to the n plus one which is irreducible if n is a power of two and if you do it Then some some entity algorithm in this ring modulo x of the n plus one then this is called the negative click entity and Yeah, now to explain how we do a non-power of two case and how to also compute such an entity quickly. I Introduce some more algebraic viewpoint and this is that basically the fact that there is a There is a primitive and fruit of unity in zq This means that this polynomial x to the n minus one splits into linear factors and Then by the Chinese reminder theorem We know that the ring we interested in is actually a product of many rings of of rank one and since this Chinese reminder map this works by Reducing some polynomial in here modulo all these linear factors We see that really the entity is nothing else than then this Chinese reminder in that and Basically this algebraic viewpoint directly also gives us an understanding how we can compute this in a fast way and the idea is That we can basically use this Chinese reminder ring in a recursive fashion so we we start with our original ring and then see that This polynomial x to the n minus one this is is a product of these two polynomials of degree and over two And then again by the Chinese reminder thing we can basically in the first step just reduce our polynomial in here modulo these These two pulling smaller polynomials and then we continue this in a in a divides and conquer fashion And the reason why the entity is such a nice algorithm and as fast is that basically reducing modulo such simple polynomials which are Basically only have two monomials. This is a very fast operation So we just continue this on the left side It is exactly the same as the first step on the right side. We noticed that plus one can be written as Minus minus one which is zeta to the n over two and then we we split this by extracting Further square root Yeah, and then one just continues until Yeah, basically as much step as one wants This is also an important observation that there is actually no reason for fast polynomial arithmetic to do this entity down to Down to linear factors You can basically stop as early as you want and still have have some algorithm to multiply or to do arithmetic because all these all these splitting steps. They are actually isomorphisms And now to see how what an optimal way for the all for the negative cyclic entity is is you basically just have to compute Some some right part of such a tree and with this View on to the entity we can understand what we can do if we don't have a power of two So this is what we did in our scheme We chose this this ring, which is modulo some the cyclotomic polynomial of degree 768 and and then as modulus we chose We wanted to choose a prime which is not much bigger as the modulus in entero HSS So we chose 7681 which has the property that this cyclotomic polynomial only splits into factors of degree 3 So not linear factors, but as I said before this is not really a problem and now to Get basically some entity like algorithm to to split this ring The most interesting step is how how to make a first splitting basically So if you look at this polynomial of degree 768 and play around with it then you find that it has actually many factors of Lower degree which which retain this structure. So which still have three monomials, but this is not really what you want because reducing modulo polynomial of this form is more expensive than you want an entity algorithm But if you look further then you find that this is a polynomial as a product of x to the 384 minus the two sixth root of unity in your ring and So after you have basically computed The the polynomial or the reduction of the polynomial start with modulo these factors You are in the situation where you have that you have in a power of two entity. So you have these nice Polynomies which have only two monomials and then splitting this further is just basically the same radix two type thing as in in the power of two entity you basically just extract further spare roots of these of the roots yet before Yeah, so now if we look at the first splitting then on first side This looks a bit more expensive than a usual radix to splitting because there's these two different roots of unity but since they are a solution of this equation we see that The other six root of unity is actually just one minus the first one So if you replace this in the second polynomial It gets obvious that we just have to multiply by one root of unity to do this as splitting So also the first splitting is really not much more expensive than a usual radix to splitting So so much for how this entity of dimension 768 works now The question is if you want to implement this in a fast way on every x2 Then the part which basically determines how fast your algorithm is is how you do these multiplications with the roots of unity So remember you have a polynomial you want to split it So you need to multiply the the the upper half by root of unity and then at or subtract this to or from the lower part and These multiplications You have basically two choices how to do this in a vectorized setting so you can either Pack only eight 16-bit coefficients in 256 bit registers So you basically leave room 32 bit for each coefficients And then if you multiply by root of unity you basically have space for these intermediate 32 bit results that appear before you reduce modulo q or a different approach is You use that every x2 has these two nice instructions where you can multiply Basically six a bit integers and get the low and high words separately into separate registers So this is basically the approach we used But this is only since you don't really want to do a muddy precision arithmetic with these low and high parts What you need is a reduction algorithm that Somehow naturally handles this case that your input is given in this low and high part representation and This is such an algorithm actually assist and this is what I want to explain next It's basically just a variant of the Montgomery reduction algorithm But we really need a signed arithmetic here So normally Montgomery reduction is is defined with unsigned arithmetic But but the situation is very similar if you use signed arithmetic So remember that this algorithm computes something which is called a Hemser remainder which basically differs from a usual or cleaning remainder only in that Your remainder is scaled by some factor 2 to the mean minus 16 modulo q And if you look at this defining equation of this Hemser reminder, you immediately get how to compute this So you just multiply Your your integer c by q inverse mod 2 to the 16 This gives you m then you multiply m by q subtract this from c So only this part remains which is out to the 2 to the 16 And then you just divide out to do the 16 by some shift and get this Hemser remainder and In our situation now we have these products which we are given as separately some are as low part and high part and the crucial observation now is that the Hemser or the Montgomery reduction algorithm Naturally somehow can work with these two to input separately without ever somehow needing to compute on 32-bit numbers So in the first step we needed to multiply by something modulo 2 to the 16 So it's immediate that the here we only need the low part and then in the second part We we multiplied the m we got here by q and subtracted this from c But if one analyzes this one sees that this product mq actually has the same low word as c So we don't need to compute it and it's actually sufficient to just compute the high word and subtract it from c and Yeah, so Then one immediately also gets r and doesn't even need to divide by to do the 16 So this is a very fast algorithm Which someone nicely fits into this vectorized setting and there's one further optimization we do in entity iu so since we multiply by Pre-computable constants which are these rules of unity we can also pre-compute These constants with a factor of q to the minus one and then we don't have to do this first first Step in the reduction and what you get in the end is that basically for full mul mod in zq with such a pre-computed root It is sufficient to just do three half Multiplication instructions, which is very fast Okay, so Yeah, this is I hope this Gave you some idea about how the the arithmetic in entity iu works And on this slide you basically see the result of this so Yeah Yeah, so so basically this is These are the numbers for the finished chem Which is a cca secure key encapsulation mechanism built around the the entry one-way function and The the cycles for key generation which usually are much higher in other entry schemes because they have this polynomial inversion They are now extremely fast using this Entity arithmetic and also also signing and verification are Faster than basically in any other scheme science-wise Since so remember I said in the beginning that entry hrs has a ring of dimension 700 Since our ring is slightly bigger dimension 768 we get slightly bigger Public keys and ciphertext so they have two thousand one thousand two hundred forty eight bytes But this is actually brings me to the last thing I want to say so One that one can actually do something about this So what would be a possibility is to not use this 7681 prime, but some smaller prime then be three four five seven There one would need to do a slightly different splitting, but Since this prime has has one bit less one basically Exactly gains what one loses in this higher dimension compared to entry hrs So one would basically get the same sizes as entry hrs. There's a small caveat And this is that a smaller modulus results in slightly higher Basically failure probability in the in the for the for decryption So one would need to analyze this and maybe Do something about it, but at least this would be would be a way to get the sizes down Maybe some other tweak I want to mention as a last For this talk is what would also be maybe interesting is to basically switch to deterministic noise in in entry and then basically further reduce the amount of pseudo randomness that is used Okay, thank you You started a minute late. So if there's any questions, please go to the microphones and line up there Guess lots of math in the morning All right, if there are no questions, then I'll quickly ask something. Maybe then somebody else gets up and ask one So you apply this to enter our SS and of course there are some people who have other intro like schemes, so We're making your life harder with enter prime by having a different polynomial On the other hand, well, could you go to some extension fields extension sizes to do the same? So you mean to to basically compute and through prime with an entity? Yeah, sure. So there's there's one way to do this so you can Basically do arithmetic over the integers not reducing modular any prime and then You can do this modular many entity friendly primes and recombine the results With the Chinese Amanda theorem and then you reduce later. Yeah, this would give you a speed up for entry prime Maybe also yeah, or at least would be at least competitive with entry prime speed. Yeah. Thank you I Don't see anybody lined up. So please join me in thanking the speaker again