 Thanks everyone for being here So the name of the game is learning with errors and lo and behold here is once again learning with errors so learning with errors goes back to the work of order the reggae f and What you're given is a tuple AC so where a is a matrix and by n mod q and Your job as the adversary is to decide if AC is either a uniform So C is also a uniform vector or if C was formed like shown on the slide So you have eight times s plus e where e is small and in this formulation, which is the old-school formulation Kind of s is uniform But as you might know, there's a normal form of LW e where you pick s also from the same distribution as the error E so you can also think of it as somewhat small Okay, so LW e is kind of everywhere, but for the purpose of this talk, I only care about two applications In particular, you can build homomorphic encryption from LW e So for example the BGV scheme Which is implemented in H ELIP, which is IBM's homomorphic encryption library Goes down to LW e and the FV scheme Which is implemented in seal version 2 which is Microsoft's homomorphic encryption library also relies on the hardness of the LW e problem Right, so you want to kind of look at LW e if you want to convince yourself that these libraries are secure So what these libraries do and many other constructional photography do is they don't pick the secret as I've just told you Right, so the secret is neither uniformly random nor does it follow the normal form or either error distribution In particular what H ELIP does it chooses the secret such that you have to guarantee Typically you can you can can change that but by default or what you kind of encourage to do shall we put it like that Is there 64 entries which are non-zero and so they are minus one or one and every other entry if in dimension n Regardless of dimension n will be zero. So it's very sparse Seal in contrast picks the secret uniformly at random from the set minus one zero and one, right? So it's not sparse, but it's just short and So the natural question to ask at this point is how much security does this cost? Right, so we have this LW e problem We have some understanding of how hard is it to solve and then kind of here We have an adaptation right a variant of it and natural question. Well, does it cost you any security or not? If you ask theory So if you want to have some strong guarantees and you want to look at reductions Then the theory tells you actually in order to achieve the same security for your binary secret So that's not even sparse. It's just a binary secret Then you have to extend the dimension to n lock you right? So you have quite a bit of blow-up in the dimension of your LW e problem on the other hand If you look at actual constructions then this issue is typically simply ignored, right? So normally kind of security analyze proceeds by saying we're going to Pretend like this is a not an LW e normal form And then we're going to kind of choose parameters based on that because really you can't do much better with a binary secret Right and so again question, which one is it? Okay, so let's look at how you would kind of pick your parameters what kind of attacks you would consider You can roughly for these kind of Schemes where the dimension is very large it really boils down to lattice attacks So combinatoria attacks don't really play a role in this this world So you can either mount a primary attack, which means you find some linear combinations of the columns of a that gets you close to see And then so you're solving the bounded distance decoding problem as Mention in the previous talk the way you can do this is other you run some enumeration as just discussed Or you embed Your problem into another lattice and then solve some unique shortest vector problem Other approach you solve the short integer solutions problem also previously mentioned So you find a short w such that w times a is zero mod q and Then when you will take the inner product of your w with C And C was indeed formed as I shown you a times s plus e then you get an inner product of W and e w is short e is short inner product is short because in cryptography inner product of two short things is always short Thanks Okay, so how do you execute that? In more detail you run some lattice reduction and In this talk, I'm only kind of going to take a very high-level view of lattice reduction You you run lattice reduction on this dual basis and then you get a vector that has length delta naught to the m Times the volume of the lattice to power one over m and the volume of the lattice is that we construct here is q to the n So you end up with this formula Right, so you construct this dual do a basis for the dual lattice you run your lattice reduction algorithm And then you check if these inner products are small and then you conclude this looks like an LW Instance or you conclude otherwise and this delta naught depends on the lattice reduction algorithm So the better the higher the block size the more expensive the algorithm the smaller this delta naught and this smaller this vector short V Right, okay, so as I've just described you you try to find some short V And then you take the inner product of V and e and you check if that is looks very much Not like a uniform distribution So there's clearly a trade-off here right the shorter your V The better you can distinguish the end in a product from uniform But also as I've just mentioned the harder is to kind of actually get something this small right So there's a trade-off between running time and distinguishing advantage And on the slide what I'm plotting you for some particular instance under some particular because that cost model If you pick your favorite cost model it wouldn't look too much different Then you see if I just in decrease the distinguishing advantage to say one over a thousand Then my cost in because that drops rapidly right so for the first kind of like going to maybe to the minus 10 Kind of as beneficial going maybe to minus 30 kind of doesn't really make much of a Give me much of an advantage Right and of course kind of you're kind of comparing two strange things right so two algorithms that have different success probabilities So maybe want to normalize right and so how would you normalize you stay like well? Okay I can solve with low advantage for each of my instance then I run my experiment sufficiently often to do majority vote to still Have some constant distinguishing advantage We are having a decision problem So you have to run this roughly one over epsilon square times and then you end up with the plot that looks something like that and Indeed kind of in each instance you want to target maybe something to the minus 10 as your Individual distinguishing advantage right Okay, brief word from our sponsor Okay, so the discussion so far was was premised on and the plot that I've just shown you Was premised on the assumption that if you want to have one over epsilon squared short vectors You have to run because that one over epsilon squared times Right, but this is not necessarily true Because right the kind of outline that I gave you is like take a random basis kind of dear reduction Take the shortest vector that you found in your new reduced basis and check if the inner product is short But after we've done all this kind of hard work of getting a short basis Why don't we kind of start re-randomizing there and this is indeed actually what what we do in Lats reduction libraries when we do this extreme pruning and we want to re-randomize You want to re-randomize in a way. It doesn't completely destroy all the hard work you've done Right, so a very simple heuristic approach is to say let's take my basis L Let's run that reduction in block size beta and then instead of kind of starting from scratch each time I'm starting from this already reduced Basis I'm going to do some light re-randomization right some sparse more unimodular matrix that re-randomizes my basis And then I run some because that was some beta prime which is maybe smaller than beta and I still get something out. That's okay Right and this whole thing is very heuristic in very hand, baby All I have to offer you is some experimental evidence in block size 60 or 70 that tells us actually This is this is more efficient than just doing it from scratch and there are clearly many questions of kind of like how does it you know How does it really behave and can you really argue that you get something that's sufficiently random? But we can give you some empirical evidence that it seems to be okay at least for the Instances that we looked at okay, so that brings down the cost somewhat right so instead of running one over epsilon squared because that instance And block size beta you only run them in smaller block sizes All right moving on now this because what I've told you so far really has nothing to do with small secrets, right? I promised you small secrets, so let's talk about small secrets So the first kind of observation is well actually we don't need v times a is zero right because the secret is also small It's actually enough to find some v times a such that w is also small Right, so if you look at the normal form of the dual attack Then you have some x and y x times a is congruent to y And then if you have something short there and you multiply it through you get an inner product of w with s Which is short and an inner product of v and e both of which are also short right, so that's great and The next thing that you want to do is s as I've just described it's actually much shorter than e right So you want to give the algorithm the freedom to say like you know what you get to pick a bigger w Because we don't need w of the same kind of quality as v because we multiplying it by something that's smaller in the end Right, and so if you do that then that means you just scale a few columns of your lattice And then you can go back to some work by she buy and Stephen Galbraith where they did this for the primary attack And this is just kind of applying this trick to the dual attack Right, and then what you end up with is some constant times your w inner product s and your v and e and then if things are small enough then you still win and The advantage of the scaling is that this has a kind of quite significant impact on the determinant of your lattice Which in turn determines kind of how big your vectors Right so to figure out the sea that also is pretty straightforward Right, so you just kind of try to balance the contribution of the two sides And then there's kind of some really trivial algebra to get your constant see out and you're done and in this Formula on the slide age is these bosses of the secret right so if you're only multiplying it by 64 non-zero entries then you can afford to have each of the things that you multiply with to be a bit bigger Because you're not adding up and when Okay, so this was on small secrets, let's talk about sparse secrets and Here the key point is okay So we're working really hard to find a vector v such as when I multiply it from the left by a I get a zero But most of the columns are completely relevant They will they have they played no role in the in the final result right because s at that index is zero So I don't need to do all this work right I don't need to work really hard to make the linear combination zero if the thing doesn't matter in the end Right, so let's just ignore them like that's literally what I'm going to propose you Let's just ignore some columns and so if you then kind of think about well, what is my probability of success? Well, I have dimension n right so I have n bolts and then each of them are non-zero right They are unlucky ones and then I'm just kind of Sampling from these bolts without putting them back and I ask myself Well, what's the probability of getting lucky every single time and you're looking at a hyper geometric distribution? So it's pretty straightforward to work out how often can you guess and how much does it how does it affect your success? probability Right, so you just ignore some columns and then you solve the smaller instance, which is easier to solve right smaller dimensional lattices are easier to reduce And then if you got lucky great kind of you have distinguished your instance if you got unlucky You have to do it again And of course then your algorithm that you run kind of one over pk times has to succeed with sufficiently high Probability so it you can actually distinguish Right, and so here's how this looks like and I should mention like in my world vectors kind of either column or row vectors Depending on what the dot product wants from them, so please excuse kind of that abuse of notation Right, so what I'm trying to do is I do I want to find some small vector v when we're multiplied by a I get something zero in in the right hand side of the column So from index k to n minus one and then and I'm asking myself do I get zero and I get zero whenever the first k Entries of s in this particular instance if they are zero, right? so my V back to V makes all the Bilinear combination makes all the components starting at k zero and then because s not to sk minus one is zero by assumption Then that kind of zeros out the remainder, right? So I get lucky if I have my s i's up to k if they are all zero Now let's look at what I get what happens if I get unlucky, right? So me getting unlucky in the instance that I'm looking at is either have a one day I have a minus one there Right, so let's assume I have a one there Well, then the final thing that I'm getting is actually the my vector v in our product with a And then I get this a naught naught prime, right, which is just the inner product of these guys So I still get some some distribution that I can understand Right, I have the small thing and it shifted by something that I know, right? I know a naught naught prime and then I have something small some small noise the inner product of w and s and v and e right, so this suggests of You know some form of post-processing I've done all my kind of hard lattice reduction Which is really expensive and then instead of just saying like oh no this thing doesn't look small Let me look for all these shifts because that's it. That's a cheap thing to do So again you're looking at some hyper geometric distribution and now you're asking myself yourself is like okay I'm ignoring k minus j columns and they're all zero and j of them are actually I get unlucky, right? Establishing their probability is easy and that's how often you have to then kind of repeat the experiment But now of course this is much cheaper because you run your lattice reduction once and then you do these really cheap Kind of checks you check for many different distributions Of course you want to make sure that the advantage of distinguishing this high enough So that you know when you found the right one because now you have to win against all these other checks Okay, so these are the three kind of fairly straightforward ideas That I want to talk you about and so if you put everything together we arrive at our final algorithm, which we call circuit so The it's a variant of the dual attack for small and spark secrets and here's what it does So you run your because at beta reduction once and then you run your because at beta prime reduction for some small beta Prime for to produce many short vectors from all the hard work you have done and then the actual kind of dual attack that you run you scale it kind of as inspired by She buy and Steven Galbraith paper and then finally if your secret is sparse Then you play this game where you ignore some components and you do this post-processing If you to deal with kind of getting unlucky All right, and then so putting everything together kind of here's what you get so Looking at seal. So this is I should stress. This is seal 2.0 So they the parameters were updated in seal 2.1 So in seal 2.0 The paper kind of contained values for for log q that should give you 80 bits of security right, so in all these instances What you have is this the standard deviation of the noise is always 3.2 and then Right then essentially if you want 80 bits of security then the size of q becomes a function n Right and the same like well You can go for q up to forty seven point five so to the forty seven point five and you still have any bits of security Okay, so for the same instance if we just kind of using our kind of cost estimates Well, how long does it actually take to run lattice reduction? Then we get 83 bits of security, right? So this is kind of our kind of cost model about kind of how long does it take to just run the dual attack as it was described in 2009 earlier Okay, and then when you actually start using the fact that you have small secrets Then you get this down to 68, right? So that's what the circle small so small stands for we're using the We're using the small secret, but we're not using the sparse secret. Sorry, and then H e-lip Right, so the difference between seal and H e-lip is that H e-lip uses a sparse secret And so you again kind of lock you Is what you would get kind of using the cost estimates from the H e-lip paper and then you exploit the sparse secret You get this down to 61 bits and then you can play the same game for 128 bit security and the gap is even bigger All right, that's all I want to say. Thank you for your attention Okay, we do have some time for a few questions What does Silke stand for? Thanks, so it's as you do it's named after a goddess of wisdom and wit But if you are looking for a background room, then you can take comfort in the fact that SIS, BKW, LWE all the letters are there and then I'm sure we can work something out offline of a cool name Can you remind us how many samples from the LW problem do you need and how many samples the H e-lip and Mm-hmm She'll provide. Mm-hmm. So you can do it with one sample. So one ring LWE sample So that's an LWE samples and this is what you get So there's no like so this is assuming that you're using only a number of samples that is provided by the scheme The size of the noise are not the same. Do you have a sense of? How much savings do you get from each of these ideas? Yeah, so we kind of we have a running example in the paper where we spell this out And then so to give you the overview like clearly this fasting doesn't apply to see it right so we can and so Silica small also only has the amortizing costs and the the scaling and the amortizing costs It doesn't for these instances doesn't make that much of a difference maybe five bits. So For this particular instance, it really is the scaling kind of it makes a big difference. It seems For the primary attack on the other hand. No, if you run She's in Stevens algorithm on these instance or you know, you run your estimator on that you get like three bits So it really depends on kind of where you apply and then what do you get up?