 Welcome to this video on multi-target decryption failure attacks and their application to Saber and Kuiber. I'm Jan-Peter Danvers and this is a joint work with Senebatsler. So what I will talk about in this video is first I will give an introduction to decryption failure attacks and then I will explain the contributions of our paper, which is a leveled multi-target attack, which is improvement to the cost estimation of decryption failure attacks and also specifically improvement of the cost estimation for existing schemes like Saber and Kuiber. So let's start with an introduction. What are decryption failure attacks? So decryption failure attacks are attacks on public key encryption schemes. Imagine that we have Alice and Bob who want to communicate, but they have no pre-shared secret key. So what Alice can do is she can generate a public key and a secret key using a key generation. So she sends this public key to Bob and Bob can then use the public key using an encryption procedure to generate a ciphertext for certain message or certain shared secret key. So Bob sends this ciphertext back to Alice, where Alice uses her secret key to decrypt to the message or the shared secret key. Now Eve is in the middle and Eve of course being evil wants to learn the secret key of Alice. Now she has as information the public key and what she does is some evil stuff to generate a ciphertext or something that looks like a ciphertext and she will submit this ciphertext to Alice. Alice who just sees something coming in will decrypt this ciphertext using her secret key and will end up with a message. And based on this message or the secret key or the reaction of Alice, Eve will learn something about the secret key. Now these type of attacks for many schemes are possible and are described in literature. But luckily we can also protect against such attacks, such chosen ciphertext attacks. So what do we do? Well we change the encryption procedure a bit. So what happens is we have Bob who has a public key, who has a message and he wants to encrypt. But now instead of using randomness for his encryption procedure he will use pseudo randomness completely based on this message that he's going to send. So this means that the whole encryption procedure now becomes pseudo random and if we know the public key and the message we can do this encryption and will end up with the same ciphertext. Now what Alice is going to do is Alice receives a ciphertext, she has a secret key and she can decrypt. So she will find a message. Now she has the message and a public key, which means she can do this encryption. So she will do and we will call this a re-encryption. So Alice re-encrypts the message and the public key into a ciphertext and now we can check. If both ciphertexts, so the input ciphertext and the re-encrypted ciphertext are the same then clearly the ciphertext is valid and she will just continue. But if the re-encrypted ciphertext is not the same as the input ciphertext then she knows that something is up. She knows that the ciphertext was not honestly generated or at least not generated according to the right procedure. And so in this case she knows she's under attack and she will just abort. She will not release any sensitive information. And this is called the Fushisaki Okamoto transformation. So it's a transformation to protect against chosen ciphertext attacks. So an attacker now, if he inputs a chosen ciphertext, an invalid ciphertext, it gets decrypted. We end up with a message. But Alice will end up with a different ciphertext. She will notice and she will not give away any information. And so an attacker will not be able to learn anything. So is this the end of the story? No, of course not. Otherwise they wouldn't, we wouldn't have a paper. Now let's first look at the NIST round two, round three candidates for encryption. And we see four schemes in there. There's Ciber, Saber and True and Machilles. Now two of these schemes, namely Ciber and Saber, have a special property. And this property is that they can have decryption failures. This means that with a very low probability, the probability given on the slide, with a very low probability, even if you input a correct ciphertext, these schemes will end up not being able to recover the message. So we will have a decryption failure, a failure to recrypt the correct message with a very low probability. So how does this enable an attack? Well, as before we explained that if we input an invalid ciphertext, then the message is wrong and Alice is going to notice. So the only thing we can do is input a valid ciphertext. So now we input a valid ciphertext and we hope it's a decryption failure. Because if it's a decryption failure, then the message is actually incorrect. And so if the message is incorrect, we do the encryption and we will see another ciphertext. And so Alice will reject the ciphertext even though the ciphertext is actually valid. So we input a valid ciphertext, we hope a decryption failure happens and if a decryption failure happens, then we know something went wrong. And this will give us some information about the secret key. So what has been done? First thing I want to talk about is a paper published in public key cryptography in 2019 talking about these decryption failure attacks. And so the first thing this paper showed is that actually if you find decryption failures, so if you find valid ciphertext that failed to decrypt, then you can use this to reduce the security of the LWA problem. So in essence, you can use this as a hint towards the secret key. And this is shown on this graph. So on the x-axis, you have the number of decryption failures that you have found. And on the y-axis, you have the security of the scheme. And you see that if you can find enough decryption failures, you can reduce the security of the underlying scheme with a certain value. So finding decryption failures clearly works. And this means that there is an attack where you just input random ciphertext and you hope you find decryption failures. But remember that these decryption failures only happen with a very, very low probability. For example, for Saber, it's 2 to the minus 136. So this attack is going to be very costly. So the natural question is, can we do better? And that's the second contribution of this paper. They introduce a way of speeding up the search for failures, speeding up the search for decryption failures. So to get an idea of how we can do this, I will first give a high-level introduction of lattice-based encryption. And so lattice-based encryption is based on the notion of learning with errors or learning with errors sample. So what is that? The notion that we have a matrix A, which is public and which is uniformly made, and we have two small vectors, S and E. And so what we calculate is A, and we also calculate B, where B is A times S plus E. So now we give out publicly A and B. And the learning with errors problem states that from A and B, it is hard to recover S and E. It even states that it's hard to distinguish A and B from just uniformly random values. So A and B, we call that an LWB sample, and we say that this is secure. So what Alice is going to do is Alice is going to generate such an LWB sample and send it to Bob. Now Bob will take both of these values, A and B, which look uniformly random to him. We'll use them both in a new LWB sample. So he will multiply both by S prime and add some error to it. And this will give him four values, A, B prime, B, and V prime. And he will send over B prime and V prime back to Alice. And now Alice, and also of course, he will add a message to this because we want to send over a message. Now Alice can recover this message by calculating V prime minus B prime times S. So if you write this out, you get this whole term. The message is equal to this whole term, a lot of terms, but we can simplify this because the two terms involving the big A cancel each other out. So we end up with the message being equal to the message plus some error. And as long as this error is small enough, we will not have a decryption failure and we'll have a correct transmission as a message. Now if this error term is actually bigger than q divided by 4, in absolute value in any of the coefficients, then we will have a decryption failure. So we can even simplify this further. Imagine that we take all the values of Alice into a big S. So this is S and E. And all the values of bulk we take into a sine-text vector, vector C, big C, which is minus E prime and S prime. And also just for the sake of this presentation, we'll say that E prime prime is approximately 0. Now on the left we have an image representing this. Be aware that this is a toy example. This is two-dimensional while in reality these vectors would be like 1500 dimensional. So given this big S and big C, we can rewrite or simplify this failure equation as the inner product between S and C. If this is bigger than q divided by 4, then we have a decryption failure. So let's continue working on this equation, the inner product between two vectors. We can also rewrite this as the norm of the first vector, times the norm of the second vector, times the angle between the two vectors. And so we want to have a decryption failure. So we want this inner product to be big. So what can we do? Well we can look at the norm of the secret. We want this to be big, but we cannot influence the secret. So we cannot do anything with this. We also want our ciphertext to be close to the secret vector, because then we'll also have a higher value and a higher probability of decryption failures. However, we don't know the direction of the secret, so we cannot do anything with this term. So this leads us with the ciphertext norm. We want to have a ciphertext with a high norm. And the higher the norm of the ciphertext, the higher the probability of a decryption failure. So solution found, we just choose a very large norm of C. Well this does not work, because remember from before we have this fushisaki okamoto transformation. Alice is going to check our ciphertext. And so we cannot choose any arbitrary ciphertext. We need to follow the correct encryption procedure. So this leads us to failure boosting. So what is AID? We have a secret vector S. And based on the secret vector, there is a certain area, and if a ciphertext is in this area, we'll have a decryption failure. So we don't know the secret vector, we don't know this area, but we want to have an as high as possible probability of finding such a failure. And this failure happens with a failure probability delta. Now what we're going to do is, we're going to generate random but valid ciphertexts. And all these random ciphertexts, they will be in this space, but we will only keep weak ciphertexts. So this means ciphertexts with a high failure probability, or equivalently ciphertexts with a high norm. So we'll keep all ciphertexts outside a certain circle. Weak ciphertexts. And so if we plot this weak ciphertext over the failure area, we'll see that there is a higher overlap. So we have just increased the probability of a decryption failure from delta to beta. Now this came at a cost because we had to do this pre-calculation. We had to generate a bunch of ciphertexts before we find such a weak ciphertext. So we've transformed just random guessing into a two-step procedure, where we first do a pre-calculation and then we can send in ciphertexts, weak ciphertexts with a higher failure probability. Now the total expected work to find the decryption failure now is this pre-calculation that we have to do for each ciphertext that we send in times the inverse of the failure probability. And now the trick comes because what we can do is we can actually speed up this pre-calculation. Because this is just a search, and a search of something that we can completely predict, that we can completely compute, so we can do this on a quantum computer. The second part, the sending in the ciphertexts, is that we cannot do on a quantum computer because we cannot simulate this. We don't have all the information, we don't have a secret key. So in essence what we did was we split up this random guessing into two procedures, a pre-calculation that we can speed up with a quantum computer using all the knowledge that we have, and a second part where we increase the failure probability, but a part that we cannot speed up using a quantum computer. So the result of this is given in this graph. So on the x-axis you have the amount of work that you do into this pre-computation. And on the y-axis you have the total work to generate a failure. Now on the complete left you have random guessing. If you do no pre-computation work then you're just random guessing. And you can see that actually using a quantum computer and using this Grover search, if you do a pre-computation you actually bring down the cost to find a failure using this failure boosting procedure. So we have just increased or decreased the cost of finding a failure. Now the advantage of failure boosting is not only this lower cost but maybe more importantly also that we have to send less queries. Before the failure probability was delta so we had to send in delta to the minus one queries. Now we've reduced the failure probability to beta which is a higher number so we have to send in less queries. So this is the first paper or a quick overview of the results of this first paper. I will also quickly talk about the second paper which looks at the scenario where you already find one decryption failure. So imagine that we have found one decryption failure or one or more decryption failures. Now finding such a decryption failure gives you some information because a decryption failure will be kind of in the direction of a secret which also means that we have an idea of the secret. So as before we have this failure condition, the inner product between S and C and we can still not influence S and we still want to have a large norm of C but now additionally we have extra information because we roughly know the direction of the secret and so we can adapt this weak ciftex area from a circle into a more fit shape based on this direction so that there is a bigger overlap between the weak ciftex and the failure area. So we have a better prediction and we can search more efficiently towards the next decryption failure. And so this graph depicts the results of this. So on the x-axis you have the number of available ciftex, the number of failing ciftex that you have already found and on the y-axis you have the total work and the total number of queries that you need to do to find the next failure. And take into account that this graph is logarithmic. So this means that if you find a failure then finding the next failure is much much cheaper and finding the next failure is even more cheap. So it's kind of like a bootstrapping problem. Once you find one failure it becomes a lot cheaper to find the next and the next and so essentially you're limited by finding the first failure. If you find one failure it really speeds up the rest of the process. So this was the introduction. Now for people who want to get more into decryption failures there's two more resources that I can recommend apart from our paper of course. The first one being my PhD thesis chapter 4. This contains no radical new ideas but gives an overview of the two previous papers and gives also some extra thoughts, some extra discussion on all of these methods and gives a generalized framework. So if you want to get into decryption failure attacks I would highly recommend you to take a look at this. A second one is a very interesting paper by Nina Bindelen, John Schenck that looks at the effect of successful decryption. So what if you don't find a decryption failure but what if you find a decryption success? Can you use this to find the next failure more efficiently? So now let's focus on the contributions in our paper. And the first is a leveled multi-target attack. So in our paper we focused on real-world impact of decryption failure attacks. And this means constrained attacks. So attacks in a real scenario or in a more real scenario with practical constraints and also attacks on real schemes like Saber and Kyber. So constrained attacks, what do we mean with this? Remember that before we were talking about an amount of work that we need to do and an amount of queries that we need to do an amount of cipher attacks that we need to send to Alice. And in these attacks before we would have numbers like we need to send 2 to the 100 or 2 to the 128 cipher attacks to Alice. Now more than this being a key recovery attack like an attack where we learn the secret key this is going to be a denial of service attack. I mean we're sending so many queries that well before we will learn anything about the secret the server will go down and people will notice that something is up. So these attacks are in that sense not practical and we're going to look at attacks where we can reduce this number of cipher attacks or of messages that we have to send at one particular server. And we're going to do that using multi-target attacks. So the idea is to spread these queries over multiple targets. To say for example we have a maximum of 2 to 64 queries that we can send to one target. But we consider more targets we consider an abundance of targets. And our goal is to retrieve the secret key of one specific target one random target out of this whole heap of targets. For example we can do maximal 2 to the 64 queries that's 2 to the 64 targets. So in total this gives us still 2 to the 128 queries but now we distributed over multiple servers over multiple IoT nodes. So a naive approach would be to consider all these servers and in the first instance we're going to try to find one failure. And we're going to say that we have a maximum of 2 to the 127 queries to find one failure, to find the first failure. And once we find one failure we're going to target this server and we're going to find follow-up failures with maximum 2 to the 63 queries. Why does this help this first step? Well remember that if we find a decryption failure then finding follow-up failures is much more efficient. And this can be seen in this graph. So on the x-axis you have the failure rate after a failure boosting. So this is inversely proportional to the number of queries that you expect to be sent. And on the y-axis you have the total work to find the failure. So you see the blue curve, the curve to find the first failure is much higher than the orange curve. So the curve is much less efficient and much more efficient to find the second failure than it is to find the first failure. But there is another thing that needs to be taken into account and that is the fact that we have 2 to the 127 queries to find the first failure. So to find the first failure we can be essentially anywhere on this blue curve and we can choose the most efficient point. However in the follow-up we are limited by this 2 to the 63 queries that we can do. So in the follow-up we cannot be on this optimal point in the curve. We need to be on the right of the dotted line, over 2 to the 63 line. And so this is the naive multi-target attack that was already hinted at in the Eurokip paper that I presented before. So now in our paper we present a leveled approach. So in a leveled approach we do the same thing as before. So we attack to the 64 targets and we have for each target a maximum number of queries. But what we are going to do is we are not going to try to find one failure, we are going to try to find n failures, so n servers at fail. And we have a maximum of 2 to the 127 queries for that. Now once we find n failures then we are going to look for a second failure and we are going to target these n servers. So this means that we have a maximum of 2 to the 62 times n queries. We have now more servers because we have n targets that we can use. And then afterwards we just continue our attack as before. So let's look at this from this perspective in this table. The details are not important but if you compare the two attacks then the difference are given in bold. So the work for the first failure goes up by a factor n. Because for the first failure we are doing this procedure n times. We are trying to find n failures here. However, for the second failure now we have much more relaxed constraints because we can do n times more queries to find a second failure. And the graph looks something like this. So on the blue curve you see that we have this dot. So instead of being on the x we do more. We have a bigger cost because we do this n times so it will be higher in the graph. But this corresponds to a lower cost on the orange curve where instead of needing to be on the right of the dollar line we can now be a little bit more relaxed. We can shift a factor of n to the left and so we can reduce the attack cost there. And so by balancing the two out we can reduce the attack cost. And in this scenario we reduce the attack cost from 2 to the 120 to approximately 2 to the 115. Improving the cost estimation. Improving the cost estimation, what do we mean with this? Well, we have this failure boosting idea, the idea of increasing the speed of finding the encryption failures, decreasing the cost of finding the encryption failures. And the whole theory is rather simple. It's rather simple theory. But then if you want to calculate how much it would cost, if you want to calculate this curve this becomes actually very expensive and very hard to do. And to do this in previous papers, and also in our paper we have used approximations, you use simplifications just to actually get numbers out in the reasonable amount of time and with a reasonable amount of computational resources. Now in our paper we include a few additional effects, a few additional constraints to get a better idea of the real cost of the encryption failure attacks. And so the ideas that we worked on in this paper is weakies, is a worse than expected angle, is a quantum computer depth, a limited quantum computer depth and a limited message space. And so in this presentation I will focus on the weakie effect and on a worse than expected angle. So weakies. Remember that we are doing this multi-target attack, so we're not only targeting one server, we're targeting a whole lot of servers, two to the 64 servers. And each of these servers has his own secret key S. Now some servers will have a secret key with a high norm and some servers will have a secret key with a low norm. Now this also means that some of these servers are more prone to failures because if you have a high norm of S then you'll have more likely failures. So some of these servers will have a higher probability of failure. And so if we find a failure while looking, querying all these targets, if we find a failure then we can assume that this server actually has a higher than average norm of S. And this also means that if we want to find follow-up failures then it will be easier. So because we're looking in this whole pool we will find servers with a higher probability of failure and so the rest of the attack will be easier. So once we go to the second phase where we find a follow-up failure, find the second, the third failure, the fourth failure, this will be easier and this is the weak key effect. And so we have in our paper described how to incorporate this weak key effect into the calculations and it's also taken all the curves and the numbers that we generate in this paper are also include this weak key effect. A second effect is a worse than expected angle. Remember that a failure happens if the norm of S times the norm of C times the angle between the secret and the side text if this is bigger than Q divided by 4. Now due to the nature of these failures actually what you'll have is approximately this term will be approximately equal to Q divided by 4. And now remember that from previous slide we have this weak key effect. We have a higher norm of S than you would expect at random. Also we're doing failure boosting so we have a higher norm of C than average. And so the two combined will lead to a lower than average cosine of the angle between the two vectors. So this means that if you find the failure then there's actually less information in it because your failure will be less in the direction of the secret. So what do we do? We have all these tricks, we do failure boosting, we have this weak key effect and we do all of this to find the failure more easily. But if we do this, if we find the failure more easily due to these tricks we will have that the failure gives us less information. A lower cosine between these two. And so also this effect we quantified, we described in the paper how to take this into account in the calculations and all the numbers also take this into account. So if we combine the weak key effect and this worse than expected angle we get the following curve. So the blue is the previous estimate from previous papers and the orange is our new estimate. On the x-axis you have the weak cyphtax failure rate, so beta. And on the y-axis you have the cost to find the next failure. And so this is to find the second failure. So the blue curve, so you see that the blue curve on the left side, complete left is if we do random guessing. So we don't do any preparation, we just do random guessing. And you see that the blue curve is actually higher than the orange curve. So the previous estimates were higher and now because of the weak key effect we actually have a lower cost to do our attack. So because of the weak key effect you can clearly see in this paper that the cost to find the next failure if we just random guessing on the left of this curve that the cost is actually lower. But you also see that these two curves overlap, that these cross and that if you go to the right if you do more failure boosting then we get actually a higher than expected or a higher than previously calculated cost. And the reason is exactly this worse than expected angle. Because if our angle is worse than expected if what we think is the angle between the secret and the side effects is worse than you expect then we will also have a worse than expected failure boosting procedure and this shows if you go more to the right if you do more failure boosting. So in this curve you clearly see that there is a weak key effect and there is a worse than expected angle effect and we've taken them into account in our new model. Improving the cost estimation for existing schemes. What do I mean with this? In the previous paper that looked at directional failure boosting so finding follow-up failures the second paper that we discussed in the introduction all the calculations were done on a toy scheme on a scheme that closely resembles Saber and Kyber but has two differences two simplifications and the first simplification is that the variance of the S term the secret term is equal to the variance of the noise term E and this is not the case for Saber and not the case for Kyber the second simplification is that there is no rounding of a ciphotext so the term E prime prime is assumed to be approximately 0 and again this was the case for the toy scheme but it's clearly not the case for Saber nor Kyber so in our paper we described we developed a new model to take into account schemes that do not fulfill these simplifications so schemes where the variance of S is not equal to the variance of C and where the rounding of a ciphotext cannot be approximated to 0 so our new model in this graph is given in orange and can be compared to the old model given in blue and so if we apply this to the toy scheme Katana you see that the both models so the geometric model, the old model and the geometric uneven model they give approximately the same results which is what we want because Katana fulfills these simplifications fulfills the conditions for the old model however if we apply it to Saber then you see a clear difference between the old model, the blue model and the new model, the orange model you see that the blue model gives a very big underestimation of the attack costs of about 20 to 40 bits and so it's very clear that if you want to calculate the costs for Saber that you need to use our updated model, our new model and that you cannot just assume the simplifications assume the simplified model from the Eurocraft paper that we discussed before so we applied our new model including the wiki effect including higher than average worse than an average average angle including this extended simplification so removing the simplifications and we applied this to Saber, to Kyber and to Kyber and the results are in this table I will not go into the details I will just give a conclusion or the conclusion that you can draw from this table and the details are in the paper so for current schemes and with that I mean Saber or Kyber single target attacks are really really impractical the amount of queries that you need to do is far beyond the limit that NIST set is far beyond any practical limit for multi target attacks is essentially the same if you take a very extreme case where you have 2-64 targets 2-64 queries for each target then theoretically you get a very small security reduction a few bits in these very extreme cases but in practice if you want to do this attack the attack is really not deficient and you'd rather do a direct attack on the underlying LWE problem so if these attacks are not practical why did we do this research well the first thing is that we want to have a better understanding of the attacks I mean there is a possibility that one of these schemes Saber or Kyber is going to be standardized in the future and having a better understanding of the attacks is really important so that we know that our future standardization future standard is actually secure against this type of attacks a second thing is more looking towards the future if we have a better understanding of attacks if we know what we can do without having a risk of security problems we can lower our parameters so we can increase the failure probability which makes lower parameters lower bandwidth faster schemes in general more efficient schemes so by doing this research we hope that future schemes can have higher decryption failure attacks without being subject to attacks that break the scheme and does that we can have more efficient schemes so future work what can we do in the future well the first thing is that we made this attack framework we made it work for Saber for Kyber but maybe we can also apply it to for example Frodo or other lattice based encryption schemes maybe also we can apply them to schemes with error correcting codes who have different properties where really more research is needed to predict the threat from decryption failure attacks for example a scheme like LAC a third thing is code based crypto some of the code based cryptographic schemes also have a decryption failure probability so can we port some of the IDs from this field from lattice based crypto to code based crypto so that is it for the attack framework on another side we can look at improving the security bounds the proofs there the difference between what we can prove is secure and actual attacks is at the moment really big and so maybe we can increase the proofs improve the proofs and have the provable security of our schemes