 Okay, thank you for the introduction. My name is Daniel Genkin and this is Yuvali Alom, and the title for our talk is May the Force be with you, Side Channel Attacks on Implementations of Curve 255.19, and this is joint work with Neil Samwell and Luke Valenta. Okay, so in the past few years, we've seen a cat and mouse game between security researchers and cryptographic libraries. Researchers need to publish papers so they find Side Channel Attacks on cryptographic implementations. Now, this, of course, forces implementations to deploy countermeasures, and then as a community, we wait until the next conference season. Come next conference season, we need to publish more papers so we find some more Side Channel Attacks, which, of course, results in some more countermeasures, and this visual cycle keeps on going. Now, in the past few years, we've seen a change in this paradigm. Implementations have started learning from past mistakes, and instead of repeatedly patching old cryptographic code, we've started deploying cryptographic primitives that are designed with what I call real-world security in mind. So take, for example, Curve 255.19. This curve was carefully designed by Daniel Bernstein in order to avoid many problems that plagued cryptographic implementations for years, and as a concrete example, it takes more subgroup and invalid curve attacks. Now, these are two family of attacks where the attacker leaks the other party's secret key by exploiting the curve's mathematical structure, sending some bad input to the Diffie-Hellman protocol, and instead of being secured, the result of that bad input is that the other party's response actually leaks information about its secret key. Now, the way we typically handle this is we have some bad inputs, which we do not allow into the elliptic curve Diffie-Hellman protocol, and we check if the input is one of those. If it is, we are talking to the adversary, and therefore we stop the protocol, we reject the input, and we terminate, hopefully. The reason why I say hopefully is because this is putting a lot of trust on the developer. He needs to be aware of the problem, know how to check it, and know how to stop the protocol when it happens. And implementations fail at any of these stages, and we still see these attacks out there despite being decades old. Now, for curve 255.19, this is solved by design. Daniel Bernstein designed the curve not to have bad inputs, not to have these corner cases, all the inputs are valid inputs for the Diffie-Hellman protocol. We don't need to trust the developer to check for anything, take the input around the Diffie-Hellman protocol with that input. It is guaranteed that the output will not leak information about the other party's secret key. Now, implementations also started deploying state-of-the-art side channel protection. Dealing away with old code, which has high-level key-dependent branches and memory accesses with key bits as indexes, we started moving to the Montgomery Ladder algorithm. Now, this algorithm is highly regular. It has one double, one add. The key is handled in a constant-time swap operation, which we will assume it's leakage-free. And overall, this means that if the constant-time swap operation is properly implemented, which we still trust the developer to do, but assuming we have that function, all the rest should not leak too much. And this entire thing is standardized under RFC 7748 that also came with a constant-time reference implementation, which is commonly believed to be secure against micro-architecture attacks and simple power analysis. So let's take a look at the RFC. So they clearly say that the CS swap function should be implemented in constant-time, i.e. independent of the swap arguments. And then they write out exactly how to do this. The procedure above ensures that the same sequence of field operations is performed for all values of the secret key, thus eliminating a common source of side-channel leakage. However, this alone does not prevent side-channel leakage by itself. It is also important that the arithmetic used not leak information about the integers modulo p. On some architectures, even primitive machine instructions, such as single-word divisions, can have variable timings based on their inputs. And then there is another paragraph that says that side-channel attacks are an active research field that still sees significant new results. Implementers are advised to follow this research closely. So Yuval and me read this RFC, and there is an assumption here that the basic machine instructions do not leak too much. So then we set out to experimentally test what can and cannot be leaked by primitive machine instructions. And what we have found is that basic machine multiplications leak information with EM. And not only leak slightly, they leak quite grossly. We can distinguish between different arguments for the multiplication instruction, even at very, very low sampling rates. Now, okay, so we have a leakage source. We then exploited this with Curve255's 19 mathematical structure. So as it turned out that it has an additional structure that is very useful for performance, but it also facilitates the exploitation of side-channel leakages. So if you have a side-channel weakness, Curve25519 is a good candidate to start exploiting it. And not only that, the problem is compounded by the lack of point validation. Because we canceled previous countermeasures that disabled specific inputs, here every input is valid for the Diffie-Hellman, so the adversary has another degree of freedom in choosing the inputs to exploit the curve structure and the leakage source. Now all this resulted in a key extraction for several popular implementations, GNU-PG, which is a common cryptographic library, but also the reference implementation as given in sold. Our attack is a chosen Cypher-text attack that produces key-dependent multiplication in key-dependent multiplications in the operands, and we then get the key out with the EM side-channel. And a nice piece of irony is that this works over Curve25519, it also works for Curve448, but one place where this doesn't work is at the ordered constructions of the NIST curves because they don't have this mathematical structure, and moreover, they use point validation. Now I'll spend most of the time today on the reference implementation of Curve25519. Okay, so with that introduction, let's see what can be revealed by primitive machine instructions. So in order to do this, I have a demo with me, so I don't know what you guys can see here, but whatever is here also will be mirrored on the screen. So I have a phone, and it's this phone right here, and I would like to measure its electromagnetic radiation. And in order to do this, I put on the back of the phone a coil that acts as an antenna. I then connect it to an amplifier, which amplifies the signal picked up by the antenna, and connect it to a software defined radio which digitizes the results. This gives me a bit stream, which I plug in into my computer, which analyzes the results and puts them on screen. Now, in terms of sampling rates, we have here a one gigahertz device, and we'll be measuring it much, much slower at one megahertz. So we are sampling three orders of magnitude below the device's execution speed, which means that we can do it with cheap, commonly available equipment, such as this very cheap software defined radio and a coil. We don't need anything special here. Okay, so now let's see what we can find. So may the demo gods be with me. So this diagram is called the spectrogram. It's a way to visualize leakage signals. In this diagram, time is a vertical axis, so time moves up, frequency is a horizontal axis, and signal intensity at a given time and a given frequency is given on in green. So for example, here we have some signals that keep on popping in and out at a specific frequency. Okay, so let me run some toy example on this phone. So as soon as I ran something, you see that the screen became much, much greener. So at least I'm not lying to you. I'm doing this live on stage. And we have here some structures. So first let's look at these three lines. So what do I do here? I wrote a small program that squares zeros. It literally takes zero, multiplies it by zero and does it for one second. So that's this thing here. This thing is squaring of random numbers. And this thing is squaring of the number that has one all over it. So we can distinguish between zero, squareings of zero, squareings of random numbers, and squareings of numbers that have all their bits set to one. So here goes the assumption that arithmetic doesn't leak grossly. Now it's actually worse than that because we have this line right here. So this is a different test program. And what it does, it starts by squaring of zeros. It then turns on a bit, starts squaring one, turns on another bit, starts squaring three, and keeps going like that all the way to a number that has all the bits set to one. So we have gross leakage of the arguments of the multiplication operation, actually the humming rate. Just by low bandwidth means. Okay, so that's not good. But let's see, but now that we have a leakage source, let's see how we can exploit it in order to extract keys from curve 255.19. So in order to, we're gonna be attacking the elliptic curve Diffie-Hellman protocol, and in order to do that, I need to establish some notation. So we have our two standard parties, Alice and Bob, and they would like to communicate securely without Dau's Vader being able to eavesdrop on their communications. And to do that, Bob generates a private key K, which is a scalar, is a public key K times G with G being the group generator, and sends K times G to Alice. Alice generates an ephemeral private key B, a ephemeral public key B times G, and then computes a shared secret by taking K times G, whatever Bob gave her, multiplying it by her ephemeral key, and we get this common secret S. And she also sends B times G to Bob in order to aid for reconstruction, and then Bob takes B times G, multiplies it by its secret key K, and gets the secret S, and the reason why all of this works is because scalar by point multiplication is commutative towards the scalar, which means that B times K is the same as K times B. Now, from a side channel perspective, if we have an evil Alice that is trying to side channel Bob, then where is Bob using his secret key? Well, here. Well, we have a scalar by point multiplication from input given by Alice, B times G, and a secret scalar K. And not only that, because we don't have a point validation, Alice can give bad inputs to Bob, and because Bob doesn't do anything with them, but compute the scalar by point multiplication directly, it would go into his scalar by point multiplication. Now, this entire thing is executed, of course, over curve 255.19. So what is this? This is the curve's equation, no need to know it over this talk. It works over the field that gave its name, two to the power 255 minus 19 offers us 127.5 bits of security. It's a Montgomery curve with a cofactor of eight, carefully designed by Daniel Bernstein. It has no invalid curve attacks, no small subgroup attacks, and therefore implementers are directly advised not to perform point validation, take the input, whatever it is, put it in the scalar by point multiplication, shouldn't click, at least by the outputs. Now, it also has low order elements, because it has an even, so first of all, what's an order of an element? An order of a point P is the smallest integer Q, such that Q times P equals infinity. So it's how many times do we need to add P to itself in order to get back to the point of origin? And moreover, because we have an even cofactor, we must have elements of order two, four and eight, and for this talk, we'll care about P four, the order four element, I'll denote it as P four, and remember that four times P four equals infinity. This would be very useful later. Okay, now just a quick recap. We're gonna, Alice is side channeling Bob, so she wants to attack Bob's scalar by point multiplication, and in order to do this, we need to show you how the scalar by point multiplication is actually implemented, and for that, I'll give the stage to you, Val. Okay, thank you, Daniel. And we now have the scalar by point multiplication algorithm that's a standard Montgomery implement algorithm, and the steps are, we initialize two variables that we trace, we track throughout the algorithm. We have a loop that goes on the bits of the scalar from the most significant to the least significant bit. We do hear a constant time swap of the variables, which depends on the values of the bits of the keys, so if the bits are the same, if we did not have a transition, we leave the bits, the values of r zero and r one the same. If the bits are different, if we had a transition from zero to one or one to zero, then we swap the values of the arguments, and then we continue, we always do one add operation and one double operation, and as we see here, the keys only use once here in the constant time swap, so the key is not used anywhere else within the algorithm. So basically here, if we look at that, we don't see any obvious leakage source, and that makes Daniel and myself very sad. So what we need to do is we need to look a bit deeper into the algorithm. And when we look a bit deeper, the first thing that we want to show is that we have an invariant that throughout the algorithm, r zero minus r one is always plus or minus p, so we have a difference of one input point between the two variables that we track. That's relatively easy to see. When we start the algorithm at this point, we just set them to infinity and to p, and that's the difference is p. After the swap, we get, if it was p, it will be minus p, otherwise it will be, if we swapped, if it didn't swap, then it will remain the same, and if we look at the add and double, we can see that basically add r zero to both r one and to r zero, and so the difference does not change. This goes back to the beginning of the loop, so we can use induction and prove that. Actually, we can see something even more than that. If we look, we can see that every time that we hit this point, r zero is an even multiple of p, and by the invariant, r one will be an odd multiple of p. And the reason for that, again, is that we started with r zero being infinity, which is zero times p. And at this point, if we did not have a transition, then we remain p, r zero will remain an even multiple of p. If we had a transition, it will be an odd multiple of p, but that doesn't really matter because when we go to the bottom of the loop, then we double r zero, we get again an even multiple of p. The important part here that we care about is what happens at this point. What are the values of r zero and what is the value of r zero as a function of k i and k i plus one? So, at this stage, we need to look at the implementation of the addition and the doubling on the group. And the formulas are here, no need to remember them. The important thing is that we basically have a field arithmetic, module N, no ifs, no questions, no branches remain the same. And the two points that we are really interested, the two calculations that we're really interested is the calculation of variable that we call L two, which basically takes two points in the representation of the point r zero, subtract them, module the field size, and then it is used when we square that value. And for the squaring, the reference implementation uses standard schoolbook multiplication, so it basically goes over the bytes of the number and multiply each byte by the whole number, shift whatever it needs, and add those. So, at this point, we ask the question, what happens if Alice sends Bob a malicious input? And the malicious input she'll send is the element of order four. So if the element of order four comes here, two things we need to look to note is, first, the element of order four, any even multiple of it will be either an element of order two, or it will be the point of infinity, just by definition of what an order four element is. And the odd multiplication of element of order four will be an element of order four. The other thing we need to know is the representation. What the algorithm uses is the projective Montgomery coordinates, so each point is represented by two field values, which are termed x and z, and the point of infinity has x being some number in the field, and z being zero. The point of order two has x zero and z non-zero, and the point of order four has both x and z non-zero and equal to each other. All other values of x and z are points that are different points on the curve. And now, we'll start to look what happens when we take a point of either infinity or order two here. That's what we get when we have an even multiple of the point of order four, and we have point of order four here. These go back to the next loop iteration. And now we need to check two possible cases. The first case that we did not have a transition, so these values propagate into the algorithm, and we get a r zero being the element of order two or the element of the element at infinity. That means that we get x zero or z zero is zero, and the other is an arbitrary field element, and we get that arbitrary field element or its inverse, and that's what goes into the square routine. That means that we're basically multiplying random numbers by each other. The other option is that we have k i different than k i plus one, so we had a transition. In this case, we swap the r zero and r one. They go back into the algorithmic implementation, and we get the element of order four in this point. So x zero takes z zero because they are equal. That will give us zero going into the implementation here. And as it turns out, zero is represented as two to the 255 minus 19, that's the field size. So we have 31 bytes that are all one, and the last bytes has only six bits that are set. And so most of the multiplication that we do here are multiplications of all ones. And because we have 961 of those multiplications of all ones, we basically amplified the difference between the cases by three orders of magnitude, which is what we wanted. That's what we lose when we sample at a low rate. So let's see what happens when we try to run this on the phone, and I let my trusty assistant do the hard work. And when we look at this spectrogram, now we start seeing some different patterns to it. And as we saw earlier, when we do multiplication, the humming weight of the values that we multiply appears as different frequency. So what we have over here is a frequency modulation of the signal that the algorithm sends. So while we were running this test, we had the MATLAB code analyzing that, doing FM demodulation, and we get this beautiful graph. And when we look at it, we see that we can see start seeing patterns. And basically we have a pattern of five peaks that keeps repeating throughout our scalar multiplication. And when we compare that to the ground truth, we know that every time it does these five peaks, that's a single iteration throughout the loop. More importantly, we see that some of them, the last peak is higher than a threshold that's put here, and in others, it's lower. And again, comparing to the ground truth, we know that when this peak is higher, then we have a transition in the bits of the scalar. So basically what we need to do now is guess what the first bit of the scalar is. Let's say we guessed that it was zero. Now we did not have a transition here, so we know that the next bit is against zero. We had a transition, so the next bit is one, and so forth, we can read the whole bits of this scalar. We try that against the data that we know, we know the public information about the scalar. If it matches all as well, if not, then we have to try again with the first bit being one. So basically what we did here, we broke the whole, we find the whole scalar just by reading it from the phone. We read the leakage there. Okay, so a little bit about countermeasures. The first possible countermeasure is to validate all the input points. This will work against this specific attack, but it doesn't block the attack itself because there is still leakage there. More advanced attacks will be able to get it. Another way to block that is to use point blinding. That's basically taking the input that Alice gives to Bob. Bob shifts that input to some other location and then does the calculation and then goes back to the original. These blocks all chosen ciphertext attacks, but it does have a bad performance. So this is the possible countermeasures that we have here or redesigned the phone not to leak, but I don't think this is going to happen. And with that, we'll finish and we'll be happy to take questions. Okay, any questions? Yes, can I repeat? It's a bit difficult to hear. So you can multiply the, yeah, can you repeat? So I think the question was that if you multiply the point by, I think four, then you no longer get, or by eight, then you no longer get an order for element and therefore this attack doesn't work. Did I get you correctly? Now the problem is, yes, it's a way to do point validation, but not without actually validating a point, just eliminating it. But what about all the other points? There are plenty of tricks out there about how to get values in the middle of a cryptographic algorithm might be less effective, might require more measurements, might require more ciphertexts, but the leakage is still there, find a way in with your chosen input. So it would stop this, but there are plenty of tricks out there to do these things. Question up there. Hello, so my question is about the applicability of coordinate blinding. Obviously the particular attack you're using where you get a value of zero, coordinate blinding is of no help. But if we're going to ban only the low order points, then does coordinate blinding protect you or not clear? Well, again, if you remove the point of order four, then yes, this attack will not work. But because we have a leakage there, then there are other attacks and there have been some other attacks on implementations of elliptic curves that leak a little bit. And it takes some more time, you need many more samples, you can't do SPA, but attacks do work. Okay, if there are no more questions, let's thank the speakers.