 Hello, this is faster Montgomery and double add ladders for short fire stress curves. I'm Mike Hamburg at Rambus So the goal here is to Speed up elliptic curve scalar multiplication So for this problem we're given an elliptic curve E a point P on that curve and a scalar K Which is just an integer and we wish to compute K times P another point on the elliptic curve More specifically we'll be working with short fire stress curves Which are the most common and most general form of elliptic curve used in cryptography? And we're given only K and P so we don't have access to for example a pre-computed table that has multiples of P And we're looking to produce a regular algorithm, which is better for side-channel resistance So in a regular algorithm no matter what K is the algorithm always does the same operations in the same order Although possibly swapping the data depending on what K is So Depending on the protocol in some cases P is given as X and Y coordinates and in some cases only the X coordinate is given and only The X coordinate of the output is desired So we'll handle both of those cases the main application of this is elliptic curve to be helman key exchange But in an embedded device it would also be very useful for ecdsa signatures key generation and so on So the algorithms that we'll be using are called the Montgomery and draw or double add ladders So for these ladder algorithms, we take two point registers Q and R or P and Q and we initialize them with the base point P zero or with double the base point and then we scan Through the bits of the scaler so for the Montgomery ladder We scan through from most significant down to least significant Whereas for the joie ladder we scan from least significant up to most significant at each step We conditionally swap the two points in the ladder state and then we double one of them and add it to the other points And then at the end one of the points in the ladder state is the desired output The ladder that I'll be working with in this paper is a three-point ladder So by adding either the difference of the two points to the Montgomery ladder or the sum of the two points to the schwa ladder We get a ladder in which the state is wider, but this may help with the ladder operation And in both cases for the three-point ladder the ladder operation takes P Q and R and has to output P Q plus R and to R Furthermore, we're given at the beginning of this ladder step that R is Q plus P or possibly due to swapping Q minus P Now it's worth noting that the representations of these three points in the ladder state need not be the same So for example in some of our formulas We will be representing the x-coordinate of P Q and R, but the y-coordinate only of P However for the Montgomery ladder Q and R have to have the same representation because you have to swap them with each other conditionally and likewise for the schwa ladder P and Q have to set have the same representation So so this will constrain the way that we can represent the ladder state There's been a lot of previous work on the subject of Montgomery ladders starting of course with Montgomery's work in 1987 which applies to the only to the so-called Montgomery curves It is however extremely efficient for these curves for short fire stress curves the more common case The Montgomery ladder has been less efficient But has slowly been improving in efficiency over the past decade or so So down from 14 multiplies per bit in 2011 down to 12 multiplies per bit in 2017 Down to 11 multiplies per bit today So we'll be using for for this work Jacobian coordinates So it's well known that it's best to represent the x and y coordinates of the elliptic curve in a projective form such as x over z And y over z so as to defer the costly division to the end of the algorithm However in some cases it's better to instead use x over z squared and y over z cubed because this homogenizes the leading term of the curve equation with a z to the sixth term It's worth noting that z equals zero is allowed And in fact there is a point on the elliptic curve, which is at infinity Which is the neutral point of the curve or the identity point of the curve? This is represented with non zero coordinates for x and y But a zero coordinate for z However if you ever get to the state 000 where x y and z are all zero Then you failed because you're representing the state as zero over zero um It's also possible instead of storing each a separate z coordinate for each point to store the three points So that they all have the same z coordinate So there's a the main benefit to this is that the first step of a point addition is typically to rescale the point So that they have the same z coordinates so that you can subtract their x and y coordinates So this is actually somewhat expensive especially for Jacobian coordinates where you have to compute z squared and z cubed So you might be able to save this if you could sort of statically guarantee that the three points all have the same z coordinate And it may be less expensive to maintain this invariant than it is to rescale the the z coordinates During the operation itself Furthermore you save instead of storing three z coordinates You have to store only one of them or in fact none at all and the reason is that well the main reason in the regular Ladder operations that you need the z coordinates is so that you can rescale the points so that they have the same z And here you're just sort of guaranteeing that they have the same z So you don't need it for that reason But you do need it at the end of the ladder to divide by so that you can get the final output however Depending on how the ladder is constructed you may be able to solve for z instead of Tracking it through the ladder step There is however a significant downside to cozy ladders for all their advantages And this is that you always fail if any one of the points pq or r becomes the neutral point This is because the neutral point has z equals zero So suppose that q becomes the neutral point it has z equals zero, but all three points have the same z coordinate So they also have z equals zero and therefore their representation is this invalid zero over zero representation Our work today is based on the Kim et al formulas from 2017 So this improved the state of the art from 14 down to 12 multiplications per bit The the representation that they use as a cozy representation that has seven different coordinates of somewhat complex interpretation This interpretation depends on an additional bit b which depends on the key bits And in fact the ladder uses two key bits instead of only one However Also in 2017 after reading this paper. I distilled it down to a Simpler and faster set of formulas that I presented at the chess 2017 rump session So from this I would say that the the key insight at least of the distilled version Is that the three points pq and r or rather pq and minus r or something similar depending on the sign? Add to zero which means that they all lie on a line intersected with the elliptic curve This line has some slope and it's better to calculate and store that slope rather than calculating and storing the y coordinates of all the points A second insight which is especially useful in the work today Is that for the addition formulas for elliptic curve points? You need the difference of x coordinates Or perhaps the difference of y coordinates, but not the x coordinates themselves So it may be that you can save effort by calculating directly the difference of the coordinates instead of calculating the coordinates and subtracting them So for the 2017 formulas, I used a modified Jacobian state in which the x coordinates are Represented with x q and x r which have to have the same representation because they're going to be swapped as their difference from the x coordinate of p Multiplied of course by z squared, which is a Jacobian form Whereas x p is represented directly further more only the y coordinate of p is stored and not of q or r and the slope m is Stored and it's got a power a single power of z so that when you multiply it by an x difference Which has z squared you get a y difference which which has z cubed One mistake in those original formulas at least from a performance point of view is that m is stored doubled as 2m z This leads to this having operation in the last line where I have to compute m prime over 2 squared It's better to compute only a Single m times z and then in the last line it won't have to be doubled or halved And it turns out that the rest of the computations can be rearranged to use the same number of operations So overall you save one having so this is the the baseline for the the formulas that I'll be discussing today The main observation of the new formulas is that x p is only used to compute the next version of itself XP prime Furthermore XP prime is only used to compute the next version of x RP and it's done by subtracting XP prime from something So that means that you know that XP doesn't have to be part of the state because you can recover it from the other variables of the state Mx Qp and x RP However, it might be that it doesn't save time to do this, but let's see what happens if we do a substitution here so if we substitute in The the recovered value of XP and then if we substitute This XP prime for the XP prime in the last line and then we do a bunch of algebra Which I don't really have time to get into in this talk Then we can come up with a direct formula for computing x prime RP However, this direct formula has quite a number of multiplications in it mostly by y p So we're calculating y p times z times this new intermediate term g However, we're also calculating in this formula y p times z squared and we don't or times z cubed And we don't use z squared or z cubed anywhere else in these formulas So by rewriting y p z cubed as y p times z times g times k Which shares a very large common sub expression with this last line We can save quite a few multiplications in it on top of that And so in the end we net out to one multiplication ahead So 11 multiplications per bit Three of which are squareings plus seven additions You can also massage these slightly to turn one of the multiplies into a squaring at the cost of a couple additions And so this is generally the form of the manipulations that I did to produce the the other formulas in this paper There's not necessarily a deep new insight, but just the observation that in these existing formulas from 2017 Some of the variables might be redundant or might not be computed in the optimal way And instead calculating some difference of them or some calculating them in some different way might produce a better result So this new Montgomery ladder as I said takes 11 multiplies and seven adds And it parallelizes on either two or three parallel units So you could get it down to a latency of only four multiplies It also requires six field registers worth of memory, which is pretty efficient It also has an advantage over previous work that it doesn't reuse the output of the ladder step within the ladder step itself And this gives an advantage for against correlation attacks There's also a different Montgomery ladder presented in the paper. So this stores yq and yr instead of yp and it Uses it can be simplified down to the same eight multiplies three squares and seven adds But it has a parallel version that's slightly has slightly more operations, but they're all parallelized four ways So the total latency is only three multiplication latencies and three addition latencies This makes it competitive with the most efficient formulas for the Montgomery ladder on Montgomery curves if you have a four-way SIMD machine I've also got a new formula for the schwa ladder So as I remarked towards the beginning of this talk, the schwa ladder uses the same ladder step as the Montgomery ladder However, the representations of the points need to be slightly different because you have to be able to swap p and q instead of swapping q and r In particular, you can't just have a y coordinate stored for p without having one stored for q So computing that extra y coordinate ends up costing after some simplifications one extra multiply So the schwa ladder With our formulas is slightly less efficient than the Montgomery ladder It does however use less memory and in fact less memory than any other Prime field elliptic curve formulas. I'm aware of it only needs five field registers Plus the scalars and the curve constants And in fact, you can do the setup and the finalization of the ladder with only those five field registers So you could do the entire operation with very little memory So, how do we do the setup now if you have x and y and you need to compute coordinates of p and 2p Then you can do this easily using formulas that you would find on the internet However, if you have only the x-coordinate, it's not totally obvious how to do this and the reason is that You don't have y and so how are you going to compute y times z cubed now? You do have y squared from the curve equation, right? It's x cubed plus ax plus b So it turns out that if you set z equals y Then all the coordinates that you have to compute are only functions of y squared and not just of y And so that means that you can compute them using the the y-square that you would recover from the curve equation A second interesting question is how do we Finalize the ladder which is to say given the ladder state that more or less represents xz squared and yz cubed for the three points How do we recover x and y? Well, we just divide through by z, but the ladder state doesn't contain z So we need to solve for z If we're doing the Montgomery ladder then this p in the ladder is the same as the base point And that means that if we've stored the initial point then we Have both that and its coordinates scaled by z squared and z cubed from this It's easy to solve for z or for 1 over z And then clear the denominators Except it doesn't work if the original point had an x-coordinate of zero in which case we would divide zero by zero So this technique is only working on the Montgomery ladder and only on curves where xp equals zero is not a possibility However, there's another way to finalize the elliptic curve ladder using curve invariance so If you take the curve equation and you intersect it with a line, then you get a cubic equation in x Furthermore, you know the three roots of this cubic equation And so because the coefficients of a cubic equation depend on the roots in the way shown on this slide You can recompute the curve coefficients a and b which presumably you also have stored in a ROM or something somewhere However, when you do this you get a times z to the fourth and b times z to the sixth because they have a different number of factors of x in them And so by dividing these two you can solve for z squared or 1 over z squared Which will give you the x-coordinate of the output you can combine this on the Montgomery ladder with the previous slide Where you can get z cubed from yz cubed and that will also allow you to extract yp However On the joie ladder this only allows you to output x it furthermore requires that the the curve coefficients a and b are not equal to zero So it won't work on sec p256 k1, but that curve doesn't have any points with x equals zero So you can use the previous slide for that curve In other cases for example, if you want to use the joie ladder, but also output the y-coordinate You can do this by tracking z Just add it to the ladder state and you'll have to multiply it by something on every ladder step So it costs you an extra multiply per bit, but otherwise it basically just works So a final question that we might ask is are these ladder formulas complete or do they have failure cases? But you already know the answer to this they're cosy formulas and therefore they always fail if they encounter z equals zero in particular If they encounter the neutral point in fact the neutral point is the only way that this can happen and they're otherwise complete And so for prime curves you can sort of work out that at least in the common case There are only at most four different scalars for which you will see a failure and in fact some of them are like the scalars zero, so Those sort of can't be salvaged anyway there the answer is the point at infinity And so if you're using this for ECDH for example, then you might just say well the probability is negligible that any of this happens because the scalar is a Uniformly random value that's not influenced by the adversary However in some applications you might want perfect correctness and for this you need to construct a ladder that avoids the neutral point so We have a way to avoid this at least if the the curve doesn't have any tiny prime factors its order is not divisible by two or by three and the way to do this is to Tract the number of ways that the neutral point can occur So remember the ladder step takes PQ and R to PQ plus R and to R Give or take a sign So I'll just be sort of ignoring sign here because it depends on on swaps so details are in the paper So if the curves order is odd, then you can't Encounter you can't have to R equals the neutral point unless like the state is already at the neutral point So the only way that you fail is if Q equals minus R so that Q plus R is oh And then this also means that because you know that P plus Q equals R you have P equals to Q or perhaps minus 2 Q And then once you're in the neutral state There's also only a few ways that it can go so for the Montgomery ladder you can stay in the neutral state the same neutral state or You can go to to this other state which ends up being 2 Q 2 Q and 4 Q up to sign so Let's let's map that out. So you have a state that starts 2 Q Q and Q And then you sort of add or subtract these right and you'll end up with zero So that would cause a failure in a cozy ladder And then finally you'll go to 2 Q 2 Q 4 Q Or you'll stay in that state for some number of ladder steps By laddering it, you know swapped and then and then later you'll exit to 2 Q 2 Q 4 Q So what we can do though is that we can do a different permutation on the state And this will allow us to sort of shadow the state of The the Montgomery ladder that would fail because it would go through the neutral point without actually going through the neutral point in In that representation So the way we do this is that we swap the first two points and then applying the ladder step goes to the state Q 3 Q 2 Q Which we call the shadow state because it sort of shadows the the the neutral zone without actually going there And this shadow state if you permute it and apply the ladder operation you get back to the same state so you can you can track as long as the The sort of ladder you're shadowing is in this neutral state. You can stay in the shadow state Using ladder operations and then when the when the ladder that the ladder that you're shadowing Sort of exits to the state to Q 2 Q 4 Q then you can do a different permutation of this state And then you'll end up in the correct output state again up to sign and the details are in the paper Um Actually, I should say that the details are in the auxiliary material whose URL is given on this slide Because while there's this whole state machine a complicated arrangement of swaps and so on so it has more detail than really fits into the 20 pages This ladder which dodges the neutral point is slightly slower It needs the extra y coordinate just like the schwa ladder Because you have to swap all three points instead of just two of them However, you don't need to use this representation for every iteration of the ladder Even if you're going for perfect correctness Because you can't get to the neutral point immediately, but only after some number of iterations So with the common case, which is the a prime order curve You can't get to the neutral point until like the last two iterations So you don't need to do anything special until the very end so this means that while the Complete version of these Curve operations is significantly more complicated It's not actually that much slower and it's entirely a viable option if you really need perfect correctness So that's all And please direct any questions to me during the question section you hear the references