 All right, so I will give a short introduction to curve-based crypto then I will talk about basically a top-down approach of our implementation, so I'll give some info about our signature and key exchange scheme Then go down to the building blocks, so the Jacobian and the kumar arithmetic Finally some details about the finite field arithmetic And I will conclude by comparing our results to the state-of-the-art implementations So a short executive summary We are basically the first software-only implementation of hyper elliptic curve crypto on microcontrollers In particular the first to do a signature scheme based on kumar services And we show significant improvement both in terms of speeds size and also stack usage compared to other implementations So everything I will be talking about is available on my website, which is displayed here So feel free to download it and have a look Right so For an introduction into curve-based crypto, most of you will have heard about elliptic curve crypto Maybe not so much about hyper elliptic curve crypto. So I will talk about both of them here So basically what we can do is we can classify curves by their genus And in this case we just care about genus one, which is elliptic curve crypto And then genus two in which case we talk about hyper elliptic curve crypto And why do we care about this for crypto purposes? Well, basically the idea is that An elliptic curve gives us a group and namely we can take the points of the elliptic curve And we can define a nice group operation which can then use in cryptographic protocols which rely on groups so Snore signatures or defi how man can be done using groups and the same is true for For Jacobians of hyper elliptic curve. So in this case, we don't take the points of an unhyper elliptic curve But we have to do something a bit more generic but the details are not important that the ideas that we still have a group and With a group that we have basically two operations or one operation, which is addition And a subset of that is like doubling operations And it's very easy to see that we can combine these operations into some sort of scalar multiplication By using some double and as or Montgomery letter kind of technique And using scalar multiplication. We can very easily build defi how man so Yeah, you do one scalar multiplication on one side and one on the other side and you somehow combine it into into a key Moreover for signature schemes We have snore type signatures for easy dsa or at dsa Which basically rely on scalar multiplications, but I also want to emphasize that these rely on generic group additions as well And it will be important for basically my next slide And I also want to say that these operations on Jacobians are generally Not as easy to make fast as the operations on their elliptic curves and also not as easy to make Constant time, which is important for a side channel attacks, of course So since elliptic curves have been studied so much, there's also a lot of research on how to make these things fast And one idea is basically to take the points on the curve and then identify points with their inverses And in practice what that means is that we can represent these points by two finite field elements X and Y And basically we can just drop the Y coordinates and By doing that we lose the group structure Which looks like it could be a problem, but we still have these operations Which are called X double and X adds which kind of salvage the situation because we can still use these operations To create a scalar multiplication routine And that's what we need for Diffie-Hellman. So key exchange is still okay since we still have scalar multiplication On the other hand by losing the group structure, we lose generic additions. So we lose the We lose the ability to do snore type signatures So for example curve 2.519 uses these X only operations to very very fast key exchange But for the adds of the assay signatures, they would move to the adds 2.519 So the twisted Edwards form which is kind of an equivalent curve Right, so why do I talk about all this elliptic curve because it's kind of easy and It's used a lot and the kumar case is really close to that So instead of working with these points we now work with Jacobian elements and we can do the same thing We can identify elements with their inverses and Again, we destroy the group structure, but as same as for elliptic curves We are still left with these X double and X add operations Which allows us to do a scalar multiplication, which again allows us to do key exchange So key exchange on kumar surfaces is kind of easy But again by destroying the group structure. We lose the ability to do signatures using kumar surfaces So summarized We can either use elliptic curves or their or Jacobians and we can do key exchange and signatures Or we can go down to the kumar varieties Which gives us a very fast a much more efficient and easy constant time key exchange, but we lose signatures so it was basically the situation for up until a couple of months ago and Now there's a new result which was presented last week at suck by chung Costello and Smith where they They say, you know, you cannot do signatures on the Jacobian But you can still abuse this fast efficient kumar arithmetic For this And the idea is really that you know, you start with a point on Jacobian and instead of doing all these Jacobian operations You really map down to the kumar You do all your efficient operations there and then you somehow recover to a point on the Jacobian And The point is that the kumar operations are really easy to get time constants and they are the key dependent operations So if we can somehow get this project and recover up and down reasonably efficient, then this will benefit a lot in an implementation So what has been done so far? So on larger platforms. There's some implementations of kumar services So they're doing kumar key exchange So by Bernstein and some others And they got really good results basically. They're only being trumped by the 4q implementation by Costello and longa We use endomorphisms on elliptic curves And for microcontrollers the situation is that there's really not much Hyper elliptic curve based crypto. There's some Implementations which use hardware, but for software only implementations. There's not much there So basically we this leads to two questions First one is how well would kumar based key exchange do on the microcontroller? And that the answer is reasonably predictable There's no reason to believe why they wouldn't do well or as well as on large platforms also because The fields are generally smaller than elliptic curve case. So could maybe even expect them to do much better But the second more interesting question is how do these kumar based signature schemes? Perform so they have really never been implemented anywhere and there's this proposal by to Costello and Smith But it's really not obvious how this translates to a real-world implementation. So that's the main Question that we want to answer here, right? So what did we implement? So this is our signature schemes, and there's a lot of stuff on this slide. I would urge you not to try to read everything This is basically a snore signature. It's very close to the at the SA with some minor modifications So what we need is this public generator a point on the Jacobian And we need some kind of hash functions or our implementation uses shake 128 We need a secret key and a message And we get three functions. So we get a key generation function a signing function and a verification function And the first thing I want to point out is that these elements are really on the Jacobian as I said Without Jacobians, it's hard to do signatures. So these elements really have to be on the Jacobian But the actual work is still done through the kumar So, you know, it looks like this stuff is all going on a Jacobian But underneath we are just all the time mapping down to the kumar and doing the hard work there and then recovering back to the Jacobian And finally by going through the kumar surface we kind of Lose the opportunity to get efficient batch verification And since we already lose batch verification, we may as well do some optimizations in the signature size So this was already proposed by by schnoir So instead of sending over some points, we can send over a hash of the point which has only half the size Which which reduces our signature size from 512 bits to 384 bits Which is quite nice. And also this means that in verification one of our scalars is a bit smaller So we also get some some speed optimizations there And finally, this is not very apparent on the slide, but there's a lot of compression going on of Jacobian points here Which is not really relevant at the moment So the key exchange scheme is much simpler. So for this we can just do stuff on the kumar surface So our generator is just the point on the kumar and we have some secret key Basically, we have two operations, which are actually the same So we can take some points, which is either the public generator and we multiply by the secret in which case we have some key gen Or we can take some public key multiplied by our secret scaler and then we have some kind of exchange function So some kind of key exchange, but there's really nothing more than a simple Tiffy-Humman So What did we actually implement? So we use the Godry Schost's curve Which is a genus 2 hyper elliptic curve. It's defined over a very nice finite field So it's this fq where q is 2 to the power of 127 minus 1, which is a prime Which has a really nice shape So this curve has Jacobian, which you can simply define and it has a kumar surface which can also define by identifying points in their inverses and the point I want to make here is that So I don't want you to read all these operation counts But the idea is that you can very easily just express all these operations in number of operations on the finite field So you can see that it's very straightforward to make all of this constant time Like we have an addition function on the Jacobian, which we need basically only in verification And then we have the projection function, which takes a constant number of operations And there's this x double adds which is basically the function underlying scalar multiplication on the kumar So it's basically doing x double and an x at the same time which you can do So if you do this you you get some efficiency And then there's this recovery function Which goes back to Jacobian from the kumar, which is reasonably expensive But since the kumar operations are so much better than the Jacobian ones. It's it's kind of worth it Right so finally some notes on finite field arithmetic So the first platform we implement our things on is the AVR at mega. So this is family of 8-bit microcontrollers So what is really important is that we need big integer multiplications and squareings So you can represent their elements using 128 bits since their field elements over prime fields 2 to the power 127 minus 1 So here we kind of cheat. So there's this implementation by Hooter and Schwabba Where they use 256 by 256 big integer multiplication and they use Karatsuba So they use three-level Karatsuba and they use multiplication and two-level Karatsuba squaring and by the way Karatsuba works is that it reduces it down to a couple of 256 by 256 reduces down to a couple of 128 by 128 bits multiplications or squareings And since they already highly optimize this there's really no reason not just to use Dare multiplication techniques directly in squaring techniques. So this reduces the level of Karatsuba we do by one So instead we have two-level Karatsuba multiplication and one-level Karatsuba squaring So on top of that we need some kind of reduction since we are working in the field And this prime has a very nice shape again. So we have basically 2 to the power 128 is converted to modular prime So whenever we do a big integer multiplication, we have 128 bits number overflow We can multiply by 2 and add it to the bottom and that's already enough for our reduction So combining this we get field multiplication and squareings So one property of this kumar surface that we are using so the kumar surface of the Godry Shost Curve is that it has a lot of small constants So for some efficiency gain we can define a separate multiplication by constant function So that's 16 by 128 bits Finally we have an inversion which is just an exponentiation based on Euler's theorem Which therefore is also constant time So the second platform is the ARM Cortex M0 So this is basically the idea is the same. It's a 32-bit microcontroller So instead we have four 32-bit words again. We have one bit redundancy, which is Very nice for implementation, which makes additions and subtractions easier to implement And again Not by coincidence. There's another 256 by 256 bits big integer multiplication implementation by by dual and some others Here they use three-level Karatsuba multiplication and squaring. So we lift out their 128 by 128 bits Big integer multiplication and squaring and we are left with two-level Karatsuba multiplication and two-level Cut Shuba squaring and Reduction and multiplication by constant and inversions are done Exactly in the same way Right, so I have quite some time So for results and comparison So on the AVR mega I want to emphasize that I'm really comparing skater multiplications here So all the other implementations have not implemented full scheme So not full the Fihelman or full signatures. They only implement skater multiplication So I want to compare our kumar skater multiplication whenever they use key exchange and our jacobian skater multiplication whenever they use Signatures because that's what you would need for for the scheme So how well did we do well the fastest implementation out there for key exchange was basically implementation by dual et al who implement curve 255 19 Basically we reduced the number of cycles by 32 percent but we also almost have the code size and we have the the stack usage Which is really nice or even the stack uses even much more so almost 80 percent So for signatures, there's the implementation by Venger under Lugauer and Werner Basically there we reduce the number of flow cycles by 71 percent, but we do increase stack usage a bit While the code size is kind of similar And there's one other implementation out there which really implements full signatures So it's by Neschimento Lopez and the hub But it do at the assay signatures using at two five nineteen So there we almost have the number of flow cycles And we also reduce the stack usage compared to them and they do not report code size So we cannot actually compare those So on the other platform the arm critics and zero again only comparing scalar multiplication So for key exchange again do a let's haul have the fastest by a curve to have a 19 implementation Reduce the number of flow cycles with 20 27 percent and we have code size and stack usage so there we are doing much better For signatures, it's again Werner under Lugauer and Werner very reduced the number of flow cycles by 75 percent Although we again increase code size as that usage That was already it so thanks for your attention