 and welcome to C-Tide, faster constant time C-Side. So what is C-Side? C-Side is a post-quantum key exchange protocol that is based on a group action on a certain set of elliptic curves. The secret keys sample from some key space give group elements and then we use the group elements to evaluate the group action to obtain public keys. What is C-Tide? C-Tide is a new key space and a new algorithm to compute the group action in C-Side. Moreover, our new algorithm is constant time and we verify this claim using Valgrind. In addition, we obtain significant speedups as compared to the previous best constant time implementations. So let's get started on C-Side. In C-Side, we use elliptic curves over a finite field Fp where P is chosen in a very special form. Then there is some mysterious abelian group acting on the set of elliptic curves in C-Side. All we need to know is that all these curves are represented by a moncomeriform, so we only need to keep track of this a coefficient. Then by construction in C-Side, for every one of these primes li, we have a group element in the mysterious group such that evaluating the group action can be done very efficiently using Li-sogenies. You can think of the group action as starting from one elliptic curve and by acting, being transported to a new elliptic curve. And isogenies are maps of elliptic curve. So starting from one curve, you map to another elliptic curve. Then the group action that we need to evaluate in C-Side is given by composing the small elements Gi that we know how to act with efficiently to get more group elements to act with. Moreover, we want to evaluate this group action in constant time. And what does it mean? We want to evaluate the group action in a way that the timing of the algorithm does not provide any information about the private key and does not provide information about the output. So that will be the goal for the constant time claims in this talk. First, we start with the new key space. Like I said, the group action that we want to compute, we want to compute the action by the small elements Gi taken EI times. So this information is captured in an exponent vector, which is just an integer vector. And these exponent vectors are sampled from some key space and we just need a key space to be large enough. In the original C-Side paper, you just take the same bound for every entry of the vector, bearing in mind that the resulting key space needs to be large enough. Sometimes you may want to only allow non-negative entries in your exponent vectors. And also because evaluation of the group action by Gi for different I's takes a different amount of effort, it's very useful to allow the bounds to vary for the bounds on the different entries of the vector to vary for efficiency. Now I'm going to explain to you how we change the key space in C-type. So for concreteness, we just take the C-Side of 512 prime just to have some numbers. So for every prime, there is a group element and we can look at the corresponding entry in the exponent vector. Then we batch these primes together. So we just pick any splitting of the set of primes and we consider all these primes as one batch. So we take them together and then we notice that this exponent vector is actually in the subset in which the subset of vectors, integer vectors in which we compute exactly three isogenes, three group actions for primes in this batch. And in which we compute exactly five isogenes, five group actions for primes in this batch and so on. To get a larger key space, we also know that the same exponent vector is in the subset where we compute up to three isogenes in this batch, up to five isogenes in this batch and so on. So this is our resulting new key space and it's not at all clear how this speeds up any of the computation, but the other, the second contribution of C-type is that we also change the algorithm to compute the group action evaluation in a way that we can really evaluate the group action for any prime in the batch efficiently. But to explain this, I first need to tell you how the group action is evaluated in general. So this is where the isogeny magic comes. So remember for all of these small primes, we have group elements that we can act with efficiently and the group action is taking from one curve to another. To get to isogenes, we know that this action is evaluated using isogenes that also take you from one curve to another. And isogenes are algebraic maps of elliptic curves. That is, if I have two elliptic curves, then the map is given by some rational functions, but they're also homomorphisms of groups. So if I have a point on an elliptic curve, then it gets mapped to another point on the new elliptic curve. And the special property that we will be using, that's inherent to C-type, is that if I take a point whose order is a multiple of Li, then if I evaluate it under the isogeny, I drop the Li in the order here. So the group action is given by isogenes and now I'm gonna give you more details on how to compute the group action. There are two steps. First, we find a point of order L on the first elliptic curve. And then we compute the isogeny with the kernel, that point of order L. The first step, finding the point, one easy way to do it is to just generate a random point of order p plus one. And then, which is rather easy. And then we multiply by suitable factor to get a point of exact order L. In the second step, computing the isogeny also splits into three different steps, depending on whether you use the value formulas or the scared value formulas. But in any case, we always enumerate some multiples of the point. Then we construct a polynomial that iterates over these multiples and takes the X coordinate. And then from this H polynomial, we somehow complete the coefficient of the new curve. The second part is really efficient. The second step, all of it takes less than six L multiplications in FP. But the first part is the crux of the matter because here is a very large scale multiplication. If the prime that we start with is a 500 bit prime and maybe we need to use larger primes for seaside, then this scale of multiplication is gonna be really much more costly than whatever happens in this intricate second step. The way to get around this is to compute the group action by several elements in one go. And hence only have to use this costly scale of multiplication once. What do I mean by that? Let's just do it on an example. Suppose we want to evaluate the exponent vector one, one, one, which means that we want to compute a three, five and seven isogenic. Then the procedure is similar. We first find a suitable point and then we compute the isogenes. But this time, instead of finding points of order three, five and seven, we find point of order three times five times seven. So from this costly multiplication, we get a point of order three times five times seven. And then if we compute the isogenes, we only ever have to do small scale multiplications to get the correct point of order three, five. And we also use that we can cheaply evaluate isogenes on points. And if we evaluate the isogenes on a point of order T one, which had exact order three, five, seven, then the resulting point drops the three in the order and has point of order five times seven. So in this way, we replace three very large scale multiplications by one large scale multiplication, two very small ones and two rather cheap point evaluations. And this is the way we evaluate isogenes, isogenic action always as a sequence of isogenes and pushing points through. The one disadvantage is that if you look at the timing, you can immediately tell that you computed a three, five and a seven isogenic because you know how long these individual steps take. Fortunately, it's very easy to adjust this algorithm or this procedure to not reveal which of the isogenes we computed. So how do we do that? Suppose we want to again evaluate a three and seven isogenic but no five isogenic. The only thing that changes is step for the five isogenic where we still do the same thing as before, but now we throw away the results. And remember the group action was moving from one curve to another. So if we stay on the same curve, then it's the same as not computing group action. But to fit into the flow, we need to make sure that the next step receives point of correct order. So that's why we use, we add a small line of code to always compute the correct multiple. Because then if you check it, this multiple then has the correct order and you can use it in the next step. So in this very simple way, albeit using dummy operations, you can turn the code that computes a three, five and seven isogenic into a code, a piece of code that computes any subset of these three, five and seven isogenes with the same timing. And in our paper, we formalize this into atomic blocks. So an atomic block is a probabilistic algorithm that computes the group action for any subset of indices in the exponent vector. In a way, the time distribution of the algorithm does not depend on the actual, on the input. So it does not depend on which curve you started with and it does not depend on whether you choose zeroes or ones for the exponents, but it only depends on the subset of the isogenic degrees if you were willing to compute. This might be a complicated definition, but we already saw it on the previous slide. We saw a way how to build an atomic block that always computed the three, five, seven isogenes. So the action by the first three elements in a way that didn't leak whether we actually computed or did not compute the isogenic. And once you have atomic blocks, it's actually rather easy to turn it into a constant time algorithm that computes the group action in C side. And so atomic blocks are a way to formalize the previous approaches to constant time isogenic implementation. Okay, so now how do we use these, how do we use this all computation with batching? So remember that for atomic blocks, we need to make sure that the information, the timing only depends on which of the actions we're computing and doesn't leak information about anything else. But with batches, there's one extra piece of information. We also need to protect which of the isogenes in the batch we use. Okay, so let me just give you the algorithm on how to extend what we just had, the computation for isogenes in batches and then tell you what we need to fix. So again, if you want to compute isogenes using batches, then we need to find a suitable point and we can make it so that the point has order that only depends on the batches and not on the individual primes. Then again, we need to take some scalar multiples. And in most of the steps, we can make it so that the procedures only depend on the batches. But you see that there's a bunch of scalar multiplications that depend on which prime in the batch we chose. And then there are isogenic evaluations that also depend on the isogenic, the actual degree that we chose. So those are the steps in red. Fixing the scale multiplication is easy because with very small overhead, we can just do it. We can just multiply by all the primes in the batch and multiplying by the chosen prime is done as a dummy operation. So that's with very small overhead. But how do we fix this five isogenic and this 11 isogenic? How do we do it so that this computation is the same for all primes in the batch in an efficient way? We could also just compute the three, five and the seven isogenic somehow and throw away the results, but that would be very inefficient. So the answer to that is the Matryoshka isogenic. And to explain it, again, we go into what actually happens when we're computing these isogenes. So to compute an 11 isogenic, we need to enumerate some multiples of the point. We construct some polynomial which is the product of linear factors that are taken from the x-coordinates of the points in step one. And then from this polynomial, we somehow derived this a coefficient. Well, but if we were computing 13 isogenic instead, we only need to add one multiple and we only need to multiply by one extra linear factor. If we do a 17 isogenic, then we need to add two more multiples and we need to multiply by two more linear factors. Or the other way around, if we're computing a seven isogenic, the code already computes everything that is needed to compute an 11 isogenic or a 13 isogenic. So for the primes in the batch, we can compute an isogenic for any prime in the batch at the cost of the largest prime, at the cost of using dummy operations. This Matryoshka property of isogenes, they somehow, you can just keep adding things to compute isogenes with larger degree. And this was already known. What is new is that we notice that this property is also for the new value square root formulas. And the reason why we're getting these speedups is that we realize that it's actually, this actually works well with batching because you don't want to, you don't want to pay the cost for a small isogenic at the cost of a large prime isogenic. But if you have batches, then if the primes in the batch have similar size, then paying the cost for a slightly larger prime is not such a big overhead. But of course, now you need to know how to set it up so that these batches actually give you the efficiency. Well, in general, we don't know how to set up batches efficiently. We don't know how to set it up because it looks like a very complicated optimization problem. Now, what we can do, we can estimate the cost of the whole group action evaluation for any batch configuration. So if you tell us, if we set up the number of batches or split the primes into batches and we give the bounds for each batch, then we can give a pretty accurate estimate of what the resulting cost will be. And then we can use this cost function to do a greedy algorithm to start from some configuration and adaptively try to change it so that we get a configuration with a smaller cost. And this way, we arrived, for instance at the, this is our best current batching for C-side 512. You see that the primes in one batch are usually pretty close to each other so that there's no big overhead. Remember that in batching, you pay the cost for the smallest isogenic is the same as the cost for the largest isogenic. That's why the first batches with small primes are rather small. And then you see that the batches get larger as the primes increase. And the one prime that's a lot larger than all the other primes is isolated because you don't want to pay the cost for this prime already for these smaller primes. Okay, we also claim that our algorithm is constant time. And beyond just understanding atomic blocks and having conceptually good ideas about how constant times should look like, we also use Valgrind to check it. So what can Valgrind do for us? We can check whether there's any flow from the secret data to any branches or any arrow indices, okay? So if you just execute your data, then the secret data might impact somehow the code execution. But if you declare them as undefined in Valgrind, then if they actually do impact the code execution, then Valgrind will complain. And then you can do a manual check and see what's happening with your code and fix it. So if do these checks with Valgrind, then you can have a pretty solid confirmation that your code does not leak timing information about secret data. Finally, let's talk about the speed of our C-type software. The green lines are the new C-type algorithm and here how to read the table. This is the size of the prime. This is the size of the public key. And depending on one or two in the third column, we either do the group evaluation or also public key validation for the C-type protocol. For the C-type protocol. And you see that if you count the number of multiplication, square, and sum additions, and depending on how you estimate the relative cost, every time we get significant speed ups compared to previous constant time implementations. We also measure the cycles on Skylake and even there, we also have a significant improvement. So to sum up, what is C-type? C-type is a new key space for C-side using batching. C-type is a new constant time algorithm to evaluate the group action using the Matryoshka isogenes. In C-type, we also formalize atomic blocks so we formalize the evaluation of the group action as a sequence of isogenes. We verify our constant time claims using Valkyrie and we obtain significant speed records. And you can see our article and you can get, most of all, you can get the code at our website. Thank you.