 Welcome to this presentation on Cryptanalysis of Psyche. This is showing work with my co-authors, Greg Costello, Patrick Longa, Michael Narek, and Fernando Virdia, and this work was done while we were at Microsoft Research. The main topic of this presentation is Psyche. Excuse me. Psyche is short for Super-Singular Isogeny Key Encapsulation, which is around two candidates in the NIST standardization process for post-quantum cryptography. I won't go into too much detail here. If you want to know more information, please visit the website, Psyche.org. I want to highlight the work on cryptanalysis that has been done on Psyche. First of all, the work by Ajay Adal, who'll look at classical cryptanalysis, and then the work by Jacques and Skang. Initially, and much more recently, a very recent e-print paper by Jacques and Trotelor, who'll look at quantum cryptanalysis. That's interestingly for Psyche. From round one to round two, these work showed that the attacks that were considered to be practical were actually not as easy to execute as initially was thought, meaning that the security actually went up a bit, meaning that the parameters could be decreased and therefore resulting in smaller public keys but also faster execution. So in this talk, we consider further analysis of classical attacks. Only we will not look at quantum attacks. It's hard to say which of the two really determines the security of Psyche at this point, which makes it interesting to still consider both of them. Okay. So let's start with a bit of an introduction to isogenic-based cryptone. So it started as SIDH, which is super-singular isogenic defilemon. The day behind is that it's a graph-based protocol. We built this graph from nodes, and these nodes here are going to be isomorphism classes of super-singular curves defined over f of p squared, where p is some prime which is chosen as a parameter. So these are sets of curves with the same chain variant. If we take the set of all these curves, it turns out that they all form a connected graph. Where the connections here are built, so the edges between these nodes are built as what's called an isogenic. An isogenic, in simplest terms, is a very natural map between two elliptic curves. So it's determined by two fractions of polynomials, F and G, written here that just maps the X and Y coordinates on one curve to the X and Y coordinate on other curves. So these are the edges in this graph. When we look at the whole graph, it turns out we have about p over 12 of them for a parameter p. If we look at a local view, so we look at a single node, then we see that if we fix some prime L, then at each node we'll have L plus 1 outgoing ascied in these. So in this case, I printed a big blue dot here, which is a starting curve, so it's one node, and from that node, we see three outgoing edges which go to other nodes. So we can continue this picture since every node or essentially every node is going to have three outgoing edges. We can continue drawing this picture on and on. I won't draw it at the top because it won't fit, but also there in a generic SIDH setting, it will just continue growing and growing. So what does SIDH do? So an SIDH public key computation now starts at this blue node, and then it takes a random walk through the graph. So it takes here four green steps landing on this green node, and here the green node would be the public key, and this walk, this random walk would be the secret key. Another example of a walk would be the following ending up on a different public key. So now you can look at how many public keys are there actually. I've drawn 15 here. You have to be also considered the ones that go upwards, which I didn't actually draw here. So if you work this out, you can see that in this case, there would be 24 different public keys. So we look at crypt analysis of this. So the most naive way you could go about breaking this is a brute force attack where you simply try each and every one of them until you find the right one. So this has the following complexity, assuming we take E2 steps in the graph. The first one has three possibilities, and each step afterwards, since we're not allowed to walk backwards, only has two possibilities. So we get three times two to the E2 minus one probability, a complexity for the attack. So one remark immediately we can make here, when we move from SIDH, a generic SIDH problem to a psych problem. So a psych problem as defined in the NIST proposal, you can see that this blue dot is actually not really a generic node. So what happens if you look at it, it does have three outgoing asogenes. You can see here in the bottom, but one of them turns out the loop back to itself, and two other ones end up looping to a single node. Okay, and so this first step, which should give you three possibilities, actually only gives you a single one. If you assume we don't wanna loop back to the curve itself, then you can only end up on one node. And so the complexity of this attack basically decreases with a factor of three. Okay, and so here you can remark that we're working with big O terms. So this three actually doesn't change the complexity and it's true, but the constant hidden in this brute force attack is simply the cost of computing an asogyny, and the actual attack here does speed up with an actual factor of three. Okay, and so this is not the best way to attack this. And so what we can do if we assume to have some public key that we're attacking that we're trying to retrieve the secret key from, instead of just starting from this public parameter curve, which is the blue note here, we can also start walking from the public key and try to end up on a shared node in the middle. Okay, so we sort of walk halfway from both sides and by the way the site parameters are set up, it's guaranteed that there's only a single node which connects the two halves. In this case, the node labeled four, which I made blue. Okay, and so what is the complexity of this? Well, in the sort of the most ideal case, you can walk halfway from one site and store everything in the middle, and then you get a square root complexity. You get sort of a baby step, giant step kind of thing. If your memory isn't as large as it can store all that, which for typical site parameters it won't be, and then you can choose a smaller memory, which does increase the runtime then. So you get sort of a linear trade-off where via a memory of size w, you get a speed up with a factor w. Okay, and so in practice, these memories that we have available to us are actually fairly small compared to sort of the parameter sizes that we work with. And so there's an additional algorithm. So this is called the view w, the Van Oort shot wiener algorithm, which works better than meet in the middle, assuming that your memory is small. So if your memory is smaller than sort of the square root of the number of public keys, and then this Van Oort shot wiener will outperform the generic meet in the middle attack. And so it has the following complexity, two to the three times e2 minus one over four divided by the square root of w. And you really work out that if w is the square root of the number of public keys, it will basically have the same complexity as meet in the middle and any larger memory will be better for meet in the middle. Okay, and so here we can make another couple of remarks about psych. So first of all, there are some choices that are made in the psych proposal, which have an influence on its security, a small influence, but an influence nonetheless. So the first choice is the fact that the starting curve is a subfield curve. And by subfield curve here, I mean a curve defined over F of P, which is a subfield of the sort of the field FB squared in which the J invariant of all these curves is guaranteed to lie. And the influence that has is that subfield curves have Frobenius and amorphisms. And this Frobenius and amorphism leads to sort of conjugate classes. Okay, and so now we have sets of nodes which are connected by the Frobenius and amorphism. And so this is very similar to the case of polar row on classical elliptic curve Diffie-Hellman where you have a point, but also minus its point leading to a conjugate class by the minus one map. Here we have a similar conjugate class only induced by Frobenius map. And this means that sort of the, yeah, sort of the entropy from one side. So walking from the subfield curve is decreased by factor two. Okay, and the other hand, if we start from the other side due to the choices of arithmetic, so psych works only with Montgomery curves, it turns out that the kernel of the finalisogony, so the kernel of the dual of the finalisogony is always going to be the one where the x-coordinates of the point is one. And this gives some information away about walking backwards. Basically it gives away the first two isogenes on the way back. So this again also decreases the number of nodes that we can land on, and therefore it decreases the complexity of the attack. Okay, so if we look at vw and psych, we go from three times e2 minus one to three times e2 minus four over four, divided by square root w. Okay, and so this is Van Ossia-Wiener. I didn't actually explain how Van Ossia-Wiener worked. So is this such an important algorithm? I want to spend some time going through it. Okay, and so the problem it solves is a meet and a middle problem. So we have two functions, a zero and h one. That map from some set s, so these are just the integers from zero to n minus one, do some other set t, which in this case are gonna be sort of the nodes in our graph. And you want to find some x and y such that h zero and h one collide. Okay, and so the way we do this is we define several functions. So we define a family of functions, which do the following. So they take an input as an input, first of all, an element of the set s set here, but also a bit b, okay? And so what it now does is it takes an element z and it evaluates the function h zero or h one determined by the bit b. So it looks at the bit b, if it's zero, it evaluates h zero at z, if it's one, it evaluates h one at z. And then to the output, it applies g of n, where g of n is just some random function indexed by n. Okay, and so for example, it could be shot three or shot 256 or some other hash function, which kind of has a sort of random behavior. And to generate a family of those, we can just domain separate them by this index n. Okay, and so now what happens that we get a family of functions indexed by n and each of these functions has a golden collision, meaning that it's a collision of the function f, f of n, such that if we find that collision, it leads back to a solution of the meet and the middle problem, okay? And sort of a downside that we have to deal with here is that we don't only have a single collision, but these functions f n, they're going to have many collisions, many other collisions, which are kind of useless for our purposes. So we're going to do collision searches on these f n, but there's only a single one we're interested in and all the other ones we basically throw away. And this is simply called a golden collision search in any of these functions. Okay, so finally we have to find a collision in one of these functions f n. So how do we do that? So we assume n to be fixed, and we assume to have a set as star of size n and a memory of size w. And the way the algorithm works is very similar to polar row. So there's some distinguishness property that says that a fraction theta of this set as star is distinguished, where here theta is defined to be the square root of w over n. And then we start the algorithm by randomly sampling an element z of s star and iteratively applying the function f n to it until we hit an element that's distinguished. And then we go to the memory and we check whether this distinguished point already appears in the memory. If it does not, we simply store our point in memory and we sample another element and we start over again. If this distinguished point is already in memory, and then we have a collision and we check whether this is the golden collision. If it is, we are happy and the algorithm concludes. If this is not the golden collision, then we just store our distinguished point in memory again and we sample a new element and start over. It could happen that you're very unlucky and you keep re-sampling elements and finding distinguished points without ever finding the golden collision. And in that case, you don't want to forever get stuck in the same function. So you want to define some way to moving to the next function version. And there's some heuristic analysis by Van Orschel and Wiener that say, well, if you have found about 10 times w distinguished points, so if you filled the memory up 10 times with distinguished points, then you should move to the next function version and try a look there. So how do we apply this to SADH or Psyche? So here, the set is sort of of size square root the number of public key. So it's the square root of two to the E2 in this case. And then we define these FN as follows. So it looks complicated, but basically what it says, we start from EI where this index I, if it's zero, we start from our starting curve, which is a public parameter. If the index is one, we start from the public key. Then from either of those two curves, we compute an isogeny walk corresponding to our input Z. So Z is just some bit string that basically says at each node that we end up on, we get two choices and it tells you which of the two to take. And finally, we end up on a J invariant and we will apply AES or some variant of AES to this key to put with N. And so that's some sort of randomly behaved function that we apply to the J invariant. Okay, and so we implemented all this. If you go through a theoretical analysis of this, so we ran this for different parameter sets. So for different depths of walks for the secret key and different memory sizes. And for those, you can do a theoretical analysis also done by Van Orzen and Wiener that tells you how many queries to this function FN you expect. Okay, so that's plotted here. This X is the expected value. And so the work by Ajay et al, who also implemented this sort of came close to these expected values, but there were still a bit of a gap. It's unclear where it comes from. It could be a minor detail in the implementation or it could be that they didn't average over enough runs. Okay, and so if we run sort of the same algorithm on SIDH, we see that we actually get extremely close to this expected value. So yeah, 0.09 bits different, for example, for the first row. Okay, and so it's sort of more interesting maybe. So what happens when we move from generic SIDH to psych? So we can now include the optimizations for the psych parameter choices. So the equivalence classes under Frobenius, but also the kernel of the dual that's known. So this decreases the set size by factor six, and therefore the attack decreases by a factor about 15. Okay, and so if you run this, you can see the expected value goes down a lot. And again, our implementation reaches very close to the expected values and is about a factor 15 faster and then the generic SIDH. Okay, and so this is a sort of theory. It just measures how many queries we need to the oracle where the oracle is basically an isogenic computation. And so here we implemented all this built on the psych submission. So one advantage of that is, is that it uses the fastest available arithmetic. So these isogenic walks are constructed using Montgomery curves, just like the psych submission, which leads to some speed ups in this oracle, which again, asymptotically, doesn't have much of an influence, but actually running the attack in practice, it will speed it up quite significantly. And so all of this will be published soon on the following GitHub pages here. I say soon, it could be so soon that by the time that this is put online, this GitHub will also be online. So yeah, keep a look out for it. Okay, and so I think an important remark to make about implementing this Venarsha Wiener in practice is that naively, it's actually quite simple. It's sort of a simple algorithm to explain, but there are actually many salties to it. But for a single instance, which has a certain memory available, it works very well. And there's no significant overhead for the single instance querying memory, basically every time it finds a distinguished point. This changes though, when you go to a more realistic setup. So theoretically, Venarsha Wiener parallelizes perfectly. So if you have M instances now with some shared memory of size W, the runtime should increase by a factor M, exactly. Okay, and so as depicted here. But if you try to implement this, you actually see that all these M instances now have to query the same memory all the time. So there's lots of distinguished points that are found and they have to query this memory to check for collisions every single time. And this actually becomes a big problem and becomes a big overhead. Okay, and so furthermore, you now have these instances running and they have to, if they start running on different function versions, so they run on different indexes N, the collisions that are put into the shared memory become kind of meaningless. So you have to make sure that all these functions are running on the same function version all the time. So there are some synchronization issues there you have to deal with. So what we did, so we didn't consider yet sort of a true distributed implementation yet. So we only consider a multicore. So we assume a single machine with many cores. So we have a 28 core setup, but also we tried with bigger setups where they do share a memory, which is just RAM, but we do have many threads trying to access that memory at the same time, which has already caused a number of issues. Okay, and so finally, if you assume such a setup, if you move towards a distributed setup, kind of one interesting area of research would be to look at how could you use some local memory? So yeah, you assume you have a very big database of size W, but if you have, I don't know, say Raspberry Pi running, it still has a little bit of local memory that you could try to use in some way to speed up the computation. And so I will consider, we'll look into some available options in the next few slides. Okay, and so if we look at the parallelization, so this is our implementation where on the x-axis we look at the number of cores and on the y-axis we look at the actual wall time of our implementation measured in seconds here. And as you can see, sort of the average depicts just the actual linear optimization compared to a single core. I mean, we see that the expected value depicts that the average is the average of our runtime and it matches extremely close to the expected value, which shows that we're really getting linear speed up up to 28 cores. Okay, so finally some remarks on how to use this local memory for speeding up the algorithm. The first idea is to use it to speed up the collision checking. And so a short reminder on how we check for collisions. So we assume here we have a collision consisting of two triples, one which starts at a point Z and ends in a point Z bar and takes exactly D steps, another one starting at a point Y, also ending up at the distinguished point Z bar, which takes E steps. And here we assume that E is smaller than D. We know this as we check the collision because we store this in memory. And so the way we check for a collision is we start applying our function fn to Z several times until we know we have as many steps left as the chain length of Y. And here, if at this point the value of the chain starting at Z is the same as Y, we are kind of unlucky because we've reached the collision without having two distinct inputs. This is what we call a Robin Hood collision. If this does not happen, we simply continue walking both chains forward until we do hit the same intermediate point, which is guaranteed to happen because at the end we reach the same distinguished point. So then we have a collision and we can check whether this is the golden collision. So one idea is that if we have local memory available, we can simply store all these intermediate points in one of the chains here and the chain starting at Z. And now if we want to check for a collision, instead of having to walk from Z first and then starting to walk from Y, we actually don't have to walk from Z at all. We can simply start applying FN to Y until we hit one of the points in the chain of Z at which point we have a collision and we can check for the golden collision. And here you can note that instead of having to walk two chains forward, we only have to walk a single chain forward. So we could hope to get maybe two times speed up here. So in practice, if these chains get longer, of course, you may not be able to store all of these intermediate points. Instead, you may be only able to store T intermediate points, say, and in this case, in that case, there are some smart tricks and ideas to optimally store these points to still get really nice results and to still approach the two times speed up with fewer memory requirements. So if you look at the actual improvement here, on the x-axis here, you have the number of, so basically T, the number of intermediate values that we can store and on the y-axis is the number of steps that we need to check a collision. And we can see that sort of as it grows, it seems to go closer and closer to about half the number of steps, which is similar to what I said before. Okay, so finally, a sort of a nice thing you can do if you, what you can do is that these devices are all computing isogeny walks, basically. So every iteration of FN is an isogeny walk and this consists of a tree computation as follows. And so now instead of starting at the blue note every time and computing all the way down the tree, you can fix some pre-computation depth. And so you can pre-compute all the notes up to some depth and now you start to walk only from that depth. Okay, so now your isogenies have a degree decreased by a factor delta, by a linear factor delta, and you have to store a table consisting of two to the delta of these notes and each of these notes consists of six elements in FB squared. We have six times two to the delta elements in FB squared. So this grows very quickly, but delta of course, but it could still give you some small speed ups for these devices. Okay, and so there again, there are some intricacies here. You have to be careful how you actually store these tables and which bases of these torsion points you choose, but again, for more detail, see the paper. Okay, and so some two other optimizations that I want to mention. So you can look into multi-target attacks. If you assume two public keys and you only want to retrieve one out of two or then you can get some small optimizations and they're all kinds of sort of practical speed ups. For example, these distinguished points, you can for example, choose them in a way that some bits are zero in which case you don't have to store those which leads to some optimizations because you can now store more points in the fixed memory size that you have available. Okay, well thank you for listening and enjoy the rest of the conference.