 Hello, my name is Patrick Longa. Today, I will talk about our project title, The Cost to Break Psych, a comparative hardware-based analysis with AES and Shathri. This is joint work with Wen Wang and Jakob Zephyr. Before jumping to the main part of the presentation, let me give you a brief summary about the project. Let's start with the problem that we are tackling. Psych, that stands for super-singular soya-niki encapsulation, is an alternate candidate in the third round of the niche post-quantum cryptography standardization process, and it's in fact the only a soya-niki-based scheme in the competition. This is important given that we think that diversity should play an important role in the selection process, in which case, Psych becomes a very attractive alternative, if we notice that most finalists in the competition are lattice-based schemes. Maybe the single most critical drawback of Psych is that it's relatively slow. However, on the other hand, Psych has demonstrated to have a solid security. The problem is that currently, Psych is being penalized because its parameters are chosen very conservatively, using a random maxes memory model. We observe that these hard Psych's performance more than it should. What we do in this work is to analyze the security of Psych, AES, and Shafiri using a more realistic budget-based cost model. We use AES and Shafiri here because these are used as reference points to determine the niche security levels. In our model, we can consider both computing and memory costs, together with historical price data, and the use of a conservative projection of future costs. For our analysis, we also propose an efficient architecture for the single most critical computation in the cryptanalysis of Psych, that is, the large-degree isogenic computation. This architecture is then used to model an ASIC power instance of the Van Orschel Wiener or VAU parallel collision finding algorithm. This algorithm is the most efficient algorithm currently known to break Psych. Now, let me briefly comment on our results. As conjecture, we concluded that the current Psych parameters are very conservative, and offer a wide margin of quantum and classical security. Our model allows us to quantify such security gap. Accordingly, we propose new parameters that fit more closely than niche security targets. These new parameters are still chosen conservatively. We also report new implementations of Psych using these new parameters, and show significant gains in performance and bandwidth. This is a win-win situation. As I said before, the execution time is the most critical drawback of Psych. On the other hand, Van Width is one of the most attractive advantages of Psych. So reducing key sizes makes Psych even more attractive. Okay. So now let's go deeper into the details of the project. Let me give you some basics about post-quantum key exchange from SuperSingular Historianists. Psych is in fact based on SIDH, which stands for SuperSingular Historian Diffie-Hellman Key Exchange. This protocol was proposed back in 2011 by David Zhao and Luca Diffel. As we mentioned before, SIDH has a solid security history in spite of its young age. The best known attack is the classical VAO algorithm. So on a classical and quantum computer, the complexity is exponential. Now, Psych is the in-CCA secure version of SIDH. It was designed in 2017 and then submitted to the NIS post-quantum cryptography standardization process. Let's recall some facts about elliptic curves and iso-units that are important to understand how SIDH and Psych work. Let's assume we have two elliptic curves, E1 and E2, which are defined over an extension field L of characteristic P, where P is prime. An iso-unit is then defined as a non-constant rational map that maps E1 to E2, and that also preserve the identity element. Now, let's mention some relevant properties that are important here. For example, iso-units are group homomorphisms. Another important one is that isomorphism classes are essentially subsets of elliptic curves that share the same J-invariant. Finally, given a prime P, we have approximately P over 12 isomorphism classes of super singular elliptic curves all defined over Fp square. One way to understand SIDH is with this idea of super singular iso-unit graphs. In this graph, we assume that the vertices or dots represent the isomorphism classes of super singular curves, meaning that they contain a subset of curves all with the same J-invariant. Now, the edges that connect these vertices are the iso-units, which have a fixed prime degree. So we can have graphs with different degrees. For example, on the left, we have the degree two iso-unit graph, and on the right, we have the degree three iso-unit graph. So as you can see, the graphs map the isomorphism classes in different ways, and this is the idea behind the SIDH protocol. Now, let me explain how SIDH and site work in a nutshell. Let's start with SIDH. This assume Alice is represented in red and Bob in blue. Again, we have our vertices, which are the isomorphism classes, just visually expanded on this slide. So we start by fixing an initial elliptic curve that we call here E-naught. This one belongs to a certain isomorphism class. Then Alice proceeds by generating a secret iso-unit that we call phi A, and with that secret, she maps E-naught to another elliptic curve that we call E sub A belonging to another isomorphism class. Both does similarly using a secret iso-unit phi B that maps E-naught to another elliptic curve that we call E sub B. Now, E sub A and E sub B are the public key information. They are exchanged between the parties. So then we proceed as follows. Alice takes Bob's public key E sub B, and using another secret iso-unit phi prime A, she maps E sub B to another curve E sub AB. Bob does similarly to map E sub A to another curve that we call E sub BA. If everything is done properly, then E sub AB and E sub BA are expected to belong to the same isomorphism class, meaning that they should have the same J invariant, which then can be used as a share key. All right. But the problem with SIDH is that it is not secure when keys are reused. It is only recommended in a femoral mode. That's why we need Psyche. Psyche uses a variant of the Huffhey-Hobelman skills transform to convert an in-CPA secure public key encryption scheme into an in-CCA secure key encapsulation mechanism. In this slide, you can see the details of the Psyche protocol. I won't go in too much detail here. Just wanted to highlight the three stages that make up SIDH. As again, it has a key generation stage where we generate the secret key and public key pair. The public key is then sent to the end capsulator which proceeds to do an encryption of a randomly generated message M. So it produces a Cypher text C and also a share key. The Cypher text is sent back to the decapsulator which proceeds to do a decryption using the secret key. Then there is a re-encryption to make sure that the Cypher text was well-formed. If that's the case, then the share key is output. Now, the security of SIDH and Psyche is based on the so-called computational super-singular Sogene problem. It is defined as follows. Let's assume we have two super-singular curves, E1 and E2, defined over FB-square. In our setting, these curves are connected by an Isogene with a large and a smooth degree. Then the problem is as follows. Given points P and Q that lie on E1, and the images of these points using the Isogene on E2, then the hard problem is to compute the Isogene phi. Let's talk about how the security strength of a given cryptosystem is determined, and then focus on the specific case of parameter selection for Psyche. Okay. So for a given cryptosystem, one can use a simplistic but very conservative approach that is based on the query complexity of the cryptanalysis of the scheme. In the case of AES-120A, for example, a brute force attack requires two to the 128 AES execution calls. And this determines the security of this scheme in this case. One can do a slightly more sophisticated estimation using the random access memory model. So for example, one can use the complexity of the iteration of the attack in terms of gates, instructions, or cycles. In our example with AES-120A, this determine that implementation of this scheme takes approximately two to the 15 gates. That means that the security of AES-120A is estimated as two to the 128 times two to the 15, which give us a total of two to the 143 classical gates. And this is precisely what is used to determine security level one in the NIST process. What are the disadvantages of this approach? Well, the communication and memory costs are totally ignored. And that crucially penalizes cryptosystems for which cryptanalysis actually requires significant amounts of memory. And this is precisely the case of Syche. All right, in the specific case of Syche and its parameter selection, first of all, we need to determine the search space. This is essentially given by the number of half-order elliptic groups of groups. And this is approximately P to the 1 fourth. And this is the parameter that was used by the Syche team to determine round one parameters. In this case, the supersimilar signal problem can be modeled as a black box clawfinding problem that can be solved classically with mid in the middle in time and space complexity P to the 1 fourth. So for example, to match the 128-bit security of AES 128 at level one, we needed a prime of approximately 512 bits. Later on, Ag and others pointed out that the amount of memory required by the mid in the middle attack was unrealistic. They suggested to use the VAL algorithm instead. You can see the corresponding time complexity in the slide. They also suggested to use the storage to certain amount, specifically to two to the 80 memory units. For example, under the light of this new analysis, the bit length of the prime required for security level one could be reduced to 448 bits. So round two parameters were updated accordingly. For example, you can see here that Syche P503, that uses a prime of 503 bits, was replaced by Syche P434, that uses a prime of 434 bits. And this is for NIST level one. This reduction in sizes translated to an important speedup and reduction of public key and Cypher text sizes. In this table, we can see a summary of security estimates. For Syche, the number correspond to round two and also round three, because the parameters remain unchanged. In the column level NIST classical gate requirements, we display the estimates that determine the different security levels, using AES for levels one, three and five, and SHA-3 for level two. The two last columns on the right correspond to Syche, with numbers estimated by each and others in the first case and by Costello and others in the second case. In both cases, the memory requirement is fixed to two to the 80 memory units as explained before. As can be seen, the estimates marked in red are slightly below the gate requirements from NIST. And this is a first issue that we observe here. This is justified as follows in the Syche specification document, and I quote, although their times fall slightly below NIST required gate counts, the corresponding conversion to gate counts would see these parameters comfortably exceed NIST requirement, end of quote. So in some cases, there is a gap between the NIST levels and the estimates of the official Syche parameters. And this remains somewhat unexplained. The second issue is that there is no fundamental reason to use the memory requirement two to the 80. So in this work, we solve these two issues as we will explain later on. Let's now proceed to explain our budget-based cost model that we used to estimate the cost of crypto analyzing Syche and to estimate its security. So this model consists of some easy to understand steps. First of all, we fix the budget of the attacker. Then we distribute the budget to get computing and storage resources, such that the time to run the crypto analysis is minimized. And finally, the security of the scheme is estimated as the time it takes to break it. Similar approaches have already been used in the literature. For example, Van Orscher and Wiener did it in 94 and 99. However, there are some drawbacks in most previous studies. The main one is that the analysis is typically focused on a specific point in time. We improve the cost model in two crucial aspects that we believe increase the significantly the confidence in the analysis. First, we analyze historical price information of semiconductors and memory components in order to determine the soundness of using this information to evaluate the cost of crypto analysis. And second, we apply simple but conservative projections to estimate the costs in the future, which is ultimately the information that is relevant for a crypto system. The piece of information that we need for the analysis is the cost of components, including gates and memories. What we did was to compile public release prices of microprocessor units or MPUs from Intel and AMD for the years between 2000 and 2020. We use public transistor counts and also we use the standard assumption that a gate consists of four transistors. In addition, we apply some scaling to these release prices to approximate them to actual production costs. Given that release prices include additional costs, including some margin for profit. To do this approximate approximation, we follow the procedure by Canon man. We did a similar estimate for memory components, such as hard disk drives, DRAMs and SSD memories. And finally, we also validated our results with the forecasts provided by the international technology roadmap for semiconductors or ITRS, which was an international consortium that coordinated the progress and development of semiconductors in the 2000s. Here, you can see a graph with the cost estimates for the different components. We display the number of gates that can be bought per dollar that is displayed as a blue line and the number of bytes that can be bought per dollar display as the red lines. In this case, considering different types of memories. As can be seen, the costs of the different components have increased in a very uniform way. This is visible by looking at the solid green line at the bottom that represents the ratio between the costs of gates and the costs of memory. This result provides evidence of the relatively stable relationship between the two costs, which is one of the key aspects that give us confidence in using the cost information to determine the cost of crypt analysis and therefore the security for the scheme. All right, so we obtain robust cost information for the analysis that we were carrying out. What we needed next was to collect area and time data for the different cryptosystems and their analysis. So let's start with Psyche. For Psyche, what we did was to design an efficient large degree isogenic accelerator, which is the main component in cryptanalysing the scheme. The goal in this case was to achieve an optimal area time product on an ASIC. So we implemented all the main computations including point addition, point doubling, isogenic evaluation and isogenic computation for the case of the degree four operations, which are actually the most efficient ones for Psyche. Here in the screen, we show a diagram of our architecture. Under the hood of these operations, we also need an efficient filler arithmetic architecture. And that's what we did. We also proposed a novel unified multiplier of our FPS squared that enables a theoretically optimal parallelization of the internal operations. All right, so we implemented this efficient hardware accelerator and then we proceeded to obtain all the data that was required to plug into our cost model. So first of all, we collected cycle counts for the different operations. You can see the table with the best results that we obtained for each parameter set. At the top, you can see the case of the round two and three parameters. But at the bottom, there are new alternative parameters that I'm going to discuss later on. Note that we applied a very conservative approach. We ignored all the control, computation and data communication overheads. This gives us a safety margin in our analysis. Then we proceeded to obtain every and time results using synthesis tools for ASICs. The results were obtained with an open source NAMGate 45 nanometer library. The results corresponding to the large degree is only accelerator are displayed in the table on the right. As before, we applied a conservative approach, basically all the control circuitry that is required to implement the rest of the VAO algorithm is also ignored, like for example, the hash function. We then proceeded to do the same analysis for AS. In the case of AS, we did the analysis with the parallel attack based on rainbow chains. The complexity of the attack is shown in the screen. In this case, it corresponds to using N parallel engines targeting a key K of B to length B. To obtain the area time information we use to our knowledge, the most efficient AES implementation available in the literature for ASICs. And in this case, efficiency is defined as the best area time product. We follow a similar approach for Shafi, in this case, using the VAO algorithm for collision find. Okay. So at this point, we had area time information and for site AS and Shafi, and what we needed to do was to plug in all this data into our model. For this purpose, we wrote a Python script that made all the calculations for different years using different budgets from one million up to one trillion dollars. And as I mentioned before, we also included simple cost projections up to the year 2040. Let me illustrate our results with the case of the hundred billion dollars. In this case, well, I have to mention that other budgets also achieve similar results. The results for the years between 2000 and 2020 were obtained with historical price information, while the results for the years between 2000 and 25 and 2040 used our projections. On the Y axis, you can see the logarithm of the number of years that it takes to break a certain crypto system. The estimates for AS are in red, corresponding to level one with AS 128, level three with AS 192, and level five with AS 256. The estimates for psych are in blue, corresponding to psych P434, P16, and P751, respectively for the same security levels. And as can be seen, there is a very wide gap between the security estimates for AS and psych, which confirmed that the current round three psych parameters do fulfill the needs targets and in fact offer much higher security than expected. So what we did was to try to choose parameters that follow more closely the needs targets. And these are the ones that I'm showing the screen now are labeled with the word new, psych P377, P546, and psych P697, for levels one, three, and five respectively. Let's remember that this still includes some conservative safety margins, which we think makes our parameters safe to be used to target the needs levels. So to finish this presentation, I will share some software implementation results that we got that showcase the potential benefit of using the proposed parameters. The results in this table in the screen correspond to an X64 machine with an Skylake CPU. The side round three parameters are displayed on the left and the alternative parameters on the right. In each case, there is a very nice reduction in bandwidth that is in the public key sizes, which is displayed in terms of bytes. Given that one of the most attractive features of psych is that it provides the smallest key sizes of any of the post quantum candidates in the next competition, this reduction makes psych even more attractive. Also in the table, we display the computing time in terms of millions of cycles, which you can see the significance speed up when moving from the round two, three parameters to the new alternative parameters. So for example, psych P377 intended for level one can be executed in close to four milliseconds. This is in a 3.4 gigahertz machine and which is about 1.4 times faster than psych P434, which is the current round three parameter for level one. So again, this provides a very important speed up in the computation, which arguably pushes psych in the right direction to make it more amenable for many more applications. To finish, let me point out a couple of links with additional information. We have our full paper version on Eprint containing many more results and analysis. Also, you can visit our GitHub repository that contains our hardware implementations, including a software hardware co-designed prototype of the full ball valve algorithm. And this GitHub repository also contains our software implementations of the new parameters. Thank you.