 All right. And the second and last presentation of this session is titled Standard Lightness-Based Key and Capsulation on Embedded Devices. The authors of the paper are James Howe, Tobias Oder, Marcus Cross, and Tim Gunesu. And James is going to give you the presentation. OK, thank you for the introduction. So yeah, we have some results here for this candidate, Frodo, in hardware and software. So I'm just here to present the research here. So I'm sure you've heard a lot today already about post-quantum crypto and learning with errors. So I'm going to just skip the introduction there and just go straight into the motivation while we looked at Frodo. And then I'll describe the microcontroller designs, the hardware designs, and then give you some results and comparison and performance analysis. So why have we decided to do this? Basically, NIST, as you know, started a post-quantum standardization competition. And they have suggested that in the future this will likely involve evaluations on constrained devices, such as smart cards and also comparisons of schemes in hardware. And the reason we focus on Frodo is mainly because as practitioners, this is a fairly appealing scheme to concentrate on. It's extremely versatile and really strong, has really strong security properties. And yeah, it's probably the most secure latter space candidate. And also, there have been less implementations of standard latter space schemes. You can even see in this conference here all of the implementations have focused on ideal lattice candidates or modular lattice candidates. And so we're looking at trying to sort of bridge the gap. I mean, we're short on the gap between what you expect from standard lattices in theory and in practice. And also, we consider Frodo an ideal candidate for long-term security use cases, especially constrained hardware platforms. So Frodo was designed really to be a conservative, yet practical, post-quantum candidate. Its security is derived from the straightforward standard learning with errors problem. And another appealing property is the fact that parameter selection is a lot more simplified for Frodo because it is connected to the learning with errors problem. There's no restrictions on the size of the format of the prime or any other sort of parameters. And practitioners, this is obviously very appealing. For feature IoT, this can be appealing. And also having a long-term and efficient crypto scheme is a good thing. As we've seen before in the last talk, the microcontrollers, especially the M4, will probably play a big role in the IoT era. And also, FPGAs will be part of that feature as well. They're already being used in the likes of cloud services, such as Microsoft. And a suitable use case for this research would be something like satellite communications, where you need long-term and highly secure cryptography. So here's just a shortened version of the encapsulation module. The main operations within the Frodo and within the encapsulation scheme are essentially in 9.6 and 9.8. So this is where we do the learning with errors calculation. This essentially consists of a matrix-matrix multiplication with some addition of an error distribution. But also, we have pseudo-random number generation in the first few lines, as well as generating these matrices from the error distribution. And then we also have in line 9 there, we have the use of a random oracle to ensure that we have CCA security. So this takes in the ciphertext and the keys that we want to send and creates a hash of it. So essentially, the key modules within Frodo within key generation encapsulation and decapsulation are matrix-matrix multiplication up to size 9.7.6, uniform and Gaussian error generation, and using random oracles. So the proposed way to do this in the specifications is via C-shake. But probably the biggest design challenge for us for this research was balancing memory utilization and not deteriorating in the performance of the modules. And we didn't want to over-exert the limited computing capabilities of the embedded devices. Essentially, we wanted to still have some room to do other things on the devices. Essentially, we sort of knew that you're not just having this device to do encapsulation, you maybe want to do some other operations on it as well. So Frodo comes in two sizes. There's two parameter sets. So it targets 128 and 192-bit security. It uses pseudo-randomness from either AES or C-shake. And we focus on the key encapsulation mechanism rather than the key exchange scheme proposed at CCS. So all of the designs we propose here cover all of these different parameter sets and different pseudo-randomness. So the ARM implementation, probably the biggest contribution here is the optimized memory allocation. So this enables us to actually fit the schemes on the embedded devices. And we also propose an optimized assembly multiplication routine, which speeds up the implementation and also helps us realize the performance for certain use cases. All of the three modules, they run in constant time to help us protect against simple side channel analysis. And the total execution time of FrodoChem for the 128-bit parameter set is 838 milliseconds running at 168 megahertz. So here's just a brief overview of the encapsulation. So you can see here in the middle, you have the matrix multiplication and addition, which stems from the samplers. And these are the outputs into the ciphertext. And we had to analyze the memory occupancy during each operation. And wherever possible, we reused the already allocated memory. And this saved us a lot of memory. So here's an overview of the cycle counts for the design. So the top half of the table here is our implementations with both parameters sets and both AES and C-shake. So as you can see here, there is a clear difference between the AES implementations and the C-shake implementations. This is essentially due to the fact that with AES, we get just enough randomness output each time that we can store them in registers. For the C-shake outputs, we have actually too much. So we have to store them in RAM. And then we have to spend time saving and loading them from RAM, which gives us the hit on the cycle counts. The only other real comparable scheme is the PQM4 implementation of Fredo Kermit using C-shake. And our optimizations here show that we do save about 2 and 1 half million clock cycles, which is quite substantial, but also compared to the ideal LATAS and modular LATAS-based schemes, there is a clear difference here. But I think that's to be expected. With the stack usage, our memory optimizations save significant amounts of bytes. So compared to the PQM4 implementation, we save between 30% and 40%. And versus the reference designs, we save 66%. Although the C-shake implementation is significantly slower, there is slight savings there compared to the AES implementation. So for the FPGA design, we propose a generic LWE multiplication core, which computes vector matrix multiplication in our addition. So instead of doing matrix matrix, we do vector matrix and just repeat on the vectors of the left-hand side matrix. We also generate feature randomness in parallel, which minimizes delays between the multiplications. This essentially makes the bottleneck of the schemes in hardware the multiplication. And we have it on the fly memory management, which means that the next values are all ready for us to use. And this means we don't use as much memory as we need to. Also, this design runs in constant time. And we do this by essentially making multiplication the bottleneck so that when we do generate randomness, it's done in parallel to this. And so the 128-bit FrodoChem has a total execution of 60 milliseconds at 167 megahertz. So here's just an overview of the encapsulation design. As you can see there in the middle, we have the learning with errors multiplier. It takes inputs from BRAM, which we use to store the key B. And we also store the random matrix A. And this is done on the fly from a C-shake module. And this also takes input from the error distribution, the Gaussian block there I've put. And it uses a DSP slice here to do the multiply and accumulate operations. Then we simply just add in an error at the end of the vector matrix multiplication in order to form the learning with errors calculation. And this is output as the ciphertext, and then also input into the random oracle here at the end to in order to calculate the shared secret. So here's just some results. So the first half of the table here is our results for both parameter sets for all three modules. Also here we give area consumption results of the modules we use, so C-shake and the error sampler. With regards to the area consumption, we do compete with new hope. So that's something quite comforting. But as you can see, it does suffer somewhat with the performance. So there's the clear distinction there with the operations per second. The only other real comparable scheme here is another standard learning with errors encryption implementation in hardware. And we do significantly outperform them in terms of BRAM. So for this implementation, they don't really optimize the memory usage. And here, even for the larger parameter sets, we do significantly outperform them. And we have an increase in the frequency of the operations, although there is some hit on the throughput as well. So in conclusion, I think we minimize the performance distance between standards and ideal latter space chems. And I think we also show that Frodo, ideally, I mean, the ideal platform for Frodo is hardware. I think it benefits a lot from having this parallelization and the ability to pre-calculate to your randomness for feature use. The microcontroller implementations show a significant we better the memory usage for the VM memory optimizations. And we show a saving of between 66% against the references designs and 40% versus the optimized designs. It would be interesting to see if, for future research, whether increasing the amount of multipliers or the DSPs in the hardware design was how that would work, how that would benefit, how fast we could get the implementations, but also how much of a hit we would take on the array consumption. And also, we'd like to consider more side channel protections, so masking and the likes. But it would also be interesting to see how this compares to other NIST post-quantum candidates. As far as I know, there's not been many NIST candidates on FPGA, so we're not exactly sure how this compares to those. All we have really is sort of the pre-NIST versions of the schemes. So yeah, these results really were trying to help with the standardization process. I think we've shown that Photochem is efficient, and we hope that we have made some contribution here. We're not obviously saying you can't use rings anymore unless you've got them. Thank you. Thank you, James, for the nice presentation. Are there any questions? We have time for a couple of questions, maybe. I do have a question. You already commented on it. Well, your architecture only uses one multiplier. Any insights on what's the expected performance improvement if you introduce more multipliers? In this case, with Frodo, right? Yeah. With the hardware design, you mean? Yeah, well, this idea that it's based on matrices that can be highly parallelized. Well, I did look at that, and with the increase in multipliers, you therefore need twice as much randomness ready for you to use. So if you are going to increase the number of multipliers, you therefore need to have a better performing C-shake or AES core. And when you increase the performance of C-shake, it does get fairly big. So our designs there, you can see fit under 2,000 slices, where I think if you look at the high performance reference implementation of Ketchak, it goes up to like 5,000 slices or something, so it's really, really big. So I think if there was to be more work in this area, we'd have to look at the C-shake module and then optimize that somehow and make it more efficient. OK, interesting. Any other questions? All right, if not, let's thank James again.