 So, the next talk of this session will be given by Luisa Papakristu-Dulu on the practical evaluation of protected RNS scalar multiplication. Yeah, better? Okay. Okay, yeah. Thank you for the introduction. So, this work is the practical evaluation of protected RNS scalar multiplication. This is a joint work with my co-authors, Apostolos Furnaris, Kosovo Yiannopoulos and Leila Batina. I'm currently working at Navin for Europe, but this work was done while I was a PhD student at about university, and I was also visiting the University of Patras in Greece. So, this is the outline of my talk. First, I'm going to talk briefly about the recipe number system and how we can use it in elliptical cryptography. Then I'm going to show some proposed TVLA threshold calculation that we found during this work, and then I'm going to show you the results of TVLA analysis of the traces of RNS, and then some location and data-dependent template attacks, and finally some conclusions. RNS has a potential to be used broadly in public key cryptography because it offers a fast, efficient, and a parallel way of operations, and it is used already in RSA and some NCC implementations. The potential of RNS as a side channel attack countermeasure is already shown in some publications, and there are some theoretical works showing this, but now this is the first work that presents many practical experiments and shows the resistance of RNS in this against some side channel attacks. So, RNS is an extension of the Chinese remainder theorem, and it is a non-positional arithmetic system where a number X is represented by a set of individual numbers, X1, X2, Xn of a given base B. This base has a co-prime numbers Mi, and the number can be reconstructed from RNS to binary if we apply the Chinese remainder theorem. RNS was initially used for digital segment processing and for parallel processing of arithmetic operations in order to increase the computational speed. As an example, you can see here that the number 50, if we have the RNS base 3711, would be 216. So, yeah, as you can see, we can have, especially when we have large prime fields and we can make the operations, we can make them into smaller dynamic ranges, and in this way we can reduce complexity and we can increase the computational speed during the calculations. In this work, we used the elliptic curves of prime fields and we implemented RNS on elliptic curves. All the modular operations that we know can be easily turned to RNS modular operations, so addition, subtraction, and multiplication happen in a regular way. The only problem is with division and inversion because RNS is a non-positional representation. So, for this, there are some algorithms, base extension algorithms that they are used in order to change from one base to another to the other. The RNS modular multiplication is usually realized through RNS mod-com, a multiplication in order to avoid division and inversion, but we have to use some base extension algorithms which add some overhead in our implementations, and still here in the RNS case, the critical operation is the scalar multiplication, so we want to protect the secret scalar k. This is an example of the Montgomery power ladder, how it is implemented usually with choosing some input points and converting them to Montgomery format, then having a for loop over the bits of the scalar, and then we have always addition and doubling, but according to the manipulated key bit, these values are stored in different registers, but it's a very regular algorithm, as you know, and it can easily be extended to RNS, but just in the beginning we need to choose some initial basis that we work with and we represent our numbers with these bases, then we transform our points on the curve, we transform them to RNS format, and we choose a permutation pt, this is a permutation of the co-prime elements of the basis. In order to add some extra randomness, we can have these elements permuted in a random way, and then we just choose for every computation we need to do, we choose a different permutation of the elements. Then every time we represent the point on the curve, we have to represent it in RNS format in the chosen permutation. At the end of the algorithm, we need to transform our result back to binary format. These are just the extra steps that we need to do if we want to do Montgomery ladder in RNS. Actually, this algorithm is the one that we're using different variations in the experiments that we did, and that's the basic algorithm that we work with. First of all, we performed a TVLA, the test vector leakage assessment, which is a methodology broadly used for evaluating the security of a device. This methodology consists of several statistical tests between two trace sets of acquisition, one that is obtained with a random input and one with a fixed input. This test provides results simultaneously for all intermediate values and typically points of potential leakage. This leakage is not exploitable with TVLA, but we need to perform further attacks not to exploit them. It is commonly used, this 4.5 threshold of this value SI, and values that are above this threshold indicate leakage. As we saw while we were implementing this TVLA to our experiments, we saw that there were some peaks in places where we didn't expect it. Going to literature, we found that there are similar ghost peaks in public key implementations. TVLA was mostly used for symmetric key. In public key, when we use more samples per trace, it's common to have some peaks that are above the 4.5. What we did is we proposed this special calibration for TVLA. That's an algorithm that depends on the number of traces in the groups that we choose, the number of samples per trace, and the distribution, the sample standard deviation of our measurements. The output would be the threshold value for calculating the TVLA. In this algorithm, we notice that with more number of samples, the family wise error rate would also go higher. We need to apply the CIDA correction that's also proposed in 2017 or 2018 to use the CIDA correction. We just put together all the formulas and then you can calculate your threshold according to the current acquisition using the last formula. For example, for our case where we had the number of traces in the order of thousands, four to ten thousand traces, and the number of samples we had 400,000 to 800,000, and we had these standard deviations. The threshold in our case was 6.3. There were some other papers, like one from Smilevsky from 2017, that they also had 32 million samples per trace. When we put these in our algorithm, the threshold for their case should be 7.3. You see that we shouldn't take for granted this 4.5 when we do analysis in public key. Further, what we did is we had this C-software implementation of fairness. We put it on a big bone and we tried with the Ernest Montgomery multiplication and different curves and dedicated and unified group law. Also, we had different variations of our algorithm. First, we had unprotected with no counter measures applied. Then, we had typical side channel counter measures, randomization of scalar and randomization of point. Then, we had Ernest specific counter measures. One that's the liquid resistant arithmetic, which basically does random base of permutations. Also, we tried with random order of operations where things could happen in parallel and we just changed the order of operations. For our analysis, we needed to do some processing of the traces. We applied LOPAS filter, we did ABS, and we also needed to align. Usually, we were collecting about 50K traces. At the end, after alignment, we are left with about 20K traces. We used the C-software and half of the traces, we used some traces for the T-test. This one shows the T-test for a random versus fixed scalar on the Twisted Edward Scare with these parameters. In the first picture, you see the leakage. When we have the unprotected scalar multiplication, then when we do randomization of the scalar, we can see that the leakage is a bit less, but still we can identify the rounds. We were collecting traces for seven rounds and we could see. Then, the LRA counter measure that's actually working better than randomization of scalar. We have only some leakage in the beginning, and then it's under our threshold. Then, what we didn't expect was when we have LRA and randomization of point, we had a combination of these two counter measures. We had actually much more leakage in the beginning. We tried different things. These were not ghost peaks, so the peaks were there. Probably it's because of the device, because there is a lot of leakage in the beginning, and maybe some values are sent in clear. That's why in the first round, we have these big leakage, but then later, with the rounds, it disappears. Then we tried also some other curves, secure Edwards curve with these parameters, and the results there were a bit better. Again, in the unprotected case, we have a lot of leakage. When we do randomization of scalar, the leakage is much less than before. With the LRA counter measure, in this case, we have a lot of leakage that propagates through all the rounds, but when we do the combination of the counter measures, then the leakage is not there. This is the case when we do T-test for random versus fixed point. That's why when we randomize the point and we test for this, then indeed this counter measure works, and the leakage, as expected, is low. Then we did some template attacks. First, we exploited data-dependent leakage, which is observed when the value of a secret variable is monitored by an adversary. This can happen when the variable is unprotected or when, at some point, the algorithm is sent in clear. In our case, we triggered around the key-dependent assignment, which is the statement, and we could, again, like we did alignment, and we ended up with 28 races. We used the half for templates and the other half for classification, and the success rates of correct classification was 90 to 91 percent for the unprotected case. With the LRA counter measure activated, depending on the curve, and it was 82 to 97, so again, it was successful. We could classify. The cases where we had a special randomization and LRA with randomized R&S operations, the classification rates were lower, 55, 58 percent, 65, 72. We assumed that in those cases, the unprotected we couldn't recover the scalar bits. Then we performed location-dependent template attacks. The templates were created based on the storage structure that handles the key-dependent instruction. In our case, it was the doubling. We triggered around the doubling operation. As I said, before, the algorithm is very regular, but what changes is where the values are stored, and this, indeed, leaks some information. So we had a very successful template classification of 95 to 99.9 percent, and when the LRA counter measure is activated with randomized operations, then this classification rate falls to 70 to 83 percent. Actually, location-dependent leakers was not something that we were expecting to see, because we have large values, and we were spitting them in four chunks of 50 bits, so each value is stored in different registers. But what we expect is that this happened because the normal distributions of the collected traces, indeed, were different, and this made it obvious from the experiments. And also the platform is leaky, so the capacitors are next to each other, and probably that's also the reason why the location-dependent leakage happens. We saw that scalar optimization is not an efficient countermeasure in all cases. It reduces sometimes the accuracy of the classification, but it's not so efficient as LRA with randomized operations. This evaluation table of all the different variations of the algorithm that we used, and with some performance overhead at the end, how much more time you need for your implementation. And from this, the last rows, the LRA with randomized operations seems to resist against more attacks with 76 percent overhead. To conclude, I would like to say that TVLA bombs should not be taken for it. We should compute every time, according to our acquisition, and the distribution of traces, the number of channels, and the number of traces, we should compute our threshold bounds. And randomization of scalar, of input point, and the regularity of the Moncomit power ladder are good countermeasures, but not enough in all the cases to avoid leakage. Different arenas representations do not lower the template's access rates, and also different shapes of the curve. Yeah, randomization of the arenas operation protects against template attacks, and it's less expensive compared to randomizing the input point. And what we are now considering for future work, and already started looking into this, is the classification using machine learning algorithms. And we could also avoid some alignment and try to use all the traces of the position if the machine learning algorithms are successful. And it would be also interesting to have an evaluation on FGA, and indeed have operations happening in parallel, because that was a software implementation, and we did it into performance optimizations. But if we would have hardware implementation with this implementation, then we could examine further, and we can have further insight on the security of arenas as a countermeasure. Thank you for your attention, and if you have any questions, I'm happy to answer. Thank you. So I think we have time for a quick question. Yeah. Thank you for the presentation. A very short question. You show experimentally that there is leakage, even when we are using scatter randomization or the arenas representation. But did you succeed to identify where the leakage is coming from? Because we could expect that there is no leakage, because we use scatter randomization, so there is no longer information available normally. So why do you see leakage? What in your implementation is causing the leakage? Yeah. One thing could be the platform that is leaking, but in the implementation we used functions as from the MPZ like the U-random and this function so we used software randomization. I guess that was not a good source of randomness. So the conclusion, because your conclusion is quite strong, you say scatter randomization is not sufficient to defeat such an attack, for instance, template attacks, but in your answer, now it seems that maybe this is not an issue of the scatter randomization, but with the way how you have implemented the randomness in the randomization, so the random generator or something like that. Yeah. We tried different things, different functions for randomization. We didn't have hardware randomization, so maybe if you have five sources of randomness combined together in a true random generator, then maybe you don't have a scalar randomization issue, but in a simple, straightforward way to implement it using random functions from the software, then it's all... Just a very short question. How did you generate the random base for the RNS representation? Did you each time regenerate the full bases or did you took the primes in a fixed basis at random? In a fixed? In a fixed base of primes, of small primes at random. So my question is, do you really regenerate for each execution the full base? Not for its execution, in the beginning of the algorithm. Not for every execution taking a new random. So not for every execution? Yes. So there is not so much randomness in the representation. Yes. Thank you. Okay, so as we are running out of time, let's thank the speaker again.