 would be given by Emanuele proof, and it's about how machine learning can be used to assess security of RSA. Thank you, thank you for the introduction. Yeah, the work I'm going to present to you now is a joint work between ANSI, so the French National Agency for Security, and the three main evolution labs, the Security Evaluation Lab in hardware in France, so namely the CIA, TALES, and CERMA. And so this is joint work between me and the experts and several experts of those labs. Let me start with a few words about the context of the study and also a very general overview of the target we have used to do the tests. First, about the context, so ANSI asked the three hardware security evolution labs in France to test the applicability of deep learning attacks against secure RSA implementations. In a quite, I would say, practical context, meaning against a quite reasonable target. So for that, we asked another company, which is well known, a crypto expert, to develop an RSA implementation, including the classical content measures on a quite well-secured platform. So the implementation of the RSA we are going to attack has been developed by crypto experts. The hardware on which the software implementation has been loaded includes Montgomery Arithmetic Co-Processor. And the evaluations performed by the labs should include horizontal attacks, which have been discussed this morning, and also, of course, machine learning techniques. But today I am only going to discuss about the deep learning aspects of the study, which has been done by the labs. So when it comes to implementing an RSA on a smart card device, for instance, we have typical content measures we are going to develop as a developer. So most of the time, the exponentiation is done at software level, so CPU, and it involves Montgomery Accelerator, so another RSA processor, to efficiently process the arithmetic operations. To deal with the classical physical attacks, we have some content measures which are classically developed, and which have been developed in the context of the study by crypto experts. So, for instance, for simple power analysis introduced by Kosher in 1996, we have to develop the exponentiation so that the execution flow is independent of the private exponent, which is quite classical, so you can, for instance, use a square and multiply always exponentiation algorithm. To defeat chosen message attacks, for instance, attacks asking for the signature of minus one, the message minus one, we, classically, we ask to blind the message, so it allows to defeat the attack proposed by Yen in 2001 and also extended by Fook and Valet in 2003. So essentially, it consists in adding a random multiple of the modulus to the message, and in order to ensure that the random mask is not removed at the first step of the exponentiation, we have also to extend the public modulus, so n, and to multiply it by air prime. Essentially, air prime will be a random of 32 bits or 64 bits, something like that. To defeat DPA-like attacks and statistical attacks, it is very classical to use expanded blinding, and essentially, it just consists in adding multiple of the other torsion of the modulus to the exponent before the exponentiation. Usually, you should also consider other attacks, which are quite classical in our context. Both attacks are the address bit DPA attack and the horizontal attacks, but it appears sometimes that both attacks are not considered during the development phase because both attacks are considered less practical than the other ones, so meaning that it appears sometimes that there are no contaminations against both attacks, and this is exactly the case of the implementation, which has been done by crypto experts, so by purpose crypto experts did choose to not consider address bit DPA attack and horizontal attack, so let me say a few words about the hardware specification, so as I said, the exponentiation has been developed for specific hardware, and I cannot tell you all the details about this hardware. I'm not allowed to give the name of the device, but let us call boombo here, but during the evaluation, one of the lab made the following picture of the device after the packaging, and you can see almost nothing, of course. It's too basic as a picture, but believe me, there is an arithmetic core processor, and what is nice also, we discover that after the fact, is that this hardware has been certified, not in Europe, but in Asia, meaning that it is not a very defensive hardware, and maybe the crypto expert does not follow the recommendations of the hardware developer when implementing the exponentiation, but it is not a basic arithmetic core processor, it is a defensive arithmetic core processor. So when we delivered the smart cards to the evaluation labs, we provided them with a very minimal operating system, enabling them to execute the RSA through a very simple API, which is given in the slide, but essentially what you have to remember is that the RSA implementation includes three types of conter measures. So first, the blinding of the message, then the blinding of the exponent, and finally the blinding of the modules. And all the random values used for the blinding are 64 bits long. Okay, just to finish the presentation of the context, I have to tell you a few words about how we use a core processor, such a core processor, such a non-rhythmic core processor, usually. When you are going to, when you need to perform, for instance, multiplication and R-squaring during the exponentiation, you will use, you will make use of what we can call a Rhythmic Montgomery multiplication module. And to use this module, you are going to use, to first, you are going to load in the memory, in the memory of the core processor, the values on which you want to operate the exponentiation, and also maybe you are going to pre-process what we call some Montgomery constants. And then when you need to do, for instance, multiplication between two intermediate values during the processing, instead of giving the values themselves, you are going to give an index, the index of the two values you want to multiply and the index of the location of the processor memory where you want the result to be stored. Essentially, this means, in my example, that if you want to perform the multiplication between the value in segment one with the value of segment two and store it in segment four, then you are just going to call the Montgomery multiplication module with the index one, then the index two, and at the end the index four, okay? So for the exponentiation algorithm developed by a crypto expert, I just can say that it's a very basic one. This is a regular exponentiation algorithm, so square and multiply always. And what is important here is to notice that before calling the modular multiplication module, so for the square ring or for the multiplication, there is a step each time which is just here to decide which part of the processor memory I'm going to use for the multiplication, so which segment of the processor memory I'm going to multiply, and where I'm going to load the result, to store the result. So this is very important because it is well known that if you are doing that, there is an attack pass which is usually called an address bdpa, and which is just using the fact that there is a direct relation between the sequence of the registers which are used, which are manipulated during the square and multiply always algorithm and the value of the exponent. Let us just look at an example. So here the values of the exponent bits are one zero, one one zero, one zero, one zero. So they are blinded, this is a blinded exponent, but my purpose is to perform an attack in one trace. So if I recover the blinded exponent, I'm going to recover in fact a valid secret exponent because it's just something which is equal to the good exponent, modular or a lot of things. So if we look at the values taken by the index segments during the exponentization, we see that if the segment value for the square at step i of the exponentization is equal to the value of the segment used for the subsequent squaring, then the exponent is one. So it's very simple, you can validate that by hand. If we see that they are okay, if we see that they are two equal segment value, then we can observe that the exponent value is one. There is also another attack pass which is very classical. We're just using the fact that there will be collisions between two operands, the operands used, the operands used for the squaring at step i and the first operands used for the multiplication, during the next iteration of the group. So this means that if you are able to recover the index values, you have one attack pass which gives you the value of the exponent bit and we have another attack which is a collision attack or an horizontal attack which just requires that we are able to detect when two operands are equal. Okay, so to perform the attacks, the labs perform two campaigns, one for the power consumption, one for the electromagnetic emission. Because of time constraint, I'm just going to speak about the electromagnetic measurements campaign. So here we have a signal acquired at around 2.5 gigasample per second. So this is a very high sampling rate, quite classical for electromagnetic analysis. And so each trace we are going to use is composed of five million samples and if this is just corresponding to the seven first most significant bits of the exponent. And if we zoom at one of the steps, we see that there is, so this is one iteration of the loop and we see that there is two Montgomery multiplication. One is for the squaring and the second one is for the multiplication and before those steps we see that there is a small steps in red and these steps is just dedicated to the processing of the index and this is where we are going to attack because this is where the indices of the Montgomery multiplication module are processed. Once you have the leakage, once you have your traces and you assume that they are quite good because there is no desynchronization and this is the case as you have seen in the previous slide. The second step you want to perform as an evaluator is to validate that there is information. So essentially is there a dependency between what I am observing, so the trace and the value I want to extract, meaning here the index of the register I'm using for the modular multiplication. So here I'm just taking the 30,000 first samples during a Montgomery multiplication by the coprocessor. So this is where the indices are assumed to be manipulated and I am just processing a scenario and I see indeed that there is an information here. There are many places where we see information that information is leaking. We can do the same for the value, so not for the values of the index, but for the values of the operands and we see that indeed there is also a very huge leakage of the information during the manipulation of the operand by the Montgomery module. So this is nice. We see that there is information, so now the question is how to exploit it. So how to exploit it? I'm not going to go back to what has been said by my colleagues, there are many techniques available and today what we want to use more and more is deep learning techniques. So here I'm just going to focus on the results. Okay, I don't have time to detail more on that. So first, just for comparison, we tested classical template attacks and what is important to notice here is that we just, so for the recovery of the index values, we just extracted very few points because to perform template attacks, we need very, very small traces. So we extracted very few points and we computed the success rate to recover one bit of index for each Montgomery multiplication and what we see is that we get around 80% of success rate to recover one bit. If we do that with MLP, with a well-chosen architecture, I'm not going to enter into the details, but if we use an MLP and now you don't need to extract points of interest, you give the full trace, meaning the 30,000 points for the analysis and then you observe that you have around 98% of success rate to recover one bit for each Montgomery multiplication. So you recover quite very precisely the exponent bit. And if you do that with a CNN, it's even better. You have a success rate of 99% around. I just can say that the same analysis have been done on the program construction campaign and in this context, the lab also tested all the machine learning techniques, but as you can see, the techniques based on multilinear perceptron and based on the convolutional neural networks work better. For the attacks targeting the values of the operands, we have quite similar results. I'm going to go fast, but just to conclude, we've template attacks where we extract exactly the points of interest, meaning that we simplify the problem a lot. We have a success rate to recover the values of the operands which is equal to around 93%. While when we are going to use a CNN, we are going to get a success rate of 97%. And something I have to say is that it is very important, especially for the exponent bit, to recover each exponent bit with very precise accuracy, because if you recover, for instance, each bit with probability 0.9, which is good for one bit, at the end, if you want to recover 1,000 bits of the exponent, you will not succeed. So it's very important to have something which is as close as possible to one, meaning that you want perfect recovery of each bit, which is almost the case here. So to conclude about this work, and you can find many, many, much more details in the paper, especially the models which have been used and the strategy which have been used to train the models. But to conclude about this work, I can say that for us, so this is a position of at-attentive. We think that deep learning may be very efficient against secure RSA implementations, but not only RSA implementations, also ellipticure cryptography, and also maybe GCD processing, or all processing which takes a lot of time and where the information is small and hidden in a lot of time samples. What is important also to get from the presentation is that the selection of point of interest is less important for deep learning techniques that for template attacks. Deep learning techniques currently used are very basic, as also said by my colleagues previously, and we think that attacks can be greatly improved. We are just, I will say, implementing the first chapters of the books dealing with deep learning. And the reported tests are for toy implementation, so this is an RSA evaluated in CC, so we think that, of course, RSA implementations evaluated by labs and developed by, I will say, the good companies in the field should resist more than this implementation to deep learning techniques, but I would say, take care. Thank you, and if you have questions, but we don't have time, I think. That was a wonderful talk. Unfortunately, we have to just stick to the timeline, but I have learned a lot, thank you. Unfortunately, we don't have enough time to take questions, but I encourage you to take it offline and talk to Emmanuel if you have any question regarding this nice study done by his team. Please join me to thank all the speakers of this session, and that's it for this machine learning session. Thank you for all of your attention. Thank you.