 Hi, welcome to the presentation of our paper, a side-channel attack on Masked-ins CCA Secure Saber Chem Implementation. I am Kalle from KTH, and today I have the privilege of presenting this work at chess on behalf of my co-authors. So today, we will briefly introduce Saber and how masking works. Then we will describe the presented attack and show its results, and we will finish off the presentation with a key recovery demo. Now, Saber is a key encapsulation mechanism which leverages public key cryptography to securely transmit a shared secret, such as an ephemeral symmetric key, which is then later used for bulk of the encryption. It is one of the finalists in the NIST Post-Quantum Standardization competition. It is one of the reasons why we have selected it for the evaluation. Saber belongs to a group of lattice-based crypto systems. It relies on module learning with rounding, which is a modification of learning with error. The fuzziness comes from lopping off a few bits off of each coefficient rather than sampling from an error distribution. Masking is a common countermeasure against side-channel analysis. So let's say you have a code section that's known to be leaky, or you would like to protect it. Now, with the masking countermeasure, you take your sensitive data and you split it into two shares for first-order masking, i.e. a random mask as one, and the other is the secret data XORed with this random mask. You then execute the code section on both shares, and at the end, you take the results and you XOR them back together to get the true answer back. The idea is that no point during the code execution did you manipulate the original secret data directly. Therefore, anybody that is watching on the power rails or listening to EM, they don't directly get to see what you are working on. Masking is considered ineffective but expensive computation-wise countermeasure, because we have essentially doubled the computation effort by executing the protect the code section twice for each share. So here, we see a portion of the saber key encapsulation mechanism that we are focusing in on. Over here, K is the session or ephemeral key, whereas S is the long-term secret key. Now, our presented message recovery retrieves the message from which we can derive the ephemeral key directly, but we will go one step further by combining it with the chosen ciphertext attack to derive the long-term secret key. Now, our tag point in red here is in the saber PKE decryption function, line one, of the CAM decapsulation algorithm. Our presented message recovery attack is a profiling attack, which consists of two stages, profiling and attack. During the profiling stage, since we have the public key of the device under attack, we can generate ciphertexts that contain a known message. We then send these to the device and ask it to decrypt. This is what gives us a set of power traces with known message bits. Now, normally, with the known message bits as labels, we can train a neural network to recognize what the power trace looks like when the message is a 1 and when the message is a 0. Easy peasy. But because the data has been randomly bit flipped due to the random mask, what looks like should be a bit 1 may actually encode a bit 0. And other times, 1 may actually mean a 1. We just don't know. This is the power of masking. So in previous works, during profiling, you would need to explicitly know the mask that's being used or you disable the mask altogether, setting it to a constant 0 or something that you know. This way, you know what the bit value is while it's being processed and how the power trace looks like for this corresponding bit. Now, with deep learning, on the other hand, from the attacker's point of view, we don't really care what the mask is. We just show the neural network the entire trace, such that it contains both shares, of course. And just tell it what the original unmasked bit is supposed to be. And of course, we know, since this is a chosen ciphertext, we picked it. We show it to the neural network and just tell it to figure it out. And believe it or not, it sees right through it. It figures out that one portion of the trace is related to the other and it can derive the hidden bit value that's behind it. This is without us ever having to explicitly tell it what the mask is even for this particular execution. And as I had mentioned, previous methods require you to fix or modify the mask in order to profile. This generally removes the ability to profile on the device under attack. There are usually measures in place to prevent somebody from reflashing it with a program to just read out the key to the screen with a printout function or something. Reflashing it would erase the key or otherwise render the secret key storage inaccessible to the new program. This gives our method an advantage because we can use traces and train on the exact device under attack. This largely removes device and process variations that would come from having to profile on a cloned device. In previous attacks, you would have to perform the attack stage in three steps. You would perform, first you would perform the recovery of the mask. In second, you would go and retrieve the second share, so which is the mask plus secret. And finally, the third, you uncover the secret by combining both shares. This is of course in contrast with ours, for which we just show all the shares at once and the neural network spits out what it thinks the bit is gonna be. Traditionally, in side channel analysis, we would rely on the t-test to help us hone in on the leakage points in the trace. But because this is an implementation that's protected with masking, we won't find much, if anything, that passes the interesting statistical threshold. As in previous methods, one would just fix the mask, use the t-test to find the leakage points and you would have everything, Bob's your uncle. But of course, we can't do that here. However, if you take a look at the averaged power traces, a very clean structure starts to emerge and when you start counting the peaks and comparing the loop counts to the algorithm, you quickly start to figure out where you are. On this slide, you'll see the beginning of the decapsulation algorithm. The pseudocode is on the left over here. We can see a few functions. The long one here is poly A to A and following that is pull to message. As you zoom in, you can actually see when the loop and state registers reload and start processing the 32 byte mask of pull to message. And indeed, if we zoom in further, you can see that there are 32 peaks for each share and each one represents one byte. Do keep in mind though, that each byte looks the same to our eyes here only because it's been averaged. The peaks and valleys for the individual traces are much more varied. So as I mentioned before, pull to message was previously known, but for the masking implementation, we stumbled upon a second leakage point under poly A to A, which is a custom primitive for masking developed by the authors of this saber implementation. The trace in front of you here contains the execution of 256 bits of the message. The leakage of both shares happen side by side so that for each interval between the spikes contains information on bit end of both shares. Now, from the regularity of the bits, this actually gave us an idea to cut across bits and take their union. This actually significantly reduces the trace acquisition time because from one single trace containing say poly A to A function, we would actually get 256 training intervals to show the neural network. This of course reduces the trace acquisition time proportionally. So if you say needed a million examples to train the neural network properly, this would cut it by a factor of 256 times. Now for the evaluation of our method, we deliberately sourced our boards from different suppliers to get an idea of the effect that process variation has. So D1 and D2, we source from the same supplier, whereas D3 was from a different supplier. If we take a look at the laser etchings, D1 and D2 suggests that they were manufactured in a factory in China, whereas if we take a look at D3, it was made in a factory in Philippines. Now in the table here on the very right column, you can see that the average accuracy does fall off as we get further away from the profiling device. Before I go to the next slide though, I would like to bring your attention to the fact that for Paul II message, the very last bit of every byte, so piece seven in this column, we found is the most difficult to retrieve for the neural network and this pattern actually repeats itself for every 32 bytes. These are the results for poly A to A leakage point. So this is the table for the second previously undocumented leakage point. The overall average is a bit lower at 95 versus the last slide at 97. But what's different and interesting for this leakage point is that if we look again at the first byte, only the first bit, so P0, plays hard to get. There's no major difference in the other seven bits. This actually holds for all 256 bits except for the very last one. So for this leakage point, all 256 bits, only the first and the last are a tad more difficult to retrieve. Whereas for the leakage point on the last slide, we get a repeating pattern of every eighth bit not wanting to cooperate. But it has higher overall accuracy for the unafflicted bits. So of course, as you see by now, you probably would be asking what would happen if we show both points to the neural network as they are kind of complementary. And indeed, this is the case. The combination actually increases accuracy across the board and we get our best results at an overall average about 98.6. As mentioned before, we will go one step further and combine the message recovery attack with a chosen ciphertext attack to derive the long-term secret key. The chosen ciphertexts are constructed so that we can derive information about the long-term secret key coefficients from the recovered message. The new secret key recovery approach, which we developed is based on maps from error correcting codes that can compensate for some small amount of errors in the recovered message. The full details and decoding tables are in the paper for you to take a closer look at. But now we will show a quick five-minute demo of the full process from gathering traces straight to recovering the long-term secret key. So what we have here is the chip whisper UFO platform. It consists of a modular carrier board that these blue target boards will plug into. Over here is the chip whisper light and for our purposes, it essentially acts as a cheap oscilloscope that will help us record the power fluctuations in the SDM32F4 board as it processes its data. The board also helps relate communication between the host computer and the ARM core over USB. For example, we can send a ciphertext to its buffer and then follow it up with a command to start deciphering it all while we carefully watch from the chip whisper the power fluctuations that are occurring over the shut resistor. Now we have also sourced three target boards that have the same model chip number on them. So B1 and B2, which in the paper is D1 and D2 and D3 respectively. These two we sourced from the same supplier at the same time whereas B3 over here was sourced from a different one. And if we take a look at the laser etchings on these chips, there is a closer picture of these in the paper. D3 or D3 has very different markings from the other ones and a quick googling seems to suggest that it actually came from a different manufacturer, a different factory. So we can begin the demo now. So first thing we can do is we can ask the ARM core to generate us a key pair. Now these are random and as you can see every time we run it, it will be a little bit different. And what we have done is actually we have asked the ARM microcontroller to send a copy of the secret key to our computer just so that we can follow along. Now normally we wouldn't have to do this but it just makes the demo easier. So now that we have this, we can also generate the ciphertexts, we will choose the ECC. So what this is is it has generated these 24 ciphertexts and when we do get traces, ECC and one. So what's happening in the background right now is that it's taking the ciphertexts that we have just generated, sending it to the ARM core, asking it to decrypt it and while it's decrypting the chip whisper is carefully watching the power fluctuations in the voltage rail and recording these in an umpire ray which then later on we can use to perform the attack. This process takes about a minute so we can give it. So that's actually taken exactly under a minute. Now that we have the traces which you can see in here, I'll show you here, they're just in an umpire ray. Now we can use these traces along with these here. So these four, these four here are our neural network models. They've already been trained. So what we can do is show these traces to our neural network. Please excuse the warnings, I've deliberately removed one of the libraries to prevent the computer from using GPU acceleration. This is just to show that this is a very lightweight implementation that the CPU can easily do. So what do we got here? So here are the recovered traces. So these traces here, sorry, these messages here are what the neural network thinks the traces represent. Now because we have the private key, we can actually do the true decryption of the ciphertexts and compare these against the ones that we think they are. And as you can see, there are a few errors in here. And this is why we have ECC. So as you can see, we can still do the key recovery and it should work. All right, so this is what we've essentially figured out what supposedly the key is. Now according to this, there's actually one question mark. So the ECC has identified that there is something that doesn't quite belong. Three coefficient 56, so it's approximately here. Yeah, here it is. So this question mark shows that we suspect all these other ones are correct, but this one seems a bit fishy. And as you can see here, the true key is supposed to be a zero, but of course the algorithm doesn't know this. And what we can do is all we have to do is just enumerate this and it has figured out the key. And of course the key enumeration is, it works exactly how you think it works. Because we have the public key, we can encrypt a known pattern and then using our guesstimated key, we can decrypt it and see if it matches our known pattern and then we repeat for all the possible coefficients. In this case, it says coefficient 568 is zero, which is exactly what we expect as this portion here is supposed to be a zero. And welcome back. So to summarize, we've basically seen that the neural network is able to see through the masking that during training, it figures out that the shares are related to each other and it can derive the correct bit. We then combine this with a secret key recovery based on maps from ECC, which ultimately led us to successfully recover the long-term secret key from a first-order masked Saber implementation. Our future work includes evaluating higher-order masked implementations. And we actually, we very recently just showed that first-order masking combined with the shuffling countermeasure can also be broken. A preprint can already be found in the archive with the final version being published later in November this year. Our longer-term goal is to design countermeasures against our own attack.