 So thank you all for coming to my presentation. I'm Peter Robbins. I'm a PhD student from Hasselt University in wireless security and privacy. And so in this presentation, I'll talk about performing low-cost electromagnetic satchel attacks using RTL, SDR and neural networks, which is what one of my students likes to call black magic. So I hope you guys will enjoy. First of all, I'd like to start off with giving an idea of my motivation for putting this talk together. So when I first started out doing this kind of stuff, about two to three years ago, I think the information regarding performing satchel attacks using SDR was quite scarce. Yet a few academic papers, I think there was also one PhD thesis that used SDRs to perform these kinds of attacks. But unfortunately, the code used in these projects is very often closed source. One exception is ChipRisper, which is an open source hardware and software platform for performing satchel attacks. But it's more focused towards power satchel attacks, and it also uses custom hardware. So what I want to do in this talk is give you guys an idea of how to get started using very cheap hardware like an RTL SDR, and also using open source software. In this presentation, we'll use the AMMA framework, which is something I developed for over a year during my PhD. As next step from this case, we'll use some machine learning to improve existing satchel attacks. Now for those of you who don't know what side channels are all about, so when your hardware is performing computations, then obviously there is some current going to a conductor, which will in turn emit an electromagnetic wave. This is just the way that physics work, and the amplitude of this emitted electromagnetic wave turns out to be proportional to the power that is being consumed. So and obviously some computations require more power than others. If you're mining bitcoins, then probably your computer will be consuming a lot more power than whereas as it's just eiling, then it will probably not consume so much. And so electromagnetic satchel attacks try to do the reverse process. So based on the observed electromagnetic radiation, they try to infer what was being computed on the chip. And some interesting examples are performing encryption algorithms. So in this case, you would want to find out what the secret key was, or even just pressing a key on your keyboard because this will also produce some current in your keyboard and this will be observable in the electromagnetic spectrum. And lastly, some memories are right. So those are all interesting things that you could observe as an attacker. In previous works, if you're interested in this kind of stuff, so there are some papers that discuss nothing keystrokes from a keyboard, RSA and algae melky extraction, or even just like in one of the first papers actually was just reading the raw emanations by CRT or LCD screen. Now if you look at the typical attack scenario, what you have is two entities. You have an attacker here and a device under attack in this presentation that will be an Arduino device. And what happens is the attacker will send a plain text to encrypts. Then the Arduino will do the encryption, it will calculate the ciphertext, but in the process it will also inadvertently leak some EM radiation during its computations. And the attacker can capture that signal and try to infer the used encryption key through a statistical analysis. Now one of these attacks that is used in practice is called the correlation electromagnetic analysis. And we'll see how this can be applied on AES. Now as a first step it's a good idea to take a look at the algorithm and see where the secret key is being used. For AES this is pretty trivial because immediately in the first stage of the algorithm you can see an add round key stage which will take the keys and inputs, it will take the plain text and just XOR them together. So that's really trivial. Then it goes to the sub byte stage where you have a substitution. So one byte will just be substituted by another, by something which is called an SBOX which is just really a lookup table. And it turns out that when you load this value in the register it will leak some information. Now what exactly is leaking in this case? So if we have a CPU register in an known state R and we do the add round key and sub byte steps then the register will be updated with D which is equal to the SBOX of the plain text XOR with the key which is just those two stages performed. And I'll use S to denote the index of the key bytes. So for AES 128 we have 16 key bytes. So S can be zero to 15 or one to 16 depending on your convention. Okay, and then we have some power consumed and this will depend on the number of bit flips. One way to calculate that is just to take the hamming distance between R and D. For example, if R is equal to this value we have D equal to that value then we just take the hamming distance, we XOR them together and then we get hamming distance equals four because we have four bit flips. Now in practice you often have R equal to zero because the registers are initialized to zero so then you can use the hamming weights which is just a complicated word for counting the number of ones in a byte. Okay, now comes the somewhat harder part because now we're going to simulate what the leakage would be if the key is for example zero. So we're going to take multiple encryption operations we're going to send multiple plain texts to the Arduino and we're going to keep the results of those encryptions and so if we assume that the key is zero then we're going to model what the leakage would look like using our power consumption model from the previous slide. So we're going to take all these hamming weights and put them in one big matrix for each of the M M encryptions you see here. And we're going to repeat that for each key byte S so that we have a simulation for all the key bytes in the AS key. Then as a final step of course we still need to do our real measurements and then we're going to correlate our measurements with each of the models that we just constructed. Now if you take, if you capture an EM signal you have a signal X and it has a time in XT because you have for example 10,000 samples. And so we're going to check each individual sample and see which one has a very high correlation with any one of the models from the previous slide. And to do that we can use any metric but what's used in the CMA attack is the Pearson correlation which gives you a negative or positive linear correlation which is why it's called the correlation power attack. Okay, so now we're ready. So this was all about the theory and now we're ready to apply this in practice. I did a little experiment for FOSDEM on AS implementation running on an Arduino. So let's take a look at how that went. If you want to do something like this yourself usually there are a few steps that you have to go through. One is the measurement setup. So you have to set up your measurement equipment correctly. You have to identify which frequencies leak from the device. Then we're going to effectively capture the leakage traces using an RTL-SDR. And then we're going to perform a standard CMA attack on AS. Finally we'll see how we can use machine learning to improve this attack further. Okay, so for the measurement setup we already know our target is the Arduino. And we're going to assume that there's some software AS implementation running on the device. So what the victim has done is he has uploaded a program to the Arduino with a static key. So this one's fixed. And the attacker can only send plain text to the device and the attacker can also then measure what the electromagnetic radiation is emitted by the device. And for that we'll use the RTL-SDR as our capture device. We'll also use an EM probe which is just a special kind of antenna for capturing this kind of electromagnetic radiation. But you could in principle also use a directional antenna of some sorts. We're also going to use an amplifier which is convenient because you guys all got an amplifier today, so that's nice. And then we can use a laptop and new radio for some signal processing. Okay, so time to position the probe and I found out a good location to do this is near the VCC and ground pins because presumably there's more current going through there. And this gives us a better quality signal. You can see the amplifier that I used is quite expensive, but I think that you can use a cheaper one. But if you're not sure then it's better, of course, to use a more expensive one. And then you have the RTL-SDR which is only 20 euros. But anyway, I think it will also work without the amplifier in any case. Oh, and if you don't know which pins to look for you just look up the pinout diagram and check it out. Okay, so now it's time to use your favorite frequency analyzer or spectrum analyzer to look for leaking frequencies. And if the Arduino is idling then you get something like this. And when we send at regular intervals some random plaintext to encrypt then obviously something is changing. We can see some new frequencies appearing here. So that's interesting because these frequencies might leak some information about the secret key being used. And if we zoom in then it's even more clear. So we don't need a lot of bandwidth in order to observe leakage occurring. Okay, so now we need to actually capture the traces. We just use Chico or X to perform random encryptions but we still need to capture them and store them in a certain data set. For that I use this, I developed a small tool which is called MCAP. It's part of the EMMA framework and it's basically just a wrapper around new radio blocks. What it will do is you can specify a sample rate to capture your traces. You can specify a center frequency which is the same one that I used here. And then you can specify the number of traces that you want and an output directory which is just for them Arduino test directory. Okay, so now what MCAP will do is it will automatically start capturing. It will send a serial command to the Arduino to perform an encryption and then automatically stop the trace. And if we instruct the target to perform random encryptions with this static key then we have a data set in our Fossilm Arduino test directory. So this is the key that the attacker has to guess. So this is unknown to us. Okay, now we can use EMMA to plot the data. And in EMMA you can specify multiple commands as a string. So EMMA will sequentially execute those commands. So as a first command I gave the apps commands which are just AMD modulated data because we are interested in changes in the amplitude of the signal. So that makes sense to use AMD modulation in that case. And there's something interesting going on between 10,000 and 20,000 samples as you can see because here the amplitude rises, right? But there's a problem because if you remember from the SEMA theory slides, the encryption operations need to be aligned because we're going to select a single point in this trace and use that to correlate with our model. But the encryption operations in this case are not aligned because at some times the capture will start a bit later or stop a bit earlier like in this case. So we still need to perform an alignment. Now in EMMA this is implemented as a simple cross correlation. You can give the align command and then after that I also did filter just to remove some of that high frequency noise. And now some interesting patterns start to emerge because now we can really see what the AS algorithm is doing because this section here corresponds to the AS init function and this section is really the AS encryption function. So if you look closely you can count 10 peaks which correspond to the 10 rounds that AS is executing. So now we're ready to actually perform our attack. And if you implement the SEMA attack that I discussed earlier then, after 51,200 traces, you can also do this with EMMA, you will get a table of correlation results and the key byte with the highest correlation will be listed at the top. So in this case it's correctly predicted that the first byte is B1. However, for the second byte it's incorrectly thought that the key byte was 52. So whereas we really used D3. So there's clearly some room for improvement and let's see if we can do that using machine learning. There were some issues with the classical approach, namely it only uses a single point. It doesn't really sound good because there could be multiple points in the trace that leak information but we're only using one and that's the one with the highest correlation. So let's see if we can fix that. Oh yeah, you also need to align the traces which are also inconvenient. And it seems that machine learning and deep learning are a good solution to that problem because what those algorithms really do is, well, if you consider a signal as a 1D image, they will automatically extract all the features from that image and try to make an accurate prediction and they can do that using very complicated filters or complicated features. And so this seems like a good thing to do. And exactly in 2017 there was a paper by proof et al that showed that this is indeed possible and there was similar work at Black Hat 2018 where they basically just use your standards image classification neural network and apply it to side channels and it actually works. So it was quite cool to see. But this is really the best approach. If we compare images to what the input data for electric radiation looks like, then we can see that they are very different. So in images you have classes that are easily observable by humans. So if we get a class corresponding to a car, for example, we get one image and we can see that it's a car, right? But for EM traces, that's not the case because if you look at the different traces that I captured, each class looks very similar to the other one. So there's only a very, very small difference in amplitude between the classes which are the key bytes in this case. So maybe there's a better approach and maybe we can do it in another way. So here's what you can do. Let's assume that we have an artificial encoder trace which we call y hat t and we're going to construct this original traces. So we're essentially going to combine the information from the original trace from multiple x t points and we're going to combine them using activation functions, linear functions don't really matter and the neural network is going to learn that automatically for us. And the goal of this encoded trace is of course to approximate the true value of the leakage which is equal to the hamming weight of the S-box results. So now how do we determine which weights to allocate to these samples and how to combine them? Well, we can just optimize any neural network architecture. So we can use, for example, multi-layer perceptrons. You can use a convolutional neural network if you have some time-shifted data. So the architecture doesn't really matter as long as you end up with these encoded samples that combine the information from the original trace. Maybe also interesting to note is that I in this case used 16 samples for the encoded trace where the first sample should contain all the information that's required for guessing the first key byte and the last one for the last. So, okay, so there's one more thing to do and that's to tell the neural network how to, or to optimize or rather how to give a penalty to a certain weight or how to determine which weights are good and which are bad. And for that we can use a loss function. So, since we want to optimize the correlation, what you wanna do is you wanna do one minus the correlation because in that case, if you have a negative correlation, the loss will be two. If you have no correlation, the loss will be one and if you have a positive correlation, then your loss will be zero, which is exactly what we want. And then as a general cost function, we take the sum of those 16 loss functions for each byte and we have our final cost function that we can optimize. If you implement this using TensorFlow, then you don't have to manually calculate the gradients to update the weights. They are calculated automatically by an optimizer and for the optimizer you can use something standard like RMS prop or Adam, doesn't really matter. Okay, so that's something you can also do with Emma and Emma, there are some modules that allow you to automatically train on the input data. So, what we're going to do is we're going to generate a new dataset of completely random but no encryption. So, we know in this case what the plain text is. We know what the cypher text is. But the only goal of this dataset is to let it learn what the relationship is between the original samples and the encoded samples. And this can be done using the core train command. And this is then what the neural network will see. You have the AS in its section right here and you have the AS encryption rounds here. So, if you correctly train your neural network, what you should expect is that there are some important samples in the first round of AS because that's where the key is used. So, one way to verify that is to visualize the saliency after learning, which is just another way of saying, okay, if I change some value of the input, what will be the effect on, which inputs will have the biggest effect on the output. And for the first key bytes, this seems to be okay because we have a few samples here around the first round of AS that were learned automatically by the neural network. The same for the seven key bytes, except that we have another leaking point here. And then for the 12 key byte, we have another few leakage points. So now we're ready to combine all these leakage points together and have our encoded trace by putting it through the neural network. And if we then, again, execute our attack, we'll end up with 51,200 encoded traces, which then just combine the information from the original ones. And we can run a standard CMAT attack once again. So in this case, the algorithm was able to correctly determine the entire key. So we don't have to do anything more than that. And we clearly improved upon the original attack. If you find that interesting, there is also another test that I did in my paper, which I will present at chess this year. I also make a comparison with some state of the arts techniques for convolutional neural networks, where they used a 19 layer CNN for doing actually the same attack. And it turns out that this technique by using the encoded traces works quite a bit better. So feel free to check it out if you found that interesting. So in conclusion, we've seen how spurious EM emanations can leak information about the state of the device. We've performed a CMAT attack using low cost RTL-SER. And we've shown that it is feasible against an Arduino running software AS. And I'm fine with neural networks to improve upon this and essentially remove the knowledge from those traces to improve the CMAT attack. So it looks like I finished a lot before my time. So thank you very much. And if you have any questions, feel free to ask them. Also, if you're interested in checking out Emma, it's open source on GitHub. And I also put the data sets that I used for the presentation online. So you can also download them if you want to. Is the training happening online or offline? The training is happening offline, but you can also specify that you wanna do it online. So there's an option for that in Emma. And then it'll just screen the samples over UDP and it will train on a small batch and then you can train it online. Am I assuming that the key is fixed? Yes, yes. So for this case, if you wanna attack a certain key, then you have to assume that the key remains static for each of the encryptions. Yeah. Okay, so how to protect? Well, yeah, yeah, yeah. Yeah, so what you could do is you could of course shield your device so that's no electromagnetic leakage can occur. There are also some software-based countermeasures. For example, you can at certain points in the algorithm mask the sense of the values by XORing them with random values, for example, and then later on the algorithm reverse that operation to get a correct result. But it turns out that at least in the implementation that I tested in my paper, that if you use techniques like neural networks, then it will be able to find the relationship between those two points because then you just shift the problem from having only one point to investigate to having two points to investigate because you have two leakages that are important and you somehow have to combine them. So maybe there are software-based attacks or countermeasures that work, but anyway, you have to be, yeah, you have to think about that and test it in order to be sure, so, yeah. So the value of the correlation, because for example, our model was exactly 180 degrees. Yeah, yeah. So you also had some DPA stuff and you took the absolute value of the correlation. Well, that's something that you would do in a standard SEMA attack, of course, because there could also be indeed a negative correlation between your observations and the models. But if you're doing, if you're optimizing a machine learning algorithm, then you don't have to do that because the optimizer will automatically just put the weights like this so that you always have a positive correlation because then it will just invert everything. So, yeah, if that answers your question. Okay. Have you tried doing this attack multiple times for different kinds of keys and if so, did you have to try it again or? Have I tried it on multiple keys? Yes, yes, of course, yeah, it works. So it's not, also the training happens on random keys, so it's, the test key that I used here is a key that I never observed, for instance. Yeah, so you train on the random keys and then you evaluate on. You always manage to get the full reconstruction. Yeah, yeah, yeah. So sometimes if you have a, if your alignment fails or if you have a very noisy signal, it could be that you require more traces, for example, but yeah, it's also adding noise requires that you take more traces in your test set. How well does this generalize have you tried learning on one Raspberry Pi and then doing the attack on the second? That's a very good question. So how does it generalize? I didn't test that whether that works, but you know, intuitively, it should because the devices should be similar, so we would expect that they leak. But that's just a guess, I would have to try that. Can you speak up, please? Can you speak up, please? Okay. Yes. Okay. Yeah, yeah, yeah, exactly. Yeah, yeah, that's true. Okay, so the question was, the Arduino is a pretty simple device. How would you extend that to other devices? This is an active topic in my research. Actually, I'm trying to do similar attacks on the Raspberry Pi and also more complicated algorithms. This was just for demonstration purposes because when I first started out, I had a lot of trouble finding the information that I needed to get started and hopefully this presentation will help you that. But yes, if you use multi-core CPUs and more complicated devices, then those attacks will be much, much harder. Yeah. Okay, so what about hardware AES? I personally didn't test that, but there are some papers that also attack hardware AES and it seems to be feasible, yes. So that's also possible. As long as you have some leakage, then yes, you can do that. Yes. You're doing 70 megahertz, but have you seen how far it radiates with better antenna? Can you make this? Yeah, so how far does it radiate? In my experience, not that far because I try to do it with a directional antenna and then I think you have to amplify the signal quite a bit in order to get some feasible results. But in any case, I just did it with a probe and that works. But yeah, from a long range, I don't know what you precisely define as long range. Well, yeah, some people haven't able to capture keystrokes through walls, but that requires a specialized antenna and powerful amplifier. So it kind of depends. Is this about securing the device? Yeah, so what is my major concern? Well, I think it's just fascinating that just by observing the electromagnetic waves executed or transmitted by a chip that you can derive sensitive information like secret keys. So obviously, if you could improve this attack and you only require, let's say, one trace or three traces, then this would be a very big threat to the security world. And especially if you could do it on a long range, so yeah, that would be concerning, I guess. Okay, thank you.