 So, how many of you have trained a machine learning model before? How many of you know what TCP and UDP is? Great. And how many of you know what does attacks are? Okay, cool. Okay, so, hi, I'm Davido. Thank you for being here. Today, we'll talk about training a deep learning model to detect does attacks on microcontrollers. And, okay, let's start. We have a lot of microcontrollers for this research. We use the ESP32. It's a MCU microcontroller unit, and it's very used in IoT projects because a lot of things like firstly I think because it's affordable, it's in Brazil is 30 reais. I think it's like $5 or $200, so it's not very expensive. And it has a very good development framework and a large community. And it has a lot of nice things in the MCU itself, but for me, the nicest thing is that it has Wi-Fi and Bluetooth connectivity on this chip. You don't need to get another MCU to get this connectivity, and it's really awesome. And it's affordable, and yet I don't see many projects, any IoT projects, mainly in Brazil where I live. And I thought my thinking was that IoT, to implement an IoT project, you need a lot of money and things like that. But then when I found out about the ESP32, I thought, okay, isn't it? It is not. Very expensive. And when I saw the device, okay, I'll play with it to make people aware of that the device exists, and we can build nice things with it. And cool. What's the project then? IoT devices are easy targets for cyber attacks, due to a lot of things, but mainly to the poor cycles of updates. We have a cell phone that our cell phone has today. The cell phones have a lot of computational power. We have internet all the time, so it's easy to update the software. But these devices are like, they don't have many computational resources, and they are out there in the field, so it's hard to update them. And here we are focusing on one type of attack, is a volumetric attack, specifically those attacks, and we will talk more about this later. Those attacks are like when the attacker sends a lot of packets to try to shut down, to interrupt the function, yeah, to make it stop working, to make the system stop working. And today I'll talk about three main challenges in this project, and tell a little bit about the story of how it went. The main one was, do the features of my training data set match the features in my production setting. So the data set I was using is the SICK IoT data set, 2012, created by the Canadian Institute for Cybersecurity, and it's a state-of-the-art data set for these things, and the nice thing is that it's built specifically for IoT devices, so it's not easy to find data sets for this. If you see the title of the talk, it says, do those attacks? It was because at first I was using a data set for, we did those attacks, but it was very old and it didn't have a legitimate traffic there for me to work with, and this data set is newer and it has attacks, attack packets, and legitimate packets, so I went for those attacks. To communicate with devices, the network devices, the ESP32 has implemented the TCPIP model, and it has a lot of protocols, and to do that, it uses a lot of letters and names, and for that it uses LWIP, it's like an implementation of the TCPIP protocols for microcontrollers, and the protocol has four layers, and it implements the four layers of the model. So the data set has pickups, pickups are files that network analyzers, packages, programs use to start the data, and then we can get these pickups and extract the features, and in my research, I extracted 10 features, and I extracted them to CSV files, and the problem was, okay, I have these features in my training data set, but I need to get the same features inside the ESP32, and it was a challenge, because it was my first time working with this stack, this LWIP stack, and it was very hard to find the, how could I get these features there, mainly because there's no documentation on the LWIP stack, but to my surprise, ChatGPT was really helpful. I was skeptical, but it really worked. I could ask them, I could ask the model, and then it would give me the answer. Of course, I had to do some drinks and things like that, but it was really helpful. Also, the work in one, I'll show in the references, the work of some Brazilian researchers, they did the same thing I was trying to do, and they open sourced the code on GitHub, and without them, I wouldn't be able to advance the research so far, and it's really nice, because the work they did, and then they open sourced the code, and now I can build something on top of it, that's amazing, that's why I love open sourced, and yeah, this was the main challenge, done okay. Then the next challenge was, how much time do I have? Then the next challenge was, how do I split the training in the test data? So, the dataset has legitimate traffic, like real traffic, and is captured by days, like one day they had a room with a lot of IoT devices, and they captured the network packets. For the attacks, they only have pickups with only attack packets, and the attacks were carried out, based on these three protocols, HTTP, UDP, and TCP, and our goal in machine learning is always to create datasets for training and tests that resemble the real setting, like when the model is out there, will the model receive the same kind of data, and I have three packets for each device, and then I thought, okay, I could use two for training and one for testing, but it was not that simple, because I couldn't just concatenate the packets, I will explain. Here are examples of legitimate traffic, like these pickups only have real data, no attacks in these network packets, and these, for these atomic coffee makers, IoT device, and for the HTTP protocol, we have three pickups, and each pickup has only attack network packets, and our goal is always to create datasets that resemble the main data that the model will see live, and after a lot of thinking and back and forth and things like that, we came to a solution that I don't know is the best one, but that is something, and here I will explain quickly. The green ones are the legitimate packets, and the red ones are the attack packets. First, we get a random legitimate traffic, like here I got the two packets, and the numbers are the time stamps of the packets, then we add the time of the random packets to the attack packets, then we concatenate them, and finally we start them. Here, we don't see the main gacha of this technique, because we are working with integers without the numbers after the dots, but in the real packets, they came very fast, and then we will see blocks of attacks and legitimate traffic packets between them, and this is ideal because this is what happens in real life, like when you're having an attack, you don't only see the attacks, you see also the real traffic going on, and yeah, this was how we got this generation of the train data sets, and the last challenge was learning C, C++, I'm a Python developer, I thought, okay, I work with Python for a lot of time, it will be fine, for the most part it was fine, but then there were some things that were really hard, even to search for, and I thought I would have had a lot of time today, but here I am, bringing the chat to the conference, and the model also helped me, for example, here we have Boolean equation, what do you think this will do? What will be the result of this thing, this code? Zero or one, right? If they are equal, no, wrong. Somehow this upper sense is a bitwise operator in C, and actually for this TCP variable, the expression will give us zero or two, and why this is important, because the training data set for this feature, the feature is zero or one there, and if we change it to two, our distribution is different from the training data set, and it will break the assumption for the model to work well, five, thank you, and yeah, so to handle that, we had to get another expression to check if it's different than zero, then we would output one, and if not, it would output zero, it was hard to find, because if you go to Google and type, upper sense, C++, it always gives, it always, it gave me the Boolean, upper sense, it was hard to find it doing Google searches, but Chatchity was able to explain to me, cool, nice, but that's the model work, it has same features, I'll go to the architecture, it's very simple, we have two dance layers, I'm not going to explain this if you want to learn more, talk with me later, we have our focus on one thing I want to focus on, is the recall for the attack packets, we have 99%, this means that of all the attacks in the data set, the model was able to get 99% of them, and the answer, actually the answer to whether the model works or not depends on the application and the IOT system in place, and I wanted to get a glimpse of that, so I've tested it with a UDP server running on S Peter 2, UDP server is a layer 2 protocol that is like TCP protocol has a connection, so if you send a packet, it will, okay, it didn't arrive there, let's try it again, for UDP you just send and you don't care if the receiver receives it or not, and it's a real setting for IOT devices, they are sensors and things like that, so it's very faster than TCP, so that's why it's used, and I used Python scripts to simulate the legitimate traffic and another one for the SD attacker, the attacks ran for 60 seconds and the real traffic ran for 80 seconds, and the firewall worked better than I expected, it was able to reduce the, like, if I didn't use the firewall, if I used the firewall, 72% of the malware packets were dropped, and the best thing is that it didn't disrupt the legitimate traffic, so it was really helpful, but we can see that the recall is very different from the one on the test data set, and yeah, in the lessons we learned with that, is that those attacks came very significantly in nature, like this attack I simulated, it was already different from the ones I saw in the training data set, and another trainer for one kind of data set might work well, but then it's not guaranteed that it will work so well in any data set, in another data set, and what we recommend is that if you are ever building an IoT system, you collect your data and then you train your model for your system, and then you can replace only the model in your system, and we called it DOS Guard 32, because of ESP 32, and yeah, this was the main results we got, okay, this model will probably work well, works so well in your system, and all the code is available on GitHub, and mainly our main goal was to make people aware that this device exists in that ship, and that we can build IoT devices with it, IoT systems with it, and yeah, that's it, thank you very much, if you have questions, this is my Twitter, I don't have Twitter, my GitHub handle to see the code and things like that, and yeah, thank you, and these are the references. Do we have questions for Deborah? Yes. Okay, I think here is the results first, and then I'll come to you. Hello there, thank you. How did you deploy, I mean, I guess you deployed the model inside the ESP 32, right? What did you use, like terms of flow light, or? Yeah. Yeah, so you have to do some kind of conversion, or did the model already comes like ready for it? Cool, cool. A lot of headache, to do that, but yeah, we trained the model using Keras, and then we, they have a tool, TF light, we have a tool to convert it to dot TF light, and then we need to convert it to dot CC, because to be deployed in the MCU, it doesn't accept, it doesn't handle files, so we need to add it to the code, but yeah, and I have a lot of trouble to get this working, but yeah, you train like a normal, usual machine learning model, then you convert it with TF light, then you convert it to C++. Can I keep? Yeah, another one quickly. How large was it? In the end, three carbides, something like that? It's tiny, but I only have two layers, so it was simple. Okay, actually that's my, that was my question, like how big could you make the model, and did it make any difference when you had a bigger model result and better results kind of like? Yeah, I didn't went further in improving the model because I was afraid of overthinking the data set, so this was my first attempt, and then, oh, 99, okay, let's go. But I tried with a sequence model, alias LSTM, and it actually got worse than the simple model, but yeah, I didn't do many, there's a name for when you're modeling. I didn't do many modeling because I was hitting my head on the wall working with C++, but yeah, thank you. This is a cool project. Oh, thank you. Yes. Thank you for the presentation. I would like if you could please just show once again this slide about the features, because my eyesight couldn't really see what the features were. Sure, sure, let's go down. And if you could tell us what did you take in consideration for selecting those features? Great question. This was one of my main research questions. The features here. My main criteria for selecting the features was, is this feature available on LWIP and do I know how to get it? But actually my main resource for selecting the features was this reference, this one, deep defense. They train it because one thing I didn't mention is that usually models for this kind of detection of those attacks, usually they use statistical features like the number of attacks during a certain period of time, how many factors in the flow. But since we are deploying the model on the SP32 to compute these features, it would need a lot of computational resources and it's not feasible. So we tried to model the model using raw feature packets. And this paper was one of the few that I see, so, that also use raw feature packets and they used 20 features. And I use these features as my starting point. I'm using 10 because I couldn't find the other ones in the other IP stack. But yes, this was my main criteria for selecting the features and the features are from two layers of the TCP IP protocol. The IP is the layer of the protocol that handles the delivering of the packets. Okay, this packet needs to go to this computer and things like that. And the other features were TCP features, the transport layer that handles the actually transport... No, transport layer is the 31. I forgot the name. I had a slide explaining these things, but I didn't have the time, so I took it off. I'm using TCP features mostly. And I'm not using UDP features here mainly because, for example, if I have a UDP packet, this will all be zero. And it kind of... There's one hot thing quoted. No, the one that you don't need to have all the features, but if you do have something on and off, it's... Sorry? No, I think it has another name, but whatever. Well, thank you for the question. Thanks very much, everyone. I think we need to go. The day is over, but actually we keep on the party. It's going to have a reception sponsored by Code for Science and Society. Before we leave this room, I just have two announcements, which is we are missing an HDMI adapter and a plug adapter. So if anyone here in this room has it, please give it back. And big round of applause for Deborah Mesquita. Fantastic talk. Thank you.