 Thank you very much. So good afternoon and welcome to the IIDS and IoT intrusion detection system talk. My name is Vivek Ramachandran. My team members Nishant and Ashish were also responsible for a lot of work. They aren't here unfortunately but just so you know. A little bit about myself. I started as a programmer in the Layer 2 security team. Wrote a lot of the 802.1x and port security for Cisco SCAT 6K. Moved on to do security research. Broke web cloaking in DC-15. This is actually my 10th year here. And discovered a lot of other wireless attacks. I speak and train at a lot of conferences. I run the Wi-Fi class at Black Hat for the past five years. I also run Pentester Academy and SecurityTube.net. Authored a bunch of books. And for today's talk. So why an IoT IDS? We all know how secure IoT devices are, right? And when I actually looked at the problem for the last six months. What I was trying to do is come up with a solution where we could go ahead and monitor IoT devices on the network and try to see if we can create an intrusion detection and then a prevention system. So as a lot of people have predicted an average home will have 10 or more devices. Of course you have industrial IoT and enterprise IoT devices as well. The key challenge in IoT devices is the fact that there is no built-in protection, right? There is no built-in firewall, no IDS, no AV. Unfortunately, most of these devices do not even have good logging capabilities. So we cannot rely on the device itself to send us any sensible information in order to do any form of threat hunting or intrusion detection. So this is where I went ahead and defined the problem scope. Now IoT is vast, right? You have devices communicating over Wi-Fi, Bluetooth, Bluetooth LE, custom frequencies and what not. What I did find is most IoT devices seem to be converging towards Wi-Fi as the back-haul network with 11 AC and extremely high bandwidth. Additionally, of course, most IoT devices do not send out a lot of data, right? It's pretty bursty. They could be sensors and such devices which might not always be communicating. Now when I looked at creating an intrusion detection system, the only option was to take an overlay network approach. And what I mean by that is because we cannot have code running on the IoT devices as well as maybe even on the network the devices connect to, I'd like to monitor the air with different IoT IDS sensors, gather information of how these devices are communicating and see if we can do any form of threat detection. So when you look at the architecture, of course, you can have a single standalone sensor, right? So one sensor plugged in running your code, trying to figure out if intrusion detection is happening. The other possibility is more of a spatially distributed sensor network. So we distribute them, they push whatever metadata packets or whatever we decide towards the central collector and that is where all the intelligence resides. So I'm going to talk about both these architectures today. Probably if I were to do each and every demo, I would take five to six hours. So I'm going to do a couple of them in the initial ones at least live. I have videos, all the source code is open source, feel free to download. If you'd like to see more live demos, you can even talk to me after the talk. And of course it's DEF CON and I wouldn't want to be running a Wi-Fi sensor. So IDS sensors. Now when you look at securing IoT devices, the question of course is how do you want to build your sensor? So I decided to pick up an IoT platform for it. So I've chosen the ESP8266. Anyone's used it? Okay, two, three people. The ESP8266 is an SOC. It has a full Wi-Fi stack in it, capable of BGNN. It supports web, WPA, and WPA2 networks. I think the latest update even supports WPA2 enterprise. It's a 32-bit risk can run up to 160 megahertz. In my demos we're going to be using the 80 megahertz option. But you can go ahead and run it at 160. Now ESP8266 has been created by this company called Esprit. And in the early days their SDK wasn't well documented. So unfortunately a lot of the work which I had to do and early for this was probably a year back was really just looking at how they've programmed a couple of things and even reverse engineering some components. Good news. Now they have a much better SDK. So should be easy for anyone who'd like to extend my code. The ESP8266 comes in the different shapes and sizes. The one I'm actually going to be using is the NodeMCU. The key difference between the regular chip and this is this has the chip and the USB to TTL converter. So I can just plug it to the USB on my computer and program it rather than having to build a circuit just for this purpose. So there it is NodeMCU. I think this retails for $3 on AliExpress. So $3 to $3.5. If you are willing to go a little back and forth on the chat with some of the vendors they can even give it to you for $3 for 10 pieces. Okay. The programming environment. Now the ESP8266 has its own SDK as I mentioned but what people have done is they've wrapped around a lot of that functionality using the Arduino framework. So a lot of code which you'll actually see looks like regular Arduino but I have to go ahead and add a lot of SDK specific functionality and call those functions as and when we need to do things. The code which I'm going to give out is very well commented so hopefully you shouldn't have trouble using it but you can always contact me later. So what could be the building blocks of an intrusion detection system over Wi-Fi? To begin with of course you have to have scanning functionality, right? You plug in your IoT IDS, scans the air, starts looking at which access points are around, eventually can start monitoring certain access points of interest which is your home AP or your office AP and also starts monitoring clients. Now what this means is you have to have some kind of a raw interface so you can get the packets directly. How many of you have programmed with raw sockets? Okay. Three people? Okay. So we will be using something similar. They are SDKs do expose some low level APIs which functions similar to raw sockets. After that once we start the monitoring exercise then detecting mis-association. So what is mis-association? Let's say I have an authorized client connected to an access point, right? Now an attacker creates a honeypot and he starts sending de-auth packets. So the client may get disconnected with the authorized AP and connect to the attacker's honeypot. This is one example of a mis-association attack. So a wireless IDS ideally should look at this and say hey, this authorized client should only be connected to this authorized AP or set of APs. Now when I say AP or set of APs we define them using the access points MAC address and SSID. I will also talk about how to detect MAC address spoofing, right? That's for the very end. Detecting honeypots. So what are honeypots? Same SSID as that of the authorized access point. The attacker brings it up, authorized clients may try to connect to it. After that we look at distributed sensors, central collector. We'll talk about advanced attack detection. I'm going to pick up some examples of how to detect evil twins, MAC address spoofing and at the very end even how we can use machine learning to do device finger printing and automatic learning. Okay, so the Wi-Fi scanner functionality. So that is the attacker, the graphic is custom made. So typically what you want to do is have the ability to monitor what is happening, right? That is the only way you can actually pick up that there is an attacker in the vicinity. So I'm sure the code is visible at the end. So the whole idea is not to make this a coding exercise. I'm going to give out all the code for free, completely open source, use as you want. What I'm going to do is I'm going to run a couple of these snippets and for the more complicated examples with APs and clients, I have videos but you can always catch me after the talk. So let me go ahead and run this. I have my ESP 8266 connected. So this is a simple scanning program. All it does is it scans Wi-Fi and figures out the list of APs, their MAC addresses, the channel, the RSSI levels, encryption and all of that. So this is one of the core building blocks which is the ability to first find the access point, SSID, BSSID, you know, of the network you're interested in protecting. So once this runs, let's open up the serial monitor and if you notice it says starting to scan and it immediately starts scanning all of that. Why isn't anyone clapping? It's $3. It's a $3 device. So as you can clearly see it has a reasonably robust implementation of the low level Wi-Fi scanning stack. Some of the SDK is open source. I've been a C programmer for over 16 years. I think it's pretty high quality code from what I've seen. So this is how we go ahead implement a simple scanner on this $3.5 chip so that you can look at what is around. Monitoring access points and clients. Now scanning is actually generally quite easy. Why? Every operating system has low level APIs with which you can query the card, talk to the card, have it scan. But when you talk about monitoring this is really where it kind of becomes very, very architecture and system specific. So you have to figure out how to sniff packets using this device. Now the good news is that they did provide some low level APIs. All of this code is going to be available. I wrote part of this code which we released at Black Hat Singapore called YD. But typically what I'm doing here is going ahead, getting the packet in promise course mode and parsing that to look at if this is a management packet, data control. Based on that what I really do is create these flows. So a flow could be an access point in connected clients. That's an example of a flow. So what you ideally want is an authorized AP should only talk to authorized clients, right? Of course you will have to have the admin give the list of authorized clients. We'll talk about some machine learning techniques with which you can auto detect some of those as well. But I'll save it for the end. So I'm not going to go ahead and run this code. Let me just go ahead show you a quick demo. So what this code does is low level sniffing, gets the packet, parses it and now starts looking at what APs are there and what clients are connected. This code is specific to AP enumeration. All of this code is available after the talk. So right now we are enumerating the access point. The key difference is this is using absolute low level raw sockets, right? So people who haven't programmed with raw sockets this is like trying to reimplement what TCP dump or some of those APIs do, right? Lip, P-cap. Trying to implement that. That's supposed to be me. So we've looked at the monitoring APs. Let's look at the monitoring clients. So what we have here is the left most column is the MAC address of the client. The column after that is the SSID that client is connected to. And then of course you have the BSID in some cases. Is it probing? Is it connected? Have you seen data packets? So think of basically this as a state machine. So if we actually see a client probing, we say hey this is in probing state. Once we end up seeing it send a couple of data packets, we know it's in connected state, right? So you have different client and APs moving through these multiple state machines at the same time. And what you ideally want to do is find anomalies in this. So as you can clearly see, total number of clients detected is three. These are the MAC addresses. This is the SSID they are probing for. And two of them are actually in probing state. They haven't yet connected. The video actually looks pretty good on my screen. But I apologize the way it looks here. So you'll have it after the talk. One of the clients, the third one is actually connected to the AP because we can see the AP's MAC address, right? So this shows there have been data packets exchanged. So let's step back and summarize what we've done so far. We have used the ESP8266. We've put it in monitor mode. We are sniffing packets. We are figuring out which access points exist, which clients exist, which clients are talking to which APs. And when I say talking, that could mean probe request, trying to just figure out what capabilities the AP has or actual data packets, which means both the client and AP are already connected and exchanging information, right? So that is what we've accomplished so far. Now, in order to go ahead and convert this into an IDS, we will require a little bit of more information. And that information is the list of BSSIDs of APs which are authorized and the list of MAC addresses of the clients which are authorized, right? You could use some interesting learning techniques. As I said, I'll talk about that at the very end so that you can try to auto-learn, assuming of course you don't have an attack under way. So let's actually move forward. Detecting mis-association attacks and detecting honeypots. I'm going to increase the font size a bit. Is this a little bit more visible? Okay. I'm going to take you. So what we've done is we've gone ahead and commented every part of the code so later on you can just add whatever MAC address belongs to your network, right? You can add a BSSID or a set of BSSIDs and that's basically the BSSID of the access point serving our network. The clients over here which is client one, two and three, these are supposed to be the authorized clients. Again, you can fill these up. You don't need to have only three clients. You can have a hundred. Whatever count you have, just make sure that you can have changed it. I'm also documenting the code on how to use it. Hopefully you should have that next week so you know where to change what. So what we do now is we use all of that state machine which we created by monitoring the APs and the clients and now we combine that with this additional knowledge of first which AP are we interested in because as you can imagine if you're just monitoring the air you're probably seeing dozens of access points and you really don't want to spend a lot of time monitoring everything. At the very same time this only has one radio. So you probably want to go ahead scan, figure out the AP of interest which is your authorized AP, change the card's channel, go to that channel and just monitor that access points MAC address. This is exactly what the code does. So it'll scan and find MAC addresses of all the APs around. The moment it sees that matching with that in the config file it'll automatically change its channel, put itself on that network. One of the things which have also added is let's say the AP automatically changes channel. So if this does not find that AP there it'll automatically go back into scanning mode. So a little bit more intelligence built into it. This is really important because you have a lot of access points which do automatic channel hopping based on the interference in the channels where they are in. Right? So once we do that we know this is the authorized AP. We immediately start just monitoring packets from that access point. And then we create these flow graphs which I talked about. These are just internal data structures. At that point what we really have is what we see on the network which is one whole tree versus what is allowed or the white list. Right? Which is a separate tree. Any time there is any mismatch we automatically go ahead and send out an alert saying hey this client or this AP is probably malicious. Right? AP malicious is honeypot which is trying to broadcast the same network. Client malicious really is an external client not on the authorized list trying to connect to the authorized AP. And misassociation is an authorized AP trying to connect to an external AP. Right? So let me show you how this looks like when we run it. Okay? So now the code is running and what we automatically do of course we want to show attacks. So these are different clients trying to connect to our authorized AP. So what IIDS does is any time a client sends a probe request or a data packet or for that matter any management or control packet a management or data packet it automatically figures out that this MAC address doesn't belong to the list of authorized MACs and starts flagging it. We also have a back end MQTT integration. So the radio once in a while can go into client mode connect to a back end access point and send you an MQTT message. Anyone MQTT? Okay. It's basically short bursty messaging for IoT devices extremely popular. Some people like to call it Twitter for IoT devices and some people hate calling that. So either way. So it's a very flexible simple protocol where you can just put a message inside and off it goes. The good news is if you have an MQTT server you can even have client apps on your phone and you can automatically get alerts. So you don't even have to do all of that alerting mechanism and workflow on your own. It's kind of built into most apps nowadays. So in this example we have multiple clients. If you notice now we have one that says misassociating and the others which says intruding. Right? I explain the difference misassociation is an authorized client trying to connect out to an external AP. This could be a honeypot while intruding is an external unauthorized client trying to connect to one of our authorized APs. Okay. Another custom graphic. Right? Honeypots and misassociation. Okay. So building a distributed sensor collector. Now when you talk about any form of distributed sensor architecture the key question is how much processing you'd like to do on the sensor and how much would you like to ideally offload onto the central collector. Right? And this is really where this is a $3 device. I mean you cannot expect this device to be able to do a lot of heavy duty number crunching metadata and statistics collection. So what I actually did was the device scans the air, figures out a lot of this metadata, immediately flips over to client mode, connects back to our backhaul network and actually sends all of that metadata to a remote server. In this case I've actually sent the whole packet. You can of course do a much more reduced size which is you can have certain header fields depending on whatever your algorithms would probably like to use for threat detection. Right? So once I receive the PCAP files I end up using a tool which I had written two years back and released a DEF CON which is PCAP to XML and SQLite. So what this does is this takes PCAP files and looks at every single header field and maps that to a column in a SQLite database. So now you can actually run arbitrary queries on any header field and this is extremely useful when you want to create an IDS on a budget. So some attacks which we can detect very easily, deauthentication floods anyone? Wifi? Deauth floods? Okay so deauthentication packets are really packets which are meant to disrupt a connection, terminate a connection between any two devices. It's a very common way by which denial of service attack happens. Right? It's also a common way by which attackers have authorized clients disconnected from the real AP and then have them hop over to their honey pots. So generally deauth attacks are very bursting nature. The attacker is going to be sending a couple of hundred packets a second probably slower speeds will do as well but most people just rely on ready made tools. So what we do is we try to figure out the number of deauth packets we figure out the different time stamps at which they have arrived. Now keep in mind that this is interesting because eventually you also want to do both time based and spatial correlation. Right? Let's say you deploy this sensor around your enterprise and the attacker is actually driving around in the parking lot trying to attack or trying to do something. So you will be able to spatially correlate that if you preserve a lot of that information. Which is really what I was trying to do. Excuse me. After that of course you can run Python. You can pretty much do a lot of things. I prefer Python. It's pretty simple to use. All of this code is available along with the talk slides. So you can try it yourself. What we really try to do here is for every single device we start seeing how many deauth packets it is either sending or receiving over time. And we have time thresholds. Based on that if you see more than n number in a specific time slot we actually say that's a deauth attack. Now what is that threshold? The truth is the thresholds actually change based on the network a lot of times. And this is really at the very end I'll talk about machine learning. So maybe you can even try to decide or learn baselines on your own. Right? My home network might hardly see any deauths but a larger enterprise network might see genuine deauth packets getting sent a lot. So baselining just like any other intrusion detection system is a hard thing to get right. The best way of course is you really know your network well. Alternately at the very end we'll talk about ML. So detecting different attacks, deauth flood, evil twins. Anyone what is an evil twin? I'm going to give out like a free ESP8266 evil twin. Okay very good. And a lot of times evil twins might even have the same MAC address as the authorized AP but on a different channel. Here you go. Clap. Yeah so what is an evil twin? Evil twin is let's say we have an authorized access point and now an attacker wants to create an AP with the same SSID. A lot of times even with the same BSID. The reason to kind of do that and also mimic all the security parameters is the attacker would ideally want to deauth the client, the authorized client from the real AP and have that client hop over to the attacker's evil twin or honeypot. It's a very common attack and a lot of enterprises actually have to go ahead and deal with this. Okay so evil twin detection. Now evil twin detection with MAC address spoofing is actually very hard to get. There are companies which get bought and sold based on how they solve this problem in Silicon Valley. One of the approaches which is very common is to look at a field called sequence number which is supposed to increment once every packet typically. So when you have two access points with the same MAC addresses and communicating you are going to actually end up seeing different patterns of sequence numbers can of move forward. Even if the attacker were to try to go ahead and trail the authorized AP as close as possible you would still end up seeing a lot of duplicates, right? Because the real AP sends a sequence number 10, the attacker sees that he also sends 10. But now there is a collision in a very small time slot. So you can detect it with Python. Interestingly you can even visualize it. So just by going ahead and plotting the sequence numbers with respect to time for a given MAC address if you actually end up seeing two patterns which are overlapping during the same time slots you actually know for sure that MAC address spoofing is happening. This is one of the most common ways industry intrusion detection systems for Wi-Fi actually pick this up. It's a very common technique. Client spoofing detection very similar to evil twin. The only difference is now you're dealing with a client device and not an access point. Again what we really do is figure out how the sequence numbers are getting bunched over time and then we try to see if there is a feed forward or feedback on the sequence number trails for the same device. In a sense we are essentially doing what we just talked about and again looks pretty similar. So device fingerprinting with ML. How much time do I have? So till now what we've really seen is different mechanisms by which we can go ahead and create an intrusion detection system which requires help. You have to tell it what the authorized devices are. At the very same time think about MAC address spoofing. The case which I've taken right now is really when the real authorized device and the MAC address spoofing device both exist at the same time. So what happens if the attacker brings in his device which spoofs the MAC of an authorized one when the authorized device is not around? How do you detect MAC address spoofing now? Anyone? So the problem is now actually quite difficult because when two devices coexist with the same MAC address at the same time you can do a lot of the analysis which I talked about. But what happens if now only the attacker device exists with the authorized MAC address? The real device is left. Maybe it's 6 p.m. and you know the authorized client device just went home. Okay, what exactly? You're on the right track but what exactly in probe request could be interesting or what are we going to do with that information? So you're actually moving in the right direction. However, keep in mind that even in a probe request packet or a beacon frame where you have all of these information elements, some of them actually end up varying for the same device as well. So we would really have to go ahead and basically hard code those values which isn't scalable. This is really where machine learning comes in. Yeah, go ahead. Absolutely. But keep in mind how do you program that because what you're essentially saying now is learn from data. Right? Because if we just had the code we wouldn't know for your network what could be the inter beacon time or how frequently does the client actually send out probe requests. There is no fixed metric. You can have a new device driver update and that can completely change. So this is a fantastic problem to apply machine learning because essentially what you want to do is look at different characteristics of all of these devices and figure out by looking at more and more data how each of these characteristics or features behave. Some of those features may always be the same value. Some may vary. Some may vary within bounds. Some may vary randomly. So what we also did and I'm going to give that out along with all the code which I showed you today is we looked at different machine learning techniques. Let me kind of get you to a summary slide. So here is what we did. We actually picked up different elements and you can visualize this every element as a dimension in an dimensional space. And based on that we can try to see if we can figure out interesting clusters and patterns. Machine learning is very interesting, a topic by itself. We can talk about it after the talk. But what we did find by using K nearest neighbors and a lot of other algorithms is we actually got pretty high scores of accuracy in detecting at least classes of devices. What I do want to do after this is also see how we can combine different metrics out of different frames and use what are called ensembling techniques. I don't know if any of you have heard of it. Ensembles which is really combining multiple algorithms and taking their scores and weighing them in some logical way and trying to see if we can go ahead and then find out which one is a better predictor. Okay, I'm going to release all of this. A couple of other things I want to mention is when you look at information elements or IEs, it's a pretty big chunk of data. So what I did was first I just took a boolean which is is that IE present or not and use that as a feature vector and then looked at how well the scoring is. The next stage was actually to pick up that along with the length. Then to pick up the length with the hash of the value, right? Because IEs can actually vary in size. So by taking a hash then what you're dealing with is this value fixed, does it vary, what values can we have, etc. And then of course finally the actual value which is very difficult to work with in general. So this is really where we have progress so far. All of this code and everything is going to be released hopefully today. I'm tempted to connect to the DEF CON network and upload everything but I'm not very sure about the kind of security of the server which I've kind of rented out for this. So but I'm going to do it tonight. By tonight you should have all the code. Any questions? Any questions? And that is really what we're kind of looking at is how do you kind of go ahead correlate that with something like time series and see what all we can do. And it's an extremely interesting, challenging problem. But that's where we are, the research. It's a good point. Could you mention? Forgot about it. So his question was what about prevention? Good news. We can actually send out D-auth packets using the ESP8266. So currently the prevention which I haven't implemented but plan to is at any point if we see a mis-association of any kind send D-auths to go ahead and break that connection. That way we can ensure that the authorized client does not get connected to an unauthorized AP. So that's what we can do and that's what actually all the industry AP sorry VIPs do as well. Wireless IDSs and IPSs. So here is an actual D-auth attack using the ESP8266. The code is there as a separate file. I haven't integrated that yet. Questions? Okay. Thank you so much. I hope you enjoyed it.