 So I just want to start off, even though I don't have a whole lot of time, I just want to start off by saying like, it's really my honor to be up here. You guys are awesome. The security community is so cool. So thank you very much for coming and thanks for DevCon for allowing me the opportunity. All right. So let's get started. So I want to start by saying like what we're talking about anti-reconnaissance, right? So there are generally speaking three main phases of a network attack. I'm not going to find this like a textbook anywhere because I completely made it up. But I think you'll find that it's roughly true, right? So stuff one is gaining access to the network that you want to attack, right? So this can happen in a number of ways. It could be that you are exploiting some externally invisible web server that's also connected to the private network, right? Or it could be that you like drop a bunch of USB sticks. One more time let's do the empty seat hand thing. So if you've got an empty seat next to you raise your hand so we can like filter people in more properly because it looks like we've got a pretty full room. Okay. Just keep your hands up until a lot of people come in because the people will be coming in. And if you see somebody with their hand up like do take the seat because otherwise you might not get one. All right. So they can gain access in a whole bunch of ways, right? You can gain access to a network also by like dropping a bunch of USB sticks in the parking lot of the company around it. Or you can like crawl through the air ducts of the building, right? And like plug into their switch or something like that. Like we don't really care. So you gain access to the network by some means. On step two is performing reconnaissance. You have to find out all kinds of stuff about the network. That is if you want to attack. I'm going to talk about this of course great detail later. And then step three is exploiting the vulnerability. Exploiting some thing that you found that's bad about the network that you gained in step two, right? And so one thing that I'm going to try to assert but I don't have like the time to really justify it too much right now is that step two doesn't really get a whole lot of attention within our community for whatever reason. I think I have a couple of ideas that I kind of put forward here. But step one obviously has a lot of attention to it. Like gaining, preventing access to a network. Step three certainly has a lot of attention. But for whatever reason, people just kind of give up on phase two. Just kind of assuming that reconnaissance has been done. So we're focusing explicitly on the second phase, not one in three. So it's into your reconnaissance right? We're talking about obscuring or more specifically obfuscating your network. So we're not talking about intrusion prevention. So if you want to say like, hey we found this bad guy, like kick him off the network or something like that. That's not what we're doing right? We're talking about like fooling the attacker. So once you find somebody's bad, you don't want to kick them off the network because we can't really do that necessarily. You want to do is try to fool them further. And we're also not access control. So another like few quick things that like things that we're not going to be talking about. There's lots of ways of performing reconnaissance. Only one of them is like through the network, right? So there's lots of things you can do like social engineering or maybe just like googling for somebody at that post a picture of their network onto the internet. We can't like handle that sort of thing. So we're just talking about like network reconnaissance. So like using end map to like actually probe network and stuff like that. Alright so enough intro. So reconnaissance like how to like how does this actually get performed? So one of the first things you want to do once you gain access to a network is find out the number of systems on the network. You want to find out like just what machines exist that's on this network that you want to exploit. So one of the major ways to do that is ARP sweep scans, right? So you send out an ARP packet and you say hey like 192.168.1.10 like what's your MAC address? And if he answers, he exists. If he doesn't then maybe he doesn't exist or maybe he's just on a different segment or something. So ICMP Echo is another major one, right? So you can send a ping, right? And then if you get a ICMP response back upon then that means he exists. If he doesn't then maybe that means that pings are filtered on your network. It could be a number of things. But if he responds then he's definitely there, right? So the next thing you want to do is figure out what types of systems are on the networks. Now you know what computers exist. You want to find out what operating systems are there, right? So there's a number of ways of doing this but the way that Nmap does it is you send out like a particular set of operating system detection scans. So what they do is they have a couple packets that are illegal on the TCGIP spec. So like it sends a weird malformed packet that like if you were programming a network stack and you were saying like all right well what should happen according to like this packet like how should I respond? It's just not on the spec. So therefore every operating system is kind of by default going to give out undefined behavior. And they're all going to be different. Now let's you figure out what operating system is there. So there's open ports. You want to find out like what ports are open on machines, right? So you can send a TCG SIN packet. If you get a SIN act back then you're good. Or you can do a full connection, right? There's a network topology using tracer. I'm not going to go through the details of tracer. I don't think you guys know how to do that. And you can find out running services. Once you actually find an open port you want to find out what services are running on that port and what versions of services are running on that port. All kinds of information. And then once you learn all this stuff then you can actually launch an exploit, right? Because now you've found out that there's like some Windows XP like vanilla machine, no service packs. It's a terrible version of MSRPC that gives you a route by like when a light breeze comes by or something like that. And so the major tool to doing this is NMAP. And of course nothing except for maybe a specific case in the OS detection scans really has to do with NMAP specifically. There's nothing that NMAP does that other tools can't or don't. But that NMAP is the canonical tool that you use to do network scanning, right? So we're going to talk a little bit about that. So why is detecting reconnaissance hard? I think that one of the major reasons that doing reconnaissance hard is because like signatures, I don't have to talk really loud, is that signatures completely fail, right? So if you look, if you think back about the old things I just talked about how you perform reconnaissance, they're all identical at the packet level, right? So you do a TCP SIN. Like TCP SINs happen all the time, right? You can't put like a firewall signature for that or you could maybe you just be inundated with false positives, right? And so ARP, TCP SINs, ICMP, like all this stuff is just normal like workings of networks. That's just how TCP SIN works, right? You can't just put a firewall signature rule for that stuff. Some other reasons that it's hard is like that they could give us that speed is an issue, right? So you can be really, really slow and that could be stealthy. You can like send like a packet a day or something like that or you can be really, really fast and call that stealthy and just like finish before anybody notices. And there's not really any rhyme or reason. Like there's not any speed that reconnaissance has to be at. And also because the guy is already inside of your network, right? So like your border security is kind of completely pointless. Even if you did have some awesome signatures that your, the firewall at the top of your network, it just wouldn't be any good because it's not going through that machine. So yeah, your border security is already bypassed. But I think there's something more fundamental going on here is that really what we're talking about is not data but metadata. So this is like network metadata, right? So imagine you have like a packet, right? And so we can encrypt the data of the packet. We can, you know, throw a key on it and be really proud that like nobody can get inside of this data, right? What you can't do is encrypt the metadata. You can't like encrypt the time that the packet was received or how big the packet was or something. You can pad the data but you know, that's the pad of data of course you can't encrypt. So all the sort of orientation of how, of all this stuff is, is all metadata and you can't encrypt that. What we can do is obfuscation which is, I might have equivalently called the talk network steganography. So there's a lot of this, you know, the theory behind what we're doing here is about steganography. It's about obscuring or obfuscating the network. So you can do this. So even though you might not be able to encrypt the metadata, you can put a whole bunch of bogus data in there. That makes it much harder to find real ones. So we're going to try to make finding your real network nodes in some, in some network of yours, some private network, like finding a needle on a haystack. A picture of a big needle inside of a haystack. So we're going to do is drown out the real nodes of your network with realistic looking fake ones. So I say realistic looking only from the network. So from the network perspective it's realistic looking, but not necessarily from my other way. So we're going to use a tool called honeyd. Perhaps you're familiar with it to do this. Honeyd is unique in a way that, so it makes virtual machines. Let me get that. It makes virtual machines, but not in the normal VMware or virtual box where they're used to virtual machines. It doesn't have like a virtual hard disk and actually emulate an operating system. It just is a network demon that runs and responds to packets as if it were those machines. So it looks real from the network but isn't actually full computer. So you can run like hundreds or thousands of these things on just like a single box and that would be relatively efficient. So we're going to have two goals here for the system. We're going to want to obfuscate the network which we're going to give reconnaissance lots and lots of bogus results making it you know much harder or possibly useless. And we're also going to want to identify reconnaissance. Because one of the unique things about this, this setup is that because you've got all these decoys, honeypots, any traffic going to the honeypots is presumptively hostile, right? So you could say why are you talking to the decoys? Like why did you send one sin packet to all the decoys, right? So not only does this give us a leg up in trying to discover reconnaissance but if there's a certain telltale, I don't want to say signature, but overall traffic pattern to reconnaissance that's really hard to get away from or perhaps impossible. So this is our free software tool that we have to do all the things we're going to be talking about. It's called NOVA. This is, we can get our source code and stuff at projectnova.org. I have some stats up here. It's written CC plus plus. Yeah, so the number one way you guys can help out is running our software. We just need some users to give us some feedback. We've been working on it for a while now, but a user base is definitely one of the things we need. So help us out. Alright, so a little bit about honeypots and decoys generally. So low fidelity honeypots are not like a real machine, right? Or like a virtual machine as you know them. So they can't be exploited like a virtual machine can. I understand the dangerousness of like telling a room full of hackers that something can't be exploited. But there's an order of magnitude and more simplicity behind these virtual machines, the honeyd virtual machines as opposed to a real one, right? Because it doesn't have like a full networking stack and actual services that are running. So honeyd, when you have services, so you can have real network services that are running. They don't like tell that FTP servers. But they're not actual like services, they're just shell sprints. And so we've got really fun stuff like FTP auto fail services, right? So you like log into this thing. And it just gives you an FTP banner. And then no matter what you get, it'll ask for a log. I'm like, here, what's your username and password? No matter what you say, it just says, no, authentication failure. So you can produce these things en masse. So we use honeyd. One of the problems with honeyd, however, if you're familiar with it, is that it hasn't gotten an update since like 2007. And that's a rather a problem because since then Nmap has put in new operating system probes. And so it completely fails at like giving the right operating system. So one of the things we had to do was update that. So if you go to our repository at github.com slash data slash honeyd, you'll find an updated version of honeyd that actually responds properly to the new Nmap operating system probes. So it will like work with the new versions of Nmap. So let's take a step back, right? So you imagine you're like the attacker, right? You gain access to some network. It's some like massive network with like a whole bunch of fake nodes. Most of them are fake. You can't tell the difference between them. You can spend hours trying to probe these machines thinking like trying to figure out whether they're real or fake. So reconnaissance then becomes an effective, cumbersome and obvious. Let's see, you're talking about high fidelity honey pots. So one of the normal ways that people deal with honey pots is by having like an actual physical machine, right? With a hardware box that's set up that like you tell your security guy to set up and he like you know gives it with some like really vulnerable services and stuff like that. And how do you like if your boss comes to you and says, hey, found any bad guys yet today? Like how do we answer that question? And unfortunately most of the way you do that is by inspecting log files, right? Maybe you've got some like auto cron job that tells you if the like shadow files changed or something like that, right? But mostly it's manual which is really unfortunate. Sometimes it's automated tools but not any like general purpose automated tools. And signatures like IDS or antivirus things mostly fail for the reasons that I talked about before. So we don't want to do that. We want to do automated learning or automated detection on machine learning. So what we do is we use the K nearest neighbor's machine learning algorithm to do this. So the KNN or the K nearest neighbor's machine learning algorithm is like a totally standard machine learning algorithm that lots of people use for lots of things that have absolutely nothing to do with network security. So I'm going to give you a quick intro to it here. So you have N statistical features or whatever you how many you want. So these are scalar values that you judge on the basis of each suspect. So every node on your network is a suspect and you measure statistics for them like packet timing, IPs contacted, how many ports have they contact, like unique ports that they contacted, how many of these, the haystack right which of these fake nodes we have, like how many of them like what percentage of them if they contact. Is a hundred percent? Is it zero percent? And then you graph these since they're all scalar values on an n dimensional graph. So here is shown a two-dimensional graph for budget cut reasons. And so you have to train the data to put in some training data. So like a spam filter, right? So you have to tell the system like what's good and what's bad. So you can have like a training button in our system, right? Say listen for a while and say all that stuff was good. That's the nine data. You put that into your data set and then you say all right I'm going to start scanning now and then you put a whole bunch of attacks and you say all right that stuff was bad and now that's your training data. You plot all those points in your n dimensional space and then you have a query point. So now you have this green dot in the center, right? You say well I just found some new suspect. Is he good or is he bad? How do you answer that question? Well you just search for the K nearest neighbors where K is some constant that you choose. And so in this you can take a majority vote. We actually do something a little bit more complicated than a majority vote in our NOVA system. But taking the simple case of a majority vote you take K equals 3. You look at that inner circle, right? You say oh well there's two red triangles and one blue square. So therefore like our green guy is probably more like a triangle. So therefore he's bad. Or maybe if K was 5 or a little bit larger then you can take more data into account and get a different result. So you want to try to tune your data classification according to your data set. So we also use like a distance metric like I said. So we use LibAn to do that which is the approximate nearest neighbors library. It's approximate nearest neighbors because it's one of those things where we like introduce a tiny bit of error in the search space. We want it getting like huge performance gains. So some other features that we have in the software. We have a Haystack auto config utility which is like really cool. Though it's like super alpha right now because we're just finishing doing it. So you like scan your network. You press like the button like auto config it will like end map your network. Figure out what's there. And then it'll like build you a Haystack on the basis of that. So that you get like a Haystack that looks real. So it's one of the hazards of doing this is that your fake nodes have to like look believable, right? They can't look super fake. Otherwise the whole system kind of goes to hell, right? So like if you have a server like a server farm like all Linux boxes and suddenly like your fake machines you set up all pretend to be like Windows XP then like that's bad right? You don't have this obvious separation. Or if your fake nodes like say that they're like Linux machines but then are running like some like like known BSD server or something or if you got like BSD servers that are running MSRPC like that's bad, right? So automating that process of like setting up the Haystack or making that as simple as possible is one of the things we really try hard at doing. So there's a lot of just simple front-end tools actually for setting up Haystack configurations that we include in here. So we have a whole bunch of UIs. We have a web UI and a local QT user interface and a scriptable terminal interface that you can use. Like I said importable exporting training data things like that and of course it's free software license of the GNU GPL v3. So how about it? So I'm going to go and do a demo now. Yeah I have no idea what I'm doing dog. So I'm going to go ahead and start. It takes like a minute and a half to actually run this so I'm going to start it and then I'm going to explain what I'm doing as I'm doing it. Alright so there are, I'm going to explain the network that I have set up right here. So I have my host machine which is up here in the upper left. Then I have two virtual machines that are just going to desktop guys running a empathy like service. So they have a port open here and here. And then we have our bad guy here who is doing some scanning. So he should take about like 70 seconds if I remember correctly to finish this. And so this will give us an in-app result of what our actual network looks like and I'm going to do a before and after. So this will show us what the actual network looks like. You can see how fast it scans it and how like pretty much easy this was. And then I'm going to turn on Nova and see what it looks like afterward. So he's going to be doing some pretty normal scans I think the next time we'll look at it. But he's doing some pretty normal scans. He starts out with an ARP sweep scan just to figure out like what ID addresses are available on the subnet here. Then he'll do a port scan to just do a TCP port scan. I'm using a SIN packets for all the defaults like ports that M-App things are common. So I didn't like hard code what ports are here. I try to use as many default options as possible and like these M-App scans to try to like convince you that I'm not getting some contrite scan that only works for me. So I'm opening up the results in ZMAP. It's a M-App front end just to show you what it looks like. So I've got five machines here. So 56.1 is my host machine. It's got like some net bioserver that I'm running for some reason. The two virtual machines, the victim one and victim two. And then 101 is the attacker himself so he doesn't have any open ports because he's backtrack. And then 100 is the DHCP server that virtual box always gives you for some reason. So that was pretty easy. You can see all the actual ports. It gives you the operating systems. It'll tell you like yeah this is like Linux 2.38. So then let's go ahead. Start, start. Clear any old data. And then let's start this game. This one takes a little bit longer so I'm gonna try to talk as it's going. So we talked a little bit about the options here. So we are using O capital O which is the M-App OS detection scans with fuzzy on since M-App only rarely gets the results like perfect every time when it's doing OS detection. So it's good to put on fuzzy which will make it guess. So it'll say like hey it works like 99% sure that it's like this operating system. OX just tells it to save the results into this XML file. T4 says to go a little bit fast because otherwise this thing takes forever. And so the last one took like what does it say 74 seconds where this one can take like a very long time since I've got a lot of fake machines running. Oh one of the things I'm gonna do is I'm just gonna ping on my host a little bit here from the virtual machines. So this is like actual traffic. So this is from a real machine to a real machine. So you can see 103 is our real machine. And so at first the system will notice about the our tacker that he's probably good. So the system reports a classification between zero and one where zero is almost surely benign and one is almost surely hostile. And there you can see he just turned hostile. He's red which means he's a bad guy. He just found in our network it's bad. Because he has all kinds of information like we contacted 19 different IP addresses into 1022 ports. And so it you know looks into its training dig and says this guy's probably bad. Whereas our good guy still remains at 0.03 because all he did was talk to it was real traffic from a real machine to a real machine. So there's nothing but nothing hostile about that at all. And our in-map search finished. Excellent. So let's look at the results. So now you can see we've got all kinds of machines in here. We've got some BSD boxes, a Windows machine on here. We've got a Nintendo Wii game console. A bear crew to spam firewall. This guy didn't come up as anything. The BSD machines. And so what's really cool is to look at one of these like Microsoft Windows servers right that happens to have an FTP port running. So let's go check that out. So this guy is 115. So he is actually already have it on here. My history. I'm going to give up if I don't see it in a second. All right. FTP into 182.168.56.115. And there it is. Microsoft FTP server. Let's try to log into it and it just fails you no matter what you get it. And so you can have all really kinds of complicated servers on top of this as well. So some of them like exist already about like you can have like a like my doom and quang like the different like viruses right that like open ports on their like infected machines that will say like it'll report so like yeah this machine is like a Windows machine that's infected with the my doom virus. And the server like the service will actually respond like to its command and server or command control server like as if it were actually like you know infected by the my doom virus when really it's just like some shell script that's like a hundred lines of code. So that is the demo. Let's see information here I'm pretty much all that for like everywhere. If you want to meet me in person though come on down to the Phoenix 2600. It's really cool. We meet first priorities month of course down Phoenix Arizona. Let's see. I want to give a special thanks down to the other guys help program for Nova David Clark who should be in the audience somewhere on Dave Scott and Edison Waldo. Those guys are awesome and I guess Fiodor from Nmap if you happen to be here you're awesome I owe you a drink as well as Niels Provost. And we'll be taking a Q&A down in the Q&A room which is like down the hall and if you happen to see me like the rest of the conference like come and talk to me like I have time for you. And as well that's it. So thanks a lot.