 I would actually like to thank the few people that actually made it to this talk. And my purpose in life here in this talk is actually like, since we have a company called Recruity Labs that actually did this research, I get to have brilliant young people working for me that actually like do cool stuff and do stuff that we need. And I get to like wrestle myself into the talk so I can claim the fame. And then they present the technical details and then I'm running around, see I'm a speaker. So this is actually what I do right now here. So we're talking about a port scanner. And as I see, most people do not consider it as a cool topic. First of all, if you have been riding a port scanner before, like if you ride port scanners, no matter if it's a worldwide used one or a very little used one, this is not religion. Port scanning is actually, as the name suggests, scanning ports is sports. So we actually do have competition and yes, it actually makes more fun if more than one is playing. So if you have been riding a port scanner, don't be upset that we did that too. But like enter the competition. Now the question is why would we actually develop a port scanner? This is a research project of a company so we actually have a reason that we do this. Otherwise we would do another research project because for us, port scanning is actually not fun. For most people it is you turn on random scanning, you find random boxes, you own them. For us it's not, it's actually work. We actually need accuracy, we need speed because our customers actually pay us for the time we spend on a task. And if we're spending a long time on the task of port scanning and network, they're spending more money. And if they're asking us how much money is it gonna cost to scan this network, we actually need a predictable port scanner to tell them. So we actually needed a port scanner that actually runs on dedicated machines and provides all that to us. Now obvious question, why not NMAP? We had some experiences with NMAP that were less than pleasant and I'm actually coming of age. So you can only jerk off that many times a day on internet porn while waiting for your port scanner to finish. And that number does not increase over time. So I actually needed a faster port scanner for that. Professionals in fact don't really scan machines that are powered off, disassembled and like currently being carried around in the office. So we do have a very limited set of requirements for our port scanner and large network scanning is actually stock-taking for the company we do this for rather than vulnerability identification because that comes afterwards. On a higher level, the thing is this, all the discovery methods, all the fingerprinting, all the banner grabbing, everything else that we know and that we use depend on a single data item and that is a list of open TCP ports. So we actually wanted to concentrate on obtaining this single data item before doing anything other fancy and like fingerprint and whatnot in vulnerability scan. Accordingly, this also takes the most part of the time. Like once you know what ports are really open, which are filtered and which are closed, you can actually do all that fancy shit. But the most time actually goes into the scanning and while we're actually at it developing it new, how about having actual algorithms like stuff you can write down on paper and calculate instead of a whole bunch of if statements nested into each other in the hopes that it actually will produce a fast and accurate result. So this is our requirements. We wanted a TCP send scanner, nothing fancy, no Christmas trees, they're just taking too much space in the office anyway. We wanna know UDP scanning because frankly, in today's world, UDP scanning is actually of no value whatsoever, especially or because when you're scanning firewall machines and every machine nowadays should actually have a local firewall because UDP scanning is a negative scan method. You send a packet and you get a packet ICMP back saying this port is closed, which means if it's firewalled, you don't get a packet back and if it's open, you don't get a packet back. So what's the point? We wanted constant access to the data we get like once something actually happens, I want to know that right now because then I can actually offload other work to other machines and not wait till the whole thing finishes and or crashes. It is designed for embedded use, so what we're doing is we write a port scanner that runs on a dedicated embedded machine which has this wonderful effect of we can have this small, so agris box or something and we can just like ship it to the customer and tell them, you know what, connect this to the network and when it stops blinking, send it back which makes the whole exercise really, really cheap because there is no human intervention involved. And that's about it. We wanted to do it like one thing and do this right. And this is the point in time where I actually like handed over to the researcher, Faps, who actually did the work, so he can actually tell you how we all did that. Hello. Since you've probably never seen me before. Thank you. Okay. I thought I'd just tell you a few things about me before we start talking about the scanner for the rest of the slides. My name's up there. I'm 22 years old. I live in Berlin and I work for security labs and I study computer science and electrical engineering and I'm really into networking and reading and writing code. So that's my random nerd profile, so to say. Okay. Now about Port Bunny. The goal's very clear and I guess pretty much every port scanner has this goal. You want accurate results and you want them as fast as possible. Now the difficulties associated with this is that all performance questions in data networks tend to be a really complex topic. It's not really simple. And secondly, there's a huge amount of different setups that you will find and you kind of want to deal with all of them but you have to make sure that you don't end up just waiting for email from people telling you that in this one network it didn't work. Why don't you insert a delay here or do something there? Because then you'll end up with this giant patch together, lots of if statements and no real algorithm. So our approach is actually, let's develop algorithms for this which have a strong theoretical foundation. Okay, but before we go into that, let's just quickly review the basics. You probably all know this, but still I don't want to lose anybody on the first couple of slides. So it's really simple actually. You wouldn't really expect it to be that big of a problem. All you do is, since you want to know if you can connect to a port, of course you will just connect to each port in a row by sending a connection request in the form of a TCP SIN packet and you will just observe the answers. So if you get an RST act, then the port is closed. If you get a SIN act, the port is open and if you get no response at all then the port is filtered. So that sounds really simple. So why can't you just write a port scanner like this? Why can't you just wrap a loop around the send call and then just wait for responses and just output the results? Well the problem is if you're going to do this, then the network is just going to drop lots and lots of packets because you're just sending as fast as you possibly can, but the network can't really handle that. So there is some kind of optimal speed that you need to calculate and it's not quite clear how you're gonna get there. So you're probably saying, wait a minute, if I do my regular soccer programming and I just connect somewhere and then I just send until there's no data left, you know, that works. Yeah that does work, but that only works because TCP is actually doing the work for you. It's implementing the congestion control. But when we're port scanning, we never really have an established TCP connection because we're only ever sending the connection requests. So this is all still on top of IP. So the first approach to finding this optimal speed will probably be to ask the question, can we just ask the network how fast we can go? And the general answer to this is no, although there are exceptions, for example, STMP source quenches and ECN. So people kind of want to have this, but if you write a port scanner, you can't expect the network to actually have it. So the only info that we actually do have is if a response is received, then we have a round trip time. And then the other thing is if we do know that some package should produce an answer and it doesn't, then we can detect packet drops. And that's really just all that we have for our timing code. Okay, so let's take a look at the environment that we're working in, the network. So we have a bunch of nodes and they're all connected through links which have different qualities, different throughput delay and reliability. And each of these nodes, and that's an important fact, has a queuing capacity. Packets are stored in the nodes and then forwarded. So if we simplify this and think about the term bottleneck and experience on socket programming, then we might come to a model like this kind of pipe model where you have one part of the pipe that's just not as thick as the rest, which is like the bottleneck. So if you were to try to find the optimal speed, then this is probably what it would look like. You're just trying to find the optimal spacing between two packets that you sent. So it's slow, faster, and then at optimal speed there's just no spacing left. But with that model, you are ignoring the queuing capacity because if you think about it, just because you can fire 10 packets at a delay of exactly zero and they will produce answers, that doesn't mean that you can do the same thing with 100 packets because of the queuing capacity. So that fact leads to a new kind of more professional model that we used, which is the bucket model. So you can think of each host as a bucket with a hole in it, and basically what you wanna do is you wanna have all the buckets filled at all times. So you're not asking yourself how long are the delays between my packet sends, but rather how much data can be out in the network at once. And the beauty of all of this is that it's self-clocked, meaning that you don't have to ask yourself a question when you will actually send data because you send data when other data leaves the network. And funnily enough, this question of how much data can be at the network at once is the exact question that TCP can get in control algorithms ask. And that is a very active research field, so it really makes sense to make your port scan compatible to what you find in TCP congestion control so that you can actually make use of those research results. But that's not directly possible because the situation is just very different. In TCP, the receiver actually acknowledges the packets while when your port scan, as you saw earlier, if you send a probe, then it might just happen that you get no response at all because the port is just filtered. And for that reason, timeouts in TCP are an error condition while when port scanning, it could be the correct behavior. Furthermore, in TCP, you have sequence numbers and if you don't integrate them into your port scanning probes, you usually don't have sequence numbers with your port scanning probes. So in other words, the TCP receiver is cooperative. A port scanned host is not. Of course, that does not mean we can't host it to be. So what we do before we start the scan is we try to get a response from the host by sending those packets that you see there. So we try to send packets on different ports. We try ag probes, echo requests, and so on. And once we get a response, then we actually have a packet where we do know that it should produce a response. And what we do then is that we send our probes in so-called batches, which we terminate with a trigger. So now we know that while we don't know if the probes will actually produce answers, we do know that the entire batch must produce an answer and if it doesn't, then it's an error condition. Okay, so then if the trigger returns, the probability is actually high that all of these other packets actually made it through two. The reason for this is that there are basically two types of reasons for packet drops. The first one is queues overflow, in which case entire batches are discarded. And the second is physical transmission errors, especially in wireless scenarios. And also in this case, entire batches of data are just discarded. So there's one exception, which is random early drop. I'll not go into that, but yeah, it's an oval exception. So if the probes do drop, what we do is that we say, no, if the trigger does drop, what we do is we say that all of the probes, which were in the batch, need to be recent. And yeah, okay. So what's this whole trigger thing good for? Well, the trigger responses now play the same role as the acknowledgments do in TCP congestion control. So we kind of made it compatible to the scenario that TCP that you find yourself in when you do TCP congestion control. And a timeout is actually a signal of error. So if you compare this with what NMAP does, NMAP does a so-called probe-based, well, I call it probe-based congestion control, which means they don't have these triggers, they just send the probes and just hope that they produce answers. And if they do, then it all works out just fine, as what you see in the graph is the number of packets which are out at once. But if there are no responses, the host is filtered mostly, then it all just breaks down. There is a slight exception to this, which is the port scan ping system, which basically means that if the host does not respond for five seconds, then a ping is sent and then a response is counted as free regular responses that you would have normally gotten. Okay, so this is actually the first real research result. What we get when we use the trigger is that it does not matter whether the host is mostly filtered or not. We get pretty much the same scanning times. So this is just, of course, these are just example numbers, but the difference in the magnitude, yeah. So we have 12 minutes and 18 seconds for Nmap on this host and then 15 seconds for Port Bunny. This is actually the time in the talk where the audience needs to go, oh. Okay, so let's take a look at this again. Now that we have trigger-based scanning, triggers are acknowledged, so that's just like the TCP packets. Timeouts are error conditions and we use sequence numbers in all triggers. So it's totally compatible. So now what we can do is we can implement any congestion control scheme that we want and yeah, that's pretty much what it means. And it's actually, you know, it's clean. If you take a look at this, since you just can't expect that a probe will actually produce an answer, you cannot do congestion control with probes. Of course there are problems with this approach. Notably, not all triggers have the same quality. For example, ICMP may be rate-limited and your trigger is an ICMP trigger, but your actual probes are not rate-limited and there's QOS, so that might pose a problem. And we, well, we have a bit of a fix for this, but it remains a problem. We try to find the good triggers first and only use the others as a fallback solution and if while scanning we find something that we can use as the trigger, which is better than the triggers we currently have, then we just discard the worst ones. So is the problem completely solved? Not quite, because the bucket model is actually not valid for rate-limiting firewall configurations while it's totally fine for normal congestion situations. And secondly, we can implement any congestion control scheme, but how will the user know which one to choose? I mean, in advance. So the scanner needs to perform some kind of detection. Detection such as, is there a rate-limiting firewall? Are we in a wireless scenario? What timing algorithm do we want to use? And the scanner is actually the expert on this issue because it is directly talking to the host that's being scanned and not the user. So what I'm gonna present now is how we did rate-limiting firewall detection. And let's first take a look at how NMAP does this. There's this comment here. You can find it in the NMAP source code. Basically what they do is if there's, hi, if packet drops, if just everything goes totally wrong, if packets just drop and drop and drop, then they say that's a firewall, most probably. And then they actually make this decision. We're going to revert to timing delays instead of doing our normal congestion control. So that's a major decision and that breaks the entire timing concept because they start using the congestion control algorithms, but now the congestion window is actually not the number of probes, the data out in the network at once anymore because they have just random delays between the probes. And the whole self-clocking thing, which was so beautiful about the congestion control, it's just lost in this approach. So and this is the effect of a false positive. Here in this scenario NMAP thought it detected a rate-limited firewall, so the scan then took 24 seconds, 24 minutes, while Port Bunny just did the normal congestion control, which was correct in the scenario, and then that took eight minutes. So there's, okay, so what's our approach to this? Well, we don't just wanna take the packet drops into account, but also the RTT because if you look into your networking textbooks, you will find this and it tells you that if you get congestion, then the delay will rise exponentially. So can we just detect this somehow? I mean, with a firewall configuration, there's just no reason why the RTT should raise exponentially, okay? So why not try to look at the RTT after a drop and maybe see if it makes a difference for a rate limiter and a congestion situation? And given the fact that we're constantly changing the speed, we're trying to adapt to the network conditions, this graph can be rather ugly, but it turns out that in reality, it looks like this. So that's really cool because the thing in the top, you probably can't read that, well, sorry. That's the RTT development with a rate-limited firewall and the thing down there is for a normal congestion situation. And what you see is that for the normal congestion situation, you actually see the RTT rise just before the drop, and then of course we regulate, we send less data, and then the RTT shrinks again. Well, for a rate-limited firewall system, that's pretty much of a ticket system. It's generating tickets that you can take like at a fixed time interval. So what you see is that our timing decides that we can only send about one packet at once. And so we produce a very low network load which results in the fact that we're very close to the base RTT, the round-trip time that you get when there's just no load in the network. So these kinds of graphs, you could easily, well, it's not easy, but you can use all your signal processing skills and just make the decision. So we're working on this approach, but we're not quite done with it, but we found another nice approach which I wanna show you. And it's kind of a two-shot approach. Observe, this is a packet, but this is a packet too. So now if the bucket says that I can fit four of those KitKats or I can fit four of the rice bags, then it's obviously not really a bucket. Okay, the thing is that a rate limiter limits the number of packets while congestion is caused by too much data. So if you enlarge the packet, and it still tells you, I can still fit these larger packets as well, then that's really not a normal congestion situation. Okay, so in reality, of course, if you just enlarge by the TCP, you can use in packet by 40 byte, which is what you can do with TCP options. That won't change much. If you take ICMP, then echo requests and that works just fine. But what you can do is you can just create background traffic. You don't actually have to have packets which will reach the host. You just have to have packets which will reach large amounts of the network path because congestion is something that spreads, okay? So let's take a look at this in reality. So this is, we made a little web interface to switch our firewall on and off, and we first switched it off. And then we run the test, okay? So first of the little packets, we got 49 responses, and for the big packets, we got 28. So that looks very much like congestion. Okay, right? Let's look at this first. And then for the rate limiting firewall, is in both cases, we got 20 responses. And you can actually see in the configuration that that was what was put down for the rate limitation, for the burst size limitation. So we can not only detect the rate limiting firewall, but also get parts of the configuration. And this is what it looked like in reality. In the back, you see the small packets, and in the front, you have this mixture of the small packets and background traffic, which were UDP datagrams. Okay, using Port Bunny, well, you can download the newest version at portbunnyrecurity.com. It's really simple to use. You just give it a host name or a network, and you can use the D flag, then it only does the discovery. But, you know, we don't have a huge amount of options because that is exactly what we don't want to have. What you can also do is use the slash L flag, which will generate a detailed scan log, which looks something like this. And if you send this to us, or if you just load this into your favorite spreadsheet program, then we can generate the data from it that we need to identify whether what we did was correct or not and where the box are. Okay, thank you. So that's what you get when you have, like, small half-Japanese researchers. They're just way too ninja fast. So we've got plenty of time for questions if anyone has any. There is one. Do you mean run it on slow larris or against slow larris? No, this is a Linux kernel module. So this runs entirely in a Linux kernel, which, surprisingly, I think you actually didn't mention. No, not really. If you want to talk about kernel stuff, then, you know, you can buy me a beer and then we will talk about that. The reason we're running this in a Linux kernel is simply because nothing is interfering with our traffic. And we also have very precise timing that you never get in a userland because you get, like, fucking out switched. That's, like, that's the reason. No, we're not running this on anything except for Linux. We've run it against slow larris quite well, yes. Oh, I think there is, perhaps you know more about this, there are ports going on. Like, people actually ask us, we never planned of having that on other operating systems because we only wanted that, as I said in the beginning, for the embedded use. But there are actually people porting this stuff to BSD as far as I know. Like, yeah, we have actually set up a SVN exactly for that reason that someone wanted to port it to BSD. How far he got already? He just started. Yeah, he just started, not very far yet. So it, yeah. Yeah, if you wanna jump in, there is a public subversion that you can play with and, like, read the code and then you get access to it and then you can just submit. So the question is, the question is synchronous or asynchronous. I think what you're actually asking is if it's stateful or not. Yes, it is stateful. It has to be. Otherwise it doesn't produce decent responses. So you've seen several ways in which the network can delay your packets. Congestion, bucket, firewall. Have you seen other ways? I mean, there are devices that try to do both, like the Peketeer or Stealthwatch or any one of those smarker network devices. Maybe they can, maybe they reveal, you know, a mix of behaviors like that. Maybe they are both a bucket and a pipe. Yeah, that's quite possible. Right now, you know, we're going one step at a time. The first really important case was the normal congestion. Then the second is obviously the rate limiters. But then we're hopefully gonna go into these kinds of things. I see, so are you looking to identify what might be on the path? Because say, at some point on our campus we saw something really strange. After a certain Cisco firmware update, every 10th TCP packet would be delayed more than others. Well, so, but you know, we would have loved to know exactly what changed by those response characteristics. So I'm thinking, once you're already doing that, some sort of frequency analysis on that signal, you know, that I'm clearly overstepping my time. No, it's actually a very valid question. And this is exactly the reason why Phops had that slide about send us your scan logs. Like in some scenarios, you actually do not scan hosts that you wanna own, but you're actually like scanning it in your own network, for example. And if you experience strange behavior of Portbunny, the only way we can actually fix the algorithms is getting the logs. And that is, well, we cannot like have the office full of networking equipment and then like try to build complex networks. So this is why we are actually asking for the logs. We don't wanna know what ports are open on your machines because you are owned anyway. More questions, there is one. No, no, no, no, no. There is no product attached to it. We're just using those agris boxes. Have you ever seen them like small? And they're called so agris. There is another one called, do you know, in the little Swiss boxes? So you take a regular like PC platform for like home router, self-built home routers. Have you ever seen those? There are different types. And you take a regular one like this and you just put it into the startup scripts and say once you're booted, scan this IP address range and then you usually have a command line tool in those Linux distributions for the small boxes where you can turn on the lights in front of them. That's what we do. We don't have a fixed product setting or something. Yeah, that's pretty much what we use. So it's a geode mostly. Yeah. So thank you very much. There is more questions, hey. No, no, the thing is this. So you can actually have any performance you want with NMAP and that is exactly the point of having all those command line switches. For me, the design requirement was to have a scanner that actually figures that out himself on a predictable basis. So in many scenarios, you can tune your scanners any way you want and they will in most cases be faster but that is not the point. If you're scanning something that you know how it's gonna react, why are you scanning it? So performance, this is why I said right in the beginning that this is actually sports. So the performance is not comparable. It is comparing apples to oranges in many cases. Do that make sense? No? Okay, I think on a, yeah, this is pretty much the graph that you're asking for. So on a filter toast, we outperform massively and on a non-filter toast, it really depends. The thing is that you can't really, you know, the configuration that you have with port bunnies basically do whatever you think is best for the network you're working in and NMAP has that too when you just start it without any flags like maximum retries and minimum delay and all these kinds of things. So the most comparable thing that you can do is you can maybe start it with, I don't know, T4, T5 or something and yeah, that's what you get then. And as in home exercise, scan an iPhone over wireless and wait. That is actually something he's... For the rate limit detection? No, no. For port detection, I don't think we have any faults positive because that just wouldn't make sense. Like I cannot imagine to have one. It just doesn't make sense. However, faults negative, I think we actually do have them in some case. I know that we have them like back in December or something. I don't know what the current experience is. Phops is testing it all. Well, we don't have measurements on that. That's pretty much what you're asking, right? You know, about a number. No, we don't have a number right now, no. The thing is, if you ever find a faults positive or faults negative, be very, very sure to send us an email because that is a very major issue. And we care about that. I found faults negatives in a very old version, like way back then. So I can't tell over the current version because as I said in the beginning, I'm just a guy saying, you do this. So thank you all for coming.