 I'm Daniel Burris and I'm giving my talk on correlating network attacks from distributed, is that better? Is that better? Okay, well as I am Daniel Burris and I'm going to be talking a little bit, a little bit less than I thought about correlating information gathered from distributed intrusion detection system. This has been, this work I've been doing while at the Institute for Security Technology Studies at Dartmouth College. So first I'm just going to give a little bit of background on ISTS where I work right now. It's a Security and Counterterrorism Research Center that's funded by the NIJ, which is the Research Arm of the Department of Justice. Its main focus is on computer security and the group that I work in is the Investigative Research for Infrastructure Assurance, which is mainly an engineering base group, which is done jointly with the Air School of Engineering at Dartmouth College. So kind of just to give a kind of overview of what we're trying to do with this system, talk about the needs and the goals of it a bit. In general, just to give like a kind of a nutshell picture of the internet and what's going on out there. We have some systems that are connected out there that have literally no defense on them. Some people that might be a little more concerned with their security, they put up firewalls, various types of defensive mechanisms, and then you've got others that are out there running a bit more information gathering systems, either IDS or other types of network sensors that are gathering a bit of information about what's going on out there. And of course we've got the people that are out there trying to get into everything. So we might see alerts occurring on some of these systems distributed across many networks, and we're trying to gather this information and find out if it's related, and rearrange the information gathered from these distributed systems to make the job of the security analyst a little bit easier. So the first thing we're trying to detect are distributed and coordinated attacks. Either an attack coming in from many places or many people trying to attack one site simultaneously, or possibly a single group or single person that is going after multiple sites that have some sort of relationship between them. So these have been of course increasing in rate and sophistication. These are whether they're the sort of the automated ones like the worms, like some of the like the curb red in them that we saw last summer, or other similar types of attacks. Also we're interested in looking at infrastructure protection. And this is part of the mission of ISTS and particularly the group that I work in, that they're wanting to detect either a coordinated attack against infrastructure or attacks against multiple infrastructure components. And this is something like where a group or individual may try to take out a particular sector of infrastructure, whether this is power generation, emergency response, communication systems, or either one sector on a wide scale or multiple sectors of infrastructure within one local geographic area. Also we want to try and reduce the overwhelming amount of data that's being produced by intrusion detection systems and reduce the requirements in the workload on the security analyst to analyze this data and pull out what's really interesting out of all the noise that's being generated on the systems. So we're interested in taking data that's already being generated out there and reorganizing it so it produces a clearer picture of what's going on. We're not trying to build a better intrusion detection system or even build a new type of sensor, but just make better use of information that's already out there on the systems and bring together information from various types of IDS, other types of network sensors as well and use this along with some other knowledge about the relationships between these types of events and what we're seeing to be able to track the activities of individual attackers on the system. So kind of the traditional sort of view of a network is kind of a very self-centered defensive position, a defensive viewpoint where we're interested in keeping people out of our network, keeping building a big wall around the castle, keeping the barbarians outside, trying to just keep people outside, but we're gathering only a partial, only part of the information of what's available in regards to each person attacking the network. This gives us, this makes sense from a defensive point of view, but doesn't give us very much information about the overall goals of someone whose attacks spread across multiple networks. So there's also weaknesses in this because there are certain types of attacks, such as a distributed denial of service where the actual victim of the attack need not be part of any of the preparation stages. Well, we've got someone in the little scenario shown here where we've got a number of systems out, there are a couple of them were protected, a couple of few of them are not, and we've got someone misattacking them. They're only able to get into the systems that are unprotected, but then later on they're able to use these to launch a denial of service type attack against someone who is actually sitting behind some amount of protection. So this person may not have seen or may not have been bothered by the initial stages to sort of setting up the preparation to zombie collection for the denial of service, but there still can be vulnerable to that type of attack. So we want to change the view to a wider view to where we can get more information about each attacker out there and try and figure out what their goals are and what they're going after, and get a more of an attacker-centered viewpoint. This requires, of course, one, gathering the data from the various systems and then also being able to fuse it and correlate it in a meaningful fashion. So we're using techniques that come out of the realm of our radar tracking system, which is in some ways very similar to what we're doing with IDS. We have multiple sensors, multiple types of sensors. We have multiple targets out there. We're trying to do this tracking and correlation in real time. Yes, and that's the sound effect there. So we're taking this model and trying to apply it to one where the different types of intrusion detection systems are our sensors and the targets that we're trying to track are the people attacking our networks. So, as I said, there are two stages to this. The first is that gathering data, collecting it. There's a lot of work that's been done in this area in the past and is currently being done. There was a couple of talks, other talks here at DEF CON this year. Focus on this. So we're concentrating more on the correlation aspect. We're trying to tell how various events are related and we're doing this by the basic assumption that the attacker's goals are going to determine their behavior and different attackers with different goals are going to have different behavior. Therefore, they'll be recognizable for one another. And we're doing this through the use of multiple hypothesis tracking, which just to give an overview of what multiple hypothesis tracking is, if we look at the radar tracking analogy again, each time we do a radar sweep, we see two targets out there and we see them moving in a certain fashion as they go through the space that we're looking at. But we're not sure exactly what's going on here because we can see what happens at each sort of sweep every time we get information from the sensors, but we're trying to now relate events from one sweep to another. So in these two cases, there are two possibilities that the targets could have followed these two paths. We're trying to do a similar thing in determining the paths that an attacker has taken while going after multiple systems. So in our system, we have the tracking system and it analyzes events that are being collected from various IDSs, whether they be distributed across many networks or different types of IDSs on a single network. And as these events arrive, they're analyzed. For example, initially we get a port scan. That's all we've seen so far, so we know that the only possible scenario so far is that there is one attack occurring and it consists of that port scan. In the future, we also see a buffer overflow attack. Now there are two possibilities for what's going on out there. There could either be one person that's doing both of these, doing this recall scan and then trying to break into the system, or there could be two unrelated attacks that just happen to be occurring at the same time. We're trying to figure out, these are all multiple hypotheses, now we're trying to figure out which one of these is most likely the case. So we do this by evaluating the hypothesis based both on the behavior of the attacker, of the target, and of the sensor. We want to know what real world event caused the reading that we're seeing off of our intrusion detection system, what actually happened out there to cause it to go off and give us an alert. And we also want to know how likely is it that someone would have done this event and how likely is it that someone that we saw before on our intrusion detection systems would have caused this event to occur. So I'll skip over the math for now. So for modeling the sensors, we're looking at the two basic methods of intrusion detection system, either the signature detection or the statistical anomaly detection system. Of course the signature has a very low false positive that it has a weakness detecting only known attacks. The anomaly detection, higher false positive, but it has the chance of detecting a wide range of attacks. And we use these both on, of course, the host and network-based systems. So for modeling the sensors with a signature detection, it's fairly easy. If a known attack occurs in an observable area, the probability of detection is one. Otherwise, you're not going to see it. The anomaly detection is a bit more difficult. The type of attack, whether it detects something is very dependent on the type of attack. Noisier and unusual attacks are likely to be seen, things like dinosaurs, board scans, someone accessing unused services. Other types of attacks may be missed more often by these, whether they be something that involves malform, web requests, some types of buffer overflows, things that look more like normal traffic. So, when we, so, also when we're looking at the sensor, we're interested in the information that it's giving us, the measurements it's making about the particular event. And we try to look at a sort of, a minimal feature set from the various ideas that we're using currently. This consists of the source destination IP, the ports, the type of the attack, and the time that the attack is occurring. These various measurements that we're taking about each attack are then used to describe a space through which the attacker is moving as they go through either various types of attacks against one system or attacking multiple systems out there. So, in order to evaluate, use this information to evaluate the response of the sensor, we're using Bayesian inference. And this is used to solve the problem. Basically, when we have a sensor and IDS out there, we know the forward response of it very well, meaning that we know that if a particular event, X, occurs out there, someone tries a particular attack, we know what the sensor is going to tell us. That's how they're designed. They're designed to respond to their stimuli. But our problem is that we have to go through this backwards because our only view of the world is through the sensors. So, we only see their output, and we're trying to infer the input that caused that. So, given the output, what was the real-world event that occurred? So, to do this, we look at both the forward response, which is known, and the prior distribution events, meaning what's the likelihood of getting this sensor response we saw given a particular real-world event. And also, what's the probability of that real-world event actually having existed? What's the chance of that attack actually happening occurred? Or is it more likely that this was a false positive? Something that probably wasn't occurred right now or at least wasn't significant? So, modeling the attackers. They're not as easy to observe or model as the sensors, mainly because we build and design the sensors and we can look at how they operate. And oftentimes, these are all we can ever use to look at the people attacking the networks. So, we have a very limited view of what's going on there at times. It's very difficult to describe the state of the attack, which I'll discuss briefly in a moment. So far, with modeling the attackers, we've got used three sources of data in this work. One is from a simulation that we have designed, another is from local networks that we have under our control, and also from the capture of the flag game here at DEF CON. So, for the simulation, we use... There's a couple of ways that we use simulation. One is in purely generated data. We just have a complete simulation. It simulates the network noise out there and the attacks occur. This is, of course, highly controllable for the development stages. We also have generated attacks. We would say a simulated attack that's inserted into normal network background noise that we record off of live networks. This is a little more realistic and a bit more interesting for testing purposes. So this is just a diagram of one of our testing systems we've got on the network at the engineering school. We've got the main line coming into it. It goes through a number of switches, and we've got IDS sat at various locations on the inside where we can either view these together or in isolation from one another. We're taking this to take the different segments of the network to pretend like they're a distributed system to try and gather information on each one and then correlate it together to see if we can pick up the big picture. We also have another system out at the location at ISPS and another one with a company nearby in town to kind of get a wider picture of what's going on out there. So, and of course, like I said, we also use the DEF CON capture the flag information. This data is unrealistic in some aspects because there's a general lack of staff, lack of network defenses, et cetera, on the system. But what's great about it for our purposes is there's many, many attacks in many scenarios. A lot of the more traditional datasets for testing and evaluating IDS, such as the Lawrence Livermore datasets, they only have single event attack syndrome. They don't have sequences of events like someone coming in doing recon and then coming back and attacking particular systems. They're trying to attack one system and go to another. They're just individual events, so they're not very, it doesn't help us for trying to develop correlation techniques. But of course, a capture of the flag game does have this. For example, and this is from two years ago in a 2.5 hour time slice of the network data, there was a playback against network intrusion detection systems. There were over 16,000 events registered on it. These were Oliver Dane and Robert Penningham from Lincoln Labs. Everyone think Oliver Dane mainly, hand classified these into scenarios, went through all these events and figured out what was really going on, and there were about 89 individual scenarios in this 2.5 hour time slice. So the state problem of the attack, basically, we're not going into too much detail about this. In a traditional tracking system, you want to be able to describe the state that the attack is in in as simple terms as possible, like for an airplane, give it its X, Y, and Z position and its velocities. It's a little more difficult to do this with an attack because the space the attack's moving through is non-linear and non-contiguous. So there's no simple method for describing the state. So what we do is we use a small history of events in each track to describe the state of the attack and to see whether a new event coming in fits in with this short history that we keep. So we use a sort of a windowed history of previous events and look at the relationship between the new event where we're trying to fit into this crack to see if it fits in there, compare it to the previous event in the event before that and the event before that, but we have a weighting function so we can give more weight to the more recent events. So we calculate the relationships between the pairs and sequences of events here to see if the new event makes sense with what has come before it. So one thing that's important to keep in mind here is that for describing the state of attack, we don't really care how they got to this state, how they got the attack got to this point. We just care which state it's in. We don't need to know the entire sequence, just enough so we can differentiate one from another. And to do this, we need some predictive model that can give us an idea based on what we've seen before, what's likely to happen in the future, what's unlikely to happen in the future. And this is where the different types of attacks, the different things someone's going after are going to have a different distribution, different probability distribution of what's occurring. For example, in these two graphs here, one shows a probability distribution for events on someone that's scanning a network, that's doing some sort of reconnaissance. And the other is for a denial of service. On these graphs, basically, the further over to the right something is that's the faster the events are arriving. So in a denial of service, things occur very quickly. Otherwise, it wouldn't be much of a denial of service. On the going back into the graph kind of along these axes, this shows the sort of spread and source IP addresses that we're seeing. Something like a distributed denial of service we're going to obviously see things coming from many different IP addresses. Something from a scanning attack, chances are, in most cases, they're all going to be coming from a similar area of the internet if it's the same person. So this shows just a couple of feature sets. Sort of the IP source spread and the arrival rate of the events. So we're interested in trying to figure out what features out of what we're measuring give good information. So we've been looking at the historical datasets from either what we've captured off of the third network or from the captured flag game to see what makes good differentiating features for these events. So basically, all this is showing here is it's looking at those two graphs that we saw on the previous chart. Now they're plotted against each other, looking at one axis at a time. Over on the left is the arrival rate. Over on the right we're looking at the spread on the source IP address. You can see there's significant more overlap on the arrival rate than there is on the spread on the source net mask. This means that looking at the IP source that things are coming from is much more likely to be able to differentiate between different types of attacks than the arrival rate for these two scenarios. Let me look at this against historical data that we have. What this is showing is the numbers are showing the amount of overlap between the various types of attacks depending on how fine of a... well, I'm not going to get into the great thing it takes too long to explain right now, but it's showing, basically, this is showing how much overlap there is between the various combinations of feature sets. This was taken from the capture of the flag data from last year's game. So, let me... time purposes just skip ahead a bit. As... well, just the stuff I skipped over was just basically explaining that after we determine what are good feature sets, we've been using a machine learning approach to then classify the events. We feed the various relationships between... and the feature sets. How well they match up on a particular set of features feeds into a machine learning system using neural nets to try and then differentiate the events as they come in and separate them off into the various tracks. As in the multiple hypothesis tracking method, as new events come in, we make all the possible hypothesis of what could have happened and then try to evaluate them to determine which one is the most likely. In the brute force approach, the number of hypothesis we have to maintain doubles every time we have an event arrive. So, it grows very quickly if we don't prune back very aggressively. So, in traditional hypothesis tracking, you create all the hypothesis and then prune away the ones that don't seem very likely. We have to be a little more aggressive in ours where we have a more selective branching where if we look at something and it's clearly a winner, if it clearly fits in with a scenario that already exists, we're branching on other possibilities. We just put it in the one we think it is and go on from there. If there's no clear winner for this particular event, if it's not absolutely clear which scenario it goes in, then we branch and maintain this hypothesis for a while until there's more events in them and there's more evidence to try and determine which one is the most likely, which one appears to be the real set of events that occurred. In order to cut down on some of the data and to cut down on the number of hypothesis, we do some preprocessing where some sequences of events that are simply related are pulled out and turned into single meta events immediately. Things like port scans where it's very obvious they're all coming from the same place. These are very noisy. They generate lots of events. We cause the system to get bogged down but by pulling these out and turning them a number of events we reduce some of the computational workload on the system. With a bit about the testing and evaluation, we've done this on both the data collected from the Thayer and ISTS networks that we have light data on and from some of the DEF CON data sets over the past few years. We used Thayer sets earlier with the probability distribution method and now the machine learning approaches have been applied to the DEF CON data sets. The Thayer data set we had a set of data had about 1,500 events. These are actual intrusion sensor and intrusion technosystem, the notes on here. They consisted of about 20 scenarios and a bunch, about 50% of the events were all single events unrelated to other things going on. When this chart shows the accuracy of the system and placing events into scenarios based on the number of hypotheses the system was allowed to maintain. When we had a very low number of hypotheses it doesn't do very well because you're not looking at as many possibilities so if a mistake is made and the hypothesis is already on it's corrupt, corrupt, corrupt, and so on. It's corrupted and it's not going to give you you're not going to have the right we make the limited number of hypotheses. If the correct one isn't in the set that you decide to keep and promote on to the next generation for future evaluation it's going to be corrupted forever and you're never going to get the right answer out of this. So as we let the number of hypotheses grow we got up to about it was about an 89% accuracy on placing events in scenarios with the third set. On the DEF CON set this is from the DEF CON 9 capture the flag which was interesting. Last year at DEF CON they changed the way the data was collected a little bit and it was collected from multiple points on the network which was more interesting from us because that gave us essentially different ideas to try and correlate events in between. On this one and this is using the machine learning approach it's a little it's less stable earlier on but eventually we got up to about a 95% accuracy with enough hypotheses allowed to maintain. One issue with this is this shows the number this is based on evaluating how many of the actual events in the scenario did it get. A lot of the hypotheses also had extra garbage events in them but that's something that we're well it's an area for improvement it's something we're not that concerned with right now because if we can pull out the scenarios and the actual sequences of events that we related and put those together even if there's some noise in those it still creates a much clearer picture than what we had prior to that. So just give a quick summary here what we're doing with this is we're organizing data that's out there to be generated and make better use of it make the job of the system analysts a bit easier provide a higher level view kind of like the overhead view to see what's going on on each network see what each attacker is trying to do across multiple networks reduce the work in security analysts and through the application of red some techniques out of radar tracking systems some of the future and now current work as well on this is to incorporate a wider variety of sensors some more host-based IDS system logs things such as that to try and get multiple views of the same sort of attacks to help try and reduce the false positive rate on the system that if you can correlate what your network IDS is telling you what is an attack that's out there on the wire that hosts are actually seeing on themselves to know whether it was actually a successful attack or not whether it's something that you need to be worried about or not and also innovation with other network analysis tools and also we're interested in incorporating this with a few other of the projects out at ISTS one of these is a the distributed ICMP backscatter which this is a system where they've modified the kernels and sort of routers to report back to a central logging facility when they're seeing large amounts of ICMP ICMP data all the packets coming back through them that would be seen when you have a worm that's going out there and scanning large portions of the internet and hitting many places trying to scan for many hosts that don't actually exist out there you get this sort of this flood of ICMP traffic coming back to the source of the scan that and by looking at this over on a wide scale and trying to coordinate this for many areas we hope to try and pinpoint where the worm is coming from very early on in its scanning stages another thing is in the BGP Gateway Protocol routing analysis we're looking at the information being passed back and forth by the border gateways where they as they propagate their known routes around the system we're getting information from these scattered around the world and trying to determine when certain systems are being routed around when there are problems in certain parts of the internet to try and identify it by the updates to the routing tables and of course these larger scale implementation and address all the scaling and timing communications issues that we accrue with that so that about does it so if there's any questions now I can't hear you okay the question was how do I get a copy and well available right now but within the next few months we're hoping to have something that's up and available and it'll be at www.ists.dartmit.edu I think there's a bit of information about this it's in the version of the top it's on the CD that you were given about the location I saw some other question maybe it was the same question the question was in the DEF CON analysis what the guys from Lincoln Labs provided as the correct set of scenarios to evaluate ours against we did that with what they did which was that two and a half hour time slides as a starting point actually the graph here was actually from DEF CON that was from two years ago from last year we went through and did sections of it by ourselves we classified we took the data from the capture flag game last year and went through it by hand like the guys at Lincoln Labs did and made our own judgements as to what we think it was the correct scenario answer it the question was how does the system deal with that data set reasoning and that was something that we haven't really that was something we haven't really addressed yet but certainly if you know that this thing is out there looking at what you're doing you could you could damage its evaluation by giving a lot of garbage shade out there but in a lot of cases the sort of traditional methods of trying to cause that sort of garbage data and to create a lot of noise are one of the things that this tries to correlate and pull out of the system so it's I wouldn't go as far to say that it's safe or does well against that because we didn't really test it against that but it's by the nature of the system it should be able to handle that at least perhaps some modification from out operates now yes the question was is the traffic capture from the capture the flag available and it is it's the SHMU group which is www.shmoo and I think it's the dot org but it might be dot com, a dot org yeah, a dot org and they've made the capture the flag data they record the raw network data of the capture the flag game and it's a lot of data but they make it publicly available for research purposes the question was do we take into account sensors running different IDS policies on them and how does that affect what we're seeing but that's taken into effect in the sensor modeling stage that there's a problem I talked about when you're looking at like say it's a signature based on IDS that its probability of detecting something is based on one what it's looking for in the area that it looks in and basically what it's looking for is directly correlated to the to how you have it configured so that is taken into account there and it's mainly it's important to kind of know what you're not looking for so if something's missing in the scenario it can be explained away as not being seen because we should have seen it but the reason we didn't see it is because we weren't looking for it the question was what's the minimum number of sequences in an event it takes to record correlation and that's very dependent on what what the the sequence is, what type of attack it is with certain types of attacks when you have if you have it's highly dependent on that that it's something like where you see like recon coming in and then attacks that are going after what was reconned it doesn't take much because they're very similar on a lot of feature sets the things that are a little more out there and separated it gets it gets more difficult quickly and it's a really hard thing to answer because it depends on with the type of attack how many what's kind of like the event rate of the attack, things like scans high events, if it's just a single buff over food attack sort of coming in and then not much else happening often times there's not enough information in that to do accurate correlation so well I think that's about it thank you