 Okay everyone if you could please make your way to your seats. This is the final shared field talk today before the lightning talk start and I would very much like to introduce you to Dan Lambrite. Thank you. That's better. Okay so I'm going to be talking about intrusion detection on OpenStack. I work at Red Hat. I've been there the last couple of years in the storage group and I also teach at a local university in Massachusetts, the University of Massachusetts at Lowell and that's how I basically got into intrusion detection. It's part of the class that I teach and there's a lot of OpenStack engineers who might work with in Swift and other components and so there's a lot of folks around who I can talk to and it's also a very interesting subject. I first gave this talk last summer at OpenStack Vancouver and since then the story has changed. It's actually become a lot easier to set it up in OpenStack so I'll be talking about that. So what I'll talk about are two types of intrusion detection systems. The network IDS, what it does and how you can configure it with OpenStack and integrate it with OpenStack, how you might want to do it. There's different use cases. The plumbing, the networking plumbing which is basically the key to it. I'll talk to you how to set that up and dive a little bit into the performance. Due to my work with storage I wasn't able to get to do a lot of the testing that I wanted to do with scalability but we do know a lot about how it would be practical to set it up. Then the second part of the talk is about host intrusion detection. That's not the network. That's more like virus scanning and a list of other things that those are capable of doing. So the question is how that can fit into the OpenStack scenarios and so we'll be talking about that. Then we'll put it all together using some open source tools and look to the future. Okay so first network intrusion detection. What the job of a network intrusion detection system is to analyze packets as they arrive on the system, scan them and look for malicious behavior and then if such behavior is seen do something. Typically tell the administrator that there is malicious behavior underway and maybe more and that's the response. There is an open source tool called Snort which is the variety called signature-based and the way it works is according to rules which you supply to it. The rules describe the malicious behavior in the packets sort of like the offset and the particular thing that's bad. You can download these community maintains them and they're called rule sets or you can purchase from the company the extra support. There's another flavor of intrusion detection systems which is a little bit more complex. It's more programmable or script based and you can look for anomalies. So things that are hard to describe in a one line or multi line rule, you could actually describe in a script some peculiar denial of service attacks which look like ordinary traffic normally but are not really and it takes a little more intelligence to detect those. Bro is a well-known IDS in that category. So here's just some examples. These are one-liners to look for scanning attacks and a typical rule at the very top. Look for TC packets from anywhere to my own network and any port with the SIN flag and the FIN flag in the TCP flags set and if that SIN display a message. So this is just a toy for demo and to show another example a last year shell-shocked attack was quickly put into the community rule set like in the same week. It turned out there was just a simple four character signature that could be used to detect that. You can see that in the content field. The one in the middle is the demo that I use in the class I teach. The one on the bottom is the actual rule which has all the nuances to make it hardened for real world use cases. Right so now we got that. What how would this work in OpenStack? So when you think about OpenStack you might have two use cases. One of them is in which you are the administrator of a tenant and you're responsible for all the instances in your tenant. In this case you may want to run snort or something inside of an instance and that tenant that instance would be charged against you like any other instance in the tenant. So this is the case where you want to do IDS but you do not have access to the underlying hardware. The second use case is where you're the administrator itself and in this of the hypervisor that is you have access to the hardware itself, the machine, the network and can do whatever you want. And in that case of course you've got to you might have two tenants with the same IP address and you've got some administrative overhead to do things like distinguished tenants. You might have one set of rules for one tenant and another set of rules for another tenant and you may use VLAN tags for example to distinguish the two tenants which snort allows you to do. So those are the two high level use cases. I'm mostly interested in the second. So let's say you are the administrator of say OpenStack and your next question is okay I want to use IDS, a network IDS. Where am I going to? The IDS basically involves sniffing packets. Where am I going to sniff the packets? So here is what the network looks like. This is a somewhat famous picture if you're involved with OpenStack. The blue rectangles are bridges, network bridges. The one that says BR ant is the so called integration bridge and all traffic flows through that. The green circles are instances and the blue rectangle between the green circles and the integration bridge are Linux bridges which are the firewalls. There is also a blue rectangle called BR-TUN. That's the tunnel bridge and that connects to other nodes. So that's the network layout internally and if you're an administrator your next question is where do I plug in my IDS? So here's one option that you could do. You could just create another instance and put it between that firewall, that Linux bridge that I mentioned in the previous slide and the instances. I know people who are doing this, who have tried this and it does work but it's slow because it's a store and forward model. You have an instance which reads every packet and then it checks it and then forwards it on to the ultimate destination. It is technically possible but I'm not crazy about it because of the performance overhead. Better I think is to mirror the traffic somehow. The traffic goes to its destination as normal but you receive a copy of it at the IDS where you can analyze it and take preventive action. Of course there's a window between the time that you see it and by the time it goes to the destination but this is more like I think the way it's set up in real life typically. What you actually want is a tap. A tap would do the mirroring and then the next question is where are we going to put it in that network and the integration bridge is a logical place where you sniff or tap into some subset of the instances that you're interested in analyzing. So how is that accomplished? Well last summer there wasn't really a good way to do it. OVS has had a mirroring capability but it was difficult to use. So last summer I used it but it was very difficult to set up administratively. Actually fantastically difficult. If anybody's messed around with networking in OpenStack it's a bear. So fortunately the community has come out with a new service for Neutron which wraps around that OVS mirroring capability. A very nice user interface and this has just come about in the last six to twelve months basically and that's what I'll hopefully demo in a few minutes. You can snoop bidirectionally or ingress or egress and let's see. So they built it. It came out of the telecom community. They built it for not just security but for analytics and some other use cases but it works great for IDS. So let's say you are an admin and you want to use this. You can just go to your, say the DevStack local.com file and add those lines. You're just enabling two plugins. One is the agent for OVS and one is the tap as a service. Now this tap as a service is built in an extensible way so that it can work with multiple network implementations but OVS is the one that they did first. So it's a pluggable model for OVS and maybe other networks in the future. Another thing you have to do in the local.com is set that port security flag and what that does I believe is say it's okay for you to forward a packet through the bridge to a destination in which the MAC address is not the same as the destination. That's normally blocked but with the security you can do that. That means traffic that was originally destined for that MAC address can go to somewhere else without it being blocked and you need to explicitly say that. So let's say you are, how you have tried this. You've downloaded it. What you'll see is a new bridge is created called the tap bridge. It's another blue rectangle in the diagram and it's connected to the integration bridge and also the tunnel bridge because the nice thing is that you can have multiple open stack nodes and one can tap into another node. The API looks like this straight forward. Basically you have a fan in model where there's a single destination in our case the IDS and multiple sources. So multiple sources are called flows and the single destination is called a service. So you first create the service with the interface that's up there. You supply a neutron port that can be the port of the IDS instance and you create flows. Those are the ports of the sources that you want to sniff. Right. So you do that. This is just a depiction. I don't know if you can see it of OBS, VS Cuddle show. This shows all the bridges and you can see there's a tunnel bridge they're created. Another interesting thing. I'm not sure you can see this either. I had to learn this to set it up. It's quite nasty but basically these are the flows which go through the OBS bridge and the cool thing is once you set up a tap as a service flow you can actually see the flow that's created in the OBS bridge in the integration bridge which represents the MAC address that you're sniffing. So I've got a quick demo here which let's see if this works. So I've got a, so the font size is a little big here and those of you who use OpenStack know that it's got some crazy outputs. But I've got three instances here. Looks like we're working. We're SSH to a machine in the U.S. right now. So I've got three instances. One is an attacker. One is a victim. And the third one is the IDS. And I've set up a flow between the attacker and the IDS. So let's just see. Yeah. So let's see if we, yeah. Okay. So I've created two, these are the two ports, the neutron ports of the two instances that I just mentioned. The attacker and the IDS. And you can see that those are the identifiers of the two ports. And let's try this. Service list. See if this works. Yeah, that worked. Okay. So that's the service I created. The tap as a service service. That's the destination. And you can see that's the port of the IDS. And then we have flow list. This is, yeah. And that's the corresponding port of the source, which is the attacker. All right. So let's just quickly see. Let's see here. Right. This is the IDS. So I'm going to do a PCP dump and I'm only going to look for ICMP packets. I'm going to go over to the attacker and I'm going to ping Yahoo. And I should see, yeah. Okay. So this just shows you that I'm going from the attacker and I'm pinging Yahoo. And a marrying happened. The marrying intercepted the packets and routed them to my sniffer, my instance, which is sniffing that packet. You can see that happened. All right. I am running Snort right now. Right? Oh, am I? Oh, that's a... Yeah. Oh, this is the wrong one, I think. Let's try this one. Yes. Okay. Usually demos always fail, but this one just may succeed because I've... All right. So let's run Nmap. Nmap is a nice scanning utility. I'm going to do a fin scan. And that IP address is actually an internal IP address of the... So it's not going to the IDS. So I do that. Nmap is happening. Takes a little bit of time to group through all the ports. And there you can see Snort has found those. The fin scan has occurred and did an alert. So that's just a proof of concept of TAP as a service and shows that it's possible to make it. Current slide. Okay. Right. So that shows that it's practical to do. I think it shows that you can do an IDS. This was not true last summer. I think it is true now. The next question you might have is, okay. So do I want to run the IDS on the same node as my instances that I'm protecting? If you do that, you begin to worry about the cores that you're swallowing up for this monitor. And what might be preferable is to run the IDS on a separate node. Doing that means that separate node will have all the capabilities it needs to analyze these packets, but it's going to be more expensive. So what are the performance costs? What happens when the CPU is overwhelmed? Well, it will give you false negatives, which means I didn't see anything wrong. That's a negative when there may have been something wrong. Actually, it should be a false positive, but anyway. This happens pretty quickly when you fan in a lot of instances to one single recipient, one single NIDS. It will start to drop packets and then you'll miss the attack. So what is the performance overhead here? Well, on this sort of setup, you've got two components to look at. And one is the open V switch, and the other is the CPU overhead of snort. And then you can look at this a little deeper by scaling up to the maximums, which I haven't yet done, unfortunately, but I would like to. So open V switch performance by itself. OBS has a reputation for being quite fast. There's a table in the kernel mapping MAC addresses to destination ports. And as long as you stay in the kernel in the fast path, you are probably going to be so quick that you won't see OBS switch as a bottleneck. The only time you come to user space with OBS is if you have a configuration change, maybe a new flow that you've added, which is a rare event. Maybe in instance goes up and down perhaps. So that should be, OBS should be quick, and it shouldn't be a problem with setting up an IDS. But there is a problem with mirroring. I've seen, and others have seen, a performance degradation when you start to mirror traffic specifically. Now, I haven't redone that with TAP as a service, and I'm not sure they're doing mirroring in the same way I did last summer, but people have measured that the more mirroring you do in OBS, the more of a performance degradation you've got, it's close to linear degradation. So that's an issue which should be looked at. The second part of the equation would be the Snort performance itself. That's the program you're running to analyze the packets. And for that, Snort has up until recently been single-threaded. I don't think that's true any longer, but the very latest Snort might be multi-threaded. Now it will be very soon. To get around this, people have in the past run multiple instances. That's an option. An alternative to Snort is Syracuda, which is compatible with those rule sets. So it can read the Snort standard rule files, but it is multi-threaded and has a reputation for being faster. So you could investigate that as an alternative. But in any way, as far as I know, Snort is going to become multi-threaded. So that's the network IDS. How about host intrusion detection systems? So what is a host intrusion detection system? It looks for hackers attacking on the system itself, not the network, but on your system. So it's looking for maybe somebody's planted a virus, maybe somebody's planted a backdoor by modifying or tampering a binary, maybe somebody changed a configuration file. Those are the sorts of things it can do. The open source HIDS, which is popular, is OSEC. And it does those things that I just mentioned, and on top of that, it does something great for the administrator in a big distributed system, which is log aggregation. So you can have many different machines that you want monitoring, and you don't need to check each one of them individually, which would take forever. You can have one master OSEC server, which is responsible for protecting all those instances. Each of those instances that is responsible for protecting runs on it an OSEC agent, which does the information collecting and then reports back to the server. So you have a, this is a really ancient diagram, but it basically looks like this, where you have one server and all the clients feeding their information about back to the server. So, all right, this is the way OSEC has been for, since its inception, which is for some time now, translating that into the open stack world. How does that go? So you have the same use cases as before. You could be a tenant administrator where you have some instances that you're responsible for. If you want to run OSEC in that environment, you can set up one of your instances on the same node or a different node being the OSEC server. But you'd probably want to give, actually you would have to force your tenant users to run OSEC, and that means having the OSEC agent, and that means putting that in your glance images. Now, if you are, the other use cases, if you're responsible for the whole thing, that says you're the hypervisor administrator, then you can do some additional things. You can, you can actually look at the open stack configuration itself and protect that. So, we'll talk about that for a few slides as well. So, here's an example OSEC alert. Somebody tries to log in to your system to the root user, and they put in the wrong password. And VAR log messages reports this happened. And OSEC will see that because it can parse logs from multiple sources. That's part of log aggregation. It can read snort logs, VAR log messages, web application firewalls. It can get all these different sources, parse those different log sources. So, you only need to read one set of logs. So, let's say it read VAR log messages. It can then generate a active, it can then do some response, like alert the administrator. In open stack, it could do a bit more. And a parser could be created for the many open stack logs. So, open stack has neutron logs, nova logs, keystone logs, et cetera, et cetera. And a parser could be written just like it is for any other application. And OSEC could read that and say flag, an illegal network topography change occurred. I don't know if anybody's doing that. It would probably be quite challenging to do because parsing the open stack logs would be quite a job, but it's theoretically possible, like it is for any other application that OSEC works with. File integrity checking is another thing that OSEC's capable of doing. This is if somebody's like, modifies a file. Modifies a binary. A hacker says, I'm going to take a file and I'm going to put, I'm going to modify it so it does something beyond and malicious that it's supposed to be doing. And you don't know because it looks like the original file. Well, OSEC will find those kind of backdoor attacks by scanning your file systems, whatever directories you wish it to scan and compute checksums for each of those files and store them. So if somebody modifies a file, it'll come up with a new hash and then you can compare the saved hash with the modified hash and it won't match. And then you can say, wow, this changed behind your back. Now, in the open stack case, I think it would be interesting to, you could actually do this with Python files. So if somebody went and modified open stack code, so open stack software, you could quickly find it with this mechanism. Now, active responses in open stack are interesting because they're actually shell scripts. So you can open, OSEC can, could do anything in a shell script, basically, including killing an instance, turning off a tenant or adding a new firewall rule to the Linux bridge, a new security group, that's all within the realm of possibility. So, all right. So putting all this, all this stuff together, you might have two nodes. You might have, say, call a monitoring node and a compute node and you have an instance and maybe the instances running has the OSEC agent in it. So it's on that system monitoring of the host and you're also monitoring the network. So you, some malicious traffic comes in and you have, that traffic gets mirrored and read by Snort on, it's mirrored to the other node and then OSEC, which is capable of reading Snort logs and parsing them and understanding them, will capture that. And then it can go ahead and tell the agent to run a shell script, which could turn off and block that traffic. And if OSEC is too much for you, it's overkill and you just want to do network intrusion detection, it's the same idea, would work fine because Snort is capable of doing active responses as well. So in conclusion, this was last summer, I think, a much more difficult job. Today, I think it's quite feasible and you could do it and it's all because of TAP as a service, its capability has been introduced. Performance bottlenecks are still an issue if you're running many dozens of instances or more and feeding them all, all that traffic into a single IDS, I think that's going to likely drop packets. So you've got to look at the cost as well. That's going to add up and you're going to probably want a monitoring node and you're probably going to want to, you might need more than one monitoring node, depending on the size of your cloud. You might want to consider using a different IDS than Snort, although Snort is improving in this respect. And OVS itself, the network itself, may become a bottleneck, although that's an area of investigation. So yeah, that's the IDS on OpenStack and thank you very much. So I have a scarf or we can go home, if anybody has a question? Yeah, you have to tone down those logs and that's part of the tuning. There's lots of ways you can tune it. There's basically a file called Threshold, which I use myself when setting this up to turn off a lot of the bogus logs which are overly paranoid. If you don't do that, I agree, it's not terribly useful. IDS is not for everybody. Firewalls probably handle the majority of the cases for most people. But for those of you who want to use an IDS, probably the first step you're going to do is tone down the log messages and set those thresholds and screen out some of the junk. So your question is who should do the active response? Should Snort do it or some other mechanism? I think it's the flexibility that you get from making your own active response, your own shell script which is worth the trouble. It's a little more administrative overhead but you can fine tune exactly what you want to do. And particularly with OpenStack, you probably would want to do some thing that's in the OpenStack API. You might want to kill an instance or something. And Snort doesn't know how to do that. Snort doesn't know anything about OpenStack. So it's likely that you'll... And I think this is an area that people will be contributing to more and more now that we have TAP as a service and this is viable. You might see people with OpenStack specific active responses which Snort won't know anything about but can call, can invoke. Yeah, that was like, I don't know. But I think it will... Last I looked, it was still single-threaded. I might be wrong now. Because it was single-threaded, it tended to drop packets. When it dropped packets, you don't see the attack. So there's alternatives out there which would fit just fine in this model. I'm not saying Snort is the most popular but that's why I brought it up. But there's other ones like that surakuta that I mentioned. I think Snort's... I guess I like Snort because it has such a large community that people add rules like immediately. So like this shell shock came out last year and immediately there was a new rule added. So that's real important. Let me give you a gift. Oh, okay. I think that was an isolation. I did it myself and I reproduced it but I had seen it written in a paper which I referenced in the slides. So that was not TAP as a service. That was just vanilla OVS port mirroring. So OVS has a capability to do this. I see a signature when you set up port mirroring with OVS. I can do OVS dump flows, one of those commands. And I can actually see the word mirroring in there. Now, when I do TAP as a service, I don't see that any longer. So I'm wondering if they figured out a different way to accomplish the same thing without using mirroring. And that's something I'm not clear on. Maybe that... I suspect, I don't have proof, but I suspect that is the origination of the 30% drop, that what you just said. You're adding a VXLAN ID and that incurs the copy? Yeah. Well, here. This is going to be a long way, but I'm going to try it. Oh, I see what you mean. That's a great question. So when you do an upgrade, it's going to change all your... I think what you would practically do is shut off OSEC temporarily. Yeah, you would have to. And you'd have to recompute. So OSEC also is a slow background crawl over the file system. It doesn't discover necessarily immediately, if I remember right, that the file has changed. So you have... So a hacker might do something and you find out hours later that it changed. So I think... But practically, you do have to shut off OSEC. Yeah. Here you go. Oh, you got one too. Okay. I have not done a great deal of looking into that. So I think, in general, the first thing you've got to think about is... So you can fan in from multiple network nodes to a single snort. But if you have many nodes, then one snort is probably not going to be powerful enough to handle all that traffic. So you're going to, in fact, have multiple snorts. You need to in order to scale up to the amount of traffic that's being received. But I believe TAP as a service has global visibility across all the nodes. It uses ports, and ports are visible throughout the entire network, as far as I know. So it will handle the underlying mechanisms to set up TAP bridges on each of those network nodes. And you give a port, and the port may live way over here, but the TAP as a service will set things up such that it will route the packet to the correct location without you having to worry about it as an admin. Right. Okay. That's it. Thank you.