 testing. All right, let's get started here. I think a few more people are coming in. I'm going to be talking a little bit about intrusion detection in OpenStack. I work at Red Hat in storage actually, but I also teach as an adjunct professor at the University of Massachusetts at Lowell, where I teach some courses in security and intrusion detection. At Red Hat, I work with a bunch of OpenStack people in Swift and other areas. That was the impetus for me getting interested in intrusion detection through my job external to Red Hat. In this talk, first of all, I hope to meet people who are also interested in this subject. I'm going to show you what I've come up with to get intrusion detection working in OpenStack, sort of where I see it going. I'd like to hear feedback from people. This is the right direction or not. I hope to do a lot of that this week. What I'll first talk about are network intrusion detection systems. I'll look at the design, some different use cases. The network plumbing changes the neutron that I see needed in order to make it work. We'll look a little bit at the performance, which is a kind of a slippery area in OpenStack. I'll highlight some of the key considerations that you run into when you set up an IDS. I'll spend a bit of time on host intrusion detection systems. I think host intrusion detection systems are quite interesting and probably have a great role to play in OpenStack. I can actually devote half my course to HIDSs. In particular, we're trying to put an HIDS and a network IDS together. They work together. This is the way commercial vendors sell their products. They don't have a network IDS or host IDS necessarily. They give a combination package. I think we can do that too. I'll talk about that. Right. Then some next steps is how we can make this easier, because a lot of this is I've found to be do-it-yourself, which is sometimes quite difficult. First, network intrusion detection systems, a recap of what these are for. A network intrusion detection system, what it basically does is it analyzes packets when they enter into the system. They look at them, they inspect them, and if they detect something that's malicious, that's suspicious, an alert will be created. The operator will somehow be contacted or a log generated. Potentially, there's also something in the lingo called an active response. An active response is some action to take to stop whatever the bad behavior is from happening. Network intrusion detection systems come in two flavors broadly. One of them is signature based, and what that is is basically you have rules which spell out in fine grain detail exactly what the suspicious activity is. I'll have an example on the next slide. Spell out the patterns that's bad. These rules are sometimes called core rule sets and can be maintained by the community, by us. If there's a new attack that comes out, say a zero day attack comes out or whatever, someone will make a brand new rule and upload it and it will get into the configuration very soon. Snort is really the best well-known IDS of this sort. Another one is called Bro. This one is anomaly based and basically what it does is it learns what it is capable of doing is learning the normal traffic and sending an alert when there's a deviation from that norm. This is also script based. It's almost like a programming language. It's possible to do some extremely complex analysis. I'm not going to talk about Bro. I actually have found it a little complicated to use but I think a lot of the techniques that I'll be talking about will be applicable to Bro. Here's just a quick example of some rules that you might find in Snort. These are reconnaissance rules. We're port scanning. The rule here is alerting. If a TCP packet comes in from any IP address or any port, that's the first four columns there, that arrives at the home net, that's my IP address, to any port. If the flags, SIN or FIN, that's the top line, are set, this is a SIN FIN scan so-called and that rule will trigger that has a unique identifier and a message is displayed. Here's another example of the shell-shocked rule. There's a simplified one which we did in our class. You see the content is just looking for a four character string, open parentheses, closed parentheses, space, open bracket. That was acceptable for demonstrating how to find shell-shocked. The actual rule used by Snort, as you can see below, is way more complex and it's a very, very fine-grained detail. Those are some example rules that give you a flavor. Why would you want to use, or how could you, what are the ways that you could use a network IDS in OpenStack? I use the word tenant throughout these slides but project interchangeably. Well, if you're a tenant, project administrator, you probably want to guard your instances. So a likely scenario that you could see would be to download and run a network IDS inside of an instance and use it to guard the other instances and that instance that you have would be charged just like any other instance. Then another use case that is plausible is of the administrator of the cloud itself. In this case, you could have a network IDS guard all the tenants and all the instances across all the tenants. Here you have a very practical problem. You could have IP addresses that are the same across two tenants. Not a big deal though. Snort can recognize VLAN tags. Each tenant has its own tag and you can even have in Snort different configuration files or different rule sets for each tenant. So those are the two use cases. The first one I think is a bit more interesting and that's what I'm sort of focusing on and we could call that IDS as a service. So IDS as a service, where would we want to place a network IDS? Well, this is, I think, probably a graph that everyone should be familiar with. I assume most of you are. The key points are that I'm using RDO. That's the Red Hat OpenStack distro. So I'm using OpenVswitch. So I'll be talking about OpenVswitch. The major piece here that matters is the integration bridge. All the instances go through there and right above that, the one with the three letters QBR, that's a Linux firewall bridge. Linux firewall because it has to use IP tables. And those connect to the green circles which are the instances. Okay. So one way you could put a network IDS into OpenStack would be to just have a gateway. You create a new instance and you feed all traffic through this gateway, perhaps after the firewall. And it could be sort of a store and forward model. So packets come into this gateway and then they go out to the instance. It would be slow because it's store and forward. Now I know some people are actually trying to implement this. I ran into some students in Australia who are doing some work on this and I think it's very interesting. It's not my approach. So I'm just going to cross that out because I have a goal, an interest in unobtrusiveness. So what I'm trying to do is create what's called a mirror or use port mirroring. And what this would do is it would take a tap basically on the wire somewhere in that software defined network and make a mirror copy of all the traffic and feed that to the network IDS which can then do what it needs to do. The downside with this approach would be that some of your virtual cores are still have to be allocated to run that network IDS. But at least you're taking the traffic out of band. So if you're going to add this tap to the open stack where would be a good place to put it, well I believe the integration bridge is an appropriate place because all the instances go through there. That's the main intersection of them all and you could choose all the instances in your tenant or some subset. So how is this done exactly? This is the fun part and where I think open sacks is explosively complex if you don't come from a network background. So let's say you want to do this. And you have created an instance and you've loaded it with IDS. Maybe there's a bunch of nice distros out there that have preloaded with security IDSs. So you have this basic setup. This is out of the box, create the instance and it creates what's called a tap interface which is connected to that instance and that in turn connects to the lower box which is the Linux bridge aka firewall. All right. How are we going to do this tap? Well my goal is I want to move traffic from other instances to this and IDS so it can read it and react. So what I'm going to do is I'm going to create a new port and attach that port to this instance. So there. So far I've not really done anything that interesting. I've created a brand new firewall as you can see and I've created a new tap interface. Well that firewall is just going to block all my traffic which is going to come from the instances that this and IDS is guarding. So I've got to somehow get rid of that firewall. No problem. There. I just got rid of it. Now I understand a lot of flashing the word hack in your face. Well that's part of the reason I'm here. And also another point is that there's something called tap as a service which is being talked about this week and I really am looking forward. This is how I got as far as I did as of now. So we got rid of that and now we can add that directly to the integration bridge. Okay. So I still need to move traffic into the network IDS. How am I going to do that? Well here's a cool command for open vSwitch and it's a long one but basically what it does is it selects a new port and creates a mirror to out the output port. I have a single instance which I've directed to mirror but I could have selected more than that. I could have selected or I could have selected all of the instances belonging to a particular tag i.e. tenant. So this is the low level nitty gritty way of accomplishing this and of course you want to make this easier. You don't want to have to go through this yourself. But this is what's going on under the hood I think. Alright. So I've just talked about a way to accomplish this on the same node and as I mentioned we're consuming virtual cores by doing this. Wouldn't be nice if we didn't have to consume virtual cores on that compute node. Well a way to avoid that would be to create another machine, a monitoring node I call it which would be responsible for monitoring only and perhaps it has a different set of hardware to do that. Maybe better set of hardware, maybe tune. So you can create a tunnel to talk to that and here is in ugly graphics but you'll get the idea. I create a patch cable and attach it to the integration bridge and I mirror traffic to that patch cable. Then I attach that patch cable to another bridge called the tunnel bridge which I'm calling a tunnel bridge. And then you can use the same kind of command, the same where you create a mirror, mirroring from the patch cable to something else and what the something else is a generic routing encapsulation interface which will be connected to another node. Again this is low level stuff and one of the nice things I hope to get out of this week is to make this easier and simpler but this is how you make it work. Okay so here if you do play with this stuff I just have a few tips to make your life easier. Basically whenever you create a new component in your network ping it or ping the element right before it and there are counters at each step of the way and you'll know if the packet reached that new point by looking at the counters. It can be dreadfully difficult. Alright so now that we've got this down we've got a way to do it let's analyze it at a little bit of a higher level. So what happens when, why do we care about performance? Well one reason is that if the limits of the network IDS CPU power is reached, the number of virtual cores power is exceeded, packets will be dropped and when packets are dropped you can get false negatives. You may think nothing's wrong when something really is wrong and keep in mind we're doing fan in here. We're taking multiple instances within your tenant potentially and fanning all that traffic into the one network IDS. So if we're going to be, we want to avoid that. How do we analyze this? So there's two areas to look at. One is the open V-switch component and the other is the CPU overhead of the network IDS. There are some upper bounds which are right here. I haven't gotten anywhere near to getting to that level but that's the ultimate upper edge of the envelope. Alright so open V-switch performance. I'm sure many people here are familiar with it for those who aren't. I'll just recap that it has a large cache of flows that it keeps maintains in the kernel. So when a MAC address comes in and it has to determine the destination port it can consult this table in the kernel. I guess you could call it a CAM table and not have to exit to user space. That's really fast. Also this table is very large. So normally happy path is that you're in the kernel and it's quick. If there's a MAC address that shows up and it doesn't exist in that table you jump to user space but that should only happen for that first time. And then another point is that we're not copying packets as the traverse the network from hop to hop. Very good. But here's the problem. There's a 30% reduction when you do the port mirroring in open V-switch. I measured this. I ran into it myself. I did a double take and then I found a paper written last year which had this great table and I put the citation below it. The numbers on the bottom are the number of times that the packet has been mirrored. So there's no mirror in the first bar one. The second time there's a 30% reduction because we're mirroring it once. Three we're mirroring it twice. Four on and on and on. You can see each time you mirror it your total packets per second, megabits per second drops. This is a serious overhead and something to talk about this week. Snort performance. The other component to study. Snort is single threaded at the moment. It's going to become multi-threaded in 3.0 so I've read. So at the moment it won't necessarily, and I'll put necessarily in quotes, scale with additional course. It is possible to run multiple snort instances and give each its own configuration file. Something you might want to tune is string searching, how packets are searched for when you're looking for, say, that shell shock string. There's faster algorithms to do that, and there's recommendations to do that tuning. A alternative to snort is something called shurikata. This is multi-threaded from day one and is also compatible with the snort rules. And I cite one paper, there's a few out there, which have done head-to-head compared bake-offs between shurikata and snort and shown that shurikata is faster. Okay. So that's network IDS, how it can be made to work in OpenStack. Let's talk about host intrusion detection systems. I'll have a few slides where I motivate its usefulness. Basically it does virus scanning and some other things, policy checking and file integrity checking. In the open source world, a popular one is called OSEC, and it does a few other nice things, which I think could be useful in the cloud space, such as log aggregation. So I'm an administrator. I do not want to look at 10 different logs, one snort logs, one var log messages. Maybe I have a web application firewall or WAF from running mod security. All these logs are driving me nuts. What OSEC can do is consolidate them all. It can parse each one of those, and so you have only one single set of logs to deal with. That's useful. The second thing that's nice about OSEC is that it is centrally controlled. So I don't have to look at the logs on each of my instances. I can go to one instance, which is called the OSEC server, and it will read from the different instances, which are running what are called OSEC agents, and monitor via the OSEC agents for bad behavior. So here's kind of an icky picture, but basically you can see that there is an OSEC server on the right, and logs are being sent from the six servers on the left, so they're being parsed on the OSEC server and being read from the six systems on the left. Now those six systems, you can just imagine those to be open stack instances. So here's a few things that OSEC can do. One of is file integrity checking. A bad guy might come in and say modify the LS command, or modify some other command. How would you know that they modified that command and perhaps put in some sort of Trojan horse? Well, if you take a hash of that slash bin slash LS, and you store that somewhere, and then periodically check to see, and to take another hash of bin LS, and see if it matches the one that you, the known one, well then that's how you'll know if it changed or not. So that's one thing that OSEC can do, and you can configure what directories to check. Another thing that OSEC can do that's very nice is what's called policy checking. This is checking configuration files to make sure they haven't been tampered with. So for example, let's say you have a rule in your space that says you must run SE Linux, and say you're a bad guy and you want people to be miserable. So what this does is it checks that it is in enforcing mode, and it will send an alert if it's not. This is one thing to note. OSEC doesn't necessarily check for these in real time. It's a very slow process to compute the hashes of all the files. So this is done perhaps on an hourly or greater basis, unlike the network IDS case. So let's say somebody tried to log in as root with the wrong password, and an alert, an OSEC alert is generated. What happens then? Well, you get an active response. Now an active response in OSEC is actually very nice because it's a script. It can do anything. A shell script can turn invoked Python or evoke OpenStack API, and this I think is another very useful property. Here are some things that are typically done. Blocking the user, maybe turning off the user, destroying the user, turning off the tenant somehow, maybe adding a firewall rule. These are all possible. They're all within the range of possibility. Now Snor can also do active responses, but my understanding is that it doesn't let you run an arbitrary script. It's a little more limited in the sorts of active responses that it allows. Namely, it can reset the TCP connection or send INC ICMP host unknown, if I remember right. So it's not as powerful in that respect. And here's just an example of on the top, I've created a script called wall.shell, and on the bottom I say run that if rule 5503 triggers. Okay. So back to use cases. Well, it goes back to if you're an administrator and your role is to administrate a tenant, then you would like to perhaps run OSEC on all the instances which are under your control. Now the only downside here is that you need to run that OSEC agent on each of those instances. The agent is the one that feeds information back to the OSEC server. And so you have to, those are the glance images that you would have to provide to people. As for the other use case, the hypervisor administrator, I see configuration files of OpenStack, you can see it being looked for and some rules created to check their validity. Okay. So let's put it all together. Here we go. So if you have perhaps two instances here, which may or may not be on separate machines, and you have a monitored instance, which also happens to be an OSEC agent, traffic will come in, say a malicious traffic, and that traffic gets mirrored to the other instance, which is your monitoring instance, that's snort. And it generates a log file. The log file is read by OSEC, which in turns runs a active response, which is a shell script, which will ultimately block the traffic. This is the goal. And you don't need to run OSEC, because as I mentioned, snort runs some limited but useful and important set of active responses. So the same workflow can basically work. Alright. Given this whole background, architecture of how IDSs can work in snort, and we have a way to do it in neutron, boy, how can we make this easier? Because it's too many moving parts. Well, first of all, I believe there's going to be a talk later this week called tap as a service. That is exactly what this IDS needs. So I think that's something to look forward to. We need a way to create a tunnel. That's easy. And probably it'd be nice to have some prebuilt images, which run the network IDS and the HIDS. We need some workflows. One workflow would be setting this all up in the very beginning. We want to monitor the IDS. Now monitoring the IDS, both snort and OSEC, most of them all have nice GUIs over HTML. So that can be something that you could probably access via the monitoring instance. And you want to be able to make sure that when you add and remove instances, they're included in the port mirroring infrastructure. However, that ultimately pans out. So in conclusion, IDS on OpenStack is very much a do-it-yourself thing in my experience. I have not found too many people who are working on this actively, and I'd like to meet some people who are. I think the orchestration is quite complex. What can we do to make this easier? Performance. Having an additional monitor node is ideal. It may not be necessary for everybody if you don't have a high amount of traffic, and it's also an extra expense, because it's extra hardware. You would want to look at Syracuda Cata if you're very interested in high performance or at least work with multiple snort instances. And in the short term, a lot of people who are interested in IDSs are looking forward to the new snort 3.0, which is going to be faster, presumably. And lastly, and this is another thing I hope to get into this week, is why are we seeing the support mirroring drop in performance? Is it because we're actually copying the packet under the hood, and this is resulting in a 30% drop? What can we do to make that faster? Could we use reference counters or something and not delete the packet until they all go to zero? Who knows? That's another area to talk about. And that's my talk, so I'll bring it up to questions now, and thank you very much. So we are doing, playing around with some of the same things, AT&T. And in your example that you showed, the OVS control examples, they tended to be static like you do in a lab. In a production cloud environment where tenants are spinning up, networks are spinning up, you have to spin your snorts up at the same time. That's been a little harder for us, and we're playing around with doing that. Have you thought about how you'd want to make that work with your examples? So that would be the workflow of adding new instances to existing tenants. And new networks, all the dynamic things that happen in open-stack clouds. Okay. Well, first when you add new instances to the existing tenant, I think probably that would be changing the port mirroring so that you're, I suppose you'd have to tear down the old one and create a new one. So yeah, that would cause an interruption. Something could sneak in the middle there. I haven't thought about brand new tenants, and that would of course be a whole new port mirroring construct. I mean, in general, this is sort of the next step for me, and I'd like to talk to you more about this afterwards. Okay, I'm a bit too tall again there. Another question about where you monitor the traffic. Now you do it on the bridges on the compute host themselves with that solution. It seems like this is really hard to scale when you run 100 or 200 compute nodes where you would have to do it per tenant, per node, and monitor all the traffic from there, from each node for each tenant. Wouldn't it be better to do it if you run neutron L3 routers to actually mirror it from the L3 routers themselves? Because I guess the incoming traffic from the internet is what you're mostly concerned about, and there you will get all of that traffic from one place. So you're saying as the traffic enters the cloud, you intercept it at that time, rather than at the integration bridge? Pretty much, yeah. Depending where you do it, either for the whole cloud or per tenant. So it's sort of like on the public network. Right before the public network you could have a tap. Or in the L3 router itself if that would have the... Oh, in the L3 router. That's an interesting idea. I guess I don't have a good feeling for the scalability problem. My understanding is OpenVSwitch is so-called screamingly fast, and I haven't encountered myself, to be quite honest, I haven't done big scalability tests, but I haven't encountered a problem with scalability. Did you mean in terms of the number of IDS instances? I'm thinking more of the complexity when you scale, not the actual performance. Oh, the complexity. I see. So in other words, I have many IDS instances, one for each tenant, and there's a lot going on there. And maybe one for each tenant per host even, depending on how you set it up, because you might have a hundred compute hosts. Yeah, I guess I leave that kind of... I punt that to the administrator of the tenant, a.k.a. project. I basically say, you know, you can do it easy or you can do it hard, and it's sort of on your... What you're saying I wonder is if that's more of the hypervisor level intrusion detection, rather than tenant level. I think you could do both. Okay. Well, it doesn't occur to me how that's simpler, but I'd like to talk to you more. We're going to discuss after this. Again, to talk. Hi, so Marisa from HP, and so they're a working team, working in collaboration with the tipping point security team, so we're working on facets of this, so we'd love to follow up with you. But my question to you is it seems that you discounted the middle box use case entirely. In the middle box, I mean the intrusion prevention use case where you're not just reporting. So I see IDS is still being useful in a tool set to be able to report when there's a security violation. However, in many instances, people want to take the next step and actually do active filtering. And so in the use case, it seems that, or in the discussion today, it seems like you completely discounted the possibility... I had some stuff on it. I didn't discount it at all. Let me go back and go back to here. So you're talking about active responses? Yes, but real time, I guess, like in the case of an IPS, it may do a block. There would be a delay, yeah, in my case. So do you see that as a next step beyond addressing the basic IDS use case? Okay, well in the vision that I outlined here, there is a delay, admittedly. I don't know how big the delay is. It can't be a big delay because if it was a big delay, we'd be falling behind it, we'd be eventually dropping packets. And I say a priori, we cannot drop packets. So I don't know if the delay would be that big, but if you want exact real time, then really you do need some sort of gateway model where you inspect the packet first, store it forward, and you do not forward that packet to its destination until you've inspected it and somehow vetted it. And I think that's what you mean. Exactly, so we'll follow up because we've got some ongoing work like this. I see some big performance issues with that, but technically with enough hardware, you could do it, yeah. Okay, thanks. Hi, Rob Clark from HP. One of the things I think is really interesting is to extend perhaps a little bit on the L3 router concept. So when you were talking about how to apply IDS, I'm really interested in how I can see a tenant-wide view. I want to understand when a tenant or a project that's spread across multiple compute nodes and potentially across multiple different sites is either doing really bad things or having really bad things done to them. And that becomes more difficult if you're operating at a compute node level because then you need to introduce some aggregation point elsewhere where you try and correlate different attacks that are going on against different parts or against different compute nodes. So I'm just wondering if you've had any thoughts in that space and if a few of us are getting together later to go over some of the IDS stuff, I think that would be an interesting topic. I personally know because I've struggled enough to get it working on a single compute node. However, there are lots of people at Red Hat who are working on more along the lines of what you're talking about where you have different regions and clouds separated geographically and so on. I think also that, you know, what I want to try and do, what I'm urging and pushing is that we somehow agree on a way forward and somehow make this easier. So it's not so much of a do-it-yourself project. And perhaps as part of the agenda for getting there, we should include regions, geographically dispersed clouds and so forth and so on, tenets which are maybe two sides of the world, you know, on the other side of the world from each other and so on. So that makes a lot of sense. Amongst other things, I'm also the PTL for the security project in OpenStack. We do a number of things including the security guide which is there to provide deployers with exactly this sort of guidance. And I think it would be an interesting thing to include in that. So I'll follow up. Sounds good. Thank you. Hi, I'm Chandr from Cisco. Have you considered what's the implication of this kind of solution in a Linux Bridge environment? Mainly looking at it because I see a lot of NFVs. Interesting that because of the VLAN, because they do need the actual packet as is. And that's why Linux Bridge is becoming more common in that case. Instead of OBS, instead of OpenVSwitch. So you're saying have I looked at Linux Bridges as opposed to OpenVSwitch? Yeah. No, actually because I'm using the Red Hat RDO distro which has Open Virtual Switch built into it. And okay, so my understanding, my understanding is a Linux Bridge has the same functionality to do port mirroring. However, I'm not sure it's got the same performance. And how about that? Well, the performance will be okay as long as you stay with VLAN, not VXLAN. But the command that you put out there to do the port mirroring is that supported, that's my... On the Linux Bridge. On a Linux Bridge. New, no, no. The syntax is completely different. And that... You would have to start... But the functionality would be the same, I believe. And again, I don't know that for sure because I haven't monkeyed with the Linux Bridge. I'm quite happy with Open Virtual Switch, actually. It does a good job. Yeah.