 It's bold proof in the cloud. So first off, who am I? I've been working in InfoSec for about 11 years, and I had about five years of hobby work before that. The primary interests for me are penetration testing, intrusion detection, and log correlation. I'm currently employed as an InfoSec generalist and a cloud provider. And I've previously worked at several Fortune 100 companies. After this talk, if you should need to reach me, get me at blindedscienceatgmail.com. So what is this? The idea of cloud is kind of, it's not new anymore. A lot of people really understand what cloud is at this point. The problem we have is less of an understanding issue and more of knowing how cloud affects what we do day to day. A lot of the companies that are in the cloud space are approaching it as a very traditional type of environment. They treat it just like they would any other environment they put servers into. But my research that built this particular presentation is primarily a response to last year's presentation, which was cloud computing and weapon of mass destruction. I'll get into that in a few minutes. And a dissection of what the current security postures of the cloud providers out there, and a proposal for what they could do to improve their posture. So what is the cloud? And this is a very dictionary type response to that question. I'm just gonna let you read it. I'm not gonna go into it in depth. But in general, people, as they understand cloud, it's the use of shared resources to accomplish some sort of task. And we've got a variety of different things. We've got SaaS, we've got infrastructure as a service. For the purpose of this talk, I'm really focusing on cloud as infrastructure as a service, or SaaS in a lesser case. Just to keep everyone on a lighthearted side, I wanna keep this fun. So cloud is very serious business for a lot of people. And for us, this is a picture of a kitten. All right, so first off, last year, we had to talk about a weapon of mass destruction. But prior to that, we had the DEF CON 17 talk, which was clobbering the cloud. Clobbering the cloud really focused on using, or I shouldn't say it focused on, but it had an example of using the Salesforce environment to build a Nik2-based scanner. So they showed a tool called SIFTU that was run from the Salesforce environment to scan remote servers, which obviously is not what that was designed to do. Last year, though, we had cloud computing weapon of mass destruction. And again, like I said, this talk is primarily built as a response to that particular talk. During that talk, they showed that cloud providers essentially aren't really doing much internal policing of their clients. Additionally, there's kind of an unofficial policy. While they have an official policy saying there's no scanning from the cloud, the unofficial policy is, as long as complaints aren't received, nothing's gonna be done about it. So what's wrong with the cloud? Why is there a problem here at all? The first problem we have is easy access. Most cloud providers build their environment so people can get into it and out of it very quickly and easily. That allows us to have other problems, such as anonymity and fraud. And the issue of being anonymous isn't necessarily a problem, except for the fact that it leads to fraud. A lot of people use cloud resources, and I'm sure we've probably got a few of you up in here who have used cloud resources in ways they weren't intended to be used. And some of that's fraudulent, some of it isn't. That leads to another problem though. You end up with contention for resources. Again, cloud really is an environment that's meant to be shared. So when you have one client who's misbehaving, it can affect the others through contention. The cloud provider really faces problems with damage to the infrastructure. As I mentioned fraudulent customers, fraudulent customers frequently turn out to be using false credit cards that can't be charged or get charged back so that you've used cloud resources without paying for them. And they're kind of proven inability to address their own security as based off of the previous presentations. The client however faces another series of problems, and if you are a cloud consumer, these things should concern you. One compromised client of a multi-tenant environment can affect others and that gets back to contention. The larger problem though for most of the customers is that they are no longer in control of their entire IT infrastructure. They've offloaded that to someone, especially again, SAS and IAAS, sorry, IAAS. So a user who's compromised may not have any capability of knowing they're compromised and may never know they're compromised. So I'm gonna get into now what most of the cloud providers are doing in this space. The providers are treating cloud security in a traditional hosting environment. That's pretty straightforward. That seems logical in most cases. Clients are given a virtual firewall with inline IPS services, and this is getting to the IAAS. Providers frequently offer a vulnerability assessment for free, and this is great. It shows the providers they're at least somewhat concerned for their clients. And the fact that they're giving away this service in a lot of cases is a good positive move forward. Each client's virtual instance is independent, which means the clients are essentially fending for themselves with no coordinated enterprise security. So the problems with the conventional solutions that a lot of cloud providers are doing, IPS first of all, it's very difficult for providers to offer a pre-packaged IPS that works for all clients who won't block legitimate traffic, and that's pretty straightforward. When you put together an environment where people can drop in and out of it very quickly, it's tough to give them something that's a cookie cutter that will work for everyone. IPS in particular is a danger because you're talking about blocking traffic, which in some cases may be legitimate traffic. Information coming from an IPS is frequently incomplete, so you have encryption that keeps you from seeing the data inside the IPS and a lack of awareness of what's going on in the endpoint. The last problem with IPS is that it has to work at line speeds. Very complex correlations aren't possible. What I mean by very complex, a lot of the IPS solutions that are out there do a really good job of doing vulnerability assessment to IPS-type correlation and other basic correlations. When I talk about complex correlations, I'm talking about behavioral correlations that go back months. The IPS can't keep that amount of data in memory, in some cases, or it can't look at it quickly enough to make line speed decisions, so it has to make sometimes incomplete decisions. Then we get to the problem of a traditional network design. Really, in a traditional network design, and I'm sure as security practitioners, we've all dealt with this in the past, you have the turtle shell approach. Hard on the outside, soft on the inside, so you're focusing on your external threats. You're assuming the internal hosts are trusted, and beyond that, your clients really aren't benefiting from security data that's being generated by the other clients. So, how is that working? Now, what I did is I have the access to a series of logs from a variety of my clients. So I took those logs, and I did probably what isn't exactly a scientific analysis of it. What I looked for were the large cloud providers, hosts that recurred from those large cloud providers, and how long it took after I sent a automated alert to that provider that there was something wrong in your network for them to respond to it. What I had to do is because most client, I'm sorry, most providers don't respond to those types of emails, I had to make guesses on how they're performing on their side. That may lead to worst case scenarios, so I would look for a host that occurred. If I sent out an email, and they responded to it, I would never see that host again. That data got thrown out because I couldn't be sure. Also, if I saw a host, and I saw the host again, and then I never saw the host again after that, I couldn't be sure that the provider had actually fixed it. Again, I just had to make guesses based off the data I was seeing. So I can't say for certain what the security posture is inside of a company, but I can guess the nature of the security posture based on behaviors of the network and their personnel. Guesses were based on how frequently a particular host contacted my network, and how long it took for it to stop. So all my data is from the first six months of 2011. The first one is Amazon Web Services. Now, Amazon Web Services, I had a single recurring host from, and given their size, that's probably a really good indicator that they're very responsive to emails or complaints of any sort about hosts on their network. This is the raw data, and I'm gonna explain what this is here. This is coming from essentially a SIM. Each of these is two entries related to one event, and I color coded them so you could understand they're two different things. In this particular case, what this is, is a sweep of my network for open Tomcat servers, and then following up an attempt to brute force those servers. So based on this, we know Amazon's response time to complaints and incidents is probably at least 14 and a half hours, because that's the amount of time that occurred between the two incidents, and when I complained about them to Amazon. Next up is Rackspace, which acquired Slicehost. There were 10 recurring hosts from Rackspace. Now, I should go into this noting that Rackspace, I can't tell from the outside, unless I do a scan of their network, whether these are actually cloud devices or traditional devices. But again, this is more on their responsiveness to complaints and less about their cloud particularly. So this is an event from Rackspace that represented an SSH scan of my network. Rackspace's response wasn't as good as Amazon's, but I'm sure a lot of us dealing with incident response know that 48 hours, it's probably a little longer than we want it to be, but it's not completely unreasonable, because in a lot of cases, you're trying to contact customers, verify what you're seeing, make sure that there was actually an abuse of the network going on, and sometimes that can take some time. The next slide I'm gonna show you is from a provider where, and again, I know some of you are in the crowd here, if you're looking for services that won't be shut off, no matter how bad you are, go to SoftLayer. So I had five recurring hosts from SoftLayer. All of them spanned multiple days. And as far as I can tell, SoftLayer never responds to complaints or incidents, or at the very least, you can measure their response in months. I had two pieces of data here. Now, I'll clarify, the first one is actually an ICMP sweep that they do of my network every month. The second one is an SSH sweep. So I can understand that maybe, if you get a complaint about an ICMP sweep, I'm not really sure why SoftLayer feels the need to scan my IP space once a month and ignore my complaints about it every month, but I can kind of understand that someone may say, okay, well, it's just an ICMP sweep, I'm not too worried about that, I'm going to ignore it. But if you look at the other case, it's a little more obvious, the SSH scan. It took them nine and a half days to respond to that, it's a week and a half. That's not the greatest response time, and I can't even be sure that they shut the host down. The host may have just given up trying to talk to me. So how can we tighten it up? Clients should have their own IDS firewall, but hosts that are attacking multiple clients should be detected and shown by the provider. Clients aren't capable of sharing information with each other, or if they were, it would take a lot of effort on their part. The provider is actually in a unique situation where they can see all the traffic on the network if they try. So as a provider, if you've got a series of systems on the outside that are attacking pretty much all your customers, those customers may not have any idea that they're part of a coordinated attack, but you should because you can see all of it. So the provider should be taking steps to help the clients protect themselves. They're not. The provider should also be looking for intentionally malicious internal clients. Consoling events from all client environments to look for enterprise threatening external agents improves things from the outside, but actually if you look at how the providers are attacking security in the cloud, the single largest unaddressed threat is the client networks. The client networks are a danger to both the provider and the other clients, again because of contention and damage to the infrastructure. So why aren't the providers doing this, or what challenges the provider's being faced with? Well the first one is frequent rapid client changes. The nature of cloud is such that clients come and go all the time. You can't really be sure what they're doing with your network, and they're probably all doing something different altogether. Clients are gonna have a wide variety of services, users, and ways of utilizing their resources. Your clients are in an unknown state, and by unknown, most of the clients are probably gonna be normal, law-abiding citizens. They're gonna sit on your network, use your resources, maybe run a web server, a database server, things like that. But you can't be certain of that. Some of the clients are gonna be bad guys. Some of the clients aren't gonna be bad guys, but they're gonna be compromised by bad guys. So you've got a whole host of clients who are in an unknown state. If you're gonna take action on incoming traffic, you need to, or even outgoing traffic, you need to be as close to zero percent false positive as possible. You're taking action on behalf of the clients, right? A client in his environment knows what he's expecting to see. He knows, he can probably be pretty certain that a certain type of traffic does not belong in his network. Use a provider can't be as certain because you're trying to put together a cookie cutter sort of solution that works for everyone. So you need to make sure that you're as close to zero percent false positive as possible if you're gonna be taking action on these things. What stays the same? In an IPS, owned and controlled by the client. There's no reason why you shouldn't give him that. Firewall, again, owned and controlled by the client. In most cases, these are virtualized firewalls for the cloud providers. Vulnerability assessment, we leave that in. All this stuff is well understood technology and it allows clients baseline control over their own networks within the cloud. A lot of clients aren't gonna use this stuff. A lot of clients of cloud are using cloud specifically because they don't wanna have to run their own IT infrastructure. And their level of expertise is a little bit lower than what yours or other security practitioners might be. So they may not use the technologies at all, but at least you're providing them. All right, so what do we add to the infrastructure? How do we make this system, how do we start building the system that will protect our clients and us from potential damage? This seems pretty straightforward to me. Doing NetFlow inside your environment. Understanding what traffic is there, what traffic should be there and kind of getting a baseline for what things are supposed to look like. We should be adding an enterprise-wide IDS. Now this particular IDS should be completely out of the visibility and control of the clients because again, they're in an unknown state. You don't want clients being able to see, I am watching you and gonna take action if you keep misbehaving. You don't want them to see that and you don't want, because if they can see it, then they can try to evade it. They're gonna try to evade it anyways, but at the very least, you're keeping that within your own eyes. Network access control. This is pretty straightforward too. It's a little different for cloud than it would be for a traditional environment and I'll get to that in a minute. You're gonna throw in an event correlator and this is what's traditionally known as a SIM. For my use here, we're talking about a forensics analysis SIM rather than a regulatory SIM as most people are familiar with it. Log consolidation and on access and misconfiguration detection. That phrase just means watching the network for new servers and services and checking them for very basic misconfigurations. All right, so when I talk about some of these things, a lot of people ask me why not use OSSIM? And OSSIM, I've thrown the URL up here in case anybody wants to investigate open source SIMs. But OSSIM uses many of the same tools I'm suggesting. And the problem I have with and the reason why I don't personally use it is because it makes assumptions about the network it's placed into and that just means that the tools it's using are all prescribed. When you use OSSIM, you know you're gonna use the asset database, I guess the base database as your system for managing data and all the tools that go along with OSSIM. Additionally, OSSIM's correlation engine is not as flexible as SEC, which is what I'll be talking about in a minute. But OSSIM does have advantages. If you're looking for something where you won't have to manage it yourself, part of the problem with building something yourself is that you've gotta deal with updates and keeping track of what's vulnerable. OSSIM, of course, takes care of all that for you. So with NetFlow in particular, I like to use a tool called NFDump. NetFlow, there are a lot of analyzers out there in the open source community. And by the way, all these tools I'm gonna talk about, I specifically chose open source tools for this talk because I wanted to make sure that if any of you wanted to replicate this, you could. But there's a lot of NetFlow open source projects out there. Most of them are inactive. I chose NFDump from the variety of inactive projects or semi-active projects, mostly because it's, really all it does is it throws everything into a database and it provides you a lot of command line tools from enabling that database. They have a front end for it called NFSend that I don't personally use. But it's convenient for scripting. So NetFlow, as we know, is used to monitor flows into and out of the network. We can use it to monitor for excessive prolonged network utilization. It can also be used to trend network performance and flag suspicious spikes. Data for descent from internal switches and other network devices for analysis. Again, you're not relying on customer environment, so you're not relying on their virtual firewall. You're relying on all your infrastructure devices to give you this data. And it can help provide network server and service inventory data for keeping track of what's running inside the client networks and your network. I like Snort for Enterprise Wide IDS. URL for those of you who haven't looked at it before, but I'm sure most of you have probably seen this. It's well known and widely used. It's independent of the clients. They cannot see it. It's attached to the network egress and ingress points. There are no trusted networks. Snort, you've got that home net variable. We're not setting that to anything. We're looking for everything. Everything is untrusted in our design. And this will also provide you some network server and service inventory data. Packet Fence is the NAC I've chosen to use. And Packet Fence has a discussion, or sorry, a talk going on tomorrow in case any of you want more information than the one I'm gonna give you here. There's the URL. Now, NAC breaks down into two different types of technologies. It's pre-admission and post-admission. Pre-admission is probably what most of you who are running corporate networks are familiar with. Pre-admission NACs, they look for when a host comes onto the network to make sure there's things like patch levels, updated antivirus, all the configuration controls that you're expecting to be on the server or host, and allow the server to come into the network based off of those things being in line with what you're expecting. The post-admission devices on the other hand, what they're doing is they're looking for behavior once the host is on the network. Ideally, you're doing both for corporate environments. It's much more difficult to do pre-admission in a cloud environment because, clients really could be anything. The client is buying space from you. So, reason is for post-admission behavior quarantine. It's gonna take input from other systems and use it to make decisions to quarantine devices. Syslog ng, which I'm pretty sure most of you have been exposed to at some point. Again, it's well known and widely used. All your infrastructure devices, your servers, switches, IDS, et cetera, logging here, again, these are not your customer devices because the customer devices they have control over and can do weird things too, and part of what you're trying to protect yourself and your clients from are these client environments. Okay, the on-access misconfiguration detection. So, I use a variety of tools for this. I use Medusa for the basic brute forcing and I have a list of like 10 usernames and 13 passwords I use should run a scan within about 30 seconds. Metasploit for some more advanced things. And then Endmap, I use Endmap primarily to look for our customers who have bought firewalls from us but aren't using them. It's my way of saying, hey, you set, you know, and this has happened. Customers come into our environment and set allow all while they're testing something and just leave it that way. I'm sure you all know that that's a nightmare. The clients may not, so the Endmap is used to detect that and warn them, hey, your firewall's turned off right now. I have a few others for odds and ends. They're specialized things like looking for open proxy, stuff like that, but these are the ones that are the workhorses of this particular system. These tools are called by the correlation system to run basic misconfiguration checks of any new servers and services seen on the network and that comes from the NetFlow data as well as the Snort data. Finally, we have where the magic occurs which is the correlation system. Now, my correlation system of choice is the Simple Event Correlator. It's a pretty simple non-vendor specific correlation system. It'll keep track of events from a variety of sources. It's not in line, so this is a big difference from the inline IPS. We're not in line here, so I can make slow, well-informed decisions. I can keep all kinds of things in memory as long as I have enough memory to hold it and I can spend minutes making decisions instead of having to spend split seconds. The SEC system is in charge of coordinating everything else in the environment. All right, so how does this work? We've got here our rather geeky-looking correlator, our cloud with our client environment and our outgoing ISP link. This symbol in the, this spot right here is just, this is where our NetFlow data is coming from. I'm using it to generically represent our switches and various network devices, our firewall and down below, we have the AAA server. On the outside, we have the enterprise-wide IDS, again running independent of the client. From time to time, our correlator will fire off our on-access misconfiguration detection as well as our vulnerability assessment. This is used to keep track of what's going on with the client systems, to give us an idea of where there might be security problems. So let's say we have a customer start sending out an event, right, these are any kind of malware, could be worm propagation, could be a scam. All these devices would be configured to send logs back to a central location, our correlator in this case. Additionally, the two network-style devices, the switches, the NetFlow and our firewalls would send our NetFlow data back to the correlator. Hopefully this should allow our correlator to understand there's a threat going on, fire off the NAC, which will then turn off the server and the malware. But how does that occur? I'm gonna go through a few scenarios real quick on easy ways that this can be done. Towards the end of the talk, I'll talk about some more complex ways to detect misbehaving hosts. But here we have our IDS, our firewall and our switches. And these are our happy clients. We'll throw in a bad guy. So someone buys a server from us and they turn it on and they start doing bad stuff. Now, this particular scenario, our bad guy has started using a known hacking tool to start attacking outside the environment. And this is actually a really simple way to detect them. The one I see most frequently with this is SIP scanners. So the external IDS should understand that there are some signatures related to specific tools that are known to be bad. Detects that, notifies the correlator. Correlator says, hey, I got malware in here. Shuts down. But there are other scenarios. In this case, maybe we have a pattern of traffic going outmount. A series of packets that all together, or individually I should say don't mean anything, but all together represent some sort of larger pattern. In this case, let's say we've got an ARP storm going on on the inside. We could capture that with NetFlow. The NetFlow would tell the correlator something weird was going on. The correlator would have to make its own decision about that NetFlow data. And again, shut the host off. Now, a quick note about NFDump. And this is common with all NetFlow data. Unusual traffic patterns alone don't dictate an incident. What needs to be done is an NFDump data has to be compared with IDS, firewall, and other data to look for anomalies. An example of this would be a traffic beak combined with the ARP collision messages coming from your switches could be indicative of an ARP cash overflow. And I should probably say that it's highly indicative of an ARP cash overflow. A traffic beak combined with many IRC events, probably some sort of botnet participation. As for the correlated IDS logs, there's a lot more information there, but it's limited to what we can see. A single event type, if it entered a server and then it was replayed by that server outbound several times, might be a worm. It might be email, but it might be a worm. If a server contacts excessive servers using the same administrative protocol, so it's scanning outbound for like SSH, protocol scanning. I just threw this one up pretty quickly earlier today to give you guys an idea of the types of correlation. These are the most common ones I catch on my network. The red ones are the ones I've marked as being close to 100% of chance of bad stuff going on. The rest are things where I would send out an alert and have an administrator come and take a closer look. The first one's a sweep. So a sweep is just where a event is played from a host across multiple other hosts, pretty straightforward. It's usually indicative of inventorying a network, sometimes protocol scanning, a variety of things. A scan is where one host contacts one host and just plays a bunch of different types of events. That's like Nessus, Nik2, those types of tools. A storm is one loud, noisy host, generating a lot of traffic. That may or may not be something bad. Sometimes it's just someone sending out a bunch of email. Could be any of the number of things. Baseline Delta is a much more complex correlation. What your correlators should be doing is keeping track of what's normal for your network. That's both for the types of events that occur on your network, how clients normally perform on your network, and what the network traffic looks like from the NetFlow data. Baseline Delta, that is just a general name for anything that deviates significantly from that. Say 150 to 200% spike in traffic. New snort event types you've never seen before, things like that. The worm, I talked about briefly, you have an event come in and then play out multiple times. Web scan is just like a Nik2 style scan. It's also where you've got a lot of 404s, a 403s outbound, things that are not really completely normal for web traffic. Behavioral, I probably shouldn't actually include it but behavioral is based off of the snort behavioral events. Admin protocol, SSH scans, attack tools or known attack tools, tools that like the SIP I mentioned before, the SIP scanner, SIPvicious is a big one by the way, and zombies, which as I mentioned before can sometimes be detected using IRC when you've got a big peak in network traffic. There's limitations to this kind of solution. The first one is you have to err on the side of caution. There may be traffic that looks very, very bad but a customer is expecting it to be allowed on your network because it's totally normal. Getting back to the example of IRC connections. Say a customer has been using an IRC server on their system for a while or even if they haven't, they just turn one up and then they suddenly start streaming out a lot of video. Those by themselves aren't very bad but you've got a peak combined with a bunch of IRC traffic that could look like a zombie. So you gotta be really careful on what you turn on, what you take action, what you turn off and what you take action on. The system is primarily reactive. You may be taking action after damage is already done. The goal of the system isn't, so IPS is meant primarily to prevent these types of things, right? The goal of the system is to slow the attacker down. It's somewhat similar to a honeypot but different in scope. What I've found is that most attackers aren't targeted attackers. They're opportunistic attackers. They're looking for a specific thing that's vulnerable on any server anywhere and they don't really care what it is or I should say they don't care what server it is, they just want in. So they try the same thing against a bunch of different systems. This system is primarily designed to keep those kinds of guys out. They start scanning, they attack host client number one and with any luck they are unable to attack anybody else after that because my enterprise level system has detected it. Without the client needing to take any action at all. So to conclude, cloud providers really don't appear to be internally policed in their clients' networks at all. And really they should be taking reliable measures to detect both malicious clients and compromise clients. And I guess at that point I'll take questions. No, I have the advantage of, since I'm from a corporate environment, I have an excellent marketing person here who did the majority of the visual design for me. I'm nowhere near this talented. Well, the goal of the system is to, I guess there's no more danger of a client compromising another client. When we talk about compromise, I mean breaking in from one client environment to another client environment. Then there is from an external environment breaking into that same client environment. The concern for client to client is primarily one of contention. So when a client, and this is another thing we've seen, ARP storms, right? They start broadcasting just a lot of ARPs trying to bypass the switches. What that does is it puts a load on the network and so since it's a shared network, everyone else has less network throughput. What we do is we detect those sorts of things using an internal series of IDSs that I didn't really talk about here or NetFlow data. Anybody else? No, okay. Thank you for sitting through. I appreciate your time. Thank you.