 OK. So welcome. My name's Rob Clark. I'm a security architect for Hewlett Packard. I've been working with OpenStack for around about the last three years. And in that time, I helped bootstrap the vulnerability management team, which acts as a cert. And I'll tell you a little bit more about them. And I started the OpenStack security group. For this release circle, I am the elected leader of the security group. So during this talk, I'm going to talk to you a little bit about vulnerability management and mention OpenStack security notes for those of you that weren't in the previous talk. We're going to go through OpenStack security advisories in review. We're going to look at the impact and try and work out some other ways forward. I was supposed to be sharing the stage today with Cody Bunch from Rackspace, who unfortunately wasn't able to make it. So any mistakes in the slides are his fault. I should point out this is going to be a very fast talk. I've got a large number of slides. I got a little bit PowerPoint happy. So we'll just see how we go. So the vulnerability management team exists to receive publicly reported vulnerabilities in OpenStack. They are an independent group of security-minded engineers. They have representatives from HP and other organizations on there. But their primary role when acting as the VMT is not to represent those organizations. They are there to work with and improve OpenStack. The VMT receives vulnerabilities normally in private. They triage them, patch them, work with PTLs. If the vulnerability turns out to not be so bad as perhaps the reporter thought, then they will perhaps just move it to an OpenStack security note or do something similar. The OpenStack security group, which I need to tell you a little bit about for those of you that weren't here in the last session, was established about two years ago by myself and Brian Payne from Nebula. He consists mainly of security engineers, architects, consultants, and developers. We consult with the vulnerability management team on security advisories. So when something comes in and they're not sure about it, or they'll go, hey Rob, how does this affect the public cloud? Or how does Brian's house affect a private appliance led, driven, something like that? So consultant advisories, we run a number of security initiatives. You have over 150 members, and we publish security notes, kind of mirror advisories. And advisories are the main point of this talk. But the security notes exist to provide you with information about parts of your configuration or bits in OpenStack that might cause you problems or cause you security issues. So this is the real difference between an OSSA and an OSSN. So an advisory is a bug in OpenStack that's going to create some sort of security issue for you. A security note, basically, if you configure it like this, use this technology naively, deploy it in a certain way, you're going to have a bad time. So security advisories are published to the mailing lists. And security notes in a very similar format are published more widely. So we have the mailing list. We also have a wiki page at lists, all of the security notes one after another. So what I'm going to do in this talk is take you through a little journey in terms of trying to understand how vulnerable OpenStack has been in the past by looking at the various issued OpenStack security advisories. Now, the advisories are listed on release pages. So in theory, you can go and have a look and say, well, Diablo had this many and Essex had this many. And it turns out to not quite work like that. So I have a quick look at the spread of vulnerabilities throughout OpenStack in terms of the OpenStack security advisories that were issued. And here you see a scary picture for anyone who still weren't running grizzly, dropping off to Walter Vanna. There's no obvious reason why grizzly should have so many more security advisories logged against it. And we'll investigate that a little bit more going forward. So this is a format of the data that I prefer to look at. I appreciate it. It's going to be difficult for some of you to read. It's OK. I don't expect you to know the individual notes. I just like to see the data in this sort of format. And one of the things we looked at doing was breaking down the data by when the vulnerability was introduced. So we see that there's actually a lot of repetitions. You can see OSSA 2020-05 propagates forward through three releases. We also notice that there's not a single shared vulnerability between Folsom and Grizzly, according to the release notes that we were basing our analysis on. But between Grizzly and Havana, there's a massive number of replicated or duplicated OSSAs. Around about two-thirds of the vulnerabilities reported in Havana were actually inherited or introduced in Grizzly. Again, making Grizzly seem really bad, and we couldn't find any good reason for it. So then we decided to have a look the other way. So when a vulnerability is reported, we assume that it is reported in the most recent release. So if a vulnerability was reported in Grizzly and Havana, we take Havana to be the point where we care about the vulnerability and see how it propagates down. So by doing that this way, we have the same mappings as before, but we start knocking data out of Grizzly as you would expect. We see a lot of data being knocked out, and we see this small spread across Folsom. And there's, again, this gap between the two of them that we couldn't explain. We looked at release notes. We looked at the various services that were released in Grizzly and between Folsom and Grizzly, and was horizon a problem. Probably not, and was heat a problem? Well, it kind of came a little bit later, and it kind of didn't make a whole bunch of sense, which is not good when you're trying to establish a security baseline for a system. It just didn't pass the SNF test. That data doesn't look right to me. So what we ended up doing is looking through, manually going through every single security note. And what we found is that actually, security notes in terms of what they address isn't necessarily reflected in what is easily reported today, which was bad because a load of the graphs and a lot of the analysis we'd already done were on the previous set of data. So what we end up seeing here is the vulnerabilities in terms of where they're introduced and propagate forward. We suddenly see that there's a large number of vulnerabilities introduced in Folsom that actually affect Grizzly. So we no longer have that gap. And if we go the other way, we can see that there are a large number of vulnerabilities introduced in Grizzly that affects Folsom. And in fact, all those bold ones towards the bottom of the Folsom release. And what this means is we really have two ways of looking at security advisory data in OpenStack. We can have a look at when the vulnerability was introduced. So we take a look at an advisory that says, this piece of code was broken in Folsom, Grizzly, and Havana. And we decide that, OK, well, we're going to attribute that to Folsom and knock it out of the other two lists. Or we go the other way and say, well, this was reported on a certain date, so we'll assign it to that release. And then the other releases before it will drop off. And what we end up with is these two different graphs. So the first one is a graph of when bugs were written in which release. And the second graph is more what we were expecting to find when we were looking at this data, which is where more inspection has been applied to the code, when more people are looking at the code, as the project grows, where do we see these vulnerabilities coming up? So we found that, firstly, you shouldn't get security guys to do statistics. And we found that simply looking in the obvious places for the data wasn't really good enough. So we had to go through every single security advisory and do the mappings ourselves. We noticed that the way that things are discovered is very different to when the vulnerabilities are introduced. We see this fairly safe upward trend for discovery, but we see a mixed trend for when they're introduced. And this is the data all side by side. And we can see that no longer is grizzly considered to be the most terrifying release. Now it's probably Folsom. We find that Havana is kind of trending along the right line in terms of the number of advisories against it, and that the number of advisories being detected, or the number of vulnerabilities being detected, which is what you see in orange here, has gone up year on year, released by release, which is probably a good thing. So I'm going to take you through a little bit of the vulnerability spread between the different releases to have a look, see if there's any obvious trends that we can recognize. So in Diablo, it was Keystone and Nova, and a little under half of the vulnerabilities were critical. So orange denotes critical here. Essex, Keystone Horizon, about a third. And then we start seeing trends. Folsom, the number of projects having security advisories, security vulnerabilities introduced that subsequently resulted in OSSAs, starts to spread. So we have Glantz, Keystone, Nova, and a mixed one that affected Nova, Keystone, and someone else. And we see that well over half of these vulnerabilities were rated as critical. And then as we go into Grizzly, more projects, more big vulnerabilities. Going to Havana, number of projects seems to sort of stabilize. The criticality of a lot of these bugs continues to grow. I need to clarify a point here. Around the Essex time frame somewhere, the vulnerability management team stopped assigning impact ratings to security advisories. So you no longer got a critical, medium, high advisory. You were just told, there's this bug. Go work out whether or not it affects you. Which is great, because they were saying, well, we don't necessarily know for every type of deployment how important this bug is going to be for you. However, having question marks all over these slides would have made them fairly terrible. So I went through and assigned impacts based on how it would affect HP Helium, how it would affect the products that I'm involved with. And because it was me that was doing this, and because of the sorts of products we deliver, I would swing them all towards critical. I want to treat something as being more important rather than less, because there's less chance of it coming back to bite me in the ass. Vulnerability classification, we're going to look a little bit at the vulnerabilities here in terms of what the root cause was. What caused Alpenstack to be vulnerable? And we've really just focused on the critical ones here, because there are some really fun vulnerabilities in there. Might be a slightly geeky thing to say. But unvalidated input, we see this all over the place. Data leakage, there's some really interesting stuff in Nova around data leakage. Credential disclosure and access control failures. Unfortunately, the ACF is going to hit keystone a lot. I'm sure there are some keystone people in here. I'm not trying to beat you up. It's just if there are access control failures, that's where they're going to be. And denial of service, I'm not actually terribly comfortable with denial of service being up there as a vulnerability classification, because to my mind it's not. It's an impact of a vulnerability. You have a buffer overflow or resource exhaustion or something that results in your system no longer being stable. And that result is the denial of service. However, for some of them, it kind of just made sense to lump them together. So where I've done that, I tried to point out what the vulnerability was, as well as it being a denial of service. So looking at some of the unvalidated input vulnerabilities, we see Nova 2012-08 arbitrary file injection. This one was really great. Any authenticated user on your system could inject a file onto the compute host just through standard API calls. The reason for this is that the input wasn't validated. So the paths weren't checked. And the assumption was, and this is a theme running right throughout it, as long as the user was authenticated with the system, it was kind of the responsibility of the cloud operator to make sure their users weren't doing bad things. Access control failures. 2012-01, malicious command injection. It was as simple as this, which is lovely. This is the only vulnerability where I've taken the time to pull out the git for you. I just kind of wanted to show you what was happening in the original code, which was where, if you provided a project ID, it would just overwrite the one it already had for the project you'd authenticated against. So you could just do anything as any user on the system as long as you were able to authenticate at one point in time. I think that's awesome. It's a great example, and I'm going to use it in security training for a long time. 2014-18, routers can cross-plugged by other tenants. Again, this was a problem with too much trust being placed on things that were happening within the API. So it's an authenticated user who could go on and basically just re-root the networking, which sounds like a bad thing. We see this a lot in terms of the users that are allowed to use the system. So the Denial of Service ones are interesting. So these ones were, again, unvalidated input. But instead of talking about the APIs and the things that are being done in there, we're talking about files that are provided. So the QCOW2 images that were provided, if they gave a certain size and it got put on disk, and then they started building and building and building, they could exhaust the disk, resulting in a Denial of Service, but it's a result of not validating input to the system, in this case, the QCOW2 files. The OS type metadata was very interesting. So a backing file would be created on the Nova host when certain operations were performed. And these backing files are always supposed to be very, very small. And for that reason, they don't necessarily get cleaned up in the way you would expect. But a malicious user who started a cycle of creating a small VM with a randomly generated OS type and then cutting it down and spinning it up and cutting it down, they could create thousands and thousands of these very, very small files on their machine, which you might think is not that bad. But who here is familiar with the iNode exhaustion and how you can break Linux file systems? OK. So you actually don't have to create that many small files before Linux can't figure out where to put things, which is bad and results in a Denial of Service. And again, it's because the trust of the remote authenticated user. Long server names grow over API logs. This, again, is a great one. So you could just put a post, a really long server name, through the API, and it would go in the log, as this really, really, really, really, really long server name. And you could do that basically as fast as your network connection would allow. Or if you were kind of smart, you'd maybe spin up 30 VMs and have all them do it as fast as they could do it on the local network and take down all the logging system for OpenStack. Which seems like a bad thing to let people do. Credential disclosures, these are where OpenStack used to be full of these. And there aren't many now. It's where OpenStack logs things in inappropriate ways. Typically, user credentials in URLs. So the first one I'm going to talk to you about is the Glance Swift Store. So when you're using Swift Store with Glance and when there was a problem when it wasn't able to connect to Swift for some reason, the URL that it was trying to connect to, it would be logged without any sanitization or filtering, which meant that the username and password that Glance was using to access Swift, which in most deployments is likely the same for every single tenant, was available in log data. And that meant that, I mean, who here runs a large scale cloud deployment? OK, a few people, sort of largest, middling. When you're running these things above a certain size, especially if you're dealing with the public, you're going to have to have support teams. You're going to have to have separation of responsibilities and the lowest privilege. When you do that, your customer support team will normally get access to your log files, but they won't get access to go and delete everybody's images. This vulnerability broke that down by putting all the credentials you'd need to access Swift and delete everybody's images, or tamper them, or replace them, available to anybody who had access to the log data. 2013-31, again, a credential privilege escalation vulnerability, exactly the same as the last one, stuff was being logged that shouldn't have been logged. Data leakage vulnerabilities in OpenStack are really, really interesting. So this is the mixed vulnerability I mentioned earlier, 2013-04. And the result of the attack was that an attacker could consume lots and lots of resources by abusing the way certain libraries were being used on the system. So what we end up with is this kind of cut, where we see that access control failures actually cover, for critical vulnerabilities, cover almost half of the spectrum. So in almost half of the times that a security advisory was issued, it was to address a failure of the system to adequately separate user operations, which is something that we're bound to have a whole bunch of technical contributors in the room, as well as a load of other people, just encourage you to go and have a look at the security guidelines, and just have a think about where you are in terms of making sure you keep things separate. Because we still see it happen now. So there's stuff happening in neutron and quantum relatively recently, where we have the same thing again. And I'm not beating those guys up. It's just there are new projects doing lots of highly multi-tenant stuff, and it's easier to spot where they have problems. So what to do? So today, the security group has a number of operations, two of which that directly apply to the problems we see in OpenStack security. So we see the threat analysis work is really where I want to see a lot more investment from my peers in the OpenStack community. For those people that are running large clouds, those people that care about security, a lot of these people are doing threat analysis work already. The developer security guidelines, we have a published set of guidelines. They're not as polished as they could be. But what they do is address all of these different issues that were root causes in the vulnerabilities that I've just run you through very, very quickly. And what we want to do is get core developers and PTLs to agree that they will evangelize these sets of guidelines within their teams when they're doing development work. And hopefully as time goes on, we can codify a lot of that stuff into either Jenkins or Tempest, depending on what's going on, and other places where we can bring in some checks to see if these guidelines are being followed. So again, you probably won't be able to see this too well. It's just a demonstration of the fact that the guidelines are up on the Wiki. They're available. We need to continue building them out and improving on them and iterating on them. It's a great place to get involved with OpenStack security. If you're looking for a way to get involved with this stuff and to modify things, then that's your opportunity. The threat analysis efforts that are taking part, the threat analysis that's going on right now is community-led. It's got a growing list of people getting involved. I really want to see lots of people doing more in terms of threat analysis. So I work for HP. We have a team of people who, amongst other things, spend a lot of time doing threat analysis. The Nebula guys do the same. The Red Hat guys do the same. There are lots and lots of people all doing threat analysis work. They're all missing things. My team will miss things. Everyone's team will be missing things. I'd like it if we could get everybody together to not just share results of the threat analyses that they have done, but to all contribute to a larger set of threat analyses. Hopefully, the result of that being that various companies who are involved can just end up doing small delta reviews for their own little bits of secret source, the bits that are significantly different to open-stack core. And we end up with an iterative threat review or threat analysis for at least of all the major core projects, but I'd like to see it go wider. We're doing a lot of work on that in the security group. It's one of our main focuses for the next release. So if anyone wants to get involved or anyone's involved with those sorts of activities that they're organizing and thinks that they'd like to work with us on that, then I encourage you to reach out to me or reach out to the security group. And we can reduce this massive duplication of effort. We can hopefully improve the quality of the threat analysis that we all benefit from. And we can stop people making really obvious mistakes in terms of multi-tenant failures in terms of data leakage and ACF and denial of service and all that stuff. So I've actually run through this deck much faster than I thought. I was a bit worried that I had too many slides. So I'm more than happy to go back through, and we can talk about any of the vulnerabilities that you thought were interesting or failing that. I'm available for questions, and maybe we can talk a little bit more of threat analysis in the security group, those sorts of things. So there's a mic just there if you want to ask a question. As far as the secure development guidelines, how much external influence do you encourage? Because I personally know that Cisco has its own secure development lifecycle. Others have their own, or other corporations as well, have their own kind of way of developing code securely and methodologies for that. And I was kind of wondering how much corporate versus personal influence goes into that. That's a good question. So everybody is welcome to contribute. Obviously, if a team of four or five people come in and say, hey, security group, we're all from whatever large company, and we've been doing a whole bunch of work on this, and we want to share it, obviously we're going to spend a lot of time looking at what they've produced and hearing their ideas on that, because it represents a whole bunch of effort that's been put in. However, it is open for anyone who wants to contribute and wants to come in. It's also a good learning opportunity. We occasionally get people who are just developers or just coming by to see what's happening, to see the sort of rigor that is being applied in terms of threat analysis and that sort of stuff. Thank you. So more on threat analysis, you said that there is a lot of replication and duplication of work between different teams. So first, what is the threat analysis process you follow in your own company, and how would you create this effort that you are suggesting of getting different companies to work together? That's a good question. In fact, that's probably the biggest question for how we're going to get people to integrate. Within HP, we use some services that are HP Secret Source that we sell to people, and they give us money to go to it. We also have operational security review processes where we spend time going through code review. We look at SDLC type approaches. I don't think there is really a finalized way to do this yet. The threat analysis team is running right now. They are doing some really cool work right now. A lot of their stuff kind of looks like the OWASP analysis that's laid out. They do a lot of interesting work there. I am open to opportunities to integrate some of this stuff together. I find, however, that as much as we have this process that we use at HP, and I like it, and other places will have their own as well, and people will be doing sort of common criteria type analysis and all that sort of stuff. Nothing really beats getting engineers in a room with security guys and doing something that's roughly analogous to a thesis defense. And if I can get more and more smart engineers and more smart security guys in the same place at the same time to do these defense of design, to me that works better. I've spent the last six, seven years in my career I've been doing threat analysis of one type or another at one company or another. And they're great for ticking boxes and demonstrating the things that have been done. And it's really important to document flows and to understand how different systems interact. But getting the right experts in the right place is critical to finding the really interesting stuff. And there are a lot of really smart people spread out amongst these teams. And I just want to find a nice way to bring them together. So if somebody has an amazing process that they bring to us and say, hey, this is an amazing process, I would talk to the guys that are running the threat analysis and say, I'd really like it if you had a look at this and tell me what you think. And is it worth investing time in? We're not rigid about any of it. And I'd like to just see it move forward. You mentioned that currently most of that is post-factum. Somebody else found the problem and bring it to you. Anything can be done upstream before either extending tempers to some of the instruments to be just straightforward to be captured. Extend the current testing suits. Absolutely. There are a whole bunch of things we can do. In some places, we're waiting for technology to catch up. So the common response to this would be, well, we'll start applying static and dynamic analysis. Well, you can't really do that with Python. It doesn't really work very well today. There are other opportunities. So a couple of our stretch goal projects for the OpenStat Security Guide are to add infrastructure jobs to check for basic things that have been done wrong or things that we think we can catch simply with rules. So in some places, we're looking at doing some tempest stuff there. In fact, there's someone I think he's in the room here somewhere who's doing some awesome work in terms of tempest fuzzing QA stuff. There's a Sunday morning. It's a session Thursday morning. If you want to see the direction people are taking on some of this stuff, this is it. And this is what the security group exists to do. And now we're actually in a position where we have enough membership that we can start building this out and bringing more and more people in to do things like this. So you will see more and more upstream projects over the next year to try and address basic security mistakes that happen in the code. But there's nothing that is going to match being in the room, and it will be a virtual room, and going through code with people. Is there any work being done about certificates, as well as certificates for all the APIs? For the API endpoints. So that should be a reasonably well understood problem to solve. If you're talking about certificate management, again, for API endpoints, you don't have enough for it to really be a big problem. If you're talking about the way to manage those certificates in terms of deploying them, the security guide, I think, has some good guidance on that. But in general, what you want to be doing is terminating SSL probably locally and then passing off. So a lot of the OpenStack services will run their own SSL through Eventlet or whatever. But it's kind of clunky, and you don't really want to do that in production environment. It's a great question, but it's a great question that's been asked before, and I think there are good answers out there documented for you in the security guide. One more question, if possible. Is there any work being done on security in between tenants or in the tenant itself, in other words, IPS or IDS inside the tenants themselves? So yes, to an extent. So there's the firewall as a service work that's going on within Neutron, but it's still early days for that in terms of really wanting to put a lot of trust assertions behind it. When you get much beyond that, you're relying on a couple of things. All your typical network IDS stuff, or host IDS stuff, you can deploy onto VMs in the cloud, obviously, if you wish. And what you do there is reduce your concern to what's going on in terms of hypervisor attacks or attacks across your shared fabric. They are interesting. There's a whole bunch of things you can do to do with hardening, to do with reducing the attack surface of the hypervised platform. We actually, again, that's reasonably well documented in the security guide. I also spoke about that topic in the summit in Hong Kong, which I'd be more than happy to fish the slides out for you. And that goes through in the space of, again, about 30, 40 minutes, all the various things that you can do to protect your hypervisor and, in turn, protect the separation of tenants on the same machine. OK, great. OK, so I think that's everyone. Thank you very much for coming to my talk. I know I managed to race through the slides pretty quickly. But if you have any questions, just come on by and we'll have a chat.