 Hello, welcome. I guess that means we're rolling. So this presentation is titled DEF CON 3. OpenStack meets the Information Security Department. How's that mic working? You don't hear me? Great. What I'm going to talk about today is basically recurring themes or problems that we have encountered when deploying OpenStack into enterprises. And these are issues that they may be familiar to you. They may be new, but they're not necessarily going to be issues that arise in a model greenfield implementation. So these are issues that arise in real OpenStack deployments where you have a legacy information security department. OK, so before I go on, I'll just give a little bit of context so you know where I'm coming from. I work for Solinia. Solinia is a company. We have three areas of business. We have, firstly, consulting services. And consulting, we try to help customers accelerate their adoption of open infrastructure and primarily OpenStack. So within consulting, we have four main phases that we work through. They are conceive, architect, integrate, and adopt. So the issues that I'm going to talk about come mostly from the architecture and integration sides, but those steps conceiving is basically building a cloud strategy, working out a roadmap for implementation, understanding the use cases that the customer has. Architecting is then taking that information and building an architecture for the client. Integration is actually building it, deploying, integrating with any external systems that they have, and then adoption is helping that customer use the cloud, which moves into the next area, which is training. So we offer training services to our customers. And in that area, we train on OpenStack operations, architecture, also open source software, such as Docker and other components essential to the cloud ecosystem. And then lastly, but not least, we have Goldstone, which is a product that has come from our experiences in actually deploying OpenStack for customers. And this product is basically, it's a platform to help customers operate OpenStack. It's metrics, compliance, et cetera. And I'll introduce myself. I'm my name's James Clark, and I work in the Seoul office. So Selenia has headquartered in San Francisco. And we have two Asia-Pacific offices in Tokyo and Seoul. My customer area mainly is Asia and Europe. So if any of the things that I talk about here seem a little far, and that may be the reason why. Just before I move on, I see there's quite a few people here. Can I get some idea of what you're involved in? So how many people here are actually involved in delivering or deploying OpenStack to outside clients? And how many are involved in deploying within your own organization? Excellent. Great. OK, so the topics I'm going to work through today. First of all, I'm going to talk about network security. The reason I tackle that first is network security is one of the things that really shapes the architecture of the cloud. So if that is actually not handled well, the cloud is never going to be able to be deployed properly. Then I'm going to move on to talk about the control plane. And by that, I mean the OpenStack controllers themselves. So your three-node HA controller cluster, typically, or however the vendor deployment of OpenStack is structured. Then I'll talk a little bit about identity. So that is the keystone side of things, how that integrates in with the organization. Next, I'll talk about IPAM. Then as a special exception, I'm going to talk about CI-CD. It's CI-CD because the way you architect the cloud and CI-CD needs to be tightly integrated. If that's not done well, they'll never work together properly after they're deployed. And then lastly, I'm going to talk a little bit about compliance, so some of the issues that we find with maintaining the customer's legacy compliance requirements. So we're a little short of time, so I'm going to try and work through this quickly. The things I'm going to focus on are based on the OpenStack reference implementation. And by that, I mean not with any additional special source or vendor products. And the reason I'm going to focus on that is so that to keep the maximum relevance. So some of the things there may be other solutions for, but I'm going to stick on the reference implementation so you can then make your own judgment whether or not you can fix it yourself or obtain an external product to do that. I'm going to be talking about implementation issues for the OpenStack itself. So not so much about migration of apps onto OpenStack, but actually deploying OpenStack. As I mentioned before, there'll be a little bit of CI-CD talk in there just because it's very critical to the architecture. And these are, again, actual implementations. So they're not like reference model implementations where we get to do everything exactly the way we want. These are implementations where things are imperfect and we have to sort of bend things to make them fit. I'm not going to talk about the actual platform security itself. There's some really good content on the OpenStack website for things like how to secure the controllers themselves and best practices for all those things. I'm also not going to talk about formal compliance, so PCI or other particular targets. Again, there's plenty of resource material on that that has been around for some time. And as I mentioned, no application security design talk. So I'd just like to introduce the information security team here. When we start out deploying, we need to look at how the high level shape of the cloud is going to look. And the two key people that we need to talk to, or when I say people, that could be a department, are the CSO themselves and also the risk assessment team. And risk assessment is also sometimes known as information security. And what they do is they look at threats, the value of the information that's going into the system, what the possible attack vectors are, and what the mitigations that you have in there. And then they work out whether or not that actually meets their policy guidelines for implementation. Later on, when we come to implementation, we need to talk to security engineering, access management, security operations, and the cert team. And I'll just briefly mention here why I titled this DEF CON 3 is oftentimes when you're dealing with the information security department and you're trying to roll out a cloud, they view that as they should as a threat. It's a potential threat to the information security. So the atmosphere can be a little defensive. So there'll be a lot of justification of why things are the way they are. And that's where that title comes from. It's not entirely always the case, but it often can be, excuse me, OK, network security. So one of the first things that you will often encounter when you walk into a large organization is that their security model is largely based on the concept of perimeters. So you'll be familiar with this. This is the firewall model. It's a very blunt instrument for enforcing security. There will be a lot of manual controls there. So to get firewall rules opened, you need to actually go through a policy check and someone will actually evaluate that and determine if it's sufficient for the risk or the information involved and then implement those rules. These things have usually been in place for a long time, maybe even a decade. And what you'll find is that a lot of the internal people have developed their own workarounds for this security model. So they may have things like SSH tunnels or VPNs, punch through holes in the firewall that they're using to get their job done. But as an outside consultant, you can't actually use those. You have to build a clean architecture that the information security team will sign off on. So often our task is a little more difficult than it is for the teams that you're actually working with inside the company, the company's infrastructure team, for example. One of the other complicating aspects you might run into is that the information security department may actually recognize that there are all of these tunnels, their firewalls are quite holy, and they want to actually raise the bar. So they're going to try and actually use the cloud to tighten up the security, which makes our job a whole lot harder. When we look inside the firewall, though, we find something completely different. Often times the subnets are completely open, so where applications are deployed, there's no host-based firewalls, for example. The company may run network intrusion detection systems. They may also use vulnerability scanners to actually prove that the software or the applications they're deploying are secure. And that you can see then is an obvious conflict with the concept of security groups. Security groups are a much finer-grained, more targeted security model, and they're actually much better. So you will end up having a discussion about whether or not you can do security groups and they will have to forego the use of their vulnerability scanners or you disable security groups so that they can actually try and make things more secure, which sometimes seems counter-intuitive. And then to ratchet the complexity up another notch, this is quite often what we find inside of customers. So just to take a step back, the customers that I'm talking about here range from telecommunications, so telco service providers, but not their service provider business, which is facing the internet, but internal use for, say, migrating their legacy data centers onto cloud. They could be in manufacturing or another example is financial services where a lot of these problems come up. So what they've done is they often will take the concept of the perimeters and then shrink them down into smaller units when inside the organization and this helps them simplify firewall rules. But the problem that we have is where are we gonna put OpenStack? Where are the controllers gonna sit? Where are the hypervisors gonna sit? And how can this actually be used? Now if you have, this depends a lot on the scope of the OpenStack project that you're involved in. If you're just using OpenStack to handle a single application or a very single specific use case, or if all of the applications you're concerned about are related and can be lumped together, you may not have a problem. But on the other hand, if you need to have developer access and developers are not really part of, they don't have access to the production zone, you can have a lot of problems. You may think that developers having access to production is not a good thing. That may well be the case, but if you look at the converse, if we put the cloud into the development zone and another one in production, we may not gain the benefits of being able to run a CI CD tool chain to actually do deployment. We may find that the developer tools, what comes out of the developer zone because it's untrusted, still needs to go through a whole lot of legacy testing before it can go into production. So if the customer is looking for agility, they may not get it. And one more step up the problematic side of things is the fact that the OpenStack controllers are multi-homed. And oftentimes customers will, to simplify things, choose a provider VLAN type deployment. If they're very conservative, they won't want to go for SDN. So provider VLANs using their existing firewall and router infrastructure is quite common. And when you do that, one of the things they will notice is, why do we have to have this trunk from all of our hypervisors flowing back into the controllers? That's a problem because we have rules which say you cannot mix security zones or you cannot, if your controller is in a particular zone, you cannot actually have all these network trunks flowing back to it. And you may be familiar with this. This is to provide things like DHCP and metadata services to virtual machines. So one solution is obviously to disable those, but then that breaks all of your orchestration functionality. So another big problem. Okay, so moving on to some of the solutions that we have actually run to in this side of things. For the control plane isolation, particularly where the customer requires physical separation, say they may have a trading platform that its networks cannot exist on the same hardware even or switches even as other software, we actually have to have separate installations of the cloud software. And one particular customer that comes to mind when we were designing this, we got to about 11 and then thought, well, hang on, this is not really gonna work because we can't have the solution to every problem being just add another cloud that's not actually gonna scale and operationally it's gonna be worse often than when we started. But to some degree, multiple clouds may be required. Another problem that we often run into is the firewall automation. So customers will often have the very large firewalls that they use to divide everything up into their security zones. And they don't wanna let go of their procedure for implementing changes to their firewalls. That may be due to regulatory requirements. They may need to have a paper trail of all of the changes and justifications. So having OpenStack reach out and modify those firewalls is also not an option. This is very difficult to change. So typically the solution we go for here is that we would negotiate with the security team to agree on a set of application classifications. So we would survey the organization's application catalog, divide them up into ones which have very similar architectures and have very similar connectivity requirements and then agree on a set of firewall rules which can be used for each application classification. And then we can preload certain subnets with those rules before deployment. And then when end users come along and they want to deploy a new application, the orchestration system will know where that needs to go and will deploy the application. That VLAN access will be on a subnet wildcard. So as soon as the VMs deployed, it's up and working. So instead of a 30 or 60 day wait, they can have a VM in a few minutes. So to wrap up on perimeters, the basic lessons here is that the perimeters have a very strong influence on the overall shape of your cloud. It's very important to engage extremely early to work out what these obstacles may be before you start on your architecture. Also, a lot of questions will be raised about governance. The reason for that is when the risk assessment team is looking at your cloud, they're going to want to know that necessary controls, the necessary operational controls are in place to make sure that everything that happens in the cloud is done according to a set of agreed governing principles. Also be aware that the perimeter security model is going to have a lot of influences on the way you design that cannot be solved with the OpenStack role-based access control entirely. So you're going to have to do some engineering outside of that to handle the organization's security requirements. And also finally, changing the security policy is almost always impossible in my experience anyway. A lot of these things are based on some regulatory requirements. So to get a security policy change is something that could take a year or two. Okay, moving to control plane. Just to briefly explain this. So one of the things I mentioned before is this needing to have 10, 11 clouds to satisfy the network separation requirements. So you can often reach a trade-off there by implementing very tight controls on the control plane. So some of the measures here, I've illustrated our two factor so that every user who actually needs to access the control plane needs to go through a two factor identification service. That actually gives the level of audit logging that is needed for their regulatory requirement. Also API access for things like your CI CD tool chain will need to run through a WAF which is going to restrict access or restrict the policy beyond what the OpenStack RBAC can do. Also, you're gonna wanna have some privileged access management there. So for the system operators who are actually gonna be managing the cloud, so if they ever need to, for example, SSH into the platform, they're gonna need often for regulatory requirements again, video screen recording basically of all the actions that they're doing. This is so that this can be used later if there is an event that needs an investigation. So the lesson here is basically that also you need to incorporate this in your design very early. You also need to consider the entire tool chain. Things like Jenkins or Packer that are gonna be building VMs on the cloud and they need API access. They're gonna need very special handling because that's not gonna work through any kind of two factor system so that you'll need an alternative solution for managing the, or for mitigating that security risk. Okay, and, excuse me. Okay, identity. This one is not too, usually too much of a problem area. One of the things that you will always notice is that enterprises will never use the Keystone internal SQL database that will always want to have their identity team look after the access and the identity team will want to use their existing either LDAP or Active Directory system. A problem though arises in that security recognize that this is actually a legacy configuration and they do not like the fact that you have plain text passwords exposed outside of the control of their AD environment. So for example, if you bind OpenStack to an AD server or LDAP server, people still send their plain text password, or it'll be an SSL, but when it gets to Keystone, it gets unwrapped. So if anyone has actually compromised the OpenStack box, they can potentially sniff all of these passwords which security people don't like at all. Another thing is, at least with Keystone v2, when you integrate with LDAP, you have to have all of your service users, so your Nova, Cinder, Glance, et cetera. They all need to be stored in plain text in the file system and that's another no-no from a security perspective. An additional thing that I mentioned earlier that they want is two factor. That is, in my experience, is often implemented through a VDI solution. So there'll be something like a Citrix or VMware view environment that users have to jump through a hoop with an authenticator and a pin to access, and then from that environment, they can then access the cloud. Now you can see there's a problem there if you wanna run an API through that because that's not gonna survive all of those hops. So moving on to the solutions here, with the plain text passwords, what we can do now with Keystone v3 is we can use the domain configuration so we can separate the service users into their own local domain so they can reside in the Keystone database and then we can have all of the user domains using the AD system and that makes security much, much happier. For two factor though, it gets a little more complicated if we don't wanna use the VDI solution, but what we can do is we can use federation and single sign-on. So many of these enterprises are now also recognizing that this LDAP model is not as secure as they would like, so they are starting to run their own identity providers. The cases that we've encountered have been OpenAM. SAML, I should say, SAML with a LDAP backend so they can retain their existing directory store and front-end it with OpenAM. In these particular instances, the credentials never pass through OpenStack, so the users are redirected to the some single sign-on portal and then back to the dashboard or other tools that you might have in the cloud environment. It's a little more complicated for API operations, but they can also be made to work in this manner. So IPAM, IPAM is often an issue because with OpenStack we like to have the ability to just deploy a VM. Neutron can grab an IP address from its pool, assign it, and then DHCP it, and everything works okay. But from the enterprise perspective, particularly for regulatory requirements, they may have a CMDB or IPAM system that needs to be capturing particular details for every IP address that's on the network. And when you think about the perimeter security model with the open subnets, this makes a lot of sense because when things are open like that, you've got to have the controls on both sides to make sure that everything plays well together. So for example, they may need to have some security classification for the data that's behind that IP address, the application, the owner, who's going to operate that, et cetera. So when you're setting up OpenStack, this is not an issue, but obviously for the operation of the stack, for the deployment of VMs, this needs to be resolved. So I mentioned earlier, we're talking about Juno and prior deployments. So the only solution there is to actually have an external orchestration platform update your IPAM database out of band. So you can negotiate to have a pre-allocated subnet done manually and added into the Neutron IP pools ahead of time. That would be done by an operator. And then have your orchestration platform so that whenever you deploy a virtual machine, it'll extract the IP address and then update the IPAM database with all of those necessary details. In, I think it was Liberty or maybe Mitaka and beyond, there is a pluggable IPAM API. This is very new, and I'm not aware of any implementations which integrate with external products, but with this model, this would be the way to go moving forward. So you can have OpenStack request IP addresses through a driver to your existing IPAM platform. So moving on to CINCD. Now, this is another real problem area. This is very much the same problem that you have with the security zones at the start. You have all of these different components which needs to talk to the cloud. You may have, you know, your CI orchestration, image build, you may be packer building images for glance. These all need API access, but where are you gonna put them such that they can talk to the cloud with all the controls that you've actually put in there? So what we have found as a workable solution is to actually build an API proxy. Now that could be implemented with a commercial WAF product, web application firewall, or with something like Apache and Mod Security, and then you can construct a JSON filtering approach to restricting the operations that can be done from your lower level of authentication environment. So here in this diagram, for example, we have some two-factor users which are potentially coming from a VDI environment, so they're all safe. And on the top there, the top right, we have simple authenticated clients. Now this could be your tool chain components. They need to run through a separate access point to reach the API endpoints. And that access point is gonna use a Mod Security rule which is gonna verify that against some integration point with their IAM platform. So we'll look at what groups or what operations that they're allowed to do and then either reject things that they shouldn't be doing and let the good operations through. So it's a little complicated, but it does actually work running at a time. I'll quickly talk compliance. So one of the other big issues that we have with Cloud is that these companies which previously had like a 30 or 40 day process for starting up a VM, which you can see there on the left, part of the output of that is a paper trail of all of the compliance checks and authorizations that were needed to get that VM deployed. So when you wanna just make it so that someone can, through an API request a VM or even just through the web interface, have that in a few minutes, there's a problem. So what we need to do is we need to build the equivalent audit controls into the environment so that the security team have all of the logging and approval records that they need to actually prove that things were done well. So we changed the model when we're doing CICD such that we codify the process. So we need to work through refactoring the business operations across all of their different teams, turning that into automation code and then changing their approvals instead into reviews of the operations for the CI platform and then having the output of that be virtual machine images or packages, which can then go into a secure repository. On the OpenStack side, we need to integrate with usually with the SEM platform. So we get all of the audit events from each of the OpenStack services. The way we do that is we need to enable the whiskey audit middleware that will ensure that all of the operations are output in CADF format. So they have who was doing the operation, where they were coming from, what the object is that they're operating on in quite a detailed format that you will not get in the normal OpenStack logs. You then need to aggregate that together and store it in a secure data store. You can build that itself. There are products around which, to some degree, help you do that. One of them that we've actually been working on is Goldstone and we have a feature in that which is OpenTrail Auditor. If you're familiar with EC2's cloud trails, this is a very similar thing. It enables you to capture all of that, log in information and then push it to a protected read-only system which is outside the domain of control of the cloud. We're almost out of time. So, quickly recapping lessons learned. Firstly, it's very difficult. It's hard to deploy into these organizations that have very old and very rigid security environment and it's also very difficult to retrospectively fix your design. So, it's very important then to engage extremely early in the process of architecting your cloud with the CISO team, particularly with the risk analysis team to actually make sure that you fully understand their environment before you actually start doing the first architecture diagrams and circulating them and then continuously review that as you go forward. It's sometimes very difficult to understand what all of these problems are until after you try to implement because things don't surface until some guy that you never met before realizes that what you're asking for hasn't gone through the right checks. So, you need to be very vigilant in that upfront engagement with the CISO team and of course, include all of those requirements in your architecture. Okay, and that's all. I think we're just have about two minutes for questions. Anyone? No questions? Tough crowd. The compliance stage you said about getting sign-offs for each part. Yes. Did you say there was a product that can do that? It won't actually get the sign-offs but what I'm mentioning there is you need to capture all of the actions and some of those, you may have some external approval process but you need to electronically capture all of that into a trail, basically an electronic paper trail that can later be used to verify that everything was done according to process. And the CAD-IF stuff you had, did you say that that was, is that in-house special source or is that something else? No, the CAD-IF, the ordered middleware is part of OpenStack. So that is a module that you can activate and it needs a little bit of manual configuration OpenStack, but. That's what I was talking about, do you have that? Yeah, it's actually in OpenStack. So, it's always there. It just needs enabling. No more? Okay, thank you very much.