 All right, thanks for coming to this session. My name is Raghu Yiluri, and I work at Intel Corporation. And I'm part of the data center and cloud products group. And I work on cloud security. I've been doing virtualization and cloud security for some time. And today I'm going to talk about infrastructure security. Specifically, there are many applications and workloads that need regulation and compliance. And they need to be controlled on where they run and where they migrate. So that's the focus of today's session. And I'm going to walk through what we are doing at Intel and what we are trying to do through the OpenStack community and into OpenStack is set of security controls that help you monitor and meet your policy and compliance requirements. The traditional security controls that are there today in data center, they are definitely necessary when you move to virtualization and cloud, but they are not sufficient. So we are trying to enable a set of new controls so that you have the same visibility, the same control that you have in a traditional data center setting, even in this abstracted world of virtualization and cloud. Just to frame the problem, I'm going to set up the problem, what we are trying to address really, but we'll transition to solution quite quickly and show you what we are doing in OpenStack. And I'm going to close with a couple of here where we are trying to go, taking this location and boundary control into the future. And I've walked through a couple of use cases, a couple of examples, and give you a sense for what versions of OpenStack we are trying to target for those. See, this is the big picture of security challenges in the cloud. It boils down to three things, visibility, control, and compliance. Beyond that headline, if you start looking at specific questions, how do I protect my VM work, my payloads when I'm in the cloud? How do I have visibility to the integrity of the infrastructure at a cloud provider so that I have the same confidence that I have running applications in a well-controlled environment like my own internal data center? Each is a huge topic. So today, the focus is on the one in red. How do you do segregation and location control for your workloads? So what's the challenge really? Policy requirements. There are a lot of policy regulations where sensitive data and apps cannot read the organization in which they are being run. There are data privacy, data governance requirements that constrain the movement of workloads. And there are cases where extremely sensitive data and apps have to run in certain quality hardware and security zones. In a typical enterprise well-controlled data center, it's relatively easy to do this. But as you get into more cloud kind of a model where there's multi-tenancy and all that, it's extremely difficult. And then to add to the policy problems, the agility of technology has added more challenges to it. I mean, at the end of the day, anything that is virtualized, it's set of virtual servers are essentially a set of files. You can copy them, you can move them, you can run them wherever you want. And with the notion of hybrid clouds and overlays and all, it becomes even more easier to move workloads from one to the other so the lines are getting pretty blurred here. This is the challenge. And this drives into every set of requirements, OK? You need to run applications in the internal data centers. You need to run apps to do governance and data sovereignty issues. You have to run in the correct geography or the right data center in the right geography. And in some cases, you have to have a specific set of hardware on which your applications have to work. I translate these higher level requirements into a bottom line technical requirement. How do you ensure asset location or geolocation in a multi-tenant cloud environment? That is the fundamental technical requirement we at Intel sought out to at least address through our technology. How many heard of Intel TXT? Wow, that's pretty good. TXT is a set of technologies in x86 architecture which provide you a certain level of integrity assurance that is rooted in hardware. It's rooted in our CPUs. It's rooted in our chipsets. And it can provide you a level of assurance of the boot process, including the software stack that runs on the hardware. You can make very specific security claims about the integrity of the hardware on which your workloads are going to run. The good thing is, as a service provider, you can make that integrity transparent to the customer. You can say, customer x, your workloads are running on this set of clusters. And here is the integrity of that one. You can actually provide that as a report. There is a concept called measure boot where everything from the time you power on the server, every step in the boot process is measured. And those measurements are stored in a secure location in a TPM typically. The measured boot process plus the attestation of it provides you that assurance. I'm not going to get into a lot of detail of TXT, but at a higher level, that's what it is. And the chain of trust we like to assure you is starting from hardware through the firmware bias into the OS and the hypervisor. We are using the same set of controls that are available in hardware to write an asset descriptor into the TPM and leverage the same measured boot and attestation process to attest that asset descriptor. That's what we call geotagging or asset tagging. So asset tagging is a way to write an asset descriptor into the TPM of the server. And through a trusted launch process, make that visible to higher level entities that are running. So an open stack scheduler, for example, when it is trying to schedule a VM, a scheduler workload, can know which servers are trusted servers or trust verified servers to be better. And then it can deploy workloads on that. So what's a tag? An asset tag is a one or more name value pairs that are bound to the unique ID of the platform, typically like a motherboard ID or, in some cases, the endorsement certificate of the TPM. It is digitally signed. We take a hash of that one and we write that into a non-volatile index in the TPM. And the index is a write once only index. And the asset descriptor could be anything. It could be GPS coordinates. It could be functional description. This is my PCI cluster. This is my web front end cluster. You can name it whatever you want. It could be geographical information, not as GPS, but you can say, hey, this is my US location. This is my California location. So it can be textual description of geographies as well. So anything that makes sense for your business, you can use that as a way to write this tag and make that visible to higher levels of the stack. It's protected in the TPM. It is made visible through the measured boot and the attestation process, so I have integrity of it. And then I can do an attestation of it so that I can verify and assert that when that server says that is the geotag of it, I know that it is the right one that I expect that server to have. So this is the geotag process. And the way it would work in OpenStack is as following. Step zero is where you have some out of band mechanism to provision the tags onto the servers. So you see three clusters here. And all the clusters are provisioned with the right tags. So I have UK, US, and France as the example here. So step one is I have a workload that I'm trying to deploy. So I take the image, I put it up in Glance. In addition to putting the workload in Glance, I do one other thing. I set a set of launch policies for that workload. Today, we use Glance image registry to set those properties. So in this example, I set the policy as it has to be trust verified wherever the server on which it's running. And it has to have a geo of France. So now I set, I upload my workload to Glance. I set the launch policies for that workload. However, I figure I go instantiate this workload now. That's step two through the API servers. And we have created a new filter to OpenStack scheduler. It's called the trust and launch filter. So as you are running all the filters, there would be one filter called the trust and launch filter that's running. So essentially, it takes the subset of servers that the scheduler got from all the other filters. And then it's going to run a set of attestations, which are step five that you see here. So essentially, there is an attestation authority. I don't know how many of you are familiar. It is called Open Attestation today, OAT. And the filters call OAT with that subset of servers and say, hey, tell me which one of these things are trusted. And then the Open Attestation server essentially challenges all the servers in the various pools. And then it's going to come back with a set of results. And based on the results, the best server is picked by the scheduler. And it deploys the workload. The same exact process happens when you are migrating workloads as well, when you're doing a live migration. There are very few manuals. There's hardly any manual steps here, by the way. And I kind of made it sound like it's manual. The only secure, the only step that a system admin would go through is step zero, really, which is when you're racking and stacking servers, when they come in, you essentially, based on whatever your business requirements, you tag the servers with the appropriate set of tags. When you need to repurpose, essentially, you take the servers, and you run the provisioning tools, and you can repurpose them for a different set of asset tags. Same index, but different set of asset tags. At the high level, this is how boundary control works. And here are the extensions we did into Open Stack. We have a new location filter that you see on Nova Scheduler. We have set of horizon changes as plugins so that you have the ability to select what are called the launch policies for your VMs. And you have some extensions to the registry so that you can actually assign a set of launch policies to that specific image. And then the one in blue are the attestation and the provisioning tools that we provide. Either they come from Intel, or they come from a set of ISVs. All these, in the ones in yellow and the red, there will be Open Stack, and there will be open source components of all this so you can actually have the whole end-to-end system in open source. There are a bunch of blueprints. I just want to call out a couple of them. There is a trusted asset tag part of Nova that's there. There's one for horizon. And then there is a wiki which kind of walks through how the process actually works. OK. We are targeting Kilo to upstream the code. The code's done. The code's available. It's vetted. As you can tell, the big focus of the Open Stack Foundation is to make sure that stability is there so new features will take some time to get into upstream. So we have tried it for Juno, but it didn't happen. So we are hoping that it'll happen into Kilo. But we will provide downloadable scripts that will be backport-portable to Ice House and Juno. So as you are ramping with Ice House and Juno, you would be able to at least get these scripts until we upstream that into the mainstream. I was going to actually show a demo, but I'm having some issues with my wireless connector, so I'm going to revert back to a set of screens here. So this is how you set tags. I wanted to show you that the tag doesn't have to be one thing or the other. In this example, you see state, you see country, and you actually have latitude and longitude from where I live. So this is how you create a set of tags. And there is no constraint on how many. I could even add a couple of ones, saying functional PCI server and something else. So I can have a set of heterogeneous tags that I can set. In this screen, what I'm trying to show you is, hey, I have my list of servers. I have the ability to pick one at a time if I want to have a separate tag. Or I can say, hey, here are my tags. Just go provision all of the servers with these tags. And then, essentially, a process on the back end, a pixie process, essentially will go securely write those tags into the TPMs and all the servers that you give. This is the only step that you, as a system admin, would have to do. One time, when the racking and stacking happen, are if you are trying to repurpose the server for something else. Here, I'm showing you how you set policies for VMs. You see the little extensions to Horizon that we made where, I don't know if you can see it very well from back there, but on the bottom, you see trust and location. And you have a set of policies that you can pick. Like I said a little while ago, there are no restrictions on how many policies. But in the reference implementation we have, we let you select up to five tags as policy. And then the magic number came after quite a lot of discussions with folks from NIST and a couple of ISVs were said, where we agreed that this is how many tags somebody can typically want to manage for a given set of servers. So five was that magic number. Again, there's no hard rule that it can be more or less, but the reference implementation gives you about five tags that you can set. It's country, state, region, function, and then one which was like other so that you can use it for any purpose. Those are the five attributes that you can set values for. And the column that shows trust policies in the middle, the lock sign shows that it is trust verified, meaning that the TXT measured boot has happened and the attestation has happened as well. And the second one kind of shows you that it has a location tag as well. Here, this is the actual instantiation of that image. So the scheduler ran the filter, found the right set of servers to launch, launch the virtual, the workloads. So that's where you see an instance name and you have the base image that's used and then you see the trust policies. I don't know if you can see the little highlight there, but it shows the state and the region for where the VM has launched. So this is it, that's pretty much it. Once you set that, once you set the policies, the scheduler filters will take care of controlling the placement of your workloads, controlling the migration of your workloads. If some policy doesn't match and it can't find a server, you won't get the launch of the instance and we write a set of logs for you in syslog and format. And if you have a mechanism to call into some other log things, we can easily plug that into the code base. So that kind of gets you to controlling VMs, okay? And even though a lot of people use share nothing kind of a model, meaning that all the VMs, the data, everything goes with it. You launch it on the server and that's where they run. But there are cases where storage volumes are important as well. So one of the use cases we are looking at is extending this geo mechanism for storage volumes. Okay, I'm gonna kind of talk about that a little bit where we are with it. And the second use case is when you put your VMs, when you put your data at a service provider, you would like some kind of data, some kind of protection. The typical model is, of course, is encryption. That's an easy answer, but who controls the keys? That's the harder answer, okay? Either you have two choices, no encryption or you have encryption that is controlled by the service provider. Okay, we heard enough from customers that they want to have control of the keys. So we are exploring a new option, what we call tenant-controlled VM encryption and decryption where the decryption keys are provided to the service provider only when the service provider can assert that there is some level of trustability of the infrastructure. Okay, I'm gonna walk through that one as well. So the geotag location control for volumes, right? So scenario one, I have two VMs, VM one, VM two, no attached storage. Everything is local storage. So when VM two launches, it's gonna have data, a database using the local storage on whatever machine it's launched. The model that we have today, the boundary control model, works perfectly fine. You know, the cloud controller talks to the attestation system, finds out that, hey, I have to have a geo constraint of France. It found the trusted pool there, launch VM and VM one, VM two, all policies are honored on that one. But if I go to scenario number two, I have VM one and VM two, but VM two has a storage volume that is mounted later. So my current implementation goes to the same process we did before. It launches it on the trusted pool on the left of this part of the picture. But once VM two got launched, it did an attach of a volume that may not meet the same policy. It could be somewhere else, or it could be maybe a different constraint than the one you had before. The current implementation of boundary control through the NOVA scheduler filter process does not address this scenario. So what we are trying to do is do exactly the same thing that we did for NOVA for Cinder as well. You know, Cinder has a Cinder scheduler, has the same plugin architecture that the NOVA scheduler has. So we are writing a new location filter that plugs into Cinder scheduler. Okay? And we are leveraging, assuming that the storage is on x86-based infrastructure, either DAS or scale out storage, we can do the same geo-tagging, the trusted geo-tagging process that I articulated a little while ago. And then I can use the same attestation system, the same provisioning, same attestation, the same filtering process, and I can find the right set of storage volumes that meet my policy. Okay? It does not work, I'm being very clear, it does not work for SAN and NAS systems yet because of, you know, they have different hyper, different operating systems, and you know, even though some of them have x86, they are not TXT-enabled and all that stuff. But as long as it is not NAS and SAN, you have scale out storage, you have direct attach storage with JBARDS, this would work. So there is a new location filter that you use, so when you're creating a volume, you can enforce the policy. So right at the time when a volume is getting created, you can say, I need this policy and the volume creation will be, and the scheduler goes through and creates the volume, you get it. The second case is the volume is already there and I'm trying to attach it. So we have a, you know, this is probably a little bit of an intrusive code. It's not the same plugin model, but we have to go into the check attach function in the NOVA API which does a few checks before it mounts any volumes. We have to go there and add a couple of lines of code which verifies that the VM location policy is the same as the volume location policy. With those two changes, we are able to enforce boundary control for storage volumes for create and attach. And you know, for folks who are familiar with Cinder, there are two other functions, migrate and backup. Okay, we think the same model could work for migrate as well because the location field, the scheduler does a very similar function when it's trying to migrate volumes, okay? The challenge is backup. Backup can go into any number of places, but ideally, you know, if you want to continue the chain of trust and boundary control, you need to get to backup as well. One option we are considering is assuming the backup goes to Swift, we can leverage the things called storage policies in Swift. You know, starting with Juno, there is this ability to set storage policies at the Cinder node or Cinder container level, and you can enforce policy there. So again, very early thinking on our part, but we can leverage the same storage policies that Swift has to continue this boundary and location protection down to backups as well. Here are a couple of screenshots, okay? So nothing different than what you have seen in the previous ones. On the left, you see trust and location for create volumes, and then here you see volumes created with the right geotags, and here is the attach. So you can see that volume, NC volume one attached to instance one, essentially showing that both the trust and the location policies for the attach have succeeded. In the other two cases you see on the top, we couldn't find the appropriate policy match, so even though those two volumes were trying to attach to a VM instance, they didn't get attached. So I'm gonna switch to the second use case, the idea of tenant control VM protection, right? The idea is simple, when the VMs and data are at rest or in transit or during up until execution, you want them to be encrypted. You want to only release the keys when there is sufficient assertion that number one, the servers on which these VMs are being launched are trusted, and then they meet the location policy. Only then the decryption keys are released, okay? This works well across all the hypervisors. I kind of probably failed to mention that early on. This is not, even though it's implemented into OpenStack, at the hyperlizer level, we support all hypervisors, ESX, Zen, KVM, Zen server, Citrix Zen server across all kinds of operating system, Red Hat, Suze, any of the Ubuntu, any of them. This one is actually being demonstrated, actually both of them are being demonstrated at the Intel booth today. So if you have time later today, I urge you to go down, take a look at boundary control for VMs, as well as tenant controlled encryption decryption. The architecture, I know I'm sorry, it's a kind of busy slide here, but the way the process works is as follows. The one in the red items are the new components here, okay? When you have images you're trying to upload into a cloud, you have the ability to encrypt them, you create your keys, you encrypt them, and the keys are stored in a key management server that's built on top of Barbican. How many of you are familiar with Barbican? Awesome. So we extended Barbican a little bit so that we can actually have a policy module in there. That can control when the keys get released and how they get released, okay? Step three, the encrypted blob goes into Glance. Step four, whatever is the initiation, a launch of a VM has been triggered. Step five, cloud controller downloads that encrypted blob from Glance onto a node, a compute node, okay? Actually that is step seven. And there is a plugin that we built called a policy plugin that plugs into Nova compute, and it does three things. Number one, as it is processing the blob, the VM image, it knows that it is encrypted, then it's gonna talk to the KMS, the key management system that is sitting at the enterprise. Okay, it says, hey, here I need this key and by the way, this is my TPM key, the TPM ID on which I'm running. And the key management system will use the TPM ID, get the assertion that the server is a trusted server. It's gonna take the decryption key and then wrap it with the TPM key. Okay, so the only entity that can unwrap this thing is the TPM because that's where the private key is. So even if somebody tries to man in the middle or sniff it, they can't get the private key because it's in the TPM. So I got the decryption key that's wrapped with the TPM key. I get it into the Nova compute. I use the TPM, decrypt that envelope, get the decryption key, decrypt the VM image, and launch it. So the only time the VM image and the data are in clear is when it is running on a specific server. Until then, everything is encrypted and nobody owns the keys other than the enterprise, okay? This is working end-to-end. The extensions to OpenStack are there. The extensions to Barbican are there. The extensions to the trust authority, the open attestations server are there and we are demonstrating that down at the Intel booth. We, the blueprints are getting done. We're gonna upload the blueprints relatively soon here. And I would like to see this in Kilo but it's not gonna happen, we know. So we're trying to see if we can at least target the L release for this, okay? We hear a lot of customers telling us that this is a desired feature. So hopefully we will get this pushed through the process, okay? I think I'm at the end. I want to kind of summarize a little bit, not sure how much time we have here. Maybe five more minutes. So boundary control is critical. It's not required for all workloads but a good chunk of business critical, mission critical workloads need some kind of compliance requirements, whether it is trust or location, boundary control that need to be met. You know, Intel TXT and the whole measured boot and the attestation process provides you the root of trust on which you can build this assurance of integrity of the launch process. Extensions to geotagging are there. I would urge you to go take a look and support the blueprints and get it through the process so that it can get upstream. So you don't need to deal with downloaded scripts and all that, you know, it's part of OpenStack upstream so you can get it by default when you get the distributions, okay? The two forward-looking ones that I shared, if you think those are interesting extensions, interesting use cases, let us know, you know, Intel guys, you know, either here or at the booth, be happy to get your input so that we can use that as a means to drive these use cases forward as fast as we can. Okay, with that, I'm done, open for questions. Anything that I can address for anybody? Yes, yes, I can actually walk you through that a little. So the way we do it is one or more tags which are named value pairs. We tie a user ID, UUID to it, create a X509 certificate. We take a SHA one or a SHA two depending on TPM one versus TPM two and we write either 20 bytes or 32 bytes into one of the index in the TPM, okay? You can see the index that, at least for TPM one dot two, the index is 4 million or 40 million 10. That's the index. Okay, for TPM two, we will do 32 bytes, assuming it's SHA 256 or if somebody really wants SHA 512, we can do the 64 bytes as well. Okay, but the initial release since TPM one twos are the more dumb prevalent ones, we would do 20 bytes. Any other questions? Yeah, yes. For the image, there is one set of encryption, one set of, there's one decryption key per image in the current prototype, okay? But that is just the way we did it. You can have, if you think every image, every instant needs a different one, you can do that as well. But in the current prototype, we have one decryption key per image. Yeah, absolutely agreed, yes. Okay, you know, sorry, was that a question there? Okay, any other questions? With the, one of the things I looked at is that Congress is something that they're looking at putting policy in. Is that something that you're looking at? Absolutely, yes. You know, all the policy that IT guys can't leave me. Policy is something that we are looking very carefully because we do a lot of policy definition, policy management, so Congress is a very promising thing for us so that there is a unified way of model for defining policy and a unified model for executing and orchestrating policy in the cloud. So we are looking at it. You know, one thing I want to mention, I forgot in all this is we are actively working closely with NIST and there is a interagency report called 7904 that talks about trusted geolocation in the cloud, okay? We actually have, I think, Mike on the back there from NIST. Mike, do you want to add one sentence, two sentences on what you guys are doing? I'm from NIST, it's the National Institute of Standards and Technology, part of the U.S. Department of Commerce and we write standards and guidelines for the U.S. government. So we've been collaborating with Intel for the past couple years, working on this geotag and trusted hardware booting process. We originally started out with just a VMware stack that are now moving to doing this with OpenStack and are in the process of building the OpenStack integration with their attestation service. So we are currently implementing the trusted boot process and the geotagging and then gonna work with Ragu and his team down the line to also build on the encryption and decryption of the VMs. Thanks, Mike. Any other questions? I got a couple of more minutes here, so. Yes, in a federated model. You know, to me, Congress is probably the most appropriate way to have a loosely coupled definition of what tags are and what policy is so that you can use that for federating across clouds. So I have a lot of hope for Congress, actually. All right, hey, I think that's it. We are running out of time. Oh, was there one more question? Yeah, go ahead. VM getting the asset tag from the host. VM can get an attestation of the tag. So it can get, it can VM can talk to an attestation system and it can get the attestation of the tag and when we attest, the assertion also has what the tag was. That the server presented. So conceptually, yes, the VM can get the tag through that assertion. Okay, all right, thanks for your time. Okay.