 Hi, welcome. As you can see this is a talk on trust and security in the cloud. My name is Sean Mullen. I'm from IBM. I'm a cloud security architect. I think there's chairs pretty much on this side. So we kind of want to make this interactive. I mean it's a small room, it's a technical talk, so I'm going to invite you if you have questions as we go along, just shout out and we'll address them. Either we can answer it or maybe somebody in the room even knows. Alright. So this is, let me go through a little bit of our architecture. So this is what we built. It's hosted managed private cloud. We have a great IBM name for it called ICOS, a pronounceable acronym. IBM OpenStack Cloud Services. We released this about a year ago and we added the trust and integrity technology from Intel, the TPM TXT, about June. So the general architecture of this is you pick up the phone and you order our I need an ICOS IBM, right? So we have to build this out and it starts with three controller nodes, right? This is a HA proxy, so the three controller nodes give us the high availability. It gives us all of the OpenStack components, right? Genova, Neutrons, things like that. Cupid, you can see the list there. So in this picture here, big squares are bare metal systems and the little ones are VMs, right? And then we have, you can kind of see there in the corner, we have, you know, viada, dedicated viada gateways on this and you can see our self block storage and our swift object storage that comes with this. And then here is the compute nodes. So what the customer gets at the end of the day is a VPN into their OpenStack dedicated. So if we were to look at a soft layered data center that's vast and we have all the bare metal systems and racks and everything like that in their true public cloud, what we do is we build out this private cloud. We carve that out and dedicate all these bare metal systems to the customer. They get to access Horizon and the OpenStack pieces create their VMs and everything like that. So basically it's a turnkey solution and they just get to start using OpenStack and as many bare metal systems to run their VMs and workloads as they need to. And there's chairs along there if you guys want to sit down. We're going to go over about an hour so you guys want to sit. Okay, so here was our requirement. So we need to build this out, right? We have this public cloud, this public data center where there's a public soft layer and we need to get our APIs and hit this and quickly build the architecture I just told you, right? So we need to do this rapidly in a very secure manner. Not only accessing the firewalls but hardening everything else around it. So once we do that, HostedManagePrivate, it's dedicated to that customer and the public side cannot get to it, right? So we have to make sure the security is in there and then as we go along through this with the demo and what Ravi's going to talk about is how we embed some very hardware based security, some of the leading edge security around this. Okay, so here's kind of the picture. Multiple customer private clouds in there, right? So there's our little customers, they access through the VPN. We have a single central management. That's the piece. It's basically two bare metal systems that run everything to call the soft layer APIs to acquire the hard layer and then it starts layering open stack on top and everything like that. And again, there's chairs down in front if you guys want to make your way and we have the support team kind of around the world. So we kind of get 24 hour coverage with this and they're basically monitoring everything and if there's a security patch, they add that in and things like that. So I think to me it seems like a fairly standard deployment. I'm going to go back to this one more time. Bare metal systems but you see all these VMs in there. How do we make sure those VMs are secure? We start with what a lot of people call that secure golden image. We happen to call ours Lucy because it's the genesis of everything else just like the Serengeti Lucy, right? So we take a single image and we harden that image and then we stamp every VM with the Lucy image. So we know it's hardened to in our case, it's IBM's ITCS 104 and it's basically a compliance standard. It's your typical thing, right? It's you know password hygiene, how often you've got to change your password, locking down ports, everything like that. And I'll go into a little bit of how we do that. And we use a standard out there from the open group called OASML and the OASML is Automated Compliance Expert Markup Language. So the deal is, I want to don't want to talk ahead of my slide. So we did it with the IBM standard. We also did it with PCI, HIPAA. We kind of have those in our back pocket. We're not, we haven't deployed those yet. But the idea is if you look at a compliance standard, it comes in a PDF format and then you got to read the PDF and it says set your password length to eight. Change your passwords every single, but it's human readable. So you got to go and you got to configure all of that. Well, what we did is we took the human readable part and we put it in XML. So you see like PCI section eight dot three dot two, your password length needs to be eight or greater. And then we put in this OASML in the template, it's kind of a key word like password security dot length and then the argument eight. And that's all that's in the OASML template. Didn't tell it how to do it. But the idea is you can take that template in any device that's OASML aware. Any device that's OASML aware is you push the template onto it. Ubuntu, Red Hat, Toaster Oven. And it gets that and it goes, Oh, I know how to set my password length to eight. And this is how I do it. But what it does, it writes that back into the XML. So human readable and a keyword, the device itself that's being applied that writes it back into the XML. So what we have now is the human readable part, how the device implemented it on the device and its success or failure. So it's complete compliance auditor ready. So you can export this into an Excel spreadsheet or report or anything like that. The compliance guy knocks on your door and you have everything we put it in log stash. And we can run it in a check mode to make sure it's still doing that way. So you can push the template and say set yourself to this Ubuntu or Toaster Oven or you can push it to that device and you can say just report on yourself. But we get that report back. We compare it to the baseline. We make sure there's no mistake. We push it into log stash. The auditor comes and we have all of the artifacts to help pass our compliance. And this is where you were you going to talk here, Robbie? Okay, so just real quick. So the next thing that we added in June of this year was Softlayer runs on on Intel with TXT. It has the TPM technology. I like to say the trusted platform module is a brand new technology that's 20 years old, right? And I'll tell you I've been doing the TPM stuff for a long time. And here was the issue with it. It's hard to use. There's a little bit of difficulty on understanding it and everything like that. So we would offer customers and this was your traditional environment, your enterprise where they owned it. And they would look at the TPM stuff and they'd go, wow, that is so secure. We love it. It's great. But I don't have the bandwidth to monitor it and understand it. I'd have to have personnel on this. And what happened with the cloud? Is that all shifted? Since we are the Hosted Managed Private Cloud, we can integrate it in. Our staff monitors it, makes sure all the systems are secure and everything like that, giving you the security and the boundary control and additional compliance and reporting. So now we'll get into detail. Thank you, Sean. Good afternoon everyone. I'm Ravi Varanasi. I'm the GM of Denizio. And actually brought out the fact that for Hosted, dedicated private cloud environment, or for any hybrid environments, security assurance is a catalog offering that enterprise that uses the IBM cloud will actually be assured of. And this assurance can come from a hardware root of trust that cannot be tampered with in a way you can verify not just the boot process, but from the boot process through the bare metal OS or the virtual machines and the hypervisor and all the way to the application. So how can we ensure and verify that? This is the version of it. Okay. I'll just leave it there. All right. So with that in mind, I'll quickly mention that in terms of the prioritized security use cases we heard from multiple customers, especially cloud service providers and enterprise, and how you can orchestrate this open stack information security and how you maintain control of data, even if it's in a third party environment, with the keys still staying in the control of the tenant or the enterprise. So that's the critical piece and that's one thing we solve jointly with IBM in the in-flight data at rest and data in use, essentially data in memory where you actually encrypt data or create some types of enclaves protected by the processor. Now the second and the most important piece Sean mentioned here is chain of trust. So this is where we talk about trust and attestation. Can't hear me? Why don't you try that other one again? Maybe he's hooked it up now. Well, I'll start Sean. There you go. The second priority we talked about is the chain of trust starting from the boot process initiated by TXT TPM. Yes, Sean can certify that TXT was hard to configure. The reason is we had to actually go touch every server and change the BIOS settings. That doesn't fly when customers ask for trusted compute, trusted network and trusted storage as their use cases and we just give TXT as a component. So cloud-intimity technology, the software layer that gets orchestrated by open stack, multiple schedulers of open stack for whether you're talking about compute network or storage and hides the details of TXT, TPM and the hardware under it but provides you the trusted use case that flew well with the customers of IBM software. So and I'll talk to you about that and the last one is about secure NFV. When you have SDN and controllers obviously with the open stack, Neutron and other sessions that you had, the moment we sacrifice, control or something happens where you lose control on the SDN interfaces or controllers themselves, you can bypass the firewall, you can bypass many configuration scripts. So essentially assuring integrity of the controllers on SDN while also ensuring that the NFV functions whether it's a virtual firewall, virtual switch or a load balancer, all of these come up with integrity checked. That is the use case people were asking us for with hardware assurance from the network. So these are the things we will touch upon in general but let's focus on trust and attestation which is the essence of today's topic. Assuring integrity of infrastructure, visibility of workloads. So what we heard most of the time is you don't want to move your workloads whether it is containers or virtual machines into the cloud just because there is no visibility, there is no traceability for trust. We don't know what's running out there both in the VM environment and from the infrastructure perspective. So ensuring that right from the boot process is part of integrity assurance, hashing the images, hashing the boot images, hashing the virtual machines, hashing the hypervisors and storing these hashes in TPM. Specific PCR registers of TPM and maintaining these as golden hashes somewhere else that you can verify with. That is the system and that is what tries to make sure that you can have a dashboard from OpenStack that shows so-and-so image is trusted. You have a list and a report of what you think is assured of infrastructure and what you think is not and if you know that a specific image is not trusted you can initiate action to move that workload onto trusted infrastructure. So an OpenStack ironic in this case actually helps. So I'll touch upon that. So these are the problems essentially visibility, confidentiality and integrity. So once we delivered the trust attestation solution the next question we were asked is what about confidentiality in OpenStack glance attributes if you're actually in glance images if you're storing a virtual machine can we store it can we store that ISO in an encrypted form where the keys remain with the tenant and when this image is loaded by Nova scheduler into a hybrid environment how do you bring that up with the keys shared by the owner at that point before decrypting the virtual machine and bringing it up in a hybrid or even a hosted private environment. So trust attestation along with encryption and confidentiality is the new addition to this whole stack. So this uses Intel's AESNI compression crypto and a few other hardware instructions to make sure that we guarantee this trust we guarantee this encryption for you and location and boundary controls that's the third and most important piece with asset tags that can be stored in the TPM we have since ice house release for OpenStack a filter a scheduler filter for Nova that calls specific interfaces of cloud integrity technology on the IBM software environment whenever the virtual machine is being brought up this filter gets called for location checks for integrity checks before Nova scheduler is allowed to bring up this workload on a stated server or in a stated location so this plugin is available it is waiting to be upstreamed since ice house we we cross K and L releases but it is available as of today if customers actually take it and load this plugin but hopefully this will get integrated soon when the scheduler features get opened up by OpenStack so all of this is done with tamper proof verifiable hardware so these are the problems and solution for this is launching of the VMs and applications on servers with boot integrity and platform trust so this is the trust chain I talked to you about enhanced by the Nova scheduler and the second and important solution here is workload boundary control when we say workloads it doesn't have to be just the virtual machine it is a container with apps embedded in it so I'll talk to you about plugins for Docker, Mesos, Kubernetes and many other plugin environments for containers as well so the bare metal OS gets CIT the cloud integrity technology integrated in it by default and then there is a plugin on top of that to take care of hypervisor and above to the application stack so this is the same thing we do for container integrity there's a session today from Intel and some of our partners I think around five o'clock to go deep deeper into the container integrity session so feel free to attend that thanks for your participation in the birds of feather this morning when we discussed the container integrity and the problems that you brought out and the last one extending this chain to enterprise ownership where the keys remain in enterprise you all heard of the safe harbor ruling in Europe we heard about a few other exposure elements that we had especially on Amazon this was about two weeks back where L1 cache was used to copy the contents of the key and compromise the plain text keys through side channel attacks on virtual machines and this was on Amazon public cloud just as an example so there are two things we can do to solve the problem one the cache management technology on the Intel e5 version 3 processors helps you lock the cache line to specific virtual machines so that prevents the side channel attacks but from a typical Intel solution we are not saying okay now go ahead and upload all your processors with the next version of v3 right libgcrypt is the best known method that we have so if you load the right version of libgcrypt libraries it hundred percent prevents the side channel attack without the processor actually taking care of the cache lines so in the container environment with trust and attestation when you're loading libgcrypt if libgcrypt is mentioned in your trust manifest file of the initram disk i'll tell you that if you have questions about it later but there is a method where we can enforce the right version of libgcrypt that needs to be loaded on this virtual machine before we allow the services to run so typically any linux path if i just pick linux as an example any linux path name can be part of your trust manifest it doesn't have to be just the boot process of the hypervisor so if i say a path name and ovs like open vswitch or a firewall environment if that is mentioned in what we call a manifest file that is used as the master file for verifying any service or any daemon that gets loaded we check the integrity of that service and we make sure that the code that is about to get loaded is known good integrity check value so this extends obviously to any linux path and that's exactly what we mentioned here in terms of the touch points if you really look at the stack we have to enable txt tpm which is by default there on the hardware issue and then the boot loader it's an open source tboot environment that you can use and the third touch point is within the os include the cloud integrity technology trust agent that comes free and by default in most cases and the last but not the least is have this plugin in the open stack nova scheduler or an associated cinder if you're talking about storage relevant schedulers and if it is network obviously we have some plugins that we are working on for neutra so if you have these four touch points where the fourth one is where actually you have to load this plugin you're good to go with the whole chain of trust that takes you with trust attestation from the boot process with platform integrity through workload integrity and we actually are working on runtime integrity at this point but as of today we have solutions from platform all the way to workload integrity and I wanted to mention this as just a small demo and a use case for you we talked about trust attestation but this is boot up right with docker and container environment these containers move almost two or three times a day on a public and hybrid cloud environment from what we learned so every time a container moves these attestation checks are triggered now in this particular case an asset tag is nothing but a random number with the host uuid with the signature all of which is combined to form an asset certificate that's what we call an asset tag and this tag gets loaded onto the tpm registers and also into the Intel cloud integrity technology server that sits outside this particular server that is being protected so the golden measurements or what you call the reference hashes are all in the Intel CIT server and the asset tag is loaded into the tpm module what does this give you the asset tag can have location information it can have even controlled app information that you want to so you can overload this tag with anything as far as Intel is concerned it's a number we use it for location controls today so with GPS RFID or a system administrator configured input you can tag a server saying this server belongs to the boundaries of this country and if this server or the virtual server physical or virtual is moved outside the specific boundary this tag check fails and the platform is not allowed to boot up in this case okay so open stack nova scheduler with this plugin before it launches the VM checks the geo policies and then verifies it with Intel CIT server before launching it so let me just see if i can show you a one and a half minute demo of how this video will demonstrate the control of workload placement in the cloud based on location policies to begin create the various tags that will be provisioned to each server here we create a country tag with possible values usa and canada next we create a selection of the specific tags that will be provisioned in this case we use the state tag and select california the open stack hypervisors page has been modified to include a provisioning tool here we demonstrate provisioning a host with a selection of tags this process writes tag information to the hardware after rebooting the host the provision tags are visible by hovering the mouse over the location icon once all hosts have been provisioned with the necessary tags policies can be associated with image objects to define workload placement in this example we've defined a policy for the image sap hannah california that will allow it to be launched exclusively on host provisioned with the sap hannah and california tags these tags correspond to the tags provisioned to host havana at node four and we can see that the scheduler has launched the instance on the host that matches the policy requirement if no hosts have tags matching the policy set on the image as in this example where we define a policy requiring both the hadoop and california tags the location policy will not allow the image instance to be launched since no hosts have been provisioned with both the tags defined in the policy the instance is not launched this concludes this demonstration of geotagging and boundary control on the cloud so what you have seen is using open stack dashboard how you can enforce location controls on virtual machines and in container environment as well so with that let me just show you a couple of slides here on extending this to the docker containers in addition to vms if you really look at the bare metal os attestation as part of the t-boot plugin and the standard plugins you can have this embedded within the bare metal os we're working with docker on this and that's being made default so once that is in place there is a virtual plugin called a docker plugin that sits outside the context of the host os within the docker demon in that context and helps you take this trust control to say for example IBM web spear just as an application running as a container or any other apps running as containers within them so this provides you two plugins here one of them at the bare metal os itself to take care of the platform and the os environment and the other plugin that sits slightly above the hypervisor or in this case the bare metal os context to help you attest containers and apps within the containers above so this is something that we will discuss a little bit more in detail at five o'clock today in the container section but this is how you attest containers and the last one I wanted to show you here is once you boot up the environment if for some reason the platform gets compromised Intel CIT would know that you lost trust of the specific platform and through some injected information into a monitoring virtual machine that could be on this platform or a different platform this VM can trigger actions through these NOAA elements in this case let's talk about ironic from open stack you can actually use ironic to move the virtual machine out of a trusted or an untrusted platform based on the policy in this case if it's untrusted ironic will help move this virtual machine out from that particular platform in question to a trusted environment so even after things boot up with open stack whether it's NOAA or ironic or Cinder if you're talking about storage you will be able to look at and actually track the trust of the platforms and move your workloads out in case the platform becomes untrusted so now I'll ask my colleagues here to do a live demo of what we're trying to do in the IBM environment and I'll give it after Elvin here then we'll open up for questions Hi, my name is Elvin, I'm an Intel's Cloud integrity technology with IBM's ICOS so what we're going to do is try to do a live demo launching a trusted image onto a trusted hypervisor but I guess before we get to that let's talk about the automation we did because it's important to say that to explain that provisioning the TPM and then with open stack does have its difficulties but then I'll give a brief interview overview on how we did this so what we use here is Chef I'll use Chef to install all the packages and I guess the prepared VMs for open stack so we have a horizon VM and we installed the CIT at a station server along with horizon it's a Toncat server and we also have to I guess add the nova filter to the list of filters for nova so this particular nova filter it gets the it works on the schedule it gets like the request to launch an image and it grabs the statuses from CIT at a station server and checks oh is this a valid server to launch this VM on if it doesn't it errors out as you saw before if it does if there's nothing wrong then it goes ahead and proceeds to launch it so we use Chef to I guess provision the servers you want to talk about the translation hi guys my name is Michael I'm also a software developer IBM and I also worked on CIT integration and in addition to CIT server and the nova trusted filters scheduler we also install intels actually has some extensions to horizon that we also installed via automation in chef recipes and also we we added a kind of a horizon panel that allows the user to kind of see everything related to tpm txt on directly on horizon so they don't have to go to a separate area to view the information and also on nova machines kvms we also installed intel's CIT extensions on there as well to the api to allow for this trust trust tags and also installed trust agent to and to allow the novus or actually the CIT server to interact with the tpm's on the nova nova machines nova nodes and as also we install a t-boot to allow for asset tagging of the same nova nodes and so now we're going to go to the live demo and now did you want to talk about Jenkins so the version of chef that we use didn't allow automatic reboots because you have to in order to build a trust from from the boot process you have to actually reboot the server so we had to make Jenkins jobs to actually reboot the servers and install things that we needed to i think for installing this particular for one compute host it takes around two or three reboots to actually install things properly so we automated that through a Jenkins job and so what what happens is when a customer comes to us they they give us I guess tags that they want to give the compute hosts and we provision the tpm to tpm's to have the asset tag for them and this is the job that we use to push the tags out and we also have a job to back up the I guess a tpm data in case the server crashes and yeah this data allows us to recover ownership of the tpm's after saying always reload or some other you know disaster yeah so now we're gonna get to the live demo but let's make sure we're actually authenticated because we get lucked out all the time okay so now we're going to show you a trusted image and or and just how we kind of extended the horizon like functionality to create images and tag them and so we added this part down here where you can define a trust policy and also define a tag here you can see all the tag names examples and also the tag values and in our case this is the name of the kvm notes and so okay so now we're going to launch a vm from the image here we have the trusted images of the images that we did find the policies for one is correctly tagged like one of the compute hosts actually has that tag and for the other image it is tagged with a tag that the compute hosts don't have so so one will fail and the other one will succeed so what happens here is that we are now building the tag the image instance of image that has tags that that match one of the kvm notes as you can see here they don't know what the correct tag is running running as expected and so we'll now show you what happens when you launch from an image that has tags that don't match any of the kvm the ones on the kvm notes and so now as you can see here there's an error that pops up and it says no valve host is found so now we're going to show you our the additional trust pound that we added to horizon and so this kind of kind of gives you the basic information about each kvm node as you can see here the host name and the trust status and just kind of just this is reporting back like the what the az stations that CIT server made and so these are run periodically and so as you can see everything is trusted no one's demodified the bios or any of those very important parts of our bare metals and here we also have just tag tag names and values and we'll show you the trust report you can download and so this shows you the last 90 days of az stations so this will you know satisfy something like a kind of ITCS one of four kind of like one of those compliance things where they require you to have logs in 90 days of you know everything and so I think that concludes our demo answers really fast about some difficulties that we had what happens since it's uh it's hardware is that you have to work with software to provision the tpm and so what happens is let's say you're trying to you know take ownership of the tpm and then you uh okay that's a lot better and then you accidentally like deleted the password you don't know what happened to it so what happens is you need to open up a ticket and they have to go into the bios and then they have to clear the tpm for you so it's a real hassle it sometimes takes like only an hour sometimes takes like five hours so you know you don't want the customer to deal with that so we'll do that for you um and I guess another thing was the OS reloads and that's why we had that job to uh back up the tpm data so yeah otherwise you would basically lose ownership of tpm because the t agent instance on those kvm nodes will also be basically deleted with the OS reload and so what happened is that you have like ownership already taken to the tpm but there's nothing to actually claim that ownership because the t agent was already uninstalled and so what we did to work around with that was the Jenkins job that we saw earlier and um that would basically it saves some values very securely and um kind of like uses those to recover a tpm ownership um yeah so I think I think that's it um I guess we can move on to questions we can open up for questions in case uh there's a mic right there I think is this working yeah there's a one right there does that work uh does it work yeah okay yeah I had a question uh you mentioned uh in your slides that you detect changes um and then you respond to those um so are the changes like what registers are you looking at are you just looking at the ones so like a bios one is only on reboot do you only detect on reboot are you doing other things at runtime quick answer to that is we detect and we actually look at the hash of the platform that includes not just you know components of bios but the base OS the virtual machine or the hypervisor so the combination of these things or combination of these hashes becomes a master hash that gets loaded into certain tpm registers like pcr 19 so this value is checked against the golden value of what it needs to be and if that fails then we consider the platform as out of trust needs or out of trust requirements and at that point cit injects this message into this uh monitoring vm that sits either on the current server or somewhere else to tell open stack ironic in this case to initiate a virtual machine movement out of this trusted untrusted platform so essentially the hash includes these components of the platform and the combination of these hashes is loaded into the tpm's register which is monitored by cit from external environment um so one of the things with the the tpm is the hashes right they're very brittle are you guys doing any work to like maybe provide a library of different hashes or anything like that like you know in icos we manage that ourselves but you if you're from if you're familiar with tpm stuff right the idea was everybody would publish their OS and say here's the hashes and you know you're right so if you have to update the kernel your your hashes change and your whitelist changes because you get the hash from the tpm you compare it to the known good whitelist and you know again this goes back to the difficulty of managing it that's why we do it in our solution because we know when we're updating the kvm we know what the new whitelist is going to look like we can make sure it's a known uh change as compared to the the hash changed out of the blue and then we know it's some type of attack and we can mitigate it from there so good question any other questions all right we'll be around in case we have some more thank you guys thanks for your time thank you