 Hi everyone I'm Malini Bandaru from Intel and I'll be talking about bare metal trust My team Tan Lin, Wei Chen, Jimmy, that's Wei Gang, myself and Shane. We were involved in this project We basically developed a proof of concept and Next steps will be you know taking all your feedback and upstreaming this code So welcome to today's talk Our agenda is to motivate the talk, but I guess you're already motivated because you're here I'll briefly touch upon what we have in terms of software and hardware to establish trust then Ironic with this attestation then our demo and our next steps, which will be our blueprints and references So the motivation Can we trust especially with bare metal you're giving some tenant your machine and What if they've installed a new BIOS on it or have they installed some new PCIE firmware for any devices that are attached to it What if they've changed the kernel? I mean you've just given them your whole machine Are you confident after they've left it that there's no malware on it? Can we be sure that the next tenant that we will allocate this machine is safe? It's free of malware. It will not be harmed. I mean there's a lot of liabilities involved here so We were talking to a couple of people about this and Robert Collins from you know Triple O and they vananda from ironic were like, you know, this is something we really need But is there adequate technology to make this possible said, you know I would never want to give my bare metal again to another tenant that was their sense that was the kind of way they were talking and We said, yeah, we can do this. We have Intel TXT and let's make it do this So earlier on in open stack. We have something called trusted compute pools it was mainly used for hypervisors and There's already a lot of protection with the hypervisor your VM is in there. It's isolated from other VMs So other tenants are not hurt So it's something nice to have but this was absolutely essential to have when we started talking about bare metal in open stack and ironic giving people Appliances to use for their own purposes. So our motivation to detect malware Can we detect changes in the BIOS? Can we detect changes in any PCIe device firmware? You know, if you you can't really go in there and plug a new device device because it's in a faraway cloud But anything that's changed you should be able to detect Any kernel changes any operating system changes? Whatever we want to detect those So, what are we going to use towards this? We're going to use Intel TXT technology and That definitely can detect changes in the BIOS It can change detection in the number and PCI devices firmware kernel updates the whole shebang But there is one caveat it detects this at boot time. So you think about this as chain of trust This is your ground level just the basic at the ground What can you change and what can you detect if you've changed it as soon as you boot the system another place? Where this does recalibrate and detect any changes is if your device has gone to sleep It wakes up then also all these numbers are rechecked. That's how this whole Intel TXT works It measures all the systems that are coming up all the BIOS Then there's called this authenticated code module. So the chain of trust works that way and What are we looking at in terms of the cloud? I have these little green circles They represent compute nodes on which you're going to have hyper visors and we already have trusted compute pools for that But the nodes that we are now concerned with ironic for your under cloud launch or for your bare metal You know delivery are these little orange and yellow ones So that's the ones we want to target now We want to be able to claim that they are trustable at least when you boot them up and when you release it And I'm going to give it to a new tenant So we're focusing on these little orange and yellows and why do we want to focus on the service nodes? Because you know you want to trust your cloud service nodes, too Okay, so let's talk about what we're using to achieve this We're achieving trust by using TPM. That's a trusted platform module then Intel trusted execution technology and OAT is an open attestation service Okay, let's take a quick look at what it looks like This is a very enlarged picture, but that little chip up there is this thing called trusted platform module It's a separate piece of hardware There's a 1.2 release out there and then a 2.0 release that's coming out the 2.0 release supports Additional hash algorithms. It'll be more globally available. So it'll be useful and available in the China and other markets The original one wasn't there was all export licenses So typically what does a trusted platform module have in it? It has a random number generator It has hash sha1 hash generators. It has an encryption decryption engine It's slow, but it does the job and it's secure. It allows you to save some Information that's called a non volatile memory there Then when the system is released through the OEM or through the chip manufacturer You know install a little key on it and that's part of its identity. It's called the endorsement key So this is a whole bunch of stuff in there that makes some kind of security and attestation claims possible Okay, and now there's this thing called platform configuration registers they're about 24 of these and This is where measurements are Maintained so once you start like loading the BIOS it's measured in it's a hash It's saved in some PCR. I think like seven or something Then your kernel is measured that goes into another PCR like 17 and 18 And so there's a whole bunch of things the how you've configured your system that goes into another PCR and there's no way to modify these they're only measured and That's something then you can expose to others through a software stack and say hey This is the measurement and you say oh, yeah, that is exactly what I expect so it's gonna be trustable at that point so So our trust stuff TPM we've seen it and then what's all the software stack that I mentioned So you have like an Intel platform at the bottom then a little chip that went on top of it on top of the platform called the TPM Then on top of that we're using Intel TXT technology and that has to you have to enable the Virtualization you have to enable hardware and the memory and the BIOS all of these pieces have to be enabled Then a piece of software called T boot comes in so you can normally load with grab or EFI or something else But this is another way to boot the system and it's called T boot for trusted boot It's open-source software so accessible to anybody who's using a Linux system Intel is developing that so that basically makes the right calls Uses all this Intel TXT technology, you know initiates an instruction called as enter and things like that That starts the whole launch measuring process So it starts right from you know an authenticated code module and then the BIOS being measured and once the BIOS is done Then the option ROMs and so on so forth and T boot itself is part of that initial authenticated code module It measures it out to see is this T boot a legitimate valid one and it's all signed by the provider in this case It would be Intel who's the platform provider the BIOS is also going to be signed But it's going to be signed by the OEM which might be HP or Dell or IBM whoever is the provider Then a layer of software on top of that is called trousers Which is really a user-level library to access all these PCR contents and then at the very top We have something called the open attestation. Then there's a commercial product called Mount Wilson. There are two components to it There's the client and the server so the client is the one that will reside on your host nodes That you're going to be giving away through ironic to different tenants And then there's a server component that will say hey, you know, give me your values your PCR show me your ID card pretty much and then it'll check against a whitelist that you should have provision and say oh, yeah good to go Okay, so that's pretty much how this whole software and hardware stack works So let's just briefly look at what the setup involves to do this in your cloud You'll have to set up the OAT server You'll have to provision known good values like it's like taking photographs of who's Malini who's Devananda etc So you know how these trusted entities look what their value should be Then for your bare metal images that you want to launch you want to have entirely trusted You have to give their signatures to so typically from the OEM you can get your BIOS and those kind of measurements And then from the image itself is it a Ubuntu or is it red hat? Whatever? What is that one signature? You have to capture that Then at each node level when you open the box when you've got the hardware in your data center You'll open it you'll say hey, you know enable the TPM in the BIOS You'll say enable Intel TXT VTX VTD It's at the VTD so that you don't have any direct memory access type of Attacks on your system and you have to take ownership of the TPM So when it comes out of the factory, it's not owned by anybody so that you know other subversive attacks can't happen You the data center provider will have to accept it and put some password So currently these are all manual steps and it would be nice to have scripts But we'll need OEM support and there's a little bit of a chicken egg problem here The OEMs will say, you know, a lot of people are going to use it Then it's worth making it and then a lot of people say hey No, it was already available easy to use that kind of stuff and then after you take ownership There's also this issue of saving this password and this is where maybe barbecue can can be used another open stack project so What else do we need to do as part of this setup in ironic just like in you know, Nova Compute You specify flavors and you can say I want trust it You would have an ironic flavor and say I want to trust it and as I said we have to Specify, you know whitelist provision that you can do this either through iPixie or Pixie boot both work You'll need to inject the old client into these images of yours that you want to launch in some cases You don't need to like There are certain distributions like Ubuntu etc where the old client is already baked in into the Distribution, but if not you need to do that Then after the ironic second boot first you have the deployment boot You know for those who know what the ironic how it works and in the second boot is when you actually boot up the image That the end tenant wants That's when all this attestation piece that's kicking in So how does the workflow go? I mean you have the hardware they have the bios a t-boot the image then you first enable everything then You have your glance image that you want to download. So that's the second step it boots What happened the number two are there? There's the ironic. Sorry Then the PCR hash values that you have to send up to OAT and then it you know compares contrast checks everything And I says hey, it's all good to go. It's trusted allocated tenant So the last steps four and five are not quite that way We wanted that way if it comes back untrusted you don't want to give it to the tenant So that's a piece that we have to still address So let's go to the demos and Go to the beginning Here we go So the very first one out there is our OAT server and the plan now is to launch a new VM and We call it our OAT client Everything is pretty much how you would do it today. You choose your image you choose your flavor and you say go And it's a real nice system. It does this little thing It'll come soon. Okay. Good And there it came. We still don't know if it's trusted. Okay, and this is the second pixie boot as I mentioned The monitors a little dirty This is its monitor It's booting up. So we're just seeing that whole thing run by and it's a reboot boot loader that we're using and As it works all those measurements are going into the PCR registers Oh and by the way, this is the client is a Java client So that's been a bit of an issue with like Susie and all saying hey, we'd prefer a C client So if there's enough adoption it makes sense to you know, rewrite it in C Hey, and look it's trusted Okay, so how did all that magic happen it booted it had all those PCR values We've told it to go talk to the OAT server and the OAT server says yeah It is what you had provision. Those are the exact values that I'm expecting and says good to go So Things that we did Okay, where is back to here? Okay, so in let me just go back to my slides one second So the first of the demos is what we looked at So the first one was just a bare metal trust the second one We wanted to demonstrate was detect firmware change. Oops. I made a fire wire change. Sorry So we wanted to detect any firmware change, but we had like a Like a you know 10 gig PCIe and Nick etc So we didn't want to quite mess that up because what if we can't put it back together properly So what we said is let's see what else we can detect So we said how about we add another PCIe device will we detect it? So the second demo will basically say I had a new PCI device attached to my system Will its measurements change and that's what we're going to demo over there Okay Back here, and I have to show you number two and I have to rush here quickly Okay, so we added a Nick card All this is the same old stuff We can say how's the weather in Paris while this is going on I Really didn't want to do this online because then what if it takes longer at least we can do fast forward here Okay, so this is our second appliance. We're launching it. It's still unknown Think we've done some fast forwarding here And it's untrusted. Okay So what if you wanted to change your kernel because hey, you know, there was security update or There's a BIOS update because there was some insecurity and I keep getting those from IT You want to apply them and then what happens it's going to be untrusted if you had the old whitelist values So next thing you have to do is update your whitelist. Okay So that's exactly what we're going to do. We're going to pretend this PCI device that we attach is something we want and Let's see what we do after updating it what happens with number three and then let's go back here Okay, so see that untrusted what they've gone now is go provision at the old server a new whitelist value You can't read it even I can't so don't worry. We're doing it there. Absolutely. We'll put them up And if you go really close you can see but we'll give you more information So the untrusted should become trusted and Notice this word called poll instance. We're not rebooting the machine It's just that the whitelist value has been updated So in these actions like you have create snapshot and delete and start and all we have another one called poll like go check Is it's still a trustable and that's essentially what's going to happen, but where I don't see that happening I didn't see him do that click there He did okay. I missed it. I wasn't paying attention Okay, bad student here So that's essentially what you can do, but what's the advantage of having this poll instruction? Somebody wants to come and check for audit logs for compliance. Is this machine still trusted at least at boot time Was it what we expected? Anytime they want they can make that API call and get back the answer true false and it does not upset any Activity on it any applications running because it just checks the PCR values and that's the guarantee of Intel TXT These PCR values are only going to be changed at boot time. So they were what you were at boot time There's another aspect like if you did any kernel change anything else, that's when it'll again like it comes out of sleep Let's say your platform is asleep. It comes back up then it'll remeasure. So you can track any changes on your system that way Okay, so let's go back here and see what our next steps go back to from current slide Okay, so we saw this we've seen enough of that. What are limitations? Of course your bare metal nodes in our open stack cloud are all going to be Linux bare metal clouds I mean images so you can call it a limitation, but hey, that's what we're We're gonna deal with you know next bare metal and our trust to is only limited to Linux bare metal So it's not like your hypervisor and a VM image that can be windows These are all like Linux and our T boot etc all works on Linux Today what we needed to do was we had to inject that client into the image the tenants end-user image So that's what I mentioned about the OS we adoption in the chicken egg problem I Think you know we're talking to Keith and if they see value and all maybe this will happen It'll get integrated into red hat proper as opposed to being in a side repository And as I said Susie had the issue with the C they want a C version of the client and you know There's enough interest we can do that and We had one manual step here that is all those enabled TXT enable VTX VTD and that's where I think we can get help from our OEMs If they see enough adoption we can kind of convince them that hey, please make some scripts because these are very Very specific to is it Dell is it HP is it IBM because it's their BIOS and how to change it It's not something that's Intel specific at that point and What are our next steps? Let's say you have a machine that's Untrusted and you want a trusted machine. This is some this is a moment when you want to alert an admin There are two options here. I mean was it an intended change and definitely during any you know Integrations updates etc. There will be such changes. There will be a moment when you have to update your whitelist So good things are like a BIOS firmware any of those updates that are you know security whatever updates are good Or is it just a missing whitelist entry and you have to update that or is this something really rogue? So you have to determine that piece So with TXT there's logs so you can kind of check what happened when and determine these sort of things to do your forensics there then Let's say you wanted to assign a trusted bare metal to a tenant and you weren't able to like the machine that came up that you Thought would be trusted is really untrusted You can say Shall I retry should I try finding another machine after I quarantine this out and you know alert the admin to See what to do about it But maybe there we might have to do like two tries three tries Whatever something configurable because you do not want this to be another source of insecurity like a denial or service thing where you have Some tenant uploading some random image that is you know invalid and it has no whitelist entry so and just to make us spin our wheels in the cloud there and So we have two blueprints out there. These are like just Placeholders right now. They don't have their spec dot rst file stuff in there for the ironic but The ideas that all came here the kind of things that we were asked to address We got from Robert Collins and they were and then these were the kind of things that they were really concerned about and I Think these are dressing those and so we would look to upstream all this code and we'd of course love your oh I didn't need this one and And I'll add a few more references, but there's a whole lot of stuff on TXT and Intel's website has a bunch of things over there and For even our you know horizon demo and modifications. We started using something from ironic So it was helpful that the horizon PTL said start with this so it made our job easier So that's the status of this work Any questions and These are like Yeah, the hardware that we support is No, no, no The limitation that it is is for Intel. It's the Intel platforms that support Intel TXT And there might be something else in AMD land, but it's not Intel TXT at that point So it had to be something similar with a plug-in type of thing But it works on all our processes for the last four five years or more both client desktop and the servers the thing about it is TXT is a little more complex in the server end because they're multiple cores and there'll be one core that'll be loading the bios The others will be in sleep state So it's just the technology behind it is more complex on the server end, but it's available across our chips The flip side after we've validated that the Bios and the kernel are correct How do we is there anything in the blueprints that will be able to prevent someone from going in and Reflashing the firmware while it's running where I understand it wouldn't be detected until the cert the system got rebooted Some things you can't do totally and escape like if you were to update the bios And there are different ways in Intel TXT like I was reading on like what if that question comes Um, it does get detected because there's something called a secret flag From somebody updating the bios like that Ah We can't control the client once they get that machine. Do we have that control? No, it's just that After they've done their mischief, we at least can be sure that it's uh, you know We can detect that they messed up the machine before we give it to another, you know Unvary client and destroy their system. So that's where this stops but There's other work going on and it's not yet there where you can start doing this at runtime like Hey, someone's messing around with this piece and detect it But that's more in your face kind of thing. It's going to use up your cpu cycles to detect Okay, yeah You can use your server for everything the same server will work. Yeah No, this one is mainly for your communication so that you don't have repeat kind of attacks Like I'll ask you for hey, give me your pcr values and then this is for the generating a nonce type of thing Or if you ask for some more public private keys, then it'll bubble down into this and give you those keys No, no, no, yeah The tpm stuff is all very low bandwidth But one of those consideration and it all comes on some serial lines. So it was basically more secure But just to meet this need Devan under you No, I can't I'm sorry So in interest people say I'll take that as an action item and find out And I'll update the blueprint with that question. Yeah there are some advantages and disadvantages and I think this is more useful for uh They call it like compliance and logging and all you can always query and say hey, give me your values So for like compliance logs audit purposes, this is more useful But you don't have that kind of footprint of what happened with secure boot type of solutions. So that's the advantage here Yeah, it's so little It's all in your cloud Which changes are you talking about? Yes, yes, yes And no, no, but this is not your performance per se when you're trying to give a tenant the machine So you know this You're the data center operator. You know, you're doing a bios upgrade or you know, you're going to add some new firmware You've got some nice 100 gig nick or something and you're going to add it to your system So you've opened the box and you're going to do it then or you're going to apply these upgrades at your regular downtime That's when you'll change the white list. That's all It's not a big deal. It's like no different from you upgrading to your havana twice house In fact, it's much less of a issue. Yes Huh Then you don't need me to inject your client thing and that's kind of there were a few more slides in the back up there So that's what our team did. They use a disk image builder Yes, it's part of OAT that you have to do this white listing and you would be the trusted entity Because you're going to be entering the white list. So we're not going to hand the OAT server information to everybody and You use a disk image builder and I'm thinking along the lines that if A tenant wants trusted bare metal. Actually, we should kind of demand all tenants who want bare metal should go the trusted path That they should have as part of the image that it's trusted. That means they have the old client in them So that's the way I'm thinking about this. Yeah Thank you That's a good idea Yeah, that that would be a great tool so that you make that change means. Hey, let me update the white list type of thing Yeah, and in fact, we should also like Blacklist the old ones that might not have had a security update so they can't get into the system. Yeah Good point Rob. Yeah Absolutely, they have to be 100 percent the same. So this is That's like the caveat. That's the starting point of all this You have the same hardware the same BIOS the measurement is the same and it should be why because BIOS is binary coded Zeros and ones the hash of any binary string I mean as long as you haven't changed the string is the exact same hash value So you're not using anything, you know, like variable in the platform any entropy related stuff It's basically software that you're measuring you can except the signature would be different I was just asking how Can any level of heterogeneity be in the hardware or not? It sounds like you're saying yes, it can be So heterogeneity is possible except that, you know, let's say your gen 9 and gen 7 had a different BIOS Signature would be different. So you would have two white list entries. Yeah No, no clients The old client is on the client machine and it runs. It's the guy that's going to answer the question Like what's your name type of thing? So, so let's look at it this way We want to give this tenant a trusted machine So at that first point if we have the old client working and running and it reports trusted then I'm done I can give you a machine say I've I did my bit. I gave you the trusted machine Then you naughty client go and you know, do upgrade of BIOS do some other mischief. That's fine It's just that when you hand back the machine We'll detect it before we give it to another tenant Yes, yeah before you we don't want to give it to another tenant without sanitizing and checking So part of you know when you return a machine You'll anyway clear out and you should clear out and there's another talk further Down the week about clearing up any memory etc etc so that you're not leaking information And then when you give this to somebody else, that's when you do this sort of check Yeah Should be yeah So we can't prevent we can at least detect Okay Yes It's no problem. I mean just like you would move your workload from one place to another After you go there, you say I want trusted and they should also have Some kind of attestation service and things like that. So it's perfectly possible Um, normally when you have them across clouds, they'll all have their own services typically the OAT server will be in the same Uh, LAN etc. So it doesn't talk to a remote OAT server at least today. It's not yeah So they wouldn't share you would have to whitelist over there And it makes sense because suppose in your hybrid cloud, let's say in this private cloud You had all your gen 7s and the hybrid cloud out there the public cloud had maybe nines or some other hardware The whitelist values would be different. Anyway, the only thing you want to claim is that it's trusted or not That's all Any other questions? Yes Uh I didn't understand the management workload going up thing, but if you have a custom kernel you would have to tell the The cloud provider that hey, this is my personal kernel. I totally trust it. It has this, you know shot to signature whatever Please make a whitelist for it and that's possible and typically what happens is when you're doing this sort of whitelist measurement You put it in a little isolated box away from any network Then just boot it up whatever readings you have you then push it to the OAT server and say this is my whitelist set You know values. So that's typically how it does. So your custom kernel Would be doable that way But from a self service point of view, you know, we need to somehow create this workflow to whitelist it Yeah, but good question Yes Say that again. That's possible. So, um Good question. We can break this up into two parts. Remember I mentioned they were like 24 odd PCR registers. You could Break this down into I'll call it trusted only if the hardware BIOS and the option ROMs and that part is right Then I call it trusted Customer you can run any kernel be my guest. We can do it that way So the OAT Attestation piece you get to choose which PCRs you want. You want all the 24. Do you want only number 17 and 18? Because you do want to control like the configuration one. So maybe more than like two So there might be like seven or eight instead of all the 24. Yeah, so that's a possibility. Yeah Any other questions? Hey awesome audience. Thank you