 Welcome everyone for coming and you know filling this room to the gills and for all the people that you know weren't able to make their way into this room it will be recorded for their viewing pleasure later but really thank you for coming and showing an interest in Key Lime just a quick show of hands how many of you have heard of Key Lime before perfect half of you that's a that's a good number and for the other half I hope this is a good introduction into the community and what Key Lime is offering so you may be asking yourself what is Key Lime well in short Key Lime is a technology to centralize the remote attestation of distributed systems to give an example of what that means we can look at the initial reason that Key Lime was created so back in 2013 members of MIT Lincoln Laboratory were enlisted to help protect the nodes of the mass open cloud project so you know they were able to build this initial system to do that the solution needed to solve the problems of root of trust and remote attestation of the remote node provisioning and remote node provisioning key concepts there and they also needed to use the available industry standard mechanisms to secure caught servers from hardware on up through the stack now to enable these functions they relied on the TPM chips available in most common hardware today from laptops to servers to even IOT arm devices most of these are enabled either with chip on board or with headers that you can buy a chip on the secondary market and plug it in now TPM chips have been used to secure a multitude of different use cases and excuse me and one of its strengths is the ability to be a hardware root of trust for systems and that's really where Key Lime is using it as the integrity management hardware root of trust so the typical users view of a system when they come into a cloud system or an IOS system is that hey you know what I can trust the company that's providing me these systems it's kind of a blind trust you're relying on the provider to have done all the due diligence to lock down the systems but trust is from the bottom up from the hardware all the way up through the top of the software stacks and as we know data centers are huge so when some malware gets somewhere on your systems sysadmins can't be checking all of the individual stacks all the individual systems for malware and when it gets down into the firmware that's even harder to look for so rip from the headlines couple of years ago you know these attacks are out and they are common so how do you how do you protect against it because now you don't have security view for the entire system you can't trust any piece of the system so that's where Key Lime comes in so a quick overview of the high level architecture so there is an agent a Key Lime agent that will live on each of the cloud nodes and that agent then communicates to back end tenant systems, tenant management systems to do the verification and registration of each node in the overall cloud system so those tenant systems can be in the cloud it could be a dedicated cloud resource or it can be offsite in the actual tenants home base and it will communicate over secure communication lines to enable the entire security system so it can secure anything from a single multi user system all the way up through a multi tenant distributed cloud system with multiple DMs around the world but it can also be run on bare metal as well as VMs so a little bit of how the technology of Key Lime works so Key Lime works on a hash system that is run by the TPM, the TPM has a root key within it that it will sign each hash of the boot process as it goes through so each hash is extended by the next step in the stack and only a valid TPM or you can attest that that entire boot stack is valid by using the TPM's public key the private key is all the way down on the TPM itself it cannot be accessed by any software, you can't retrieve it so it's a secure hardware based private key so here you see from the hardware up so the first step it measures the firmware, hashes that, then passes that to the boot loader in the shim, measures that, extends that into the TPM the TPM has memory locations called program control registers, PCRs where it stores these hashes and we'll see some of those later on in the demonstration but really all you need to know here is that as the boot process moves on each step is hashed and it's extended and stored in the TPM securely and it's secured by the hash based on the secure key, the private key of the TPM so that's the boot process so you can also secure your running system too by giving an approved whitelist of processes for your system and key lime will go through and it will ask the TPM to sign what's running on your system so that you can then compare the hashes of what's running on the system to the gold system hash that you created with your offline air gap system as a gold image so pretty much the way this works is you have a cloud provider, you're the tenant, say hey Mr. cloud provider can I trust your infrastructure and the cloud provider says sure but if you want to really trust me here's the TPM sign log showing that this system is what you expected it to be and then if you do the comparisons and you say yes this is what I expected to be then you say all right cool go start my workload otherwise at that point you can say hey this is crap you know I can't trust this you know we're out of this node go get me another one so likewise with the continuous file attestation the continuous runtime attestation it's a constant ask so you're always asking can I still trust you can I still trust you can I still trust you and the TPM or the agent which is communicating with the TPM will keep reporting back yes here yes here yes here but it's not really saying yes it's just saying here here's my here's my signed logs you tell me if you can trust me and then do what you need to based on that that decision so that's kind of the high level overview now we'll go into some of the lower level technology itself now this is going to have to go fast and I apologize this is an hour long presentation condensed into half an hour but I will be up at the red hat booth after this so if anyone wants any more information on this because it goes too fast please come up and see me so first we'll start off with your cloud node so now on the cloud node like I mentioned you have this agent running so this agent is a piece of the key line system that's the only other piece of software you'll need to enter into the key line security so and you also have your cloud user as your tenant again that tenant can be that tenant system can be anywhere so the other couple key line components that are needed will be running in a tenant facility somewhere is the key line verifier which will be used to constantly get those TPM reports from the agent and the key line registrar the key line registrar is a centralized location where each agent will come up and report its TPMs public keys so it'll be a central repository to hold that information so that everyone in the system doesn't have to be pinging everybody else's node to ask for those so it's a centralized location that everyone knows where to get that information from so high level or mid level view of the key line secret sauce here so I lost my keys so that's supposed to show a key being split into two and I don't know why that has changed from five minutes ago but so we have a boot key and that boot key is going to be cryptographically split into two pieces now that is to make sure that we can number one delicate the security checking piece to the verifier but then we can also not have the verifier able to just start spinning up nodes the other half of the key is held by the tenant and only passed to a node at time that it wants that node to actually come up in provision so as I was mentioning the first half of the key goes to the delegation check of the verifier and the second half is to demonstrate the intent that the tenant actually wants to bring up the node so what will happen is that the verifier will check the integrity of the cloud node and if it checks out it will pass its half of the key down and it will it checks the validity of the of the TPM's key by asking the registrar if hey is this key valid registrar report back yeah great okay it passes down his his side of the key and then the tenant at whatever time the tenant wants to pull up the node it sends the tenant half at that point the node will have both halves of the key it can recombine it and it can start its provisioning with it so the two pieces that key lime introduce the two new concepts that key lime introduces is this concept of TPM key registration into a centralized location and the splitting of the keys to ensure that provisioning only happens when the tenant wants it to happen in a secure manner so now we get into the more more low level details key lime key definitions so we have a number of keys that are used within the system the endorsement key is that hardware based TPM key that is all the way down burned in at the manufacturer site that can't be accessed by anyone and that is used in very very few functions of the key lime system we don't want to be leaking that key out just in case there was some issue with the encryption hash that was used to generate the key we don't want to be spreading that all over the place so because of that and because we also want the TPM to be signing all of this stuff we need to create a secondary key which is also TPM based so the TPM based will the TPM will also create an attestation key in AIK and that is used to actually sign all of the quotes that are being asked for and then we have a challenge key that is used in the handshaking for key lime we'll get into all of these in the walkthroughs and the bootstrap key that's the key that was shown being split into two pieces and those pieces are U and V and then we have an NK key which is used to protect those keys during the initial transmittal because they're going over non-secure communication channels at that point so the first step in bringing up a key lime node is to start the cloud node with the agent on it the agent has the TPM enabled and the TPM has an endorsement key at that point it needs to create its attestation key and then it needs to communicate those keys to the registrar with its ID, its node ID now that node ID can be anything generally in a cloud system it'll be your cloud node ID so that all gets transmitted off to the registrar stored away for the other parts of the system to manage so once it's communicated its keys and ID across then the registrar will come down and it will give it a challenge to make sure that it knows the keys that I am actually talking with the correct TPM so it will use its public encryption keys to ball up the ephemeral challenge key and then once it has that challenge key it will then, the cloud node will then use that key that it's now decrypted using its internal hardware keys so now it knows that yes ok it encrypts its ID and sends that back to the registrar using that challenge key so it was only past the public keys so it encrypts it sends it back the registrar then decrypts it and says yes that's who I thought I was talking to great valid alright so that was that part so now it knows it can trust that attestation key because again that's not hardware hardware burned in it's just something that's generated so now it knows it can tie that AIK to the EK so the next step then is to bootstrap so at that point we have the keys which again disappeared on me so it splits off into you and it sends you over to the cloud verifier again ID, the half of the key and everything and a whitelist telling it what's good in the system so then the verifier will then go and check the condition of the cloud node it will verify that yes this is again the cloud node that we're thinking of and it will actually ask for a TPM quote of the measurements it will also pass down announce to make sure that it's a fresh quote that something hasn't gotten in the way and just passing back quote that was good from weeks ago you know trying to man in the middle there and then the reply from the cloud node will be checked against the registrar's known good values for the AIK once it verifies that the AIK is good then it will pass the v half of the key down encrypted with the NK the key that's used just to secure the key halves so now that the verifier is good it knows the nodes good it's running it's doing everything it needs to now the tenant can tell it to go run so this is the demonstrate intense so now like you know similarly to what the verifier did now the tenants gonna go and ask for a quote verify the quote with the nouns making sure it was fresh it's gonna check make sure it was a valid AIK everything comes back good and it will pass it's half down so it will pass you down now cloud node has both halves it can recombine them and it can decrypt the encrypted boot manifest provisioning manifest and go on its way and bring itself up so now we've got a system that's running we know that the initial state was trusted but we want to make sure that we know that it can continue to be trusted so the verifier is going to constantly be asking for TPM measurements of the running system it's gonna you know same same type of process of creating a noun it's making sure it's a good fresh quote passing it back and checking that everything checks out checking the AIK is again everything is good okay keep going but now some malware gets on the machine now just the presence of a file on the machine is not going to trigger anything it really doesn't matter if there's a file on the machine but if it starts running well then the verifier is going to go down it's going to create its hashes of the running system and it's going to come back and it's going to check to see if it matches with the gold whitelist images it's going to say no something else is running throw a flag and do whatever functions are needed to cut that cut that node off and those are programmable add-on hooks that you can create depending on what you want that system to do when it discovers something out of the ordinary so one of the things that it can be used for is certificate revocation so how would certificates work within the system so there'd be a certificate authority that will create the certs and pass those certs into the tenant now the tenant will use that boot key that it initially broke up the public haves of and it will encrypt the certs with the private half of that boot key and transmit that down to the node the node now has the public boot key so it can now decrypt it and go so now that the certs are in there the cloud verifier can do its integrity measurements with it and then if something comes up that something went wrong it can then communicate okay revoke these certificates so that's kind of how that works so there's a number of different things that certificates can be used for and key lime can enable secure configure management you can do encrypted hard disks and the one thing that we really are looking at here though is the IP second encryption because that is what you can use to communicate securely between your nodes and then if something goes wrong you can cut that off and fence off a node so what would that look like so first you would come up the nodes would come up and they'd have the certs and they'd make an IP sec connection between them attestation loop goes on all the quotes look good okay keep going checks both systems constantly great okay malware goes on up we've got an evil imposter in there somewhere okay we know something went wrong verifier comes checks catches it alright let's cut off the certs it will communicate the revocation to the tenant and all of the other nodes in the system and then the node can cut off its own IP sec tunnels so that's kind of a lower level rushed through example of all this I can go into much more detail at the booth if you want so what we really want to get to as well is a demonstration here and you can also run through these demonstrations yourself all of the information is on the key lime sites at the the user's guide pages so if I can quickly take this down near my screen to make things easier okay so the setup we have here we have two nodes these are both remote nodes they're not running on my laptop one is called Neptune that will act as the node in the system the cloud node that's being monitored and we have Saturn which is the tenant node so I mentioned earlier about the TPM PCRs so here on the node we'll check what those PCR values are so those PCR values are post boot loaded into into the or during boot loaded into the PCRs we are interested in right now PCR 9 now PCR 9 is part of the shim phase of the boot load so we're going to manipulate that in this demo to show what happens on a non secure boot as it were we come over here first thing we need to do is we need to start the key lime verifier that will generate any needed keys used within the system now we'll start up the registrar again these are on tenant nodes on Saturn those are both running we need to run the agent on this node so again this is the cloud node that we're showing the TPM values for so now we're running everything's up but we haven't given that secondary key to tell it to come up yet so now we will go over back to Saturn so if we recall back at the PCR 9 value here B6D59 I've actually changed it to B6D58 so this is the PCR value that we're telling the verifier that we expect so we're passing this in come over to the verifier there we go so the verifier caught it PCR value 9 does not match so it says ok well it doesn't match I'm not going to tell it and cut it off so it's a very easy demo on that side but now we want to actually bring that node up get back to the thing so we can go back and we can delete the node out of the system add it back in with the correct value and over on the agent now you can see that it's constantly being asked for integrity checks for those TPM quotes so it's going this is a normal boot everything checked out during the boot phase so now we want to show that when something is introduced that we don't like what happens so we actually want to delete the node back out for this check so all of these parameters that are being passed on the command line are also resident in a config file so you can make all these settings in the config file and just let it go we're playing around with the values so we're adding the different parameters on the command line so now we want to just say ok we want to pass in a list of known good programs that are running and things that we know are in there that we don't care about so the known good is the whitelist and the ones we don't care about is the excludes.text so we're going to add the node back in and here you can see again the post of vkey so that's half of the key that was just passed in by the tenant so now we're back into that integrity loop and now I've got this evil script that I'm going to copy across onto a new file and it's very simple it doesn't really do anything but it's not on the whitelist so we're going to change that to be executable it's on the system, it's not running, the system doesn't care everything's still good now if we actually run that let me go over to the verifier we can see that it caught file not found in the whitelist now we go back over to the agent the agent's not being queried anymore the agent is cut off from the verifier system so that's really the simplicity of how it works via the command line the good thing about this is that it's all REST APIs as well so you can build your own management systems around this to do all of your provisioning and error catching and dissemination but we will go back to the presentation and we'll do a quick wrap up well that's not the way that should have worked excuse me for a minute mirror it then okay so to wrap up what's going on in the key lime community so we've got a couple tasks that we're tackling right now there's heavy development on our version 5 release major port from Python 2 to Python 3 and we're also porting the cryptography libraries that are being used to PyCrypto PyCryptography because the PyCryptography library is actually FIPS compliant and we want the systems to be FIPS compliant we're also working on Fedora packaging and we're documenting everything that we're running through building out the documentation, users files and everything and we're doing some virtual TPM development on for the KVM hypervisor now again because this is a shortened presentation we'll run through this very quickly so initially there is a virtual TPM that was developed for the Zen hypervisor because it was the only one that was supporting TPM back in 2013 so that's why it was chosen but to enable a virtualized TPM that's actually tied to the hardware TPM each VM that comes up will be assigned a virtualized TPM VM so you'll actually have a one to one correspondence there it's a very small VM but it does grow the system but the intricacies here are now that we have a dual TPM system so we have the virtualized TPM which is controlled by the tenant but we also need to have the hardware TPM controlled by the provider so we introduce a provider registrar into the system to maintain the validity of the hardware TPMs in the system and the key lime agent will only worry about the virtual TPM but to be able to really trust that the virtual TPM is its integrity state because of software it can be hacked or someone can install its keys we've come up with their own keys into it we've come up with this deep quote system where you ask for a deep quote and that will not only get a quote from the virtual TPM but the virtual TPM will then go all the way down to the hardware TPM to grab that full stack of you so this is done always at boot time so you want to make sure that you initially have that deep quote to verify the entire system and then when the verifier runs in its continuous loop it will do it on its first check to make sure again that the entire integrity stack is intact but then it will do a lighter weight quote during most of its checks but every once in a while it will do that deep quote again just to ensure that that full system integrity is maintained so that's a quick overview of the virtual TPM again we can go into more at the booth if you want what's going on in the community going forward we're looking at porting the non-agent components to Rust right now the Rust agent or sorry the key lime agent is developed in Rust to ensure its security Python isn't that secure so we wanted to make sure that the pieces running on the actual cloud nodes were secure we're investigating what other pieces need to be more secure or performant so that's the current thing going on and so we're also looking at compatibility of different OSs so initially this was developed on Buntu and Zen hypervisor because that's what was around at the time supporting TPMs but now we're doing work packaging for Fedora we're looking to ensure that this runs on RHEL, CentOS, Fedora, Ubuntu, ARM systems anything out there that the wider community wants to use key lime for and we're doing system diversity testing to make sure that hey if it works on my machine it also works on your machine there are so many different variations, so many different TPM chip manufacturers that we need to do wider things problem is we're only a team of six right now and there's a lot of work here that needs to be done so we're really looking for people to come in and join the community and help out with this one of the more fun things that's in the near future is us looking at the containers containerizing key lime itself but also integrity checking containers during run time so if that interests you, help protect your slice and come join us and join the community, wide open community that's your landing page, so have fun with it and thank you for your time, any questions so this does not have a direct relationship with ENARCs it is another security product that we're trying to foster at Red Hat and this is more of an upper level node security and not necessarily a lower level protecting the code run time security anything else? oh we got another one, good well, so you're only worrying about protecting the good nodes if the other nodes are compromised then they can go keep communicating with themselves but the outlet path, the outlet pipe is still going to be protected so you're going to cut that outlet pipe off based on those certificates so they can still be running whatever they're trying to do on the system and only able to communicate with themselves, with the compromised nodes so that's how that would look well, exactly, well those will all be protected by your TLS connections so as soon as those certs are revoked, that will cut off the connections anything else? well, thank you for your time, hope you enjoyed it and if you have more questions, come find me