 Hi, my name is Paul more and I'm here today with joy Latin to talk about securing TPM secrets in the data center. So we'll go ahead and get started. But first, some quick introductions. Joy, did you want to introduce yourself? Sure, Paul. Thanks. Hi, my name is joy Latin. I've been working in open source for quite a while. I started at the IBM Linux technology center in 2002 and I've worked on several projects and I had the pleasure of working with. I have a call with on label networking and I've also contributed to open for poppy and IP security in the kernel crypto API. And I'll hand it over to you. Thanks, Joy. So like joy, I've been playing in the Linux security space for quite a while now and various companies have started off the HP and compact and moved on to red hat. And now I'm here at Microsoft and currently I serve as the kernel maintainer for a few subsystems. SE Linux audit, labeled networking. Also clear stack on a cold rainy weekend that you put together Libsec comp and I now co-maintain that with another individual Tom. And there's my Twitter handle as well. So if you want to talk to me after the presentation, you're more than welcome to get in touch with me over Twitter. All right, so as you probably notice both joy and I are attending the security summit this year virtually and we're pre recording our presentation, which of course presents some challenges for doing questions and answers. But we are going to be on IRC during this presentation and probably a little bit afterwards if there's no questions. The information here is on the screen. Hopefully all of you are familiar with library chat or offer that's pronounced. My apologies to the library chat people will be in the data center TPM secrets channel. Welcome to join. Those are our nicks right there. Leave that out for just another second or two in case you didn't get the information. Okay, let's go ahead. So before we get to kind of the meat of our presentation of the problem and our solutions, we wanted to give you a few background slides just to make sure that we're on the same page with regards to some of the important technologies that we're going to be talking about and relying on in our solution. The first the first thing and the most important is the TPM, the trusted platform module. And I'm sure that most of you in the room today are familiar with TPMs. But for those of you who aren't, there's just a few basic things that you need to know. And one of those that TPMs contain registers, which we commonly call as PCRs and some non volatile RAM, which you'll often hear us talk about as an envy index or, you know, envy Ram region. PCRs are registers that are extended by writing a value into the PCR and then the TPM takes that value that you've written, hashes it with the existing value in the PCR and puts the resulting hash back into the PCR. And that's just how you update the PCR. You can't really write an explicit value to the PCR. Everything that gets written there is hashed in some way. And TPMs have a variety of implementations, you know, from standalone, separate hardware chips to some are implemented in firmware nowadays. But the important thing to remember about PCRs other than the fact that they're, you know, written by hashing values is that the TPM protects the PCRs from tampering. So they are trustworthy source of information. And the other thing, we haven't talked about the envy Ram yet, but the envy Ram and the TPM and these envy indices can be secured using these PCR values. Basically, when you define an envy Ram region or an envy index, you can give it a set of PCRs and their values such that those PCRs effectively act as a key to lock and unlock that envy index. So that whenever you have these specific values in those PCRs that you set when you define the envy index, that will effectively unlock the envy index and allow you to access it. If you change one of the values of those PCRs by extending a new value into it, that envy index will be automatically locked. And this is all managed and enforced by the TPM itself. All right. So the next thing we want to talk about beyond the TPM is kind of that intersection of UEFI secure boot and the TPM and how they work together. Once again, I'm sure that many of you here today have, you know, great deal understanding with UEFI secure boot. So we just want to touch on a few basic things. One is that when UEFI secure boot comes up, it measures different portions of the system into these TPM PCRs. And those measurements range from the firmware itself to the configuration values, the boot loader kernel and so on. And if you change any of these values, say if you change your firmware configuration, you're going to see that reflected in the PCR values. And the one PCR that we're going to talk a lot about today is PCR number seven. And this contains information about the kernel boot loader signing authority. It's a fairly unique PCR in the sense that it's not actually taking a checksum or hash value of a binary that's being executed or a configuration value. It's taking a hash or a checksum of a certificate or a public key that's used to authenticate the signature in a file. That's fairly unique because it essentially allows you to have multiple binaries. And as long as they're signed with the same signing key, they'll still result in the same PCR seven value. And we're going to talk a lot more about that as we go on. But that's an important thing to remember. However, PCR seven is not always stable. You know, you can have multiple binaries at the same signing key and PCR seven can still change. And there's two real primary ways that's going to change. If you have a firmware update, which changes the UEFI key databases. So these are the public keys, the public certificates that are stored in the firmware. That will affect the PCR seven value. Now in the post boot hole world where we have things like SBAT, which is interesting, but unfortunately a bit beyond the scope of what we can talk about today. If you change the SBAT value because you have an OS update which revokes a boot loader or a kernel, that's also going to affect the PCR seven value. So if you have a long live system that's deployed out in your data center, you have a reasonable probability that over time you're going to have a future update that is going to affect your PCR seven stability. Now might not be a big deal if we're talking about a desktop or a laptop system. But in a data center environment where resealing your TPM based secret because PCR seven change, that's going to involve somebody accessing the system, keying in a TPM owner password to reseal that. And that's just simply not suitable for the scale of the systems and the accessibility of the systems that we're dealing with. Our last background slide is going to be about TPM extended authorizations. And this is probably something that's not as well understood as well known as the TPM and the PCR based security for a TPM and the index. TPM extended authorization policies or EA policies basically allow for much richer access controls than the simple PCR checks that you may be familiar with. The EA policies can be composed of multiple assertions put together for a single policy and these assertions can include the simple PCR value checks that we already talked about. It can look at values and other NB indexes in the TPM. It can take into account passwords, pins, smart cards, there's a whole variety of things that you can put into an extended authorization policy. And further extended authorize extended authorization policies support signing so that you can sign an extended authorization policy. And that's important because it effectively allows you to have multiple EA policies to authorize access to a given NB index. The way that works is instead of assigning a specific security policy to an NB index, you assign a public key to that NB index. And any EA policy that that public key verifies can be used to define the access control policy for that NB index. All right, so enough background. Let's actually jump into our problem. And basically this comes from the title of our slide. We've got a data center environment, hands off lights out. We've got production systems and we want to make sure that the production systems are such that only authorized OS is only authorized kernels can actually access the secrets that we stored in the TPM. But we still need to have provisions so that development systems, CIS systems, test systems, pressure burn systems, you name it need to be able to access that secret regardless of the OS authorization. So you should be able to boot an unsigned kernel on these systems and still get access to the secret. And as we talked about, all this needs to work across OS and firmware updates because we've got long run systems that we need to support. And so here's our solution. And if you were following along with all the background slides, you kind of probably have a good idea of what our solution is going to look like, at least in broad terms. So we're storing our secret in the TPM and Vindex and we're going to leverage UEFI secure boot and the TPM EA policies to unlock that secret. And we do that by relying on the kernel signature as well as the UEFI secure boot measurements into PCR7 and the EA policies to unlock that secret. And if we have a breaking update either through the OS or the firmware, we resolve that by shipping a new EA policy and an OS update. All right, let's talk about this in a little more detail. So I'm basically going to walk you through a normal production boot process and we'll see how all of this works. And later on in the slide, Joy's going to go a bit more into the implementation and actually show a little demo so you'll get to see all of this work. But anyway, so when you first power on the system, the system bootstraps itself, the hardware comes up and it starts executing the firmware. And we've got the normal UEFI boot process with secure boot enabled, secure boot verifies all the binaries based on the signatures. And it gets to the point where it looks at the boot manager and it figures out where the boot loader is and it verifies the signature on the boot loader. And we're using a Microsoft signed UEFI shampoo loader like most all the other Linux distros out there. You can work around this by adding a mock entry for your specific shampoo loader. Or if you're able to, you can create your own UEFI key database and load that into your firmware. But either way, you need to get your shampoo loader. It needs to start up. And we are doing something a little bit different. Most distros use a second stage boot loader, a lava-muse grub. We boot our kernels directly from Shim. Some of that is because we have a reasonable amount of confidence that the UEFI boot manager is going to work the way we want it to. And we feel that that seems to be a fair statement to make on modern systems as well as QDMU. That all seems to work well. And that removes grub from the equation. So if there's any problems with grub, it doesn't affect us. But anyway, your mileage may vary in that one. So anyway, we pass the kernel path name as an argument in boot loader. And that's what the boot loader, the shampoo loader, verifies the signature on and executes. And so once the boot loader executes, the first thing it does that verifies the signature of this kernel file. It uses both the UEFI and the built-in Shim key databases. This is nothing new. This is how Shim works. Our particular Shim boot loader has three keys stored in it. It's got the production key, which is used to verify kernels and establish a PCR7 value, which would allow read-only access to the TPM-based secrets. We have a TPM management key, which we'll talk about in a little bit, which is going to grant full read-write access to the TPM. And we have a limited key, which, as the name would suggest, is fairly limited in that it doesn't grant any special privileges to access things on the TPM. Basically, you can only access the publicly available information you can get out of the TPM. It would be very similar as if you would walk up with an Ubuntu on a thumb drive and stick it into the system and boot it. You wouldn't get access to any of the secrets or anything privileged on there. The other thing we do that's somewhat interesting is that we combine the kernel, the init ramfs, and the kernel command line into a single UEFI binary. And we do that because this way we can sign that entire binary in UEFI secure boot and the boot loader verifies all of those components for us. Normal Linux distributions typically only sign the kernel, which means that the init ramfs and the command line are not protected by the normal UEFI secure boot process. So this way we get to leverage that and protect everything in early boot. So that's a nice thing that we do that we really enjoy the security advantage of. And last but not least, the other thing that we have to mention is the shim extends the kernel and the signing queue measurements into the TPM PCRs because that's our special sauce. That's what makes all of this work. And then finally, the last thing that the boot loader does is it hand execution over to the kernel once it's verified it and then the kernel is off and running. And so once the kernel gets execution passed to it from the boot loader, it's maybe a little bit different in our case because we've bundled or smushed that's technical term. Because we've smushed everything together, the kernel, the init ramfs and the command line, we have to include a little bit of a binary stub in there and that stub is effectively system debut. We use their EFI stub. And what that does is it locates those particular kernel components loaded into memory and it just basically makes that transition from the boot loader jumps into the kernel's execution point. And the kernel boots normally, much as it would in any other UEFI secure boot system, it initializes the hardware brings up all the subsystems and then it goes and executes the init located in the init ramfs. And the init ramfs is where things start to get a little bit interesting here because we ship our TPM EA policies in the init ramfs and we can ship as many EA policies as we want. They're all very small. So if we have a system that needs to support multiple different firmware builds or whatnot, we can include multiple EA policies. And so the init ramfs, the init in there, uses these EA policies in conjunction with the PCR7 values that were set during the boot process. It uses that to unlock or unseal the secrets that we have in the TPM. And then from there, depending on what the secrets are, you could do whatever you want with the secrets. But one thing that's kind of neat is that once you've done whatever you need to do with the secret, you can go ahead and reseal or relock the TPM. We rely on PCR7. So if you were to go and extend another value into PCR7, that would effectively lock the TPMs that you wouldn't gain access to that secret anymore. So you could have it in such a way that the only thing that actually has access to those secrets in the TPM is the init ramfs. By the time it transitions into your proper root file system, the TPM is locked again. And the only way to regain access to that would be to actually do a system reset, re-initialize the TPM, go back through the whole boot process. So it's kind of a cool little thing to keep in mind, depending on what exactly you're doing, that might be something that's a plus for you. And at this point, the system boots normally, just as you would expect it to. So, well, that was basically our production common use case. There are a couple of special cases I want to talk about very quickly here. The first case is number one, managing the TPM. And much like where we can expect a long-lived system to need a firmware update, it's not totally unreasonable to think that a long-lived system might need to have some management done on it. And the TPM to deal with who knows, some future calamity. The problem is that privileged TPM operations, we typically require the TPM owner password. And the TPM owner password is effectively root. So we really need to control access to that TPM owner password. You could do that. We want to have obviously a unique password for each system, for each TPM. But if we do that, you need to have some sort of centralized password database, which comes with its own issues and liabilities. So what we're going to do to work around that is store the TPM owner password in the TPM itself using the same security mechanisms that we used to store our other TPM secrets. The only difference is we're going to have a EA policy that uses a different kernel sign-in key. So a different PCR value than our production load. We want to do this so that a normal production kernel, obviously, we don't want to have access to the TPM owner password. But you could compose a special one-off OS image build. You know, it's just the kernel, the netRAMFS, it never mounts to the root file system. It just goes in, does whatever TPM management tasks you need to do, and resets the system. Obviously, you want to guard this because, you know, if it could be manipulated, it could be used as a management tool. So, or a tax tool, sorry, something to keep in mind. The other thing is kind of tightly related to what we talked about before, EA policy revocation. And when we talk about security, we always talk about revocation. This is, you know, a difficult task to do in a lot of cases, but here it's actually not too bad. You know, we say like, you know, you've been shipping your product for X number of years. You've got a number of TPM EA policies that are out and released products and something comes along and you need to revoke access to all those EA policies. How do you go about doing that? Well, as we talked about before, the EA policies allow for multiple assertions. So beyond the PCR seven check, we're going to add a second assertion to check a policy version. And this is another NV index in the TPM, which just stores a simple value. In our case, we chose a 32 bit integer, but could be whatever. And this is normally a publicly readable value, but it does require privilege. It requires the TPM owner to change. So production system can't change the version number or PCR seven for that matter. Anyway, the way it works is that the normal production systems does simply quality checks. It's got to have a policy version that matches the policy and it's got a PCR seven that matches the policy. The policy version is different. It doesn't matter what the PCR seven version is. The EA policy is going to deny access. And so basically if you've shipped all of your EA policies with a policy version of one and all of a sudden you need to revoke those EA policies, you can simply do a TPM management update that changes that policy version to two. And now all of a sudden all those shift EA policies with policy version of one will no longer authorize access to that TPM based secret. So once again, it's not something you're going to want to do very often. But should you have the need, there is this capability. And the last special case I'm going to talk about now before I hand it off to Joy and she can talk more about the implementation and give you a demo is development systems. So once again, building on the previous two slides and the solutions there, how do we go about coming up with the development system that will allow you to do developer kernels unsigned kernels or new user spaces in the net ram or fast that developer wants to test? Well, we're going to leverage the policy version to do this. So on traditional production systems, we're going to check both the policy version and the PCR seven value on a development system. We're going to have developer EA policies that only check the policy version. And we're going to check it to a well known value of zero. And we're never going to ship a production TPM EA policy with a policy version of zero. And this actually works out pretty well. A developer can order their production system like any other customer or user would they get the production system. They run especially crafted, you know, TPM management OS, you know, magic image that goes in and adjust the policy version stored in the TPM down to zero. And now they can go ahead and run whatever they want in that system as long as they have the TPM EA policy, which specifies policy equals zero and skips the PCR seven check. And what's kind of neat about this is for your CI and your test systems, you can actually safely include this development TPM EA policy with policy version equals to zero because you know you're never going to ship a production system with a policy version equals zero stored in the TPMs and V index. So anyway, that's our solution. That's the problem we're trying to solve. Those are some of the special cases and some of the background information. And at this point, I'm going to go ahead and hand it off to Joy. She can talk a little bit more about the implementation and give you a demo. Thanks, Paul. Thanks a lot. Okay, so I'm going to talk to you a little bit about the implementation. Currently we are still implementing the design. So consider this more of a work in progress. So I'm going to only touch on a few of the implementation points. I'm going to talk about stubby and the TPM provisioning work that we've done. So stubby. So stubby is considered just a simple EFI boot stub that can act as a boot loader. And we're basing it off of the system D boot EFI stub. So we wanted something agile, small standalone and system D is a lot of code. So we decided to just extract the EFI stub code from system D and use that. And you can view our stubby source. It is open source and I have a provider to link there. And we like to refer to that combination of stubby and the kernel and the kernel command line and the init ramfs that we use to create that single UFI application. And I believe Paul mentioned some of this. We like to refer to this as our smushed kernel. And so what's going to happen is our, I'm sorry. Our smushed kernel is going to be, we're going to sign that that smushed EFI will be signed with that the Cisco keys that Paul mentioned like the TPM management key or the production key or the limited key. Okay, so now I'm going to go move on to the TPM provisioning requirements. So as given the requirement that we wanted to be able to store a policy version, a TPM password and a secret into the TPMs in the RAM. And we want to use the PCR seven values and the EIA policies to secure and authorize access to those in the indices that contain that data. So what do you need to do this? So we needed the PCR values and we need them from each booted kernel. Now remember, so we can have a single kernel version that's going to be signed with a copy of it signed with the production key. A copy perhaps signed with the TPM management management key and a copy perhaps signed with the limited key. So therefore we need the PCR values from each one of these signed booted kernels. We also need to end the indices that the data is going to be stored in and we of course pick those within appropriate ranges as defined by the TCG. And we need a EIA policy signing key pair. So the private key is going to be used to sign the digest of the EIA policy. So we're using TPM two tools and these tools like they output a policy and for verification the tools require an input to be a signed digest of that policy. So we need, so thus we need the private key to sign it, to sign that digest and the public key that we, the public key and that signed digest is what we include into the init ramfs. So I'm going to go into a little bit detail with that, with how we implemented some of this. So of course you have to take TPM ownership with using the password and then we also need to load the public key, the public key into the TPM to be used because remember we wanted to tether that to that policy. And so for example to set up the policy version we need to define an NV index for that policy version and then we just need to write that value into the NV index. So the policy version was a little bit easier than the other ones. So for the TPM password, so now we have to pass in our PCR, we need to have, we need that PCR seven value. And from that, from the signed with that, I'm sorry, from a booted kernel signed with that TPM management key. Because remember only kernel signed with the TPM management key will have access to the TPM password. So we need to create a policy of course bound to that PCR value. And then we also need to associate the public key to that policy. And we need to define an NV index for the TPM password and then of course write it into the newly defined NV index. And just something to point out is that the result of this which is going to be a policy and we have to sign the digest of that policy. And also that the public key that was passed in those are the things we're going to include into the unit or MFS. Now for the secret, it was pretty much similar pretty similar to the what we did before just a little bit different because remember for the secret. We want to check the PCR value and the policy version before you have access to that secret. So again, we need that PCR value from the booted signed kernel with the kernel signed with the production key. Then we're going to check that PCR value and that policy version. And we're going to create a policy based on that policy version in that PCR seven. And we're going to again associate that public key that we're going to use to sign that policy to that policy in the TPM. And we're going to define an in the index for that secret and again write that secret into the in the index. And again that's going to out the output will be a policy which we're going to create the digest and sign the digest and include that dot sign digest and the public key into the unit or MFS. Now what so okay we done that now what what does that do so now the TPM secret can be read during boot of a car with a kernel signed with the production key. So we're going to remember the public key and the sign that sign digest is included in the net RAMFS. We're going to check the PCR value. We're going to verify the signature and determine if the policy version matches. And if all goes well we're allowed access to that TPM secret allowed to read that TPM secret from the in the index. And for kernel sign with the TPM management key. They can now access the TPM password and update the policies and everything because more but once they have that TPM password they're allowed to do administer do administration for the TPM. Okay. So here are some links to some of the information that we've talked about so you can in that have been open source and so I'm going to go ahead and proceed to the demo if that's if everybody's ready for that. Okay. Here I have a. Booted into a production kernel and I'm going to simulate what would happen when with the unit RAMFS would do. So I have a script here called TPM setup that SH that's going to do the provisioning that we just described. And I'm going to need to pass to the script the the PCR so recall that I needed to set the PCR readings I needed the PCR readings from booting into production kernel and the PCR seven readings from booting into the TPM kernel. And I've already collected those and I have them ready. And we also need that public key which is going to be the public the public key portion of that e a e a policy signing key pair. So let's run the script and let's pass in the public key. And the two PCR readings. Now as you can see my output is a TPM management policy and a TPM secret policy. Now in order to when we want to read the MV index is to get the secret we need to make sure we need to pass in a public key and we also need to pass in a. The sign digest of the policy so I need to sign the create a digest and sign it. So I'm going to use open SSL to do this. Okay, so now before I attempt to read I'm going to boot back into my production kernels like you see I'm running a production kernel. So I am going to a script that's actually going to do the verification that's needed in order for me to attempt to read that secret. So I'm going to start that script and I need to pass in that public key and I need to pass in that sign digest policy. Now let's see if I can get access to that secret. Ah, there it is. So there's my secret. Now. Remember only a kernel signed with the production key has access will will have act have the ability to access the TPM the TPM secret because which is using the PCR seven value and the policy version. So let's say I booted into a kernel sign with the limited key. Let's see what it can do. As we recall the limited key doesn't have this ability. It doesn't have a policy that all that that's included in its ramfs that will allow it to access the secret. So let me quickly. This is in a special purpose OS so I have to copy over my evi boot manager. Now you can take a look here. You'll see that my limited kernel is at the boot order 0008. So let's choose that. Okay, and that's boot into that kernel. Okay, and that same script and passing that same public key in that same sign digest policy. Wow, there you go. You can see that it failed. So hopefully the demo has showed you that when booted into a kernel signed with a key other than the production key than any other any other kernel but the production kernel does not have access to that TPM secret. Okay, handling over to you, Paul. Great. Thanks. That was cool. As you may have noticed under more information slide at the time we recorded this we didn't have the scripts that we used in the demo available up where you can get access to them on public GitHub or some other place but we will have that done by the time you are watching this. So go ahead and hop over to the RFC channel and we'll share that link with you and you can see what the scripts do and should be able to replicate this by yourself if you wanted. So anyway, thank you very much for attending our session. Hopefully this was useful. Hopefully you learned something and hope you enjoy the rest of the security summit. Thanks. Thank you.