 So, as James mentioned, my name is Matthew Garrett, I work at CoreOS, I've recently been concentrating primarily on security work and part of that is a focus on trust computing measured boot. So, I'm going to be talking about some of the work that we've done on making this work out of the box and what still needs to be done for this to be generally useful, why we care. So, security of the boot chain is a pretty vital component of any other security solution. If someone is able to tamper with your boot chain, then any other security functionality you have available can be subverted. If it's possible for me to interfere with your kernel, then it doesn't matter how much self-protection the kernel would otherwise have. If I've modified the kernel, I have arbitrary code execution in the kernel. If I'm able to modify your firmware, I'm probably still able to circumvent any security functionality the kernel has, and the boot load is in the kind of intermediate position. The boot load can modify the kernel before it passes control to it, and there's no way for the kernel to then verify itself once it's running. In terms of what we have deployed in the Linux ecosystem so far, the primary protection we have against this is UEFI secure boot in the desktop and server space. UEFI secure boots a firmware level feature where the firmware verifies a signature on the boot loader before it executes it. The boot loader in turn then verifies a signature on the next step of the boot process, and so on. You can then build an arbitrary level of signature validation before you give up and just let arbitrary code run. Microsoft have extended this on the Windows RT platforms to the point where the secure boot root of trust ends up locking down the individual execs you can run in user space. You can't run any user space binaries unless they're signed by Microsoft. Outside that world, in the embedded world, we also have a large number of different solutions, many of which are rooted in the SOC itself, the firmware in the SOC. The SOC will not execute additional firmware code unless it matches either a hash or signature that is actually flashed into the SOC itself at provisioning time. So in that world, again, you have a chain of trust that goes to an arbitrary level and it's rooted in the SOC. The problem with these is that they rely on the security of the firmware or of the SOC. Whichever component is performing the initial validation, if that can be subverted, then you have no actual trust. So if someone's able to modify the firmware, then there's not necessarily any actual validation of the firmware before you hand off control to it. Beyond the signature validation, it's conceivable that you can modify the lowest level and then own the system. But the other thing, which that allows attacks on the system, but the other thing is that there's no way to prove that the verification happens. The only thing that is doing this validation is the prior stage of the boot process. Once you're running a kernel, you can't ask that kernel whether it's booted in a verified manner because if someone was able to boot an unverified kernel, they could have just modified the kernel to lie to you about that. And then suddenly you're happily sitting there running something that says, yes, I booted securely, and it's actually not secure. It's changed behavior behind your back, you're in a bad position. It's easy to think that, well, this doesn't matter if the system's been signed appropriately. Then who goes? The chances are I can just assert that this route of trust is sufficient. If materials I expected to be rejected is rejected and materials I expected to be accepted is accepted, that's probably okay. We're seeing people starting to move towards providing bare metal server hosting. So a lot of people want the agility, the convenience of cloud hosting, but they also want the performance of bare metal. They don't want necessarily to use your virtualization infrastructure, they just want to use your cloud management infrastructure. And we're seeing vendors start to offer bare metal. The problem with offering bare metal is that often if you have route on a system, if you have the ability to access hardware resources, many of these systems have not been designed to be resilient against that. The trust model has basically been if you're an admin, then access to the hardware is outside the trust model. This has resulted in it being possible in various cases to basically escalate from route into the system firmware, potentially in a persistent way. Such that the next time somebody installs an operating system on there, there's now an agent running in the firmware that can compromise that system. So say you wanted to, if you know that a company you want to compromise is using a particular bare metal hosting provider, you could just get a large number of short term leases on hardware in their facility, install this persistent firmware agent, and then get out, and then later use that firmware agent to exfiltrate data about the target company. And secure boot will not help you there. Verified boot is not going to help anything. There's no point at which the initial system firmware on X86 class hardware will necessarily be completely verified. This is slightly untrue. Intel's boot guard allows some degree of control around this, but there's a lot of systems that don't actually chain all the way up through boot guard terribly well. There have also been cases where people are perhaps legitimately concerned that if their laptop is taken away from them when passing through borders, that the firmware may be modified in a way that results in system compromise. The degree of, well, how much of this is paranoia and how much of this is legitimate concern, I'm not going to make any firm statements about, but it's the kind of thing that you do perhaps want to think about. Obviously, we can't protect against hardware attacks in a particularly straightforward way. If someone adds a hardware key logger, there's pretty much nothing we can do in software, or even using some degree of hardware protection to protect against that. Something people have talked about is the publishing X-ray images of known goods laptops such that you can then X-ray your laptop and check whether anything appears to have been added to it. And that, yeah, well. But there are a bunch of other cases that we can protect against, and measured boot is a component of this. So measured boot is based around trusty computing, which got a fairly bad reputation in the early 2000s as it was very easy to portray it as a basically DRM mechanism. In practice, that was not necessarily a particularly plausible outcome even then, but what we've certainly seen is nobody's ended up trying to use trusty computing for DRM. There are lots of reasons why it's very difficult to actually deploy it in a particularly user-hostile way. But the hardware components of trusty computing is the trusted platform module, which is a small chip. The idea is that the spec defines the pinout for any given bus. So they're manufactured by several different manufacturers. You should be able to buy one that has the functionality that you want and drop it onto a board, and it will work regardless of which vendor you got it from. They mostly differentiate based on the amount of NVRM storage they have and the performance. Otherwise, they're pretty much all the same. In practice, they obviously being basically embedded devices have a bewildering array of different bugs. But most of those you don't end up hitting. It could be a lot worse. TPMs do a lot of different things, but I'm going to focus on some functionality provided through what are called platform configuration registers. Platform configuration registers are basically between 16 and 24 typically registers on the TPM. Those are not directly accessible. You cannot directly write them. You have to ask the TPM to store some data in there, and then something magic happens, so I'll get to it in a moment. Platform configuration registers are commonly referred to as PCR, so I'll probably end up doing that for the rest of the presentation. On TPM 1.2 devices, for PCRs, there are a static set, and then there's 20 bytes each, which is the length of a SHR1 hash. On TPM 2 devices, you can have several different banks of PCRs, which vary in size depending on the hash algorithm that's used for that bank. The idea is that rather than limiting the spec to a specific hashing algorithm, TPM 2 allows you to use a range of different hashing algorithms based on your feelings about the security of those hashes. So when we boot, we measure things and we store those measurements in the PCRs. Measurement is the idea there is to store something that represents the next part of the boot process. So the firmware measures the boot sector of the hard drive. The boot sector measures the rest of the boot loader. The boot loader measures the kernel. Each of these measurements gets stored in the PCRs. And through convention, different PCRs are used for different purposes. Zero through seven are used as part of the boot process and are defined by the spec. Once you get past seven, then that's up to the operating system for the most parts. PCRs, as I said, are not directly addressable. If I want to store a measurement in the PCR, I need to call a TPM function that says, here is a measurement and here is the PCR, so I want it to be measured into. Now, if I were able to just say, set this PCR to this value, if at any point you booted some untrusted code, I'd be able to go back and say, OK, set PCR 0 to this known good value, set PCR 1 to this known good value, and just go through and overwrite the history of what was actually booted and replace it with known good values. And that would defeat the object entirely. Instead, when you perform a measurement, the existing value of the PCR is taken, the new measurement is taken and they're concatenated. So you now have, for TPM 1.2 devices, a 40-byte value and then the hash of that is taken and that hash is stored in the TPM. So this means that you can't set a value unless you perform exactly the same sequence of writes or, alternatively, if you can somehow calculate the value that you would need to measure in order to get from value A to value B. And doing that involves basically breaking SHA-1, which is not yet practical. So we're not too worried about that. But TPM 2.2 devices allow you to use more complex, secure hashing algorithms, so we have even more confidence. Right now, SHA-1 is basically sufficient. So this has that property. The property that it is very difficult to guess a value unless you perform the same sequence of writes. But that also means that if you change the order of the writes, you'll end up with a different value. If I measure A, B, I'll have a different value to if I measure B, A. And that'll become relevant later. Now, outside the TPM stores in system memory, there is also a log and each measurement event usually results in the log being extended. So you add a new entry to the log describing the value that you wrote into, that you measured. So where it came from and what the hash of it was. As I said, that's just stored in RAM, so there's no direct protection of this. Any untrusted code can overwrite the log and make it have arbitrary contents. That's not as much of a problem as it sounds like and again, I'll get to that later. So in order to have a measured boot, you need to be able to measure each component of the boot chain. Now, first of all, that requires you to have firmware that implements this. Pretty much all systems, all x86 systems shipped these days have firmware that does this. So, and we're also starting to see this appear on some ARM devices. Assuming you have firmware that does this measurement and assuming that the firmware actually performs the measurement in all cases, I've seen systems that are fine at measuring stuff booted off disk and which forget to measure anything that you pixie, which is kind of a problem. But I think the one of those that I saw, I reported and they seem to have fixed it. So that's great. But you need support in the later parts of the boot chain as well. And the big issue here is that no standard Linux bootloader does this out of the box. In the past, we had Trusted Grub. Trusted Grub was a set of patches on top of the old legacy version of Grub 0.97. It was not particularly good. The first thing is that that old version of Grub is unusable in a lot of modern contexts. It doesn't have a lot of functionality that people want. The quality of the Trusted TPM support codes on top of it was also not great. It's never really a good sign when patches have completely different code formatting to the original code. It usually ends up indicating that maybe people weren't paying that much attention to what was there in the first place. So this worked, but it wasn't particularly nice. A German company called Rosentrotz have a Trusted Grub 2 project. This was originally being developed by a group called Cyrix, who now seem the GitHub repo now, as I discovered today, redirects to Rosentrotz. So I assume that something interesting has happened there. And this was a version of Grub 2, which has a TPM support, but it had no UEFI support. It only had support for using the legacy BIOS interfaces to the TPM, and it only supported TPM 1.2 devices. It had no support for TPM 2 devices. So when I was starting to think about this at CoreOS, I basically decided to just... Well, I actually started on this before seeing the series code and ended up just writing my own support code, in some cases based on the code from the original Trusted Grub patches and, in some cases, just written from scratch. So you think this is okay. I now have a bootloader that supports measuring the kernel, and arguably that's a complete job. Things are a little more complicated than that. There's a question of what more should I measure? And the reason that you have to ask that is, is the kernel the only thing that influences the booted state of the system? If I boot a kernel and I measure the kernel and I know that the correct kernel was booted, was there anything else that could be done that would alter the security of the system? And the answer is yes. The configuration that's passed to the kernel is very relevant. The command line that's passed to the kernel is very relevant. Grub 2 has a significant scripting language that allows you to basically, if you want to, do anything you want, including poking values into RAM or port IO. So you could reprogram a bunch of hardware to do R3DMA over the kernel shortly after you've booted it before the IOMU's been set up. You could disable the IOMU before passing controls to the kernel, or possibly even overwrite the tables that cause the kernel to believe that there is an IOMU and therefore disable that protection entirely. And the kernel measurements would still be the same. Measuring what happens during the grub phase is pretty important in terms of being able to say the system is where it wants it to be. So the traditional approach here was to measure each component into a separate PCR. This is a little less than ideal, firstly because there's not a unlimited number of PCRs and secondly because some of those PCRs are being used by other things. PCR 10 I believe is used by the Linux IMA project to measure user space binaries and have the kernel provide measurements of those. If you've already started putting stuff into PCR 10, that interacts somewhat poorly. So having each individual component put into a separate PCR, it doesn't scale. And if you want to measure, say, so if you wanted to measure grub into one PCR, the grub configuration into another PCR, the kernel into a PCR, the initRMFS into a PCR, then you've already used five PCRs. And then if grub supports loading modules, are you putting each module grub loads into a separate PCR or all into one PCR? And then if you're loading, putting multiple things into one PCR, then the ordering starts mattering. So reusing PCRs is important here, the order of loading masses and you probably want to avoid those constraints. So if you're reusing the same PCR, if you're trained to actually measure everything into a small number of PCRs, which is basically a requirement, you also have to deal with the problem that unimportant configuration changes will alter values. If you measure the entire grub configuration and then someone changes a comment, does that result in a change in measurement? If someone changes a particular constant in the grub configuration that doesn't have any security impact, that will again alter the measurements. And it's not really straightforward to teach grub everything it needs to know about whether something matters or not because there is no single correct answer. You have to deal with this differently depending on your security policy, depending on your threat models. So just sticking everything in PCRs and then looking at the static values of the PCRs is suboptimal for multiple reasons. This is where we go back to the log file that's not actually a file, using the TPM log. Now, as I mentioned, the log is just in memory. So an untrusted component can overwrite the log. And if you're just looking at the log, that means that a hostile component can fool you. If you trust the log, then a hostile component can cause you to believe that everything's okay. But there is a way around this. As I mentioned earlier, the mechanism that the TPM uses to change the values in the PCRs is very straightforward. It's easy for you to look at the log and replay the log in the same way that the TPM extended the measurements, extended the PCRs during the boot process. And you should end up with the same values that are in the TPM. If the values in the log file end up giving you the same answers as the values that's currently in the TPM, you know the log file has not been tampered with. And at that point, you can look at the individual log entries. So there's two ways of doing this. And the traditional one has been to just have the log contain a description of the binary and the hash of that binary. So you might have kernel and the hash of the kernel. You might have a graphics module and the hash of that module. That's great for looking, that's great for being able to measure a large number of small binaries. You're now able to look at the log, verify that it hasn't been tampered with and then say, okay, I don't care about the order that these things were loaded in. I merely care that each of these was appropriately, was measured and gave me the measurements I expected. That doesn't work hugely well for configuration. So again, as I mentioned, there are certain changes that you could make that are irrelevant. And there are certain changes you can make that are relevant depending on your threat model. So instead, you can have a log entry that contains some text, like the commands that grab executed and then the hash be the hash of that text. So in this case, when looking at the log file, you can verify that the text hashes to the hash that's in the log file and then you verify that the log file gives you the correct PCR values and now you know that the text in the log file was the text that was executed by grab. So you can have a policy that describes each binary and says these are the set of permitted binaries and you can also then have a policy that provides a set of regular expressions that you can match against the text entries and then you can have a policy that describes something like this grab command followed by any argument is acceptable. This grab command can only be followed by these three arguments in order to be valid for my particular use case. But if you're writing policy, where does it come from? Ideally, you don't want to have to write a brand new policy yourself for every release of an operating system and for every operating system you need to support. So CoreOS, as of three months ago, every release that we build ships with a file describing the known good hashes for every component of the operating system and a policy that describes a valid grab configuration from our perspective. And that's not complete because unfortunately you still need some way of validating the firmware values. The firmware measurements are things that are provided by the manufacturer, they're not things that are provided by us. And at the moment, I'm not aware of any vendor that provides known good values for their firmware. Until that happens, there's still an amount of manual policy development work involved. But that's a case where you can have a known good system, you can extract the firmware values from the known good system and as long as you never then upgrade the firmware, those values will remain static and everything will be fine. But we're still faced with a few problems. In the CoreOS case, the init ramfs is something that's statically generated by us and is actually attached to the kernel image. So it's easy for us to provide a static value here. Most distributions generate the init ramfs at install time. And what's even worse is that even if you don't change the contents, when you build the init ramfs, you will typically get different output because timestamps end up being embedded in it. So something that we need is a reproducible init ramfs. And ideally we also could do with known good generic init ramfs is being distributed by vendors, operating system vendors. There will be users for whom these will not be suitable but this at least gives you a case that would work for the majority of people and sidesteps that problem. Another thing is where do we store boot data? So you may need some amount of data that is normally stored in the init ramfs and which is relevant to the TPM. But if you put it in the init ramfs, then the init ramfs measurements will vary and so suddenly you're outside your known good state. So one answer here is on UEFI systems, some of this information can just be stashed in UEFI variables which are portions of NV ram that are available in the firmware but also in the running operating system. The TPM itself also has a small amount of NV ram so you can do things like put data inside the NV around the TPM and have it stored there. And why, what sort of boot data am I talking about? There are some things that you can use the TPM for that are particularly entertaining. One of these, one problem with using a TPM for boot measurement is that I've been talking about this as it needs to improve local security and that's still very difficult because even if you have value stores in the TPM, even if it's impossible for a compromised kernel to cause the TPM to give you good values, you still need to get the TPM values by asking the kernel and it would be possible for the kernel to then lie and give you back fake values that didn't actually come from the TPM. What you can do is ask the TPM to encrypt some information and only decrypt that information if the PCR values match a defined policy that you imposed when you encrypted that information. So that means you can encrypt a secret with the TPM and the TPM will only be able to decrypt that secret if the boot process has not been tampered with and then it can display that secret. Problem with a static secret is that it's possible for someone to just shoulder surf you, steal that static secret and then print the static secret even if your system has been tampered with. So the way I got around this was rather than using a static secret, I'm sealing a seed for a TOTP, one-time password generator. You run this, it prints a QR code, you scan the QR code with your phone. So this is familiar to anyone who's used 2FA on Gmail, for instance. And then when the system boots, it shows you a six-digit number. And then you check on your phone and you check whether the six-digit number that your phone is displaying is the same as the one that your laptop's displaying. And if so, then the TPM decrypted this information and you know that your boot process has not been tampered with. If the number's different or if there is no number, then you have to be suspicious. Another traditional thing that's used here is disk encryption keys. The system will refuse to decrypt your disk unless you booted in a good state and then the TPM handed over the AES secret for your disk encryption. We can also store SSH keys on the TPM such that if the system's been tampered with, you won't be able to SSH into another system. The basis of this is, a problem that we haven't solved here yet is if you upgrade the operating system or if you upgrade the firmware, then the PCR values will change and suddenly the TPM will refuse to decrypt your disk encryption key or your TOTP key. And then your system doesn't boot or your system, you can't validate your system. So we need a mechanism that permits automatically unsealing and resealing the information when we perform system updates and that's a problem that we really haven't started solving yet. There's a lot of details that make this quite difficult to do. So even once we've solved the rest of measure boots, in order to make it useful, we still need to solve this one. So there's an argument that this is all made much easier using Intel's TXT technology. TXT is what's called a dynamic Roos of Trust rather than a static Roos of Trust. What I've been describing so far is a static Roos of Trust. You have the firmware as this Roos of Trust and it measures each thing beyond the firmware measures the next component. You have to get to the desired outcome by performing exactly the same sequence of operations. A dynamic Roos of Trust allows you to remeasure at basically arbitrary times and only really cares about the stage you're in, not how you got there. And this simplifies a lot of stuff. It doesn't really matter if someone modified the boot process as long as the boot process still resulted in the same kernel being loads in the same place and had not tampered with any other components. TXT allows you to, just before exiting the boot loader and passing over to the kernel, do that measurement, write those values into the TPM and then go from there which makes writing policy much more straightforward. So in theory, TXT lets us sidestep a bunch of these problems. In reality, not so much. TXT does not currently solve many of these problems because well, TXT does not have meaningful support for secure boot. TXT does not allow any signature verification of components. In fact, the T-boot loader that is used in order to perform TXT validation, TXT measurement when booting Linux does not, it uses the multi-boot protocol which doesn't have any real concept of signature validation and so there's no way to verify that T-boot itself has not been tampered with which is not ideal. It actually gets a little worse than that because many systems including servers are now booting with UEFI by default run in legacy BIOS. And TXT nominally supports UEFI boots as long as you entirely disable UEFI functionality in the kernel which is a problem because then you can't tell the firmware where your boot loader is after install. There's various other things that we need to do with UEFI. So it's basically at the moment incompatible with the way we boot operating systems. So until that's fixed, TXT while theoretically nice is just not particularly helpful. So quick summary, in order to get to a point where we have measured boot out of the box we need to ship boot loader support. Now the code for that's written. The big problem is that CoreOS is mostly deployed on servers and so I have full expectation that as we push this out to desktops and laptops we'll find a large number of fascinating firmware bugs that will break in all kinds of entertaining ways and when your boot loader breaks users tend to get upset. We need to ship known good measurements. We need to have integration with firmware updates so that we know when we expect the firmware values to change and ideally we need to convince vendors to start shipping the known good values with firmware updates so we can do this automatically rather than just having to assume that because we perform the firmware update the new value is good. And ideally we also need to terministic in a trimFS generation so that we know in advance what legitimate measurements are going to be there. So that's everything I want to cover and we have five minutes until the break. So questions? Are there plans to upstream the trusted grub codes to upstream grub? So yes, there are plans to. After a certain amount of discussion with Richard Solman I finally managed to convince him that we could use TPMs for purposes of good rather than being intrinsically evil. Which I really hope is something that I can put on my performance review. In addition to that, so in addition to that grub2 recently gained new maintainers who are much more active than the previous maintainers so I'm going to start working with them. The big problem here is that as a GNU project grub requires copyright assignment and some of my code is based on the original trusted grub code and I can't assign the copyright of code that wasn't written by me. So I'm going to need to figure something out about that. It's somewhat annoying because this is code that which you basically can only write one way. Especially within the constraints of having to fit this in a 512 byte MBR. So ideally I'm going to be able to track down whoever did this originally, hope that they're still employed by the same people and get them to also sign a copyright assignment because otherwise we're going to need to jump through a few hoops on that front. It is theoretically possible that the grub maintainers could be convinced to not do this to not require a copyright assignment. I should say that I'm a member of the Board of Directors of the FSF and therefore I have something of a conflict of interests in this situation. So I'm not going to say anything on that topic right now but yeah, ideally this will be upstream. I do not want a situation where people have to carry a bunch of out-of-tree patches just to get this kind of functionality. It's sub-optimal for everyone. Anyone else? Have I implemented anything to hand over the event log from the bootloader to the kernel? Right, so in TPM 1 space there's an ACPI table which just provides a pointer to the TPM event log and that's very nice and straightforward because that is still there after you've got into the kernel. You can just read it out of RAM. In TPM 2 land, that's not the case. When you exit from the firmware, that memory is reclaimed and so the kernel cannot access the event log anymore. I've got a written but as yet untested implementation where the EFI stub loader within the kernel finds the event log and then passes that to the kernel as loader data and then that gets pulled out and then handled by the TPM 2 code. As I said, this is not as yet meaningfully tested and with luck I'll get the opportunity to do that in the near future and then we can submit that for review and inclusion. OK, so the question was why is there a difference between TPM 1 and TPM 2 in this respect? My understanding is that the reason was people didn't want to have to have this lump of RAM that they couldn't touch sitting around if they weren't using the functionality. So this way it's basically opt in. There's no requirements that the operating system maintain this RAM. Instead, it can be free to use by the operating system if it doesn't care about it. So if the operating system does care, then it has to explicitly do something about that. If it doesn't, it's fine. So basically that is my understanding. Right, so the point that was mentioned there was that UEFI does not necessarily imply ACPI and therefore requiring that an operating system have ACPI in order to be able to obtain these logs was unhelpful. So I think we're out of time. I'm around for the rest of the week. If anybody does want to chat to me about any more of this, let me know. There is a further TPM discussion session tomorrow morning, which I'm 10 something, sometime around 10. So please also come along to that if you're interested. Thank you.