 Good afternoon, everyone. Today we're going to be talking about how we're using TPMs to improve the security of Google's enterprise fleet, specifically using a remote attestation, using TPM primitives in order to gain an understanding of the security state of the device and subsequently make decisions based on that. And then we're also going to be talking about how you can make use of the same technology in a way that provides benefit both for enterprise and individual end users. Just by quick way of introduction, my name's Tom. And together we work on the enterprise platform security team. And our role on this team is to make improvements to Google's enterprise fleet and improve the security state there. So that's typically workstations, laptops, and corporate service. To be clear, we are not going to be talking about the production Google servers at any point today. We are talking primarily about normal off-the-shelf systems and security techniques that apply to those. So just signposting, foremost we're going to be talking about the problem that we're trying to solve. It is fundamentally a trust problem. And then Matthew will go over how we're using the TPM, the trusted platform module, in order to give us the primitives that we need to do remote verification. Then we'll go over an end-to-end example of remote attestation and walk you through some of the tools that we've built and are continuing to open source that allow you to build this capability yourself. So put simply, we're trying to solve a trust problem. A trust problem that comes when you're trying to use a device or users try to use the device over the network to access resources. Typically an organization will provision a managed device, perhaps load an image, load a TLS client certificate, or some kind of credential. And then issue that device to users. At this stage, in a typical organization, the user may access corporate resources from just about anywhere. They may go take meetings in another country. They may access it from a coffee shop. And the problem is that we don't really get much insight when we see these devices phoning home. We see a bunch of network packets on the wire. Perhaps they're authenticated using a password or a TLS client certificate. But that's authentication. We don't get insight into the actual security state of the device. How do we know that the machine hasn't been compromised? And as a result, how do we know that this traffic is authentic? And how do we know that we're providing access to resources to a real user as opposed to a attacker? So there are attempts to solve this problem. And it is a hard problem. So typically, you may deploy some security agents on the end devices. These could take the form of simply remote logging and sending those logs up to an organization server where they analyzed for signals of badness. It could be antivirus with quarantine capabilities. But regardless, these suffer the same problems. You have a situation where you are sending data to a remote endpoint, and that data can be faked, replayed, or manipulated. So as long as the attacker can compromise the host, there is no strong assurances using this mechanism that the information that you're receiving over the network is authentic. And this is the problem we're trying to solve. We're trying to solve the remote trust problem by using cryptographic assurances from the TPM. So we generally found that it helps when talking about security solutions to actually explain what we're talking about first before making any assumptions based on that. So I'm going to give you a brief introduction to what a trusted platform module is. So apologies for those of you who are already aware of this. The fundamental point of a TPM is to provide a solution to the problem that Tom described, where if you have a system that has been compromised and if you are relying on the operating system to provide you with the information you need to determine whether or not the system has been compromised, you lose. If the operating system can be compromised, any validation that you perform of the operating system state can be faked. If you're able to get code into the kernel, and if the kernel is fundamentally the thing that's saying, yes, everything is OK, then the moment an attacker can get code into the kernel, all your presumptions about your state being correct are no longer viable. So the goal here is to try to push the trust down to a point where you have a sufficiently simple, a sufficiently straightforward, segregated device that you can say, did everything go OK? Was any point of the boot process tampered with? And you can get an answer back and you can trust that answer. You can believe that there's no viable mechanism for someone to have lied during that process. The Trusted Platform Module is a small additional piece of hardware that sits on the motherboard of the majority of x86 systems that are sold these days. And for many, ARM systems is implemented in trust zone on the SOC. So at this point, basically, all PCs that's sold, certainly all PCs that have Windows compatibility stickers on them, and a surprisingly large number of phones have, at this point, an implementation of a TPM. The TPM is a small device. It is not a particularly competent device. And I say that in the nicest possible way. The TPM does a small number of jobs. The TPM takes information you give it and it stores that information in a way that can't be tampered with. It has the ability to generate cryptographic keys and it has the ability to prove to you that it was the device that generated those keys. The TPM has no direct visibility into the operating system, which is a sort of both a flaw in the sense that the TPM cannot validate the runtime state of the system, but it's also a benefit because it means the operating system has no ability to interact with the TPM outside a very limited number of ways. This means that the TPM is largely resistant to attacks from the operating system. If you compromise the operating system, the TPM itself is still trustworthy. So the TPM feature that we're going to be talking about today is based around the TPM's ability to measure. This is a combination of both TPM functionality and firmware and operating system functionality. Basically, the TPM contains a number of registers called platform configuration registers. You can put information in those registers, but you cannot directly control the contents of them. We call this measurement. You look at something. You create a measurement of it, generally a cryptographic hash of that, and then you pass that measurement to the TPM. The TPM takes the existing content of the register, appends this new value, hashes that with a cryptographic algorithm, and then stores the final value. This means that the specific value that stored within any platform configuration register on the TPM depends on the precise sequence of writes. If you modify any of those writes, then the value that stores within the platform configuration register will be different. And this means that unless you're so the only way to directly control what is contained within a platform configuration register is to break the underlying cryptographic hash algorithm, which so TPM 1.2 devices, which are still very common, use SHA1, which is at this point considered a weak hashing algorithm, but is still considered basically cryptographically secure. TPM 2, which is now becoming much more prevalent, the basically all new devices shipped with TPM 2, uses SHA256, which has no known security vulnerabilities. So basically, each component in the boot chain, starting from on PC devices, the management engine, measures the next step in the boot process. So you turn on your system on Intel-based x86 systems. The management engine measures the initial block of the firmware and puts that into the TPM. The initial block of the firmware then starts executing. And the initial block of the firmware then measures the rest of the firmware and puts that into the TPM. Any code that the firmware then executes, including codes that is built into ROM on a PCI card. So for instance, your GPU option ROM gets measured as well. So anything that's executing in the firmware environment from hardware gets measured. The firmware then measures whatever bootloader is being booted. And at this point, stuff stops being the firmware's responsibility. And stuff is passed over to the bootloader. The bootloader is then responsible for identifying which information needs to be verified later and pushing that into the TPM as well. So overall, by going through this chain, we get to the point where the operating system itself and potentially the configuration that's part of the operating system is all measured into the TPM. Now, the raw hashes are not themselves necessarily particularly useful. So we also provide additional information that goes into the event log. And that information can then be verified against the values that's written in the TPM, which Tom's going to talk about a bit later. In the case of Linux, the platform firmware and EFI, those measurements are handled just without the operating system being involved at all. On Linux, generally on EFI systems, we have a two-stage bootloader setup where you have the shim bootloader, which is required in order to provide a handoff from the Microsoft routed secure boot, Root of Trust, to the distribution Root of Trust, and then generally but not always grub as a second stage bootloader. So as of last year, both shim and grub will perform measurements of the next thing they boot without the firmware having to do that for them. Grub will not only measure the kernel and the inner trimfs, which are critical components of the operating system, it will also measure the kernel command line, which is also something that is security critical. So at this stage, the TPM has, knowledge isn't entirely the right word, but the TPM stores information regarding the security-critical components of the boot process up until this point. But we still face the problem of, well, okay, at some later point, I'm going to ask the TPM what I booted. I'm going to verify whether or not that was what I expected to be booting. But how do I know that the results I'm guessing are genuinely coming from the TPM? How do I prevent someone faking those results? And the answer is that every TPM at manufacturing time has what's called an endorsement key generated on it. So the other thing that TPM gives us over TPM 1.2 is the combination of support for both RSA and elliptic curve cryptography. So for endorsement keys on TPM 2, we have both an RSA endorsement key and an elliptic curve endorsement key. So you can have more than one endorsement key per TPM. TPM 1.2, we just have an RSA one. So at manufacturing time, a key pair is generated and the private half of that key never leaves the TPM. The TPM is the only thing that has that private key. The TPM manufacturer produces an endorsement certificate, which is a statement from the TPM manufacturer that this public key was produced on a TPM produced by this manufacturer and corresponds to this private key. Now, the endorsement key chains back to the TPM manufacturer. You have a basically equivalent to SSL, Roots of Trust, back to there, like you use for web services. So that means that when the TPM generates a key, you can ask the TPM, okay, did you generate this key? And you can get back a signed assertion from the TPM saying, yes, I generated this key and you can chain that back through the endorsement key to the TPM manufacturer. So you can say, okay, this key definitely came from a real TPM, this was not faked in software. In general, we don't use the endorsement key directly though. In general, we use something called an assertion key and assertion keys allow you to separate the verification process from the TPM itself. And the fundamental reason for this is that one of the use cases that was considered for TPMs is you might want to prove to arbitrary third parties, I'm a real TPM and this is the system state. But if you use the endorsement key for that, then every site that you attest to can then tie you together and know that you're the same system. So the assertion key allows a level of indirection. Each TPM only has per crypto algorithm one endorsement key but you can generate an arbitrary number of assertion keys and then there's a mechanism by which you can prove that this assertion key corresponds to this specific endorsement credential. So this means that we can get a separate key for the TPM that can be used to sign any cryptographic information the TPM produces and then we can look at the assertion key, we can look at whether the assertion key chains to the endorsement key and then we can prove, okay, this came to us not just from something capable of doing cryptography, it came to us from a TPM and specifically it came to us from a TPM with this endorsement credential. That's our use case. We're not particularly worried about the privacy impact because for what we're doing, we literally own all the confusers that are doing this. So we already know who has those confusers but in principle, you can have an independent third party who's responsible for taking the endorsement key and the assertion key and then generating a credential saying, okay, this is a legitimate assertion key and that way you avoid the privacy implications. So basically when we do a measurement all these measurements get put into the platform configuration registers on the TPM and we can then get a signed copy of those register values signed with the assertion key and this process is called quotation. We get a quote from the TPM that contains the values of the platform configuration registers and we can then chain that back to the TPM that it came from. So that way we know that not only were these measurements pushed into a TPM, we know specifically which TPM they were pushed into. Now that we've heard a little about how the TPM itself works and the primitives that it provide, I'm gonna go over how we put these primitives together to build a capability, the ability to introspect these running machines and understand their state, how they booted and have a hardware backed identity for them. So before we do that, just a quick overview of the different pieces you need to build in order to develop this capability. You need to have a piece of software that is running on the system that you wish to attest. This piece of software is responsible for communicating with the TPM and performing the operations that Matthew described. You need to develop an association between the EK and the AK that you generate, the endorsement key and the attestation key and then prove that remotely. And then you need to gather the event log and to gather the PCR values, generate a quote, so a signature over these PCR values and ship them off to the verification service. The verification service is the second piece necessary in order to build this capability. It is a service, a server running somewhere inside your organization or within the trust boundary of your network and it's responsible for verifying the information that the clients provide. Specifically, that means verifying that the quote it produced has a valid signature and the signature corresponds to the attestation key of the device that it is claiming to be. Then you need to verify that the platform configuration registers, which hash summation of the state, the booted state of the device, match what you match the values of the event log. Finally, if you know that these match, you know that the event log is authentic and therefore you can use it to make access control decisions. And finally, this brings us onto the third component of this kind of system, an access proxy. So typically with a environment that allows remote access to resources, sensitive or corporate resources, you will put them behind some kind of access control mechanism. This could be a reverse proxy or it could be a VPN or something of the like. And so you'll want to start using the information contained within the event log in order to make decisions about trustworthiness and security state of the device, allowing access to legitimate machines and blocking access to that which does not meet the bar for security. To give you an example of how this all comes together from first boot all the way up to access control. Consider this. First, the device boots. And as Matthew mentioned, the platform firmware and the management engine take part in measuring each component as it boots, extending it into the TPM, so recording the hash into a platform configuration register, and then also writing information about that event to an event log. This event log is stored in memory. So this is why we use the TPM PCRs because an attacker with control of the system could mutate memory. However, in doing so, they're not able to mutate the values in the TPM and they wouldn't match and we can detect this. Anyway, boot continues and each stage of boot measures information into the TPM and into the event log. And as this happens, the machine boots up before finally loading the OS. At this stage, there's a log with a wealth of information about the booted state of the device stored in memory with corresponding hashes stored in the platform configuration register inside the TPM. At this point is where the on-system agent runs. So this can be triggered. You can do this on a cron job, so you run it every few hours or it can be triggered on demand. But regardless, some agent running on the system you wish to attest needs to request the TPM to perform a quote operation and collect the event log. That information is then sent to the verification service, which then performs the verifications that are described before checking the signature, verifying the PCR values match that of the event log. And then finally, if everything checks out, pulling information and security attributes out of the event log. Finally, you use the information and make access control decisions. There is a wealth of data that you can use inside the event log to make decisions. If you are concerned with the state of the OS that was booted and the state of the platform, there is information about secure boot and information about the hashes of firmware and OSs and bootloaders that is extended into the TPM and stored in the event log. So this information can be used once the event log has been validated by simply checking the entries that you care about in the event log and matching them up to a database of good values. This is easier said than done for things like hashes. If you have a hash of the platform firmware, how do you then know what the correct hashes for all systems in your fleet? A typical large organization may have over 100,000 or more different devices. Many of them different makes models and firmware versions. So assembling a state of hashes of a good device is certainly no easy task. On the other hand, there are entries in the event log that are much simpler to pass. For instance, secure boot state is a simple boolean and information on Windows about which drivers were loaded and whether code signings enabled is also much simpler to pass as you can see. We get to see in the example on the slides that a specific security module was loaded in early boot and hence we can infer that this security module was started. So obviously it wouldn't be particularly appropriate to be at the open source summit and talk about this unless there was a significant open source component to it. So I'm pleased to say the codes that we are building all of this on top of is open source and is already available on GitHub. So we've decided to implement this in Go on the basis that firstly, a bunch of our internal tooling is based on Go so it's straightforward to do so. But also at some point this is tooling that is taking untrusted input. We are going to verify that information at some stage but we're still taking untrusted input off the network and so doing this in a language that is memory safe as opposed to doing this in C is a more sensible thing to do. Situation right now is that the majority of TPM tooling is built around C and that's, we've found multiple cases where trousers, the TPM 1.2 stack that's used on Linux has not particularly difficult trigger crashes and as a result probably is also interestingly exploitable in various ways. We do still on Linux rely on trousers for TPM 1.2 devices because the kernel does not support multiple applications accessing the TPM simultaneously. So we use trousers as a mechanism for multiplexing access to there but for TPM 2 devices we use pure Go all the way down. We have in this repository a complete implementation of both the client side and the server side verification. Tom will give a demo of that in a moment. At the moment, to be clear, we have this already running internally and have about 25,000 machines providing access stations on a regular basis. So we verified that this code works, we verified that this code scales. The actual clients and server we're not providing because those are very tightly tied into our internal implementations and it's just not useful to provide that code but we do have example clients and servers available in this pull request and we are looking at building more interesting examples of that. So for instance, while we've largely been talking about this from a sort of enterprise perspective where you have clients attesting to a remote server, the reason we want a separate server is because if the client is compromised you can't trust the clients to tell you whether the TPM values are good or not because an attacker could have compromised the client, that's sort of the point. But it doesn't need to be remote incentive over the internet so a completely viable way of doing remote access station would be to run the verification server on for instance a phone and then have your laptop attest to your phone when you boot it or on demand. That way you're able to do the access station locally. Your phone could ideal long-term solution would involve all the information about good values. So the values that the firmware has pushed into the TPM, the values that corresponds to the boot loader that your distribution has booted. If those were available on the internet in a defined format in a known location, you could produce a client that ran on your phone that grabbed that information and then you could boot a laptop and then your phone could tell you in a verifiable manner that your laptop booted this firmware from this vendor and then booted this boot loader and this kernel signed by this distribution. And that means that people who do have to worry about you fly into a country, security people take your laptop away and then they bring your laptop back. You now have a mechanism to determine whether your laptop is still running the firmware and the software you expect it to without having to rely on an internet connection to do so. We've aimed to build this in a way that makes it possible for people to build stuff of basically arbitrary complexity on top of it. This is very much a library implementation with example clients, as opposed to a implementation that is tightly bound to a particular client server model. We have an implementation of the verification event log parsing and will very shortly be landing an implementation of not just verifying the event log, but pulling out individual events and providing you with the information contained within those events. So that means information both about bits of firmware but also information about on UEFI systems. What did the secure boot database look like? What were the X509 certificates within the secure boot database? Who did those chain back to? And you can verify that your system booted not just the software you expected but it booted with the firmware configuration that you expected. And that will be landing in the very near future in the same repository. So I think we are going to do a very quick demo of the fact that this code actually works as opposed to being a myth where? So as Matthew mentioned, this code at the current stage with the current demo client in the pull request demonstrates the very basic primitives where creating and attestation key. I can find that. And this creates the attestation key on the TPM but there's no way to link this back to the identity of the device, the EK, the endorsement key without performing an additional step called credential activation. I forgot to run the server so I'll start that. So credential activation basically enables is a very, very complicated procedure that basically asks the server for a challenge and the server will generate a challenge, combine it cryptographically with the endorsement key and the attestation key and then send this blob to the device. And the device sends it to the TPM and the TPM will only decrypt the credential if the attestation key is from the TPM and the endorsement key is that of the TPM and this allows us to prove that we are actually that device without revealing, without using the endorsement key all the time, fantastic. And the server can record this. The server records the fact that the AK successfully activated against the EK and thus we now use the AK as a identity key. For clarity here, the server is able to verify the EK chains back to the TPM manufacturer. At the moment, we only have infrastructure for verifying that the EK came from the TPM manufacturer but the industry is gradually moving to a situation where you can get that guarantee from the device manufacturer. So you could buy a laptop and you can get a certificate that says this endorsement key belongs to the laptop with this asset tag or this serial number which then means you're able to say ship a machine to a remote location. Never touch it, just have someone at that location rack the machine and then net boot it and then the system can provide this certificate. You can verify that this was the machine you were expecting to show up on your network and then you can provision that system without anything else. You manage to establish trust in the system without having to touch it at any point yourself. The final stage for what we have in this repo is performing an actual attestation. So the on-system agent will run and it will collect the event log and collect a quote and send that to the verification server. The verification server then needs to check the signature on the quote, check that the quote signature is using a key that we expect, so an attestation key we expect, check that the nonce, so there is a challenge response to prevent replay, the nonce is correct, check that the platform configuration registers did match and finally if, did match that provided by the event log and finally if that is the case, then we know the event log is authentic and we pull out information from the event log to make security decisions on. We do not, as of right now, have the code in the repository for pulling out information but we have written it and it does work. Over the next few days we will be shopping it around and finally getting it committed into the open source repository. So we have, I was gonna say a couple of minutes but it is in fact literally a couple of minutes for questions if anyone has any. So I'm assuming that the client state itself gets hashed into the PCRs as well when you're doing remote attestation. So let's say someone was able to somehow load a virus or someone was download the virus, it somehow replaces the client code itself or they find a zero day like ability to go under the radar of measurement or something. So the entirety of the quote is based on information provided by the TPM. The client itself is not in a position to put any information into the quote. So the TPM has no visibility of the code that's running on the host system itself. So if you're able to compromise the client after boot, then in the model we're describing which is called a static root of trust, we don't have any insight into that at that point. However, any quotes you get from the TPM, we can guarantee that that has not been tampered with by a compromised client. So if you compromise the boot process and as a result compromise the client, then any quotes you get will indicate that the boot process was tampered with. So if that's your attack factor, the runtime state that's not being verified. Basically we verified that the boot process and the device identity are correct but not in this, the actual runtime state. Is event log also stored in the TPM? Or if not, how is best? So the event log is not stored within the TPM. It's stored in system memory and an attacker could compromise and modify the event log. However, we're an attacker to do that. The values in the event log, we can replay those events in the same way that the TPM stored them originally. So basically we verify that when we hash the values in the event log, those match the value the TPM gave us. And if those don't match, then the event log's being tampered with. If they do match, then even though it was stored in host memory and could have been attacked, then we know that the event log is legitimate. Oh, how does the event log pass between bootstaces? The firmware gives the operating system a pointer to where it is in memory and then the operating system handles that. For TPM 1.2, it just points you to memory. For TPM 2, it's hilariously more complicated in very annoying ways, but the kernel now does that for you as of 5.3. So with regards to the user mode security for Windows, you were talking about an endpoint agent, something that runs that you trust that can be verified with Elam and the hash of that goes to the TPM and you can verify that. Is there any way that you can use this to sort of attest that you've got something that's monitoring your user mode on other platforms, you know, OS 10 or on Linux? Yeah, so fundamentally, if, so it's a chain of trust, a chain of measurement going all the way from the boot stage all the way up as far as you can. When it gets to the OS, it gets harder to do this because more code runs and if you don't measure code that runs and that happens to be compromised, you can't be sure that the measurements going afterwards are going to be authentic. So typically what we try and do is we're measuring as much as we can about the boot and the boot state and the OS that's loaded and then diversifying our monitoring strategy. So in this case, we're using a on-system agent, Windows defenders, whatever, in addition to trying to use remote attestation. So there is also the ability to use TPMs and remote attestation going far down into use-land as well. You can extend additional PCRs with additional measurements for as long as you want through the boot, beyond boot, and you can attest that as well. And I believe IMA on Linux does that. So the integrity measurement architecture on Linux or IMA basically does this. So it will perform measurements based on a policy. So it can measure kernel modules. It can measure individual binaries. It can measure configuration if you want to. Also, at any point in the trust chain, we can push additional measurements in there. So on the Apple side, Apple do not use TPMs. The closest equivalent is the secure element in the touch bar and later devices, so the T1 or the T2 chip. That doesn't have the same boot integrity measurement and it doesn't conform to the trusted computing group specification. So we're not able to use it in the same way. Do you think we have time for one more question? Anyone? Do you guys use the trusted grub project or did you guys do the boot loader part yourself? So the trusted grub implementation was a little bit complicated. That was for basically what's now called grub legacy, grub 0.98. And also, grub is a GNU project, the copyrighted zone by the Free Software Foundation. The people who wrote the trusted grub code never signed the paperwork to pass the copyright of their code over to the FSF, so it couldn't be directly incorporated into grub. So we rewriters. So grub as of now supports TPMs but only on UEFI systems. It does not support this on legacy BIOS systems. It does support both TPM 1.2 and TPM 2 devices. That was built on top of the Verifier Framework in grub which was intended to allow you to basically verify applications either by looking at the signature or by measuring it or by just comparing a hash to something in Flash or something. So the TPM code hooks into the Verifier Framework and then pushes each verified event into the TPM. So I think we're out of time. We'll be around for the afternoon so if anybody does have any further questions please feel free to just find us and ask them. Thank you. Thank you.