 I appreciate that everybody. My name is Luke Hines and we're going to be doing a talk on a new project called Keyline. So I'll go into the ins and outs of what Keyline is and what it covers. So a quick intro. I'm a security engineer. I work at Red Hat in the Office of Technology. I've been around security for a good number of years now, mainly tall in and vulnerability management and research. So I recently joined the Product Security Committee for Kubernetes, but I've done stuff in open stack and open daylight and various open source security committees. I live in Wiltshire in the UK. It's a rural area. It's mainly software developers and farmers. That's about as much as you get there. I'm a keen runner as well, so if anybody wants to talk running, I could probably talk about that all night. So let's define the problem. What is it that we're actually looking at here? Because without a problem you can't really have a solution, can you? So essentially we're talking about remote IoT trust. Now the key take from here is the remote part. So with IoT, consider that devices are often in physically easily accessible places. So they can be tampered with. They're not within a secure place like a data centre or an office. A lot of the time we might be talking about units that are up in the roof or outside and situated in areas where it's very difficult to maintain a level of physical security. And from that physical security, the actual software security itself because people can access that device. So if we consider the security, I want to see if I can turn this round because I can't see my own slides. OK, so first of all we have physical protections. So if we have a unit, it's not very economical to have a human watch over it all the time. So security guards, CCTV, all of these things are not going to work. There is various types of tamper prevention that you can use. So adhesives can be used to seal the box. You get special security bits, torcs. These are like screwdrivers. Typically, I don't know whether this is a universal term, but we have a Philips and a flat head. And they have these sort of torcs where there are these variations of screw heads. Again, really, they're not really going to provide any decent level of security. And then you have tamper monitoring, which is effectively where there's trip switches. So if you open a device, it will trigger off an alert. Now the problem with physical protection schemes is it's a gamercat and mouse. So eventually somebody's going to get hold of it. There's going to be a design flaw that's not originally anticipated. And somebody's going to be able to open the unit and bypass the security protections that you have. And if we look at safe breaking, the history of safe breaking, it's just a continuous cycle of more security and then somebody circumvents that security. Then we also have software security. So we have mandatory access control discretionary. So we have our standard permissions, SC Linux or APARMA labels. And we can have cryptographic assurity. So we can sign objects. We can look at the hash and then compare it and then make a decision that somebody's not tampered with that particular hash to suggest us trust date. And then there's obviously like integrity verification systems, trip wire and aid. And there's many. I mean, we could go on all day about the various sort of software security solutions that there are. But again, the problem is the software trust is, it resides either in the memory or in the disk. So there's a private key that will be resident in the memory or be in a disk. And so that can be accessed. It can be tampered. It can be spoofed and so forth. And then the other aspect is you're at the mercy of the lower levels of the stack. So what do I mean by that? If we take, for example, that we're talking about a remote device here, essentially you have a chain of trust from the firmware, the bootloader or the shim. You're in it ramfs, your kernel, your modules, your user land and your runtime. So again, if we think about, we've got a device that we've remotely instantiated and provisioned, you can really only see the latter stages of that instantiation. So you can log in with a shell session, you can move around, you can change accounts. But you can't really get an implicit level of trust in the lower levels of the stack itself. You have to kind of make a loose level of trust in that. So this brings us on to a hardware root of trust, aka trusted platform modules. Now I'm not really going to go into the deep ins and outs of TPMs. There's probably a few people here that may well know them even better than I do. So we're just going to do a 101 just to bring those up to speed that are new to the technology. So it's a specialized chip. It's not a crypto accelerator. It's just has a very simple engine that can perform certain operations such as signing and hashing and so forth. It has an RSA key pair that it's inaccessible to software. It's only a particular bus that can connect to perform requests for various operations. There's actually multiple keys. There's an intent to T key. There's a station key and you can create your own keys from that. But essentially the key part to consider here is that the private counterpart is siloed and is within that chip and it can't be accessed. The TPM is able to hash critical sections of firmware and software. So by hash we're talking effectively just creating a cryptographic hash to show the state of that particular object. But the extra part that a TPM can do is it can make those hashes public and it will sign them with the private key, which remember is physically inaccessible within the TPM chip itself. So using the public key you can verify that the hash list has been signed by a TPM and it's not been tampered with. Because obviously if it's been tampered then it would break the cryptography. So that's effectively something that we call an event log. And with this event log being made public you can then do something called remote attestation. So you can remotely, outside of the device, look at the hash measurements, be sure that a TPM actually signed that hash list and then verify the integrity of the system remotely. So some of the usages for a TPM, it's quite commonly used for disk encryption. So if a device was removed for example from a laptop the TPM would no longer be present so he wouldn't be able to get the data from the drive. It's also used for password key protection, machine identification. It's been used in gaming to stop people cheating and then as said earlier platform integrity, which is what we're interested in here, that's a particular use of the TPM that we're using in KeyLine. Okay so what is KeyLine? So KeyLine is a project that provides open source remote attestation. It was originally devised in MIT Lincoln Labs, they have a security team there. They came up with the white paper, the cryptographic relations that we used and they put together some early code as a proof of concept. And since then it's become an open source project which various different people are working on. So KeyLine provides a measured boot. So again the TPM will measure various artefacts such as the firmware, the bootloader, your init ramifest, your kernel, your modules and then that list of hashes is made public and then you can verify that nobody has tampered with any of those particular objects. Now there's a bit more to it than that. There's an extend operation, it's a one-way hash so hashes are concatenated together and then they're rehashed. And this provides a level of assurity because it's very difficult to go back on a one-way function. It also provides remote runtime attestation. So in the kernel there's integrity measurement architecture. And it's been something in the kernel since I think Linux 2.6, kernel 2.6. And IMA is effectively it's a subsystem where whenever a certain action occurs and a system call happens, so I have here BPRM, M-MAP, SE Linux labels are changed, files are executed, there's various policies that you can set up. And what will happen is when that occurs the hash will be captured, it will be put into a security FS and then the TPM will perform an extend on that hash so it will build up that cryptographic hash tree again. And one of the things we do with keyline is we can continuously monitor that list as it's populated and compare it to a white list of values that we consider good. So we have like a golden state that we want the system to be in. And we can then tell if somebody is remotely executed something that we don't have white listed or they've changed something or they've changed an SE Linux label, then we can know within a matter of seconds that the machine is being compromised and then start to take an action. We also can do encrypted payloads. So what we do here is effectively we monitor the machine so we can monitor the boot, monitor the runtime with IMA and then if the cryptographic state of that machine is as we expect it to be, then we can unload an encrypted payload on that machine. So that could be effectively, that could be some certificates, some secrets, perhaps a config file that has database, password strings, any sort of sensitive data. And we also have a revocation framework. So when a node fails a state, there's a series of actions that you can take afterwards. And I'll go into those a bit more. We've got some more details on those. So this is the architecture. So the way to consider this is that over to your right, we effectively have the remote domain. So this is where our IoT devices. And then over to your left, we have on-premise. So these would be systems that you have within your local control, perhaps within your home network. First of all, we have the verifier. So the verifier continuously monitors the integrity state of the agent which runs on the actual node that we wish to monitor. This is where all of the cryptographic verification happens for the boot and runtime and so forth. And for anybody interested in TPMs, this continuously polls a TPM quote which is where effectively we request a quote from the TPM on the current cryptographic state of the machine. We have a register where we keep the public keys that the TPM manufacturers provide that can be used to attest the signatures that we get back. And we also register the agent there. We have a simple database. And then we have a revocation service which is the framework where we can kick off specific events that occur should a node fail its integrity state. And then we can also interface with a certificate authority. So, for example, CFSSL. So, again, if a machine fails its state then we could revoke a certificate which in turn would perhaps strip down all the TLS or IPsec connections. And last of all, quite interesting. Some of you may have noticed in the middle it says HTTP. That's intentionally put there because we don't have any secrets at all that pass across this connection. So this can actually be open because it's effectively all that comes back as a TPM quote with a nonce. And if somebody tried to tamper with that it would break the cryptography and the node would be seen as failed. So we have no reason to protect that connection at all. Okay, so it can be deployed in many different architectural models. So a single site to a single node or device. A single site to many, many, many different devices or nodes. A multi-site, multi-node so you could connect multiple data centres together and then have a whole many, many relationship of nodes. It works very well within multi-tenant, within a cloud scenario where I would be a cloud consumer. I would have a workload that has a sensitive angle to it and I can effectively ask a cloud provider can I trust your hardware so I can remotely test their hardware. If I can trust it then I can schedule my load to execute on that particular hypervisor. And multiple can attest a single node so you could effectively have a user attest a machine and then the provider of that machine attest the machine as well. Okay, so let's have a look at some of the use cases. So the first one is where we actually bootstrap the machine and then we tag on, it's not mandatory but we're going to tag on an encrypted payload. Okay, so again we have our, at the bottom we've got our machine that we're monitoring and this is running the key lime agent. And this has a TPM chip and then at the top we have the verifier which performs the cryptographic verification and then we have the register where we register the node and we keep the public TPM keys. And then over to the far right here we have our user and they're going to use something called the key lime talent which is effectively just a CLI application that we provide but there's REST APIs for all of this as well so you could develop your own system to integrate with key lime. And what's going to happen is using this key lime talent the user is going to create a key just check on my keys. Okay, I don't know why. I've bought it out of full screen mode because my keys disappeared, I've got my keys back. Right, so effectively what's going to happen is we're going to create a key and this is called the bootstrap key and this is going to be cryptographically split into two pieces. Now we're going to call these v and u. First of all we're going to delegate to the verifier that we want to monitor the integrity of a machine. So we're going to send them the V counterpart of the key that's been split into two. Then we're going to send the u half to the device itself which is running the key lime agent. We're then going to ask the verifier to perform an integrity check so it's going to perform a TPM quote onto the agent which is then going to communicate to the TPM itself using a TPM2 software stack. What's going to happen is if that integrity is shown to be sound in that we know that nothing is being tampered with it cryptographically checks out then the second part of the key is going to be provided to the agent who can then put these together and then has the bootstrap key to be able to unencrypt the payload. Now this payload it could be something that we send over the wire or it could be stored within an OS image so it could be embedded into the image itself whether that be an ISO, QCAL2 or whatever format you have. So the keys are recombined by the agent and then it's able to execute the payload and safely have the secrets delivered to the device. So for example if this device failed the verification then the counterpart of the key is not going to be made available to the agent and the agent is not going to be able to do anything with that information so effectively if somebody hacked the machine they're not going to be able to get hold of your secrets. So that's the first use case that we have which is an encrypted payload. The second one is what I spoke about earlier which is continuous remote attestation. So for this one what we have is a whitelist let me give you an example so you can see what I'm talking about. So this is a whitelist, it's pretty simple we've got a hash and then we have the POSIX path to the file. This was generated by unarchiving an init ramfs of a stock operating system. So as I said we've got the hash and the file itself. This is IMA, the IMA Integrity Measurement Architecture. It stores this on SecurityFS and again there's a few more labels but effectively we've got a hash and then we have the POSIX path to the object that's been measured. Now what happens is every time a syscall is made a various syscall the IMA will update this list so it will generate a new hash but that hash will be created and extended by the TPM so you have a hardware root of trust. Now using these two we have a point of comparison between the current state of the machine and our whitelist, the state that we expect the machine to be in. So as we can see again we have the keyline tenant our CLI application and we've got the whitelist. So what we do is we send the whitelist to the verifier. Remember the verifier is on premise. We're not sending it out into the dangerous world where the agent resides. So we send this to the verifier. The verifier generates some nonce so that we don't have any susceptibility to replay attacks and so forth and then it performs a TPM quote to the device and the device will return the quote and then the verifier will perform that cryptographic comparison differential between the expected state and the current state. Now if that fails then there's a series of actions that we can take which we'll look at next. But just to refresh we have the whitelist which is the golden state. We have IMA which populates a list based on the current state which is updated in real time every time an event occurs and key line remotely attests the system state IMA against the golden state and that happens continuously. We typically work around it's a configurable interval but our poll is, I think the default is every two seconds and one verify can do that to thousands of devices so it's very light traffic we're talking it's a very small get request and we did do some benchmarking and we took it up to 2,000 devices against one verifier. Okay so we look at the verification framework so what we're talking about here is effectively where a device fails its trust state. Again we have the verifier at the top and then at the bottom we have the key line agents and the difference is this connects into a certificate authority and then what happens when the device fails its state the verifier will send out a revocation event to all of the other nodes that are alongside the failed node and it will tell it to perform some local actions we'll look at what local actions are in the next slide and at the same time should a node fail remember we had that bootstrap key that we created that could be part of a certificate authority so a request could be made into a certificate authority to revoke the certificate so then if you built up a TLS structure or some IPsec tunnels based on that certificate authority that node would then effectively be cut off. Okay so let's just have a little bit more of a deep dive on what the revocation framework is so it's a custom framework so you can effectively you can come up with anything I mean the world's your oyster if you can write some simple bash or some Python you script it yourself and as I say you know anything that you can script can be kicked off locally on the machines so a good example would be a node fails the verifier sends out a revocation event which is signed so that you know that it's the actual verifier that's generated that revocation event and you could tell all of the local machines apart from the failed machine this machine has failed knock out its entry from authorized keys that's something that will be relatively simple to script another one would be again a node fails and the verifier calls your certificate authority to make a certificate revocation and again the example that I used earlier this would invalidate all of the TLS connections and then effectively you'd cut the node off because it's been compromised okay a little bit more about the project itself so we've had some really nice organic growth people have found the project from various Google searches and they've come along they've shown an interest and they've actually turned into people that are contributed so we're a kind of a multi vendor project there's a couple of independent developers that have come along and started working on the project as well so as you can see we've this is a kind of like an auto generation metric of a github project's current state so as you can see there's an increase in year on year commits we've got a young but established code base developed by a large development team the first commit was made in October 2016 which is when MIT uploaded the code to their github repository and then it says the most recent one was an hour ago we're very welcome in and friendly to people that want to come along and get involved you don't have to be a security expert you don't have to even be a developer we need people that just try out the solution tell us how to improve it perhaps write documentations we're very open to any help that we get we do mark issues as good first issues and this has attracted some people to the project and we have a contributions guide which will tell you exactly how to make a poor request what sort of issues we need help with and we are a friendly community I like to think we're a welcoming community since the project was opened by MIT we've worked on trying to have as much continuous integration automation testing as we can so every time somebody makes a patch we instantiate a container which runs a TPM emulator and then all of those scenarios that I've just outlined where a machine is a tested and we fail and so forth we carry out functional equivalents of those tests within a container if the test fails then Travis will tell the poor request that is failed and the developer will need to fix it make a git commit amend send up the patch again and we'll retest it we also are in the midst of doing the same with all of our upstream dependencies so we use the TPM to software stack there's a software stack, a set of tools and a resource manager that they develop and they have various versions and of course their master branch so we're going to start testing against that so that we can capture any breakages as they happen we have all our documentation that's auto generated by Sphinx and is rendered on to read the docs and we're going to also set up build testing so as far as coverage at the moment we're working on a lot of the standard Linux distributions so Fedora, Ubuntu, Debian, Raspberry Pi, Buster release has recently been verified and we're also open to working with as many distributions that we can so if somebody came forward and they wanted to get it working on X we as a community would support them we'll discuss things over an issue we'll troubleshoot it with you and so forth so we currently have weekly meetings every meeting the agenda is a GitHub issue that way we can easily link into issues and pull requests and do an at a sign to a particular person and so forth and that helps us easily track everything and we also meet once a week Wednesday 1500 UTC we have a Gitter channel where we meet and we discuss the various items that we need to go through and this Gitter channel is also, it's there 24-7 so if somebody was trying to achieve something with a system or something didn't work you'd be able to jump in there quite a few of the core developers hang out on that channel and you'd be able to get support so that's the other key thing that we wanted for this project really was that when new people come along they try to get it to work something blows up, they get an exception they have support people that will rally around and help them to get that to work so what's coming next at the moment we're working on VTPM support so a VTPM is a virtual TPM so effectively a virtual TPM could be rendered within a virtual machine or a container now the key thing with a virtual TPM you don't have that hardware root of trust so effectively the keys for the VTPM are stored within the disk but VTPMs are very useful because you can create them at scale and every container or virtual machine can run its own local TPM so one of the things we've been looking at doing is extending that hardware trust from the hardware TPM into the virtual TPM so at the moment we're working on this with Boston University and one of the professors from MIT that originally worked on the project and we're effectively going to pool all of the quotes together build them into a Merkle tree and then have one central hardware request quote request to the TPM which will then effectively allow us to pool a quote of thousands and thousands of devices into a single operation because one of the things with hardware TPMs they're not designed for handling multiple requests they typically expect to just work with one piece of metal machine themselves so this would allow us to have mass scale and it extends the cryptographic trust from the hardware TPM to the VTPM the other thing we're working on at the moment is KeyLime is developed in Python and we're porting the agent to Rust so the agent is the part that runs on the remote machine so that's in a more hostile environment and we're going for Rust quite simply because it's very performant there's no runtime garbage collection and the security Rust has a very strict compiler around memory safety and thread safety so you tend to deal with a lot of your security technical debt when you're actually trying to get your code to compile so last of all just to round up you know we're a young project we're looking for people to get involved anybody you know you don't have to be a security crypto expert or even a developer architects users people that have got TPMs people that are producing their own boards anybody we welcome you to come along and get involved we have a website where you can get hold of everything easily like our documentation find out where the repositories are how to get a simple system up and running and then as mentioned earlier we have a community chat channel where you can jump in 24-7 and ask any questions so talking of questions we have a few minutes should anybody have anything they'd like to ask sure you're probably going to need a mic is this one good? one, two thanks for the talk the connection to the TPM is usually via ISCA-C on the board so that would be subject to local attacks is there some way to authenticate that if you have a trusted boot or something on my SOC this part is looked after by the TPM software stack so they have a built in a session authentication system so we're quite lucky what we do is we effectively wrap around the command set that they provide and that allows us to put in commands to create a session and they've got lots of things in there to prevent dictionary attacks and replay attacks and stuff like that so luckily that because we sit on top of a quite actively developed stack that provides a resource manager a set of tools and a software stack itself the TSS is actually all part of that system there does that answer your question? someone else? sure thank you for sharing these community insights that was very interesting one question was that who is sponsoring the CIE infrastructure and who is maintaining and operating the pipeline so nobody at present it's a free travis they provide a free account for open source projects and I tend to set it up myself so I sponsor it but not with cash just time okay sure in the beginning you mentioned a few times that the TPM is doing measuring of various code at boot I'm a bit surprised because I thought the TPM is not doing DMA and controlling all the buses so how exactly is the TPM measuring anything? effectively what happens is so the TPM never makes a decision on the trust state all it does is it signs the object itself so this is where I'm not I'm not a TPM expert but from what I understand there's a core root of trust there's a CRTM which is like a seed and then it will take the next part of the boot chain whatever that might be the firmware and it will take a hash of that and it will add the two hashes together so it will concatenate them and then it will sign those and then the next part will be measured okay and then that will continue until you have a complete one way hash function which will allow you to effectively replay the boot is that what you were asking about or something around the communication? so the TPM is not not the thing on which the security depends but rather another component which reads all the data and feeds it to the TPM so the TPM even if the TPM is working correctly you have no guarantee that the rest of the system hasn't been tampered with because the rest is reading the data so there's two within the context of Keyline there's two parts that feed the TPM so one of them is IMA which is part of the Linux subsystem okay I wouldn't really be able to tell you about the attack vectors in there but it's been around for about I think close to nine ten years that code now so it's pretty well audited and that came out in kernel 2.6 I think so that's IMA which measures the run time okay and what will happen is IMA if it senses a TPM then it will send its measurements into the TPM into a platform configuration register which is a particular part of the TPM where a hash is stored now if it's the boot then there is in fact a shim which can do the same thing it can if a TPM is present then the shim will extend its measurements into the TPM itself around the protections on that I wouldn't be able to comment I'm relying on the kind of the maintainers and the people that are developing that so we're more of an application that's sitting on top of that software stack but it's very widely audited I believe so I'd be fairly confident in the security of that no exploits have been reported since TPM 1.2 which was depreciated quite a long time ago and I think that's the case of somebody was I think they had an oscilloscope and a solder in iron that sort of attack no worries anybody else have a question sure so you have been talking about having a certificate for the key in the TPM on many embedded systems you also have certificates and keys for things like VPN tunnels or something like that is managing and renewing certificates of that kind also in scope for keyline or is that something that another component would do so that would be managed by a certificate authority so keyline not something that we do at present all we would be able to do is we could effectively tie the certificate into a CA in the aspect that the machine is compromised you can revoke and invalidate that certificate you could do an encrypted payload which would have your your SSH keys your TLS certificates and so forth so if the trust state of the machine is good then a key will the second part of the key will be release which would allow the device to unlock its secrets so you could do a follow up provision that's quite possible where you could restage some more secrets but again you'd ensure a non compromised state on the machine first and you could tie that into a CA as well yep okay anybody else cool okay so thanks everybody for your help sorry about me having to keep Ben round I couldn't see the slides on my laptop so that's why I had to keep craning my head round but I'm here all week so do come and grab me if anybody wants to learn some more details or find out a bit more about the project I'm here all week as well I'm also here for the Linux security summit where I'm hopefully going to do a demo of Keyline, we'll do a live demo if we can so thank you