 My name is Eric Richter, I work for IBM. I've been there for about a year now, which is roughly about when we kind of started working on this endeavor when I really started getting spun up. So this is kind of a bit of, I guess, a journey through the kind of decisions that we had to deal with for this particular component within our trusted boot stack. So I know, sorry, trusted boot again, which means you're probably going to hear a lot of information that you've already heard before again, so I'll try to blaze through some of that and the coffee might help. So quick overview. I have some background information, again a quick interview of both the TPM and Trusted Boot. Then I'll kind of get into Petty Boot, what it is and what we needed to do to make it a trusted boot loader. All the links to all the software that's mentioned in here and some extras are on a slide at the end, so I know some of you like to take pictures of the slides with the links, they'll all be at the end. So starting off, one of the core pieces out of all of this is the Trusted Platform Module, the TPM. And for this presentation at least, the only thing we really care about is the extend operation. The TPM has a set of these platform control registers, which are not directly addressable and the only thing we can really do with it is give it some data and it'll take what the previous data was, concatenate it and hash it all together. So this is where it's useful for our trusted boot stack. For each piece of component, save some firmware component A, the first thing that it'll do before it goes to transfer to the next component will be to measure it, log the event somewhere in memory and I'll talk a little bit more about how that works for our case later, extend the PCR and then transfer the execution. This goes through the rest of the stack. So eventually we'll end up where our target is D. Now by the end of this, we should have a log of saying, okay, we measured firmware component A, B, C, D and of course there may be some other extra components, may not be just these here. But we can take that log and compare against what the data in a particular PCR in the TPM and see if they match and if it matches up then we know that we're okay. All right, enough of that background information. This is what it looks like on an open power. The part that we really care about though is Petty Boot. The part that sticks us right in between the earlier firmware and our target OS, Ubuntu, Fedora, Vrel, so on. So what is Petty Boot? First before I get into that, I'm going to pick on EFI and grub a little bit. So in the case of EFI, say we want to find where either our boot loader is living or maybe we just want to directly access the kernel, we're going to need some kind of disk driver to talk to the disk, something that will speak either ATA or something. And if we want a net boot, we're going to need a network driver. And a TCP IP stack. Also our boot loader might want that too. We'll probably want to actually be able to see what we're doing in the boot loader so we'll throw in a GPU driver in there as well, something basic. And USB for input and SCSI, iSCSI, PCI, DHCP, FAT32, that's kind of a lot. So this is the kind of wheel that we've been spinning for this firmware. And what we figured with Petty Boot is why reinvent it? We already have all of this in the kernel. So why don't we just use the kernel, reuse all these existing stacks to work as our boot loader. And so that's what we end up having. We use Kexec to actually leverage the booting, just to swap from one kernel to whatever we happen to be booting. And this also provides us the ability to just reuse the drivers. We don't have to re-implement FAT32. We don't have to re-implement any disk driver. It also provides us a user space. And this is an interesting part that is covered a lot more in a different talk. And also full disclosure, I shamelessly stole those slides from Jeremy Kerr. I do recommend if you're interested in this, looking up his talk Petty Boot doing interesting things in your boot loader. That covers a little more of the coolness of what you can do with this. But anyway, so what this means is that our actual boot loader part, the part that investigates the drives, tries to find these bootable objects, this can all just be written as a regular user space application. We don't have to work in either some kind of weird 16-bit mode, shove it in the MBR, or do anything strange. We can just write it like any other application, which also means we can test it on our regular booted OS. We don't have to do anything funky, like rebooting or use a VM to test this. And it also gives us platform independence. Any architecture that the kernel supports, and specifically supports Kexec4, we can use this for, whether it be x86, power, arm, so on. It's also well-tested, sort of. I mean, as we've seen in previous talks, there's, of course, bugs in the kernel. But it just as often as those are found, they're patched. And as long as you keep your firmware patched in the same way, then they can be mitigated. But of course, that means we don't necessarily have to completely audit every tiny little bit of our boot loader anymore, and make sure that there's no bugs or overflows. There's already, for the most part, a good community looking for these exact things already. So, I kind of lied, there's a little more background information. For those of you unfamiliar with Kexec, say we have some kernel A. There's one syscall to actually load the next kernel into memory somewhere. Also, for those of you that are familiar with this, you'll see that I have Kexec file load there rather than Kexec load. I'll kind of talk about why we use one over the other later on. But for this case, we are using Kexec file load and not Kexec load. Next, we use the reboot syscall with a special flag. And that jumps the execution from the first kernel into this little intermediary thing, it's called purgatory, which sets up the next kernel, and then jumps back to it. So then now we have kernel B running and essentially not having to do an entire full re-initialization of the entire hardware or firmware. And then, hopefully, it should clean it up and reclaim that memory. So what do we have to do to add the trusted measurements to this? It's a use-to-space application, right? We just have a small little thing that's running. Why not just take the measurements in there right before we boot? We have a menu, looks similar to Grubb, I wish I included a screenshot here. Except there's some problems. We need some kind of way to speak TPM. It's great that we have a dev TPM, since we can just use the TPM device driver. But I mean, we might have to re-implement the entire stack or figure out, at least figure out how to extend. And I don't want to write that part, personally. We also need some kind of write access to the event log. Again, the event log started in the earlier parts of the firmware, so we need some kind of way to access this. And for those of you that were at the bot yesterday, this is kind of one of the major problems that we ran into. Is that normally this event log is supposed to be read only at this point, at the kernel point. So either we'll have to implement some kind of write through to it. Or find some other way to access this memory. That kind of got a little sketchy. Finally, there's a full shell that's available to you at Petiboot. Which is of course, probably the greatest idea that you can ever have for your bootloader when you want to keep it secure. Eventually, we're looking to lock that down. But for now, we're not particularly worrying about it. But of course, you can always just exit the shell and manually craft a K-exec. That means the Petiboot application that's supposed to do the measurement has been circumvented. And later on, we can just add in a new measurement and pretend that we did what we're supposed to do. Which is bad. All right, so that didn't work. Let's just change maybe the K-exec tools. There's a little binary that actually kicks off and does all the sys calls for us. So let's modify that. Have that do the measurements. Fixes the manual K-exec issue, but the other two issues are still there. So, all right, back to the drawing board. What do we need? Let's step back, I guess. We need a TPM driver. All right, handled by the kernel. Don't really need to worry about that one. Some kind of way to interface with the TPM, okay? Slightly annoying, but not impossible. Way to measure the next component, obviously. That's what this whole thing is for. A way to log the measurement. All right, that was one of our tricky spots. And we need this streamable. The maintainers for our firmware don't particularly like carrying patches. Especially when you have to backport them every single time that there's a new release. So obviously, we want this to somehow make it to the upstream kernel. If only we had an integrity subsystem, we do. I'm a can handle talking to the device driver. Can handle extending the data, it already does that. Can handle measuring the next component when I'll get into that in a minute. It already has a log of its own. And it can handle some small modifications. Last background slide, I promise. Quick overview of IMA. It measures and appraises files based on a set policy. So really simple policy. All this does is measure all files read by route. And that's, I'm sorry, that's a little small. That's essentially what the log looks like. We have what PCR is measured into. Hasher the template, the template used, so on. And you can see which files were measured on the far right. Like, for example, init, busybox, so on. It's not a perfect solution. There's some changes that we will definitely still need. Like, including a hook to actually do the measurements that we care about. Probably one of the more important ones. Needs support for extending more than just one PCR. By default right now, it only measures into one PCR, PCR-10. Which isn't particularly useful when we want to make sure that we have some kind of fine granularity with our measurements. Maybe we want to seal a key to a particular value. So, next we need to preserve the IMA log. Great, that's a log. But as soon as we kexec, it poofs with the rest of the original kernel's memory. So somehow we need to keep that. And at least this way we can either merge it with the other log or find some other way to validate this. And in our case on power, we use a device tree. So the kexec load syscall includes an extra argument for handling, passing a device tree through. But kexec file load doesn't. So that's a bit of an issue that we're running into now. And how are we going to end up measuring it? Okay, so let's tackle problem number one. We need to measure the kernel in RAMFS. Well, Ima measures things based on a policy. So if we have a file loader, then we can just include a hook. And this has already been done. Maybe Zohar included a common file loader for the kexec syscalls, which adds in an extra hook so we can just measure both the kernel and the RAMFS right as they're being loaded, and we're all set. And the policy would just look like something like this. Next issue is extending other PCRs. In bootloader, maybe we want to measure it into PCR4 and 5, not PCR10, like it's the default configuration is to. So again, sorry for the really small log image. But the first thing is what PCR is it was measured into. Solution is just add flexibility into the policy. Rather than have some kind of really strange K config option to say, maybe I want these hooks to go into this PCR or whatever. Just add it into the policy. So for my example that I had before, take this example policy and just stick on a PCR equals at the end. And now as you can see, if you can read that, again, sorry for the small font. The left value now just says 11 rather than a 10. And if you were to check your PCRs, you would see now that 11's being measured with whatever the policy was set to do. And this was upstream in 4.8, so that should be on its way. Going to, of course, give myself a shout out. It was my first patch. Thank you. Thank you, Mimi, for working with for me on that. Finally, preserving the ima log, somehow we need to validate the PCRs on the other side. Solutions not as easy. We have to somehow take some kind of memory and make sure that it lasts through this. So what we end up doing is serializing the ima log, storing it, or rather a reference to it in the K exec image, doing the reboot, and then ima on a knit checks for this. And the next slide is, unfortunately, just blob of text. It's a little complicated for power. Since we're using the device tree, we end up creating nodes to say, okay, the memory that exists for the ima log exists from here to here. And if ima finds these nodes, then it loads the log from there. And, okay. And what that'll end up doing is it'll end up appending the log. Rather, all new entries would be appended onto that. So for those of you familiar with it, you would end up having two boot aggregates. And any measurements that you would see from your boot loader would now exist prior to the second boot aggregate. I think I might have an image for that somewhere, I'm not sure. Okay, now for the device tree. You need to measure and pass the device through Kexec file load. Now, didn't really fully explain Kexec file load, so I'll do that now. Kexec load, the way that it operates is, user space takes the current and the next kernel and it kind of chunks it up into these structures and tosses it along to this call. Kexec file load is a lot simpler, it just takes in a file descriptor. That's what allowed us to do the hook before. But unfortunately, this doesn't include that device tree option anymore. So we have to work around that. And our solution, I don't know. This is completely bleeding edge. I mean, I've been hacking together these slides all week. And we still don't really have an answer to this one yet. But some questions that we do have are what should be measured in the device tree? And what should user space be allowed to supply? In the original usage of Petty Boot, we did have a device tree that we were able to supply. Now we don't have that option anymore. But not only that, but we didn't need to measure that. The device tree is no different than, say, a command line. It's some kind of configuration data that can alter the next kernel. So we can't just throw in anything in there. We have to figure out what should actually be allowed to go through. So some final remarks, and I'm probably way under time. I blame the coffee. It works. That was one of the biggest surprises. Actually playing through all of this, it was remarkably easy to get this working in hindsight at least. And also it works cross-platform. I was able to actually spin this up in a x86 VM as well as using it on our test hardware, of course. And the measurements still work. The only thing that's lacking currently is the mechanism to pass the IMA log. If it had a device tree, for example, then maybe it would work. But that's where we need to work with figuring out if we should throw it in either some kind of EFI config table or something else. This also leads the way to secure boot. Since we're using IMA for the measurement, we can just also appraise the file while we're at it and provide it that the previous firmware components are doing the same signature validations, then we can just continue that chain right here. And that would be even more trivial than getting the rest of this working so far. The only downside is that we currently need the Xata support in the CPIO. Some things we still need, though, are command line measurement. This was brought up in previous talks, too. But we need some kind of way to measure the configuration. The command line can change a lot of the options or way a kernel runs, so that should be something that's measured as well. IMA is not as good at measuring buffers, unfortunately. Same goes with other config. For example, Petty Boot has some other options to maybe automatically boot to network versus the hard drive. So that option should probably be measured in there somewhere. And again, that would require some kind of buffer measurement. Any questions? Sorry, I went way under time on that. Yeah. So you asked the question, what should we measure? So why isn't the answer there? It's more a matter of mapping out everything we can. You measure everything. The value of our PCR is used as a ceiling. Right. It will modify the whole time, but it will verify the log. So some time early in the trusted sequence, there should be a policy of what actually we use the ceiling, which would be praised and verified. And then we would still log the whole thing, but it uses separate PCR to have the intrinsic measurements we're interested in, policy, and then we see what's left. Given that we have the ability to, at least in at this step, have a flexible policy for all of that, you could, of course, just add in a different policy for how you measure it. So maybe we want to measure these buffer measurements into a different PCR and see behind that. And the point is that if you don't really... Right, of course. Again, that's kind of what we have in the works, at least now. We're trying to figure out what can be measured, what should be measured. And of course, there's a lot of things like addresses that may not necessarily be the same. So measuring everything in that case is, well, that's pretty much like measuring random data. True. Oh, of course. Right. Absolutely. Anything else? Yeah. So if you wanted not starting from zero again, 16 and 23, and filter the step operation, so you have some control for the TSS or the driver, sorry, you know, you'd be able to have some, who can actually do the research, that's a simple research. So you might want to think about using a resettable PCR. Okay. Well, I think we have to. I mean, for you to say. Yeah. Yeah. Otherwise, you do it for once. Resettable PCR might be kind of interesting. So something interesting that comes out of this too that just reminded me of is that you could also continually, infinitely, Kexec and continue that exact same chain. Right. So using obviously a resettable PCR at that point would make sense if you wanted to simulate a full reset. But in the case where maybe you don't, I don't have any examples off the top of my head, but. Right. Right. Right. In order to have flexibility. That's a good point. I'll bring that up actually, because that's an interesting point that I don't think we've considered. Anything else? Yeah. Performance. Hmm. I'm going to decline to answer that. Performance as a performance hit to what we're doing here. I see it nearly insignificant. I don't honestly, from myself, watching the console as it boots. I don't see any difference. I think we're a bit early to actually do some kind of data measurements on it to see what it is. But I have not seen anything myself, at any major hit at least. Anything else? Performance. Alrighty. Thank you very much. Thank you.