 Hello! Welcome to this CNCF webinar where we are going to talk about trustedboot and specifically how the Keros project implements trustedboot. First, we are going to have a look, generally speaking, at trustedboot and what are the ingredients and what are the challenges that it tries to solve. Then we are going to have a look at how links handle the booting process and most specifically how trustedboot differs here and how Keros implements booting. Then we are going to have a deep dive on the measurement versus encryption and we are going to have some security consideration on the implementation that we have taken. Then it will follow up a short demo where we will see booting of a trustedboot system and sort of examples to underline the security consideration why we pick it up, some of the design choices that we did. First of all, yeah, I'd like to introduce myself. I'm Petra Di Giucinto. I'm head of open source aspect to cloud. I've been involved with open source for almost 18 years, wearing different hats. So I'm also a developer again to Linux developer and I had my last team was at Suisse. So what is all the fun about cloud and the Edge? So Edge is becoming very much important lately because it's being used almost everywhere. Also the computation, it's moving not only to the cloud but also closer to the real use cases. So we can think about Edge at retail shops, for instance, also to government or public sector deployments, trains, of course on ATM. So it's not only about Kubernetes, even if there is a strong push towards using Kubernetes everywhere. Edge by itself of its own set of challenges because as you can imagine, there are different use cases which requires different set of features. So for instance, high availability or that encryption to ensure that the data is kept secured, preventing from cold boot attacks or physical attacks, setup and maintenance. So maintenance is also a very strong point where everybody can allow to have technical or skillet people closer to those machines where they are actually running the workloads. So it's very important to have a sort of management plan but as well it's very important to have a mechanism that prevents from theft or cold boot attacks or where people can actually mingle with the device physically. And this is very important that we are going to see why trusted boot is important and solves a critical issue in this field. So why we're talking about trusted boot, what are the goals and what are we trying to solve here. So what we are going to solve is preventing modification to the system and that is both runtime and offline. So we are going to have a very deep dive into what we're trying to do and why and prevent of course the theft of the customer data. So whatever it's actually a device or trying to process at the edge, we don't want anybody to have access to it. So if somebody steals the device, power it off and walk away with the device, they shouldn't be capable of reverse it, reverse engineering in another location. So we want to also keep the simple UX as we have today because we are going to see how Keros does things differently and where it sets it apart from other distributions. What are not goals here of the conversation is to prevent the theft of the device. So we are not talking about physical security layer, but rather a softer security layer that prevents physical attacks. What are the most important gotchas that we would like to underline here. And there is a key difference between measurement and encryption. So measurement in the way that we are discussing it here guarantees immutability and verifies the software integrity. So when we talk about measurement, we talk about a very complex topic. So I'm trying to simplify it here in this conversation, but the outcome of it is trying to make sure that the software is run against measurement. And most importantly, so if the device is equipped with a certain software, we don't want the device to overrun again if that software is modified. So the system has to somehow assess itself during the good process to be able to know if it wasn't modified by anything. So it has to be of course tamper proof both online and offline. So we are going to deep delve on why we took some design choices to make sure that we are also tamper proof online. Or at least it's not possible to change the system online as well. Encryption is different, right? So with encryption, we refer to the fact that the data is readable offline. So if somebody steals a device or take off the disk from the system, they cannot see the data. Of course encryption doesn't give you the guarantee that the data is immutable because when the system is in runtime, the data can be manipulated. So there are other security frameworks that are taking care of encryption of data also online like confidential computing, but this is out of scope of this conversation. So what are the ingredients to make Trusted Boot work and how we can solve the challenges at the edge? So we're going to see how Trusted Boot is split into several components. Some of them are already touched. One of the main components is what we call UKI, which is unified image or USI, unified system image. So from a technical standpoint, Keros chooses USI and we are going to see later what is the difference. Secure Boot, measured boot are key ingredient as well. Secure Boot is a security framework that is very well established in the firmware since the past at least almost 20 years. So we have seen Secure Boot growing in adoption. Now basically any recent hardware should have support for Secure Boot and most importantly for a weak UFI. So since the BIOS was in from the beginning of the computers, the attackers basically challenged the security system of the BIOS. UFI was standing as a security mechanism to make sure that we boot secure part of the systems. Now we are going to see how this concept is extended a little bit to make sure that we adopt Secure Boot to boot larger FAA files. And also another key ingredient is TPM, which can be both discrete or emulated by firmware. So we are not constrained to an hardware implementation. However, the hardware implementation seems to be moderately more safe. Of course, verified kernels, since they are assigned kernels are part of this as well, and encryption of this with TPM, which is not something new. Again, also Secure Boot is not something new, but the way all of this is implemented together, you can read more of the references being the docs of KBOS and also on the fantastic blog post by Leonard Pottering, the author of System New, Brave New Trusted Bootport. So where are we looking at here? It's time to make a difference between the standard Linux booting on the left and the KBOS booting mechanism on the right. So KBOS already does a little bit of distinction between the booting from a standard Linux distribution standpoint. By looking at the left picture, we can see how the boot starts, starts from the firmware and then goes to the boot partition, which is typically unencrypted. And then you have the grab bootloader taking in. Of course, you could have different bootloaders, but now specifically we are looking at grab because it's most commonly used and been there since a long time. And then from grab or another bootloader that you like to use there, you can jump basically directly to the kernel and then to the NITRD. The NITRD is just a very small system that has the proper tools in order to mount necessary mount points to switch to the real system. And typically the real system can also be small or big. It really depends. It can be in a partition, a different partition, and of course it can be encrypted as well. So it really depends on the use case here because it could be an encrypted partition, for instance, with a simple pass phrase that we entered with the keyboard. It can be a PIN, so it can even be a PIN mixed with the TPM chip and that allows you to another degree of security. However, this kind of setup requires a human interaction. Of course, there are other mechanisms to avoid user interaction and you can also use TPM chips for that directly to encrypt the partition with grab. But now on the right, we're going to see the difference with Keros. So the Keros approach is slightly different because it starts with a boot partition, which is indeed containing grab as a bootloader. But then the real system, it's a single image already. So that's including the kernel, the NITRD, and the root at first. So when we do the pivot route, we actually do the pivot route, but it's inside the same image and we rely, for instance, on grab to be able to mount those image file as loopback devices. That it's very different because then the system is a single unit and this is represented by a single file. And this is very important because we are reusing this concept during the upgrades or the resets or during the life cycle of the machine. So this is tremendously important because separate the different life cycle usage of the box, streamlined use by having less operation Of course, when you have to upgrade the machine, you need to rebuild the image from scratch. And this is why we talk about immutability a lot in the context of Keros. We think this is a very important aspect and it helps during the management of various thousands of nodes because you build one image once and you deploy it across different devices. And they expect to be all the same. So there are no snowflakes during the upgrades. So we don't use a page manager during the upgrade, but we use the package manager during the build of the image used for the upgrade. And now we are going to see closer what's the difference in the UTI approach. It's not too much different from the standpoint that we still have a single image file. In this case, the image file, it's a kernel and a root fs. And the boot loader in this case, it's changing. So from grab, we switch it over to SystemD boot because SystemD have a tight integration with trusted boot. So they have most of the softer pieces already tied together to provide the best out of the trusted boot implementation. And the file are still resigning in the boot partition, which is unencrypted so it can be accessible. And the FCA file are composed by the kernel and the root fs. So here we can do a distinction between UTI and USI because by specification UTI files are only kernel and in interim fs. So it's meant to switch over to another system. So pivoting, like we basically have in the legacy Linux system over here. So when we do the pivoting here, we are basically switching over to the software into another image. And that can be, for instance, switching to another SystemD which is included over this partition or root fs that we'd like to limit. And of course, there can be different given kernel models. So when we do a pivot route, we are switching the context of the system. So the thing, we are also ending over the init process. So the pit one is going to end over the execution to another thing. So the only thing which is not moving during a pivot route, it's the kernel. Everything gets it's moving. So in our approach, instead, which we call USI, it's unified system image. There is no pivot route. So we jump from the kernel directly to the root fs. And then from there, we mount the system and overlay that on top of encrypted user data. And this is what stands this apart from implementing exactly UTI. So we call this unified system image. Now let's have a look at what are the differences with a measured boot process. So a measured boot process helps the difference that it stores each of the measurements in what we call TPMPCR registries. There is a certified root of trust model. And that one is the first thing that boots even before the FEI. So that's going to do its own measurement and store the measurement in the TPMPCR registries. Now each one of those steps you see here, which are going to store the measurement, are actually extending the measurement, creating the fact to a chain of measurements. So each measurement depends on the measurement of the other block. And this is done transparently by the TPMPCR. You cannot really store anything else, which is not extended. That allows basically to, whenever we do the boot process and pass by to one of the steps that are applied here. So we go from CRTM to the FEI bias to the boot order and finally to the OS. All the measurements have to match a certain expectation. And this is where the challenge comes. So when we build the OS, we build the OS in a way that we pre-compute the measurement so they cannot really be modified and the OS is going to refuse to boot if those are going to change. So this is the core of how the boot measurement process works. Now, what's inside the unified system image? There is nothing really confidential in there. So all the data which is continued into a U.S. public so we can think about it as an interim affairs, but actually it's the whole root affairs. So there is a kernel, so there is a system D sending it system. There are file system tools needed actually to discover the partition in the system and be able to mount that. And then there are precalice components which are needed in order to set up the immutable system or for instance configure the machine with the cloud configs and a locking partition. So there is nothing really important here to see, but everything to find in a U.S. image is already public. And how does it work all together? So we can think about it in two steps because there is a signature verification. The step that is done basically because since it's a single FEI file, we are able to sign it with secure boot and make sure that this file is the only one that should be booted in the system. And it is very important because if somebody tries to change the file with a wrong signature and the signature verification phase, the system won't be able to boot and this is gated by the secure boot mechanism. Then the TPM enters into the stage, so to say, because then we decrypt the target and the encrypted portion of the disk only if the measurement are matching and if that's matching the PCR policy keys. So the PCR policy key is another important aspect of all of this because when we generate those USA images, we're going to generate first a set of keys that are going to be needed along the life cycle of the box and those certificates are going to be needed every time that we're going to build a new image. For instance, if we replace the PCR policy keys, then the system cannot be able to decrypt the disk anymore because they have to measure it. And this is what makes it tamper proof. So whenever we do a modification to the system, that's going to be detected by the measurements and it needs also to have the PCR policies in order to... policy keys in order to decrypt correctly the portion of the disk. So the target that we see here is the system after we boot with the UKI. The target is a mixture of what is inside the USI plus the overlaid content from the encrypted portion of the disk to be able to decrypt the process. So why we don't switch to the second route is very important because switching to a second route which is not measured would be a security flaw because switching to a second image will not measure it. It means that in doing runtime a malware or route keep could actually modify and tamper it and it would go unnoticed. So it's very important to have the measured first mindset in this case because it allows to have a very much stronger security posture rather than lying on the fact that you can modify the second stage and get unnoticed. It's not possible to measure the switch route, the second stage in it now. For various reasons, one of those is also because it's not easy to measure that big chunk of files with the TPM chip. So possibly in the future there will be an enhancement on this area but the choice here is also to make things very simple because it makes also the maintenance very simple to have one single file which is signed and measured at the same time. We do have a much more security posture in this case. So how they are composed of those USI files are unencrypted first. They are containing, as we were speaking, not something which is confidential so the content of those files are public. So the first section we have a system disk tab which actually allows to the API file to be booted directly from the firmware without any bootloader. Then there is a panel and then it comes in fast. There are other portion of the API file which I've kept, not dislike, make things much more simple. However, taking into consideration those are the biggest, the most important part of the API file that we care about and how we can create the API file. So what we do basically is run a UQFI tool which is a part of system D which takes as input different portions of the API file. So they need to manifest the kernel, let's release the ename and cmdline for instance because also this cmdline cannot be tampered so we cannot boot an API file with a different cmdline which wasn't signed against. So this is very important which makes it stand apart also because now the system that we're going to have cannot be booted differently than it was supposed to be when it constructs. So they need some fs in the specific case of K-ROS it's not a need some fs but it's the full root fs and which makes it much stronger. And how this is split is shown in this following slide. So the whole API file is signed with secure boot. And the portions which are measured are the kernel, the root fs, the cmdline, this release the ename and others which are also there are smaller components that can be also measured as well like the splash image or the device tree database. And those are actually the measurements of those parts are signed with a private key. In the public key it's then made part of the FIA file and we have also the PCR policies which are seen in the parts of the PPM banks that are responsible to decrypting the disk. So only if the measurement and the sign letters are matching then we can actually unseal the PPM PCR banks that allows to decrypt the portion of the disk that we are interested in. Now we're going to see a small demo where first we generate the keys required to build the OS and in the specific case the installer ISO but those keys are actually also needed to build the other ways to install the system. So the keys are needed also to create the FIA file or a container image used for the upgrade. So the same keys are used for the whole life cycle of the box or the boxes or the cluster. And then next we are going to see a booting system so all digitalized in a VM how the ISO is going to be putted how then it's going to be installed and then a deeper look on how our system looks like after installation. So the key generation actually allows us to generate keys for secure boot but also the PCR policies which are responsible for the decryption of the partition which are encrypted in the system. And in order to do that we will use a container image which is being published by the Kerosene and this container image has all the tools needed to generate specific artifacts. Here we can see the keys have been generated and between the keys needed which is the secure boot certificate and the keys needed to actually boot the FUKI files and there is also all the certificate and other base needed to do the outer enrollment for the installer ISO. So right afterwards we can go and see how it looks like the process of building an installer ISO. We'll embed part of the public part of the certificates which are also needed to do the GFI enrollment for the secure boot artifacts. So we can see here we start from Maconimus in this case and then we have a tool called a US Builder which is one of the artifacts built by the Kerosene to actually create an installer ISO. The installer ISO is built off of a container image as I just explained and the container image can actually be used as a source and allow for the customization of the simple Docker file. The input of this process is both the container image and the keys that we generate in few steps and this also means that whenever we have to build again another container image for the upgrades of the system we will need to supply the same keys. Now we are going to boot ISO and do the installation. So in order to boot the ISO we are going to create a simple Docker container which contains the key more dependencies so we will start this in at the end. So the command that you see here are available in documentation of the Keros website and you can check out here it's just the M commands which basically is a wrapper to avoid to install any dependencies in the system. So we build this container image and this container image actually is used for two things. The first one which we've seen in the top pane is to start an emulation of the TPM chip so we're going to emulate the TPM chip and on the bottom one we're going to start the key more ISO that we have just built. The commands are documented in the Keros website and you can reproduce the same as well locally. In this case we are specifying the CD-ROM and the ISO that we have just built and you can see here how the system is also automatically enrolling the secure boot keys from the directories available in ISO. So all the certificates that we use have been regenerated from the first steps we're going to be using now. So you can see the ICD which is now boot and the FCI file it's going to build. So the FCI file here, it's a UTI image so it has the kernel and the unit 30 included the unit 30 in this case it's all who has so the approach that Keros takes here it's to have a unified system image so the single FCI file it's the whole OS so the OS is signed in as an FCI file with the secure boot and also measured by the TPM chip. So now we are going to the installation process the installation process of Keros can happen in different ways and this one it's the interactive installer that you're going to see. So just a bunch of questions and we can answer and we have straight away a full installation. So you can see the installation is very quick because in this case we have to just copy the FCI file and encrypt the partitions. We can have a quick look and we can see now here the TPM and the persistent partition which are like those are very Keros specific partition OEM contains the configuration of the system along the cloud configs and the persistent partition instead contains everything else which is the user data accumulated to the use of the OS. So we can also see balancing with LSDLK the partitions and now we are going to stop the system reboot and check what happens to the first boot and have a more deep dive on what are the partition how they are used in the system. So the booting is happening and you can see also in the top pane the operation is terminated by the TPM chip. So now we start the first boot basically what it means we are going to restart again the TPM chip emulation process and on the bottom on pane we are going to boot the virtual machine without the CD-ROP so it's only booting from the disk you can see there are three entries active passive recovery which are common to a Keros system so now we have selected the active entry and we will see the system booting you can see now a bunch of DMSG or output directly because this is running from Kim from the console but now we will be able to login into the system and we have installed it with the Keros it's a Keros password just to test things out and you can see there is a federal Linux container image displayed in the welcome message so now we can actually now list the partition and we can see that the OEM and the system are encrypted with the crypto type over there also see a list with the LS, BLK and the mount how actually they are used in the system so as a reminder in the Keros system we have part of the mount points overlaid on top of the running system and this is for instance home OPD use local it's entirely on the persistent partition ATC it's a femoral and all the parts you see in there are overlaid on the encrypted portion of the disk now after the demo we can have a little bit of understanding how those pieces are fitting together and we have also seen how it's very nice in demo to start from a container image so I think that's a very nice part of Keros because it allows a higher degree of customization so you can customize the system before I had with the Dockerfine and add the tools that you like and at the same time it keeps the flow very simple because it goes straight from generating the piece to building ISO and then booting there so what are the consequences of using a USI image rather than having a UTI image so let's imagine a malware rootkit attack so something that hits the system remotely that can be for instance if we have a class that's exposed outside somehow or there is an application bug anything that can be used by a remote attacker so let's say that remote attacker also manages to run a malware inside a rootkit then we have a much higher degree of security because if they modify the FAFI the system won't boot anymore and they cannot even modify anything about it because the system is completely loaded and mounted only so they cannot modify the system as it's seen by the booting OS and they cannot even tamper it afterwards because they cannot tamper the file that are used to boot because the FAFI are entirely measured and let's say even if they manage to get the secure boot keys still the system is measured so it won't be able to decrypt the encrypted portion of the disk which is one of the most important things that we want to protect here so what happens if somebody has physical access instead so they try to insert for instance a live CD or even try to hack around those nice duck db keys which are compatible and for instance a Fripper 0 can emulate a keyboard and let's say the attacker manages to have security hold inside and leverage security hold inside the box with emulating USB drive so something lower level that can trigger a bug from the kernel standpoint or driver standpoint in that case we are still secure because they can read the content of the FAFI even if they take the box with them but they cannot read the content of the system which is the encrypted part of the disk which is the most important part because that part is being overlaid from the encrypted partition now specifically when we look at for instance how the user passwords are stored in Keros we store those into the OEM partition because they are part of the cloud config so everything in ATC in Keros is ephemeral including the password D or shadow file and so that means that any modification to those are going to thrown away during the boot cycle that means the only way to persist those information the system is to re-apply the configuration over and over again during the boot processing and those configuration parts are part of the OEM partition which is encrypted so what sets apart these from using stage 2 so here that we see here in the slide it's the internal component of Keros which is responsible to set up the system in a mutable manner and it's responsible also to do different things like decrypt the partitions so stage 1 it's what we call stage 1 which is a USI so we have the FEI file including the kernel and the whole interface and that's going to be booted then what happens in this good process is that there is a handover to system D but still done in the same root effect so there is no real switch stage 2 is that there would be another step which would be after decrypting the partition we prepare another environment or actually another partition then we switch root that one so even the context of system D would change and when we boot to switching it means that there could be another system D version inside the stage 2 and that would go completely unnoticed because that cannot be measured so what are the implications of that so let's say in the same case that we talked about before there is a malware or a root heat attack remotely so body having getting access to the system would change the stage 2 in a way that it would get unnoticed so they could replace for instance the system D version with a backdoor and nobody would actually notice that in the same way if somebody have great access physical access to the machine they cannot decrypt an encrypted portion of disk they can manipulate the stage 2 during grand time so let's say that they cannot manipulate that from a live CD standpoint from the machine D turn it down turn it off with a different system but if the system is up and they manage to rely for instance on a kernel bug or a driver bug and bypass and get a console from the running system then they can do any manipulation and that wouldn't get unnoticed at all so what's the future so what we are looking after all of this work so we are looking in a way to measure system this is extension because we think the security is very much important we don't want to sacrifice that so we would like to have a mechanism to overlay that extend the OS with additional packages which are not part of the USI that is for instance if we want to have a big chunk or large modification to the OS and add very large packages and this is also for UX standpoint we would like to have support to overlay portion of the disk from encrypted disk and then we would like to have a way to have a measured default so this is not something we are receiving completely so we are still looking at that with a grain of salt because we think one big image is still better for maintenance however we would like to export ways we can do that if at least from a security standpoint we can allow that we would like to improve the UX on top of this it's already quite easy to get this started by following our talks but we would like to improve the UX even faster and we would like to extend the support of more levels so currently this is working only from Fedora and Ubuntu and we would like to extend the support for all distros that we already have and then we would like to blend this this concept with the remote measurement and be able to control also the booting of the machine from the management control plane thanks for watching you can visit keros.io to learn more you can try already the trusted boot experience at what we have been discussing a year by following the documentation the keros website you can see the link over here you can learn more about the trusted boot by checking out the architecture page and the keros website and then you can engage with the community so we have regular office hours, we have monthly meetups where we share updates as planned for the next releases and then you can hook up with your team with Slack or Matrix or GitHub for discussion. Thanks for watching and have a nice day.