 Okay. Hi everyone. Hi everyone. My name is Roberto Soso. I'm from the Dresden Research Center in Norway. And today I would like to present to you my work on integrity. And I first did the implementation in the kernel and now I'm trying to do it with ABPF. So first I will show the goal of this presentation. We'll give you an integrity overview, the state of the art. I will introduce the implementation of the GLEAM on ABPF and then the conclusion. So the goal of this presentation is to show a flexible and scalable way to do integrity protection at operating system level and to do it with ABPF. And so during this summit I would like to discuss with you how it should be integrated with the ABPF kernel subsystem. So when we talk about the integrity of an application we talk about three different aspects. So we talk about executable code, configuration, file and keys which are all immutable and then application state which is mutable. And how these three aspects influence the behavior of the application. For example, if we have an executable code, we have an alteration of the executable code function one will provide a different output to function two and then the overall output will be not the expected one. If we change also the configuration file or application state we could take another branch in the execution flow and then this will cause the output to be unexpected. How we can solve this problem? At least for the first two aspects. So only for executable code and for a configuration file. We could have an access control mechanism that let only approve the code or configuration file to be loaded by the application. We can take a reference value from the Linux distribution or from the end user itself. And the goal is to prevent the attacker from executing arbitrary code on loading malicious configuration file. So this is a very high level on integrity protection solution work and basically we have the software developer which builds the software and they provide a proof of authenticity in form of signature. And this signature goes to the target system of the end user and the end user as an access control mechanism which consists of the enforcement part itself plus the policy which consists of the approved vendor keys and also which part of the application we want to protect. If we want to protect only executable code we want to protect also configuration file or even application state but it is not the case for this presentation. So this work is inspired by integrity measurement architecture which is part of the integrity subsystem of the kernel. But as you can see here it works in a slightly different way. So it takes a signature from the software vendor as an evidence of authenticity and this is what the IMA verifies in the end user system. What is the problem? The problem is not in the mechanism itself but is in the ecosystem because if you see in the left side this is the current format of RPM where there is only one signature for all the files while IMA will require a signature for every file. So we have a different granularity. So on the right side you can see redundancy. And the important part of this the important limitation is that the Linux distribution vendor which wants to provide IMA enforcement need to be all the packages. IMA is also more limitation. For example, we cannot do actually currently enforcement in Denysharan disk because there is lack of extended attribute support and Huawei operating system open order as a non upstream patch that provides this support. Also there is an unusual default policy. So IMA appraisal do the appraisal of file on the by route which can be easily circumvented if there is no metadata protection. So it's sufficient to change the owner of the file to not have the file verified. And the problem is and so now Fedora 37 has a feature which was accepted to include the file signature in the package. So this is great. But what happens if we want also metadata protection? So the feature is only for content protection. So we will add another signature and this will cause additional overhead both for the software vendor and also for the package size which will increase even more. So we solved in one way this metadata protection problem and this is available since September 2020 with our IMA digest list extension which is basically a modification of IMA to do that. So now we come to the solution for this problem and so first on the left side we call digest list every source of information be it RPM package or Debian package or even the manifest of Docker container or the information which provides the checks of the file. And how this existing source of information are used? Basically we have so this control mechanism which before was only one component we split into parts. One package format dependent which basically attempts to parse all the digest from the packages and another part which is a package format independent which works regardless of which information we get. And here you can see that it's modular because we can provide a parser for each package we want to support so now we are supporting our RPM but we can introduce a parser for Debian packages and what we do we once we extract this information the file digest we put them in a map in a hash map. Just this is what I wanted to clarify in the previous discussion we had. Can you so this digest list it is essentially a file and like a signature or like a... Yes so it's just basically the RPM header and then I put this header in a file and I add also the scene to in a form of module signature. I know that it seems strange but it's because in the kernel we have this routine for signature verification and they put exactly in the same format that the kernel expects. So I mean I know this is not a part of your presentation just to make so because this discussion is going to go a little bit deeper can you show me an example of this digest list or do... Oh yes sure so actually I have a demo later but I can start it now. Just so that we don't lose. Okay I'm not sure that it's no maybe I increase the character right or you say... Okay so now the first part of the demo was to generate the digest list so I will do it now. So basically now I'm doing all the steps that are required from almost unmodified distribution to have the Deglima. I say almost unmodified because I needed to to install my kernel with some new helper that I'm using for the signature verification. Okay so now what the script is doing is iterating over the RPM database. It's taking the header of each package and is writing it to the disk. Okay so now we also generated the digest list for the custom file that the custom version of Deglima that I installed in the system and we'll just take a few seconds. I'm also signing the custom digest list. I'm enabling our RPM plugin and logging service and I'm just rebuilding the initial disk. That's it because the digest list need to be in the initial and disk at the time in it starts because this is when the hash map is populated. Wow this is exited without failure. A demo that works. I mean the demo that works is impressive. Let's see later. Let's see when we reboot. So now it's not enforcing yet. Okay now we are ready to to show the digest list and I take one. For example we take core no RPM core UTS. Okay what you can see now this is the appended signature and also the string which to identify this is an appended signature and on the top part you see the file digest that this is the information we want to extract and we know that it's authentic because we have the signature of the vendor here. So this is so we and I did not sign this this package so this signature comes from the vendor because it's a section in the RPM header from which I take the signature. Because the RPM header and the signature for the package. Yes that's it. You want me to continue the presentation or to run? That's to the presentation. Okay okay so now I just show the components of the Glim and this is the tradeoff that we have. So first one is obviously that we are not constrained to a specific data format we can extend support for additional parser. You should just go one step back on that diagram. Yes yes sure. And like there is some you know there is the parser that is parsing these digest lists right? Yes. But then there is there are other components we should we should discuss what is happening in the first box there in the second box there. Okay yeah so basically the first step is we take the digest list from the disk and we run a ebpf program which extracts the digest. So that is the parser and we do we push the element into hmap. So you will see later that the hmap has 26,000 digest. And the next part is the security model of the Glim which attach to the security is very similar to AIMA. AIMA attaching to xf, to mmap, aso2, and protect. And what is very surprising is that this security model is very simple because you will see later that we are just doing a look up in the hmap and if the look up is successful we allow the operation otherwise we deny. That's it. It's very very simple. Currently the policies are coded. We are doing the appraisal of executable code but we can do in a similar way like AIMA to appraisal so configuration file we can switch to policy-based enforcement. But for now it was just I wanted just to verify that with the ebpf we are able to do this disenforcement and it's just a feature that we have on top. Okay so the most important part here is that with the Glim we do we can do AIMA appraisal with almost no modification for the Linux distribution vendor. So if, so literally this part is for signature verification but before I just run the Glim without touching the distribution. I just choose OpenSUSE, Tumbleweed and Fedora 36 and they were able to do enforcement of the executable code. It's very very surprising and the other part is that this is more efficient because we are just verifying what signature for all the filing in a package. So for example for the kernel we have many files for all the system D and instead of for AIMA we are verifying a signature for every file so it's clearly more efficient. So now let's say the less good part. So currently the ebpf map is fully preallocated because I understood that since we are executing a sleepable program we needed to do preallocation and this maybe it's a bit creates too much overhead in the system because if we just do dynamic allocation we could allocate only the memory necessary for for the digest that we want to store in the map and currently we have to have a limiter, I limiter and so that we are sure that the map is not filled and but this means that we are are likely allocating more memory than necessary. You are storing all the all the digest of all the things in the map right? Yes, but the problem is that I had to put a limit to say this map has one million of maximum one million of entries. What would be the alternative? You will load it on demand? Or slices. For example when we reach a limiter we allocate another page or so we can still have this partial so not fully dynamic allocation but just something which could be handled by the bbf channel subsystem. But what is the key? The key is the digest algorithm plus the digest so it's also in this case it's not 200 digests so what is the key in this hash map? Value is a digest right but how do you look it up based on what? So the key is the digest of the file so we are adding... Oh the key is the digest itself and the value is like yes or no pretty much existence of the digest. We are adding more information like the type of the file that we have for example if we we store the information that this file is immutable because maybe what we could do is to then I write to the file. And what's the length of the digest? Like how many bytes? It's 65 because it's the maximum sites of the digest plus one of the algorithm. Okay. But what we could do is to have different maps one with the right key sites for example we can have a map for chart 156 which has a key of 33 no 32 because we is a map for just that algorithm and but for now I didn't optimize the implementation is just the most generic and also because the maps of maps understood that there are limitations on the inner map. So just a general suggestion if you think like some problems are there like we can address all of them like I mentioned the sleepable can only have preallocated well it's only because it wasn't addressed yet and we've been talking about fully resizable like hash map forever. Okay that would be great. So like all of this is coming it's just not done today so don't think of the current implementations that somehow will stay forever. Yes I show all the problem that I encountered and but I'm sure that we can solve together. So the problem that you could think is that this would add too much memory pressure because if you store all the digest in memory maybe the requirement is too high but when you did the internal implementation actually just for storing the digest of the executable code they perform the numbers were very good in my opinion so less than one megabyte. So if we have a resizable hash map probably we could reach this similar. Just go with the one megabyte thing for now like I think it's fine. Yes okay yeah okay and also for when we booted the system you will observe that we have a small delay because we are passing all the packages and but when I did the internal implementation I noticed that I compared the boot time of ima appraiser and the boot time of my implementation my implementation was faster took one quarter of the time. I am doing more than just executables because that may explain for the data right. Just executable so I think the total number was less than 6,000 files. Okay so you you are comparing the same. The same yes yes. Okay that's all for the trade-off and why AVTF. So basically I did the several versions of this solution and the first solution was called ima digest list but it was not accepted because it was too invasive. It also more features so this was not just for appraiser but also for remote attestation. And so the the maintainer said that okay if we accept this maybe there is a risk that something in ima will not work anymore and Mimis are asking me to do something which is more modular in a way that we don't add everything to ima but something outside. And this is what actually I did. So I did a central module which has the functionality of a dash map and ima basically was a very small change just to do the query on the on the database and decide what to do. So in the case of remote attestation for example I wanted to avoid to extend the task platform module every time we access a file but we do it only if the file is unknown. Plus we do the measurement of the digest list of course for completeness of information. Okay so but this version was not accepted so in the first version I received the comments and the third version which should have addressed all the comments. I didn't receive responses so basically it means that was not accepted and I knew that ABPF could have been another option but without knowing it I just knew that it could be useful for processing network packets so it took a while for me to see that ABPF could be used for the solution but I think the most important part was KPE work on the ABPF LSM which bridge the ABPF functionality with the security part. Okay so I think I should not explain what is in BPF but maybe it helps you to see if I understood what is ABPF. So it's a framework to add the more functionality to the kernel without modifying the kernel and it allows to load the send box of the program after statically checking the recorder for safety and provide the integrity assurance so there is no legal memory access and there is a finite program termination and confidentiality assurance so when for example the lockdown is in confidentiality mode ABPF is not allowed to access all the memory areas but maybe some I don't know exactly this part and I said just a few moments ago that sub-BDF LSM is the critical component and it allows to attach to the security hook exactly in the same way as a conventional Linux security module and also we can do so these security models implementers of the ABPF can make a decision and this causes the same code to terminate if the decision is denied and so the LSM framework currently requires that all the code is built in in the kernel and the main advantage of ABPF is that we can attach programs without pushing them to upstream so now the glim is in split in two parts the conversion stage and then the access control stage and this is the first part so first we start from user space and actually actually we start first from the kernel space so there is a loader which could be like the skeleton or we have a user space loader for now it doesn't matter but it matters for the security guarantees that we get out of it and what is running basically these each parser for the digest list plus the security module here we have just the first part so we stop at the hash map containing the file digest and when these ABPF programs are running we have a user space tool called the glim user loader which push the digest list to data input map and when we do this operation basically we are attaching the parser to this to the BPF system call and when we see that the operation done is BPF map update element we process the file that was pushed to the map and so the first thing that we do that the parser do is to check if the signature is valid so we I introduced a new helper called the BPF mode verify signal which take use the kernel keys in the primary and or secondary key ring this is very similar to AIMA and if the signature verification is successful then each BPF program do the parsing and push the extracted the file digest to the hash map and if you want to add a new parser we simply attach so I mean like I think when we were on the call we were arguing about whether this was the right design or whatever and we could probably argue that for a while here if we wanted but then is the bottom line that you just need a helper that BPF mod verify SIG to build your thing is that the missing piece for this this stage of it yes so the non-existing part is a new helper and that would be called from sleepable context right so any sleepable context yes I mean to me that seems generically useful whether like I don't think we have to agree on your total model to agree that a signature verifier inside the kernel helper that runs for sleepable context that can read keys is a good idea I think I think something more generic like here it is based on the module key ring and it has modules right maybe something that is more future proof some way to read a key and verify it that something matches it or whatever right like BPF verify SIG right then and then the type of key rings and if there are more key rings in the kernel you could use that or something there is more future proof doesn't have like just more but we could argue about that helper and then we don't have to argue about whether we think this is a good design or not because there's probably other use cases for this I agree with your assessment that helper is generically useful like and one of the challenges that not everyone should be able to add the keys to the key ring so if you wanted to have a BPF specific key ring we need to be sure that there is the same restriction that the IMA key ring other so that should be orthogonal I think like I think that in BPF we should not worry about how keys are there and only allow the helper to access the keys they can read only so what is happening right now is that I'm calling a verified pkcs7 signature and actually I have a patch set for supporting a pgp keys signature because the rpm either they are signed with the pgp and so I took the the patch set from David Overs and just upgraded to work on the current carer so now we support two signature format that pkcs7 and pgp so when the the parser is calling the helper is verifying the pgp signature of the rpm either but all of that aside could I use that helper from like from IMA or from the lsm hook to verify a signature of an executable matches something in the key ring I'm a little out of my depth here because I'm not entirely sure how the key the actually PI the key ring looks like but okay so for now I'm just using the primary and secondary key ring of the current so I minimize the changes that we are we are doing I think I think the goal here if I remember correctly from your previous discussion is that the reason why we're doing this is because we don't trust the sort of package on the disk right we want to make sure that this is the header this is this signature claims to be the signature of the header let's verify this in a component we trust we trust the kernel let's verify that and the and the and the bono of contention and the bpf sort of office hours was that can we trust something beyond okay I have the next slide for that so this is a comparison with IMA and the glim so as you can see here the ground in the IMA the granularity granularity is file and we have all the information inside the kernel to do the verification we have the signature verification part of the gestic calculation so the trust the boundary is inside the kernel it's just the kernel and the problem is that when we switch to different granularity so we switch to package granularity we need another component which is called the digest instruction that needs to change the granularity granularity from package to file and the problem is that if we put this component outside we need to provide the guarantee the guarantees on that yeah but like the idea we were throwing there right like and this this has implications right now which you pointed out in the office hours the idea was that this component could be run the thing that does the extraction could be run fairly early on in the boot process before anything else runs but then you had concerns about like it needs to have access to the rpm packages which needs to wait for the file system to be mounted right mm-hmm yeah I I thought about it I'm okay with like if you consider your trusted I mean the part of the flexibility bbf is supposed to offer is that you you should be able to choose a trusted computer right like if upstream was to give you like thoughts on how you should implement your lsm let's just not have a upstream based lsm right like I think this is good discussion points but at the end of the day you decide what your trusted trust boundary is mm-hmm I think because Alexi in the bbf of saura ask at the why the rpm parser is in the kernel and I think the the answer is that if we push it outside we need to protect it and it means that we needed to increase the complexity of the security module to protect the user space process which is doing the conversion so here are a couple of things right we agree that the and please like correct me if I'm wrong here we agree that the helper you're trying to use to build that digest extractor in the kernel is used okay so let's implement that let's have it there now for a user who chooses that look I don't want a bbf program doing the science extraction because I was sufficiently I control the way my boot image is constructed I can put the image onto my cloud where like workloads are not sort of scheduled and then create the digest extraction there have this this is like the unenforced mode of thing where I trust to use the space process and I build this list and then I reboot and I am done with one could do that one could create it using a bbf program so I both are fine okay I have another question can you go back one slide so like this digest items hash map right how do you make sure that nothing tempers with it we were just discussing so the problem was that I wanted to do to implement the bbf map security hook and to catch the right attempt and to deny it when the right is on the digest item items map and the problem is that if I did that on that hook bbf tool map lists stop working and so I tried a different solution which is to implement the bbf security hook and I get the command in the system caller which is a map update element and if is that I added a new helper which is called the bbf map map same which takes the file descriptor of the user space the address of the map or the digest item map and compare it in the kernel and if the the map are the same then I return permission that I I am I rejected the the operation of updating the element I think we we discussed with Alexia and probably the solution is just to to do a small fix of bbf tool to not request the right permission when it gets wanted to get the info for the map and in this way we I don't have to add a new helper and it's more straightforward okay I was thinking like I mean you could just freeze this map from user space right no that's like this bbf tool stuff is I don't think it's related it's just just like what we like find out this bbf tool when it's trading maps is doing get the id by id fd by id and this fd by id is only one helper in the bbf which is calling everything by default and by default we request read write when we request an fd by id that's that's all okay but let's say let's say that's that's trivial extension to like bbf and to bbf tool when it just needs the info for the map it doesn't need to request write permissions okay when it just wants to like say how big the map is that's just a bbf tool problem that's just a bbf like if you have some other application that tries to tamper with it I guess like from user space you could just do the freeze thing and you wouldn't have updates anymore from user space they would be denied but the bbf program could still do the update but then the question is what if other bbf programs would try to tamper with that map right how do you so I think we we need to check the scene to rough the bbf program that we load so we load only the valid parser and also probably I don't know if we could freeze the map from user space because in user space is my issues in my threat model and we need to to prevent all the all the all the attempts from to write so I think basically the sequence which I which I think Proberto was telling is so user space there's a there's a if you if you populate the map and freeze it there's a race right there's a window where somebody could exploit that in the kernel what what I think the lsm policy does is the lsm policy is loaded up front there is some way in the lsm policy that allows a signature like there's no signatures yet so something else that figures out that this is the only bbf program that is supposed to access and update that map everything else is rejected yes so yeah sounds okay okay this we discussed if there is no other question I can move to the next part which is the okay I think okay so this is the access control stage where we take the ash map that we just populated from the parser and we simply do a first we call IMA to calculate the digest of the file that is being accessed in this case is executable executable code that the application wants to run so after IMA returned the file digest to to the security hook it does a look up in the ash map and and then depending on the result of the look up we allow or deny the operation and for performance reason we also cash these the result of the look up in with a I know the storage map and we clear we invalidate the result when the file is being written did wasn't there a patch that we had to pull the file hash out of the like IMA machinery so like you didn't have to do all the is that what the is did he get merged oh yes cool great I'm happy so the so my only question like looking at the IMA is caching the results why do you need another I know I know local storage to to hash to to cache the digest because the problem is that the IMA cached depends on the IMA policy and basically with the DGLIM I'm not initially initializing IMA with a policy but I just asking the digest calculation functionality yes yes so before that I need to ensure just just sorry it's just a double check so so yeah so it's calculating but IMA policy flag is still off so it's not doing integrity fine so it's actually so you're saying when you're asking also like why IMA file hash you calling this new new helper that you added like BPF IMA file hash right so the BPF IMA file actually existed already but it was not doing so well we're still relying on the IMA IMA cache the problem was that in order to get the file digest if you want to use a BPF is you need to ensure that IMA calculated digest before you access from a BPF yes so that was fixed right so this was fixed at the yeah so like so this is solved problem right so my question then is because you just mentioned like it will either calculate it if it finds it but like in your scenario it always calculates as fully it's not cached there is no cache at all right because you don't enable any IMA policy IMA doesn't do any caching of the digest so you're always first time you're accessing it with the BPF IMA file hash it does the whole digest right no so basically I'm using the IMA dash map the first time yes but after the first time yeah but after the first time you're doing this you caching inside the IMA so the caching is not the IMA local storage right yes I know the storage yeah okay just like and I think it is also more efficient because the IMA cache is still three and while in this case so you have a logarithm time in this case is a constant time because the the node map is associated to the node yes yes yeah if IMA is implemented as a lesson would be the same yeah okay so now I think is the most difficult part how to pass the package with a BPF because with the verifier I had many many challenges at the end I managed to do the parts that I needed and so these are the details of the format and what I needed to do so I rely every on the BPF loop functionality so in this case in the case of the original digest list format there are two methods of the loop one 10 times 10 blocks of digest and then counts digest for each block and for rpm is I think it's easier because there are three three sequential loop first to get the offset in the data section of the rpm and then I process also the direct so the directories of the rpm header because basically take only the file that have executable permission plus the file that are in the lib modules or lib firmware because they are not executable and I think the key takeaway from this okay for us could be that like what are the verifier issues you run into because typically this you're trying to write something complex there is reading something from user space right you you you load it then you try to do something with BPF you run run into these things and then you you typically it's because that for us that we learned is it because you don't know something how you could make it more efficient in BPF right so that is a development thing that we should sort of fix in general so it was not just for efficiency but also for trust because before the first implementation of the glim rely on executing with a space parser which link at the rpm library and so there is this sort of a direction and if since we consider root malicious the problem is that the root could trace could attach to the process and can try to alter the extraction of the digest so I don't quite understand why we need to parse the rpm in why why do we not just explode the rpm into the individual hashes in the beginning why so you're you're parsing the rpm in the ebpf yes if you're building a hash map of files that are trusted why not just build the flat file hash map and explode that at the pre-processing time so I'm not sure and this is the so you you've built if I understand correctly you've built a you're using the rpm packaging format to get the digest of the files yes why not just take the digest of the files and store those in the map and as block stock and barrel instead of having to parse the rpm file format and remove that complexity I see I like so what are you saying so your threat model doesn't trust user space yes so like you could have the hash map populated pre-populated right like from these you don't do the verification there you have the hash map populated you walk through the hash map in bpf and then you want verify the signature in your bpf program all you care about is the verification of the signature or you yeah so in the in the previous slide so if I put them the digested traction component in the kernel I rely on the kernel keys I rely on the signature verification facility so I don't need to trust anything else I guess the only only sort of malicious attack here is that the user space adds a very vibrant digest right it shouldn't be there but that should never exist right like such a digest it is not signed so I think it depends on the use case of where generation is happening if you're like self-rolling the system or whether you have a like a dedicated CI CD pipeline building your images whether this is a threat in your threat model this is a question okay so the third model is everything that is outside the current is entrusted so user space process can be fully entrusted I have a threat model here maybe we can talk about it so in green with the green lines you see that everything that we need to trust in order to to do correct enforcement and you can see that the user space of the the attacker can tamper to the process that is pushing the digested list to the data that I input a map and yeah there are there are also some some other issue but the main point is that is the kernel which takes care of the digested list to verify the signature and then it does not go to user space anymore that I just you don't need to do the parsing in BVM you could do the parsing in user space right but this means that I need to trust the user space the user space not necessarily user space of the machine you're on but the user space of a trusted machine with a trusted computing base so if you like we're not trusting the machine like I'm assuming that this is bigger scope than the like currently running machine like I'm running a fleet of machines and I might have a trusted machine that are trusted place where I build my images right and then I don't have to worry about this threat model and if I take that out of the threat model then the BPF system becomes much simpler okay I think you you're asking about a trusted initialization process right so that okay okay so the problem is that in the end user system you have a set of packages that you don't know in advance so the so you are starting with a system and the only thing that you can trust is the signature of the package and so if we do trusted initialization it means that part of the process we need to to protect in in another way I'm not not sure how but because the way I was thinking to do it is just that we so the kernel is enforcing so does not allow anything to execute except something which is signed and I will accept so if the BPF program is building the kernel we don't have the problem because the BPF the the programs are running from the beginning and that's it if we have the BPF program in the in the kernel module also we are covered because the kernel model is signed the problem is when we have a user space so when we load the the BPF program from user space because everything that is before that is we're not sure about doesn't that doesn't that mean that this has to be done in early boot which would mean it has to be part of the image to get the trust chain all the way back to whatever the tpm or whatever's at the bottom the problem is that the image means that you have a fixed set of packages well I mean otherwise you're racing right like I mean you either I don't see the what's the third option so the options I see are you do it in early boot there's no race things are simple it's built into the image that's like how a like a production like fleet would be deployed right like where you have control of the image and then you're saying this is an end user who has just got some random image running yes so there's always there how do you know that that thing is going to run first there's always that and you say you don't trust root do you have to trust root enough to start this right otherwise your trust over here your your lines are broken right yes so this is the the problem that we have so in the red line is a basically what the kernel cannot verify in the green line is what the kernel can verify and we could avoid the trust the user space if for example that we use the light skeleton so if we compile the glima bbf with the light skeleton we embed it in the kernel module like a bbf prod and the kernel module is signed so we don't have we don't have the problem but that means that you you have some control over the image then right at which point why don't you just put the whole list in the image to begin with and then move everything you have no user space app at that point right you just you load your module early and somehow you get into this early boot that's all signed why don't you sign your digest as well then there's just a sign and then there's no user space piece at all right at that point yes I think we're agreeing I'm sure it seems like you either go early boot or you have a race and you either say okay a race is better than nothing or you do it you say I want correctness and you could you modify your image whether that be after it's been built as a sort of addition or you do it at the kind of when it's built am I is that a summary the race window is before the thing starts right yeah that's what I'm saying so either if you you can't keep saying I don't trust root I think it's the problem I think that's what's throwing me is you keep saying I don't trust root but you but you are trusting root because if you if you don't trust root you need to be in early boot yes everything needs to be in early boot there's no user space application necessary okay except for maybe convenience and then or you trust root enough to launch your application no actually it was your second option in which I actually need to explicitly allow the loader of the digest list to run because when you start the glimmer the enforcement you cannot run any any any application and that's root that's going to do that right so actually is the kernel module which do initialization pre initialization so it's lost the light skeleton and then takes a file assigned the digest of the loader from from the disk loaded into the digest item maps and then the security model can allow the right the loader so I mean maybe to be to be to go back to what I said earlier I think we can do your helper and you can do this I think now we're arguing over your use case which maybe doesn't doesn't help you get what you need but I'm still trying to see if you can if you put the skeleton into early boot you have a way to do that before before you know before everything else gets called why can't you also embed your digest in that and then just go from there I think one of the problem that it failed so when I tried to add the digest too early it didn't work so there is also another problem that now I'm thinking that we cannot do signature verification until the keys are loaded in the in the key ring but that's pretty early on right why don't you have a look after that it's not it's late it's very late so like I I look at what is executed without enforcement and there are 368 execution of mod probe before the the keys are loaded how does finding work then aren't those signed those months that doesn't know that the models are signed so you can restrict with the sega enforce the problem is that the user space is not signed mod probe could be an arbitrary but but you wouldn't you wouldn't care right because it's it's loading signed things so even if it's a malicious mod probe it's doing the same thing that the legitimate mod probe would be doing right I think I have to see to check this but you could replace the mod probe with a shell with a ssh server and you are the assistant which is in a unknown state because you have some processes which are running and because you didn't you didn't do enforcement on those so basically what I wanted to do is an even more restrictive environment as a controller so before the security model gets to approve the execution you deny everything which could break firmware loading could break maybe some things which are needed but the idea is that if you want to ensure that in the system there are all these there is only executable code that you approve you need to deny until you can you can check right but that only works if you get into the early boot in my room right because otherwise you can't how would you load anything like how would the system come up if nothing can exact so like you need to be an early boot I think and then why don't you put yourself in the in the BPF LSM active part way back there because here we are we have to wait this the the key ring so we have to deny until the the keys are loaded into the key ring do you know if there's any blocker to move that to an earlier end installation point that on specifically probably not for at least for the keys there are no blockers for I'm I know that there is a blocker because he needs to wait for the TPM to be initialized but this is I'm a specific thing I think for the keys the key type are registered early I have to check but I mean it would be possible to make that independent of the other one yes for BPF for sake of BPF move it earlier right I'm not sure that we catch every execution but we can go pretty close okay so basically here I put the true option to do to have a degree mean ability in the kernel or in a kernel model to try to to be loaded as early as possible and in this case we don't need for example a supporter for example that the boot in the initial and disk because we have appended signature so the regular files and it works and but it requires the kernel to be modified to push changes to upstream and unless we use the kernel model but the kernel models need to be loaded by the BPF subsystem and the other proposal was to wait aren't you using your own kernel or like what you're trying to do this is like for the standard like Fedora kernel or yes what's the goal here the goal is to to have this feature approved for Fedora for open SUSE and what I'm trying to do is to minimize the changes that they need to do and so I proposed that in the in the past also the glim the internal implementation but it was rejected because it required the glim to be upstream and and since I could not do that I started to implement the glim as a BPF and that the chance that because in this case with the BPF they only changed that the need is to invoke the glim from early enough and the vendor needs just to sign the a BPF program and this is much less invasive because before they needed to accept for patch set for for IMA for the database service and in this case it's quite minimal is quite you know contained saying like sign the BPF program right how does that work uh okay so if we if we put the we use the live schedule I think is should be signed with the boot image right no no no no no no no no no no no sign the kernel modules okay yeah so I build uh so in the in my in my repository included I make file for the kernel module and the this kernel model is what needs to be signed and if we load it from like the BPF per load, the BPF per load I think is lower than when you mount the BPF per system. If you build it as a module, then according to you it's laid, right? If you build it as a... It depends. Because if we load it from... So we can have an init call that invokes a request module and will cause the glim to be loaded. Yeah, actually that's true. So have you talked to the Fedora folks about it, like this current thing? So I thought to wait until it is more clear what happens in upstream because the feature was rejected when they needed the code to be upstream. So I wanted first to understand what the BPF system could offer in terms of loading and also if the helper would be accepted and if we have this basic infrastructure that is required by the glim to run I think for them it should be fine to accept. If I were to get a bucket of popcorn and enjoy the discussion with Fedora all you need is the helper to be upstream, right? Yes. That's the only blocker. The helper to be upstream and to load the kernel module, I think it's more reasonable. I'm not sure if we didn't talk or if we consider to put the glim in the kernel tree or it should be a separate repository. Because another possibility is that the glim is in the BPF directory and it's compiled there and this will solve... So for people who can change stuff, right? What is the story for BPF preload where the source effectively has to currently the L scale needs to be in the kernel tree, right? What are your thoughts on this, Alexei? How could we make it more flexible to have a config file that points to some things that could be built as a part of the kernel image? I would just say that maybe adding this to the kernel tree seems questionable to me because as soon as you add this I'll ask you to add my thing, right? Or we can fight about it if there's going to be one. But it feels to me like this is going to be specific to your implementation, right? Yeah, for me it's fine if we just invoke the glim from a separate kernel model. I suspect that we need to get to the level of stability, usefulness to be accepted. Like if, let's say, Fedora indeed starts using it and the year later they'll say, well, we want security that is stronger, so it should be not just a kernel module distributed with Fedora assigned by Fedora key, but part of the kernel. Then at this point I think it would make sense to open the discussion and make it indeed part of the kernel, but right now it's too early. Yeah, so for now I think the number of the line of code that we have is quite limited. So I calculated that it's 820 in total and I think the complexity is manageable to do a review and to see. Okay, we could probably work more on the security model, but at least for the parser it's quite stable. I think the thing we're having is the configuration file-based thing where if I have this, currently we have this iterators dot ppf preload iterators dot c, right, which is like a one, which is the only implementation of a preload thing. If this has some, and there's hard coded which says two programs be loaded, that needs a little bit of more work where we could say this is the manifest of ppf programs or light skeletons we want to be loaded as a part of the kernel and that could be passed in the boot process, not necessarily checked it because otherwise it defeats the flexibility purpose. So for him to like update, for like Roberto to have Fedora adopt this, they would either, they would need something, and I think I would either be patches on top of the Fedora kernel, like which would go in kernel ppf preload dot c, but the better part would be something that is, could be indirected from the, like why our manifest file and then this is just loaded as a part of the boot process or like... So configuration file, I think it's not a problem for them to enable, but they require the code to be upstream. I'm not sure exactly if they need also the BBF program to be upstream, but at least for the infrastructure part, yes. Could be upstream, like if you have some glue that ties your, like ppf preload makes it more flexible, and then this code is checked in or put somewhere, then you could put those together as a part of your build process. I guess this raises, this is a feature request for somebody who's implementing this. To preview the talk on Wednesday, so the block faults right now have been talking about agent ppf enforcement in a block layer, and they would need it to be really early, not as a kernel module, but part of the kernel, like boot time before you need, even before firmware already. Like really early. So we'll have this discussion Wednesday. I think the early part is fine, but what we need is that we don't want the kernel, we else get stuff to be in the kernel sources, right? What did we have? I think the concept is good. I think that we need to remove the RPM specificness and to make it, we want a very generic thing that could be used by any platform anywhere. If it has to be in the kernel source. If it has to be in the kernel source. That's my newbie opinion. Yeah, so the mechanism itself could be inside the kernel and then the parser could be outside. As the project is... No, that is like... The parser should be separated out from the concept of a security experiment. And it isn't even a media... I mean, really you're just saying, I'm giving you some signatures, put them in a map. I can repeat myself, but yeah. I'm just saying if you get early enough in the process, the boot process, all you're asking is, here's a list of signatures, put them in a signature map for me that I have access to from BPF. And I actually also just to add to that, I think, I mean, if it's part of the Fedora build process, they have all the packages, right? So they could extract all of this and build this into the kernel that they are building, for example. They could extract... So they will not put the RPM others in. So the RPM others are in a separate directory and which is populated, for example, in installation time. I mean, you have to start trust somewhere, right? And this goes back to like the 1976 trusting trust paper. Like, where do you start your root of trust for... And if they're building the ROPMs that are trusted for the base install, they should probably be able to generate this initial map so you can initiate this four borough cycle of eating the signatures. Okay, I think it's... Straight forward with the parser. Well, just want to say, like time-wise, that's been the longest presentation so far today. And we have a few public service announcements. We have a dinner somewhere here and our eyes is something, something. It's not a room, maybe it's outside. I don't even know where it is, but it is six. But at 530, we have lightning talks. And it's only like 45 minutes before, so I'd like to finish up, like, soonish. And so falls can have some rest and join 530 the main hall, where Joseph will give a lightning talk and maybe like a few more. And then dinner at six. I guess it's five. Okay, then probably we'll conclude soon. Would you like to see the demo? Okay, let's see if it works. And tomorrow we'll start at nine, as similar as today, and we'll start with program signing. And just to show that now we have a file for each package. And then this is what we use to extract the digest that we use for the access control. So now the system is ready. So I did previously the step for the configuration. This is just open SUSE with the modified kernel. So now I told the bootloader to run the Glim. So in the next boot, we will enforce a verification of executable code. So I decided to do a very large installation with desktop. So there are many packages just to stress the implementation and to see if it works. So it's loading. Okay. So it worked and then I have a script that executes some steps to show that the enforcement is actually done. Okay. So first I copy ID in the current directory and execute the copy of ID. It works. Now I append some data to ID and I try to execute it and it doesn't work because the code is different. And we see also in the journal that the attempt of execution with some useful information. And now we see the number of digest that we have. So 26,177. So now I do installation. So we also catch the installation removal of packages and we update the digest items map. And we can see now that the number of digest increases by one because we take just the digest of Timux. And okay, we execute Timux and it works. And we copy Timux to the current directory and we execute the copy and it works. And we uninstall Timux and we see that the hash map is updated. So the element is removed. Now when we execute Timux, surprisingly it works because the digest lookup is still cached and we need to invalidate the cache by trying to write the executable. We just touch. And if we now try to execute, it does not work because the lookup is not in the cache anymore. We see now the second attempt of execution. And so now we install another package but from Fedora for which we don't have the key. So the central verification will fail. And we see now that Timux is not allowed and we see the third attempt with the digest of Timux. And so the problem now is that this implementation is not secure because it's running from user space and I pin the program to BPFS and it's sufficient just to remove the files and now it works. So the enforcement is not running anymore. And I compile the glim as a kernel module and now it's doing a modprobe and I told that Ima is ready to calculate the digest. And when we execute the Timux for which we don't have the digest in the hash map, it does not work. Okay, that's it. You can freeze the program too, right? Well, we freeze maps. In this case, it's frozen for all the processing user space. So the parser can still add a new element, but the user space cannot. I was just saying that it seems to me that deleting the program could be stopped by just ensuring that the BPFS program is never removed. Like the kernel can just add a flag. Like we do for frozen maps, you could say freeze the program and no amount of finangling the file system would cause it to be removed. How do you plan to deal with updates also? I get new RPM package and I need to install it. How does that work with this now? It's not my Timux, it's an untrusted binary. Okay, so this is the untrusted part. So RPM here is completely untrusted. And when it sees... I created a new plugin which takes the editor of the package which is installed and push to the data input map. So which means that... The editor is signed, so if RPM starts to behave in a wrong way we can detect it from the kernel when the signature verification is done. Just one minute of the conclusion. Okay, so we saw that enforcing the verification executable code is not difficult, but the problem is the ecosystem because we need to provide not only the mechanism but also the reference, the proof of authenticity for the files. And so the fact that Fedora 37 will include the file signature is a positive thing because it means that the integrity feature is available for our user. But we think that the GLEAM VBF is more flexible because it's not tied to a particular package format. It does not require the Linux distribution vendor to do a mass rebuild. And... Yes, and... Actually, since the GLEAM is moderate, we could take only the part that collects the reference digest. And, for example, we can do the second part that has control with iMightServe. And there are minor dependencies that I show, so these helper that are required for signature verification also the patch set for supporting PGP keys and signature is not yet in the kernel, but David always is here, so we try to convince him to accept. And I hope that we can have some fruitful conversation and we manage to add the requirement of the GLEAM to the kernel. Thank you. All right, thanks a lot. So that concludes the first day. And in eight minutes, there's this comment session that you mentioned earlier in the main hall, right? All right.