 So, my name is Matthew Garris. I'm a developer who is not entirely correct. I work on security at Google on the security of our internal Linux desktop systems. So, effectively, my job is to make desktop Linux secure, which is interesting. Today I'm going to be talking about some work I've been doing around various existing pieces of functionality that have not necessarily been terribly well advertised or terribly widely used, but which are intended to provide more accountability, verivivalency of files on a Linux system. And why do we care about this kind of thing? Why is it important that we think about the signatures on files, given that we already have a robust mechanism for verifying the signatures on the packages that contain files in the first place? And the first answer there is you are able to tell where a file came from. The fact that you installed a package that contains a file at point A in time does not mean that it's on the laser point B. The copy of the file on your system is the same file that came from the package. We unfortunately know that we're not as great as an industry, as ensuring that people cannot get into infuses they're not supposed to be able to get into, that they will not be able to replace files on their systems. And so being able to verify the state of the file, not merely where the file nominally came from, is important. We also have some amounts of that already, though, in the form of DevSubs. You're able to verify whether the files on system correspond to the metadata that came from the package. If you're using Cripwire you can even do that in ways that result in you having some sort of meaningful trust that the stack copy of the hashes has not been tampered with. But that leads you open to something we call a time of check, time of use attack. The fact that you have verified at, say, midnight in a Chrome job that the files have not been tampered with does not guarantee that an attacker has not swapped the file after that validation and then, at the point where you run the file, it has been modified and is now malicious. And finally, there are various security mechanisms where we have metadata associated with the file that is responsible for informing the kernel as to what sort of privileges it should give the file. And generating signatures that cover that information gives us more trust that those have also not been tampered with. So the majority of what I'm going to be talking about today is in a piece of kernel functionality called the Integrity Measurement Architecture, or IMA. Quick, Jeff, how many of you have heard of IMA? Well, almost 10. That's quite a lot more than I was expecting. Well, how many of you have deployed IMA? Okay, zero. That's pretty much what I was expecting. The Integrity Measurement Architecture is intended to measure files. Measurement is a sort of hand-waving crypto jargon term that means hashing and recording of hash. So IMA, when you run a binary based on policy, IMA will generate a hash of that binary. And it will only do this on the correctly configured system once a binary during a particular boot of the system. It makes use of a feature called inode versioning, where if you modify a file, the kernel increments the inode version counter. So in this case, you run a file, the kernel measures it, hashes it, and then records that hash. And that can be in the form of generating an audit log event, or that can even be in the form of pushing that hash into a TPM so you can later perform remote validation of it as an encrypted graph in a secure way. But if you then run the same file again, it will not generate a second-dimension, particularly if you do that for the first time. If you install a new version of the file, or if you modify the file, the inode version counter is incremented and then the kernel will do so again. So you can then go back and look at this log information and you can say, oh, this system ran a file that hashed to something I wasn't expecting. I should go and look at that. You have verification that this happened, but you didn't block it at the time. More recently, a feature called imerb appraisal was added to the kernel. And in this mode, the expected hash is stored in an extended attribute on the file system alongside the executable. So in this set up, when you run the file, the kernel will hash the executable and will compare that to the on-disk copy of the hash. And if those don't match, you again, by policy, have a couple of choices. You can either generate an audit event or you can block execution completely. And when you say execution, none of what I'm talking about actually strictly is limited to executable files. You can write some kind of policy that enforces this on everything. Now, you probably don't want this to be enforced on stuff like, say, PROC because you don't have spot-raising the attributes and you don't have the expected hash of everything and everything in there is changing all the time. So that was obviously not work. So you can filter out specific file system types in your policy. You can say, don't look at stuff that's in PROC or CIS. And otherwise though, if you want it, you can insist that all files be utterly immutable. Even if you have a read-write file system, you can say that any file open on the system must have a hash that corresponds to the on-disk copy of the hash. That's not necessarily a good policy, but it's something you can do. Question as to why you do that? Well, de-embarity allows you to do something very similar to that, but in a read-only way. In this case, you still have a read-write file system. So even if you're insisting that everything be immutable, you'd still be able to install new copies of the files as long as the new copy of the file has to be with them was correct. Now, the obvious flaw in this approach is that if an attacker multiplies a file, they can just multiply the extended attribute as well. And then that is bypassed. So we could do it with more than that. Skipping even further forwards in time, we now have support for digitally signed copies of these hashes. Now, in this configuration, when the kernel reads the copy of the extended attribute of this, it verifies that that extended attribute is signed with a trusted key. If it is, then it then compares the hash of the file to the hash in the signed extended attribute. And if that's valid, it's allowed to proceed. If not, again, depending on policy, executions are blocked or locked. So we now have a much higher level of assurance because if anybody tempers with the file, even if they tamper with the extended attributes, they won't be able to generate an appropriate signature unless they have a copy of the private half of the trusted key in which case it's made a terrible series of mistakes. So let's assume that you haven't made these mistakes. And so, good. You have assurance now. Now, obviously, at this point, people are potentially going to be a little concerned because generally, when we talk about signing files, we're talking about doing so in ways that restrict user freedom. We're talking about having an 8-in set of keys that can be used to restrict people from building and running their own software. We're talking about scenarios where suddenly users aren't in control of their system anymore. And we saw a great difference around this in terms of these true boot discussion. So I'm going to back up a little and start talking about some additional kernel functionality that's relevant here. The kernel has support for key rooms. These are used for various purposes, but effectively it's just an internal information store. There's several different key types, one of which user is effectively just an arbitrary data store. It's a way of putting small amounts of information into the kernel and then imposing fairly fine-grained access control upon that information, making it impossible for that information to be accessible to things that should not be able to access it. Obviously we're assuming that the kernel itself is secure here. But the kernel also has support for looking at these as x5 and i certificates of actually interpreting the information you can give them. And one of the ways we can use that is to set up changes trust. So you can configure a key ring such that anybody can add new keys to that key ring provided that those keys are in turn signed by a trusted key that the kernel already knows about, which means you can inject new keys at runtime. The key CDL utility is used for doing this. The man page for key CDL utility is technically correct in every way. But it doesn't give you a great deal of context about anything. So in terms of documentation, I would actually recommend reading the kernel documentation. The kernel documentation is much more focused on the API rather than the use land side of things, but there's a very good mapping between the kernel API and the keycattle functionality because they were written by the same person, which is helpful in this respect. The kernel functionality gives you a much better conceptual understanding of what's going on than the keycattle and page does. The keycattle one is really very much a reference document. So we're in a situation where I want to create a key ring. I want it to be possible to add new keys to that key ring, but I want those keys to be verified by an existing trusted key. So this is the kernel documentation you should read, but then this is a nice straightforward example of using fanatics. So the idea here, first line, we are loading a, sorry, we're creating a new key ring. So the keycattle and key ring root sets. So the key ring is called root sets. The at you means that it's being added to the user key ring. You want to do this at the beginning of boots so that everything gets fault from there in parents, these key rings. Keycattle P add is just add from a pipe thing. It means it reads it from a file, asymmetric is the key type, and empty name. You can give a name to each key you load. And then it's telling it to add it to the ID of the root set key ring. Each key ring has a unique numerical ID. So when we create that key ring, we assign the numerical value of that key ring to that variable. We then add the root one dot set key to the root set key ring. We then create another key ring, in this case, DOS-IMA, which is the key ring that the kernel will use for validating IMA signatures. And then we restrict the IMA key ring and insist that it's the, any keys added to it must chain back to one of the keys in the root set key ring. So at this point, even though the DOS-IMA key ring is still rice ball, there's no way to add anything more to it unless that key is signed by a trustworthy root key. And this is relevant because it allows you to generate different keys for different purposes. So in Debian, for instance, you could sign main with one key, contrived with another key, and non-free with a third key. And if you did that, use then a user who would be able to choose which keys to trust. If you had a Debian key that all of those were signed with, you'd be able to load all of those. But a user could also just load, say, the main key and lock the key ring down, at which point you wouldn't be able to add any more. And then you'd have a system that could only run Debian verified free software. You wouldn't actually have the freedom to build any of your own software, which could be construed as an issue. But if you want to make very, very short, it's not going to run any non-free software, you can do that. Can you handle this much freedom? You do still have something with problem, which is that for this to be useful, the signatures actually need to end up on somebody's system in the first place. If somebody has the files and has the signatures, then there's no way to do look any of this. As I'm sure most of you know, a Debian package is an AR file containing two tiles. When we extract the tile contents, that's how we end up with stuff like the file ownership, the appropriate permissions, because that's information is stored alongside the file contents within the title. TAR is a kind of old format. TAR was designed to allow you to store and restore information on tape. On the plus side it does post data, some other formats, but for the most part, TAR is not in itself particularly helpful, because TAR doesn't allow us to store extended attributes. There's no way in TAR formats to add that metadata. There's another archiving format that is more recent called PAX, and PAX has the benefit of allowing extensions to be added. You can extend PAX's functionality, you can add additional metadata, and then it's great the metadata can all be nicely named space, so you're able to add extensions which aren't in the full format without having to worry about colliding with anybody else's extensions. And there's even an extension to the PAX format that stores extended attributes. So this seems fine, we just add support to the package to support extracting PAX APIs from TAR APIs, and we regenerate the entire build infrastructure to generate PAX APIs from TAR APIs. This is viable, this is not too much an issue. The greater issue is that the extended attributes are in the shillie namespace having been designed by one Europe shillie. And it turns out that Debian's history in this respect is not particularly good. No one's is, is he wrong? Anyway, so there is some reluctance on the part of various people in the deep package maintenance team to add support for a feature that would rely on having a good working relationship with your shillie. So this, some additional context here, the vast majority of all the kernel, Ima support and the associated use line support was written by IBM. IBM contributed patches that added PAX support to deep package and which then allowed the writing out of the Ima extended attributes that it was stored within the PAX archive. This was not particularly inducing actually partly because of the shillie concern and that PAX series didn't really go anywhere. So we're still left with we don't have a way of actually distributing these signatures to users, which means that this isn't particularly useful. Debian packages contain a bunch of mess data in the control archive. We have information about, for instance, the patches of the package contents which are then used in DevSums, so you don't need to generate those locally. There's also mess data that is stored locally in the form of, say, a deep package that overrides where you have a local database that says, even though the tar archive contains a file with this ownership and these file permissions, ignore those and use these ones instead. That information is stored locally. So we already have both mess data inside the packages and we have metadata that is stored locally, which has to be stored locally. If we could standardize on the mess data formats, then the nice thing is we could merge all of this stuff into a single database. We could get rid of the several different mess data files to be shipped and embed all that information in one place. And then we could also add support or include extended attribute contents in that mess data. So this seems to be something that would allow us to clean up various bits of existing sources tacked onto the sign functionality and also sort of this problem. So in conversations with the deep package maintainer, I discovered that this was already under consideration and adding the extended attribute to this was pretty straightforward. So I possibly encourage anybody who is in a position where you're working on a feature to talk to the maintenance of the relevant code before you start adding support for stuff and find out whether they've actually already planned to write that themselves. I don't really understand how this works, but for some reason doing 50% of the work yourself and then writing on your performance review that you've got someone outside to come to the other 50% of the work. Jeremy seems to be received better by management than saying you did all the work yourself. Apparently, collaboration is something that's valued to hurry. Steve Brooks has been suggested is to use a format called mcre. mcre is kind of niche in the Linux world but is pretty common in the SD world. It's a flat text file that contains in it's more complex forms, it contains a bunch of directory names or file names and then it's very context sensitive as you pass it and then you just have additional sentences that contain various bits of metadata that file, such as it's size, it's hash, it's ownership. And on the SD systems you can use this to say generate the metadata corresponding to a file system hierarchy and then because it obliterates all the metadata from that file system by copying it all to a fat partition copy it back and then restore all the metadata from the mcre file. The version that we're looking at is a little simpler in that it's going to be based around having absolute path names rather than the standard mcre format which allows for a lot of relative path names and makes passing more difficult. But it's simple to pass, it's simple to store you can just ship an mcre file within the control archive that then ends up in far live dpackage and then you have the initial context there and you can also merge all that information into a local database. And it's then simple to validate that the files on disk have metadata that corresponds to the files in the mcre file. Extended attributes are not part of the core mcre format but they had already been added to a code with mcre implementation that was conveniently available on GitHub. So I chose to use the same format we're not going to go to the steal. And it's just the base64 encoded copy of the Extended Actory contents and an Actory name that is just xhacer.nv Extended Actory name. So to dpackage minus i of the package, control.r is extracted and parsed the mcre file is parsed and then as data.r is extracted we just look at the filename, compare that to the mcre contents and if there is a match apply any extended attributes that were present in the mcre contents just using fxhacer. So the proof of content patches this I wrote for dpackage to verify that this worked without causing unconnected explosions where in terms of the code that touched dpackage for opencoder at the mcre console about 30 lines at most so we're not talking about a lot of additional complicated functionality for the simplest case Guillaume would like to go further than this and support using mcre as the on-disk overall action-dase-based format so that's going to be rather more code but he's working on that and is aiming to get support for this into 190 or 191 which we're talking about in the recently near future so great if you have your extended attributes if you have your signatures you can put those in an mcre file you can put that in the dev and then you can install that on-disk and the extended attributes get written on to disk brilliant how do you generate the signatures in the first place the technical answer that you're recommended to use is a tool called evmkiddle evmkiddle has a couple of minor issues the primary one being that it doesn't actually build with current OpenSSL which is exactly what you want to see in security software but that's actually a bit of a straight forward patch I just need to send that up straight but obviously generating cryptographic signatures is not a particularly novel problem generating the signatures used in ima is actually a very straight forward a rearrangement in that code is trivial the only problematic aspect of this is that the format is not the signature it can take a small amount of metadata and the documentation for that format that's something that I need to write up and submit but the signatures are reproducible if your binaries are reproducible then you'll always get the same signature assuming you're signed with the same key so this is not something that's in conflict with any goals around reproducible business the other fun thing is that signatures can be added after the package you don't need to know anything about how a package was built in order to sign the contents you can take an existing binary dead file you can just run through data.r you can generate signatures for everything, you can generate a file you can embed that in control.r you can put the package back together and then you can put that package in archive so you've got the choice to generate the signatures during the package build so we could imagine a dead helper or pass it for the of the package build process that would allow this to happen at the end of package build but you can also just have this happen at the point where you're adding something to the archive so you have options I'm sure that if someone actually wants to add this you'll have one tool or QT to discuss that with many people it will have strong opinions on the matter but the signatures don't enforce anything there is no way that in the configuration I've described that this can be used to prevent a user-result-definance system from running software unless the user QT is not true under all circumstances there's support in the kernel for building into the set of QT commands types ones which were creeped, set of trusted keys and then save it all by the keys you can configure the kernels so that the trusted keys are built into the kernel and then anyone who boots that kernel will only be able to add by the keys as assigned by whoever held that key would suggest that Debian probably should not configure their kernel that way that is pretty much getting into the this is a straightforward way to prevent users from running systems at running software of their choice but otherwise the default time of policy even if you are controlling the keys that can be loaded the default policy doesn't actually enforce anything default policy will do measurements so this is an example of I'm a policy that would do enforcements a praise is just tell I'm a v1 to 2 a praise whatever comes next funk equals bpr this is telling it's somewhat unfortunate that's internal jargon kind of leaks across this boundary until you fix this whole policy file but bpr check means perform this appraisal at the point of something being executed by the kernel but not all the exact code is code that you execute yourself it's not the code that the kernel pulls out an L file once you've got code needs to obtain other code for instance dynamic libraries and you do that by opening a file and then mmapping the code with an accessible context into a princess press space so the second line here is telling the kernel that it should not only appraise files that are directly executed it should also appraise any file that puts someone attempts to mmap it so that way you then get appraisal of dynamic libraries but you also get appraisal of any files that someone has done to the code into and then dropped on this you can even integrate this with limit security modules so the example here appraise oxg equals bin exact that means perform appraisal on any files that have the s e limits file label of bin executive so this means that you can have multiple different types of accessible foundation you can say don't appraise in this context you would not appraise files on arbitrary execution you would instead appraise any file that has this accessible label why would you want to do that an example there is if you add an s e limits configuration that gave different levels of privilege to different contexts so you could have bin executive corresponding to files that came from say any package that someone has installed execs will end up with bin executive and then get appraise and those would have the full set of privileges and then locally build stuff would not have if you could choose not to appraise it and then those would run in a different context they would be a little more constrained they wouldn't have full access to the system so you can build configurations where you allow users to build and execute their own code but you don't give it the full set of privileges now in a general domain context that's probably not particularly useful but in context where you have say confidential or privileged information and you only want that to be accessible to execs that you have verified this will allow you to build a setup where even if an attacker gets onto the system even if they're able to build their own code even if they're able to execute that code that code will apply be blocked by s e limits and even if they're able to apply the correct s e limits labeled to that code ex using that will then fail because it will not be appropriately signed so you can build much more around this than they had first seen one of the problems there is that there's no cryptographic validation of the s e limits label so if an attacker is able to tamper with the label of a file they could potentially do more than you'd like them to be able to do and the other thing is there's no validation that the file with that label is the file that's supposed to have that label it's just that the file with that label has about a signature so if deviancyshipped something with bin underscore privileged underscore exec underscore t and bin underscore exec underscore t if you then took something that was bin exec t, i.e. the less privileged one and gave it the bin privileged exec t label validation would still succeed because it's still signed there's no way for you to say if something has this label it must be signed with this key and if something has this label it must be signed with this other key if it's signed with any trusted key that's sufficient so evm solved this problem evm is the extended verification module and this adds an additional security dot evm attribute which contains a cryptographic validation of not just the ima attribute but also the lsn attribute so this way you can have cryptographic verification that's the se-learns label has itself not been tagged with problem with this is that the evm information also includes the inode number of the file which means that you can only generate the evm information once a system's been installed because when you're shipping something in a package you have no idea what the inode number is going to be when it's actually this so this is pretty much intended for a very specific set of configurations which are not necessarily what people want I'm looking at whether this can be made a little more generic in a way that would then be helpful for giving us validation of a wider set of things obvious other consequences of that would be the lsn labels would need to be shipped as part of the package as opposed to being generated locally by policy later which is very much not how se-learns specifically does stuff the se-learns policy tends to be shipped entirely independently and then that's labeling applied at package install time or if a policy upgrade occurs by automating and adopting re-labeled without the source package for those files having been changed so some complexity there now one major shortfall of this approach is that interpreted languages are generally not polite enough to open things that are going to interpret with an executable flag and as a result any appraisal that you're doing that's based on the executable will fail the other thing is that when you would say run Python and then pipe a bunch of code into it the se-learns context that Python ends up running as is that of Python not that of the underage code that you're piping into it so don't have good answers for this definitely more work to do there configuration is stuff that often ends up having to be changed but the other thing is that configuration is something that has an impact on your local security posture if someone is able to modify the current command line because the growth configuration is not based on good growth configuration is generated locally then they can pass a kernel browser that disables my web so you need to come up with a mechanism for dealing with that as well and you can do something to measure those in TPMs if you want to this is the only password solution this does not turn an otherwise insecure system into a secure system but it is a set of functionality to improve the security of the system and impose some additional security boundaries that otherwise be difficult or impossible to impose I would love to see Debian at some point shipping this additional data with packages in order to let people come up with and impose their own policy with the material of Debian device so that is everything I have and I think we've got a few minutes of questions how you said it doesn't conflict with reproducible builds as long as you have the same key is my understanding correct? one of the fundamental premises of the reproducible builds project is that any arbitrary person can reproduce the build and so as long as they have the same key is kind of a questionable requirement and so if we start shipping packages it is a way so fundamentally you can verify that the binary is the same the extended attributes this kind of signing doesn't alter the binary in any way because it shifts an extended attribute it is a long sign file so you can do a build you can verify that the binary still matches you can't prove that the signature was you can't yourself re-generate the signature but if you've got the public part of that key you can verify that that signature does correspond to that file so in that case it's not possible for you to reproduce the signature but you can verify that the signature is the signature that should be right and in that case it would be important to ship these in some other not in the actual package itself that was one of the big problems for RPM is that they actually embed GPD sign signatures in the package so you can verify that you're rebuilding the same package right, if you're expecting that the .deb will be by far identical this complex would pass if you are willing to hold the debit part .deb.tar should be identical and then it's a you need to look at the individual files in ctrl.tar and verify that they match other than the signatures and then you can verify the signatures I thought of a stupid question but regarding putting the data in tar I can imagine probably five reasons the upstream tar developers wouldn't accept this but why can't the tar or the data be in tar in a hidden file in the tar archive that then tar knows how to extract and actually when it's on a puzzle put them in the right place you could then use end up in a situation where someone who is using all versions of the tool ends up with this strange additional blob somewhere you could do it but it's not really when having it in the metadata also avoid means that you can for the recrusable build space you can look at this tar and verify that it's identical which otherwise you can do it in terms of that so I think overall shipping done by us fans is a small win so just a very quick look at the tarman page and it seems to suggest that you can embed ACLs in tar archives now ACLs are independent of aren't extended which means yes and extended attributes as well just have a look another quick embeds it in the tar archive in the shilling database right so that's yeah so let's step back from that could I be used to verify signatures on firmware but were there loaded by the firmware? yes in fact that is one of the use cases supported while I was talking about this being used to verify files as part of the policy you can say that firmware must be appraised you can also say that module must be appraised so we have two ways to control module signing anyway yes so I guess first of all I guess I don't think it's fair to step back from the tar question the tar we've been shipping for years does support storing extended attributes it's a new tar extension and I don't know if it's falling back it's doing the pax thing or not but the tar binary we all use is supported for extended attributes for a long time I understand that the tar unpacker in the package doesn't right I guess I think you may be like ignoring that too soon and I urge you to think that we'll have a more considered answer than let's step aside from that so part of this is pragmatic I'm not the de-packaged maintainer so if I'm unable to convince the de-packaged maintainer to change their mind the functionality doesn't get rid of right but I mean I'm talking so we as a community I'm not I'm saying wait hold on here there's an answer that should be given so the other positive is that from a pragmatic perspective I think there is benefit in shipping this information outside of the data of the tar so it makes it easier to verify whether a local build of a de-packaged is equivalent to the degree of build it makes it easier to ship additional metadata as well this is not something we necessarily want to incorporate into tar directly so I think while it is certainly the case that we could be more aggressive about pushing for de-packaged to support this in a tar implementation that that was not necessarily the approach that would bring the strongest wins so another I should also say that another part of this is that if the kernel doesn't have limer support and if you try to extract these extended attributes you'll get an error because the security except in the space is magic so you could potentially end up in a case where you would then use special PCs in tar so that you didn't get unexpected errors just trying to extract these and then you could recover those I would urge you to I guess fact update your presentation a little bit though a second issue I have when you say that the signatures are reproducible that twitches that is not best cryptographic practice no sorry but like I understand it is common cryptographic practice but if you go look at the if you go look at the theoretical models of public key cryptography any model in which your signature doesn't have the random component has some theoretical weaknesses again and again that we've seen turn into real maps and you've seen and I mean this is the reason why you have things like SS as opposed to pkcs1.1.5 and and and you saw a lot of this in the chat three discussions and so I understand that's not your problem to solve but I would say that it is not necessarily a good thing that we're sitting here and saying signatures aren't reproducible sure okay so I think we're out of time now thank you for listening