 Okay, welcome back to day two of the Linux security summit in North America, so we have another round of talks today, apologies for the streaming delay yesterday, hopefully that will be recorded and be available later. So we had BOFs sessions last night, there was an SC Linux discussion and I believe a IMA discussion, I'm not sure if there'll be any this afternoon as a lot of people will be wanting to get to the airport and travel, but if there are certainly write them up on the board. So I don't really have much to update today, again thanks to everyone for attending both online and here and all of the organizers from the Linux Foundation and the sponsors and the program committee. We'll start the first talk, we'll be Unprivileged Access Control in App Armor and we'll wait until it goes to 905 so that we sync up with the streaming. Okay, are we live? Yep. Alright, I'm John Johansson and my cohort is just down here, we're going to talk about Unprivileged Access Control in App Armor and specifically application directed Unprivileged Access Control, user directed is a whole other ball of acts. So by application directed I mean the application is defining the policy and so the advantage here is so that it can be dynamic because the application knows things that an outside Mac doesn't know. So basically think about sandboxing, there's different reasons that you see sandboxing right, privilege separation or reduced privilege, it's nothing new, there's been lots of other work on this set, you would have been used in the past to do it but that's got some privileged things, the OpenBSD, Pledge and Unveil, SecComp does some stuff like this, Unprivileged user namespaces and bind mounts, Landlock, so why App Armor? So our policy is the same right, so we've got a basic API and the why App Armor part comes down to, because it's been something we've been wanting to do for a long time, some design decisions actually were made around this years and years ago, we have a basic API, it just lets you define a profile and tell in your code and then tell the App Armor that you want to confine your application with it, really a confine is just a simple wrapper around, we're going to compile that policy and then we are going to load it into the kernel, the compile part is, it's a wrapper around our parser stuff compiler, right now it's very crude, it's very much a work in progress still and the reason we use the compiler, we have to have this library is App Armor uses a state machine and the state machine was designed a long time ago, it would be very bounded and so that we could verify it, it's much simpler than say BPF, so it can be verified safe for a user to load stuff with some other restrictions on it, but so the compiler is going to convert whatever is there into the state machine, so it's just text policy in the application, generally, and then you load it and that's got the bounding and run time guarantees, compiling also make sure you don't have errors instead of like say a pledge or unveil which actually feeds stuff into the kernel and then they can throw an error back from the kernel, so like our binaries policy, you can do that, right, you don't actually have text, you could have pre-compiled binaries and just load it, it's not dynamic though, so it kind of defeats the purpose of using it, putting it in the application. So once it's loaded into the kernel, you know, like we've verified it and we can update it unless it gets locked, it's applied at the process level and that API we saw is very simple, it's static, again it's not very interesting from application point of view, it has its place, but there's a dynamic API, so this is where, for applications where you start seeing what you want, unless you expand the policy, so you can start with a basic policy and then you can add rules to it, so you create your base policy, whatever, and then you can add rules as you need them and then you can tell it to confine application and then even after confining, as long as you haven't locked it out, you can update it some more, so for example here, applications updating it with a rule for RGV1, so some file that was passed in and then it's going to load it up and confine itself with that extra bit. You can lock that policy, it's done on the kernel side and so once it's locked, that updating of the profile is not going to happen anymore, you can actually add some new restrictions through stacking and we'll get into that in a bit, and it's done through a flag saying that this is immutable because it's just all profiles, so kernel side, of course users loading into the kernel, again we have the state machine so we are bounding it there, but you also have to be careful about you want to be able to say, I want to be able to disable this, not every user necessarily needs to be able to use it or should be able to use it, or maybe if there's a bug there that you want to be able to disable it at the system level, we need to control memory, so there needs to be some memory controls on this, so a user can't just DOS the kernel memory and again it's verified. It's deterministic, it has very well defined runtime bounds based on inputs and the user can't control the jumps, it's all set out and well controlled there. And of course it's going to apply to the children, so dealing with processes, this is a pain in the butt. Kernel deals with tasks, tasks get their own cred, threads are just a task that's sharing some of the data, so what we're talking about for a thread is we're going to talk about a tree thread as a process group and with each task getting its own cred type thing or when you do updates because in the kernel the task updates its own cred, it can cause problems, so what we could do is we could say we're only going to allow from user space to confine something if you are only single threaded, right? That's not so good. There's a lot of multi-threaded applications and part of the target here is Pledge and Unveil and they actually require that you be able to update threaded applications because those applications have already been updated to work that way. So it's a little bit of a pain. We can fix that by a couple, there's a couple ways. We can do it in user space, we can do it in the kernel. In user space, we did like an early init, users could do it in manual encode, we don't want to have the user have to deal with that, less work the programmer has to do the better. We can do it via library with C doors and whatever. Issue there is depending on what your threading libraries are and everything and what your application is, we can't guarantee something like that is going to run before a thread pool, but generally it's pretty good. Like I said, other languages may run, create their own libraries, we've had that before where Go interfaces actually don't use the C wrapper, a Go wrapper around the C library. They create their own and they've ignored some of the stuff so they break at times. Basically it's just harder to make guarantees on user space. The kernel, you can do it, but there's some overhead if you're doing it as tracking. We'll talk about the tracking in a minute. And there is also, you could do it the seccomp style where you kind of freeze the task and then run through and that's a real pain as well. So right now we are doing primarily doing by the C tours and whatever where we flag a task that it's going to be using, it's going to be using application policy so there's no overhead on applications that don't. But once that's flagged, it is tracked in the kernel the same way as if we were tracking everything. So there can be cases where it could be over a little bit and we can actually track everything if we just do a config. So tracking it's, you know, process has a task, it gets its cred, it points to our label. We've got an unconfined profile right. When you clone a thread, within that process you get two tasks. Sometimes they share a cred, sometimes they don't, but they end up both pointing to our label. And then they don't always like, they don't have to share but again like I said they can point to the same label. Fork, when you're doing a new process, we're going to get a split and generally speaking the label can be different, different processes, different applications, different confinement, whatever. That's okay. So we can have them share as well and that can be inherited. So this is controlled by exec rules. It's not that complicated and well okay it is. But it's just part of regular apromer policy right. We do this all the time. But with these things we want a little bit more control over it. So when we enter confinement, the application, it runs through in the order. So one is we're loading the new profile in. Two is we create the cred. Three, we upstate the structure to point to the new profile. Four, the profiles marked as stale, the old profile, whatever is on the task. And five, the task is updated to point to the new cred. When we have a threaded environment, what's happened is we already have that cred. We have things tracked already. It's basically the same thing except for the other task, it doesn't get updated right away. So when you go in and enter confinement, the one task enters confinement. The other task doesn't enter confinement, enter the new confinement for the application profile until it enters the LSM hook. We have an update. We check. See that the profile has been made stale. And then it will update itself. And that's what's going on here. It enters its critical section. It grabs the new label and then it updates itself. So the creds, the cred of individual threads get updated asynchronously. And we have to have that information there to track and that's why there's a bit of overhead if we were doing this for every task in the system because we have a little bit on tasks that are in application confinement where we have a little bit of extra work going on on clone. Updating confinement is the same idea, right? We can replace the profile. It looks the same as processes, right? It just looks the same as we just did with entering confinement. Under threads, it looking pretty much the same thing, right? We're just marking it as stale. This is all regular profile replacement for us when we're doing the updating confinement. And that's how we work all this is it runs like through our regular profile replacement dynamic updates. Not a huge deal. Except for the tracking, right? One of the things we do have with exec is we still keep with application policy, we still keep all of our exec type behaviors and rules. We have a rather rich set of exec behaviors you can do, how you can transition profiles. These application profiles don't generally use them mostly. They just inherit or go unconfined. But they can in some cases actually load multiple profiles. We'll get into that with stacking later and then they can use some of the other transitions. Interface right now, PR control or assist control for this call, I mean, because we don't want to use the file interface that we usually use for loading policy. LSM will block that our own policy will block that. We couldn't use the LSM syscall because the LSM syscall, well, it's not for loading policy, right? We could create an app from a specific syscall, but for the moment we're just using a PR control just because it was easier. As we get this in better shape we may try adding a syscall instead. So introspection is just through the regular LSM interfaces that we're all familiar with. So the proc adder interfaces or through the new LSM syscalls. And what you'll see is this weird double slash and thing is stacking. That's how we show expose it to the label to the user space. And so unconfined and app would be, it's unconfined at system policy level, but we have an application policy on this application. So how does it interact with system policy? Well, when we have our application policy it's always going to be stacked against system policy. So they will both be in effect at the same time. And what that basically means is that application has the intersection of however many profiles are stacked on it. So it could be two, that's probably the most common, but it could be more than two. And what it's allowed is just the intersection of all those. So with three it would be the middle of the standard Venn diagram. It's put into its own policy scope. So what this basically means is when you have applications that are interacting, say one process with another, there's things like say Unix domain sockets where you're communicating across IPC-wise and your policy wants to say this task can talk to this task, this subject can talk to that subject or object, whatever you want to call it. So what this does is it says being in its own scope the applications policy doesn't interact or interfere with the system policy. So system policy doesn't have to worry your care that the user policy is in place. You just use system policy it doesn't change. And that's just showing that they can talk to each other. The applications themselves can at the policy level, the system does at the policy level. So one of the things we have is okay we have application policy but you have to go update your applications to use it. Nobody really is going to do that except for a few special cases. Part of the motivation he's here is pledging unveils been around for a while and applications have been updated for this already. So can we take and can we map pledge and unveil into the app armor API. And some of this is also working towards improving the app armor side of things as well so that we can do better. And it's a work in progress. How close can we get? It's not going to be a hundred percent. But we can actually, we think we can get pretty close. So pledge that's the pledge sys call that you get from OpenBSD. It has a set of promises and exec promises. Those promises, there's text strings. And those are the current set of promises that BSD lets you make. Each of those has a whole bunch of defined semantics to them. Very interesting when you go through some of these. They're very, if you talk to BSD people it's like well we don't have an exact definition of what they are. We kind of revise them as we test things and they have revised over time the behaviors of them. It's so update will have to be made at some point. The set of promises, so it's just, like I said, it's a text string. You list which promises you want to make. The null on the exec promises just means we're not making any exec promises. It's not affected. We're not doing anything there. You can't, with Pledge, you can't update to add new promises. You can only take away. Once you've pledged something you are only ever reducing your permissions that you're granting. So in this case you can't add the Unix promise. But you can, you can go down like that, right? And you can, if you don't put any promises in there, it's an empty string instead of null. That says basically, I don't want to be able to do anything except for basic compute. And how they define it is basically only the exit syscalls allowed. So that's bringing in some, you know, syscall level stuff. What we are, what we're doing now is we have a little library and it gives the Pledge-Unveil interface. So we take that text string and we parse it in user space and we convert it into tokens, obviously. And then we convert that into a set of app armor rules for each one of those promises. They could be multiple, whoa, they could be multiple promises or it's just, I mean multiple rules or it could in some cases just be one. Got a second error. Can I get this back? And in the case for development purposes, we actually are allowing us to call out and have some user space rules defined so we can include those in and update without having to recompile this library. We can do some revisions. It just allows us to tweak and improve stuff without easy development. Eventually those are going to have to be all folded back in and not allowed because if you think about Pledge and Unveil and stuff, some of those Pledges stuff, then you can't get out to these even. You can't include the external files. We don't have a perfect mapping like I said. Pledge uses stuff that is, you know, some things map really well as some, but not this is call level, right? We are looking at bringing some SecComp in to help with making this better, more accurate emulation of Pledge. But we'll see. The SecComp site is interesting in itself. So one of the things we have to do to try to get closer to Pledge is, like I said, we were working on, we've been working on revising App Armor, updating it and extending it. So breaking promises in Pledge, the defined behavior is you get a SIG abort and the application gets killed. That's not generally how App Armor works. We generally do return E access. So a little bit different there. We do have a flag that you can set on profiles to kill flags and to kill processes if they violate rules. It's normally SIG kill and not SIG abort. But we've added the ability to set the flag so the profile can change, or the signal, so the profile can change what signal is going to get sent. So that would get us to the SIG abort side of things. Except we're not always SIG abort with Pledge. There is the exec promise and the air promise. So the air promise turns all the other promises into, for turning errors instead of signals. That's not too hard to handle in the mapping. The parse just recognizes that we have an air and then we don't set the signal, the kill flag. Exec's harder because exec isn't not the kill signal ever. It is an enosys. So lots of fun there. Again, not Abrammer's normal behavior. So in this case, we actually have added the ability to set, and that was the air behavior, too. Exec's a little different. But air, when you set the air pledge, it's not E-access, it's enosys. So we've made it so you can set the air code that's going to be returned. E-access is the behavior for exec. So no kill, not enosys, you get E-access. Kind of a pain. So this is some of the work that has not landed yet. We're working on it. So what we've gone here is to give us more flexibility instead of just a global profile flag, being able to set these on a per rule basis. So now we can control and say, if I want to kill on this set of rules, I can do that. If I want to air out on these set of rules, I can do that. And the priority stuff will let the compiler know if things overlap, which one's important, how they overlap, when they overlap, where to resolve what. And even the air code sort of can be set. The reality is you only can have two of them, but it can be set which one you're using at the rule level. So that gets us all the combinations for Pledge. This is very much still a work in progress. So what does it look like? Right now, Pledge on BSD will use Last.com. We don't actually work with Last.com. It'll set a P on it for applications that had the Pledge. We don't work with that right now. So we're just working through regular introspection. What you'll see is, again, you'll see stacking, right? You see the system profile is unconfined. And then you see P. We're just following what Pledge does there. It uses P and U. So you see a P on the application. And we haven't looked at extending, like Serenity OS has a PROC, what is it? PROC Unveil and something else for Pledge. There are other extensions, ways to introspect that we haven't looked at yet. So Pledge exec, unless you said an exec promise, is usually what it is. It's what we would call a UX. So transition to unconfined. So we're going to drop the application profile. And then basically you go back to being just system profiles if the exec is allowed like that. When you have an exec promise, that changes the behavior. What happens is we change the application profile to do what we call a CX. That CX takes a second profile that's loaded with the Pledge profile that handles the exec privilege or exec promise. And we now transition to that on exec. So it's kind of like a regular app armor policy transition. We just have set it up so that the two go together when we build profiles. Promise reduction. So we mentioned that Pledge can only ever reduce, right? And we talked about the app armor API allows until you lock it, allows you to increase your privilege. So how do we do this? We want the promise reduction to only ever we want guarantees on it, right? But we also need to allow updating in a sense because we want to reduce what you're allowed. So to guarantee that, to make that guarantee we want the guarantee to come from the kernel. And so how do we do this? We're not doing it through profiles generation and user space. So we set the beautiful flag on our profile that we're putting in Pledge to guarantee that you can't replace it, can't update it. What we end up doing is we start doing some stacking to do this right now. This is not ideal. So every time you reduce your promises you get an extra Pledge profile and you get the intersection of them. So as you could technically increase your promises except for user space we'll try to reject that. But even if you manage to do that, you still, your kernel side, get that guarantee you get the intersection of them and it only ever reduces. It looks ugly and gets worse as it gets longer on the introspection. And there's a little bit of a runtime cost to it too because regularly an app armor profile when we compile it we optimize it. And you have, it's the same amount of time to match a thousand rules as a single rule. We're here because it's dynamic. The more profiles you have that's three profiles worth walking right? So there's a linear cost to it there as well for the matching. We don't want to do that. That's kind of what you would see on the introspection right? P and P and P. So we're working on moving to Boolean conditionals and so what happens here is we're going to lock the profile but we're going to allow some Boolean variables that are exposed and allow user space to toggle those Boolean variables. And this is going to give us our promise reduction. What happens is the right to access the Boolean variable is tied inside the Boolean variable conditional. And so you have to have, to be able to set that Boolean variable you have to have it already. And once you have it and drop it then you can't access it again. And then it takes away the rules as well. And this can be done at run time. And so this is some of the work that's going on right now is to add these Boolean conditionals to this. And so we can just have a single profile and have that performance. And then it changes the way we have to do the update of the library but that's not a big deal. It's all hidden. So unveil. This one when you look at it first is it seems like it's easier to map to AppArmor. It's file, it's all about the file system. AppArmor tends to like files. And what it is you specify a path and then you specify some permissions. And the permissions are just r, w, x, and c for create. And it's a straight mapping on the air code that it returns back. So eaccess. That's nice. It's mostly anyways. It's not quite. So it's very easy with unveil. Once you start doing unveil you just can keep adding new promises. So this maps back to the AppArmor API pretty well again. We just add a new rule and load it and you have access to it. And what unveil does is unveil at some point. Oh, there's, you know, with directories you can add directories and we just map it over directories and file rules, whatever. The directories are expressed as text strings but they have some interesting properties to them under unveil and we'll get into that in a minute. And relative paths we just map it into a variable. Same thing on the directory how do we handle these? There's a couple of different ways whether it's a kernel variable or a user space variable to be worked with. So unveil, like I said, directories have some interesting behavior. What happens with unveil is once you do an unveil of a directory you get everything underneath it. So that's kind of the tree, right? The whole directory tree underneath you have access and then you have those permissions that you specified. Sort of. Unless you make another unveil that is different and more specific than the other directory. So then you get a different subset. This is actually hard to express an app armor at the moment. It's possible to do it programmatically but as you're trying to write app armor rules, if you're trying to write an app armor rule this is actually hard to express. We are doing some work to make this easier to express in policy for a human. But interestingly that one's actually hard for us to express in policy at the moment. So there's some other behaviors with directories that are interesting. So directories added after you make an unveil even if they're in that area covered by the unveil are hidden. So this is a different behavior and we actually don't handle this right now. We have some ideas on how to handle this but it's one of the areas where we fail on unveil at the moment this hidden directory. With unveil mappings again the null null like this means to lock unveil. So what happens there is we just set the immutable flag and load it up. So it's very much like pledge in that sense once you've done that. Then you can't replace or update your unveil profile. Uninterespecting again you'll see instead of the P you're getting the U. Okay so let's talk about how we can bring these together. So like I said you get profiles. So what we're going to do is for pledge you get a profile for unveil you get a profile and that can stack with the system profile. The pledge profile can be locked and unveil can be unlocked while it's being updated until your unveil promise is removed or you lock it. So we have some different semantics also with a pledge and unveil we talked about them already. Cigabort versus e-access. The unveil is you're building it up until you lock it down the pledge you're locked down and reducing all the way. And we can handle this all at the profile level. And all that is handled at the profile level so that can coexist within the stack. And so just because you have your profile over on pledge killing things your unveil profile if it's doing violations it will do the e-access behavior. Same with system policy system policy will behave the way it's supposed to. We did those. And then again when you have them both you're going to get your system policy at the introspection you're going to get your pledge profile you're going to get your unveil profile all showing up. Again it's not exactly the same thing as BSD there for introspection but the regular system interface, Linux interfaces aren't the same as open BSD interfaces either. So do we have any questions? Any questions early on? Yeah. That's working. What order do they stack in? So is it system first and then followed by application or application followed by system? Okay. It's an interesting question and it will depend. So it's a little more complicated than that. So it's based on policy namespaces and so policy namespaces are hierarchical and the first in the hierarchy is going to be the first in the listing. So say your root and then you can have your next, so like your pledge and unveil or whatever but the reason I say it's complicated is say we do a pledge or unveil at the system level and we've created also a child policy namespace and that child policy namespace has some other application policy in it say for a container or something and that is doing its own system policy so say LXD. So the system level pledge unveil will show up before the policy that's in the container in the container namespace. So it will sort canonically it sorts based on level of hierarchy and then it has a canonical text order based on name and address is the fallback as well. Hi. Nice talk. Yeah. I had the sense of deja vu. Yeah. You know, it looks, yeah, it looks for my opinion, it looks a lot like like like I said at the beginning it is something that has been done several times right. It's it's a bit like landlock. It's a bit like pledge and unveil. It maps better to pledge and unveil than it maps to landlock. There's some other ones out there as well. Right. And like I said, today we're talked specifically about application directed policy. There's another side to this where the user can have a user policy namespace and the user can define policy for the user's application that is different than the system policy and that policy can be loaded and it can be stacked again and with the application policy with the system policy. So it's quite a flexible mechanism. Yeah. Yeah. So that's exactly the goal of landlock and that's why you have the same changes. And yeah, I'm sure you took a look at it. I was wondering what is the difference? I mean, from an API point of view, yes, there's a difference. But you can implement the same way, unveil and pledge with just a user-based library using landlock. So you can. You can. Yes. So like I said, they each are approaches to it. We want to make sure that how to put this. We want to make sure that things are available. And we want to make sure it integrates to the system the way we want it to integrate. And we're not against landlock at all. We are just trying to provide options. And it's a different way to do it. And when I said this has been in the works a long time, the some of the decisions around the state machine were actually made back in 2005, 2006 era. With the goal, stated goal of being able to expose this to users. So this has actually kind of been in the work for a very long time. Okay. So, yeah. It will be interesting to see how it goes. And yeah, I'm wondering how it could be received. I mean, mainly upstream and so kind of. So if you need to add new Cisco, which I guess you will need to potentially be either a new syscall or PR control. Like I said, right now we're just using that. But whether that's the best interface. Yeah, probably not. Yeah. Yeah. So I can use PSTL first and then add it to the data syscall. Yes. That's why with Unlock we created new data syscalls. Yes, exactly. But there is also, I mean, from an LSM standpoint, a lot of the LSMs have desired a syscall for a long time. There are operations we have within App Armor that we would rather have a syscall for, that we have been forced in the past to go through the file system interface. And so some of those have been taken care of by the LSM syscall. So for example, the set proc adder, others have not. And so it would be very nice for our point of view to be able to load policy through a syscall instead of through the file system interface, because we do have situations where we would like to be able to load policy and without having to mount the security FS into, say, the container. Yeah. Yeah. And so it's very similar to what's going on with Landlock. There's no question or debate about it. It's just a different way of achieving it. And it integrates in with what we were already doing. It's just trying to complete out the model. And there are some use cases where we have, for example, integrating at the App Armor level where we would have, like, a system label where the application policy could say, I want to take the system label and do something based on it where we want to interact with the stacked application policy. So there are some, like I said, there's some system integration bits there. It's just what it is, right? Don't worry about it. I don't, but. And last question. So, yeah. Well, same good again. Landlock was designed to be able to implement and veil with the application. I'm not sure if you just highlighted it because Linux is not OpenBusy. And so, did you to go look, yeah, in OpenBusy there's like hundreds of applications on the page. Did you try to run well, I mean, OpenBusy is a whole system providing user space as well? Right. And there's a lot of forks of OpenSource projects for OpenBusy. And I think most of the unveiling page calls are in this forks, not in the upstream OpenSource source code, which is used by Linux, these two and so on. So I was wondering, yes, in theory, that's good thing. And I would like to do the same with Landlock. But in practice, it looks like, well, that's not really the same source code. And as you highlighted, the semantic is kind of moving because... It is. You're right. And that's one of the big problems with the application policy side of things, the unprivileged policy application policy is the source code gets forked, the semantics between different source code bases drift. So cherry picking patches from BSD, say, into our Linux version of it, you know, you get conflicts and even if you get it to go in, doesn't mean that it actually is doing what it's supposed to do. And this is one of the things about developing these promises is they have to be tweaked and one part of the work of this is trying to come out with a good set of mapping, you know, trying to run different things and get a mapping that is good for most of them. Because it's going to be a little bit different on Linux than it is on BSD. And even on BSD, like your different versions of BSD, those promises actually change what they do because they've been refining them as themselves. You're right. It's a problem. But it's better for applications for us to try to take a different approach. And if we're going to do that, you know, like pledge an unveil, then say use our API, right? Our API is there, there's a few cases where it will get used. But that API is more flexible. But it's in reality, it's only going to be used in a few special cases. Yeah, that's what most of the play general batches were done. Yeah. Well, that would be interesting to see how it goes. Yeah, I think just remember that. In fact, there's already an unveil library implementing unveil for Linux with Gump and Landlock. So did you think about contributing to this library? So we have looked at it a little bit. Will we contribute possibly? How to put it? Where we can probably get work mostly for this is in picking patches into applications for Pledge and Unveil. So we have people who are interested in getting Pledge and Unveil in the Linux application base. You know, those patches for those specific applications. And so that's one of the places where there will be contribution for Landlock as well because there's more applications that use Pledge and Unveil on Linux. That also works for Landlock's Pledge and Unveil, right? Can we contribute to those libraries? Possibly. I can't promise anything work time related to that, right? Could it be personal time, possibly? Okay. Thank you. All right. Anything else? I just had a quick comment around the syscall discussion. And I know we've kind of talked about this a little bit. So initially we have three LSM syscalls right now. None of them, like you said, aren't anything that would be suitable for loading policy. But just because we have these three now doesn't mean we can't expand that list in the future. It doesn't make sense for it to be an app armor specific system call, but there's a number of LSM switch load policy. If we wanted to create an LSM policy loading syscall, I think that would be a reasonable discussion to have. The LSM system calls are not limited to just the three that we have today. Correct. Like I said, PR control for now because it's easy and where we go from there. I do expect it's going to be a syscall. But this is called, well, that's how that goes, right? That takes a long time to design and there will be a lot of back and forth on stuff like that. Am I okay? Great. Well, thank you very much. I'm about to... Okay, here we go. We're getting started. Okay. There's something here. There we go. Hello, I'm Casey Schauffler. I'm going to be talking a bit about some work we're doing with regard to a library for LSM using applications. If you don't know me, I've been doing operating systems for a long time. I've been doing security for a long time. I'm the author of the SMAC security module. I'm currently working on LSM infrastructure primarily. And I'm a hobbyist, which means I get to do it when I want to and under what conditions I want to. So we can't really talk about a library for dealing with Linux security modules until we've actually talked a little bit about Linux security modules. So we'll just dive into it a bit here. Linux security modules provide additional restrictions to the normal system policies. Now, they've been around for long enough that a lot of us think of those as normal system policies these days. But it really is an attempt to have a general mechanism for having additional controls. And traditionally, we've used this was initially mandatory access control was where it was initially focused. Today, though, we're using it for sandboxes and we're doing some hardening work. Yama is a good example of some of the hardening we've done with it. Sandboxes have got landlocked and for mandatory access control, we've got an app armor and app armor is now going to be doing some sandboxing, too, so we've got a good collection here of additional restrictions that you can have for the normal system policies. Now, sometimes when you want to do something different, you need to introduce some additional attributes. If you're doing mandatory access control, for example, you might want to have a label on a file. If you're doing some elements, you might want to have a hash of some sort associated with a file or a process. So there are all kinds of attributes that you could add time stamps all again, all kinds of things. And you're going to put those attributes on all kinds of things. You're going to put them on processes, files, other system objects not objects. Port number, ports, for example, you might want to put a security label on a port or do an integrity measurement on a tape drive for some reason. But there are all kinds of attributes you might want to put on that are specific to your security module. And sometimes you're going to want to access those. So if you want to see the previous talk, we had ps-z to tell you that attributes on a process. Well, you might want to do that for administration purposes. If it's a sandbox, you might want to allow your user to modify those attributes. If you're a program like D-Bus, you might want to look at some of these attributes to decide whether to deliver messages from one process to another. So there are all kinds of things that you might want to use this information for, and you might want to get it into user space any way to do that. Now, traditionally back in the bad old days, before 6.8, you do the administration via special file system entries. SC Linux FS, MAC FS, Security FS, or all mechanisms you can use to administer the LSM. Examples here. You can see that we can use Security FS to look at the list of security modules on the system and use MAC FS to look at one of the network configuration values. We can also do process attribute manipulation and identification using PROC self-adder. There are several entries here. Some LSMs use all of them. One of them some use none of them. So we can look at look at the information about my current process looking at these interfaces. And that's fine if you're in a shell script but it's really not very convenient for an application to use. Very recently, we added LSM system calls. These are relatively new but the problem we've got with these is they're not in the library anywhere. We have three system calls. One will list the security modules that are available on the system. Interesting fact here, I don't have a slide for which I probably should, is that we introduced the whole notion of having an LSM ID, which is a numeric value which identifies the security module in addition to the name and this makes the interface a whole lot easier. You don't like interfaces where you have to pass two strings. The name and the value that's really inconvenient because either you end up parsing it and parsing is bad. So we've introduced LSM IDs. Another system call will get an attribute for the current process and you can either tell it to just give me one of them or just give me the first one or give me the one I've specified or you'll get the whole list of all of the attributes with that ID that are associated with that. So if you have App Armor and SE Linux at the same time you'll get two entries, one for each and they'll be identified so you can know what to do with them. You can also set an attribute and this has interesting aspects to it because of course each LSM may have its own policies regarding whether you can or can't set this attribute. And again, not all LSM support all attributes. But these are system calls they're not in libc anywhere they're not in any library anywhere so we need to have a library for them. How about liblsm and that's what we're working on right now so that we can do this. Now there are a lot of things you're going to want to do in an application with the security module attributes. You're going to want to read them you might want to print them you might want to do all kinds of things with them. So we want to make it easy to do this. Not so easy that everybody just does it for you but we want to actually make it pretty so we want to do a few things one of the first things we want to do is to manipulate the LSM context structure. Now the LSM context structure is the fundamental unit of data that the get and set operations are going to use rather than having multiple parameters which makes it really difficult to return more than one for example. This is the basic structure we've got IDs, flags, lengths lengths of various sorts and then the actual attribute value itself. This structure is not difficult to deal with but boy it would be really nice if I've got a list of them I would like to be able to get to the next one in the list. So we're going to have a function here LSM context next which is really just going to be adding the length of the current value and returning the pointer to the next one. Some people are afraid of doing pointer arithmetic on their own and based on some of the casting I've seen when people try to do this I can sure understand why they're afraid of it. So we want to have a function to do that. We also want to have a function to fill the buffer. One of the reasons for this is that we want the LSM context to be aligned and calculating the alignment isn't necessarily obvious so we'll just create the length properly rather than having the user specify it when they're filling the buffer. That makes it pretty easy. So I mentioned a little bit earlier that we have ID mappings. So if you have a numeric ID you'll sometimes want the name if you're going to print it. If you're interpreting a command line you probably want somebody to specify the security module by name rather than number because nobody really wants to remember that SCLinux is LSM number 100 or that AppArmer is 102. It's just hard to do that. So we want to have a function to do that. So we're going to have a function LSM ID to name. It's going to map an ID and give you back the pointer to the name. If you have a name you're going to want to get an ID from name so you have the name and it'll give you the ID. You don't have to remember the numbers anymore. Everybody's a lot happier. Now we also have to have IDs for our attributes. So the attributes we currently have are mapped directly to the entries in PROC self-adder. So there's six entries there so we have six IDs. No reason we couldn't have more. Again not all LSM support all of these IDs but just like with the LSM IDs you want to have it be able to map it back and forth. Again, very pretty simple. Now everything here is so far really pretty straightforward. But if we're going to have a library we're going to want it to work everywhere because we don't want somebody saying in their application if I have a kernel after 6.8 then I'm going to be able to get this information. Otherwise let's go do a whole bunch of stuff. So we want to do some LSM system call emulation. So we want to be able to go look in the traditional places and pretend we're you ran the system call when the system call is not there. So we're going to have some new functions for that of course. The older kernels aren't supporting the system call so we're going to have some functions. LSM get self adder proc which is going to get it from proc. LSM set self adder proc and so forth. So you don't code your application to use the proc interface you code your application to use the system calls and if you don't have the system calls you get the values anyway. Now proc has some issues. One of them is that you don't know which LSM is necessarily giving you the value if you look at proc adder current proc self adder current. This is my bad and I made the mistake here. But we started looking at ways to address this and one of the ways we said well let's put a subdirectory in for each of the LSMs and that'll make it easy if you know you're looking for an SELinux you're looking at SELinux and so forth. So we got a subdirectory for smacking in version 5 5.0, and the SELinux maintainers have declined to have a subdirectory. So we're working with this it does make the emulation a little bit trickier current implementation that we have we say for smack we're going to look in the subdirectory for app armor we're going to look in the subdirectory and we're going to assume that what's in it's not in a subdirectory that's the SELinux one and if you don't have SELinux then we have to figure out based on who comes which one it's going to be but we don't care if it's not SELinux we don't care because we have a subdirectory we can look in. Proc also has some privilege issues while the reading is pretty consistent updating entries setting attributes isn't consistent and not always the same as with setproc adder and it couldn't be policy dependent so you can have SELinux policy that doesn't allow you to update that entry you can have smack policy doesn't allow you to update that entry you can have smack policy that doesn't allow you to update the SELinux entry you can have SE so the emulation is going to be perfect it's going to be very close to similar but you're not going to be able to have it be identical and that's a problem with emulation this is one of the reasons why we went with system calls so we could have a more consistent interface more consistent behavior we're going to have to do some things on the systems to actually make use of this now because we don't have a whole lot of user space that's aware of the fact that you might have multiple security modules what aspects of security modules we would really like like to have is that applications don't really care what LSM it's got it's going to be some system call behavior might be a little bit different but applications are supposed to be able to handle that there are some applications however that actually do depend on the security module that do have to be aware of it so in fact on Fedora 39 there are 93 packages with SE Linux in the name and 157 that depend on LibSE Linux that's a lot more than I expected with Ubuntu there are 16 with AppArmor in the name and 38 that are depending on LibAppArmor and there's virtually no overlap so if you're going to look at this and say well I want to make my distribution LSM agnostic you have a lot of work to do just determining which of these actually have issues is going to be a bit tricky most of these things you're probably not going to have to do much with but on occasion you're going to have something that's really going to be stand out as a whopper and of course I'm talking about SystemD SystemD is really very good about being aware of Linux security modules it does a whole lot of things in support of the various security modules and their various behaviors so clearly it needs attention but it probably isn't going to need more than a few tweaks now one thing that is going to have to change is this line of code that's actually actually in SystemD 234 where there's an assertion that you're not running SC Linux and SMAC at the same time this caused me a considerable amount of vexation during testing on module stacking where I was like why won't SystemD come up oh because it says this is a bad state and it turns out there's actually an easy fix for that and in general looking at applications there are easy fixes for most of the things that they're doing that are security modules specific ID-Z for example doesn't need to be testing to see if it's got SC Linux enabled because it refuses to tell you the attribute values if you don't have SC Linux enabled even though it could we want to have a few simple commands that are going to map essentially directly to the system calls so that you can put them into your shell scripts these are tentative names for it having played with it a little bit underscores and program names are not convenient so that'll probably get changed at some point another aspect of user space that we have to have to deal with logging most of the logging we're doing is pretty pretty independent of what security module you have but auto records are an exception an auto record if you've got multiple security modules that have attributes need to produce them in such a way that you can get all of the attributes not just the first one or in the current state the only one so we need to have a new auto record type to explain the attribute contents of the various LSMs that means new auto record type new auto parsing it's not difficult it's no technology but it's just something that's going to be different here we don't have to do anything in a library for this we just have to do that in the audit code so we got a bunch of things we still have to do the LSM current implementation is still pretty basic but there are still there are things more to do one of them is the pure context right now there are two mechanisms SO peer sect and SCM security where you can get the value for your peer on a network connection but you can only get one and we'd like to provide an LSM context instead that way you can do it if you've got one if you've got more than one file attributes so we'd like to be able to say give me the attributes of this file and give me all of them setting them as well rather than dealing with the current well what you have to do currently is you have to go in and deal with the extended attributes directly dealing with extended attributes is not very convenient so if we have this then we can just do one thing and get that through so to sum things up now we've got LSM system calls now we need to have a way to get at them so we need to provide a library to do that it'll help us get away from using slash proc ideally it'll help us get away from having to know the gory details of how things are implemented with extended attributes and should make the entire process of moving applications so that they can be LSM agnostic so that they can work with all LSMs rather than one in particular make that a little bit easier that's a balancing act if you want to we're going to put out a fairly simple library initially get some people using it and then of course they will say why don't you do this as well at which point we'll say we'll do that extend it out make it a little better or we could wait for everybody to start screaming and doing their own implementations being proactive is a good approach we can see what happens carry it forward and get some good interfaces out there so people will actually start thinking in terms of doing their applications for LSMs as opposed to for SC Linux or for App Armor or for Smack and that's really all I've got to say about it I've got a URL here for the current it's not very big but it has a few things few ideas in it I haven't done man pages I haven't done a man page in a long time so I think that's about all I've got so questions don't push well if there are no questions I think we're on to a break now thank you hello can you hear me am I audible good morning everyone my name is Tushar Sugandhi and I'm going to talk about IMA Log Snapshotting today so welcome to Seattle those who are travelling from multiple places across the globe it's a beautiful weather here outside okay so let's get started a little bit about myself I've been in software industry for like 15 years in software security for about 7 and Linux security about 4 I work for Microsoft that's pretty much it so let's see what we are going to discuss today I want to discuss what is the motivation for snapshotting IMA Log it's about kernel memory pressure long running devices without reboots and attestation server side processing we also want to just not snapshot the logs but we also have to preserve the integrity of the IMA subsystem for remote attestation as and we'll discuss what needs to be done from the kernel side to make the feature work and what needs to be done from the client slash service side to make the feature work and some other design considerations if demo gods are happy with me we'll see a demo Q&A followed by Q&A big disclaimer this has been discussed in the community on the IMA mailing list and I was fortunate enough yesterday to have a conversation with IMA maintainers Robert and Mimi so I modified a slide a bit to take their feedback into account but I'm flying the plane as I'm building it so not everything that I say as part of this presentation would reflect in the final design slash documentation of course it will evolve this is the current state that I'm representing so let's get into the basics of remote attestation so imagine you have two parties who are talking across the internet Alice wants to talk to Bob and she wants to get some work done from the Bob but before delegating the work to Bob she wants to ensure that she can trust Bob she wants to ensure that Bob is trustworthy Alice trusts Charlie Charlie is trusted by Alice so what happens is Bob sends his information to Charlie for verification and Charlie wets it and verifies it and sends the verified report to Alice so this is kind of normal world in the attestation world Alice is a relying party Bob is attestee and Charlie is verifier slash attester so in cloud era so you have relying parties you have attestees and you have verifiers who are verifying that attestees are trustworthy by checking their various parameters and signals and then sharing those signals with the relying party so that relying party can trust the attestees a little bit deeper into the IMA world as you have seen yesterday IMA is an integrity measurement architecture it's a subsystem it's a kernel subsystem which monitors events from the kernel side extends them into a PCR and it has a IMA log which which is in the kernel space so events are recorded into the IMA log and that IMA log with the signed and trust TPM PCR codes is the user mode agent sends to the remote verifier and the TPM PCR codes since they are signed by TPM you have a chain of trust and root of trust you can trust those codes and you can replay those entries in IMA log and verify that the log is not tampered with and then you can validate the values inside the log and make sure that whether it meets your security bar or not and if everything looks good you send attestation report in form of tokens in form of certificates and what not so that the which marks that attestation has succeeded so what is the motivation behind IMA log snapshotting it's currently it's a monolithic log which sits in the kernel and when we are using it we observed a few problems and that's why we are proposing to snapshot it so obviously kernel and user mode memory pressure is one of the issue and the devices that are really running for the long time without getting reboots so stuff accumulates and we need to deal with that that is another motivation and we will get into these motivations in detail and then and as you know the verifier sits in the cloud it is not local it's not a parade scenario as Robert was discussing yesterday it is attestation which is outside of the client or the node so we want to make the remote server side processing fast so that's another motivation so let's dig a little bit deeper into the memory pressure issues so as I said IMA log is stored in the kernel memory and depending on the IMA policy it can and the system state you can generate a lot of events I mean it measures various critical data this device mapper block devices coming and going various file system measurements whenever those files in those file systems are touched so it generates a lot of data and with it is very valuable data it is needed for attestation no doubt about that but depending on the policy IMA log can grow and of course kernel doesn't flush the IMA log it is ever growing because if you flush it there is a loss of information and then you cannot replay it on the server side so and you cannot validate the tamper proofness of the log that it's not tampered with so it's a genuine good security design that the IMA log is not flushed currently I'm just repeating what I said here as I said truncating it would call irrevocable attestation failures on the um side the user mode side user mode processes combine to various log you have boot logs you have of course IMA is the runtime log attestation log and you have to be sometimes you have to be consistent with the format which server understands so there is some processing happens on the server side so this leads up to multiple copies of IMA log and you are dealing with a big chunk of IMA log then there is a memory pressure on some user mode processes as well that's what we have observed so together kernel mode and user mode handling of IMA logs add to the memory pressure of the system and that leads to the next point which is long running devices which are with no reboots as I said when IMA log grows it can only be flushed by the kernel it's read only for user mode processes and in today's cloud era where the servers are running for a really really long time months sometimes years no kidding and we have some industries which depend on these long running servers like gaming industry or e-commerce industry or finance whatever so even as blip of few seconds it costs them money they just don't accept the greyouts so to flush the IMA log to relieve the memory pressure you can warm reboot you can cold reboot but that needs time and even few seconds of delay is unacceptable so what can we do here that's another motivation so the systems are running for really long time IMA log grows and it is non-trivial both from the security threat model point of view and practicality implementation point of view to recover from that the bloated memory on that node and the last but not the least you are sending those logs to the remote attestation server and the server needs to quickly go through the logs, replay them you can do seal on seal that's another story but if you are doing the replay of the log to match it with the PCR codes then in that case you need to really quickly respond so in the current model it takes time the more the log grows it takes time for the server to respond so that's another motivation to solve this problem now that we have discussed a few motivations I mean genuine business cases what are the solutions that we have thought about so one is how do you trigger the snapshot that is one thing that to think of when you are discussing the solutions then of course you need to persist it copied from the kernel memory to some persistent storage or network storage basically to relieve the memory pressure but still preserve the information how do you do that and do you need a marker event to logically separate the snapshots per se it's an optional thing and we'll get into the details of that and last but not least truncate the in memory copy of the IMA log so of course on the client side you need to client needs to be aware that the kernel supports snapshotting capabilities and it can it needs to interact and pass over those not just the single IMA file now on but multiple snapshot files plus the latest instance in the memory of the IMA file and of course server needs to be aware that hey the clients which are talking to me can not only sending a single monolithic IMA log but they are sending in memory IMA log plus snapshot files so these are the things that we need to take into account when we are talking about the solutions so let's dig deeper into each of the things that I talked about one the triggering mechanism how do you trigger the snapshots it can be triggered by a user mode agent that is running on the system and it could be as simple as having a syscall and it could be file open and that triggers the I mean you can implement that syscall in IMA subsystem and can start trigger of the snapshot some kernel mode events like you are KXX of booting into the new kernel that can be another logical point where you want to trigger the snapshot there could be thresholds that can be defined in IMA policy and those thresholds can be triggered snapshot after every 10,000 events in the IMA log and IMA does keep track of number of measurements in the log or little bit less reliable but harder to implement thing but you can keep track of memory of how much memory has IMA log consumed and if it reaches certain threshold and if that threshold hits then you can say that hey trigger the snapshotting or it could be a combination of them I mean there is no reason why multiple snapshot triggers can't be present so that is about snapshot triggers as I mentioned we discussed about the snapshot aggregate slash snapshot marker which logically separates two snapshots and it can be optional just disclaimer but the goal or the purpose of the snapshot aggregate marker is to make the service side process of the IMA log more efficient and we will get into the technical details how basically we can use some events to denote that the snapshot has been taken and server can exploit those events to make the server side processing more efficient yeah a little bit detailed into the aggregate event it can be a simple empty event with just the event name those who are familiar with IMA template you have event name, event template, event data and pser in which you are extending the event and digest and so on and so forth so you can have just the event name or you can say that hey give me the current time very cheap and simple or just keep a counter that how many snapshots have been taken so far and just increment that counter or it could be something sophisticated which aesthetics very expensive like getting the PCR states from the TPM and those are eventually used for replay and then calculate the final value it can be done from the kernel mode then from the user mode maybe doing it from the user mode logically make sense doing it from the kernel mode clearly is very expensive and if we want to go to that route it needs to be well justified maybe we can get away without using the this complicated snapshot marker as I said it's an optional thing I mean things would still work and we'll see on the service side how we need to be a stateful anyway moving forward so snapshot trigger mechanism snapshot aggregate marker event what next we have persistent storage location so far the IMA log has been in security well it's in the kernel memory and now when you want to move to the some persistent location the relying part is the agent you can have different agents running on the node for different purposes one is for the certificate management one is for I don't know talking to mail server there could be one is for OS update agent and whatever so multiple agents need this information so the location where the snapshot needs to be stored needs to be well known it needs to be standardized it can be configured by IMA policy you currently it is not there but you can have an entry in the IMA policy which says that hey this is my storage location for for snapshots and of course I mean you can have a Kconfig such that the IMA policy stays read only and it is not appendable at runtime so it is a robust secure location I mean the path is immutable the location can't disappear and that's a separate problem but as long as you have a reliable location the relying parties can take dependency on that location and you can do once you have the location you can do things like IMA can provide a sim link to be consistent with that location so that when you are adding you are referring to the IMA log from its current location it points to actually location on disk or you can use file systems like fuse which was recommended by some people in the community when we were having this discussion on the IMA mailing thread mailing list that use the file systems to snapshot so that it appears like a single monolithic file but it is actually multiple files so these are a few things that we considered when we were designing this feature so going I shared this diagram without snapshotting now with snapshotting you have kernel mode copy of the IMA log and you have snapshots so so let's play a high level workflow how things will work out when on the client so you on the kernel side you have IMA log let's say you have event E1 and E2 in the IMA log and on the user space you have snapshot file which is currently empty and this control file it was earlier part of the discussion so that's a remnant in my presentation but we can get away without this control file so but for the sake of discussion let's pretend that we are dealing with this control file so some of the trigger happens and once the trigger happens the snapshot aggregate event gets generated in the by IMA in the kernel which can be a PCR digest of extend one extend of event one plus event two and then you can copy kernel can copy the event the event one and event two from kernel memory to the control file technically it can directly be copied to its final destination on the disk and that can be done by user mode process as well and then when user mode says that hey I am done copying you can finalize it or you can truncate the you can truncate the in memory events then once IMA gets that signal IMA truncates it but the snapshot aggregate event stays so and when the new events come they can continue upending to the existing in memory IMA lock so you are left with in memory portion of the IMA lock plus a snapshot and when you have multiple such triggers happening then you can you will accumulate multiple snapshots but they will be on the disk not in the kernel memory so let's talk a little bit about the server side processing the stuff that I talked about the marker event which I claimed it will help with making the server side processing more efficient so let's pretend that on the top side you have a monolithic IMA lock with events E1 to E12 and let's see some specific events you have a device mapper you have encrypted disk and you are loading the disk and resuming the disk so let's say that is event 8 and event 10 and event 11 is a Selenax policy hash and by the way these events are measured in IMA lock today I was proud to do that work with optical data and device mapper work so these events can be used on the server side to say that does my client meets a specific disk configuration or a Selenax policy hash to trust that client and in the snapshotting world let's say you create a snapshot after every 3 events and store it on the disk and that is the bottom part so notice that here are marker events are being added and this is the complicated sophisticated marker event that I talked about which has the PCR hash and as I mentioned earlier it can be also given on the side by the user mode process so it doesn't necessarily need to be a part of the IMA lock but for the sake of discussion let's pretend it is a part of the IMA lock and now on the server side you are rather than getting the full monolithic IMA lock you just get the latest truncated copy of the IMA lock and now that we have event 789 it is the starting PCR state for that event and when you do the replay of the extend operation of for E10, E11 then you arrive at the same PCR codes and then you can say that and only this much information is sufficient for you to validate that this portion of the IMA lock was not tampered with it is signed by TPM and it is validated by the PCR codes so if your attestation policy says that I will succeed the attestation of the IMA lock then you do not need all these events or you say that hey my device mapper device needs to have this configuration but the resume does not tell me the sufficient information about that device so that information is present in the load so you need to stitch event 10 with event 8 the server says that hey I only have event 10 where is the event 8 so then it can ask the attestation client give me one more snapshot from your repository so then you are just dealing with this snapshot sorry this snapshot which is denoted by disk and the in memory copy you do not need the previous two events to succeed the attestation what else of course there are drawbacks there is no free lunch so far with the monolithic IMA logs you just pass the entire log and you do not need to save the state of the clients on the server side but if we are going with this approach you and you are not dealing with seal and seal but you are doing the replay method for attestation then you need to preserve some state and as I mentioned there are different types of marker remember just the event name and no value or timestamp or pcr digest or the last case snapshot so what this table on the server side does is it tells whether the client was ever attested if it was never attested then server would say that give me the in memory one and give me all the snapshot files because this was never attested I cannot trust this guy before analyzing each and every each and every event in the IMA log but if it was attested until some point which let us say before the presentation before this presentation started so you can say that the events before this were already attested just give me the snapshot or the in memory log after after this value and then you should still be able to replay and attach the remaining one given the previous attestation had succeeded for the older snapshot files or the pcr digest method or you can say that I had attested until snapshot 4 so if you are giving me an in kernel memory IMA log which says that the system is at snapshot 7 then the server would say that give me 5, 6 and in memory 7 then I will be able to do that attestation so who does what and this is my earlier push was to try to do everything in kernel so that it is trustworthy it is measured by IMA but do we really I mean security and there is always security and efficiency slash doing things quickly it matters so efficiency and performance matters sorry so taking those things into consideration the trigger can be either on the server side or on the user mode side it could be as I said it could be kexec soft boot or the number of events or it could be user mode process poking kernel that hey please generate a snapshot and as I said snapshot marker event it can be generated by the kernel or it can it is a pcr read so you can just use the tpm pcr to come on to generate it from the user mode there could be some optimization issues and as I said it is optional but it is possible copy log is obviously a user mode thing because you are just transferring the logs from memory to the disk at a predetermined location guided by the IMA policy but truncate has to be in the kernel because as I said earlier the IMA log memory is controlled by IMA subsystem not the user user mode can read it but only with the root access but it cannot modify it for the security reasons so truncate has to be performed by kernel and let's pray that the demo gods are happy with me today so let's see I'll first show you so ask your runtime measurements and binary runtime measurements this stuff already exists runtime measurement counts is keeps account of runtime measurements and this snapshot file is the new thing that we added as I mentioned the control file which says that trigger the snapshot or you can just have a file system syscall on this and it will trigger the snapshot so for the time for this demo I'll use that as an example to trigger the snapshot so let's say what we have in the ask it and time measurements by the way are we good on time yeah so by the way is the text legible you can read it right okay so boot aggregate being the first entry in the snapshot and then kernel version is measured and then you have built in trusted keys and bunch of other entries so this is the initial state of the snapshot rate of the IMA log then let's say we have an agent let me open so as I said you can trigger you can copy you can truncate so I'm using that snapshot file syscall open call on the snapshot file to trigger the snapshot so let's let's trigger it so okay I have triggered the snapshot and let me open another window to show you what's happening okay snapshot aggregate event has been generated in the IMA log but as you can see the previous entries are not yet removed because my agent user mode agent haven't copied them to some secure location so let's copy it I just want to copy it at some default location and if I go to the default location there is on disk IMA log of course it's a binary because I servers usually process the binary log ask is for human readability of course and then I will instruct the kernel to truncate the IMA log so there is some glitch but the IMA log has been truncated and let's verify that IMA log is truncated that's it you have snapshot aggregate but the previous entries have been removed from the log I want to thank of course Mimi and Robert too for very lively discussion on on the mailing list and of course in person as well for giving their guidance feedback and expertise while designing this stuff Paul from SLINUX who was also a very huge huge helping hand in guiding me in the right direction and Swish my partner in crime who was helping with whiteboarding sound acting as my sounding board and typing some of the stuff and here are a bunch of references to the mailing list threads that you can go and comment on or if you want to learn more about this and of course this presentation is also present on the at this location and that's it if you have any questions I'll be happy to answer generally with something like this when you're changing what is being measured you publish a profile for how to appraise the measurements and the collateral that you're generating is the only source of that the patch or do you have that published as a document? It is not published as an RFC yet the design slash approach was published and as you can see I mean it's fundamentally changing because if you truncate the IML log and clients and servers are not yet ready to make use of this feature so it's a data loss and the attestation will simply start failing so it's a disruptive change even though it brings benefits it's a disruptive change so we wanted to first vet the idea get the consensus from the community that is it okay to do this and I think we are getting a consensus that this is needed this adds value we should do it so now is the right time to send the RFC. Do you have the attestation verification logic in an open source project or is that one internal? You have multiple verifiers in the industry and some implementations are open source like key lime and all and some are internal so of course since the functionality is not yet implemented in the kernel the attestation clients and verifiers are not aware of it so they can't there is no code to implement it so my first focus was on to prove that it is doable and try to do it the kernel side changes first and then once they are upstreamed I mean if you don't truncate the stuff should still work as a monolithic log so clients and servers will have time to adopt this feature but I don't think clients and verifiers have done the work to make use of this feature because the feature doesn't yet exist in the kernel. Thanks. My question is regarding the IMA log your first comment about it was that it is not the kernel does not flush the IMA log and then a couple of slides later I saw that the kernel can flush the IMA log so is that to suggest that the kernel does not automatically flush the IMA log but it can be it can flush the IMA log manually is that what the message was intended to convey? Okay. So when I said kernel does not flush the IMA log means that the capability does not exist in the kernel it just doesn't there is no way. Okay. Unless you hard warm reboot the system. Okay. So because it's a data loss and if you even modify a single bit in the IMA log it goes out of sync with PCR digest and you get PCR mismatch when you are doing that attestation. Okay. So what I showed in the demo is the implementation of the proposal that in order to delete the entries before snapshot can I do that and I wrote the code to demo it can be done and that's what the demo was. Okay. So it is not yet in the kernel. Kernel cannot truncate it. Okay. And is the amount of memory configurable that the IMA log uses? IMA log memory is not configurable. The memory that it currently it is not configurable. Okay. So there are some checks in the IMA code base which says that hey don't surpass more than half of the RAM available RAM. Okay. But it is not configurable. Thank you. Thank you. I had a question about two different operations here. You said trigger that generated the snapshot event and then copy was to copy that to disk. It seems odd that maybe there are two separate events because couldn't we get to a situation where if we trigger and some other events were generated before we copied don't we have an issue right there where the snapshot is a few events back? That can be handled by truncate and thanks to the suggestion from Mimi yesterday that you when you're giving the truncate signal you can say that I have copied until from entry 1 to K. So it's safe to delete from 1 to K. Got it. And whatever was because user mode is not under control of what will get major and what new entries will get added, right? Right. It can only say that confidently that you I have copied last K entries. So truncate last K entries. The snapshot can come one immediately K plus 1 or it could be K plus Y. Got it. Okay. Truncate should handle it. And I haven't kept up with the conversation on the list, but has there been any discussion about maybe a handler? Like trigger being one option, maybe it's K exact, maybe it's just writing one to a file in security FS, but then the option of the action of snapshotting and copying maybe like a like the core dump handler for example some process gets registered as being the thing that's going to handle these events being truncated and copied to disk. If there was a specific handler that could be registered at boot time perhaps some of these operations could be simplified to just the kernel executing this handler piping all these events through standard in and letting the handler operate however it wishes. We started with a lot of complex desires when we proposed it and if you follow the links on the thread, it went into multiple directions what clients can do and what servers can do and consistency and reliability related ones. But to summarize it was a very valuable suggestion from the Linux IMA maintainers to keep it simple and the simplest approach is to just let kernel handle the truncate that will keep the logic very simple rest of the stuff can be done from user mode. The trigger slash copy can be handled from user mode. There is a benefit of having multiple triggers from the and some from the kernel side like on Kexec and if it reaches certain IMA event counts but that doesn't really need to happen. You can just assume that hey I have copied the last K entries it can be the starting event. Technically you don't even need a trigger IMA event but the truncate has to be there and if you just have truncate then I think it is still doable. Thank you. So to answer the question it was the original design required a lot of locking and because of the locking we decided that let's keep it simple and let's prevent additional locking. Thanks Mimi. So the locking the desire behind the locking was to make sure that it is synchronous and no loss of events but maybe it is not needed. I think we are at time so thank you everyone for attending this one and for your questions. I appreciate it. I had a visa issue. She was supposed to be here but could not come but she did send a pre-record which we are going to be playing. Hello everyone. My name is Ahra. I work for Microsoft various systems and security projects and in this talk I want to discuss some of the ways that we can use large language models for improving kind of bug discovery for our product. It is mostly discussing some of the ways that we can use them better and some of the challenges that we still need to resolve if we want to get a more automatic ways of utilization of LLMs for kind of patching or kind of vulnerability discoveries so let's get it started. So as you know large language models are becoming more and more advanced and that's why in recent years we can see the integrations of LLMs in a wide range of use cases software stacks, security scenarios for monitoring for finding adversarial patterns and that's exciting areas now that we can integrate LLMs in different use cases. And one of the reasons for that is that LLMs basically the entire software stack and around it it's becoming more flexible and better and more mature for designing customized model and I really like this description of LLMs through operating system from Andrea Karthadita that basically it mentions that even from distribution perspectives LLMs are becoming more similar to operating systems because there are close source LLM models and there are open source ones that's similar to Linux and also all of the components it's getting more complex from all the systems interactions, the file system, caches basically similar to IPC interactions with other models this is basically exciting time for designing customized model so one of the questions that we were looking for as a use case for LLMs was this question that can we use LLMs for more automating the patching process on Linux kernel and the architecture that we were discussing was that can we use for example the generative adversarial networks like putting several models for generating a good input for basically before going through the whole patching generator and these models interact with each other and basically go through this iteration loop to finally create that patch that can be deployed maybe starting from a small patchers but the question was that what would be this architecture look like, what would be the data input that we need to provide for both the generator and the discriminator and basically one of the main problems here that we also were looking for was that can we as a part of this process like before like a generator also detect that this like code has some vulnerabilities so it won't generate like vulnerable patches so it can find bugs or suspicious patterns or if it can detect like some of these codes are I don't know copied from Stackover Furlough or like sources that basically can be marked that should be checked so the patch the generated patch won't be like some vulnerable patch that would be like even worse than not using LLMs at all so that's why we looked at three different ways different techniques for improving that particular part of the bigger problem of automated patching to just find that like can we improve like just make sure that like we have a process for finding some of the bugs for even like making sure that it knows some of the vulnerability descriptions that we provide and can it look for those specific bugs and so we're going to discuss these three approaches from first the simplest like prompt engineering plus RAC that's basically like giving more information to your LLM and the second is LLM basis system API from OpenAI and also then I discussed the fine tuning process on some of the details that without like it can be useful for the first step for adding more like information to your LLM is true like prompt engineering and retrieval like maintain generation or RAC there are several papers that basically focused on this area for vulnerability detections not specifically on Linux kernel there is one actually that's improving the fuzzing that's basically like integrating LLMs to syscolor but that's a different story I don't want to go through the details of that so we took a look at those papers like finding like if they have useful suggestion for prompt engineering and some of the suggestions were useful such as like describing the vulnerability first describing the API first describing the data flows for the code and these are general like information some were more useful for C for example one of the papers finding was that GPT 3 is mostly very good with Java but for C those recommendations were more helpful and some other techniques that they suggested such as first discussing the high level description of the code and then basically ask them the LLM for line by line or function by function investigations and for evaluation environment beside a small with 32 ministering patches peak ministering patches because they have like already like you can see reviewers opinions what is the actual final fix for a bug and useful detailed information of how the bug is like happening how you can reproduce some like a specific like debugging information so that's a good source to compare what we can like expect from like LLM and what we have currently and the other source was like your 15 kind of source code from kernel version 519 and some of the like areas that we were working on like such as the Selenux IMO Hyper-V, VSM and 15 kernel code I picked from those areas and for drag to fit like more information to our model as I mentioned like the current main stream patches characters are good sources and some of the security blogs such as Google project 0 for more complex vulnerability, CV details not much but a little bit and this was like basically a smart environment to experience design experiments with open AI GPT models so for example this patch that's fixing issue BPFG that's basically the problem is that that piece of code assumes that like the current process has conditional access to a page that is always available but that address when you size the page and change the page size it's not available and it's causing a memory leak and sequined so basically when we ask like a model about detecting like any vulnerability, memory vulnerability, memory leak and that piece of code like unmodified code not the patch so the model couldn't detect it and it was like no there's no memory vulnerability here and then we manually wrote a description that there can be places that there's a specific dependency or assumptions to like an address that may not be available for that current process but it's basically not giving the like the whole patch to the model but it's a smaller manual process for like describing that this is the high level issue and then we like ask again with this like this is our code and what was the problem and this time surprisingly well it describes like exactly what can go wrong and that's why like after that can you create a patch for that and as you can see it can detect like exactly where that variable should have like if we should add a condition so if that's like there is a change in the size of that it could go to like it have another like condition for handling that that is not the right way of fixing this part but as you can see it's completely different from the ministry patch but it could detect like where there is an issue and what is the definition of like that more better description of that possible vulnerability that can be really helpful another feature that we wanted our model to ideally have was like the ability to detect copied code from well known sources that are specifically secure such as like a stack overflow that may have a vulnerable code basically the model could just market that this patch is coming from like someone like insecure sources and like tried several code copied from a stack overflow we could see that like the model could really not detect any pattern and suspicious pattern it could like we asked breaking it down to like basically finding like a specific parts that are copied but it gives a lot of description that this is a well known pattern for example for capabilities to be used like that and but at the end it was like I can't really identify any like specific evidence that says that like this is a stack overflow code so basically for giving more information for adding more information to the model we picked a lot of like similar code from a stack overflow on specific kernels areas that were related for example the cryptoid API so like these are for example the files the file IDs like this files that are copied from like different like stack overflow responses but they're all really tagged with like that crypto API from the kernel and then we use the assistant API that's basically it's useful for like pitting more information without also like you have like this like file that's you don't need to repeat a lot of like instructions for it to again and again and it can be useful for fitting like a smaller number of like sources that you want you want the model to be a variable and so this was actually like a very simple and like naive way of like fitting the model as you can see like the rag model that's where I basically described here just like as we did like fitting files it's really you cannot really scale for more complex pattern but even for this case it was helpful that like at the end our assistant was like yes that code seems to have specific patterns and it's copied from a stack workflow so for basically this kind of like pattern detection we really need like a larger number of like data and more sophisticated ways of like fine tuning model but even this small like improvement a small amount of like fitting the code spaces for the exact like related parts of the kernel from a stack overflow and using the assistant API was helpful so the result we were seeing from like both previous techniques were not really like good results and they involved a lot of still manual like interactions and it was not really something that can be useful and reliable really to be used in kernel by discovery so the next step was that can we basically improve really improve like the LLMs knowledge on some of the specific like criteria that we care about like the vulnerabilities that we care about and we want like only for those things have a like proper like improvement on our LLMs and that's why the next step was using of an AI fine tuning API and for that basically like not all like GPT models can be fine tuned so they have some of the models that you can fine tune and it suggests that GPT 3.5 is better for fine tuning so we tried that one and it's also less expensive that GPT before so we were looking for some data that basically could help us fine tuning and kernel CVs are actually not that helpful the only thing that when we saw that they can be useful is ranking the severity of vulnerability so they can say okay this attack is similar to this previous CV so it can have this like a specific severity like it's a high or no but it wasn't really helpful because the descriptions are like not that detailed so it wasn't really helpful with the detection process the other thing that I was curious about was basically we have more complex forms of attacks that you can see like we have a lot of good security blogs about them and can we use those sources like for example project zero improving or LMS or model understanding and this is for example one of the examples that so we've picked several like project zero sources changed the format fit them to basically have the same like fine tuning format for open A.I and so basically before this process our model couldn't really detect what's happening that could be some arm up that doesn't issue on the anonymous virtual pages and after the process they actually surprisingly the model could really pick what was wrong describe like really details that this is the problem that like for example we have this map pages that having a dangling page and pointer and this this causing that and so I ask that like can you now that you know the attack and you write a patch for our map to fix that and though it could describe like exactly where are the problems couldn't really fix that so all the patches like really like not useful it's really basic but it could detect that this area this anonymous VMA structure shouldn't have this this pointer and the actual patch like it's not comparable the actual patch is very detailed and like not even comparable from accuracy to our model but it could detect that these places should be changed so that that was an improvement and it could identify that this like code can cause this complex attack that was described. Another source for fine-tuning our models to no more advance like sort of attacks are academic papers they usually come with a lot of technical descriptions of that categories and also all the related works and if they're like having open source data sets actually that's also very useful like after you'll like fine-tune the model you can test that like the model is really like memorized all the details and so for example in this case we picked three papers that are coming with like advanced source tests like the kernel code tests that they used for their evaluations and we could see that like for example in this simple case like the after the fine-tuning the model could say exactly which lines are a problem exactly where the type confusion is happening and how this can cause like more complex like attacks that was kind of the high-level in descriptions that like describing the paper so it was an interesting result. Similarly for another paper that was more on like basically the memory leaks that you could resemble through the memory ownership and again we fine-tuned the model and it could detect test cases that at the end we got from the paper and their open source test and so these are not a large number of papers so the results are not that reliable and basically we were just trying to know what are the limitations and can be like helpful to read the model so there is also interesting but at the same time there are number of tokens that if you basically break the tokenized papers and those are like really lot of tokens and can be really expensive to scale it in a large number of papers so it can be useful for specific sets of vulnerabilities like if you are looking for this specific for example memory vulnerabilities but for scaling that it needs more work more advanced techniques for fitting these details so another useful data source specifically for low level kernel development like when you are dealing with hardware or hypervisor the architecture docs and like arch manuals like as you know these are really longer documents and full of details that can be really helpful for finding like bugs and debugging so I started with relatively a small one because the first it's very expensive to like be that like large probably millions of tokens if like for example it started with Intel manuals or something like that so I started with a smaller one it's about 200 pages for Hyper-V and it has like images table so one of the things that was useful was having two versions like take space tokenization and then converting the PDF to a lot of images and extract them separately and so this was basically helpful for finding like the fine tuned model was actually really knowing better about details of like basically MSR configurations for Hyper-V and Hyper-Calls and it was like basically completely different in describing my like Hyper-V like related quotes that for example or VSM kernel module and it could find four issues that they're not part of the main stream yet like we didn't test on the analysis on main stream it was mostly on my dev environment like on VSM codes that needs to be verified but yeah it was like really surprising that we could understand like specifically Hyper-Calls and MSR configurations much better than the previous version our final fine tuned model could do really much better on our small dataset so from our 32 main stream patches it could identify like 29 of them like the part was identified exactly that part that was expected from the 15 kernel codes that like we asked for file function like line-by-line investigation it could find 11 security issues that these codes are internal they're not part of the main stream code so they need more investigations before discovery and so this is a promising result and of course it cannot be like generalized because our sample is very small it needs like a lot of improvements much better set up like complex data or better sets of like vulnerability description and like for example architecture video descriptions or attack conferences that are specific on like for example like attack and vulnerabilities or audio descriptions of vulnerabilities so sources can all be combined together as a multimodal approach that I think it would be a very promising approach and at the same time if we go to that direction we should also consider that there are a lot of security problems with LMSS as well we should be very rough like specifically if we're using like formats like images and videos they can be like poisoning attacks injection attacks on LMSS so there are like other security considerations to like to consider for those kind of data but still I think it's like really promising for us to integrate LMSS for improving security after current so thank you for attending my talk and please feel free to ask any questions I also want to thank James Morris and Paul Moore for their help their feedback their help with the brainstorming finding patches and that was very helpful thanks a lot okay let's get started hello everyone it's so nice to see you here my name is Corinne I'm currently a Ph.D. student at a UC Riverside under the guidance of the professor Zhiyun Qian so our group is actively researching on system security especially for the Linux kernel and my research mainly focus on the kernel vulnerability and also the program analysis so today I'm going to introduce our recent work a hybrid alias analysis and this application to the Linux kernel so this is the agenda so the first I will introduce the problem that we are working on and also I will then introduce some background around that and finally I will jump into the program analysis world so the motivation is that we realize that many kernel access control mechanisms are rarely used because their rules are not easy to derive or the derived rules are actually no sound because the analysis themselves are no sound this might cause some wrong time issue if you have some false negative in your analysis so to address that at a high level we propose a novel alias analysis framework precise, scalable and sound alias analysis on the whole Linux kernel become real even we can analysis with the all-yes config Linux kernel which turns on all possible modules and it's quite large for analysis so we also made many efforts to specifically support many Linux kernel code features such as type casting and pointer arithmetic which are not well handled before and eventually we have used our analysis to help derive access control rules for some existing kernel access control mechanisms showing the promising results and some improvements so for the background the kernel is always under exploitation mostly because of some memory safety bugs such as the use after free and out of bound or generally speaking the dangling pointers that point to the place that it should never point to and the kernel will crash because of the memory safety bugs but the exploitation will start from there so one way to mitigate the kernel exploitation is to use the access control which determines the access rights for a subject to an object ideally following the principles of the list privilege so we are all familiar with several mechanisms such as the user authentication or file operations like the read write execution and some resource isolation mechanisms like the namespace and the seccom but let's think about at the low level so the program level from the memory safety perspective or say the relationship between the pointers and their points to objects so the most ideal access control here should follow the list memory safety privilege which means a pointer should only access the permitted memory objects or a pointer or a memory object should only be accessed by the permitted pointers so no dangling pointers that should point to the objects that they should never point to in the source code so to achieve that in practice there are many access control mechanisms or prototypes for example the read-only memory page maintains the object that should never be accessed through a write for example the constant variables that once you define a constant variable you should never change its value during the program execution so and many other mechanisms we will introduce two of them real quick so the first is the read-only after initialization or say the RO after init it's a memory page based mechanism that is used for protecting the global variables we know that after the kernel compilation a constant global variable will be put into the read-only section since they are statically defined and would never be changed dynamically but there are some other global variables that will only be initialized during the kernel boot and the kernel initialization so also it will not never be changed afterwards so the RO after init is proposed to make such variables read-only just right after the kernel initialization because it will never be changed after the initialization but the challenge is that it's hard to tell if a global variable will be changed or not given the complex nature of the Linux kernel it's hard to see that where does this global variable the address of it for example flows to and it's also hard for developers to track the global variables and confirm all its changes only happens in the initialization stage so after six years of this mechanism being proposed about only 500 global variables are manually tagged by the developers and there are also some global variables that are indeed could be changed after the kernel initialization so for example the multiple path which is a highly corrupted global variable in many recent kernel exploits so we cannot put it into the read-only section even after the initialization because it could be changed if we turn on the system control config but the truth is that only few pointers in one single function of the system control module in the next kernel will modify it and all the exploits for example that using the IPC using the lead filters corrupted it through those dangling pointers in those unrelated modules so in other words that if we can identify all the legal pointers that point to the multiple path we can filter out the illegal pointers and project it through some pointer level access control so we realize that the key building block to derive the memory level access control rule is alias or say the pointer analysis so for example that given a global variable if we can find all its pointer aliases like to say that all the pointers that might point to it and check if they are all used as read-only after the initialization then we can confirm it could be pretty neat and also for more fine-grained access control like we discussed before alias analysis will naturally derive the points to set of the pointers and of course the pointer set of the object but in past decades we have so many great works on the alias analysis so why do we need yet another alias analysis well the quick answer is that there is no good enough solution yet because all of them take different design trade-off and none of them performs well for an instance kernel in the fundamental dimensions soundness, precision and the scalability so the soundness means that there should never be true alias is missed and precision means less false aliases and also it's very important to be scalable to the whole instance kernel which is quite large with the reasonable precision and also generally speaking all of these analysis can be divided into two categories data flow-based and type-based so first let me introduce the data flow-based methods now we are jumping to the program analysis so assume there is a pointer in the p1 in the instance kernel so it could be the address of a global variable or say it point to the global variable initially but the aliases of this global variable we need to figure out where does p1 go to so the data flow analysis is straightforward it looks for the data flow of the p1 and eventually if I figure out it flows to p2 and p3 or say the p2 and p3 may point to the global variable well sometimes it's not that easy case actually it would be much more complex in the Linux kernel the data flow might go through millions of nodes that across many global variables, heap objects system cores, so on and so forth make the situation much more complex and difficult so intuitively data flow analysis can only finish the analysis in a reasonable time so then we take a look at the type-based alias analysis so the type-based methods are more straightforward it doesn't trace the data flow instead it looks for the type of the variable for example the defined type for example assume the p1 is a structure A pointer so with type-based analysis it will directly derive that p2 to p5 as the aliases since they share the same type but p4 and p5 are first positives here because there is no data flow between p1 to p5 so you might think the result is reasonably good because it only introduces two false positives but it could be much worse in practice so assume the p1 is now an integer pointer it means that millions of the integer pointers in the Linux kernel will be the aliases which is definitely an acceptable result well there might be some type-based variants that try to refine the results based on some contexts for example here we find some context 1, context 2 and to differentiate out some integer pointers but those type analysis are based on the strong assumptions that are not always held in the Linux kernel and things would be much worse due to the type casting and some undefined language futures and the result is still acceptable so it's too imprecise to use the Linux kernel so in a summary that the data flow based analysis are more data flow analysis and type analysis so we will illustrate our idea with a simple example here so still start from the p1 at the beginning we are in data flow mode to trace the data flow but instead of tracing the data flow all the way down we will have a suitable time to change so for example at the node n1 which is a structure B pointer with the inside from the type-based methods just like taking a shortcut we switch to type mode and teleport the data flow to other structure B pointer nodes and continue the data flow there so it's worth noting that the data flow from p1 to n1 is not a trivial integer pointer cast to a structure B pointer so we will see the case that p1 is actually an integer pointer field of the structure B and correspondingly those teleport terminals are also the same field of the structure B for example p2 is a structure field of the structure B n2 so at the high level we do data flow analysis by default meanwhile we look for a good chance to take the type-based shortcut and then we continue the data flow analysis afterwards and more interestingly there is actually a more precise strategy so this time we won't choose to take the shortcut at n1 instead we keep tracking the data flow when at the n4 which is a structure C pointer we can choose to take the shortcut by the structure C and potentially get a more precise result one of the precise reasons is because that structure C is less commonly used than structure B so taking the shortcut of the structure C goes to less nodes which just means that it's more precise and we'll get a more precise result so in the extreme case that if we don't take any shortcut the analysis will fall back to a purely data flow based analysis so it shows that Unius is not a simple data flow analysis it's actually an adjustable and unified framework that allows you to have different data flow and type strategies and achieve different trade-off in terms of the precision and the scalability so in addition to that Unius also provides customization for different variables since it's a per-variable analysis meaning that you should always give an input variable that allows you to apply different strategies to different variables and tasks so for example that some variables may need a more precise result that require more data flow analysis and you also have the freedom to do that so let's go back to the inside so we now know the key is the structure field which we collapse the same structure type fields to be connected and then we can teleport from one to others such a strategy is actually an over-approximation since there might not be a true data flow between the same type field nodes but we can still improve the precision given the adjustable framework as we just discussed before we can choose to take the shortcut through the less commonly used structure types to limit the search space and also we can just try more data flow just to try more before taking the shortcut so in the previous case the precision is improved not only because we choose the less common type structure C but also we do more data flow before the shortcuts such a data flow constraint requires a symmetric data flow after the shortcuts so for example here before the structure C we have the data flow constraint is stored to a structure B field and such a structure B is from a structure C field after the structure C shortcut after the teleport we also look for the same structure C to B field and also a loud instruction from the structure B field so since we store something before which is necessary to match as loud afterwards so you could imagine there might be three instructions on the left a store instruction on the left and correspondingly they require three loud instructions on the right so you always need to make a symmetric balance so in other words even if we take a very common structure as the shortcut when we choose to have more data flow constraint before we can expect less matched data flow afterwards which is still precise when we try to handle some kernel specific code features so the first is the type cast so when dealing with types we always need to handle the type casting especially in the Linux kernel which has a bunch of the casts for example here in the source code that there is a type cast from structure A to structure B in which case if we want to take the shortcut through the structure A we need to also take care of the structure B because there is a type cast where traditionally type based methods handle it through a union find strategy they will do a linear scan of all the type cast instructions in the whole kernel and consider the type that once cast to each other to collapse to the same type thus it will in this case consider the use of the structure B as the use of structure A thus would derive P3, P4 and P5 are aliases of P1 however in unions we have a more precise solution we also scan all cast instructions but instead of collecting the equivalence at the type level we collect it at the object level which means that we will only jump to only teleport to the exact pointer that experienced this cast instruction here is the A3 to B1 so the data flow analysis will only continue from the B1 and then derive P3 as the aliases and if there are indeed a data flow between B1 to B2 or between the B1 to B3 the data flow analysis would also automatically trace it and handle it but in this case there are not so it does need to do so and the result is more precise and it is exactly the ground truth so there are many undefined code features used in the Linux kernel source code so mostly the pointer arithmetic instead of assuming they are not existed we chose to address them through some formal strategies so for example due to the type casting a structure type a structure object might be accessed through different type of pointers some void star pointers so in which case if you want to match such field access for example match at the left and match at the right we need to normalize them into a byte offset instead of the field indices because different field indices in different types might access the same memory offset in the object and we also relax the field arithmetic rules for the Linux kernel because for example we allow the field arithmetic from the field to base and even we allow the negative offset of the field access so a typical case here is the container of which essentially access the structure base from the field through a void star pointer arithmetic so it's worth noting that unlike many previous works that try to handle the container of specifically through some model UNIAS naturally we can use it through the formal handling of the pointer cast and the pointer arithmetic so we didn't specifically handle it but by handling these two code features we automatically handle the container of so the UNIAS could handle more pointer arithmetic or undefined behaviors in practice and we design and implement our analysis based on the grammar of the context free reachability framework which solves the analysis of the graph reachability problem so for example if you give me a pointer that I can through the graph reachability research on the graph I can tell you the points to object of this pointer or if you give me a memory object note on the graph I can search the pointers for you so UNIAS seamlessly unifies the data flow and type ADS analysis and it's sound by design and that's why we call it the UNIAS so UNIFIED the ADS so it also handles the multi-entry nature of the lens kernel through the flow insensitive analysis which means that different system core execution orders don't affect the data flow results and our results is always conservatively approximated beyond the existing core execution order and as mentioned before it's a variable analysis so you can choose different strategies for different variables to achieve different tradeoff of the precision and scalability but there are still some limitations in the UNIAS so for example first some undefined code behaviors rather than a pointer arithmetic that we mentioned before the kernel developers even write the code in a way that cast a pointer to an integer and cast that to a pointer somewhere else so the reason is that they don't want to accidentally dereference the pointers during the propagation or they try to do some pointer arithmetic through the integer arithmetic but since the UNIAS work on the SVF which is a static analysis framework based on LVM IR so we only work on the SVF pointer assignment graph which is a data flow graph provided by SVF which would only represent the pointers not the integers so far thus that it cannot recognize such arithmetic and handle it so it might be some future work and there are many inline assembly code in the NIST kernel which are not represented semantically in the LVM IR so some potential solutions are trying to model the assembly code as some external APIs or try to lift them to the LVM IR through some decompiling strategies and also the un-demand analysis somehow limited the scalability of the UNIAS so even though the UNIAS search space of the UNIAS is the whole kernel code space but it's not like a whole program analysis which will analysis the whole program at first query for the results afterwards instead the UNIAS needs to query first and start the analysis on demand so start the graph search after you query the things thus if you query more variables it will be more time consuming because you need to spend more time but it's fully parallelizable so if you have enough CPU cores and threads that you can get a good scalability so we evaluated our analysis on some applications for example the ROF need mechanism that we mentioned before so as a result UNIAS can analysis all 12,000 global variables in the Linux kernel version 5.14 with the default configuration in one hour time budget so we're the Anderson based data for analysis can only analyze about 3,000 so one fourth of the UNIAS and showing the UNIAS is more scalable than the data flow based methods also compared to the type based methods UNIAS can protect 6 times more global variables by identify them as ROF in it because type based are too over approximation to use meaning that UNIAS is more precise than the type based methods so as a result that UNIAS find about 10 times of the protectable global variables than previous human efforts so we tested the result by two weeks fuzzing and confirm only few negative are introduced due to the limitation that we discussed before the UNIAS may code or point to integer so thus there are still some improvement space here and we are also researching on some strategies to decrease the force negatives in practice the design itself is still sound so we do need to handle some implementation level details and we also tested about the two ability of the UNIAS so by setting the different shortcut strategy for example taking the shortcut more aggressively we do improve the scalability of the UNIAS but also sacrifice the precision a little bit and go back to the highly exploited global variable the multiple path well the ground truth is that it will have three pointer addresses in one c file c code file and UNIAS find 19 pointers in six c files which slightly over approximates the truth but no matter what is good enough for preventing all related exploits as a comparison that data flow based methods find 11 pointers in four c files while the type based methods find a few pointers because the multiple path global variable is actually a characterary global variable which is essentially avoid a star pointer in the LVM IR so we also apply the UNIAS to protect the global variable with two mechanisms our offering it and the general software for the isolation so to our best knowledge in the recent six years there are about 13 out of the 37 third of the open exploits once leveraged the global variables and as a result the UNIAS derived that all of those exploits are using illegal pointers so if we apply the access control mechanism through the rules derived by UNIAS we can prevent them all and there are also many other potential applications for the UNIAS so for example to derive the rules for the access control mechanisms such as the pointer authentication and write integrity which still need some derive of the access control rule automatically also for the general program analysis given that the alias analysis or say the pointer analysis is actually a very basic building block of almost all program analysis such as the 10 analysis and maybe some bug finding analysis because UNIAS is much more precise so even if you are and even if you are not even if you are not doing anything around the program analysis you might still benefit from UNIAS you can still try it when you're writing the code for the UNIAS kernel because you might wonder the potential points to set giving a pointer or finding the pointers given an object in the UNIAS kernel source code so feel free to try it it should be a tool that help you to figure out the pointer and the object relationship so to summarize in this work we propose a novel alias analysis framework that principally unifies the data flow based and type based methods it is sound by design and can adjust the precision and scalability and can apply to different variables and tasks on your demand we apply the UNIAS to analysis the use of glow variables so then we prevent all the glow variable reality that exploits in recent 6 years with the studiable mechanism access controls so eventually that we find the UNIAS in the solution space that's it for my presentation thanks everyone for your time any questions at Google we find that static analyses are best understood at code review time and for that to work you need it to be pretty stable under change have you seen if analyzing the kernel at different points in its history provide a fairly consistent view of the same variables that have existed in previous versions oh you mean can we support the incremental analysis somehow no I'm talking about how flexible under refactoring the analysis is because whenever you're tuning a static analysis it can be very finicky and that finickiness just because you added extra curly braces or not can make it unsuitable for including in the software development life cycle okay so one thing that I want to mention that we also tried our static analysis always on the most recent NINIS kernels so it should be suitable for if the NINIS kernel get update and also the framework itself is writing in a very brief way so that developers can easily add or delete any logic on the framework and the framework itself is basically a depth first search you could think about it like that so that is quite easy and friendly for developers to work on that go ahead you said you found 5000 instances where we can mark read only after net are we expecting some patches as output from this this is a really good question but indeed that one truth is that if you can check our paper we mentioned that we find 5000 but there are still 23 false negatives and all of them because of the inline assembly code also the 0.2 integer and since the are often needs if you really mark them as the read only is quite dangerous because after the initialization if some variables read only and some legitimate code access it will just crash the kernel so we would like to suggest that developers could use our results and double check for that that could be a better strategy since there are many implementation level dirty things actually for the static analysis for the kernel so it's not that safe to do so is that list easily accessible a list of 5000 places that maybe we could use you mean if the result is acceptable the false negative is the list published somewhere that someone could actually sit down and look at each of those and say oh yeah we are going to release that okay great I'm looking forward to it thank you so much I have another question you have a claim of soundness in order for you to prove a mathematical property about a programming language you need the semantics as a mathematical object which one are you using for C you mean the semantic improvement of the soundness or in order to be a sound analysis that's a mathematical statement over mathematical objects and for here you have a semantics of a programming language perhaps C where is that model what model did you base this off so the things that actually for the most of the C language features that all of the C language if you indeed follow the code future or the behavior that we should already handle that but the only exception is those undefined code future you do some false arithmetic on the pointers on the integers and turn it into a pointer which is a complex for the pointer analysis and it's still an open problem for general program analysis because the soundness is always based on some assumption or just like you said some semantic or some model and we can confirm that under most of the most of the code futures we handled them but there are indeed many not many few implementation level things that are really tough so your model you've built from the ground up or you're taking it from somewhere else for example comp cert has an entire big step semantics specified in for C and that is the kind of model that I'm talking about as a mathematical object okay that sounds some specific thing that I might not be very familiar with and we can discuss maybe offline about that thank you thank you okay that's pretty much it the presentation thank you hello hello everybody it's exactly 2.50pm and that means it's my session now my name is Peter Desaric the linux community since the early 2000s on and off spent time on crash dump analysis and various other smaller things and lately I have been working on this thing called sandbox mode this is a catchy name and it attracts some attention thank you like upfront I know that sandbox is a loaded term and this thing has nothing to do with sandboxing like namespaces and containers whatever it's a completely new thing that I tried out to protect the kernel from itself and this is original research funded by Huawei Technologies thank you very much I was contracted as an external consultant so I'm not here as a Huawei employee I'm just but it was paid by Huawei so yes they deserve the credits and let's go to the matter okay so I'm saying sandbox mode is a new mode like a new process mode until now we had user mode kernel mode and I'm proposing sandbox mode why if you look at the project goals it starts with a premise that all software contains bug like everything is buggy it's just a matter of how much buggy it is and the goal is to improve self protection by reducing the attack surface I'll get to that I mean the problem with kernel mode is if there is a memory safety bug then the whole system might be compromised and usually is I also wanted I had a second goal if we make a sandbox mode or if we reduce the attack surface can we do that so that we can run kernel workload without interference from less privileged entities means like if for example if your kernel runs with the lockdown then it means UID 0 is not as privileged as the kernel and that means if the kernel is locked down then user space or user mode is not an option okay I'll start with a very high level overview excuse my drawings I'm not an artist but it should illustrate the point let me first explain where we stand today so the options if you want to run something as a kernel workload is well obviously run it in kernel mode the problem with that is okay these barrels are like gunpowder and if you allow something that is buggy into this storage and it contains a bug well you can imagine what happens so that's generally not a good idea and although the Linux kernel is doing that with a growing code base might not be sustainable and sometimes it's not even a good idea sometimes you can't really completely audit what you need to run so this is suboptimal this is the full attack surface we have alternatives we could make an EBPF program that's better it has some limitations I'm showing here the main thing is it's no longer really kernel mode EBPF code that gets translated like I'm 3D printed on my illustration and this also has some limitations like the size of the 3D printer means size of the EBPF program and most importantly there's this verifier we only can run it in the kernel because some code has verified that it is safe to run and this verifier is quite complex and it also limits what can and can't be run like EBPF itself is touring complete so it can run anything but the verifier will not anything run maybe it's good but it has some limitations also EBPF programs are loaded from user space so that's not an option if you really can't afford to allow interference from less privileged entities that could be fixed I'm just mentioning it that's a limitation of the current implementation the other points are inherent to the solution so if we don't run an EBPF program can't we use user mode really, there's this user mode driver this user mode driver is in its own space that is separate from kernel mode there's some communication path that way if there's a bug in the user mode driver it does not have any effects on the rest of the kernel well there are some this did not work well so these are attacks on the UMD one of them is Ptrace, the other one is Kail so signals and so on user mode driver is protected the communication path is also in danger it's possible to find a file descriptor through PROCFS for example it is also possible to change the scheduling priority of the user mode driver and if there is anything in the kernel that needs to run before or after or whatever so these are attacks and most of them are like they are currently mitigated it's just that user mode was not designed to run kernel workload so it's a fear if you rely on user mode being fully protected then that may not play well in the future because someone adds a new feature to the kernel like IOU ring and forget that we also have to make protections for UMD so yeah, it kind of works but may become a whack-a-mole game so how do I what do I suggest I'm proposing something that's called sandbox mode so you can see this is very much like the kernel mode but there is a wall there is a wall between code that runs in the sandbox and the rest of the kernel which means it's exactly as dangerous but if the sandbox blows up the rest of the kernel can continue that's the basic idea now you may ask yeah, this is good but if it is isolated what can you do with the sandbox right so this is okay, sorry this is contained damage right right now I'm not protecting sensitivity so kernel data is read-only so if sandbox mode needs something from the kernel it can simply de-reference a pointer kernel code is mapped into the sandbox might not be a good idea I'll get to that but by default it is and some things can be called directly so if you don't have any dependencies on pages that are not writable from the sandbox which are behind the wall so to say then yeah, you can just call it directly if you need to modify something that is behind the wall well, that will fail for example, Kmalloc will fail that's a big limitation it is also possible to share some data with the sandbox there is this shared page on the top of the wall so that's something that the kernel can write into and sandbox code can write into if you do that keep in mind that it is a potential it is a hole in the isolation so to say which might be acceptable sometimes might not be acceptable other times, we'll get to that the thing is if writing fails then it is possible to intercept this and depending on the isolation strength and policy either fix it up or abort so first I'll get to the isolation strength the way I designed it there is weak isolation and strong isolation weak isolation really does not do much it provides the APIs and it uses the guard pages that are implicitly provided by Vmalloc and copies data into the sandbox and out of the sandbox so the thing is if I go back to this slide the sandbox starts with some data already that like these two barrels and these were copied into the sandbox before it was started and when the sandbox terminates the kernel mode can copy out some data so that's how it communicates yeah weak isolation does not provide much there is also strong isolation that's the main goal and that will switch to its own address space while sandbox mode is running that requires changes in some arch dependent like in the interrupt handlers and possibly other code to make sure that it works that was rejected by the way at least I wrote that for the x86 architecture and that was rejected by the architecture maintainers we'll see if the idea has some merit I may try with a different architecture so let me just quickly present how sandbox mode works internally now that we know what it should achieve I'll start with a slide that's how process modes work today we have user mode, we have kernel mode user mode can use sysqual APIs it is unprivileged the kernel mode is privileged internally there are kernel APIs these are not really well documented at least some of them are not kernel APIs means function calls inside the kernel how does this change with sandbox mode okay so we still have this user mode kernel mode but we also have the sandbox mode the sandbox mode is isolated and it uses kernel APIs which means theoretically you can have sandbox mode for each process in the system it's like a new mode that's why I'm saying that's a new mode the essential features sandbox mode is isolated how does it differ from user mode because user mode is also isolated can't write into the kernel data the big difference is sandbox mode is nested inside kernel mode so if you look at user mode we'll just call into kernel and the kernel will lazily do as little as needed and return sandbox mode is started from kernel mode externally it looks like it is still executing in kernel mode if you make a process listing or for profiling, tracing whatever it looks like kernel mode is just internally executing something else on the CPU it can access kernel data, I said that kernel data can be accessed and it prevents buffer overflows with byte granularity because we are copying in copying out so we are copying only as much as the trusted kernel mode said the buffer size should be violations are detected with page granularity because it uses page tables to give the isolation just to make I'm listing the APIs just to give an idea how that looks in code each such sandbox mode has an instance variable you initialize that then you define the input and output buffers with sbm copy in copy out or copy in out if that's data that is copied in and then copied out you can also share it with map read only or map writeable so these are the shared pages and then you call the function and then you clean up so that is easy to make a function callable in sandbox mode there is a macro which will just wrap around the target function it will provide a call helper and a function which makes sure that the parameters are passed correctly and you just then call sbm call with the parameters this is an example that was posted on the mailing list which converted AA unpack the up armor unpack function to run in a sandbox so you can see that it has this it has some changes outside the target function but no changes were necessary to the AA unpack itself you just call it with sbm call that's the last line before the commentary like there is this sbm destroyer at the very bottom then there is a comment line and then there is this error equals sbm call so I am only changing how AA unpack is called I am not changing AA unpack itself we still have a bit of time so I will spend a bit more time on the strong isolation so now we have an idea how that looks in source code what happens on the CPU level in a strongly isolated address space we are making most of the kernel address space non writable and only the buffers that are dedicated to sandbox mode are writable that's why it is and this is subject to the content damage that I mentioned earlier and there is no user space I think there was a misconception about this I am not if I run that like this then the protection against accessing user space is lost that's not true because user space is not even mapped in this address space now that is good what I had to take care of is interrupt handling so I am making a difference between intercepted and not intercepted so not intercepted is where sandbox mode only wants to run the interrupt handler but does not really care what it does like external interrupts from devices and so on so the thing is when such an external interrupt comes in the CPU must leave the sandbox and run the interrupt handler in kernel mode because obviously the interrupt handler was not designed to run in this sandbox and then switch back that's easy and then we also have these intercepted interrupts these are processor faults and these may have to be handled so they start the same they enter kernel mode and then they check the policy if this kind of fault is expected and in that case it will fix up the fault but if it is unexpected it will abort so what I mean with a fix up what is a fix up since sandbox mode is isolated and not privileged it runs with cpl3 in my implementation that's not a prerequisite but it has some merits I can get to that if people are interested but because of that it can't do a lot of things that kernel code normally does for example it can't allocate dynamic memory but this code is expected to contain calls to such a privileged code like Kmalloc and the thing is if this code does call Kmalloc for example and the police says well yes we want the sandbox to allocate dynamic memory then the interrupt handler will switch to kernel mode and see ok we are expecting this kind of fault and we can allocate this memory on behalf of the sandbox so in some way this is related to ebpf helpers but it's different because like ebpf helpers that's what ebpf programs have instead is called here we are trying to run native kernel code which contains like a call instruction like a real cpu call instruction and will just execute whatever is needed I also call it fixup because it does not just call Kmalloc it also makes that the newly allocated memory is mapped into the sandbox so sandbox can use it now it may also check that it is a sane it may do per meter sanitization that would probably be needed for K free so that if there is a bug you can't free a buffer that was not allocated in the sandbox all this is theoretical that has not been implemented but the idea is yes fixup should do some sort of sanitization and the sanitization might be actually shared with these ebpf helpers possibly ok so that's it essentially I think this is the main core idea behind sandbox mode and I want to give you some time for questions so how expensive is the setup and teardown of the sandboxes how expensive setting up the memory mapping how expensive is it to set one up and teardown so setting it up may not be that expensive obviously we have to allocate all the page table hierarchies entering and exiting the sandbox is it can be expensive because obviously you have to make sure that the new mappings become active luckily and that's why I said I run it with cpl3 on x86 because that allows lazy tlb invalidation you know like but obviously it's not the full story if there's any user space well the user space translations in the tlb are flushed when you store the sandbox right so that is expensive what use cases you had in mind when you wrote discord? so the trigger probably is not the main use case I had in mind okay there are a few so first is if you have to run a binary only module for any reason and you don't want to trust it maybe you want to run it in a kind of sandbox and I know that binary only modules are evil and you should not be using them but still there are people who have to use them and second is well for things that do not run on process context this tlb flush is not so expensive because we are only flushing a few pages that are actually used by the sandbox so things that make a complex thing like parsing user supply data that might be crafted to trigger a bug could easily run in sandbox mode so this does not happen too often like unpacking the up or more profile usually you only do that once so things like that parsing of user supply data so you are trying to act as a safety net between untrusted user more than that's it make it harder to exploit an existing bug thank you hi nice talk so can this be implemented for arm64 as well and we use the same levels like EL2 EL3 yes it can be implemented for any architecture that has paging so anything that has an MMU I haven't really implemented an arm64 version but I had some ok so that's in the in the backup slides so if you had sandbox mode on arm64 you would run it at EL0 because otherwise if you run it as EL1 you would have access to privileged registers so you could escape the sandbox the idea was like load a new value into ttbr 1 EL1 when starting the sandbox and then at interrupt entry and exit implement something like kpti for the sandbox there are some ok I wrote down a note about config on-map kernel at EL1 yeah that's the kpti for arm there are some tlb considerations again like we would have to allocate an ACID for each sandbox instance and yeah global entries yeah we can also implement the lazy tlb flushing here and there is also yeah it needs its own stack it also needs it on x86 I skipped it for this presentation because that's an implementation detail you can read it in the submitted patch series but we would need something like that there was there was a trick on x86 to recognize user mode from sandbox mode which used the saved cs registers we don't have that on arm so we would probably use a per cpu variable but essentially it can be done but this sounds a bit more expensive in terms of cpu I mean imagine going from user space and then in coming to the kernel space you do the context switching and then from kernel space you have to go to the sbm mode more context switching so have you done any profiling and then like is there any like what's the possible impact in performance I definitely didn't do any profiling on arm because I didn't write the code I just made a sketch how that code should look like on x86 I didn't or like okay loading on x86 loading a new value into cr3 is a relatively expensive operation but yeah like it's manageable on a 3Ghz cpu the latency added by like entering and exiting the sandbox mode was under one millisecond which is like still a lot like if you're talking about nanosecond latencies then this was noticeable would divide that by two but was on the order like hundreds of nanoseconds even if I optimize it it's still expensive and you usually go from user space to kernel for like memory allocations and those things and there will be frequent calls to the kernel and then you're switching again to the sbm mode I think it will be more frequent it's going to be more expensive that's why I'm saying my target was things that do not run frequently I wonder if if the performance it could be justified for some use scenarios it depends a lot on yeah like said it depends a lot on on some factors that are unknown like is there a user space that needs to be repopulated into the TLB for example in that case it's probably a bit expensive but if there is no user space then it might work okay thanks you're welcome the kernel text has commands in it that require cpl0 so what does failure look like I mean you're going to get general protection faults and does it blame the sandbox and make it easy okay so depending on what you want to have so generally yes if if it raises a general production fault it will abort the sandbox and return an E fault to the caller but if you so wish you can write a fix up for the general protection fault like you may allow to disable introps for example but keep in mind well then the sandbox is allowed to DDoS the CPU that it's running on I know that questions then thank you and that's it so this mic oh did I miss the ice cream no ice cream good I say ice cream in the room just empties alright so it's 1550 I'm going to kick this off so my name is Bill Roberts I'm going to talk about system D it's integration with TPM 2.0 and some of the fallacies and pitfalls that kind of pop up when you're doing that type of integration work and so we'll get going so with this crowd probably a lot of people here have some background like I noticed a lot of the discussion heard the presentations mentioned TPM somehow but I'll just do a really brief intro to it so if I say TPM I mean TPM 2.0 there is an older 1.x device that kind of started the whole TPM thing it's kind of really old it's deprecated I haven't seen one in a while they're still out there but yeah we're talking about TPM 2.0 and so it's a the TPM is a thing called a trusted platform module and it's a standard by the trusted computing group and at a high level it provides storage so you can say hey TPM create me a key and store it through some protection mechanism it does measurement so we've seen a lot of talks talk about those PCRs that's that measurement state that the TPM can provide reporting there's a whole attestation protocol around the TPM so it happens to do with like signing the PCR measurements how do you get them to a remote server and perform the attestation protocol and cryptography so not only can it create keys it can also use the keys to do things like you know asymmetric and symmetric cryptography you know like RSA Hmax etc AES but not all TPM support AES this talk is mostly going to be around storage the protection of those storage keys and specifically the disk encryption key so if anybody's like familiar with Windows BitLocker or something like that it uses the TPM to store the disk encryption key and protect it and then that disk encryption key is eventually released later and then so what is system D so this is the NIT process replaced the NIT system it bootstraps most Linux distros at least the ones I use it provides a lot of stuff it's not just the NIT system it's an ecosystem of things and one of the things that it has is native lux disk mounting and TPM2 disk enrollment so it has sets of utilities and things that are integrated into system D that lets you say hey I want to take this hard disk and I want to enroll it with a TPM key for protection so this is not a talk on system D versus a NIT and this is not a talk to be critical of system D I know Leonard very well and other projects have made the same mistake you're going to see along here like parsec there's a couple others so I want to talk about what these problems are and what we can do and again this talk is specifically as far as system D is going to talk about that lux disk enrollment so what is the problem so there is a potential attack vector with TPM 2.0 so by default whenever you send a command to the TPM and you get a response back that is all over plain text and right so there's a certain TPMs are going to be discrete i.e. it's like its own little chip hanging out somewhere or it's going to be you know built into the CPU firmware and so the typical attack vector is like the bus if you have a discrete TPM you could hook up some wires on the bus and you could sniff all the traffic going to the TPM but it could be other potential parts of the system and I think a lot of the other talks about this in this conference have brought up like different execution levels or different execution environments and usages of TPMs whether virtual TPMs however you're getting those bytes to and from a TPM whether it's virtual or physical and those bytes cross your trust boundary you probably want to protect them so this is a known attack and the TPM 2.0 architectural docs actually describe this in detail and then if you're actually looking for like a real world instrumentation of how you could perform these types of attacks if you just Google TPM Genie there's a great github page it's got source code it tells you how to do it it walks you through it it's actually kind of fun to do so while this is a known problem that architectural documentation also describes how to circumvent it or to prevent it, mitigations so there's these things called sessions unfortunately to get the sessions you have to opt into them there's a lot of different sessions if you look at the TPM documentation which is very terse at times the sessions you care about are these things called HMAC sessions and you go oh well HMAC is great but yeah it's like integrity right well there's other bits you can set when you create the session there's bits for encrypting the data to the TPM and then bits for telling the TPM to encrypt the response data so you want to enable those types of bits and you can think of this conceptually the same as TLS and you want to do the same things that you would do in like a TLS session again the attack factor is typically the bus but again like a point that I really want to drive home is anytime your bytes of communication leave the trust boundary think about it a lot of people just go oh I'm using the TPM it's good well maybe not depending on the scenario and another nice thing is when you enable sessions whatever client application your software is that's communicating with the TPM it's endpoint to endpoint right so no matter where that application moves like if it gets moved you know ported to a different system it still should work so this attack has gotten a lot more press than I think it actually deserves because it was known here's just one but if you if you google the attack you'll see like all sorts of documentate or not documentation but news articles bit lockers broken you know pick pick a vendor that's making a TPM chip oh their security doesn't work typically happens to be around the firmware based TPMs that are built into manufacturers processors so again this could be thwarted with that encryption encryption and the problem that I'm going to show you is that people were even using sessions and thinking that they were getting these protections but they were just saying hey TPM create me a key pair now encrypt that communication channel with the key pair you just created that kind of you know in a TLS session you're going to walk that certificate back from the public key to actually make sure it's signed by some CA that you trust you don't ask the server to say hey server create me a key pair send me the public key and I'll trust it and that's kind of another problem that was around here so the system D support so they added the support in so system D actually added that support for like a bit locker type thing they added support for encrypting and decrypting hard drive contents the way that it works in the TPM is that when you want to create one of these disk encryption keys for your hard drive is you generate the disk encryption key not the TPM so there's two types, essentially two types of like keys in the TPM one where you say hey TPM create me the key and one where you say hey TPM here's the key disk encryption keys are hey TPM here's the key I'm going to give it to you you might ask like why, why not just let the TPM handle all of this TPM is slow TPM is not good for bulk encryption in fact most TPMs do not support like AES or something like that you actually want to get that key back in raw bytes because you can hand it off to certain types of hardware disk controllers where you can actually they'll perform the encryption and decryption so it's not even done in software so if you want like that hardware acceleration stuff they kind of need the key to do the work so because this key is essentially going to and from the TPM and the TPM is really just kind of providing data at rest protection for the key that man in the middle attack is kind of useful because you can get that disk encryption key so here's the timeline of changes so these are system D version numbers on the left version 248 where they had that initial TPM 2.0 support with no attempts to do any type of encryption then they enabled session support and then that's kind of when I went to go look at their implementation and notice there was a problem and then I kind of worked on mitigation steps after that through version 252 through version 254 I forget what version they're on now but it's like to upper 250s might even be 260s by now I don't know they do move pretty quick but most of the recent distributions like I'm on Fedora 39 it's new enough where you're fine so visually this is what I've been explaining if you're just in a plain text scenario you have the key it's coming from the TPM an attacker on the bus or on whatever communication channel it is gets the key they have your disk encryption key they can use that in an offline attack not particularly interesting unless you're the attacker version 251 so this is where systemd actually was like hey let's fix this and actually solve this problem so they enabled the session protections again it's just like TLS but the weakness here is that they failed to verify the public key and this is like the second time in a public repository I've seen this happen and I'm sure it's elsewhere I just don't know about it I've seen it in a lot of proprietary software so when you're using the TPM verify your public keys if you're using them for some well verify your public keys it's a good idea so this would be akin to like trusting any root CA for TLS or accepting any key pair over SSH like you're just like oh sure that's a fine key it's good enough and so how would the attacker though actually do this right oh actually let me show you so this is their vulnerable code snippet this is generally the code flows I see in other people's software is will be like they'll call make primary they'll get a key pair from the TPM and then they'll do something like alright I'm going to start up that encrypted session so if you're auditing code bases or you're working around the TPM stuff keep an eye out for stuff like this and then make sure that something like make primary or make encryption session is actually verifying the public key and one thing to note here is the way that the TPM works internally is the keys are in a hierarchy the primary key will be the root key within the key hierarchy and then you can make child keys under it so that's what that means like if you see primary key it just means it's like the root parent key for a key hierarchy okay so this is where they added the session encryption so how would an attacker actually take advantage of this so an attacker would say alright you want a key pair for the session encryption so like when they call make primary as the attacker I just give them a malicious primary key back where I control the private key and then I just as the attacker on the bus I enable the I just perform what the TPM would do which is session encryption I do all that if I need to forward commands the actual TPM I can I'm in full control I can decrypt the communication stream I can send stuff I can see all the data coming back and they have no idea they would never know so in this case when they they send the key encryption key back or the disk encryption key back you just decrypt the communication stream and you have the key then the client application in this case system B would never know so that's when I I saw this and I was like okay so what can I do really really quickly to make this a little bit harder to pull off well I figure out how to fix their code and also figure out if they want patches so the simplest step was to add something called a bind key so in the TPM when you set up a session to enable the encryption between the client and the TPM there's essentially two ways you can set this up once called the TPM key once called the bind key in the TPM key you use the asymmetric key pair to encrypt assault with the public key and then you send that when you start the encrypted session with the TPM TPM as the private key it can decrypt assault that assault is used to seed the session key the bind key is suppose I make an object in the TPM and I put a password on it that password is known by me on the client side because I created it and it's known by the TPM because the TPM has to actually perform checks against that password and what it does is you can use the bind key to establish the session key because that's a shared secret between each side so instead of actually having to encrypt something and send assault over you just use the symmetric you just use the password as a symmetric secret to enable everything the problem with this though is that any recorded traffic can actually be decrypted offline so you could you could use a crack attack on a dictionary attack or something to derive the password and then decrypt the traffic so like weak pins so like if you've used this feature on systemd it's like put in a pin most people are going to be like 1, 2, 3, 4 so it's not really useful let's skip the rest of that so version 252 this has the bind key so we set that up the attacker in this case could still be sitting there and faking the encryption for the primary key but they couldn't do the other half with the with the bind key so they could get the traffic back and they would see the layer of encryption from the bind key but they wouldn't actually be able to do anything useful they'd have to take that information and then take it offline and try to crack it so um how can we make that better so it was a quick stop gap because it was literally like a two line change it took like two seconds they accepted it immediately and put a release out but again the traffic can be brute forced so how do we fix weak pins we add entropy to weak pins using a PBKDF um so we added we used HMAC SHA-256 they already had HMAC and SHA-256 there in their crypto library so I just had to put them together in a PBKDF the other nice part about this is um so we have to store the salt that gets stored in the lux metadata for um for the super block but now like if you have your disk encryption key and it's protected by pin 1234 while anybody could go offline around that go straight to the TPM and be like oh give me the key I know the password it's 1234 now you need to have the salt to do that so that greatly also protects users like weak pins from uh just having another backdoor into the TPM and this really just leaves kind of like one more task left and so here's kind of like a visual of what I was just describing is um it's the same thing as before but they can't really take that offline now and mount that attack it'd be much more difficult see the attacker said so the last thing and this is like the thing that I really want people to be looking for if they're working on TPM is like if you're going to start an encrypted session with an asymmetric key pair do this so um this is the TPM key parameter in TPM to start off session if you're actually looking at the documentation um I'm the maintainer of the TPM to TSS project on github so we have something called an enhanced system API that makes this process easier if you use the native esus tr types which um some people do not like but the idea here was to bind the objects location to TPM so TPMs just have handles to objects with the name of the object which is the cryptographically unforgible actual object so like attackers you need the name and the object location to guarantee that you're talking to the right thing in the TPM so if you use esus trs with the enhanced system API and you use that to start um sessions you're just fine if you use the feature API which is probably not going to be useful for most of this audience it's definitely more of like a high level API like but if you're in user space systems leveled programming up there that'll be fine and the feature API does this all under the hood for you you just call functions like give me the key and you don't have to worry about it um the esus tr gets stored in the lux metadata so where we stored that salt in the super block this just gets shoved in there as well and the security model around this is first to set so when you enroll the disk encryption key during that time when you seal it to the TPM that's when we actually record what the primary key should be for the encrypted session so this is like SSH when you're like logging in you accept the remote servers public key fingerprint um so again esapi will take care of everything um and this prevents the attacker from supplying that public portion of the key instead of getting it from the TPM you're just getting it from the super block for the for the disk um and then all the unsealed traffic within system D is uh protected by sessions now even when no pin is present so like the bind key approach only works if the user specifies a pin but if they do a pinless setup now they have those protections as well and this allows um different off sessions uh different like models of protections for the keys so like Leonard was just putting patches up to system D yesterday where you can do like PCR policies and other things so um and have all those encryptions and protections in place so uh in this case this is this is just not even possible uh the attacker is sad so if you're on version 254 or greater everything should work and be secure the other there was kind of some miscellaneous issues that I fixed along the day along the way um system D was creating a primary key and every seal and unseal which is really really slow even though they were using ECC which is way faster than RSA um you don't really need to do this and there is actually no way to specify the owner hierarchy the owner off for the hierarchy that the primary key was made in so which mean is if you actually deployed a TPM with an owner password set which is you should it's recommended in the uh provisioning guidance documentation there was no way to actually make this work so I don't really know how they were deploying this in practice um so what we did is I changed the primary key to use what's called the change root key um this is part of that provisioning specification um and it's it's really just a known key at a very specific address in the TPM and it avoids this owner off requirement so it's it's kind of like a scratch space for everybody that wants to use the TPM without being the owner of the password which is typically going to be controlled by like an IT organization and not the end user uh they have a spot where they can create keys for the TPM without needing like special passwords and stuff so future work it would be really nice to um do the TPM to seal path so right now the seals are going uh unencrypted to the TPM but at that time we're assuming that the TPM is in a good state that was that first to set model so when you do the seal that's the set but it would be nice to be like I want to do a seal I want to use this key this public key to protect it if you're if you're in a state where you know what you should be expecting the TPM with so that way we don't have to blindly trust what's there you know you could you could add an option to you know supply the name of the object or or something like that um and I guess we could drop the bind key argument now there's there's really no no use for it I just didn't do it so if somebody just wants to write a one line patch to delete that'd be great and then um I'll throw some thoughts here on SPDM SPDM is pretty great because it's going to be a generic um way to encrypt and decrypt traffic to peripherals um TPM 2.0 devices are going to support it I've seen some early hardware I haven't seen anything in the market but it's supposed to be coming but you're going to need to have like add support for this and do stuff to get it to work the TPM kind of already has it and you won't need the newest hardware to do it and as you port to different things that may or may not have it you just know that if you port this design it's going to work the same everywhere so I'm a huge fan of adding session support to the TPM and not relying on SPDM until maybe it becomes more widespread and then in summary um use the feature API if you can you don't even have to worry about this it just does it for you if you're going to use enhanced system API do it correctly use the sRK you don't need to be creating primary keys unless you have a very specific use case and so that's my contact information and I'll open it up for Q&A I thought I was going to get away with no questions oh sorry so correct me if I'm wrong but it seems like you're saying that to verify the public key of the TPM you basically just read it from disk and that would probably be like totally unencrypted so what's stopping an attacker from just modifying the disk and saying oh the TPM's public key should be me so in the attack this is why the attack I kind of have questions on is that somebody's on the bus between the TPM and the host CPU right well they don't have access to the disk okay they can only be there they can't be on the disk that's why I'm saying that's the attack now in instances where maybe your bytes are going across an untrusted boundary like maybe you're in like a T and the rich OS is moving those bytes to the TPM for you if you set up that that will prevent that rich operating system for being able to tamper with the communications they could doss you but they couldn't do that yeah if somebody actually has physical access to the machine your physical TPM they have physical access to the device right that's why this attack I think it's blown out of proportion but the press is an amazing thing yeah I guess could you like somehow say oh here's the public key of all the TPM manufacturers and then verify it's an authentic TPM device so there is a way to do that so there's this thing called the endorsement hierarchy and in there is you can create this thing called the endorsement key and that endorsement key is going to map back to a manufacturer certificate which so it gets really difficult to do that because there's a lot of different CAs that are used to sign these and so like if you're in like really really low level part of the stack and you're trying to walk a cert chain back and verify like I think our code base can do it but I think it's hard coded for like at least 12 root CAs right now now one thing you can do is some TPMs actually let you remanufacture them so essentially you take over this endorsement hierarchy and you set the endorsement key and by changing all that you could then just issue your own certificates all tied to one root CA and then you could just have one root CA to verify to and that would be way more feasible for like firmware when you're trying to do this real down like down low it's really difficult to do right thanks good talk yep so with v254 man in the middle attacks go away is that correct in theory okay so with versions prior to 254 and I know you indicated that this was not intended to be a talk on system D but do you know if there's any way to discover or detect if system D has been compromised for the versions prior to 254 does my question make sense yeah it does I never really thought about like trying to figure out if it's currently being compromised okay it'd be you'd have to be looking at at the communication and then understanding what key was being used and then maybe verifying that that key isn't certifiable in the TPM to like maybe like the EK or something like to a key you trust so there's a way to ask the TPM be like hey is this key actually in the TPM and you certify that key to another key that you already know and trust so most people will certify to the like the EK or something you can do it with any key as long as it meets the correct attributes that could be a way and then you could see okay this key isn't in the TPM but it'd be tricky to do probably better just to upgrade your system D if that's attack you're worried about upgrade my system to V254 and hope in theory there I'll have no man in the middle yeah it depends on like what your attack model is here so like the classic attack battle is somebody on the discrete bus but if somebody's on the discrete bus they can probably get your discrete disc too so but I mean there's all sorts of weird ways to use this session encryption like we have the ability where you can run your TPM connections over an SSH connection so you know when you're starting to cross potential trust boundaries there properly setting up your encrypted session and not relying on SSH which is only going to get you from machine to machine not inside the machine so okay thank you you don't get to ask questions hey Will latest version is 255 by the way my question is which PCRs does system D uses for sealing the password when you're using this oh so okay at the time of this PCR policies wasn't there so it was just pin based or nothing during this time Leonard was adding PCR stuff I didn't look super closely to it yesterday he just uploaded a bunch of stuff for being able to tie it into PCRs there's currently three different ways to tie it into PCRs right now don't ask me what they are I know the simplest one is just to say here's a PCR policy that gets really brittle right because as updates happen and your PCR policies out of date you can't get to the key and I'm not sure what he has for a fallback mechanism maybe you could unseal with just the user off or something and grab it I'm not sure what he did the newest one is using a essentially using various policy signed type ideas so you can actually have a mutable PCR policy but I literally just looked at that stuff yesterday so I'm not really the right person to ask just because from an attacker perspective if I can swap out system D because it wasn't measured into some PCR that seems to be a pretty clear attack that I just you know swap out system D to log whatever pin the user is entering well yeah if you're I mean if you're swapping out things right and if it's not somehow reflected in a state that's going to get caught by some policy then yeah you win cool thanks yep alright looks like we're done okay so that that goes to sleep that microphone so that's the end of the um the scheduled talks it's weird being in front of the microphone in front of the speaker looks like there's no more buff sessions did anyone have a buff session so that will be it for the security summit North America this year thanks to everyone who participated especially thanks to Linux foundation and to Jennifer and everyone and the AV folk and everyone who's keeping everything running here and to the sponsors so if you are a speaker and you have an upload or slides yet please do so as soon as possible and also feel free to grab more stickers there and just redistribute them okay thanks everyone