 Hi everyone, I'm Mikael San. I work for the French Network and Information Security Agency. And this talk is about landlock. I talked about landlock last year, so I will not go through all the details of the implementations, but I will focus my talk on the file system access control, which is pretty different than other LSM. The first part is a quick recap of why landlock, what it is, and how it works. And the second part is dedicated to the file system part. So this talk is about the eight series of the batteries, which was sent in February. So there's no more new series from now, but soon. And so let's start with the first part. Landlock's first design, well, the thread is to mitigate bug exploitation or bug door in applications. The application may be server-side or client-side. This may be, for example, a text editor or a web server. Here we want to protect the user of the application against unattended accesses from these applications to the user resources. So really, we want to slow down an attack. There's many important features provided by landlock. The first one is to empower any developer of applications to be able to create tailored security policy for their needs. So some use cases. For example, this may be used to create a security model which fits best your application. It is also useful to emit the security in your application. This way, it is easier when you update your applications to update as well the security which is tied to the application. And then, for example, it may be useful for the user to have only one configuration file which integrates the features of your application and some security properties too. Another one important feature is the ability to compose different access control on the same system. For example, on an end user computer, you may find a system administrator which wants to enforce some security policy. Also, one on multiple end users which may want to isolate their activities. And also, well, the applications the user is using which may want to send back the application to make them more secure, which is, for example, the case for some web users. It may also be useful for multiple cloud services which may have multiple clients. Another important feature is the ability to update if it is deemed necessary the access control on the fly. This may be really useful, for example, to implement some Power Box support on your application. So this Power Box means an ability for a sandbox application to access resources outside of the sandbox. It may be seen for an end user as a fire picker, for example. You may find such Power Box in Mac OS, Android, or even on Linux with a flat pack or snappy, which may have other names like Pottles. And another example may be to update a security policy according to some external factors, like, for example, user behavior or application behavior changes, or, for example, the time, for example, office hours or things like that. Let's first start with simple dimension. So here we are submitting a web server. So there is multiple paths which should be accessible in a read-only way and some which need to be accessible in a read and write way. For example, a slash public, which contains the most of the web files, slash tc, slash resa, and the slash tmp, which should be accessible in a read and write way. So here I will not launch a web server, but instead we launch a shell, which is easier for the demonstration. So we are in the slash public directory. So you can see there's multiple files in here, not really web ones, but now there's one web file. So I created an index.html file. And then I will launch a user applications, which is a sandbox helper, which takes multiple arguments. The first one is a list of paths which should be accessible in a read-only way. And the second one is a list of paths which should be accessible in a read and write way. The other one helper here is called lanoc1, and it will launch a binbash. Now we are in the sandbox process. So we are in the shell, but in the sandbox. And we still can list the files in the same directory. However, you may notice that there is some access which are denied. For example, the .dot directory is denied, which is, in fact, the slash directory. And of course, we cannot write on the file now. But we still can go to the TMP directory, see what's inside, and create some files. And still, the slash directory is not accessible. Well, we cannot do a stat on it. And we cannot go into it neither, or even list some other properties. So how does it work? In a nutshell, this diagram gives you the intuition of how it works. Basically, it's pretty similar to how SecComp works, how SecComp can apply a security policy. The first process which one to sandbox itself creates a security policy and load it in the kernel. So this policy is a set of Flunnel programs, which may be triggered for some specific action and can only restrict the process which loaded it. Well, that's the first simple example. So if this process, one to call the open.cisco, access.fi, this set of programs will take a look at this request. And allow or deny the access. Lambda is, well, made with multiple important parts of the Linux kernel. The first one is the Linux security module framework, which provides a way to implement kernel code, which is dedicated to enforce security policy on user space. There's a lot of security hooks, more than 200. The second big part of Flunlock is the use of eBPF. So the extended Becley packet filter is an internal between machine, which is dedicated to run, well, or interpret, safely code by code in the kernel at runtime. You can load it and upload it. It is used mainly nowadays in the network part of the kernel, but also on the tracing part. And some other projects are common. Two really important properties about this Victor machine is that you can call dedicated functions in the kernel, which are dedicated for one type of program. And there is a way to exchange information between two eBPF program and one eBPF program with user space process, so this kind of new IPC. Here, Flunlock brings a set of hooks, which are dedicated to a set of actions for specific kernel objects, for example, files. There's also a set of program, which are in fact eBPF program, but dedicated to Flunlock. This program can be stacked on the Flunlock hooks. And they may be interpreted, triggered when the properties ask to be triggered for a specific action, for example, read, write, or such action. So really here, Flunlock has a new layer of security. It is not meant to replace any LSM, but the goal is to provide a new way to enforce and to secure your application ecosystem. So it is on top, well, it should be used on top of other security modules. A really important part of Flunlock is the ability to be used by entourage processes. So this is quite challenging because it is not the case for other processes, for other LSM. So there's two main challenges here. The first one is to protect resources from the applications which are sandboxed. So for this, a process which wants to sandbox an application needs to be able to retrace this process. So this means that it is not a threat if this process impersonates this process. If the requesting process impersonates the sandbox process, it is already allowed. But there is not only a need to protect user space, also the need to protect the kernel, and especially to prevent information leaks. In fact, an MbPF program should not have access to information, not otherwise accessible to the process requesting the sandboxing. Otherwise, we will have a privilege escalation. Another important aspect is to avoid side channels, which may be, for example, avoided by only interpreted MbPF program after, well, on objects which are viewable by the process which requested the sandboxing. And, of course, after OSLSM. It is a kind of discretionary access control, but not really because it is implemented by the developer. And another important aspect is the need to be able to account kernel resources, which are used by these new access controls. So now let's take a look at the second part, which is dedicated to the file system. So why and how the file system access control is different between LANDAC and OSLSM? First, there is two kind of way to enforce an access control on the file system. First, you may use excellent attributes, which is a way to tag two label files, but you may also only use path. So let's see the first way. The excellent attributes, which are in each files, well, there are metadata, are really interesting because they're native to the kernel, easily accessible and efficient. But for LANDAC, there's some drawbacks. First, well, it is not possible to use excellent attributes to achieve compatibility, to implement different security policies and run them side by side because only one label per file, which means there is mainly one view of the file system by the kernel. For example, if you do little addings, buy amounts, unused namespaces in container, for example, well, the file, which may have different paths, will only have one inode, then only one label. For the infillage parts, well, we need to be able to account which process sandbox security policy. Also, if you want to use excellent attributes, the file system you're using need to support this, which is not the case for every file system. And last but not least, if you want to level a file, you need to be able to write on this file system, which should not be a good thing for intraded users. You don't want a user to be able to write anything on the file system, of course. And for the dynamic parts, well, you may not want to impose persistent labeling on the file system, but may prepare to label on the fly. About the file path, so the other way to enforce an example on the kernel to create an LSM. Well, first, it is really interesting because it is the point of view of the user. So it really reflects the view of what you want to apply in a sense control. But for unlock, there's some drawbacks. First, the composability. Well, for every file access, we need to remember how this file was accessed because you may use bind months, name spaces, and multiple add links. Well, there's a multiple way to access a file, so a multiple path for a file. And for the intraded parts, well, we need to deal with some underlying inode stuff, which may be tricky, like, for example, accessing a file with a partial path. For example, if you're using the open add syscall, you get a file descriptor, and then you add a part, a relative file path. You may also use anonymous inodes to shoot all nine spaces. So this all can be tricky to implement an X control with these constraints. Of course, there's a risk of leaking path information because, well, you cannot assume that the sandboxing process is trusted. If you don't want this process to be able to gather more information, then you should normally have access to. For example, the death of a file or some underlying directory. So the idea we saw a lot is to create a new eBPF map, which is called nigh.map. So you may see a map as an hash map. So most of the time, it is in a way with a key and a value, with multiple entries, with key and values. The idea here is to create a dedicated map to be able to identify an inode and tie an inode with a numberity value, like a label. In practice, this map is filled with file descriptors. But in the map, the file descriptor is not stored, only the inodes, which is referenced by this file descriptor. This way, it is easy to fill the map. But still, to fill a map, you need to first have access to this file, to the file descriptor. Because we deal with inodes, it is quite efficient to match an inode, if an access match a known inode or not. Because it is an eBPF map, it can be updated by user space on the fly if user space wants to keep the map open. Otherwise, it is easy to lock the map. And of course, it is usable by infrared processes because it is, again, an eBPF map feature. So here, we achieve a way to identify an inode and not store any information on the file system. But still, we can account how much memory it takes and which process requested this memory. And as I said before, we are now able to tag any inodes. But this is not tagging the file system, only the files in the memory. The tagging only is taken in memory. So now, let's see another demonstration, which is about deleting an access control on the fly. So here, there is two shell. The first one will be the one which will be sandboxed. And at the bottom, you can see another shell, which will be used to update the sandbox on the fly. So first here, I use a file system, which is dedicated to eBPF to pin either a eBPF program or a eBPF map. But there is also a way to do it. Then, I run almost the same sandboxer we showed before, we seen before. So there's a list of files which are accessible in a read only way and a list of paths which are accessible in a read and write way. But there's also a new path, a new arguments, which say where we should pin the eBPF map. And then we run the shell. So now, we are in the sandbox. And you can see, well, we can still access the files. But you also can see, well, some stuff here are not really great, because there's no mapping between UID and user. So do you know why? We cannot see the root here, and we only see UID0, because something's missing. In fact, well, it is the same for the prompt. There is no username, because two files are missing. First one is ETC Password and ETC Group. So well, we can add them on the fly. Why is it running the sandbox? So here, I call, I launch another helper, which sole purpose is to update, unlock InodMap. And here, to add two paths. So now, without relaunching the sandbox, well, the new access are granted. But of course, if you want to have a nice prompt, you need to exact again a bash, because it is a shell limitation. OK, so let's see how it works. There is two kind, two type of launch program. For now, the first one is dedicated to work through the file system, to work through directory. And the other type of program is dedicated to allow or deny a specific action, a read or write. So the first one here is dedicated to go through the directories and identify a path. So you can see it as a state machine. This program, the FSWORD program, can then pass a state to another one. If it is a file access, here it is a file pick, which the first one is triggered for open, search here and get to get attribute. So mainly read-only accesses. And the other one, which is also an FSPIC, can be chained to the two previous. And this third program will only be triggered for specific write actions. So let's see with some example. We have here at the bottom an eBPF map, which is a free file, free I know, slash tc, slash public, and slash tmp, and a number to evaluate, which is kind of a tag. So if we're working through slash public, slash web, slash index, dot html, well, the first file, which will be seen by the eBPF program, will be the slash directory. So here, the FSWalk is first, well, say the first invocation, interpretation of this program. But because the slash is not in the map, well, it is not known. So nothing happened. Then when we go through the pass, a slash public directory is seen. And this directory is present in the eBPF map. So it matched. And then the FSWalk program can then tag and change its states. Then it can pass its states through a variable in the eBPF context called here cookie. So in this example, it is really a simple one. And it only saved the death of the path. Then there is a web directory. So the web, again, is not referenced by the map. But the FSWalk knows that it was seen before. Well, one of the files was seen before. So we are still in this file hierarchy. And then finally, we reach the index.gstml. So the final target. And the FSPick can look for the state of this chain of programs. And, well, if it is not zero, it can accept it. So again, it is a really simple example to illustrate the way to identify a file. And then this program can allow the access. So this way to identify a pass has many advantages for landlock. The first one is to be completely agnostic to shoot on all namespaces. There is no need for extra information, which are not already available to the requester process. It's easy to account how much resources are used. It is updatable on the fly. Do not rely on string matching, which may introduce a lot of security issues. And, but we can still detect file hierarchies. But there's multiple ways to do it. Here was only an example, a simple one. And also, because it is fully in privilege and fully in user space, it is quite easy to test this kind of security policies. But there's some drawbacks. The first one is, well, the main one is that this pass identification rely on the way that can help those pass name lookup. So mainly, how does it resolve an assembly, the dot or the dot dot dectories? And also, I needed to add a security block to name the data, which is used to record in which pass work we are. So there's some concern from some FCM developers, because this might rely too much on the current pass name lookup implementation, which changes multiple times. But this seems to be quite stable now. So you can take a look at some headers in the FS directory. And well, I think this logic may add change. But right now, it is already viewable, visible to user space, and especially to discretionary access control and also LSM, which may indirectly rely on the way the pass are resolved. And of course, the user-defined queries. So to wrap up, Landlock here is a user-space hardening, which is a problematic way to create security policies and embed them in your applications. And it is designed to be used as, well, it allows entry-level processes to use it. This way, you don't have to have SID binary and so on. So actually, currently, there's around 2,000 single lines of code, which is not much. There's ongoing patches, so you can follow them on the LKML or on my Twitter account. So the main concern right now, I think, is the pass name lookup. But one good thing which is coming, hopefully, is a way to stack multiple LSM, which, of course, will be really useful for Landlock, because distro, which implements, well, which use a Selenux, a Parmo, or other LSM, will then be able to use Landlock as well. So there's multiple future works, just to cite some of them. The audit support, to be able to have any more minimal audit support. But this may be a bit tricky. Of course, to extend the external control to multiple subsystems, like the network and IPC. Maybe to create real capabilities for Landlock. And of course, to have decent library and tools to implement security policy easily. And of course, all this will not be possible without canal developer reviews. So if you want to take a look at Landlock, please do it. Thank you very much. And if you have any questions, I'd be pleased to answer them now. Thanks. Do you have an example of LLMAP that you could show us? Like what the policy looks like? Or is it a binary policy? I can show you an example. So it is a C example. This is one. OK. So I need to go quickly. But mainly, the first unit here is a way to describe some property of a Landlock program, an EBIP program. First, I need to describe the type of this program, which is, for example, an FSB program. Then some options. I will not go through too much deeply into that. The chaining. So which program was chained before? Like we saw in the example, you may have a different program chain and some triggers. So you want this program to be triggered for append, create, and so on actions. And the main program may look like this, but these multiple lines, which are in the update cookie functions. But you can take a look at the code. For example, take a look at Landlock.io, and you'll see a code, the real code. But let's say this program is the one that allows or denies access to a write operation. So it takes the value of the cookie, like we saw before in the chaining operation. It updates the cookie. So if the cookie is 0, well, it will stay 0, except if the iNode, which is here in the context argument, is present in the BPF map. Otherwise, it will increase the cookie. Well, it is a way to identify the file path. Then in this example, I add a mark to the cookie to say, OK, I saw this iNode. And it was in the set of my iNode, which is allowed to be accessible in a right way. And if it is the case, well, I return a low. Otherwise, it is nigh. So that's basically the way right now you can write a policy. So this may not be convenient, but it is really a programmatic way to do it. But of course, you may think about a more easier way to write a policy and still have some programmatic way to do some payloads and custom security policy. And then the following question to that. I seem to have interpreted that you said that you can't match on specificity of file descriptors only of iNodes. Is that right? No. When I, well, so the map add a reference, well, store references to iNode. So if a process want to access a file descriptor, the well, the unlock and the program can see the underlying iNode. So it works. It is an intended goal. But what I was talking about is when you want to fill a map with iNode references, just fill a map with a BPFC score. And as an argument, you use a file descriptor, which reference the iNode you want to put in the map. It is really a simple way and a unique way to fill a map. That's it. So you just need a single reference to the iNode. It's the opposite of what I asked. OK. OK, just one more question here. So other LSMs also use extended attributes. Does your use of them collide in any way? Could you repeat, please? Other LSMs use extended attributes. Does your use collide with those at all, or how do they work? No, well, I don't use extended attributes. I only use the blob for the strict iNode. So for this, well, to be usable with other LSM, I need a way to stack the unlock with a Selenix or a SMAC, for example. So why not is not possible in a clean way? But with the LSM stacking patch series, it will be possible. So I hope it will be upstream soon. OK, we'll move on to the subsystem updates now. Thanks. Thank you.