 Hello. Hello, everyone. My name is Michael Sanin. I'm a senior software engineer at Microsoft. And I'm the manager of Lonlock, which is a new Linux terminal, which was in Mainline, which was introduced in Mainline last year, in the 5.13 kernel. And this talk is about other dates since then, which is mostly about some improvement on the file access control types and some ongoing work for the support networking access control types. Okay. So, also some important news, just not directly related to the kernel, but which are important, well, useful for everyone, is that Lonlock is now enabled by default in some Linux distros. So, for example, Ubuntu, Fedora, Arch Linux, Pine Linux, Gen2, which is the Fedora configuration, and others are ongoing. For example, the Chrome OS is working on supporting Lonlock too. So, just to understand a bit. So, at first, Lonlock was a MIMAM variable product. So, the extreme version, which was ported in a 5.13 kernel, only contained a subset of what Lonlock could do. And, well, it was required for, of course, to minimize the code, to minimize the review, to minimize the condition, to minimize the tests, which are already better enough. And, yeah, so, over the time, well, Lonlock is kind of a journey. And, yeah, it is an incremental development. So, one other feature I just tried to explain is some limitations that were, what, that are still there, and how we can, well, lift that. And in the second part, yeah, I will talk about ongoing work for, well, to support network access control. But first, let's recap bits. What is sandboxing? Maybe not everyone is on the same page. So, sandboxing, especially security sandboxing, is a security approach to isolate square components. So, you can imagine content might be either an application, container, or a set of processes from the rest of the system. So, nowadays, we know what isolation means, lockdown, stuff like that. But concerning application processes, we should keep in mind that even trusset application, trusset process, can become many issues over time. So, that's a really important point. Even if you develop your application, you control it, and the user may trust you and trust your application. Well, because of bugs exploited by attackers, well, it may change its behavior and become many issues. That's why there is sandboxing. And the two main properties of sandboxing are, well, to follow the disparage principle, which is to not require more privileges to drop access that you already have. So, in a shell to not use saturated binaries. And another important property is that these policies, because they should be accessible to everyone and anyone, they could also be accessible and used by attackers. So, being able to enforce a security policy should be innocuous for the rest of the system. And we should also be able to compose different security policies because you have multiple applications, and each of them could use and enforce a security policy dedicated for their use case. So, from the beginning point of view, it is really, well, a composition of different and independent security policies that may be loaded over time during the lifetime of the system. Okay. And here, what is Linux? So, Linux is a monetary access control system which is kind of special because it is not dedicated to the system administrator or the Linux distro mainers. There, Linux is a set of features which are used through free schools. And, well, it is dedicated to app developers at first, but, of course, it can be used by users, by system administrators, too, and so on. And, yeah, it is able to add built-in and sandboxing into applications which will help to follow the logic, the behavior of the application. For example, you may configure a web server which access, well, directory containing HTML files, and so it is HTML for this server to read all the files in the directory, but it may not be legitimate to read whatever secrets may be in the slash home directory, for example. Yeah, and another important point is that so it can be used for sandboxing your own application, but we should also consider trusted, well, third party component. So this might include libraries, but also, well, services. That may or may not be as trusted as your own developments. So, yeah, we have some solution for, well, secure supply chain, but sandboxing is also another layer of security. Let's go for the first part of the talk. Lifting the file reprinting limits. So this is mostly about the rename and the name syscalls. Initial rename is to be able to move a file and well, link to create new paths pointing to the same content. Initially, the files that make some types which were supported by Unlock were mainly to control execution, read-write to a file, to be able to list a directory, or remove some files from it, and to create files according to that type. For example, to be able to create a regular file, name pipes, and stuff like that. So to be able to enforce, specifically, well, for an unproledged ecosystem, we need to be able to identify file hierarchies. And that was the first challenge because we cannot rely on excellent attributes on file. We may not be able to rely on paths because, well, assembly process may be executing in a specific namespace, so you may not be aware of the full absolute path of specific hierarchy. And, yeah, so it may not be able to write to a file or to a file system. So that's the reason why Unlock uses a kind of an fml iNode tagging. So on the fly, an application reaching to some DAX itself can identify set of files, set of iNodes, in fact, and put some restrictions on them, or to be more correct, some exceptions. And, yeah, so we also should keep in mind that there's multiple independent security processes running on the same system, so each of them may identify the same file hierarchies and you should be able to, well, identify these hierarchies but for this specific set of processes. Let's start with an example of FilesM4C composition. So that might be more clear. So Unlock enabled to have multiple security processes at the same time for different applications, but also it enables to create nested sandboxing. Because you may, for example, have a system service that launch and sandbox itself, and after, let's say, it is an SSH server, after a user connects to this server, well, a new process might be spawned and new restrictions could be added to this specific process. So this way, you can have multiple layers of security processes. In this first layer, it is quite a generic security policy which mostly restricts execution to the slash-dev directory and the home user directory and some directory like TMP, and bars that may contain cache files or temporary files. The second layer, for example, when there is a launch picture application, could be, well, the developer might think, well, my application only need to access some cache files which are in the home directory, .cache slash app, and well, my applications might access some of its configuration files in a read and write way because you might want to change the configuration through your application. And because it is an application dedicated to display pictures, well, you know that you should be able to read some files, some pictures in the picture directory. So that's a second layer. And once the user wants to open a specific file, this display application could create another layer of security. And at this point, well, the application knows that only specific file here, the cool.jpg file, should be accessed and in a read and write way. But the cache directory might also be useful, well, to store some passing of this picture. But the configuration might not be required and other accesses to the files are the same. So here we have three layers and now let's see how the canals identify if an access request is legitimate or not. At first, it looks for, well, the target file, which is in this case a cool.jpg file. So the third layer allowed to access this file in a read way. Then it is okay for the first layer, but it's still two layers, we mean. So the canals continue to pass to the parent directory, which is here the pictures directory. And the thing is, well, the second layer allowed this directory to be read, so it's okay. And there's a check. And at the third check, well, going to the parent directory again, the canals found that the home directory is in fact allowed to be read to. It was, well, it is allowed to be written, but only for the first layer. And that's not requested, so that's also scope anyway. But the read request is legitimate for all three layers. So that's good, it is allowed, and you can view a cat picture, that's good. Okay, but the thing is, Landlock relies on file arcs to identify files and to map access to this set of files. So it also means that we cannot, we may not be safe while modifying these file arcs. Because if you identify something with a specific path, but you rename this path or you move the directory, well, that might change stuff. And that's especially the case for renaming and linking. And that's one of the most annoying limitations which probably be, for example, generate containers to use Landlock as is. Well, to use Landlock until now. So let's see an example of what could go wrong here. And when I had explained everything here, the current rules, I will ask a question. Here it is a sandbox with three rules. The first one is to allow, well, read access to the home directory. The second rule is to allow write access to the work directory. So the work directory can be read and write because the parent of the work directory is the user directory. And the third rule is to allow files in the tools directory to be executed. So it may contain a set of scripts and stuff like that. And what could go wrong if the user want to link the tools through file to the work foo file? Does anyone have an idea? Yeah, so, yeah, that's correct. The answer was, well, until it is a way to gain more privileges. So it's kind of a privilege escalation. So accessing the foo files through the tools directory only allow execution and read. But accessing the work foo file, so the foo files through the work directory allows, well, only read and write, except that the handling I note and the handling data are the same. So this kind of mirror, which gives on the one side the ability to read and execute a file and the other to write to the file. So this would allow, if it was allowed, to read, write, and execute a content which might not be what we want for this use case. Now enters a new access control types, which is called an unlock access fsrefer. Well, to refer to something, to refer to an item. To be able to link our name file, it is now required to have this access types for specific hierarchy for the source and the destination. But it is not enough to be able to link our name file. Well, of course, you should be allowed to write to destination directory and potentially to remove some file from the source directory in the case of the rename. And last, but not least, well, we need to be sure that this will not lead to a kind of privilege escalation. So the kernel checks that the destination directory will not give more access right to the file which already allowed on the source directory. Okay, so let's get the same, quite the same rule sets and see how we could allow this workflow. This kind of strange policy, but it forces the example. So we still have read access to the home directory, but we also have the new access type. So the refer, the FS refer to the home directory. We still have write access to the work directory, and the third rule still has, still gives access to the tool directory in an execute way, but also in a write way, which is kind of weird because you can put whatever you want in the directory, execute it and write. So let's look for things about, to explain how to link, well, which kind of check the instrument to link a file from source to a destination. So, yeah, here we can see that the tools foo file inherit the read, execute, and write access, and the walk foo file only inherit the read and write access. So these are subsets of the source, the allowed source files. Now, how does this work from the kernel point of view? Well, the kernel first check, well, browse the files like it does for a common access, access quest. It starts by the source path, which is here the tools foo file. Look for some access rights. So here, again, it is only for one layer. So you may keep in mind that there is potentially multiple layers. So we got to the primary tree, which is the tools directory, and there we collect, at the bottom left, you see, we collect the write and execute access rights, and then we continue to the primary tree and we collect the read and refer access rights. And we continue until the first moin point. So do you have an idea why we can stop at this moin point? What is special about moin points? We name and links. Yeah. So you cannot do hard links. You can do links. So we name, I mean, quickly we name a file from one tree to another if there are two different moin points. So that's kind of a small optimization, but it's interesting anyway. So here we are. For the source path, we collected four access rights. Now the canal checks for the destination path and check, well, collect here the write access rights and the read and refer access rights and then stop at the primary tree. So that's it. We can see that destination, indeed, are a subset of the source access rights. So that's okay. We can link a file or even rename it. And so this walking, this path walking, was also a reason why we needed to switch, well, to limit the number of layers. So right now the number of nested sandboxing is 16. So I think it's enough for most use cases, maybe not all, but that's pretty good, I think. Yeah, so again, we need to take into account that this kind of path walk is to collect all the access rights for all the 16 maximum layers. And yeah, so 16 layers max. One other thing to keep in mind is that when you rename a file, you may get an eaccess error code, which is when, well, the permission is denied, or an xdev error code, which is when you want to link a file from one point to another one, which is denied by the channel. And so this is interesting because if you want to move with the MV command line, a file from one point to another, well, if the destination is denied, well, even creating a new file will be denied. So if the deny comes from Langlock, which made such check, well, it returned an xdev error code in the case of creating a new file will be allowed and removing the source file will be also allowed. We'll see an example just after. And so we also introduce a new ABI version, which is version specific to Langlock. So Langlock enables use space to get some information, which is important to enforce the best security approach. So this works simply by calling a syscall here, so the langlock-crate-doolset with a specific flag, a langlock-crate-doolset version. So this way you can, well, application can get and understand the features which are supported by the ring channel. And yeah, so it's again required because we are in a sandbox environment and we want to enforce as much as we can, because application developers don't know everything that they can do. Okay, let's quickly go to the last part, the networking part, which is developed actually by Konstantin. So I think he's in the chat. In actual, the idea here is still to be able to restrict the process and to protect processes outside of it. So it's not a system-wide firewall. So in actual, here, we want to be able to control the what, so what application can access. So in practice, TCP ports or UDP ports, for example, but the who, the IPs, might be more difficult. So that's not on scope for now. And, well, that may get changes. So yeah, that may not be really even for a sandbox anymore. So this interview, the sixth session was sent some days ago. And in actual, there's two new access sites. The TCP connect and the TCP bind. So you can create a rule with this access type and also add a port. So to access to connect to a specific port or to bind to a specific port. Okay, so there's some questions which aren't solved yet. But if you have any idea or wishes, we can discuss about that. So here's a list of some questions that might come later. Okay, let's switch to the demo. So here, I hope you see enough. So we are on a video machine with a new channel and with, well, network supporting stuff. And we have, so let's say one sandboxer. I copied old sandboxer 2 to show the difference. Okay, so here I will launch a shell, bash, with the initial version of Linux. So without the fsrefer access file. So here we are in a new sandbox. So you cannot read your directory, but you can go in slash TMP and read what's inside and create files. Good, okay. So let's say you want to move this file. So for example, you'll create a directory and you'll move A to X, A. But before that, let's look at the i-node, the effective i-node of this file. So, well, you can see it. And if you want to move A to X, A, this works, great. But I just told before that it was denied. But in fact, this is denied with exdev. So the MV application knows that it cannot create a name, cannot create a link. So what this application does is that it copies the content of the file, it creates a new one, and we move the old one. And you can see that by looking at the i-node of the new file, which is different. So it's not the same file, actually the same content. So it works, but it may not be as efficient. So let's do the same stuff with a new sandbox. So we go in the same directory. Still, well, it's the same file, so the same i-node. But let's copy this file here. And then you can see that it is indeed the same i-node. So it's not magic trick. It's really the same file which was renamed properly. And so launch a new version of the sandboxer. We should also include here new rules that you can see. This is LLTCB bind and LLTCB connect. So this means that this new sandbox shell can only bind to the port 2000 and only connect to the port 3000. So we can check that quickly. If you want to connect, let's say I listen here to port 2000. So at the top, I'm outside the sandbox. And at the bottom, I'm in the sandbox. So if I want to connect to this port, it is denied because it is only allowed to bind on this port, not to connect. So that's the other port. So let's listen to this port. Now this works. And well, we can do the same if you want to bind to something. So here this port will work. But if you want to bind to another port, it will be denied. And if you want to connect to something else, it will be denied too. OK, that was it. So what's next? Actually, the map is short time to add new audit features to be able to easily debug the sandbox. And to also add new access control types, so for networking and processing, and to improve performance, which will be good too, of course. In the medium term, well, it is, again, to extend what already exists. In some use cases, it may be to be able to follow a denied listing approach. So for example, to allow every access except your dot SSH directory. OK, that was it. It was a bit more long, but I hope you understood. And yeah, it was interesting for you. Do you have any question for the networking stuff or something else? Great. Yeah, one question. What the networking? Yeah, I can share with that. So in a shell, that is big enough. But let's try. So that's the code of the Sunboxer. And so here are the... OK. So in a shell, we create a rule set here. So, OK. At the middle, you can see there's a rule set with handled access FS and handled access net. So both of these still contain a set of flags. So, for example, here, this... All this flag for system controls. Not very clear, but you can find it on the KALSUS code. And then we populate this set of handled access rights. And in this case... Well, it's... Well, the rule set get populated. And yeah. So the network part is here. You create a long log net service ATTR structure and set which action are allowed. This part get populated later. And a field which contains the port, which is allowed in this case. So it's there for tpConnect or tpBuy. And once you set, well, you add a rule. With the Linux add rule syscall, the rule set, which was created before. And you specify this net service that you want to add. Well, it is a type of the rule. And then, well, the definition of this service, which is the action and the port. So yeah, that's maybe not the best example. But yeah, feel free to look at the source code. I mean, the sample source code. Thank you.