 Thanks. I'll be doing a talk about sharing code and concepts between user space and the kernel code. The subject of the talk comes from what I usually hear from maintainers when I tell them I'm going to take their code and use it in user space. They kind of against it. So that's a pretty basic slide. How do you write good code? First of all, you start with looking if someone else already did something like that. You just take his code. Next you get people to look at your code because given enough eyeballs or a bug or shallow, if enough people get to see your code, any problem will be obvious and it will be fixed. Next you test and you test again because no matter how good your code looks, as soon as you test it, you'll find that it sucks and you have to rewrite it. And then once your code is ready and it all works and everything's up and running, you share your code, you let other people use it, you let other people take it and do anything they want with it. That's the concept of open source. Basically you let other people use it. They'll give you back patches and they'll approve your code back to you. So a good source to take code from is kernel code, obviously. It's open source. It's probably the most recognized open source project in existence. People who work on the kernel are probably some of the smartest and most intelligent people in their field. They're working on the kernel nets. And that's the people you want to take code from. The kernel gets tested a lot. It gets a bunch of individuals, a bunch of corporations. It's probably also the most tested project in existence, both directly and indirectly. There's so many users that just tested just by being users. It's amazing. And it obviously works. The user base of the kernel keeps growing every year and that's probably the best thing you can show for the success of a software project, increasing users. That's perfect. So now you're asking, sure, let's just take all the code from the kernel and shove it into our user space projects and we'll be happy and get our job done. So it's not that easy. Most of the code inside the kernel is pretty much useless for user space. I mean, when was the last time you needed a task schedule or memory management or device drivers in your user space code? Probably never. Beyond the code, the kernel also has some concepts which are mostly irrelevant to user space, like spin locks, like atomic slipping, interrupts or paging. Those don't exist in user space, but the kernel has to work around them. So that's another issue. It's also very hard just taking a small piece of code out and just put it into the user space project because first you have a bunch of kernel headers which forcefully refuse being compiled from user space. So even if you get past those headers, there are code dependencies. You have that little code, but that depends on some other code somewhere else, and that thing depends on a whole bunch of headers or whatever. So even if you get past that point, you have all those config options sprinkled all over the kernel. So even if you have a bunch of small independent code, you still usually can't compile it all by itself without dealing with those config options somewhere or another. So after this slide, you might say, no, the kernel is useless. Let's just leave it there and rewrite our code. So not exactly. There are some small stuff we can get from the kernel and reuse. The kernel implements quite a lot of data structures. There are several lists, several trees, and hash tables, just not one implementation, but many of them. The kernel really needs them. There are data structures being used all over the kernel. And we do know that the implementation of those structures is good and it's tested because it's in the kernel. The kernel also has a bunch of algorithms. The kernel does lots of encryption, does lots of hashing, checks something, whatever. And we also know that those things are also correct and are also tested. There are also things, the concepts part, which are generally kernel-specific, like let's talk about RCU. The implementation of RCU is very kernel-specific. You can't just rip it out and put it in user space. But people have taken the concept of RCU and created the user space RCU, which builds on the things learned by the kernel RCU. And now there's the nice user space RCU available for everyone, just because they did it in the kernel first. So let's talk about the problems that we mentioned before. The first one was the header jungle. There are a couple sub-problems here. The first one is the header dependency. The header dependency is a mess. It appears that somehow everything depends on everything else, even though it shouldn't. And everything ends up depending on kernel.h. And kernel.h on its own is very user space unfriendly. It really doesn't want to live in user space. Next we have those massive model with headers. They started out pretty small, nice, when the kernel started. But no one ever bothered to split them into manageable parts. So you end up with 3,000 bytes of code, massive, mm.h, which is impossible to use anywhere else because it has so many things that you don't need and don't work in user space. The headers also have compiler and linker directives. Basically it makes sense in the kernel when you can say where, for example, in which section would the function go into? That makes a lot of sense for the kernel, but it doesn't make sense for user space. Kernel section layout doesn't make sense in user space. And there are also data definition dependencies. That's basically where the whole data isn't defined in one place, but it's spread across a couple. A good example for that is the list data structure, where the data type itself is defined in one header, but it's implemented in a completely unrelated header. So getting those all in is quite problematic. So what they did to try and improve that is add something called UAPI. Basically it's an attempt to split the headers into headers that are intended to be included in user space and kernel headers. That's a good start for what we need, because it means that now the kernel has a collection of headers which are intended to be used by user space, and that's awesome. That's what we want to achieve. The problem with that is that UAPI is basically trying to provide ABI to user space. It doesn't try to provide any common code, it just provides mostly the things that will get passed between user and kernel. It doesn't really provide lists or dash tables or whatever. Just as an example, the kernel.h we talked about before has just two really basic macros. It should have been thousands of macros. So UAPI basically would be able to solve the problem with headers. It just needs quite a lot of work in splitting the kernel headers into user space and kernel headers. So what we can do about it is first there are already projects that did that job, Perf KVMTour or LibLog already took kernel project to kernel headers and sanitized them for user space usage. So basically each of those projects has a collection of kernel headers that could be used in user space. So what we can do is take those headers and put them back into UAPI inside the kernel tree. So both projects benefit. Now the kernel has more functionality, it gives up the user space, and all those projects can delete their own locally sanitized headers because they now live in the kernel. Next what we could do is do what those projects do, but from scratch, we could take common code from the kernel and move it into the UAPI headers. That's pretty much what happened with Lib, it mostly has code inside. But Lib is a host to lots of common code that's not specific to any subsystem in the kernel. We can do exactly the same for headers as well. Now if we go even further, we can take all that common code from the kernel and put it completely outside of the kernel in a shared library, give it its own maintainer, and just tell him to push stuff to lead us back. But that's a wild dream and that's probably not going to happen anytime soon. So we've talked a bit about the theory behind splitting using kernel code in user space. So I'll likely talk about some actual examples. So the first one we'll talk will be using actual kernel code directly from user space. And we'll look at Lib logged up for that. We'll look at potentially moving projects inside the kernel tree, and we'll look at trinity for that. And we'll talk about reusing more concepts from the kernel, more coding style, more structures. And we'll look at subsurface for that. So let's begin with log that. Log that was introduced in 2006. It was an attempt to solve the kernel's horrible deadlocking issue, because the kernel was a mess when we got to deadlocks. Deadlocks are quite difficult to reproduce because that's timing and not much you can do to improve that. So there was, someone had to give a solution to that and that was log that. Ever since it was introduced, Lib log, log that managed to eliminate most of locking issues in the kernel, and it did it pretty much early. Most of the time it's the developer that sees the problem and can fix it while he's developing his code. And if he didn't catch it, it's usually getting caught in Linux next. So it rarely gets to Linux's tree. That's what's nice about it. It prevents so many bugs from actually getting to Linux's tree. Log that is probably one of the most used debugging infrastructure in the kernel, because anyone who even touches locking is using log that, even if he doesn't, even if he isn't aware of that. So even though log that was developed specifically for the kernel, there's nothing really specific about log that for the kernel. It's basically a locking dependency analyzer. There's nothing inside the kernel that makes it very specific to the kernel. It has some stuff like interrupts and the difference between atomic locking and non-atomic locking, slip locking, but they don't really matter when you push it out to user space. So Lib log that was suggested by Ingo Molnar, one of the guys who wrote log that. The idea is to basically take log that as it is and provide the same functionality back to user space. The entire project basically took a small four line patch to the kernel, just so we could simplify the user code a bit. It's sort of a, we prefer doing a small patch to the kernel overriding lots of code to user space and the maintainers agree to take it. That made us happy. What that means is that kernel log that sees now being compiled directly from user space, which is quite surprising that you can take code file from the kernel and just compile it in user space. The other two things inside Lib log that, one is basically a small header wrapper that wraps pthread mutixes and let us call Lib log that to catch those. And the second one, that's pretty much the only code in Lib log that it's a trick to let us analyze binaries without having to recompile the code. Basically, we use the LD preload trick to inject ourselves into binaries and just take over the locking there. That way we can use the log that to analyze locking on existing projects, not having to recompile everything to use Lib log that. So if Lib log that had good UAPI headers when it was written, the project would pretty much be a Mac file with compile kernel log that.c. We use UAPI headers and we're done. Zero work. And when you think about what you get from that, you do a small Mac file and you provide the locking dependency analyzer to user space that's working for almost eight years now and it's proven to work fine. That's amazing stuff you can do just by using the kernel that way, just by grabbing chunks of the kernel and providing it to user space. It'll be mostly just a Mac file. We still want the LD preload thing in there but the amount of code beyond that is pretty much nonexistent. There's, currently the pull request is somewhere in ingust locking tree. It's bound to get to the kernel while we're in another. Hopefully next merge window. So the next project is Trinity. It was created and it's been maintained right now by Dave Jones from Red Hat. Trinity is a smart, intelligent, Cisco Pfizer. Basically, it has done an awesome job right now. It caught hundreds of bugs even before they got to Linux's tree. It's been maintained, it's been developed by multiple individuals and companies right now and that means that Linux Next gets lots of testing done every time there's a new Linux Next using the Fuzzer and it catches a couple of good bugs a week which means that it prevents quite a lot of bugs from reaching Linux's tree. And as you might imagine, Cisco Fuzzer for Linux is very specific to the Linux kernel. I mean, you can take it to Windows and let it fuzz Cisco's there. It doesn't make sense. So now you might ask, why is it outside of the Linux kernel in the first place? Trinity is really integrated with the kernel. It reuses bunch of stuff from the internal kernel headers and from the UAPI headers. Since Trinity is a smart fuzzer, it will only test actual configuration of Cisco's that will actually test something. It won't call Cisco with parameters that just won't make sense and immediately return. It'll actually call Cisco with valid parameters and so actual kernel code will get tested. That means that Trinity needs to know everything about the kernel's ABI, everything about the Cisco's, everything about the Ioctals. So most of Trinity's line of code count goes to a big chunk of files that describe Cisco's and Ioctals in the kernel. That means that when you test the kernel, you want that the version of Trinity you have will match exactly the version of the kernel you have because if the version of Trinity is newer than the kernel, that means that now you have a bunch of Cisco's that are not getting tested, they're just getting disabled because they don't exist in the kernel. And if you have a Trinity that's older than the kernel, that means that now the kernel has quite a lot of Cisco's that also don't get tested because they're not describing Trinity. So you really have to have that one-to-one match with the kernel and Trinity. So even after we said all of that, does it really make sense to keep it separated from the kernel? Let's see what happens if we try to merge Trinity inside the kernel. That'll mean that we'll have to go to the kernel and annotate every Cisco and Ioctal. That'll do two things. One is that you'll now have better documentation of the kernel because everything is annotated. It also means that it becomes harder to break the ABI because once you break it, it's easy to find out who or what broke it and how. So it'll prevent breaking the ABI to improve that protection against breaking it. It also means that Trinity could just delete the part about describing the ABI and just use the one that the kernel provides. It also means that more people will be able to test the kernel because once Trinity lives inside the kernel tree, anyone who clones the tree now has Trinity. So it'll encourage more people to develop and to use Trinity to test the kernel. It also means that Trinity will have better coverage. I said before that Trinity must have all the Cisco's and all the Ioctals described in it, but that thing isn't perfect. I mean, humans sat and entered that manually. It's possible that they did a mistake. It's possible that they forgot something, which means that even if we think that the version of Trinity exactly matches the version of the kernel, we still might be missing some stuff, which means that we might be missing important kernel bugs, which aren't getting found, even though they could have quite easily. So if we do that, Trinity will come down to being just a small tiny fuzzer without all the directory of the ABI and the kernel will gain documentation and protection against breaking the ABI. Subsurface. Subsurface was started about two years ago when Linus decided there's no good dive lock software for Linux. He just decided to write one. So this isn't the first time that Linus goes to user space. Most of you have probably heard of Git, which he did a couple of years back. Even though Git and Subsurface are, they provide very different services. One's a dive log and one is version control management system. They're actually quite similar. They follow this thing that I call kernel hacker does user space, which basically means that the kernel hacker decides they need something in user space. So he decides to write it in C, maybe C++. He follows exactly the kernel's coding style. He tries to reuse stuff from the kernel, mostly tries to keep the same function names. He tries to use some data structures if he can, if he can possibly get anything from that. He also probably uses Git and Git, with all that stuff you know and love from the kernel, the signed off by, the tested by stuff. And this one in particular even has Linus's commit messages, so that's an extra bonus. So even though Linus started working on it, you'd imagine that it will be the perfect code, that it looks just like the kernel, that it's amazing and it does pretty much everything I've talked about so far. Yet when you look at the code, you see stuff like this. And the kernel hackers that sit here kinda like, what exactly is going on here? That's a very trivial piece of code, but it took the hackers more than a second understanding what's going on here. I mean, it's obviously does something with linked lists, but it looks weird, it doesn't look quite right. Now, if this code instead of that would have been this thing, that thing looks much more obvious to kernel folks. This little use of HList, it suddenly makes stuff obvious. Kernel hackers look at it and they'll say, oh, I know that it rates over linked lists. No, that's obvious. So doing that small change is both making the code more obvious to more people, and it's also attracting kernel hackers to your project. Now, when they look at code like that, they kinda feel at home, kinda looks like, oh, that's the kernel code, I know how that works. I mean, even if you look at this code, you can't tell me if it's a user space code or a kernel code. Could have been both pretty much. There's nothing specific to either one of them. Actually, give this one. The question here is why wasn't it written this way in the first place? And the answer is that the kernel doesn't make it easy to use the list it has inside the kernel. Even though the list is actually designed very generically, it's very abstract, there's nothing kernel specific in the list. You can't just easily take the list out and use it in your favorite user space project. You have to jump through hoops. It's not easy and obviously Linus or the guys who came after him didn't wanna go through that. So they just gave up and you got that thing. That's a shame, could have been this thing which is much prettier. So what did we see so far with the examples is that it will be nice if the kernel and the user space will work much closer together if they share more code, if they share more headers, more data structures. And doing that sharing isn't too complicated. I mean, tearing out the list code, it's quite easy. It's not that much of a project but it was too difficult for the subsurface folks to do it for example. So it has to be much more simpler than that. People just have to be able to just include list.h and use it in their user space code. They don't wanna jump through hoops to get lists. It also means that if we do that, we might get the opportunity to have more hybrid hackers and that's also a concept used by Ingo Molnar. What the hybrid hacker is is basically a hacker who can easily jump between kernel and user space. He can work on kernel. If he gets tired of that and he sees something missing in user space, he'll just go and write user space. Usually, you'll have a couple of projects. You'll have also a bunch of shared code in the projects. There'll be a common shared code base that he uses in every project because it makes sense. You don't wanna write lists all over every time you start any project. Just grab them from somewhere else. Like we talked in the first slide. The code will look very similar in all of these projects. You'll probably use the kernel coding style because that makes sense and people are familiar with that. It also means that the hacker's talent will benefit all those projects. So if you think about it, if you get, if you have a user space project, a Linux decides to help it, that's a huge help for the project. And maybe not Linux, but there are a bunch of other kernel hackers who will be happy to work on projects. If it'll look much more easier to them to join in and not having to learn everything from the beginning, they could just jump in a project and everything will look familiar. The list iterators will look the same. They'll be happy to work on a project. A good example for a hybrid hacker is Linus. Usually walks in the kernel, but he saw a problem with version control, so he wrote Git. Then he saw a problem with Devlog software so he decided to write subsurface. So just to wrap it up, couple of pointers about where we could either find opportunities to grab shared code from or where we can put some work to extract more shared code. The first one is lib, which is basically the kernel's shared library code. It has a bunch of functions which are usually not kernel specific. They're definitely common to the kernel. Those tend to be encryption functions, lashing everything we talked about in the beginning. Basically, if you're working on something that does encryption, that's the best place to look at first. You just take encryption implementations from it and use it in your project. Next, tools already have user space code. It's mostly small stuff. The biggest thing there is Perf. It's usually small stuff that just test various kernel stuff, but there's no reason not to add stuff to it, not to add more functionality into tools or even to look at how existing tools are already implemented and possibly take IDs from them if you're working on a different user space project. You can see how they did list, how they did hash table and take it from them. We have include UAPI, which is basically where the user space API is located at. It's useful to look at it and see what exactly the kernel exposes right now. It's bound to just keep growing all the time. Users will keep adding new code to it. It's also possible to see if the kernel has something you want and just take it from the kernel and move it into a user space API thing, just creating new header and move stuff in there. You'll benefit yourself and you'll benefit other people who will now be able to use the code just like you. The part about sharing code style, it's really, people just don't understand how important it is when people look at your project and they can just understand easily the code. The smallest things like that ListIterator is what's making the difference. That's how you attract kernel hackers to work on your projects. They don't really want to start getting to know everything from the beginning. They want to look at the code and they want to feel like home. Well, they want it to look like the kernel. So basically, if you adapt the kernels coding style, kernel hackers will feel that way. They'll encourage them to come and help your project. They'll encourage them to take code from the kernel, move it into your project. Basically, that's the way to attract talent. And the small macros and code snippets, those are actually, it's possible to fish them out out of headers. There are tons of small help on macros that are easy to come by if you just put the effort in. And if someone does decide to, if someone sees something he wants to take from the kernel, just please put it in the UAPI directory and let other people use it as well. Because the kernel is full of so many small and useful things. They're just buried down there and no one can use them because the kernel doesn't make it any easier. And that's all, if anyone has any questions, I'll use this thing. So let's use the question from a kernel, no op. But is it actually possible to have a library and build a static and a dynamic version of that and then directly link the static version of the library into the kernel or do you actually have to build it twice? So does it need to live inside the kernel? It has to live inside your code, basically. That's unfortunate. So you can't just take a .afile and just add that to the kernel and complete take out the code of, say, like your data structures or encryption algorithms. No, it's getting built differently. It won't get just, you can't just shove it inside your project. You're just dragging so much stuff into the kernel that it'll explore. The kernel has its own thingy for libraries, basically. It has its own small implementation of what Lipsy does. It also has those. So if you try dragging stuff from user space, it will collide with what the kernel does, like string function and everything. It'll just hit what the kernel already did for that. Another question. Sorry if that's nothing else. Yeah, so it seems in the beginning we're arguing for taking stuff out of the kernel which is useful in both worlds, but it seems in the case of Trinity you were arguing quite the opposite way. So did I misunderstand this or was that just a case for a thing which should have really have been done in the kernel right from the start? The Trinity argument is more about having stuff which are tightly integrated with the kernel tree inside the kernel tree. Putting Trinity inside the kernel tree doesn't really mean that it becomes hidden and no one can use it. It just makes the development cycle for Trinity and the kernel work together and share more code. Basically, it's less about taking stuff outside the kernel or taking stuff outside of user space. It's more about sharing code between them. So we can do it either by ripping out code from the kernel or you can do it by adding code into the kernel. If you have a bunch of common code, for example, with Trinity and the kernel, you can eliminate that common code. Okay, thanks. Anyone else? I had a couple of questions. Firstly, why not actually add a Ulib instead of UAPI to stop confusing stuff that the kernel exports an API with stuff that the kernel is sharing as a library? That makes sense. That's though a very hard political maneuver to pull off. Well, then it won't go in easily. You'll need to fight over that. So it's easier to just move stuff into UAPI. The header part, move it into UAPI and move the shared code into Lib and then somehow it's traded from Lib because adding a Ulib inside the kernel is a very difficult political move. And secondly, for something like Trinity, could you export when you build the kernel a list of the entry points as well while you're doing that? So you don't have to keep recompiling Trinity as well when you're building the kernel to test. Trinity has to know stuff beyond entry points. Trinity knows about all the parameters this is called stake and expected return value. So it's much more than just for the functions. It's pretty much exactly what each parameter is and what it can accept as a valid value. But could that be then processed and then put out with the kernels and say some sort of data file of this is the all? Oh, this definitely could be exported from the kernel. That means that you have to annotate all those stuff inside the kernel. So you can already do that job. So the only relevant part to getting Trinity inside the kernel tree is just making it so much easier for people to access it and work with it and touch it. No one else? Well, that's not really a question. It is like as a user space guy, I'm not really fully convinced that there would be 100% good idea to make more code look like the kernel for various reasons. But first, I mean, Linus might tell us as much as you want, but I mean, I strongly disbelieve that it's a good idea to write everything and see on the user space because people make way too many hours on that and it's way too expensive to do this. And just because it's the right answer for the kernel doesn't mean it's the right answer for everything else. So in fact, it would probably be better to stop reinventing the wheel on data structures and hash tables all the time and instead use sensible programming languages which hide away all that complexity for you so that the code snippets that you show and get more abstract and even easier to read and so on. And then we let compilers worry about the glory bits. So, and I mean, there was lots of like user space library around which allows you that like either higher level languages like Vala or even Glib which makes more complex data structures bearable and see. Of course, I guess someone like Linus would never get even close to that because it's not cool enough. But I just wanted to say like that as a statement, sorry. I agree, I didn't try saying that if you're starting your project, write it in C. I wasn't trying to force everyone to use C. I'm just saying that if you're starting a project and if you need to write it in C for any reason, it's probably better to adapt the kernel's code install. I wasn't trying to apply that now all of you have to use C and forget everything else. So either you start a project and you have to use C or like for some odd reason or if you're Linus who just won't use anything else. Just it's usually better to adapt the kernel coding style and the kernel structure if you do decide to do that. That's your choice if you decide to write and see. For all the right reasons and for all the right languages, the kernel does have a very specific make setup. One problem that I foresee just from our own use and we will, for some of our GPL stuff, we'll quite happily go in and steal ideas from the kernel because super intelligent guys have thought about it and gotten the right code and it does the right thing. G libc, less so, quite honestly. But one problem we've sort of hit in the lib scenario is that we have a rather different view in user space as to what the compiler should be doing to our code and specifically things like whole program optimizations, link time optimizations and so forth which we rely on typically on to, I don't see that we could, I could argue in front of Linus that he should turn on that FLTO in the kernel build any time soon. There's actually a project to enable that rolling around, it's quite huge though because the kernel has tons of stuff which aren't happy with that. So yeah, okay, that basically answers my question, it's on the way. LTS on the way, basically I was also trying to say previously over there that it might be a good idea to remove anything, anything compiler or linker specific outside of the headers or outside of the, mostly outside of the headers and just keep it inside the code itself. User space can work around, basically it can disable anything that the kernel says about linker and compiler directives that just slightly difficult. But yeah, basically compiler and linker stuff shouldn't be part of headers, they don't belong there, you know what I'm saying? One last thought, is there any danger of GPL infection across this? Definitely, if you take stuff from the kernel, if your stuff looks similar enough to the kernel, it's obviously has to be GPL. Yeah, pretty much, if you take code from the kernel, it has to be GPL, that's very important, that's what's with Trinity and subsurface and we block that for all GPL. If your project is BSD and you decide to take code from the kernel, might be not a good idea in the future. Now, lawyers and everything. No, thank you very much then.