 So, our next talk will be delivered by Samuel here, but before we start, may I just ask the audience not to leave the room before the talk is over, because it disturbs the speaker and also the people who are watching it. It's not very considerate. I would also like to tell the same thing to the people who are about to come, but unfortunately they are not here. So, it's yours. Okay. So, good morning everybody. So, we talk about the herd in general and in particular this morning the PCI Arbiter, which is something which was developed really recently. So, I am Samuel Thibault, but this is Johan who did all the work. So, credit is for him. So, the herd, basically it's all about freedom zero of free software and the freedom to run the program for any purpose. And the thing is on Linux you don't have that. You cannot do things that only routes can do and things like this. So, for instance, why is fdisk, mkx2fs, et cetera, hidden in slash has been. I have to prepare a disk image or something. Why do you have to get the tools from has been. I should be able to tinker with image. I have a home directory where there is room and I can store things. I have access to the network. I would like to be able to do VPNs and plug the applications I want to the VPN and not the others and things like this. And it's also freedom to innovate. So, if you want to experiment with the file system, well you have to patch your kernel and maybe it will become less stable. If you're on the machine, you don't administrate, you will not be able to do that because the administrator doesn't trust you. So, that's really a problem. Maybe you want to tune your workflow because yes, you have things and the way you prefer and being just in your home directory is not the things you would like to have to combine processes the way you want, et cetera. And one of the things I'll talk about today is giving a PCI card and to process. So, you have, I don't know, music processing software and you want to give it the PCI card so it can really drive the thing as efficiently as possible to read time processing things like this and you want to give it a PCI card. It's also the freedom from software itself. Software, maybe you do not trust it because you haven't read the source code or you cannot read the source code or you know there are bugs and you want to isolate the program from others to avoid something crashing, getting everything with it and drivers of course. I mean, we know that all the bugs in the kernel are mostly in the drivers. So, this is how it looks like. So, it's really a micro kernel layering. So, you have a kernel which only manages tasks, memory and IPC and the rest is in New Zealand. So, we have a PROC server which knows about processes, their PID, UID, owners, things like this. PFI9 is the TCP IP stack. The X2FS manages the file system and OTH is something people don't usually know about. It's just knowing what process belongs to which UID. So, this OTH server is related with all the rest being that X2FS knows about UIDs for files but it's OTH which tells, okay, this shell is allowed to do this with the file system. So, it's a sort of rendezvous point for authorization things. And so, it's all flexible like this. We can do crazy stuff. I can refer you to previous videos of previous FOSDEM micro kernel room sessions just to give some examples. If a server crashes, that's not a problem. So, the computable, the farm error we see on her system, it's just an error. It's not something of the death that takes the whole system away. So, for instance, if the TCP IP stack crashes, okay, all the TCP sessions are lost. But everything else is fine. I can recover a system. I don't have network access to it. But if I have console access, then, okay, nothing is lost. It's easy to debug and tune. You can run GDB on the TCP IP stack on X2FS, things like this. And then you can do crazy things. So, for instance, the console, there is a mark console which is really low-featured. But there is the heart console. It has dynamic front support, text front support. That is, the PCI VGA card only allows for 500 glyphs showed on the screen. But then you can allocate dynamically which glyph you want to show. And then you can have Chinese, which has many more kanjis than this. But since they don't show up at the same time, then you can dynamically choose what shows up. So, Chinese kanjis, emojis, even we don't have to do anything to get emojis on the console, but just load a font which has the glyphs. And that's fine. And the kernel is quite small. It only handles the task memory and IPC. Everything else is on the user land. So, for instance, I have the same things I talked about. And then I have an FTPFS running as a user because it doesn't need any kind of special authorization. It just needs to access the network to connect to FTP. And then there's ISOFS and a shell CP. What is happening? It's actually basically you have mounted an ISOFS which is on an FTP server. And then you can access to the files within the ISO image which is on FTP. And the thing is we don't need to download the whole ISO to do that because ISO is indexed. And so FTPFS just serves the bits of the file that are needed for ISOFS to work and provide the file to CP which can get the file. So it's extremely efficient, actually. So some examples I have on my home FTP column which is multiplexer for FTPFS. And so I can just do, I don't know, VI.Chill slash FTP column and an FTP URL, same for HTTP. And I don't need to implement FTP in VI. It's just provided by the system. So I can mount on my directory the ISOFS for something which is on an FTP server. Then look at it. It just works. The signature file I can put something on it which says each time you open it you run Fortune. And so each time you actually read the dot signature you get a new Fortune. So it's cool stuff. And one thing I mentioned, getting your own environment, there's the map, the remap translator which says you get a new shell in which slash bin slash SH is actually appointed to home slash bin slash SH. And so from then on all the scripts I run, et cetera, do see that shell instead of that shell which might be convenient if you have, I don't know, scripts which want bash as a shell in slash bin slash SH but the system provides dash as a shell. Or you can get everything in a slash bin which is actually your bin in which you have put everything. So this is basically the kind of things that we have with toe, nicks, geeks, et cetera. Except that it's at the operating system level. So how does it work? It's quite simple actually. The idea is that it slipsy which implements this. So everything that happens in the system is interposable because everything is already an RPC. It's how it works from the start. And so you just expose what you want in the file system in slash something slash something and then the user can decide what to put here and there. The idea, if you think about, I don't know, fake root, CH root for instance, fake root is actually something that puts a library to redirect, open, close, et cetera so that instead of opening that thing it opens something else. It does work enough for the use it's used for but it's not perfect. And as soon as you have new system calls then it has to be taught about the new system calls to redirect them as well. Well here since it leapsy then anything that will be added will be interposable that way. So actually it's virtualization at a really fine grain interface. When you see containers, everything like this, it's all very coarse grain compared to this. And so the idea as a user you have your home directory, you have network access then you can do whatever you want with it. So crazy thing, you have open a VPN then your own TCP IP stack on top of it then FTPFS and to get a disk image which is partitioned so you get to one of the partition then you get inside the X2FS of the partition and there's an ISO image and then you open something inside it. And it is not sure to do this actually. Okay, so getting to what I'd like to show today. So quite recently what we did was to move originally we had the drivers, the network drivers inside the kernel and that was not a good thing. So we said okay let's move it to username so that it, eth0 actually leaves something somewhere on slash def slash eth0 so I can see it. It's not something you see in Linux for instance. But here we actually have slash def slash eth0. And what we did was just to take the Linux drivers and to put it in a process and then plug here with PFI net and get it to have access to the hardware to drive the PCI cards. So this is really nice because for instance I know that some of the real tech drivers get stuck from times to times and apparently it's really the driver which has a bug. And so when it's stuck, okay just kill the driver and it gets restarted and everything else continues to work. The PFI net stuck is fine. It just opens the new driver and all the TCP sessions are still there. Nothing is lost. So it's really cool. If you want to do a firewall then just put a process which filters the frames between the TCP IP stack and the actual device and then you have a firewall. If you want to be crazy you can put an open VPN on top of this and then a new TCP IP stack. The idea is that the system provided stack is somewhere here and then your own stack you can put it in your home directory and with the remap translator just tell applications. Instead the slash server socket 2 actually is redirected to somewhere and then it just works. You have virtualization made easy actually. So about PCI, the problem we have which is not too much a problem because we are lucky. The net DDE driver, so this is called net DDE it actually accesses PCI and the network PCI cards and for this it has to access the PCI config space when it starts. But then if you start Xorg, Xorg will need that as well and then if you start a ramp demon for getting sound then you need that as well. So the list can continue. We need something to make that safe because PCI config is not something which is threat safe, concurrent safe. So for now it's fine enough because GNUMA starts first then net DDE and then Xorg. So it never happens that they actually work together but quite soon we will have something like this happening so it's not a good idea. So the idea that was implemented by Joanne is to have a translator that provides the PCI config access and so you would have slash server slash PCI slash the path to the device so the PCI domain, PCI bus, device function etc. And then when you open the file here then you can use RPCs to read, write, configuration things and get the device regions to access the memory and the ROM to map it so that the drivers can work etc. And so basically that was enough to implement libpcexs and pciutils backends. So that way anything that uses libpci access for instance Xorg and things like this just transparently access the PCI arbiter and they can run concurrently. So that's good. So we have this. So we have the PCI arbiter which lives in New Zealand which provides access to so NetDD for the network and then Xorg which is started and then a Rump Demon for sound so PCM and for instance my Firefox opens a network and emits to Xorg and plays the sound at the same time and W3M just uses the network. Okay, good. But then we can go further. Accessing PCI cards as a user, while we are at it, we have files in stashservers.pci maybe I can switch mode a file and then provide access to a user to the device. So just change the permissions. You can say okay on servers PCI you add that is allowed to access to this PCI card. So you can do it on the fly by using fc-sops or you can use setruns to record that on the file system so it's always that way and then the user here can access to get read write config etc. So for now one thing which I mentioned so get dev regions, get dev ROM for now these only provide the address and then you have to open slash dev slash RAM to access the actual memory and you have to be root for this for now. One thing we would like to do is to add operations here so that the user will just mmap the device there to get the actual resources, the memory of the PCI card, things like this and some PCI cards are not only driven by memory but also IO ports, the old IO legacy IO ports. The funny thing is GNUMA does implement the notion of token which allows to access to a given set of IO ports. So the idea is that the PCI arbiter would create a token tell the kernel please make me a token that allows to access to these IO ports which are the ones that access the card and then I can give it to a program and then it can use it to access to the IO ports so that we can have this. That is we have the PCI arbiter which has to be root to access the actual hardware and then everything else can be run as nobody. Nobody on the herd is not a UID, is really nobody, it's processes which don't have any permission. So for instance the NetDD driver can start maybe as a user which is allowed to access network cards so it opens the network card with the PCI arbiter once it has done this, it has the permissions it can forget about its permission it tells the system remove any of my permissions and then it cannot do anything else than just driving the PCI card. So that's extremely safe. And same for Xorg for the Ramp driver etc. and then everything here we don't trust it that's not a problem, it cannot do much harm it cannot even open a file. So that's fine. Maybe I can even move my Ramp sound daemon to a user that is this is controlled by the administrator of the system that provides drivers for the hardware, ok? But maybe the user said I would like to drive the sound card myself because it has special features, experimental features etc. I would like to run this as a user. Ok, why not? But that's dangerous. Why? Because PCI cards can do DMAs anywhere in the physical memory. But you have IOMMU which allows to prevent this kind of thing. So you actually control, you say this card can only access to that part of the memory and then it becomes safe. You just allow the user to access the memory it owns and that's all. So if you know about PCI path passed through with hypervisors that's the same feature and it's just at the shell level instead of hypervisor you have to configure something etc. Here you just configure it in the shell and you have access to the PCI card. And maybe you don't want to give the whole PCI card to the user because others would like to use it. There are quite a few high-end cards like Cisco cards or things like this which actually provides various functions. You have just one Ethernet card but when you do LSPCI you see a lot of functions. It's just virtual cards that you can assign to different domains on a hypervisor system but on the HUD it would be on different users. So each user has access to the same card and they are just isolated one from the other. And you could even go further. The thing is you're a user in that situation if you don't trust this code it's a problem because it can open files maybe I don't know the passwords you have stored in Firefox and things like this. So you don't trust that code. Just start a sub-herd. A sub-herd is something that you can start as a normal user which gets completely separated. It has its own notion of users and then it cannot access the files you can access. So I have W3M which I trust. Okay, I like it and I know it's small. Okay, fine, I trust it. But Firefox, it has Flash, all these things. I run it as a separate user and the driver as well. I run it as a separate user. The idea is that to be able to do this I actually nest the PCI orbiters. There is the system provided one which the user is allowed to access only the sound card. And then the user starts another PCI orbiter in which it says, okay, this sub-user is allowed to access the sound board and only that and maybe something else will do not provide access to it, etc. So this is a way to be able to run entrusted code safely as a user even. Okay, so that was for the PCI part and now the news about the Herd so the status hasn't changed so much. So we support 32-bit. We have a starting of 64-bit support which hasn't moved so much recently because we have so many things to do. It's not so much a priority. For now we have two 632 leaks drivers. We would like to use Rump to get newer drivers, of course. So we have Xorg, we have HCI. So that's fine enough for most hacking use not desktop use but that's fine enough for having fun. We have an experimental sound support through Userland Rump. So you start and player and actually it starts a kernel with the sound drivers and then it drives the sound device. So we had a talk about this last year or two years ago, I don't remember. We don't have USB yet but again it's a matter of taking Rump and then plugging things and that should be fine. So it's quite stable. The thing is the boxes I maintain for a building package, etc. I don't remember when I reinstalled them. More than a decade probably. I don't even remember. So yes, it's quite stable. And the build demons, the keep building packages it's something which is really intensive. You keep copying files, compiling things, downloading stuff. So it's really stressing everything and we don't get tanks or memory issues, etc. after weeks of doing that all the time. So yeah, it does work quite well. It does crash sometimes but it's not so frequent. We have a lot of packages from the Debian archive. So it goes up and down depending on new things that are required and things to fix. But basically we are around 80%. Help is really welcome to make that figure bigger. Usually it's just a matter of fixing a few lines of code in the software because they assume Linux if it's not Linux, it's Windows. But we have some of Firefox numeric. I mean big software do work actually. We have the standard Debian installer. So it's actually as easy to install Debian Herd as it is to install Debian at all. So that's fine. So more recent news. We have Geeks, which is going quite well. So purists, I mean GNU purists are happy because they have a real GNU GNU only GNU system. The nice thing is it's actually bootstrapped from scratch. They have the rule to starting from a C compiler on Linux, build a cross builder, etc. and then build from scratch the whole system from an existing Linux system. So we can trust that thing quite a lot. And it helped us in Debian for the reboot strap effort to be safe saying, okay, we can also reconstruct the Debian architecture, GNU Herd architecture from scratch if it would ever be necessary. So I think there's some work to do so that it's actually bootable, but yeah, people are hacking on it and it's going quite fine. So recently I didn't mention to it the translators, so slash servers socket, slash servers, PCI, etc. It's actually special files and so far we used GNU Herd extension to store them. Recently we put some code to use XATTR so to use a standard feature of X2FS to actually store things. We have optimizations, stabilization, so quite a few things. I won't detail everything, but basically we got the Futex implementation which provides really nice performance. We have high memory support. It was quite something because we are still with a 32-bit kernel so to access more than 4 gigabytes of memory you have to map memory on the fly, etc. This was implemented so, okay, for now we can access more than 4 gigabytes of memory that way, so we are fine on that front. I mentioned the sub-HUD to say a user can start something and be sure that it's isolated, so this is something that one can really do now. Before we had to be root to run a sub-HUD so a different set of notion of users, but now we can do it as a user. The really cool thing about this is that you can really think about containers when you think about sub-HUDs. The difference is that from the ground it's safe because actually it was difficult to implement because it's safe. It's deep inside the construction of sub-HUDs that it is safe because it cannot be not safe. Instead of Linux which keeps saying, oh for containers we forgot to isolate the sound card or we forgot to isolate this part and that I don't know where it will end because there will always be somebody which adds something to the Linux kernel which needs to be contained, while on the HUD it's not so easy to implement something because from the start it's contained and so you cannot do something unsafe. We've worked on using the LWIP TCP IP stack because for now we use an old Linux stack which is fine enough but let's get something which is maintained and put it in a process. There's somebody who also worked on a distributed system so you do PS and you see all processes running on your machine but also another machine. They have a current view of what is a system. For now it's really experimental but you could think that you can migrate a process from one to the other and everything works fine because everything is interposed so you just need to push the messages through the network to get things working. We had releases, so there's that old one which is still fun. We used to have Arc Herd, it's not really active nowadays. We have upstream releases from time to time and we have Debian Snapshots. So we had last one one year ago and within one year we will have another one normally. So we have stable sets of things you can have a try. So future work, I think the one thing that we would like to have a look at is drivers to get support for sound, USB, et cetera, but going through Rump. As Jean-Lucas said, Rump is something which is maintained, Rump is mainstream which is flexible and we should really leverage this. So it's just a matter of plugging things, fixing a couple of compilation issues and it should be working fine. And with the PCI arbiter it's fine to have the disk, network, et cetera in separate processes so if one crashes, no problem. So 64-bit support, sometime maybe. A read-ahead is still missing so the performance may not be so good but for now it's fine enough but at some point we should really work on something. And then there's the crazy stuff. So for instance somebody said, well the startup scripts, it's all in C, it's complex, you could do this in scheme and then everything from boot up to the shell would be in scheme. Why not? I mean the idea of the herd is that it's flexible and there are crazy things because you can plug things together. You just have the file system to plug things together. So you're welcome to have any kind of hack you would like to have in the system. So thanks for listening and everybody that works on this, I really welcome help because there's a lot of things I have to do just to maintain the system so that it works stably, et cetera. I would really like to welcome help. For instance the 64-bit support is still post just because I don't have the time to do it because I'm working on something else. So if people want to have 64-bit support then help me with that stuff I have to do and then I'll have time to do the 64-bit support. So thanks and I welcome questions. I've heard of the patches. I haven't read them myself on the Linux kernel to mitigate Spectre and make it meltdown. Would it have been easier on her? Or would it be easier on her? Do you mean running Apache? No, no, patches. There are patches on the Linux kernel that have been published a few weeks ago, one month ago, right now, to mitigate. Ah, you mean meltdown and Spectre? Meltdown and Spectre. So meltdown and Spectre are really awful in that. I mean, yeah. So the question is, would it be easier to patch against meltdown and Spectre on the herd? And the problem is no because here managing memory is relying on the CPU to do things. The thing is, it's always a problem of performance versus security. Ideally the kernel here wouldn't see anything and any memory and it would always copy from here to there and then from there to there and then what you would be able to see would be with meltdown would be only what the kernel sees, that is, the message is currently being transferred. So not everything, so that would be less of a problem. But for performance, we prefer to make the kernel see all the memory so it can copy directly from one process to the other. And then with meltdown, you can see everything then. So yes, meltdown is really awful in that. You cannot trust even the hardware, the isolation between processes. Okay then, we screwed. Yes? You talked a lot about contributing and getting help. What I found from the contributing pages on the herd website is there's a lot of kind of pastes of IRC conversations and not very much kind of description of tasks and some of these new projects are getting involved. So the question is about the contributing page on the Wiki. It has some things, but it's not so easy. The thing is there are so many things to do. I could spend time on doing this. I welcome people doing this for me. I agree. It's like a bottleneck to get people to do some things. Well, one issue we have is that people come and say, what can I do? And the answer is what can you do? In that, what are you able to do? And what you are willing to learn to do? And these are really different things which make what you can actually do then really different. So on the contributing page, there are a list of small hacks. They are not so small in that. It means, I don't know, for somebody who knows what he's doing, maybe a few days. For somebody who doesn't know, then it will be a few months because he will have to learn a lot of things. But that's cool because, I mean, really almost everything I know about the printing system, et cetera, it gives me a job in the United Kingdom and things like this was because of things I've learned due to pet projects. So, yes, there is that list of small hacks you can have a look at. If you don't understand then ask on the mailing list, we can describe what's required, give pointers, et cetera. The idea is that it should be something you would like to work on because you are motivated to make it work. So, I don't know. But yes, we should work on that part, indeed. Yes? You said that for the 64-bit support, you are also working on the 32, 64-bit RPC translator. Yeah. Is it just to... This is necessary for being able to run 32-bit user space applications thanks to 64-bit? So the question is about 64-bit support. So, for now, we have just the kernel support. The idea is that you don't so much need 64-bit in New Zealand because 4 gigabytes of memory is quite big. I mean, when I see Firefox taking 4 gigabytes of memory, I say, no, it's not really normal. So, for now, the idea would be the kernel sees 64-bit so it can manage memory the way it prefers. And then we just have to translate the RPCs, the parameters, the addresses and things like this. It shouldn't be so hard because it's really just the layer and then everything else will work, all the RPCs, et cetera. And at some point, we can bootstrap a 64-bit architecture. It will be just a matter of defining things, porting the assembly snippets, et cetera. That's not too much of an effort. Sure. My point was just having this translation is an additional work to do. So, switching to 64-bit user space from day one might actually save you some time. We just need to recompile or maybe not. So, either to implement translation between 32-bit and 64, or just go native 64, maybe. The thing is we have a 32-bit distribution which exists, which we know is stable, et cetera. We like to leverage on this maybe. Myself, I prefer 32 to 64. If somebody prefers to bootstrap 64, then yeah, go for it and you're welcome. Yeah? Just last question. Yes, because we are out of time then. Yeah. Is this a general mechanism or is it implemented as a special case? So, I mentioned that you can have a token which gives access to IO ports and you can give it to another process. So, is it something which is general or something just for this case? It is something really general. Actually, when I mentioned the question of discussion between X2FS and OTH, here, the arrows I put is actually a port exchange. The idea is that SH actually gets a token from OTH to X2FS and then X2FS can match it with the token it got from OTH, which is a proof that this process is actually allowed to access the file. So, yeah, this is something deep in the system. Okay? Thanks, everybody.