 I'm Martin from Genotlabs and I'd like to talk about our custom kernel approach and the talk is structured the following way and in the first chapter I'd like to talk about what motivated us to create our own kernel. In the second chapter I give a little overview about general qualities of this kernel and how it works in general and in chapter 3, 4 and 5 I will go into detail about some features of the kernel and yeah let's start with the motivation. One big advantage of Genotlabs is that you can run it of various kernels like NOVA like the L4 kernels like the L4 linux and so on and that gives you a great flexibility in development and application of Genotlabs so you can for instance some kernels have a cool debugger built in and some kernels are specialized on security like the L4 kernel and so you can choose in the application and in the user applications it doesn't matter which kernel you use for the basic features and the second big advantage of supporting many kernels is that you have a lot of different ways of testing the system on top of it so you have different timing, you have different scheduling, you have in general a different behavior at the base and so you can get much harder testing for your system on top. Yeah so when you look at this combination Genotlabs and the kernel it's normally the way that you have a micro kernel that is developed on its own and that aims for comprehensive security concepts so it likes to be self-contained and mistrusts all that is on top of it. Of course this brings a little bit of problems with it because you then have the core Genotlabs core component that has to bring this paranoid kernel perspective in line with the Genotl API on top of it and the misery here is that the core component of Genotlabs must be trusted anyway by all components on top so you don't get, yeah you don't have much advantage from this extra effort you spend. So of course there are some drawbacks that come from this aspect. First you have to shape some concepts the way you like it to be for example on some kernels for asynchronous communication you have you have semaphores for instance and these semaphores must be abandoned shape to support the asynchronous communication API that we use on Genotlabs so the signals in this instance so the other thing is that often work is done redundantly so the memory management for example must be done redundantly in the core because the Genotl API differs a lot from that memory management API that kernels support normally and at some points there are even deficiencies that you can't solve so you have to work around it for instance capability delegation on some kernels you have the problem that you have to remember capabilities at certain places where you normally wouldn't have to using the Genotl API but you have to because the kernel otherwise revokes the capability so yeah well we came up with the idea of creating our own kernel this kernel should trust core because it's trusted anyway as I said and should be completely tailored to serve the needs of core and in that it can be reduced to a minimalistic library that is linked to core and simply enables it to run directly on the hardware and having the Genotl API you can also run the most critical parts of the system in a simple in a more simple manner than in core for instance core is a multi-threaded application and some things are much easier if you are not multi-threaded for example so let's talk about a little bit about the kernel itself the tasks behind the kernel API are at first the exception vectors setting up the exception vectors and catching exceptions is the main task of the kernel then doing the scheduling of course for the multi-threaded components on top then the kernel controls the interrupts of course because it has the exception vectors and it provides communication in our case it's only two communication channels you have IPC and you have signals for synchronous and asynchronous communication then it supports capabilities so you have trusted capabilities on top and you have local names in each protection domain that can't be misused and it does cache and TLB maintenance because on most architectures these are privileged operations and last but not least it does virtualization for the same reason as for cache and TLB maintenance and because it has the exception vectors so the API that rises from these tasks is pretty simple you can see in the top row these are all syscalls that are core only so only the core component of GNOT is allowed to do these syscalls at all and in the bottom row you see the public API of the kernel so these syscalls can be done by each user and you see these are not many syscalls it's 14 if I counted right and it's mostly about threads, signals, messaging and some capability syscalls to manage the capabilities in the local PD and all the memory management you can see is the memory management for the kernel you did the new and delete syscalls are all in the core only section so yeah this brings me to the first quality of the kernel that all dynamic memory is accounted because for each kernel object that is created in runtime you have to put in some memory from the outside so from core and in core we have a good memory management and so these memory management reflects the cost for the kernel objects to the session quota on top of the GNOT API so in general for each kernel object you create you have to pay as user and this brings with it that the kernel can't be can't run out of memory and yeah that's that's a big advantage of this corner and the other thing is that this way the whole kernel is a mere state machine so you can keep it single threaded you you don't have any dynamic stuff that is created and this makes the kernel pretty small it's a really low complex and all you do in the kernel every time you're in the kernel you're the only one that is in the kernel so you are really fast you don't block you are not interrupted and kernel passes get really fast in general so speaking about low complex I have a little picture here normally a few on GNOT and you have one of the third party kernels you have in the basic setup say core the kernel and the init component and some bootstrap tools on some kernels you have at least 30,000 lines of code that you have to trust from every component on top of it and this is already pretty small but with the kernel library approach we can reduce it to about 22,000 lines of code about the hardware support of the kernel yeah initially the kernel was developed on ARM 7 and so the most features it provides on this architecture of course it provides several boards you can see it here like free scale boards or Exynos and yeah the OMAP4 for example and you have various features that are supported like the ARM trust zone features multiprocessing virtualization or FPU or something like that and of course we also support other architectures and the ARM 6 support we have for Raspberry Pi, we also have a board for the Moran separation panel which is a project from Switzerland. Yeah let's go into more detail first I'd like to talk about the scheduling normally on other kernels priorities for scheduling and the big problem that you have with priorities is of course that one of your components got met at some point the other components that have a lower priority are not executed anymore so this is a problem we faced and we thought about how to solve this in our custom kernel and we thought yeah having only the priority like a quality value is not enough we have to have another value that is like a quantity value for CPU resources and we came up with the Coder, CPU Coder. CPU Coder is like the time size you have a super period of 100% currently it is set to one second in our kernel but you can configure it of course and of this one second now you can donate parts of it you can donate slices of it to the other components and then that's the principle that the component has only a priority if it also has Coder so if you're out of Coder you have no priority and you get only scheduled while dropping and as long as there is a component that has a priority in Coder it is scheduled before the other components. Okay let's look at our example from before we have again our USB that goes met at some point and you can see that in this super period it exceeds its own quota by going met and then it goes to wrong routing because there is no component left that has Coder in this super period. So at the end of the super period all the quota gets reset and there are again priorities and then you start again with the USB because it has gone met and then it exceeds its quota again. So this is a nice way of keeping the system alive even if you have a high priority but going met. Okay another nice thing about the CPU resources is that they can be distributed in a way that resources are normally distributed in genome so you have the parent of a component tree that has a specific amount of quota of CPU quota and to fill its children with life it can give them some of its quota and these children can again like CLI gives virtual box again a little bit of its quota and so on and so on. So it's in the hands of the parents how much quota their children get and the same is with the priorities you have at the root of the tree the init has all priorities 1 to 8 and then can give a sub range of these priorities to its children and so on and so on. Okay let's talk about the capabilities a little bit. One cool thing about capabilities on the base HW kernel is that the capabilities get automatically created and translated so normally you get capabilities into your protection domain by receiving IPC. And then this IPC message are some local names of the other component that sent me this IPC but I cannot use these local names of the other components or the kernel goes ahead and tries to translate these local names of the other protection domain to my own protection domain. So I can directly use them without looking up any object or something like that and so when I receive the message the local names that I have in these IPC message are already fine for me I can directly invoke them. And it also if the kernel now realizes there is a local name of the sender of the IPC that is not present in there is no counter local name in the other protection domain so the kernel goes ahead and creates automatically one for me. So yeah I have no invalid local names when I receive the message. Yeah and this implicitly implies that you have no name diversity in your protection domain. So if you have one object in the kernel that you want to reference you can be sure that you only have one name in your protection domain for this object because they don't get complicated. Yeah okay and one cool thing about it is that all hosts in the kernel that arises from receiving capabilities gets countered to my own protection domain session quota. So the user if it wants to receive capabilities to its protection domain as beforehand give the kernel some quota say okay you can use this quota for my capabilities and as soon as the kernel has the case that it likes to create some local names for me it takes some of this quota and creates it from it. And if the quota doesn't is not sufficient for creating these local names the kernel says no and throws an exception and says here I have some capabilities I want to give you but I have not enough quota and then I can go ahead and give them some more quota and receive the capability. So this is again in line with the concept I told about at the beginning that the kernel doesn't give its own memory for users but it always wants to be paid for actions. Yeah last but not least the cool thing about capabilities is that the kernel helps you with the lifetime management. It looks a bit a little bit complicated but I try to explain it. In general if you receiving a new capability the kernel has to create a local name object there. I talked about this at the last slide and it has to use my quota for this. And of course it's in my interest to get rid of this local name object as soon as I don't need it anymore. And how to achieve this the kernel helps me with this but there is a problem because there is a transition phase. So when the kernel has created the local object here it has to send it to me it moves this capability into my UTCB for example. And then I have to receive it from the UTCB and create a local instance and so on and start ref counting and the whole lifetime management in the user land. But between these two points I have a phase where the local name object was already created but it is not known to the user land. So if now one thread in my application receives it and another thread also receives it. And during the time that this other thread demangles the capability from the UTCB the first thread starts to delay it. It would be fatal to delay it. So the user land has to acknowledge when it has received the capability. And as long as... Okay, talking about communication. This is a short topic first about IPC. Cool thing about IPC is on the base at Shabu-Kono that it implicitly delegates CPU resources. So if you imagine you have these four components, it's two components that use a terminal session. It's a session where you can put the characters and these characters go somewhere. And you have a multiplexer for the terminal session. And a UART where you can print these characters that you put in the terminal session. So if these two terminal clients here above start an RPC to their terminal multiplexer, they give their IPC resources to the terminal multiplexer. So the terminal multiplexer has all its own RPC to UART. All these devices it has for its own to UART. So the UART now has a lot of CPU resources to do in our process with RPC and in particular to go back to their original component. About signals, the asynchronous communication channel. Again the kernel helps here with the lifetime management. And here the problem is a little bit you have to manage. And this user-land object is the signal context. Different signals like a timer signal or something like that. For each type of signal you have an object to know if the signal arrives what type of signal is that. So in general you have again a phase, a sequence where the signal arrives. It's in my UTCB again and I have to update the signal context. So the user-land lifetime management is up to date. But during this time slice it would be fatal if another thread goes on and likes to destroy the signal context. Because then the pointer that I got with the signal would be invalid. And this is why the kernel has this kill signal context as is called. And if another thread now comes and wants to delay the signal context it does this kill as is called. And the kernel now knows the signal is killed. But there is also an acknowledgement pending. So I have to wait but I don't accept any more submits for the signal. So the signal is dead but it's already there. It's still there. And as soon as the acknowledgement of the receiver arrives this kill as is called of the main thread gets unblocked and it can start destroying this signal context without having any fear that there is still a signal. The main advantage of this is that normally you would have, when you receive a signal normally you would have to look at the signal context. You would have some data structure where all signal contexts are in and you do a look up on it every time you have a signal to ensure that the signal context is still alive. And with this system you won't have to look up signals anymore. So signaling in general I think gets faster. Okay, that's it for my point. Maybe I have some questions. If I wanted to play with you, so just to check it out, what would I do? Yes, of course. You can see here just the source code of Gino on GitHub and then there you have a tool for creating abilities and build directories. And in this build directory you would have to set the corner. There's a corner variable. The build directories are architecture. So we have x86 build directory for example and then you set the corner variable to hw. That's the name of the corner. And then while you build in this build directory automatically that's built. And yeah, maybe I can show you. So I have my created and now I have this build.conf file. Where can I configure my building? Yes, the corner variable. You can set this corner variable also temporarily. Yeah, if you want to make later, you can also set it only for this one make call but you can also set it in here. hw instead of nova. And that's it. I start the test with this here. And now it starts building. You can see here that it builds the corner dependent library for the hw corner. We have some corner dependent libraries that are linked against the generic binaries. So the binaries stay independent from the corner. Oh yeah. I forgot. I can also set... I enable multiprocessing. And yeah, here it compiles the corner and yeah, the core takes a little time. There is the corner library. It's in the component running on Vans and Shovey. You can also start Vans and your tests, of course, but it will take a little bit longer. I hope this answers your question. Yeah, it is fine. It's not question, it's a suggestion. You should definitely come up with a better name. Yeah, I thought about it a lot of times. I thought about lots of different names. And then there was the point, I think everybody knows it, where all the other guys said, okay, we need a name, we need a name. And I don't have a name. And then they started, okay, let's come up with a hardware for example. And I said, okay, well... It's not the coolest name, but I think the corner is not that... It's not an on-standing component. You don't have this popularity where the name is that much important. Yeah, it's different for other corners that are used without any other system. Okay. Two questions about the scheduling and about the member management. You can go back a slide to the schedule where you explained the super period. I didn't get the red dots. Actually, the red dots mean that they didn't get the time to process. They didn't get the CPU time, right? Okay. Why do they get the CPU time if the CPU, the super period is not over and there is time and there is no more periodic tasks? Yes. Okay. Okay. This is the super period, yes? Yeah. Under here. And now you have, at the beginning, lots of processes that have a priority and the quota. And they use their quota like here. And for example, for a foul play, at this point, the quota for a foul play is exceeded, yeah? It has no more quota left for this super period. Yeah? So... So you think actually we should show that the quota is used and that the quota is available? No, no. Yeah, it's a little bit complicated because at this point, you are already in the next super period, yes? Okay. This shows the quota for the next super period, yes? But for the last super period, at this point, the quota is exceeded for a foul play, yeah? And at this point, for audio, the quota is exceeded and so on. So at this point, all components except the timer have no more quota. In the whole process, so more tasks comes, but there is still time in the super period to receive the super time, right? I'd say the timer doesn't get involved. After needed, it doesn't require time. Yeah. For raw priority tasks, of course you can only consume your quota when you're awake, yeah? So this is why the timer still has quota left at the end of the super period. Any pending acknowledgement that would make the kernel wait? No, no, yeah. That's a good question. I said that the kernel would wait, but that's actually not true because the kernel only sets the thread to wait. So if I do this, this delete, the delete happens in the kernel. It's like the kernel waits for the delete. Yeah, yeah, yeah. It's not like that. Actually, it's like, it's when you do the delete, it's this call, and yeah, no, it's bad example because at this point, nobody waits. The delete is like a hint for the kernel only, and it directly returns. But for signals, it's true. For signals, the kill is this call. When you say, I want to delete the signal context, and the kernel realizes, okay, there's an acknowledgement pending. It blocks the thread. So the thread doesn't return from the syscall, and at this point, it returns. In the previous case? But the kernel stays alive. In the previous case, from the kernel perspective? No. Where was it? In this case, from the kernel perspective, the delete won't happen until the acknowledgement comes. It's possible that the user space never sends this acknowledgement. What mechanism the kernel has to capture that? It's... This delay has no effect. The kernel only realizes, okay, as somebody wants to delete the cap, but there's still a representation of the capability in the component in the protection main, but it has not resolved it yet. So when it gets the acknowledgement, the kernel says, okay, now the user then has the new local name, and then the delete waits for the next delete. Delete the delete. There's a notice in the question, but what happens if that acknowledgement never comes in, and it keeps creating the kernel? That's not a problem. That's not a problem. Because this acknowledgement is called the protection main for the user. Yeah? If the user is not intelligent enough to do the acknowledgement, it's its own problem, because it's its own order that is used for the local name engine. Okay. Thanks. Okay. So I have another question on the justification. So if I understand correctly, for instance, a file system processes, more generally, if you have a process that needs to communicate with another process, who's allocating the name? You mean in case of the capabilities that you have? Okay. So each protection domain has to pay for its own capabilities. So if I'm a server and I provide a session for clients, and over the session I receive capabilities from the clients, I have to pay a server for this. I'm willing to spend some of my quota to fulfill the service I provide for the clients in this case. So I have to spend some quota beforehand before I say, okay, now I'm ready. And if I don't and I get capabilities and the kernel throws an exception, then I can also then I can still decide whether I want to spend more quota for the client or not. It's in my own. The exception is the client thing. The server will receive the exception from the kernel if the client sends a message where the capability is in and the kernel tries to create a local name for me. But it has not enough quota for me. I have the choice to spend more quota. Or I say, no, it has to send a quota first. The quota must be spent to the kernel. I don't know if I understand the question right. Can you make some analogy with the kernel, with the type provider and the guest for this number of locations? What did you say? Server clients can be the end guest. This analogy with the quota wouldn't be easier to do. It's the same for a hypervisor. The quota always comes first for the guest and if the guest requires something more. You have to go and if you already know that. Yes. I think it's a little bit more use to hypervisals. Each component has a certain function of quota. There is a problem point that allows the component to trade this quota. The client wants to use some service from the server. It opens a connection to the server. With this connection it donates a bit of its own quota. But this is something that goes out of the band of this obviously. This has already happened to the client. It has donated some of its own quota to the server. This particular client has given me this amount of memory. And so the server can trade a band through the quota. Now I am spending a little bit more of my quota. We are counting the contract and the component is the topic that we are talking about. Okay. There are no more pressing questions.