 A wonderful good morning to all of you on day four. I hope you made it through last night halfway intact. Our next speaker, Mindy Preston, who's a core dev on the team for MirageOS and release manager for its latest release, will be talking to you about library operating systems. The solution for, well, if you're doubling in microservices and Docker just is too fat of a thing. Give a warm round of applause to Mindy. Hello, thank you. If you can't read this, now is the time to shout. But it seems like it's fine. So today I'm going to talk to you about ways that we can use library operating systems in Unicernals to reject the default reality or abstractions and substitute our own. Quick little roadmap of what I'm going to talk today. Talk about today. This talk is in the resiliency track, which is new at CCC this year. And I wanted to talk about why I think that the concept of resiliency is very relevant to the work that we're doing in library operating systems in Unicernals. And talk about some of the thing, the properties that a project might have that lead it to be resilient, especially as how that relates to whether a project is understandable. I'm going to talk about what I think of as the ultimate bad dependency in your project, which is a traditional monolithic operating system. I'll talk about how you can replace that bad dependency with maybe a dependency that fits better into the way that you usually handle your dependencies. And I'll talk about how we can actually implement increased resiliency when we take this approach by using good software tools that we use in applications for the libraries that we make this OS out of. So to me, a really important concept in whether a given project is resilient is how many humans can disappear before the project grinds to a halt. We have a name for this, which you've probably heard, which is bus number. Euphemistically defined by Wikipedia as a sudden disappearance. If we want to increase the bus number of our project, the simple answer is, well, okay, if I want more humans, if I want to have like a higher number of humans who could go away and my project would still go, I should get more humans involved in the project, and then the bus number will go up and that'll be fine. But in fact, it's not a simple thing to just say, a human who's exposed to my project is a knowledgeable or competent person in the context of this project. And a way that we can help people be knowledgeable and competent people in the context of our project is to make our project easier to understand. But our project isn't just our code, so we can go and we can do a ton of refactoring and we can say, okay, we're gonna use the best practices that we know of and document the hell out of everything and use nice language features and all that stuff. But we also have to think about the language that our code is written in, how we actually interact with our code from day to day, like how do we build it, where do we run it, and what things do we need in order to actually use our code in some kind of meaningful way. Our project isn't just like the code that sits in our repository, it's also the list of all the things that it depends on. And you might be saying, okay, well, my project doesn't have any dependencies or has a really small list of dependencies, I understand all of them really well, they're all really simple. And I would say to you, that's probably not true. It's very likely that you have, in fact, a really heavy dependency for your project, which is a conventional operating system. So I've made a nice little ASCII art diagram of how I think about applications, their dependencies, and a monolithic OS in this holistic view where you put all these things in the same category of stuff that I actually need. And in this diagram, I've cheated a little bit by just saying dependencies don't worry about them, but this one's really big. But in fact, it is really big. The operating system is really huge, it's almost always got more stuff than you need. And the way that you interact with it is different from the way that you interact with the rest of your dependencies. The documentation is probably not where you normally look for documentation. You probably, if you have a problem with it, you probably can't debug it in the way that you normally debug your code. And when you're writing code that deals with it, you are likely accessing it within API that doesn't look very much like the APIs that you like to use when you're doing application level code. So to give you a concrete example of what I mean by that, this is a selection from Man2 in Linux. The signature for the function is here. So this is Socket, which would be the first thing that we call if we wanna do some network communication. And Socket wants a magic number and another magic number, and another magic number, and it'll give us back a magic number or another magic number to tell us that we got an error and we should look somewhere else for another magic number to find out what the error was. If you program in languages other than C, this is probably not what you're used to seeing for a nice interface for doing some external kind of stuff. And worse, so that magic number that we got back from Socket, we have another function that we need to call with it, which is also doing a whole bunch of stuff with magic numbers, giving us this weird out of band error communication. And you might say, well, okay, but if you're doing application development and you're complaining that the C interface is bad, that's because it's a C interface, like that's probably not what you're actually dealing with in your higher level language. But in fact, often you are. So this is the language that I usually work in is OCaml. It's a functional programming language with a lot of nice features. It's considered to be a higher level language than C is by most people. And I'm gonna give you a quick tour of how to read OCaml type signatures. So this is the equivalent of Socket. It's also named Socket, so that's helpful. This is a function. It's going to take some arguments and these are their types. The type Socket domain is also a magic number, but it's at least restricted to a small subset of possible things. The same thing for Socket type. We have a magic number for the protocol type, which in the documentation we at least are told zero for default and then a number for other stuff. And we get this type file descriptor back, which again is under the covers and magic number. And on failure we get an exception. And if you've ever programmed in a language that has exceptions, you are usually not very pleased to see that there is something that's going to be throwing them as a matter of course, when you're trying to do normal stuff. Connect is even worse. Connect is going to take this file descriptor that we got back from Socket. It's going to try to map a Socket address to it. It doesn't even give us any, the return type is unit, which means, which is basically, which is kind of equivalent to Void. Like, I just went off and did something. It was side affecting, don't worry about it. Look somewhere else for errors. In this case, we might get an exception, which will just wrap the information from Erno from Unix. So this is really not very fun to program against in OCaml either. And we can see why this API is the way it is. It's just a wrapper for the C API. It's got a few nice language features from OCaml, but not very many. And really all you can do when you're in this situation trying to interface with the operating system like this, is you can write a higher level API on top of this. You can build more and more and more abstractions to get to something that looks nice to work with as an application developer or in the context of your language. But somewhere down on the bottom, you always have to talk like this. And what if you don't want to? So if we're in open source universe, you can say, okay Mindy, if you really don't like that API so much, go change it your stupid self. And I would say that that is non-trivial for most people who are trying to write code that they just want to run somewhere. So if I want to make a change to how I interface with the kernel when I want to do something, I first have to learn to program it in the language of the kernel, which is probably not the language I use day to day. I have to learn how to do that kind of programming in the kernel, which is another skill set that I probably don't have. I have to make a test, make a patch, make a decent patch, and then somehow get that patch accepted. The community that I'll have to talk to do that is a different community than I normally talk to. It's not the community of my project. It's not the community of my language. It has different norms. It has different processes. And it's probably not very excited about my patch. That's for me. That's for either my application or my language. Like the kernel doesn't care about whatever stupid user space thing I want to make easier for myself, generally speaking. And if you think that I'm just whining about doing kernel development is hard, I would say that it's not just me that thinks that doing kernel development is hard. You may have heard of the eudiptole challenge, which was a series of programming exercises for the Linux kernel that was done entirely by mail. It started from writing a basic kernel module and the challenges would be gated if you would basically send in a patch bomb that said, okay, I've solved this challenge, please give me the next one. And then some scripts would sit and think about it for a little while and say, oh yeah, okay, I think you did a good job. Here's the next challenge. So there are 20 steps to the eudiptole challenge. There were around 19,000 people who started. I was one of them. Of them 160 finished, I was not one of them. And I would argue that a 1% success rate of people who start out specifically wanting to do this one thing is probably indicative that it is in fact difficult. And since we're talking about resilience, I wanted to bring up another topic around this. So members of your project might want to limit their exposure to communities that are known for their hostility and toxicity, right? Like they've decided to be in your community. They haven't decided to be in every community that's touched by the dependencies of your project. And a way that we can protect each other is to limit our exposure to these communities that we know might not be safe for people to be in. And projects that are famous for their toxicity do include some very famous monolithic kernel projects. So, okay, that all seems really bad and kind of a bummer. So let's think a little bit more about what operating systems are actually doing for us. And I've kind of zoomed out from the view that I gave you before of our application, its dependencies, and an operating system. I've thrown a little bit more detail in here about how we're doing this communication. How we're doing this communication with the things that are sort of below us in the way that we normally think about the stack. And crucially, I've put in this last little block here. Usually when we're deploying applications, in this context where we have some operating system that we're running on, the operating system is actually not usually running on what we consider bare metal or hardware. It's normally virtualized. So there's a thing underneath it, the hypervisor, that is doing sort of a, depending on how you look at it in this view of the stack, it would be a lower level management of system resources. And then farming these things out to individual virtual machines. So, well, the view that the application has is that the OS is taking care of memory management and scheduling and doing all the network stuff and file systems, clock, video, all of that. It's actually the hypervisor that's responsible for most of this in this, when you're deployed in this sort of environment. So what the operating system is actually doing a lot of times is just passing calls to the hypervisor for a lot of these things or getting more coarse grained resources and then doling them out on a more fine grained manner to its individual applications. So what if we can talk to the hypervisor directly and get the same sort of easy virtualized interface that the operating system is getting when it's running in this virtual machine context and then use libraries to take care of the things that kernel code would be taking care of for us that are at a higher level than just interacting with the device. So things like networking isn't just, I need to push this data into a network card. I need to assemble TCP flows. I need to know how to do DNS lookups. Storage isn't just take all of this data and write it to this sector over here. There's an entire file system layer that we're usually interested in there. The logic of how we do timekeeping isn't just okay, go get the time. But we can implement these things in libraries that are written in the language that we are used to dealing with and that we know how to deal with. And then shim out the bits that we think are complicated in a traditional operating system because we've been told operating systems have lots of drivers. But in fact, in this hypervisor context or not, we can have the language run, we can write small shims for it and then have the language runtime take care of for us. This, by the way, is a unicolonel. So I'm going to show you some examples of what these libraries actually look like and what it sort of feels like to interact with a system like this. In MirageOS, which is the library operating system that I work on, we have a common set of interface definitions for these libraries. So we have a set of module types that tell us, okay, if you're a timekeeping device, you should have these functions that people can call. If you're a low-level network device, I should be able to send a packet. If you're a TCP module, I should be able to interact with you in terms of flows. And then we have a whole bunch of implementations of those interfaces. So specific examples of file systems, specific examples of networks. This is also where we have specific ways of handling networks in different hypervisors so that we can be general when we are writing our code and then just deploy to the target that we actually end up on. So let's take a look at what this function that I've been complaining about in the man in our C API and in the OCaml interface to the C API looks like. In MirageOS, we have a module type TCP that says, okay, I'm going to give to you, I'm gonna expose to you this function createConnection. It takes the, so this type signature is quite a bit more idiomatic OCaml, which unfortunately makes it a little bit more difficult to read if you are not an OCaml programmer, but I will try to make it a little bit clear. So in OCaml, we usually carry around some context or some state for a thing in a thing that we often call T. So this is the current state of the way that we're going to send stuff over TCP. We're going to take an IP address and a port and we're going to try to make a connection. The return type is either a flow that you can then write to with other functions that are provided by some implementation of this module type TCP or it's some error which is returned in band. And so the type of, this whole thing, this flow error result is saying, okay, you'll either get a flow or I will tell you what went wrong. And we have this parameterized over IO, which is a concurrency type. Since this might block, we exposed some stuff for doing lightweight cooperative multitasking. This particular type signature also uses something that was fairly just recently merged into the OCaml standard library. This particular result type was something that you had to implement yourself up until fairly recently in OCaml 4.3. It was pulled into the main compiler, which means, sorry, let me back up. We were able to change all of these module type signatures which we had custom error types for all of them all over MirageOS. We were able to unify them with this result type within I think about eight months or so of that being released in the main standard library which is a turnaround time that I think you probably won't see with most larger operating systems projects. The implementations also look like idiomatic OCaml. So if we look under the hood of what Create Connection is actually doing, it's not making any weird buffers. It's not digging around in global state. It is just taking, okay, give me your TCP connection, give me your destination address and port. I'll make a PCB for it. If it failed, I'll log that and I'll send you the error. Otherwise, you'll get the flow that I got. So the way that we actually assemble these into things that you can really use. We have, so what I've shown you so far is conceptually how we deal with these list of implementations and what they are and the list of dependencies and what they are. But what we can do, what we do in MirageOS is we take the application that the user wants to run, the list of dependencies, a list of implementations which they can pick when they're actually building their thing and then compose those into either a process that runs on a traditional operating system. So we have a whole bunch of shim layers in these libraries that mean that when you're trying to develop your application and you just want to run it locally, you can do that, it's fine. Or you can have a minimal virtual machine that will run in the hypervisor context that I showed you in the diagram just a few slides ago which can run on several different hypervisors now. So I'm gonna show you a quick demo of what it's like to interact with that. I have a web page. Well, I have a web application actually. It is a MirageOS Unicernel. It has a list of things that it needs. This is the configuration document for the Unicernel which includes a whole bunch of information about how I want to get my networking. So this generic stack V4 is just give me some IPv4 stack. The user can decide when they're building the application, how they want to get it. Where my content is going to be that I want to build an HTTP server, I want to use TLS. I can do some customization stuff for ports where I'll get my certificates from for the TLS connection, some more customization for the HTTP ports. These are my application level package dependencies so these are the things that you might have in a manifest in another language. And some composition that says, okay, get all this stuff together. I also need from the library operating system. I need a POSIX clock. I need two places for data. First off, the actual data that you want me to serve and then something with the certificates and then give me something that knows how to deal with HTTP. And we're good to go. And the actual logic for my application is in this dispatch file here, which I won't show you because it's a whole bunch of OCaml that is sort of beside the point. Here's what I want to show you. So when I invoke Mirage Configure, it says, okay, I'm gonna go look at that configuration document and the list of things that you said you wanted and I'll figure out how to go make that. So part of what it does is it makes me a make file so now I can just type make. It's gonna go off, make that application. It'll make a binary called main.native. I can run it and I've got a webpage. Well, thanks. Let's make a unicorn out of it now. So this is the thing that I just showed you when I deploy it. I send it off to Google Cloud. Google Cloud wants, the hypervisor on Google Cloud wants something that knows how to talk to Verdiyo devices. So I'm going to say, Mirage, please make me a unicolonel that will thrive in this environment. Make me one of those. And now what I have is this HTTPS.verdiyo, which I should show you more fully. So I have 16 megabytes, which represents the entire virtual machine that I need to run that same website. And you might say, 16 megabytes is kind of a lot for like, all you're really doing is serving web stuff. I have all of the content for that page rolled up in an in-memory file system there, which is part of the reason that it's large. And I should mention for fairness that the binary that we built for Unix, which is actually a sim link, sorry, let me show you the size of the operating system, the binary that runs on top of the OS, it's actually bigger. So I hope that I've maybe convinced you at least to look into whether library operating systems make it easier to do systems development. And I hope that you have a language that you like enough that you think that it might be nice to be able to talk to you to be able to build some of these operating systems things in it. And now I'd like to convince you that being able to use your language tools makes it easier to do a good job of building these modules that you might want to use for these systems, these sorts of tasks. So library operating systems, there's a really interesting implementation of the ideas behind library operating systems in the Rumpkernel project, which was a project in NetBSD to take the kernel drivers, basically to make NetBSD be more like a library operating system. And the person who put the time in for doing this tweeted fairly recently that his original motivation for doing this was to make it possible to actually do debugging and testing for these kernel drivers. Because once you have these devices in isolation and you can maybe run them in user space or even run them just by themselves in some other kind of space, it's much easier to make conclusions about how they behave in the presence of certain kinds of input. So the tests that we write for MirageOS, they look just like really boring unit tests for any other bit of OCaml code that we might write. This is a test of something that is wrong in a lot of implementations of TCP, which is options padding. It is very, very boring, a bit slinging code that you wanna test a lot and not screw up, especially if you're in a memory unsafe language, which luckily in OCaml we're not. A feature of this code is that it is boring. And the environment in which we run it is also very boring. We just run it in our normal test framework, the way that we normally run tests. We don't have to set up a special test harness. We just make our tests and then execute them. No drama. Another thing that we can do when all of these implementations are just individual libraries is we can decide, okay, there's the normal library for doing some specific thing that is gonna do its best to implement a spec or whatever. But for my testing purposes, I want one that's kind of broken. I want one that does something unexpected so I can see what my application does when that unexpected thing happens. So in MirageOS, we've come up with several sort of fun, I guess, if you like to break stuff, custom implementations of various things, like network interfaces that always have new packets waiting, and a really nice thing to be able to do is always be able to sub out the random number generator that something might be looking at so suddenly all your tests can be deterministic, even if they're not cooperating with you and they wanna go read from some source that you haven't set up. In the same vein, you can imagine that having entropy sources that always block is interesting. Application code often fails in lots of situations like when the file system is full or a block device is busy or it gets short reads from the network or maybe your DNS has been hijacked and you wanna make sure that you don't do something stupid in that situation. When the implementations that you have for these things are just modules, it's really easy to drop in one that's broken in a way that you think is cool. And some more stuff that we can do that is maybe a little bit less obvious. So in a conventional operating system, you have a whole bunch of global state and the way that we're used to interacting with this whole bunch of global state about how the system is running is we open up a shell and we say, okay, let me poke around in here and see what's going on. In the context of a unicolonial, we have access to the state that these modules are keeping track of. We have access to how that state is surfaced. So instead of us having to go in and try to think of, okay, what can we think of? The application is doing something weird in this way. What are all of the things that we can think of in the massive set of global state in our operating system that might be affecting how the application is behaving? We can have the operating system tell us what state it's in and even more interestingly why it's in that state. We did some experiments using the library Ermin, which is a distributed data store that presents interfaces that look a lot like Git to sort of see, okay, what if every change we made to the operating system state had a commit message? So everything that wanted to change anything about this state would have to tell us why and we'd get a log of why. And if we wanted to change it ourselves, we could commit to it and say why we did it and then see what changes in the running OS and in our running application. Another interesting thing is, so I mentioned that we have schedulers that we can swap out. Well, what if our scheduler left traces of what it was doing? We had someone in our project. Thomas Leonard wrote a really nice visualizer of all of the decisions that the scheduler, that the cooperative multitasking scheduler makes in a Mirage OS application, saves it to a shared memory buffer and then you can examine it interactively in two dimensions to see, okay, this thread failed, why? And a couple more things. So you might say, okay, Mindy, you've convinced me that I don't need the operating system anymore but I still need a hypervisor if I'm gonna do this thing and a hypervisor is also not a tiny bit of code. Like that's a pretty big dependency as well. So what do you think about that smart guy? And I say, well, we can do stuff with hypervisors too. There's a really interesting project called Solo 5 which has a component UKVM which takes the device requirement information that we surface when we're writing these configuration documents from Mirage OS Unicernals and says, oh, okay, you said you don't need a network device. I'm not surfacing anything about network devices to you. As far as you know, this hypervisor doesn't know what a network is. I'll put a link, there's a link to that project on the end slide, I think it's a very interesting project and very cool. And we have a little bit more ongoing work. We have a lot more ongoing work for opening up more hypervisors so that you can run more Unicernals on more desktops, some work on taking these libraries and using them outside of the context of library operating systems. There's nothing that says that you have to use them only in this context. You can use them wherever you want to do whatever you want. We've been doing a lot of interesting work with CubesOS. CubesOS is a desktop operating system. It's running on my laptop right now. It is powering these slides before you. Where the conceit is that you have a whole bunch of different virtual machines to enforce separation of concerns between the different things that you're doing. And a problem in CubesOS is that if you do all of this with conventional operating systems, you need a lot of hard drive space, which they solve in part by doing some sharing, and you need a lot of memory to run all those VMs. So it's really nice to be able to have something smaller, both in terms of size and in terms of memory consumption. So we've had a whole bunch of small projects to replace small bits of things in CubesOS that run on big, fat Linux VMs with real tiny unit kernels. The most widely used and successful one has been CubesMirage Firewall, which I'm using right now. I replaced a Fedora VM that needed, like I think like two gigs of memory or something, no, not that much, like 500 meg of memory with a unit kernel that needed 32. We have a whole bunch of other projects that are ongoing right now. If you're interested and you want to see the slide later, I can pull it back up for you. But your idea can go here. There's plenty of interesting stuff to do with library operating systems and unit kernels. There's plenty of space here. And I'm interested in hearing your ideas. I think we have maybe enough time for a question. But thanks a lot for your time. Thank you very much, Mindy. We actually have time for one question, and we're going to pick the one from the signal angel via the internet, because those guys can't catch up with you later on. No question from the internet? Then we can have one question at microphone one if you're quick. Have you explored using capability-based languages? For example, there's a project called Emily, which is a capability-based version of OCaml that provides more isolation between processes. You can reason more easily about what parts of your application can use what state. I personally haven't. I had heard of Emily, but I didn't realize that it had that interesting feature. So I should, for sure, look at it. Thank you. OK, that's all the time we have right now. If you have more questions, please catch up with Mindy outside the lecture hall. And thank you again, Mindy, and give her a warm round of applause.