 So the microphone's turned on. Let me get rid of this. Yeah. So now I'm going to talk about Cloud ABI for FreeBSD. How does it work? I'll briefly explain how this talk came to exist. The debt line for the submissions for the BSD dev room was before the submission deadline for the main track, I think. I wanted to get into the main track, but I still wanted to have a backup plan. So I made two submissions and both of them got accepted. They all had exactly the same abstract. So I was planning, but then Rodrigo poked me hard enough when he said, no, you have to give two talks. So this is why I've spun off a bit, and this is about a different topic. During the main track, I gave a talk about Cloud ABI itself, and now I'm going to explain how it works on FreeBSD. What's going to be interesting about this talk is that I don't have any slides. One of the reasons for that is while I was preparing this talk and figuring out what to put in slides, I noticed that I wanted to make so many demos and play around in the terminal so often that I decided to just stick to this. So for the people who have seen Brooks's talk earlier today, but these really nice flame charts, they're not going to see any of those today. It's not as fleshed out as his talk. So if you have seen Brooks's talk earlier today about Hello World, or have seen it before, a couple of people, who have seen my talk about Cloud ABI during the main track, only a couple of hands. So first, I'm going to give this really brief introduction about what Cloud ABI is. It's basically a programming environment, similar to the C programming environment on Linux, FreeBSD, etc. But it's stripped down in such a way that you can use it to build programs that are really strongly sandboxed. So if you just stick to the APIs that are provided by this runtime environment, then in theory, you should end up with a program that is really easy to sandbox. So I'll give a demo of how it works. This was a demo that I wanted to show during my main talk, but unfortunately, I didn't manage to because of time constraints. So what I've done is I've written this tiny web server in C, Cloud ABI-webserver.t, and there's this configuration file. I'll first show the configuration file because this shows how the program is going to be started up. This is sort of, you could think of it as like the replacement for command line arguments in a certain way, where it describes like, we want to start this web server and it needs to be started in this way. So what I do is, this is like the entire configuration where I say like, this is a YAML file, it has a couple of key values in there. Namely, at first, we're saying, we need to have a socket and the socket needs to be this file descriptor of like bound to a TCP port on which the web server can receive its incoming TCP requests. We have a file descriptor where we want to sort of write out log file entries and then there is this sort of fixed response message that this web server needs to print. So this is the entire config file for a web server. Then there is the C file over here that implements this web server and I'll sort of zoom out a bit because font size it should be good enough for you guys. So at the top part, there is this chunk of code and this is sort of all we need to put in a source file to parse this YAML file or extract the values out of the YAML file. So you see that we're extracting the socket log file and message attributes from the YAML. Then here at the bottom, we have sort of this main loop of the web server where we're using the socket that was specified in the config file and for every request that we get, we print a log message and we write a response back to the browser. So I can show you what happens if I run it. So there's this launch utility called CloudABI-RUN, where we say we want to run this executable and specify this configuration flag. The reason I'm passing in this dash-e flag over here is normally in freebies that you don't need to do this, but on macOS, I'm using a stock operating system that's not capable of running CloudABI software. So I'm passing in the dash-e flag to turn on emulation. So if I run this, then I get this big warning message saying that emulation is not ideal and not secure, but if I now open up the browser and go to localhost 12345, you actually see that this web server is serving requests. So now I can just refresh this page and you see requests ending up at the web server. And basically the idea behind CloudABI is that because we have this policy file or this configuration file that explains how this needs to be started up, CloudABI-RUN can make sure that only those resources are available to the program. So only the socket and the file descriptor of the log file are available to the process, nothing else. So if an attacker would take over this application, that person couldn't do anything else than interact with the socket and the log file file descriptor. So you can even run this web server as root in theory, doesn't matter the program is constrained to its file descriptors at all times. So that's this five minute introduction of what CloudABI is. It's nothing more than this, a constrained programming environment that allows you to build these sandbox programs. So the goal of today's talk is to show you how it works in the freebies, the side, or if you're developing programs on freebies, what are the modifications I had to make to freebies into some of the related components to actually get this to work. This talk has two target audiences. So the first group of people, this is addressed to people who love CloudABI and want to know more. The second group of people is people who hate CloudABI and want to know how to design something like that themselves. That's the trade-off. So the first thing we're going to take a look at, this is a freebies system on which we can also run CloudABI programs, is how does development of CloudABI programs work? So I'll clear this terminal because I don't want to spoil anything. So on this system, what I can do is here is a simple source. I'll resize this virtual machine a bit because some of it is going off-screen. There is this cross-compiler that you can invoke like this. So instead of calling CC to build your programs, you can use the CloudABI cross-compiler. This can be used to build software, and now we're first going to reverse engineer this compiler. How does this compiler work? What's so special about it? So this compiler is actually provided by a package called CloudABI Toolchain. If we take a look at its contents, you actually see that this package installs this bunch of files. Basically what it does for every architecture that CloudABI supports, namely ARMv6, ARM64, and x86 go 32 and 64 bits, it installs the same files, namely like a cross-product of all of the development tools. So this is pretty nice. You install the single package and you end up with a compiler for all of the architectures in one go. There's no need to install four copies of GCC and bin utils, etc. So now we're going to take a look at those binaries, where do they come from? So when we run an LS on one of the compiler executables, we see that it's actually a symbolic link over to LVM 3.9. So this is actually the compiler binary that's provided by the stock freebies, the LVM 3.9 package. That's actually maintained by Brooks, right? Yeah, you also maintain the stable versions of the compiler packages. So now what I'm going to do is I'm going to spend the next couple of minutes sort of browsing through the LVM source tree and show to you what kind of changes I've made. So one of the first things you'll notice if you're sort of running some of the development tools on the Cloud ABI executables. So now we're going to run Redelf and sort of take a look at what the Cloud ABI executable looks like. It turns out the Cloud ABI executable is nothing more than like an ELF program. So it's the same executable format as what you see on Linux and on FreeBSD, Solaris, etc. But there are two things that are sort of different compared to FreeBSD, or at least two things are different. So first of all, what you see over here, this thing, unknown OX11. So what happens is when FreeBSD starts programs, so for the people who've seen Brooks talk is that there is sort of an image activator in the kernel that loads up executables and determines whether it can start those programs. So it has a separate loader for ELF and one for A out, etc. And one of the things that FreeBSD's loader does is it checks this OS slash ABI version to sort of see whether it's actually an executable meant for the operating system itself. So if you would run an OpenBSD executable on FreeBSD, FreeBSD will just say like, sorry, I don't understand this. This is for the wrong operating system. Or you're running the wrong operating system. That's what the OpenBSD people would say. Also one of the things that's sort of interesting about this is, and we'll talk about this a bit more later on, is over here, it says there were DIN, even though for FreeBSD executables, this would say exec. Cloud ABI uses position-independent executables exclusively. So they are not loaded in and a fixed memory address, the operating system can decide where to load them in memory. And the reason for this is this is something that sort of needed to run those programs on macOS where on macOS we don't have a choice like where we can load our program into the address space because that's something that's already decided or forced upon us by macOS. So that's the reason why Cloud ABI uses position-independent executables so that the emulator can deal with that properly. So now we're going to take a look at the LVM source tree. I'm going to switch over to a different window. So this is on my Mac and I'm going into the LVM source tree. And I'm going to show you a couple of source files in which I had to make modification. So if you ever want to add your own operating system to LVM, these are the files that you need to modify. And you'll see that it's only a very small number of changes. I mean, if you sort of take a look at what you have to change in the GCC source tree to add one platform, it's such nonsense. It's really like, if my operating system, then my operating system. And then in 100 places scattered throughout the source tree, it's just a mess. So with GCC, sort of with LVM, I want to show you that that's actually pretty easy. So one of the first things that I made modifications to is this file over here, set IC. And if we sort of search around for Cloud ABI, is I added this piece of code to the compiler, which is if Cloud ABI, then Cloud ABI. So again, this is not that interesting. But basically what this piece of code says is if we're generating executables for Cloud ABI, then put that 0x11 value in the executable header. If you don't do this, then eventually Clang still works. But it generates these binaries that have the number 0 in the header setup. So if you then start it up on FreeBSD, then FreeBSD sort of thinks, oh, wait, this is a Linux executable. So then it sort of starts to interpret it completely incorrectly and crash. So don't remove this piece of code. So also worth mentioning, there's sort of like a central registry where you can request your own numbers. So if you're building your own operating system, then you have to contact the people at the SEO or Cynuos, they're called nowadays. You just have to send them an email and then hopefully within a couple of days or weeks they send you a response saying like, yeah, your number is 18, 20, whatever. So that's the process I went through. Another change I had to make. So these are like, I had to make some other small change to LVM, but that's sort of all of it. Most of LVM could sort of remain as it is. But the most interesting change I had to make is actually in Clang, so in the C++ compiler. And I can sort of show you what this looks like. So if you're ever adding an operating system to a compiler, then you probably want to add a class to this one as well. It's called in a file called basic slash targets. And this is where you sort of add your own tiny class. Doesn't need to be really big as a template. Where you sort of say like, if we're targeting cloud ABI, then these are all of the defines that need to be set. So if you're building for free BSD and there's this underscore underscore free BSD definition, this is actually the class where it's coming from. So for every operating system, this source file has a class that sort of defines constants like these. So also for open BSD probably. So just add a class over here and it should be good. Then there are sort of two other source files that I needed to modify, driver, tool chains, CPP. And this is basically that you sort of have to explain to Clang where it can find its header files. So what's fairly annoying about operating systems is that they have different conventions where they store their header files. On free BSD, user local include, on net BSD, they tend to use user package include, I think. So this is just some logic here and there to sort of make all of that stuff work correctly. Also, some compatibility things like operating systems use different names for their C++ library. So you also need to add this function saying like on this operating system we're using these C++ libraries. And that's almost it. It's here some stuff related to shoot position independent code be enabled by default. In our case, it should. So there's also a function for that. But that's about it. That's what you need to modify over here. And then like the final source file that I needed to modify was this one where I'm basically giving like a function that sort of describes how the linker should be invoked. So in the case of Cloud ABI, I'm making use of LVMs linker nowadays, LLD. And this sort of explains like if you're building, you need to pass these command line flags to the linker. If we disable position independent code, then we need to pass in these flags as well. And that's about it. So a couple of classes, one commit, bam, put it in the LVM source tree. And then any binary of clang from that moment on actually supports your operating system. There's no need to support any custom compiler for your target. You know, because this is built into all copies of LVM, even the one shipped by Apple on my Mac by default sort of supports Cloud ABI out of the box nowadays. So that's pretty cool, not a lot of work. So this sort of concludes sort of the changes I had to make to LVM. And now we're actually going to take a look at what I've been doing to FreeBSD to sort of make this work. So the FreeBSD source tree. So for people who are not that familiar with the FreeBSD source tree, all of the kernel source code is stored in a directory called sys. And I'll run this. Oh, is there a way to, yeah, the colors are. Yeah, I mean, during the next couple of examples, you won't be bothered by all of these colors. So in the source tree, I've created a couple of directories. And at first it sort of looks a bit messy. These are all of the directories where all of the Cloud ABI source code is stored. So at first it may look like complete chaos. That's like, was it like 10 directories where Cloud ABI source code is stored? This is what's needed to make FreeBSD support Cloud ABI executables. But I'll sort of drill it down a bit and then you'll see it's not as scary as it looks at first. So these directories contain make files for building kernel modules. So if you're not building Cloud ABI support into the kernel, this is where it sort of ends up. Just a couple of make files in there. Then these three directories that you see over here, Compat Cloud ABI, Cloud ABI 32 and 64, these contain actually like the C code, like the kernel bits to make Cloud ABI stuff work. And this is architecture independent. So this doesn't contain anything specific to X86 or ARM. And it's divided in a couple of directories, namely generic code that doesn't depend on pointer sizes. So there are both 32, 64-bit Cloud ABI executables. And some code of it is separate per pointer size. So we need to do different tricks to get the 64-bit read or 32-bit write to work. That's sort of where this distinction comes in. But most of the source code is actually generic. So that's being placed in there. And then the other directories that you see in there is sort of architecture specific. So on I3D6, you can run 32-bit executables. But on AMD64, you can actually run both 32 and 64-bit executables. So that's how this division is made of directories. Finally, there's this directory, Contrib Cloud ABI. And that is where code ends up. That's actually not maintained by the freebies, the devs, so to speak. Nice thing about Cloud ABI is that like a lot of the code that's needed to add support is sort of generated. So we have this one repository where there's sort of like a master definition of, these are all the system calls that are there. These are all the data types. For example, einval is defined as value 123. All that stuff is sort of written down in this sort of abstract definition. And we have a whole bunch of scripts to turn that into header files that can be used by freebies, the or by Linux. And these are stored in this Contrib directory. So now I'm sort of go to trace or like show you what happens when you run a Cloud ABI program, sort of what code is being executed. So Brooks already during his talk mentioned really nicely that like the layer inside of the freebies, the kernel for loading ELF files is called inchact ELF. And there's some trickery in that source code to actually make it work for both 32 and 64 bit binaries. It's actually compiled twice with a whole bunch of macros that are being expanded. What's actually really impressive about like the ELF layer that's part of the BSD operating systems, especially if you compare to like the crap that's provided by Linux, is that it's built in mind to actually support multiple like operating systems. So it's built with that binary emulation in mind. So the Linux layer is so incredibly stupid. What it does, it checks the first couple of bytes of the executable. It says, this is an ELF file. So this means that this is a Linux ELF file. And it sort of starts to run it like a Linux program. So if you copy a freebie is the executable over to Linux and you run it, it actually loads it into memory and it actually goes to the first instructions and starts running it. And then it sort of explodes massively because you're starting this binary in this environment that it's completely wrong for what it was expecting. But freebie is the sort of doesn't do it this way. It sort of has this function in inject ELF. And it's called insert brand entry. And it's over here. And this function can be used to sort of explain to the ELF layer like, hey, I want to register like an environment. I want to register freebie is the binaries. I want to add support for Linux binaries. And that's sort of the function you call to register your own brand. So now I'm going to sort of show you how the Cloud API layer calls this and puts an entry in there and what this brand specification looks like. Because that's actually sort of really interesting. Gives you a really good insight. So now to peak, ViCompat Cloud API64 module.c at the very bottom of the C file. There's this function called Cloud API64 mod event. And this is sort of the standard function that gets called a freebie theme when you're loading or unloading a kernel module. So if you're running KLD load Cloud API64 to load up 64-bit support and put it into the kernel, this is the piece of code that gets run. And you actually see that over here on this line, this is actually where we call this insert brand entry function. And this is where we register our brand. And if it fails, then, well, the kernel module can't be loaded properly. So now what we're going to look at is like this thing. Because this is sort of the most interesting thing. This specifies the brand. And this thing is defined in another source file, unfortunately. Yes, it's over here. This is where Cloud API64 brand is. So this is sort of the instruction to the ELF layer, like it explains which binaries we sort of expect. And in the case of Cloud API, it's actually pretty simple. What we say is, when you see an executable that has that 0x11 in it, so again, this ELF OS API Cloud API, then this should be matched. We should run this. The same holds for the machine architecture. If you're copying over a Cloud API executable for Spark or MIPS, then you shouldn't be running it. That doesn't work. You don't want to load it into memory and then jump to the first instruction and sort of let it explode in your face. So that's why this check or this, you could set this field to sort of say, be picky about the architecture, and we're only matching this. See, so for all of the machine-specific files, we use different constants there, of course. And then this sort of says, I'll first do this one. This sort of says, this is an environment that can run position-independent executables. This is, I think, not set on FreeBSD or is it? I don't know. But this needs to be set to run a position-independent executables. And then finally, this is the most important one. What we sort of say is whenever you decide to start this executable and go ahead, the executable was of the right format, then this is sort of the structure describing how it should be run. So this applies to loading the executable and determining, should we run this? And this structure over here, that's being pointed at, this describes if we decide to run it, how should we do it? That's sort of the logic behind it. Oh, how did the lecture go? OK, let me close off the browser. Yeah. So now we scroll up. Where's this Cloud API 64 Elf Sysvec? Well, it's right above it, so we're in luck. And now we're sort of going to roughly walk through things that are described in the structure, maybe in random order a bit. And later on, we're going to dig through these a bit more. So the ones that you have at the top over here, these are actually quite important. So the fun thing is there are also some NetBSD people sitting in the audience. And I have to confess that NetBSD's version of the structure is a lot more beautiful than ours. So NetBSD people don't laugh at us. This is what I have to deal with. So yeah, these two fields are at the top of actually quite important. They describe the system call table. So Cloud API doesn't have that many system calls compared to FreeBSD or something like 400. Cloud API only has 57 or 55 for throwing out even a couple more. It's just a really tiny environment. And this is nothing more than a pointer to a table of system calls and the number of system calls that are in there. And this actually comes from machine-generated source code. So we don't need to worry about how that's being defined. Then there's this function over here, the fix-up line. That explains if we're going to start this process, how should the stack be initialized? So this is being executed before we actually run any machine code of the program. But it explains at the very late stage right before starting it, how should I fix up the stack? Pretty bad name. The SV name field, this is, I think, mainly used for debugging. So for example, if you're using trust that it actually knows the kind of executable that you're running, also when you get a kernel panic, you sort of get the stack trace in FreeBSD. And it also prints the kind of executable you're running. So it says this was a Linux executable that was causing the crash. Don't do that. Then there's also how to write out core dumps. For now, I'm just relying on the standard FreeBSD core dump function. So it generates a FreeBSD style core dump. Good enough. These fields over here, especially like the VM min address and VM max user address, you specify the maximum and minimum virtual address space that may be used by the process. And you might wonder, why does it even matter? So this actually matters for running 32-bit executables on a 64-bit kernel. Because you have the 64-bit address space. Well, technically speaking, it's 48 bits. But 64 bits, you don't want that the kernel decides to start up a 32-bit process and load it all the way in the top of the address space in a location where the program can't reach it anymore. Because then the program starts to run and the kernel says, yeah, good luck. I've managed to map your executable at like 40 gigabytes in. And then the program use phase is like, I can only address up to four gigabytes. So these fields up there, they sort of control, you can sort of constrain, don't go higher than four gigabytes. But in our case, we just say, pick the min and the max address. That's good enough. What's below here is standard stack permissions. So on freebie is the, unfortunately, reserve to add the executable permission to the stacks. On Cloud ABI, we don't need to. This is meant to be used for copying out command line arguments. I'll talk about this a bit later because it's a bit interesting. And this is how the initial registers of the process should be set on startup. Then there's a couple of flags over here. This one is interesting, sv-gapsicum. That one is being used by the kernel to, what you want to have, of course, is when you start a Cloud ABI executable, that when it starts running, you want to have it as such a way that freebie is also sv-gapsicum enabled. So that is actually being run in a sandbox mode. If this flag wouldn't be here, then freebie is the sort of starts it up immediately. And then the program could just execute system calls in such a way that they just completely ignored a gapsicum security model that's being used. So that's what that flag does. And these ones will also sort of look into those in a minute. They describe how system call arguments and return values should be extracted and set in the registers. Are there any questions at this point? Then I think I'll just go on. Yeah, so the first one I think I'll be looking at is the fixup function. And right now, at first you think, why is this useful? But later on you'll discover that this is actually a really important function in our case. Module fixup, yeah, your Cloud API 64 fixup. And what this does, I'll make it a bit smaller, otherwise it sort of wraps around a bit, which I don't like. This initializes the stack of the process. And there's a couple of things that are being copied out on the stack. So this one, I'll start here, which you see at the top of the screen right here. This is whenever a Cloud API program starts up. Cloud API programs may make use of stack smashing protection. So at startup, it already needs to have a random number or random seed. So from that moment on, all of the stack frames can be sort of marked with those random numbers. And they can be checked afterwards to see whether there was a buffer overflow. So what we do is we first copy out some random garbage, well, garbage, random numbers out onto the stack. And the program may use that any way it likes. And then what we're sort of doing over here is I'm going to skip this for now. This is sort of the interesting part. Brooks also mentioned it during his talk, auxiliary vectors. So the kernel can sort of give a key value, list of key value pairs over to the program of all sorts of interesting things that might be quite useful during the program's execution. So a really good example of them is there in Cloud API, there's no special system call for a thread to get its own thread ID. It sort of has to remember that on its own. Why would there be a system call for that? It's a constant thing. It's a value that's picked at thread startup, and it doesn't need to be fetched every time. So one of the values that's being sort of sent over to user space when the first thread of the program starts is the thread ID, which is placed in the auxiliary vector. Also things like other constant things, well, I assume they're constant like the number of CPUs and the page size. It's good enough for now to keep that as constant in there. But also some things that are being passed on is the address and length of the argument data, so the command line arguments that are being passed in, and also the location of that random stack smashing data. So one of the most interesting fields in there is really the most interesting one is this one. But I won't explain that one yet. I'll keep that as a secret. Stay tuned. So now that we've sort of seen how this stack is being prepared, this thing is being copied out eventually to user space and put on the stack. Now we're going to take a look at how system call argument fetching works. So going back to this file, there were two functions, namely fetch system call args and set syscall retfall. So the one at this one that I'm going to show you now, this is the function that's being invoked every time a Cloud API runs a system call. So how it works is it gets a so-called trap frame, and a trap frame is a sort of copy of what the registers of the thread look like by the time it jumped into the kernel. All of the arguments of Cloud API system calls are stored in registers. The largest system call that we have only a six argument, so you can fit all of that in registers on AD64. So what you have over here is this sort of six lines in a row where we're fetching the arguments, numbers from 0 to 5, from the respective registers in user space. There are some madness going on that the value of R10 actually sort of gets lost, but a backup copy is saved in RCX. So user space writes it in R10, but in this specific code we need to grab it out of RCX. Not that awesome. And then the second function is like, how do we set the return value? So the system call is running, and after some time it completes, it can either succeed or it can fail. And this piece of code then determines how do we set the registers when turning back to user space. So if the error number is 0, the system call succeeded. We write the return value. So for example, the number of bytes written in case of the right system call. Yeah, that's like the best example. Or like if you call mmap to allocate memory, the memory address of the pages that you manage to allocate. That's being written into RAX, and some certain system calls have longer return values. It's also written in RDX. And we clear the carry bit. So the carry bits uses an indicator to keep track of whether a system call succeeded or not. And then if the system call fails, so we have a non-zero value, then what we do is in RAX we write down the error number, and we do set the carry bit. So that's basically how user space distinguishes between whether something succeeded or not. It evokes a system call. And later on, when you get back to user space, it checks the carry bit to see did we succeed or did we fail. That's the logic in there. So now that I've shown you how system calls are being invoked in a certain way, how do we fetch the arguments, and how do we deal with the return values, it's also important to show you what does an implementation of a system call looks like. Because not shown that would be really out of context. So in this specific file, syscalls.master, contrib, cloudABI64, yeah, syscalls64.master, this is the list of cloudABI system calls. So if you go to the bottom, there's, well, it starts counting at zero, so there's 57 in total. And these are just C prototypes of all of the system calls. And FreeBST has a bunch of these files, not only for cloudABI, but also for FreeBST itself, syscalls.master. This is just a list, automatically generated list of system calls. But what makes cloudABI sort of a bit unique compared to how FreeBST itself works is that the generator we used for this file has logic built into it, which Maurice wrote in the back, to distinguish between system calls that depend on machine-dependent data structures and ones that don't. So a really good example is, for example, the FD-dub system call, which is equivalent to the dub system call in Unix, it doesn't depend on any machine-dependent types. FileScript goes in, FileScript comes out. It's nothing special about it. So this one is called cloudABI underscore sys. But then right underneath it, there is this annoying system call, period. And this one does depend on machine-dependent structures, namely, you pass in this list of IOVex. And this one, the layout of the IOVex structure depends on whether you're using a 32 or 64-bit system. So the scripts that Maurice wrote make this really good distinction between them. And it sort of tries to keep system calls generic as much as possible. So in the case of cloudABI, it's the case that I think 48 of them are pointer-size independent, so to speak, and 9 or 10 or so are pointer-size dependent. So in case 128-bit CPUs would ever appear, or we want to port this over to 16-bit CPUs, like run as long as 286s, then we only need to implement those nine missing system calls that are 10 or how many they are for the pointer-size dependent ones. But all of the pointer-size independent ones, we can just reuse them again. So that's pretty awesome. If you sort of look at the compatibility layer for running 32-bit programs on 3bSD, it's sort of a bit clunky. We sort of had to make this hand-written system calls master file where we're sort of manually determining whether stuff is pointer-size dependent or not. And in some cases, I think we might even get it wrong. In this case, it's actually really structured because we use scripts to solve this problem for us. So now I'm going to show you what an implementation of such a system call looks like. So if I'm going to compact Cloud ABI, Cloud ABI mem, and yeah, mem-map, this is sort of what the implementation of M-map looks like. So the mem-map is sort of the Cloud ABI's equivalent of the BSD M-map. So what we do is we convert the flags that you pass in. So if you use Cloud ABI's map and on flag, then we translate to 3bSD's, just goes on like this. And then in the end, we just call into 3bSD's this M-map. And that's sort of does the rest for us. So these wrappers, they sort of look a bit like this. Then I can also show you that here in Cloud ABI 64, Cloud ABI FD, FD write is a good example. This is like a machine-dependent system call because of the IOVAC structures. So you see that this is actually like a two-stage rocket. In the first part, what we do is we convert the 64-bit Cloud ABI IOVACs into like a native format to struct UIO, as you see over there. We then invoke the generic counterpart of that function and free it up. So these wrappers are just like this. So for like those 50 system calls, it's I think something like 4,000 lines of code to get all of those system calls working. This sort of summarizes what like the kernel space code of Cloud ABI looks like. It's nothing more than this. There's actually one thing where Cloud ABI differs wildly from 3bSD in a certain way. And that is when I was showing what the fix-up function looked like and I didn't want to spoil like one tiny detail, this is actually what we're getting at right now. So Cloud ABI programs, they don't invoke system calls by using hardware instructions. So normally in 3bSD, what happens is it does something along the lines of in register ex, move to number 12. I want to invoke system call number 12. And then invoke the system call. And when we're done with that, we do a jump on carry, set error number in libc, ret. Something like this is what a system call on 3bSD normally looks like. But for Cloud ABI, I didn't want to put this stuff in user space programs. And the reason for that is I want to be able to run these programs on macOS as well. And if programs and Cloud ABI programs would actually run this kind of stuff on macOS, they wouldn't call into the emulator, but they would call into macOS itself. So that is pretty hard. You can't easily trap system call instructions from within user space. So what I've done instead is I think it's actually a pretty nice solution. I've written a so-called VDSO, virtual dynamic shared object. And I can show you what it looks like here. It's this assembly file that's like 500 lines big. But it sort of has that syscall invocation in there. So what happens is that the kernel sort of has this really tiny shared library included. It's only like two or three kilobytes big. And when the process starts, it sort of gives that library to the programmer user space, and it says, whenever you want to focus a system call, do it through these functions. So what's really nice about this approach is that we can actually also dynamically add and remove system calls. If some system call we don't like anymore, we can just remove it from the kernel entirely. And the programmer user space will start up and will notice that it's not part of the VDSO that was being pushed in by the kernel. And then it simply already knows that that system calls absent from the start. It's not like we eternally need to keep track of that system call in kernel space to sort of say like system call 48, that one was reserved for this function. We need to keep it there forever. So it really sort of makes the coupling between user space and kernel space a lot more loose and allows us to dynamically add or remove things. Also, it's pretty awesome what's not there right now. If we would ever add support for a symbol versioning in there, you could also change the prototypes of system calls entirely. And we could just have like a VDSO that exports two different versions of the same system call under the same name. So programs in user space, well, in theory, they have no understanding of how to invoke system calls, how to switch over to the operating system. The only thing they do is they get a library from kernel space and they call into it and they sort of hope that library does the right thing. And that's the reason why emulation is also so quickly if you're running on macOS because there is no traps or complex hardware instructions. It's really the program working together with the emulator in a certain way. The emulator just says, like, here are some stubs that you need to call into and I'll translate it to macOS system calls. So this is really sort of what's quite different between FreeB is the native programs and Cloud ABI. This sort of concludes what I wanted to tell about the kernel space side of FreeBSD. And now I sort of want to show you like how things look on the user space side. So if a Cloud ABI program starts up inside of the C library, what's going on there? So Brooke's already mentioned during this talk that on FreeBSD there is a function called underscore start and that's sort of the entry point that's being called by the kernel when the program starts up. Our Cloud ABI also has an underscore start function but it's implemented in a different way. So now I'm going to the C library, source C of t, C of t, 0, and I'm going to show you what our start function looks like. So what happens is that once the program starts, the kernel sort of invokes it with a single argument. It just makes sure that the registers are set up correctly. You know, there's a set regs function that I briefly hinted at earlier on. And the kernel makes sure that this underscore start function is invoked with a single argument, namely with that auxiliary vector. So the start function immediately gets access to these key value pairs of all of the interesting attributes that we want. And then what happens if you sort of scroll down? You see that there's this ugly loop in here that extracts all of the things from the auxiliary vector that we're interested in. So it extracts its own thread ID, et cetera. And this field that I didn't want to spoil earlier on, this is actually the base address where the kernel stored at VDSO. So it loaded the program into memory, and next to it at some place in the address space, it's also loaded that tiny assembly written library there. And this is the base address of that library. So this is sort of the library contain. This is explaining like, you know, if you want to invoke system calls at some point in time, you need to go through this library over there. So after going through this loop, this is also where sort of the madness starts. So Cloud ABI executables are position independent, at least on AMD64 and ARM64, because in other architectures it's too feasible to do. But this is where like the insane thing happens. So what happens is that programs can't be made like fully position independent. There are some things that can't actually sort of be placed in the executable or stored in the executable in such a way that they're completely position independent. So if you have a global variable that's a pointer that's pointing to some other global variable and that's pointing to some other global variable, those kinds of things can't be filled in at compile or link time. They really need to be filled in at runtime when a program starts up. So what happens is that the CRT0 files written in such a way that it actually doesn't use such constructs itself. It can be run in this sort of cripple environment where we've loaded the executable in the wrong memory address, so to speak. You know, it's completely relocation free, but there is sort of a tiny relocator in there. So what happens is that in the headers of the executable, there's sort of this huge list that sort of stores like if this executable is being moved over to a different place in memory, then you need to make sure that you patch up these memory addresses to point to these other variables in the program. And those are called relocations. So this is a relocator, a relocator applies relocations to make sure that the program can be relocated. Then this is sort of like the start of it what we're seeing. So what we're doing, we're sort of keeping track of the so-called dynamic section. And in the dynamic section, that's also where the relocations are stored. And what we're also doing, we're keeping track of this railroad header. And this is really funny. Once you're done relocating, there's actually certain pieces of memory that needed to be writable to apply the relocations. But then once you're done, you can make them read-only again. So we sort of keep track of this information so we can make it read-only afterwards once we're done relocating. So also threat local storage, I won't explain too much about that right now. Now that we're going a bit further, we're going to extract the list of relocations. And then over here is this loop where we're applying them. So only on x86, 64 and arm64, we have support for position independent executables. And these are the only relocations that we need to take care of. There's like a whole bunch of different types of relocations. But in our case, these are the ones we need to apply. So the only thing it does is, for a whole bunch of memory addresses in the program, we're changing the value to be the base address of the program plus the offset that was being requested. So this is like the patching up that's being done of the program. Once that's sort of done, now the program is in the same state. So all of this code and everything that comes above is in really sort of this cripple state where the program is only sort of half functioning because it's loaded up in the wrong address. But this piece of code sort of tackles that. And now things are starting to become same. Now, well, we're only halfway there because we still can't invoke any system calls. That's still madness. So all of this piece of code that you saw above there, it can't allocate any memory, it can't do anything. It's really running in this really constrained environment. So now we're going to like make system calls work. So how does that work? Cloud API programs sort of have a built-in system call table. So it's like an array of function pointers where like the system calls are placed. And by default, it initializes it with a function that does nothing more than, which I can show you, enosys. So this is like a really simple stub system call. And whenever you call a system call that the kernel doesn't implement, it only does this. It only returns enosys. And now what we're going to do is, well, this is sort of some legacy garbage that you should ignore. So let's remove this. You didn't see this. Also this to-do entry can be removed entirely. Yeah, see? This is what I want to show. So it evokes this link-video-zo function. And this is just a tiny piece of code that gets a shared library that the kernel passed in. So it gets it from the ET-Sysinfo E-Header. And it walks over the ELF headers, looking for the symbol table that I said over here. And then eventually we're looping over all of the symbols in that library that the kernel sort of pushed into the program. And we're looking for everything that started with Cloud API sys. And these are the system call functions that the library provided. And we're sort of patching them into that table with this ugly loop over here. So once link-video-zo is done, now we can sort of invoke system calls. We have an array in the program that contains function pointers to the proper implementations of the system calls. So now here's like a remnant of the relocator. If relocation for some reason failed, only at this point in time we can abort. Before we couldn't abort because we didn't have an abort system call yet. So, ah, it's a bit messy. And again, that railroad stuff, relocation, because it's now finished, we can mark some stuff read-only. So this is like the first system call that a program, a Cloud API program calls. It calls a memprotect to sort of make that stuff read-only. And now global variables work as well because we applied relocation. So we're storing a whole bunch of stuff that was placed in the auxiliary vector in global variables so the C library can end up using it. Setting up stack smashing that's done over here. Initializing thread local storage. I won't explain anything about that. That's all voodoo. Initializing P threads because we now know our own thread ID. So, yeah, that's something that sort of needed to make threading work. If you don't know your thread ID, you can't acquire locks because you have to write your own thread ID into locks to say like, I'm the thread that owns this lock. So this is like some stuff needed to make P thread join work and this is some stuff that's needed to make mutex locking work. Global constructors. Brooks also explained this during his talk. Some simple loop to call constructors. And then we're calling into program main which was sort of that alternative entry point that uses the YAML data pass into the program. Now there is still eight minutes left. Yeah. What should I explain next? I've got something, I can strip it down. I've got something that takes like nine minutes to explain. So, sort of have to think out, yeah, yeah, I'll shorten it down a bit. So, now the question was also like with that web server that I showed you, demo web server. You know, I showed you the config file where you sort of both have config attributes and file descriptors in there. So, this is like this magic tree of both strings, integers, Booleans, dictionaries, et cetera, but it also contains file descriptors. And the question is, how does this get passed on to programs? You know, how does it look under doing it? How can the kernel sort of copy one tree of stuff over to the other program? Well, what I can sort of show you is that in the C library, first of all there is this API called ArcData.h. And this is functions that can be used to sort of create this tree structure. So, what you can do is you can say like, okay, let's create a string and let's create a floating point number and let's create a file descriptor. And then later on, we place them into a list. So, this is like a YAML style list. So, you end up with this ArcData object, ArcDataT, and that represents a list of like the string, the floating point number and the file descriptor in there. And this is sort of what like a program that wants to start another cloud API program sort of invokes. First, constructs this tree in memory of its arguments. And what happens then is there is this program exec function. And this is sort of the function that you use in user space if you want to call different cloud API programs. So, what you say like, I've got a file descriptor of a program, like the new program I want to start like the cloud API web server. And I want to start it with these arguments. That's sort of what this call does. And how that works in practice is actually, I think it's pretty cool, program exec. So, let's take a look at the cloud API C library. It actually, this is sort of where the magic comes in. Like how can we get this tree transferred over to the other side? It sort of decomposes it into two things. So, first of all, it serializes the entire tree into like a binary blob. So, all of the strings, integers, booleans, dictionaries, lists, they're all turned into like a binary YAML in a certain way, like a really compact way of representing that. But then we also need to deal with the file descriptors that need to be sent over to the other side. So, what happens is that this function, our data cat buffer decomposes the two things. It serializes all of the non-file descriptors and puts all of the file descriptors in an array. And that serialized data, so the one chunk of data sort of has references over to like the file descriptors. So, it says like, of those file descriptors that we extracted out of the data, the first file descriptor corresponds with the sockets. The second file descriptor in there corresponds with the log file. The third file descriptor corresponds with the web service route. And these two things are then passed onto the kernel separately. So, the fun thing about this model is that Cloud ABI program of like the kernel has absolutely no understanding of ArcDataT. The only thing it can deal with is a binary blob that needs to be passed onto the process. And like here's all that junk that, you know, strings, booleans that the new program needs to have. And here's a bunch of file descriptors that need to be preserved in the other process. And this sort of keeps the kernel, you know, lean and clean. And also makes it a lot more flexible because now in user space we can introduce new data types. So, say if we want to add a complex number type or, you know, bi-directional maps or any other stuff. All of that stuff can be implemented in the ArcData library. But the kernel only needs to be implemented in such a way that it can copy over like binary junk and a list of file descriptors. And then on the other side in the new process, that's sort of combined into like one unified tree again. So, it's two separate things, but as a programmer you won't notice anything about it. So, this is sort of like all, I think I'd better sort of keep it at this. This is sort of like in bird's eye view, or well the really low level view how cloud API works. It's a lot of code. If you understood half of it, then I'm already pretty glad about it. Are there any questions? Did you find this informative? Not at all? Yeah. Oh, yeah. So, that's a really good question. So, the broader question is, is there any support for dynamic linking? Can I use shared libraries in cloud API programs? And the answer to that is no. So, one of the things that I, now I'm going to make a statement and then now like five people are going to correct me, which I think is really horrible. Having shared library support or not is actually you get two features or you don't get both of them. As in, what's really annoying about shared libraries is that the entire API of dealing with shared libraries is actually two things. So, first of all there's like the linking on startup where you create a program and you link it with dash L whatever and there's deal open. And both of those two are completely separate beasts that need to be tackled separately from each other. In the case of cloud API, I wouldn't mind having a deal open function. You know, where you can just say, I've got a file descriptor to a library, like an FDL open, like load this into the program, I want to call its function. That's perfectly fine to me. But what doesn't really work that well for cloud API is linking against libraries at compile time. And the reason for that is, cloud API programs don't have like a global file system namespace, there's no user lib from which you can load libraries. And that's also not something I want to have. I don't want to have in a global namespace. But as far as I know, like looking at sort of the way ELF works, looking at the way the LVM, LD linker works, looking at the way Clang works, you only get both or not. So, they're essentially the same thing. Well, from the perspective of the dynamic linker, there are some differences due to initialization or things like that. Otherwise, they're really the same thing. Exactly, but for cloud API, they aren't. Because you can't like, in the header of an ELF executable, say I want to load in lib SSL.SO, and I want you to use file descriptor five. That doesn't make any sense. So, like this, already doing this linking at startup, or like when the program's loaded and started up, I can't really think of a sane way in which I could get that working with cloud API. Because with cloud API, the first instruction that the program already calls in underscore start, is already running in the sandbox mode. I can't just access user lib from there. And there's no easy way to decouple these two for now. So, what I've done is I've just focused on only doing static linkage. So, no dependency libraries. And it does have its disadvantages, but it also makes certain things a lot easier. One of the pretty cool things that I like about not having dynamically linking against libraries, is that because you already know which functions need to be present during the entire lifetime of the process, you can also do some really aggressive garbage collecting of functions. So, with LLD, what I really skimmed over briefly, if you looked at the ClangSource code, it always calls the linker in such a way that it calls dash gc sections. So, it really trims off any code that's actually not being called. So, if you link in a huge library but only use a fraction of it, it's not part of the resulting executable. So, yeah, no dynamic linking, but yeah, good enough. We'll tackle that later on if people really need this. Yeah. So, like you're talking about like the interpreter header. Yeah, yeah, yeah, yeah. So, what happens is that in Clang, what we skimmed over really briefly, it calls the linker with dash dash no default linker. And that makes LLD not add like one of those interpreter headers to it. So, then the kernel starts it up directly. You know, it doesn't try to load an interpreter there as well. So, that's the reason why our CRT zero is a bit fatter than it normally is because it really needs to be sort of freestanding in a certain way. The program starts up and it needs to fix itself up. So, that makes it all a bit more complex. That's also the reason why there's no position independent executable support for 32-bit intels because that's pretty hard to accomplish if you're sort of starting up in this unrelocated way on a 32-bit system. There's no RIP relative addressing. So, yeah, but it's a really good question. Yeah, are there any other questions? None at all? All of you were sort of baffled about all of the technical stuff. Well, it's already late I guess. I guess most people want to head over to a restaurant, grab some Belgian beers. Oh, thanks for attending anyway.