 So I'm going to go ahead and get started. I'm going to give a talk called, So You Want to Add a System Call. This is a talk that I wrote because I did a lot of work writing system calls and working on different ABI's. And in the process, learned a lot. And also noticed that many of our most senior developers in FreeBSD who add system calls did not understand how the compatibility interface worked. And we're really consistently wrong in all sorts of different directions. It was remarkable and surprising. So I decided I'd write everything down and we'll see how that goes. OK, so why listen to me? Well, so a long time ago, I added the wiki page on how to add a system call and response because I was trying to figure it out and I couldn't find any documentation. That was really annoying. And I was trying to add some system calls. Ironically, the system calls I was adding have never landed in the tree. But nonetheless, I did write that page. I've also implemented two complete alternate ABI's outside of the FreeBSD tree in CherryBSD, which I'll talk about a bit more later. And as a result of that work and not wanting to type 6 million copies of syscalls.master, I automated the process of creating alternate ABI files. And I brought that into FreeBSD in the last year, which should hopefully eliminate a lot of the confusion. As now, you will run some code and the code will say, you need to do this compatibility thing or you don't need to do any compatibility thing as long as you wrote the definition right. So why add a system call? Well, obviously you do it to access kernel resources. You might do it because you need to trust it intermediary. So if you have two processes communicating and they need to do it in a way that is secure, they would do it via the kernel rather than via some shared page or something like that. You might do it to avoid excess context switches. For instance, the send file system call was added so that you could say, I want to send all this data from this file out to the network and just tell me when you're done. And rather than having to do reads and writes and reads and writes and reads and writes. Or an existing system call might be insufficiently expressive. So we've had a number of cases where we have the system call that's like almost what you want. So we added a new version that takes a flag. And so those are sort of common reasons why you might want to add a system call. Next, why not? Well, the main one, system calls are mostly forever. FreeBSD supports virtually every system call that any version of FreeBSD back to our origins in 4.3BSD. There's a few big exceptions like the KSE threading thing, which we had to remove the system calls when we ripped out all the kernel bits. But for the most part, we support everything. We can run almost any binary that's ever run on a FreeBSD system. And that adds maintenance burden and just an accretion of a tax service, frankly. So don't add them for no reason. It might also be that something like an IO control makes more sense. So anything that's device or file descriptor type specific, that's probably something for an IOctl. And I'm not gonna talk about them today, but they have many of the same issues that system calls do. And the design of interfaces has a lot in common. So many of the things you'll learn here would apply there. Or it might be that you want a SysControl for changing system-wide defaults or obtaining information, this can be good. We also tend to be a little sloppier about compatibility there and view them as a user space thing. So it's fine if it changes a bit. That's not totally true. Sometimes some of them are very much permanent things, but there's a little more flexibility. So how does system calls work? We're gonna start with a really simple but not the simplest example. We have a slightly contrived hello world here where we use P write V to sort of the most generic way to write a bunch of data. We construct an IO vector array which has pointers and pointer length pairs. And then we send that out and we call the system call implementation. So what does that system call implementation look like? Well, here's the implementation on AMD64. The parts we care about are that we set the system call number, the number of the system call we want to call. We stuff that in a register. There's this little thing that does something with ABI that I don't understand and didn't care about so I didn't research it very much. And then the next call tells the kernel, hey, I wanna make a system call. It causes a trap in the kernel and that's what happens. And then on return, if there was an error flagged then we call this C error function to handle that, otherwise we just return. So that's the basics of what user space does to make system calls and we're done with user space for the day. So the kernel side. Let's give a quick overview here. So there's that trap handler that I talked about when you make the system call instruction. And it calls a machine dependent system call handler. That in turn calls this syscall enter function which does a bunch of things. Not all of which we're gonna talk about any great detail because we're interested in the interfaces rather than the gory details. So it calls this CPU fetch syscall args to fill in a per thread syscall args structure and we'll talk about that more. There's a bunch of tracing and auditing and checking if you're in capability mode and whether or not this system call is allowed to run in capability mode. And then the actual implementation which is sys underscore syscall name gets called. Once that's done and it returns, there's this CPU set syscall retval which adjusts the registers prior to return to user space to set the return value so that user space knows is there an error flag? What was returned? And then this syscall ret is called already bugging tracing stuff that's not really to the point today. So CPU fetch syscall args. This is kind of where all the tricky bits come in is the interaction of this with the actual system call implementation. So this thing's job is it initializes the return value, sets a default return value and then it fills in this syscall args structure. It takes that code, the number that we saw in the assembly and stuffs that into the code thing. It also fills in the pointer destruct sysent that points to the actual details of the system call implementation like the function pointer to the sys underscore bit and it fills in this array of arguments. So we can take up to eight arguments for each system call and each one of them is a sort of machine architecture sized integer. So on 64 bit, that's a 64 bit integer. On a 32 bit machine, that would be a 32 bit integer. Now, here is the implementation of pwriteb, at least the very top of the implementation. So what it does is it copies in that IO vector that we created and makes a copy of it in the kernel ABI format. Now, if that's a conventional, if the kernel is using the same ABI as user space, then that's pretty much just copying it in. It just copies the array and allocates some space and copies everything in. If that were a 32 bit version, it would have to copy the elements and translate the pointers and translate the sizes. But mostly its job here is to get that into the kernel and have it ready for use. Then we call the underlying implementation. This is the Kern pwriteb. Most system calls have one of these, not all of them. Some of them start immediately going off and calling other kernel interfaces. But most of them these days have a current underscore version. However, we're only interested in the interface today. So we're not gonna talk about what's going on underneath, how it looks up the file descriptor and finds what type it is and figures out how to actually schlep data to it. We're just gonna talk about interfaces. So I'm gonna jump to the end briefly and get return values out of the way. So most system calls, their interface is that they either return zero or negative one, zero being success, negative one being, there's an error, go look at error note. The C error function that I've talked about briefly is what's responsible for taking whatever the kernel user space calling convention is for passing that error back and getting it into the error note variable in user space. So in this particular case, so normally just the sys function will return either zero or an error code and then some magic happens so that you get what you expect in user space. So normally functions that set something other than zero or one will instead set this thread specific retval number and very rarely they'll set a second one. The reason for the second values, there are some historic system calls, for instance the pipe system call which returns two file descriptors. And also that's used for 32 bit programs or 32 bit interfaces where you need to return a 64 bit value so it's chopped up into pieces and returned. Now let's talk more about argument handling because this is where all the sort of confusion and misconceptions tends to happen. So here is that system call argument structure that let's go back actually. Here's the system call argument structure, this so called UAP which is argument here and then you see these arguments are passed in as pointers to function. Now let's figure out how do we get, so how do we get from that array of integers that we had that we'd filled in into this structure? Well the simple answer is we just cast it and we commit a horrible aliasing violation and hopefully it doesn't blow up in the compiler. So someday it's probably gonna bite us but for now we're okay and this is how it works. So let's see how that maps in. So first off we have the first element of the array and we have our integer file descriptor, second element of the array we have a pointer, third element we have another integer, this time an unsigned integer and then we have an offset. We're not using the offset in this case but here's the offset of zero. Well that's all well and good but you might notice this is all little endian where the actual non-zero numbers are all on the left and then we have these things we don't use on the right but what about a big endian system? Well if we brought this number in as a 64 bit integer because it was just in a register we need to handle that. So in big endian we add some padding but some padding in for the integers and then so everything lines up properly and we get the right values on the right parts of the structure. On little endian for consistency we do the same because well we don't strictly need it on any of our current 64 bit architectures because the two 64 bit values have 64 bit alignment in the structure. It's more complete to pad everything out and now here's what it actually looks like in the kernel so you can run screaming and ignore this part but if you ever are looking at the definition of a UAP structure as part of writing a system we call interface unfortunately this is what the Dmitrii generated code looks like. Sorry. There's probably some argument to be made that we could stylize the definition so they are slightly less horrible and every line isn't 120 characters but that has not happened at this point. Actually it would be pretty easy now that Kyle has rewritten the system call generation code in Lua much easier than the pile of awk and shell we had before. So now let's talk a bit about, so you have sort of a general idea how the argument parsing works. Let's talk about adding a system call and how this works in practice. So first off we add an entry to syskernsyscalls.master. This is where we declare the interface to a system call. One great thing about FreeBSD as opposed to Linux is every system call has the same number on every architecture. On Linux the system call numbers start with whatever the most popular Unix platform was whatever numbers were used by the most popular Unix platform at the time that the port was created. So you might have Solaris or HPUX system call numbers whereas we have them all the same, it's much nicer. And we declare them all in one place. The kernel was responsible for that. There's no confusion as to whether glibc or the kernel owns things. We have an integrated system and I think that's true of all the BSDs. So you add it to syskalls.master. You add it to the end. There are all the gaps in the system call registry are now reserved for local extensions. You implement your sysfoo function that does whatever it is you need to do. If you need to, you implement a FreeBSD32 version of it. And we'll talk more about that in a bit. You run this makesysent command at the top level of the source tree. You export to generate some files that we'll talk about a little more. You export the symbol from, you export the symbol from libc so that you actually call the system call as opposed to using the system syscall and the syscall, syscall rather. And you add a man page, always add a man page. Don't make people sad. So here's what an entry looks like in syskalls.master. So on the upper left, we have the system call number. We then have the audit event type. Allocating audit events is slightly awkward and that you need to get them allocated in the OpenBSM project and then get it merged back. But it's not a huge deal. And there are people who monitor the mailing list there to get that done. This is probably the most weird out-of-band step in the process. And then we have some flags. In this case, we have this STD, which means it is a standard system call. I mean, it's always part of the previous DAVI and it's not, say, a compatibility interface that's only on if you turn it on. And then CapEnable says, this doesn't access arbitrary namespaces, so it's okay to use it when you're using CapScum in your program and you have entered capability mode. We then, you know, the part in the middle looks just like a C function declaration and that's by design. So we have our types and all that stuff. It's worth noting that the tool doesn't really know what types are. It's not parsed by something that understands C. And we'll get to a few extensions of this in a moment that we take advantage of to work around that issue. So we add a couple more additions to the declaration. These are the first one. This in-reads IovCount tells a program or what the memory footprint of this system call is. It says that from the IovPointer, we read at most IovCount objects. And that's useful if you're writing wrappers or loggers that want to be able to deal with any system call and work fairly naively with those system calls. So they don't have to know all the details of the implementation. They just need to know that, hey, I know this is gonna be read from user space. If anything, outside those bounds is read. That's maybe bad. And also that it's a read versus a write. And then this contains long pointer is a new addition which indicates things that there are members of the structure which might change depending on the ABI. There's three sort of values for contains. There's long. So is a long 64-bit or 32-bit pointer. Is a pointer 64-bit or 32-bit. FreeBSD, those are always tied. But in CherryBSD, that's not true. So I've gone ahead and generalized and put both in here. And then there's a third which is TimeT, which basically is, is it I3D6 or is it a sensible architecture? And so that's those indications. That's an addition they are not strictly required. So if you're writing system calls for your own platform, local extensions, whatnot, you don't have to do this stuff. But we do require those at FreeBSD and they are very useful. So let's talk about the user space bits. I said we were done with user space, but we weren't quite. There's a little bit in the system call adding part. So first of all, for a system call whose declared interface is the public interface, a stub is created automatically. There's no effort required here to create the stub, but you do need to expose it. So you need to add it to this symbol.map file. One bit that people get wrong on a regular basis is they add it to the wrong section. So there is a new FreeBSD.one.number version for every major FreeBSD revision. They don't have anything to do with FreeBSD versions. Yay, because why would we make sense? But you need to add it to the right one. And even if you merge it back to an older version, it stays in that version number, which is a little weird and confusing. Also, despite the fact that the majority of system calls export their underscore sys foo and underscore foo versions, that was a mistake. So don't do that on new ones. We should probably rip them all out at some point or all the ones that aren't used by some specific library because that wasn't intended. There were some misunderstandings early in the symbol versioning code. Or in the symbol versioning adoption process. Now, obviously every new system call needs a man page. Do that. And then less common, sometimes we'll add a new system call whose interface is not the public interface. So a recent example of that is the special FD system call. So it's for creating, I can't even remember which type it is actually used to create for, but it would be, you would use it for something like if we were porting 1x's event FD or timer FD, I think maybe it was added for event FD or signal FD. If we have a single consolidated system call that can handle any of those cases, then you need to add a wrapper and you don't expose the system call itself, but rather expose the wrapper. Not very common, the best way to handle that is find the most recent example that looks like the one you've done, the one you want to do and see what was done and check for follow-up commits in case it was done wrong. Because that's the sort of edge cases that can get tricky. You run make syset, it generates a whole bunch of files. There's this init syset, which sets up a big array of all the system calls and all the data about them, the function pointers. And syscalls.c, systrace, args.c is this massively horrible ugly file that handles tracing and it schleps data back and forth between the argument structure and arrays that are stored in the tracing stuff. It is unfortunately large. syscall.h is actually where the syscall numbers are declared. sysproto.h is the internal declarations of things like the sys underscore function, the implementations, as well as the args structures. And then syscall.make is what causes system call subs to be generated automatically in libc. And then there's 32-bit versions of the same things. The results though may be slightly different due to compatibility things. So let's talk about that. So when do we need to provide 32-bit compatibility? Well, in an obvious case, if we have a 64-bit argument on 32-bit it doesn't fit in one register so it had to be spread out. And now we need to do some special handling there. Signed longs are a problem because longs are different sizes. So if we declared it as a long in the system call argument array and didn't sign extend it, we'd have the wrong number. And that would be sort of bad. We're, when you have pointers to objects where the ABI differs, you need to handle that. So if you're, so like that struct iovec that was in the PRITE v case, those structures differ in your, between the ABI's. So you need to handle that at copy in. And then there's the little bit that I alluded to about handling 64-bit return values. This is one of these cases where maybe just don't make the public interface do that and do a little translation at the edge and maybe your life will be simpler and the code will be less weird. But you don't have to handle ints, unsigned longs or pointers because of the way we cast the data, because of the way we do casting from that Cisco all-hour structure. So we'll talk about that a bit more. So here we go, we've converted the PRITE VRs into the 32-bit version where we're now pointing to iovec32s and we have split the offset into two pieces. Now, in the regular version, we had this case where we had this simple mapping. However, in compact mode, we're mapping 64-bit, that array of 64-bit integers. However, we only fill the bottom of them when we're copying in the arguments. So we need to get those mapped back across. So here's the changes. So we've got, we know that we have the pointers sort of to a different type, and we know that the offsets are split. So we need to unsplit those arguments, and then we're gonna need to handle the iovecs. Now, it turns out it's a little more complicated than that. This is one of the things that makes everything weird, and used to make system calls that master really strange is that on 32-bit platforms that aren't i386, 64-bit integers are strongly aligned, and that includes in the calling convention and in the registers in the calling convention. So we actually have to add some padding in order to align the 64-bit parts even though they're split into two pieces, and that gets inserted automatically. You don't have to think about it anymore, thankfully. It did turn out, though, as I was in the process of automating the handling of FreeBSD32 calls, I found that there were two system calls that in fact were broken and never had worked. Fortunately, it probably what happened in practice is that it was the case, I think it was preadv and pwritev. The last argument probably was always zero because no one ever uses the offset, or if they did, they didn't notice because they were using small enough files. And it worked kind of by accident, so yeah, this is why automation is good. Computers are better at remembering all these stupid little details. I had even forgotten some of these details when I went to review my slides, so thankfully, we have computers. And then here, let's look at the quick, quickly at the implementation of 32-bit pwritev. The two things that differ, we have this copy in UIO, except it's a 32-bit version, so it allocates space for the vector, it copies in the one from user space, and then it updates it so that we have 64-bit pointers and 64-bit size T's. That's all it does. And then we have this handy little pair macro, which knows that there's two members in the structure which are offset one and offset two, I think. Yes, offset one and offset two, and it knows to glue them together. It does the right thing for whatever your Andy in this is, and it all just works. So that's kind of all you have to do in a simple case like this. And if you design your interface right, you won't have to do any more than this. If you design your interface badly, life could get complicated, so when your 32-bit combat function starts to get horrible, it's time to rethink your interface. So a little guidance on new system call APIs. So first of all, as I mentioned, signed longs are kind of weird, so maybe avoid them. Unsigned longs are fine, and unsigned longs appear in the form of size T's on a regular basis. That's all sensible and fine, but unsigned longs get weird, so maybe don't use them, maybe use a fixed 64-bit type. For object APIs, so try to make things that don't contain pointers have the same API. So use fixed-width types where possible and appropriate. And if you do need to store a pointer, make sure to use a pointer, an actual one, or IUint pointer T. Don't use a long, don't do what Linux does and use use 64 everywhere. Use the right type, because in the future, pointers won't be integers. If you do need to store an address, I suggest using KV adder T. It's kind of oddly named, but the goal is basically, if you need to share something that is definitely an address and not a pointer. For instance, you're using the address of an object in the kernel as a token, that you don't have to trust, but you're using it for convenience or just as a unique value, KV adder T is your friend. If you are adding explicit padding, consider the possibility that pointers might be bigger than 64-bit someday. Because otherwise, if you might save yourself trouble now, but in hopefully just a few years, pointers might be bigger. Overall, try to ensure that the memory foot can be described with the sal annotations I talked very briefly about. They're documented on top of SysCalls.master as well. This is handy for people writing tracing tools and writing sanitizers and things like that. So it's not always possible, but do try. Don't write new system calls that take a variable number of arguments. They work today because every ABI we use, except maybe Power, just assumes that if you're passing up to however many registers you're allowed to use for passing arguments, that the calling convention is the same between variadic and non-variadic functions. So the value, the optional value of open that everyone forgets to add and uses garbage instead of as a permission, that value is just passed because they just look at a register. So try not to do that anymore. And if you're adding your system on it, if it doesn't have a flags field, maybe think about adding one even if it's gratuitous and you have to write a wrapper to hide it because it means you have an extension mechanism later. So a little bit of bonus content on previous 64 and how that works in Cherry. So we don't have time for a full introduction to Cherry. It's easy to give a two hour talk on Cherry. I'm gonna give you the like two minute talk. So the short short version is that Cherry introduces a new hardware type of the capability. Capabilities grant access to regions of memory. Their validity is maintained in both memory and in registers for a tag value. So if you store arbitrary data in a register or in memory, that was a capability. The tag is cleared in the process and you can't use it as a pointer anymore. So you can only use guarded manipulation to change the values. That means you can't take the pointer that points to some object and make it point to some other object. That's just simply not allowed by the architecture. Capabilities can only be derived from other capabilities and that can only be done in a way that reduces their permissions or maintains them. So you can think of them as 128 bit unforgeable fat pointers. So they are bounded and you can't just make them up. And in a Cherry system, all memory access is via a capability one way or another. Either it's via an explicit capability load or it's via default capability using a conventional load instruction or store instruction or jump. But that capability can also be restricted. And in fact, in our Cherry ABI that capability is null and has access to nothing. So you can only make explicit accesses. So this leads to some ABI differences. First, there's some ABI similarities in Cherry ABI. It is a 64 bit ABI so longs, time t's and are the same. And also 64 bit objects are aligned or 64 bit types are strongly aligned just like normal. However, pointer size is now 128 bits and pointers are aligned to 128 bits. Also, pointer providence is strict which is this property that means you can only create a pointer from another pointer. You can't just write down, I want a bunch of, write a bunch of Fs down and say I want access to some kernel memory. You have to have derived that from somewhere else. So on Cherry BST, our default ABI these days is Cherry ABI where every pointer is a capability. But that's not a viable transition path. Cherry C is not quite the C language. It's very, very close, but not quite. So we need to be able to run hybrid code. It's a big part of our compatibility story is that you can take a Cherry processor and you can run your existing code on it and you get no benefit, but you also don't have any cost. So you can deploy Cherry hardware and the risks are quite minimal. And as you start adopting, you get more and more benefit. So we added this previous D32 Compat Layer. However, one thing that's important about the Compat Layer is once you're in the kernel, we've made it so that all access to the user space is via a capability. So we actually have to derive capabilities from system call arguments and from inside structures in order to access user space. Now we do that just using the default capability for that process or thread, but it is something that has to be done. Otherwise, it's a lot like previous D32, except that we don't have to handle time T because we always have 64 bit time T's. We don't have to handle longs and size T's. So constructing user space capabilities. How does this work? Well, we have a bunch of macros that help out. There's a little bit of ugliness here because it turns out that we have a bunch of weird POSIX APIs where there are magic sentinel values past as pointers. So for instance, map failed, return from mmap is past as avoid star, but it's minus one. And likewise, the one that comes from user space is things like sig IGN, ignore a signal, or sig default is the other one. And these are just small numbers past as pointers. So we don't wanna create a pointer to that because it's not a valid pointer. So what we do instead is we have this little bit of a hack where basically anything in the bottom page, which you're not allowed to map because that's bad. We had a nasty bug around that once. And also anything above the valid user address space, we create null capabilities or null derived capabilities which don't have access to anything. That works pretty well. We have a couple of macros here, one of which sets bounds on an object. So when we know what the size of something is, we can set bounds on it. And then if there's a kernel implementation bug elsewhere, we will get a fault in copy in or copy out. Sometimes we don't know what the bounds should be. So we have this other macro that just doesn't set any bounds and that works just like us conventional ABI does today. It's no worse than it was before. And then we have a bunch of other helper macros but I didn't wanna bore you with another slide full of them. But this is basically how it works. And we actually have found some cases where those bounds would mitigate things we issued security advisories for. So that is a case where you're completely unmodified code, get some security benefit at a system level from running on a cherry system. So quick look at what the previous D64 P write V implementation looks like. But turned out I had four copies of this thing with the copy and UI out of the thing. So I created a user P write V yet one more layer of abstraction like any proper computer scientist, which meant that I had to write a tiny bit less code. So now we here, we construct an argument to the, or capability to the array of IOVex. And then we pass in a copy in UIOVex structure and here's the actual thing. So we call the function pointer and then we call the implementation. This looks like the old CIS version. It's now, except now the pre-ABI versions are a tiny bit smaller. Whether it's worth doing, in this case, probably pretty mixed, but at one point in cherry BSD, we had two additional compatibility ABIs plus the system one and I was just like, I don't want any more duplicate code than necessary. So I started refactoring. So a few final comments. Adding a system call is relatively straightforward. I mean, other than the dirty bits down at the bottom that it actually do the work. But you should ask yourself, should you? And should I do this? Or, and also, is the interface the right one? So seek review early and often and let us help refine the process. We've got a number of people who are experts on how to do these sort of interfaces, but seek help if it's something you want, if it's something you feel like needs to be done, get advice. So happy to take questions. Yeah, I will. Yes. So what is the status of cherry BSD and Morello? So, I mean, it is experimental in the sense that, you know, Morello is a prototype. It's a 100 million pound prototype, but it is nonetheless a prototype and the architecture exactly as it is is not what would become a product. There will be changes. However, it is very much real. Cherry BSD runs on it. We have been running on Morello hardware since, what, November, December last year. Our next release in probably October is going to have packages that work with the GPU that's on the sock. And we should have a Wayland-based graphic GUI environment with Katie and basic stuff up and running. And at least be able to run a legacy Firefox. We've got Morello. So Morello is a proving ground. Morello exists to sort of deal with the fact that hardware people don't want to spend enormous amounts of time, energy, and money implementing something that software people won't use and software people don't want to spend and it's really don't want to spend huge amounts of time and energy building software for hardware that doesn't exist. And Morello is designed, is intended to get us to the point where we can all say, yes, this is, this technology is what we need. It is worth the cost. The benefits are worth it and we will make this massive change. And it's very promising, both in terms of being able to use legacy C and C++, which to replace with a safe language like Rust, cost estimates are insane. Tens or hundreds of billions of dollars to rewrite all the open source code. Whereas porting to Cherry, considerably less than that. Not free, but considerably less than that. Also, we're seeing, for instance, an early prototype of Rust on Cherry. We can eliminate many of the checks. We might be able to make Rust as fast as C with the help of Cherry and really close that gap. So now we can have a better programming environment that's also as fast. So when replacement makes sense, things are also better. Yes. So the question is, is there a formal review process for system calls, particularly because if you introduce a new system call that does horrible things and is insecure, it's hard to fix that. There's not really a formal process per se. I don't, yeah, there's not really a formal process. I mean, there are a few people who are flagged on reviews. I'm always flagged on any review that touches this Call Stop Master, so I see them all. And I make sure to tag people who I think should be involved. There's always some risk there. I don't know that we have a, we don't have a really, yeah. I'm not sure what that process would look like if we. So questions about dynamic system calls. So where a module provides them. So generally speaking in FreeBSD, we actually add an entry in the system call table. I mean, systemcalls.master for any system call that's, even system calls that are loaded by modules, it is possible to load them otherwise. I guess the rules about compatibility are kind of the same. My personal preference is that if it's in the base, it should be in systemcalls.master even if it is always provided by a module. The one exception in the current case is, I don't remember, it's PMC something. It's the, there is no reason for it to be that way and that should be fixed. It just serves no purpose and is kind of bad from a tracing perspective. And it's problematic on Cherry. It is possible to add loadable system calls that have not been declared, but it's more complicated. And if they are particularly weird and require special argument handling, it just won't. Yeah, so I guess the comment was that, yeah, I suggested IO control, but IO control is kind of klugey. And so IO control I think makes sense if it is specific to a particular file descriptor or a particular device and early narrow. I agree it is a klugey mess. I hate it. The interface is everything is wrong. I want to fix it, but no one will pay me to fix it. I think systemcalls are generally cleaner. It's very clear what you want to do, but they are very heavyweight in terms of their just the accreted cost of maintaining this system. So caution is required. I don't know that I have a great answer. Yeah, I mean, I kind of feel like we're maybe a little too conservative in adding system calls. You mentioned the cost of maintaining it in LibC. If the interface is the public one, then there's no, I mean, yes, the headers get a tiny bit bigger, but that's kind of like it. LibC grows a five instruction sequence. It's not that big a deal. We shouldn't just like spray them everywhere and add a dozen a year, but I don't think we should be overly conservative. And I really, I've had a lot of problems implementing MUXT system calls like, what is it, UMTXOP, which is used for mutexes. It's just, it's confusing and complicated. And my life would be simpler if it was a bunch of separate system calls. It's just easier to understand what's going on. You don't have argument type confusion where some things are, where one MUXT thing takes a pointer and one takes an integer. In all current cherry architectures, that's fine. In our MIPS implementation that we've deleted, we had a separate register file, so you had to like figure out where to go grab the value from. It was awful. Mark. Yeah, so Mark asks about whether we could add some introspection to IO control so we could know what types of device uses. Yeah, it's not something I really thought much about. One of the things I have thought about with IO control is one of the ways we differ from Linux, and I think everyone else actually, is that we do the copy in of the data, the object from user space in a central location, and that's nice in a lot of ways. It's simplifying, it's great, but it's horrible for compatibility. You'd like to be able to make the transform. It would be more convenient to make the transform at the lower level in some cases, and there are some tricks that Conrad did with in his, I guess, master's thesis where he wrote, he generated custom copy-ins to do translation, and that only works if you do it where you know what the type is, and with IO control, you only know what the type is when you get all the way to the device and the actual implementation, and it's, so it's an awkward mismatch. I think some things would be simpler if we could, if we would defer, but I can't imagine us doing the work without also, like, revving the whole darn interface. Because you'd have to go and change every implementation. It'd be awful. There is a bit pattern in the IOctl command. There are three bits that set in a particular way are always invalid, so we could do that and have 29 bits that are still usable that we could reuse and do a whole new implementation on top of it, but that seems kind of awkward. Yes. Yeah, so Hans points out that in 1xIO controls, while there is an in and an out, there's no actual, because the copies are decentralized, there's no checking. It's actually something we might be able to address with Cherry. We could make those pointers read only or write only. As we pass them down, that could be entertaining. There'd be lots of explosions, but they'd be good explosions. Anyone else, so shall we flee to lunch? All right, thank you, everyone.