 Welcome everyone to the first session of this year's EuroBSD con, please welcome Paul Goyette. He's going to talk about the challenges in replacing if-deaf messes with one-term code selection, so give him a nice welcome. Good morning everybody, oh good, this thing's actually working. So anyways, my name is Paul Goyette, I've been with NetBSD now for a very long time, at least like 12 years, and been contributing for 25 years, so it's been a little while. Today we're going to talk about some work that I did over the last year and a half in changing the modular approach to the NetBSD compact code. Talk a little bit about the motivation, why I did it, some of the issues that were involved, the approach, the implementation status, and finally at the end some thank yous. So if there's any questions along the way, feel free to interrupt, this is not going to take the full session time unless something miraculous happens, so feel free to interrupt with questions along the way. So anyways, NetBSD has always prided itself on being able to maintain compatibility with older versions of NetBSD. We actually have code that says if you have a NetBSD 0.9 image, it will still run today if you've included the compact code in your kernel. And anything that's ever run on NetBSD will still run, pretty much. We also have the ability to load kernel components dynamically as needed and unload them when we're done with them, either automatically or under user control. And at one point we had the compact code was just monolithic and built into the kernel, so now we can do it in modules. It makes life a little easier and it made the work I was doing a little bit less difficult to do. The main reason I started all this is because I got bit by one of the main problems that we have. In my systems I only run highly modularized kernels. I basically strip everything out of the kernel and load everything as a module if and when I need it. So with the generic kernel for NetBSD has somewhere around 250 modules that are built in to the kernel so that everybody gets everything. When my kernel boots it's got 20 modules, or 22 I think I counted last night. I have at peak runtime about 70 modules running. This is all the stuff that gets loaded dynamically when needed. So one of the things that got changed one day while I was running, somebody changed the way the getIF adder works. They changed the code in RTSock.c. And my kernel didn't work anymore. Even if I loaded the compact module it had all the compact code in the world built in. But it didn't work. Turns out that if you hadn't defined the compact macro when you built your kernel there was nothing to call the compact code. It didn't matter if you had the code available. If you don't call it it doesn't work. So I actually had to go back and start running a generic kernel until I fixed the problem. Oh my god what was that all about right? So even if I loaded the compact module it couldn't work. Had to do it manually. Some of the big issues are the fact that the compact code was all controlled by a bunch of if-def spaghetti. There's a Compat XX for every single version of NetBSD that's ever been issued. And it gets kind of nested at times. It got really messy. And the hard job was to pull it all apart and separate the code out. The code that you need for Compat 50 goes in one module. The code you need for Compat 60 goes in another module. And sometimes the compact code replaced other compact code. There were a few situations where we'd gone through multiple revisions of certain syscalls in particular. I forget which one it was now. There was actually one that had gone through three separate revisions. So there was an old and an old old old version of the syscall as well as the current one. It got really nasty. And the worst part was the modules can be unloaded when they're done. Well how do you know when you're done with Compat code? How do you know you're not already executing it on another CPU? Really nasty if you do a mod unload of some syscall while the syscall is executing. Or an IO control. Lots of other modules or options in building these modules too. For example, the FFS code, file system code, is built conditional on the WAPBL whatever. I forget what the acronym stands for. It's the logging, the journaling portion of the file system code. It's not LFS, but it's the journaling for FFS. The FFS module depends on whether you've defined that code or not. So anyways, Compat modules have their own copy of those macros to define which code gets built in. Gotta keep them in sync. And there's no clear way to find out whether the optional code was included. In the example I had before where the RTSock code changed. There was no way for the current version of the RTSock code to determine if it even needed to try to look for the 50 Compat code or the 60 Compat code. Just didn't know. So, kinda made it really nasty. You don't know if code's there, but you gotta call it if it is. And that RTSock code basically always assumed that the code was built into the kernel. Didn't bother checking. Standard builds only provided one version of the Compat module, the monolithic module. And there were no provisions in the original model for... Well, let's say I built my kernel with Compat 5.0, so anything that was built for NetBSD 5.0 and above would work. Oh crap, now I got a 4.0 image that I want to run. You couldn't add the 4.0 compatibility code to your kernel without removing all the existing compatibility code, building a new module with 4.0 code built in, and then loading the new module. Just didn't happen. You couldn't incrementally increase the number of versions back you went. Device driver modules have a way of determining if they have any instances created. Cisco modules actually can tell whether they're active or not. We actually keep a flag of every time a Cisco has entered. The Cisco table entry has a rough count. And even the buffer queue strategy modules have a reference count. But some of this Compat code isn't Cisco calls. It's additional options within the IO control call. There was no way for it to know that the code was there and no way to know if the code was active. And again, if you happen to unload a module while you're executing, bad things happen. We didn't have the ability to do that. So I worked with one of our other guys, Taylor Campbell, to define a module hook mechanism so that when the Compat module code is loaded, it installs a hook in a known location in the kernel. A code that might want to call Compat code if it exists doesn't indirect call through that hook. If the hook is set, it calls the Compat code. If the hook's not set, it doesn't do anything. It just returns E pass-through. That would have been great all by itself just having the function pointer. And in fact, some of the code had been indirect call modified previously. But it didn't solve the problem of unloading the code out from underneath the code being executed. The second thing that was done was to split the Compat code into the version-specific modules so that we could incrementally increase the number of versions back that we go and then unload just the piece that we don't need anymore when we're done with it without having to unload everything. There's no more monolithic Compat. Now it's Compat 5.0, Compat 6.0, Compat 7.0, and so forth. So as I said, when the module code is loaded, it sets the hook. When a caller needs to possibly call Compatibility code, it's determined, for example, that it doesn't know what the current IO control function is. Maybe it's a Compat code, maybe not. Call through the hook and see if it can handle it. If the hook doesn't handle it, you haven't lost anything except a few instruction cycles for calling. Most of the time, if the hook's not set, we simply return E-noses instead. So the mechanism that was used for protecting the hook from unloading, it's actually fairly heavy weight. Couldn't think of a very lightweight way to do it, unfortunately. But we actually have passive serialization involved to prevent someone from starting to acquire a local count synchronization item. The local count on the second is the rough count for the actual invocation of the Compat code. If you want to unload the Compat code, you have to wait and drain the local count. The local count has a requirement that you must provide a mechanism to prevent further acquires of that local count from happening. In the multi-CPU environment, it was actually possible for the check to see if the local count needed to be drained. And the actual starting of the draining could kind of race with each other, and there was this possibility still of having the code unloaded from underneath itself. So the passive serialization is used to prevent new acquires. The local count is used to attract the active references. And before we unset the hook, which is a prerequisite for unloading the module, we drain the local count, which makes sure that no new local references can get added. So the module hook is just a macro that defines the hook. It's got a mutex and a convar and a local count and a passive serialization. Then it's got the two important things. One is the flag that says the hook is set or not set. And if it's set, then there's a function pointer with an argument list inside the structure. So the hook structure itself is all synchronization variables. And oh, by the way, we have a function pointer. It's fairly heavy weight, but you don't do it that often. And it's not really all that much memory. Because all the hooks were different, they all had individual or unique prototypes. Use the macro approach rather than a fixed type for the hook function itself. And you can see here we've got the individual macros that are used to access the hooks. For invoking the optional code before, you would basically check a function pointer and see if the function pointer was set. If it was, call it. Now call the macro, call the hook, and it does the same thing. It does a little bit more behind the scenes, but it accomplishes the same result. The module initialization code sets the hook. Finish code clears the hook. Pretty simple stuff. There's a compact stub.h that defines all the hooks as types, and the actual hook memory is allocated in compact stub.c. As I said earlier, the second major change besides defining the hooks was to split everything up. We had a lot of different versions of compact code all going all the way back to 0.9, even though some of it's been disabled by default. By default we start with 1.5 and up, but we can add the other code if we want. So the 0.9 code assumes that you've got 1.0 code, because if you're going to go back to 0.9, you better have everything in between your current and where you're going back to because you never know what's going to happen. So the 0.9 depends on 1.0, 1.0 depends on 1.1, 1.1 and so forth. 1.5 depends on 2.0, which depends on 3.0 and 4.0. The dependency list got pretty long, and oh, we used to have a hardwired constant of how many dependencies you could have in a given module. Well, we exceeded that. We could have possibly have made the module initialization code recursively call the module load code for the things it needed explicitly rather than just depending upon the required module list. But there was a limit of those two, and guess what, we exceeded that. So we had to get rid of a couple of compile time constants as well. And, oh, yes, this is the key thing, syscalls.master. Some of these compact codes were actual syscalls, and syscalls.master, many several years ago, I guess, learned how to auto load the compact module if it needed a compact syscall. Oh, by the way, that was my fault. I did that too. But now that we're splitting the compact module into all these little small version-specific compact modules, we had to go back and fix syscalls.master. So it did the same thing, but only for the specific module version, or version-specific module that it needed to load. As I said, there's the compile time limits. We used to have a maximum number of dependencies of 10, and if you start counting, you find 0.912. It adds up to a lot more than 10. And even if you went through the recursion process, the explicit recursion on having the one-modules initialization code do a mod auto load, there was a limit of that of six as well. That was hard-coded in. So where are we now? There's a lot of work that went into this. It took me about a year of elapsed time. I didn't even start counting the hours or days of actual work on it. There were days when I'd code it all day long, and then there were weeks when I did nothing because I was getting frustrated and couldn't figure something out. All together, it took about a year to do. We merged it into head in the middle of January of this year. That was back in the time when we thought we were going to pull the NetBSD9 branch imminently. Of course, when are we going to pull the 9 branch soon? We pulled the branch. We just haven't released the code yet. So it will be one NetBSD9 ship real soon now. The changes that we're talking about today will be in that code. Hello. There we go. There's still a few little bits and pieces that didn't get done. The compile time restrictions were removed. And oh, by the way, because we changed the way the modules worked, we had to add some compact code to make it compatible with older versions of ModStat, which still thought there was a fixed limit. I actually had to build a Compat 8.0 to hold the code that allowed me to have Compat 8.0. Yeah, that was interesting. We subsequently had some other changes as well that have gone in more in the RTSock code. That RTSock code really is pretty ugly stuff. A little bit more on that when we get to the thank yous at the end that's one of the things. So all the version-specific modules were created. Pretty much all the Compat code calls were converted to use the hooks. There's a couple of cases where the Compat code, not the Compat code, but there's a couple of cases where optional code cannot be modularized and at this point I got lazy and didn't use hooks for things that could never get unloaded because there's not modular code yet. I think some NTP code falls into that category. Either your kernel has NTP code or you can't module load the NTP code. So the NTP hooks that are done as a result of including NTP are still just old-style indirect function pointers without all the synchronization. It turns out that the NetBSD32 Compat code also needed to get pretty much all these changes because if you want to run 32-bit Compat images on your 64-bit AMD processor, well, yeah, you still need the Compat code. And it didn't make sense to split the monolithic 64-bit code without also splitting up the 32-bit code. So I bit the bullet and did that as well. There are still a few areas that aren't done. There's some machine-dependent bits and pieces. Most notably in the AMD64, the code that does microcode update is optional. It's modularized. And there's actually some compatibility code in there as well because something changed. I don't even remember the details off the top of my head. It doesn't work in Zen. So the build infrastructure doesn't include the right headers for the modules. Even though it knows how to build Zen versions of modules, it doesn't include the right headers in the right order and just never got it to compile. Christo says he's looking at it in his copious spare time. Yeah, we'll see when that happens. There's still a couple of old-style calls in the GPIO code in WSMux. And I have not yet done a full audit to see if there's anything else that I missed. I tried to find everything, but you know how that goes. You always miss something. I just don't know where yet. A couple of areas of improvement is really the hook definition mechanism, as I said at the beginning, is a little bit heavyweight. It's got a lot of synchronization issues or structures built into the hook just to make sure that you don't pull the hook out from underneath the code while it's executing. I can't figure out a way to get rid of it. Nobody else could either, but it'd be really nice if we could make it a little bit simpler. It'd also be nice if we didn't have to have the Kern underscore stub stuff to actually allocate the hooks. The hooks themselves have to be there whether or not the code that they point into is there. Otherwise, you wouldn't have a place to go through for the indirect pointers. So there's a little bit of static or permanently allocated memory in the kernel to occupy these, and it'd be nice if we could get rid of them, but again, couldn't figure out a way to do that. So there's a couple of comments I've received from other people that, gee, adding a new hook is laborious. Yeah, well, it could be worse. It could be better, but it could be worse. So possibly some sort of non-procedural definition mechanism might help. Anybody think about config and maybe teaching it how to do this kind of stuff? I don't know. I don't want to touch config. It's a mess. So this was an example of an example that somebody sent me as one possible way to think about doing this, and it's on a future project. Not done yet. So I did most of the work here, but definitely could not have done it without a lot of help. Taylor Campbell came up with the initial structure for the hook and the synchronization. Turns out I actually had to add the passive serialization to prevent the local count race from having a new local count acquire running at the same time as the local count drain was starting. Yeah, that would have been pretty nasty because the counters would get all whacked out. And Christos Zulis provided major encouragement. I said earlier there were a lot of places where I just shut down for a week or two at a time because I couldn't do it. Christos that kept nudging me and making suggestions on how to approach things that got me over a lot of the humps along the way, especially with the RTSock.c code. RTSock.c code is really ugly. It basically included, or PoundSign included most of itself in the compact code with different definitions of compile time options. So essentially you ended up with running the code or compiling the code twice in one object code, once with and once without various things being defined. Really nasty stuff. I pulled that out and the duplicated code is now a separate source module. So you can actually find out where the code is and whether it's built once or twice doesn't matter. The place where the common code gets included defines all the compile macros appropriately. So the real help came after the merge when all the bugs that I created got found. There were a few. I think we found all of them. But the entire BSD community, users and developers both, had a major contribution here just by finding and identifying the bugs so we could get them fixed. So that's it. Anybody got any questions? I'm curious, what do we do for auto-loading like IO controls? How do we configure that now? A little backstory. So when GCC was upgraded some time ago, it started to produce new kinds of relocations and so the kernel linker code in separation port, for example, that was not prepared for that, started spewing warnings about that each time you try to load a module. If you run the I, it issues some terminal IO control and some of the IO control routines tries to auto-load to see if that will handle that IO control. So on each invocation, you get the module out-loaded. You check, oh, I cannot handle that IO control. It's unloaded. Then the next time you run the I, it gets auto-loaded. You get to figure out a way to make it take nine-plus seconds in between the load and the unload so that the auto-unload won't happen or manually load the module that's involved because manually loaded modules will not get auto-unloaded. But yeah, that's not a problem that was part of the scope of this project. There's lots of problems with modules in general. If anybody really wants to look at it, I got so tired of having to remember all the problems. I actually wrote them down and then collected some enhancements to that list. So there's actually a to-do in the source slash doc directory, to-do dot modules. I think it last kind of had more than 20 items of areas that need improvement with the module subsystem. Not specific to the compact code. Thanks. Feel free to help. In that case I want to thank you.