 So five o'clock, let's get started. So I'm David Strauss, I do work with Pantheon. We're one of the earliest adopters of SystemD on a pretty wide scale, and I've done substantial work in code bases of a few projects, everything from Drupal to some database systems. And what I'd like to talk today about is improving the quality iteratively of the SystemD code base in terms of how we structure our memory management, how we structure our data structures, how we audit our code base, and improvements that we can make toward that. We have a lot of patterns that get used all over the SystemD code base. There are over 1,000 instances of cleanup routines where we have to have code that basically does in reverse order all of the allocations that are being done that are not automatically cleaned up. This often manifests in the code as check for error condition, if it's an error condition, set a return value, and then go to a finished routine at the end of the code to be able to do cleanup. Sometimes we even have multiple finished routines because there are different nesting levels of allocations that occur. I don't have any problem with the fact that it's go to, but it's actually pretty tedious to maintain the reversal of the allocations at the end of code when we have patterns like that. We have over 4,000 instances of using cleanup macros in the code where we manually are assigning with GCC's extensions to C. This is actually non-standard C, the idea of running some sort of routine, almost like a destructor, in order to clean up memory once an object goes out of scope. On the upside, this means that the object gets automatically cleaned up without the sort of routine I just talked about. On the downside, it means that we have to have cleanup macros that are often type specific, and we have sometimes cleanup macros that go several levels into their freeing operations in order to provide the right corresponding cleanup for a particular data type. And when you implement this, you have to use the appropriate cleanup macro to go with a particular data type because there's not an implicit link between the two. And we have over 6,000 lines of code that provide functionality that's extremely similar to things that are part of the standard template library for C++. This includes strings, sets, hash maps, bit maps. Overall, we dedicate an enormous amount of code to recreating C++ style functionality in the system decode base. We even have our own kind of attempt at a class hierarchy with things like units where the idea of a unit is basically an abstract superclass and things like services and sockets and targets, you name it, are all basically these sort of concrete subclasses of that where there's certain attributes that are inherited by all of these units and these units also provide their own. We do these through a combination of structs, macros, and these sort of lookup v-tables that basically provide a way to take things that are methods that are stored on each of these things with different implementations for these different unit types and then map them to the actual routine that runs them. It's not really regular C. In fact, system D can't compile on a C compiler that just only does regular C. Some of these things like cleanup routines, use compiler implemented features that go beyond regular C and actually use parts of the C++ runtime. That's actually, when you do something like a cleanup macro in GCC, it's actually throwing that into the same portion of the code that would run in the compiler if it were running a destructor on that particular object because it tracks the life cycle the same way. And in fact, almost every C compiler these days is based on building a C++ compiler and then implementing a subset of the functionality to compile the C. And I think a lot of this is really clunky in terms of the developer experience for people improving the system D code base for people reviewing it, learning about it. And I think it creates a pretty high barrier to entry to getting additional developers and even having ourselves work on improved code. So take something like this, relatively fresh code and system D code base, pretty kind of leaf project in the sense of this isn't a core routine, this isn't part of PID one, this is just a utility running on the shell. But we have this thing here where we have a vector of these strings and here you see a cleanup macro that's being assigned. This cleanup macro was written specifically for this use case of having the, basically a variable length vector to strings and that's why it's cleanup strv free. We have cases where for a lot of the data structures there are specific behaviors being implemented on them like for the hash map. The string hash ops is defining that it's operating on strings. And then we also have to figure out whether this is even the right call for cleaning up this hash map we've created, ordered hash map free, free, free. And if we get this wrong, we leak memory. And in the case of this utility, not a terrible consequence. The reason why I chose this utility is later I'm gonna go over a conversion I did of it incrementally to convert just this particular utility to a much more concise and safe in my opinion kind of minimalist C++ code base. And also I was mentioning before when we're doing these sort of finish routines where we go to them, we have to order them appropriately because it's not automatic that things get unwound in the order that they get created. And if we order these incorrectly, we could end up with use after free scenarios. If we, for example, free the container structure before we free the members of the structure. So there's a lot of things you can get wrong even in this small set of code. And additionally, we've overloaded the concept of ints throughout the code base to be able to return success values and actual return values with error codes that we basically have the convention through the code base of values below zero indicating an error, values above zero being a sometimes a return type and zero just being success. But there's not really a great way to enforce this if you especially if you're refactoring the code from something that is returning say a Boolean style success value to this sort of situation. And in fact, if I run analysis on this particular piece of code, it isn't correct. The freeing of the hash map is leaving some lingering things in terms of still reachable with the dynamic analysis. No, it's not leaking. It's just the implementation. So in any case, even if this is not leaking, it's creating a false positive in terms of analysis of the code. Yes, it's also fixed. I actually ran this analysis as of earlier today on the master code base. Literally more recently than that. More recently than this morning. Okay, I ran this analysis this morning using Valgrind. And so ultimately like my argument here and you'll see in the later slides that this is kind of part of my point is that it's often not clear whether we're doing something wrong or whether we've implemented it right and we're confusing the dynamic and static analysis. The going forward to some of the other concerns that I have is we're so oriented around things like these event loop structures and we actually lose a lot of type data in that process where we have things like this where we send an avoid star. It basically got de-typed earlier to avoid star and then we're re-typing it back to a manager pointer here. This is really hard to validate. You can, some systems like Covairty can get a little bit of validation going on this but they're using a lot of heuristics to do so and there's almost no runtime or compile time validation on this sort of setup because if this is not actually a pointer to a manager this will just crash. And we just can't analyze it well and so we're confusing tools all over the place. In this case, this is using Clang Analyzer where if we use, this is a pretty popular static analysis tool that's built into the LLVM code base. It's totally open source. It's a great tool for diagnosing things and it doesn't work on our code base because we use functionality like cleanup free that it doesn't understand. In fact, this code, it does not leak because the cleanup free gets invoked if this has been set to something and it goes out of scope but the analysis tool considers it a potential memory leak. In fact, there are about 306 instances of this in the code base where the static analyzer associated with Clang thinks that there's a problem even if there's not. And it's really hard to get down to where the actual problems are in the code base and continue to improve code quality if we have false positives. And I think that we can also start looking at some of the constructs that have been added in modern C++. And this is totally different than the C++ that a lot of people have grown to hate because there's a lot of reasons to hate old school C++. It has awful template systems. It has awful pointer and smart pointer systems. But around 2011, they actually decided enough is enough and that they would add things to the language that were mostly supporting the idea of confidence in design around memory safety. That means that in terms of ensuring things are free, ensuring things aren't used after free, ensuring that the handoff of ownership of data is extremely clear in the code and enforced at the compiler level. And all of this gets supported down to the static and dynamic analysis layer. As a sort of kind of spectrum here, there's a big difference between a lot of these systems, but I see this sort of span where you have the sort of standard C, which doesn't do some of the stuff that we rely on in terms of things like cleanup. Then you have our sort of GCC flavored version of C that system D is written in and that uses that has extensions that rely on some of the C++ functionality, even if it's not C++. And then you have systems that are gravitating more and more toward the idea of enforcing safety as the means of using the system. You look at something like Rust and it's designed from the ground up to prevent you from doing things that will cause unspecified behavior or unpredictable memory allocation or other patterns that we see in broken code. But C++ 2011 goes a long way toward that and we can actually convert code iteratively toward that of the current code base and gain some of these constructs that allow us to be confident in the memory cycle of a lot of the implementations we're doing. The specific sorts of conversions that I think would be useful would be going from cleanup macros and go to finish to using proper destructors on these objects. This would reduce a substantial amount of code in the code base and ensure that things get unwound precisely in the opposite order that they were actually wound up in the beginning and can simplify a lot of our return cases for handling errors because we don't have to create these multiple jumps to different layers of unwinding through this sort of go to finish or even multiple level unwinding structures. We could be getting typed data that are going into callbacks in the sense of instead of handing void stars through things, we could be using lambdas that can actually get built at compile time and ensure that we actually maintain type safety all the way through a callback of an event loop. And we could be using systems like references, move semantics and smart pointers to ensure that we don't actually have cases like the possibility of handing a null pointer to something that is expecting something to be there, which means we could remove quite a few assertions in the code base because we wouldn't need to assert that a memory address has been assigned for something that is being handed over one of those ways. And as we provide APIs to different parts of the code, handing over the ownership of data objects where say one function creates an object and then hands the ownership of that memory to its caller can be actually enforced at the compiler level all the way from the early development time. And in fact, when we look at the total results, almost all of the things that we're getting dinged on are the things that we could be selectively using some aspects of C++ to mitigate in terms of destructors, smart pointers. RAII is the idea of resource acquisition is initialization. There are many cases in the code base where we are constructing something as far as we think, but we're not 100% sure if it's constructed and then we have to check error values to see if it's been constructed properly. And RAII means that if it's been constructed and we have the data and we're moving forward in the code, it actually the invariance of whatever thing that we're working with are being held. So just looking at the distribution of issues here in terms of possible null pointers, possible leaks, I'm not saying we're actually doing these things wrong, but I'm saying that it's creating a lot of noise for us that we can't look at a report like this and know with confidence that we're getting a good result of kind of not having a bunch of false positives. And there's really no way to solve this with piracy. I mean, you could put comments in the code to tell the static analyzer to ignore those lines, but that doesn't really help. We could be doing manual freeing of these things rather than using the cleanup things, but I think that's more dangerous than what we're doing today. So in some ways, we basically need to, if we want these tools to work for us, we have to work within the bounds that the tools expect. Because it's not really about just that code works and that we've gotten a low defect rate just in production after lots of periods of iteration. It's about whether we can look at a piece of code and analyze it and be confident before it goes out the door that it's actually correct. So just to kind of clear up what I'm not pushing for at all, I'm not pushing for any kind of generalized conversion to idiomatic C++. I'm not pushing for any sort of extensive use of objects or even adding many exceptions to the code base. It's about using some of these facilities for memory management and about using some of the high level data structures. And we totally wouldn't be the first project to do this. GCC did this years ago. Sorry if these slides are a little text heavy, but I'll just summarize what it's basically saying here. When GCC looked at this, they started looking at it about 2008. And by about 2010, they had decided to use a subset of C++ where they basically said, we have a C-based code base. Let's take a handful of features from C++, allow those specific ones to be used in GCC and thereby allow using more elegant structures. And one of my favorite arguments around this is that C++, and this is, I think, Ian Lance Taylor of Google saying this, that it supports cleaner code in many significant cases and it never forces us to do anything more ugly because we can always use C. Like C with very few exceptions is directly usable within a C++ code base. In terms of the effect on the projects, they actually saw for modern style systems in terms of multi-processor, multi-core systems. They saw no impact to bootstrap performance of running the program, but they actually saw improvement to the execution time of a lot of their stuff because they were able to use better optimized data structures that were available in libraries and in things like STL that were generally better tuned, especially on a platform by platform basis than the data structures they could write manually in their C code. And they saw absolutely no drop-off in contributors. They saw no drop-off in commits. There was no sort of exodus that this created on the project. I know a lot of people panic around the idea of as soon as something touches C++ that everyone runs away from it, from like a kernel developer perspective. Like I know the kernel team is like really ideological on this. And I think the argument is actually way stronger today than it ever was for GCC given the advancements that have been made in 2011 and 2014 with the code base. So I think that we could do a lot of small improvements. I think that we could be taking leaf utilities like command line tools and certain daemons that are not relied on by other code as an initial foray into this. We could be focusing on cleaning up analysis results where replacing these sort of cleanup things with destructors, replacing some of the freeing routines with destructors, replacing some of the way that we hand off data to be in a more semantically validatable method at the compile time. And then we could eventually graduate to doing deeper work on core code that is actually relied on by other systems and even exporting symbols as C wherever we want in terms of libraries. So there's really no limit to what scope we could choose for this because we can always maintain the libraries and symbols that we export to the rest of the world just as we have today. There's no real dependency there. And eventually we could remove all those unused macros and data structures that are largely redundant with what C++ supports. So I did this to sysctl.c. It dropped the lines of code by about 26% and this is not even counting any retirement of libraries within SystemD, I'm talking purely this single file. And just to kind of compare the sort of structures that I'm using here, we can just, we can forego a lot of the boilerplate that we have to apply today. We don't have to have these finished routines. We don't have to have the manual freeing. We don't have to have this sort of bespoke, like for SystemD invocation of these hash maps in terms of like this ordered hash map new stuff that is basically being handled at a higher level here. And I'm a rather a fan of some of the semantic error codes because they actually provide a way to distinguish between the idea of an integer and an error code which can be useful for refactoring. So we basically can simplify the code a whole lot. Whoops. In terms of the typed error codes, just returning when we're done, letting the destructors take over and using these high level specifications for how the data structures work. So what I did is like in the conversion, I was able to drop many of the requirements on the internal libraries and add a handful of external ones. I don't actually have a strong opinion on some of the extra libraries beyond STL. I think we could benefit a ton just from using some of the basic C++ stuff. In the case of this rewrite, I did use some of the boost stuff just because I thought it would be fun. And that helped cut the code considerably. But I think that from the memory safety perspective and analysis perspective, we could get all of the gains that I've talked about without touching boost at all. There are some weird things in terms of C++ doesn't really like the hash map container that we have in system D because it relies on some GCC features that are not part of normal C and don't actually that conflict with parts of the C++ standard because they're the sort of add-ons to GCC that are actually using parts of the C++ engine in terms of doing things like type checking. So it's not a problem in the C code itself. It's only a problem in the actual code that gets converted to C++. So this doesn't create a widespread problem. It just meant that I had to replace the hash map with a vector or a hash thing from STL for the code that I was converting. So with that, I'll open it up to questions. We have a couple of minutes left. I have a question. So the first part, I agree, like we can improve the code base. The second part was like, you're referencing some example from project that happened like 10 years ago, like the GCC moved to C++. And at the same time, you're also referencing, I want something which is correct or safe. So I'm aiming at rust and then you say, but I actually want to use C++, which is a kitchen sink, but just a proper subset. So we still have to be very careful. So the question is, given that other projects like Firefox, VLC, Tor, of there are a few more Gstream or what else, they are already moving into the same direction and they are directly aiming at what you were mentioning at first, like a safe language. So why are you actually suggesting we started writing these now in something which is not having what I wanted to do before. So it's like, I was kind of like trying the same experiment in Rust with the new MazeZone BIOS system and it's nice, but I cannot like force people into that and I'm not proposing for that. And so I'm surprised that you're like proposing to get into that direction, but not even doing that. And my question is why? So I don't think Rust is viable for a substantial portion of the system decode. It doesn't support forking, for example, as part of the core operations. I mean, you could do the system call, but it's not supported. And forking is a pretty core part of running a PID one for the system. The other issue is that there's no real way to iteratively move. We would basically have to take file by file and rewrite entire subsystems and create bindings for C, which you can do, but it's a much higher impedance bridge between C and Rust than it is between C and C++. It's impossible for us to even go in the C++ and change part of one file to improve some of the verification operations for it. And I agree, it's C++ as a kitchen sink and you have to be careful what parts you pick of it, but I think that there's substantial redundancy with what we've done in the current code base with what C++ is designed to offer. And I mean, the kind of example I gave is somewhat proof of the fact that you can do this sort of thing iteratively. I took one file, changed the build. I changed the build information in terms of the MISIN setup to be able to support C++ for the project. Just a few lines of code changed there. And a few lines of code changed in the sysctl.c and suddenly I'm able to build that particular utility as C++. So your point is that the standard library of Rust is not flexible enough, first. And the second one is that you want to retain the C compatibility at any level, like internal and external. I think that, yes. I think that it would be totally reasonable to rewrite certain isolated parts of SystemD in Rust. I think that there's no way really to move PID 1 to Rust. But you're not arguing from that here. You're saying I don't want to move all of SystemD into that. I want to start, right? Sysctl or... I think that that's really from a risk mitigation perspective that we could be doing it for small parts of the external project and then kind of working the way inwards. And part of what I was providing with a GCC example is it's so low impact as a change. And so it doesn't create any kind of gigantic burden of work for the whole project that it didn't, for GCC, the conversion didn't create any sort of period of time where they weren't implementing new features or they weren't making improvements, which is the thing I would worry about for a larger conversion. One point on Rust is that the official compiler supports x86 and ARM and everything else, you're on your own. So if you want to support something like MIPS or one of these other platforms, you also get to be a compiler maintainer. Yep, that's a decent point. Part of why I was emphasizing the support in GCC and compilers, we already support for everything. Everything I'm talking about here would be supported on all the same build environments that we're already supported on. Oh, no more time, sorry. Thanks.