 I would say that it's great to be back, except that we are not really back, of course, but we are going to have to do things this way this year. So I'm going to talk about how we set up debug packages in OpenBSD. It's somewhat old work by now. I was supposed to present this last year at EuroBSD.com, which didn't happen, obviously. So I'm doing this from memory. I hope I didn't forget too many details. It's been a long while, actually. Well, that's a good point that we have had more feedback about how it works, but some details might be a little fuzzy, especially with who did what specifically. So this all started very quickly. As usual, in OpenBSD, it was a port cycloton, specifically. In 2019, November, Poliofti set up a port cycloton in Bucharest. So many thanks, but it was wonderful. And as usual, I was planning on working on something else. About the first day of the hackathon, he came to me, Paul, telling me that, hey, you know, there's this new stuff in Objcopy and GDB that we can use to actually build debug information into packages because we can now split off debug from the main program. And, well, why not? Let's try that. So how does it work? You've got about three lines with link tools. The first one is to only keep the debug information inside a separate file. So you do Objcopy dash dash, only keep debug, and you get the debug information on the side. Then you do some invocation on the strip, on the program proper, so that we lose the debug information in the program. And finally, you got some GNU extension for GDB, which allows the program to figure out that its debug information is actually in the new file that we created. So this is very stupid. Basically, you just cut along the dotted lines and you got your debug information on the side. So quite simple. We tried that, and it worked. And that's about all. Okay, no, of course not. There's still about 30, 45 minutes or talk to talk about. The thing is that when you get a proof of concept like that, that you figure out, you can take out debug information and put it on the side. It's only the first step. And we've got lots of infrastructure details to make things as endless as possible to the poor folks, including me, who are going to actually port software to OpenBSD and try to provide debug information for people. So the first thing to do was to actually ship that debug information, which means that on the side of normal packages, you're going to have debug packages. Should we make them visible? By this, I mean that in the infrastructure, when you do make package, you are going to actually depend on some cookies that correspond to building each package. So should we add to the list of cookies so that debug packages are full blown packages? This was complicated to do, so I decided, okay, let's try something different. Let's try to make debug packages, let's say phantom packages that don't really exist. Most specifically, instead of creating two packages each time I do make package for simple parts, I'm going to just create one package and on the site, I'm going to also create the debug package so it doesn't really show up all that much in the infrastructure. As for adding stuff at runtime, this was actually the trivial part because for this, we've got update signatures for normal packages, and it was completely straightforward to just say, okay, the debug package is going to have the exact same signature as the normal package so that when you update the normal package, the debug package will update as well at the same time in synchronization. That was really the only part that worked from the start without needing any of our changes to make things work better. Since we were at Hackathon, it was a perfect location to do some rapid development. So like I said, make package, in a nutshell, it's just make package cookies because it's open BSD and we have multi-packages, etc. I will come back to that later. As far as the debug packages go, we chose, well, I chose a very simple naming convention. I just prepended debug to the front of the package name for two reasons. One reason is that we don't have packages which start with debug, and the second reason was simply that when you look at the full listing for all packages on a snapshot, if you put everything with debug at the front, you're going to have a perfectly normal listing with just a few pages of debug packages, but it's not like you are going to scatter debug packages' names all over the place. I could also have decided to put them in a separate directory, but for mirrors and stuff like that, this meant that I would have to add some logic that I don't want to. I wanted to make things as simple as possible. So as far as creating debug information, the first iteration was simply to add the variable to the makefile, debug files, which listed explicitly which files we wanted debug information for, on which we applied objcopy transformation, and the first iteration as well, I wrote the packing list manually. It was just enough as a proof of concept to allow people to play with it and figure out whether they did manage to get debug packages that made sense for them. First iteration, obviously it's not going to stay that way, but having lots of beta testers in the room, crash test developers, I would say, that was the way to go, I think. As far as creating debug packages, it should be done manually. Just having debug files defined in the makefile was enough. Also, we added obviously extra configure arguments to the configure step. At this point, most of was ready to go through configure and simply add minus g to the cflags and remove street from the install part. That was it. I think that of the room of developers, there were something like 20 of us. Most of them did try that configuration and figured out that, okay, we have actual debug packages that work. Which was great. And of course, I got some suggestions. Oh, okay. So let me show you how this looked at that point. This is just small fragments from the port infrastructure. So basically, we have three variables, debug packages, to say, okay, we are going to have some actual extra packages, debug files, the stuff that does the unscopy routine, and debug configure hours, just in case that configure cannot do stuff automatically that you have to overwrite things. Since we are open BSD, we have multi packages and flavors. How does that work? This is just a summary of basic information regarding open BSD. We do build stuff once. We put install into staging area called fake. And then we are going to split possibly the install information into several packages. Like for instance, if you have big stuff like parts of QTE, you might have the main program, you might have some sample demos and some extra documentation on the site, which gives you three separate packages. So more or less at first, we just create, we just named a debug package file for each of those packages. And in order to split off documentation from the start, any package which doesn't have an actual architecture, if package arch is equal to star, that means that this is the same package for everything, which is more or less documentation package. Then we are not going to actually have debug package cookie for that specific package. And that takes care of almost every case where you don't want to have an empty debug package. So like I said, if you have some of those variables, not empty, like 19 in this slide, you are going to have some really straightforward debug options. Like you don't strip during install, you add some debug flags, and that's it. And finally, there was adding the creation of debug packages proper. So more or less instead of doing a straightforward create package, I put that into make file variable, like create package for package s. And in case we have debug package set, so starting from line 41, you are going to have a second call to create package to actually create the debug package with some more or less the same options as the normal create package, apart from taking the packing list from somewhere and creating some really straightforward, very simple stuff. The only fun things here is that we reuse the exact same dependency information as the normal package, which does ensure that we are going to get the exact same update signature for the debug package and the normal package. Finally, there was the use of debug files. So just add an extra target, which is going to do the hopscopy dance for every files and debug files. At first, I didn't know what to do with static libraries. So there was a test in line 62 that I'm going to do the same thing for each binary and shared object. And I'm not going to do the same thing for static libraries. Later on, I realized that I didn't want to have to do anything with static libraries. So that part is really not interesting at all. So this was the first iteration. And even though it worked, it was a lot of manual integration for people. So the first feedback I got that, okay, this works, but we need to do something that should be simpler to use. So I decided to reuse previous work. So if you refer to one of the last Eurobase.com, I talked about how I would update the list. And more or less update the list as most of what we want. It does possibly read existing packing lists. It does possibly scan the information under a staging area under fake. And well, packing lists under openBSD are annotated. You've got stuff that says, okay, this is a binary. This is a shared object. We just needed an extra annotation for this loading module without any version information, which was trivial to add. And after that, we can just reuse the exact code from the dead pay list to actually grab the debug information and do something with it. Let me check the time. Okay. So why was it simple to do? It's because the dead pay list, as I said in a previous talk, is already fully object oriented. It grabs all the information it needs directly from package create arguments, which means that already we have a parser, which is common to package create and to a dead pay list with a specific derived sub case for dead pay list. And we just need to reuse that class and change a few details to have our build debug info tool reuse most of the code. One very, very important part about that is that this means that whenever we change something in the infrastructure, we add some new annotations, some new features. Most of the code is inherited. You don't have to rewrite everything from scratch. If you have, say, 95% of the support required to handle some new annotation or some new class of files, then you just need to write for five remaining percent. And that's it. You don't have to do everything from scratch. Okay. The only part that was a bit clunky at that point was how we are going to do the Opscopy dance. So the second iteration of the tool did create debug info file, which contains line two free informations for each file. The debug path where we're going to put the data, the original program name and the actual debug info file. So this is still a bit clunky for various reasons. I don't know why I did it that way because actual way to do things should be obvious. I'll let you think about it. You get the answer in a few slides. Don't pick it. But they seem to work just fine for starters. At which point I got some very specific feedback from friends. I've selected two of them, which are most significant. Stuart Henderson is in charge more or less of everything mirrors relative to packages. So we figured out that debug packages is going to grow the mirror size a lot, which is why we decided to make them opt-in. Like it does not make sense to actually create a debug package for everything under OpenBSD. There are lots of programs which takes less than one minute to compile without any dependencies. And unless they are really critical, anybody can create debug packages for them on their own. The real benefit in having debug packages is for stuff that is large. Like for instance, you don't want to recompile Qt5 or I don't know Mozilla or a Gimp from scratch. Those are cases where you're going to gain a lot of time having debug packages. Also, coping with every architecture we support didn't seem like a good idea at the time for two reasons. The first one was it would grow the mirror size even worse. And the second reason is that we still compile file things natively on OpenBSD. So 32-bit architectures are not likely to get debug packages easily because if you're already close to the limit when you try to compile stuff on 32-bit architectures, then it's very likely that debug information is going to push them over the edge. We can revisit those decisions later, but we did focus on actually debugging stuff that would be arch-independent first. So I'm the 64 and then we possibly extend it to over 64-bit architectures. And then for some packages, we can extend it to 32-bit architectures. So that was it with Stuart and it was a good idea. Antoine Jacoutot also gave me some fairly pertinent advice. As usual with Antoine, it's, hey, I tried to do that stuff and it doesn't really work. Like some Python packages didn't want to have debug. So looking closer, it was obvious in retrospect. That's because if you have our links, then the Opscopy dance won't work. For instance, if you have two links to the same program, the first strip is going to remove the debug information from that program. And of course, the second Opscopy won't be able to find anything. So let me give you a better picture. The second iteration of debug packages was simply to say, okay, we have the packing list data. We're going to process that using package create arguments for build debug info. And that will give us debug packing list and some kind of list for debug files. But if we want to take our links into account, we have to just take the staging area information as well. So build debug info on top of packing lists is also going to reuse the information from the staging area to figure out which files are actually hard links. And as you can see from the warning signs, the debug files list is probably going to be a bit more complicated because you can do the same thing with normal files with a single link and files which have an actual hard link. Most specifically, so in the simple case, if you have two links, two names for the same program A and B, then the first time you create the debug information for A. And naturally, B is going to point to the debug information for A. With most binaries, you don't have to do anything. The only snag is that the debug link that you stuff into your program does not have the full path. So if A and B actually live in separate directories, you will still need to do a link for debug information because if I come back to one of the first slides, that one. Okay, you see here, line three, that the debug file information is called .debug slash program.dbg. So it's located under the same directory as your original program. So if your hard link lives somewhere else, you have to create a link to the debug information as well. That's more or less if this is the algorithm we are going to have to use. So like I said earlier, we are in multi-packages land. So there is another part of multi-packages which needed to work in this case, which is that actually we don't build all packages, also packages for all architectures all the time. We have this stuff which is called pseudo-flavors that you set to say, okay, in this case because I don't have this dependency, I'm not going to build this specific sub-package. So there was a bit of glue to write in the infrastructure to make sure that I would only try to build debug packages for packages, sub-packages, sorry, which are arch-dependent, no debug packages for documentation, and also trimmed on the list to remove packages that we are actually not building at this time for this architecture. This is just a detail. I'm not going to expand more about that. Just the kind of stuff that, okay, you have tens of very small details to take care of until everything works perfectly. So once we set up multi-packages, our infrastructure will remove stuff to create build packages. So in Vient, we just set debug packages equal build packages, and since Mac is lazy, this won't get evaluated until build packages is actually set. And this means that we don't have more to do. We just need to actually remove arch-independent packages, and that's it. We have perfectly accurate debug packages set. You could still set debug packages manually to subsets. Like for instance, if you have main application and everything else you don't really need to debug, you could just generate one debug packages. But so far, I don't think we have actually used that for real. The next problem was a problem of workflow. If you remember, at this point, we create the debug information during the staging part. But when we update a port, at some point, we need to run the update playlist. But in order to do that, we need for fake to finish. And at this point, so second iteration of the debug packages work, we do the staging area normally. And we run the copy debug info target at Vient of fake. So if some files are no longer there, which is quite possible because the packing list hasn't been updated already, then our loop to extract the debug information is going to fail. Possibly even worse, if there are new binaries, they are not going to get through that loop. So first thing is that possibly it might fail, which is by the idea. So we have to have copied a bug info just once if there are any issues, which isn't a good idea because people don't always look at warning messages. They need actually errors to stop and reflect on what's going on. And also, you might create the bug packages that do not contain everything. So the actual solution was to wait until make package to run build debug info and have it extract the debugging information, not really, but create some file that will extract the debugging information. So instead of having make package extract the debugging information each and every time, it does depend on the make file that actually extracts the debugging information. And that make file gets updated every time make fake is run again, which means that we do it just once, which is great because we finish fake. We do update playlist. Then we do make package. If the playlist has been updated, then it will recreate a make file which is up to date with every binary we need to extract debugging information from. And also because it's a make file, each debug file is actually depending on the underlying file with debugging information. So we run a copy just once. If it's already been run, then next time it won't happen again. So we don't have the problem that we try to extract debugging information from a file that's already been handled. In retrospect, this is much simpler than the second iteration I had. I had lots of problems of chicken and egg between debug information and file and creating scripts that extract debugging information. Okay, we have files that depend on other stuff. It's what make files have been made for. So how does it look in practice? Here is an instance of a generating make file. We have two rules which are a bit complicated. Obscopy rule is we create debug directory first. Then if we are privileged, which means that we are actually running stuff as different user for the staging information, we have to extract the stats of the file we want to change. We have to make it read-write because most binaries are usually read and execute only. I did a check on line 8 and 9 checking whether we were actually getting debug information. So this is the first indication that something went wrong during configuring build. If you don't find any debug information in some file, it means that some stupid framework managed to strip the debug information because before you had any access to it. Then you do the code that we talked about, ObscopyDance. I'll talk about DWZ later. Then you restore the original permissions to your file and that's it. For links, you got a second rule, line 18 and 19, which is stupidly that you create a link from the original debug information to the final debug information. Then you simply have a list of targets where every debug file is just depending on the actual normal file and invokes either the Obscopy rule or the link flow. Okay, this make file does fall off the right hand, but it's not really interesting to know what's going on on the right side. So this made for a much improved workflow for creating debug packages. In the whole process, you had to run fake staging area, which generates some debug info which may or may not be accurate. Then you run update pedis, which will invalidate the meta info necessary for files to debug. So before actually packaging, you need to remove a staging area, recreate the staging area, which does take some time for big parts. And it was frankly a major pain to do things that way. So in the new process, you do make fake. There is no debugging info involved at this stage. You run update pedis, then at the start of make package, you create debug info which is accurate, which corresponds to the staging area and the updated packing list. And you end up with up to date packages. End of story, this was much cleaner and it worked much better. So, okay, just some details. If we run make package once and created the debug info, we just need to have some up to date dependencies in the make file. So that we don't need to extract the debug information if it's already been done. On the other side, if we run update pedis, then we regenerate the make file for debug information. And that make file which we consider each file one after the other, and extract debug information or not for that file. There is a small edge case where it might not work specifically if we had a given program in a version of the packing list. And in the next version, we have the same program and hard link to that program. Then maybe the other might be wrong and we might try to extract debug information from the wrong file name. So far, we haven't run into it. It's the one case where you might have to clean fake and do things again. And otherwise, it just works. So, to sum up things from the porters point of view, you need to obtain, you need to create, to declare which packages you want debug stuff for in order to keep the repository under a manageable size. It's not just only the size needed for mirrors, but you got to realize that we have new snapshots every few days for md64, like every two or three days, which means that you need to find out a large number of files to which mirror, which takes some bandwidth obviously. So, if you grow the repository too much, it will take more time for packages to end up on each mirror, which makes things less useful actually. So, obtain is mostly debug packages equal build packages and that's it. In most cases, 95% of the time, you don't need to do anything to have debug information generated correctly. And the part that actually processes the staging area and creates the debug packages entirely automatic. I haven't had to add any exceptions to that part. It's very specific, as long as the packing lists are up to date, as long as you have actual annotations for binary, short libraries and modules, then the part that creates debug packages is entirely automatic. A quick flashback to TWZ, if I have time. Yeah, I have. So, this is a tool that was found, I think, Brian Calhan, or maybe with a bit of help with Jeremy Courage-Gendreau as well, to make the debug packages a bit smaller. Turns out that whatever ends up as dwarf debug information is not as small as it could be. And some people wrote a nice tool that makes it a little bit more compact. So, that's just the detail that we depend on TWZ for everything, except TWZ, which has to be able to call itself to compress its debug information. Turned out to be interesting, like there are some cases where the debug information is complex enough, Mozilla, that TWZ won't work. But apart from that, it shrunk debug packages by about 10%, which is not too shabby considering that debug packages are already compressed with Zlib. On the packages side, there are a few things to do, actually. At first, I simply did, okay, you just packaged the debug stuff that you want, and you were able to debug stuff. But you have some shearing effects, most specifically, in order for GBDB to work correctly. It has to have a perfect match between binary packages and the debug information. Otherwise, it's going to crash in awful ways. So, the way the dwarf people made things work is that each program generates a specific hash for its binary code, and that hash is going to be the exact same hash for the debug information that matches this program. So, each time you actually recompile this, you are going to create some new matchy colleges for debug information. And so, if you have a given package and a debug package, which are off by just a little bit, that's enough for debug information to work. So, in case you are working with releases, this is not a problem. But obviously, in case you want to work with snapshots, you're going to have some issues. If you say, if you install some application and then later on, you figure out that, okay, I have a bug in this application. I want to debug it. Then you have to actually reinstall the application and the debug package most of the time, because the debug packages that you are going to find on the mirror are not the same ones that the application was compiled with. So, I just added some two very simple options to package that to deal with it. The first one was to add an option, minus D. Surprisingly enough, I did not use minus D for anything yet. To automatically install and update debug packages when available, silently, which means that if you don't have a debug package for something, it's not an error. You just add debug packages to the list of stuff that could be added, updated if they exist, which takes a lot of room on your system. And the second option, which is what we use most of the time and which is a preferred option for developers, which is to set directory as debug package cache. And each time you add or update a package, then it's going to look for the corresponding debug packages. And it's going to save them for you in their compressed state in that debug directory, which means that later on, if you want to actually debug something, you just install the debug packages from your cache. And you have a guarantee that these debug packages are going to be 100% up to date with respect to your package, specifically because maybe the package and the snapshot mirror is going to be different. But each time you update the package on your machine, it decides to look for the debug package and don't load it as well to the debug cache. And that's enough. It ensures that each time you get a package, you have the debug package on the side as well. I think that's about all. We are something like two years later. I've looked at some numbers we have this day, something like 700 debug packages on OpenBSD, which means that most significant stuff actually has debug packages. Like we said, it's opt-in, which means that other stuff doesn't have debug packages because no one saw a need for that. Small packages, stuff that are actually interpreted stuff for the most part. We could easily create debug packages for more stuff, but so far it hasn't been useful. There are still a few remaining issues, like some frameworks make it complicated to have debug information on the side. Those poor souls who are working with CMake know what I'm talking about, where you have release mode and a debug mode, which are completely different. And basically here, what we want to do is we want to stay in release mode, but compile stuff with minus g and strip stuff during the fake stage. And CMake doesn't make it very easy to do, but I think that more or less it works this way, these days. There's always the open issue that compilers and linkers are not perfect. So adding minus g might change the compilation and linking to change a bit, and you might end up with binaries, which are slightly different from what you get when you just run your optimizer. And finally, that's the question for newer or smaller platforms. Which platforms you want debug packages for? Do we want them for WIS-5, or do we want them for Spark64, or ILM64? That's really easy to do. We just need to add the platform to debug architectures and see what crashes. This is not work I've been doing. This is stuff I'm delegating to people who are in charge of such architectures. Just to be said that what we have here works just fine for MB64 and can be used for about anything else. Okay, more or less on time. Do I have any questions? Yes. Yes. That was the question from Fred Finster, raised his hand. Brad, please go ahead. On the slide, not on the slide. In this thing, almost accurately, it's not very visible. This is Kristoff says in the chat, please feel any questions. The room will pause in about nine minutes before the next talk. So, if you mark it through. Yeah, sure. I hope it was possible to understand things so far. Yes. We'll be looking at whatever turns up in the chat then. Several people are typing. Even if you don't understand the old sales, we'll have the other result. Yes, well, it's good. It was mostly to explain that something that's actually very simple in concept can lead to lots of small data to implement. I think we're on to the applause then. Thank you very much Mark. I hope we will, next year, hopefully, we'll do this a physical conference. The cool thing that this year it was supposed to be in Vienna, and actually no Vienna. It wasn't all that bad to have this. I think we probably will be announcing it today at the end of the conference that there will be a physical conference in Vienna next year. I have to work on something to be able to come. Yeah, so basically we'll need to think of something to say then. I think there are no further questions. As I said, this room will go away for a few moments in a few minutes. The next talks are up at 11.45 in this room. The next talk is from Peter Chanik working with the BSD ports. I think he's a free BSD guy. In the other room there's a ghost BSD talk by Andy and the arts. So please feel free to mingle. There is also this special little chat that you've got a link for has lots of rooms with the hangouts. Can you repost the link to the chat because each chat is it and I didn't say it last time. The special chat was, hang on, you were... Okay, Luna already paces. Okay, so since there doesn't seem to be any more questions I'm going to give back the microphone and the webcam. All right, thank you very much. You're welcome. Fully, see you next year.