 Right, the next talk is Debian's debugging in debacle, the debrief, I'm not quite sure what debacle did to merit that, but, and it's by Erin and AJ. Good afternoon everyone. So we're going to talk about Debian and debugging. Debugging is one of those things that, in computers, is widely recognized as being unglamorous, but essential. You have to debug programs because, well, I mean, unless you're the perfect programmer, there will be bugs and you have to discover them and fix them. And it's one of those things, especially in free software, where you have access to the code that it's not just for developers. Users are able to jump in and debug things and start fixing them themselves. Okay, so the point of debugging is to be able to know what's actually happening in your program rather than just what's supposed to happen. Everyone knows what's supposed to happen. It's meant to just work and do exactly what you want all the time. But unless you actually have an implementation of DWIM, then that's not going to happen. So the way you debug is basically to make sure you see what the computer is actually doing, which is monitoring the internal behavior of the program. How many people here have done debugging? How many people haven't? How many people have, wow, oh my goodness. How many people here have not done debugging but had a program crash on them? How many people who haven't done debugging? So put your hand up again. That's you, Natalie, very good. No, come on, put it up, put it up. Okay, and how many of you have not had a program crash ever? Look, no hands up, imagine. Except for Lamont, right. So the whole point of debugging is that if a program crashes on you or doesn't do what you want, you can look into it and work out how to fix it. How many people here haven't used strace? Have not used strace? Haven't. If you don't know what strace is, you haven't used it. Okay, that's very good. If you've run strace, you've done debugging. Okay, so it's a matter of monitoring the internal state. Sometimes it's going to involve slowing the program down, so you're not doing a billion instructions a second because you can't follow that yourself. And it's also observing the effect the program has on the system. So strace will show you the effects of function calls on libraries, or you can just look at the files that your program's modified or whatever else and get a general idea of the implications of what's happening even if you can't see it happening yourself. And yeah, basically the other way of doing it is just changing the input to see what the output is. Has anyone tried reverse engineering stuff here? Has anyone tried hacking through to discover how... So one of my first hacks, which is my favorite, I think was reverse engineering the password encryption in Udoro? Has anyone tried anything like that before? It's awesome you should try it. And Udoro at least a few years ago was very easy to crack because it was just an XOR that depended on the password link. So give it a go. All right, you want to have a go? There are several different kinds of debugging. A relatively popular one is called printf debugging, and you can also use asserts. And that's basically where you just put printfs or prints depending on the language all over the code to see what the result is to kind of figure out where things might be going wrong. One of the major benefits of this is that it's really easy to do. It's certainly one of my favorite kinds. Oh, yours too. But some of the cons, at least with asserts, it makes the programs run slower. And it does require familiarity with the code or language because if you don't know where you might need to be looking for problems, you're likely to put just printfs just all over the place. So it can take a while to figure out what you're meant to be doing and where. S-trace is another mode of debugging, which AJ was just talking about, which a lot of times you just kind of run S-trace in the program and it follows it and you see a lot of opens and writes and reads and such like. And you can kind of figure out what's going wrong by looking at it to see if it's maybe not opening a file or what's going wrong. And the pros are that it gives excellent visibility to the program's interaction with the operating system. And it doesn't require source code access, so you don't need to be able to put printfs all over the place because it'll just show you basically what's going on between the program and I guess the kernel. The cons is that there's a lot of information. Anyone who's run S-trace knows that it basically just spews tons of garbage and some of it is relevant, but a lot of it can just be thrown away just immediately. So you have to filter through a lot of that. And another problem is that it doesn't relate the problems to the source code directly. So while you may be able to see that it's maybe not opening this file or it's not, the permissions aren't right on something like that, you can't exactly tell where the problem is in the source code and that can be problematic. Okay, so those are printf debugging is or echo debugging or whatever you're echoing in your programming language is great and S-trace is great, especially if you don't have the source code, but the best sort of debugging is generally symbolic debugging. Who hears RunGDB or XXGDB or whatever else? Who hasn't? Okay, that's only a few, that's good. And after running it, who's done more with it than just try and print a backtrace or some very basic stuff like that? Okay, yeah, that's about right. There's a lot you can do with a debugger. One of the best things to do with a debugger is stepping through the code so that you actually get to see what's going on, how variables change and just make sure that that does actually reflect what's going on. So the main benefit of symbolic debugging is that you just have to run a debugger and you point it at the source code, you point it at the program that's running and then it'll step through what's happening in line with the source code itself. So it's also a reasonable way of actually learning how the language works. So even if you don't know it in advance, you can step through a couple of lines, have a guess at what they mean, pause it there, look at your system, see what's changed, say, okay, well, that did what I thought it did. Let's continue on to the next thing. And it has the benefits of both printf and strace debugging because you can see the source code, but you also don't need to modify the source code to do it. You can just run it through. So it obviously requires source code and it requires extra stuff. All right, Debian's approach. So everyone knows, I presume, that we strip debugging information out of all our source code or out of all our compiled binaries. Yeah, we strip it out of our source code, we frustrate everything so no one can use it, it's great. So we basically focus on smaller size rather than ease of debugging because everyone wants to download our binaries and increasing the size of X by 100 megabytes or so per architecture just isn't going to be a useful thing to do. Now, we go halfway though in that we do actually build all our packages with debug symbols initially and then strip them out. And that's okay because it lets us debug things while we're trying to build it, but it doesn't really help our users out. So we've also got the debug packages who here's actually used the debug package? Yeah, you'll notice my hand's down at the moment. So debug packages are fairly rare, so most programs don't have them. I think there's one program in the archive that has it, all the rest are libraries. And I think that's GIMP, right? Game. Yeah, something that begins with G who cares, whatever. And so that allows you to install a separate package which does include all the debugging information. Did Joey leave or is he still in the room somewhere? Okay, so Joey's also integrated some stuff in DebHelper which incorporates some of the stuff we'll be talking about later inside the debug packages. And obviously the problem with debug packages is it's an extra step in trying to maintain a package in order to get it. And if the maintainer doesn't want to do that, which most of us don't, it's a debug information. What's the next slide? Yeah, if we go back. So who here is familiar with the original FSF debugging? Well, debacle. VDAL and... Oh, okay. So... So does everyone remember that once upon a time the Debian project was a subproject of the Free Software Foundation? It was, it was initially. Joey and Murdoch got basically started. We split from, we split from the FSF after a while basically because we decided that we would rather choose our own direction for what we were going to do rather than just following suggestions from the Free Software Foundation. One of their suggestions was that, well, one of their, yeah, suggestions will do. Was that... Sorry? One of Richard's suggestions, right, was that we, as well as the whole Debian GNU Linux rather than Debian Linux thing, was that we include all the debugging information because that encourages people to be able to look at their source code and make use of it, and that's free software for you. But we decided that, no, this wasn't our goal, this was, okay, the Free Software Foundation's goal, but it didn't meet the needs of our users, so we weren't going to do that and we weren't going to take the suggestion of the Free Software Foundation board, or Richard. And basically that was the split between Debian and the Free Software Foundation. That continued for a while, and we've kind of gotten back in their good books, and then we've done the whole GFTL thing and whatever. Okay, now it's your turn. Thanks. So some formats that are very common in Linux and computers in general are elfs and dwarfs. Elfs are executable and linkable formats, or the executable and linkable format. And it's a common standard for executables, object code, shared libraries, and core dumps. But this doesn't have the debugging information and for that you need dwarf, and that is debug with arbitrary record format. And it standardizes the symbol table use for debugging. There are several ways to manipulate the information in these, and Ben Utils has a couple of them. One of them is... I never know how to pronounce this because I've heard it so many different ways. Object copy, which has only keep debug, which strips out the debugging symbols, and add GNU debug link, which links the program with the symbol table. And there's also strip, which is pretty self-explanatory. It just strips the debugging symbols completely. And it also has similar options so that you can keep the debugging symbols if you want. Elf Utils is something Red Hat has been working on, specifically Elward Stripper. And it has a very clever program. Okay. So in Red Hat systems, in RPM, they have standardized on keeping debug symbols and I think var lib debug, or something like that, or user lib debug. And basically what debug edit does is on the source location for the debugging symbols so that you can just download the debugging information. And when you... I think when you get a core dump and you submit a bug report, it automatically pulls in the debugging symbols. But the problem is that it's not free, so we can't use it. It's licensed under the OSL, if I remember correctly. Excuse me? As of about two weeks ago, 0.120 is now under the GPL. I have the tar ball just here. I haven't done a full license audit, but I noticed this the other day. Oh, they've GPL'd it. So we can use it now. Okay. Well, talk over. We're done. Because basically what we were thinking is we were going to have to reimplement this program and that was going to be problematic. So, you know, so I guess our work is done. We just need to kind of fix it up for Devs instead of RPMs and we're good to go. Great. Thanks. So, okay, so actually, it's not quite that simple. For Debbie in the bugging, sorry, I... Sorry, you completely threw me for a loop there. Actually, here, do you want to go ahead with that? Okay, so... So LfuTills allows you to have the separate debug info and the actual source. So users normally just download the binary package, not the source, the binary package. And if they then need to debug, they can download the symbol table that they've stripped out with LfuTills. But they need somewhere to download it from, right? So the question is what do we do with that? One option is just to put it in the dev, but again, that would increase the size of numerous programs that have lots of symbols by a huge amount. And it's probably not really acceptable for X, and if you're in a backwater like Australia with no bandwidth, then it's not really acceptable at all. Can you sound like Kristen? Yeah, screw you, hippie. So we really want to package it separately somehow, and the question is how? We can just do debug packages. That would require a lot of extra work for every maintainer, essentially, though, so maybe that's not a great idea. And another option is a variant packaging format. So a couple of options that seem straightforward would be just to have a tab all of the information that LfuTills or OptiCopy would spit out, or we could have it as a separate dev, possibly somewhat like UDebs or something like that. Okay, so if we had it as a dev package, that would increase the size of the packages file. If we just had it as a tab all, then we'd have to not be able to use app to get it or we'd have to re-implement something because app doesn't download tab all in some strange format that no one knows. And if we increase the packages file by the size of the it currently is because we're adding a debug package for just about every package, then that's probably bad. And a variant packaging format like a tab all would require changes to de-package because we want all these debug things to be trapped and kept in sync with the programs they're linked to, require changes to apps so that it actually knows how to download it, would require changes to that because we'd like to put it in the archive somewhere and keep it synced up and whatever else. Another alternative is to install it in a separate repository. Like we have DAB-SRC and we can make DAB-DBG sort of. Like we package source files separately. Because you can add a package build stage strip but keep the buggy information and put them in a separate package and then instead of putting it in the main pool which you can then hook into if you need it. Another option may not be viable just came to my mind. No, we were thinking of doing something like that and also make it so you could just apt-get debug and then the package name which would possibly require either splitting it up archive-wise or something else but also in light of the SCC mirror splitting I think that probably would make it easier too for people to decide, you know, for mirrors to decide whether they wanted to carry all that extra debugging information and so forth. Just one other thing with splitting it up is that we obviously want this debug information available for everyone. So if you upload an AMD64 package with debug information and it gets rebuilt for i386 we want the buildee to upload the debug information. So having that have to be uploaded to two separate places might be more complicated than changing the archive. But yeah. So and also the format for debug packages they could correspond to binary packages which would mean you have one debug package per binary package which could be really quite a lot of debug packages or they could be combined into one source package or one debug package per source package which I guess kind of a negative thing about that would be that it could possibly be really big and not that useful especially for source packages that build a bunch of binary packages but it must be easy to generate it can't require source package modification and it has to be automatic because you can't have all of these debug packages going through new and people waiting on processing and sending nasty emails to develop. Okay so it's no good just getting the debug symbols right because you need the source that it links to it's no good saying okay this machine code operation is line 56 of the source if you don't actually have a copy of the source and one of the things with actually getting the source is that no one builds it in the same place. How many people build their source code under homeajs slash something or other? Yeah so Keith and I are great but everyone else you're not going to be able to we're not going to be able to have it work for you. So one of the things that was going to be a problem but evidently isn't now is that without Elfutils there's no way to rewrite the source directory. With Elfutils there is a way to rewrite the source directory so that's great we can basically get Elfutils GPL installed on everyone's system hopefully and then once you build it in whatever scratch directory you have the maintainer scripts through probably debhelper can say okay here's the strip here's the strip debug information I know it refers to this scratch directory but we're going to change that to instead refer to user source debian package slash package name dash version slash all the source code. So yeah or something like that so there's an example up there of a name you'll notice that it has the dash 13 we don't normally build packages with the debian version normally we just build them as the upstream version so I'm not sure what the best pattern from there is maybe it should be user source debian maybe it should be just user source how often we're going to want to have sources for different versions all that sort of stuff is an open question that people need to kind of consider and the other problem for glibc and similar packages is that if you just dpackage source dash x that you don't actually get the source you get a tab all of the source and a bunch of patches and that obviously isn't any good for gdb either because it can't look through tab balls and it's not going to know which patches to apply and whatever else. So who here is familiar with the Wigan pen format? Who here has never heard of it? Okay the Wigan pen is a pub in Canberra that was really near the Linux conference Australia 2000 and what five venue and Scott James Remnant and James Troupe and Brandon O'Day and Brandon O'Day rather and a few other people got together over beers and invented the Wigan pen source format which is basically a multiple tab ball and multiple patch format so if you have multiple tab balls that you need you just put them straight in the archive if you have multiple patches that you need you put them in a debian dot tag dot gz package and dpackage will apply those patches automatically for you so that you don't need to do debian rules unpack or debian rules make source or debian rules apply patches or whatever strange thing it is that might run all sorts of strange stuff on your system you can just do dpackage source dash x but unfortunately the Wigan pen source format isn't actually supported in the archive okay yeah well this is obviously a relevant now so great is there anything else we've got? yeah so are there any questions? one of the problems I've run into with the debug packages in the archive is that since debian policy mandates building with optimizations these programs are optimized and some of the code gets optimized out and it's impossible to debug and you have to rebuild those with dash o0 anyway so is there a good solution for that? I know we hadn't actually thought of that so that's a good point we'll have to kind of look into that Simon right now when you go get sourced through apps like I have to get sourced a package you can't really get that package that package is source code which makes sense because perhaps you want to build it with other libraries or something like that but for debug packages when you automatically get debug stuff have you considered automatically pulling in all of the debug symbols that this package build depended on and what kind of implications that has for users? so I think there are obviously two implications one is that when you step through something like going into libraries and maybe you don't want to do that and the other is that it will just download a lot more stuff for example if you're trying to debug a GNOME app you probably don't actually want to get all the debug information for all of x and step through all that I don't think that would be a major issue because after already should have logic to download dependencies and so forth and that should just be able to be brought across another interesting sort of thing is saying that I want everything on my system to be debuggable so when I run out to get install some package then automatically download the debugging for automatically download the source and then I've got everything I feel an estimate about how large the debugging information in the whole archive will be some of the debugging packages I've been looking at were incredibly huge especially for x probably for some other packages back ports and uploading them wasn't very fun because of that huge is the best estimate we've got but not all the packages in the archive need to be built with debugging symbols at all because not all of them are in C or C++ and so forth you're not going to have debug packages for Perl programs and Python modules and stuff like that we don't have any real statistics but we're working on it cause we wonder that if it's too much then the usefulness is sort of diminished because it would be a huge mirror hit it would be a hit on build these which could potentially be just way way too much but at the same time maybe the selective debug packages that not many people use is not terribly useful it might be better if it were more spread out I just wanted to ask you because I experimented with building OpenOffice.org a few months ago and it was 324 megabytes and one gigabyte installed size so it could be a real problem but I can't space or disk space on the system it could definitely be a huge problem we welcome all help so basically isn't enough free space on FTP master at the moment to even do this at the moment where filled up all our disk space with debug and all the packages that have been removed but kept in the morgue just in case so we obviously need to do something about that and get more disk space before we can even do debug packages for basic kind of base or standard or so but once we've done that I mean uploading it to the regular archive and then splitting it to a separate site a separate mirror network much as we've got a separate mirror network for the first class architectures I386 and AMD64 would also be possible but yeah that depends on how huge the mirror here is and I mean it also gets multiplied by every architecture of course which advice would you give to developers who like to include debug packages as far as I know CDBS supports some kind of magic for that for example building a debug package is a reasonable thing to do if you have enough bugs that you expect it will be useful for users to actually look through the symbol table but more generally I'm not sure that's worthwhile very often and I think that's reflected in the number of people who do it already there are two other options that you can do one is just say hey here's a source code rebuild it run it again send me the core file from that so I can look through it myself the other option is as you build the package keep the unstripped binaries from your build tree around then if someone sends a core file from their strip binaries you can just compare that to your original source but obviously that only works for the stuff you build not the stuff the buildees build so if they're on AMD64 you're on I386 or vice versa it doesn't help I was a little confused earlier you had an implication that you wanted to produce debug information packages with no source changes do you have a plan for that because I'm kind of racking my brain I can't figure out how you would for one thing not every package uses dev helpers or anything like that so you can't necessarily just rely on just changing DHStrip to do anything although that would be a good start do you have a concrete idea of how you would produce an additional package and not have to change a typical Debian rules file okay so if you're willing to accept a typical Debian rules file as having lots of dev helpers stuff in it then that covers most of the archive obviously if you've programmed exactly what your package is going to build yourself like Manoj usually does then you'll have to change your packaging to get the benefit of this so DHStrip can generate the separate already generate all the appropriate debug information in the appropriate place and presumably with LQ tools it'll be able to standardize it on whatever source location we need it's then a matter of deep package, build package probably actually grabbing all that stuff from somewhere under Debian slash capital debug perhaps as per slash Debian Debian temp capital control once it's collected all that and put it in this new package it's then a matter of Debian gen changes noticing it and whatever tool you're using to upload the package uploading it and then DAC obviously taking care of that does that? So you think it would mostly be a matter of changing deep package dev and then maybe a few changes of depth helper as well? I think mostly a matter of depth helper changes and a few changes of deep package build package but yeah just to comment on that the one of the latest releases of depth helper if you have combat level set to five and all you have to do is put a package in your control file that ends in dash debug and it'll do everything automatically sorry? It's exactly the same except creating a new package and requiring changes to the source so maintain our uploads and so forth The other option would be to replace strip I mean not only DH strip but also strip or just but that would have consequences on people building local code so maybe with a wrapper using a special environment variable or something Did everyone hear that? Okay so the option proposed was that instead of changing DH strip we changed strip itself to basically do what DH strip does and rather than just deleting the information the debug information put it somewhere appropriate which then can be found later and then of course that would require no source changes but it would also break the expectations of what regular users want when running strip they don't expect some user source debug directory to be created by underway they're working and populated so that would require less changes but might be more of an effect Yeah and it could be conditional based on some environment variable or something so any other questions? Cool then I think we're done Thanks very much