 Okay, my name is Jan Blunk. I'm currently working as a technology architect for L3 Maintenance Security Department in Novel, so I'm responsible for the open source and enterprise products. And I will talk about application crash reporting. So actually what is this all about? It's all about bugs and finding bugs. That is actually a copy of the logbook where, yeah, it's this story that the first bug was found and it was actually a moth inside of a capture in a relay or something from one of the very old computers. And during that time people wrote down logbooks for their computer and what they were doing and what problems they were finding. So nowadays, at least I don't know anybody who's writing a logbook for us on computer and most people don't even read their dialogue messages stuff or at least not usually. And most of the application crashes are not blocked to somewhere in the system. So most of the time we don't even write out core files. So the user usually don't realize that the application has a problem and is sack folding or whatever. So what does the other have? The others like Microsoft they have Windows error reporting service. This is, I think it's existing since Windows XP was the first release that they included that. It's also available for ISVs. So for, you can register in the Winqual project and the only thing that you need is you need to have a valid Verizon certificate. But on the other side, so the program itself is actually free even for on Windows for ISVs. And the Windows error reporting is collecting certain information about the application itself as far as I know it saves the core file and it's uploaded to a server. So Mac has it as well. They have also a problem reporter as far as I could not really find out if this is also available for other companies or for ISVs on Mac. I'm pretty, I'm not sure. Even the iPhone has it. So if you put your iPhone and connect it with iTunes automatically application, the crash reports are downloaded from the iPhone and send to Apple to analyze the bugs. And yeah, they do it also system-wide. So you can also see the kernel problems in the crash reports there. And even Sun has it. So actually I didn't find a really good picture of a Solaris brand. So I choose the Solarium. And they are cheating because they use detrace for it. We could use system tap as well for generating this report. But actually we don't do. So what does Linux have? Thanks to Ubuntu we have UPPORT. It's an application crash reporting system which is actually kernel-based or kernel-supported running in user space. UPPORT was ported to Fedora. So it's also available on Fedora. And thanks to Google, they have this Google Summer of Code project. And a student was interested in porting UPPORT to OpenSUSE as well. So now we have application crash reporting since 11.1 as well. So what is UPPORT doing? UPPORT is basically a collection of Python code which is called automatically by the kernel during the application crash. So instead of writing out the core file to disk, it's calling this application. And the application is then collecting information about the application crash. It gathers potential user information about the process environment and the operating system. So since it's, I will go into more detail later, since it's a two-step process, the first step is very similar to just get the dump, get proc maps, get all your information that you cannot gather later. But all the information about the process environments get it now and write it out to disk. And when the user is notified, then you can collect additional information. So, yeah, it runs in multiple steps. So it notifies the user by a small applet. There is an applet available for QT for, and there is an applet available for GTK, or written in GTK, both written in Python as well. So actually what the applets are doing, it's very simple. They are just watching or putting a notify on the directory where the crash reports are stored from the application. And then you get a pop-up, and then you can do the additional stuff. So, and optionally, the applets support you to send the crash report to the developer. So at the moment, we only have one sent to a server, but I will go into more detail later. It is possible to support multiple servers, so to have one open-suzer server, and one Firefox server, and one server for GTK or something, and to upload the reports into different databases. So how does it look like? This is the GTK applet, because I'm one of the very few persons actually inside the using GNOME, and most of the people use KDE, but I use that one. So here you can see it's the application's only purpose is to segfold. So then you get this pop-up, this notification, you can press report the problem, and then you can send it, or you can have a look in the contents of the report. So the report itself looks like this. It's just a plain text file. Here you can see it's just the output of PROCMAPS, so this was a compass crash, and the reports are structured like that, so it's a key value thing, list this command line. This will release, so this is all information that you can gather afterwards. The PROC environment, so which path was active at that moment, and this is the core file, and here's even more information, build IDs, and load addresses of the shared objects that were loaded during that time. Package dependencies, which files were modified in the package dependency, so that is quite interesting sometimes to see, because you can rule out certain stuff just by looking at that list if something was modified. So yeah, very long list. And then it dumps out information from GDB, the stack trace, stack trace top, which is actually the topmost frames, and the threaded stack trace, so yeah. So what happens with the reports? As I said, we have a crash database server. At the moment we have only support for OpenSUSE, so no enterprise products, and all such things, only for OpenSUSE and only starting with 11.0.1, due to different other reasons. You can find the crash database server at crashdbaopensuse.org. Yeah, as I said, application-specific servers are also possible. I have the local version of the database. I have it here. This is how it looks like, so I tried to integrate it into the common OpenSUSE website look and feel. You can search for specific reports, so all reports from 11.0.1, that's actually because I don't use factory yet, otherwise you would see a few things here. And then you can see that was the report that we looked at just now. It's on disk here, and this is how it looks like when it's uploaded. It gets a UUID. You can see when the crash happened and when it was uploaded. And here you can see that the core dump itself is removed. So, oh no, here it is. Oh, so that's a bug. But usually it should be removed, so I think the others, yeah, the others don't have a core dump. Yeah, this is how it looks like. Free text search is also available, so you can search for specific words in the application crashes. So, what is missing or what I'm working on at the moment is the further processing of crash reports. So, it would be ideal if one person sends a crash report that you can automatically detect duplicates of the crash report and connect them together. So, actually I want to gather all the reports, so I don't want to prevent the uploading, but on the database, it's easier if you look at the reports if you have the duplicates as well. So, what, on the moment, there's also searching for available fixes or work around. So, to add a feedback channel to say, oh yeah, your problem is fixed with maintenance updates, blah, blah, blah, something like that. And searching for regressions. So, this would also be interesting to have a regression detection. So, if the backtrace actually is found and the database server thinks that the bug is already fixed in a specific version, and then the bug shows up in a later maintenance update again, so that you can automatically, yeah. What happens if my application crashes? I click on send the report, but I do not have internet access at the moment. Is it stored or is it stored? Yeah, it's stored. It's saved under a var crash, and then you can send it later. At the moment, it's not, you have to run it from the command line then and give it the full path. So, upper command line interface minus C and then the report file. Yeah. Would it be possible also not to send the core, but only the backtraces, because some people may not want to have the memory of the process sent? Yeah. So, I don't know why this happened here, probably because I uploaded the crash report with Kool. But the normal applet is removing the core dump, and it's doing, I have a page on that as well. And it's also anonymizing the reports. So, it's not only the core file that might contain sensitive information, but it's also user names and such things. So, the account name and the Geekos fields are replaced by username, just the string username. So, and the queue and working directory is also removed from slash proc, because this is information, which is not, yeah, most of the people don't want to send. But nevertheless, the user always have the possibility to review the report before it's getting sent, and he should really do that, because I think in certain situations, you just don't want to automatically send the report. Yeah. Hi. So, you generate the backtracks in the client side. Do you have full debug info available when doing that? No. But this is one reason why we only have it since 11.1, because I enabled built IDs for OpenSUSE as well. And that was not enough for generating backtracks. You need to have unwind info. So, we built everything with as in Cronus unwind tables, and we don't strip the unwind tables. So, actually for C plus or all applications that supported exceptions, or where the programming language supported exceptions, it was necessary to build that as well. For all applications that don't support exceptions, the as in Cronus unwind tables are usually quite small. So, it's not bloating up your application binary. And so, it was, I think it was roughly like 5%. So, which the distribution was getting bigger, but that was only a problem for the live CDs and not for the DVD. So, there was enough space available. And what we do is you could retrace a report as well without the call. So, I'm actually parsing the report, and I can extract the addresses from the stack trace, and I can look up the correct symbol information from the debug info files. This application exists, is all written in Python as well as UPPORT itself. And the problem is to get the debug info data about the built IDs out into a database. And so, that is missing actually at the moment. And the uploading of the retraced backtrace to the server, which is also not implemented yet. This is due to the reason that we created our own server. So, I have a second question, if you will. You've created a separate crashdb.opensues.org rather than using the bug tracker. So, are you going to end up with a parallel bug tracker where people have to check two places? The thing is, due to internal political complexity, the idea of extending bugzilla in that way, that it supports uploading of crash reports was immediately adwapt. Of course, it is close to impossible to get certain extensions. It seems like you could use crashdb as the upload target, but have that file bugged for you. Would that be possible? Yeah, but still, that would be something like communicating or scripting bugzilla. It would be possible. And it is also planned to add support for linking back to the crashdb in the bugzilla, because that would be much easier, that you have your reports in bugzilla saying there is a crash report for this bug existing in the crash database. So, but that way, I think it was much easier to come up with a basic implementation than to start working with the bugzilla stuff. So, yeah, but it is actually planned. And also, what else is planned is a notification. So, we have a notification service for the build service. It's called Hermes, and to connect this uploading or the crash database with a notification service so that the responsible developer gets a notification about the application crash. That's really to annoy them. So, how does it work? Technically speaking, it's a kernel patch, which is existing in the upstream kernel since I don't know when. It's core pattern, or it's introducing a sys control called core pattern, where you can write in, it was intended to write in a special format of the core file name, but undeclean extended it to have support for piping into an application. So, what is actually done, instead of writing the call to disk, the kernel pipes the core dump into the upward application itself and gives them a process ID and a core size limit and such things. So, what upward is then doing, it's writing the crash report out to disk. Here you can see it's usually under VAR crash, and this is how the file name is set up. It's the name of the binary, the user ID of the application that was running under which a user ID. And then there it's picked up by the notification applets here that is supported by the GNOME settings daemon. So, it's a patch just of a file to monitor a directory. And for KDE, it's a module for KDE daemon. Yeah. You use a unique ID for the files, because in the use case I'm thinking when you go on with your computer for maybe one month without access to a network, I'm sure you will end up squashing crash reports. Yeah. The interesting thing is there's a corn drop, which is removing crash reports that are older than one week. So, at the moment it's expected that you send the report in this time. And after that, you can always download the war report, how it was sent upstream from the database server. But this file name is also used to detect duplicates. So, when your compass or your pigeon is constantly crashing all the time, you don't want to fill the disk by 100 duplicate reports. So, that are all names in a different way. So, this was, I think, the simplest way to do that. But that's how it's done in Ubuntu and in Fedora as well. So, I think we stick to the naming in OpenZooze as well. But yeah, it would be an idea to extend that or at least to configure, make it possible to configure that. But at the moment it's not. You could use a unique name for the file then in the first line have the same information. Yeah, something like that. So, a question. Oh, Dr. Cronky. No, it's quite easy. The desktop-specific application in crash handlers are hooking into the segfold handler. So, actually, that application are never segfolding anymore. Therefore, airport is not getting called. It's very simple. But you can disable the desktop-specific crash handler and then use airport. So, I extended, but I think I have a thought about that later. I extended the upwards or this step to be more flexible for OpenZooze because I know that the KDE project is very proud of Dr. Cronky. So, I made it possible that you can call Dr. Cronky from your crashing application handler as well. So, that would enable us to disable the stuff in the KDE applications, but still have the intention or the user experience would stay the same because Dr. Cronky is called as well. So, it's very, very flexible and it integrates quite good into GNOME and Google Breakpad and whatever. And it's very good because at the moment the OpenZooze, the GNOME Breakpad implementation is broken for OpenZooze, so it's segfolding itself. So, but I can get the reports of the segfolding crash handler of GNOME. I can capture them and report them as well. So, at the moment it's good to have it. Yeah. What technologies are used? As I said, it's the core pattern feature which is upstream since 2624, the piping stuff, the linker features with the build IDs. It's basically a two-chain feature which is new in OpenZooze 11.1. The compiler features about the Asynchronous unwind tables to be able to produce or to correctly produce a backtrace without having debug info or without having full debug info. And system management features are you need lip-sip bindings for Python which are only available on 11.1 because the lip-sip bindings for 11.0 are broken and nobody wants to fix them. So, what's in there for developers? As I said, it's very flexible so you can add your own hooks to it that are called during the applet. So, when you click the report problem button, it's starting all these hooks and collecting gathering data. So, you can easily hook in there just by adding another file into it. So, it's searching for or it's actually executing all the Python files which are there. Then you can also do that per package. So, package specific hooks. And all these hooks need to implement an add info function which is then adding certain information to the report. It's also possible to delete certain information from the report. So, yeah, all the hooks are very powerful there. You can execute arbitrary Python code. So, this is how it looks. This is an example hook. This is an Ubuntu example. You just define add info and then you can just make a new key in your report and pipe the information there. So, this is adding a lot of files. Here's another example. This is quite interesting because by this you can disable the report generation or the sending of the report upstream if you detect certain things. This is looking for specific or specific things in the backtrace. And if it shows up in the backtrace it says, oh, the crash report is likely that it's invalid. So, it's an unreportable reason, bug report. And you are not able to send them upstream or to the server. So, then there's something which is unique to OpenSUSE. It's the developer mode. You can enable it by just putting in the config file developer mode and just enabling it in the config file. This is generating backtraces or crash reports also for unpackaged applications. That means applications that are not officially signed by OpenSUSE build key. And also applications like my second fold application which is not coming with the package. So, the packaging system doesn't know of it. Here you can see also this unreportable reason tag. This is not a genuine SUSE package. And then you cannot send the report upstream but you can save it. So, and then you can look at it or do whatever you want for it. So, this is another thing which is unique to OpenSUSE. It's on app crash invoke environment variable. So, if the application itself has the environment variable set while it's crashing, the crash handler which is defined here. So, here you can see it was this application crashed and it had set a path or a file name. Here it was invoke.sh. It's just a shell script. So, you can run arbitrary code there. You can also see the security feature. So, my username is of course not username but it was replaced. And so, this thing was then run. And here you can see it's just the output of the thing that you're running is just attached to the application report. So, by that means it's very easy to add other crash reporting handlers also on that case. So, when I say it's unique to the OpenSUSE version, it's just because I wasn't able to push this upstream yet. I'm working quite the collaboration with Obuntu with Martin Pitt is quite good. So, he's taking patches from us as well but of course he wants me to prepare the patch and not tell him where the branch is and let him do the merging work. So, but this will be upstream I hope in a few weeks or so. So, list of missing features. It's quite long. So, the proper integration with BugBuddy, Dr. Konke and Colonel Oops. That is also the problem is since I'm only using GNOME that would be the easier part but somebody should do the KDE stuff. I hope that I can find somebody in-house probably or somebody in the community. And then rewrites of the core application. So, which is located under user support. So, this is the application that is run by the Colonel. I really wanted to have minimal requirements and I think that the Python requirement at that point is not very optimal. So, I want to rewrite at least that part in C probably or in C++ I don't know. And then something disabled, apart per process. Yeah? I have a question about when you mean rewriting, you mean rewriting as an executable again or as a library? No, as a standalone application but that application which is called from the Colonel. So, the initial gathering of the initial report gathering which is very, it's very simple work. It's only doing writing out the core dump and doing all this prog file. That's exactly the opposite of what Windows does. In a Windows role when you crash the application what you are doing is you linked and state the library because when your application is crashing it might be very well because your system is unstable due to hardware or memory and the last thing you want to do is to start something new. You want to run something that you already have in memory. No, no, it's the interesting thing is this is in a separate address space. So, it's not starting, it's not like the segfold handling implementation. You start a port which is a new application. So, if my problem was I was running out of memory and that was the reason I was crashing, you are making things worse, not better. Yeah, but the Python, you know, you have to die one death in that case. No, I mean when you rewrite that in C, it will make sense for it to be a library, a static library. Yeah, probably. So, but I don't want to link it or have it loaded multiple times or if you have it loaded as a library and you don't want to run another application during crash, then I have to be sure that the code of the library, the text is not modified during the application crash. So, I really want to run in a separate address space. So, it makes sense to start another application in that terms because this is the worst problem that you have with Dr. Konke and Bug Buddy is if the application goes crazy, it's not able to start the segfold handler anymore or even the segfold handler is crashing and dying then. So, and to get this right is very, very complex. So, what you can do is you map the part of the code that you need to run during the crash, you lock it and you map it read only. No, not read only so that it's not touchable by the application itself, but that would be a lot of hacking and for a really small benefit, I think. That's exactly what the Windows world has. That's exactly what the Windows world has been doing for 20 years. So, yeah, okay. So, yeah, there are some other things that are missing, at least also on the crash report database, as I told connections from Novaya Vaxilla, automatic report analyzes. So, I'm currently working on the report analyzes and the report retracer. Yeah, so that's it. There are some people involved, Nikolai, who did the initial report, Martin for merging our patches and Andreas Bauer and Markus Rückert for helping me with Rails. And, yeah, here's the wiki pages, apport and apport for developers. So, if you're interested, that's it. So, questions? Anybody? Yeah, about the developer mode. You said if a package is some piece of software, it's not a package by OpenSuzi, I mean signed by Susi, it will be not reported unless you enable developer mode. Yeah. What if I develop some software, say commercial software, is not signed by Susi, but I want it to have reporting features? So, the plan is with the connection to the OpenSuzi build service, that if the OpenSuzi build service has a bug, has a reporter set, a bug ID set, that's our CrashDB server also accepts that report. Not an option. Software not built in the OpenSuzi build service. Yeah, you can always, you can add your own crash database server. So, it is possible to have multiple servers. Yes, but in developer mode, it doesn't send, it only saves. Yeah, but you can in the hook, you can overwrite all this stuff. So, in a package hook. So, that's possible. Okay. Good. Okay, thanks.