 Not really more serious than that. The cycle of life is of course that we fork a process and then it crashes and so we fork it again, but then it crashes again, so we fork it again. It's annoying when software crashes, but it's what happens, right? So what does a crash look like, right? We get a secfall for example, then kcrash comes in, intercepts the signal, it launches drconkey, drconkey starts gdb, gdb spits out a trace and then the user has to file a bug report. All of this is incredibly tedious and horrible. Who agrees? Okay, okay, I will convince you all. So in KDE software we kind of have a problem. So it is software, so it crashes and then those crashes end up in bugzilla and who has used bugzilla? Who here liked using bugzilla? Yeah. Case in point, set developers, set users. It's not ideal. So we have a crash and this lovely popup pops up. Everyone has seen that, I guess. And then this lovely view comes up, which is now even more gorgeous than it was before because now it's kirigami. And then you go about your business and file a bug report because of course for every crash we file a bug report, right? Right? Yes? Uh-huh, uh-huh. You know what I'm talking about? And then this happens. The backtrace is not complete. You know, you install the bug info. What is the bug info, what is the backtrace? We are asking a lot of the user here. So there are a bunch of problems with this. First of all, the user has to do something. You were laughing when I asked if we file a bug report for a crash we get because of course we don't because it is annoying and tedious and horrible. So we don't do it. The second problem is the debug symbols. They are often missing. They are incomplete. The distro might have them in separate packages in a separate repository or perhaps not even available at all. And there's also a reliability problem with this. Currently what we are doing is just in time debugging, which is if we go back to the earlier thing, blah, blah, blah, blah, blah, blah, blah, this one, is K Crash launches Dr. Konke. So there is a time frame where the application has technically crashed but has not yet been attached to a debugger. Are we, anyone unclear about this? So in this time frame, the application can do whatever. It can crash in another frame. It can run its threads and do magic, I don't know, can try to do auto-recovery. All of these things. And so that is a problem. It is a problem because we do just in time debugging. So we want to have the state as it was when the application crashed. But we don't necessarily have this because of the time frame in between where Dr. Konke is not yet up. And so it's a reliability problem. There have been bug reports to discover, our software center, that showed crashes in the flat pack, which was weird because there was nothing weird going on in that particular frame. But it was crashing there. The reason it was crashing there was precisely because of this time frame between the actual crash and when Dr. Konke jumps in and saves the show. So the flat pack thread kept running and eventually it would crash in the flat pack thread because the states were messed up. So it's entirely horrible. So there are a bunch of solutions that have come up in the past couple of years. One of the lovelier ones is core dump D, which is system D's answer to caching crashes. Oddly enough, system D is now also in charge of that. So it's not far to the complete operating system now. So it is a system level core handler. Who knows what a system level core handler is? So when an application crashes and doesn't handle the crash, the crash gets deferred to the kernel and the kernel then does what's called a core dump. That's where the concept of a core comes from. It basically takes the entire memory image of the process and dumps it into a file. The more evolved concept of that is that one of a system level core handler, which is essentially a program that receives the crashing application and then writes it to a file or does whatever with it. In core dump D's case, it basically saves a bunch of metadata and then dumps the core to a file for later inspection. Now the lovely bit about this is that it has disk usage limits. So it stores, I don't know, five different crashes, the last five different crashes on disk and all other ones get discarded. So you'll have automatic rotation of what is on disk, which means it doesn't grow out of control. I don't know, Gwyn is super crashy and you get a bunch of cores and then eventually your disk is full. That doesn't happen. And as I've said, it's recording metadata, which is also very lovely. Another interesting and important part to the entire ecosystem is debug infodi, which is solving this debug symbols problem that I've alluded to, right? It's one server to debug them all, one server to find them. One server to bring them all in symbols, find them. It is precisely that. It is super simple and I'm somewhat baffled that no one had this idea earlier. It's basically a cross distro server that allows you to ship the debug symbols in through a simple REST API to a client. Now the client might be GDB, might be LLDB, might be some completely random other process. We have seen debug info in action. The rest of you will surely see it in time. So it's become pretty much the standard in Arch Linux, I think. So you need GDB 12, fairly recent release. And then your distribution needs to provide a server and then you're basically set. You need to set an environment variable and then GDB just goes, give me the symbols and then the server goes, yes, my time to shine. And yeah, it's fairly new. That's why there were not a lot of hands that got raised. Now the industry standard is a bit different than that. The industry standard is you call home, first of all, that's super important. But it's also automatic, right? If your application crashes on Windows, then this annoying pop up comes up and goes, ah, the application has stopped responding. Do you want to cancel me? And then you click cancel. But that also allows the industry standard to cover way more data and gather statistics and stuff. So wouldn't it be lovely if we had that too, right? So we have debug info which gives us debug symbols, that's amazing. We do have core dump D which allows us to not do just in time debugging and eliminate that sort of quirkiness. So wouldn't it be lovely if we had something like that? Well, yeah, we do, we do now kind of many, many asterisks attached to that. It's called Sentry. So what is Sentry, exactly? It's a web application and it does a bunch of stuff. It does symbol occasion, so if you have a crash and it's missing symbols, we don't need to care anymore because Sentry takes care of it. It's gonna talk to the debugging for D and I have this frame at this address with this shared object. Can you give me the symbols? And then the debug info gives out the symbols and it's gonna be amazing. It does event aggregation. So if you have crashes coming in from two different users that look exactly the same, Sentry goes, yeah, that's the same crash. Let's find it together. It also gives us metadata like what distribution was being used, what version of the distribution, what version of the software, and a bunch of other lovely stuff that I'm sure we will eventually see in production. And that's kind of what it looks like and of course that's super boring. So, life demo time, wish me luck. So this is where a zero day exploit is demonstrated live on stage. I shouldn't hope not. Okay, so this is like the Sentry view. Does everyone see what's going on? Okay, perfect. So we have a bunch of projects and while preparing for this talk, I noticed that there was a bug in our bugzilla bot. So our bugzilla bot sends data there. We can already see a bunch of stuff. So first of all, in the past 14 days it had no crashes. It was 100% crash free and it has two releases. Now this down here is something that is specific because it's kind of like a web service, so it's sending performance metrics to Sentry. It basically goes, I've done a request, I've done a request, I've done a request. And then Sentry goes, this request was way too slow. The user is gonna be in agony. What are we gonna do? But so what I've noticed is down here, we have a new issue. And so let's look at it. This is essentially a crash. It happened three times so far. It's probably gonna happen a bunch more times because what appears to be happening that I've made a boo boo and there's an uninitialized constant. And indeed here we have the code where the uninitialized constant is being used. That's this one here. And apparently that line just always blows up. We can see that this happens in our production environment. So the actual bugzilla bot that talks to bugzilla. And we get a bunch of metadata about the system that it was running on. Now we can also look at something else perhaps, performance I've already mentioned. Let's maybe take a closer look at it. So what we can see here is that the project API, that is the tech that powers, release me, a bunch of our release scripts, the localization technology, blah, blah, blah. It's hosted at projects.kd.org slash api slash, I think. And so we can also see that it has performance metrics on that stuff. And if there were a problem where one of the endpoints would be too slow to respond, Sentry would send out a notification and go, users are in agony, help, help. Now let's look at C++ crash. Because obviously the first one was Ruby, wasn't particularly interesting for most of us. This is what a C++ crash looks like. This is actually from an actual system somewhere in the world that is using Dr. Konke from a master build. So what we can see, the binary of crash was console. This is the binary ID, the debug ID, essentially of the binary that was built. We can see that this happened on Arch Linux rolling. And it was the release 2008 one. Now if we scroll down, we can see the actual involved frames. So process info is valid, crashed. I would imagine that it was a null pointer, but we don't know yet. We can also switch between the unsymbolicated view and the symbolicated one. So as I've mentioned, debug symbols aren't necessary anymore because Sentry can just fetch them directly from the debug info div if available. So we might only get some of the symbols resolved. Like here we have the cute symbols resolved, but the console ones aren't actually resolved. And they do get resolved through some dedication. So now we have context for the source file where the error appeared. Now this becomes even more interesting when we look for a very specific issue that I may or may not have caused in KIO. So if we look at the past 90 days, what we can see is that there was an increase in crashes here, and then it decreased again. Oddly enough, there was a bug here, right? So let's look at the bug. We can see that this bug has happened 404 times. Let's look at it. Did you find the bug? Yeah, I found it and I fixed it. It takes a month to ship out, right? It's the problem with the release cadence of framework. So I analyzed the problem, right? And then Nicola said, I did do both things. So I did fix it properly, I think. But so if we look at it, it's actually not a super useful back trace because it failed in KIO directly. In KIO we have a state assertion system and we failed an assert. Super and obvious what the problem was, but what is interesting about this is here on the side. So here we can see different shades of colors for the OS, right? So this happened with multiple OSes. The biggest one being Ruda Linux apparently, then Manjaro and then Arch, right? So what we can tell from this information is that this bug must clearly not be specific to one particular distribution. We can also tell that it is possibly not in one specific version either because it had different build versions, right? Now obviously they could still have the same patch applied that breaks it, but it's more and more information and the more information we get, the better. And yeah, that pretty much gives us information. Another interesting thing perhaps is the device family where we can know if it happens only on a laptop or only on a desktop. Laptops of course having different use cases like more plug and play than you would ordinarily do on a desktop. And anyone else wants, does anyone want to see anything specific that you had your eye on perhaps? Very good. So the other thing then is, how does this tie back into bugzilla? And the way it works is if the user actually happens to file a bug report, oh, I actually didn't really mention this to you, the data to sentry gets sent the moment the user opens the developer information, not when filing a bug report. So we've removed this entire problem of having to open the thing and then I was doing this and that and then Dr. Konke goes, this is not enough information. You cannot possibly file a bug report. We've bypassed all of that. The information gets sent right away and then sentry just fills in the gaps. But if the user decides to file a bug report, then we also get the information here. So plus my crashes when trying to configure sysguard widget probably deserves looking at and apparently a bunch of people have looked at it already. Why is it not fixed yet? But yeah, so the ultimate plan would then be to have bugzilla also point back to sentry and sentry to bugzilla and then the two are kind of linked together so we can have both systems at the same time and perhaps eventually move away from bugzilla all together. Any questions so far? Are you all right? There's some online questions. Oh, yes, let's do some online questions. Koshan asks, how does device detection work? Are you going to detect plasma mobile or does it detect plasma mobile as a mobile? That is a very good question and I can't answer it without looking at the code. It's, actually let's look at the code because actually the code is amazing. The code has no bugs because obviously there were no crashes. Oh, we're asking system D for it so it wouldn't know. So once system D knows that it's mobile. We will know too. Yeah. Okay, we have another question. So sentry is really cool. I've used this in the past mostly for web stuff. Never thought about using for C++ compiled stuff. The question I have is have you found a way to make this stuff work offline? No? I mean, yes, but no, but yes, but no. So what happens is, it's probably not super interesting to look at. What happens is that we compile a payload through GDB Python scripting. So we construct a JSON payload and then send that off to the sentry server, right? In an offline scenario, we would just store the JSON payload for submission at a later date and then you would have like a system D timer that runs continuously and checks if they're pending payloads. So it can be done, it's not done right. Store and forward is where it's at. Yes. Other questions from the room? I see a question way back there. Can you expand a little more on how it works, like what it sends, like you've just shown it sends some payload out, but like I can't imagine you're sending a core file to the network, but can you expand like how it does and what it does and what not? Especially in terms of privacy and things. Yeah. I can, I can, I can show you. So it does this, right? Do with that information what do you want? So essentially it's, I have to go way back to the beginning. We're still invoking GDB and GDB has scripting APIs. So we can have a Python script that runs in GDB and allows us to inspect the process from within the Python script. So what we then do is we run through all the frames in the trace and for each frame we record the information of the file at the memory location, the file that was at that memory location and if possible also the debug information, if available. This is essentially the payload, right? And we do that for all the threads and then additionally we attach the metadata payload which is the stuff down here. Right? So it's like timestamps and yeah, the release from your always release file, that sort of thing. The question behind you. You're still launching GDB and how does the solve the latency problem between the crash? Ah, yeah, yeah. Okay, so code empty solves the latency problem. Yeah. I'll show you. Maybe. Maximum wobbliness. So if we do code and control, these are all the crashes that had happened on this system and we see most of them are quite old and the column here says missing so that the core file is no longer available. So we will look for one that is actually available which would be this one. So this is the information that CodumpD has recorded about this crash, right? So far so good. It recorded the PID and the UAD and so on and so forth. That is more or less all the information that we would have available at runtime regularly. So what then happens is that if we attach GDB to that, we can run a back trace on the core, right? So this is no longer just in time. K-Ride crashed a while back. But since CodumpD recorded the core, we can still debug it. So this is no longer just in time, right? But that is unrelated to Sentry. That is an improvement that is already landed and Sentry is still under evaluation right now. Okay, thanks. How to use it in your application? Ideally, you don't need to do anything because it wires into the existing Dr. Konke framework. So if Dr. Konke works, Sentry will also work. There's some management stuff that I need to do to actually filter out the project though. So if we look at our projects here, there's one very notorious one which is called Fall Through, which oddly enough collects everything that has fallen through the net of other projects. So unsurprisingly, it had a number of errors. But yeah. So ideally, you don't need to do anything. Question for you, and that is, how do we do this on 3BSD? Oh. Oh. Please push out and ask any questions. Yes, of course, yes, yes. I'm certain he is. And another non-systemd, non-cordom. Yep, yep. So the entire Cordom thing is separate, right? We can still do just-in-time debugging. So nothing's changed. Everything's still the same as it was two years ago, except if you're in SystemD and except if you opt into this Cordom D stuff, right? But the Sentry stuff is unrelated and so you can still have the Sentry bit on 3BSD without Cordom D. Well, if you use Gitmaster then already, but only then. So this is currently under evaluation. Sentry has a platform. There's still some quirks we need to iron out. There's still some features that we are not quite happy with how they integrate with our GitLab instance. So technically we can link up the traces that we've looked at to the actual source file. Like we've seen with the Ruby crash where it actually referenced the source code. We can do the same for C++ but we need GitLab linking and that currently is not quite where we want it to be. So maybe six months, maybe 12 months, we'll have to see. In the audience, you mentioned that the factory is only gets sent if we go to develop the information. Yeah. Normal user when we go to develop the information. It's only for the, it is this way right now, but ultimately we wanted to just send the information right away. But since it's still under evaluation, it's basically kept behind the development information. There is also a more difficult problem to solve there in that how do we distinguish a dirty build from a pristine build, right? If you are doing development on Plasma Shell and Plasma Shell crashes because of something you've done on your dirty tree, we don't really want to know about it in Sentry. Another question. That is a good thought, yeah. That whatever the comment was is not on audio. Yeah, it was a good thought. So the idea was. Please repeat the comment. The comment was, it should be relatively easy to figure out if the tree was dirty by just looking up the build ID in the distribution, right? If the distribution doesn't know about the build ID, the crash is value less anyway. Yeah, good thought. So as I've said, Python scripting is really amazing. If you ever do anything with GDB then do try the Python scripting, especially if you find yourself repeating the same steps like, I don't know, starting an application, setting a breakpoint, and then running it five times and then having the breakpoint hit, you can totally script that using the Python scripting. It's really amazing. And with that, I'm really done now. Yes? No, from Git master builds. So those are, that is real live. That is real live data. There are some crashes that I've created, but not those ones. You said that Sentry pulls it from a debug info d, but how do our symbols get into it, given the near infinite number of distributors? From the distributors themselves. They just submit it, or they run their own. So debug info d is a federated service, essentially. It can talk to other debug info d's, or it can source the material directly. And so each distribution just runs their own debug info d, and then we have a federated server that talks to all the known ones. Yeah, you should totally create a debug info d. Okay, if there are no other questions, comments, poems to be read, then you have an extra long coffee break. There's a couple minutes left in our session. And join us next time for KF6 for licensing in the other room. Thank you.