 So, welcome. Hi, my name is John Ognes. I'm one of the happy hackers at Linnotronics, and I'm here to talk to you about when applications crash, how we can capture that. And I know everyone likes to think their applications don't crash, but they do. So the first step is acknowledging you have crashing applications. For example, two days ago I was going down to the Metro to visit a pub, and one of the, there's, you know, when you go down the elevator, some of them, they have like all these like 50 screens showing advertisements. And one of the screens had crashed, and they're using, it's running Linux, and they're using the VT console, so I could actually see a little bit dump information. And actually try to debug it a little bit, but you have to walk backwards on the stairs to, and I realized it's getting dangerous, so I didn't quite, but their application crashed, right? So these things happen. And what I'm particularly talking about is crashing of applications on your products, right? So I'm not talking about when you're debugging and developing that you already have tools for that. So I'm talking about your products when they're crashing out there. Now I'm sure everyone here is aware of core dumps, but just to cover the basics. If we look at the man page, what is a core dump? This is just a file containing an image of the processor's memory at the time of crash. So this is a feature of Linux that when application crashes, Linux can take a snapshot of the entire virtual address space of that application and save it to disk, which is really nice. These core dumps, by the way, are in the ELF format, so it's very easy for tools like GB to find out where are the stacks, where were the library linkings and all of these things, because they're ELF files, these core dump files. The advantage of core dumps is that the kernel has it, so you don't need any special tools or anything in user space, any thing that you have to special turn on, it's just a part of the kernel. You have everything, right? So you have all the stacks, the registers, everything you could possibly need as an application developer is in that core dump. And when it's in a file, after the application is crashed, you can actually investigate what happened afterwards. So you can do porse modem, modem exploration. Now this thing died, and I want to now look at why it died. Looking at the stacks, looking at the variables, perhaps looking at some things in the heap will help give me clues as to why it died, because probably it's a bug in my code. So having all this information, of course, is very helpful. And, of course, it's very nice to do offline debugging. You can copy that file off of the device, and then on your nice powerful workstation, with your cross tool chain, you can go through and actually spend a lot of time going through with a graphic environment eclipse, if you want, and all these things. So these are really nice features of core dumps. And as a developer, there is nothing better than getting a core dump file if my application crashed. If we just know our products are crashing in the field, and no one really knows why, this is horrible. We have to try to reproduce this if we can. It's a nightmare, actually. But if we have our products in the field that actually capture and capturing core dumps, and we just have to get one of those products back that crashed, and now the developer can see exactly what's going on. So this is really a great thing there that we have that. And really the main point of my topic, more important than the package I'm going to present, is that you get the idea that we should be capturing this core data on our products. So this is really the core thing I'm trying to push that you should be doing this. That's more important, even if you're not using this tool, you could do your own tool, but you need to be capturing core dumps. You need to be capturing this crash data, because Linux is everywhere. Soon it's going to be in every light bulb, and if we have things crashing, we need to capture that data. We need to help to make our code less buggy. We're always going to make new bugs, but at least we have an active tool that we can use, or a mechanism that we can use to fight that. Now, of course, the disadvantages of core dumps, which everyone immediately says, yeah, but they're huge. I don't have space on my embedded device for these core dumps. Or you need special debugging tools, like if I don't happen to have the tool chain with me with GDB, I can't actually look at what happened. Or maybe it's nice to have information that this crashed, but I want to know others. It's a big multiprocess system. Maybe there's information I want to know from other processes that are running that didn't crash. They also contain important information. So the core dump doesn't have everything that I need, or it's too big. And the project I'm going to present is the mini core dumper project, which is exactly to address these issues. So the mini core dumper project has basically three goals. The first one is to make, which you could guess from the name, small core dumps. So we want to get you to not have this excuse. They're too big, right? So if we can get core dumps down to really small sizes, then you don't have excuses anymore. And even on your light bulb, you can be capturing core dumps. This idea of custom core dumps, that you can actually choose what do you want in that core dump. Now, what exactly do I want to have stacks? Do I want to have part of the heap? Maybe I just want a couple buffers that you can actually choose what you want to have into that. And the third thing is that you can do state snapshots so that if something crashes, I can actually see a lot of information about other things in my system. So this is really the three goals that the mini core dumper project is attacking. And the way that it does this is basically with three different components. So I'm going to talk about each of the three components. Most definitely the most interesting component is the first one. So I'm going to be spending the most time on the first component, the mini core dumper, because this is the one that makes the small core dumps, right? This is the one that we're really interested in. So the mini core dumper is just a regular user space application. It has a configuration file so that you can specify what applications if they crash that I want to capture data from. You can specify exactly what kinds of data you want to crash. It has features, for example, like in-memory compression so that we can actually, rather than putting a core dump, even a small core dump to this, we can actually compress it in memory and put a compressed image onto the file system so we can get it even smaller. So it doesn't have to touch the disk first. It can compress it directly in memory. Obviously, it's targeted for embedded devices. So it doesn't have a lot of dependencies. And the most important thing is there's no kernel patches required. So the mini core dumper project has been around for quite a while, actually, and the kernel supports this. So how can we do this from user space? How can I be doing a core dump from user space? It makes no sense. Well, if we look a little further down on the man page for core, you see that actually there is a proxys kernel core pattern file that you can specify the name of the core file. It doesn't have to be called core. You can actually use some percent identifier to say like percent p and things like this to give it a name that actually has values that are assigned at runtime. This is already a really nice feature because just using this feature, you can control in which partition. You can specify, for example, a path and a name. So you can control which partition these core dumps are landing on. So this is already a nice feature. And you can also specify things like the name, like there's a percent shortcuts or symbols. I don't even know what they're called percent specifiers, like based on the name, right? Which means if you have a name that uses the name of the crashing program in it, then you're always going to have the last core file of that program. If you want to have all of the core files of that program, then add a PID to that or something, or add a timestamp, right? So you actually have control. Do we want to have all the core files? Do we just want to have one core file for program and things like this? Actually, just from this feature, you have a lot of control there, particularly where they're stored, right? So you can have a separate partition on a separate media or something we're using to store those. But if we go a little bit further on the man page, oops, we see that if the first character of this file starts with a pipe, then you're not specifying the name of the core file. You're specifying a program that should run, and this program will take care of creating the core file. And the way that it works is this program will actually get the core file over standard input. Now you could do this for yourself, right? You could say, yeah, let's just make a script. And when something crashes, it calls our scripts and we get all the core dump and standard input and we do everything we want. Yeah, you can do that, right? So that's also all part of the kernel. This is how mini core dumper works, right? So all it does when you want to install the mini core dumper is basically you're just, you know, in this proxys kernel core pattern, I just put in there pipe the path to mini core dumper and the different parameters that this program wants. So, you know, the PID, the UID, the GID, the signal and things like this. This is all information that we filled in dynamically in that moment where the thing crashes. Now you'll see I also put something in proxys kernel core pipe limit. This is because it's a feature of the kernel. If you're using this user space stuff, you can actually limit that only so many core dumps at a time are allowed. So if there's like, you know, 10,000 programs that crash at the same time, that doesn't start 10,000 programs to handle those because if it's an embedded system, actually, you might not be able to do that. So you could actually limit that. Here I'm setting it to the maximum value, which you probably won't have that many core dumping core dumping programs running at once on your machine, I hope, or it's got a lot of crashing going on. Okay, so using the actually the way it ties into the kernel is very easy, you know, nothing special and you could actually write your own stuff. Now for the mini core dumper itself, this is this program has a main configuration file and really in this main configuration file, you're doing a specifying where you want the core dumps to be dumped. And then you specify a set of match rules for different programs because I said different programs could have different settings. For example, in one program, I just want the stacks in another program. I want heaps and all kinds of other stuff. Right. So I can actually have a separate configuration file for different programs if I want, or maybe I don't want any core dump files except for when it's this program of that program. Right. So I can basically configure, tell the mini core dumper what it should be dumping and not dumping. And each of these application specific dump configurations are called Recepts. This is kind of a German English disaster name. It's historical. I'm just going to leave it at that, but they're called Recepts. This is just the application specific configuration files. These are also in JSON format. And this is where you can actually specify exactly what you want to dump for this application. You can also specify things like particular memory mappings or particular symbols that I want to dump or the compression options I want to use, for example. So this is a, that's not from my laptop. Thank you. So an example of the main configuration, right? So it's just specifying where you want to dump the stuff. And then these are the different expressions for the different applications. Now there's two different things you can use to match expressions. You can either match based on the actual real path path of the executable that's running. So if you run real, if it's a symbolic link, it'll resolve that. So for example, if you have a busy box based system, and you, and it's an LS that crash, then it'll actually be exe will actually be the busy box binary, the full busy box binary, or you can match on the calm field within the task struct. So this is typically the thing that you're running. So if on a busy box system, if I run LS, then it's going to be LS. Keep in mind that the calm field can actually be modified by the applications as well. This could be an advantage, you know, so that you actually, some applications, so maybe all the same binary, but maybe in some cases you want it to be different and you want to have different rules, you can modify the calm field yourself so that you have different crashing configurations which you're using. You can also see here that wild cards, just a real, it's not regular expressions, it's just a wild card support so that you can actually match, for example, I don't care where this binary is, if the real binary is real path example app, then this is the configuration, the recept we want to use. The way it works is it just goes one to the next one. So as soon as you have a match, that's the one it's going to use. That's the decision it's going to take. You'll see I have one here where I don't have any recept specified. This just means use the defaults of mini core dumper. I don't know why you'd want to do this unless you're too, I guess you're too lazy to write a recept yourself. The defaults are actually quite okay, so you could do that. This last one, you see there's no calm and no exes specified. The default values for exe and calm are the star, which means they match everything, right? So if I just specify a recept without any comm or exe, then I'm saying all, if anything crashes, this is basically a default I want to catch. Now if I were to leave this last one off, then there's possibilities that there's things that don't match anything and they just won't be dumped, right? So that's how you can control what's not dumped is that there's no match. And so the mini core dumper won't dump anything. This is the sole mechanism to dump when you've specified. When you tell the kernel I want to use this program, then if that program decides not to dump, then nothing will dump, right? Now if we look into one of the recepts, so these are the per application configurations, here you can see this isn't everything, but these are the most interesting things actually. You can say if you want to dump stacks, there's a stack section, you can say if I only want the crashing thread, because usually as a developer, all I care about is the crashing thread. If the thing has 50 threads, I don't, okay, the other might be nice, but if I can just get that crashing thread, that would be a huge advantage. So that's actually a typical use case. And for example, the default Debian configuration is using that set to true. You can also limit the stack size. I'm not sure if that's really valuable, but if you do want to limit that, you can. If you do have applications that have like 3 megabyte, 10, 50, maybe you're doing some Java stuff for like 20 megabyte stacks. And you actually want to just want the, you know, the top part, which is the bottom most part of memory. So what it does is if you limit that, it's going to get the most relevant part. You're going to see the stack frames where the crash occurred, which is the most important stuff. Here's some examples of maps you can set for like the VDSO, or you can say I want the whole heap or things like this. These are just some mappings, or maybe there's some shared memory mappings you want to also dump. You can specify individual symbols. So this is an example of like a global variable that I'm specifying. And I just, so basically you have the symbol and then the amount of memory behind that symbol that you want to grab. And it also supports if this symbol is a pointer. So in this case, I'm saying that the symbol is a pointer, which means it's not going to dump 42 bytes starting at that symbol. It's going to treat that symbol as a pointer. It's going to resolve that address and then dump 42 bytes from there, right? Because sometimes we have a pointer to a struct and the pointer is the thing you can get to and not the struct. So you need to go to the pointer to get to the struct. So compression here, you specify basically compressor. This is actually a binary that you specify. And then you can specify the extension for the file that's been generated. For XZ, it's not very interesting because XZ and XZ is the same, but like GZIP is a good example. The GZIP program is the name of the program, but the extension I want on the file is dot GZ, typically, right? So that's why it's two different settings there. The compressor is, it can use any compressor, but the compressor needs to be able to support the data stream as input, and it's going to compress it as output for standard out, right? So any compressor that supports that, which in Linux is pretty much all of them, you can use there to compress your data. Now, this in tar option is actually quite interesting, and the reason why we have this, this is an ability to pack the core file into a tar ball. Now, the reason why this is interesting is because tar formats supports sparse files. So if I have the virtual address space of a program and it has maybe 500 megabytes virtual address space, there may be only be a few megabytes that are actually being used. Most of that are zeros, right? So I don't want to compress 500 megabytes of data. I just want to compress the data that's actually non-zero, and I can do this with sparse files, but I can't, it's not possible to send a sparse file over a pipe to GZIP. I can't do that. But what I can do is I can just, I can manually create a tar ball in memory that's just sparse and I can give that to GZIP, right? So if we have a 500 megabyte virtual address space, you know, we're just grabbing those pieces that are non-zero, and that's what we can actually send to GZIP so that it actually goes quite fast. Even if Libre Office and things like these were to crash, you get immediately, you get these core dump files because I'm just grabbing what we want and then sending it to GZIP and we're done. This last thing, which is interesting, this isn't all of them, is the right PROC info. This is interesting because what it will do is it'll write additional information from the PROC system at that time. For example, the maps file, or I really like the FD directory in there so that you can see all of the current open file descriptors and where they're pointing to. So these are really interesting information that you can just, you're getting a copy of what PROC looked like for that process at that time, which can be a nice overview. Most of the stuff you can actually find out through the core file, but it's really nice to just have this immediate visual feedback. You don't have to have the debug tools and all this stuff to look at those. So the way that it works is I already mentioned the core file itself comes through standard input when it comes and the mini core dumper uses this basically just to parse the elf header. So it's not going to use this for getting all of the information. The reason for that is that it's not seekable. It's coming in through standard input. So once I've read something, I can't, and then I decide, oh, I do need that. It's already gone. I already got past it. I don't want to buffer all of this. So I only use standard input to basically read the elf header. This is where I can get basically all the important information about where everything is in memory. And then after that, once I've read that information, I use the PROC file system to actually grab the stuff, right? So I can get all of the memory mappings through the maps file, the stack pointers I use stat for, which is really nice because the current, the 29th field of the stat, stat file says the stack address and it does it in a platform, uh, independent way, right? So if I'm on MIPS or the power PC or just totally different architectures, actually finding the stack address is not that easy. So like GDP has lots of special code for all these different architectures. And honestly, I don't want to implement all that in the mini core number, but PROC stat totally abstracts that for me. I can just grab that number and I'm good. So it's a great use of that. Uh, the, the auxiliary vector, this is basically so I can get some link map information, mostly to create a core dump file to make GDP happy, not because I really need it, although I do use it for some library stuff. And then there's the actual grabbing of the memory, right? So the actual things we want to put in the core file, then I use the mem file in PROC for that. So this is how I can get everything. As I mentioned, the core files are sparse. So I'm only grabbing the things I want. And those are the things I'm putting in the core file. The mini core dumper also appends its own custom elf section. So a note onto there that has the list of things that it dumped. And this is interesting because, uh, you know, if it's a 500 megabyte virtual address space that crashed, you're going to have a 500 megabyte core file even from the mini core dumper, right? It's going to be sparse. So it's going to be really small, but the size of it's going to be 500 megabytes and most of that's going to be zeros. And so now when GDB looks at it and you look at a variable and it says the value zero, and is it the value zero or did it not get dumped? Like I don't know. And so this additional note, this additional note, which is added onto the core dump file is a list of everything it actually dumped so that GDB can actually say it's not available, right? And on linotronics in our GitHub, we actually have a relatively old fork of GDB that implemented this as kind of a proof of concept. And it's actually really nice because you say print this variable and it says unavailable, right? Like it's, it's the variables there, but the data is not there. So that's kind of nice. Uh, and then of course the in compression, uh, in memory compression I already talked about with the tar format support is really nice just to get these sparse files and compressed quickly. So just to quickly simulate, I'm not going to do this live. I'm just, I kind of do screenshots here. Uh, you know, we take an application like Firefox, which is relatively large. Uh, and I just simulate a core dump, uh, by doing, sending a segmentation fault, an invalid memory access signal to it. That's all, that's all you have to do to simulate, uh, core dumps. Uh, and then we can see what happens. Now just a little word about this. If you're going to be doing anything where you're trying to catch core dumps, definitely try it. Now just send a signal, crash your programs and make sure you're getting what you think you don't wait for. Yeah, theoretically it should work and we have the things in field and then later we get it back and there's no core files, right? You can test these things really easy. You don't have to actually crash your programs. Just send them some signal that will cause core dumping, uh, just to make sure that your whole core dumping, uh, solution, uh, is working correctly. So with, this is Firefox. I did this with, with Debbie and Bookworm a couple of days ago, just to kind of get some actual, uh, current numbers. Uh, so here you see the first row. This is with the normal, uh, kernel core dumping, right? So it's a, you know, the numbers are a little bit different because I had to run Firefox three different times, but they're pretty much the same, uh, for the actual file size. So you see, we're dealing with about a half a gigabyte of virtual address space. Uh, even the kernel of course uses sports files because why would it not? Right? So the kernel, even if you're using the normal kernel core file, it's only going to be 170 megabytes large, but this is 170 megabytes taken up on your actual disk. And we could be fair just to kind of have comparison. I could, I could pack that into a tar ball and use XZ to compress it as good as possible. And I got it down to 17 megabytes, which is pretty good actually considering as a 500 megabyte address space. Now, if we look at the default mini core, uh, uh, settings in the default settings are we're interested in all of the thread stacks, but none of the heap. And so again, the file size actually is about the same. That doesn't play any role there. Uh, we could see that the disk usage is down dropped on the two megabytes. So just grabbing all those stacks plus additional information that GDB needs. Uh, we have about 2.3 megabytes and the mini core dumper will never let that touch the disk on the fly. It'll create the tar ball and compress it. And that's what's going to land on the disc. So in this case, you're seeing 108 kilobytes that are actually landing on your disc. That's much, much better. But if we're developers and saying all we care about really is the crashing thread, let's just do that. Then of course it gets even smaller. So now we can see that the actual disk usage of the core file when I unpack it is 1.3 megabytes and the compressed is 72 kilobytes. That's touching. So we can have an application as big as firebox. This thing had, I think 109 threads that were running. All I did was start it and go to the OSS website. Uh, and there was 109 threads running and I've got a core dump file that's 72 kilobytes actually, right? And keep in mind, all of these variants give me the full backtrace for the crashing thread, all of them, right? So this is, this is huge there for developers. This is great. This lies just kind of showing you that the name of the additional section, right? So it's using type 80 and the name there. So of course the official bin utilities don't know what this is, but it is there, right? And in this case, we're actually adding 5 kilobytes to the core file. This is the fullest. This example is taken from all of the threads. So there was considerably more dumped. So we have about 5.8 kilobytes of additional information added. The dependencies aren't too bad. We've got the libJSON, which is probably the only, like, oddly one there because the configuration files are in JSON, and the libZ is a dependency actually of libJSON. So we don't need libZ for anything else. So you could probably configure libJSON not to support jzip to compress JSON files, but I just used out-of-the-box Debian here. So these are the dependencies you get on that. Not too bad, I would argue. So most of these things, I mean libElf is not part of the glibc, but most of these things are coming from the glibc. So in summary, when you have the mini-core dumper there, you have this low storage overhead. There's zero runtime overhead because you're setting up the kernel, when something crashed, then call it. It doesn't affect anything at runtime. I would argue the configurations aren't too bad, even if the name recept is insane. The configuration itself is, maybe JSON configuration is all insane, but I would argue it's not too bad to use it. The crash data that you're getting is extremely useful. These are real core dumps, right? So that's 72 kilobyte file. I can actually unpack it, start gdb, and gdb is totally happy, no complaints, no warnings, and I can do a BT and look at that backtrace, right? So it's extremely useful information. And it's small enough that hopefully you can't say we don't have space, right? Do you have an eProm? Do you have something or I can just put 10 kilobytes somewhere? Usually you have something, right? So this gives you really great leverage to say, hey, we can actually start capturing core dumps in our lightbulbs because we only need 10 kilobytes of space to get some useful data. And this is really hitting my point for this whole talk is that you really should be doing this, right? And whether you're using this program or you're trying to create your own, this one's nice because it parses that file for you and it basically does what you want to do, right? So it's actually quite nice. But if you want to do it yourself, that's okay. The point is, is you should do this with your products. And actually yesterday, Eric Johnson from Menfault did a great talk about Zephyr. I'm not a Zephyr guy, but he talked about basically something similar they're working on in Zephyr. I mean, it's not as nice as this and it's kind of work in progress. But it's nice to see that the Zephyr people are already saying, hey, we need to be capturing core dumps. We need to be doing stuff like this. Because maybe you're not running a system like Linux, you're running Zephyr, but you should still try to capture your crash data. So if you didn't see the talk yesterday, it was recorded, so you should check it out if you're a Zephyr person. Okay, but that's not everything. Remember, I said, there's three components. This is the most interesting component, but it's not awful. So the next one is the libmini core dumper. This is basically just a library that you can link to your application to specify special data that you want to dump. So if I crash, I would like certain things, this link list or these symbols, I would like some special stuff also dumped in. So that's what this libmini core dumper is for. And it supports dumping the data directly into the core file, or you can dump it into external files. If you just want the raw binary data, or maybe you want a text interpretation of that data, it also supports that. So this is interesting because now you can say, I don't, you know, maybe we have some important link lists in the heap. We don't have to grab the whole heap, right? Just grab those, those couple things that we need. And we can still keep our core files small. And the fact that it also supports like a text representation, this allows us to maybe if we crash, there is some values we would like to see, and we could have them nicely string formatted so that we could just, we don't have to have our whole tool chain with us. We can just look and we can see like principal things for our application that are really important. And we just want to look at that in text really quick without having to have a tool chain, right? So with the libmini core dumper, you can do that. Basically, the way it works is it has two different symbol set of exports. The interesting one here is the data head, which basically is just a link list of things that you've said you wanted to dump. So when the minicore dumper handles the crash, it's going to look for these two symbols and say, hey, you've got some special data you would like to also dump, then it will handle that for you. Basically, the API is you can register binary data that you want to dump, and you can have it dump in an external file or dump into the core dump into the core itself. There's two VRX based and non VR based text where you can register text data so that it's formatted really nice. And then you also have the opportunity to unregister data, right? So if it's like a link list and this link list is changing, then you can say, okay, we've registered some items on this link list and now we're removing an item. So I'm also going to unregister that item maybe or things like this, right? So you can actually control the data that will be dumped during runtime. So this is an example of a simple program here. You can see we have a character string that we're allocating. We have an integer pointer that we're allocating. And I'm going to set that data to 42 that the integer is pointing to. I'm going to do a binary dump of the S. So this is the string. So and the fact that I'm saying null here means I want that data to go into the core file. I can also do a binary dump of I. And in this case, I say I want it to go into an external file. I don't want it to go into the core dump itself. And I can also register some text. And I would like that to be called out.txt. And I would like it to say s is s equals quote, the value of the string. And then the value of the resolving I. Now these values are not going to be interpreted until the actual crash happens. So for example, if string were to these value were to change at some point before it crashes, of course, you'd see the new value wouldn't it's not the values at that moment of registration. It's the it's the values when it crashes. So if we run this, it crashes, we're going to go ahead and move the crash directory. So like var crash mini core dumper. And this is so the name of the directory that's created. Go to change permission so I can look at it as a normal user and look at what's in there. Yeah, so we see we have this compressed core file. We have this I dot bin that we said we wanted this this I variable that we wanted externally saved. And we have the out dot txt. That's this text file that we wanted to generate with some some data in it. There's also an additional symbol map file that's created that just has information about this I dot bin, in case we want to reinsert this into the core dump. So it's not in the core file, it's external, but we can actually inject it later. So that's what we need the symbol dot map file for. And if we look at this dump, this out dot txt, we can see it says s equals my string resolving I is 42. We can unpack it, start the gdb, gdb is totally happy. It tells me the line of code where I crashed. I can say I want to look at s. It's going to show me s, right? So it's actually inserted into the core file, the thing that s is pointing to actually the string. We can look at I we can see the pointer by but if I try to resolve it, we're seeing a value of zero. And it's because it actually wasn't dumped. Now, if I did this with our the linotronics patch version of gdb, it would say unavailable, right? But if I do this with normal current gdb, it's going to say zero because that data is not dumped. And we can actually verify that by checking the note. However, there's also a tool called core inject that's part of this package. And I can use core inject to give it the core file, the symbol map this I dot bin file. And it'll re inject it into the core file. And now when I start up gdb again, and I print the value behind I, we see it's 42, right? So we can actually re inject these things. So these are just things that you could use for yourself. Is it important to have it external or internal? That's your decision. But it's possible to actually have certain variables stored externally. And then you can just with a heck stuff or something just quickly look at look at these things, right? So it's to give you power to not require an entire tool chain to look at stuff, because maybe you want to ssh into the board, you want to connect into that board and just look at the things right there. And they're not going to have a tool chain on the board, right? Or maybe you don't want to do over gdb server and things like this, because it's, you know, the things sitting in Timbuk 2 or whatever, right? Okay, dependencies next to none thing for this lib mini core dumper. So the summary here is you have your application. You can register additional data that you want to have. There's no runtime overhead. But you know, obviously, if you're registering and registering stuff, there is some overhead there, because it's going to allocate some memory added to a link list and things like this, right? So obviously, if you're constantly registering and unregistering data, there's going to be some overhead, but you know, otherwise, there's no runtime overhead there. And you can basically have this precise data that you want to dump. So the last component is live dumps. So this is the idea that we can kind of the snapshot idea. We don't just want to have the crashing application. We would like to know some other applications, what their values are also set to. So this is an ability to kind of get a pseudo snapshot of multiple applications or data from multiple applications when one of them were to crash. And you can trigger on crash or manually, which you should definitely do manually to test it. Do these things to see how it works. So this is interesting because you can have these pseudo snapshots. And I say pseudo snapshots, because obviously it's not instantaneous. When something crashes, it's not instantaneous. Everything is frozen. Of course, the kernel first has to load the mini core dump or the mini core dump has to parse its configuration file. And then it can grab this and see, okay, these are the other applications that have things registered. And we can start working there. So there is time and I have some numbers here later, but it's not instant. It's the same thing with GDB. When something crashes, the threads take a second before they've all been killed. So the way this works is we have a registration daemon, so mini couple of regd that runs. And it basically opens a Unix local domain socket. And with this Unix local domain socket works through credentials so that when an application connects to this Unix local domain socket, I immediately know the real PID of this application, even if it's in some namespace or something like this, PID namespace, I know the real PID of this. And then I can store this information there. And basically, I have a shared memory area where basically I add a whole bunch of list of PIDs that have special information that the core dump core dumpers should also look at when something crashes. So the way that it works is that when you're using libmini core dumper, if you've said you want to use this feature, it will when you initialize the library, it'll register it with the regd. And basically say, hey, when something crashes, I have stuff for you too. And this happens automatically. Actually, the first time you say I have some special data that I want to dump. And if you say, no, I don't want to dump this anymore. As soon as you've, as soon as you said the last piece, I don't want to dump anymore, then you're unregistered automatically, right? So it's not going to be in that list if there's no data to dump for you. And so when the mini core dumper actually is invoked because of a crash, it's going to read this PID list. And then for each of those PIDs, it's actually going to use P trace interface to freeze them all. And then it's basically going to go through those variables to grab the data that you want. You're not going to have core dumps of those things. It's not creating a core file of them. It's just going to grab special registered data, all of those. Also, basically, no dependencies. So with these pseudo snapshots, it's going to give you an idea of the latencies. So you're going to have to expect two milliseconds, two to 30 millisecond latencies between when the crash actually occurred to when that first dump is generated. And then we're freezing all these things. There might be 30 microseconds to four milliseconds. So they're going to, you might have like 40, 50 milliseconds difference between when the crash and this data that I have from the other running, still running tasks. But that's kind of a pseudo state that might be interesting. Okay. So the summary for the live dumps are that it's also an ability for me to grab this information from other programs that are running. They have to be using the lib mini core dumper to register themselves. But then I can also, in my application, if I have a complex multiprocess application, that we can kind of have a coordinated effort. If something crashes, we want to have all this data from these tasks. And it doesn't have runtime overhead, but be aware, if you're using this, if you've registered yourself and somebody else crashes, you will be frozen for maybe 40 to 80 milliseconds, right? So be aware, even though you're not crashing, you might freeze because someone else crashed and the mini core dumper wants to get a reliable state there. And it's not a technical reason why they're being frozen. It's just because I want, I say stop. I'm just telling them to stop so that I can try to get a reliable snapshot there. Okay. Last slide. The current version of the project is 203. Debian apt install mini core dumper open embedded bit bake mini core dumper gen two, not gen two user, sorry. What is it? It's a gen two, the e build or whatever they have there. Like I said, on the Linux GitHub, we have a proof of concept of GDB. It would be nice if I kind of got that into GDB, but I kind of let that fall behind. Some things we're working on right now is implementing a modern tar format because the tar format we're using right now has a limit of, I think, two gigabytes or eight gigabytes or something like this. So if your core files are giant, actually, they're going to get cut off. There's also someone working on the packs format support. There's been a lot of requests for post processing. Like after you did the core file, can you do some scripts and stuff? So we're going to do that. And recently it was pointed out, I need to have a case health test for this because the kernel developers keep breaking this, uh, this core pattern thing. Uh, it's not the mini core numbers. I mean, they're breaking it for everybody, not just for me, but I seem to be the only one that cares about this. So I've had to fix it twice already and I just realized mainline has broken it again. Uh, not for the, not for the crashing thread, but for the secondary threads, uh, they just broke it again. So, uh, I need a case health test in the kernel so that we can detect that early. We have one minute. If anyone has any questions. Okay. From the security perspective, yeah, if it looks like very doable, but it's still, uh, if the customer wants a proof that you are not taking out any sensitive info information with a dump. So it looks like you can implement it with this library in your code. Yeah. I mean, coming up at the proof is your thing, but you can actually pick and choose so that we're avoiding customer sensitive information. Yeah. Sure. Anyone else? I assume this is using P trace underneath for accessing stuff. So is that what's breaking? Cause when I was doing crash handling P trace. No, it wasn't. It's not P trace. It's, it's actually the, the proc stat file, this, this 29th element, which says where the stack is normally you cannot look at that. They don't want you to see it. Uh, there's also, you know, it's a really racy to have a running program and you want to know the stack address. And so they basically usually you just see a zero there, but there actually is code that says if it's a core dumping as a program that's crashed, we know it's dead, then it will show it. And they just recently broke it that if it's a secondary thread, it's not the one that crashed, but it's also a thread in that application. Uh, they now I'm getting zeros there again. So somebody bro, I actually know who it is. I'm not going to mention names. Uh, but so we just need a case self-test in there so that people don't accidentally break this because they don't think about this, right? It's a remote question from Stefan Agner. Can this be a replacement of system decore dump? And would I lose features if I'd replace it? Yeah. So system D also is a core dumping. I mean, there's actually multiple core, even in Debbie and there's multiple core dumpers available. Uh, the focus of system decore dump is not for minimal core dumps, right? So, uh, if you really want to have minimal where you're only picking certain areas that you want to dump, then system decore core dumps, core dumping feature can't do this. I've had people say, well, why don't you implement this for the system decore dumper? Maybe I could. I don't have time for this actually, but, you know, it would be possible that system D actually picks up some of these features. That would be maybe nice because if people are, you know, are married to system decore dumper and then they want to use these features, they're, they can only pick one. They can only have one core dumper. So, uh, yeah. So you'd have to trade off. Do I want the other features that this offers, which I'm not sure what features it really offers there? Or if you want to have the mini stuff, then you're going to have to choose which one you want. Did you look into using control groups to maybe freeze a set of processes and then capture the state from that to make it more consistent? Very good idea. I actually thought about this this morning in the shower. I'm like, why am I just not using control groups for this? I don't know how to do this with control groups V2 with control groups V1. It would actually be quite easy, but I'd have to think about this, but that's actually a much better way of freezing with this. Because if we can say, okay, we put all these processes into a control group, then we can immediately freeze them all. Right. And this is really nice. So this is definitely a great suggestion and be something for the future. I hope. Well, thank you, John. Okay, thank you.