 Okay, it's 145 so I guess I'll start go anyway. Hi. I'm Steve rusted work for Google And today I'm going to be talking about something that I decided I was going to need to do a few months ago back in February I said, you know, I need to relearn kegs at K dump and F trace part because this something I used to do at red hat when I worked at red hat which was over five years ago six years ago or so So how what's the best way to relearn something you write an abstract about what you want to learn and present it And it gets accepted and now you're on the hook to learning that so I've been relearning this all the time So anyway as my Oops, I just turn off. I turn on anyway. This is as people knows or been to my talks before smile There you go. Perfect. Um I'll post that later on Twitter. So It's my selfie with a real camera. That's a real selfie. Anyway today I'm talking about post-mortem K exact K dump and F trace and back at curl recipes about three weeks ago There's a guy who says well, you can't do you can't use like F trace for post-mortem And I raised my hands that excuse me in two weeks or I'll be giving a talk at open-source summit about, you know Postmortem K exact K dump and F trace. So Debugging a panic. This is what this is all about. What happens when the system crashes? It could be difficult It's what happens is you have no idea. You're sometimes what happened Limited output from the council you may not have much output at all or you may have no output whatsoever You ever have that gooey and something suddenly the system just locks up and you don't know why If you have a serial council there might be something chance, but how many people have serial councils anymore my laptop doesn't have a serial Council. Yeah, you still have to yes. Yes. Yes us people sometimes we really work hard I try to make sure all my boxes have a serial council. I tried some way even Chromebooks have serial councils I want a council that if you know how to do it I work for Google. I Have that cable Anyway What's it called? Yeah, serial is great and but what for this is now this is for you that doesn't have this Access so so you have a customer support that there's a system that you don't have access to the machine at all This is what happened at Red Hat when we had a customer that had these crashes We needed a way to debug those kernels and they had they're running proprietary software They did not want to share with us So we need to get the data from that machine to bring it over to be able to analyze to figure out how to Debug it. It's out in the field. So we don't know what it is So they have to send us some information So this is where kexec k-dump comes in Real quick. How many people have heard of kexec k-dump? Okay, how actually how many people have not heard of kexec k-dump? Okay, you know good next question. Okay, good. You'll learn something at one person. How many people have used kexec k-dump? Okay How many people okay? How many people have gotten working? Same amount of people okay, that's that's one of the things so it could create a core dump of the links Kernel so the core dump could be analyzed by other tools like gdb and what's not They can send the core dump to other machines at the crash type So the crash could happen and you could actually scp the core dump to another machine from the crash kernel The problem is it could be difficult to set up. That's one of the issues. I have a kexec k-dump It's not it's gotten much better over the years but Let's let me talk about kexec so kexec acts like The exact system call which if you remember if anyone knows what the exact system call is usually do fork and exec fork We'll split, you know the process into a duplicate process with the same address space same Same program running exact changes all the page tables or so that you see a new world and changes the New process to be running something else so that's how basically everything works inside the kernel or inside Yeah Linux for all the applications you run with kexec. We do basically the same thing with the kernel We run a new kernel we take an old kernel and replace it with a new kernel So it could be the same kernel. I mean the same source kernel But it's actually going to be a another instance of the same kernel So this is almost like a fork kind of But a lot of times it'd be different kernel from the kernel that's running The old kernels still exists on the in memory that you have access to just doesn't execute you just run the new kernel Some cases you could use this for fast reboots So if you have one of these machines that take like 20 minutes to get through the bias setup and there are some there like that A lot of times the kexec is used to basically I need to you know have a new update for the kernel You'd have to go from the new kernel You don't want to spend the downtime in the bias you do a kexec boot So actually what it does is just it loads the kernel in memory And then when it reboots it just jumps to the new kernel deletes the old one and runs Requires a relocatable kernel because you want to be able to put it anywhere inside so it can't be out of fixed addresses The basic way of this works is you have on your kernel command line crash kernel you give it some Parameters and it's going to reserve some space of memory it'll on a panic it jumps to that location and runs and everything's fun So about this kernel command line you have crash kernel The first one is the well that actually go at the second one at 128 mega megabytes member This is usually for a relocatable kernel you could tell it where you want where in memory you want to put I think believe this is a physical memory layout you say okay at the 128 megabyte location boom That's where I'm going to load the Kernel or space the first parameter by the way that at is optional. You don't have to tell it kexec will actually Or the crash kernel the kernel will actually figure out its own location But sometimes it doesn't work and sometimes you have to modify it So this is the way you can modify it the 256 meg is how much you want to reserve for it This could be tricky to get right in fact half the time I spent trying to get this working was I didn't have enough memory or something But what really happened was I did this I had a typo I had to bug on my command line And I somehow instead of putting 256 M. I put 256 space M and attach it to debug I Spent hours trying to figure out why this isn't working until I realized I was giving it 256 bytes of memory to hold a curl so The documentation says some documentation says 128 megabytes per size that is incorrect Because I had it working until I started playing with the k-dump part which also needs memory and that didn't work That was not enough. I needed 256 meg to make this work. So sometimes a lot of The kexec k-dump is kind of growing. So you have to account for it because You'll get more information When things go wrong, you'll get some information in bar log k-dump log But I found out that's actually Quite useless information. I didn't really debug much at all from reading that log K-dump Uses k-exec to jump to the new fresh kernel Does not need to be the same kernel some people like to have a different kernel or maybe a More a smaller kernel you got especially if you're worried about space you could make sure the you know turn off all the modules Just loading the right modules for the device that you need The original kernel is it's funny in the documentation when I'm reading all this I saw boot kernel first kernel panicked kernel these are used interchangeably through the documentation So when you're reading it with you see boot kernel first kernel panic kernel, that's the kernel that first started and crashed The new kernel could be called captured kernel second kernel or k-dump kernel Love the documentation for consistency So we need a needs kind of a special in it ran this although you can use the old one You know there's tools that help you make this I'll discuss more of that later But like I said you could use the normal RAM disk, but it's not really helpful for making a core But you could play with it just if you want to learn like that's what I was doing like okay I'm re-learning all this. I'm like, let's just use the normal RAM disk, too It takes several options to create the core dump. So there's some things you can do with the core dump You could create it on the local file system. That's what I usually do. That's what I'm doing in my examples I just reserve a spot in the local file system and That's where I might put all my core dumps now What happens if the file system gets corrupted because it crashed and corrupt the file system? Well now you're kind of In trouble because your your core dump won't live anywhere, but you can actually Hook it to a remote file system NFS and you even use a raw partition that's used reserved only for k-dump files This is all in the documentation You can also do you can set it up So it does a SSH or SCP so when it creates the core dump actually sends it to another machine someplace else so this is all available in kexit k-dump and But like I said, I'm not going to go explain it. I actually didn't play with it I only did enough for here. I got 200 slides. So I didn't have enough information to do everything here So did you get the picture? Going back to k-dump where I said you had the memory the reserved like said it loads the kernel it loads in it RAM So the k-dump code actually is the user space side of the code that sits in the ram net ram FS It calls a special function or application called make dump file It's a utility that creates the core dump file and it reads proc VM core on the crash kernel There's a parameter there that creates a that makes in the proc file system will create a VM core file And this is basically a file into the memory of the crashed kernel The make dump file runs in the second kernel. So when you jump to the second kernel, it's executed in I mean It's not in kernel space, but runs. It's running in in the World of the second kernel and then creates all the everything you need The thing is about this make dump file it needs to have access to the debug symbols of the first kernel So somehow you need to in store the debug information Onto the net RAM disk that the make dump file has access to because otherwise it won't know how to read the VM core file You can specify exactly what you want to save from that file and I'll get a little bit back into that And like I said it could send over the network if there's not enough this space available here So at the bare minimum and if you could read that little fine print up there I'll be posting this I just finished the slides last night and even some of the examples have dates from last night So the I'll be uploading these slides so you could download them and have the links and everything once I get done with this talk So that's the so do not take pictures or anything. So on Fedora 36 I Did this on both Fedora and Debian? For those that use like Ubuntu Gen2 Suzy. I'm sorry. I didn't have enough time Anyway, here's the bare minimum. I did the Kexa Kexac function calls on here Dash P. I believe it means like it's going to be a crash kernel Or the boot up after crash or panic think that says for panic do this and then you pass in the kernel that you want to load I'm actually using the same kernel that I booted with and then you could pass in a net RD And this is where I'm using the same in it RD that I booted with But because if I were just to do this it would almost basically just be a new reboot if I just left it alone like that I added the append saying that okay. I want you to boot into the init bash, which is on the init RD So I ran that you know and to if you want to crash or test all this You echo C for crash into the proc sys request trigger file Which does a bunch of commands inside the kernel see is for testing For this crash kernel because it will actually crash your kernel So if you want to bring down someone's machine you got root access and you say oh the guy left his terminal I could go in here you could echo C and hit enter and you just crashed your machine. Don't do that Boom crash So after a boot up I got to see hey give read password You know the normal command they usually see I typed in my password. I said cat command line Proc command line just to see what I had it added this elf core header app Boom, I think that's the command line that you add to create the proc VM core So that's where it tells you where to look or something and just so the heck of it sure enough I did LS and there was proc you know VM core So on Debian on Debian testing I Updated on the 18th to the latest and greatest I did get app get install K-dump tools that kill K-dump tools is the package on Debian to get the K exact K-dump to work The files that are installed on Debian are you know the default K-dump tools and then in user share doc If you want to read about it, there's a read me Debian It's very minimal Debian did really put a lot of effort on this So this is the very minimal bike I everything working If you want to update things they give you the tool in that K-dump tools It creates it creates this utility you could use called K-dump config and you go load if you say K-dump config load and it reads the information in the default the K-dump tools and know what to load and It will create sim links to help do things for you. So you don't have to worry about it I just do K for K config load. It will load the kernel into the reserve space It will load the net RD into the reserve space if you want to see how it works You could say K-dump config status It will show you that if it's operational or not and you do a test which doesn't really test anything It just says here's what's going to run when the thing crashes On fedora 36 which I updated my machine to be or my virtual machine to be fedora 36 It's called K-exec tools. I love the consistency here across distros So if like I said Susie could call it something else gently make up something else who went to my colors I will put there's usually baselines for the Debian. So probably has the same name as Debian But anyway, it's DNF install Kexec tools The files you get is Etsy K dump comp. There's a sis config K dump I don't know exactly why they have the two there, but actually since this is mostly developed by red hat In red hat supports fedora the Kexec documentation in there was lots of documentation. That's really informative to read I mean, that's so when you're doing this it tells you how to do it for different architectures power PC arm, whatever It's really informative. I was actually really impressed with the documentation that was here. I actually learned a lot Their tool for consistency with Debian is called K-dump control So you do start to create the It reads the K-dump comp loads all the stuff read Oh, by the way, if you modify the K-dump comp It will then rebuild in it the K-dump in it ramdisk because the K-dump ramdisk is built dynamically And if you and it's dependent on what's in the K-dump comp file So you could say you actually tell the K-dump comp will tell you what you want to do like SCP it do a shell do whatever You want Inside the K-dump comp and then it will create the init ramdisk if you modify it when do K-dump control start It'll say oh K-dump comp was modified. Let's rebuild the init RD Now, let's say if it you did something and or something it didn't rebuild it You could say rebuild and it will rebuild it regardless whether you modified it or not and then K-dump control status will show you whether or not K-dump is actually operational or not You want to run this before you ever do any crashes because there's a few times I did control C and the machine just rebooted. What what happened? I went. Oh, I never started it So triggering K-exact K-dump as I said you could do echo C But this is not your purpose. This is the way you test to make sure you did it properly This is not what you want me doing in production unless you have that rogue person that finds the open terminal on you You could disable that as well. I recommend it. Anyways, just control kernel panic oops You want to enable panic on oops because it ups a lot of times is just you know You get those bug reports in the nice stack trace that may not trigger K-exact K-dump It might just stop right there. You want to trigger a panic So you want to say panic on oops, which will trigger that well not to be careful because sometimes you'll have a oops like that Anytime a bug is called not warning, but bug inside the kernel. Well, it will trigger panic So if something happens where it could have actually recovered from there are same cases where bugs are recoverable Not always it's best just to reboot the machine anyway But this will cause the machine to panic and then reboot do anything You could also set up an NMI watchdog that will trigger a panic if it detects like a hard lockup So this make dump file is very interesting something I'm going to investigate more time in in the future because it's what creates the core dump file from the VM core file after the crash happens So this this is run in the init RAM disk After the panic before so you do the reboot into the new kernel It runs make dump file to create what you need and then after it's all done It reboots the machine back to the original kernel So you could create a normal elf file, which is readable by gdb Okay, or which is what the default is and I think it's highly recommended you do a k-dump compress file It's much smaller than a normal elf file And this is where you can add a lot more like stripped down stuff Like if you're worried one thing we have to do for this like a red hat and what we're gonna do at Google and everything else You can't be recording user space data You have to be very careful about what you could record because for privacy reasons So you could use this thing to kind of like strip out only the data that you need So it's much smaller than the elf version and then here's the commands you could do This is by default I plan and make it strip more than what it does today But right now it's everything's page relative So you could think it's five bits it has and depending on which bit is set it will exclude things So if you do if you pass in D31 all these are excluded so Then when you reboot into the new kernel by default there should be a directory called bar crash that has where all these core dumps exist Read the documentation about it. They said you could put it elsewhere Like I mentioned it could be a load in other locations if we're all partitioned. I've already said it over a network, but usually You want your or each crash will get its own directory So this is the bar crash is where it's going to be done when a crash happens It creates a directory that it puts the crash in on fedora It will the directory name will be followed by the IP address year month day hour minute second But you know for consistency Debian had to do something different So it creates a year month day hour minute with no thing So that's what you see when you do an LS of the directories So the crash file is on fedora. I want to LS what's inside there? I see a Kexa I see three files, but this is all by default the Kexac For the Kexac dmessage.log is actually the dmessage if everyone knows their dmessages, right? Do I have to explain that anyone that doesn't know what dmessages? It's the kernel output that you see well This is a dmessage of the Kexac file the second kernel the capture kernel So you get to see the data or like what happened all the messages from when the Panicked or the the recovery kernel or what you want to call it is Doing the second one see the inconsistency. I don't know what to call it anymore second kernel The VM core is the core the core dump that you want to look at just like you have a core dump of a normal Gdb process or whatever VM the VM core dmessage is the dmessage of the kernel that died So that's the one that's actually more interesting to me That's the one you want to look at because if they had a oops message in there you'll see it in there Debian for consistency Has these two files The dmessage one is just the dmessage of the kernel that died The dump file is the VM core file and it for some reason it also appended the date that has the same name as the directory name Just in case you forgot which directory you're in So I want to talk about the crash utility For all those that use Kexac K dump, I'm sure okay Everyone here knows the crew or who here that uses Kexac K dump doesn't know crash utility Okay, good. So that's basically the consistency if you do is Kexac K dump. You know about the crash utility It's a Gdb wrapper. It's a wrapper around Gdb that understands Linux kernel structures I was written by David Anderson. I put a question mark on 2003 because I'm not exactly sure when he created it But the documentation always says copyright 2003, so I'm just assuming that's when it was created To get it. He has a white paper that he talks about something that you should read This is where you get down download the source code, which you might need I needed it for this demo Distributed by both Fedora and Debian and other distributions They both read I said the VM core file or you what was it the dump file for Debian, but you know, I'm talking about and Here's the usage of it does Obviously crash VM core and you need a file that has to debug information in it I said debug info there. It's really could be the VM Linux file if you compiled it with debug config debug info Debian and Fedora, and I'm sure other architecture or other distributions will store the debug information in some other location So running crash on Fedora you CD into the place you run crash the VM core and this is the path name For where I found it. So it's in user lib debug lib modules you name dash R That's just because I knew it was this kernel that I'm using obviously you have to put the version number of the kernel that crashed and VM Linux Debian Actually, this is the first time they're actually match. It's in the same location. Thank you So I'd have to relearn two different long paths. So really yeah The the location of the VM Linux debug information was actually in the same location for Fedora Demian. Thank you very much So when I first ran Debian, I Did this I ran crash dump and it gave me this nasty error message So I'm like this is Saturday. I'm like, oh crap. I have to get this done by my This presentation I have to fix it. So like I mentioned before you got to get download the latest crash so I downloaded the crash utility and then I Know CD crash and this second pseudo line app get install I put it there because I didn't do this when I first did it the first time it compiles it downloads gdb Like the like the version I in those it was so when you type make it will download gdb and then build gdb But what happened was I didn't have these Library or this you know, I didn't have g++ on this virtual because remember this is a virtual machine This is not my desktop work. That was a virtual machine. I was doing this on so it had a limited supply of things that was available and it crash the make died and When I typed when I downloaded the new stuff and type make the make one at work again The only way is able to get again I deleted the entire directory redownloaded it and started again and then I had to do it again for the next thing Actually, I think was I got bison on flex working the other the other two the build continued So that's why I tell people make sure you have those packages And then once you start running crash It's really nice since you guys mostly run this before and this is not going to be a tutorial for those I have not this is really quick first thing you do is you do pt a bt backtrace Which gives you the trace of the backtrace of the thing that crashed if you look here You'll say hey sysrequest handler right down and it called what here's actually the system call goes backwards all The way up does the sysrequest handle crash panic crash kernel Machine kexec boom. That's the backtrace so you get to see that you could switch from different tasks You look at other tasks you can do a ps to see a nice little ps of all the tasks that were running what state They were in lots of nice information. You could do mount to look at everything that was mounted Dev look at the devices that were all said and it shows the f ops of all the files And then you can actually look at these and then you can do for each task Which this is only a cut it would give I only had enough room for on the slide here But it showed me every single task the task the structure of the tasks everything else There's a lot more you could type struck something with the address and it'll give you the structure of that information So it's really powerful, but this talk is not about crash So what I'm going to suggest is I'll upload this. This is a man page look at crash utility It's extremely useful. There is documentation. I wish there was more documentation, but nor you play with it The nicer is there's a guy in the audience that has another utility that can makes it makes it work for GDP Anyway, maybe he's not here So The only problem is it's not very good at knowing how you got there. So it crashed Great, this is what happened. We hit a crash and we look at things and they like, okay This is wrong something you can find what's wrong in the kernel with this Now the question is how do we get to this state? We don't know Takes a lot of forensics work and knowledge to do this I mean once you start doing this long enough you start seeing patterns you can figure things out But that's a big learning curve to do all this So we all have to become a bunch of you know detectives, you know CSI crash GDP K dump new TV show so Tangent off-topic not quite off-topic One day there's this question about trace AMD. Okay. He'll hear does not know about trace AMD trace command Okay, good. It's my utility. It's a front end to f-trace. So on March 19th 2010 I get this email from Lai Jiang Shen from Fujitsu and he's saying hey Do you have some place documentation that you could send me on the trace dot that file from? For your your trace command output. I said well if you have the man pages installed You just do man trace command dot that And it gives you all the force. Yes. I'm a documentation freak I like to write I like to document things so I documented the format and put it into a man page So so that's all I said, and then good. He just said thank you and disappeared disappeared May 20th two months later. I get cc'd on this patch to the crash utility from Lai Jiang Shen patch title is Crash trace command generate trace dot that file from core file. I Went I can't say what I said Holy Wow, I'm glad for that man page So the crash trace extension so for the longest time in fact it was funny part was it kind of freaked me out it was in the Get tree of the crash utility and then when I went to go look for it I couldn't find it and it freaked me out like what did they get rid of it? Was it un-maintained? What's the problem? They said no they actually moved it in April of last year to Its own crash utility repository because I guess they were doing so much updates and such like that that they said look We're just we're not going to bother the maintainer of crash We're just going to pull it out and do it on our own So it's here at GitHub Fujitsu crash trace is the repository. I've actually submitted patches and fixed some of the things there It's actually kind of fun learning how to walk through the kernel from a crash core dump I was gonna actually talk about that but I ran out of time Unfortunately Debian hasn't caught up yet There's no package so if you download the crash utility crash tools or whatever It's called k-dump tools that Debian calls it. You will not have the trace utility You actually have to download and build it from scratch build the crash utility from scratch And then you have to build this from scratch as well It doesn't matter because like I said I tried running the the kexec k-dump from Debian and it didn't work anyway So I had to download and do it myself So but on Fedora, they have the package utility called crash trace command So you can't just download crash you have to download the package called to rash trace Crash trace command well say that ten times as so So it reads the F trace ring buffers as well as the KL sims in the event format files and puts everything at the trace That that metadata needs and builds a trace that depth value file for you so Then you could just run trace command on it So using the trace extension is so this will we do so obviously I did this on Fedora because Fedora had all the information So I said you know what I did on Debian 2, but I'm just talking about the door now. So I do the start I do the status and way. Okay. Let's run function tracer and then See proc proc sysrequest trigger crash No, it really did crash I mean really K exec did not come up. It triple-faulted and rebooted This was Sunday I was like, oh crap What happened? so One of my tools. This is what I'm like great I could talk about one of my tools that very few people know about exists inside the kernel So function tracing traces almost all kernel functions and never so often it traces something it shouldn't And when that happens Bad things happen. For example, some functions are called when memory is shut just disappears So basically it's just executing special code where the actual data of the kernel is disappearing or especially like in shut down to turn off CPUs Suspend and resume they have locations that do weird things that you can't write to memory or memory turns into you know Read only like they might just say okay memories all read only now So when we try to do a trace of something we try to write to something we take a fault Then we go into the fall handler the fall handler tries to write something it can't so it falls And then it goes into the fall handler and says oh, we did this three times. We see a pattern just reboot the machine So these taking on CPU shut down path area set us so if one of these functions is trace writes to memory that's not available we get the fall and reboot so How do we find that function? So here's the thing the problem is a function here so We bisect You know we have there's actually inside the Linux kernel repository inside scripts tracing There's a script called ftrace bisect dot sh and the way you do it is first thing you do is you copy it from there To put it someplace else unless you want to run it from the repo sometimes I usually like to keep it so it's available I copy it to use a local bin so I can execute it Then we do is you cat the available filter functions Which is all the functions that ftrace will trace into a this thing called full file So I just call it to full file and then I run the ftrace bisect So you pass in all the functions you want to trace so that's gonna be tested and then you say a test file Which is a file that you want to do your test on and a non test file Which will put all the functions that is not being tested into the test file So basically it splits the full file in half puts half of it in the first file and that half of it the rest In the second file and then we do is you cat trace file to the set ftrace filter, which only enables the Functions when you enable function tracing it will only trace the functions that are in this file So I'm doing test the file Don't do this Great. I just wasted all this time just to tell you not to do something Why well there's 50,000 files or functions in that file and When you if I do this which is I do time. I just timed head I only passed in a thousand of those functions remember there's 50,000 functions So I did the top thousand into of the available filter functions into set ftrace filter And it took almost a minute to complete Why well it requires thousands of functions to be filtered in a set of training This could take several minutes because it does an oh and squared algorithm with k all sims because you're passing in a name Everything in ftrace's addresses that doesn't know anything about names So to all addresses that's convert to k all sims now if you're gonna do a k all sims and we're passing a k all sims We have to now match to match a name to 50,000 functions We actually have to search all 50,000 functions and anything that matches It's going to walk through that all that time So if you're passing in 25,000 it's 25,000 times 50,000 exit k all sim lookups So it takes a long time So you're taking it takes could be like five ten minutes or more 15 minutes to execute half of that and I got tired of that so in 2019 because I You know someone said hey this config breaks on shutdown or something or on CPU hot plug And I went and tested it and sure enough. Yeah, something broke and I was like I don't have time to do this So I modified the kernel to allow set ftrace filter function to do something different If you do the index of the available filter functions into the set ftrace filter It will just go to that index enable it because the available filter functions It's just a directive mapping one-to-one in the exact order of Sorry, so the direct mapping of the What's called how the layout in the index inside the kernel is the same as the layout in the available filter functions So when you say index here, you just say that it's easy It could easily find that function enable it so let's say we did we I want the hundredth file So I did head dash a hundred of available filter functions tail minus one So it gives me this arc perf update user page Which is the hundredth entry into the available filter functions I echoed a hundred into set ftrace filter and boom it does arc It gives me exactly that function that I saw so let's go back to the original thing of member I did the first thousand and it took 50 seconds Well, I might copy the set ftrace filter into a temp director into the temp directory called file called a I Did the second one look how fast a thousand I did seek a thousand Which is just gives you one through a thousand right there It took real time 93 milliseconds and both user and system time was like zero So it's an o1 operation extremely fast So let's copy that into B and do a diff to see if it's everything's okay. Well, they weren't the same Anyone can guess why they weren't the same static functions all these are multiple instances within the file So because when you pass a name it goes and matches everything that matches that name So if you have something like type show, there's a lot of type show functions in the kernel in lots of various areas They're static located there, so it will enable all them So that's why it's it's different where here's something else that's different But by echoing the number you can actually distinguish which type show you want So ideally you just do that and that's why every I checked out every one of these things had a duplicate within the first thousand Functions, so I'm like, okay still I don't care. I want to know which functions crashing I really want to know which function it is. That's why I don't care about this So going back to it. So instead of doing this cat the test file So instead of doing the cat or the whole thing it's a full file Well, I do instead as I do a WC of the line count Give me how many functions are there there's 52,340 functions in the available filter functions So I do seek, you know, 54 or 52,340 into full file and then I do the bisect same thing because now I'm just this full file is just a list of numbers and that's all I care about and then I make the whole thing full file And it splits it half and half I do the echo test, you know And then I go through if I crash if it if it locks up or does the reboot or does something bad I go and I say okay copy the test file that just happened to full file point one So now I'm going to the second stage of the bisect by the way always put the point one point two and When you run the second time I always do the point one that way I sometimes you'll screw up And you want to go back you want to find out where you were if you don't do this and just keep you know Just rename it to full file which I used to do I found that I screwed up I just restart the process all over again. So that wasn't fun So I always put in levels so if I screw up I could kind of figure out where I screwed up If it doesn't lock up always test a non-test file first That means that one of the function if it doesn't lock up the non-test side should be the problem So I would test that before you go further. This is how I messed up I didn't test the other side and I somehow screwed up anyway, then you copy the next test to the file one and you go on Wash rinse repeat finally I got down after 16 iterations I got and I found that there was it I put in this and it reports if there's only one function there It will tell you hey, there's only one function left The must be the bad one and that was number seven forty nine So I echoed seven forty nine into the set up trace filter and I did a cat a separate filter I said H3 is isolated the isolation supported Hmm. Okay. So what I did was I start I started a k-dump got it ready And then I ran the function tracer Trace command with the dash n option means don't trace this function. So I put dash n H3 is isolated isolation supported and then I'm like, okay a cat to make sure it really was in the set F trace No trace file. I want to make sure before I did the command Ran the C crash it worked So that was the actual problem. So I was like, oh, I was going to go and say okay I'll submit a patch make that no trace. Well when I booted the I try I tested this on the latest kernel the 519 RC2 It worked fine So if this is with the 517 kernel had this issue, but this is something you want to know I did this because it might be So Crash VM core there and then I did the extend by the way. That's how you load the module So extend trace so you sometimes if you if you build it yourself You have to put the full path in to get to it But to add the extension is to crash you do extend trace so it says it's loaded And then if you do help trace it gives you all this information The only one I cared about for this talk happened to be this bottom one the trace dump dash T Which says output file so it says it's going to dump the ring buffers and all the metadata It'll create the trace that that file for you. So that's the one I ran. I ran trace dump dash T this attempt trace that that exit and then trace command report and Boom and right there is the machine exact that you saw that it did the exact stuff So you got to see and then there's actually an NMI handler triggered shortly afterwards to before the whole thing rebooted I've done this several times this the first time I'm like, okay I'm going to do the cut and paste version of it was the time I had an NMI handler that doesn't usually happen I was like, oh, that's kind of cool So I was like, let's try doing a little bit more information Let's add scheduler events and interrupt events make sure I turn off that HV is isolated supported and then did the whole crash did the whole thing Brought up the crash the extend trace that so did the trace dump to the file and I exited and I ran kernel shark Boom, there's the exact core dump from the crash from with a kernel dump or a kernel shark output This is just saying that hey you could do this. So Let's try demoing. I said I want to demo. What's where are we here? Oh? It's stop. Ah I Can't do wait. It's not oh come on. I do Okay 30 seconds. Oh come on Let's do Well, then I want to do the Debian one because this one has that camera which file like it was a Trace command start dash e all you just do sked Just sked sked events. Okay. That's it bump K dump control start status Okay, echo see Proc Sys requests trigger And one of these where is it? There is that's my Debian one I'm hoping it worked. So it's rebooting back I know I know I know I know And then from here SSH CD var Crash, what's today's date? Today's the 22nd Yep, it's the only time of their LS Crash Wait, so you do current crash The M core there entry Extend trace so Hurry up. Hurry up. Hurry up. Hurry up It needs to be as fast as me trace D Wait, right. They say trace What was that camera? It was trace that D dash T temp trace at that Don't push me exit Colonel shark Come on colonel. I don't know shark on this one. Right. Yeah, I have to copy it. So trace command report What? Oh By the way There it is bump. Wait, that's Yes, that's all sketch switch. That's all the skid events. So there it is. It worked That was 200 slides, okay, thank you