 I work for Red Hat as a technical support engineer, meaning that I help our clients administer their Red Hat Enterprise Linux systems. And more specifically, I work in the kernel team, which means that we do a lot of work on helping our customers do an analysis when their system crashes because there was a problem on the kernel side. And as you might notice from the title, this is going to be an introduction to some of the techniques that we use on the team to do an analysis on kernels that they encounter programmed on life systems. What I want to go through today is tell you what exactly happens when a kernel crashes, what you should be prepared to do. And I want to talk to you about a couple of tools that might help you do this analysis after you encounter the problem. Let's start with something easy, something which I hope you are more familiar with. Let's assume that we just have a normal application, a normal user space program, that crashed. It encountered a problem and it crashed. You need to figure out what happened to that application. Well, there are a couple of ways you can do this. You can start out by, for example, reading the code. If you're the developer, for example, or you just have access to the code, you can start and based on what was the environment of the application when it crashed and your knowledge of the code itself, you may want to go through the code and see if there was a possible problem in the code, fix that bug, recompile it, make it work. Not all the time you have this option, you might have a big application, you might not want to spend years and years debugging a software problem. You might want to modify the code to provide some more information about the application while it's running. You can do some feedback messages. And even though most developers won't like to admit it, the best way to do it is just to put in some print apps in the code and see what are the conditions of the code while it's running. This affects the code, you have to recompile it. It sometimes can modify the application so much that it doesn't give you the right perspective. So you might want to do something else. Usually another good approach is to use GDB. How many of you are familiar with what GDB does? Yes, thank you. GDB is awesome. You should use it. Developers use it all the time. Why is it awesome? Because it can debug processes either by starting them alongside an application. So you have GDB that starts an application and when the process of that application is running, you can introduce breakpoints, see what's the environment at some life cycle of the process. You can alter the virtual memory of the process. You can do several things with GDB. So one approach is to have that application running. You can attach the GDB to a running process or you can start the process with GDB, my application, and have GDB attached from the start to that process. But sometimes you want to find out what happened to you at the application after it crashed. So you want to do a post-mortem analysis. In such situations, the crash of that application, for example usually in the case of a sec fault, you would get something like a core dump, a file called core dump, which is the dump of the virtual memory of that process. What does that mean? If I have access to the virtual memory of the process, it means I have information about the environment of that application at the time of the crash. Based on that, I can gather information about what was going on at the moment when it crashed. I can also use GDB on this. I can start GDB on an application and also provide a core dump, which will allow me for that application to run in the same way it ran when it encountered the problem and got the crash. This way, I have a GDB instance of a process which was basically a clone of that initial crash process, which I can dissect live and run GDB operations on that application. So these are some of the approaches for debugging an application. If we move into the kernel world, some people will say that things are completely different. We have no idea what is going on in the kernel, how to debug it, how to do anything. Kernel space is unicernel and everything is different. Not so much. Most of the principles still apply. You still have most of the same debugging mechanisms. Maybe not print dev, you have print k, which indeed comes with some other problems because the way that the kernel works, for example, not having terminal attached, we won't see messages on the console. You would probably see print k generated messages on a serial port, for example, or by default in the Dmesk, the kernel log buffer. We can look at those messages for more information. We can do GDB. We can attach GDB to the live kernel. Slightly more complicated, but we can do that. We can also get some information on a live kernel. I would just want to mention SystemTap, which is also a very interesting way of debugging things in the kernel. It deserves its own presentation. But as you can see, we mostly have the same way of thinking about debugging the kernel as we would do most of the application in the user space. However, there are some restrictions. I want to mention what are those restrictions and how we can work around them. Let's talk about what's a kernel crash. A lot of people said, okay, something happened in the kernel crash. Let's be more specific about what that means. First of all, we should realize what is the severity of a problem in the kernel. For example, we can just have wardings in the kernel buffer in that Dmesk, which tells us that something is happening in kernel space. Either the actual kernel or kernel module outputted some information in the kernel Dmesk buffer, and those messages could mean something. They can range for just informational warnings to actual errors in the device. These won't cause problems in the kernel themselves. They may be problems on the system, and the kernel module will tell you that there is a problem, for example, with a disk. You have IO errors on the disk. That's a problem. But that's a problem for the disk, for the hardware. It's not a problem for the actual kernel. So, from that perspective, everything that we get in the Dmesk buffer with these screen key generated messages are informational from the kernel's point of view. Then we have kernel oops. The kernel oops are something specific to kernel space. There are a type of problems within the kernel that we usually don't want to see. But what do they mean? These are some anticipated problems, some specific set of problems that the kernel might encounter, which might cause future problems if the kernel continues to run. The kernel will warn you about these issues as an oops. But that might mean that the kernel will continue running. It will continue running with the disclaimer from the kernel that I am now unstable. Use me at your own risk. You might want to not continue doing that. You might want to reboot the machine. You might want to troubleshoot what caused that kernel oops, usually a software issue in either the kernel or most likely a kernel device driver. Sometimes it might have a side effect in the future which will lead to a kernel panic. Or in general, the system will just be unstable. Then you have the kernel bugs. Kernel bugs are something that the software developer anticipated. There are situations that the kernel developer thought that could happen but should never happen. So if the kernel reaches that point where the code is in a function, a piece of a function, where it shouldn't ever reach, that would be a problem. It's anticipated, so the kernel developer said, okay, I'm going to track this, but you should never be there. Sometimes that bug is just informational. So you can say, okay, that happened. Do something about it. You might have side effects later. But if you can also cause a kernel crash, so actually crash the system. And I kept saying kernel crashes and kernel panics. The actual kernel panic is when the system is completely down, is inoperable. It's the point when the kernel reached a phase where it's no longer working. You can't do anything on it. Processes are not being scheduled. You can't type in anything on the keyboard. Nothing new will pop into the monitor. Nothing is going to happen. The kernel is in a state where it's still running technically. So it's the only thing on the machine actually existing in RAM in anywhere. But it's not actually doing anything. It's frozen. That's a kernel panic. It's the equivalent of what we would see in the Windows world, for example, as that blue screen of death. And this is what we wanted to talk about in this presentation. What we do with these kernel crashes, these kernel panics. When the kernel panics, the Linux kernel specifically, wants to help the administrator of the system out. And it does that by providing some information about the kernel panic. So before it draws its last breath, the kernel will print out on the screen on the serial console. It will print out a series of messages. Those will be the last thing you will see on the screen before the kernel crashes. Depending on how the system is configured, the kernel will automatically reboot or not. Usually it doesn't because you want the system administrator to see that last message on screen. And these messages contain things like what was the process that caused the panic? What was the state of the registers on the CPU at the time of the panic? The call trace of the kernel space functions, which would be the history before the kernel panic. And it looks something like this. For example, at the first line you would see something like unable to handle kernel new pointer dereference. That's the cause of the panic. But it would be the equivalent of a new pointer dereference in user space like a second fold, but not really. That is something that the system cannot recover from. It panics. So you see the reason why it panicked on top. Then you have some more informational messages. For example, what was the kernel version? What modules were linked in? You also get the list of registers. What were the hardware registers on the system? What were their state? What were their values? You also get a call trace. You get the call trace of functions that were called before it reached that state to panic. Now, let me give you a bonus tip regarding presentations, not regarding this kernel crash analysis. If you ever have to do a presentation when you put in a piece of code or text that spans two slides, you're doing it wrong. Don't do that. But I actually have a point. Here, the kernel tries to give you as much information as possible. So it spans out to monitor screens, for example. The problem is that when the kernel panics, everything stops working. For example, you cannot go with page up and see what was the previous screen. So even though the kernel tried to tell you a lot of things, you can't see them. You only see this screen. You would not see the second screen. Unless, for example, you have a serial console and everything outputted by the kernel went to a serial console and you have that thing locked. But most of the times you don't. So you have information but you can't access it, so you can't really do anything with it. Or can you? Let's see what we can do about it. Another example would be if I have a panic caused by out-of-memory issues. On top, I see that there is kernel panic due to out-of-memory. So that means no more RAM, no more swap, nothing on the system that can be allocated. I have to panic. No more memory. I need to panic. But it doesn't really show me anything useful in this cold trace. It doesn't really show me what caused the out-of-memory. What were the processes that were eating up the resources on the system? We need more information. Let's see how we can do that. For example, using a tool called Kdump, we can gather more information. We can prepare the system before a possible crash. Prepare the system to be ready to provide us with more information at the moment of the panic. What does Kdump do? It has one role in mind, to generate the VM core. A VM core, just like a normal core dump, is a copy of the memory to a file. So VM core is just a file that is a memory of the running entity. In the case of a normal process, it was the virtual memory of the process. In the case of the kernel, it's the contents of the RAM plus swap. So everything the kernel will consider memory. So that's what we want. We want to generate the VM core. This tool called Kdump will take care of that. When the kernel panics, so we go through that panic phase, Kdump will start up and copy the contents of the RAM and swap into this VM core file and store it on some persistent disk so we can access it later. But there's a problem. Kdump is something that needs to be running on the system. But I mentioned earlier that when a kernel crash happens, nothing works on the system. So we need a kernel that doesn't do anything because it's in a coma to schedule something like Kdump to do the operation of extracting the contents of the memory and dumping it somewhere. So we have a problem. We want to run something and nothing to run it. This is where Kexec comes in. Kexec is awesome concept in the Linux kernel. Just like the name suggests, Kexec executes something in the kernel. Kexec is an independent feature from Kdump. And it can be used for several features. For example, if you've heard of Kexec, probably you've heard of the feature of fast reboot. This means that when you have a running kernel, you do a YAM or update of the kernel package and you want to boot into a new kernel. Kexec can execute the new kernel, the new kernel image from the disk without having to go through a full reboot, without having to power off the machine, go through POST, go to BIOS, UFI, read from the master boot tracker, etc. It just has the old kernel executing a new kernel. So that's what Kexec does. It executes a new executable, let's say, in the kernel. Not to be confused with K-graphed and K-spliced, which is a mechanism of life-patching, meaning that you have one kernel running and subsequently you replace the running image with another kernel's code, live. Kexec actually executes a new image. How can I do that? Well, I have to have the new image somewhere stored. I need to have the image which I'm about to execute stored somewhere. And since I can't store it on disk to use it later because operations on disk need processing from the kernel, I need the code to be there just when I'm about to execute. That means that it has to be in RAM. It has to be in the physical memory when I want to execute it anytime in the future. So to run Kexec on a new image, I have to reserve a piece of the physical memory just to store an image which I'm going to later on execute with Kexec. So to run this, I'm going to pre-allocate at boot time when the kernel loads. I will tell the kernel to always reserve a certain portion of the RAM to be used by nothing else. And something in the future will populate that piece of memory, a physical memory with a binary image which Kexec will be able to execute when it needs to. So using Kexec, Kdump provisions this piece of physical memory with a Kdump kernel image which will be used to do the things we discussed, to copy the contents of RAM and put them on a disk or a network location. I need to do this, I need to reserve that memory region by passing the crash kernel parameter to the kernel at boot time. This would mean that the boot loader needs to pass to the kernel of this parameter. And Kdump knows of that reserved region and will put in that region the needed binary. Small disclaimer, it's not necessarily a new kernel image, it can be the old kernel image but a new ramfs, but details. Just think of it as a new kernel running, a special kernel just for Kdump. You have the running kernel and you have the special Kdump kernel which only does the two things that we care about. Copying the contents of the memory and storing it somewhere. So we have Kdump configured. This will tell the kernel when it panics, meaning that it does everything that the kernel should do on panic time. For example, print out those messages on screen just before it does the last thing which is do nothing and stay there on the screen. The kernel will do a Kexec on this memory region which contains the Kdump kernel. Thus, we're going to have a new running kernel on the system. We're going to have a new image doing something else. Everything that was in the ram at the time of the crash is still there. Nothing cleared it. The old kernel did not do clearing of the ram. I didn't power off anything so I didn't lose the content. I just ran a new kernel there and everything else remains. Meaning that I can copy the contents into this VM core. So the Kdump image will take care of this. It will just take the contents of the ram and swap if we need to and store them on a file. That file can be, for example, on a partition on the disk. It can be on the root FS, not recommended. It can be on a separate partition or LV. And even can do network copy. So either secure copy or NFS or actually use Ice Cousin to copy the VM core over the network into another system. But at the end, Kdump managed to copy everything and store it in a place where we can do a post-mortem analysis. Now, how much does this VM core occupy? Well, in theory, it's the size of the ram plus the size of the swap. Let's just say the size of the ram for generic purposes. That might be a big file. So for example, if we're on a server that has 128GB of ram, it's going to have 128GB of stored data on the file. That might be big. But sometimes we don't need to store everything. For example, we don't need to store the contents of the user space, the virtual memory, user space, virtual memory of the process. If we do kernel debugging, we don't need to know what was going on in this application running on the system. So we can zero those out. And actually we want to do that because it's a matter of privacy. For example, if Red Hat customers send us a VM core of their system, they need to trust us that we're not going to access private data for their company. So what we do, we actually zero out, or we recommend the client to set up KDAP in such a way that they zero out all the information that has nothing to do with the actual kernel. So in the end, we're going to have an image of mostly zeros, which can be compressed. So we have a big image, but if we run a compression algorithm on it, we can reduce the size of that VM core to a very small number, like a couple of gigabytes. So we have the VM core. We can do analysis on the VM core and see what was going on on the system at the time of the crash. What do we use to do that? There's this utility called Crash. Crash is something that we would run on our system or on the system that crashed after the reboot to analyze that VM core that was generated. Crash is actually based on GDB. So it uses in the background GDB. This means that if you know GDB, you could be quite familiar with what Crash looks like. You have some of the same workflows. You have some of the same features. You can have plug-ins, for example, like in GDB. So you take advantage of what GDB does plus kernel-specific things on top. It requires its input of course of the VM core to see what was the state of the system at the time, but it also requires the original kernel image, kernel binary. Just like GDB would do on a normal application. When you start both more than an application, you want to rewind an application crash if you have the core DOM. You do GDB, my application, and the core DOM. Same thing with Crash. You run Crash with the kernel image. It has to be the same kernel image as the one on the VM core. And then the actual VM core. What you get is an environment where you have the same things in the Crash session as you would have on the real system at the time of the Crash. So, for example, you can access logs. You can access the Dmesk buffer because the kernel log buffer was in RAM at the time of the Crash. Therefore, it is going to be in the VM core. You can access it so you can get more information. You can see the entire history of the kernel of the logs without having to go with page up on the screen. You also have access to all the data structures of the kernel. So everything from file systems to processes to network connections, everything that was in RAM and used by the kernel, you have access to it then. You can have some tools built into Crash, for example, PS. Crash contains PS command that will give you a similar output as PS in user space. It's not going to have all the features of a normal PS, but it will have just enough information for you to figure out what was running on the system. Or you can have some built-in commands to get information about memory. You can access the mount to see the mount points at the time of the Crash. If there isn't something specific that exists as a commanding Crash, you can always try to dissect the actual structures. You can have pointers in memory. You can say that that pointer in memory represents, for example, a file struct. And you can read the contents of that structure as it was at the moment of the Crash by casting it to a struct file. So you can do the debug as detailed debugging as you want in this Crash session if you have all the contents of the RAM. For example, what you can do, very easy and something that we do all the time, out of memory debugging. So we saw on the screen earlier that there was that Crash that was caused by out-of-memory issues. So we know that something was occupying the RAM, but we don't know what. In a normal system, you have the OOM killer that will try to kill some processes that consume RAM to free up RAM, but at some point, if OOM killer cannot cope with the amount of memory being used, it's going to give up and let the kernel panic. You're going to see on the panic message, you're going to see that one specific process failed allocating memory and that ultimately caused the kernel panic. But it does not mean that that specific process which was running the panic function was the one occupying the most memory. So you want to see what was occupying the memory. So what you can use, for example, there's a built-in command in Crash run Kmem and see, for example, what was the usage of RAM swap just like a similar output as free formatted in another way. And with the output of Kmem, you, for example, would see if indeed all the swap was used and all of the RAM was used, indeed if it was an out-of-memory issue. Then you can do, you can run PS. Sorry? Yes? So to figure out what were the structures used by the kernel? Yes. I honestly don't remember. I think you can get that information, but I really don't know the answer. We can try to run the output. Somebody knows? Nope. But usually the amount of memory occupied by kernel structures is infamously smaller compared to the user paid data. Well, you would then have to dig into all the information about the slab and you can access that information. So you can probably not with this specific command. But in the end, yes, you would be able to get all that information there. It depends how deep you want to dig. So back to Crash, you have the option of running PS and see which process, so process ID with process name, all the information you would have with the normal PS. See which processes or threads were using the physical memory. Also you would have the virtual memory of each process. Lookups. You would have a problem where the system freeze because there was a resource being used by several processes. You can end up in a deadlock situation where all of the CPUs were stuck on an instruction and no new task were able to be scheduled on those CPUs. The generic deadlock up situation. For example, in a VM core, you can run a BT, a backtrace of all the process of those CPUs to see what processes were running on the CPU, on what kernel function they were stuck on. And if, for example, all the CPUs are stuck on a spinlock, you would be able to see what was that spinlock which was keeping all of the processes, all of the CPUs occupied. You would be able to determine what was the common resource that caused the lockup. There are several things you can do with Crash. I just gave two examples. We also have some specific plugins to test for issues, for example with the file system. If the file system was frozen and somebody tried to access that file system. With those plugins, you can build up some specific tests and try to see if, at the time of the crash, some specific things were happening on the machine test for them and see if it matches some known use case. But these were just a couple of examples of how you would ordinary try to use Crash to debug some very basic things in the kernel. What I wanted to share with you today is that you need to be prepared for these kinds of crashes. KDumb is not enabled by default. In some newer distributions, it comes as an option to be installed by default but you still need to activate it. You still need to reserve that memory space for the RAM because it talks to you. It depends on if you find it useful to have a specific region of memory allocated constantly in RAM. So you have to configure it yourself. But if you do have KDumb configured, at least you will have the option of doing the analysis for a small term. So you will be able to gather more information after the system has been rebooted and restored into production. Also, once you have that VM core, you have tools like Crash to analyze the VM core and get some useful information about the system without having to debug the problem. Install KDumb. If you're running a production server, we highly recommend it. Half of our cases, we basically say please install KDumb. It will be useful for you in the future if you're running this REL system in production. If you have any questions, is it a high-combing system performance? Well, let me ask you this. Is it running while the system is in production? Nothing's running before the actual panel. So except that piece of memory, a physical memory that you have to reserve to be there in case you want to execute it, nothing's actually running. So it doesn't impact performance because you have something else running on the system. It depends. For example, if you have... I don't know the numbers specifically, but rough estimate, if you have, for example, a 100 gigabyte of RAM, maybe you want to reserve 200 megabytes for the crash counter parameter. This is needed because usually to do the copying, to have all the space-run operations like copy, compress, copy over the network, or copy over a local disk. Does that answer your question? Ah, sorry. Anything else? Ah, I need to... Can you come after? Yes. So how about the little compiler optimization? Do they all do some reasonable debugging after crash? Well, you will need the... When you run crash, you will need to run the image on a debug kernel. So not the kernel compiled with the debug options, but the kernel combined with, for example, the symbols. So not the strict kernel. Okay? There was... So when your running handler actually crashes again, so when you're running handler, the kernel crashes there. It can happen that you have the K-dump while it's running, having some issues and panic itself. Right. Yes. You won't have a K-dump or K-dump. So like, how about pitchfired? Everything's good? Yes. In short, yes. There are situations where, for example, in cluster environments where you have the running system that it crashed, and you have fencing mechanisms by other cluster nodes, meaning that the node that crashed needs to surely be taken out of the system. So an entity powers off the machine. So if that happens, while K-dump is copying the data, you won't have the VM core, or at least not a complete VM core. So is it recommended to run another kernel as the K-dump kernel then? So you would run into the same issues? Well, you would run into the same issues. So in reality, the K-dump kernel is actually the same kernel. It just has a limited set of tools there in the initrom. So the same as a normal kernel would have things in the initrom that has images. So you don't use another kernel version? No, no. You could, but in practice you don't. It's the same kernel image for the K-dump kernel. Why bother with maintaining a different separate kernel when you can use the same one? New York kernels are always better. Okay, I think we're running out of time. I left some links in the presentation. First is for the feature in the Linux kernel, the K-dump feature. The second is the white paper regarding crash. So it's actually a rather old tool, but it explains how it's made. For Red Hat customers, we have some specific articles on how we recommend K-dump being set up. And we actually have a small script that prepares the environment for K-dump. You just go to the customer portal. You run through a wizard in the web interface. It generates a script. You download the script. You run the script. You have K-dump working. So it takes you three seconds instead of one minute. Because you don't want all the time to reserve that kernel, that memory region. It might be impacting on the system. I would recommend everybody to have it. Okay, thank you very much. If you want you'll find the slides there. Thank you and enjoy the rest of the conference. Yes. People with scarves. So this is about the documentation for kernel configuration on your portal. Actually for Red Hat 7, it's still out there. Is there any possibility? No, it's still there. So it's available for... So this link is only for people who have Red Hat subscription. Yeah, I got it. But I couldn't actually find it for Red Hat 7. Give me just a second. I'll just create a new ticket for the Red Hat. And the sabers set us links, but only for the 6 and the 5. There was no information about 7. I'm almost positive that there is for 7. Ah, never mind. I'll check it now actually, I need to check on the internet. Well, that's pretty much it. Do you have a memory? Give me a place, give me a place. The ticket is available for 10 days. Give me a place. Maybe. Maybe for a few days. I don't know exactly. I don't know exactly. I don't know exactly. Well, we have this one. Migration code. By the way, in the middle time, how can you generate the VM core on a working system without any crash? Well, you couldn't because... Let me find that so I can unplug and leave the next presenter. Actually, without crashing the system. Ah, okay. Because I just generate the crash. Okay, I have to export them from Google Docs. Ah, that's awesome. Just only dumping of the memory. PDF? PDF is the best. By the way, the presentation is perfect. I just generated the crash, so I will send it for the rest of it. Thank you. Thank you. I actually like giving out presentations. I like talking about the things that I do and like. Thank you. So, K-Dump Helper. Back to K-Dump Helper. Who asked about K-Dump on REL7? What? Somebody asked about K-Dump on REL7. Ah, I don't know who. Damn, it's there. It has options. Okay. The cable. Yes. Can it be like this? We have the presenter here. You, C4. I would like to cut it, because it was needed here. Ah, I thought it was on the cable. I'm going to cut it. The presenter is doing something. It's good. I think it's better to send it to the back. You should come to USB.