 Thanks, Candice. So today we're going to talk about Linux kernel debugging. So my name is Joel Fernandez. I'm one of the maintainers of the kernel. I maintain the, I co-maintain the RCU subsystem, which is a core kernel subsystem. And I've been developing, working on the kernel for about like 13 years now. I currently work at Google. So just to, just to mention, you know, there's nothing confidential in any of these slides. All the information is public. All the code is open source, but, you know, do your own research and, you know, don't take anybody for their word on anything and definitely double check and, you know, do your research and, before you try anything. So kernel debug, so let's, let's talk briefly about kernel debugging, like an introduction. So in my experience, there is no magic formula. It usually requires some creative, like detective type of work. It requires like, like, you know, some imagination, you know, because problems are very different, different problems have required different tools and different like ideas and stuff like that. So you have to, you have to have very, you have to focus on creativity and imagination quite a lot. Like Mark Twain says, you can depend on your eyes when your imagination is out of focus. So you need to have some imagination as well and, you know, you know, have that attitude of trying different things. So the talk is, this talk is not about the creative part of the detective work though. It's more about what's available and the choice is yours. So, you know, as Steve Jobs said, you know, he said that you cannot connect the dots looking forward. You can only connect them looking backward. And so you have to trust that the dots will somehow connect in your future. So I take that quite seriously because like, you know, for me like, it's like, I need to accumulate enough dots so that I can connect them. And so that's what this talk is going to focus on is we're going to focus on those dots and what's available out there. We won't be covering too much of intro level software debugging introduction kind of stuff, but more like focusing on the actual tools and the dots as well. And we'll have some demos as well to show, you know, as we, to show the concepts as we discuss them. And there's many ways to arrive at the same results. You know, my first kernel patch about 13, 14 years ago was actually not even using any kernel debugging tools. It was actually looking at Wireshark traces and there was a bug in the bridging code of the Linux kernel. And that was my first patch that came out just from looking at the traces in Wireshark, which is this network analy and analysis tool. And then, you know, forming some theories about the issue and trying different things and they should just went away. So, you know, like I said, there's many ways to arrive at the same result and many different tools that you can use depending on what you're doing. But we'll be discussing the most common, some of the most common and easy to use tools that you should know, like those dots that you should know so that you can connect them at some point in the future. Okay, so, and you know, it's important to have an attitude to try new things. Like when you hit a real problem experiment, you know, try different things and that attitude is very important. Always keep trying and it's always impossible until it's done like Nelson Madeira said. So always, you know, keep trying new things and nothing is too hard, you know, until you do it. So let's jump into GDB. So why use GDB, right? I'm talking about using GDB on a live running system. One reason to use it would be to, you know, understand the code flow. So by, you know, by using GDB, you can walk through like a piece of code line by line, you know, in the debugger and understand like how it's flowing and how the control is going through it, what the variables are and so forth. You can also dump data structures using GDB and understand what the structures look like, what their assembler looks like and so forth. Yeah, and GDB is very good at debugging hangs because when the system hangs, you can connect to it and you can look at the state of the system at that moment in time, which is very, very useful. And some reasons not to use it in my opinion are if the issue is not easy to reproduce, right, then a debugger cannot help you at that, you know, on the live running system because you can't even reproduce the issue in the first place. So what's debug going to do, right? And if you don't know what to look for and also sometimes you cannot run GDB at all on a system because it's in production or something like that. So those are some reasons why GDB is not necessarily that helpful. But GDB can also be helpful offline. So if you wanna analyze a core DOM using GDB, you can pass the core DOM to GDB and look at the state of the system and all that kind of stuff using GDB. So it does have some use if you have a core DOM but otherwise I find it not that useful if you cannot reproduce the issue and so forth. So let's see. So I will only talk about QMU and demo QMU and GDB, but the same principles apply to different ways of using GDB as well. Like for example, KGDB is a way to connect to a real Linux system that's not emulated and connect GDB to that system. And there's other tools like OpenOCD and stuff that work with GDB. OpenOCD is more for embedded type of devices but like QMU is very nice to practice. So you're gonna test different things with QMU and kind of sharpen the blade with your abilities of debugging and so forth and then apply those same concepts to these other technologies that are useful in these other environments. You can also run GDB server on a remote host and actually run GDB locally and then connect to that remote host as well. That's useful for debugging like the user space application running on a remote host. So those are some of the other users but we are only gonna talk about QMU. So to start QMU in QMU is GDB server which the GDB client connects so you pass these options minus S and minus capital S. Minus S starts the server and minus capital S actually waits for GDB to connect to the GDB client to connect to the GDB server and only then starts the virtual machine. So the virtual machine will wait until you actually connect to it connect to it from a GDB client and hit continue. And minus S starts the GDB server like I said so that starts the GDB server at the port of one, two, three, four and you start the GDB client by running just running GDB VM Linux and then running target remote local host one, two, three, four, which connects because QMU is running on the same machine that we're connecting GDB to you connect it to the same to local host one, two, three, four and then it'll show some output like this and then you hit continue. So I can show this as a demo as well really quick. So I have a script that already passes minus S. So here I'm starting QMU and passing these options and it's actually waiting for GDB to connect. So I connect with GDB like that. So you can pass that target remote on the GDB command line itself when you start GDB instead of entering it later. So that's using the minus EX option and that's very useful because you don't have to enter it later. And so right now we're connected to the VM and I just hit continue and you can see the VM is going to start off like that. Okay, so that's what I wanted to show you so far. Let me go back here. Okay, so a couple of things that I want to mention without config debug info you won't get any line information. So you have to enable config debug info which will add that line information, the symbol to line information mapping to your VM Linux kernel binary which we passed to GDB. So I already have that enabled so I'm already gonna see line information for my different symbols. But if you didn't, you would not see anything. Okay, you would not see any line information but you may or may not see symbols and I'll go over why you may not see symbols. So there's this feature in Linux called KSLR which loads the kernel at a random address when the system boots. And the reason for that is security. So by loading it at a different address you make things like buffer overflow attacks much more difficult. And so this is actually a big problem of debuggers because GDB would not be able to know where the kernel is loaded. So if you have KSLR enabled you'll see something like this where it'll just, you know, if you do a back trace it'll just show you like these numbers it won't show you the symbols. So you have to pass no KSLR boot parameter which I'm already passing to my QMU setup already. So that's why we saw those symbols, you know we were able to see the symbol name along with the number. So if you pass no KSLR you'll start seeing the symbol names, okay? So no KSLR and debug info. Those are the two things that you have to pass. So now let me talk about now that we connected to the VM what can we do, right? So one thing is if you run info threads in GDB it'll show you the stack frame which is basically the function variant on all the CPUs, okay? So info threads will show you that. And the star points to which CPU we're currently going to be like running commands on like showing the back trace and so forth. And then if you want to switch CPUs you just use the thread command and say it like if I want to switch to the second CPU I say thread two. So now GDB is focusing on the second CPU and I can do a back trace and get the second CPUs back trace. And if I want to switch to the third one I do thread three and I can do back trace and get the third CPUs. So you can keep switching between these different CPUs and analyze the stacks at this point in time. So now let's try to set a break point while we're in the same session and continue to make sure the VM is still doing okay. And then I'm going to set a break point on the panic function, okay? So I set break point on panic like that and now I continue, okay? So the panic function is called whenever the kernel panics. So whenever it cannot proceed whether it's execution you have the panic happen. So let me crash the kernel and show you and there we go, we hit the panic function. So the VM stopped right there and if I do a back trace I can see that it's showing me the whole chain of events that happened leading to panic which is the sysRQ a right that we just did, okay? So that's setting a break point and if I do well is it showing me? Yeah, so it also shows you the arguments to panic as well as you can see it shows you the arguments there and you can also see the code. So if you type list it will show you the actual C code for panic which is super useful. Like right when the break point is hit if you just type list it'll show you the C code. And again, this is only possible because of config debug info because GBB knows where which what the panic symbol corresponds to which lines of code does corresponded, okay? You can also do this assemblies at this point. So if you type this s or this assemble it will show you all the assembler instructions at panic. So this is very useful if you wanna look at what the assembler looks like and step through the assembler instruction and instruction and so forth. So if you don't wanna see C code and if you just wanna see the assembler you just type this assemble. And another thing you could do is add and GBB you can type info registers and it'll show you all the registers and their values at the point that the break point is hit. So you can see here RIP register is actually showing panic because that's where we're at. And it also shows all the special registers like for example, CR3 is used for the page tables and so forth. And these XM registers are related to floating point registers. So it shows you a lot of state it shows you the flags as well, the flags register. So info registers will show you that, okay? And you can look at the arguments that were passed to the panic point. So if you say info arts it will show you the arguments that was passed to panic. So FMT is the argument that was passed to panic. And you can also look at the local variables. So if you say info locals it'll show you the local variables in the panic function. So if we do a list we can actually see you can see some local variables here arts, I state and so forth and you do info locals you can see those local variables. The reason it says optimized out is because GDB has not yet reached the instructions that populate the registers that hold those values. So it doesn't know what those local variables are yet. But as we step through it they will start showing up above. Let's see. Okay, and so now I'll show you GDB's 2E mode. I'll start the VM again, start 2E mode. Okay, so I'm connected to the VM and 2E mode actually shows you the it has this nice display of two windows on the top window is the source code where we're at. And in the bottom is the same from GDB problems we're seeing all this while. And I'll do the same thing, set the breakpoint at panic continue and now I'm going to try to crash the kernel. So I hit the breakpoint and you can see in the top window it's already at showing the source code. So this is nice to see the source code automatically and instead of having to type list, right? So this is very nice. And if I say next, it will step it'll step to the next line in the window on the top. And if you just hit enter and GDB it'll go to the next it'll reissue the previous command which was next. So if I just keep hitting enter it'll keep doing next, next, next, okay. And so say I want to step into this function preempt disable node traces called from the panic function. If I just say step, it will step into the preempt disable function. So this is where we're at. And if I do back trace now it'll show you the, it'll show you that we are in the preempt count add function. So back trace also confirms that. And if you want to go back to the panic back to the panic functions stack frame we just say finish and I believe we're, yeah, we're back. We're back. We came out of preempt disable node trace back to panic. And now if I say next, we can keep continuing like that. And notice when I execute the pure emerge kernel panic then it'll show you the pure kernel panic message on the left. And there you go, it showed that message. So you can see you can execute things step by step and see the results as you do that, which is very useful. So I should, so I think that concludes our GDP demo. I wanted to talk about a real recent use of GDP that I use. There was this issue where in RCU where the whole system would hang after two hours and it was very reproducible. And the whole system would hang and it was completely unresponsive. All the CPUs were unresponsive. So there's nothing in the kernel logs in the kernel console. And so there's no way to actually without anything on the kernel console you're only left with your imagination about what exactly happened. Like there's nothing you can do. So GDP in this situation helped me like a lot. I actually thought it was a QMU bug because it was so strange that all the CPUs would lock up like that. But it turned out it was actually a kernel bug. And there's this code in the kernel called stop machine which makes all the CPUs stop for some amount of time and interrupts are disabled and they're just hung there. And we were not coming out of that stop machine state because one of the CPUs was misbehaving. And even the kernel watchdog wouldn't crash the system which the watchdog basically tries to make sure that the system is responsive. And if it's not, and I'll show you how that works as well but I'll show a demo of that as well. But even that was not like I enabled these watchdog and they're not fast in system as well. So that's where GDP was like super useful because I was able to connect to the system while it was in this hung state and I was able to dump the stack. So the same way I showed you, I was able to do like input threads and then switch to the CPU, switch to all the CPUs and look at the back trace and see what exactly is going on on each CPU. And I found that it was this stack that it was always this stack on a certain CPU that was constantly showing up in the back trace. And so that led me to believe that this is something related to timer interrupts that are constantly running on a CPU and are not giving up. And so then I could focus my effort on timers and I applied other tools to figure out what exactly is wrong with timers and then we ended up fixing the bug as well. So GB is really useful in the real world. So that concludes GBB. We're, I think that 30% through the presentation. Shua, do you wanna open it up a little bit for questions or? We don't have any questions at the moment. There were questions about KGB versus GBB. I think all of those have been answered. So I think, yeah, go ahead. And then also one question, well, I guess on second thought, I have a question of my own. In this particular case, in some cases, GBB, you cannot reproduce the problem obviously, right? Because definitely if you are any timing related issue and such GBB could be in the way of being able to raise conditions and such. So could you really be able to elaborate a little bit on situations where GBB is very useful? You're talking about this particular case understanding the code flow, but a type of bugs, will you be able to elaborate on that as you go along? Yeah, that's certainly, that's a good point that because we're stopping the whole system, we may not be able to, like we're kind of changing the timing right off the system. So we may not be able to reproduce certain bugs related to race conditions that depend on timing and stuff. So yeah, GBB is not really useful for such situations and tracing is more useful. And there's other tools other than GBB like KSAN, for example, that can detect, sorry, KCSAN that can detect concurrency bugs with races and then there's lock depth that can detect like issues with locking and so forth. So yeah, GBB is not useful for that, like for those kind of situations. Yeah. So it looks like we have a couple of questions in the Q&A. Can you see the Q&A or would you like me to read it out? I can try to see it, but I can read it and let's see. This is from, the first question is from Dimitrios. How would you debug a reproducible complete freeze of the cardinal in a QMU VM, which causes the cardinal to ignore even the GDP interrupts? Yeah, I mean, in such a situation, you probably what you wanna do is try to get a core dump of the memory of the system. And then you can run GBB on that offline. Obviously, you can, I mean, if you can't connect using GBB, then you can't use it. So GBB is out of question there. So yeah. And the other thing is like sometimes, sometimes GBB doesn't work well if you use software breakpoints, like the kernel doesn't really work, that doesn't play well with software breakpoints. Although the demo I just showed you was using a software breakpoint. So use hardware breakpoints instead of break, you would use edge break and that would give you better results. Okay, great. Thank you. The second question is, can GBB used on a live machine and not VM, specifically when accessing PMU registers directly using read MSR safe on CPU? Yes, I believe you can do that. You can do those things with KGDB. Yeah, so you would run KGDB on the machine and then run GBB on another machine and then connect to it and do those operations. I believe you can dump MSRs as well. So you can do a lot with GBB, okay? All right, there is one more question. Can we use GBB to debug third-party kernel module problems? Yes, you can, you can certainly do that. And there's these scripts in GBB. One of the things you can do with GBB is you can actually dump the kernel logs, for example. So you can see this is me dumping the kernel logs while I'm in GBB itself and then you can browse through the kernel logs. So there are these GBB scripts that are very useful. I haven't used them recently, but they're out there and they're very useful for debugging kernel modules. So one of the challenges with kernel modules is figuring out how the symbols in the kernel module map to addresses and vice versa. And that requires you to know like the layout of how the module is loaded into memory and so forth. And that's very difficult to do manually. So these GBB scripts in the kernel help you do that. And they can do a lot of things as well, like other than the kernel logs done, sometimes the console will not show you the kernel logs and because the console driver has a problem or something like that. And if you can connect GBB, then you can just extract the kernel logs from the system's memory using LXD message. So definitely check out the GBB scripts and to build them you have to say make scripts GB and that will result in the GBB script string created. And then you just run GBB in the same directory as kernel and then you can just use the scripts. So they're very useful. Okay. One more question. Do you have examples in your demo later on using JTAG? I don't have demo using JTAG, but the concepts are the same. Like if you use this open source tool called OpenOCD, it can expose a GDB server which then you can use GDB to connect to OpenOCD which will connect to JTAG. So the same concepts apply, but no, I don't have demo for this presentation. So it looks like we have one more question. When it comes to certain registers, GDB seems to have some limitations. Is there a way to force GDB to read the value of registers such as MSRs and others that it doesn't display by default? I don't know of the top of my head because I haven't done that, but I would be very surprised if there is no way to do that. I think there should be there GDB commands to do those kinds of operations. But yeah, I don't know of the top of my head. Yeah. I think there is just one last question before, I think before you can... Is this preferred setup QMU plus QGDB to start with working on kernel bugs for newbies who are exploring the kernel? Yes, I think so. A lot of kernel bugs actually can be reproduced using just QMU because they don't really depend on some hardware, but they're more to do with memory and page tables and those kind of stuff. So yeah, I mean, you can certainly look at some issues and debug them using GDB for sure. Just using QMU and GDB. In fact, QMU is really, really useful, really, really important for kernel development because a lot of testing happens with QMU. The nice thing about QMU is you can spin like hundreds of instances of QMU and test like different kernels in parallel on the same machine, right? You don't need real hardware to do that. And so QMU is actually a very useful tool for kernel development and for reproducing bugs and stuff like that. There's other projects like the syscaller project which heavily uses QMU for fuzzing the kernel. So yeah, it's really useful. And Shua definitely will agree with me on that. Absolutely, absolutely. Yeah, okay, kernel CI runs them and then build runs them. So absolutely. QMU definitely has a lot of advantages to quickly get started and then you will be able to experiment with different kernels, different things. However, as Joel mentioned, definitely, if you anything architecture related or anything DT device related and then anything ACPI or firmware related, definitely CPU and DRM stuff, you can't really exercise that with the QMU. So you have, they both have places and some things you can do with a very cheap laptop you won't be able to do with QMU. So it just depends on what the bug is and how you, what your use cases. So there is one question though, Joel, but I will let you decide whether you want to answer it now or push it out for later. Okay, go ahead. Yeah, we can quickly look at it. Let me check how we're doing on time. I think yeah, 34 minutes. So I think we can take one question and then I'll see how, if it's something that's already answered later then we can just discuss later or else I can take it now as well. I think you might have answered this question. Can we keep the kernel running under GDB server environment in cases where kernel freezes after some time? Yeah, you can do that. You can certainly do that. And that's exactly how I debug that two hour issue I was referring to with RCU. So you just run the GDB server, keep it active and then once it hangs, you just connect to the instance at that point in time. Okay. And there is one question, but I don't think this is, does adding this C and C cache to speed up kernel builds and test complicate debugging? That has nothing to do with C cache speeds of build has nothing to do with actual running the kernel and debugging being able to optimization might though, depending on your kernel compiler options that you want to check and see. Yeah, the C cache should be aware of that. And if you change the optimization, I guess C cache should just do a cache miss, right? Yeah, because I guess it has to, the object it provides should be identical to what was generated when C cache could not provide the object, I guess. But yeah, definitely compiler optimizations are important. There's this option I forgot to mention that I had on one of the slides for, oh, actually I'm talking about it later. This is on generating backtraces, like good quality backtraces. And it's, yeah, actually it's this slide. So there's this option called config frame pointers. And if you don't enable this option, then the stacks that you get in the kernel, when a kernel crash happens, they will have some, sometimes they might have certain symbols missing in it. So this config frame point as if possible, it's a good thing to enable so that the unwinder inside the kernel does a better job of unwinding the stack. Otherwise, the kernel tries to, it has this thing called or orc, which is a concise debug information that is stored, that is embedded in the kernel. And at runtime, it refers to this orc to do the unwinding, which is also how the same debug information is how GDB does unwinding. So if you want good runtime unwinding, definitely enable frame pointers. Okay, so symbols are not missing in your backtraces. So sure, shall we continue? Yeah, yes, please. Okay, so the next thing I want to, the next dot that I want to talk about is trace command function graph. So this is a very nice tool that will show you, like, give you an overview of what is exactly going on. If you want to look at, okay, what's going on in this function, right? So I can show you a quick demo of that. So in this case, I want to see, like, up to three levels deep inside of k3, I want to see what functions are being called, k3 and three levels inside of it. And so I can run this command. Let me actually start my VM again and it's booting. Let it finish booting. Okay, so actually let me pass max graph depth of y. So if I do this, and it's now tracing, the function graph tracer of the kernel is running, and it's just tracing k3. And now I stop. I can actually see and do trace cmd report. I should be able to see, you know, I should get these long traces of k3, of k3 and what k3 was calling. So, you know, you can see k3 call these functions. So this is very useful. I've been using this for many, many years to understand how different kernel functions work, what they're doing to kind of understand the code flow. So you can just run a tool, run trace cmd record minus p function graph, minus minus graph max graph depth or whatever. And it will show you, you know, the flow. So this, I would highly recommend anybody trying to understand the kernel source code flow, or any part of the kernel where you have a suspicion, like me, to understand it better, or how it's doing, what it's doing. I would definitely recommend running this. So in this case, I did a system wide tracing. So in the whole system, it's looking at what exactly is k3 doing. Okay. This will go over the options, minus g stands for graph function. So from which function onwards, we start tracing. And then it will just keep tracing all the functions that are inside of that graph function. So that's minus g. Okay. And so the next thing is another useful use of function graph tracer is actually to find function execution times. Okay. Like if you suspect that, I think this function is really slow. Okay. But I want to measure its performance. One way is you could insert timestamps at the beginning and the end, but the function graph tracer will already do it for you. So you don't need to waste your time. So for example, if I want to find out I want to find out I want to find out all instances of VFS read and how much time they took. I can just pass minus L VFS read, minus L basically filters to only that function. It will only trace that function and nothing else. Not the functions that call it and not the function it calls. It only traces VFS read and tries to build a graph out of that. So obviously because it's only tracing VFS read it will only print VFS read and show how much time that function took. So I can show you a quick demo of that. So again, I'm here and running trace command record minus p function graph minus L VFS read. And I stopped there and now when I do report it'll show me, I only had two instances of VFS read called in the entire trace. So it shows me the first one and second one and it shows you this nice plus sign for if the function took a long time to run. I believe it shows you an exclamation if it took even longer like hundreds or thousands of microseconds. So definitely use the function graph trace or more as you explore the source code and you do your debugging. Definitely use this. And there's a there's lots of options where if you want to trace only kernel functions that are called for a certain execution you can pass like certain different options. I believe that's the minus F option where you can pass it like command or something like that. And it'll only trace that function, that execution. It won't trace the whole system. So those are the two things I wanted to talk about for trace command. Now I will jump into and we'll take questions about trace command later as well. But now I'll jump into a real demo of a kernel bug that I introduced just to show you another tool which is useful for finding like long scheduler latencies. So what I did is I instrumented the kernel and I intentionally added a loop in the context switch path to slow down the context switch path just to introduce this bug. So I'll show you the code and then I'll show you the demo as well. So the code is here. So this function RCU node context switch is called from the context switch path. Now all I do there is if we have the boot argument enable CS plug enable, I just spin for a certain amount of time every thousand occurrences. So I don't spin every time, but if I ever hit this every thousand times, then I will do the spin. And I don't do this for the first five seconds of the boot because I had some issues with getting the timestamps early during boot because the clock source is not up and all that. So I just wait for five seconds and then I do this spin. So it's like every thousand times I will hit this. Every thousand context switches, I'll hit this. So it's not something that happens all the time. So it doesn't cripple my system, but it's also not ideal. So I'll show you the demo. So now I just boot my kernel. Okay. So I'm passing this boot parameter so that the bug starts showing up and I boots. Okay. So now it will do some prints as well. Every thousand occurrences, it actually does a print that the, you know, we have, we have through thousand every thousand context switches, it'll print the 4,000, 5,000, 6,000 like that. And now if I run some command, like perfs get record. So, okay. So what am I doing here? Right. So here what I'm doing is I'm, I'm running the find command and I'm running perf to record perf traces. Perf is another tool that is used for recording like traces. But, and it has other, other features as well, like profiling and stuff, but perfs get is particularly useful for recording scheduling traces. So I can say perfs get record. I want to kill the find function as well once it, once 10 seconds have passed. Okay. And I do that. And now I'm recording traces find is going to do its thing. And the find command is running. And it's definitely hitting this bug because find is doing a lot of things and it's definitely context switching and hitting this bug. So now perf is done running and perf writes the perf traces. And now I can run, now that the perf traces are collector, it shows up in perf.data in this file. If you do perf, perf script, it will actually show you all the traces with the timestamp. Okay. And that's great, right? This is like all the information that you could ever need. But what's really cool is you can do perfs get latency, run perfs get latency, which will actually look at that perf.data file, all the traces in it. And it will show you the worst case latency that the system experience. So if you do sort, if you do sort max like that, it'll show you the stable. And you can see here that different commands experience different, different processes experience different latencies. So the worst was actually the RCU preempt thread, which experienced a max delay of 300 microseconds and, you know, an average delay of 150 microseconds. And what's really cool is perfs get latency will also show you not only the worst, the processes that experience the worst latency, it will also show you where the latency started in the trace and where it stopped. So if you take this timestamp, okay, 78.06 seconds, and you do perf script, and then you search for that timestamp, it'll take your right to that. So somewhere around here, the latency started. So you can see that the RCU preempted was preempted out by a migration thread. And so that's the point from which it started to wait. So perfs get will not only show you the amount of time processes or tasks, as we say in Linux are waiting for CPU after they wake up, but also when they were preempted. So in this case, the state R here tells us that RCU preempt was running and it was forced off of the CPU so that another thread could run of higher priority. Okay. And, you know, it will also perfs get latency sort will also show you where the latency stopped. So these two fields were actually added by me because I really wanted to see where the latency stops, starts and stops in this giant trace, right? I want to narrow it down. So I actually added that to the perf, perf tool myself. I've used this tool like a lot in the last like three or four years to fight various scheduling latency type of issues. I literally upstreamed like a major scheduling feature which took us 10 months to upstream. I found like tons of bugs in the scheduler using this, just these two commands because I can run a use case and then run these two commands and I can instantly see boom, like, you know, that's where the problem started, that's where it ended. So I could iterate very quickly on finding and fixing a lot of bugs, a lot of scheduling related bugs. Another thing I do is I actually run perfs get and trace command sometimes at the same time. You can actually run them at the same time. And whether that is useful is trace command is really good at event tracing and perf is really good at summarizing a large trace. Trace command is not good at that. And Stephen Rothsted is actually working on tools to improve that. But this is what I do just to show you my secrets, so to speak. I'm giving away all my secrets in this presentation because I want everybody to do really well and I want the kernel community to really prosper and Linux to be great. So I'm giving you this away. This is what I do and I would encourage you to, if you have scheduler problems, check this out and run these tools and, you know, make the scheduler better. Another thing you can do is you can pass options to perfs get where you, so perfs get by default, it'll only record scheduler event. But say you want to record another event, like I want to also record CPU idle trace events. And I want them to be in the perf trace as well. I can actually pass minus E and then the actual trace event that I want. I want to, you know, and it'll be interspersed with the scheduling traces. So what I can do is I can quickly run this and show you. So here, so let me run it. Let me run find again. Now perf is running and it's collecting all the traces. How are you doing on time? Got 40 minutes. Yeah, around 40 minutes. Okay, awesome. So now when I do perf script, I not only see the scheduler events, I also see the CPU idle events. So if the scheduler traces were not enough, like you see that the latency started here and ended there, but you want to know deep, you want to go deeper and see, okay, were there any power management events, were the CPU idle events and so forth? You can pass minus E and enable all of those and then look at, okay, you know, it might tell you that, okay, this event actually caused the latency to happen, right? And so that's why enabling events like that can be really, really useful. Okay, so that's trace command and perf. Now let me jump into another thing that I do quite a lot. And this is not too difficult, but it's something that, you know, to keep in mind is that there's this technique called shotgun debugging, which actually Stephen Rosted taught me, like he does this a lot. And basically when you take a piece of code and you put printk's in it, you sometimes don't need, like sometimes you might like printk a certain message and printk another message. Instead, what you can do is you can just copy paste this line and just put it all over the place. And let me show you a quick demo of that as well. So here, let's see here. So I have this shotgun.dev that I will apply to the kernel sources. Yeah, I applied it. And here what I've done is I've printed this trace printk line, which shows the file name and the line number. It's the exact same line of copy pasted that like six times. Because what I was trying to do here is show how you could, just like gdb, right? You could do this and you could look at the flow of execution. You can actually see, okay, were these branches taken or were these not taken? Like what exactly is this function doing? Right? So if you do printk or trace printk, trace printk actually prints the message to the kernel trace buffer. You could, you know, it's the same thing. You could quickly identify how certain functions flowing. So if, so here I sprinkle trace printk around, I believe it's around the same path where that bug was that, yeah, where that context which bug was, that's exactly where I'm also doing this trace printk of file and line numbers. And let me copy the kernel binaries for the demo and then make boots. Boot the kernel. And so now in this kernel, it's actually executing those trace printk's and it'll show you like an output that looks like this. And so it'll show you like, okay, this function was in this file in this line number. And you can put this across different files and it'll show you the files and the line numbers. Then you can go back to your development editor and you can see, okay, it went here and here and here and here. So that's what this function is doing. So this is another really useful thing that I do on a regular basis to debug the kernel and get a better sense of what is going on. That's simple. And another thing you could do to understand what the kernel is up to is you could dump the stack at any given point in time into the trace buffer. So for this demo I have, let me check out the code. So let me take away the code of the other demo and let me show you what I'm going to demo now. So here in this scheduler schedule, the scheduling loop, this is what executes every time a process goes to sleep or it switches to another process or task. And here what I'm doing is, let's see. Yeah. So what I'm doing is, so this came out of a real problem that I was facing. I was seeing that RCU torture, which is RCU testing suite, the testing kernel module, was actually entering uninterruptible sleep quite a lot, right? And I could only see that it was entering uninterruptible sleep, but I didn't know why it was entering or which path it was taking to enter that sleep because it enters an uninterruptible sleep in a lot of places. So I didn't know which place to look at. I just knew it was entering uninterruptible sleep. So what I did is I went to the schedule loop and I put some code there that basically says every 10 occurrences of this of an uninterruptible sleep dump the stack to the trace buffer. So what this does is I first of all, I wasn't flooding the trace buffer because a lot of things might be doing uninterruptible sleep. So I thought I'll do a sampling kind of approach, but I'll only do it every 10 instances. I won't like do it all the time. And let me demo this as well. Okay. Start preimmune. And I'm not sure fully if I will see traces, but I should be able to see something that is entering an uninterruptible sleep if I run some commands. Okay. And now let me read the kernel trace buffer. Okay. Yeah. So I ignore the prints from the previous demo. So this is the important stuff. So you can see that there's some K worker here that actually entered the schedule loop and we dumped the stack. So this worker thread was actually the one that called schedule. Not sure why, but that's what it did. So yeah. So this is another useful trick. Just looking at the stack, you know, you know that, okay, the system is behaving in a certain way, right? In this case, it was actually entering uninterruptible sleep. So now I want to know why it is doing that. Who is doing that? Right. So now you can go back to the kernel and say, okay, every time this happens, I'm going to dump the whole stack to the trace buffer. And I've debugged a lot of issues like this. Another one was with this SE Linux violation, where, you know, I was seeing the SE Linux messages and I was like, oh, you know, who's doing these things? And just dumping the stack, you can see the whole, like, code flow and, you know, why you ended up there. So next I want to show you kernel hands. So here I have a kernel module that I wrote that basically hangs the system. And the way it hangs is I storm a certain CPU with a lot of IPIs, which are inter-processor interrupts to a point where that CPU just can't do anything but service those IPIs. So that causes a lot of problems. So let me show you the code for that really quick. So here I start my module, the init function of the module, create a kernel thread on the CPU on which the init function of the module is running. And I select the target CPU as the next CPU module or the number of CPU IDs. And the kernel thread, what it does is that target CPU, it floods it with IPIs using this function called SMB call function single async, okay? And it, so what it does is for 50 seconds, it floods 1,000 IPIs every five milliseconds. So after 1,000 IPIs, it will go to sleep for five milliseconds. And then again, it'll do the 1,000 and again, it'll do the 1,000. So this is enough to hang the kernel because that CPU, at least hang that CPU. And sometimes the kernel doesn't hang immediately, but eventually you start seeing issues and I'll show you those issues. But that's what this module does, okay? And just to demo that, let me see. Let me go step by step here. So actually what I want to do is show you GDB, show you GDB actually showing you this IPI flood. So I'm going to start QMU with the GDB option. And I'm going to connect, make sure I'll be the right kernel. Okay. And I'm going to connect GDB, okay? And we're booting. And now I'm going to load my kernel module that will storm IPIs, as I mentioned. I think I'm not able to load my module because for some reason, let me try something else. Okay. So my kernel module loaded and it's CPU 3 is flooding CPU 0 with IPIs, okay? So now I stopped GDB here. And if I do in full threads, you can see that that's exactly what is happening. CPU 0 is actually in this IPI handler. And it's doing this, you know, this whole call stack is basically that IPI handler that's doing that M-delay inside of the IPI, okay? And, you know, if I continue and I stop, I'm always going to see the same stack trace on the CPU 0 because it's, basically this tells you that that CPU is locked up. It's never going idle, right? Like if you look at some of the other CPUs, you'll see that if I switch to other CPU 2, it's going to show that we are in the idle loop, right? But that CPU 0 is never going to go idle because my kernel module is giving it a hard time. Okay. So now I'm going to continue. And what we will see is eventually we will see RCU stalls showing up. And I could have accelerated the RCU stalls, but RCU stalls are basically this issue where RCU is not able to make progress because some CPU is locked up. And it takes like 20, 30 seconds to happen, but there's a way to accelerate it by telling RCU to reduce that stall timeout. Okay. What I'm going to do is I'm just going to, and the thing about RCU stalls, so let me talk a little bit about RCU stalls while this is boring. I showed you the stacks showing that the CPU was hung. RCU stalls, so the thing about RCU stalls is that you won't see an RCU stall unless you have some RCU activity in the system. That's why RCU stalls are not a good way of detecting hangs because sometimes RCU stalls will happen much later if you don't have any RCU activity. Okay. So what I'm going to do is I'm going to try to hang the system. Let's see if an RCU stall shows up now. Yeah. There we go. So RCU detected a stall. And you can clearly see here, it's in the RCU stall warning, it'll show you that CPU is zero. Zero dash actually points that CPU zero is actually not responding. And so when you see an RCU stall, that usually indicates that the CPU is not responding and it has it's locked up. But there are many reasons why RCU stalls can happen. And they're really bad. But this is not the best way to detect a lockup in the system. For that, there's another tool called the hard lockup detector, which I want to talk about next. And basically what the hard lockup detector does is, so there's different kinds of lockups. Okay. There's hard lockup, there's soft lockup, and then there's something called hung tasks which we're not going to talk about right now. But a soft lockup happens when you have the system is spinning in kernel mode, it's spending a lot of time kernel mode, but that's what it's all it's doing. It's still able to process interrupts and so forth. Whereas a hard lockup means even interrupts are not able to be serviced. And the issue we're seeing with this IPI being flooded, the system is always executing this IPI. And so the other hierarchies are not able to be executed. And you start seeing hard lockup. So I will show you the hard lockup detector. Actually, instead of RCU stalls showing you the hang, I'll show you the hard lockup detector. It's showing you the hang as well. Let me run the demo again. Okay. And before I load my kernel module, I'm going to, before I load my kernel module, I'm going to start the hard lockup detector. Okay. So this watchdog Thresh basically tells us that we want to detect lockups like within two seconds. So every two seconds, that CPU has to respond to these watchdog timer interrupts. If it stops, if it doesn't respond to those interrupts, then we conclude that that CPU is locked up. And we dump this stack of that CPU from another CPU and so forth. But it can also happen in some systems from the perf interrupt, which is an NMI. And it dumps the, it detects this on the CPU that is locked up by interrupting it with an NMI. If you don't have NMI, another CPU can do the detection instead. So all the CPUs have to respond two seconds. Right now, actually, all the CPUs are responding. That's why the system is doing okay. That one is actually to enable this, the lockup detection. Okay. So now when I load my kernel module, it should print a dump. Okay. And let's see if the lockup is being detected. It should have detected it by now. So something is going on. Let's see. Okay. So it actually did detect it. It just printed it in the D message. Because I passed the choir option, I believe, to the kernel. So I didn't print it on the console, but it did detect the lockup. So you can see it says hard lockup on CPU zero, which is exactly what our target for the IPIs were. So CPU zero was what we flooded with IPIs. And the hard lockup detected successfully detected the lockup. So this is a better way to debug lockups than use the RCU stall detection. And basically you can see that it will show you that on that CPU zero from the back trace, you can see that we were in that IPI handle. But this is the same stack that I showed you when I showed you the stack on GDB. It's the exact same stack. Okay. So that was a hard lockup detection. It's 2.10 Eastern time. I do want to talk about f-trace. So the next thing I want to talk about is dumping f-trace to the console. This is another super useful thing where you can start the kernel in this mode where it's collecting traces right from boot. And then if it runs in, if it has an oops, you can dump the whole f-trace buffer to the console and the kernel locks. So it's almost like a flight recorder where you don't know if you're going to crash or something bad is going to happen. But you just start this flight recording and tracing is going on forever. And then when a bad thing happens, like an oops or a panic, then it will dump the whole trace to the trace buffer. And there's a couple of boot parameters that you can use to dump the traces on different and different bad things happen. So if a kernel, so let me talk about the option. So you have f-trace dump on oops, which dumps the f-trace buffer on both oops and panic to the console. Oops doesn't mean the kernel has stopped working. It just means that something bad happened. So even in those cases, it'll dump the trace and then the kernel may just continue executing normally. However, when a panic happens, obviously the kernel cannot proceed anymore and everything stops. In both cases, f-trace and dump on oops will dump the f-trace to the console and the kernel ops. And then there's an option called trace event which a boot parameter called trace event to which you can pass what events do you want to enable? What do you want to record in the trace buffer? And then what buffer size do you want? So those are some options. And then another, a few more options is f-trace dump on oops will dump the trace buffer on panic. But maybe you want to dump it for other reasons. Like for example, if you have a kernel warning, you want to dump that trace buffer. So a nice trick you can do is just have the kernel panic on one and whenever a warning goes off, it will panic and then that panic will cause a trace dump to the console. That's very useful in a lot of situations where you know that's warning happens but you want to dump the trace buffer when that warning happens. So you just panic the kernel at that point by saying panic on warn and the trace gets dumped. Essentially the pattern is pass f-trace dump on oops and then pass panic on y, y, y, y, y, y, y, y, y, y, y, y, y, some problem like rc install or warning. And with this combination, you can dump the trace f-trace buffer to the console whenever any of these problems happen. So let me show you a quick demo where I'm passing these f-trace dump on oops. I'm enabling scheduler events and the CPU idle event. I'm saying I want my per CPU buffer size f-trace buffer size to be 1k and I want to panic on warn. Okay, so let me copy the right kernel. Shut this down and I'm going to run the kernel with those options, okay? So now tracing is going on in the background. You know, to confirm that I just kept the trace buffer. I can see that it's tracing, right? So now as soon as any warning goes off in the kernel using the warn on function, it will dump the traces to the console. So I have a kernel module that I wrote that just triggers a warning, okay? So if I load that, boom. So now it's showing it not only printed the warning, it's also dumping all of the traces. So this is my kernel logs that are being printed to the console and you can see that it's showing you the trace buffer. But this is super useful. Like this is another thing I do all the time. Like, you know, when a crash happens or something bad happens, like a warning or something like that, I can just dump the trace buffer to the console and look at what exactly happened. And we're trying to improve that in the kernel because sometimes the console is not fast enough to take in all the messages coming from ftrace. So that's a work in progress, but it's still very useful. I could show you a demo of the IPI stall warnings, the IPI flood issue causing RCU stalls, but I think we might be out of time for that. So there's, again, if you pass panic on RCU stall and you pass ftrace dump on oops, then when that RCU stall happens, it will dump the whole trace buffer to the console. So that is super useful to figure out why an RCU stall happened. Like why did the CPU get locked up? And I could demo that as well if we have time, but you get the idea. Another very useful option is trace off on warning. Trace off on warning basically is like, so you have this flight recorder that is recording traces. The problem is that when you panic the kernel, it will still continue tracing. The tracing doesn't stop even though you panicked it. So trace off on warning is very useful when you want, like you know that a bad thing happened, like the warning happened. Now you don't want to trace anymore. You just want to stop disabled tracing. So this is very useful when you can, once the traces are dumped to the console, you can jump straight to the end of the trace. And you know that, okay, this point, this part of the trace is where the warning fired. So it takes out all the garbage after that, which might not be relevant and lets you focus only on the traces that are relevant. So I can quickly show you this as well. You have 15 minutes. This should be probably our last demo just so that we can take some questions as well. And as you can see here, I'm passing trace off on warning, which was the only difference from last time. And tracing is going on. And now when I load the warning kernel module, it's dumping all the traces. And you can see, you'll see that in the end, after the dump completes, you'll see this disabled tracing due to warning. So the warning fires, and then so this is again very useful because I can just go over to the end of the trace and look, okay, what were the last set of events in my flight recorder before my flight crash, so to speak, right? And so that's another trick that I would encourage you to look at. So just to quickly go over some tools that I did not get time to cover because we have only so much time. KSAN is a tool that is very useful for diagnosing memory corruption bugs, like use after freeze and out of bounds, buffer overflows and stuff like that. It has a lot of overhead, but if you suspect that your system might be undergoing some kind of memory corruption or something like that, this is a useful option to enable to see, okay, is there a kernel bug that is causing any of these C language memory safety type of issues, okay? And a couple other tools I want to mention. Log depth is very useful for detecting locking issues like ABBA deadlocks or other kinds of safety issues where you're taking a lock inside of an interrupt handler and not inside of an interrupt handler so you can have a deadlock and the log depth detects all that. Preempt hierarchy is all cases very useful for finding issues in the kernel where the kernel terms of preemption are interrupts and that causes, when that happens, the scheduler doesn't run. So that can cause a lot of latency problems. So preempt hierarchy is all cases very useful to look into those as well. KSAN, I would talk about. KCSAN is useful for finding data races in the kernel where you access shared memory locations from two threads or CPUs and you do that concurrently and that can have bad effects that are called data races and that's another source of bugs. KCSAN is relatively new and it has false positives but it also picks a lot of bugs. And then hung task detector is very useful for situations where you have some tasks that is just hung in an uninterruptible sleep state and the system is proceeding normally but there's this one task that is hung. And so you can run the hung task detector to diagnose those. So other than that, I wanted to talk about VS Code really quick. So this is something I started to use a lot more this year just because I can run it with ClangD. So ClangD is this tool that it's called a language server and what's going on here is like as I'm typing the kernel is actually ClangD is actually compiling this translation unit. So I can see compiler errors as I type which is super useful because it saves me a lot of time you know having I don't have to I almost never like have to go back and fix build errors now at least like not 90% of the time ever since I started doing this. So ClangD is super useful for that so and VS Code is integrated with ClangD so you know this sort of thing is very useful. VS Code also has a couple of other things that are super useful like Git blame is built into it and you know so you don't need to run Git blame for example if I go to this line it's showing me in gray who the author is if I hold over it it shows me the commit and everything so I can quickly go to different lines and see okay who modified this line right what is the commit or what is the context I can click on the commit and it will show me more details about the commit and the code that changed and all that kind of stuff. VS Code also has built in Git Dev which is really really cool like if I type any code it shows me this and the left it shows me green so if I click on that it shows me okay this is the code I modified this is the code I added so this is very useful to see what I changed in the code and when you delete code it will actually show you on the left in this different color about code that was deleted not able to see that right now yeah see it shows in this red arrow here if I click on that it will show me it'll show me this red bar and it shows me that I deleted this code so this makes development a lot of fun and a lot of like a lot easier because you reduce your errors and so forth so yeah maybe it's a good time to take questions because VS Code crashed but yeah that's that's pretty much all I had yeah I'm happy to take any questions I did want to show that VS Code also has this built-in terminal if you hit control so this is really useful because I can be a succession to a machine somewhere and I hit control that take and it'll it'll not only edit code but also show me like a terminal on which I can do different things so definitely use VS Code and make your make your life easier but use whatever whatever you know makes you comfortable so so sure yeah yes Joel so we do have a few questions I fielded some questions I do have a question somebody is asking about if your RQ script is open source and if it is where where can we find it it's not open source but I could certainly do that if you don't mind reaching out to me I can see if enough people are interested I can put it somewhere or I can just send it to like whoever request for it the reason I don't open didn't open source it because I'm doing a lot of stuff in a data specific to me and I don't want the burden of maintaining it so that's the only reason why I kept it to myself but I'm happy to share it okay the second question thank you thank you for that um another question is I have a system installed in field and I do not have console access then how can I check the reason for oops then how can I access oops lot how can we debug live system where traffic is running and enormous data is transacting okay yeah so that's a good question so for that there's there's some very useful tools I didn't demo it today but it's called KXAC and KDOM where whenever like a kernel panic happens it'll reboot you can reboot into another kernel and dump the whole kernel all the relevant parts of the kernel memory into this core dump and then analyze it after so that's that's one way to do that another way to do that would be to collect traces and you know upload the traces periodically to some server or something like that and analyze the traces so that those would be some techniques to to help with those kind of issues okay so what's the best way to debug a specific kernel failing to boot on specific hardware in this case it doesn't seem the kernel comes alive is it possible to boot a working kernel and use kgdv kgdv plus kxac to boot the kernel and figure out what's going on so the question is is the you said the kernel I guess I'm confused because the question said the kernel is not booting but then it said it's booting and yeah it really depends usually yeah right you're right it doesn't say kernel no kernel is not booting it seems like okay usually the way to do that is to start from the boot loader and that's the problem is like with kgdv kgdv has to come up for you to be able to to debug it to to start debugging if if the kernel is not booting I'm up until that point where kgdv comes up you can't use kgdv so for that you need more lower level debugging tools like jtag and you need to probably use jtag and see whether you know the kernel is is loading or not there's this phase of the kernel called decompression where the kernel is compressed and it has to be decompressed and then you jump into that and all that so you have to step through that code and for that you need more lower level debugging tools so one thing yeah there's also like init call debugging which is very useful enable init call debugging you can see what is going on in the boot process but that is I guess much later than if the problem is happening before init calls are started then you know you need you need other tools you could enable early init call as well so there are some tools available early on but yeah depending on the stage you get to so I I did a blog on that I'll post that in a little bit I was playing with init call debug but yes but yeah the kernel like Joel is saying if the kernel doesn't boot kgdv needs to come up different stages yeah ah okay another question what root of us are you using with QMU do you have a build custom root of us so that's so what I do is I build my own CPIO archive using busybox so it's a busybox CPIO archive and I use that as my initial RAM file system in it in a ramfs and then what I do is I ch root and do another into into into a Debian distribution so I do something like that it's a funny setup so again this was not meant for public it was meant only for me but there's many different ways to run QMU sure might have other ideas as well but I think people use libwork quite a lot and verisetch which I didn't want to do that because I want to pass all the options bare bones to PMU I want like full control over it so that's why I didn't do that but you could build a root of us in many different ways and boot boot them with QMU right and then especially if if you are debugging any of the problems that are on sysbot and those sites they provide you diskimaging and image and all the artifacts so you can just bring it into it I also use QMU a lot Joel for the same reasons you mentioned I have more control what I do use though I use word as such frontend for QMU so that I can manage them and keep those around keep the virtual machines around for me to play with and then I can create different virtual machines to pick up and then compare them I see can you can you use verisetch if you don't if you don't use libwork to boot to boot the VM or like if you use QMU and then can you use verisetch after that if you use QMU directly yeah no well let me see you can't you can't yeah it's one so but yeah I kind of go back and forth I do use in some cases I keep the libwork and work so that I can keep them around so that I can play with different options say different configuration options or whatnot so that I have I have something to go back to you could do it manually managing QMU images as well but what will allow you to save when you if you figured out all of the options that you want to enable you can just save them and then kick the virtual machine off instead of instead of remembering all of that so okay I see yeah definitely worth checking out right so let me see there is one earlier about process dead process stack is there a way to let me read the question is there a way to a way we can get dead process stack messages as part of FCRAS or kernel logs instead of using pstas AUX and then grabbing the PID and then going and looking at yeah a slash there is actually a sysrq option to dump all of the stacks of even of processes that are not running so check that out I don't know off the top of my head what the sysrq option is but that's another thing like you can use to see if like okay what are all the tasks on the system blocked on doing like is it a few texts or something else is it some kind of IO request another thing you could do is you could pass the panic option to the kernel and it will dump many different useful things when a panic happens like it can dump like the locks that are held and one of the things I believe that it dumps is also the stacks of all the tasks in the system another way to get the stacks is actually through crash so crash is basically this tool that can analyze kernel core dumps and you can collect a core dump using kxseq K dump that mechanism or actually you can connect connect a collect a core dump using QMU directly and then pass that to crash and collect the stacks that way as well I've not personally done that and I have to do that for myself but that's another way to collect a core dump and get the stacks I know people who do that sure we're out of time there is one question about production system debug I think Joel already addressed that really with the production systems you have to you have to figure out what's the a simple way to de- recreate the problem reproduce the problem that's what I do I ask for a what is the minimal things that you can do to debug and then gathering the logs in the field and asking for the logs all of those stress and G is a different tool it's not it can be used for debugging but it's a it's more of a testing tool than debugging tool it can stress the system so those are the questions that were there Joel what is the good good tool apart from stress and G and I'm guessing it's for the stressing the system so that those are more stressing and stress testing tools like sysbar and such and fuzzing yeah I think all right the LPP project also has a lot of test unit tests and stuff and of course case of tests has the case of tests that sure maintains has a lot of good stuff that we use a lot to test the kernel as well right and k unit is another one as well k unit yeah so all right sorry we are two minutes past Candice how are we doing yeah I can wrap up or if you guys have any other questions you want to answer for a few extra minutes totally up to you I don't think so thank you sir coming in Joel thank you so much absolutely hope it was helpful and thank you very much see you next time thank you Joel and Shua for your time today and thank you everyone for joining us as a reminder this recording will be on the Linux Foundation's YouTube page later today and a copy of the presentation slides will be added to the Linux Foundation website we hope you join us for future mentorship sessions have a wonderful day