 Okay, thank you Okay, hello everyone. Thanks for joining the session Let me show the presentation here So yeah, I had a few technical problems here, but it could hopefully solve and I'll prepare a lot of hands-on kind of action here in my terminal, so I Was hoping we I could do this right now because I mean but hopefully it's working and I hope it We enjoyed the session. So my name is Sergio and from Brazil And I have been doing a lot of embedded stuff for 25 plus years. I have a company by lab works and I do consulting training Everything embedded Linux related and I am a Kind of part-time Contributor of few open-source projects including build route the Linux kernel and the Octo So our idea here is to talk about the buggy. That's a topic that I found it's fascinating because This is something that It's a kind of a byproduct of development, right? We usually don't learn about the buggy In theory only in practice, right? We we learn about programming developing software, but when comes to finding solve issues like We have to learn that in practice. There are a few books out there, but I always like Feel that we should have more Documentation and more resources about this so that's why I'm doing a few of those the bugging talks out there I have done one three years ago in the last embedded in this conference that I have Participated it was kind of Focus on the kernel so right now I'm doing another one, but more Broad like I'm gonna cover kernel and also use a space to buggy I have here a board from to the dex. I'm running a really really small System I'm booting from the network So When I put the device it will download the kernel the device tree and Mountain with fell system over the network using the files. So it's a kind of good architecture to To try stuff to the bug. I mean, that's what we're gonna try to do here I want to start with a little bit of introduction Let me manage my time here because I have a lot of stuff that I want to show so I'm gonna be very fast in the introduction because I want to to focus on the hands-on on showing kind of real Cases and Let's use the tools right like GDP F trace and several different tools here to Find out bugs in an embedded Linux system One of the ideas here is to show that There are a lot of tools right and You may have different kind of Problems and depending on the problem that is One or a few of good tools to solve that kind of problem like if we're having Performance problems, you probably don't want to use GDP to the bug performance problems because DB will impact the performance of the system and So depending on the problem, you're gonna have probably The right to to solve that so my idea here is to talk a little bit about the problems that you can have and the tools That you have to solve this kind of problems and show everything in press It's kind of a joke that I like right as developers that writes author and when someone finds bug in our software, right, we have this kind of Six stages from basically denying right I mean, I wrote the software. There is no bug there And then you go to like okay, but in my machine, right? I like could be happening in your machine, but in my machine it works until we go to the question right How did that ever work? that's usually true So Well, we know what the bugging is right So we decided to use that word bug to represent errors in software and the bugging is removing bugs from software errors from software If we think about where the process to to the bug any error or issue I think we can come up with this kind of five steps Understanding the problem. That's very important because if you don't understand the problem, how can you solve it, right? Like if you if you see a kernel oops, if you don't know what a kernel oopsie is if you don't know how to interpret this Dump from the kernel. How can we can you solve this problem? So understanding the problem understanding what is happening is very important The second step would be reproduce the problem. That's also very important because if you don't know how to reproduce the problem, how Can you confirm that to fix it in the end, right? So have to to know the complete steps to reproduce a problem So later on when you apply the fix you will run these steps again to make sure that the problem is Fix it the next step and then find the root cause Usually that's what takes most of the time, right? Like You're having a crash in the kernel So you have the kernel ops the kernel pane if you can analyze the kernel you have to understand that You have to be able to reproduce that right you have to know the steps to cause this crash And then the next step is identify the root cause what is causing the crash? That's usually take or can take a little bit of time The the first step is now that you find the root cause you just have to change the code rebuild it deploy it and test it Usually that's kind of fast, right? Of course depends on the software is the size of the software, but Usually don't take much of our time and of course if you fix it Just celebrate if not go back to one of the earlier steps to see what is happening What is going on? I can Looking at different problems. I can like divide the problems in five categories Maybe if you know all their category you can just let me know But if when we have problems like I can try to put those problems in five different categories Crash is a kind of problem when the software like just interrupts abruptly Look ups when the software just hangs doing something that we don't know Logi or implementation problems that those are problems that like Everything looks like it's working, but the output is not the desired. So it's a kind of implementation problem Resource leakage. That's another kind of problem, right? Could be memory could be I don't know file descriptors Anything that can be allocated by an application could be liquid if the application doesn't be allocated in the end and performance or lack of performance, right? That's another kind of issue everything works But the performance is bad. Usually a performance issue It's a kind of a usability issue right like the system comes is low It's easy to use is difficult to use because it is using a lot of CPU sometimes you can have consequences like reboots, right like Your software starts using lots of memory and then the out of memory killer from the kernel just Triggered and then reboot the system things like this But the the cause is a performance issue And there are tools and techniques to debug those kind of problems and In the end I could come up with five categories of Tools and techniques our brain is one of them, right? That's possibly the main tool that we have The second category of tools and techniques post mort analysis It's a kind of a technique where you just collect information From the system to do analysis somewhere else, right later So maybe you don't have direct access to the device to go that and do the Analysis that you just collect information and you go to our machine So you have more tools to do this analysis Tracing profiling it's a kind of analysis that I Mean most of us do when you put prints in the code you are doing tracing, right? Tracing is a kind of technique where you instrument the code at runtime So you add instrumentation points in the code could be at build time or at runtime But at runtime the code will be instrumented with your instrumentation points And when you add prints in the code you are doing just that But as we will see here, there are a lot of tools that could put the prints in the code for you, right? You don't have to open the search code And Interactive debugging that's another class of known tools for the buggy, right? Like to be for example, it's the most known tool to in in our space That makes it possible to interactively Debug the system, right and run the code step by step look at the memory things like this and The last one that I'm calling the buggy frameworks It's kind of tools that were built to the bug specific kind of problems For example, Valgrind that has a very known tool for that Valgrind is a kind of framework where you can build tools on top of it to the bug memory problems or Do profiling things like this Hopefully if we have time we're gonna see some hands on all of those tools So my idea here is to talk a little bit about each one of those categories and show in prints how that works starting with post-mortem analysis It's kind of very common right to have crashes could be in kind of space in user space and We have tools to analyze kind of visual crash could happen because the software is misbehaving like trying to access a Univalid and memory address or is trying to I don't know executing a invalid instruction, right? And the software will just crash. It's a user space code The kernel will send a signal to the process For example, if a process try to access an invalid has your memory the kernel via the MMU will identify that and send a signal to the process zigzag V and that process will be Ended right will be aborted How can we debug problems like that? So I Gonna I have everything here in these lights right in case like I cannot do their hands on but Everything that we have here in these lights. I'm gonna do in the terminal and show you guys how that work So as I mentioned I have here is more embedded in the system I added a few bugs in the system. So we can do this hands on here. The first one is when I injected a A pen drive I have Kernel ops message crash in the kernel. So this is a kernel ops message Should we ignore this of course not Should we be afraid of this of course not There are a lot of numbers here, that's true But there are a lot of Useful information here, right? So if you go from the beginning of the kernel ops message, we can see the reason unable to handle kernel pointer the reference and We can see where We had a problem So we can even see the function because there is a config option the kind of enabled color colored config all seems That's usually by the full enabled in the kernel and It is able to solve some symbols like the name of the functions. So we have there the storage probe That's the function that crashed and what is what sir? config k all seems Yeah k all This one usually is enabled. So we usually have those kind of Symbols in a kernel. That's why we can see for example in the end We have the back trace right with all of the the the functions that were called So we have like in the end we have k thread That's because the enumeration of the OSB happens inside the thread in the kernel Via cow cow worker So k thread cow worker thread and you can go up until the function that caused the crash in the kernel Storage probe. How can we analyze this? We have the address so we can just convert the address to the line of code And that can be done. You just need the source code of the kernel. You need basically two things You need the source code of the kernel. So I have here the source code of the kernel that I'm running there And you need the L file of the kernel with the buggy symbols That's the file called VM Linux in the in the source In the search code of the common you build a kernel this fire generated This fire is used to create a kind of image in the end but In the end it is not used to boot the kernel this fire here But you can use it to debug the kernel because you have there all the symbol tables of the kernel you can solve symbols with this file So we can use two tools for that One of them is a DDR2 line This tool you can just give Let me to ADR to line you can just give the L file I'm gonna ask you to print here the function. I think it's that way the L file and the address so What is the address? You can take from the backtrace or from this PC program counter and then You just give to the tool and then you have there The search code and the line of code that calls it the crash so I can just open there and see this This line of code here I Can do the same thing with gdb so I can just open the gdb for my tool chain I can open the VM Linux 5 with with gdb And then in gdb there are a few Comments that I can use to solve symbols one of them is list I Can give list an address or I can give The the memory plus the index so I'm gonna do different here. I'm gonna give this That's the base the same thing as the address right the name of the function plus the index inside the function that calls of the crash So I can give this to the list comment of gdb and it will show me the same thing like to was that address That line in that file that causes this crash It's important to notice that if you compile the kernel with a few security options like if we enable the Randomization of addresses then you'll not get this right because the address will be randomized and then Secure is security use usually goes against the book ability, right more secure More difficult to the bug this system, but that's just how it is What about the user space application? How can you debug and use it is space application? So that's kind of the same thing in terms of how you do it But with the user space applications, you don't have a kind of ups But you can ask the camera to generate you a cord up a cord up is a kind of Snapshot of the memory when the kernel crash it That is a common the Linux usually by default The power when a process crashes the a cord up is not generated. Let me show you I have a comment here I think it's f pink. Yeah, so I added a bug in this command and it's crashing segmentation fall and That is a common the Linux call it is your limit It's a beauty comment from bash that is able to Let me run again That is able to enable the generational cord up So that's a parameter But it's a limit all limits here So there is a parameter the maximum size of core files created so by the full is zero so No cord up is generated and then you can just change this if your distribution has Disenable right so here I can just run a limit dash C Limited Then I just enable the generation of cord up now if I run the command again I'm gonna have a core file that I can debug We did be so what I can do is Here we have this core file this kind of a snapshot of the memory so I can take it to my machine this file and What do I need to debug this I need the cord up file I need the search code of the application and the binary the buggy symbols I'm using here build the route to build a system and So let me go to the build the root directory build the roots Output build F pink So I have here the file with the buggy symbols This is the binary with the buggy symbols. I have the search code so I can just debug this Core dump how to do it first. I need a quarter up. So I'm gonna copy From the root file system that is in my machine because I'm using an FS. So I'm gonna take the Root FS Core So I'm moving to my machine the core dump file Add the permissions to my user and I'm gonna open it with GDP GDP and The binary with the buggy symbols and The cord up so GDP the binary with the buggy symbols and dash C core. That's my core dump as we can see GDP already opens the The binary the buggy symbols it analyzes the core dump and show me the line of code that crashed Automatically, so I cannot I don't do that. I don't need to do anything else There is a nice future feature from GDP that scored to II I like it because it's a kind of a graphical view of the search code so dash to II you can enable this and then You can we have the common line below and the search code above Then you can interact more with the source code, right and see the lines and what is nice about the core dump Is that we can since you have the snapshot of the memory we can just Investigate the memory so I can put as for example print right print variables print option then I can see the value option Option is a pointer, but I am options Thank you Yeah, options is a pointer. We can see that options is not no, so Probably not the issue Probably argv is the issue here, right? If we print options argv as argv is no, that's why it's crash it So my point here is that we just a few comments using some tools With the right infrastructure, right? We can easily find The offending code that's causing some problem All of these comments are here, so you can check it out later Tracing Tracing and another technique for the buggy the idea of tracing is instrumenting the code There are several different ways to do tracing you can trace at Build time, right? You can add Tracy point at build time in the code. You can trace at runtime and the kernel provides Complete infrastructure for that so you can trace kernel code and use a space code I Want to show you guys Let's see this one for example, I have here a comment that's taken Let's see if I have here in there So I have here a comment That's taking a lot of time like it's just trying to Turn on LED But it's taking four seconds. So how can we debug that of course? we could start printing Adding prints in the code of the kernel to see where is taking that much time But the kernel provides a complete infrastructure to debug this kind of problem and and here we're gonna use F trace for that so F trace is part of the kernel tracing subsystem if you enable it So you can go to the kernel configuration in kernel hacking the new kernel hacking There is a sub submenu there call it tracing and if you enable tracing there are several options Options there that you can enable to trace like kernel function calls. You can trace latency things like that So we're gonna use here F trace to debug this issue For those who don't know F trace CIS F traces provides an interface via files That's very nice because you can interact with F trace in the common line. You just have to mount the trace FS file system That I guess I don't know if I already mounted or not. Maybe not. I'm gonna Mount it again device service is busy so I Just mounted trace FS and I have it here It's a file-based interface. You can interact with the kernel. So Let's say I want to Trace all kind of functions. We have here the available tracers With all of the tracers that can enable Let's say I want to enable the function graph tracer. Just see how it works. I have to echo this Tracer to the Current tracer file. So I just write the name of the tracer in the current trace file And that should be I forgot for about one thing since my kernel has crashed before I have to reboot it. So System will be unstable. Every time the kernel crashes, I have an oops. The kernel is unstable I discovered during my test that F trace is not working after I crashed my kernel with the USB stick So I just reboot it but should be very fast Okay, so I'm gonna enable it again function graph Now I have a file for example Trace pipe That I can follow all of the kernel functions that is being called So I can trace everything that is happening inside the kernel in terms of function calls There is a tool What is the output? I'm just Printing the output of the trace pipe file So this file it's a kind of the buffer of the tracing the that's racing to the system So and it's a pipe. So it never stops Keep showing the buffer. There is a tool that is built on top of this trace FS file system. That is called it Trace CMD and I'm gonna use it here to record the execution of that program So it's Trace CMD It's basically it's going to basically write those files. So it's kind of just a tool to help Interactive with the tracing system record function graph and This is the common so it will run this comment It will only trace this comment and the functions calls inside the kernel column by this comment And we'll put this in a file for me So we should take a few seconds and then after that it starts generating the file. Oh Yeah, I am in a directory that I don't have right access. So that let me do again Very good. I got it. I got it. Now I have a trace Dot that file that I can analyze. There is a very nice graphical tool called kernel shark that we can use This trace, but I'm gonna just use here trace CMD Report and I'm ascending these reports to a trace the text file because I want to open it with vi This is the result of the tracing and here we can see all of the functions that were called it For this specific program Now we have to use a little bit of creativity, right to try to find out what is happening If we search for LED, that's because I'm trying to turn on my LED I can't get to LED functions here and we can see that here in the middle of an LED Function calls. We have a m-slip and this function call Took a few seconds, right? I Don't know if vi is going to the right place, but yeah, so so if we open this GPIO LED Set function in the kernel we can Find out that there is a m-slip inside the kernel that is causing this delay. Of course, I added this For this demonstration So I would say it's much faster than if you start adding prints in the code of the kernel to try to find out What's happening? Another approach would be to run The bugger right to see what's going on that could be another approach. I guess I have 10 minutes, so Let's see what we can do now User space tracing there are several tools that you can use to do user space tracing One tool that I really like is S-Trace very simple very efficient, right? It's easy to install. It's just a small tool that is able to trace system calls and trace is system calls Can help a lot to define problems. So here in this example I ran that cat and it's returning an error couldn't set up listening socket We could like use GDP or any other tool to debug this issue or maybe just S-Trace So you can just run S-Trace if you have it in your system You just run S-Trace with these two and you can get all of the system calls and then in the end you can See what is happening, right? Like when it's calling the bind system call It's giving a new parameter in the second parameter and if you go to the bind interface We're gonna see that you have to pass a socket address there not new so that's the problem So you can spot this kind of problems like when you run a tool It doesn't return anything or some kind of mysterious error, right? And don't know what's going on. Just run S-Trace you can like identify for example that the problem is trying to open a file It's not finding the file and it's returning an error But it's not saying to you that a file is missing for example, so it's a very nice simple and useful tool The camera has a infrastructure for user space debugging probing user space applications at runtime. It is called it You probe user space probe, so you can really probe user space applications at runtime When you enable you probe in the kernel you are able to add at runtime trace points at function calls of your application Because this S-Trace tool and that is also the other L-Trace tool that is able to to help you to trace like system calls library calls But not the cost from your application if you want to debug to trace the cost from our application Then you're gonna problem have to use some kind of user space tool and For example Perf is able to do that Have time to do the demonstration now, but everything is in there's lights Can check it out later so the Perf tool do a lot of stuff right in terms of Tracing and profiling one of the things that is able to use the you probe kind of subsistence to Take the symbols from applications and it find the the addresses and Add tracing points to those address so you can instrument the application at runtime So here in this example This is a small kind of script to Put trace points in all of the functions of an application. That's ETH to And then you can just run and record What is going on it will generate a File with the the functions that were called it call it Perf dot data And then with this file you can see all of the functions that were called from your application And you can try to figure out what's happening in this specific case It's application that is freezy hanging so you can identify these kind of issues There are two more kind of Class of tools that is very useful. I have just a few minutes Interactive debugging that's very commonly useful, right? Gdb is our main tool for interactive debugging and For the most of cases it helps a lot in the embedded Linux space that is just one small problem because You have all of the tools in your machine, but you have the code running on your another machine So you have you need a kind of a client server Architecture so you run Gdb server on one side and Gdb client on the other side and you can do do just this With user space code and with kernel code right the kernel has a and and Gdb server inside of it So you can use it you can run the kernel step by step or you cannot you can also do it with user space code I was also planning a demonstration on this, but we don't have time This is kernel being the buggy it and User space is more simple right to the bug and user space application is much much more simple I can do it here in one minute Let's let's see we want let's say we want to debug this application here This application here is Freezing right so we want to debug this application. What we need to do is start Gdb server on the device Gdb server We're gonna use an internet connection here To communicate with the client so Gdb now Gdb server Now it's waiting for a connection. Then we go to the host machine we have to go to the Search code of the application that is 3 where is 3 3 is here then here I start the Gdb client passing my Application with the bugging symbols and I'm also going to start it to emote now. I just have to connect the command is target to remote and The IP number and the port number Connected I can put a breakpoint to the main function for example for example breakman Continued execution now I can control the execution of that program and Yeah, let's just run it's freezing then we can just cancel and see Why it is freezing, right? Very easy to spot kind of fish Okay, I'm out of time here and just to conclude here the presentation My point here in the end right so I try to show several different different tools and techniques like tracing interactive debugging crash analysis and things like this When we start our career the only thing that we do to the bug is adding prints in the coding and that's Sometimes almost most of the time is not the right way to do it. So over time you learn different tools and techniques So prints in the code. It's not the right solution for the most situations So there are different situations different kind of problems and different tools and we need to understand And it's important to know understand all of those tools and know how and when to use any of those When we need to solve bugs in any system. So I guess that's it guys thinks a lot for your time. I hope you enjoyed. Yeah