 First in person conference in two years and we already have a special set up Some of you know me Some of do some of you do not so I need to introduce myself a little bit My name is Marta And I hate debugging So now you are asking me why I'm covering subject like that As I hate debugging I was trying to find out the best techniques tools ideas on how to make debugging easier First fact One that you are not going to like We spent most of our time debugging unfortunately with best thoughts. It's still going to be like that but We can improve We can improve things in an important way and I'm going to talk about ways to improving debugging in embedded Linux today. So Here we go So what is important in debugging? You may say tools You may say knowledge you have already you may say experience and and What I will going I would like to I Argued today is that the most important thing is the approach you take If you have an organized approach, it's going to be way way better. Oh My dear clicker Okay But let's start with Things I have seen over the years Frequent errors when people are debugging the applications The first the most common one is Trying random solutions. I don't know what's happening But I'm going to change that and that may be the change the file system Let's change the network card Let's change the component Usually it's not really fixing the problem the second Common error is adding wait sleeps you sleep or whatever you have in your framework and When you hear when I am asleep the test case actually passes That's a common sign And I will tell you a small secret after 10 years working in silicon companies If you are fixing a problem by adding a slip It's going to show up just before the important presentation and Won't be happening just from time to time time to time it will have be happening all the time no slips except Except if you can really Tell why you're adding it in a better space Quite often you have a protocol specification. You have a chip specification telling you that you need to wait One second for the controller to initialize or for something to boot in this case. That's okay. Good to write a comment Above that sleep why it is there linked to the documentation So that everyone else after won't be wondering what is this thing doing here then Inefficient tools either doing everything manually or Or people are often writing crappy tools because it's just for debugging and this code is just for debugging I'm going to throw it away or they are avoiding writing tools whatsoever As we are going to cover a little bit later In a good debugging methodology, you need to reuse your tools if they are crappy you will have crappy experience and then Managers will Want me for this Estimating time for your debugging especially when you are hard hardware software problem Some people will ask you to estimate how long it's going to take if you are really experienced and You think you nearly have the solution you might try to tell how much it's going to take Unfortunately, you might end up saying three days and Three months later, you will be still at the problem after quite a long time for me. It worked with managers I'm sorry. I'm debugging. I don't know how how long it's going to take After repeating and repeating repeating it's it finally worked. You can try hopefully too so This is my secret methodology that like to share with you today So let's go through rapidly first defining what is the problem you are walking on then listing What are the possible reasons for this problem? After that, I look into ways how you can actually verify that this reason is actually the reason for your problem Then you write a test case you run a test case You analyze what you got and then you look back or you have a solution now It's going to be easier with an example So let's go to an example an Example is not going to be the HDMI in the laptop because that's a little bit too complex and I'm going to run with something a Little bit more fun. I have been installing a weather station recently In my garden the garden is in the mountains. So we have a slope And it's pretty long like more than 100 meters long you have trees so we have different things that That can interact with the weather system and the weather station has the base station It has temperature sensors. It has a wind sensor and it has the rain level sensor The base station was working the temperature sensor was working The wind sensor was working But the rain sensor wasn't working. So Let's go into the bugging of my weather sensor So what the problem I define it as the weather The rain sensor doesn't work. What could be the reason? We are in embedded space first possible reason. There's no power Second possible reason I can think of just like that It's a radio. So maybe it's too far from the base station It's not receiving a signal and doesn't work. Okay, so I have those two possibilities Let's go with the one with radio so How can I verify that's true? I Start looking at the at the sensor. There's no LED Nothing showing that it's it's really working But when we're looking at the base station I can see that there is a small icon the sensor of the radio link With the with the rain sensor. Okay So it means that there is a reducing now So this is my test case the icon on The base station we have radio. Okay, so what have I learned I? Have learned that both of my possible causes are not the real cause because if there is radio there is also power on The device. It's not that the problem. So We look back and they back further. I have as I had no clue Let's disassemble the sensor So as I do not want to use photos of the actual device because copyright and all stuff What is important to know how it how it works? Oh We'll be on the right place in a moment Okay, here we are so it works with a scale You have a raindrop Going down Running on the scale and when you have enough drops the scale goes down the water goes out and the other part of the scales come up and when it switches it clicks and It counts the number of clicks it does It's pretty pretty easy to understand how it works. So I was looking at it The possible reason it actually mechanically doesn't doesn't click brought what what can we do to verify it bring a bottle of water and See if it's going to click and if we get the actual result. Okay, so I do the experiment. It works What have we learned mechanically it works. So what could it be? In fact, there were more steps like that during the whole thing until I realized that it works with the bottle of water But it doesn't work With the real rain or at least it works but for only one click during a real rain and After the rain I Got that aha moment. You remember the slope. So the sensor itself was also on the slope like that so what if This kid doesn't work with a low amount of rain if it is not exactly horizontal How do we test it? Easy when you have a gun on the slope you just you just switch it 90 degrees and wait for the next train It worked. It worked and it's working ever since time to get into What we have learned from the methodology. So what is the problem? Write down everything, you know about the problem then List all the reasons you can think about By the way, I'm typically using an electronical fire, but you can do paper whatever you want Listing all possible reasons you can find out then you choose one that is The most interesting for you or maybe the most probable one you'd grab that reason you keep it and You find a way to verify that reason and you keep the reason not that all the other reasons This one you grab it and you design an experiment to verify that specific reason When you have the experiment Because it may be that it's pretty hard to do and you would prefer to go back to another reason Maybe it will be easier to do it may happen when you have the experiment that you can actually do You do the test case It's either something to not down or how exactly to do it to reproduce or if it's just called You just commit the test case and then what are we learning from the result? Does it support the fact that the reason that you were thinking about is the real reason? Have you discovered any? new possible reasons That you could be able to look into or maybe Inevitable happens quite often Maybe do you need to recheck? your previous results Because one of your experiments were broken for some reason or was incomplete and That's why I'm telling you to commit the results and have all the history So you're not going to be lost because one of the experiments was just broken and Then we'll look back either. We have found the solution. We are happy and we implement the actual solution and At the end we run the actual first test case from the beginning to make sure if we have actually fixed the issue and Then if you do not have enough information Go out to any point above typically you will go to the wrist of possible causes and you will be evaluating What is the next one to work on? Okay That is about the methodology then it's about tools because You need tools to understand what is happening in your system What exactly does not work or How the system at large works? so that you can get more insight into the issue and of course you also need tools to Design your experiment. There are so many tools out there I'm going to cover a few Most frequently used ones Then it's up to you to work on your toolbox to go around find Things you can use. There are a lot of interesting presentations happening around for example one about tracing later on today So the first Some people are not going to like me again here because print of debugging that's for Beginners, right all people working in embedded space for yes, no that well For beginners back everyone is doing that so printing is Print F in C print K on peer something in the Linux kernel and a specific function in the language Tooling that you use what it allows you to show is the at the state of your some variables or The state of the code path when you are finding yourself in basically but but but Beginners beware Decode may behave differently with and without and Why is it so? For quite many reasons first my advice will be to show only what you actually need Because one reason too much data may be really hard to analyze if you are just lost in pages and pages and pages of data even if your Piece of gold is somewhere in there. You may just not see it and then watch out the bandwidth In code especially with loops It's really easy to generate pages and pages and megabytes of logs When you are embedded with a serial port over the slow network connection, you can actually saturate Messages may be lost in this case and Imagine a situation when you have had the right message in the exactly right point showing the exactly right Variable in the right moment and you are just not seeing it because it's going to be lost It happened to me not funny at all So make sure you get all messages and save your bandwidth for the importance then quite frequent technique to limit the amount of messages and Make it easier to To work over your debugging is using log levels. What is it? log level in Natural you have a variable that is controlling printing some of the information so You have some variables and you in the code you have in some variable is higher than Some value then you print the message otherwise you don't Why it's useful because you may have all this conditional debugging in the code and You just enable It by setting the variable by a debugger in from the command line if it's in Xcarno in sys and It just happens Saves your time and If you think that adding at printf Looks like a fix for your problem I Would tell you to retest because my bet is that you are having a race condition Okay Then let's go to the kernel special files For debugging Linux kernel and debugging your application working over Linux. It's it's also pretty interesting Yeah, my dear clicker Okay So there are two main Directories with special files that are used for proc and sys not going into the details of history about why you have both Not important for our use case In proc you have various statistics Especially related to your processes. For example proc self limits is giving limits of the process itself There are way way way many other things Useful to learn what you can find interesting that applies to your to your case. The other is sys It includes fires for each bus. It allows you to enable disabled certain features dynamically. For example Sys kernel debug dynamic debug It allows you to enable log levels conditional debugging in the kernel code quite useful You can enable things without recompiling Saves time also Then We come to domain specific Situations and probably in your own domain, they will be way other things Two examples of stress and path and how you can use them So my first friend's trace It allows you to see all the system calls that the program is running with the Parameters and the results of the system calls It has saved me a number of times from Applications that do not check error codes of system calls For example, someone is opening a file and not checking if the file was actually opened and Going with their work If they are wrong permissions or the fire doesn't exist it usually ends badly So it it saves you a lot of time For such kind of cases just looking for minus one as their error code and verifying if they all look reasonable then perf I'm not going to talk a lot about perf because if I run into the subject of perf We are not going to get our Guinness this evening. So You can get a lot of information from many places and perf gives you a Lot especially related to your CPU and The general monitoring performance stuff for example Just perf start sleep Chan Gives you the CPU statistics for the last 10 seconds Pretty useful and there's an excellent documentation With perf with wine one line us you do not have to remember the comment Just you search on the on the site of the most important comments and You will like as an expert just by knowing the the site as being able to find the right Great comment for your use case and they are they are plenty. Okay Another domain networking Pick up TCP dump a way shack what is what is pick up pick up is a format that allows you to save your network traffic And then reanalyze TCP dump is a tool that allows you to jump to dump to display the traffic In a text format the way shack is a graphical interface. So what I'm going to show you is the graphical interface For people do not to who haven't done a PG it working It may be sometimes a little bit complex to understand But you the tool analyzes the protocols for you So you already have some information and in this case I for example, I have a TLS connection I can see a port number In the tool Usually in the red it shows you Things that are not going well in the network connection. So it may be a hint of course need to be careful For some reasons and have your protocol reference handy next to you when you are debugging. So What do I use? way shack or TCP dump for To answer questions like is there really network connectivity working correctly at all levels Is the protocol I'm working on actually working as it should be Yet again, you need to have the protocol spec somewhere handy to be able to verify if it's actually true or Is there some? Unexpected traffic Something that you were not expecting in your network But actually happening and interacting with what you want to have in a way that Generate specific effects. So a Typical findings using Workshark or TCP dump same Mac address in a in the same network For everyone who hasn't run into that yet when it happens to you your network behaves Randomly and quite often it happens in a bedded because Same Mac address twice happens quite often when we have Ethernet controllers Where That's the software that has to put the Mac address Or you're prototyping a device someone forgot to put the Mac address or all device are clones And we connect to identical device to the network and then buff it may actually Take some time to debug and in the network trace you see you see requests Two requests for the same on the same Mac and you just have to find the offender then I know the interesting situation is when you have your Network connection or protocol connection blocked by some misconfiguration or if I will typically as you are seeing Exchanges between two parties and then one of those parties say stop. I Don't want to talk to you anymore if a network trace You can see who said that And then you go to that side to find out what happened in the log or Or by other means you look in the configuration fires, whatever it could be Okay, so Let's wrap up and Repeat what is important? At least what I find important So my the the magical debugging procedure adjust as you want to Define what is the problem clearly? List the possible reasons show Think about how you can verify if it's really the reason right at this case Save it for later Find out what you have learned from the result of the experiment and then look back depending on the needs on the tool side Printing quite often helps with initial questions. I Haven't worked on I haven't worked through the bugger's in this session because otherwise it would have been a shopping list Debugging is helps in similar situations, especially if you want to look into some specific code You have also logs for from your system from your applications Look into them and find out what is important and what is interesting what you can reuse and Then there are specific tools for each domain. You have a performance information from in tools like puff You have the networking Tools why check tcbdamp if you are an internet there are other tools if you are on infinity band or other things And the way there are way more tools in your toolbox available for different domains and Add them as you find useful so if you want to discuss this subject or a Specific subject of why a security person talks about debugging. Okay. The answer is very simple security bags are bags first So Let's bugs let means less security bugs. That's why I'm working on that and also the Friday As the procession So if I'm calculating correctly, we have some time for questions now Everyone's shy. It seems so If you ask yeah, we have a question because I have of course things to add we have someone Yes, hello Dance all of them I work for Ford Motor Company Near me Dance all of them I work for Ford Motor Company and something that I've seen You know differences between going through university and then working in real-world world environments. How do you propose we translate some of this experience in real-world environments back into the University Environment because something, you know, it's it's especially when you're starting out printf is very common and it's it's at least my experience It's usually code fast and debug later Usually two three four hours before assignments do so. Do you have any ideas on how we can better? Teach these advanced techniques through a university level. Okay There's a very insightful question. It's a very insightful question and in fact Just two wells tools away. We are discussing the same thing for security So I've noticed the same thing people coming from school. They do not really have the bugging skills No, no security skills know the experience so What I what I'm trying to do is to educate people on on actual techniques because I find personally that Tools they can learn It's easy to give them tools for a specific task when they need them It's more complicated to To learn people actually work step by step I'll give you an example here When you have the result of of that analysis as I as I've presented available Don't buy by a genuine the junior developer They can show it to a more advanced developer easily And that person is going to look Okay, have you considered this situation? Have you considered this situation in this and this you have you have considered this yes But but there's one more possible reason that you should consider and it's very easy to have them in this case Other than they are telling you for 30 minutes. What do they have tried to do? And then well, you are going to redo the same experiment. So I think that what we are missing is actually learning them the approach and The tools are going to come I think that is easier Hopefully answering your question We have an online question What do you think about debugging real-time embedded systems? A lot of tools you presented a such as Strace cannot be used on those I have been using Strace on Okay, maybe on a Hard embedded because there's the definition of what is what is real-time? Yeah, so if you are on a hard embedded Then you can You will need a little different toolbox So usually for example printing won't be useful because it's going to completely break your runtime a constraint except that you can add you can try to adjust those techniques for example as Without without using printf what you can do is to re-implement the printing to some buffer somewhere and then get get those The information later on the same with tracing you can also try to do it in a non-intrusive way My advice will be mostly to think If your problem is really linked to the hard real-time requirements or not If it's not You can just use tools as in typical Linux system If it is Hard to return really in the problem is really showing in that hard to return problem Then that will be and there will be a need for Specific tooling for for your test case depending on where you exactly can put the data and what you are exactly looking Looking in That will really depend on the situation. It's pretty it could be pretty complex Okay, we have a question like fifth row. I think on this side It's quite funny. I ran into an Mac address duplication problem I ran into a Mac address duplication problem few weeks ago and I found that a very helpful tool can be looking at your ARP table and That's a little easier when running wire shark in the first instance This is yeah Looking at your at your ARP table when it happens. That's also a reflex to have I Use this example because quite often we are not considering. This is what is actually happening So we are thinking it's something else and then and then we figure out and of course after after the fact to having that The duplicate in the trace You run the the ARP format. You just see. Okay. They are two of them And it's clear you have verified you have verified the case. Yeah for quite many Situations you have multiple tools you can use and with experience With different kind of projects you will have more of them to use I'm unable to cover of them in one hour. Sorry. I would say that We are done now and I think that will be the lunchtime, right? So, thank you all