 Hi everybody, my name is Thierry Boutel, I'm a software architect working in a French company named IoT.BZDH. IoT stands for Internet of Things, obviously, and BZDH stands for Brittany, which is the place where we live. The subject of this presentation is cross debugging on Linux. In some past years, I was working for a wind river company, and we were developing development tools, and especially I was working on the Linux user mod debugger. We will see what is available today in terms of cross debugging and also what can be expected from modern tools today. The issues addressed in this talk are obviously the remote debugging, but also we will have a deeper look on the debugger entrances. For the ones who have ever wanted to know how a debugger works, it will be probably of interest. At the end of the talk, I also did a little bit about the ancient essential tools for the developers, like LTNG and Valgrind, and also we will see what and how ZEFIR OS works and what kind of debug it offers. IoT.BZDH is currently developing and recently released the official version of WetBask. WetBask is both a factory which helps developers and integrators to make products easier, and also WetBask OS, which is a new Linux distribution based on CentOS and that has long-term support. First, I would like to be a little more specific about what remote debugging means. Remote debugging means when you debug an application which is on a remote embedded target. Usually, this is a target which is of a different CPU architecture than your OS development machine, and as a rule, considering that the target architecture is different as your OS will prevent you from bad surprises. What it is not is just connecting SSH for instance on a target that has an opposite SSH server and to launch the debugger on it that doesn't suit with small embedded targets and that also won't work if you want to do system-mode debugging. In the past 15 years, I have often seen people that weren't using remote debugging. There were several reasons for that. Some people were just working on code that was not architecture-dependent or that was using board devices. In that case, they were able to reproduce their bug on their OS. Some others said that the debugger was badly integrated in popular IDs like this code and they were white. Also, GDB and GDB server have suffered from instabilities for years and were leading from crashes or bad performances. Some other people were very special ones because they weren't using the debugger at all simply because they didn't trust them or they could cop their bugs without using the debugger at all. There are two kinds of remote debugging, system-mode and user-mode. System-mode basically means kernel debugging when you are on the Linux. It can be performed either with hardware assistance, with a JTAG pub, through software, things like KGDB or software emulation for instance when you run KVM with a special debug port that emulates a big GDB server. One thing to keep in mind is when you perform a stop that suspends the whole system that means that you lose communication. In case of you are using software like KGDB, software debugger that means that you will have to keep your communication alive with AIRQ disabled. For doing that, the debugger implements a special communication channel based on pulling on the communication interface, like serial or Ethernet. It's interesting to notice that some high-level debugger may have what is called OS awareness. That means that they know that the underlying debug system is Linux and that they will be able to display some FWAD information which is the backtraces of FWAD local storage. User mode debugging means application debugging. Debugger relies on the P2S API that is built in the kernel. It's interesting to notice that since last year there have been several significant improvements in the P2S API. The signal handling is far better since Linux 3.4 and security has been improved because second comp is under Linux 4.8. For using P2S API with GDB, you have to get appropriate security credentials. If you have ever wanted what are the mechanics when you insert a breakpoint in your favorite debugger GUI the things behind are that the debugger inserts a special instruction into the process memory. That instruction is completely architecture-dependent and its role is to generate a software interrupt when the program counter comes on it. The debugger will catch that software interrupt. As it maintains a list of breakpoints and the original instructions that were at their places the debugger will be able to step over the breakpoint. Maybe the most important thing to remember here is that the process memory is modified and that can have significant side effects that I explained a little bit later. Hardware breakpoints use hardware support. The amount of available hardware breakpoints and how to use them depends on the CPU. There are two types of instruction breakpoints that just like software breakpoints are it when the program counter which is location and hardware breakpoints which are very useful to detect or to debug crashes due to altered memory. It is worth to notice that usually it's not a special address but rather a range of addresses. The main advantage of hardware breakpoints is that trace memory is not modified at all. Another important feature of a debugger backend is a single step. The single step consists in executing a single machine extension and stopping immediately after. That feature can be supported by the hardware or not. Usually the Linux kernel publishes that feature through the P2 single step call that can rely on hardware support or being emulated by the kernel itself. When there are no support at all it was the case on older kernel on the MIPS architecture. This is a trickiest issue because you have to figure out what will be the next program, location depending on the current instruction. You can imagine that jumps are the trickiest thing to handle. When you know the instruction where you will go after you have to put a temporary breakpoint on that location to resume your process to eat that location and to remove the breakpoint. In other words, your trace memory is modified twice and that can lead to annoying side effects. One of the most complicated problems that debugger has to deal with is multi-thread debugging, especially when there are a lot of shared code and when you want to use thread-specific breakpoints. In both cases, you are in the same address space. When you put a breakpoint, it will be eventually eat by a thread that was not aimed to. When this happens, the victim must be resumed. But when you want to do that, before modifying the process memory, you have to stop all the other ones. You have to suspend them, else you will eventually lose the breakpoint in the other threads. As you can guess, things can get even worse when you don't have hardware support for a single step, but that you emulate it with temporary breakpoints. The reason why I have been working on the Linux debugger is that when we were wanted a debugger with an ID, whose name was tornado, that has the same features as VXworks. So everything began in 2005. The first proof of concept of a unified software debugger for Linux that was dual, it could perform system and user mode. The first proof of concept was performed a couple of months after the beginning of 2005. It consisted of set of ugly patches on top of 2.6.x kernel. It was using VXworks WDB protocol and it was using serial and net polling routines because there was no netpoll API yet. And net everything think it was able to spawn processes from the kernel context, which is unusual. Today, nowadays, when the kernel wants for some reason to spawn a user process, it usually needs help from the user-land. After some months, the project was finally abandoned and a split was made between system mode and user mode. KGB was chosen for system mode. By the way, I would be interested to get some testimonials from the listeners if they know of themselves still use KGB today, because I don't know a lot of people that do so. For user mode, the P2A's API was chosen and we started to develop an application called the user-mode-agent that was leveraging, specifying and implementing the WDB protocol. It was using UDP and was supporting also the TPC protocol and also serial backend. All those architectures were supported, X86, 64, ARM32, PPC32, NM32 and MIX64. It was possible to get the Tracy IO-initiable console through the virtual console. There were support for hardware buy points, for ARM sub-mode and interworking one, and both flavors of threads that were available at that date, which were NPTEL and NNIC thread. And the key feature to my mind, it was the possibility to have five going debugging. I mean, stop, stepping, continuing on single thread without having to stop the whole process. At the end of the project, we have finally achieved the following result. Our debugger was able to debug up to 100 threads with lots of shared code, and each thread could be debugged individually or not. A set of more than 2,000 unit or threads could be one from the host debugger. Facing the complexity of the single step emulation problem, I decided to base my design on a finite state machine. That finite state machine was holding more than 80 states, and it was possible to design it with a GUI. The debugger was and has continued to be internally used by Win River developers to develop the JTAG Probs firmware of the company. What you can see here is the finite state machine displaying its GUI and its complexity. That tool is quite deprecated. It's no longer maintained and was originally written by a French engineering school. I recently published the code of that application on GitHub just for it's not being lost, but it's no longer maintained at all. The inconvenience of this design is that it doesn't suit for system mode debugging. Moreover, the problem of stepping over ghost boy points, ghost boy points are the ones that are not intended to be hit because specific to another thread, led to performances issues with SMP. So, the tweak used by the other project was to step over a break point of performance ghost type emulation by putting or executing the execution somewhere else in another advice space that was not shared with other threads. Let's go back to the present, if you don't mind. I will show some remote debug sessions example. In the first setup, the communication will be done by a GDB running in the WebPESC local builder. The local builder is a light container that has all the developing tools, including cross compiler when it's a GDB and so on, and what you need to package your application into the WebPESC factory. And it will connect to an instance of WebPESC OS running in a KVM. So what you can see in the running video is that the KVM instance is booted. Meanwhile, the developer compiles a very simple application. And as you can notice in the make file, there is a separation between the application program and the debug information. Once the target is booted, the GDB server instance is launched with the dash-dash-multi, which means that we don't want to close the connection when the GDB session exists, and we specify a special port. On the host we launch a GDB, we attach to the target with the special target extended remote and the given port, that has been mapped to my host because it is running on a KVM. And then we put a breakpoint on main and launch the program. We can see that the breakpoint on main is correctly hit. The following example matches the reality better. In this case, we still have GDB running on the host machine, and this time it will connect to an instance of GDB server running on an actual board. In this case, this is a Raspberry Pi 4. In this current video, I have booted the Raspberry Pi 4 and connected for SSH on it. First of all, we compile the simple application by having modified the... make file a little bit in order to invoke the cross-compiler. Thanks to the cross-compile suffix. We then upload the compile binary to the target without pushing the debug information. The debug information file will be kept on the host side. In the SSH session, we launch the GDB server just like we did in the previous example. We connect on it. Again, we will put a breakpoint on the main... on the three-point and execute the program. There is still the upload of the unknown target libraries from GDB server to GDB because it needs them to debug. We can see that the breakpoint is hit at the end, that we can continue hit to have the L1 output and that's it. As you can see, there is no significant loss of performances or efficiency despite of the fact that the target is less powerful than the host. In the next example, we will be debugging a much bigger process, still on the Raspberry Pi 4. This time, the process is the OpenCPN application. This is an application for displaying charts in navigation. It uses X11 and OpenGL and a lot of chart libraries. Up to 180 and it is written in C++. This time, since this is a graphical application, I will do the SSH port forwarding for the X11 port. What you can see also is that there is a completion for the C++ symbol. I am putting a breakpoint in chart canvas, column, column, chart canvas. This is the constructor of that class. Then, when the program is started, the thing which is happening here is that there is a synchronization with the missing library. It takes a while because there is a bunch of them, but this is an amount of time which is acceptable because it only happens once. After a while, the breakpoint we aim is finally hit. And we already see the first graphical windows that come. I am then stepping and exiting the current function and at the end we have the application which is launched displaying the map. By the way, this is the place where we live close to the Isle of Groix. I have wanted to make a kind of evaluation of alternatives to the GDB and GDB server couple. I have identified one interesting which is the LDDB from the Clang environment. LDDB can work with remote connection as well. It has the LDDB server. To my mind, the main advantage of this debugger is that it can perform an automatic upload of the debug binaries. The synchronization to the target is done when the binary is recompiled, when it changes on the host. However, I got a very bad experience from the Raspberry Pi. It was possible to debug my simple sample but it has crashed when I have wanted to debug a bigger application. When that happened, the loss connection was not detected. Also, the documentation went on how to list and delete pending or actual breakpoints didn't work. The documentation seems also outdated regarding the process launch command. As a conclusion, there is also a completion for the lookup of symbols including the C++ symbols but it is not user-friendly. Instead of suggesting the symbols on a single line, it presents them on a whole page in a pipe-more-way. So it is to my mind less practical that GDB does. Another alternative to GDB and GDB server that I had identified is TCF which means the target communication framework. The TCF project was initially launched at WinWeaver as a replacer of the user mode agent. That project was launched due to Felix Burton and is currently maintained by Eugene Tasav. This is a project which is hosted and integrated in Eclipse IDE. Unfortunately, I didn't have to perform a lot of tests with TCF as I won't have time to post that video but I just took some look at the code. It seems still maintained as I said. It supports a lot of backends including other backends than the ones for the Linux OS and also it has a very cool feature which is the auto-discovery via via UDP broadcast. TCF architecture is cool but it's still unclear to me why they removed all the multi-sweat logics that I had implemented in the user mode agent. I have finished to tell what I had to say about Linux debuggers. I would like to introduce something new for the ones that don't know which is the Zephyr OS and to make a small demo on how the debugging works on it. In short words, what is Zephyr OS? Zephyr OS is a real-time OS aimed for microcontrollers. It has a very nice documentation. Most API are very similar to the ones of Linux kernel. It also supports and has device trees but the device trees are statically compiled. Since we are on microcontrollers usually most of the time we don't have ethernet and there is no implemented software debug agent. That means that for debugging you need a GTAG. The setup of the following demo is as such. The target will be a WNSS H3ULCB board. That board has several CPUs on it. 4 Cortex-A 57 4 A53 and a Cortex-R7 on which we will run the Zephyr OS. We plug and only makes JTAG debug problem on the board and on the host we run OpenOCD that makes an abstraction of the JTAG communication and provides a GDB server on which we can plug we can connect a GDB debugger. OpenOCD offers one TCP port per debug call on the target. It also has one TCP port from internal monitoring and management and the demo in the coming demo we will see how Zephyr tools make it even easier with the WestDebug command. On the left terminal shows the project directory where the command is just WasteBuild to generate the firmware to inject on the target. On the right it's just a picocom running on the serial port. When you type WestDebug it performs several things. It uploads the firmware to the board and directly connects GDB to the OpenOCD where we put a waypoint on main, we perform a continue we have seen that the has started and after stepping over the point K you can see that the hello world is deployed so it's as simple as debugging Linux process. Before concluding my presentation about cause debuggers, I'd like to tell you a little bit about two tools that I find very cool and that have saved my life a number of times. Those tools are Valgrin and ATTNG. I think that most people that already have used Valgrin know it because of his memcheck tool which is a really useful tool when you want to track memory links for instance but it also has profiling tools that are maybe less popular but that can help you to track where bottlenecks in your code when you are facing performance issues. It is worth to notice that it works well but that kind of instrumentation has a big impact on performances. So don't expect your monitor code to be as fast as the production one. Using Valgrin is really simple. You just have to type the valgrin command the dash dash tool with the tool that you want to use. In this example it is callgrin you eventually had other options than your program name and your program options. When your program ends the callgrin.hout.pid file is generated that can be processed with the kcashgrin tool. Kcashgrin's graphical tool will show up the callgrin of your functions with the amount of time spent in each function. When should you use valgrin I would say all the time and that would must be in your CI. A special attention must be paid for the memcheck exclusion rules. For more information go to glue and see the valgrin documentation. Another tool which is really cool is LTTNG I really like their logo and find their mall very cute. The slogan of LTTNG can be instrument twice and investigate. It can perform either kernel or user space instrumentation. In my case it has saved my life in the past when I was chasing way nasty while scheduling issues. A typical example that LTTNG can solve is that when you are getting scheduling or priority inversion issues or when you want to chase higher queue storms. There are other known alternatives to LTTNG just like F2S, S2S and Perth but they are not as cool I mean they are not as easy to use and maybe the strength of LTTNG can combine both userland and kernel twice and it is especially useful when you have an application that relies on a dedicated kernel. LTTNG is composed of three things LTTNG tools, LTTNG modules which are kernel modules and LTTNG UST It was a real because it implements high performance buffers based on the RCU technology. Starting with LTTNG is quite easy. The documentation is very well written and you don't have to follow the mall. You need to add some specific kernel options, config modules of course but config IOS timers and config twice points of course. By the way, config twice points is not visible in kconfig but it is activated implicitly by config F2S. Config F2S just adds 5 by NOP's operation at each function tree. In fact, in another challenge on the other side, I show how with 4 commands you can get your first LTTNG twice. Got one file of twice by your cpu core. For using that twice you can either use bebel twice too or the best graphical application with this twice compass was formerly an eclipse plugin and now it is still at eclipse.org but it is a standalone application. As you can see the application shows which thread or which cpu is active at one time or when it is preempted by an IRCOO with some statistics. We have finally reached the end of the presentation. I hope it was interesting. I would like to thank everybody for their attention. We will now start the question and answer session. Feel free to ask anything you have got in mind and I will try to answer. If you come to France please visit us on the next picture to show again the place where we live. We will be happy to show you our beautiful country.