 Okay, hello, welcome everybody. So we will have this talk about LTTNG, Kernel Ad User Space Tracing in Debian, by Mikhail Jonson, please, welcome. So, hi everybody. Today I'll talk about LTTNG, which stands for the Linux Trace Toolkit Next Generation, which is a set of tools for Kernel and user space tracing. First of all, who am I? I am a software developer at Efficios, which is a consulting company that's been doing a lot of work. Well, we're the main developers of the LTTNG tool kit, the tool set, and I am also a Debian maintainer for mostly the LTTNG packages in Debian, and also in Ubuntu, and I do packaging work for mostly LTTNG and other distributions. So, what I'll be talking about today is basically what is tracing for people that are not aware of what is exactly, and I'll show you more specifically tracing with LTTNG. So, has anybody here ever used tracing in a general sense like S-trace or TCP done? So, well, I guess then that most of you are aware of what exactly, well, what is tracing in a general sense. But in the case of LTTNG, so you can see the system as a black box or a flight recorder for your kernel. So, it will log a variety of different types of events that you've configured. So, the data that you can extract from a running kernel using LTTNG is mostly CIS calls, function, entry and exit, which are classic debugging information. But it also permits you to enable and disable those events at runtime. So, while your system is operating, you can add or remove the type of information that you are extracting from it live. It also has a very low overhead, so it can be used on a production system that is actually doing work. So, in that way, it is different from most other kernel debugging system that are not recommended to be used in production because they have a large impact on the system, even sometimes making it unresponsive or crashing it. But the whole point of LTTNG and one of the core design principle behind it is that we want to have the less impact on the running systems, and in no way we want to have an effect or in any way modify the behavior of the system while it is traced. So, why would we use tracing? So, most of the times, it's to debug problems that are not easy to fix. So, when you're up to the point where you would use tracing, it's because you've tried all the classic debugging tools that you are used to use in your daily lives. So, it can be used to narrow down mostly bugs and latency problems. As I said before, it has a very low performance impact, and it's highly suggested to be used in production on real. So, you can debug real problems while they are happening. I'll rapidly describe the different tools that are part of the LTTNG stack. So, there's four main categories of softwares. Though the tracers are the system that collect the information, the control utilities are used to drive those different tracers in a single unified interface, and then the viewers are the graphical or command line utilities that are used to visualize the information that you've collected in the trace files. And there are also post-processing and analysis tools that are used to pinpoint information in tracers. So, the two tracers that are part of the LTTNG projects are first LTTNG modules, which is an out-of-tree kernel tracer. So, well, it is compatible with kernels starting from 2632 up to the latest RC kernel. And I want to explicitly state that you do not need to recompile your kernel. So, that is something that's been following us for a while because the first versions of LTTNG required to heavily patch kernels, and they were a bit frowned upon by most people because it involved too much manipulations of the systems. But it's been many years now that you don't need to recompile your kernel. You just need to build some out-of-tree modules and then load them to the kernel. If you are the kind of people that like to roll out their own kernels, you can also add LTTNG-built modules, but you can add them directly into an image kernel also as built-in modules. In Debian, there's the LTTNG modules, the KMS package that is available that you can install on all supported Debian systems. And these set of modules are at least built, tested against all Linux stable tags starting at do638, all Linux RT tags, and all Ubuntu LTS kernel, including the LTS Backboards kernel. So, in adding all those kernels together, each time we commit a change to the modules, we built them against approximately 1,500 different versions of Linux to make sure that it at least builds against those. We're planning to add some runtime tests somewhat in the future, but right now it's just build tests, but at least we have a good idea that it's going to work on most kernels. Then the second tracer that we support is the user space tracer, which is basically an in-process library that will allow you to trace user-specific application. However, those applications will need to be instrumented, so trace-print will need to be added to the source code like logging statements. And there's also agents for popular logging frameworks in interpreted languages. So we have an agent for the Java user-spacelugging log4j, and we also have an agent for the Python logging subsystem. The rest of my presentation will mostly focus on the kernel part, but the UST tracer is also available, and it permits to correlate traces from user-space application with kernel traces. The control utilities is actually a single package called LT-TNG tools, which contains a main CLI command line command, which is called LT-TNG, and also some backend demoms that are used to collect traces and stream them over the network to other hosts or write them to remote systems. There are two main viewers. The first one is Babel Trace, which is the command line interface viewer, which will basically give you a text log from your traces. Well, I forgot to mention that the traces are created in a binary format, which is optimized for size and for speed of writing, and so you cannot just open the trace file with a text editor. You need, in this instance, Babel Trace, which will convert it to a text representation, which is human-readable. It can also convert different trace formats, so I don't have the exact list, but if you use other tracing systems, it's possible that you can use Babel Trace to import those traces as CTF trace and then correlate them with LT-TNG traces. The other viewer is called Trace Compass, which is the graphical user interface, the GUI front-end for LT-TNG. It's written in Java as an Eclipse plugin, but there's also a standalone RCP version, and it allows you to visualize trace, explore them. There's also analysis tools that are included for latency frequency analysis and a lot of different analysis. And finally, LT-TNG-TOP, which is unfortunately broken in Debian at the moment, but it is a re-implementation of the top utility, but instead of reading the slash proc file system, it uses a live tracing session, and so it has a lower impact on the system, so it can be used on a very highly loaded system to extract top-like information with minimal impact. And the last kind of tools is the post-processing and analysis tools, which is called NPT-NG analysis, very nice name. It's basically post-processing scripts, so the way you use them is that you record a kernel trace of your running system when you have some kind of problem, and then you can take this trace on maybe another system that has a lot of computing power, and you can then do offline analysis on this trace to debug your problem. So at the end of this presentation, I'll do a short demo of this tool. Just a quick comparison on other tracing system that you may be familiar with. So the first one is S-Trace, which is probably the one that's used by most people because first it's really easy to, there's not a lot of setup. It's a purely user-space tool, so it's very, very useful, but at the same time, if used on a production system, it has a very high impact on, and it also sometimes causes side effects, like because it intercepts syscalls between the kernel and the application you are tracing. There are also two other tracers, which are included in the upstream Linux kernel sources, which are F-Trace and Perf, and so I won't cover them because I don't really know how they work, but if you want to understand what LT-TNG does and you are familiar with those systems, then LT-TNG does mostly the same thing, but with a focus on performance and low overhead. So LT-TNG, yeah. What about system type and the BPF-based BCC suite? So there are two other things. How do they compare to LT-TNG? I'll do the compare. In the case, I know that EBPF is something that's new in the kernel, but it is also a tracer, but it's more of a a profiler or it mostly does statistics or e-chantionage. Same thing, yes. It's more of a sampler than a tracer, but it has very similar use cases. And well, it's just been included in the kernel, but EBPF is really an interesting project. I am not really familiar with system type, though. And yes, I didn't mention that, but I welcome any questions during the presentation, so feel free to ask. Yeah, so LT-TNG in billion. So all the tools are packaged. There are two maintainers, myself and John Bernard. All the latest versions of these tools are right now packaged in testing and unstable. Unfortunately, for older releases, stable has the 2.5 version stack, which is as now unsupported by upstream, but we still fix it when there's bugs in Debian. Hopefully, well, the 2.8 will soon be backported to stable backwards, so you'll be able to use, if you use Debian stable, you'll be able to use the latest and greatest LT-TNG version. And what's included in the old stable is really ancient stuff and I don't recommend to use it. There was, at this time, the LT-TNG project was in a really fast release cycle and you should not consider using it. Also, LT-TNG in Ubuntu, so well, the packages are the same as in Debian, but they are also maintained. We also offer PPAs for those of you that use Ubuntu. We have three different kinds of PPAs. We have daily builds that are basically built straight out of the master branches, so that's the packages that most of our developers use. We also build the latest commits of the latest stable branches in a separate PPA and we also have a release builds, which is basically kind of an unofficial back port PPA where we build the latest package from Debian unstable but for Ubuntu LTSs. So yeah, use cases for using LT-TNG, so well, debugging complex and hard to reproduce problem that involve different part of the system, so you'll, when you have a slowdown on a system that you cannot pinpoint to a specific subsystem or to a specific demon, then it's the kind of thing that will be really useful because you can extract information directly from the kernel and then analyze it to pinpoint latencies, which is usually the main easiest way to localize your problem. It's also used a lot in embedded development with what we call Rimbaud Tracing, which you can stream your traces over the network. So on a system that doesn't have local storage, instead of writing the traces to disk, they can be streamed over the network to a remote host. There's also what we call the snapshot mode, which is really useful when you have a server which runs fine most of the time, but every now and then, once a week maybe, there's a problem that happened. Then you can run a snapshot mode, a snapshot mode which will actually trace to a circular buffer in memory and never touch disk, but when the problem occurs, you can launch a trigger and say, write the buffers to disk, so that will allow you to not only have the events that happened after the problem, but depending on the size of the buffer that you configured in kernel, you can have the events leading to your problem, which will be really, really useful when you afterwards use it to find the root cause of the problem. It could also be used to do low-level metric collection, so we can imagine a scenario where you have a lot of hosts and you could configure a network streaming tracing session on all of them and then stream those traces to a central collecting host and then add those events to your other event collection system, like, well, your metric collection systems. And finally, the low-overhead top-like monitoring with LTG and GTAP, which, as I mentioned before, can be very useful to get diagnostic information on a system that is overloaded at the moment. So the main workflow with tracing is given a reproducible problem to gather the trace and, well, then using the tools I mentioned before, pinpoint a specific moment in the trace where you see something interesting and then try to add some, well, actually, when you do tracing, there's so many events in a trace that it's always difficult to find when your problem is occurring because of the overload of information. So it's easier to start with enabling only certain events in your trace and then as you progress in your debugging session, you can then add some other trace point that you've identified as interesting. So I'll try to do a short demo of the LTG and GT analysis tools. So I'll use a trace that I've collected on a system that was running a simple PHP application and which add sometimes high latency. So the mean response time was in 20 milliseconds and once in a while, a request took 400 milliseconds. So we'll try to find what is the root cause of this latency. So the workflow would be, which I've already done on another system would be to run these command line tools to gather the trace. And then either on that system or another system run the LTG and GT analysis tools on the trace file. So the analysis are written in Python, so they are quite inefficient. They require a lot of processing power and time. It's something we're working on at the moment, but the objective was to have a working system first and then optimize it as time permits. So I've run this analysis on a really short trace. The trace is only 16 seconds long and you saw it, well actually quite a few seconds to analyze this. So on a real world problem, the trace would be a lot bigger and sometimes the analysis will take multiple hours to run, but that's something we're working on improving. Okay, so can you actually see what's on the screen? Hopefully. So here what we see is the top 10 processes performing IO system calls while the trace was enabled. That's what you're supposed to see here, but I can maybe, well I guess you'll need to trust me on that one. So yeah, we should see the 10 top processes performing IO and then we should have the same information on a per file basis. So we should see, yeah, so we should see that there are reads to a database file and that we also see the LTDNG system that also have read and writes. But that doesn't give us a lot of useful information for now. So let's try another analysis. Okay, so I'm currently running the IO latency statistics analysis. Well, it's hard to tell but I'm using a min size of two which means it will filter out any syscal that has a less than two byte parameter. So because obviously when I was running this demo, my shell was waiting on a read for a character in bash so it polluted the trace with unrelated events but we can filter them with this kind of options. So here we see the IO latency which is the delay in the IO subsystem between the delay between a call, sorry, is the delay in a system call. So between the time where we call the system call entry and the system call exit or for block IO events, the delay between the block request and the block complete. So we should see a per process. Well, so we should see that there's a max latency which is orders of the magnitudes over the average latency for the open operation which tells us that something is wrong is happening on some open syscalls. So we should then use the frequency distribution analysis to maybe identify these problematic latencies. Sorry, I'm trying to scroll. So here we can see a frequency distribution of the open syscalls and we see that there's only one system call that is way off on the latency from the other. So there's 1,414,000 of syscalls that take less than 25 microseconds and then there's one that takes over 400 milliseconds and we can also, so now we can try to pinpoint where is this specific latency that is way bigger than the others. So I'll use the IO latency tab which will basically give me the, with the limit of four, which will give me the four worst latencies for the whole trace with specific information related to each of those. And then I can see that my worst latency is an open syscall on a PHP session file that happened at a specific timestamp. And the next step would be what was happening at that specific timestamp. And obviously because this is a demo and demo never works, I've tested it before this session and this analysis script is broken on my laptop but unfortunately I had the output from a previous test that I'll just show you. So here I've run the IO log analysis but I used a specific time range which includes the timestamp where I had my previous latency. And then I can see that I have a sync syscall which took, which started at 12, well, it started before the PHP session open and ended afterwards. So I can easily determine that this sync syscall was the cause of the open syscall taking much longer than the other word, the other. So if I had more time I could use other analysis to find the source of that sync syscall. So you can see how you can use a single trace and then extract information from it to go backwards and trace the source of your problems. So I'll welcome your questions if you have them. Are the user probes compatible with the one from the trace on the system tab? I'm sorry. Are the user probes compatible with the one from the trace on system tab? So the user command line part or? No, the user probes. The user, oh, user probes. Yeah, so the kernel tracer, the LTNG kernel tracer will bind to all the trace points in the kernel. So K probes, U probes and everything that is instrumented in the kernel. So we basically share the same infrastructure as F trace and system tab in the kernel. It's just a different collector of those information. Yes? In the example that you showed, you were able to pinpoint and demonstrate that, for example, Apache 2 was the process which took some time, which was working on a PHP session file. But if I had to dive deeper into the Apache subsystem saying that what exact routine or function call of Apache was at fault, does LTNG have a simplistic view of it or is there more work required in terms of taking the traces? Well, I've only showed a really small subset of the different analysis scripts. So all those scripts will also take PIDs, for example. You could limit the analysis to a single PID in a trace file that, well, your trace file will include everything from the system and you can then filter on multiple types of information like PIDs or, I can't quite figure them out of my head, but yeah, you can do a lot of filtering. And also if your application is instrumented with USD, then you could have a trace that would combine the kernel events with the user application events and then you could do some more correlation. Also the LTNG analysis code base is meant to be expanded so you could, using the framework, implement your own system-specific analysis on top of LTNG, any other questions? We still have eight minutes for that, so please. Is it possible to instrument on dwarf symbols? Dwarf symbols? I know that we've worked on adding debugging information from dwarf binaries. So you can add, but actually I could probably answer your question in more details if I could, I mean, I do mostly the packaging work, but the main developers would know a lot more about that so I can relay your question to them if you'd like, I'll be happy to. Instead of saying stuff that are not in my realm of... Hi, you said it's a low overhead system. Yes. How low? Just a poor figure. Yeah, overhead is always something that's really hard to measure, but we ran, well, for example, a benchmark running, we ran some benchmark on, for example, MySQL. So we ran some, I don't remember which one, but some MySQL benchmark system and so we did runs with no tracing at all, LTNG tracing and S-trace tracing and usually the overhead on the request the main request per second was of one or two percent when you had LTNG enabled and it scaled pretty much, little early with when you added multiple cores to the system while S-trace was, that didn't have a much bigger, like maybe five or 10% impact, but once you run the system on multiple cores then the performance was a lot worse with some other systems while the impact wasn't as much on LTNG. But yeah, there are multiple benchmarks that were done by officials and from by third party that you can easily find by Googling for them. It's an out-of-treat kernel module and the way you have packaged it in Debian, we've simplified it with DKMS, but in the longer run, are there any plans on getting it mainline? Not for the moment, keeping it out-of-treat gives us more flexibility in what we can do and we can control our own release cycle and since the kernel usually aims to have only one system do one thing in the kernel, there's already F-trace and Perf, so it would be a really hard sell to have LTNG included in the upstream kernel. Thank you. RAP. Okay, thank you. So thank you very much. Thank you.