 So good morning everyone. This talk is going to be a bit about system performance, especially kernel as well as user space. And I'll be focusing a little bit more about LTTNG here. So without any delay, I'm just going to start. So I'm Sujakra. I'm a PhD student at Polytechnic Montreal, which is associated with University of Montreal. I work on debugging, tracing, compilation, and dynamic instrumentation also. And that's about it. So today we are going to talk about tracing. What is tracing? But before tracing, you would need to know what tracing, what kind of group of tools the tracing is a part of and what can you do with it? Why do we really need it? We'll talk about some tracing tools like ftrace, system tab. We won't talk about all the other tools. But in the end, I'll give you a pointer, what you can do if you want to analyze the complete performance of your system and what would be the best choice of tools. The focus would be more towards LTTNG and how it's, what are the internals of LTTNG and how it has been developed and some details about it. And then the most important part about trace analysis tools. So now you will be like looking at this, it's all tracing. I mean, what's this thing? Okay, so I'll give you a brief introduction about tracing. So let's start with tracing 101. So tracing is a part of a group of tools which you can use to check system performance. You can do debugging, profiling, and you can do all sorts of performance analysis. And one of those group of tools is tracing. But the thing is when we have debuggers, when we have everything, why do we need tracing? I mean, what is the need of tracing? I really did not know the kind of audience which is going to be here. So it'll be semi-technical and like semi-introduction to tracing sort of talk. So I love bike rides. So here's a story about bike rides. So consider you have a bike, which is actually a program and you are now running your program, which is like running your bike. And what you are doing is you are going to get Vada Powerz in your favorite Vada Powerz place. Okay. So and then while you're riding, you hear a sound. I cannot make the sound properly. So just read it. Okay. So you have you have some sound and then it's gone suddenly. Okay. And then it comes back again and then Oh, it's suddenly gone again. So you feel I mean, what's the problem? You want to diagnose the problem. It's your program, which is running and you want to know what's going on. So you know, some there is some problem with the tire. Something is wrong. Okay. So there is a problem with the tire, but what's what's the problem? So you pause. So this pause is your break point. You put a break point in your program and then you pause your program. And then you start analyzing. So you start seeing, Oh, what's, what's this? What's this? Okay. Okay. Let me put a break point here and see, try to analyze. And then you're like, okay, let's continue. And then you continue riding your bike. And then you pause again. Another break point. You have put some break points there. And then you start analyzing it. So the thing is, you're getting late now. So what, what does getting late mean? Obviously, you will not have Vada Powerz. Probably the store is closed and you can no longer buy them. So what not having Vada Powerz means is that you had a program which was originally intended to give you Vada Powerz. If it would have run properly, if you, if you would have bike properly reached in time, you would have got Vada Powerz, but now you don't get it. It means that the result of program execution is inaccurate. Okay. In a normal execution, it would have been good. So there are problems, you know, but you cannot diagnose because when you try to use conventional tools, it starts failing. Okay. So what do you do? So probably you can ignore the problem. It'll go away. This is a very good solution. Like, yeah, it'll get you fired, but you can still try it. What else you can do? So let's take a scenario. You take some paint and you put it on your tires. Okay. So these paint things which you have put on the tires are actually the trace points. Okay. And then you're like, okay, I've put my trace points. It's like I've enabled those trace points and then you ride. So what's going to happen? You start riding. It's going to leave a trace on the road. Okay. You'll see all these traces and these traces would would be according to what you are, how you're running. Okay. So all of these are events. So if you analyze it, if you try to think about some events being enabled on a tire, so it sounds a bit ridiculous, but you still think about it like that and you start running your program, it's going to leave the traces. The traces would give you accurate time. So these are timestamps which you see on all of these trace points which you have. So this actually is tracing. It's nothing more. That's the truth about tracing. So you have some specialized techniques like the paint on the tire and you start running your bike or whatever program you have and you leave traces on the road. And so what you can do is you can now like reach in time and you have your what about in time and your program is running properly. And you can then look at all those traces which have been generated and analyze them later to find out what the problem is. We'll see. So you have some data on your trace, which is like a trace payload. So this is for a tire, it would be like some specialized grooves or something. So now as you move fast, you have more trace data being generated. So it's not just like one event and you have that. So you have more trace data being generated. Each of those things were events which you generate while you're running your bike. Okay. So still more, a lot more, more. So what do you do with all these events? It becomes very difficult to analyze it. Okay. Deal with it. We are not going to deal with it. We are going to do something about it. I'm going to tell you. So but before that, let's come back to the tracing and let's try to analyze what a tracer is. So some of the characteristics and benefits I have laid out here. So a tracer should be fast. It should not alter the performance of your program. It should be like really fast. It should not alter how your program's output would come. And that's your main goal. So a tracer is very fast. It has low overhead. It should not interfere with the target. And a tracer would obviously generate large amount of data, large amounts of data, but we have to find out what we have to do with that data. A tracer is also high accuracy. The timestamps have to be very accurate. There have to be a mechanism where you can get in the quickest possible manner the most accurate timestamps from the system. So the benefits are that you can use it to detect Heisenbergs. Heisenbergs are certain hard to detect errors like which occurs sometime and then they do not occur. So you can let a tracer run for a long, long time in your system, probably even a production system. And people use that. So and then you can analyze your traces later on. So a tracer can be classified in many manners. Any tracer, it's like something which I thought about. So you can have either a kernel space tracer. It does tracing in the kernel space. You have things about your system. You have things about your normal user space programs. And a tracer can be static or dynamic. Static means you put the trace points inside your program before you are trying to compile it. And dynamic is that your program is running or the banner is on the disk and you try to insert trace points there and use it. So I'll talk also about the building blocks of tracer, probably not all of them, but the most important thing. So the debugger style tracers can be using ptrace syscall. So all the debuggers are not all, but most of the debuggers would use ptrace. And even if you can, you can write your own debugger in like 20, 30, probably not 20, 30, like 100 lines of code and using just the ptrace syscall. But ptrace syscall is slow because inherently it's a syscall. You have to do the re-parenting of the process and everything. So the ptrace would be a bit slow. So inside the kernel, you have like very intelligent techniques such as the trace event macro. You should once like go to the kernel source and have a look at it. We're going to talk a bit about that. And then we are going to talk about kprobes, kredprobes, jprobes, etc., which provide your technique so that you can dynamically insert trace points in your code. So it's, I mean, it's a facility provided by the kernel so you can dynamically instrument. So let's talk a bit about trace event macro. It's going to be a little bit of details. I would just mostly just make it a bit simple if you prefer. So it's a macro in the kernel which automatically adds callbacks to the trace point call. So you have some certain you have trace and you have your even name and trace underscore some even name. And you have these calls everywhere in the kernel. So it just adds callbacks to the trace points and then any other infrastructure you build using the trace event macro which is built upon it can have that data and can do whatever it wants with the data. So it's used by almost all the tools which provide tracing inside the kernel. It's used by ftrace lttng system tab. So it looks a bit like that. So it's a small trace event macro for sketch switch events. So you give the even name here. You have a function prototype. You can specify how to store data. It's a very detailed process. So you can look at all these trace points which have been done previously. You can look at the kernel source code. You'll find a lot of them. So for example, we'll deal with sketch switch a bit. So it's a bit consistent in this presentation. So there are some more macros when you are defining trace points inside your kernel. It provides a very good way so that you can easily define trace points inside your kernel. You want to add more trace points and you can do that. So there are some macros like you define a trace system and then you have to do define create trace points. You use this macro. So here is the trace point being used inside the kernel. So you can go to kernelsketcode.c. So it's the sketch switch event. So it's actually inside the prepare task switch. So when the task is going to be switched, so you immediately want to have an event there. So you have that even. So at the time, there are thousands of tasks being switched again and again in the system. So you get all of these traces details. So it's obviously a huge amount of data. So then other tools, for example, ftrace, they can hook onto these trace points. So ftrace has ftrace event reg. So you can use that and I mean ftrace uses that and it uses tracepoint probe register. It registers the tracepoint probe and then it can use it. lttng also uses the similar technique. Some wrappers around the tracepoint probe register and then you can have lttng wrapper tracepoint probe register. You can use it in your module and you can like instantly get your own tracepoint enable inside. Then there are the dynamic tracing approaches which which are through kprobes, jprobes. So kprobes is they usually like trap into almost any kernel function. You can trap into a kernel function and then the target instruction you patch it with an interrupt and then you have a handler for the interrupt where you can do whatever you want, gather some data and you can optimize this by having jump optimizations instead of having trapped every time. Then there are jprobes and kprobes you can like go through them but that's enough. Okay so I'm not gonna go into details now. I understand I see the sympathy in your eyes. Oh please what is this guy doing? So anyways so I'll you want to now know what can be done. These are just too much details there. So we'll quickly jump to the tracing tools. So one of the famous tools which almost everybody knows and it's the kernel's favorite tool is ftrace. So ftrace is provided through the debug fs interface and for example here I'm showing a small script of how you can use an ftrace to have a function graph. So it uses the function graph tracer. There are certain trace targets like knob, wakeup and all of these. So I'm going to show you the interesting one which I like really the most. So this this script you do tiny function trace. Who am I? So when when the whomever function gets executed it's going to build a complete function graph of this. So it would look something like that. It's just a snippet. There are lots of things. So this shows what went on in the kernels which function was executed where the trace points were and it'll show you how much time it took to come out of that function. How much time this function execution took and then this is the it records the entries and the exits and then it calculates the I'm in the latency how much it it calculates the delta of the time and how much it took. So ftrace is really powerful but it's a bit difficult to use. So you have tools built upon that like trace cmd it was done by Steven and you have kernel shark which gives a good geo interface. So now this here you want sketch switch even so you just do trace cmd record the sketch switch event for who am I? Okay and then you do trace cmd report so it gives in a bit better fashion. So this is just like a small snippet. So this gives the process name CPU from where it was switched to which process the target process and so that's how you can use ftrace. And then there is system tab the favorite of Federa folks. So system tab is quite good. I really like it. It's a very feature rich thing but I have done some analysis and it's a bit less scalable but you you can still use it very well on the federal infrastructure and or in red added works really well and it's very feature rich. The best part is that it's it's a bit similar to how detrace was. So you have system tab scripts they can be passed to see and then converted into modules and then you execute that modules and the data can be sent back. So it's it's it would have some interference but still it's a it's very good because you can have scripts which can like what should we do anything you just write the script and it's going to work. So you have you can do probe syscall read return. So so for every read syscall which it would made it would find the time it took and then it can print a histogram there are very cool features in this. So you you see a histogram like that and then there are more probes you can have for example you can do colonel dot trace. So it's like hooking on to the trace points you can provide what even you want like sketch switch you can hook on to any colonel function you do colonel dot functions this open. So it's going to put a probe there and then you can do whatever just like that we were doing in the script. So you do probe colonel dot function and then you can use it. So and then you can even use it and use the space directly through stabbed in it's Dennis based or you can use you probe based I think it uses you probes. So there's a process and you can do process full and statement and even which if you have like the debugging for everything you can do add this source file at this location I want a probe to be set and I'll do what you what I want to do inside that probe. So it's really a powerful tool. Then you have gdb trace points yes I know probably a lot of you didn't know that you can do tracing inside gdb also. So you can do trace foo and you have actions and you specify the action. So for example, so foo is like a function I have so I do trace foo and then I provide actions so collect registers and locals first and then for while you are stepping the next nine instructions you do collect you collect the registers. Okay and then you do T start for tracing start you do whatever you want in your program and you do T stop your tracing stops and then you can use T find to find the frame you can dump the frame and analyze it it gives you all the values at the registers because we wanted it to collect the registers and local variables and you can switch to frames more frames the fourth frame you're switching and then you can analyze it. So you can do like really cool stuff with gdb there is an option of fast trace points also. So instead of using the normal trace point technique it inserts jump but it's specialized you need to have instructions less than five bytes and certain in most of the architecture so you do that. So then I'm going to talk about the most important thing LTDNG. So LTDNG is a really awesome tool it has very less overhead it's built upon the colonels trace point trace probe infrastructure trace point event and trace event infrastructure which is already there then the colonel part is provided by LTDNG modules and the user space part is provided by LTDNG UST library. So you can do you can create an event like create event and they do something and then you can do LTDNG stop and view and destroy and this generates lots of data and it's very important data but that's just too much data okay you know that so so you have a timestamp here you have a delta here and then the event which you wanted so here we wanted sketch switch it's a colonel event sketch switch so it's going to give you all the data there about so this you get when you do LTDNG view so the data is stored in a unified trace format and when you do LTDNG view it uses a tool like Babel trace and to parse that whole data it parses the metadata and brings about all the event data takes it out from that CTF formatted file and you can read it you can send this data anywhere else as well so but but it's too much info I know we will soon see about some tracing tools so what goes on inside LTDNG so one of the important things is the LTDNG session demon it's a session demon which always runs and then you provide the commands which we were giving here so you provide these commands and then suddenly if there is a user space application which has trace points inserted statically or probably dynamically also and then the colonel trace points are also there and they have been enabled so the session D recognizes them and it sets up a consumer demon attached with some shared memory and the trace data can directly be put into the shared memory there and the consumer demon consumes it and it has certain providers like the trace data can be stored on disk and it can be relayed over to network and you can do like analysis yes you can even do live analysis we are going to see that so then you can you have these tools trace viewing tools from where you can you view all that so Babel trace trace compass is a GUI tool I'm going to show you how it works because the data is huge we have to do something about it so that's the interesting part so here so here's an example of a usd example user space tracing so you have an instrumented app and there's a usd listener thread with the app because you have built it with the trace points and when the session demon sees it it uses unique sockets and makes links between the session demon and a consumer demon so these two demons are like the core then it brings up the shared memory which I already showed you and then you can like consume it so the shared memory is basically a ring buffer you fill it again and then you consume it as it's getting filled so it makes this thing really scalable so how does usd work well we're going to see you can talk to me afterwards yeah it requires a lot of power as well and go to lttng docs you can read about that so some more cool features of lttng so you have trace snapshots so what you can do is you can do lttng snapshot record and it'll instantly take a snapshot of the trace buffer and you can store it in a disk or send it over the network so it's a really interesting feature you can use it in a flight recorder mode so when you use it in a flight recorder mode it's like you constantly try to create snapshots and override them periodically just the way a flight recorder in an airplane works it's a really interesting feature you save a lot in disk space and you can do like live analysis when you analyze it so you can trigger a record for example small use cases if you have in federal automatic bug reporting tool so you can trigger a record instantly and take a small snapshot at every time there is something something happens there's a kernel panic yeah trigger a snapshot so you can record alerts in infrastructures like nagios etc and then correlate those logs with the traces you have received from the system so it's easier to do that it's very remote you can do remote analysis and it's very lightweight so then you have the live trace view where you can use lttng relay daemon so the consumer daemon takes stuff takes all the events and the relay daemon can write it on the disk as well as relate and then you can like observe it from any place so for example if you have your federal infrastructure servers are there and you have you have lttng sessions running on all of them they can send their data they can send their trace streams directly through and you can use the relay daemon on another place on some another infrastructure and then you can live view any of those live traces from any of those servers so this is one small use case then the most interesting part there are thousands and thousands of events millions and millions of events generated we have to make sense of it so how do we make sense of that all that data so you have tools for trace viewing and analysis yes that's now that's what we are going to do we are going to deal with it now so the most interesting for ctf which is being used in lttng is a ctf viewer which is the babel trace it's a default viewer and then the most comprehensive one is a trace compass it's a eclipse based tool and you can you have an rcp for that as well and you can do live views you can do remote views inside you can do very detailed analysis we i'm going to give you a demo of that so you can download the latest trace compass rcp from here and you can so there's a critical flow view experimental done by my colleague francis jehaldo so he's doing this so it gives you a critical flow of what was happening in user space as well as actually in the kernel space and what processes were doing so it gives you a good correlation so it gives you the critical flow critical flow view it gives you the critical path of how your system was so your system was falling your process was falling as it was going through so you can download it here so i'll just give you like a small demo of that so this is the experimental one you can download the latest one from eclipse and i have the links on the slides and it's taking time and it's taking time it's experimental if it crashes don't say anything to me so what i'm going to show you here is an experiment which francis did some time back he used a trace of apd get tree so it gave so so he you he did apd get install tree so what happens when you do apd get install tree so he did this trace in 2013 december i think and so the good thing about traces is you have all this data and you give it to someone and they can analyze it okay so it's a it's a very good thing to do so you just generate all these traces and then you can analyze it somewhere else just give it to someone so i took his traces and i put it up in the trace compass i start fired up trace compass and what i compared this is with probably it's not a good comparison but i just wanted to see what's going on i did a dnf install s l s l is a good tool to write out so i did dnf install s l and saw what's going on so these traces tell you what happened in the whole system this is your complete events which were happening as they were so for example the this is a scared wake up and so how it happened everything how everything happened you can see so this is your control flow view how was how your control in the system was going on so you have all these processes so for example we are just interested in apd get so i can just uncheck and i can do apd get and i check the subtree and so so this is apd get here so let me zoom out so this is the complete execution of the apd get it started somewhere from here so if i mean i move it a little bit so it started somewhere here you can zoom so and its parent was like a pseudo so there was a clone process here there was a select okay so you you hover over it and you can see it's a sys select sys call which was there and then you can see the control was passed on it was the clone sys call was made so the green ones here you have the legend here so you have so it was a sys call when it's green it's in user mode when it's yellow it's weight block so so you can like quickly look at your complete trace of your apd get while it was running and you probably know what was happening so there's lots of green stuff user mode things are happening oh then there's like a big weight block so so so we see it it stopped somewhere here so so we see there's like a big weight block here so so he started analyzing it a little bit more and he did he did a very pretty nice view called as a control flow view so this control flow view what it gives you is the complete a critical path of what your process was doing so so you can see here there's so many man db processes which had started it's really ugly this is actually your apd get which started here and then you know here for for all this time so all this part is synchronized it was doing some something so if i select this small part i have this time delta below i can analyze that time there and it's synchronized with this so you can see what happened in the whole the other system while it was there and then if i zoom here i see oh my god this is just like something is happening here okay so instantly you know that there was an issue why is man db communicating so much with so much with apd get so so you you can like analyze what what's happening in your complete system right right there itself then here this this legend for this color code is a bit different so these are timers and this is network so it would obviously get something from the network so probably somewhere in the end there was some network interaction and then it's interacting with dpkg as well you can see here you know so here it's interacting with dpkg for the time so i so so you can like hover over this and get more details about this even at what time stamp it was running so this is something about apd get and then you have i'll show you how dnf looks so we'll so i loaded the dnf trace but we would have to find out where the dnf is and yeah okay so so here's your dnf execution i mean there are some things which i observed i did not check the code of dnf but there are some things which i observed so it's there is lots of syscalls being made i'm not sure what's going on i'll have to look at the source code but this is just insane amount of syscalls probably it was a problem with my system because yeah so there's like so many clones is called being made but probably it was a problem with my system or it's really a problem in the thing so so you can select some zone here and then you can quickly click on resources and in resources it tells you in what cpu what was being scheduled and if you have some irqs which were there so so yeah if you zoom out you see lots of soft irqs so there is something really going on here at this moment so you can see like a good resource view you can see the control flow view and you have all the event list here you can quickly browse through left and right and see all the events what was happening so so yeah so the interesting thing is a critical flow view here so so so you see it's uh dnf is like pretty good and it's this is my system probably it was waiting for some network so so this is a timer which is being running so it's tried to do something here there was some block access here probably it was trying to read the repo files from the disk and see what's going on and then you have uh you you have your normal execution of dnf which is going on and then there is some timer and then here you see some network interaction so there is a network activity so probably at this time it was fetching uh data from uh that and uh then it's writing something on the disk here so you you see some block device writing being done here and then this is the journal d stuff so it was writing here so here it it was writing the sl probably back and then you can select some part of it and analyze in in the system what was actually going on so it's like a pretty good tool so trace compass gives you a very very detailed analysis it's it's just a small thing which i've showed you can do much more you can probably just uh like contact me later and i'm gonna give you another demo of that so uh coming back to this so we talked about live trace view yes you have a viewer like lttng top which gives you live trace view on your systems you can pause the live traces you can see the you can go forward and backward and see all those traces so i'll just give you a small demo so i'm doing lttng top dash k so uh just a second let me uh just a second so is it visible now yeah so so i can do something like that and it gives you uh it gives you detail about your cpu execution so so you have cpu top puff top iotop you can see so these are traces which are actually being generated right now and you are seeing them live so so we were talking about sys open we wanted to know so i i i just try to see okay so it uses k probes here and sys open so whenever there will be a sys open it's gonna like show up okay uh here so you have uh you can like pause the traces so as the traces are being generated you can uh like pause it with p so you cannot see that thing here so there are all the details which are on the right so you can pause it and then you can go back in time and you can see what traces it just recorded and you can go forward in time and see what traces uh it just recorded there there's lots more with lttng top here we are gonna talk about it soon as well so um it's still in development it's 0.3 so it's a bit stable so which brings us almost to the end of our talk and one more thing this is an amazing slide by brandon greg i i've i'm following him and reading his books and he does really good stuff so it's it's i have printouts of this i can distribute them later on if you want so and you should pin them at your desk whatever whoever is working on system performance tools you should have this on your desk so it gives you a complete view of all the tools which can be used when uh which part of the system you are trying to diagnose and it gives you like static analysis tools as well and detailed tools like rdmsr like just direct instructions to read and you can for library for example you have ltrace and then perf ftrace system type lttng you can use throughout i'm actually working these days on ebpf which is the upcoming stuff uh going on and uh then you can know more about system performance here and uh these are the references and thanks to a fissure serix in montreal uh docelab and of course the federal community for bringing me here don't forget lttng.org slash docs the docs are really awesome the lttng guys are working really hard it's an open source tool my lab is working on all the open source stuff so you can just contribute to it and it's it's really an amazing place to be so uh if you have any questions i'll be on such a chron lttng this is my email id and my details so that's it so questions yes can someone please bring the mic there uh when compared to system tap uh how um so for example in system tap uh specifically in red hat we use um to either probe something uh a specific crash and specific uh areas of the code uh which we have to insert it before uh when the code is run or the system tapped has to be compiled and then inserted before we actually run um so in terms of advantages between system tap and lttng uh what are the other uh features that lttng have or does it um has some something similar to system tap so uh if you i assume you're talking about dynamic dynamically inserting probes so there is some work which is going on in dynamic insertion of uh so lttng is an ever evolving project so there is some work which is being being done by for inserting trace points dynamically so i did some part of it and some part of it is a little bit stable now but the issue is which i saw myself is that if you have like thousands and thousands of trace points and you insert them so it takes a lot of time and you enable all of them and probably what i observed while working is that the main benefit of lttng is that it's insanely scalable i have tried it from zero to 64 cores and it's very scalable there are a few even drops the evens are dropped sometimes but it's really scalable so uh this is one of the benefits and it's really fast okay so probably they don't have the same goal so obviously they have the same goal all the tools want to be the best so but some tools have this feature some tools have that feature so you have to like see okay this here i have to do a small check so i can just probably just use system tap and insert a probe and i write my own script but here i want a very detailed analysis it's a production system i don't want it to crash at all i want minimum latency so you use you properly instrumented statically with lttng so it's a i would see that it's it's a you know thousands and thousands of tools but you have to select properly which one you have to use so it's more about that then it's about being a competition of tracers that okay this is better this is not better so i see it that way probably the goal is to have a very unified system but you know it's i mean it's everything is still in development this is still a new phase so yeah thank you so i'll take more questions later on if at all you need so thank you very much i'll tell you about where you can get these slides i'll post it on the twitter so thanks a lot