 So today I'm going to talk to you about how we can use it to solve complex problems in prodigy. So the topics we cover are tracing, the different kinds of tracing because usually there is a mix that can be done with trace buffering, aggregation, sampling. Then we talk about what is XTNG and why we want to use it for this kind of these cases. And then we discuss the four main models of XTNG and how to compute them and why they are useful for our virus cases. My name is Milly Fosse and I'm a software developer at Shell. We developed and support the XTNG. So I work on XTNG and how to use it in space. And Bower Race, the text from the team. And also the author and the owner of the latest tracker and XTNG analysis project. We'll talk about that during the presentation. So the first kind of tracer that I want to talk about is trace buffering. So that's usually what we refer to as tracing. It's this fast and efficient blogger. So you take the instrument control and it generates blood. But the difference between normal blogging and tracing is that it generates usually the larger amount of trace. And that's why we use buffering. We want to extract as much information as possible. And make sure it's complete and the full idea of what's happening by our system. So it's usually really application-specific instrument for developers and created by the developers. So it's really good for debugging the local condition. So the common tracing code in the next I have trace, which is in the ground. Perf can be used as trace buffering also, but it's not the main code. And of course, what data can be used. So the buffering use cases is to understand complex problems. So you need as much information because you want the low-level data. And then the local process, the trace offline, just read the data and see what was happening. So it can be used to solve concurrency issues, race conditions or drivers, to roll-able problems. You also have hardware tracing, which does this kind of trace buffering. But it's usually considered the last time of the event because when you are at the end of using a tracer, because you didn't find a solution with a simple tool before. So that's usually why it's not that popular to use a tracer and why you always have to rediscover a code because usually you don't use that everywhere. But with what I'm going to show you today, you will see that it can be used and in connection for cloud and monitoring users. The second type of tracing is trace aggregation. So if you have heard about T-Trace, see that, what they did. So it's the same source of information. It's usually trace points in the camera or the manual. But instead of loving the data, you aggregate them to extract parameters. And then you output reports. So it can be averages, min, max. It can be also used to just give a profile of IO or the kind of IO operation for a specific file. So it can be reconfigured to make sure you understand one system, but it's after runtime. So when you upload the code or the program, you get a report and that's it. You can then go back to the time and see what was happening. You just have this source of data. So it's just aggregation. So the common tracing tool aggregation tools are standard. Now we have EBBF, Wichon P, which we can use in VCC, which helps creating user space program that load EBBF code. And the data tracker, which is the tool that we work on. We'll talk about that. And the third kind of tracing tool I want to talk about is sampling or providing. So it just takes the current activity of a system at a very particular time. So it can be time-based or it can be when the counter overflows. So you could have some forward-degree, 1,000 instructions, just what's happening on the system at the moment, the counter overflows. And it's used to extract statistics, but mainly outspots. So if you have a current protocol, you want to see exactly what was running on the system and it really helps you find what was the activity and where your protocol was happening. So for FT&G, the main problem is that it's a fast commit result, as fast as a trace. The main difference is that FT&G also extracts the payload of the system cause. FT&G also doesn't output the line of the parameters, but if you have pointers in there, it doesn't reference them. So FT&G does that. So it's also a fast user space tracer. It doesn't rely on the system cause for each trace one. So compared to tracing or logging, when you have a test on the right system cause, every time you want to output the line, FT&G is in memory and when the protocol works like that on the system. And it's designed to run in traditional environment. So it's not just a debugging tool. If you don't have any more choice, you can do that. And it's filled with a lot of tools for post-processing traces. In the batch or graph form, we will see some examples later. So FT&G can generate a lot of events. If you configure the default system with the current tracing, on this batch of that does nothing, you can generate about 54,000 events per second. And if you run the same configuration on the server that's busy, it will generate 27 million events per second. So it's 95 megabytes of trace per second. So that's a lot of lines and you have to process them afterwards. So the default mode is maybe not for everyone and we will see why we have the other modes and why they are useful. So the first mode, like I said, is just extract the trace buffers and write them on this or on the network. So you can have FT&G related that just waits for all data packets to come. So if you don't have space on your local system, you can send the trace remotely. So you are only limited by this space and you are willing to process the traces. But the main use cases is that you want as much information as possible. You don't know what you are looking for, you don't want to get just the thin picture of what the system is doing and maybe sometimes you just don't know when the program happens so you want to record everything and then do the post-processing. And it's also used as a continuous integration system where you can just record the trace while the tests are running and then the post-processing is something way better. So create this cost-trimming session and you just create the session, enable events, start the session, you wait for the entity or the program to happen and you start the session and then you can process the trace. So you cannot process the trace while it's running but you really have to stop before being able to process the data so it can be a good pace on that. If you take a real-world example of a distance-trimming mode, we have the full write-up of the space where the link can be but maybe it's in the script where users complain that the website is slow and you don't know why or when just some time they complain. And the standard monitoring tools you really don't make of this kind of activity. They will just take a sample every minute of how many factors per second you already know was the average you can see but you really don't see those values in my data where this kind of observation tool comes really handy because now what we do is that we record every idle operation so that it's more like some kind of observer so we can specify the process and when some planning and reward problem they can go to the tool run some processing signals and extract really detailed information so that's just a little bit of stuff I can really read it all by now but just to give you my journey it's the idle agency stuff so it takes the trace and extract the top 4 for the open, read, write and sync kind of operations and this one I will show here and the first one is the open of the PHP 5 session 5 so that's the website that's trying to create a new session on the disk and it took around 500 milliseconds to open this file and we look at the bottom we also see that there is a sync that's happening on the system at the same time so it's working the disk and the Apache server and I'm creating a session file so it's just hanging so that's the kind of micro-spider that we wouldn't see with but that's possible and after that we should look at the blog post if you want more details we really need to think how we find this and how to explain where the sync comes from the second part I want to talk about is the language it's literally the same process to create a session but the main thing is that you attach to the session so you can process the trace while it is still running so that's a big difference because the other kids want to stop the trace and then process it life can attach to it and start processing the data another difference that we can do is that normal model is that the trace file size so we can record it since you don't want to keep the full history on the disk you can limit the disk space that's used by the trace so if you attach to the session but it's still running to this but at least you have a standard usage disk usage live work in itself is useful for low throughput loading because you don't want to win 54,000 events per second on the console as you go so if you configure really specific events that are really low throughput maybe you want to attach to a live session and see the events when they happen but it's also used for distributed embedded system or monitoring out of them so you send the trace to a relay and then you do your analysis on the relay so you don't have too much impact on the system that's being traced you just use it to trace, send packets and then out of them you have a system that process it so the big difference between the first model and this one is that you have to attach live on the create operation and then you can do why the trace is running instead of having to stop and then and of course you can send also the trace to remote server for the bounded disk usage you can configure your channels to be on the in this example 4 files of 10 megabytes so we rotate on those 4 files for ECU on this channel and that way you have 40 megabytes per scheme at most of the disk usage and of history also one of the most interesting is this app instead of having the traces for buffers in memory then we extract the data right into this in this part we go like this by default so we just trace memory memory buffer and that's it, it's just writing events and when the other events when new events want to come we just override all the events so it's low overhead because instead of doing IO we just write in memory and when you want to extract the trace just type in this snapshot record it takes the content of the buffers in memory and write them to this and then it can do the off-processing so you can configure triggers new applications or anything of monitoring tool just when you take something you just record the snapshot and you get the back work of maybe a few seconds before the incident happens so you can do that you can configure the core file for when you generate the code you can also trigger instance snapshot record and then you get the full activity right before the segmentation before that of course since we are writing in memory and overwriting the data and the size of the snapshot depends on how much memory you want to dedicate to the tracing session but that's really something that you can configure and depending on your event rate and size you will have a bigger or smaller snapshot at the end for these cases are 40 test occasions when you know when the height is indicating you can trigger the snapshot and also you can ask profiling tool because you can say every 5 minutes I want to see a story of the last 2 seconds and see what was happening on the system of course and then confused iteration also can be used because when the worker takes an error you can just record the snapshot and at the end you will be able to process the crash and you will have it up the difference between the default load and the snapshot mode is that on the creation it's inside the snapshot and that's it and then the order and start the session and when you want the size of the snapshot it doesn't push the data from memory so if you just type the size of the snapshot twice you just get approximately the same snapshot and you just so if you want a complete example we are monitoring the website with the latest tracker with just an aggregation tool and you can see that it uses trace points and just concludes the time the first byte when the receiver requests the actual send of data and we just track the latency of this operation and from the graph we see some time with this byte so that's fully automated at this point compared to the first stage where we had to wait for those to report and then we automated the aggregation tool so we did take spikes and now with the snapshot mode we can expand the snapshot every time the spike is detected and then from processing I'm just a small trace so I think we have one second of the story just before, one second before then I can see it up so we can see the review and idea we have metrics for each latency measured by the latest tracker and every time you see a bigger line that's above the baseline we also have an annotation with a smaller orange triangle and if you click on this orange triangle you get reports that are fully automated in the end so the latest tracker takes the high latency and sends it to several processes to extract the profile here is the high latency frequency distribution and then it's allocated to the government just click on the link and you can get a full picture of what was happening exactly in the latency frequency of two outliers in the range of four microseconds in the range of five microseconds and then we have to the time period to look at the documentation finally the last mode that we have been working on for the past eight months is the rotation mode so it's expected to be released in March 2018, it's currently under review and the idea of the rotation mode is that combines the normal tracing mode which you just trace on this with the snapshot mode which is creating small traces so now we write to this all the time and whatever you want you can just type at 10.08 and it takes the current chunk of data and writes it into a separate folder and the trace comes into the right mode on this so in the end it comes here every minute executes at 10.08 set one thing, one minute trace so now I think we do trace that covers the normal tracing session or small snapshots so it's a big difference and again it's dependent on the new space sometimes snapshots are better on different spaces but this is really useful also you can do it from one trace to trace we have customers that trace all the time and traces become really huge so they want to every hour use the trace, compress it and keep it for storage so if they need some time to use the trace you can just get from the storage instead of having to start the session and then you can close events between sessions with this mode you don't close any event just switching to the directory and you can use the trace as a really small self-contact event so it can run differently depending on the disk size so if you can also just rotate process the trace before like that and then start the trace you don't keep too much storage on your computer you can use that to every minute expect really low level data you just want some high-end trace for attaching to that you choose how long it takes and also for post-processing because sometimes just generating this graph from a 10 second trace can take more than 10 seconds depending on the load and the number of events so with this kind of process you can just expect the staff to send it to a worker that will do the post-processing of the trace it will continue so create the trace just like creating a disk session just create one row and after the start when the activity is running then you can just type extend your rotate and you can see the path of the worker that they've found between the start and the rotate then if you wait a bit and re-invict your rotate you will get the path to the now and the previous rotation you will really get the full control to understand between the two robots and then we have two, we have the trace compass which is a graphical group you want to look at the data graphically and see what's going on it gives you a way to explore by process or by results also in the winter months through days you are in the histogram or kind of graphical and we are also working on the new tool which is also a processing tool you are able to really look into a trace and find what's going on those are especially useful for race conditions you don't want statistics like the tools I've created before you really want to see the full background have the nano-saving escape and really see what's happening what's running on the stem why this process got created you will see interrupts and you will see everything from the 3D volleyball group and you want smaller traces because I can pay ages to process the stem the stem chart operation is very easy to extract whatever information from the production environment it's efficient camera and other space spaces the traces are really separate at the end you can version them and see the camera events so you will see the full time you can use commentary in 4D distribution we know a few parameters that you did in the telecommunication and alternative environments also and there are lots of ways to configure the engine and depending on the workspace you can just select the one that is most interesting for you so if you have any questions it's not for you it's just for the engine that runs inside the Java or Python languages so they are built in Java but they start with engine or they are used to process output other space spaces from the property of space so instead of using the engine pro or even switch the camera every time it's just fully in the space and I'm also curious when do you see when do you see a stress in the engine mostly in the engine so the main difference I feel at the beginning is that you have trace buffering and trace aggravation and those are really complementary when you see an engine you get a statistic when you have reports when something changes it's not operating at the time so you can go back to time and look at the detailed trace compared to VCC or you would just hit histogram 3 or statistics so usually we use trace aggravation to make problems and then if we want to go back at that exact time when the high intensity are open for example you look at the truth so that's why you can use the perfect truth but it's really not made for high support just as a face art it's just a banner because you can expect a lot more from this but it's just it's not meant for continuous living Any other questions?