 So here we are in the continuation of this topic. So the idea is how can we get those lib traceFS and all those things and take it to the extreme on optimization and extreme on readability of the output. That's basically the idea after TimerLat and the tools inside the RTA. How can we create all them integrated? There's a set of two inside of RTA, and we can integrate those are tools like Steven was mentioned. So just explaining our work for Red Hat in the real-time team, and Steven was there, and then I started to work with those things, and we worked together since that time. So maybe this will be shorter because Steven already mentioned things here, but let's, so Linux has been used as a real-time operating system, and there are multiple reasons for people to use it, like software stack and availability of the new software, people that are trained on it, but also because Linux achieves the desired timing behavior. Some features that help there is like the fully primitive mode, real-time scheduling, schedule that line, and so on. So one of the problem however is that the way that we show the timing properties of Linux. So for example, we use Cyclic Test, that is a black box tool that mimics a real-time workload, and the latency gives a report of the latency without explaining why. So for example, Cyclic Test sets the timer in the future, it fires, and it measures the latency. That's what it measures. Right? The black box approach, you cannot say it doesn't work. Right? It helped to Linux reach this place, but the root cause analysis, it's not given by the tool, and it's generally done using tracing. And okay, tracing is nice for me, for Stephen, but not for my manager to understand, not for a newcomer to understand. So this interpretation mainly of the abstractions that compose the latency and how to understand the tracing, they are not straightforward. Right? And mainly, when we read the trace and try to trace it back, we need to glue things by human and try to understand these and that. And after 10 years doing those traces and tracing by hand and trying to understand trace by hand, one gets annoyed of repeating and repeating and repeating that. So that's the path forward that brings to RTLA. So who cares, right? Who cares about these things? Who cares about trying to make these analysis simpler and more readable and try to optimize it? The poor guy tried to debugging it, right? He's the first person. It was me trying to debug things inside Red Hat and repeating the same ritual, enable the same events, do the same interpretation, try to do this. Also, by not trying to join the analysis with the workload, many times users report just the problem, but not the root cause for it. So it gets annoyed because you have 10 bugzillas about I have a scheduling latency problem and go figure what it is. Maybe they're all the 10 problems have the same root cause and we're just spending 10 times the time to try to figure out those problems. And how can I make that easier? How can I relieve the poor guy from doing this debugging over and over again? In an interface that my program manager or my project manager can read and can just say, okay, this bug has the same signature for that bug and I can correlate them. Make this a dump of the same problem. How can I bring that, right? And also, how can I optimize this tracing to the maximum with the knowledge that we have nowadays? And also there is the point of real time to the masses, right? All the kernel developers will have to run some sort of RT testing on the kernel to see if they are not breaking the kernel with their new algorithm because the kernel will be, the RT will be part of the mainline kernel one day or the other, right? And there are a number of projects that need some kind of analysis that goes beyond just looking back in the trace and seeing these use case. They need more information. What else happened during the execution that I didn't get on my last or the one that broke my latency that could have happened? How can I take that to the next level? And that's the idea behind the RTLA and integrate and actually integrating multiple tools, right? So RTLA time a lot is a new approach based on the things that we've learned over the time, right? So it's a component tool. So we have things in the kernel space and things in the user space and they are all glued with the library that Steven was explaining here. So all those find this value inside that trace event. It's actually what TimerLite does. So we have, in kernel, we have an optimized tracer that tries to avoid writing things to the buffer to save time as much as possible, right? Trying to optimize that. So it does most of the output that TimerLite is providing here in user space. It did optimize the kernel, the synchronization of these values, right? And it tries also to do processing that I will do in user space that will take me time. For example, to compute the delta of the execution time of an interrupt in kernel, it does in kernel and optimize it. And it also, I can also control like nested events. For example, I have a thread interrupted by a software RQ interrupted by an RQ and by an NMI. How can I compute all those numbers in a more optimized fashion without having, let's say false positives and giving me a trace that I can read it and not have to interpret it too much. Or I don't have to expose too much data to user space, right? I just want a report. And so it has, now it has one workload in kernel and now it's on the Linux next, there is the user space workload. And the good thing about the trace points that we use on RTLA, the US Noise Trace Points, that they are based on the formal model of the preempt RT and they will be used for, they can be used for multiple purpose. Here I'm using it for explaining the pieces of the latency. I can use them for the US Noise and here's the paper where I explain them. And I can use it for the RTSL, which is a tool that will come for RTLA that shows what would be the worst is scheduling, the worst is scheduling latency in order that I could have on my system. Neither TimerLot or CyclicTest goes to that answer, right? It's a step beyond that. TimerLot and CyclicTest works by sampling the code. So that they have these characteristics but TimerLot already goes into that direction and you'll see later with some data. So, and TimerLot also has this user space part that is trying to make this tracer as easy to use as possible, right? I don't need to use two tools. I don't need to try to reinvent the wheel here. The tool does everything. And it has like a benchmark like interface that shows you the numbers that you're seeing now, right? And it has an auto-analysis module that goes and try to explain the latency that we saw. And then now for Linux Next, we also have a user space workload. So, TimerLot, instead of working with one step at showing the tread latency, it works in two steps. It shows the IRQ latency and the tread latency. And that's because they have different root causes for this part of the delay and this part of the delay, right? And the TimerLot latency on the IRQ is also useful for some use cases where the users handle the critical workload in the IRQ. So just here is one example I watch on YouTube because it's easier to control. So here's one example of the tool running. It's running on my workstation. Here is the, I have a kernel compiling in the background and it is a preempt RTE 6.3, I think. It was last week. So, RTLA TimerLot top, it gives us this benchmark like interface. And here we can see the results of the IRQ latency and tread latency. And we have some statistics here, like the current latency for the tread latency, the current, the minimum, the average and the maximum values. And here also for the IRQ. So we can get these two views of the system. Go ahead, Daniel, from the past. Yeah. So clear. And here is the histogram output. So we can have a histogram of our occurrence of latency. So here I'm limiting the CPUs to zero to three because the histogram generates a lot of columns. It's one for IRQ and one for the actual tread latency. So here you can see the IRQ latency is always smaller than the tread latency. And you can see here the patterns that they create, right? And here we have like a summary of the maximum. And on the system I was reaching like 30 or maximum here. So I think it's, okay, this part. So RTLA can use it as a benchmark, a standalone as a benchmark. So, but when testing a system, we generally have a maximum acceptable latency. Let's say I have like a, my system needs to react up to 100 microseconds or less. Or my IRQ latency maximum should be 10, right? And TimerLot can set up and produce a report if the latency threshold is hit, right? And there is this option to set stop if the IRQ latency is that, stop if tread latency is that. Or there is this match option, which is dash A that tries to enable the default options that we generally use. So here's one example of the auto analysis. So I'm using RTLA TimerLot top and say 30. It stops if it reaches 30. And here is the two output. So the two is giving me a human readable output of the problem. So it's showing here, Daniel from the past to show that there is a spin lot there. So the IRQ handler delay, the IRQ latency, the, so the idea here is that I'll explain this field, but the idea here is that the tool breaks down these, the pieces of the latency into small pieces. So you can look for them independently and give hints of where the code it was the problem. Here it was a bit better FS right and the same C groups and so on. But I will explain better the output of the auto analysis. Here is, let me see if it ends here. Actually, okay, it has started. So the RTL auto analysis, right? Under the hood it's doing something similar that Steven was showing. It's using live trace FS, parsing the trace, but it's based on some abstractions that we take from the real-time theory that makes it easier for us to understand and to interpret it. So the timer lot auto analysis, it decomposes the latency into a set of variables and then these variables can be independently analyzed. Whoever saw the first presentation I was doing about scheduling latency with the formally proofed scheduling latency, this is inheriting part of that research, right? And so there you can try to have different analysis and different importance and the auto analysis works for all pre-emptual models. It works for the preempt RT, for the non-preempt RT. It has all the problems that they try to address of the problems using a more generalistic way, not just parsing a single trace. So to understand it like on the real-time theory we have like the execution time, which is the time that we need to accomplish a job that I need to execute that I'm interested. We have the context of blocking. Blocking is when a lower priority task causing me delay. And I have the interference, which is when the higher priority task interfered on my workload, right? And if you try to look at Linux with a different set of tasks, we will see that Linux has an hierarchy of tasks. We can consider the most common are the threads, right? But on top of it, we have the software queues that can preempt threads, but cannot preempt our queue or NMI. But it can disable IRQ and cause blocking for these two. Likewise, the IRQ can preempt these two and these two can cause block to IRQ, right? And can be interfered by NMI and so on. So IRQ latency examples, trying to understand the output of time or not. So here is one output. I draw a timeline here to make it easier for us to understand what is going on there and try to understand the variable. So here we have an IRQ latency of 32 microseconds, right? And these 32 microseconds, I can interpret in the trace and using the US Noise TracePoints that I show later, I can figure out how much time it took from each to start, right? It was an IRQ delay of 31. The blocking thread was this object tool. It was running. And here are the stack trace of the IRQ, the timer IRQ. Looking in the stack trace, I can see that, okay, here is the lock that caused me the delay. It was this lock inside the C-group code in a butterFS write. So I can look at these and say, it was a VFS write, it was writing on a butterFS doing main C-group, doing C-group and there is a spin lock there and that caused me the latency. As sweet as that. And that's why it's easier for my manager to understand because she will look at this and say, 59%. And here, okay. So what was causing this, right? I can, what added this IRQ lock here? And it's easy to map that to a comment. This speeds up our workflow inside Red Hat a lot, right? It doesn't even, in most of the case, if this is starting repeating on a bug report, even the support people that are not that aware of the problems, my manager can relate one bug to the other and say, boom, duplicate, right? So it gives a really easy to read output. You can just read it. You don't need to try to interpret it too much. So for example here, another example is when we don't have the system set up for, we don't have the system set with idle setup correctly. So I can have exit from idle latency. And the auto analysis can show you here, okay? There was an IRQ handler delay, but it is an exit from idle use case. If there were interrupts here, postponing it, it would also show. But here, okay, there's an exit from idle written here. Is it from idle six, seven, six percent? Yes, I'm leaving idle. And it's also printing here, okay? The exit from idle latency is high on the CPU nine, try to do some setup here. RTLA has a workaround for this, which is setting the DMI latency to zero. And then you can just bypass this problem saying, okay, I will do this later. I can use this option. And Cyclic Test does this by default. I didn't do this on RTLA by default because I considered this toning. So you are lying to the user if you do this without letting them know that there is actually a problem. And so here we have a tread example, right? So this was done in the no real-time kernel. Just show that the two can also work on that case. So here on my system, the RQ latency was very good, zero. And my, the thing that postponed the RQ latency was zero. And my RQ latency was actually just 1.6 microseconds. So, okay, I'm good here. The time relapse RQ duration was nine microseconds. So there is an execution time here that, okay, nine microseconds is not small, but still it's a low value in my analysis here. It's not that the main problem. Then I can also see here that there were tasks and I RQ and software RQ, which are a higher priority that postponed the execution of the context switch, right? But they also here, they just amount for a fewer percent of the time, so they are not like the problem. And here is where I see, okay, most of my problem was here, right? It's this 94% of this tread blocking the schedule, right? The value is high because it's the no primitive kernel. And what was working there? It was the key worker doing the butterfly fast, so I mean running butterfly fast on all my systems. And it was doing the compressed page. And here in this part of the code, from here to the next scheduling point, there is a huge spacing time. So if one would like to optimize the no real time kernel to be more responsive, they need to find a place around here to add a con riscad, for example. But the idea here is that are not used butterfly fast. Yeah, that's another output. Yeah, you can, or you can set your key workers and move them to another CPU. So the analysis is doing here, it's automating everything and try to make it as optimized with regard overhead as possible without limiting the use key. Because the timer lot is also tracing, right? It's based on tracing. And I can leverage timer lot using those tracing as well, trying to make it as easy as possible again. So the timer lot is a front end for the tracer. The trace activates the US noise trace points and they are used to report the blocking or the interference of the, that the thread suffers, right? We have one trace point for each task. So instead of bringing two trace points and then processing in user space and try to compute, these took this amount of time but it was preempting the other one. So I need to discount the amount of time on that one and try to do this on user space. We can do this in kernel, minimizing the amount of data we can push to the trace buffer and try to optimize time. There is in the paper where I presented the runtime verification system, I showed that doing this kind of processing, if it's faster than writing to the buffer, it's advantage. And this is one of those use case. So it's better to do simple processing here and reporting just one trace point then keeping writing trace and trace and trace and trace and here the values are processed they are free from nested interference. Again, if I have a thread preempting my workload and this thread was preempted by an IRQ, software Q and NMI, blah, blah, blah. The values reported by the trace point they are at net values. You can just read them. No need to try to correlate one with the other. So here's one example of the trace and the auto analysis. So I'm just saying trace is top if it's 30 microseconds and I'm just tracing the CPUA. And boom. So it prints the auto analysis, right? And if you see in the last line it also saved the trace file. And I can use that trace file to go a step further, right? If I need to, if I want to. So here's showing that the latency was a thread latency that there is these IRQ interference here. And I can read the trace file. So the things under the hood, they are also exposed there. And you can go ahead and read and extend it. Anti-merlot can as well add integration with other like the locking analysis and all those things. And but we can do more with tracing, right? I'll just try to show that in the example if you play it back you'll see that this value was created with that value. Won't just let me pause here. So but if you can see here there's this IRQ noise, right? It's just saying it started here and that was the duration. This thread noise or this thread noise it just say at this time we started postponing the thread context switch and it postponed by nine microseconds. I can just read that I don't need to try to correlate later. And this is out on Inferno in an optimized way. I'll do it later. So and this, so and then I said we can even go forward and enable all the events inside RTLA without having to use another tool. So here it is. So I'm running that command line again and I say I would like to see the SCAD events, the work queue events, the IRQ vector events and the IRQ events. And then I start running it and then I can read the trace and try to correlate all those events when they started, when they finished and I can enhance it. Also because TimerLot controls the timer I don't need to enable for one example of optimization. TimerLot controls the timer so I don't need to enable high resolution timer events to try to analyze the latency. So this is one example of optimization. It's taking the tracing to the extreme in optimization for these use case. So here just showing all the events that I can also enable here. So for example, this is the IRQ work entry and the context switch, the IRQ latency, say the timerLot was awakened here and so on. So even though by default, TimerLot is not giving you all the verbosity because it's not required. If you want it to go deeper, you can using a single interface. So that's when we can go even deeper, right? And this is the part where the analysis is going, shows the power that we can have here. So I can, those trace points that mentioned the interference and the blocking, they are always running as I'm running the analysis, right? When I hit the latency, I can print what causes the latency but I can also collect all those values during my execution to try to figure out if there is one bad case, one bad piece that didn't compose my final latency but alone it could. It's just that the TimerLot didn't hit the latency at that time. So I can go deeper and do in-date analysis of what could have happened but didn't because the sample didn't happen at that time, right? So here's the link for where I compose the command line. I will edit this as an option to TimerLot saying histograms but for now I'm just using the raw trace functionality. So I'm copying here the command line using histograms. And then I run, you see here, I'm running the TimerLot and saying I would like to use this event as a noise and I would like to trigger this histogram. So I would like to see the CPU and the duration of the NMI, right? I would like to sort it by CPU and by duration in microseconds and how many times did that thing happen? So how many times NMI happened and that took one microsecond, two microsecond, how many times? So here also for the IRQ noise, how many times these IRQ took place on the CPU and took this amount of time in microseconds and say I would like to sort CPU and duration and I would like to see the histogram of this count. So I started the tool here, press enter, Daniel. Okay, Daniel from the past gives some time for me in the future to explain it. So here's the tool running and it can run for a while and when it stops, the TimerLot already know that you enabled a histogram and so it saves the histogram profile. In the future, I will add an option to enable and parse them and you can hide this part, right? So for example here, there were some NMI's on my system even though it didn't show on the final report, right? There are some NMI's but they, wow, there's one here that took the 10 microsecond. Okay, let's see the interrupts here. There are more examples. So here you can see I have Timer interrupts go into 17 microseconds, 14 microseconds and I know that there is a case where the Timer interrupt takes a lot of time. So here, it wasn't the case that I got when I was analyzing it but there was a time that the time, local Timer RQ to 32 microseconds and if I target for 30 microseconds, I can have a problem here but it was not TimerLot or even Sektas would not be lucky enough to hit at that time so it was something that happened in the past, they didn't observe but it happened and it's here. There's a problem there and we can even enhance on these analysis like doing, and then if we go deeper and deeper on these, we reach to the presentation I gave two years ago where we have the formally proved worst case execution, worst case scheduling latency. That's something that we will have to our TLA soon. So here just showing the other case of interrupts and also the tread noise. How much blocking time did the treads added to my TimerLot tread, right? And here you can see that the tread blocking, they were not taking too much time, just fewer microseconds. So most of the case what we are seeing are IRQ latencies and not tread latencies. Okay, this amount of time some summary but in the future we will parse these histograms to make it easier to read. So I don't need to read files. And this is something that is coming probably soon. It's on Linux next, which is running TimerLot in the tread in user space. Here the TimerLot exposed a file descriptor where the tool can use to activate TimerLot from user space. And with these use case, I can have three reports of latencies. I can have, we still have my IRQ latency. The moment, the exact moment it was scheduled without any overhead of going to user space, just the scheduling, try to isolate every time more the variables, isolating variables to get more precision. And also when it returns to you to the kernel space, I print another latency, which is the schedule return. The good thing here is that by using this approach, now it's just TimerLot dispatching workload, but by using this approach, I can generalize TimerLot for any workload. So any workload could use, here's the other value. So any workload could use TimerLot as a way to sleep. And when it wakes up, it can do any computation. And by the end of the computation, it will see here what was the execution time of the response time of that computation. And it will get the report of everything that happened in between. So for example, if my tread was running and it received an interference from IRQ, the RTLA TimerLot print that, okay, there was this event in the middle that you need to check. So it's bringing, it's hiding all the analysis and try to make it as easy to use as possible. Just one tool, simple options and some value, some value, added value with the auto analysis. And also the RTLA, here is the histogram further space. And RTLA was made using this idea of using the tracing libraries so it can be expanded. For example, it can be easily expanded as it did the locking things that Steven made. It's just the same thing, it's just the same libraries. Great work. So some by the ways, right? Some tips when using the tool and how to proceed on the debugging. TimerLot has a rich set of options. I can set the period, I can limit it to the CPU time, to the CPUs I would like to run it. We can say the duration that I want my session. I can add some debug information of what the tool is doing. I can change the priority of the thread, like making it run with SCAD deadline, it's how it works. I can say that this is for next. I can say put my RTLA, put my RTLA thread to this CPU to not interfere on the workload. Same DC groups. And I can do only auto analysis. Okay, I don't want to see the trace, I just want to print the auto analysis. And here it is. This option, the dump task, it's useful for a special case when we have hardware latencies influencing on my CPU. Just for example, URI and I were working on a system where the system was idle and I was seeing a high IRQ latency. The idle setup was perfect, but I was still seeing this idle IRQ latency. And what was causing that was a work queue on another CPU right into the DRM driver. So with this option, at the end of the auto analysis, it will also print all the tasks that were running on other CPUs. So I can try to catch this case where the latency is influenced by other CPU because of hardware effects. That are something I can trace on my current execution. And this is the idea and the flexibility behind RTLA. We can accumulate knowledge on how to interpret this data and try to optimize it all in a single place. So people can take advantage of it and not creating more and more tools that we can need to have. Another thing is that before running, trying to figure out if your system can do scheduling latency as well, we need to try first to see if the system isn't adding hardware latencies. That is, if the system has, for example, when on large CPUs, if they have SMIs enabled, they will cause hardware latency. Or if I have a driver that interferes with my older CPU, I need to try to understand those things before starting going into the thread latency analysis. Because this hardware latency, they can happen anytime on my timeline. And so I will have timelines that are very odd. Sometimes my latency is before the RQ. Sometimes my big latency is during the RQ. Sometimes it's after, but the system is all running idle. So it doesn't make sense. For example, 40 microseconds latency to schedule if I was running an idle. So all those things then have these odd behaviors. We are generally dealing with hardware latencies. So there is this other two inside the RTLA, that is the hardware lat that uses the OS noise tracer. Okay, let's use it in your turn. So the timer lat, it uses the OS noise tracer. That's another trace that RTLA uses. It uses it with IRQ disabled. So I can see, for example, on this period of time, I can see how much runtime the thread had and how much noise it observed. And how much CPU time was available for the system to run without the noise that we can have. So here was during my period, how much noise did my CPU suffered? And here's one example, right? Here is the sum of one period, but every single piece of noise, what was the largest single piece of noise, and is this, right, max single? I saw on 11 microseconds gap in time that I cannot relate to IRQs or threads. They were not running, right? And here I can have a counter of how many NMIs, because I cannot mask NMIs. How many NMIs happened? And how many of these gaps I couldn't relate to anything in the operating system? So it was something outside of the operating system. And here I can have just, I don't know, top of it, Paulo Bonzini and I, and Federico, a student at PolyM. We are working on expanding this output to showcase where KVM was added in the hardware latency. So it was not the hardware of the host, it was the KVM vCPU being prepped. So we can augment this. And that's why it's good to have those tools running and integrated, right? We can integrate everything in a single place. And also here, the hardware noise, like timer dot, I can enable other events to try to figure out what was causing. So I have a max latency of 31. I say stop if I have a latency higher than 15 and I would also like to see the NMI events. So it starts running, boom, the stop trace and it shows the trace. One thing that I would like to add at the here is, okay, so it's showing here, there was the sample threshold and there was one interference, something interfered with it, it was the NMI. And then you can see what was running inside the NMI. One thing that I would like to add to RTLA instead of writing a file to write a trace command data where I can join Perf and trace command data into a single file. Final remark, so RTLA, it was born as the idea of doing this kind of tracing analysis in the most, let's say, perfection way that we could. Trying to reduce overhead, trying to automate the analysis, trying to generalize it and try to integrate multiple ways to see the system inside a single tool, right? Integrating tracing in the tool. It produced a summary for the root cause, for latency spikes, and it's a good starting point for the analysis even for the non-experts, right? And that's the turning point of time a lot. And it does it without blocking the user form using other more advanced tracing. So it can only get better. For example, we can integrate that blocking tool inside RTLA and move forward. And that's the idea. Also, RTLA, I call it a meta tool because it's not a tool, it's a binary that calls all the remains. So I can have multiple tools inside RTLA. So with a single tool, we can do this kind of analysis. We could integrate students code. We could do other tracings and other analysis using a single binary that it's part of the kernel. And, okay, as I say, RTLA is the home of other tools like timer a lot, West noise, hardware noise, but there are more to come. Like there is a tracer that use the West noise trace points to show the execution time of the tasks. That's for free. The West noise events already does it. We can leverage on those. We can just focus on IRQ execution time. That's a request from people at Red Hat. There is that work that I did previous to this where we decompose the latency into independent variables and then we trace all these independent variables and then glue the worst case of them all. That's one of the motivations for the RTLA. And why am I not exploring that first? Because that requires some trace points that are not always enabled. And they are the IRQ-appointed disabled trace points that Steven were mentioned. They are generally not enabled by default because they cause overhead. We need to optimize that and then add in the full worst case latency analysis. But timer a lot is doing a step forward already with those histograms. And we are working with integration with KVM so we can understand if the root cause is on the virtual CPU or in the host and try to detect, for example, hardware latencies on the host even though I am inside the KVM and try to report to that, for example, for a provider. And whatever the community needs, the doors are open inside the RTLA. And that's it, thanks. Questions? Do you have locking analysis in the list here for RTLA? We can add it. Because I wonder also why no one is adding trace points to locking primitives. Like Steve, in Steve's presentation, he was using function tracer, actually. Overhead. Overhead. Overhead. Overhead. Function tracer is lower overhead than trace point. Trace point? Yes. Really? There is also options inside. People in Perth have been using a BPF to augment those lockings taken and get the name and get more information from that. RTLA can also use the BPF if people need. There are multiple ways to do the things. And the idea behind on RTLA is try to figure out which one is the most efficient to reduce the noise to the extreme and try to make those analysis. But yes, locking analysis is something that can fit inside RTLA. Because if you do real trace points, you can grab more data. Yes. But that's the point that we would have to either optimize those trace points so distros can enable them by default. Or we could work around using function tracer and try to figure out things. What is the best way? You can use the BPF. What is the best way that we could reach that? And the idea of using a single tool is that we can aggregate the knowledge of the people and selecting the best thing to do. But it's because trace points aren't enabled because of performance. And where is your kernel part merged in? The timer lot? It's merged. These things are all part of the kernel. There is this dash user space function. That's the only thing that wasn't merged. But it's only next. Luckily, with some luck, it would be in the next list. So I need to look for 6.3 or what? 6.3. 5.14? Ah, 5.4. 5.14, it was the first kernel with timer lot tracer. 5.16, it was the first with RTLA. And anything on, let's say, 6, beyond the things that are pretty stable and usable. Fedora already packages all those RTLA and the tracers. Is there a way for, especially on embedded systems with? Sorry. Sorry. Is there a way with hardware noise or one of the tracers, for example, to induce the delays that we know are possible on hardware because there is a contending GPU that's running or hammering DDR or other things that's happening. And we artificially introduce this noise into the test that we are running and figure out what would be the outcome. So that's something possible. The tool actually just measures and doesn't exercise the systems. They measure what the system is and the conditions where the system is. There are tools that people use to exercise portions of the system to generate those latencies, right? And then we have RTL of all in, now they have RTL of all and there is the stress and G. But yes, the exercise in the system is work for other tools. Here we measure it. But if one would like to add like measure, workload stress inside the RTLA, why not? I was just trying to add a hypothetical load and see how it behaves. That's the measure of the worst. I mean, model the worst case kind of a thing. Yeah, it depends on the use case. It depends on what you use and create a history of what causes every latency. It would be a nice tool, but it's a different scope, right? Thanks. There's a question from the virtual attendee. Do we benchmark or plan to benchmark run RTLA, our mainline RT kernel branch? It's all mainline. You can just run the tool and benchmark it. The code is already mainline that I'm talking about. Just one option that's still not mainline that is dash U option that should be merged with some load. Which I'll probably go in this week. Yeah. You see, Steven and I aren't competing. We're working together. It's an emerging regress. I understand I need those libraries, right? The Lib Trace, Trace Libs. Do you recommend me to keep those libraries so when on my production software, or just for analysis, my development, is any effect of, any bad effects of the real time or I don't know, of those libraries? It's a question for, it's better for Steven. I mean, the library, if you have tools that use them, then you need them on the production. I mean, some people actually use them on production, so it's nothing that's hard. It's actually an easy way to interact with the tracing system. So people disable tracing on, if you disable tracing on your system for whatever reason, then you don't need them. But if you have tracing, you're going to use tracing at all whatsoever. I'm actually giving a talk on Wednesday at the ELC part of EOS about you read ahead. Resurrection of you read ahead, if you're familiar with that. You read ahead is a way to, what basically it does for boot services, or if you want to bring something up fast, it will read what pages are pulled in for an application and then it will save it. So what it will do then, before you run your application, you could say, start pulling into the page cache all the things that this application will use, like say if you have a VM or something you want to bring up fast, it's used for that. So basically, I'm talking about this for you read ahead, it uses libtraceFS, the same thing that RTLA uses to do this feature. So it's not just for real time and tracing. I mean, it's a tracing utility it uses, but it's for fast boot ups and fast loads on a system. But one can use these traces directly, right? One can use it. The Timelapse Tracer, they are all in kernel. So you can use it even without RTLA. You could use the kernel part and do it by yourself using the Timelapse Tracer enabling the events directly on the TraceFS Tracer. There is a way even without, but I mean, the libraries use foots. What else? Yeah, thank you all. Oh, just one question, one more question. One more question just came in on Zoom. Can you talk a bit more about the user thread feature that's coming? Will there be multiple user threads? What work will happen in the thread? Sorry, where it is? About the user thread feature that's coming? Will there be multiple user threads? What work will happen in the thread? There will be one thread per CPU. And currently, if you look at the tracing documentation, Timelapse documentation explains what would be the main for using that. Well, yeah, a simple user space tool in C that could use that interface. There's an example there. And you can add any code that you want inside that example, right? You can do bubble sort. You can do stress your video card. You can do anything you would like to. But Timelapse is just using, it's not running, Timelapse is adding its own thread and these threads aren't running any code. They're just sleeping. Waking up, going to sleep, waking up, going to sleep. And then the user can add any, they can dispatch their own workload instead of asking RTA to do that. And they can put whatever workload that they would like to. That option was inspired by creation from NI. He mentioned that he used something similar. That he added his workload and tried to measure the response time. It was at the OSS North America last year. Thanks.