 Good morning everybody. Thank you for taking the time to come visit my session I'm Frank Roan, I currently work at Sony and About a month and a half my funding disappears. So if you have a spare job keep me in mind Or I might still be at Sony in a different location I'm not going to read this whole slide to you. This is just what the program says Just to give you an idea of what the talk is about There's a benchmark that's used very very frequently and quoted frequently in the real-time world Someone has a new machine and and they say I'm running such-and-such a real-time kernel. Here's my cyclic test results Is this good or bad? So you'll hear this this number a lot and I'll talk be talking about cyclic test today Um Cyclic test gives you essentially a single number which gives you a sense of what your real-time performance is and The definition that you get from people talk about hard real-time and People talk soft real-time that you get a similar definition You want to know what is the latency of a response just stimulus Something in the real world happens your system needs to respond to it in a certain amount of time And that's what we're trying to measure when we're trying to measure real-time behavior and performance So what happens is we have some sort of external interrupt trigger Such as a clock expiring, but it could be a sensor triggering can be any sort of device and Lots of things can happen before we actually get to control transfer control to our real-time program And in our case we're going to use cyclic test is our real-time program So lots of things can go wrong and slow us down IRQs might be disabled when the interrupt comes in so you might not be able to handle it right away It takes a little bit of time to run through the IRQ handling code saving state and figuring out which interrupts are pending Which one to go to? We actually wake up The cyclic test program because it's registered for this interrupt But it it hasn't started running yet There might be a delay because preemption might be disabled We might we have to wait for preemption to get enabled potentially Cyclic test might not be the highest priority program There might be some other higher priority program that's also awake and ready to run and needs to get run and So we may have to wait for that process to get preempted if it's already running or once it starts running and To start running us we have to invoke scheduler code and it's amazing how much code is involved in this entire path So that's what we're trying to measure From the time we get an interrupt our trigger How long until our real-time application is actually executing code to do our real-time work? Yeah, we have an expert here. Mr. Steve Rosted. Just real-time expert Also, you could be blocked on a lock. Yes. Yeah There there are lots of other Ways you can get delayed Steve is giving a talk after lunch in this room. I highly recommend it some of the things he'll talk about our trebuchets pumpkin chunking and How the the Linux real-time Changes how that technology is set up to try and solve and reduce some of these issues some of these delays and How to create the maximum amount of determinacy possible? So as Steve mentioned that really isn't the full list of what could be delaying your real-time task There are other things such as lock contention and lock issues. You might have to worry about It's not a specific order like that list. I gave you it could be happening in in random orders There could be additional interrupts coming in So it's very much not a sequential thing. It's very asynchronous and in out of order and Here are a lot of other factors that can create latency like I said extra interrupts could be coming in during that whole list of events SMIs you do not want system management interrupts if you have a platform with those. I'm sorry Get another platform. There are a lot of talks talking about SMIs or we can talk later or I don't know if Steve will talk about them But we'd be glad to talk about SMIs The processor might be in a sleep state It it takes some time for the processor to start executing at full speed again. So that can slow you down If your real-time process is moving from one processor to another it also needs to migrate its data from one cache to another It's dirty data And as Steve mentioned there are a lot of lock issues So we have our nice cyclic test program which I said is a very common Predominant real-time Linux benchmark. I'm going to show you some pseudo code and It's going to be very straightforward Very very trivial. You'll be surprised at how simple it is given that it's almost 3,000 lines of actual source code So here we go. This is the key core of cyclic test If you know this, you know 90% of what you need to know about cyclic test So we have a thread executing and We find out what time is it right now Like I said, it's a pseudo code. So these double parentheses are so you won't get all upset when you say well clock Get time really has two or three parameters. What is this lie? You're showing me? I'm really trying to go for the simplest Version of this so we find out what time it is now We say the next time we want to wake up is now plus some interval and then we go into a loop and We sleep until that time. We wanted to wake up We say what time is it now that we've woken up It's that whole list of possible delays has occurred and now we're awake. We know what time it is then we say What's the difference between when we plan to wake up and when we actually did start executing and That's our latency. Diff is our magic key variable that we're going to be reporting on through our cyclic test reports And the next thing we do in our our loop here is we update our all our statistics We have a new smallest value for minimum a new largest value for maximum We want to collect total latency so we can calculate averages and We actually might keep data for each event or we might put it in bins For histogram data, so we'll do all that data manipulation here And then I'll say the next time you want to wake up is incremented by your interval and we just spin in this loop collecting data You are now cyclic test experts Question yes, right if the right the question is if the Latency is larger than the interval then what happens? Yeah, there is more code you can spin and catch up in a report several late latencies There are several different clock sources you can select and the actual algorithm for the different sources is different So you have to figure out which clocks Clock type you're using and then look at the code to see how it does the catch up for that I'll show you some interesting results very late in the talk about catching up. It's very interesting data So like I said, it's really trivial and it's it's magical that something so simple really collects all that complexity of all those delay sources For the most part and I'll give some caveats. I love to give caveats Stepping back to the bigger picture. We'll look at what is the entire cyclic test program? So first of all we create a certain number of threads however many we want and That's what we were just looking at was those timer threads. We already saw that algorithm of what's going on in there once we've created all the threads we just start spinning and optionally Once every 10,000 microseconds we might want to print out the current updated statistics and Once we've decided to shut down for whatever reason We might have run a certain number of predetermined loops. We may have run for a certain time duration There are a lot of controls for that then we shut down and we shut down the threads and We optionally print out The accumulated data for instance the histogram data so overall that's the entire program very very simple again and Let's go and look at a little bit more detail of The timer thread again We saw the core loop of the timer thread in our first slide And there are really two blocks in the timer thread the first one is we're setting up the thread We're modifying ourselves. The second is we're doing that test loop that we already saw So here's the thread set up We're doing a few system calls essentially and The key here is that we are Expending cycles. We're going forward in time in an asynchronous manner. Every thread is doing this on its own On an SMP system those threads might be off scheduling and bouncing around processors. They're doing system calls So it's not deterministic that every thread is Starting its timer cycle in total lockstep with every other thread. You have multiple threads There's a bit of jitter in there and again, there's the test loop which we saw before just for completeness every thread Separately does this So every thread does its own setup every thread sets its own processor affinity potentially Every thread sets its scheduling policy and priority potentially sets its signal mask So every thread independently does it to itself The Main program is running at a normal priority. So it's kind of a low priority The timer threads we tend to run at higher priority So that the main pro cyclic test program is just kind of a background load It's having in most cases no impact on the actual timer threads Because the timer threads typically you're running at a higher priority And this is what I was just saying those timer threads are not necessarily in lockstep Since they're not in lockstep, they're probably waking up at slightly different times Depending on how many threads you have versus how many processors you might be able to overload it enough that they do wake up And overlap a little bit And even in the vanilla case Who knows it's kind of random timing. Maybe they'll overlap a little bit. Maybe not. Yes I can't tell you ahead of time for any given system. So here are the caveats that I promised earlier of How cyclic test might not be able to capture all the delays So the measured latency that gets reported is not the theoretical maximum ever you could hit It's a floor to that maximum So when you've run cyclic test and come up with a maximum latency It's telling you you should occasionally expect to hit at least this latency You might hit one a little bit worse Or not But don't take it as gospel. You've actually measured the worst your system can be definitely leave headroom That cyclic test wake up is happening on a regular timed interval The default is once every thousand milliseconds. So if somehow You have some event that's somewhat synchronous to the timers some other real-time tasks, for example It might get scheduled Right in the middle of those slots when cyclic test is not running Causing delay in there if that happens to click test will never ever measure the delays you're creating there So you have to be aware of what your your workload actually is going on your system. Yeah Okay, so a cyclic test thread will wake wake up every 1,000 milliseconds on a regular basis So I've got wake-up points if there's Some evil thing happening on the system in the middle of that period when the cyclic test thread is sleeping This causing delay There's no way that cyclic test can measure The existence of that delay because it's never waking up in the middle of That that's really a pathological case. I mean Right so for instance, you might have have this evil thing that disables interrupts and Three-quarters of the way through that Cyclic test Interrupt pops so you're not going to measure that three-quarters of the interrupt disabled time Right now like I said, that's really pathological. How often do you have something else that specifically lockstep in between? Your yeah, and you can adjust that thousand microseconds to different values. Yeah question back here The question is should there is there an option to add some randomness in there? There is not one explicitly. There are other clocks which are More an interval from now not not a fixed interval from not a fixed cadence But once it wakes up and measures the latency then it says I want to schedule my next one Interval in the future, so there is a little bit of slop in that one, but there'll be a great enhancement to actually add Some sort of randomness into that interval Right, right exactly. Yeah Question Howard Same question. So the question is could you use Monte Carlo testing to find that? If the very end I'll have a slide Doesn't really address that So I started out with that whole list of delay causes in reality in my world I measure those delay sources directly and that is a different answer to your question I measure. What are my IQ's disabled time periods? What are my preempt disabled time periods and their tools in the real-time kernel to do these measurements? And I highly recommend those tools. They're very very good at trying to focus in on the exact behavior of your system And what's causing that and how you can reduce those individual components that make up that whole overall delay and Cyclic test also will give you some tools to help find those things. I'll mention those later just in passing This is also not measuring Your IRQ source that your real-time application actually is using if you have a hardware sensor With its own specific driver with its IRQ handler running an IRQ context Which potentially wakes up its IRQ thread and that thread runs and Then that thread wakes up your real-time process That's one model. There are other models how you can structure your application The timer handler is not doing that. It's going into IRQ context and directly waking Cyclic test. There's no intermediate IRQ thread for example and Who knows what the size of your IRQ handler is versus the timer IRQ handler So if you're being really hardcore and you're really tight on measurements take a look at that You need to have your proper workload running that actually is creating these delayed causes and Cyclic test itself is not causing those delays. So you need to figure out what Workload you need to run to actually characterize your real application. There's some really bad ones SMIs I mentioned they're out of your control if you have them Stop machine is one that most people aren't aware of Which will get you totally unexpectedly If you do a module load or unload You're calling stop machine Potentially it's optional on the load depending on conditions. It happens on the unload Stop machine has got to stop every single CPU on your system while that that work is going on That's an incredible latency. You don't want to deal with so something that's totally unrelated to your real-time application Could be causing you grief. You have to control that stuff And Steve says enabling function trace on an arm same problem stop machine Hop plug uses stop machine hop plug fortunately is in the process of being re architected by By Thomas Glickner But still there's probably going to be nasty overheads and there are people in the arm world who think that Hop plug is a great way to do power management If you have idle processors, let's just hot unplug them and then we need them will hop plug them not good for your real-time performance So how do you deal with this? Just don't use cyclic test so You have a real real-time application instrument it put in measurement points where you have your your wake-up event and Where you're actually starting execute code in your real-time application. That's kind of the holy grail. That's really what you want But this is a talk about cyclic test. So we can't go with that solution. We need another solution So you can run your normal real-time application as your load and Then run cyclic test with a higher priority than your real-time application So cyclic test always gets scheduled before your real application It'll measure a maximum latency that your application will then see in the real world when cyclic test is not running So it's a very easy work around works very well Typically you're not going to have one thread you'll have several real-time threads They'll typically be running at different priorities. They'll have different needs and So you'll actually need to measure the latency for each of these different priorities So you'll run your your benchmark with cyclic test at a higher priority Than your most important thread measure its latency Then you'll run the benchmark again with the cyclic test priority in between those two real-time test priorities And you'll measure the latency of the lower real-time priority task Right, let me get to that in a few minutes we'll come right in that Just something just have to make one correction. Yeah, I'm a previous thing Actually not working on this Okay, yeah, Steve is giving credit to Cervante But Doing doing oh, yeah, yeah doing doing the hot plug work Thomas is working with him and helping Get it in there But there are a lot of people actually doing work in that area a lot of people and a lot of pieces are going to come together a Lot of individual pieces that are enabling that work to be easier also So big crowd of people to give credit to So here's an example of what I was just saying I might want to measure Latencies for two different Real-time threads in my application and in this example, I have one thread that's critical it needs to be Woken and running within 80 microseconds and I've set its priority arbitrarily at 50 And so to measure that late the latency for that thread the expected latency I'll just run cyclic test it one higher priority at 51 now I'll have a second real-time thread and It can allow up to a point one percent of time that it misses its deadline without cat cat catastrophe occurring And it's running at a lower priority 47 So if I want to measure its latency, I can just run cyclic test a priority of 48 Which is less than the critical real-time task but greater than the second one and Here's an example of that I've actually measured and this purplish magenta line The latency that the highest priority task can expect that's why I run cyclic test a priority of 51 Real quick explanation of the graph. This is a histogram The x-axis is what is my latency in microseconds my vertical axis is how often did I encounter that specific latency? so Roughly 5,500 times. I had a latency of around 20 microseconds For that most critical application and the high watermark was out here around 65 microseconds So going back is 65 microseconds worst case good enough and Yeah, I only needed 80 microseconds. I've got a little bit ahead room looking good My less critical application if I run cyclic test a priority of 48. I get this more spread out graph With a high watermark of almost 130 130 is worse than 100 microseconds. We're gonna occasionally miss that deadline I said we could miss it point 1% of the time whoop, so I said the deadline was 100 microseconds So is this tail? more or less than point 1% of all the samples and I'll wave my hands and claim. Yes, it's less. We're in good shape We actually have the histogram data. We can actually count how many items are in that tail and say what percent is it and We have another way we can measure when did that occur were those all together at the same time Or they nicely spread out over time because that might matter to your application not just how many events can it miss but how many in a certain time frame and Looking at that same graph It's really good to use a log scale in your your vertical axis looking at latencies and Looking at causes of delay because you often get these these tail events out here and If you're running a Linear y-axis, they don't stand out as much This little bump here 80 to 90 That that really jumps out when you start going log you can start getting a better appreciation for what's going on out in those Tails, so I'll tend to show both graphs and a lot of these cases Just a quick aside Real cyclic test output looks like these two lines. It has a lot of data But if I try and put that much data on a slide I have to go to a small font So I've cheated I've cut a lot of these Details out in a lot of the slides. So these red fields disappear typically and this is what An output will look like on my slides typically. So when you really run cyclic test, don't be surprised. It doesn't match my slides So cyclic test has a lot of command line options Can I just run the stupid program? Do I really have to read through all 60 options and figure out which ones to use and Here's the answer if I run it Mostly with the defaults. I'm adding a couple and I'm saying go for 10,000 100,000 loops if I could read Don't show me my update every thousand milliseconds way to the end and run a priority 80 is a schedule so a real time and Looking at my maximum latency is 337 my average is 281 What if I say? Use clock nano sleep instead of the default clock My maximum plummets from 337 down to 68 my average plummets from 281 down to 43 So clearly there are some options that just without question you need them and You really need to pay attention and Here's the graphs. I like I Like to have graphs and Again linear and then again Log so you see it's a dramatic difference. It's not just the maximum in the average of this changing It's also the shape of the curve. It's actually behaving differently so some more examples of how options can impact cyclic test and another caveat a Lot of the tests I ran in this only ran for say a hundred thousand or a million loops That's not very much. There are people who run this for a day or a week Or whatever to try and find those corner cases and find real maximums So if you look at my numbers when you go home and the slides are available If you run a similar system and you say my maximum was twice yours. What's wrong with your stupid slides? Don't expect my maximums to be reality. You really have to measure your own system. These are just examples And just one more detail Since we're gonna be talking about priorities to click test is running. I changed my priorities Most my real-time threads are running at 70 or 50 my kernel threads that are always there There's one that I cannot change the migration thread is set to 99 you try and change it It won't let you sked-set scousers just blows you off Here's a whole List of different options. I'm not gonna go through these in detail. You can read these at home But I'm gonna point out a couple of them So each line or pair of lines is a single cyclic test run Again, I've cut down the output display and then in this section. I've listed which Cyclic test command line options I gave it and in the end is kind of an English explanation of what's going on Okay, I'll get to pinned in a second So the very first run I go with no options at all and My maximum is 2699 which is way worse than anything we saw on the previous slides This is a totally idle system no background load What's going on is I'm running cyclic test from an essay H session across the network so and Cyclic test is updating the results every thousandth of a second. I mean 1,000 microseconds and So it's sending data back and forth over the network involving the networking RQ handler this software cues for networking is doing a lot of Thread activity which has a priority much much much better than my default priority, which is sked normal So we'd expect a terrible maximum So if I turn on the quiet flag, which is don't give me those live updates I've just gotten rid of all that network traffic and my maximum has gotten way better. So that's an easy one If I set my priority to a real-time priority and it gets a little bit better, but the average got worse Who knows I'm not going to analyze these. I'm just trying to give you examples of how things change I said clock nano sleep was important that drops our maximum down very nicely pinned is each timer thread Locked onto a specific CPU or is it allowed to float around as the scheduler feels the need? So pinned means it's stuck on one specific CPU and That the t-option tells us to pin it and our maximas are big Going sked good high-priority real-time Scheduling clock nano sleep our results are starting to look pretty good right here Maximum of 68 and we're not really going to get any better in some of these other variations You'll notice that almost all the examples I give I do not use the dash M option That's laziness dash M says lock the memory that cyclic test is going to use so that you don't take the overhead Of memory management, so really should be using dash M even though I don't in this presentation and The rest for approval at home So that's running a hundred thousand loops Yeah, I'll show you a graph in a second. So the question is why is pinned so much slower than not pinned So max of 91 and 111 so we have two threads here running One on CPU zero one in CPU one When they're not pinned the maximums are much smaller or significantly smaller the averages are about the same I Have a theory, but I don't quote me on this My theory is that sometimes if a thread is Blocked on one processor because interrupts are just that's not a good idea preemption is disabled Can migrate? No. Yeah, yeah, you can migrate So there are cases where we're a real-time thread might be out of migrate to different processor and avoid some of that extra delay For the scheduler for RT is aggressive scheduler So if you block at any time, it will look for any CPU that is a lower that has a lower priority than it And then push it there aggressively So if you don't pin you get locked on something it will actually have to wait so it won't migrate Yeah, I would say Yeah, I would say don't try to extrapolate that particular pairing to your system measure your system Does it matter? Is it better or worse when you pin specific processes? And in general people tend to pin real-time processes because it tends to be better Yeah, Chris Exactly Right so Chris was saying that the thread isn't slow or it runs on the same time once it starts running It's just less deterministic because it the wake-up can be longer potentially and then the second point Right It's a subtlety, but it's correctly saying we're measuring jitter not latency The data is there pull out of it what you need to pull out, but but yeah, it's we're definitely looking at jitter They're both key items you're measuring latency because you want to know when is your real-time task going to wake up What is that wake-up latency, and that's what we're measuring, but that measurement is showing jitter Yeah, yeah So this is the same measurements 100,000 loops or or Million How many zeros are in there? Okay Looking at any specific value. You'll see the maximums change in ways you may or may not expect Just again saying you really really need to run it a lot of times and You can't necessarily just run the benchmark once for a long period of time You probably want to run it several times for that long period of time and watch how much consistency You have and how much It's use this word jitter how much difference you have between the runs how much noise Because this is not a deterministic result. This is just giving you a general Feel in general picture. Like I said, it's a floor. It's not a ceiling and what your your latency might be Yeah, the question about pinned versus non pinned Just a graph of that data, which is kind of interesting So the red is not pinned the blue is a run where it was pinned So it's not just the maximum which is different. It's also that shape of that curve There's definitely interesting behavior there Where you're probably seeing the extra work of moving from one whoops drive this right or right? Yeah, red is not pinned you're probably seeing the extra work of moving to another processor Which is kind of the opposite of what the results we just saw a minute ago. Yeah Yeah, and like I said CPU catches if you're migrating you potentially need to be migrating data and depending on your catcher architecture That could be you manga's or it could be moderate it again It depends on your systems. You really need to measure your systems. Yeah Yeah, let's defer that to the end. So at the very end let's because I probably don't have the answer I probably don't know off the top of my head, but And if it if I do know the answer is probably gonna be a long involved answer But but hold don't don't lose that question. Don't don't forget it Yeah Right so the question is what is cyclic tests measuring if the real app with your real real-time application is taking page faults you're saying Clearly cyclic test is not measuring that and that goes back to the case where I said if you want to answer So that's sort of question. You really really need to instrument your own application That will give you the precise gory details This is just gonna give you an overview of what the overall system behaves like I'm just gonna skip the demo because we're running short I can show you guys live when we come to the end if we have time or in the hallway What it actually looks like to run cyclic test There's a demo thing tonight, maybe I should sit at a table and demo cyclic test Look for me if I can wind the table if I remember I'm gonna show a lot of information about What is a normal result or not a normal result kind of sort of so I have my MIPS three-way three CPUs with four levels of cache and 64 megabytes of memory and I'm running my XYZ application. What should my cyclic test result look like on that system? I have no idea to tell you the truth But there's a really great well there's an okay source and a really great source The okay source is the real-time wiki actually has a table of people who've measured various machines that they have and Sometimes in this table in the comment section on the right. They'll actually say what was their average latency or their maximum latency? so Two very small numbers that don't really mean much in isolation not the big graphs just two numbers If you go to the Linux RT users email list people are are quite often saying I got a cycler test result of X Is this expected is this good is this bad? So you can mine that list for the last few years and see what other people are getting Again, not a real great resource. I mean it's not a real comprehensive resource, but it gives you some indications the goldmine is osadl.org and They have a QA farm and they every day Measure a very wide variety of machines different architectures different sizes different frequencies different kernel versions and You can go there Don't bother writing down the URLs the slides are available And this is what all their machines running in a given day look like So that's what it's a typical cyclic test result looks like There you go It's a lot of different things there's nothing there consistent is there So to give you a sense of what he's doing he's running that many zeros loops That's like a hundred million loops locking memory Running a very short interval. He's waking up every 200 microseconds. So he's stressing the system a little bit more He has various background loads. So you have to go to his his URLs and read all the gory details And the actual members of SDL you get a lot more detailed information get the actual data behind those histograms that he has and he also has Each histogram for every day that it's run over the past and years So he's made a three-dimensional graph. I've been showing you two-dimensional graphs He's taking those two-dimensional graphs and line them up edge to edge to edge for three-dimensional So you can see how is a given machine? How consistent has it been over time or if he's changed operating systems versions How did that impact the results? So great benefits to us a DL members great resource And I'm just gonna pick a few graphs out of that big big spaghetti graph just to show you some specific examples that you can actually see This is the one everybody loves. It's really a small latency Nice tight peak right the low values All these graphs got me on the scale of zero to four hundred latency Just so you can compare them. That's the way he does it And I'm not going to try and explain these different shapes and what's going on just trying to make you aware That normal can take different shapes on different systems Different things in architecture can impact machines differently So it's very common to see a secondary hump out to the side If you have in this case 32 processors, it's very common that different processors actually behave differently CPU zero quite often has more latency because it might have more timekeeping work to do There are other loads that may be specifically tied to different processors. There may be interconnect issues That make certain processors slower than others If you start getting the new machine, you get all the new issues This is a slower machine Much much longer latency way out to about 270 micro seconds again showing multiple peaks down low But even at the high end there's something that occasionally happens that slows that machine down This is a real oddball machine I asked Carson about this one specifically he sent me an email and I'm still trying to understand and Getting it kind of to prickle it through my head quite what's going on here. So sorry no explanation for this one another multi-peak Another multi-peak This one it's more of a single peak, but it's it's more offset to the right There's still a couple of secondary peaks early two processors one processor So Even a boring graph can be interesting. I showed you before a very spiky beginning graph This one also has some low data points out here There actually are some latencies out here So if we zoom in on that we can see the different processors Actually have different maximums out here It might just be an artifact of this measurement, I don't know if I look at the same System for different measurements for different days in a row if each CPU has that same pattern or not If I was analyzing this system, I'd be looking for that and if I seeing that same pattern I'd be saying what is it that's driving a certain CPU to have a longer latency. Can I fix that and zooming in again Okay, a cyclic test It's a hair ball as an interface. So here are all the options and we have Five minutes to explain them all to you So I'm not going to I'll just tell you the groups There are a set of options that impact how the thread behaves. So that's things like what sked Class you're in what priority you're running at how many threads are running What's the interval between waking up? Are using clock nano sleep? Just a whole lot of variables you can use to influence the way that cyclic test is is behaving to try to better mimic Your actual real-time application There are some things that are useful for benchmarks There are things that impact what the display the data output what format that's in which then Sets up how you can actually analyze and and view that data and There's some debug options and I could do a whole talk on debug options and so I punted and Said that I really don't have time for it But they're really really really cool and this answer is an earlier question of I have a large latency that occurs how do I Analyze what that latency was and you can set various triggers To invoke various tracers that have been built into the kernel that measure things like what is your IRQs off time? What is your latency? F trace primary out there up here if you want to talk to him later It can measure what was going on in the period leading up to your delay so you can get a picture so you can trigger on certain latencies to Capture that data so that's extremely valuable. I highly recommend you look into that for analyzing debugging and fixing the hairball nest of cyclic test. This is a real gotcha. Just be aware of it The code that handles the options is not very robust So it's best if you separate every single option into its own Dash option space dash option space dash option and these are just examples of how you get different Apparently the same options, but the result ends up being differently. The takeaway really is just be very clean and separate each option Don't don't assume that you can just clump them together like you normally can in a unique system another data format is So I've shown you The overall statistics the minimum average max sets one format the second format was all that histogram data So we can draw this really nice graph what latencies look like the third one is for every single latency Collect that data and save it away. So now you can look at every single event and look at it over time And here's an example of a graph of that Where the x-axis is time the y-axis is what was the latency for that specific event? So each point on this graph is one of those wake-ups so you can see over time. Is there some time pattern? Where you have a certain latency occurring for example? This is just an interesting case of how you can use that sort of data to find something and There's a throttle for the real-time scheduling Where it's a kind of a fail safe So that if you have a runaway real-time process you can say I want to limit all my real-time process to x percent and that defaults to 95% if I've used more than 95 percent of the CPU in a certain interval, which you can also set in the period Then let a normal process normal prior your process run for a while So you can try and recover your system and kill that that runaway session I recommend you don't use that in a real production system I Actually would put in a different mechanism, but we can talk about that later if you're curious So here's an example of running cyclic test with I've changed the percentage down to 80 percent. So once I've used 80 percent Then for 20 percent of the time my real-time process is not allowed to run so you can see every once in a while my latency spikes to really big value and doing that in Log y-axis. It's just an interesting shape to see you can see that nice sharp pattern Yeah, what I have is some background real-time tasks running at priority 40 Which is lower than my priority 80 cyclic test. So that other task is trying to monopolize the CPU It's a runaway process So cyclic test is detecting that Here's another way of looking at that and kind of the reverse direction Where I'm running cyclic test not a real-time priority, but it is sked normal priority So it's a low priority task. And so what it's measuring here is how much Time is left over after all the real-time tasks have taken the CPU away And again, I have that throttle so that real-time task is monopolizing the system and then I Capitated 80 percent. So I'm given a 20 percent Window in which cyclic test can actually run since it's a lower priority And this is also answering the question if I miss a latency what happens to the next window You can see my latencies are coming down in a curve They're not just dropping so we're really we're actually getting varying latencies We're catching all those missed latencies and here's the log graph So just an interesting shape to keep in mind you could run cyclic test Just to get a sense of how loaded is your system. How much CPU left? how much CPU is left for normal processes and I've never actually done this for a real system So I'm not sure quite What this graph would look like if I had a 70% busy real-time system And of course, it's not going to be it's probably going to be varying when it's busy and not So if anyone else run this on a real application, I'd love to see the results and Again, I'll show a demo tonight We're about one minute over or I can start showing the demo before they kick me out if they start kicking me out This is what it looks like, but I can do a live demo So coming out of red hat there's the oscilloscope tool you can get it from the fedora distribution Python should be nice and portable. Hopefully Just showing over time and it's actually churning this out as cyclic test is running and updating this this line and Sliding it over as new events come into the on the right side. It's just kind of cool to watch what's happening in real time There's another data format coming out. Hopefully the next version Trying to catch those periodic peaks. They're way outside normal with less overhead So that'll be coming. Hopefully soon Just a slide on where to get cyclic test just for later when you have these slides So I've talked about how it's a really really simple methodology and algorithm that cyclic test uses But it ends up being very very powerful and can actually capture fairly well All sources of latency delay with all the usual caveats. They gave earlier Use your options very carefully. You really have to pay attention to them And they give you a lot of power use different data formats to try and get different insights into the data and If you have latencies that you don't like then there are the nice debug options Which I did not talk about but you should use to to debug those those issues