 Okay. So I'm going to go ahead and start. My name is John Agnes. The talk is about proposing a new tracer that doesn't necessarily mean that it's something that's going to land in an F-trace. Maybe it's part of the, it becomes part of RTLA or something else or just remains a set of scripts on my desktop. But the idea here is it's really taking a lot from the first three talks. I was really impressed because there's a lot of overlap and I'm going to continue with that overlap but I'm going with another angle than what we've had before. So hopefully this will just get us thinking about another approach that we need to have or should have perhaps. So the reason why I wanted to do this talk is just to make it clear that preempt RT is gaining adoption. Now there's more and more people using it. So I've been working for Linotronics for 10 years and we do support for various companies. And in my opinion the companies are actually getting much better with Linux. However, they're not good with preempt RT. So they're trying to do real-time systems. That's independent of preempt RT. Preempt RT is not the problem. The problem is they don't know how to use it correctly. So they're taking this preempt RT that they've heard a lot about. If you need sub millisecond latencies you have to take preempt RT. It's not an option. They're taking it and they're just implementing it incorrectly. Their user space applications are implementing it incorrectly. And unfortunately most of the time it works great. So they're choosing things wrong. They're implementing them wrong but when they do their QA tests and everything it works great. They're making products. There's companies that spend three years developing products based on this and they're actually doing it wrong. And then they ship them and then in the field all of a sudden they're getting latency issues and they don't understand what's going wrong. The problem is they implemented it wrong. So what they'll do is they'll maybe come to Linotronics and they'll say hey can you debug this problem? What's going on there? And I do the same things that we saw with Steven and the stuff we saw with Daniel going through and doing the traces and analyzing. And at the end I come to the conclusion they implemented their application wrong. And I come to this conclusion very, very, very often. It's not that we need to do a preempt RT patch. It's that they have to fix their application. And the problems are almost always the same. All companies are making the same mistakes over and over again because they don't know how to use preempt RT correctly. So one of the things I'm pushing here is to provide a tracer so that companies can a little bit do it yourself on things that you're doing wrong. That it just automatically tell them things that they're doing wrong without having to involve commercial support or something for preempt RT. But what that means is that when they do come to me now there's a real problem. Like they're still maybe have problems but now it's not going to be they're doing something stupid in their application now they're going to have real problems which I also want. I don't want to spend my time debugging some stupid thing and it's just because of page vault or something like this. I want to have real issues on my table. So by putting something into the hands of the people, of the developers we can give them a chance to take care of the simple stuff so that the real hard problems come to us. And I'm not just talking about user space problems I'm talking about real hard kernel problems that we still have a few of them that we have to tackle. So what are these common problems that I'm usually seeing? First off they're using the wrong APIs they're using things like Timer FD. Timer FD Timer FD you can't use that with preempt RT. It's never called from a hardware interrupt context so you're guaranteed to have unreliable latencies. But people don't know this. A lot of real-time developers don't even know this right so they're doing things with it. Just last week I was with a company they spent three years developing an entire product huge 50 threads real-time and it's all based on Timer FD but that's just not going to work. And of course everyone here knows that the defaults for the mutexes are not sufficient. We need the priority inheritance features need to be turned on. I also mentioned poll there because some companies are doing a poll on like five different file descriptors from a real-time context and the different file descriptors have different importance but if you're polling all five and now when the events come on the poll you're going to have automatic priority inversion. You can't be listening to multiple priorities at the same time. It's not going to work. Memory management is something that they're also getting wrong. Everyone knows the magical M lock alt been mentioned here in all three talks before me but that's not enough. M lock alt does not pre-fault the stacks so if you call M lock all and your stack grows you'll suddenly get a page fault in your stack. You have to pre-fault your stacks it's just not known. IPC networking, IO, anyone here who's done which almost every application needs these days is extremely complicated because the problem with if we're talking about IPC, if we're talking about pipes or message queues or anything like this they're not directly connected by a mutex so for example if I'm writing into a message queue and that message queue is full then I'm just going to go to sleep. There's nobody I can boost on the other side, no reader, I'm just going to go to sleep. So these things that are just complicated that you have to know about this otherwise you think just magically yeah it'll priority boost when the message queue is full or something like no, this doesn't happen. And then of course latencies are also something that's not quite understood. I'm going to take a second to pick on cyclic tests just as an example and there's nothing against cyclic tests but we've actually seen this now I think all three presentations mentioned a form of this capital minus capital S is the same as minus t minus a and this is the typical thing you see. You're told you run this and you can see what your max latencies for your system are but actually this command is not very good for finding that out when you're developing a product. The first is what about the tick? There's a lot of stuff that goes on in the tick. The tick could cost you up to 10-15 microseconds if you're unlucky. So if you just call cyclic tests this way you may never hit the tick and then you see a histogram with one mountain and it looks great but if you ever hit that tick you're going to have a second mountain that's about 15 microseconds shifted to the side. It depends on your luck. But there are options for cyclic tests so that you can make sure that you hit that tick. Your histogram should always show two mountains right? We want to see those two mountains when I miss the tick and when I hit the tick because it's going to make a big difference in your maximum latency. For 99, why is this example always with 99? I saw yours, this is 80, this is good. I didn't notice that. But a lot of the stuff in the internet you see is with 99. First of all there's no reason for a database application ever to use 99 because there's the migration kernel threads at 99 and they're running FIFO. Since when? It's a separate... Oh really? Okay, this is new. That's good. They even get deadline. Wow, okay. That's good. Then I guess 99 is not as dangerous as anymore. It used to be that 99 was a big deal because you can't trump the migration thread anyway so what's the point of even running cyclic tests there? Highs will run cyclic tests underneath it so that it always trumps. I guess it always trumps now anyway. Generally, in case it's an older kernel, I would still never use more than 98. That should be the highest that user space ever touched really. Then there's the fact about other real-time tasks. If we're running it at this high priority, we're seeing this max latency, how much does this really tell me about my product? Because my product, if I think of the company last week, they have 50 threads with all real-time, different real-time priorities going on there. They did this test once and they're like, yeah, okay, we can plan everything for maximum 80 microsecond latency. That's not going to be true because their different threads are also causing latencies to the other threads. It's not just about checking the highest thing. I want to know what's the maximum latency at 52, what's the maximum latency at 24 while my application is running so that I can understand the picture that I have. This thread is actually going to experience two millisecond latencies because of my other real-time stuff I have on top of it. It's important that they can build a full picture. It's not just about one max number for a product. Then this was also in a couple slides specifically mentioned using the CPU DMA latency trick basically to avoid the lower C states to make sure that the CPU is running full. This is actually the default of cyclic tests. A couple times I've caused trouble on the mailing list about this because you have a measuring tool that's modifying the behavior of the system to get better performance numbers. Of course you could argument and say, yeah, but we want to see what the hardware technically could do. But most people don't understand that. They don't understand that this measuring tool is actually modifying the behavior of the system. If your system is not configured to isolate to the C state zero, then you're going to have totally different latencies in the field. There are arguments for cyclic tests to avoid this hack. This is how I always like to do the commands. With the second line, you're making sure that you're hitting that tick so that we get the two mountains on our histogram. We should always see those two mountains. The minus-minus default system. It used to be called minus-minus laptop. Then they said, okay, that's a stupid name. Now it's called the default system. All default system says is don't change the system behavior. Just measure. That's what we want to do. We just want to measure the system. We don't want to modify its behavior. These are really both very important features. People just don't know about it. Partiola doesn't have the problem. Very good. That was just kind of one example of how there's something that's so simple like cyclic test. There's actually a lot of little details that people don't know. I see histograms and I can immediately see from the histogram, that's not worst case. If I don't see that second mountain, it's not worst case. We kind of talked about this earlier. All day we've been talking about great topics here. Creating real-time applications on Linux is actually quite hard. Of course our marketing says you have the POSIX and it's much easier because you can take normal C programmers and they can immediately start doing stuff. This is all true. But if you actually want real-time performance in your real-time applications, it's actually quite difficult. The reason why it's so difficult is because you have to understand what the kernel is doing. You have to understand the complexity of the kernel and how it's managing everything so that you can do it correctly. Then a lot of vendors, product companies have problems with this because they come from maybe they previously did something with a real-time OS where all the APIs are allowed to be used or maybe the real-time APIs have an RT at the beginning of them or something like this. Then they come to Linux it's not clear at all. Am I allowed to use a message queue? Am I allowed to do this? Am I allowed to do that? What am I allowed to do? You just have to know. You have to learn it because it's not labeled anywhere. It's a general purpose operating system. This is really the big hurdle that a lot of people have when they're moving to preempt RT. This is just a couple of the main points which we all they've been talked about all day today so Alana mentioned actually all of this stuff. Avoiding page faults, proper locking which means we're using mutexs with inheritance that we have to activate or notifying so for doing like notification and locking that we need to use the converse combination, IPC communication you have to understand how does this affect my priority inversion? What am I allowed to use? Really I always recommend shared memory. Shared memory you can actually do correct avoidance of priority inversion because you are controlling that notification mechanism. You are controlling the data flow and you're allowing priority inheritance to happen. If you're using something like Unix local domain sockets, message queues, these kinds of things then you have to really ask yourself is that really a real time channel that I'm using there? Hardware communication so when we're actually talking to controllers, talking on I2C talking at anything, we're doing GPIOs with the IO control is this something that's actually capable for real time? There's some systems that are hard. Anyone here who has done networking knows this pain. There's a lot of things you have to be aware of in a lot of special situations you just have to handle and have to know. The reason why is because most hardware drivers and subsystems these days are using K-workers and software cues are still around. They're real headache for the preempt RT people. This is how the kernel works. Then of course there was the great talk about RT throttling I'm in the camp definitely always minus one if you need a software watchdog you can implement your own that's above your highest priority real time application and have that one adjust real time priorities or steal real time priorities if someone's going crazy. But the RT throttling is you immediately have a broken system. Like what Daniel said your system is broken when this happens. It's like an airbag in your car when it fires you're not going to be driving afterwards. Your car is done and you get out and that's the end of the trip. It's not something you could maybe call it safety but if the safety mechanism explodes then I wouldn't call it safety actually. It's good for the non-RT people because they don't trust us or whatever and they have their non-RT kernels and all this stuff. But for RT if you're serious about RT then RT throttling is a dangerous priority inheritance bomb that priority inversion bombs are. And the last thing which makes it even more difficult is they have to watch out for libraries. Someone who's developing applications for Linux it's great I can use all these libraries that's one of the great features that we can use Linux because we have libraries but are these libraries taking all these things into account even glibc is it really doing these things if I do an sprintf is there a danger that some sort of malic will happen or some sort of page fault will happen do I really know can I trust these base libraries even and it's complicated and you have to look at it every person has to look at it and decide for themselves. So how can we make this situation easier for these developers? Of course documentation but that's not the scope of this talk so obviously if we had someone out there writing books about this or something that's missing that is a huge hole right now. The information in Google that's getting is very mixed and it's misleading right so we need some definitive documentation at some point definitely. It's not Google's fault they're just finding it but chat gpt will be spitting it out soon right or barge. Okay so what am I doing with my talk is I'm actually talking about providing a built intricer or maybe something that's part of RTLA that will monitor a real-time application and just look for doing stupid things right so it's different than what we heard from Steven and what we heard from Daniel where it was more like we have a certain problem and we want to analyze it that's not what I'm going for I'm going from an angle that we have an application and we just want to see if it's doing stupid things even if it's hitting all of its deadlines we just want to see if it's doing stupid things like using timer fd right so just to monitor that then it will help a lot of application developers to just say okay I guess I'm not supposed to use that and then maybe they'll look at why so we'll actually take a look at a live demo in a minute of something that I scripted up to show these four points right so it's a report so live report page faults live reporting if a real-time application goes into the s state unless it's doing it by clock nano sleep that's good or it's doing it because it's blocking on a priority inheritance mutex those are two reasons to go to the s state this is only for RT applications so this is only sure but then they're not sure but those tasks are not running with an RT priority then right so I'm just talking about so the question was there are multi-thread applications where some of the threads are not doing real-time activity right so what I'm talking about is I guess I should say instead of RT applications they're RT tasks right so I'm talking about monitoring RT tasks right which generally is RT applications but it doesn't have to be okay so I don't care about unless it's a kind of RT policy I do not care of what it's doing but if it does have one I'm checking if it's doing page faulting I'm checking if it's going into the s state some other reason whatever the reason is I'm reporting how much time I was preempted by a higher priority task right so there's a bunch of threads in my system that are higher priority and they do a bunch of stuff that I just get the information hey I was preemptive for two milliseconds not because of a bug in the kernel but because my own software is preempting itself and that I just am aware of these things that are happening and then of course if any time an RT task is blocking on a mutics without prior inherent we should know about this right maybe we're using some library something from QT or who knows what you'd be amazed what everyone's using these days and it's just has a normal mutics right and we need to know hey we're blocking on mutics so that aren't priority inheritance capable and there's a lot more things that I am interested in I'll talk about this later at the end there's more things that I also think are interesting but I have implemented all of these things in an example that we can take a look at right now there's only four examples we'll go through I did this with a BPF trace script I don't know if everyone's familiar with BPF trace BPF trace is using BPF but it has its own high level scripting language so it's very easy to actually write scripts with BPF and it'll compile them using B and putting them into the kernel to run now the reason why I chose BPF trace is because for what I did and we'll take a look at it later is I needed BPF I needed to get at things that are not available in F trace for example if I'm doing a some function is being called and I want to know is it a real time task I need to get at the the prior field of the task struct that's not available with BPF I can get this information really easy so I can actually see is this something that's related to real time or not and then I can just ignore it or I can decide to start tracking it this is why actually my suggestion was to say if it was a built in tracer like IRQ latency and stuff like this then we can add all of the hooks we want or we can say we want to do RTLA and then we're going to use BPF for that and grab all the information we need and there's the possibilities are endless okay so let's go ahead and take a look at this live demo here the purpose of this live demo I hacked it actually this morning not last night but the purpose of it is not that it looks nice it's just that you kind of get a feel of the things that it detects automatically reports which can be really interesting so this is actually running in a KVM because I wanted a more modern kernel these two windows but they're running on the same machine and I'm going to start in this window I'm going to start the I think everyone can see that I'm going to actually start the BPF trace program that's going to load everything in and all I'm doing is I'm just using print so one of the features of BPF trace is you can do printing and then it actually does the formatting in a different context and it just streams it out and so you can actually see printing information right so it's not super efficient but for proof of concept that you can kind of see what's going on there it's actually fine and maybe take a look at this really quick just to see what this thing looks like so I mean it's not huge right this is 100 lines here but if you look it's mostly there's a couple big ones here but mostly it's pretty small that's the whole thing right all four of those things are being covered at this and the way that it works here is that for example there are certain situations where I want to say it's allowed to go into the estate because for example the IRQ threads they're going to go into the estate every time they handle an IRQ that's okay or the SMP boot threads that are started there for example there's the what was the one that kept bothering me I forget there's but there's I guess a handful of threads there they're all running real time priority one like the RCU threads and stuff they're going to keep going into the estate and I didn't want to debug those that's all kernel stuff this isn't about debugging the kernel this is about checking user space applications and so the way it works is for example I have a K probe for IRQ thread this is the function when we're going to invoke the handler we're in a while loop and we're going to actually just invoke the handler and at one point there's actually a schedule there and that's what's actually going to make it go to sleep so here you see I'm inserting a K probe offset 191 from IRQ thread this is right before we call schedule I have a K probe in there and this BPF program is being inserted at that point right so basically I just have a hash table here called allow estates based on the task ID I'm just going to set a value of one which means this is we're right before schedule so if we go to sleep at this point we're okay we're in the IRQ thread function and it's totally normal that we sleep there right so I just set a one there and then offset 6 is when we come out of the schedule I just use object dump to get these offsets and then I can just delete it because now we've finished with the schedule so now I don't want my hash table anymore I remove the entry from the hash table right and this is just to kind of show you the elegance of BPF traits how easy that is to insert the hash table so I can start tracking things at an arbitrary place inside the kernel without there being any special trace points or function trace or anything like this that means the magic of K probe right K probe and BPF I do the same thing for the SMP boot threads here so offset 219 I'm going to say okay insert it into my hash table that this task is it's okay if it goes into the estate and offset 224 when we come out of that schedule then I'm going to remove it from the hash this is just so that we're not getting noise from these things because this is something that I think is okay for it to go into the estate if we look at something a little bit more interesting for example the do nano sleep here I'm K probe on do nano sleep and here also if we're going into do nano sleep it's okay that we go into the estate I'm expecting that and what you should also notice here I guess I should have mentioned here the second line in BPF trace this is an expression you can set to whether or not the BPF program is executed or not right so I'm actually using the current task struck the preo field to decide if it's less than 99 which means it's a real time priority that's the only time I'm even going to go into these BPF programs so if the the task is not a real time task nothing is going to change right so the only time I'm even going into these programs on every single one of my BPF trace scripting functions I don't know what you call them BPF snippets I use this exact same condition that we're only going to execute it when we're a real time task so that's why we're going into do nano do do nano sleep we're real time task we're going to note that that task is allowed to go into the estate and then when we return from that do nano sleep we're going to remove it then from that hash table again same thing with the few text lock priority inheritance if I'm a real time task and I go to lock a mutex and it has priority inheritance then I'm allowed to go into the estate it's totally normal that's okay right so basically I'm just marking areas where I'm allowed to go into the estate there so the last section which gets a little bit trickier is the sked switch the sked switch here is actually checking two different things it's checking to make sure it's keeping track of when I get preempted then it's taking the time stamp of that and whenever I get rescheduled then it's going to capture that difference and print out the output here so the first part here is really just keeping track of this when I was preempted by someone with more priority and here you see that I'm using another hash table called sked out R state this is just to track was I scheduled out so if I was scheduled out then here I'm just going to I didn't do it very nice here but basically I just calculate the difference here and I get the output preempted run and I give all the information that it was preempted for a certain amount of nanoseconds and here you see that if my previous state is zero which means I'm runnable think about it we're in the sked switch and my preempt state is runnable and I'm an RT task that's where I want to start tracking so that's where the entry into the hash table goes when I'm being scheduled out and I was runnable and I was RT then I want to track that and then when I get scheduled back in that's when I can then remove the hash and print so that's for tracking how long I was scheduled out the second part is about when I'm sleeping when I'm going to sleep for something so first of all I just check as the previous state one because if the previous state is one that's where we going to sleep so if it's not one if it's something else then it's not interesting at all so we come down here we know okay I was I went to sleep and this is where it's interesting I just check are you allowed to go to sleep because if you're allowed to go to sleep you're just going to go ahead and delete it and say yeah that's okay you're allowed to go to sleep but if you are not in that allowed to go to sleep hash table then I'm going to say someone went to sleep and I'm going to give all the information this task went to sleep and it wasn't because of nano sleep and it wasn't because of blocking on a mutex and then the last ones here really simple just showing page faults right so here are just the page fault kernel and the page fault user which do not actually have all the page faults these are just the exceptions so in my opinion things like M lock all or page faults due to the stack page faulting they are not counted as exceptions because the kernel is doing it on purpose and so it doesn't actually cause an exception in my opinion we should be tracking those as well but I'm just using what's there right now although I could have also put a K probe in the M lock all code or in the stack where it does page fault in the stack I could have added some K probes there okay and then the last one here is just do a few text this is just to kind of show an example of a real specific example and this is just to show that I'm a real time task and I'm doing a few text call with the op 0 which means I'm doing a lock and in this case it's not a priority inheritance because the op 0 you only do if it's not with priority inheritance otherwise you do I think it's number 6 or something like this or number 5 something like this so the fact that I'm doing it op 0 means that this priority inheritance is not set I am a real time task though so I'm also going to give that information out and then the rest is just to clean up at the end and initialization stuff so it's really not a whole lot there I did a couple tests to try to trigger something I couldn't yeah converse wasn't it I know we had that problem before I'm not sure if that problem still exists but I mean yeah but converse was completely written and I thought they changed that semantics now there but because I wanted to trigger that exact situation and I couldn't trigger it but maybe I needed a more complicated example of blocking okay so that's the whole thing and you see at the top for my interpreter yeah I saw about them maybe that's being used for the converse anyway I just wanted my demo to work I didn't but you see here yeah I have to you need to take some time for but yeah but I mean you see how easy it is to get these things yeah so easy to just grab you want the third argument you want some you need some out from the current task truck anything you just can grab them it's really nice okay so let's actually do the demo at the top I specify BPF trace as the interpreter right so I just run this like a script it's going to run BPF trace and as the first argument it's my script right so that's just running it's set up to run for I think 10 minutes which should oh it has to be real so that's running now actually the first time it takes a little while because of my virtual machine and this machine it sets I think 15k probes and then ready so you already see something already came up so for example this key timers test was actually preempted by something I didn't say what it was be preempted by because it could be all kinds of stuff but it was it was definitely preempted by something higher priority and that was so 390 microseconds this is running in KVM even though it's an RT kernel so you can't trust the numbers right now but okay so the first thing I want to show is just running a bash so for example if I so let me go ahead and start a real-time bash shell here and you immediately see lots of things page faults that happened you know things we'd expect nothing exciting here I'm doing the kernel stack trace for some of the events so for example if it's going into the sleeping state I actually show so you can see that for example in select is being done and every time like if I press the L key you know because it's reading right so then it goes back to sleep and I do the S key and it does some more and hit return and that ran ls right so there's a bunch of page faults and all this stuff but it's just things you're just seeing that happening that live on the system and if I'm doing my real-time application correctly I shouldn't see anything shouldn't see anything it's just showing me things that I'm probably doing wrong at a real-time priority right I shouldn't be real-time priority if I'm doing these kinds of things so let's go ahead and go to the next demo so this is an IPC program it uses shared memory just to communicate data so basically data will be written in shared memory it's doing things correctly with priority inheritance and mutexes and con bars and all these things just to kind of see what goes there actually when we do training and stuff I actually use this example to show people how to do things correctly however when I start this example oops there's a reader and a writer because it's using shared memory with two processes you see that I'm actually externally giving them real-time priorities right so that means there's going to be page faults that show up because I'm immediately giving them real-time priority and then do they do all the page faulting there so I start to receive in the background I have to type some message in the sender no the receiver is in the background then I start the sender I type some message and then it will get sent to the receiver right so I'll just go ahead and run that so it'll ask me for some message hello world and it'll get sent now you'll see that nothing happened after that message came right maybe I should do a little bit slower here right so when I start this at the beginning we get all these page faults and we have a sleeping that's happening because I'm reading I'm doing an F get S yeah the sender wants to know what should I send right so that's and I'm doing all the stuff in a real-time context right but in the moment where I actually type my message right so I'm going to add some space there we say nothing else because I did the rest of the application is fine it was just this setup was wrong but this made me realize I'm doing this setup wrong actually you shouldn't be in a real-time context when you're setting yourself up right because there might be other real-time applications on the system and I'm affecting them right so this whole idea of you have to really be aware do I need to be in real-time context or not and to pre-fault my stack and just do all this stuff I don't need to be real-time context and I should not be real-time context right and so this kind of helps to point that out which even for me was a little bit of a aha moment that I saw that there the next example is just with a mutex so this is a really horrible coded example but this is just to show that we have a thread that's going to take a mutex and I'm not setting the inherent so it's commented out and then I'm going to start a secondary thread so basically it's just going to take it and hold it for ten seconds and the secondary thread is going to try to grab it and we're in real-time context here with a schedule priority of one this is just to show that we're seeing those messages so if I run the mutex program so you see here that we're seeing the non-pi-futex weight and we're seeing the sleeping situation right because both things are happening here I'm sleeping on that mutex so I went into sleeping and that wasn't one of the two criterium and like a fine detail is I'm doing a non-priority inheritance I'm blocking on a mutex so we're seeing both of those examples there I didn't need to end that and then two more demos then there's a busy demo this is just also really quick program basically I started thread that's just busy waiting with a priority with a priority of one and I have another thread that's just going to sleep at the beginning and it's just going to jump in there for some loops and then jump out so it's just to show that we're interrupting another thread just to show that that's accounted for so let me just go ahead and run the busy program so this is just to show you that what that looks like so I don't do a backtrace or anything like this but this is showing you that this task the PID 572 was actually interrupted twice it was scheduled the way and for the first time that was actually when my program did it this is when my program ended I don't know who that was I might have been some interrupt handler or something that ran there really quick but you can see I can actually see I have real-time tasks that are getting scheduled out in the runnable state you should know these things are happening so I can see those things and then the last demo is cyclic test itself so if I use this cyclic test then the question is what's it going to look like in cyclic test cyclic test some key and lock all priority I'll just do I'll go ahead and do 98, second aligned and default system so just kind of see what it's doing there now this only has one CPU but what you see here is that obviously if it's running at priority 98 it's going to be interrupting lots of other tasks the whole time which it's actually doing here so you can see the RCU and all these things they're all being interrupted cyclic test is only there for a moment but is interrupting these other real-time tasks there so we can see all these real-time tasks that are being for 39 milliseconds 36 milliseconds so it's but this is a KVM don't take the numbers too seriously but the point is you see all that stuff but what you didn't see with cyclic test is there were page faults you didn't see it sleeping you didn't see it calling new texas that were priority inheritance so cyclic test is doing those things correctly which is really great it just started off by just preempting everybody so my vision really is when I thought when we have something like this that people can kind of see those things how is this going to affect how people develop their software first of all and this is how it would affect me they would probably drop their real-time priority whenever they have to do something that's not real-time critical this is actually quite similar to security applications they'll drop their capabilities so that they can do some stuff and then they'll just grab the capabilities when they're actually doing something and you can actually start to see this kind of thing you know if I do need to log to a file and I know I'm not in real-time context anymore then I just drop my real-time context I don't log to the file anyway right so I just I'm actively making sure that I'm only using RT when I actually need RT maybe there are certain system calls there are certain IOI do where I say okay our system is offline so this task also coming out of real-time priority and now it can do non-rate real-time stuff for example people are going to have to simplify their IPC models because Unix domain sockets and message queues and this kind of pipes is a disaster for real-time and a lot of people just don't know that and when they see their application sleeping all the time they're going to keep getting sleeping sleeping sleeping sleeping it's because they're using these IPC methods they're going to have to use shared memory or something like this so that they can guarantee they don't go to sleep and I think they will also tend to want to program multi-process instead of multi-threaded because when you're multi-threaded there may be some things that you're doing in JLib C where you're actually having a conflict you're actually contending with something in JLib C with threads that are not real-time like you know maybe you're doing something and there's some sort of you do an S print or something like this and there's memory allocation and now for a moment you're actually contending with another non-real-time task inside of the application right so these kinds of things would show up that why am I sleeping what's going on there and it's because you're contending with or maybe the non-priority inheritance mutics would show up something like this right so people will realize multi-process with just dealing with one you don't have to worry about these things having a multi-thread application that's half real-time and half not real-time my opinion is a little bit of a dangerous game right because you're sharing common resources of the libraries you're using and then of course this will also help that people only use RT when they really need it and I'm not talking about dropping RT I'm talking about this application doesn't need to be RT I see a lot of user developers they say okay I need a preempt RT system we have 50 threads all 50 threads have to be RT priority right and that doesn't make sense there's only maybe like three or four threads that actually are doing real-time work they're abusing real-time in order to just kind of make things go faster to run smooth or something like this but this is garbage for the non-RT stuff you've got C groups with CPU scheduling and you've got nice values and that is an enormously flexible tool for the non-real-time stuff use that and really decide what needs to be real-time and if they they will decide when they have something that's complaining at them they'll make as little of it real-time as possible to get out of these complaints and then of course the biggest benefit is they'll start using the correct APIs they'll realize oh I can't do that mutex I can't use timer FD I can't do all this they'll realize this and then they'll have to look at the documentation in chat GPT or whatever hopefully it knows it and find the correct APIs and then they run their application and it doesn't complain at all it's just silent the whole time and that doesn't mean their application is correct but it's a lot more correct than it was right so when they do have a problem and they come back then it's probably a really complex problem right so that's something different there okay and then these are just for me some final thoughts when I was writing up this f-trace script you know the other things that are interesting what about the D state you know is that something that we need to be tracking what about if I'm being woken up from a K worker or from the K software Q daemon or I'm being woken up from a non-RT task is that is that an alarm is it something I should be saying hey because if a K workers waking me up I'm probably doing something that I shouldn't be doing right generally speaking I'm also not checking the clocks on nano sleep so if it's a real time clock versus the monotone clock that's maybe also something to check it was mentioned earlier about the CPU scaling and stuff that may be also something we want to mention is that hey there's CPU scaling going on here right just something that we can just complain about there might be some more safe syscalls maybe in the GPIO sub system it's okay to do the IO controls I haven't looked at that closely right so there might be a whole bunch of things that we can say it's okay to go into the estate it's okay to go into the estate here and mark those and then also the idea of really convars are designed for wake up lock wake up lock if that's the pattern I wake up and I have to grab a lock that's what convars are actually for so if someone's you know using a mutex with a semaphore and we can notice hey you're doing a you're reacting off the semaphore which is okay it's a notification method but grabbing that mutex is not good because we have a we have a special type for that right so these are things we could auto detect we could start just you know notice things just in hash tables and BPF and say hey okay you just did this and now you just did that maybe you should be using convars right so things that we could actually maybe make suggestions there and that's it so are there any questions comments feedback even some of the mic actually it's very good and BPF traces fine everything else but I'm thinking for general purpose I would be a lot of this information probably would be useful to have a tracer create a new tracer that's what tracers are for to do something unique and then you could add everything you want in the tracer you probably have a little bit more flexibility because you're in the kernel doing that and then what I would do on top of this would be having RTLA so that because I was thinking about other use cases like it was like when you're showing here I don't care about cyclic tests interrupting all these other tracers so having a user space tool that reads trace that maybe the trace is showing all this but you can tell RTA I only want this trace I only want these things traced ignore everything else so I think having a combination of user space and inside the kernel tracer would be advantageous to and probably do a lot more and again having it in the kernel as for BPF traces there's a lot of environments that BPF is not really available but if you just say hey enable this event run your thing and then you might be able to do a lot more tracking yeah that was why originally that's why the title says new tracer because I think and really we're talking about a general set of rules that people should be following yep so I mean these rules we can all agree on yeah we want to have a notice if that happens right yeah and like I said I could see a lot of cases where you want to ignore things or you want to do different things or you want to set up a bunch of parameters which would require the BPF trace required editing the BPF trace file to do it or like I said we had a user space tool like RTLA that just said hey you know I want that tells you in that in fact what's nice about RTLA would be is the fact that it could detect something like you said with the pattern with the semaphore and the mutex it could actually say we notice this pattern perhaps you want to use this instead so they actually could have more input to the user yep that's the idea behind RTLA that's the idea behind RTLA so there's a helper for the tracers we can also use RV for this but there was a question here maybe something thought to enhance what about ugly things happen in the kernel side while I develop my drivers such as when I disabling interrupts or disabling preemption or maybe use spin locks you know all spin locks that's more what RTLA is for them right so but maybe this tool can reveal also this situation when I use it the reason why maybe the reason why the reason I would say is maybe not is because I want to have something I can trust so that I can really like whitelist the kernel threads whitelist because that makes the rules a lot easier when we can whitelist the complexity in the kernel and just focus on what user space is doing right so if we're saying yeah I'm a development driver and I don't want to trust kernel threads anymore then the rules might have to get a lot more complex but that might be something where we can say yeah for this thread I do want to make sure it doesn't go to the state or these kinds of things just so so one thing that came to my mind that I talked to you before and that's something that was on my to-do list is that we can easily write a state machine that represents all the states that you have here and the composition of them and those are candidates for RV monitors where you can track those things and RV monitors they run in kernel there are C kernel code and we can add any type of configuration option for the monitor that we have so we can say I would like to start this monitor, this monitor only traces RT for example but we can say filter only this task or and then we can add any kind of filtering that we would like to do and the good thing about doing it inside RV is that it's already the documentation the state machine is the documentation I think the only tricky part is if we would want to go to the B.P.F. route or somehow try to integrate it with the tracer just because at the end of the day you need to grab that data that's not available anywhere RV it's connected with tracing subsystem so you cannot trace output to the trace subsystem it's part of tracing subsystem so I had one comment and one question so the comment would be that if we don't have like note in main pages saying this is not RT safe we shouldn't really blame users we should really update the documentation but that's beside the point my question was you showed the script and it seemed to like refer kernel symbols and offsets so that's not very portable right so what do we need to do this properly or in maintainable way yeah so like Stephen suggested is if it was actually an RT tracer one of the current available tracers right and then we're just adding code into the kernel right at these points the reason why I had to use these offsets is because this is an out-of-the-box debbing RT kernel I didn't touch the kernel code and so obviously I had to use object dump to disassemble and find those points and insert the K-Propes so this will break with the next kernel update right so with the kernel update it would break yes thank you yeah it's also possible to add the trace hooker that you can hook to that point or add a trace point it's just a proof of concept so you see it right I mean it might work it depends symbol moves symbol changes as another enhancement that may be done when you are catching incorrect mute access or incorrect memory allocation or page faults in your case you may be better to point out where in the program it happens for this you need user space stack yeah actually to do user space stack lots of systems just don't allow you from kernel space to trace user space so maybe it's better to send a signal to application that misbehaves and rely on I don't know kill this application and have a core dump but actually BPF Trace does support user space stacks originally I had it turned on but there's so many user space page faults that it was like distracting I thought because you see all these stacks but it does have user space stack it support user space stack on all platforms but I could add really quick because signal handling usually can help you with user space stack on almost all platforms sort of if it has like frame buffers in there or frame they have frame pointers enabled yes then it does if you have frame pointers disabled a lot of times like I know Perf does this hack where it just records like 2k or 4k of the user space stack and then just preprocesses looks for the addresses but we are working on S frames which if that gets probably like 2 or 3 years out where just look up S frames and you'll see where actually it puts in inside the elf format field a kind of an orc unwinder that the kernel could actually read and figure out the stack trace at that time so S frames S frames yes but one thing that he mentioned is that maybe you would like to take an action when you go into an exception of your tracer and that's why the RIV I see disconnected RIV because on RIV we can take like reactions to the when you go out of the design there's a question from a virtual attendee I'm listening to the criticism of cyclic test not allowing high C states by default and I'm thinking would you not recommend an RT application writer to use the same trick should your tool detect transition to a high C state I mean that's a good question the question is really is my product allowed to run that way so obviously it's going to run a lot harder a lot more power consumption you actually have warranty issues with the chip vendor by running my product it's C state zero the whole time there's no universal answer because it depends on your requirements so if your latency requirements are very small then you probably don't want to have deep C states on the other hand if it's a relaxed real-time system where you can tolerate the wake-up time out of deeper C states which is up to a hundred microseconds for the deepest for the more deep ones and you still make your deadline why wouldn't you save power I mean there's no real answer to that so you have to know I mean real-time is hard because you have to design the system based on the requirements and not throwing a real-time kernel at the thing and say oh it's real-time now it doesn't work that's like I mean just because you have a firewall that's installed on your computer doesn't make your computer secure you have to actually engineer the firewall in order to make it correct so just the fact that there's a firewall doesn't tell you anything so this is really engineering and people have to put engineering resources on the system level on to it and I mean it gets more complex if you add safety but then it's nothing else than system level engineering it's a great question it should be clear that I'm not advocating that you do or don't use the C states I was just pointing out that CyclicTest modifies the behavior if you want to have your product do that that's fine that's your decision but the fact that the measuring tools are doing that for you is a problem because I see it on real customers where CyclicTest has great results but for some reason in our application we see much worse results the measuring tool is doing a tuning that not necessarily your workload is doing so the workload needs to respect the tool that needs my problem great thank you