 Hello, my name is Ramesh Thomas. I work in the open source technology center at Intel I'm part of a team that Contributes to real-time Linux by real-time Linux. I mean the distribution with the preempt RTPatch Which you may some of me may aware of So we generally look for optimizations in the kernel and tools That can help real-time applications One particular area we look at is how can we use hardware certain hardware features? Take advantage of them We generally gets turned off in most real-time environments One subject of our feature is the C states the CPU idle power management I'm going to be talking about that and how we can Include power managed CPU or power management and part of the Real-time application design All that I'm going to talk about is documented in detail in this Linux foundation. There's a wiki in the Linux foundation These slides are also going to I'm going to upload it after this presentation. So you have it will have them So I've been looking at what goes on behind, you know when the why are those deep-sea states? why do they impact the Latencies and Also, I've been looking at what goes on what options are available in the kernel that we can use to Mitigate them so that instead of Disabling them we should we will be able to tune them to our needs And so this is what the these are the methods that we are going to talk about There are several reasons for power savings or advantages of power saving One thing we should know this we should try to avoid the all-or-nothing approach, you know There are several flavors of C states available With different levels of latency impacts. So we should be able to dynamically choose the ones that We can use at you know and disable some of them that may be problematic similarly In a system we may be using there are several cores and then all the cores may not be running real-time applications Some of them may be running something like graphics or something which is not real-time and them But we don't they want to disable C states system-wide. So we should be able to also control it individually in each course and Also, we should avoid static configurations at boot time, you know We should be able to dynamically change this configurations over the lifetime of the application Was conditions change? So one thing I've seen I've worked in the Zephyr power management also and one thing I realized is any power management solution Especially in this embedded areas. It will be effective if you tailor it to a specific need In this case in real-time application case. I Determinism is the main goal of it. So we'll have to the solutions are designed based on solving The the problem of determinism. So let us talk a little bit more about what is determinism? so this are these graphs are are a Generated from histograms from a Generate from a tool called cyclic test which we'll talk about so here in the left-hand side if the the x-axis is the latency and The y-axis is the number of samples. So you see in the left-hand side That is some latency, but it is fixed. It is consistent. So that is that is what is it is deterministic? so in the application real-time application we can design we can budget time and Try to absorb that compensate for that But in the right-hand side This is not good because it is the latency is jumping all over the place. It becomes very difficult to Come up with any solutions. So just to note that these are the right-hand side graph has deep-sea states in play The left-hand side has been tuned to block deep-sea states So let us see. So there's variations We call it jitter. So let us see where does this jitter come from? So as we know that the deep-sea sea state it says small power and Also, it has to do a lot of more things while entering and existing it has to turn on and off a lot of things The shallower ones also save power, but they return most of the states the C3 sees the cash and TLB gets flushed now the thing is he is not just the The deeper states doing more things that is the problem It is the problem is in what it does So it it disables now the cash and TLBs gets flush now application when it is accessing Exeggiting code or accessing memory if that is in the cash, it will be executing much faster compared to if it is Exiting just exiting from a CP idle state when the cash are all lost it is going to access memory directly and it the cash has to be repopulated and This is going to vary a lot and this is where the jitter and the main Latency variations get started come from and the other things like when you are turning back Things that are going turned off the kernel may need to do synchronization may have to disable in traps and The PLLs may get relocked me need relocking all those things introduce jitter so now the thing is if all these things happened in a consistent manner, it's okay if it is even if it is a Larger latency it is fine, but it is actually Jumping around that is the problem. That's what we are going to solve so let's Talk about this charge because we are going to rely on this graphs a lot so this is generated by a tool called cyclic test I said and it is It is run over a long period of time so to get a reliable latency of in Particular platform and kernel configurations that is important So in a particular and platform and kernel configuration that will be a certain latency behavior And so we run it for a long time to get a reliable data so in the left-hand side we have we get Certain latency behavior Which is we can be a which we can then in the later application will budget some time to compensate for that So on the right-hand side when this also we is the worst case condition That is we run it for a long period for 24 hours or more to get This worst case so that you know the worst case will be using it in later Tuning things so we should be aware of what is the worst case so when we reach Critical phase we know what to watch out for how much time we need to you know Keep a give a buffer or we know to watch out careful about So let's say that we are so that means when we need each that Critical phases we don't want this to happen. So how and why is this happening because? deep sea states are in play here and So one obvious thing that comes to mind is to block the sea states deep sea states at critical phases So so that means we need to we have a to dynamically control the sea states, right? So let that's what we will see how will we control the city selection? so To control anything In software we need look for we need some attributes So we need some things that we can use to based on that we can set some conditions and filtering So sea states has two such attributes on his exit latency and the other is target residency The exit latency is the time when the sea state when the high the time taken for the hardware to exit that sea state target residency any sea state It takes some power just to enter and exit Because it has to do some things when it enters and when it exits So that to consume some power to now if you want to You know you need to stay idle for certain amount of time to compensate for the power that was spent just entering and exit that Amount of time is called target residency so the So the kernel policy which is called the CPU idle governor Uses these two attributes to select the correct sea state the appropriate sea state in a given in a given idle time So now how do we filter out with these things? So one thing we notice here is the deeper the sea state the exit latency is higher And similarly deeper the sea states it has to do more things So it will take consume more power and the target residency There is also more power. It has to stay idle for longer period to Justify just entering that sea state. So now we can we can start you know looking thinking of What if I give some kind of a constraint saying that to the power to the governor of the policy saying Remo all this did don't select sea states above a certain exit latency So these are the are above a certain target residency So based on these attributes we can think of now two methods to do this these two are independent methods Even though the governor will use both of them, but where in for our purpose will be we can use this independently So we can come up with the two methods saying I We have to look at a way to how to specify a constraint saying To block exit latencies and target residencies So we'll talk about that. So and let us start with first looking at exit latency so this This graph and this pair and the next slides that the scales are just for illustration And they are exaggerated. So just to show that deeper the sea state the exit latencies are higher Now in our example We may have the shaded region is a constraint that we'll see how to specify But we say we say we specify a constraint of exit latencies To that we will tolerate only that is concerned. So if we can specify that what happens is the governor Will then see this constraint any cease cease state that is higher than that will be rejected so in now this example C6 was rejected and C6 in this example was the one that was causing the you know the huge jitter You saw the the graph where the it was going all the latency was going all of the place So C6 was the one that was causing the problem. So we when we remove it it goes away So how now one thing to note is when we are doing this thing we calibration trying to find the proper to exit Latency that will constraint that will block it. We don't need to actually know the specifics of the sea states We only need to be able to calibrate try out a calibration process. We try out a latency that will give good results So once we get a latency constraint that gives good results We'll call it will refer to it as a safe latency constraint Now the next method is target Filtering using the target residences so the deeper the sea state it has a larger target residency So the governor the way it selects sea states using filters using the target residences when the CPU goes idle it Get the governor gets called and the governor what it says its first checks how much time do I have to idle? It has several algorithms to it has a algorithms to determine that So one of the things this is the next event. It knows what is the next event? So what the what is the next event could be? if there was a time a text scheduled it could be that one, but if you Use a dynamic or adaptive text that is don't use the periodic text The next event will be the next event that is going to happen So the idle interval it could be that the time that your application is Slipping and will be waking up scheduled to wake up So now for example now the c6 for example, it has target resume residency of 200 mic microseconds Now the governor just sees that the next idle interval is less than 200 say 199 microseconds So this will not fit into the idle interval So it will find the next deepest sea state that will fit into it So that means we show if we could control this idle interval time. We should be able to Filter out c6. So how do we do that in our application when we sleep? We don't sleep for? More than say in this example 199 microseconds at one time if we want to wait longer periods We break it into chunks of 199 microseconds So let us name these two methods based on this That we use against the attributes safe latency constraint to filter using the exit latencies attribute and Managing the maximum idle time your application sleeves to Control using the target residency Now, let us see how this happens quickly look at how what happens goes on in the kernel So When the kernel goes idle when there is nothing to do it goes into the kernel idle So if sea states were Disabled there was no sea states enabled it will just wait in a tight idle loop That's our total waste if sea states were enabled it will call CPU idle driver For example Intel idle is the driver for Intel CPUs if there were no driver it will call a CPI CPI driver now that will call call the governor to Check the policies and pick appropriate sea state And after that it will pass that the CPU driver will pass it to the hardware and there are other logics in the hardware called the power control unit which will which can demote it even further but Demoting lowering is not a problem because lower the sea state it is safer, right? So we don't have to be concerned about that So we'll talk we'll need to only focus up or the the governor and all our solutions will be interfacing with this governor so the governor uses exit latency and target residence to filter out sea states and it compares it with The idle interval that it has it compares target residency with idle interval and exit latency with the constraint now we Been talking about giving it a constant giving the constraint, but let we don't know how to give the constraint So now in the next slide we'll see how we can specify the constraint So we'll specify the CPU the exit latency constraint using infrastructure framework called power management quality of service So this is a very very easy very simple interface all that you get give is a number and which is a microseconds and this the power the PMQ is a framework interfaces with the governor here and It is provides it the with the in the constraint that was given to it and The governor uses the constraint to compare it with exit latency. So it's we'll talk about in more detail Let's see the advantages of using this method. So up to the applications So this option gives the application an option to change the constraints dynamically at different phases. You can Enable it you can enable all see states you can disable completely all see states or you can fine-tune it and Also, this gives an option to control it per core. So you can turn off See states in some codes while keeping it on on other codes. So it gives it gives a lot of flexibility and This is the interface. It's pretty simple. It is they use the PMQ us as a System-wide you can turn on or off or see states for there all the CPUs in the system or per core So the what we are I'm showing here is per core it and it also has a driver interface and User space interface. So this one is the user space interface to using This is the first interface every CPU as the PMQ us resume latency microsecond attribute you just write the Constraint to it and that'll be yours and there is option called writing Disabling it all see states completely writing a n slash a Or enabling removing all restrictions writing a zero This is a new change. That was added. I think end of last year. So this is there in 4.16 And RT Linux is currently 4.14. So these changes are not there So these are the three patches you may need to pull in if you are using are the RT Linux distribution And also one more thing I want to note on is that the system-wide interface and the user space And the per core interface are a little bit different so you can refer to the the wiki page that I pointed at as details as well as the the reference also has that and Another thing is this document this I don't see it in the documentation yet So it is missing in the PMQs documentation. So you may need to refer to this or and file a buck to for the documentation So let's recap. So the also we covered the main concepts. So let us recap So we know that during critical phases, so we know that sees deep sea states cause some latency high-latency jitter and During critical phases, we want to filter them out. So how do we filter them out? We filter them out using some of the those attributes of sea states and Out using set of corresponding user controls and those are the controls is for exit latency, we are able to give a latency constraint and Filtering out using target residency is to limit the idle interval or application sleeves at one time So with this information, well, let us look at an example tuning So for tuning we will first have to calibrate. What is what is the some some values? We'll need to find out some values so So so this is the So first we find out what is the worst case we talked about what is the worst case? And then we find out how we can achieve a good results. That is something that is some latency behavior, which is consistent so we have two methods to do is one is using the The latency constraint and other is in the idle interval. We'll go into this The steps in detail now So we talked about cyclic text. So this is we use the cyclic test to find the generate the histogram so the cyclist test if this is the histogram option and The wiki page gives more details and the parameter that we are more interested in is the interval This is the iPhone i and the argument is in micro microseconds. So the way cyclic test worse is It you this sleeps for this interval in a loop. It keeps sleeping for this interval for and then it wakes up Then it checks the time then it sees what is the difference within the amount of time? I was expecting to sleep and the amount of time I in a actual time I woke up so that difference is the jitter is the latency so so that This is run for the number number of Samples you can run it first if you give a This time duration it will run for the time iteration keep running running in the loop So the longer we run it the more reliable the data So when we are running it this for the getting the worst case scenario We remove all restrictions when PM queers we write the n slash a which means no restriction and Cyclic test will keep a high interval so that all the C states will can get in so the first one is the how to find a latency constraint that will block The deep C states it's so we'll so inside that so we will In for this will the site the interval will keep it high so that that won't do any blocking Then in the PM cause constraint will calibrate will try different latency constraints until we get a good result similarly When we want to find the good idle interval will we remove all restrictions in the pm Cures and try different intervals until we forget a good results good results is basically The bottom graph so in this example, this is only for an example So the worst case say we found the maximus 400 microseconds So in a platform configure and kernel configurations if you run it for a longer period. This is a reliable number and To get the the good results we are able so to using the latency constraint method We found that 30 microsecond constraint is can filter out the problematic C states and and And an idle interval of giving hunter direct signal also gives a similar results So let's so let with this information. So let us look at an example tuning so so in this We are having a deadline at thousand microseconds. So now we have time to idle at thousand micro for thousand microseconds But and at the end of thousand microseconds. We need to be ready to The the response should be reliable. We should be We should be right on the thousand microsecond with the response should be there at that point So how do we ensure that so so we have to wake up a little earlier to before that? before the thousand because we If we we wait here with all restrictions remote the deeper C states will also get in and as we know when the deeper C states come Then we'll be having This kind of jitter. So now we know that 400 microsecond is the maximum Worst case latency so we can use that knowledge to wake up that much time before that that that problem a problem's problematic time And then we can do some things that can You know control that from happening So so what we do is we have to wake up a little bit earlier than the worst case It's cause we may have to after we wake up. We'll have to do some operations. We have to prime the cache So access the code or and data and data to so that they get Repopulated into the cache before it reaches this The deadline and another thing to notice. So now this we may want be Waking up at 400 microsecond, but it may not be actually waking up there It may wake up anywhere around there because you know, this is this is this is not fixed It kept it could wake up any any time within that time, right? So so what we need to be aware of so and when you come up We need to look at how much time we have remaining for the to reach the deadline and then the remaining time We should wait make sure that the deep sea states don't enter So now we don't want to take any more any more chances. So Then now we know that the safe the first example is using the idle interval method so this will then we start waiting and The island under microseconds chunks So see deep sea states will not enter and we are we are sure to meet the deadline accurately and Another option is to is to use the PMQ us method So that is maybe a little easier in certain use cases and that is we just said as soon as we wake up we set the Latency constraint to the safe latency constraint that we found Here in this case 30 microxons and the other option is to just write n slash a to block all sea states So that is another option. So once that is done and then the cache is also all in place So when we reach the deadline, we'll we are safe. So this pretty much come and covers the all the My findings in the CPU idle power management use in real-time workloads. So there are some additional strategies that would help In in general CP idle use of CP idle. What is this the entering sea states? Depends on the topology of the see the how the CPUs are organized grouped together if two logical CPUs are in a core the core will enter at the as a particular sea state Only when Everything in that all the logical CPUs in that enter the state. So if you group the your Your tasks in the idling tasks the tasks that can have opportunities to idle in In the in the groups you have a better chance of getting doing more power savings And other one I mentioned is the the priming the cache. So that is important You need to may have to develop algorithms to be able to It is a standard real-time application development procedure. So you have to Make sure that things are consistent when we reach the deadline. So we prime the cache before with reached there and another thing is this kernel configuration the boot parameters and their certain The kernel has to be built in certain with certain configurations. These are Document in detail in the wiki page. One thing is if interest is the the no hurts the schedule the scheduling clock ticks So is you can say more power by disabling the periodic ticks You don't want to every a period you want a ticks to be coming. So you want to disable it So but Some of the Options are known to create some problems with CPU idle So you can that is the documented here, but the know has full is undergoing more Work in the Linux kernel. They are constantly that has been improvements made for real-time risk For the real-time for use case So the key take a base is this methods the primary tool is to make sure that we don't Compromise any of this real-time constraints requirements. So it's the thing is the approaches It's okay to completely disable see states, you know, I don't want to see for but we have to meet the real-time console Otherwise, there's no use of The real-time application So and then we are it provides, you know the options to enable or disable dynamically that is and also Multiple options to do it. So there is a flexibility and he also can do it in degree instead of Disabling it at a boot time you have options to dynamically do it and Then the tools and infrastructures, you know, it is all of what we talked about is in the Linux kernel ready to it's available you can improve it if you if you need to and The and the scalability is option. We are is an another advantage You don't need to be knowing about specifics of the sea states on a platform or architecture Otherwise that design becomes not scalable, you know, so so that's why You and also it makes it easier to give provide a solution for Platform and kernel configuration So there are references There are good references and I'm also the have this This wiki page also has a lot of details. You can give feedback or you know improvements in that And thank you any questions Kernels the kernels policy manager is called the governor the way it so the PMQ is a Labyrinth system in the kernel interface So we don't take anything in the hardware behavior We don't do anything. We don't change the hardware behavior. So we just Interface with the policy manager So the yeah, the user that does what one common interface there So the PMQ us the one interface So you see there are two different interface one is a system wide and one is a per core and they both are inconsistent So when you look at the document, you'll be confused saying one says different than the other so In the system wide interface It is just you open a file and you write to that that there's a surface interface and then proceed it goes away The user space you write on any right application writes to it. It stays purses. So When you provide a solution you need to be aware of What is going on in that all the application? It is not it is just one common place where that goes Any more questions? Okay. Thank you. Thank you very much