 Hello, everyone. So today I'll be speaking about adding runtime power management capabilities to device drivers. Before going on to the topic, a little bit about myself. I'm an associate software engineer at Colabra. And I've been mostly working on some of the hardware enablement projects. One of the recent projects that we've been working on is for RK3588 upstreaming work. If you'd like to check it out, all of our work is online. And just one search away. I've been previously also worked on Steam Deck kernel development. And currently I'm also part of the kernel CI team, where we've been doing some work on the regression tracking. In 2020, I was an outreach intern for SoundOpen Firmware Project. I started my Linux kernel development journey through IIO subsystem. So if anyone is new to device driver development or Linux kernel in general, I would definitely recommend going through IIO subsystem. So today I'll be speaking about PM core and runtime PM relationship in general. And then we'll look at the subsystem level runtime power management. Next, I'll give a brief description about some of the helper functions. And then we'll look at one of the light sensor driver and try to understand it how it actually works so that we can add our PM support to the light sensor driver. And finally, some issues and its solutions. So this is the big picture of how power management looks like in Linux kernel in today's date. There's two models, static and dynamic. Runtime power management falls under the dynamic category. And especially for idle devices, I won't be speaking about the other frameworks for two reasons, because it's not relevant and because I don't know about them. But there's a very nice talk from Kevin Hillman about an overview of all the frameworks. So I definitely recommend that if you'd like to understand all other frameworks and get an overview about it. So runtime power management was first introduced in 2009 by Rafael Weisoki. And back then, we just had the traditional way of system suspend and resume, where all the devices would be put into low power state altogether or they would be brought up altogether. But this kind of model is not very efficient in all scenarios. For example, if you would like only one device to be active, but all the other devices are also forced to be active in that case. So that is why a mechanism to put individual devices to sleep was needed. And that's how runtime power management was introduced. One of the examples could be if you'd like to listen to some music on your mobile phone or on your laptop. At that time, you don't really need your screen to be active. It would just consume power without doing any work for you. So that's how you can add runtime power management support to that. And it will give some performance benefit and it will consume less power. And this is just about one device. But if all the device start adding runtime power management support, then it brings a huge difference in the overall system power consumption. So runtime power management is not completely independent. It integrates with PMCore for some of its features. PMCore is where all the system-wide power transition happens. And it handles all the system-wide related power management. And runtime power management also need to work with PMCore in order to synchronize some of the functions with system-wide power management. PMCore also provides a work queue. So devices that have runtime power management support will put their work items related to suspend or resume into this work queue, which is provided by PMCore. So next, you also have dev PM op structure, which usually has all the callback functions related to system-wide power transition. But the following three that you see are specific to runtime power management. And this is where devices will use these callback functions into their code and perform suspend, resume, or send an idle request. There are also certain runtime power management fields. These fields are actually used by device drivers to update some of the information related to how power management is done. And these fields are then used by PMCore. PMCore will, before doing any suspend or resume request, it will first check for these fields to see whether it is satisfying all the criteria. And then only it will go ahead with the request. So finally, you also have certain helper functions provided by runtime PM. These helper functions are again used by device drivers. And once you have used these helper functions into device drivers, then helper functions will go to PMCore and put their work items into the work queue. And PMCore will then execute the callback functions provided in the device drivers. So it's not always necessary that the suspend or resume request always goes from the device driver level. There are a bunch of layers on top of device drivers as well. For example, you can have a domain where there are a group of devices who fall under a certain domain. So callback functions like suspend and resume can be called through these PM domain as well, instead of on the device driver level. And then you have some classes or types. Example for classes would be like network devices fall into a network class. And instead of all the network devices doing runtime PM individually, it can happen at the class level. Similarly, you have type. For example, network devices have different types, like wireless devices, ethernet devices. So these bunch of devices can do PM together. And finally, you have bus level as well. One of the examples for the bus could be if you have an I2C bus having two or three devices. In that case, instead of two or three devices executing the callback functions, it could happen at the bus level so that there is no code duplicacy in that case. And if all the subsystem level is not present, then PM code will look at the device driver level to check if there are callback functions, and it will execute accordingly. This is the example of how bus level runtime PM looks like. If you see there is a PM runtime suspend and resume function, these are bus level suspend and resume function. If you check the code for a PM runtime suspend, it internally calls PM generate runtime suspend. This will actually suspend the device associated with this particular bus. So if this bus is being used by two or three devices, then it can directly suspend at the bus level. So this is Lola. Lola was initially single and happy, and she can do anything. She can sleep anytime. She can wake up anytime. That is how devices as well work, which have independent runtime support. So in that case, they don't depend on anyone. Next, Lola has a kid now after some years. If the kid decides not to sleep at night, even Lola cannot sleep at night. So that's how devices also have dependency with runtime PM. If the child device has runtime PM, then if there is a parent for that child device, then the parent cannot sleep as well. That is, it cannot go to suspend until the child is active. So now Lola has two kids after one more year. So again, even if one child decides not to sleep, then the parent cannot sleep. Similarly, that's how it works with runtime PM, where if one of the devices is also active, parent cannot go to sleep. But runtime power management gives a way to avoid this by if the child device executes this function PM, suspend ignore children into their code, then the parent can go to suspend or can suspend and go to sleep, only if this particular function is called by the child device. So the very first step when you want to have support for your device into runtime power management, then first thing you do is initialization and enabling. By default, not all the devices are suspended, but it's not necessary that your device is not able to do input-output operation initially. It might be active, but still the runtime power management status by default will be suspended. So you need to tell PM core to that the device is able to do input-output operation. So it is active, but you have to inform PM core and that is done using PM runtime set active function. Next, you also need to enable runtime PM. This will basically allow you to use other helper functions for the runtime power management so that otherwise you won't be able to use the helper functions provided by runtime power management framework. It will give you an access denied kind of error code. And there's also DevM version of this and not many devices use this function. There are still people using PM runtime enable, but DevM version usually helps to also automatically disable the runtime PM and you don't have to explicitly call the disable function when the device is being unregistered. Now, this is interesting. If you want to run your callback functions in an atomic context, then you can do that using PM runtime IRQ save function. And that's how the interrupts will be disabled. So whenever any suspend function is being executed, it won't be blocked or will not sleep if this function is called and the interrupts will be disabled. But it is usually not advisable to use this function because if the child device is executing this PM runtime IRQ save and if it has a parent, but if child is IRQ save and the parent is non IRQ save, then that will create a problem. That is why what runtime power management does is that it will not let the parent sleep at all. It will never let the parent execute the runtime suspend function. It will not be suspended. So that is why it is not advisable to use it unless you are fine with the parent not going to sleep. So after initialization and enabling, next step would be resuming and suspending. Resuming is usually done before you want to do any input-output operation. And the most simple function for that is PM runtime resume or PM request resume. And after you're done with input-output operation, you have a PM runtime suspend and PM schedule suspend which will just suspend the device. But usually devices are not very easy. You might have multiple users using this device. In that case, if you have a user A using the device and after it's done with its input-output operation, it will suspend it. But the user B is still trying to access the device. In that case, it will give a fatal error and that's why reference counting has been added to runtime PM where you have get input function to keep a track of the device usage request. In this case, if you use a get function, it will do a resume, but it will also increment a usage count for it. And only when this usage count is zero, then only the device will go into a suspend mode. Previously, many drivers use this PM runtime get function, but the problem with this function is that it will increment the usage counter, but if the resume fails, then it will not decrement the usage counter back. So that is why a new helper function was introduced, PM runtime resume and get. So then all the drivers were made to replace with this new function that was added and this will resume the device, but if it fails, then it will again decrement the counter. There are also certain synchronous functions provided by runtime PM. So if the helper function doesn't want to block while the next code is executing, then the synchronous helper functions can be used as well. You have PM runtime get sync and PM runtime put sync for that auto suspend. So imagine you have a device which does input-output operation every one or two seconds. In that case, adding runtime power management is not very helpful because it will immediately go to sleep and immediately come back and there's a lot of time and energy that is wasted in this process. So that is why there was a new feature added to this auto suspend where you mention a particular inactivity period for the device and if the device is inactive for this very period of time, then only the suspend request is executed. So this is done using auto suspend function. So auto suspend seems a bit confusing. It doesn't mean it automatically suspends it. It just means that it will defer the suspend until the inactivity period has been elapsed. And to do that, to add an inactivity period to your devices, you have this delay time that can be added using PM runtime set auto suspend delay. And after that, now the PM core will use all the auto suspend function, but how does the PM core know when to start counting this inactivity period? How does it know that when was the last input output operation done? So you need to tell the device drivers need to tell PM core that I'm done with the input output operation. So you can now start counting this inactivity period. So suppose you have one second of inactivity period. So after one second, only PM core will execute the suspend function. And that is how, that is done using PM runtime mark last busy function. That will set the last busy field of power member. And you can also change this inactivity period through CISFS. There's this CISFS attribute auto suspend delay MS that you can do. But there are certain race condition issues with this auto suspend, where suppose the inactivity period has started and a new input output request comes in. So at that time, there are possible chances of race conditions, but the device drivers needs to handle this. They can check whether there are no pending input output request in their suspend function and then only execute a suspend. Finally, we will be doing the removal where after you're done with your device is being unregistered, you can also unregister the RPM framework. At that time, you can just call PM runtime remove, which will also disable all the pending runtime power management functions. And it will unregister device from the runtime PM framework. There are certain ways with which you can also control these RPM option through CISFS. If you write on to this echo file, then PM runtime forbid is executed, which will tell your device driver not to use RPM. And similarly, if you write auto, then you can again bring back the RPM support for the device. Then you also, if some devices don't want to control through CISFS, so they would like to remove the attributes from CISFS. So that is done using PM runtime no callbacks. It will remove the files from the power directory that you see here. So a little bit about light sensor driver, because if you want to apply RPM, the first thing that you need to do is understand the driver, how it works. So this is an I squared C based light sensor driver. There are certain terminologies. Light sensor drivers will fetch the intensity from environment and store it into a register. And similarly, we have ALS data register, where the raw value of the intensity is stored. And then you have processed value as well. The processed value is basically, which has done some processing on the raw value, because raw value has certain factors like window factor, and you don't want to count infrared light, you just want visible light. In that case, there is some processing done on the raw value, and that new value is called the processed value. This driver also supports multiple integration time. Integration time is basically time taken to capture one intensity, data for one intensity of light, and more the time, the better the data is in that case. So this is where we have all the previous information that I said about the light sensor driver is stored using this info variable, and info has redraw, rightraw, and redavailable. Redraw is basically when you read from the sensor, and rightraw is basically when you write back to the sensor. And this is where actually our input-output operation is being done. If you see the details of redraw, we are reading the ALS data register, and that is our raw value. And if you see the processed one, where we are again, getlux is basically where we are doing, again reading through the sensor, but it is a processed data. And get integration time is what the current integration time is. As I said, there are various integration times supported by the device, so this will display the current integration time. And if you wanna write back to the sensor register, then you do this using set integration time, which is done using rightraw. Then you also have channels. If you would like to access all these details through channels, which we mentioned, which we captured from the sensor, then you need to define these channels. And again, you have raw processed, and whatever details you need, you can add to this channels. It will look something like this, and then these are the files that you get, SSFS files after you add the channels. Now we will actually add the RPM support to this driver. The very first thing that I mentioned was to inform the PM core that the device is active. And you do that using PM runtime set active. And the next thing is, if you wanna be able to use the other helper functions, then you need to enable RPM framework. That is done using DevM PM runtime enable. Next, I want to use auto suspend for this driver. So I inform our PM core that use PM runtime auto suspend. I also set a one second of delay. This is the inactivity period of time, only if my sensor or my device remains inactive for one second, then only execute the suspend function. That is what I'm informing to the PM core through this. So as we saw that we were doing the input output operation through the read data function. And so before doing the input output operation, you would want to resume the device. And that is done using the set power state. I will get into details about that function later, but assume that it is doing the resume function. And after doing the input output operation, I will want to suspend the device. And that is done using again set power, but the second argument is false in that. And similar thing is done in get lux function as well. If you see him again before reading the data, I will first resume the device and then I will suspend the device. So this is how the set power state function looks like. If the first value is true, then it will do a PM runtime resume and get. That is it will increment the usage counter for my device and it will execute the resume function, callback function. And if the second argument is false, I need to now inform the PM core that this is the time to start counting the inactivity period, the one second that we had set. So that is done using PM runtime mark last busy since I'm done doing the input output operation and it will mark the last busy time. And finally, once that time has elapsed, put auto suspend is executed where it will suspend the device only after an activity period has been elapsed. This is how it is actually hooked to the PM core where we have runtime suspend and runtime resume callback function. This is very device specific on how you do it for your device. In this case, we just write some value to the register. Suspend will basically disable the device and resume will enable the device by writing some values to the register. Once you have added all these support, there will be a power directory created under this for your device and it will have all these values auto suspend, delay, control. I think we discussed most of them, but there are two of them which we would like to discuss which is actually used by Powertop. If you see the runtime active time and runtime suspended time that is used by Powertop to give you the usage details in the device stat for each particular devices. So this can be useful if you have multiple device who has added runtime PM. So the issues that you might face is what I faced after doing this was once the device goes into suspend mode and then comes back to resumes, then all the register data that was previously stored is lost so it will come back to its initial value, whatever was stored in the, for example, I had the initial value for integration time as 100 MS and I changed it to 400 MS for getting some values, but after suspend it will come back to 100 MS. So you need to restore the values of your register using, for this case we used a rich cache and that could be used to restore your values for the registers and usually devices are not independent. You need to have a bunch of devices that work together and now there's also power domains. So there is, runtime PM is not always useful so that is why a new framework was added that is called as GenPD and this is basically built on top of runtime PM which will manage the power for group of devices. There is a governor who will manage all the devices under the same PM, which are under the same PM domain and it will suspend the devices when all of the devices go into idle mode. Thank you, that's it. Special thanks to some of my colleagues who are not here who helped me prepare with this presentation so thanks to them as well. Any question times? Any questions? Hi, thank you for the presentation. I have a question. So I don't know if I saw correctly. So for the driver, for the sensor driver you did also auto suspend but also when reading data, the sensor data you were doing a resume and then suspend. So isn't auto suspend enough? So you're reading the data and then you just auto suspend after 1,000 milliseconds. No, that is not enough. You need to explicitly mention whether you would like to suspend the device. As I said in the beginning as well that auto suspend is just used to add the inactivity period but you need to still explicitly use the auto suspend functions given by RPM framework to suspend it. So it doesn't handle it automatically. Okay, and is it, I know to do each time suspend and resume each time you're reading a value. For me it doesn't seem, it seemed a lot of waste of time to resume and suspend, resume and suspend each time. So it actually makes a difference. This one is there are certain devices which have more idle time. In this case it's only one second so it doesn't seem very useful but in certain cases there could be more idle time as well. You don't actually do a resume and suspend every time. If you read one per second the device won't get the chance to enter suspend. So it'll be always active. The number of times you're reading is much more than the suspend delay. So that happens only if you finish your readings. So the device won't actually get into a suspend if you continuously read from it. Yes, if I continuously read on it then it won't be suspended and that's what inactivity period is for. And only if it is inactive for that very period of time then it is possible that it will remain inactive. And that is why after an active period has elapsed it will execute the suspend function. I have another question if you know. So you add runtime PM suspend but some drivers also have system PM suspend. Do you know how this to interact? That's a complete whole topic. But I know that this is being handled by PMCore on how this runtime suspend and system suspend work together. So it's handled at the PMCore level but I don't know the internal details about it. Thank you. Any more questions? Thank you.