 Hello, everyone. I'm Vagav Gupta. I hope you all are doing very well and having a very great day. I'm a senior year undergraduate computer science student studying in India. I'm a copy lover. I'm an OS Colonel in Dossiast. I'm passionate about embedded systems, firmware, bootloader, literally any technology functioning process to the hardware. Recently, I worked as a Linux Colonel mentee under the Linux Colonel Mentorship Program. My project was to upgrade the power management framework for PCI drivers. I was also a Google summer off course student developer in 2019. I worked with Artemis project on the Colonel. I worked upon Colonel API layers. This year, they invited me as a mentor for the same program. First of all, I would like to thank the Linux Foundation for giving me a chance to speak here. I'm really very glad to be here. So I got inspired for this talk from the project I worked upon. It revolves around PCI subsystem and PCI drivers. Beyond was my mentor for this project is the Linux PCI subsystem Mentena. And the whole program was provided by Shua. She is Linux fellow developer and Mentena. And during the whole project I interacted with many other developers, especially the maintainer driver maintainers because I was working on a very large set of drivers. And so we'll be discussing about PCI PCI drivers and power management in context of PCI driver PCI drivers. We'll see the codes and framework and other features that are provided to make the power management possible for PCI drivers and how the legacy framework started creating problems and there was a need of generic framework and how we went from legacy to generate the policy and we'll also discuss the next steps. Power management and PCI. Power management has always been a focal point in Linux. And when talking about PCI power management, it is like talking about powers that are very subtle to handle. I'm saying this because PCI in itself is very complex. I mean when I started working on this project, I thought to give a read to PCI spec because I wanted to know more about PCI capabilities and registers. And when I first read the PCI spec, I was like, is this some alien text? There was really many of the things went above my head. And things were easy when I started to work upon the code, but still I worked on a part of a part of a kernel and I still feel the code that are running the PCI devices are not less than magical spells. So talking about PCI PM. I want to make this clear first that whenever I will be talking about power management during the talk, I'll be talking particularly about the suspend and resume transitions. And the framework term, I'll be generally using it for power management framework for PCI drivers. Okay. So legacy versus generate. So before going into legacy versus generate, let's see what is the framework. So framework generally describes how your described flow of execution of a code because if you see any code, if you see check out any driver code in the kernel repo, you'll see that there are many functions. They are literally every function for every feature that device can perform. Basic ones are like probe initialization, attach device, attach, suspend, resume, device D attack, etc. So the driver code has functions or methods for each and every feature that a device can perform. But when and how that code has to be executed is decided by the framework. And again, framework is also a code. And for this, for this power management framework, you can find it in PCI code in the PCI code file in the repository. So like here it is given in for suspend method. First the device has to go and prepare then suspend mode is suspend no interrupt request. Similarly for resume. And so this is how the framework decide. And generally speaking, we needed the framework because earlier in earlier days, you know, devices were very custom and random. There was no particular standard. Okay. And devices were there, but things started getting very complex and very messy and ugly. And there was there were too many devices and was getting hard to put interaction call like put to interact with them. So there was a need of standard, like there is USB with PCI and these standards, they form a subsystem. The PCI has PCI subsystem. So drivers are generally a layer between devices and the kernel. Once we talk about standards, like PCI subsystem, the PCI core, it is a layer between the driver and the kernel again. Okay. So there's just like another layer, but they these things make simplify many of the things these things, these things are easy to understand and easy to code. So let's talk about PCI subsystem. Now, before going into the power management stuff and all those legacy stuff, first, let's see, let's see some, some of the few, let's go through a few of the few things that we are going to like see again and again. So it is in PCI dot h you'll find the structure struck PCI div. Although when you look in the into the port, PCI div PCI div structure is really very big. It has a lot many from your member variables function pointers and it is very big structure. But we are concerned with few of them. So stuck PCI day generally describes up a basic PCI device. Okay. And it embeds another member variable, which which is a type structure device will come across this data. First of all, we should look upon this one. It has a member pointer driver, which is a pointer to a variable to the with that variable of type stuck PCI driver. Let's look on that. Yeah. Okay. So struck PCI driver. Now this truck PCI driver, it describes the PCI driver, and it has two function pointers suspend and resume. Now these pointers we call them as legacy pointers. Right. Talking about devices, struck device, this structure describes a basic device. The earlier this PCI div, this PCI div, it described the PCI device. It means this structure is telling that this particular device belongs to PCI subsystem and other things. But this truck device described the basic, it describes a basic device. It doesn't care that the devices of cat type or block type or is belongs to USB or PCI or nothing. It's just a basic device. Now it embeds another, it is having another variable called driver. It is a, it is a pointer to the variable of type stuck device driver. Okay. Now here is the, this is the main thing. It has a member variable PM, which is pointer to variable type stuck day PM ops. Now whenever talking about PCI PM, PCI drivers, whenever there is this, there is a use of this structure, struck day PM ops, it means that driver is following that driver, that driver belongs to generate framework. Okay. So if we compare legacy framework and generate framework from core point of view, it is going to look something like this. We took up. Okay. There's a, of course, the suspended resume are the same functions in the pure driver itself because of course you are not going to write different functions for legacy framework and generate framework. There's, although there is some modification when you go from legacy to generate, but this is how the functions were like identified in legacy framework and in generic framework, they are inspired through device type structure. So you can see that in generic framework, we have, we have made it more generalized. We call it through dev driver PM in the suspend. Here it was using pdev, which is a PCI dev type. It means it was literally saying that this driver, this device where this driver is of PCI subsystem. No, but this is generic now. And yeah, as I said, that although this suspended resume in the generic framework are identified by dev, but again, PCI driver has to be called upon. So driver, this driver is again having some, you can say another function. Let's see. Okay. stuck PCI dev embeds device dev as we talked about earlier. Now this stuck PCI driver, it has device driver. It has another member variable driver, which is of type stuck device driver. So this is how things work. It's really just here for reference. You don't need to worry about to understand the stuff to things. Now driver should do only device specific jobs. Yes. That's what we mean because that's why there is a subsystem layer and the kernel because you see, if we talk about PCI devices, PCI is a standard PCI devices are built upon those standards, right? And like if you talk about PCI specs, it has even power management is also part of spec and power management and PCI spec is like described briefly that how what are the power states in which a PCI devices can go. What are the intermediate steps during the like you can say suspend resume transition, what are the intermediate steps through which our device has to go. So these are some standard things. Okay. Like all the PCI devices are going to follow those those standard things. All the PCI devices are going to have those standard features. So why are we writing all the code for all those standard features again and again in each and every driver? That doesn't make any sense. So, but legacy framework was like too much simple too much. Simple is not the correct way. Actually, it was too much not it was not doing much of the standard tasks. It was like, yeah, it did some of the some of some part of the of the tasks regarding power management and it gave responsibility to the driver, like go and do these these these and these stuffs. But generic frameworks that is that is why we need to start that that is these are the things that these are the problems that actually created the need of generic framework. There is no need of there is literally no point of including that code to perform the standard power management jobs again and again in each and every PCI driver. Now, as we saw problem with legacy and as I told that all the stuffs are over done by like even the standard jobs were done by the PCI driver itself. There is a possibility that a driver can mess up in that thing. Suppose those standard tasks they were not performing for the right order or at not right point of time or the the you can say the code itself can be buggy. So, you see the problem here because that can be a serious issue and this resulted in redundancy messy code and buggy code. So, talking about redundancy, let's go through the example. These are the patches I like worked upon while I was working on a project. So, let's go through this. Yes, SMSC 940 usually power management. So, see, this is the function for suspend. Now, you can see that they are these are the various function calls. These are PCI helper function PCI safety and enable big. So, you see, these are the standard things a PCI device should do like saving the state enable big if the device wants to and disabled device that the new power state then again and assume. Yeah, just restore the state enable device and set pieces hit master. These are just again and again each and every driver is including this code again and again. And these are literally just standard jobs. These all things could be performed by the PCI subsystem layer itself. There's no point of including this board in the driver again and again. But this is how the things work. So, you can see there are some other significant changes to like there is a change in function signature. We'll see data and these things. But again, the main point is that these are the there's too much redundant code in that they were too much. There were too much redundant code in the drivers. Now, here's another example. Now, see, are the two x double zero. Now, this in this. Yeah, in this case, the suspended resume, you see suspended resume code is shared by other drivers too. Yeah, so you see again, same problem. Same standard jobs, safety, disabled device at my power state. Again, all those stuffs. So, you see, now if this code messes up at some point of time due to any reason, all these drivers are going to suffer messy code is really very good fun. Like if I go to this particular patch, you see. You see this line. Now, what happened is that earlier, not generally because as I told that PCI spec defines the power states for all the PCI devices. They're D0, D1, D2, D3, and D3 port and there are many intermediate states too. But generally, generally not all devices follow all like go into all those states. Generally, they are D0, D1 and D3 state. But not all drivers go through all the states and generally for even suspend, hibernate and freeze. The same suspend method is used and resume, resume method is used because generally the code written for suspend works for hibernate to and freeze to. Right. Because that device that particular device that doesn't need to go in any particular other state. So it does not need to do is it does not need to perform any other specific tasks. So as you see in the in the suspend code. Now, in the suspend code, the peer, this particular message is passed to the suspend method. And this code is generally checking that if the you can see if this particular code was called for suspend, you know, when the device is going into suspend mode. If this code is called in the device is going to suspend, then only perform those those tasks. Otherwise, just pack up and go back. So you see if the if this code is called for during because the PCI code doesn't know that this device is not going to perform during the freeze or hibernate thing. It will just simply call and the driver will check that this device supports it or not. Or we have to perform those steps or not. But here is the real problem. Because of the legacy framework. There was a function call. Okay, even in case of hibernate and freeze, there was a function call the driver that the function is checking that no, this call is not for prison hibernate. Let's go back. So you see this is just extra code. This is just extra operation that the like colonel is going to that core is going to perform. Although it doesn't need to call this code for prison suspend. Now this thing, this thing is like this was a huge simplification achieved because of generic framework. You see, in general framework, I can literally a lot that for suspend, I need to call suspend method for freeze. There is no function. Just define it null for power of null and resume. We know. There is that extra step of calling because I don't need to tell you that how much costly it is to like call a new function. If you if you go look at the CPU level, the CPU is going to store all the registers and variables needed into the stack and then it will call the new function then it is going to again if it's condition. So now the CPU knows that all those things that has done was useless. Now bring those things back again and start the execution. So this was literally a mess. Next one. This code. Again, similar problem. And this this code does not want to perform like specifically for freeze. So again, there was a unnecessary function called in the event of peace. But again, with the help of generic framework, we can just keep it aside. Let's take a look at this one. So you see this is this. This is something different case in this case there is a code for suspend the other code for resume and you see all the all actual stuff that though these two functions do are actually the standard stuff. They are going to save the state, set the past it and again restore the state and set the past it in the resume one. So you see again, there is unnecessary function called in the case of suspense. And in the case of all other fees and hibernate, there will be unnecessary function called because all all that all the job that these two functions will do are the standard stuff. So again, with the help of generic is defined and now because these all stuff will perform by the PCI code itself in the generic framework now. Hi, the buggy code. Yeah, this one is interesting one. Now, let's check out. Okay, we don't have a full code here. So suspend function. Yeah, suspend and resume. Okay. And this code, as you can see, where is that particular line? Okay, I'm missing this one. What is that? No. Okay, I guess already. Okay. Hmm. Okay, this one is modified to one shit. Okay, no worries. So what this particular driver was doing is it was. It was calling PCI, PCI enable way in both suspended and resume and all it was doing is to disable the enable way. So, really, the device doesn't want to go into doesn't want to do enable weight in both either in both condition either suspended and resume it doesn't want to perform enable weight. It is just disabling it. And this thing can be buggy because maybe that was unnecessary function called that was placed there intentionally because the device was doing some weird stuff weird stuff. If not done so, or maybe things were ignored, but we don't know because there were there are many problems related to the PCI enable weight that function itself. So that part is buggy, but again, that part is there in every device driver. So even if that thing is, you know, worked upon this many, many drivers will still be using that particular buggy code. So we saw that how this legacy stuff is creating problem now legacy to generate how this conversions took place. So, first of all, we change the function signature of suspended and resume. As we saw earlier to ice told I showed you the function itself. Let's see. Yeah, so we change the function signature because earlier the function signature and earlier the function was accepting the parameter of type PCI. We are upon it was a pointer type PCI enable and PMSS type variable state. But now with the general framework, because we are going general, we are going general, we don't need to know that yeah, this device is a particular piece of system because the PCI code itself now know that yes, this device is of this device, this device is PCI. So, let the code do the all those stuffs driver should do only device specific jobs. Okay. So, yes, we change the function signature. And accordingly to PCI Dev and these things we need, then remove the PCI helper functions. As you can see these for helper function, these are our PCI helper functions and all the all literally everything these functions to our standard jobs. And these things can be taken care by the PCI code itself so driver code does not need to involve these code. Like, they don't need all these for all these things will be taken care by the PCI code itself. Step three, manage wake as I told again that this PCI wake from deep three and PCI enable wake this because many of the drivers use them in a very buggy manner. And this should be avoided or be or should be replaced with the device that wake up in your world because that is more general version of it. And most of the cases we should just avoid it. Okay. Now, step four, this one was the this was this one should be actually the first step. But yeah, now config PM as you see that this particular the suspended resume code, this was inside the directive container. If depth config PM earlier for the legacy one, but again, this was done because many devices don't want to support the like, you know, power management stuffs. So in that case, we don't want the unnecessary code to go into the kernel. But with generic, we don't need the config PM condition we need the config PM sleep. But while I was working, I saw many drivers which are all we many PCI drivers which are already using the generic framework. They are either using the config PM sleep or they literally remove the container this conditional code for conditional for compilation and they mark the functions as maybe unused. So this this part is literally that how the container or how the how they want the code to be. So I literally that just depends that is not standard thing that mark all the functions as maybe unused or use the config PM sleep. That depends of how the code was earlier return. Okay. Now step five, as I told that struck day PM off this day PM off wherever this day PM off is used means that driver that driver is using generic framework. Now they can be there to possibly know how to use them. Okay. Now, as you saw, in the messy one, or in the, as you can say, redundant one, this one, and thus I modified the suspended resume. Now just we have to like embed them in this variable. So I use simple day PM off. There is universal day PM off. In that case, run time suspended and 10 resume are also going to use the same code and. Okay. Either I can explicitly initialize the members as we saw in the case of messy one that we don't need to, we don't need to perform. We don't, we don't need to call this suspended resume function in the case of prison. Hi bonnet. In that case, I was like manually defining that if I'm like initialize them as null. So use this variable. Now steps access finally included in the driver. Okay. So how to finally do it loop the previous steps for 210 drivers. Yeah, that that's literally I did because when I started working on this project, my first task was to find out how many drivers are actually using the legacy code. So beyond suggested me just remove that legacy pointers from the code and compile the kernel using make a and just record all the errors thrown. So I did that and I just redirected the output and the error to the file. Then I performed all corporations and I had desktop 210 drivers. I was like, this is this is very large number. But yeah, things were things got easy as I told that once I start working on code, I started to like understand the code. And these, although these are basic steps, I told one, two, three, five, six. But again, you need to apply those steps for each and every driver on each and driver again is a new task in itself. Because some one driver is performing something except the standard tasks. Some drivers perform some steps in a particular way. Another can do it in another way. And you have to take care of that while converting legacy to general that you don't break things. You should not break things. Now, again, the PCI code we can see like in detail now, as we saw for the suspend first, there is a particular code for each one of them. When the device has to go in suspend mode, there is PCI PM prepare then PCI PM suspend and PCI PM IQ. Now this is the point involving the PCI PM suspend the functions check that is the driver using legacy code or the generic code. To make sure that the device is using generic code, just check that it is using the PM ops variable or not. Else it goes on and similarly it goes for resume. So as you can see, there was a major simplification achieved when we switched to from legacy to generate. Now all the drivers are not performing the standard steps again and again for each and every driver is like literally a lot of you can say a lot of code was reduced when we switched from to generate. And there was a lot of simplifications and you can just say that earlier the PCI code was like hello knock knock hello PCI driver. Now is the time to suspend please perform all those standard tasks. Just tell me are you going to perform it or not or either I should do and it waits. Yeah, that you are you are going to do it. You are using legacy framework. Okay, do it, but do it properly please. But now with generic framework PCI code is like knock knock hello driver. This is the time to suspend. I have done all the standard tasks as to the device specific top tasks you are supposed to do and return. Yeah. So that is now you see that there is much stability there is much more stability and there is much more power to the core now because all most of the tasks most of the standard tasks are done by now the core itself. So the fall of the legacy this was really this is the target and this is good this is happening because the legacy. Yeah, it was good. It helped it performed well. It gave good results. But now things are changing there is standard there are a lot of standard tasks and those standard tasks again and again because suppose there is a like update in some PCI specs and although it should not happen because generally it doesn't happen that should be very long but suppose something new has to be performed and then literally you will have to modify all the PCI drivers for but now with the work just included in the PCI code code and the driver will do only the device specific job that doesn't have to worry but still we are waiting to we are just trying because the submission process takes place verification and regression we are waiting for all the patches to go into the and the other major thing which I found out while working on my project because what happened was that when I was working on my project many times I was like this a new driver so most of the time what I used to do is look up on the neighboring drivers which are using the generic framework and look at their code to get an idea yes how this code is handling and how because they are sibling they are working on sibling devices and so this can help this driver should also like tackle out the problems in the similar way. So that time I realized that many of the generic drivers are still invoking those PCI health functions. Like if we go in this step during the during all the steps it is given in the documentation also if any general generic driver is saving the state of the devices and performing few of the like changing the state of the devices. It is expected that all the standard most of the standard tasks will be done by the driver itself and the core won't do the rest of the jobs. So again that part can be a troublesome so the driver they should not use those PCI health functions and they need cleaning and first of all they need to I need to research that what are the health function and is it like safe to remove them but yeah I have to work on that too. And third task is I need to figure out like we need to figure out that PCI enable wake and PCI wake from D3 this stuff because as I told earlier to that they are either using a buggy manner or they can be a bug in some of the cases so that needs a check. So happy hacking I hope you enjoyed the talk if you have any problem if you have any doubt just mention and I'll try to explain whatever the confusion is there thank you once again have a great day.