 All right, I'll try to talk about my name is Pat Lachov. I am here at the Linux kernel driver model and SysFS, which are both important features in the 2.6 kernel. And I'm going to talk about how it's making a little better one subsystem at a time and how it sets a great fit. If anybody has any questions, we're starting a little link about 10 hours. And I know there's talks starting at five, which will probably be well attended. I'm going to try and make it within that time. And if anybody has any questions, feel free to raise your hand and to come talk to me fast or anybody. Feel free to raise your hand and tell me that as well. So moving on, I'm going to try and quickly get to the talk slowly. So I'm going to talk about kernel driver model and SysFS. And I'm going to answer these basic questions, which are probably all pertinent ones. What is driver model? What is SysFS? Why do I care? How do they work together? And can you prove it? And I'm going to explain each one of those and of course I can prove it and prove to you that you should care about them very much. So first of all, what is driver model? The driver model is a collection of big structures and programming interfaces inside of kernel. It's really quite simple. It just describes common five-strips objects. I know a lot of you probably are familiar with object-reading programming. And if any of you have done kernel work which we've heard all of you know that there's a strong version of object-reading programming because it's commonly known to people because of languages like C++ and Java. Well, there's a use for object-reading programming and I think that instead of that, that's one time that object orientation is in the programmer's mind, not in language or something like that. And so the driver model follows that very well. So we're trying to describe self-contained objects and things using C++ inside of kernel. So what driver model does is it tries to identify what some of these objects are that are common to a lot of different subsystems and it captures them into structures. It also consists of a set of libraries for subsystems to use that manipulate those objects. It's not for both of the vice-drivers at all. It's only for the subsystems to use. So that has a couple of factors. One is that it's completely transparent to the vice-drivers during the program, as well as some of the users. And it's also it also makes it very transparent on the subsystem level. So that's what he said. That's it. That's all of it. Most of you can think of a way to explain this. Most of you will never see it. You'll never experience it and you'll never have any problems with it. It will just work perfectly, of course. That's pretty much all there is to it. However, I traveled very far to come here so I'm going to say a few more things. First of all, some of the concepts of the driver model are in the data structures. And this is very core of it. There's four basic data structures. One extra one of them is core data structures and then two more are in the first row, which I'm not going to talk about. The four basic ones are the number one one instruct device. It presents a single license system. It has no hardware access whatsoever. It doesn't care about IOCORs. It doesn't care about memory addresses. It doesn't care about location on the bus. All it has is really simple actually. It's like a name, a bus ID, a parent corner, and a list of jobs. It's very simple. There are fields that pretty much don't have any devices in the package that can house a bus. No matter what bus it's on. It's a serial device or whatever it's a PCI network card that doesn't matter. It all has those common keys. The next one is a struct bus type. This is not a bus, like an instance of an PCI bus. It only describes a type of bus. So there's one per subject. There's one for the PCI subsystem. There's one for the USB subsystem. There's one for the serial subsystem. Et cetera. And all that does is it says that hey, this is a type of bus that's in the system. And it has basically two things. A list of devices and one for the drivers that are attached to that bus. And some lots of technical and some other things here. But for the most part of the time. The next one is a struct bus. And it's really called struct bus. It's a new product. It's just a piss-off C++ product. And the fact that it uses people. So it's called struct bus. It describes one functional bus. And there's one per subsystem. Like a working input. And like a bus type. It only has a list of devices that are attached to it and associated with it. And then of course, those structs that I strike. Which then exists one for each driving system. And there's a driver-specific bus and a ticker bus. They can have the e1000 driver. It's a number of drivers that controls Intel, PCI, Delta cards, and that's implemented on the support system. So they basically only have the list of devices that the driver supports. And these lists, everything that are listed here are contained in the core objects that are described by itself. So these objects and these structures are pretty much never used by the drivers themselves. They're only used by the subsystems and by the driver model core, which I'll explain in a second. So the interfaces that the driver model has are pretty much four basic things. Register and unregister, which are used when the device is discovered or an object is created. The subsystem will register the device with the core, which I'll insert into the list. So if you go back, we have this list of devices and drivers that the bus has. So when a bus finds the device, it will register with the driver core. It will then add it to the list of devices that the bus has. That's it. Well, why do the buses do this up? I'll explain that a little bit more in a second. But for the main reason is because the locks in the list are contained in the essential driver core objects. And so all the access to them is mediated through the very core itself, which uses the complexity of the subsystem itself and keeps all the common code in one place. So the other thing is to get input reference counting. We use a lot of reference counting in the driver core because it's scalable. It's much more scalable and much more efficient than using any types of locks. And also guarantees that we can be safe against devices or drivers coming away, like modules being loaded and unloaded and devices being plugged into the lock. And so by using reference counting, we're making sure that we're always accessing fresh data at the right time. Then we've got a stumble over that number. And we also have SysSFS operation, which is mediated through the driver core. And I'll talk about SysSFS in a moment. For the most part, just know that any time you want to access SysSFS through subsystems, the subsystems are actually calling it through the driver core. And then, of course, the driver finding that line. And so one device is discovered by a bus. It's registered with the core. When a driver is loaded in a module, it's registered and it's loaded through the subsystem. It's registered with the core. And then there's access watches to control and to immediate the association of the driver with the devices. Because different buses handle different winds. We like every bus that happens to handle it in the same way, but unfortunately that's not the case. So, talking a little more about the driver layer and the driver model is that it's a simplification layer. It's not an abstraction layer. In the kernel world, we don't like abstraction layers. Abstraction layers are bad. So remember that. We never tried to meet everything that everybody did. We only tried to take very the most common data elements in the structures that are shared or created in all the different subsystems and still them down to common objects. And essentially simplify the subsystems that you're using. We never wanted to create something that encapsulated every single subsystem in the entire kernel. Because that just doesn't make sense. Buses of jQuery, classes of jQuery. We wanted to find a common set of functionality that most people could use. So by doing so, we've been able to create a very gradual evolutionary system by taking common components out of the structures and then gradually converting subsystems to use these data structures and embedding them in the larger structures that the subsystems use. And so what this does, is by using statically allocated embedded data structures, we can convince the subsystems to adopt the driver model by simply inserting the data structure in the larger data structure, and then all of a sudden it's ready for the driver model. It's just like that. It's very simple. To convert any of these to the driver model, it took three lines of code. And it was bad. It just perfectly worked. And so that also lends to the transparency of the entire model. No one knows what's happening. No one needs to bother with it. It just works. The library functions that I mentioned before, the interfaces are healthy, because they're not replacements. They're not meant to replace PCI device register or PCI driver register. They're only meant to help the PCI subsystem in the job that it's doing by taking care of the list management and taking care of the locking and what not. It's just going to take away the codes and then instead of an augmented functionality it's not going to replace it. And it takes care of redundant features like listening and locking and what not. So the subsystems don't have to worry about it. They're making it themselves. So, now that you're all convinced that the driver model is a good thing, this is how it works in kernel. It makes the subsystem simple. It remixes the code size, which is obviously a good thing. It's one of the things that we've learned after many times of trial and error reset. We try interface, we try one that made it able to, made it possible to remove underlines of code. We knew that it would be kind of a right choice. It creates a mechanism for more consolidation and simplification. Some of the hardest choices to make in some of the hardest code right is the first step. We found that as soon as we started to implement patients in the driver model a lot of people came to us and said, wow, I've been wanting to do X for a long time and now that there's an actual subsystem that will take care of it, it's now just an actual interface infrastructure setup so you don't walk up. It makes a lot easier and you kind of live the way. The last of driver subsystems is the scale up and down. Some of the subsystems weren't able to handle dynamic device discovery or registration like serial devices, had a static allocated list that no one would ever need more than that many serial devices. Which is all well appeared, most people don't have many more serial devices but when it comes to washing and when it comes to devices that are disappearing, they can be too spent. By using I have all the list management and all the registration platforms were able to mediate the pain that was called and they were ready to write as well as being able to scale up if we have dynamic object objects. We have lists that grow and get raised, etc. It's also been able to use power management and top bugging and now you're saying that Linux has always had top bugging and power management almost. And they always worked relatively well we were able to do that What the driver model does is a much easier usage and much easier complication of power management than the hot bugging I'll talk about power management in a moment and maybe a little bit but yeah. Moving on Does that all make sense so far? Is that okay? Alright so Is that better? Alright Does that mean you buy new gear afterwards? So, what is SysFS? How many of you have heard of SysFS? SysFS is an imaginary file system based on RAM and VES it's in the kernel itself and the kernel is created by the kernel when the kernel starts up so it's very similar to PROC PROC system but it's based on RAM and VES PROC was written before the VFS when the VFS layer was written RAM and VES was written and it became the world's simplest file system is incredibly small and very simple and so SysFS is based upon VES and also thinks and manages all the VFS features and makes it very simple to use so what it does is that it exports kernel objects their attributes and their relationship between the different objects to user space that's pretty high level of view but it's way more in one moment and also for that living app review for example PROC PROC is totally in our case things in place randomly you can never find anything unless you happen to know where it is and when new things are added to it it's added wherever the driver offers or wherever the subsystem author wants it to SysFS is not like we impose restrictions that are implicit in the entire system that give it an organized hierarchical view of what's in the kernel so everything in SysFS will stay other things that PROC does not have some things in PROC are dynamically created but you're not supposed to do that the locking is really bad it's always been really bad and it's really unsafe to renew things from PROC once they've been created that's not so in SysFS we fix those problems we also make it easier to expose user accessible controls we try to impose restrictions of having one value per file note that they're ASCII and can only make it so they can be right on the command line using cap and echo to read and write the values with some exceptions but those are special cases that don't even need to be discussed now make it really easy to expose these attributes and so instead of having code that people copy over and over and over that is 50 and 100 lines long to export one single value user space and thereby having the same bug in 200 different places like people have been PROC a couple of things at times we make it easy to use these three lines of code to export an attribute to user space and to read and write it as per example now of course there's other setup you have to do but just watch this movie here and then believe in what it said so a little history about SysFS it was created as DDFS driver file system to expose the device tree when the driver model is being created I worked at a company called Transmitter and when I started working on working on the driver model and I created this device tree and I made it to debug it and so I walked into this guy's office that I worked with and I said hey this is what I've done and I wanted to debug it so I was thinking of using PROC and he said PROC is correct I don't use PROC and he didn't actually say that he said it in a much harsher way which is not family safe but nonetheless he encouraged me to write in fossils from an important DFS which he had just written and I said wow okay I've never written fossils before but sounds easy to know and I went ahead and did it and it was very simple so as the driver model proved we added more arguments to the driver model the driver model started adding new devices then we soon added buses and drivers and classes to represent all these different arguments that we found and I wanted to expose those through the files of them so I changed the name to something a little more user friendly for actual tests and that's about the time we got an upgraded kernel and I had interfaces to sell the best to add all these new arguments buses, drivers, and classes and in doing so I had to extend the interface a bit so that there are certain specific calls then to SysFS to add new different arguments it didn't scale very well and soon other subsystems wanted to start using SysFS and the driver model like the block subsystem wanted to use it to boot the system with the primary device and they wanted to publish our device which I thought was really bad so what we did is we created smart payout and changed the name to SysFS to make it even more generic much to the chagrin and much to the dislike of many people and sorry about that so I'll just talk about Gavix for a moment because it's very, very nasty and it can be affected very quickly so Gavix are simple objects they're distilled from common fields in the driver model so I mentioned before that the driver model objects have very common attributes and fields that are common different objects throughout in the very subsystems so every device has a name has a plus ID, has a parent has a list of children no matter where it is so buses have a name and they have a list of children and they have a parent and so the classes and so the questioners so we took those and we put in this really simple small object called payout and what that does is it provides you a very simple set of management and functionality that the subsystems can use and so what this does most of this is not important for the sake of this discussion but just to explain each of the best directories relates to one payout everything is written behind the scenes by the driver so it's immediately transparent the driver subsystems don't even see the payout they don't question only the driver and each of the best directories created when the caretaker is registered so it happens automatically so the PCS subsystem will have to pass discovery devices ill register devices with the driver the driver forward will then register the payout with the caretaker subsystem now when you're doing that all of the location information you know where the devices are you know who the parents and you know who their siblings are so you register them and you register that with the process and then all of a sudden you have this directory you pick for free and you know exactly where everything is so now that your eyes are all basically I'll give you one second one second sorry so family directories files are actually out of the address from devices or any other object in the subsystem those are regular text files that are represented in the files siblings are relationships between payouts so it makes it very easy you have a start device not to the caretaker driver the device who the driver is the driver who includes what devices they control and so you use assembly to rig and process them to visually expose that relationship because it's already there in the kernel and so you cannot so why should you care first of all you might already be using how many people are using two subsets so you're already using a bit of a kernel since 2.5.1 December 2001 system s is integrated mounted by all of the contemporary distro I know it's done by SUSE9 too because that's what this is Fedora, Gen2 and I don't know about the other distros not this one but I'm not really a problem where they're at other utilities I've started to use it SUSE9 CPU frequency scaling used it at some point although I think that's being replaced by some other utilities all device tools are now limited through SUSE9 like request pieces and this kind of work to use and power management needs it by hierarchy in order to properly shut down the system and suspend it or do all the devices in order when you're suspending to S3 or S4 to suspend it now now that you're all completely confused and I'll show you what it looks like because I know that you're all curious so what do we have here light upon is that better alright so as you can see this is SUSE9.2 not too less than SUSE9.2 distribution or to this and the others which is what I happen to be using and at CFF we have as you can see the SUSE9 it's actually mounted by the kernel on boot during the internet test stage and so it's not mounted automatically it's already there so what we have in the top level tree is all the major subsystems that are represented in SUSE9 we have the block layer we have the buses, the classes the device tree, the firmware the firmware the firmware interface all the modules that are loaded the module interface which includes all the modules that are loaded in the system and the power management interface so just to show you an example of some other people that are using SUSE9.2 you can see if you look at the module directory these are all the modules that are loaded on my system so whenever module is loaded into the kernel so if we look into the devices directory we have the top level devices in the system and the only one that we have to care about for now is the PCI PCI 00000 which is the root PCI in my system so if we look at that these are all the PCI devices at least the one starting with four zeros of the colon and you can roughly see the hierarchy of the tree it's only a laptop so it's not a very interesting PCI tree there's about a dozen or so devices you can see some of the devices like 0000 zeros all the way across to the top that is the I guess that's the word that is the north bridge that's the very core device this next one is the PCI ATT bridge and the device directory of that is my ATT video card and over here we have what you can't necessarily tell right now these are the USP devices that are in the system and some other audio devices that went on so in the bus directory we have all the bus drivers that are in my system IDE, PCI, etc etc etc so remember back to the very beginning when I was talking about the different device model objects the bus type is one of them and these are all the start bus types in the direction of the system they have so if you look at SysBus PCI devices these are all the devices that the PCI bus has found and discovered on the system now all the devices are actually registered in the device tree which I showed you just a second ago and so what is registered here is all the assemblies that point to the locations in the device tree since there's no need to actually replicate the directory tree we just create a assemblies in this directory that points back to this location in the device hierarchy so what this does is it needs to apply the listing because all the devices have a unique name or unique identifier within the PCI tree and it points back to its unique location and if you look in the driver's directory we have a list of all the drivers that are registered with the PCI services as you can see there's a lot just in case you happen to have those devices on your system and they all use directories in the street so the only ones that we really care about for now are going to be the UHC app USB control for the driver so if we look in there we see three semblies three devices have been registered with the UHC device because there's three USB controllers and so each one of them gives the semblies two is the directory location back to the location in the device hierarchy that the device is located at and so if we look any more complete view of the USB tree which is the USB bus type and hands actually four controllers and this is then both UHCI and EHCI we see all of the USB hub devices and then a shadow device that is created in the scene we have the hub driver and the USB driver which are this is just an internal driver within the USB core that is registered and all the devices that are bound to each one of those each one of those drivers and of course the semblies point back to the location in the device hierarchy that the devices are located at and that's all it makes sense so far alright, much better response than the last one so as you can see there's also a few drivers that are not being used by the USB subsystem and so this is where the full part is this is the digital camera that's your father from thank you first of all, let's see everyone smile and wait hey so we take a picture turn it off this is what it looks like now, so we take the device it's a flash reader with a USB port and we plug it in and we see that some new devices have appeared so first of all, we have a USB device here which was not in the previous one and that is the device that was discovered by the USB subsystem here so that bound back to a device in the device tree it's discovered that it's a USB storage device so now that it's discovered by that device it also knows that it's a storage device which was with the USB storage driver by doing so, it creates a sim link in the USB storage directory back to the location of the device in the last hierarchy so if we look at the USB subsystem since USB storage devices are accumulated with the USB subsystem for some unknown reason we now have two devices that have appeared and it's actually just to show you that I'm not clicking this so now the device is here I've unplugged the device and so now I've put the device back in and excuse me for a minute and we see that there are new devices there that have now gotten new devices that need to be prevented because of the increments in every kind of device and the number of decrements and personas on the block so now that we have new devices in the system so let's try this so let's see where do you find more information there is the driver model which is in in Linux and it's all contained in the driver base the core driver model is about 2,000 lines and it has removed hundreds of lines from there is some other stuff in there that has been added on to the driver subject for the driver core so it's now a total of about 5,000 lines so the best is in those in those files and that's about 1,500 lines and the cab resource and I would recommend the first one for casual reading and the other two since the best is okay, the cab report gets a little air it's not your average for even more tradition there is an excellent book that was just published it just came out two weeks ago Linux collection of drivers by a rally table and the other building is actually selling this book it's an excellent book it covers the 2,600 2,610 it includes a chapter on the driver model and also has hesitated all of the rest of the information from the previous edition of the book a lot of the book is actually based on the LWM.net driver reporting series by John Corbett which is an excellent series that explains many aspects of the driver model changes as well as the other changes throughout the car model and so that's it we're going to replace them with us oh sorry is this the best going to replace them with us or is it the last step last step so yes, this is the best design for you going to replace that, the answer is no it was never designed to do that we never intended to do that because the MFS and the ATM like that is not the longer the kernel at all it belongs to the average base so what Sys-the-Fest is enabled is exported all of the information that's needed to set up all of the device entries through to user space and now there's a new program called QDAV that uses the information and how plug-in can be structured in the kernel to populate the directory dynamically and all of the I think both Fedora and Sys-the-Fest have been using it for some time and so you can actually have it evenly dynamically created that file system without having any device entries that exists all of the code yes so the question is what are you possible to have only Sys-the-Fest has become the facto standard for ATM-Sys-the-Fest so it's now a new location in your hierarchy that you'd need or you've probably already seen I don't know if it's an actual standard yet but it just happens to work so the question is will there only be a slash-sys and not a slash-proc no proc has its place proc has its use now it has a very small use that's it ever since a long time ago people decided that it would be really handy really useful to export other things about the kernel other information about the kernel to users-based and so like I mentioned one of the slides sometimes the hardest to take is the first step so once somebody has already done that they latch on to something else that people have done and overload it and use it and that's what's happening with proc is that people have attached on to the back of the proc they insist controls and everything else which don't belong there they belong in some of their transport mechanism proc has a limit to export and process ideas and all the information is just able to defend and so that's why you see some of the scaling problems and the locking problems that are going on so ideally what would happen is that a lot of the things that are except for the proc will be moved over to system s and then proc would stay for exporting and process information so not quite, we wouldn't just have a slash set so we have both of all the information that's in the proc so any other questions yes, I'm sorry yes and then we'll work so the question is the statement is that these crynex porting says the best of the NFS and it didn't work and my response is that patches are going to fix that okay I've never had a need to because I don't use NFS and so I've never encountered a problem and as Tom dictates I don't have the time to fix all these people I'm sorry it doesn't work I'd love to see it fixed yes to suspend different parts of the registry it just hasn't been done correctly and so we can suspend the entire tree but we haven't figured out where to go so you suspend this part of the tree so we've had issues with managed patches in the past but it's nothing necessary it's just technical issues that we have with the patches sometimes what's best for the users isn't always best in the terminal so we have a hard time implementing all the features that are really important to people useful because they don't live up to the standards that could follow any other questions? yes right so is it desirable to suspend a device even though it's now a driver account to it yes because there are some devices that have default state programs and other devices like for instance courageous devices that will never have the driver account that have supported free information like what buses are being on them and whatnot that needs to be retained so there's been a number of sufficient proposals in the past years some of them have been acceptable none of them have been implemented such as implementing a default driver for bridges or default drivers for all devices that are in the class or being able to implement or attach multiple drivers to a single device so that you always have the default driver found to be a device that will handle suspend and resume while not necessarily supporting the functionality of the device like networking or video that's something that just hasn't been we haven't had the chance to do yet any other questions alright thank you