 Good morning everybody and welcome to this talk on how not to write device drivers. Skip that. If you don't know me already, I've been working in Embedded Linux for many years and you can catch up with me at these places. But the important thing then is that today I want to be talking about device drivers and how to do it in user space. So I'll begin by talking a little bit about device drivers in kernel space just to give the background. And then we'll talk about user space device drivers and I'll follow up with some examples using GPIOs, PulseWidth modulation and I2C. So the conventional driver model is that all the hardware is accessed via the kernel. So if you want to access some hardware, you have to write a kernel device driver that exposes some kind of interface within the user level and then applications way up here. Can you see that? No you can't. The application somewhere way up here can then call the kernel system interfaces. The kernel then accesses the hardware on your behalf. Down here in the kernel you can handle things like interrupts and also do fancy things with DMA and all kinds of other stuff. And then the interface between the kernel and the application is basically through files. As we know in Linux everything is a file except network interfaces which are sockets. That's almost a file. So most of the time when you're writing user applications you interact with the device whatever it is via the POSIX function calls, I open, close, read, write, et cetera. And the files you are accessing are going to be usually either device nodes, that's the stuff in the slash dev directory, or we can also expose attributes via the sys-fs interface via slash sys whatever. So for example we'll look at the moment at the GPIO and you can access GPIOs through slash sys slash class slash GPIO. So this is the normal way of doing things. So writing kernel device drivers is a lot of fun. Hands up who has written device drivers for kernel before. Yeah pretty much everybody. So I mean it's great fun, I really really enjoy that. But it is a little time consuming. You are working within a kernel environment which is quite a lot different to a normal programming environment. And it has the problem that if you introduce a bug in a kernel driver, then you can crash the entire kernel and bring the entire system to a halt. So bugs in kernel code are much more serious than bugs in user space. So the reason for this presentation really is to say that if possible it would be good if we didn't have to write kernel device drivers. Both because it takes a certain amount of time and effort, but mostly because it introduces points of failure into your system. So I want to encourage you when you are accessing hardware to think can I do this from user space? Do I really need a kernel device driver to do this thing? If you can do it from user space, then do so. That is safer and easier for everyone. Note about device trees. So since we're talking mostly to an embedded audience here, you'll be aware that if you're working on ARM platforms and several other platforms as well, in order to access the hardware, you'll need to tell the kernel where the hardware is and you do that through a device tree. So as a part of this, but I'm not actually going to go into this because that's kind of too much detail for this session, but as part of this exercise you still need to write the device tree or the device tree overlay to give the kernel access to the particular bit of hardware you're trying to get to. So I'm going to start with GPIOs, general purpose input output. So these are the most basic level of digital IO on a system. There are a bunch of pins that we can configure for inputs and outputs. And as outputs, we can use them to control lights, relays. We can control other chip selects and various other things. And then as inputs, we can use it to read the state of a digital input, which could be a switch or a button or something else. And when we read them as inputs, we can either read them in polled mode. We can just keep on reading the input and see what it is. Or in most cases on most modern SOCs, we can configure the GPIOs to generate an interrupt. And the interfaces we're going to be looking at allow us to wait for that interrupt so we don't have to keep on polling. You have the luxury at the moment of having two interfaces, two user space interfaces, two GPIOs. We have the old GPIO lib interface. It's not a library, by the way, it's a kernel driver. So GPIO lib has been around for quite some time. It exists as a CISFS interface. The great thing about it is it's scriptable. It's just a bunch of files. You can write shell script or whatever to access those files. So it's nice and easy. But it has a few problems. In particular, it doesn't handle interrupts particularly well. And there can be issues if you want to change more than one input or output at the same time. So we have, in addition, then the GPIO Cdev interface, which is the modern better way of accessing GPIOs. But since it uses IOCTL functions, as we'll see in a moment, you have to write some code to do that. You can't script it. So starting off then with the GPIO lib CISFS interface. So first of all, looking in the, no, it's still not working. Looking in CIS class GPIO, you'll see a number of registers called GPIO chips, something or other. So each one of those directors, rather, represents a GPIO register. Typically, as in this case here, the registers are 32 bits each. And we have four of them. This actually is from a Beaglebone. We can then, we have these two files called export and unexport. So you can write a number to the export file, and that will export a GPIO to user space, which you can then access. And then when you're done with it, you can unexport it by writing the same number to the unexport file. Now these numbers are a linear range, well, there are a range of numbers starting, usually started from zero and going up to the maximum number of GPIOs you have. So in the example here, GPIO chip zero, this register will have the first 32 GPIOs numbered from zero to 31. And then GPIO chip 32 will have the next bunch numbered 32, 263, and so on and so on. This is a slight pest because it means you have to convert the, normally when you look at the schematics, you will see the schematics label the GPIOs as the GPIO register and then the bit within that register. So you have to convert that GPIO chip and register number into a linear GPIO range. Looking within the GPIO chip directory, you can see there are a bunch of files. The important ones are one called base, which tells you the first GPIO number occurring in this register. So we're looking at GPIO chip zero on the screen, so base will contain zero. N GPIO tells you how many GPIO pins there are in this register, 32 in this case, and then label is an arbitrary label to identify this thing. And in the case of this particular SOC, you will see that it is just the name of the chip, it will be GPIO chip zero. So then if we want to export, say, pin 42, which will be the 10th pin on the second GPIO chip, you just write 42 to the export file and then you see that magically GPIO 42 has appeared as a subdirectory. Assuming that's possible, if the GPIO has already been used by, for example, something within the kernel, then the export will fail. But assuming it is a free GPIO that we can export to user space, that will succeed. And then if you look within that new directory, you get the access to the GPIOs. So there are a bunch of things in here as well. The important things here are a file called direction, which can be either output or in, sorry, out or in for an output or an input. It defaults to an input. And then we have the file called value, which represents the value of the level of the pin. So if it's an input, when we read value, we read the level of the pin, one for high, zero for low. And when it's an output, we can write one or zero to value and that sets the output to be either high or low. And then we have this file called active low, which we can set to one to invert that logic. So it turns out the hardware engineers get a bit confused about high voltage and low voltage. It's quite often they get it wrong. So in order to get it right again, you can flip active low by setting a one to that, which means that when you write a one to it, if it's an output, it goes low and when you write a zero to it, it goes high. Okay, good. Next thing, interrupts. So if you want to monitor an input, you can just pull it, but that's inefficient. So if the GPIO hardware underneath can generate interrupts on a level change, then we will have this extra file called edge and we can write to that to indicate how we want to generate the interrupt. So edge can be non, meaning we don't generate any interrupts or rising or falling or both. So we can interrupt on a rising edge, a falling edge or both of those. So in the example on the slide, I'm writing the string falling to GPIO 60 slash edge. That will then give me an interrupt on a falling edge. And then, unfortunately, you can't use the interrupt mechanism from a script. You need to actually write a bit of code and you need to poll. And there's some examples on my website, but not on the slides, which show you how to do that. But essentially you call the poll function that will block until the interrupt comes along. When the interrupt comes along, you unblock and then you read the value and find out what it was. Okay, so that's the GPIO lib interface. The GPIO Cdev interface achieves a similar thing, but using nodes in slash dev. So when you enable this, you'll find that in slash dev you have GPIO chip, zero, one, two, three, et cetera. So each one of those nodes in slash dev represents a GPIO register, similar to the GPIO chip directory in CISFS. The difference then is that there are no sub directories. So GPIO chip is a character node, character device node, and so it doesn't have a structure within it. You have to open it and then you can use IOCTO functions to access the pins. Why, so this is actually more complicated, it's more coding to do simple things at least. What are the advantages? Well, first of all, it has the naming scheme that's more accurately reflects what you'll see on the schematic. So we name them now using the register, GPIO chip, one, two, three, et cetera, and then we number the pins within that, exactly as you will see on the hardware diagrams. And also since we are treating the register as a single entity, you can actually do several transactions in one single function call. So we can set a number of bits, for example, in one function call, and we can do that without glitching. Whereas with the GPIO Lib interface, you could only change one input or output at a time. Sorry, you can only change one output at a time, I should say. And the way it handles interrupts is a little nicer as well. So I'm not going to go through every single aspect of this because it's quite a complicated programming interface. So I'll just give some demo code that gives the basic idea. So this bit of code is going to be writing to GPIO chip one pin 21, which is an LED if you have a Beaglebone black. So the first lot is just the various include files you need and the variables. The interesting stuff is on the next slide. So here we open GPIO chip one. The next line, we set the line offsets to be 21. Remember that's the pin we wanted, 121. I set the flags, so we want this to be an output because we're going to use it to control an LED. Default values, so initially we're going to set it to zero. And consumer label, we can actually give a human readable string or at least a meaningful string, a meaningful label to these GPIO pins, which helps with debugging and such like. And then the last line, we have a number of lines we want is one. So having created that request structure, we then do the IOCTL, get line handle IOCTL. So this is a little bit of magic because this will return a new file descriptor, a new handle if you like, for this request. So we can now use, it actually returns in the request structure in the FD field. So then, if you look at the IOCTL down here, we are using that file descriptor then to change the state of this GPIO pin. OK, and we could do this multiple times. We could have different file descriptors for different groups of pins on this GPIO. And as you can see where we have line offsets up here, the fact it is line offsets is an array. So we can actually create a file descriptor handle which represents more than one pin. So as I say, we can then manipulate several pins in one go in a single function call so we atomically change those outputs without any glitches. So it's a nicer interface from a technical viewpoint, but there's a bit more coding involved to make it work. You can also, so the GPIO CDEV has an event mechanism which is tied into the GPIO interrupt mechanism. I haven't got an example of that code, but it's plenty around. So using that, we can then listen on an event. We can use poll or select to wait for the event to happen and then we handle the event. Alrighty, so GPIO, that's the fun stuff. The next few things, so I need to look at pulse width modulation and I need to look at I squared C and then I need to make a few something up remarks. So pulse width modulation isn't so often used perhaps as GPIOs, but pretty much every SOC has some PWM circuits. And essentially the idea of a pulse width modulation is that you can create a pulse train as shown at the top of the slide there where you have a period for the pulse and a duty cycle which is the percentage of the period essentially that the level is high. And then by modifying the duty cycle you can change it from 0% which means it's permanently off to 100% which means it's permanently on so you can do the full range. And it's used, the two common use cases for PWMs are dimable LEDs including back lights. So as you change the duty cycle the LED will be brighter or dimmer. And the second case is for servo motors. Typically servo motors are controlled by a pulse width modulated chain and the deflection of the servo motor is controlled by the duty cycle. So this is kind of useful. It's fairly simple to programme. Again there is in this case just one interface and it's a CISFS interface very similar to the GPIO lib interface. So if we look in CIS class PWM you will see that ok there's a slight missing key actually you will see that there are a number of PWM chips one for each PWM interface. Some PWM interfaces can handle multiple channels. So here I'm looking inside PWM chip 0 again this is actually from a Beaglebone. And if we look inside the PWM chip you can see that my arrow needs to move along a little bit actually. There are two files called export and unexport. So move that mentally along a little bit. The arrow should be pointing to the export so we can write the PWM number to export and that will then export it. And unexport works in reverse. There is also a file there called NPWM which says how many PWM interfaces we have for this PWM chip. And in the case of this particular chip which again is the Beaglebone this PWM chip 0 actually supports two PWM channels. So I can write either 0 or 1 to export. So let's go ahead and export channel 0 and we find then that we have a new directory for this PWM interface called PWM 0. And if you look within that we have the controls necessary to control the period and the duty cycle. And there's a flag to enable it or disable it. So initially it will be disabled which means it's not actually running. We can then set the period in nanoseconds and the duty cycle so that's the on part of the period also in nanoseconds. Oh and there's a file there also called polarity. We can set polarity to 1 which means that it flips it round. So during the on period it will be low and during the off period it will be high. So it just inverts the waveform. So we can do this from the command line or from a script. So in this case supposing I want a PWM with a one millisecond period and a 50% duty cycle. So one millisecond turns out to be a million nanoseconds. So we write one million to the period and we want it to be 50% on 50% off. So we write 500,000 nanoseconds to duty cycle. This means the thing will be half on and half off. And then we write one to the enable file and that sets the whole thing running. Okay so again that's fairly simple to do from user space. And then the last thing I'm going to finish I'm going to talk about is going to be I squared C. So this is inherently more complicated. I squared C is a simple serial bus. Two wire bus usually used to connect sensor devices, small e-proms, maybe control devices like touch screens. Although they usually use SPI. So with I squared C it is a bus. So you have a bus controller which is usually part of the SOC. And then on the bus you have a number of peripherals. The peripherals have a seven bit address usually hard wired. So when you buy an I squared C chip the datasheet tells you which address it is using. Commonly there are two or maybe four addresses you can choose from by linking various wires together. If you have a problem with a conflict where you have two I squared C devices with the same address well you can put them on different buses. Most systems have several I squared C bus controllers. So you can put them on to separate I squared C buses or you can go and buy a different chip. The way the addressing works is a little bit odd. So we have 128 addresses from a seven bit address but a bunch of them are reserved by the bus mechanic or the bus electrics. So it turns out you only really have 112 nodes per bus. So accessing this from user space you can do this by enabling the I squared C dev driver within the kernel and that will expose in the dev directory a device node for each of the master bus controllers. So you might see something like this. So here we have two controllers on bus zero and bus one. And then as before you can access them using open close read and write but most of the time you use some IOCTL functions to actually initiate I squared C transactions. The structures you'll need to do all this stuff I defined in that I2Cdev.h file. Just as an illustration of a couple of things you can do so there is a package called I squared C tools which help you in debugging these things. So this is an example of using I squared C detect on a bus and it's printing out so essentially it does a probe to every possible address and then prints out the results. So in this case here there is a device at address 39 and there are also a bunch of things at 53 to 57 but they are marked with a UU so that means they are already used by the kernel so we cannot access those addresses from user space. In point of fact since again this is from a Beaglebone those addresses are the I squared C eProms that are used by the Beaglebone which contain various IDs and such like but they are handled internally within the kernel we can't without some fiddling around access some from user space. We have an unhandled device at address 39. So again from the command line we can use there is an I squared C get and an I squared C set command so we can use this for simple diagnostics and simple use cases. So with I squared C get we give it the bus we want to talk to the chip address and then the register on that chip. So in my example here it turns out that the chip at address 39 is actually a ambient light sensor which I bought from Adafruit one time and reading through the datasheet from this we can see that register 8a contains the ID so we can check that we have the right chip and the right version of the chip by reading the ID register at 0x8a and it comes back with 50 and then I look in the datasheet and it says yes 50 indicates this is the right chip. So we can use I squared C get to read any 8 bit register in a similar way we can use I squared C set to change a register value I squared C set the bus, the chip, the register and then the value. So you can do a certain amount using I squared C get and set you can do more complicated control of the chip by using the IOCTL functions. So we can use, this example code here is showing how to read a, no, so you were writing register. Say that again. So actually this code is reading so we're opening the bus I squared C dash one we're selecting through the IOCTL we want to talk to chip 39 and then actually we are doing a read of the ID register which is at 0x8a. It turns out that the way that I squared C works is in actual fact you have to do a write then a read it's basically a loop so you have to write the value and then read the register value out and then you read the contents of the register back. Alrighty so there are three examples of systems of increasing complexity from GPIOs which are comparatively simple, PWMs slightly more simple, slightly more complicated maybe and then with I squared C you can do quite complicated stuff again doing it without having to touch the kernel. There are similar interfaces, generic drivers for SPI for USB and also a bunch of others I should have mentioned so for example if you are using GPIOs as if you have a bunch of GPIO buttons to control the user interface or something there's a whole GPIO buttons subsystem which will map a button, a key press sorry a button press into a key code which you can then handle within your user interface so we have GPIO buttons we have also the LED subsystem so my example was using the GPIO level to access the LED there's actually a whole CIS class LED subsystem which does more sophisticated things with LEDs and then you can also go you can also go further by using the user defined IO subsystem so this is very generic that allows you to create again from user space a program that completely accesses the hardware so using the UIO framework you can map the registers of the piece of hardware into the application memory address space you can also write a little stub kernel function which will handle interrupts and then you can actually do the majority of the handling in user space so this is the main use case for this seems to be FPGAs if you create an FPGA you will inevitably be creating some kind of interface through that piece of hardware and the simplest way to access that from user space is to use the UIO subsystem to access your bunch of registers that are exposed on the FPGA what are you missing by not writing a kernel device driver? so there are some things there are reasons that kernel device drivers exist in the first place there are a couple of things to do with robustness and performance so it is perhaps a worry to you that if you are handling a device from user space a user space program can be killed and terminated whereas there is no way to kill or terminate a kernel device driver except by terminating the entire kernel of course so it is maybe smart you could be regarded as being more robust to put the code into the kernel for that reason also within the kernel we have a lot of sophisticated locking techniques from spin locks, mutexes read write locks and the read copy write update mechanism so you can do, if you have highly contended locking scenarios you can do those better in the kernel than you can in user space and also you have more direct access to the hardware so you can do fancy things with memory addresses and DMAs and such like and then finally and probably the biggest reason to write kernel device drivers in the first place is that there are a whole bunch of subsystems within the kernel in which you can just slot in another device driver to cover so a simple example of this is the backlight, the LCD backlight it's usually a PWM channel that controls that so you could quite easily control it from user space using the PWM user space mechanism we just talked about but in fact there is another subsystem the backlight subsystem which exposes a different user interface via CISFS and if you want to expose your LCD backlight using that standard interface you would have to write some kernel code to do that Okidoki and that is everything I have to say on the subject right now so we have a couple of minutes literally three minutes spare for questions so if you want to ask a question I encourage you to come and use the microphones here so that the question is recorded on the video that's been made but yes, anybody have any questions or comments on not writing kernel device drivers? I'm afraid it's a long walk Is this working? Seems to What progress has been made on being able to access DMA from user space because we were interested in our company for that kind of task but there didn't seem to be anything very stable yet there was some movement in that area but not a So which interfaces did you look at? I can't remember which one it was but we were looking at there was something that exposed setting up a DMA channel from a user task but it wasn't an official thing yet So accessing DMA channels directly from user space is kind of tricky because the DMA hardware is very specific to an application to a chip and the way the memory is allocated and the way memory addresses are handled is also very system specific So I would guess I'm not aware of any standardised way of accessing DMA channels from user space and I would guess that there would not be such a thing because it is kind of tricky and device specific So there may be particular examples in particular particular board support packages but I don't expect to find a generate one Talking about GPIOs Have you looked at LibGPIOD which is a tool that exposes for the character driver and functions that you can either script or have either bindings into your Python code So there's a good point The way I describe the interface is that you write the IOCTS directly in your code to control either GPIOs or I2C or whatever There are in many cases libraries that will help you do this including LibGPIOD which gives you a high level abstraction and is easier to code to So in fact I would actually recommend you would use a library such as that I just didn't really have time to delve into that in the slides It's also upstream now It's also upstream in the kernel so we can build it and use it OK, maybe time for one more question I think we're coming to the end of our slot Ah, Michael You know in the Raspberry Pi world there are lots of Python libraries for doing stuff in user space You never change your kernel Any experiment on other platforms? I guess it's portable at the same way on Beaglebone Black for example It would be nice if there was So certainly the Raspberry Pi and the Beaglebone also has bone script something very similar So these platforms are highly scriptable using Python or JSON or whatever scripting language you like I don't know of very much effort to port that to other more deeply embedded platforms So Mike's experience at least if you get a single board computer or a system on module from one of the many vendors of these things they don't come with Python support very much they don't come with fully scripted whatever So it would be nice if they were to maybe as the Raspberry Pi concept continues to roll through people will start doing this more But yeah, it would make our lives a lot easier if we could just write everything in Python code Okay, I think we're pretty much out of time now so if anybody has any remaining questions grab me as we finish this session But meanwhile, thank you all very much for coming along to this and I hope it's been useful