 Okay. Looks like my mic's working, so that's always a good start. Thanks everybody for coming to Grocking the Linux Spy subsystem. My name is Matt Porter with Consulco Group, and let's dive into it here because I broke my rule about how many slides to have in something because there's just too many visual aids needed in this. A little background on this. If you're somebody who works in the spy subsystem itself and the core code or you're already doing controller drivers, probably not the content you're looking for here. This is more from that person trying to use it, right? How do I actually make use of this stuff? So let's get going. Oh, turned it off already. Of course my pointer quit working again. Alright, so one of the things I try to do is that we get reminded a lot that we have a very diverse community these days, and so I got to unobfuscate the title of the session. So Grock comes from Stranger in a Strange Land. That's where the book comes from. That's actually one of the cover art things. So now you know if you didn't know where that word came from. So we're going to go through today. We're going to talk about what spy is. We can't really talk about how the subsystem works and what the facilities are available to us as users, whether we're kernel users or user space users until we actually understand it completely. And so we'll go over spy fundamentals, some Linux specific spy concepts, and then we'll look at use cases, these kind of use cases where at a device we want to do a protocol driver, controller driver, and we'll talk about what those mean in this context, and then user space drivers. And we'll talk a little bit about spy performance and then what's coming up in the future in the spy subsystem. So what is spy? Yes, not as many people in Europe get this reference. I noticed, but more people in the U.S. will get that. So serial peripheral interface. It's a Motorola de facto standard. This comes back from HCO8 microcontroller time frame. They had introduced this right around there. For any of you that started in university working on HCO5, HCO8s like me, dating myself, you might have seen this stuff back then. So it's a master slave bus. It's a four wire bus, except when it's not. So nothing is simple. And there's no maximum clock speed, right? So we can go maximum ludicrous speed like a Tesla. And one joking comment about spy itself is that it's just a glorified shift register, right? Or shift register with side effects. So if you want to get into the more detail, gory detail, you can go to the usual source of knowledge there, Stack Overflow, of course. And hope that people haven't defaced the page. All right, so let's talk a little bit. Common uses of spy were embedded Linux engineers or budding embedded Linux engineers in here. You probably have seen some of these. Maybe you didn't know all of them. But just to give you an idea of the types of peripherals that have spy bus support, flash memory is a big one, okay? A lot of spy flashes, analog digital converters, very popular. A lot of sensors, thermocouples in particular is one type, LCD controllers. The Chromium embedded controller has a spy interface to name some one particular thing in a known product. So a lot of different things. And a lot of these have one characteristic in common. They're actually relatively high data rate peripherals compared to I2C peripherals. So that's a little idea of the flavor of things we see with spy. So let's talk about fundamentals. So we really need to understand first what the spy signals are that we use. And so pretty simple, the defecto standard name. And so we're all driven by the original Motorola naming for that. So we'll tend to see MOSI or however you pronounce it. That's how I do it. Master Output Slave Input. But as I show here, there's lots of other names for it when you start looking at data sheets and your user manual and so forth. You'll see SOMI, you'll see SDI. You'll even see I2C naming, which is kind of a little bit of a problem sometimes on data sheets. MISO, Master Input, Slave Output. And the name gives you a clue of what direction things are going. That's the beauty of those. Same naming issues, because it's a defecto standard, people will go and adjust things a bit in their version. Your serial clock, which is Master Output, and your slave select, it's a master output. So this is the simple, basic, original way that SPI shows up, the basic symbols. And the slave select that's also known as chip select. And we'll see that that's how it gets referred to later in the kernel stuff as well. Okay, so wow. I am excellent at drawing diagrams, as you see. So with the little help from tools, so that's about as good as it gets. So we like things simple. So first we show the most simple thing possible, right? Those four signals, right? Exactly the directions I promised them, and everything marvelous, right? So now we want to look at timing diagrams, because timing diagrams are important in understanding SPI. So the best thing to do is take a look at what a basic 8-bit read and write will look like on SPI bus. So here we see data's not stable to these spots, okay? So we've got some bits. Notice that we're doing the most significant bit first on the wire here, okay? This is a write mode. We'll talk about modes in a second. That's a very important part, concept to understand in SPI. We don't care about the master in, right? This is a write, so we're only seeing data on this Mosey line, okay? On the read, converse happens here, okay? And so what you see is you see the chip select line, okay, being asserted low in this basic thing through this whole transfer, or we'll just call it a cycle for the moment, okay? And you'll see that the data here is latched, okay? On the rising edge, okay? And the same thing's true, okay? On a read. And you'll see that the chip select went low during that whole transfer. So that's your basic transfer, simple, right? So now we need to understand SPI modes, right? I hit you with SPI mode on there. I don't tell you what it is. Now I'll tell you what it is. So it's essentially a clock polarity and phase of the clock relative to where we're latching the data. A very simple way to look at it is that when you have clock polarity, where the idle state of the clock is low, then your C pole is zero, okay? And if the idle state's high, then it's one, okay? And then clock phase, it's simply data is latched falling and output on rising, and one is data latched on rising, output falling. We're going to take a look at this visually because it doesn't always make sense just with a bunch of words, right? But if you understand the simple binary there, then you'll see that mode numbers are a simple mapping in this table. So those mode numbers you're seeing zero, one, two, three. As you see references in SPI, this is what they mean. And then we're going to show you what it looks like in the timing world. So you can see why I'm using a lot of laser pointer now because it takes a bit to look at this. So once again, we're looking at one version of what I showed before. And so we're just going to look at writes in this case. So in each case, I just drop out the meso line. We're just looking at output. And so I've got data here, okay? My clock is idle low as I promised, okay? And write mode zero. Data is latched on the rising edge. So that corresponds back to that mode zero, okay? Write mode one, my clock is idle low, all right? And it clocks, but the data is getting latched here on the falling edge. And you see falling edge lines up with where our data is stable all the way down. So that's write mode zero and one visually. And then now when we go to write mode two, these are both where C poles one, okay, on the slide. And so what you see is our clock is idle high, okay? Data is latched on the falling edge. And so you see it lining up. And the same thing, our clock is idle high in this case. And it's latched on the rising edge in write mode three. It's that simple. Works the same way when you're doing a read, but we just need to show you the cases in just a write to simplify the timing diagram. Okay, all right, so great, simple, right? We know everything we need to know about SPI, except it then gets more complicated in the real world. We have lots of simple cases that work like that. You might just have one device, but the reality is you're going to have quite a few. And that's where your chip select comes into play, okay? We showed on the timing diagram, that chip select being asserted low typically when I was showing the mode zero examples. And each slave has a chip select, okay? So if you have multiple SPI slaves, you would have the chip select being independently asserted to handle so that it would handle that right, okay? The next thing that you see is there's daisy chaining and there's a common type of daisy chaining where it's the inputs to outputs, okay? So of the devices. There's another rarer one which is where chip selects get daisy chained. And I see one person perplexed by that. They've never seen it, but that's the beauty of de facto standard. Everybody can do whatever they want. And I can tell you that the anodyme field program or analog arrays actually do this chip select daisy chaining which is super ugly. So lots of ugliness. Then we get into some cool stuff. We talked about flash, right? Being a big use case, right? And it started originally as these single lane flashes and now, okay, instead of one miso, you can have n misos, okay? It's a little bit more complicated than that, but on a read, you can have dual or quad misos typically in those devices. And of course, now you have n times bandwidth when you're doing reads which is that fast path that they're trying to optimize, okay? And then we have the micro wire, three wire variants, the one I promised except when it's not four wire and they handle half duplex, okay? Well, half duplex. So miso and most of you are combined on the same line. So a few different variants of that. And again, another one of my masterful drawings where you see how you hook up three different spy slaves. Notice as promised that there's multiple chip selects. I'm using the original terminology slave select there. So you see slave select one, two, and three are only routed to each of those devices and all the signals go appropriately to all three devices, okay? So one master, many peripherals or slaves. So now, if you can see this eye chart and apologize, I can't point to both sides. But now we look at multiple slaves. What does that timing diagram look like? Well, it's exactly the same except we've got three chip selects. Again, I just have to pick one mode for illustration purposes and we're latching on that rising edge, right? And the clock was idle low, right? We're latching on the rising edge. But notice that the chip select, right? Before that clock edge where it's going to latch, it goes low for that entire cycle. Then here, this next set of data on slave select two, okay? And then slave select three. So pretty straightforward there. And those are the basics, fundamentals. So now we can talk about Linux stuff. All right. So conceptually in... Let me get my... Here. All right. So now conceptually with Linux, we have the concepts of controller and protocol drivers, okay? And controller drivers support a spy master device. So when we looked at that fundamental piece of hardware, whatever houses that peripheral, that IP block, right? Normally in our SOC these days, that's the driver that runs that hardware, okay? And so things are carefully separated in the spy subsystem so that everything for the controller is just the basic pieces to do clock control, chip select control, think back to our signals, right? We're dealing with that zero clock signal, right? The chip selects, right? And then running that shift register that we talked about, right? And there's a lot of details involved in that, but at a high level, it doesn't know anything about your end peripheral, right? Now, so an example would be take the Raspberry Pi series, the BCM2835 AUX driver. That would be an example of a spy controller driver you might use, okay? So now protocol drivers, they support whatever's needed to drive the actual functionality of that spy slave in our block diagrams, right? And so this is completely based on a concept of messages and transfers, which is Linux terminology, okay? And we'll talk a little bit more about that. And the protocol driver doesn't know anything about your controller driver, right? So it relies completely on the controller driver to do all that chip select work, right? All the shifting and stuff, so it's neatly separated, all right? So an example there would be, let's say you were using a ADC in the IIO subsystem like MCP3008, that's a protocol driver. And if you're familiar with SpyDev, which we'll talk way later about in user space stuff, that is an example of a protocol driver as well. Okay, so how do we do communication in the Linux Spy subsystem? And so fundamentally as I promised, the communications are broken up into transfers and messages. And so a transfer is a single operation between a master and slave, okay? And so the structures, the transfer structure has things like transmit and receive buffer pointers, okay? It has fields to control chip select behavior after operations. What's the behavior of that chip select as to whether it becomes de-asserted in between transfers before the next one? And we'll talk a little bit more about that later, and how tricky that can be. And things like you may need a delay after your transfer. So all this goes into when you're looking at that data sheet for your peripheral, it has timing requirements, and it may need a certain span of time before you send that next transfer, whatever it is. It depends on the protocol it implements, okay? So you're able to define all those things in there, in a transfer. And then a message is just an atomic sequence of these transfers. And then a message is actually that fundamental argument to all the SPI subsystem transfer APIs, all right? So basically imagine a link list of transfers, right? And another great diagram, right? So that's what a SPI message looks like. So now let's get into use cases, and that'll take us into how we do these various things. So if we explore some of these common use cases that people need to do, the first thing people seem to want to do is I need to figure out how to hook up this device that's on my board I've got, right? And there's already a kernel driver for it, right? So that's one. The next thing is I need to write a kernel protocol driver, and then the next big thing, and less common, right? Because most of them are written for us, is I need to write a kernel controller driver. And then finally, one of the big ones, and you see kind of, especially from people doing testing and some of the maker-type projects in user space, is I want to do something in user space to drive my SPI slave, okay? All right, so let's jump into adding a SPI device to your system. So the first thing is we need to learn how to read data sheets because a lot of people can't seem to read data sheets. So we're going to force you through reading a data sheet very quickly. And I'll show you what I mean by that and the type of things you need to pick out. But there's three methods, okay, after we do go through that, that we hook up a protocol driver in our system. And first is the ubiquitous device tree method that most of you will use. Then there's the board file method, which some systems will still use, especially old vendor trees. If you're unlucky enough to be there, sometimes some of the x86 stuff because ACPI is tough to use. And so you'll see the latter two, mostly on x86 and vendor kernels these days. But we'll look at all of them. So yeah, so now we learn how to read a data sheet by me picking out a few things. So you're looking at a data sheet and it's like, what is this thing? Sometimes you will not always have the thing tell you, oh, this is a SPI device. It might just say it's a three wire or two wire or something synchronous serial interface. And so you might have to do a little detective work on that. The next thing that's important is you're going to start remembering these timing diagrams and saying, gosh, it's a good thing it told me SPI because it's showing me here I2C signal names. So that's the proof that there's data sheets that show you. This is really out of the ST7735. And if you've used like the FBTFT driver, this is one of the parts that's supported in that. So real device. And so the idea here is you need to remember things like I need to figure out what's the maximum clock rate that my device can support, right? Because I need to tell the system what the max clock rate is when I'm hooking up a device. It's one of those critical things, right? So you're going to have to be able to read this timing diagram. Well, it maps exactly to what I was showing you because, and my resolution's awful here. So you look at these two values. You look further down in the table and you find the hold times on these, right? And so you can get with simple addition, you get the period and if you're any good at math, I hope you can do minimal math, you get the frequency, right? And so you've got to, with those minimums, you now have a maximum frequency out of that, right? So those are the kind of things you're looking for. Don't always expect to see spy jumping out at you or it's saying, hey, your maximum clock rates this, right? You need to actually read the timing diagram and understand it. Same thing goes when you start getting into the protocol pieces and so as we get down here, further down in the data sheet, it's one thing to have that timing diagram and what the clock looks like for the individual bit clocking, right? But then when you get into the specific protocol of this device, now it's showing me what kind of delay I have to have between the transfers, okay? So there's a delay here minimum and so that goes into things where I showed you with the transfer thing, you know, optional delay between transfers, you may need to set that depending on how efficient your protocol driver is trying to be. If it's trying to maximize the bandwidth on there, you're trying to optimize exactly against this timing diagram to get your best performance. So important that you understand and can read those. Just another example here with the MCP3008. I'm going to use this a little bit more in this talk but it's good to show you two examples. Here they told you what modes it uses too, okay? Gave you a little bit of information. Don't get confused about conversion rates. It has nothing to do with our actual spy clock and again, same kind of things here, right? T high, T low. We're able to jump into here and see, okay, you know, my minimum, if I didn't clip this off, those are minimum high and low times, right? So we got a 250 nanosecond period and we can get our frequency or max frequency out of that. All right. And again, same thing holds true. That's just the basic what can I do with a clock but I also have this special protocol, right? I got to send it commands of when to do an actual analog to digital conversion and then it's going to send me data back and so there's start bits, there's certain delays and so you're going to have this protocol specific stuff that you're going to have to figure that out for your peripheral if you're writing those. If you're just using an existing driver, you know, you probably just need to know that, hey, I need to know what the max frequency is if I don't have another example, right, on the web somewhere or I have a variant because you'll have, like in this family, you know, there's like 30 of these chips, right? And they all have different zero clock maxes, right? So if you want to get the maximum rate, you want to go double check that before you hook it up. Alrighty. So now we get into the examples and I sure fixed this from doing this at FOSTA because I had a black background, that was terrible. So first thing you want to do is this is all about showing you where to get the information. We're going to use this MCP3008 all the way through for all these examples now, okay? So the first thing is you need to know where to find device tree bindings, okay? If we're doing it device tree-wise. So you're going to use your documentation source. Either you're hitting Google and you're seeing the LXR tree and free electrons hosted kernel stuff to read it there or you're looking in your local tree and down in the documentation device tree bindings directory. And you hunt around in there a while and you find, oh, here's my MCP3008 binding. Well, it's one of many chips in this binding and this is kind of trimmed a little bit so you can see, but they show you, they show you here. Now I'm favoring this side of the room, of course. In that list, right, is a compatible string. And just to point out, this isn't grocking device tree so I expect you to know at least the basics of that first because there's not enough time to do it all but you need to know that compatible string and of course they give you the example this should be familiar to people working with device tree. But what's important with spy is there's the reg here and spy max frequency. So this is actually the chip select number. So that's a unique address on the bus essentially is how that works out in the driver model and then your spy max frequency that we're really talking a lot about here. So that's why we needed to know that. And so that's an example, shows you how to hook it up. Now, one thing I improved here was that somebody had a question of where did these IDs get picked up, right? Well, here's, you can see I kind of trimmed some stuff out here just so I could fit in on a slide. So you have in your driver this mcp320x ADC driver you have this ID list, common pattern. You'll see this for other data structures like ACP IDs we'll show an example of that as well. So you see this common pattern in the kernel, right? So if you look at that driver, you'll see that it handles this compatible string here, right? And then that it's able to bring in some chip specific info for that variant of the family, okay? This is how it all gets hooked up. If you're wondering where this comes into place, it's all part of that driver definition that's all part of the driver model, okay? So nothing surprising here, but at the end of the day that's how that compatible string gets matched in, right? And if you have any confusion of what it's using, you always go to the source and look at this, okay? Last thing is, I showed you where the driver side was. I showed you where to get the binding information. Now you actually have to use it. I'll show you in this case, and the overlay fragment syntax is changing as we speak. So that's a whole nother topic, but this is at least a old style version. It is what it is, right? So what you see here is exactly like that example in the binding. We hook it up, all right? And that's that max frequency we got out of that period, right? That we got off the 3008 thing. So you do something like that. If you're doing a static DTS entry, the syntax is a bit different, but obviously the meat of it here is the same. So we went end-to-end on that, hooking up that via DT. Not much different with the board file, right? Don't use board files though. I'm just showing you don't do this, okay? But that's where you would hook up the same things. You notice chip select zero. You have to mention the bus number for the actual master here, but essentially the same info there, and you use mod alias to hook it up to that module. And it gets in there either if it was an old board file and build in the kernel, or sometimes I saw people on the mental board project just writing a module and doing this because it's too hard to do via ACPI. All right. Did I say it was too hard to do by ACPI? It is possible, but as you see, it's not designed for humans to read or at least not for me to read. But what you see is if you were to do, instantiate a spy device by ACPI, you have the same things. You're telling it polarity, all these different specs. But the most important thing is the max frequency, which of course they have in hex, so it's not readable. Well, it's readable, but that's ACPI. All right. So the first thing is they follow a standard driver model, just like everything. And when I'm hooking up a driver, I show you the driver IDs, and I show you all those pieces. It's all standard driver model. It's just spy using that. And so I want to do a protocol driver in the kernel. I need to do some basic things. And we don't show everything. We don't show everything. But you need to find that driver structure, right? Here I've got my protocol. I need to have some power management ops. Those are all implementation details for your part. But I need to have an entry point, probe, and remove all standard driver model stuff. Once you enter the probe, though, you can immediately start working on it. Once you enter the probe, though, you can immediately start using kernel APIs. And we'll talk about the kernel APIs now. So kind of diving in from that use case perspective. So what are the kernel APIs available to these protocol drivers? Well, pretty simple. Two big categories, async and synchronous, surprisingly enough. First you have spy async. And so it's an asynchronous message request, surprisingly enough. And you get a callback executed when it completes. You can execute those in any context. And what does that mean? We'll mention down here. So if we go to spy sync, it's a synchronous message request. And we can only execute in a context that doesn't sleep. So in a common case, don't do it in IRQ context. All of these later things are all wrappers around spy async. So there's not any real magic going on. It's just that thing's going to wait for the transfer to complete and then return to you. And then there's some helper functions that, again, wrap around spy sync, spy write, and spy read. Recall that I said that the message is that fundamental argument to everything. So when you're using these, you're handing it a message with just a list of these transfers. And you're going to define those transfers as whatever satisfies the protocol specific to that part that you've carefully studied the data sheet for. There's some other ones for special use cases. We mentioned how flash devices, commonly now, especially, are dual well mostly quad now, quads by capable. So there's optimized routines for those. And also these are optimized because of the quads by things being such low pin count. They're replacing NOR flash, old style parallel NOR flash, because the speeds are very similar now and the reliability of those. And so there's actually, I think, take, for example, the DRA 7XX stuff from TI. They have an MMO controller that translates into flash commands. And so it can actually XIP out of these spy flashes that way. Then we have these other helper APIs that allow us to create and build up that message. It's that list of transfers. So we need helpers to help us out here. Spy message in it, add tail. Some of those patterns will look familiar as to other subsystems in the kernel and how data structures are managed. So you can build up your message as a bunch of transfers all with relatively unique characteristics. All right, so that covers protocol drivers, controller drivers. Again, standard Linux model. You first allocate a controller with spy alloc master, and then you set controller fields and methods. We showed just the basics. Common mode bits, for example, all these flags. You show what the capabilities are of the controller. Not every controller can do spy RX quad. That would apply to those controllers that can handle the four miso lanes, for example, in that mode. Spy loop, where it can do a loopback in hardware. You have to implement setup and cleanup, and you need to implement one of these. And many of the masters, they've moved to the transfer one model because that allows you to let the core support GPIO, chip selects, and there's a lot of demand to expand out chip selects. Or because there's a lot of controllers that just don't even have a hardware chip select mechanism, that's necessary as well. So you'll see this implemented more commonly on some of them. And then finally, when you've got that all set up, you just register master. All right, talk about user space. Yeah, well, all right. So user space driver, probably many of you have heard of spy dev or used it and so forth. So it's intended primarily for development and test. In reality, we know that everybody's got their Python and GoLang and Ruby and every other thing built on top of it to do their maker cases. So very popular for people doing hobbyist projects and so forth. But in many test scenarios. If you're using that common DT case. Well, first let me say, normally you're going to want to write a kernel driver. You're not going to want to do a spy dev user space driver. You're going to work hard. Probably the reason that you were using spy in the first place is that it's a high performance part. You're better off doing a kernel driver and everything, but there's obviously lots of cases where you just want to whack at something and have maybe a test case or so forth. So I mean, that's really the most common place of use where people are just hacking around with it. So super useful, but how do we hook it up? And so you can leverage in upstream one of these compatible strings because you're just kind of doing some testing and stuff and then you don't have to modify anything. I mean, if you needed to put something extra in there, you could and I know sometimes accept new compatible strings upstream where it's a legitimate device that's not going to ever have a kernel driver. And then finally, in the spy dev driver, the other way you can hook it up. Well, you can do the board file method, exactly how we showed the other one, but the ACPI IDs that are supported, you can use those if you're doing an override, an ACPI override loading. So you can do that or you can modify your vendor kernel and add something else. So that's how you hook that up. All right, so what happens? Once that driver binds, okay, we've hooked it up. It binds. The magic that happens is that we have this this class spy dev devices get created and so bus is your master ID, numerical ID, as they're allocated in the system and then there's a device for each chip select. So if you had three chip selects, you're going to have bus 0.0123, okay? And then you also get the dev spy dev same thing, okay? It's a simple character device, open, close, read, write. I say read, writes, half duplex, right? The one thing I think I neglected earlier to say, inherently, if we go back to the beginning and look at that spy bus, right, it's inherently a full duplex bus, right? Every time you're clocking a bit out, there's another one coming in. However, in your protocols, right, you will find that, you know, some parts, like the ST7735, you don't need to hook up MISO. It's a one-way path, right? So when you're actually, if you think back to the transfers we looked at before there's TX and RX buffs, the trend you'll see, you're delivering data in one direction. So if I'm transmitting, I've got an actual buffer and in that transfer, so I've got a pointer to a buffer, but if I'm doing a write only, okay, then my RX buffer just points to null, okay? And so that's what happens here. If you were to just write a byte, okay, with the write call, it would, behind the scenes, what would happen is it generates a transfer where that RX buff is pointing to null, okay? So many times you don't want to do a half-duplex transfer, right? A lot of devices, you're reading something back simultaneously and so you're gonna have to go to the IAC tool interface and so the options you have there are the same, map exactly the same to those kernel APIs, right? I want to send a raw message. It's full-duplex. In that interface you can define your buffers, right? Filling the receive and having that transmit buffer sent, okay? And you can also define your chipselect control, okay? And that, sorry, chipselect control. So all of those options, we showed a few of those, how you could change the behavior or the delays after transfers. You can do all of that through spydev here. Finally, there's a complete set of parameters here that you can set for both read and write cases, all right? So you have complete control over that. And as you get into the details of this, so showing this in slides and this much content, that's not the whole deal. I kind of, you know, show it broken out so you know where to look. But if you want the gory details on that, as always, the kernel docs are the way to go. And what's really great is, and sometimes people forget that the tools directory exists, right? Except by references when people are talking about things on the mailing list. Don't forget that you have tools spy these two examples, right? So it's a full-duplex example, all right? That's really great for getting started. You want to write a test or something in there. And then the spydev test that was written, which is outstanding, tests all kinds of different cases. Exposed all kinds of problems and various controller drivers. Once you're past beating your head against the raw interface, you might want to look at something like Jack Mitchell's Libsoc, which puts a nice common interface across a lot of these user space things. It's, I think, a nicely written piece of C code, very easy to follow and everything. And then what's also popular in the maker community on top of that is the Python spydev binding, that particular one. There's a couple that's the most popular that I see used most of the time. All right, let's talk a little bit about Linux spy performance. One of the things you have to keep in mind, we talk about all this nice abstraction, right? And abstraction is great, except eventually, sometimes you find these holds and things where you have to start knowing how the hardware works, which is sort of like when you need that last 10% performance that takes 90% of the time, right? And so there are things like, and I just give examples, you need to be aware of what that controller driver does when you start thinking about performance, because you may run a test case and, in example, this OMAP mixpy driver, okay, it has in the controller driver a fixed, there are no heuristics of when it engages DMA. It simply says if it's greater than 160 bytes, the transfer size, I'll use DMA, otherwise it's too much overhead, right? It's a typical, typical type thing, but people have been, and I saw it firsthand, confused by the fact that, hey, DMA is not happening, we don't understand why. Well, they were doing 128 byte transfers on this driver. So you need to be aware of that. There are other cases very specific to various controller drivers when you start looking at the thing on your logic analyzer or whatever tool you're using on your scope and you start seeing that there's delays in it, you need to start being aware of those types of things in the controller driver, how they manage their chip selects would be other areas that may not be optimized very well. And then the next big area is you need to know when to use sync versus async, okay? You'll find most kernel drivers are using synchronous calls. You'll find that there's some network drivers are the big users of async, right? And actually it's fairly rare, but that's the big place and it's because they're optimizing for bandwidth rather than latency, right? And that makes perfect sense there whereas everybody else really wants to get that little transfer out on the wire as fast as possible, right? So they're typically doing the sync. One of the nice things that came in in the 4x series was that sync will now attempt to execute in caller context and that's made a big difference. So it will rather than sleeping, it will try to execute in that caller context so further reduction in latency. So it aggressively tries to do that if possible. And then this big one, okay, talked a little bit about how you can set all these different options in your transfer structure, right? When you're building up your message and you're creating transfers, you need to understand that the characteristic again of that slave device, right? And what it wants to see optimally because, for example, if you have a protocol and you allow the chip select to go back high after each transfer, you add extra timing delay in between the transfers. Your particular chip, it might require that, okay? It may not. Some of them can handle stringing them together and just leaving the chip select asserted. So as you can imagine, if you don't have to have the controller driver, de-assert that chip select after each transfer and then reassert it and you're going to the same device, you can improve performance in a way that you can quantify when you're looking on the logic analyzer and seeing that all of a sudden, okay, now I can string these transfers together and hit that maximum timing diagram they showed me in the data sheet. So this is the canonical good explanation of how that CS change field works. And, you know, it's a pretty detailed one, right? But that's how you actually manage cases like that. So if you've got a part, right, where you want to have... where you want to control that chip select in any of those different variants, you do it here, okay? So the important thing there, this is in the spy header, right? Most of the other important docs are in the actual documentation directory, but this one I think is... you know, CS change is mentioned, but it's often confused and it was actually implemented wrong in the IIO subsystem for a long time, in fact. So, yeah, they fixed that up. I can't remember where I saw that coming in, but they realized it was wrong for a while on that when someone was doing some performance analysis. So keep this in mind as well. And the other big thing is, you know, there's no excuse for not having good performance and debug tools these days, right? So you have to have it if you're trying to get decent performance because you can't fix what you can't see, and so, you know, you can look at these resources here and actually prefer SigRocSource, right? Because they have a great comparison of everything they support or don't support there for logic analyzers. So for things that are now well under $50 U.S. and have been for a while, there's really no excuse not to have even a simple one. The other thing is you can use spy loopback test module, the self-test for some performance work and also debug of controller drivers, so forth. That's very useful. It's exposed a lot of problems. And then finally, one of the newer things, relatively new, that came in is this great set of statistics. And so now you have under sysclass by master and now bb.c, so that's bus, chip, select, right? Same notation we're looking at. Kind of makes sense based on the allocation of the driver. You have these statistics nodes, and so you have statistics on messages, transfers, errors, timeouts, all those good things you want to see. And also, you know, I talked about how it tries to optimize for executing in caller context, right? So you're writing a driver or you're doing something spy dev, you can go look and see how many times it did that transfer with spy sync and how many times it actually executed in caller context, which is what you really wanted, right? To avoid if you're going for reduction in latency. So you can use that to start, you know, having that information. The other cool thing is when you have something that's complex with maybe many devices, I got histogram on transfers. So you can get that data out, okay, of what size buffers are getting transferred and stuff. And that's actually interesting to see as you start debugging how different messages and transfers are occurring. Maybe you're looking at DMA efficiency and so forth. Having that kind of analysis of what your system's doing with multiple protocol drivers can be really useful in diagnosing things. That thing is flaky. Okay, and time check, we're doing it. Real quick on spy future, so slave support is coming. Actually, I didn't look in the last couple weeks where we're at, been traveling, but so hard real-time issues, right, on Linux and the inability to have very tight-time constraint determinism has made slave support kind of problematic in a generic way. However, Geer De Hoven noted that there's a lot of use cases that you can implement that are real-world use cases if you kind of limit what your scope is of this. So he had mentioned pre-existing responses. So you have a slave driver and maybe it's time of day. It's ready to return and it's got that transfer ready. You can implement something of that, or commands are just one way, right? And so currently has a V2 RFC patch series. Looks pretty good now. There's a bit of work to do on it, of course, and he mentioned some bugs and so forth. It works just analogously like registering a master with Spy Alex Slave when you're adding a controller driver. And then the way it works is he has this type of node that gets exposed after a slave controller is instantiated and you can write slave protocol drivers that are bound by a CISFS like this. So if you have a slave protocol driver, go that in and that binds and activates it. And he provides in that RFC patch series on the previous slide these two examples, which actually map neatly to those two use cases. He says where this Spy Slave Time just sends the latest uptime back, just an example of that type of use case, and then Spy Slave System Control, which is power off reboot halt system. So it accomplishes those. All right. I think we have questions. I have the sync and the async. And I have a protocol driver that we can do. As I understand it, SPI queues the messages up and then we burst the queue out twice. And we may or may not get an answer or reply at that time. You know, when the send queue empties, we're also going to receive queue because I'm paying it directly. Are you talking about actually on that slave device or in the subsystem? Now, the sync versus the async. Okay. When I fill a message queue and I call the sync or I call async, does it immediately empty the queue through the hardware or DNA mechanism to be sent immediately? Or is it a budget command? The spy sync will try to execute that immediately. The sync will. It can sleep though. It can sleep and block. Well, not block, but sleep. You don't want it to sleep. So you could use spy async and just hook a callback. That's what... I forget which network drivers. I didn't think it was ENC. Somebody might remember... Do you know... There's one that does that and is a great example of... There's not many examples of it that I know of. Yeah. But it works in the network driver type thing. So yeah, it's very limited use cases. So speaking of interrupts... There. Let's say I had an SPI device that... I mean, SPI inherently doesn't have an IRQ, but if you have a GPIO that says, hey, I need service now. Can you talk to how you would tie that in? Or is there a way to have SPI pull in the background to see what... So there's a separate subsystem for these things. The GPIO subsystem offers all... I get to say disclaimer. That's out of the scope for this talk now. Different talk, right? But the GPIO subsystem... So when you're building a protocol driver, again, this is all simplified. That's like an everyday case, right? That you've got the IRQ. That's out of the bounds of SPI. But in reality, when you're writing that protocol driver, one of the other resources you have to hook is that GPIO. And you need to request the GPIO using the GPIO framework and set it up as an interrupt, okay? Well, you just request an interrupt. Ignore the kind of GPIO. You don't want to read just the GPIO. Right, right. When I say get the resource, talking about all of that, the same hookup you do, you have to say in device tree you need to specify that here's the GPIO for this device when you're instantiating the device, right? The subsystem. His specific question was about interrupts and how interrupts are handled. And so, yeah, the clarification is that that is not something that's part of SPI just by the very nature that's not SPI. But that's where you're using the features of the interrupt capabilities in the kernel and the GPIO subsystem that implements all that for you in that protocol driver, okay? Put those two together. So, I'd seen in using SPI Dev that you weren't supposed to have it in a compatible string. Is that right? Because it's not a piece of hardware. And with Beaglebone, for example, there was like a patch that then made it complain really loudly and I think we've squelched that. Mark, this is going to be the last question because we're going to run it. But Mark should answer this because the maintainer of the subsystem is here. What was the actual question? Can I? You put it into some file. I guess that's what you're actually supposed to... So, you were showing that instead of having SPI Dev, you're actually supposed to have an actual compatible string for a piece of hardware in which case you're supposed to add it into some file somewhere? Yeah. Yeah, the SPI Dev has a list of compatible strings. So, you're supposed to put it in SPI Dev. Maybe like a C store. Yeah. So, yeah, just say SPI Dev is how we handle it on Linux or write a driver for it. So, why was SPI Dev removed from the compatible strings? Because, like Matt said, SPI Dev isn't describing the hardware. It's describing how you want to drive it on Linux right this very minute. So, we now complain loudly about it to make you actually describe your hardware because somebody might come along later and write a driver for it and you have no idea what it is. And even when you're using... Even when you were using SPI Dev, it's not very helpful for user space to work at which particular thing to do with that SPI Dev if we just say SPI Dev. So, you couldn't write a generic driver or generic SPI Dev using user space driver.