 Greetings. I'm Sandra Capri from Ambient Sensors in Boise, Idaho. First off, the most important thing, it's Pi Day. Greetings on Pi Day, and I'm giving my speech, so that's pretty cool. So in honor of Pi Day, I just want to say 3.14159265358979323846. Can't go any farther. And then that's kind of a lead in for those of you that know me, you'll know that when I get excited about things like technology, I tend to talk really fast about the speed that I do my Pi. So if I do that, I've got a co-worker in the back that's telling me, slow down, slow down. I'll try to stay under sublight speed today. And then, oh, if anybody has downloaded the slides already, I apologize, they got changed again last night. So if you want an up-to-date version of the slides, you're going to want to go out and get them again. The subject of the talk is going to be about a white paper that I wrote. The white paper is done, but being the engineer that I am, I'm not so good at the whole formatting thing. I have a co-worker who is currently desperately trying to format it into something that's a little more human readable. So hopefully by the end of today or by tomorrow, you will be able to see it. All right, so enough of the preliminaries. So I mentioned I'm from Ambient Sensors. We're in Boise, Idaho. We're an engineering services company. And as an engineering services company, we tend to do a lot of different things. You know, whatever a customer pays us to do and we agree to do it, we go out and branch a lot of different directions. So in the last few months or so, we've been doing a lot of various work. Some of the work has been with Bluetooth Low Energy for specifically version 5, Bluetooth Mesh, Wi-Fi, lots of IoT stuff. And we've even played around with connecting our IoT devices up to Alexa. And that was a lot of fun. Haven't had any customers be very excited about paying for that, but they love the demos. We do a lot of the smaller IoT systems. And we'll do like single threaded, small processor systems where there's no operating system, just our thread, on these low end SOCs. And those work out pretty well. But then some of the funner stuff is when we get to work on the higher end processors and we get to do something with Linux. So about a year ago, I was working with a customer and they had a question. And investigating what I could do with Linux for them and it led me to do an investigation and create this white paper, which is the subject of today's talk. So everyone's heard this Einstein quote, time is an illusion. And then you look it up and find out Einstein really didn't say that. I'm going to have to read his quote because it's a long one. The distinction between the past, present and future is only a stubbornly persistent illusion. You can see why that got simplified down to time is an illusion. But who's going to remember that quote? But the other reason for putting it up there is it makes a great segue for the next part of someone else's quote, a quote from our favorite author of The Hitchhiker's Guide. It wouldn't have gone with the actual Einstein quote, I fear. Now this painting in the background, this is Salvador Dali's The Persistence of Memory, which I forgot there for a second. To my knowledge, Salvador Dali really didn't do anything with Linux, but he certainly had an interesting perspective on time here. He's taking his representation of time, these clocks that are all melted and warped, and he's saying, you know, I can manipulate time to be my vision of art. And that's a really cool idea. I mean, you can do that with art. Sadly, with Linux and being able to do real-time control, we don't have this kind of control as much as we wish we did. When you're trying to use Linux to control some hardware, you're going to wish you had this capability. All right, I bet everybody knows that's a Bitcoin logo. So there's a couple of times in this presentation that I'm going to be asking questions of the audience. Now, I'm sorry, I can't gave you any Bitcoin if you participate. But if anyone was going to sleep a few minutes ago, you are now awake. So this is just a plea. I can't give you Bitcoin, but if I'm asking a couple of questions and you have a thought, don't be shy. Go ahead and give your thoughts on it. I'm certainly not going to mock you because I'm definitely in the hot seat here. So a general question for this talk. This is a rhetorical one, not looking for an answer. Can a small embedded Linux platform be deterministic enough to serve as a device controller? Now, specifically, we're talking about plain old vanilla Linux. We're not talking about real-time Linux. There's no kernel extensions here for this. So, you know, how you do on an everyday basis. You go to kernel.org, you grab a new version of the kernel and you build your own distro up from scratch. I'm sure we don't do that everyday. What's more likely is you grab a small single board computer, you find out what's the current distro available on it, you throw it on your computer, and then you save yourself. All right, can I use this as a device controller? All right, first audience participation. Why is this difficult for a Linux system to do anyway? So, I was looking for a giant question mark on openclipart.org and I ran into this awesome pun. So, does anybody know what this is? Not a zipper. It's a good thought, though. Why not? You get a virtual bitcoin. Yes, someone asked you why. What does everyone love to answer? Why not? Okay, that was my really bad pun for today. So, I assume that everybody's got some kind of an understanding of what is the problem with Linux, or you wouldn't bother with coming to this talk, but just in case there's a quick summary. So, the Linux scheduler's job, let's see if I can find my pointer. All right, so Linux scheduler's job is to decide when each task in the system gets a chance to run. We've got task preemption going on for the first part. So, if a higher priority task wants to run, the scheduler's going to take care of that. Even kernel tasks are preemptible nowadays in modern Linux. When I first started, kernel tasks weren't quite preemptible. So, any time that the kernel is built with config preempt, which currently is pretty much the default, you're going to have preemption even in the kernel. So, here you see task A. Task A is just happily running along, and it gets to a certain point, and all of a sudden, an interrupt occurs. Hardware interrupt. Interrupt service routine is going to start running. Run, run, run. Does something that's going to wake up task B. Task B is a higher priority than task A. So, task B gets to run along, and task B gets done. Fortunately, there are no other higher priority processes. Finally, task A gets to finish out what it was doing. Okay, if task A was your controller, you just got into yourself into a bit of a problem there. So, in general, with most modern operating systems that aren't designed for real-time, you're going to be dealing with all kinds of operating system overhead. You're going to have interrupt latency, context switching time, memory allocation, paging, the list goes on and on. So, how could you ever do any sort of device control with an operating system like Linux anyway? Well, first you have to kind of figure out what control you need. Do you need hard real-time control? Enimus deadline is a system failure. Quite the system failure in this case. This is for mission-critical systems where failure to conform to timing constraints results in a loss of life or property. So, some examples of hard real-time systems would be airplane control systems where you're worried about losing people's life in the plane, car engine control systems where maybe you're going to destroy the engine, heart pacemakers, no explanation needed as to why you have hard real-time there, and industrial controllers where you could be losing some pretty big equipment. Or do you want soft real-time control? This allows for frequently messed deadlines and as long as tasks are executed in a timely manner, the results are going to continue to have value. But complete tasks may have less and less value after the deadline, but they can still be of value. Some good soft real-time examples, as you can see there, streaming audio and video, voice over IP games, that sort of thing. And then people will talk about firm real-time. Infrequently messed deadlines are acceptable if they don't happen too often. I guess that's the definition of it, infrequently. And then here is a don't sue me, don't blame me slide. I really don't want that responsibility. If anyone doesn't know what that logo is, that's the international symbol for a peacemaker. I had to actually look that one up. Seriously, you need to really understand your timing requirements, study them pretty carefully before you decide what's the best way to try to meet those timing requirements. Don't just go off of what someone else is saying. Now, there is a way to support real-time in a Linux system. For example, the preempt RT patches. That's going to remove the unbounded latencies. Okay, so if preempt RT is so powerful and you can really do real-time Linux, why doesn't everyone just run preempt RT? Well, there's some engineers or their companies that aren't comfortable with trying to support a Linux system. That has the preempt RT patches in it. It takes work. You don't just apply the patches and it's magically real-time. You've got to do a lot of testing. You've got to do a lot of tuning. If your project has generated new drivers or kernel modules or using someone else's drivers or kernel modules that haven't been part of the real-time system, then you're going to have potentially a lot of modification to do. Then potentially there's going to be lag in support for new kernel versions. A patched kernel could be a little more sensitive to certain kinds of bugs than an un-patched kernel. For those that are interested in actual preempt RT in real-time, that's not what this talk is about. Look up a couple of different talks. Andreas Amans did a great talk at last year's ELC. In addition to discussing just real-time Linux and how to deal with it, he had a really excellent section on how to do some debugging, finding bottlenecks, testing and tuning. It's actually really good information for anyone working on Linux, whether you're doing real-time or not. Some pretty good tools and walking through how to do some things. Then just yesterday, Julia Cartwright gave a very cool talk about driver development in real-time. The what every driver developer should know about RT is a good talk to go look up if you can. That gives really good information for driver writers. Now I'm going to restate the question as a customer attempted to explain it to us. Can a Linux single-board computer control a GPIO to create a very accurate pulse without the Linux real-time patches? The two proposed systems that they were talking about were the Raspberry Pi and the Beaglebone Black. I'm guessing those are familiar boards to pretty much everyone in here. Then the question you see I've got very accurate in quotes. What did they mean by very accurate? I tried to pin them down on that and sometimes it can be surprisingly difficult to get a customer to define certain terms. We started out with 11 milliseconds, plus or minus 250 microseconds. They wanted me to be able to essentially at any time generate a very accurate, in that definition, 11 millisecond pulse width. I said, come on, how hard can that be for Linux to do anyway? 11 milliseconds is an eternity in processor time. I started on the Raspberry Pi 3. That's the quad-core Pi, for those of you familiar with it. I loaded up Raspbian on it and I started working. At the start of the investigation, I was using kernel 4.4. It was about a year ago, but I've since re-verified the results, are still pretty much the same on 4.973, and that's the latest Raspbian as of December 2017. Those images are, of course, compliments of the respective companies. You see there's a pulse train there. The point of this project was not to generate an accurate pulse train, but I figured the easiest way to test generating an accurate pulse at any given time would be to just generate a whole series of pulses and then measure them, make sure the system is really busy, and if they're all nice and uniform at the right width, then I probably can, at any point, generate a pulse. So I'm not generating a pulse train to generate a pulse train. That was just the way to test what I was doing. And then to simulate a busy system, I decided I was just going to swamp the Linux system, just bury the poor thing, and I was pretty good at doing that because I actually got the poor Raspberry Pi to crash a couple of times. Sometimes it just can't deal with a lot of stuff in there. And then if I had it swamped that badly and I could still generate fairly accurate pulses, then I would know, okay, I've got something going here. Then for being able to measure it, I figured I could do, in some cases, internal measurements, ask the system to grab the time information and check on that, and then other times I would use the oscilloscope or the logic analyzer to try to measure it. So odd note, don't ask me why. Even though my original goal was an 11 millisecond pulse, I did all of my experiments with a 10 millisecond pulse. I don't know why. I was looking at my results. What's that? A number of fingers. A number of fingers. Well, yeah, that's very possible. And I looked at it and went, 11 milliseconds. It's prime. It's a palindrome. It's a great number. Why didn't I use 11? And I just don't know. So first, excuse me. First I needed to decide what is going to be the most accurate way to make a thread create a pulse of a requested width. So after I cert the GPIL, did I want to do a busy weight loop and just sit there and spin and spin and spin and when the time was up, then I did sort the GPIL. That burns a lot of processor time. Or do I assert the GPIL and just let the thread sleep? Linux does the work. And then when my thread is awake again, I go ahead and I drop the GPIL. That sounds like that's a pretty easy thing to do, but how much is accuracy going to suffer because of OS overhead? Good question. And then finally, is accuracy going to be better in kernel or in user space? Is user space going to be sufficiently accurate or do you have to be in the kernel? All right, kicking off the investigation, I thought I'll just start with a simple user space program. The job of this guy was to get the current time and store it. That's for my internal measurement. Then I'm going to drive the GPIO high. I'm going to do a microsleep. It's not a busy weight. I'm going to let Linux sleep the task for this first pass. Then I drive the GPIO low, get and store the time immediately after, compare it to the start time from step one, and then I store the shortest and the longest times. That way I can see what's my best case and what's my worst case. Now, in this first pass, I'm making no effort to try to minimize any latency. This is our control. I expect the accuracy to be horrible, and we will not be disappointed. This is going to demonstrate the problem. And that way we have something to compare. How much better did we do after we did some ways to fix the problem? Here's my first results. As expected, control is pretty pathetic. The shortest pulses aren't terrible. 30, I guess, 20 to 60 microseconds of slack there. But what is up with these longest times? I mean, that 20 milliseconds when it should have been a 10 millisecond, how can the processor go so far wrong? Anybody have any thoughts on that? Microsleep, does it use signals? I honestly don't know. I didn't look that up. On the Raspberry Pi, I knew that once. I probably knew it when I wasn't standing up here on the platform. But it's actually a more general reason why. It's something that I mentioned earlier. What's that? Definitely, the process could have been paged out. Yes, but more generally, remember that I'm running these terrible swamping tasks? I'm artificially swamping this poor Linux system horribly. It's just pathetic. There are hundreds and hundreds of busy work tasks that are running. I start up the task. Its job is to fire off another version of itself and then start doing swamping things. So the new guy that gets generated, That's the next couple of slides. That's a really good segue. I like that. Here's your Bitcoin. That will be on the next slide as well. Another Bitcoin for you. So none of that stuff that you guys are talking about that immediately your mind jumps to got done here. There's no sort of way that we did. This is just plain old. This is how it came out of the Raspberry Pi box with Debian. And so we've got hundreds and hundreds and hundreds of these tasks that are doing. Let's see, I did a whole bunch of mathematics. There's one giant mathematical calculation. And so that's simulating a processor bound system. And then I'm doing a sync, which was easy to do in a script. And that sync is doing file system type work. So this poor system is just being swamped with hundreds and hundreds. And at one point it was hoping to get up like 5 or 10,000 tasks. But the poor Raspberry Pi keeled over at about 1200 of these swamping tasks running. And even when it was keeled over, 20 milliseconds isn't horrible for a totally swamped system like that. Now that of course doesn't count the case where the Raspberry Pi just stopped responding entirely because there were too many tasks. But still, it varies a lot. This is not going to meet the very accurate time that the customer was asking for. From 10 to 14 was too much, much less 10 to 20. Now I set that out as sometimes I got the 20 because I might have to run it for five, six minutes before I actually got the 20 milliseconds. And normally the worst case was around 14 milliseconds. But here we have Linux pulling the rug out from under us. And I mean the poor Linux, what else is it supposed to do? So we need to help it out, that's what we want to do. And you two guys gave some great, hey, did you do this? And that's the first things we ran for. So I was using the GPIO driver from Ask Me Afterwards when I'm not on the stage. I could give you all the details about that GPIO driver. But I was using one of the standard user space GPIO drivers. It's not known for being super fast, but it's not known to be onerous either. That was in C. I do play with Python, but at my heart I am programmed in C. So I first jumped to doing the scheduling thing. So I thought let's get, you know, tell the scheduler to be a little more intelligent. And I went for SCED FIFO. And then I said let's do top priority of SCED FIFO. And then so I didn't totally kill the system. I thought let's reserve a core just for this thread. So it's a quad core. It doesn't need four cores for Linux. It can use just three cores for Linux. So I stole one of the cores away and I said, we're just going to dedicate this specifically to our thread. We're going to make this thread the very highest priority and we're going to go with that. The idea being that no other threads do we want to run on this core at all, just us. Of course, this is only going to work on a multi-core system. If you did this on a single core system, I think things would go very badly very quickly. And then the last thing is to lock the task in memory to keep it from getting paged out. And this is something that obviously is only going to apply to user space. We don't so much worry about the kernel threads being paged at. So here are the results with those changes. We're minimizing latency. We're still in user space. We're just doing the things that we talked about. And pretty impressive when you see the difference here. We no longer have those scary 20 millisecond pulses. We've got some pretty decent time there. And we're definitely under that 250 microseconds that the customer talked about. And I let this run for quite a while. And there were a couple of runs that I maybe saw pop up to 120, 130. But I never could reproduce those. So I felt pretty good about, yeah, I think that'll stay under 250. But if you ever wanted to do something shorter, the customer was only asking for 10 milliseconds. But I always wanted to do the one millisecond and 100 microseconds and 10 microseconds and see where things were. The 10 microsecond one was a little distressing. It's like really, 10 to 97. That's kind of sad. Even the 100 wasn't great. 100 microseconds to 179. Okay, still meets the customer's requirements. But I really, yeah, 10 to 97. That's quite a margin of error there. I just, I wanted to see how much farther I could go. I certainly wasn't going to stop here. I'm sorry. Yes. Yes. Very good observation. We've got a 70 to 90 microsecond variance for all of them. So you're getting a pretty good look at the system overhead of a very swamp system. Now, again, remember, this is a very swamp system. So, again, this is pretty impressive. So in the swapping tasks that I'm doing, tasks are starting and dying. I didn't intentionally do that. But for some reason, these scripts at some point would start getting killed off. I think that the system was saying something's wrong here. And then new ones would start up because the task, every task is trying to start a new task. And yeah, there, and a few other things that I did. In fact, that's a very good point. Thank you. The 20 milliseconds that would occasionally pop up. I think that may have been in the case when I was secure shelling in just to start another window. So, but I also did that in these tests, try secure shelling in doing some different things. Not very deterministic because I'm just playing and, you know, typing on the keyboard at the time. But I never did see it go above the 92 microseconds above that. Sorry. The interrupt definition? Interrupt affinity. Oh, interrupt affinity. No, no. So far I've only used what it was that I talked about, that I explained in the previous slide. So no, no interrupt affinity yet. It very well could be, yes. And some further slides you'll be seeing some of the information that can lead you to that conclusion. I noticed that that shortest pulse is never going to be less than 10 milliseconds. One of the things that the customer was very insistent about is we can never have the pulse be less than 11 milliseconds. And so pretend it's 10 here. And I can't explain for confidentiality reasons what it was the customer was doing. But their requirement was the pulse could not even be, you know, a nanosecond less than their target's time. It could be a tad larger, but absolutely no smaller. So that's part of the reason I made sure to put up the times for the shortest pulse to make sure I never went under. But I took a look at the Linux man page for microsleep. So this is expected. And that page says the microsleep function suspends execution of the calling thread for at least the requested number of microseconds. The sleep may be lengthened slightly, slightly, by any system activity, or by the time spent processing the call, or by the granularity of the system timers, or, and you can add your own or as I'm sure there. Okay, then I pulled out our Saley logic analyzer, and I took some external measurements. And you can see I wanted to make sure that the internal measurements weren't terribly lying. And it turned out they were pretty good. During this for a second time, nice long four seconds, the desired 10 millisecond pulse was 30 to 80 microseconds longer than the ideal. So I got a pretty good feeling that the internal measurements weren't at least not horribly bad. All right, so the user space code was an interesting start, but I am a kernel developer. So once I got a few tests done in there, I'm like, all right, we're off to the core. We're just totally throwing away the user space code. So I didn't even try tuning that any farther. So in this case for the kernel, I decided just like before we're going to use SCED FIFO. We're going to use high real-time priority. We're going to reserve a core again. And this time we're writing to the GPIOs directly. That is, I understand, a bit of a no-no. You're bypassing the driver that has been put there for a reason, but if this is your system and you know no one else is going to be accessing the GPIOs, you're relatively safe. Just don't try to go do this on someone else's Linux system and go right to the hardware directly. As we all know in Linux, we have drivers for a reason, but I just wanted to make sure there wasn't going to be a problem with any latency for drivers I wasn't familiar with. I really wanted to get good timings. And then the question, can we disable kernel preemption? Another kind of a no-no, but you know, we're on our own core and if we're disabling preemption on our core, no one should have been preempting us anyway. Now, just to be a little more polite, I only turned off kernel preemption during the very important timing period, the entire time I had the core. Let's see. And then I wanted to compare the results between a sleep, using microsleep, or a busy weight, using micro-delay. And then because I'm doing kernel preemption, I also suppressed the CPU stalls. I kept getting lots of these, you know, kernel telling me the CPU is stalled because I've disabled preemption and I'm killing the core. And then turn off real-time throttling. I didn't want my real-time process not to get time because I'd taken up too much time already. So, here's the results. You can see, times are looking pretty similar to what we had in user space, which I guess isn't terribly surprising, but it's a little bit better in some and similar in others. And I will admit, being a kernel developer, I was a bit disappointed that it wasn't kernel rules, but we survived. But you'll notice the 40-microsecond delay and we're using microsleep for all these guys. Anybody have any idea, is this going to be, like, way better when we get around to using a micro-delay, you know, a busy loop? Is it going to be about the same or is it going to be worse? To be as much as you ask. How many are shorter? Yes, that is very true. That is a distinct possibility. I'm definitely not surprised. But what about the other end of it? Anybody got any guesses? Is it going to be better? Is it going to be worse? Better. Better. Better. Bitcoin for you. Okay, so comparing the busy-weight delay to the kernel sleep, a lot better. You look at some of these numbers. The longest worst-case pulses are 10 to 30 microseconds shorter than they were with the kernel sleep. The average is 10 to 35 microseconds shorter for busy-weight. And kernel sleep, you notice, we never get right dead on target there, whereas we seem to be really good with the busy-weight. Now, definitely take to heart the statement that was made earlier. If you absolutely positively can't have anything accidentally too short, be very careful using this because if you have to be sure of that, then you have to be careful about this. But even the averages, think about that. We have a totally swamped busy system here, and the average is a 5 microsecond over what you expected it to be. That's actually pretty impressive. Okay, now back to the busy-weight loop idea. Oh, sorry, Dave. I didn't draw them out. Because I was running on a Raspberry Pi, I was only keeping the maximum, the minimum, and the average. Storing a whole bunch of that stuff, I was afraid might actually end up skewing things, even though I guess that would have been even better to be having more disk access and all that, but no, I didn't do that. We were just more worried about what's the worst case, what's the best case, and what's the shortest case, I guess, and what's the longest case. For me, what I was looking for was, what's the very worst case? Because the customer wanted to know, am I guaranteed you're going to be able to give me a pulse no bigger than this? I kind of eyeballed the numbers, because I had print statements coming out so that I could take a look at. Every time I got a worse number or a better number, I would be seeing, okay, this happened. The worst cases tended to go through cycles and a distribution actually would have been really interesting to look at. Maybe that's something that would be good for our future investigation. All right, on the busy wait stuff, normally you're really cautious about running busy wait on a Linux system because you're burning processor time here and you're stealing from everybody else. But again, we're on our own core. Who are we stealing from? Nobody that we care about. But then the question comes up, well, what about some other things that might happen because you're running this processor so long? Is it going to start making the processor too hot? So if you're doing something like this, take a look at it and go, okay, am I running this busy wait loop 10% of the time, 20% of the time, 30% of the time, 5% of the time. If you're doing that, you're going to want to take a look. Am I going to have heat problem or are there any other side effects I'm going to need to deal with? So this is another one of these warnings. Don't blame me if you do this and something catches on fire. So I brought out the salient again and took a look at this four-second measurement and the pulses range between absolutely spot on to 30 microseconds too long. And then I did a close-up of the pulses so you could kind of see what they look like and this little range here, they're pretty much all at 10 milliseconds. And these pulses, like I said, attended to run in cycles. So this was a cycle when it was right there dead on at 10 milliseconds. And then finally I decided to try this last experiment with kernel preemption off. Oh, was there a hand raised? I'm sorry, Julia. Interesting thought, being out of phase with the scheduler. Oh, now that's interesting. Anybody want to do any experiments on that? Go for it. I would love to hear the results of that. Yeah, I like that. Okay, so here's the results with kernel preemption turned off. So the previous one was with kernel preemption on and I wanted to compare it. And I was kind of disappointed. Things are a little better, but not like significantly better. And then I thought about it and went, oh, there's nothing else running on our core pretty much. So what is there to worry about preemption with? Yes, that slide is coming. Now, I didn't do it specifically for that core, but I did take... I'll be showing you some top and top will show you how much time was spent in interrupts. Interrupt affinity. Yes, interrupt affinity is a very good thing that needs to be done in the future on that. I've got that on my list of things to think about. I can give the end of this sad story already. We told the customer that we had some good results and we thought this could be done. And they went off and did it themselves. So as we have learned, you don't always just want to tell people some answers without making sure that you have a contract signed. So in essence, we never did end up finishing the project because we started the investigation and we didn't get the contract. I don't remember on the previous slide. The worst case was, oh, for the hundred. No, no, that's a very good question. I do remember seeing that earlier, being a little puzzled by it. But again, since our target was the ten, that's the one that I concentrated on. But yes, that's a very good question. Why do we see the results for the busy weight being a bit worse than for the kernel sleep for the hundred and for the ten? And again, if anybody wants to investigate that, that would be a great thing to add to the white paper. Could be interrupts. Yes. Yes, it definitely could be another interrupt going on. Dave? When the system was completely idle, I did do that and it ended up everybody was just working great. We only had, I did not print that slide, I'm sorry. If I remember right, it was like never worse than like 20 microseconds off. It was for the busy weight. With the kernel sleep, I don't think it was much worse than that if I remember with the unloaded system. But take that with a grain of salt. That's numbers that I sillyly did not print, sorry. Back to... Okay, so here's top. The first question I wanted to know was, do we really own this core? So, let me find the pointer here. So you see, I've sorted things in top here by which processor core we're on. And my first shock was, oh my goodness, what are all of these threads? I thought I told Linux not to do that. Well, Linux has its threads. It's going to put on every kernel whether you tell it to or not. So here's these threads out here. But when you take a look, they really haven't taken up much of any time. Now, this guy is... I think this is the running one, right? Okay, yes, the R, I can see it. But again, he's really not taking much time. And then the SH here, my kernel thread is running under this shell. And you can see it's taking 100% of its core. If you look up here on the rest of it, 25.2% system. I'm assuming that 25 is my thread. And then 74.8% idle. That's like, oh, okay, pretty much 75%. The rest of the system is pretty well idle because those three cores have nothing running on it. This is obviously not a swamped version of the system. This is when I'm just running my test without any swamping going on. So if I were to print the rest of this, it would be an eye test. And you would see that, you know, you would have a bunch of stuff on core two, a bunch of stuff on core one, and a bunch of stuff on core zero, and no one's really doing anything. Then I said, okay, let's do the same measurement when we have the total system being swamped. And pretty quickly, you go, ah, let's check this out. We have, I can't read that number. Is that 24.5? 14.5, thank you. 14.5, user sum number system. And down here is system interrupts. So this is back to your question that you were asking. We definitely have got some system interrupts going on here, and there's a very good chance that some of those may have been thrown on our core. And by doing some of the dealing with the interrupt affinity, sorry, I could have gotten, made sure none of that was happening on our core. So I don't have any good data on that. But taking a look at this, we definitely are completely swamping this poor system. There is no spare cycles anywhere. There is zero idle. And seeing how busy the system is, and seeing how good those results were, I was actually pretty impressed. We're way under what the customer said at 250 microseconds. Okay, now that was the Raspberry Pi. Even though we met with the customer... Oh, I'm sorry, go ahead. CPU. I did not enable any frequency scaling. I don't know if that's standard in the Raspberry Pi, Debian or not. It does or it doesn't? Okay, you think the chip doesn't have frequency scaling, then that would definitely answer that question. Okay, okay, thank you. And what was your first question? I don't remember. Oh, how accurate was the timing? So were you referring to the timing as in my internal timing that I was doing? I was using the high-resolution timers. So the high-resolution timers go down too. I remembered at one point. Very nice. It wasn't in Jiffy's, I can tell you that. I immediately... Pardon? It's in nanoseconds. It's in nanoseconds. But I know the Raspberry Pi hardware doesn't support the nanoseconds. So it goes to whatever it is? Yeah, whatever the Raspberry Pi has in that case. So, yes. But that's part of the reason why I did a lot of external timings just to verify that my internal timings weren't going to be, at least, magnitudes of error off. They seem to be awfully close to what I was saying. Okay. Oh, thank you, yes. Yes, yes. Not for that one. I actually did throw it on the oscilloscope because I happened to be in the lab when I was first doing the tests. And then I went back to my desk and I was doing stuff with the logic analyzer. I wasn't seeing anything vastly different, so I wasn't terribly worried about what I was seeing. And plus, it was kind of verifying what my internal measurements were. So again, I wasn't really worried about it. Okay. So even though I met all of the user requests for using the Raspberry Pi, you can never get enough accuracy. So I thought, let's see how good we can get with the Beaglebone Black. I actually, before I started working on this, had very little familiarity with the Beaglebone Black from actually working on it other than, you know, just having, you know, people, you know, playing with it in the past and me seeing it running. So this is my first chance to really dive in there and get to know it. So for those that don't know, it's a single core arm, which is for, that's for the Linux. That's an A8. And there's two PRU cores. Now, the PRU cores are programmable real-time units. Now, we've got the single core A8. So this trick that we had of, let's take one of the cores and let's isolate it over here and let's make sure that nobody can touch it, obviously it's not going to work on a single core A8. But these two PRUs, let me give you some of their statistics here. They are 200 megahertz 32-bit processors with single cycle IO access. Each has 8K of program RAM and 8K of data RAM. There is a 12K shared data RAM between them. There are 25 low-latency IOs for the PRUs and at 200 megahertz, a PRU gives deterministic submicron GPIO control down to 5 nanoseconds. And they're not pipelined, so you're actually getting real-time response. So I thought, ooh, these PRUs are pretty nice. We could do some pretty nice stuff with this. Let's check that out. And notice that they're, this is a very, very simplified picture of a Raspberry Pi. I ripped out most everything out of there, but notice what I've left in there is there's a connection between the PRUs and the Cortex A8. Keep that in mind for the next few slides. So obviously you're not going to be running Linux on these PRUs. They're way not enough RAM for that. You're going to be writing your own thread code. So the thread's going to be running on the PRUs. And since there is no OS, there is no OS overhead, which is quite an advantage. All the advantages of an OS go away as well, and we know the advantages of having an OS. So you're going to create a thread, and in my case, I need to create an exact pulse width. So I'm going to count the number of lines of code I've got, because each one of those is going to be taking the execution time of five nanoseconds. And then I put the delay in there for how long, for how wide I want the pulse to be, and that's going to be my pulse width. And as I just mentioned, I've got a connection between the PRUs and the AA8. So a thread on the PRU can communicate with Linux on the AA8 and vice versa. You look like you were going to say something, Mike. Okay, no. Oh, okay, okay. Yeah, I didn't look into them. I just knew they were the PRUs. I'm sorry? Oh, they look more like a CPLD. Interesting. Well, you know, I just sat down and learned the programming language and went for it. Oh, so for more information on the PRUs, three years ago, Rob Burkett did a presentation enhancing real-time capabilities with the PRU, and I ended up using that to figure some stuff out, so that was a pretty good presentation. All right, steps to getting stuff working on the PRUs. So you had another presentation, one that Jason Kridner did that's on the Beaglebone website. So you write and compile the firmware that you want running on the PRUs. You use the AA8 when it's going to be running Debbie and Linux to load this firmware onto the PRUs, then create device tree entries to configure the kernel and the IO pins. Obviously, I need to drive a GPIO, so I need to tell the system to let me have access to the GPIO pins. And then implement a communication mechanism between the AA8s and the PRUs. So the PRUs, these guys are very accurate, obviously, but they're limited. They don't have a lot of RAM. So you're not going to be implementing megabytes of code on them. You're going to need to figure out how to divide up your logic between the AA8 and the PRUs. You're going to be doing any non-real-time stuff on the AA8 in Linux and any real-time stuff on the PRUs. I was very fortunate that for my experiments, I really didn't need to have the AA8 do much of anything. It's just going to load up the PRUs because all of the work pretty much was being done on the PRUs. If you remember, I'm driving the GPIO line high, doing a weight, and then driving the GPIO line low. That didn't take a lot of code, so I was pretty much on the PRUs. Now, there's two commonly used drivers to load code on the PRUs from the AA8. One of them is called UIO-PRUS. It memory maps all of the PRU registers in the user space. Now, that can be a bit dangerous, but some people find it easier to use. What's actually much more highly recommended to use is PRU-RPROC. This is a proper Linux driver using remote PROC. It's very much preferred. I use PRU-RPROC. When I was talking to Jason about it, and I mentioned that I was using PRU-RPROC, he was like, thank you. Evidently, it's very strongly preferred. Eagle bone black results. Wow. My logic analyzer actually couldn't measure any error. I hit the Nyquist sampling limit, so I moved over to our high-speed oscilloscope, and it showed that the error was on the order of no more than tens of nanoseconds. I actually need to take a look at my code a little better and make sure that some of that error actually wasn't my fault in counting the instructions in my loop. If I were to create a chart for the Eagle bone black like my previous charts, I would have the ideal pulse, the shortest pulse, the longest pulse, and the average pulse all pretty much having the same values. That would be a really boring chart. I didn't bother creating that chart. Again, you can never have too much accuracy. What happens if you do a 25 microsecond pulse? Forget the whole 10 millisecond pulse. How accurate are we going to be down at 25 microseconds? Back to the oscilloscope, my high-quality photograph I took here. Some of the pulses are 25 microseconds, and you can see down at the bottom, maybe you can or can't see at the bottom. Some of the pulses were 24.95 microseconds. Pretty accurate. Then I decided to zoom way in just to see what the quality of these signals were. I was actually impressed how clean that was. Now, I didn't use any special connectors for the measurement. I just took the probe and popped it right on the little debug output of the header, so that is impressively clean. What's that? I would assume the ringing is the probe. Yes, exactly. I looked at that and went, huh, pretty impressive. These PRUs really impressed me. Yes. On this one, I didn't do any interaction between Linux and the PRUs other than Linux is loading up the PRUs and just saying go. At that point, it's completely self-contained. It goes high, waits its time, goes low, waits a little bit, goes high, waits its time, goes low. The only other thing that I didn't do a slide for is I set up the other PRU. This is one PRU. I set up the other PRU to look for input from another line, and I could just stop the thing from running. I didn't even do any timing to see how quick that was. That's a really good question is how accurate, how quick is the communication between the two? The BeagleBone Black has set up some really good paths between Linux and the PRUs. If you use certain ones, you're definitely going to be getting a lot more accuracy than if you use other ways of communication. Oh, my goodness, time is just about up. I am so sorry. How did the busy work processes affect the PRUs? Eh, not really. I didn't expect that they would and they really didn't. The busy work processes run on Linux on the A8. The PRUs don't even know or care about them. Linux is the bootloader for the PRUs in this case, yes. That's a good way of looking at it. Now, the one thing to remember, if you do end up having a lot of A8 interaction with the PRUs, any busy work that is happening on the Linux processor could be impacting that. Which one should you use, the BeagleBone Black or the Raspberry Pi? It's one of my coworkers that's famous for saying, what problem are you trying to solve? Well, do you need deterministic control and you feel comfortable writing threads on the PRUs? Definitely go for the BeagleBone Black. That will work great for you. Do you feel more comfortable writing all of your code in Linux and you don't mind 100 microseconds, 200 microseconds moving around a little bit? Raspberry Pi is going to work for you. Future investigations, interrupt affinity, dedicated PRUs, cache misses, can we lock all the code in that L1 cache? And would we get any advantage driving spy instead of GPIO? And then, yeah, all of the stuff that people have been asking about. Someone really ought to do that. Great details of the experiment and go get the white paper. It actually isn't live right now. It will be very soon that white paper is being formatted as we speak. Go to the Downloads tab, which is currently under our Game Changers tab, and download it. Feel free to send us email, let us know. If you did anything with this, if you want more details, we'd be glad to talk to you. Thank you.