 I am ready. Good morning everyone and welcome to the start of the Embedded Track. We are going to be two days here today and tomorrow as well. Today we have a series of talks that will end about 6 p.m. and tomorrow we will start at about 11 right after the keynote and then we will go until about 5 or 5.30 tomorrow afternoon. So glad to have everybody here. Our first speaker today is Alison Trichen. She is going to be talking about the first second of a Linux boot and she has been working in Embedded Linux for quite a few years and did some physics work before that and now she is here to present to us. Alison take it away. Thanks very much Tom. So this is a talk about the Linux boot process, not a talk about Postgres. I don't know anything about Postgres if you are hoping to hear about that you are in the wrong room. Furthermore to get started I will just say to get it out there. I support Muslim immigration into the United States. But enough about all that stuff. Let's talk about nerd topics like how Linux starts. Since I am talking about the boot process, the first step of the boot process is off. Off may be a concept that you think that you fully understand but I will try to convince you that you are wrong. However, the boot process actually is in its own way one of the more comprehensible parts of the kernel I think. I am going to try to convince you that with tools you already have even if you are not a kernel hacker or an embedded developer you can learn a lot about how your system boots. Because the beginning of the boot process is single threaded because no matter what kind of developer you are you have a kernel image on your system you can learn a lot about the boot loader and the beginning of the kernel on your own pretty easily. So I am going to try to answer a lot of the questions that people might think of. A common question embedded is why do we have a shim boot loader? This is my boot loader. I have two binaries who ordered the second boot loader binary. I will talk a little bit about ACPI which is what X86 and some ARM 64-bit systems have as a firmware basis and compare it to the device tree that 32-bit ARM and Android and so forth are using. Talk about really how the kernel gets started like how it becomes multi-core and when the second thread of the kernel after the system boots up starts. I will answer the puzzling question what is the initRD and why do we have one? Talk a little bit about how user space starts. I already showed you my hibernate suspend demo because I think the last time I presented this that was an hour three of this talk. Topics I am not going to cover because they have been truly beaten to death by previous speakers whose presentations are linked here are system D, UEFI, secure boot and fast boot and automotive and other spaces. So resources are out there on these topics and plenty of other material to present. So first a warning. There is material in this talk that is for mature audiences. There is potentially a kernel panic and a boot failure and before your very eyes this morning I will successfully dereference a null pointer. So if you are a person of sensitivity and squeamishness you cannot bear to see these accents taken. You may leave the room at this time. Before I get too far into this and forget I will say a lot of this talk is inspired by this wonderful book The Embedded Linux Primer by Christopher Hallanan. I actually bought this book at my first scale at the book show a few years ago. I like it so much that the second edition is out and I therefore am willing to give away this copy of the first edition which is much stained, highlighted, annotated and all together beaten up to the first person who wants it in the spirit of open source. So come up to me afterwards if you would like to claim this wonderful book. So on to the technical part of the presentation. Let's talk about the off state. So the off state is what happens right before the on state but there are two or three flavors of going from the off state to the on state. There is the most obvious one called very often cold boot or power on reset. Everything in the system has been cleared in memory. This is what happens when you plug in the device. So that seems quite obvious. Warm reset, if you say system control reboot or shutdown minus R now and the system goes down to the boot loader and comes back up again, that is warm reset. In warm reset traditionally the system sets the memory, the DDR controller to self refresh so the memory is not cleared. And then a lot of the other hardware can be reset. A lot of what goes on here, the details are differing by processor. Besides manual rebooting, a watchdog reset or a JTAG reset is also in this category. This category is actually very important for example because data can be passed by Linux in memory to the boot loader. And so if you're working say on an IoT project where you want to do software updates you may want to read the information that Linux has written from the boot loader. And I'll just say if you are interested in the software update topic I direct your attention to a wonderful talk given on this topic by Matt Porter of Consulco whose slides of video are up from embedded Linux conference Europe. So let me change the color of this. So this is now my IMX6 board sitting here. So IMX6 is a free scale RMV7 processor that's very popular and automotive where I've worked a lot in the last five years. And I'm just going to show you if we go back here to the very beginning of my net boot you can say reset cause is POR, power on reset. And if I log in and I reboot, this is basically what I'm going to do with these systems is reboot them during this presentation. It would be hilarious if this actually didn't work. I didn't say now, okay, system control reboot. So now we're back to Uboot and now you can see the reset cause is Watchdog. So the point about this is if you are doing over-the-air updates, you've updated your system, you can pass information, for example the reset cause from Linux to the boot loader. And that's how the boot loader can detect if your boot is failing and say choose a backup image. So this mechanism looks trivial and boring but it's actually really important in practice. And it's a major problem in the work project I'm working on now that the TI processors that we have clear the reset reason. And so one of my work tasks is to work around that. Now what's interesting about off and why you might not understand it as well as you think you do is you may have noticed that if your system is off but your LAN cable is plugged in on your laptop that an LED is illuminated. Well, why is that? That's because there's a third type of power on called wake on LAN where the network can actually wake the system up. And this is implemented now in all Intel and AMD x86 processors. Intel calls it the active management technology. They point out how over the network, owners of embedded systems can monitor them even when the enabled systems are quote unquote powered off. In server land, this feature is called IPMI, an acronym whose expansion I'm afraid I don't remember. Someone in the audience whose assistant member will remember. Who can remember what IPMI stands for? Yell it out. Nobody? Yeah, something like that. So the IPMI is, I don't feel so bad now, the IPMI is run from a microcontroller called the Baseboard Management Controller and as the name suggests that controller is on the PC board, the MOBO that your processor is on. What's more interesting about the AMD, the active management technology here is that it's run from the graphics and memory controller hub, a microcontroller which used to be called the Northbridge and that microcontroller is actually inside the SoC on modern processors. And some remarkable reverse engineering of x8664 by Igor Skotinsky of HexRays, which is described in this talk that I've got the hyperlink for here, shows that that graphics and memory controller hub is actually an ARC architecture processor, not x86 and it's actually running the ThreadX RTOS which if you consider that Intel makes a number of microcontrollers like the Edison and the Galileo and they're now pushing their own Zephyr open RTOS seems like an interesting choice. They've turned away from their own dog food here. But the worrisome thing about this graphics and memory controller hub is obviously it's closed source, it's inside your SoC and unlike your base word management controller you really can't remove it or put probes on it to see what it's doing. So that does seem a little worrisome. There's actually an excellent series of articles about this by Ruben Rodriguez and Donald Robertson from FSS which actually have a lot of interesting technical content. This is called platform security processor in AMD land. On ARM as far as I know, despite the various privilege modes and security states that the ARM processor can have there is certainly nothing like this. But enough about off, we've covered that in full now. Let's move on to the next stage of the boot process which would be the boot loader. Here is a very nice diagram from Intel that kind of shows you what boot loaders do starting over here on the left. You start in the SoC with some boot ROM that it has. It then performs what you could call early emit which involves initializing a lot of the subsystems. The most important task of the early emit is to bring up the DRAM. DRAM has its own controller that is separate from the SoC so DRAM is actually not accessible to the processor when the power first comes on. Then there is an initialization of a whole bunch of other devices. Irony about a boot loader is a boot loader brings up all the IRQs, GPIOs, and timers and brings you to the boot loader prompt. Then before it starts the kernel very often all that state is lost and the kernel has to initialize all those subsystems again sometimes. Since this is X86 or ARM64 in this picture there are ACPI tables. ACPI is Advanced Configuration and Power Interface if I'm not mistaken. This is where we would have the device tree instead for an ARM system. The device tree is kind of a manifest of the hardware that's in the system, the hardware capabilities. The kernel has tried to separate the data about a system into the device tree so that the methods that operate on the data are what is remaining in the kernel processor. So it's this kind of a modern design. And ACPI tables have a different format but the idea there is similar. Getting to the topic of why does my boot loader have two binaries? Why do I have an X loader or a Schum boot loader or sometimes it's called MLO and the main Uboot image if you're using Uboot. The answer is that the Schum boot loader does this early part of the init. And so there are kind of two reasons why people have Schum boot loaders. One reason is that very often across an entire family of processors Uboot.image will be the same but then this X loader will be processor specific. And so it just makes code maintenance easier to separate off the hardware specific part of the boot loader into a separate binary. The other more important reason for practical point of view is before the DRAM controller is initialized that I was just talking about, the amount of memory available to the system is really tiny. And so Uboot has been accused of being bloatware. It does have a lot of features and support all kinds of systems like NFS, this board for example is net booting so it's using NFS part of Uboot but Uboot now supports I think NFC and all kinds of IR devices and so forth and so on. So it's got a lot of code and that code won't fit into the memory that's available when the system first comes up. Before the DDR controller is up and there are page tables and virtual memory and all the gigs of memory you may now have, the only thing that's available to the system is the memory that's actually attached to the pins of the SoC. So typical SoCs now may have some SRAM or maybe they have a parallel Nord chip whose data pins are actually wired directly to the pins of the SoC and so since this memory has no intermediating microcontroller the processor can use it as soon as the power comes on. And so these shim boot loaders can start in that tiny memory and then all the shim boot loader really does is bring up the DDR memory controller so that then you can start the main part of the boot loader. The part that's happening towards the end is the initialization potentially before your kernel comes up. In x86 land is this ACPI advanced configuration and power interface. The ACPI is described by tables which are stored in ROM on your system. The ACPI, unlike the management engine or the platform security processor, as I was talking about a few minutes ago, is available to you and you can change it and read it to your heart's content if you want and actually you may sometimes wish to do so because it's pretty buggy in my experience which is why if you came in early I was telling you about how I run suspend and hibernate from the command line because in my experience that always works whereas if you close the laptop lid you're actually triggering ACPI events at the mercy of what's in this weird firmware from my point of view and let's just say I haven't had a lot of luck in my personal systems with having them come back on after they have hibernated. You can just with the command ACPI dump that's available I think in the ACPI tools package see what's in your ACPI tables and this is a... I'll show you the ACPI tables are very long but I will show you my favorite part since I'm the kind of girl who has a favorite part of the ACPI tables so if you look for the string OSI that's operating system information and here we find, bless their hearts that this laptop has been prepared by Lenovo to run Windows 2001 Windows NT, Windows XP Windows 2001 service pack 1 Windows 2001 service pack 2 anyhow you get the idea and then for whatever reason Linux and FreeBSD only have one line whereas actually the Windows content is continuing on after this it's fascinating so I will not bother to speculate why this is but if you want to learn more about how to tell whatever system you're connecting to that you're running Windows XP this would be the way to do it I think if you open port 80 and broadcast to the world that you're running Windows XP it certainly would make for a most entertaining couple of hours so ACPI tables are a little bit different from DeviceTree which is what 32BootArm has as I was saying I'm going to talk a little bit more about DeviceTree in just a second but ACPI actually has a Linux device driver that's running all the time that's part of the kernel so it's active after the system is up and it's involved in doing things like running the fan listening to events on the power buttons and the other hard buttons on the system the core boot project can be the code from the core boot project can be used to actually replace most of the functionality of the ACPI tables and I should say also the core boot project has a table on its web page that lists the state of implementation of this active management technology in the microcontroller on the x86 SoC on its web page that is really super good and so if you want to know your particular laptop what is known about that GPMH microcontroller you can read about it at the core boot wiki if you want to learn more about ACPI tables take yourself to the Arch Linux wiki for a lot of insight into what you can read there So the device tree unlike the ACPI tables for the most part doesn't have any role after boot in recent years a developer named Pantelus Antoniatis who some of us know has added a feature called a device tree overlays which are actually configuration fragments of the device tree that can be dynamically loaded in accordance with hotplug events on an ARM32 system after the device is up so that's not quite true but for the most part the device tree is really a boot time event whereas the ACPI tables has a role both at boot time and ongoing in these power controls the boot loader can be something of a black box for people but this is just a little bit of information about how to get the boot loader to print out some more information of course if your system boots you would be crazy to change the boot loader on your system you should never touch the boot loader on a system that's working however if you did wish to do so if you recompile your u-boot with these configuration settings you get this chatty little dialogue here at the beginning of boot which I can show you here we are at u-boot so if I run the netboot here it's on this IMAX 6 and here whoops this table went by showing the boot stages alright come back here so the reset the main loop of the state machine of u-boot is loading here ethernet is starting because I'm netbooting so all the tftp and ethernet have to start beginning boot m which is the command boot from memory that I'm running is right here and then start kernel which is finally coming to Linux is here and then there's all these IDs that look incomprehensible in between but if you go and look at the bootstages.h file you can learn what all these magic numbers are and in fact consistent with what we see here right before start kernel for example id 15 and that's exiting u-boot jumping operating system so this type of debug info is really helpful if your u-boot is malfunctioning you can get all this chatty printf out before it stops working so I talked a little bit about passing info from Linux to the bootloader how the bootloader can pass information to Linux is a lot less mysterious because it can just put it in the kernel command line that's pretty obvious a lot of people know even with grub on a laptop if you interrupt the boot by hitting carriage return usually at some particular point where E you get into a menu where you can simply type things and append them to the end of the kernel command line I actually had a whole demo where I was going to show that but as I mentioned at the beginning I have to be realistic about how much material will fit into this talk but another way that's kind of newer and sexier and more embedded to pass information between the kernel and the bootloader is to use some facilities called MTD OOPS and RAM OOPS so I'm going to show you a little bit about MTD OOPS MTD is memory technology devices that means raw flashes, raw man spy nor Matt Porter is going to talk about spy next maybe he'll talk about spy nor and PNOR are the kind of common or it could even be a USB stick I guess I think you can write the USB from MTD anyhow MTD OOPS as the name suggests is a method of rating out the stack trace produced by an OOPS which is kind of a little mini kernel panic to a flash device so here is my device tree for my IMAX sex you can see the device tree kind of looks a lot like JSON of course in the kernel community we didn't simply use JSON for the device tree we created something almost identical to it but which is different and annoying in slight ways I've written four partitions on my spy nor which is this kind of device so this device tree fragment gets is part of a larger device tree that gets compiled and read by the bootloader and passed to the kernel at boot and it's kind of a manifest of the hardware on the system and it does include things like partitions and so I've actually implemented this on this board and if I go back here so here's my board log in again this font big enough people yell out if it's not so I've got the MTD oops kernel driver compiled in I can look at the information with modinfo and we see that it's an MTD oops Panix console logger driver which what that means is if this driver is loaded and properly configured when there's a kernel panic the kernel is going to write out the stack trace to the flash device just before it dies which means that the bootloader could read the reboot well the boot failure the crash reason from the flash this is really useful if you have an embedded device that's out in the field and someone brings it back to you RMA and says it doesn't work it keeps crashing well what were you doing to it when it started crashing well this is a way to actually read that information back even after the device has been powered off you don't have to rely on somebody taking a photograph of the screen as someone who has taken many screen photographs over the years so I'm going to load this driver and you may remember from my device tree that my partition is number three and I'll say dump oops is equal to one so that an oops will get written to the flash and so now the kernel says it's attached the oops facility to MTD device three and now let's load another module of my creation especially for this demo called null pointer a description of null pointer is do something that you don't realize without any precautions I offer this under the GPL for people who would like to try it themselves so golly that sounds like fun let's load null pointer I wonder what will happen oh no pointer doesn't work however the good news is that the MTD oops driver here reports that it's written out to page 11 of the MTD oops partition and so if everything is working properly the stack trace is going to look a lot like this so this wasn't a panic this was just an oops but if we now say system control reboot now here we are at the boot loader again so we tell the system to probe the spy flash and now say spy flash read to some particular maybe I should say SF help SF read command we store to a particular address a particular partition a particular length so we'll say SF read and you'll probably remember from my device tray that this is where the beginning of my MTD partition is because that's the type of thing to remember on embedded first I have to give the memory address the reason to become an embedded developer is if you enjoy typing hex numbers you'll really love this profession so I've read that page and now if I say to display it oops actually let's read a little bit well let's play the whole page so here's the whole page of memory which looks like a lot of gibberish because it is if we look in here closely we can see in fact here's the information from the crash unable to handle no pointer dereference so that's how MTD oops works since the spy flash is an MTD device the RAM oops driver which I just mentioned here as the name suggests it's saving the oops to non-volatile memory and you might say hi I don't think I have any non-volatile memory on my system well this is one of the most exciting developments in my mind coming to Linux and computing in general in the next few years which is that non-volatile memory is coming not necessarily boxes of flash like NVME you know, Bitcoin miners or somebody can afford but actually dim format non-volatile memory that people can have in personal and embedded systems I've got some these are all magenta textures all hyperlinks articles about it but basically the imminently available non-volatile memory is called the next point by Micron and obtained by Intel but it's actually co-manufactured by them at a joint site in Idaho this memory has gigabit speed like the DRAM you have now but it retains the data when the power is off indefinitely and it's actually based on a technology similar to optical recording believe it or not but in chip form the advent of this memory is going to be really profound not only for Linux and it's boot process but really it's going to have a lot of a lot of impact on the Linux kernel an example that's been pointed out by Matthew Wilcox who's given a number of interesting talks about this is that the need for the block layer of Linux goes away so I don't know if people know what the block layer is but a major subsystem of the kernel is the block layer which is the in-memory representation of data that is part of file systems so when you open a file that most intrinsic Linux activity you're actually reading data from the file system from the permanent storage into memory but if you had memory that ran a gigabit speeds you don't need to copy the data you could just run from the memory you could execute in place as people call it you could have your stack and your heap actually be persistent so people have pointed out that this is going to have profound security implications and Matthew Wilcox is working on this but he's changing the name from XIP execute in place to DAX and rewriting the subsystem which is exciting to watch and this is not science fiction in fact Lenovo is announced a laptop where this obtained memory will be used in the cache that's going to ship this year so I wouldn't be surprised if the Linux kernel doesn't support this when it first comes out we may take a little while to catch up as we sometimes do but in the next year I think this is going to start being a big development and obviously the boot process will be affected because if the system can easily suspend with almost no cost and come really fast back up because it will be reading from memory then start to question why you'd really ever want to turn off all the way off a laptop more like the way you treat your phone but covered that in enough detail I'll probably spend the remaining time to talk about bringing up the kernel and hopefully get a little bit to talk about the NITRDs and in the meantime if anybody has any questions please feel free to ask but on to the kernel is a regular ELF binary in some sense ELF is executable in linking format all the executable binary executables on your system are ELF files whether you knew it or not if you use the file command you can see that just show you that briefly so if I change to a kernel source directory where my imx6 kernel is and if I use the ordinary file command that you can get with appget install and inquire what vmlinux, vmlinux.o arch arm boot view image and arch arm boot image these are all files produced by the ARM32 bit kernel and we see that vmlinux and vmlinux.o are both ELF binaries so if I say file binls that's also an ELF binary so these kernel binaries really are in some sense plain system executables view image that we use a lot with view boot is the same except with a special view boot header on it the file image produced by the kernel compilation is actually the raw binary without an ELF header so the file command has said it's a send mail frozen configuration which probably is not the right answer I'm thinking now you if you have an ordinary laptop like this and you don't happen to have a compiled ARM kernel you might be bored you might be bored anyway even if you do but you can look in your regular slash boot at the kernel binary there so vmlinux and slash boot is a linux x86 executable bz image what is a bz image well a z image is actually a self-extracting kernel image it has assembly code pre-pended to it that instructs the kernel startup code how to extract itself and uncompress itself so that header is where the message uncompressing the kernel comes from if you've seen that when booting an embedded target this is really a quite a slick trick now you can actually well you can get ahead of myself here let me go back and talk about so the kernel boots like an elf binary because it is an elf binary you can actually read the sectors from your kernel using readelf which is one of the bin utility you can use to look at any elf binary and you can actually use the tool extract vmlinux to get your kernel out of the vmlinux so just show you readelf minus e vmlinux and this is shows you the segments of your or actually excuse me the sections of your elf code if you've ever looked at a regular C program you're familiar with dot text and dot data and dot dss and these are the same things they always are so this is working because vmlinux is a regular elf binary if you have instead this vmlinux there's a pro script called extract vmlinux that will allow you to get your vmlinux out of the vmlinux if you want to look at it in any of the methods I'm about to show you and even if you don't have a kernel source tree I'm sure you could just download this pro script and use it so if the kernel starts if the kernel is an elf binary and it starts like an elf binary then clearly it starts up in the way that elf binaries start but how do elf binaries start who thinks they know how elf binaries start you think you know you cheaters how many people how many people think that that exact v jumps into main the main function and starts there this is obviously a trick right so the good news is that you can use your friend gdb to look at what your elf binaries doing even in a cross system that I will just show you so first let's look at a regular elf binary so I have here the program DRM info arm DRM info is part of the DRM test just show you what it does if I run the x86 version so here this program has just print out a bunch of information about the display hardware on the system but let's look at what's inside the arm version to find out how it starts so we'll do that by using our great friend gdb people say that we only use one tenth of our brain or something like that I think we know how to use 0.1% of gdb gdb just has thousands of features and here's one of many many that you didn't know about so we've read DRM info which is a regular user space program with gdb and now we'll say info files and here we see once again the dotdata dottext dotrodata dotbss sections of the program but here's what's interesting the entry point is also printed so even though this is an arm binary the cross gdb will read this and we can actually list the program where the executable starts up where the entry point is and lo and behold it's in armstart.s and this being gdb we can list a bunch more of it and we find out how this user space program and actually all user space programs on arm actually start they go into this start.s they program knows that the frame pointer and the link register which is the return address for the frame have garbage in them so the first thing it does is write zeros into there so it continues on and it reads up argv and it creates a stack it's all very logical so all this is is there are constructors they're called you may be familiar with that concept from object oriented programming at the beginning of each user space executable file that read the environment from the invoking shell that set up the stack and provide the heap and provide the standard in, standard out file descriptors to each process that's what goes on so since I've given away the fact that the kernel is an arm binary the logical question you'd have to ask if you are unnaturally fascinated with such things as I am is suppose we tried the same thing with the kernel since it's also an elf binary so if we now invoke plus gdb on the kernel binary let's use vmlinux.o which is the first program output by the kernel compilation which has the most information on it and this is a big program so gdb is taking a moment to read it come on gdb okay there we go so if we now say info files here's the sections in the kernel and lo and behold you've got things like ROdata and BSS and so forth just like you do in DRM Info, the user space program up here at the beginning is dot text and here's the entry point which we see is at 0.0 suppose we dereference that address and see what's there it works no point or dereference and in fact the kernel is not in ARM start.s which is where user space programs start because after all it's not starting in a shell it's starting from nothing actually I mean it's starting from a system whose hardware has been prepared by the bootloader but like the user space program it doesn't have a stack it doesn't have a heap no C language program can proceed without that type of resource you need to be able to push things onto a stack and you need to be able to malloc memory and create objects in any C language type of program and head.s does that for the kernel and for this a little bit more of this program using GCC we can see you don't have to know a lot of assembly I would say to be able to understand this stuff I mean really the take away from this whole talk is that the kernel boot process is surprisingly not one of the harder parts of the kernel to understand I think if you go and look at it with GDB you kind of get the picture so here's what the kernel does at the beginning it checks to see if it's if it's if it's in a hypervisor it is turning the MMU on good idea then we can have virtual memory it's got an entry point here for secondary CPUs I'm not going to be labored this but the point is just that by using tools like file, NM, strings that are just available in regular packages or from a NARO for cross tools you really can find out a lot about what's going on and this code this assembly code is actually very well commented and not that hard to understand with kind of basic knowledge of the kernel and I'm not going to show it but you can do if you use extract VM Linux to get your distro kernel from the VM Linus file regular plain old x86 gdb you can do very much the same thing so I'm going to skip past all that so a lot of these slides are actually just the things I've just shown you but screenshots for the web now let's talk about really what happens when we get past the preliminaries and we have a stack and a heap and all that groovy kind of stuff then we actually get into the kernels main which is in a file called main.c and once again this code is actually relatively pretty reasonable to read I think so if I open it up here with the mighty mighty emacs I go to the function called start kernel which is where the real action is here let me make this font a little bit bigger and hopefully that is big enough and so a lot of the code in this part of the kernel has under score and knit on it if you have ever read the dmessage buffer where you see a message go by that says freeing virtual kernel memory or something like that the kernel automatically frees all the resources associated with any function marked under score or under score and knit when it gets to that point of boot and you may become painfully aware of this if you are working on a device driver and you find that it doesn't hot plug if it relies on a resource say a bus driver that has been loaded at boot and was marked with a knit all the data structures associated with that driver are gone once that initial memory has been freed so that is a word to the wise I think it took me about two days to figure that out when I was assigned a bug related to that if we look in this start kernel and see what is there kernel starts sets up its own stack resources some highlights are setup arch command line this is the parsing of the kernel's own command line which is where it reads the device tree on ARM devices it starts the memory manager so it has virtual memory here in this start kernel function is where it prints out the kernel command line string so you've watched the boot process you know exactly where you are here you can an interesting point to think of about this is in this function the kernel is running single threaded so it actually use the case that unlike most of the kernel which is very asynchronous and has a lot of preemption scheduling interrupts if you try to follow the kernel with KGDB the kernel's own debugger you jump around all over the place and it's really really hard as with any concurrent system to figure out what the heck is going on half the time that's actually not true here there is only one thread running the kernel has not started the second thread so really this function runs then this function and then this function and so in some sense this is the simplest part of the kernel I think so all these resources get started timekeeping in it is when the time stamps start appearing if you have something that's slow that happens before this and you need to debug it it is very painful to debug speed performance problems without time stamps let me tell you that and of course just rearranging order of these functions is not really going to work so I'm not going to be labor what's in here all the systems that we know and love are coming up one at a time and then finally down here at the bottom for the grand finale do the rest there's a function called rest in it and that's where things really get interesting so let's look at rest in it okay so rest in it now that the kernel subsystems are up almost at the end of my talk and I've almost got my machine booted so I'll go right through this you see that there are two calls here to kernel thread so kernel thread finally is the program or excuse me the function that starts the second thread of the kernel the first kernel thread indication starts something called kernel init and if we look up at what kernel init does this should be pretty obvious tries to run espin init tries to run et cetera in it so kernel init actually starts user space the kernel has brought up all the subsystems it starts the second thread that it starts is user space and this comment is very indicative we need to spawn init first so it always gets PID1 that's why that happens then it starts another thread immediately K thread D because the first thing user space is going to want is more resources and so you have to get right down here and start K thread D and now if you go into the shell you can see you can see that K thread D is actually PID2 because as we know init is always PID1 so I'm up to 1054 in theory I have 5 minutes left I haven't gotten it all into K I haven't gotten it all into init RDS I'll just go right to my summary after this talk are references including a link to this book which everyone should buy if they are interested in this topic it's really excellent embedded Linux printer by Chris Halinan someone is welcome to take this copy since as I said I have my version 2 and I'll just leave the summary up here and take any questions that people have so thanks very much for your attention thank you for coming this has been recorded so we'll have copies of the slides also we're going to upload copies of the slides so they'll be available through the website so if you couldn't see them they're certainly going to be available there and like I said the talk has been recorded so thank you all for coming and thank you Allison any questions and thank you the next talk in this room will be about the Linux SPI subsystem at 1030 rather 1130 good morning and welcome to the second talk of the embedded track this is being recorded which helps us slide a lot so the slides will also be uploaded under the website so you can get them if you can't see them after this talk so be aware of that our speaker today is Matt Porter he'll be speaking today on the Linux SPI subsystem and he's been doing embedded development for a number of years and done some kernel maintenance in different aspects so take it away all righty thank you Tom thanks to Allison for kicking off the embedded track today we saw in Allison's talk how we got up got the kernel up and got going and now we're going to talk about one aspect of the kernel subsystems the SPI subsystem very popular let's jump right into it this is the third time I'm giving this in a month period of time so you all are not victims now alright so this has improved a bit the cover slide changes for each conference but some of the slides have been updated to reflect some of the questions I got at Faustome and ELC in the last month so let's get into it so first thing is the title everybody can be overwhelmed by titles we have a more diverse community now so it feels obligated I feel obligated to de-obfuscate the title name so if you have never read Heinlein's Stranger in a Strange Land don't know where Grock came from that's where it comes from and that's the beautiful cover art from one of at least the paperbacks of it but we're going to try to understand alright so we're going to go over what SPI is talk about fundamentals of SPI itself because we can't understand the Linux SPI subsystem and the facilities available to us if we don't understand the underlying protocol hardware specification we'll talk about Linux SPI concepts and then we'll investigate this subsystem via use cases because that's really what matters to people and how do I do different things so we'll go through four different major use cases that drive you to make use of this subsystem and then we'll talk a little bit about SPI performance how to extract the most performance what things you need to look at and then we'll look ahead at what's coming next in the SPI subsystem and what you may make use of alright so what is SPI so peripheral interface was developed by Motorola it's a de facto standard it's a four wire bus except when it's not so we're going to find out that it's not as simple as that but we'll look at the basic case and then we'll talk about these exception cases that's the beauty of de facto standards and even committee standards there is no maximum clock speed on this bus and it's often jokingly known as a glorified shift register so you want to see more details as you go through that you can just take a look at that Wikipedia page alright so what are some common uses because if we're here we're thinking about using SPI we run into it before you know you're going to see it in some key areas okay so Allison mentioned in her talk flash memory she showed some DTS files and so forth memory was in use in the MTD subsystem and another big area is ADCs and then sensors of all types good example that I've worked with before is a thermal couple where you need to be accessed at a very high rate LCD controllers we'll actually look at a couple examples of these in the talk and then if you've worked with Chromebooks and so forth the embedded controller and those talk over SPI so just a few examples it's not a comprehensive list alright now we get into our fundamentals okay we want to understand exactly how the SPI protocol SPI bus works so it's very easy we take that four wire case which is the original de facto standard that Motorola specified and some of their old their old microcontrollers where this comes from it goes back and so we have four key signals we've got MOSI depends on how you want to pronounce it that's how I pronounce it master output slave input it makes it very easy if you use those original signal names to understand what direction they go into just by the expansion of that acronym and then you have these other alternative names de facto standard so everybody adopting and being compatible with it they came up with all these other crazy variants on it so you've got SIMO, the SDI you can kind of see where we're going here then you have MISO so master and slave out so those are our two data paths what you may be detecting is this is a full duplex bus by it's very nature it's inherently full duplex and then we have a clock and I mentioned there's no maximum clock rate there's no committee spec that says this is a max so forth it's purely up to your peripheral you'll see it also known just clock and one of the things I wanted you to see here is notice this anybody who knows I2C will have those jump out and see yes some people actually use I2C naming on spy signals it's very maddening and frustrating and then finally we have slave select you'll see it referred to a lot of times is chip select or a particular value or enables and so that's what actually activates a particular spy chip we'll talk a little bit more about that in the upcoming slides so this is my masterful diagram I know nobody can do diagrams better than I can and what you see is exactly what I promised the most simple case one spy master one spy slave and then we have our four signals with the expected direction okay so chip select or slave select is something asserted by the master device we have the most you and me so that reflect the direction they go and then of course the clock which originates from the master now we really need to understand timing diagrams to understand spy so bear with me we're going to get into it here it's actually very simple so I show and don't worry about write modes just yet we'll get into that detail in the next couple slides but I'll show you a write and then a read that's just eight bits okay so when what we have here and we have what you'll see is data being stable on mo see remember master out slave input we're showing writes okay and what you'll see is also this chip select or slave select it's being asserted low for this entire eight bit right cycle okay and you'll see that it goes low this data becomes stable and in this case we're clocking it in on the rising edge of this clock okay and you see that down the line very straightforward you'll see the inverse if you will happen here where the data is stable here on a read remember master in slave out right so the slaves driving out some data on this line and the masters then latching it in here on the rising edge of that clock okay pretty straightforward so now this is the part where people tend to get a little confused as to you know what are these modes you know maybe you've looked at spy before it didn't fully understand what the modes are well you're going to understand them when you leave here so you have two clock characteristics to define a spy mode okay you have the clock polarity typically called C pull okay and then you have clock phase C phase okay and typically you'll see these these monikers how it varies and all this means is with clock polarity we define zero is the clock idle state is low all right so when it's not being asserted it's idle low okay and then one is clock idle state high and I'm going to show you some timing diagrams so that you visually have this clear okay and then C phase if we assign zero as the data is latched on the falling edge of the clock and and out and then it's output on the rising edge and here when it's one our phase is it's latched when rising and output falling what you end up with these these spy modes you have modes zero through three you'll see those referred to in the Linux by subsystem and any other software that you deal with respond to the simple truth table okay now now that I've said that it's a simple truth table it's simple except when it's not again which is our theme here today you go look at a microchip data sheet old ones and they actually define modes differently again the beauty of de facto standards this is what you'll find in the Linux world this is what you'll find on everybody but they're terrible data sheets so now they went this different route alright so let's take a look at these how do these look visually so again we come back to this right mode I showed you the overall timing in our simple case and so remember that when I'm doing I'm going to show you all right modes for these four different modes mode zero one two and three but these are all right cycles drop me so off there just to illustrate this and so what you'll see again we see we see our slave select be asserted okay our data stabilizes and just like the original example it's being it's being latched on this rising edge here so you see how it lines up with each of these bits of data okay and if I go to right mode one okay it'll low and let me just review on this one so clock starts off it's idle low and once I'm clocking in data okay then it's asserted high now the data is latched on the falling edge in mode one so if you see the difference here we're latching on this falling edge where the data is stable here okay we're latching on the rising edge so you see that slight difference in the timing diagram that's the difference in the phase part so if we go back to that truth table right we see that mode zero to mode one it's only c-phase that's changing right right the polarity of the clock was the same in both of those but the c-phase changed so that's what it looks like there now if we go to right modes two and three the same thing happens now we're keeping we have in this case notice that our clock is idle high right and it's driving low okay so what we see here in right mode two again c-phase changing in these two examples and we see that data is latched on the falling edge in mode two each of these times where it's stable there and then on mode three the data is latched on the rising edge again the clock is idle high in this case data is latched on the rising edge this is something that's a characteristic of the peripheral you're dealing with so you have a spy peripheral the data sheet will show in a timing diagram what mode it wants to run at it may handle multiple modes but you have to program via whatever your software is one of these modes to be driven alright so that was great nice and simple we had master and single slave and four wire but then we get into well it can be more complicated right everybody had to do their own thing but the biggest place where it's more complicated is we have multiple slaves okay and that's where your slave select line comes into place so you have one chip select for each slave alright and then you can have crazy things where you have daisy chaining of different signals and the common one you'll see and I believe even the Wikipedia page will show a variant of input to output daisy chaining so you're doing mocy to me so down a set of peripherals and another one that's kind of unique is the ability to daisy chain chip selects very rare I've seen that on these anodyme field programmable analog arrays where they actually daisy chain chip selects which is kind of ridiculous you won't see that one much but there's a lot of variants everybody's free to do what they want for the most part this is pretty much a common practice this is the most of them you'll have a dedicated chip select and not have to worry about this kind of daisy chaining the other thing is we talked about it being a four wire bus well you get into these flash parts that Allison first brought up and is a major use case in order to get higher rates of speed for the read side which is your fast path when you're dealing with flash they've gone to dual and quad lanes instead of a single master enslave out your input into the master from the peripheral now they can have two lanes or four lanes so if you're familiar with say a PCI Express you can see how same thing with the serial line you're increasing your throughput so you have n times the bandwidth of just that four wire type setup and then micro wire defines a three wire setup where the MISO and MOSI are actually combined and it operates in half duplex kind of crazy that's well documented elsewhere we won't go into that one it's somewhat rare to have to use that in common case we're going to talk about more of the common cases another masterful diagram is what it looks like when you start hooking up multiple slaves as you can see we have multiple slave selects we have in this case SS1, SS2, and SS3 all being driven by this spy master and you notice that all the signals are hooked up together so we'll look at a timing diagram of what that looks like so what happens is when we think about this from a hardware perspective this becomes essentially your address select so if you think about address decoders and so forth and typical memory map things you can treat this slave select as that differentiator so it's actually a unique identifier on the bus yes in the common case you need a unique chip select for each of those slaves but notice that we're still we're three wires but then we have an extra one for each of these slave selects so it does expand from there yes, very good question alright, so this eye chart I like the warpage at the end there too this is very similar to the other ones I showed you again I show right mode zero I'll show you in that diagram where we had three different slave selects or chip selects we see that first one gets asserted low here's an 8 bit right right mode zero, rising edge and then the slave select two say we want to do 8 bits to that it gets asserted low and we clock in and so on very straightforward that's how you uniquely address and so typically when peripherals aren't selected they'll tristate their output so that's the only if you were doing reeds then they're the only person that would be asserting things on me so alright we've got our fundamentals now I've got a question, yes in terms of this bus fundamentally so it's we use this one so fundamentally spy is a synchronous bus and what that means is that everything's tied to this clock clock everything's tied to this being asserted this clock being active and so each data bit is clocked in in terms of rising edge so it's fundamentally a synchronous bus as well so getting into the linux spy subsystem itself we want to know how we use this in linux we know our fundamentals what we have today is we have in the spy system a concept of controller and protocol drivers we start to introduce some things that are linux specific terminology so we'll talk about how that maps so what this means is that controller drivers are what we define as supporting a spy master controller so we go back to our diagram that's the thing that drives those signals okay and then it also controls characteristics like the clock frequency in the mode that we learned about right so the controller driver has all the smarts to drive that bit since it's an embedded track on our SOC right that controls that S clock what phase that it's actually going to latch data in off of the MISO line in the case of read and so forth okay and so an example would be if you were raspberry pi fans I picked out BCM2835 AUX is an example of a controller driver you'll find in the spy system now protocol drivers are what actually support the spy slave specific functionality and the key thing to keep in mind here is I mean obviously the maybe not obviously the software doesn't run on that slave the slave is attached over the bus the software runs on your high end linux processor right so supports the protocol that a particular peripheral implements okay so the low level controlling those signals split into the controller driver and then there's a very firm disconnect via APIs in the kernel and also in user space of separating out this protocol driver so it's independent of this low level work okay if you work with microcontrollers like that typically you don't see that kind of separation we have like we have in the linux kernel where you're masked from the details of the controller and so the spy slave specific functionality and a protocol driver we have an abstraction when you work with the slave protocol you'll implement it in terms of messages and transfers we'll talk a little bit about what messages are used at the moment but again relies on that controller driver to program the low level use an example would be this mcp3008 ADC we'll use that as an example in this talk yet so getting into transfers and messages so we have fundamentally any protocol driver that's written it's written in terms of linux by subsystem transfers so a transfer is a single operation between the master and slave it contains information the kernel structures that define a spy transfer that are in the spy subsystem they contain things like transmit and receive buffer pointer remember when we talked about fundamentals it's inherently a full duplex bus every time you're doing a transfer operation although I don't show it in those original examples you are clocking in data potentially full duplex on meso at the same time you're outputting a lot of devices don't use it right but it is happening at the underlying level so a lot of peripherals you may be right only and I'll show some examples of that where you just discard the rx side right so to put it in software perspective when you define a transfer it's only in the right direction you would have a tx buffer in your transfer but you could have your rx buffer null and so you'll see that regularly if you look at any drivers protocol drivers in the linux kernel then you can also define all these characteristics of a transfer we talked about modes and chip select and so forth you can actually set per transfer how that chip select is it de asserted after a transfer does it stay asserted we'll get a little bit more into how you can do that later in the performance section and then you can set optional delays so you may need to modify a lot of this depending on what your peripheral with the datasheet says to do and then messages are just an atomic sequence of these low level transfers okay and the thing to keep in mind is a message is that fundamental argument to all spy subsystem rewrite apis so you'll deal with messages a lot if you're doing anything with a protocol driver and here's a really poor diagram of a spy message just to visualize it it's just imagine a link list of those low level transfer descriptors okay alright so let's talk about use cases the way we get into this is you know first thing people tend to want to do is I've got a protocol driver and I need to I've got a there's a protocol driver in the kernel right and I need to hook it up on my board I've just attached it physically and how do I get this protocol driver working with my board so we'll look at that and the other one is I got this new device I want to write a protocol driver how do I do that these are written for hardware that's out there I want to write a controller driver most of you will find that whatever SOC you're working with already has that controller driver done for some time and then finally very popular especially in the maker community is I want to write some kind of user space driver to control that spy slave or that peripheral rather than a kernel driver so let's start with the first one I want to add this kernel to my system we need to know the characteristics of our device so the first thing I find is people don't always know how to read data sheets to get the information they want it's not thrown out at you so we'll look at that real quick with an example and then keep in mind there's three ways to do this because we can't just have one way to do things in Linux we need choice so we have device tree talking embedded systems board file approach which is deprecated by both device tree and ACPI if you're on certain platforms and then ACPI most of you will see that on x86 and maybe the world of ARM servers alright so I know that will be hard to see in the back but this is just some clippings you can always go back and look at these slides so I clipped out an example of a part that I've worked with the past and wrote a driver for it how do you know that SPI is being used on the part well hopefully at the beginning of the data sheet it says something useful like this is a SPI part so that's our first take away then you'll notice I wanted to prove to you that some crazy data sheets use I2C nomenclature for SPI signals okay and that's because it supports both I2C and SPI but they're actually showing the SPI behavior here with the same signal names so I did not make that up earlier this is a real clipping out of that and so the important thing you'll see here is that when you're looking at let's say we need to know what the maximum frequency is anytime we're working with SPI devices so to add one into our into our platform we need to know characteristics like what's the maximum frequency that can be driven by the controller driver it's not going to jump out and tell you hey 10 megahertz is my maximum frequency all you get is this nifty little timing diagram so you have to be able to look at this and say okay here's the full period from here to here and then you have to well, sky in the sky and then you have to look down here at the table that shows you the minimum times that those can be held so they give it to you in period and so for example here's your high and low pulse width so you add that up 16 nanoseconds would be your period if you know basic math to reverse relationship of period the frequency now you've got your maximum frequency right so that tells you the low level piece but when you write a protocol driver there's also things in the data sheet that tell you what the big picture of the protocol is and this type of part here this ST7735 it's a LCD controller very high speed you can you can run video out to it over spy if you've got a dma capable controller and so it's showing that it has like a data command bit and this is how you access registers you write a bit out and so forth and so it's showing you delays in between each write so you think back to how I showed you transfers you can control that minimum delay that's the most performance out of the thing and you're trying to avoid exceeding the capabilities of data sheet you need to understand you know what kind of delay you need between these these accesses so you're going to look in the data sheet for this overall description of protocol this is a quick overview of it I mean there's like three pages of how that works but gives you an idea that there's more than just that low level timing okay let's look at mcp308 we're going to use this in more detail same thing it told us it's spy it's got a similar way to show us ti and t low there's our full period if I go down here to it's 125, 125 250 nanoseconds is our period 1 over 250 nanoseconds we find out our maximum frequency for this device is 4 megahertz so we'll use that as we go along same thing applies it's one thing to know that low level again here's an example that's showing us a timing diagram at this kind of macro level of what happens this is an analog digital converter and it's showing what happens after the data transfer it tells you when the conversion takes place and when you start getting data out from the device and in this case the out means it's the master in slave out peripheral tonight I promised we're going to see how to hook it up so we take the dt case first which is the most popular one so the first thing you have to do is you have to go look for your binding and this isn't rocking device tree in linux so you can actually look at my slides from two years ago about some more details on the device tree so I assume you have that base knowledge but where you actually find what you want to know about how to actually write your dts part is in the binding and so we find that for an mcp3008 this is just a snippet of it you can see I kind of trimmed out where you see the full stops here the mcp3008 we're going to need that compatible string and it gives us a little example of what the dts file look like we need a spy max frequency we need reg important thing to note with reg that corresponds directly to your chip select I promise to it's your unique identifier on the bus and that's how it's used alright how does this get used well in the driver you're going to see if you look at a driver how does this get matched up, sure enough there's an mcp3008 compatible string there it's in this dtid it gets registered with the kernel driver model and there's where it gets hooked up and you'll see this pattern this is all kind of snipped from different pieces in this driver file this protocol driver but you'll see that pattern everywhere it's just boilerplate stuff alright, how do we actually put in our dts file there are many ways so this is an actual device tree overlay fragment this is actually the old syntax there's a new syntax coming because everything must change before it goes mainline this is actually what you might see if you work with beagle bones and so forth you'll see this older syntax that's getting ready to change if you're actually modifying a dts file in your kernel build you would have a slightly different syntax inline Allison showed an example of that in her talk but here's the meat of it you see that it lines up exactly with that example the other deprecated way is a board file you'll find that same information we needed our max frequency notice 4 megahertz is in here it needs to know the bus number and the chip select in this case it's just 0 that's the only chip select in this example I'm giving and then finally let's say you have the misfortune to be on a system where you need to use ACPI you get to use this stuff that's not human readable you start to see things kind of line up here similar thing they don't like to use decimal so you know that's 4 megahertz because you do hex in your head same kind of stuff but you see all the data is there different formats there's plenty of documentation on how to do that but everything lines up the same with the same fundamentals that we've learned so that's hooking it up three different methods protocol driver first thing you want to do is learn your driver model and just a little snippets of this all you need to do to instantiate a SPI protocol driver is you're going to instantiate this struct SPI driver you give it a name you may have some power management operations that are standard pieces to define elsewhere and the important thing is you just need to define a probe routine and a remove once the driver is matched let's say those compatible strings we talked about that's what actually matches and binds a driver once this probe gets fired off that's all driver model standard stuff nothing different in the SPI subsystem once you get there you can actually start making kernel API calls in the probe routine it's done so now that's all there is to a protocol driver so what are these kernel APIs first thing is remember I promised you're going to use SPI messages as a fundamental argument to these things so we start with asynchronous calls and not asynchronous in terms of the bus being synchronous but this is in terms of kernel perspective and behavior so SPI in sync is the fundamental low level kernel API it's an asynchronous message request so that means it's going to return immediately it can be issued in any context you can be in an interrupt context and make a SPI in sync call but it's not complete when the routine returns there's a call back executed when the message is completed that's not used very often in Linux kernel the biggest place you'll find it will be there's a couple network drivers imagine network drivers work well in this case because they're typically optimized around high throughput so they don't care about the latency of getting that for a set of messages on the wire so quickly so what you'll see almost all the time is used to SPI sync or some of the helper things and so it's a synchronous message request so it's not going to return until it's dispatched and you can't execute it in a context that can't sleep so as I gave an example with using ASync in an interrupt context and what's interesting about it it's just a wrapper around SPI async finally what you'll see also very commonly is these helper functions SPI write and SPI read if you look at some protocol drivers that you like and those just wrap around SPI sync all of these take a SPI message that link list or descriptor chain of transfers they're all those unique fundamental transfers at the low level alright there's some specialty APIs SPI read flash that's actually optimized for SPI flash commands so that's such a unique area of SPI devices they have a standardized command set and one of the things that happens on modern controllers and for example comes to mind the X7X from TI has a quad spy controller that actually has a memory mapped IO area where when you access memory it will turn that into a SPI read for example and so you can actually XIP out of that memory mapped IO and these routines actually are optimized around those types of accesses yes absolutely they'll be uploaded soon today alright so the other things you have helpers to help you set up these SPI messages you're going to use so as it's just a link list of descriptors you have some helpers, SPI message and NIT to get you an empty messages and then you can add transfers that you've defined with SPI message add tail you build the thing up you pass it into one of those routines and you go forward let's go over to controller drivers again standard Linux driver model surprise in this you allocate a controller with SPI Alec master notice that we depart from saying controller and we say master just because legacy and with this you have to go set a number of controller fields and methods we're just going to show the most basic case that most controller drivers see will have so mode bits would be things of which controller or the hardware can typically support some subset of this wider set of features we talked about for example I don't show if it can support say quad SPI you would expose SPI RX quad and that allows pro called drivers when they're matching to know what the capability is of the controller are so you have that abstraction I think you had a question are you asking in terms of Linux kernel standard library drive yes so you'll see the controller drivers here in drivers SPI in the kernel tree just referring to that tree you'll find all the controller drivers in that hierarchy protocol drivers you'll find scattered throughout the kernel the protocol drivers for all your peripherals because they're going to be in the subsystem that matches what they do for example a SPI flash device it needs to be part of the MTD subsystem which is all of the NAND nor all these parts a analog digital converter would be part of the IIO subsystem the industrial IO system weird name but that's where all the ADCs and everything are because those subsystems actually provide the API for that class of device and the low level nature of the SPI the SPI interface is in this subsystem yep oh yes and so the question was is there a standard library of for all these SPI drivers controller controllers and the answer is yes the protocol drivers are scattered about so throughout the kernel so the controller drivers are fundamentally in this driver SPI subsystem alright so another couple things you have to have set up a cleanup those are pretty straight forward and then at the low level you have either a transfer one message or a transfer one routine that actually does that bottom level meet of clocking out or clocking in data and getting it passing it back to a protocol driver or multiple protocol drivers those are mutually exclusive typically transfer one is used because it allows ability to use GPIO based chip selects since many parts are limited in how many hardware chip selects so just to point out what I mean by that the hardware chip selects are the actual SPI master hardware typically will have a limited number of of those slave selects or chip selects that it can drive directly as part of programming the controller so most modern controllers on Linux capable SOCs have don't bit bang SPI right you program it and issue a whole dispatch a whole set of messages for example right and that typically back ends on DMA and so the last thing for those is you just register the master and then that controller will show up in your Linux system so the question here is is there a race condition between the GPIO chip selects and the actual controller and there can be as it stands now and I know Michael who talked about something related to problems you ran into with GPIO chip selects there's still problems in the core with those in some cases right so it depends on the controller and so forth and DMA so it's still not perfect yeah it depends is the right answer okay so let's get into user space the last big use case SPI DEV many of you may have encountered this primarily for development and test most people will want to write a kernel driver to get the proper performance out of their device but you might want to like play with things to start or you're doing testing on your platform that typically happens to the SPI DEV interface and that exports all of those kernel APIs essentially to user space and so how do we make use of it well there's these three compatible strings in the kernel today and you could actually hook up to one of those okay you could put SPI DEV itself as your compatible string but that would be wrong and not recommended it will bind because of the nature of how these the driver core actually will match on a a mod alias and that's just the detail of how the matching happens it just happens by luck okay if you do that and you'll see that a lot and like Raspberry Pi stuff downstream and so forth you'll see this error show up it is a benign error you can use that but I didn't tell you to do it okay so the reason that it's your ass not to do that is because it's a production and so using SPI as a compatible string departs from that it breaks that so just use that as testing don't publish that to your end users using that method alright you can also use ACPI if you love ACPI and so there's these these three special IDs where you can hook up SPI DEV with ACPI again the mod alias SPI DEV will bind a device there so that covers our three methods so what happens in the user space driver once SPI DEVs hooked up on your board is you get in sysfs you get a bunch of devices for however many peripherals you hooked up SPI DEV bus the bus number and the chip select remember that chip selects we've got one for every device and then you have these device files they're just character devices you have the same nomenclature you have a bus number that's on because you have multiple master controllers and then a chip select for each peripheral and so you have open close you have read write the interesting thing is read and write exposed user space are inherently half duplex operations so what that translates to I gave you a hint of this earlier is if you do a write you would end up with a TX buffer but RX being null and so you could use those for simple things if you want to do complex user space drivers you would use IACTOL interface and use these SPI IAC message IAC read and write there's a whole set of them where you can control all the parameters you can set up a message with a link list of transfers their fundamental difference just like you do in the kernel space so all that's analogous and available albeit with the extra latency of context switches and everything coming from user space so again always good reason to do a real protocol driver if you want proper performance you want to know more about SPI DEV you want to start as always in the kernel documentation area however nobody tells you that things are buried in tools SPI and so there's this full duplex example that uses that IACTOL interface that's a great one the SPI DEV test is great for just checking out fundamentals of your platform and then a couple shoutouts here Jack Mitchell has this great libsock framework if you don't want to work with the low level SPI DEV interface that I showed the raw character device then you can go use that tool it's a nice library with a consistent API and C and a lot of people use these Python SPI DEV bindings so we'll hit performance quickly one thing you have to keep in mind is that we have this great separation between the controller and the protocol driver and that's all well and great until your performance sucks and so that's when you have to start actually understanding what the controller is doing and so forth so a great example of this I'm going to tell you there isn't a fundamental answer to these things so I just kind of give you these overall hints of where to look because every situation is different and these guys are laughing because it depends it just depends on your hardware, your system how good your controller driver is what assumptions it's made and so here's a great one the OMAP Mixed By driver it's been like this since 2011 it is today I don't understand it didn't use DMA for some reason well hard coded in there it doesn't have heuristics deciding when to do it like DMA always has some setup overhead so they've hard coded if you have less than 160 byte transfer it just uses programmed I.O okay so you need to be aware of that when you're chaining things together if you're trying to talk to something and you've set up messages that are smaller then you're not going to get DMA engaged and that may cause you a problem okay the other thing is and I touched on this already you need to know where to use the sync and async APIs and async I mentioned it's using network drivers network drivers typically have this you're going for bandwidth typically there are use cases where you're worried about latency like voice over IP and so forth so the network drivers end up optimized for throughput and so you'll see the async works fine because you can have callbacks hooked into the refills for network buffers for SKBs and so forth so it fits very well works a lot like descriptor rings and completions on a memory mapped controller so that whole behavior works and that's why it's literally the only place you'll find spy async used in the kernel I don't know if there's any out-of-treat drivers using it almost everybody wants a low latency access and wants to come back and return and know that the transfer has been dispatched for all the cases different use cases devices and one of the key things to keep in mind and I'll show you how to actually look at this is that spy sync operation and that goes for if you're using it from user space or from kernel space it will now as of the 4.x kernel versions and this is good reason for you using downstream kernels before kernel 4 to upgrade is that the sync call will actually attempt to execute in caller context so remember that when we talked about the difference of spy sync it may sleep so it may go to sleep you may have context you get scheduled out and you gotta wait to get scheduled back in a huge hit on latency so if you're optimizing for performance you wanna be using spy sync and you also wanna make sure and look at whether you're actually getting executed in caller context so it will actually dispatch to your driver and then however good your driver is getting it out on the wire of course I'll explain where you can find out how that works some of you in the back may not be able to see this last thing I talked about there's characteristics of if you're doing transfers okay you can set this fields chip select change you will not find in the spy subsystem docs this detail you need to go look at includelinuxspy spy.h and it explains to you in detail how the chip select behaves as you set this read it carefully when you understand your data sheet because I can tell you it's only been a year and a half I think the IIO subsystem had this implemented wrong because it's a little bit confusing and they actually were using CS change wrong until about a year and a half ago but this will allow you to do things like if your peripheral can in between transfers can support the chip select asserted and you can use this as a hint say it's the only device in your system you can actually optimize around keeping that chip select asserted because what will happen is let's say using GPIO chip selects there's a lot of latency in that thing getting asserted back high right and then being asserted low again and so you have this big gap in between transfers so you can use this to optimize around just leaving that thing asserted if your peripheral can handle it maybe you only have one peripheral in the system and then you can squash those transfers close together so definitely read on that the other thing is if you're doing performance work you need to have visibility sort of like the GDB comment very powerful tools but nobody uses all of them you need to have a logic analyzer because you need to see what your transfers look like on the wire how have I got there especially when you're abstracting a Linux kernel with that controller driver you don't know what the heck it's doing before it gets on the wire it may work but you need to see it you can go take a look at the Linux page SigRock has a great page of their supported hardware very good detail on all the cheap logic there's no reason not to have one but it's $30 so yes spy simulators there's definitely test equipment yeah there are on the market you'll mostly want to do this side of things with the control you have over it the other thing you can do if you have questions about your controller driver and its behavior there's a spy loopback test very great it tries every combination level of different transfers and you can look at the performance and how do you look at performance well the great thing is you don't always have to use a logic analyzer there's this spy subsystem statistics now and that came in 4.x as well you can see how many messages you sent transfers, timeouts that happened in the spy subsystem and key thing here we talked about that executing in caller context the spy sync immediate if it's executing in caller context then this will increment these are all exposed in sysfs so you can see the percentage of how many things actually executed in caller context so the lowest possible latency versus ones that just executed and slept so you have that visibility and then you also have this nice histogram set of things and that's just wild carded to show that there's a bunch of buckets it'll show you a histogram of 2 byte transfers and 8 byte so on and powers it too so really great when you're kind of looking at that overall performance maybe you've got multiple spy slaves so what's the future so slave support is coming believe it or not sounds crazy in the context of linux because if you think about think about the nature of we talked about it being inherently a full duplex bus you have one clock cycle to respond and get a bit on the wire right if you think about it at the low level so there's hard real-time issues right you have to respond quickly however as the author of the subsystem is noted and people have done ad hoc implementations of this without the spy subsystem Geert Ederhoeven has sent a, it's in v2 it's an RFC patch series you can look here at it there's a couple use cases where you actually use it without any sort of RT preempt or dealing with it offloaded in a microcontroller or something pre-existing responses so you've got something like maybe time of day and you're just going to respond back you can have those buffers ready to go and those types of things you can respond to as a linux host providing slave support and then just commands so we're just right path, you can handle that so responses implementing something where you're instantiating registers and so forth it's pretty hard to decode a register and respond in a clock cycle at say 20 megahertz clock so with all the overhead works just like registering a master in his setup just allocate a slave what it looks like is you end up with this spy slave area there's a bus and then there's a slave node for each slave controller and then what you can do is when you build the slave protocol drivers the way you use it is you just echo that slave driver into that the slave controller node that's exposed and so what Geert has done is he gave two example protocol drivers and you can look at this on your own but spy slave time is that first example I gave and then spy slave system control where you can power off, reboot, halt the system by sending commands over the spy bus something doing that so that's it any questions remaining? oh good okay so the question is so he had asked if I have a feature that's not yet supported by the spy subsystem I'm already leading the answer here you know should I do something completely different he's got a peripheral where the length of time the chip selects held is part of the protocol beyond what's supported right now in the subsystem and so the answer is what you should do is get a feature added into the spy subsystem that allows you to select that as part of transfer of how that's held and so forth and we'd have to talk a little bit more in detail about that it's always a visual thing that's the timing diagrams so I had to support that now but I'd have to understand that better good question but overall the idea is the spy system subsystem maintainer is very good about supporting these things so just patches welcome is always the answer I think somebody else had a question before we run out so the question is what tool did I use for doing the generic time diagrams and that's a great one, I meant to actually put a comment in there it's WaveDrom it has a little JSONWaveDrom yeah it has a little JSON input put file and you can run it through they've got actually a web based editor so you can just pop it up right now and try it out it makes nice timing diagrams I was all excited when I found it I didn't want to steal things and cite them off of the web page I had specific examples I wanted to make good question what's the question in that and then I'll repeat it I mean is that possible so the question is at any point out that you can use very low cost microcontrollers that have a spy master in them and then hook the slaves to that right and actually it's actually a good point there are times where that can make sense most things you probably don't want the abstraction in there because you're talking about a master approach and you already have plenty of spy masters on any of these modern SOCs so it's not a problem they've got DMA support usually on the spy slave side that's where that becomes more interesting because you can implement a spy slave and maybe you have a communication path back to your Linux host if you're doing a mixed system but there are cases where you might want to do that approach and I'll give you one where there are people doing things like there is a point in which the overhead of the Linux spy subsystem and some of the high rate ADCs and we recently did one of my colleagues who did an implementation for the PRU which is analogous to what you're doing on the AM335X they had an ADC for an engine controller and we can't achieve the rate of response in Linux with that so we offload it to the PRU in that case so extremely tight deterministic constraints there you know 90 it's like everything with that 95% of what's out there just to make up a number but in practice usually you're not doing that but you're always going to find a case that doesn't work right in this framework on the master side but absolutely you can go offload the type of thing there's even you can do things too where you have a master out on a microcontroller and you can actually do that through say remote proc and create a spy master in Linux that controls that abstract you might not want to do that in the low latency tight deterministic constraints type situation where you would do that but absolutely that's a possibility too there are many system designs that's for sure yeah don't stream video over spy well actually take that back I mean we had on that ST 7735 we're doing 30 frames a second big buck bunny on a 1.8 inch LCD so anybody that's played with the FB TFT driver the precursor we were running that on that at that rate because when you've got DMA and you're able to burst the whole frame in on the vertical blanking interval without a problem it's worth pointing out though bomb cost wise a buck can be a really big deal on a consumer product so breaking out a copro sensor complete non-starter depending on your project so that's way too expensive so one dollar on a small project no big deal one dollar on a really big project that's potentially millions of dollars right there so not going to happen in a lot of situations yeah and there's specific things where people use where you know maybe a power controller and so forth that chromium embedded controller we had a system that was almost exactly like that but it was connected over UART could have been over spy but that was a case where the power controller was in that so that just depends on the device okay one more wait coder 4 let me understand this before I repeat sure so yeah I'm not normally a philosopher but when I am I do it at scale so typically I mean typically we find the application developers there's definitely a divide between them and the the driver developers right and it doesn't matter whether it's Linux or something else they're typically going to be a bit more abstract especially with modern frameworks and you know desire I mean higher level languages right so I mean the dividing point is probably when I showed for example the user space stuff we were talking about spy dev but you would typically in a good system have a protocol driver written that is you know high performance and a great example is actually this whole LCD controller right we have either the FB dev or the DRM framework right and you can write a driver that's high performance right that talks in user space through those interfaces right the application developers right that are accessing that could be working in Wail and Weston way up above and cute right and so there's a big gap between that and the person that actually wrote that protocol driver right I mean we're talking we're going through it's not just the spy side of it it's they wrote a DRM driver there's you get out to user space and there's there's Wail and Weston there's cute Wail and running on top of that and then finally you have the application developer writing a QML app on top of that so philosophically yeah I mean there's generally a gap unless you're writing low level you know type you know the people that write the applications can sometimes be the same in a small company right I mean I do both user space and kernel development so so good question yeah not when you're writing the protocol driver the protocol driver is independent of that so it doesn't know anything about the controller or anything the same protocol driver or run against any controller when you're yeah so that's when you're hooking it up okay in the ACPI case okay and I can actually flip back there since we're not running into somebody I guess yeah we are that's right that's why I'm so hungry right plus I'm on eastern time did I all right so here notice that there is no bus number sorry I'm on the driver one there's no bus number there except in the target thing and this is specific to overlay notation okay and back in the driver sorry back in the driver itself so this is a great question so the question is do you have to have the bus number encoded in the protocol driver overall and again the controller driver and the protocol driver are completely independent when you hook them up depending on the hookup the way you hook it up your platform and instantiate or describe how it's going to be instantiated that's where there's a difference okay so the driver itself what you'll see is there's no mention at all okay this is the DT version it just has to know about the compatible string okay when I show the the overlay fragment in DT you see the target spy zero that's actually a textual reference to the spy master okay which is out of the scope of this fragment okay but that fragment is otherwise independent right of where it lands so it could be spy one that lands so when you do hook it up it's there but we're not mentioning it in a numerical fashion like a board file one when I show the board file one okay when you would hook up and out of the scope here was the controller driver would um be referenced as you would know the first one that got set up in the old board file stuff would be bus number zero okay you just have to know which one that's why this mechanism is deprecated and bad right you've got a nice um with device tree you've got a nice ability to say I know that spy zero corresponds to spy zero um or whatever it is for example mix spy in OMAP I go to the data sheet and mix spy one if my schematic shows that this device is hooked to mix spy one then I know I put spy one which would correspond to mix spy one as the that target okay and so you have a strict definition of the hardware and everything these are a bus number here in the old board file stuff the Linux fabricated thing that depends on the ordering that the controllers were instantiated so that's why it's despite the sometimes obtuse nature of the syntax here you actually have a clear definition of what you're attaching to if you go through the rest of the DTS files for your platform okay not clear in this context and then the exception again the CPI being evil you have the case where it's showing you scope spy one and buried in there but what they do is they instantiate the spy zero bus and but they have all the polarity stuff and everything kind of jumbled in with that so they take a different approach to it but you should not unless you're working on a very old kernel have to deal with that or everything's controlled I mean are you talking in the device tree case specifically because that's pretty much you know but yeah absolutely true that you would have to when you instantiate it right the spy board info you would have to specify the bus number you do that in a more generic sense with device tree it takes a bit more to unravel this back to a schematic because the end thing we're doing when we're hooking stuff up anybody else? okay one more performance questions trying to think back the last time I did high performance that ST7735 we were doing I forget the resolution of it now you're asking me to do math in my head from years ago but that was a I don't remember what I think it did 160 by 140 resolution 16 bit color depth and we were doing 30 frames a second and with DMA that was no problem and that was like a 40 megahertz spy clock or so running that you can do the math on that what's that in the byte throughput I can't remember if I have the resolution right but that's actually a fairly high rate of speed and that's not even touching the capabilities of what most spy controllers can do these days so whatever 160 120 times 2 per second bytes that was no problem all and that was when we fixed the bugs in the DMA engine driver and so forth the back ends the controller driver then that thing just ran fluid no problem at all I mean no CPU usage at all hardly so to answer your question about performance I can't tell you an upper bound it's going to depend per controller driver the biggest constraint is latency that's the thing where you fall over with the Linux spy subsystem eventually right in rare cases right so if you're really trying to get samples at some specific interval for an engine controller then you have trouble it's not really on the bandwidth end and the quad spy parts for flash are actually higher speed than what I quoted on their reads so one last question then we've got to go here because it's lunchtime had to be you a long transfer but the controller doesn't handle it and you have no way to there's no way to divide it up because the actual part insists on that being a complete transfer with the let's see it's defining the transfer how long that chip selects asserted you can use a GPIO chip select and screw that controller to put it lightly right so don't use don't use their hardware the limitation is that your controller has a hardware chip select that prevents that length that transfer being asserted I would go and use a GPIO wire up the GPIO and then you can leave that asserted however you want now yeah thank you for coming everybody okay once again the the actual talk has been videoed so you can actually go look at it online that's available through the scale the scale channel on on I could see it YouTube that's the place and also the slides will be uploaded so you'll be able to look at those as well much closer and they'll be on the scale website so thank you for coming and we'll be back here around two for the next talk thank you 130? I don't think so it's 130 okay it's 130 so we better get to lunch or we're not going to get any that's where my recommendation was why don't you use Graybus and use that protocol because it's worked out all of the funky things I remember we had we had the thing done on paper and we abstracted all the it was I2C first and I hooked up an AT-25 and we had this prototype rig where we could test against real hardware and we found that oh in our silly way of looking things we just hooked up the stuff in the spy system and then we found things like AT-25 drivers rely on timeouts happening and we had to have the timeouts and everything happening for the AT-25 driver so it was like wow so there's there's definitely a good reason to not reinvent the wheel I mean if you're doing a one-off thing for just one thing so if you want it more generically alright let me come down there let me take a look I think I got it up the table up the table go away bam actually I want to make sure I get you wired up first and then I'll bring water back for you or you can put it in here if you'd like I just want it inside of a pocket and the reason for that is that a lot of people fling things well that's a fling it 1, 2, 3, test, cool it's okay thanks I think it should be done in 4T good crowd you know the most boring part is the language barrier actually you're doing very good I've been listening to you for the last couple of days and I'm thinking that's not bad at all oh no you're doing quite well actually thank you I think this is very, very hard it is especially if you have just native speakers here we have lots of people come from all over the place oh yeah I'm not worried about that hi Allison these slides are kind of small and it's hard to see so if you can move forward it's a lot going to be a lot easier so let me get started here welcome to the afternoon session first afternoon session the embedded Linux track here at scale and appreciate you coming here this afternoon we're going to be talking about SOCs and FPGAs combined a new kind of a new area that people are starting to use and Merrick Boussut is going to be speaking about that today thank you thanks so let me get started first of all I'm a native speaker so please pardon my slightly not fluent English anyway here is something about me I work for this company thanks a lot for engineering as a contractor I mostly do U-boot Linux kernel work and open embedded work in U-boot and Linux I'm working as a maintainer or contributor an open embedded just contributor I also do FPGA stuff but that's just a hobby for me so that's the way I operate and let's get to the talk it will be structured like this first of all I would like to define what the SOC is what the FPGAs are and if you put them on the same package you get the FPGA and what that is and how is that beneficial for you then I'll go through all the available offerings which are on the market the small ones, the big ones the big ones which are running Linux and then I'll look more into the Linux part and how to use the FPGA part in Linux the way it actually is easy to use and the way it's not painful to use and then there will be conclusion and we can go get some beer or something so let's talk about the SOC FPGA first of all the SOC you probably know that it's a system on chip basically it's CPU core plus some peripherals you put that on a single die of silicon connect some pads to it and there you have an SOC FPGA again it's a piece of silicon with some pads but you can actually configure the logic or the behavior of the chip as a user you can program the device multiple times in field that's why it's called a field programmable gate array and if you put these two things together on a single chip you get the SOC FPGA solution usually in that setup you have some sort of buzz interface between the CPU core and the FPGA part on that chip so that's what SOC FPGA is I would like to talk a little bit more about that let me just ask you how many of you actually ever worked with an FPGA oh that's real nice but it's like half so let me explain that a little bit more so what the FPGA really is it's high speed programmable logic you can define the behavior of the programmable logic as a user by describing how the chip should behave when I say high speed like 200 to 800 megahertz clock frequency of the internal fabric but these are awesome numbers but it's not always the case if you have some complex logic which you put into the FPGA you have to slow the internal fabric down because there is signal propagation delay on the inside of the FPGA and you have to account for that so 800 megahertz is like what the fabric is rated for and if you put in the CPU core let's say it goes down to 40 100 or something like that the FPGA also has a lot of IOs so you can connect a lot of inputs and outputs with all sorts of different properties into the fabric and by different properties I mean like different voltage levels from let's say 1.2 low voltage TTL all the way to 3.3 let's say you can have also differential inputs some FPGAs have some specialty blocks which allow you to do simple encoding and such stuff so all that is there and if you have such nice programmable logic device you can for example use that to interface some obscure high speed ADCs and then just some transformations in the FPGA to make it more palatable for the SOC maybe on the other side to like get the data from the ADC you can also model hardware in the FPGA all sorts of hardware CPU cores or some weird bus interfaces all that is possible so that's what the FPGAs are for there's multiple vendors of the FPGAs so you can take your pick the big brands are Altera Silings and MicroSemi there are also a couple of smaller ones but I'm sure you can find that on the internet I'm not doing PR for any of these vendors and what I would like to show you in the next slide is how the FPGA looks on the inside to like even further clarify how that works so on the left side you see like a top level die shot of the FPGA and you can see like three distinctive elements here one is the red blob this is where you define your logic function the red blob is connected to other red blobs with this blue stuff which is called the global interconnect this is actually programmable as well so it's like think of it as assembling stuff on a breadboard and the red things are the chips and the blue stuff are like the wires that's basically a good idea how to think about that and then obviously you need to connect it to the outside so we have these black boxes these are the IO elements you can think of it like the pads on the chip but in fact there is some more logic in those which allow you to do like the voltage level adaptation and stuff because the internal fabric of the FPGA runs at reasonably low voltage and it's kind of sensitive let's say an LED which is pretty tolerant you just have this adaptation logic there if you zoom in at the red blob a little bit more you will notice you can further divide that you can see that on the right side on the right top side there and what you will see is the green stuff which is local interconnect and the green stuff are what's called logic elements so these are the subdivisions of the red blob and the local interconnect is not connected actually to the global interconnect the reason for the local interconnect in these like which is called logic array block is that you can do a quick local connections there and then only if you need to go to the global interconnect you connect through the local interconnect into the global one and then communicate with the other red blobs other logic array blocks now if you look at the logic element which is like the smallest building block of the FPGA that's what you see on the right side down there it actually consists of a lookup table a LED on the left side and optionally a register which you can disable and as you can see the lookup table is the local interconnect so it's the output of the register and by just basically by changing these smallest elements together you can build any digital logic circuit in the FPGA the way the lookup table works is basically it has four inputs for each state of these four inputs there is defined output value which is 1 or 0 you can either go into the register or it can go directly to the output of the logic element if you just use the lookup table you get combinatorial logic if you add the register in there you get sequential logic that's basically how it works now are there any questions regarding how the FPGA works how much stuff is in there it really depends how much you pay for the FPGA okay let's see Altera Cycling 4 the smallest one has the matrix of the logic array blocks which is like 24 by 30 big and it has 16 logic elements per each of these entries so 25 by 30 by 16 and that costs you like that chip but then there are like smaller offerings from latest which actually has an open source toolchain which is like super cool but it's not coming from latest but if you want to look at the open source toolchain for latest look at the ice storm project page it's like really awesome ice storm, yeah it's really awesome thanks for the question any more questions now then I'm moving on so like why would you want to use this of the FPGA right because like you have this FPGA there's a lot of logic elements in that why would you need to put a CPU like next to it if you can put the CPU into the FPGA fabric the problem is the FPGA fabric is actually quite expensive so if you buy an FPGA which has some basic amount of logic elements you put the CPU core in there for just the CPU core that's the problem and it costs you like 50 bucks so not awesome, if you think like how much an ARM core costs you that's like 1 dollar or something also there's the thing that if you put the CPU core into an FPGA fabric it will not run that quickly like in the entry level FPGAs we can go to something like 100 megahertz on a simple core if you put like Intel 486 into that you can be running at 30 megahertz or something so it's super slow then again you can have an ARM core next to it which runs at 1 gigahertz and that's not a problem right so that's why you don't want to put an core into the FPGA itself then again why would you want to have a CPU and connect an FPGA to it well the reason for that is like you can get a dedicated chip for example if you want like 20 UARTs sure you can have 20 dedicated UART chips and connect them to the CPU over USB or whatever but then your bill of materials will just be huge right so instead you use an SOC with FPGA synthesize 20 UARTs into the FPGA and you're done that's perfect or you need some really special interface aviation or something like that it's just easier to put it into the FPGA then get a dedicated chip which might not be in the right package which you need especially in the aviation this is kind of sensitive-ish there so that's why SOC with FPGA is good for you so what's available there's actually a lot of stuff to choose from and if you go to the internet SOC FPGA you will usually get the Cortex-A offerings but there are even locust parts like the Cortex-M based ones or AT-51 based SOC FPGA even from Cypress and then there's micro-semi which is kind of getting closer to being an actually reasonably usable SOC and then you get the Cortex-A parts which can just run Linux and it's awesome so I'll go through these four offerings and show you how to actually get started with them reasonably and what are the downsides and upsides let's get through it first of all the PSOC this actually evolved from chips which were used in smoke detectors so the idea was you have a small CPU core and a lot of analog blocks which you can chain together and connect to some circuit which will give you some sort of analog signal and since all the smoke detectors are like the smoke detector analog parts are different you need some flexibility in the chip to like deal with this difference so what they think thought about is let's put a CPU core there let's put some ADCs in there some op-amps in there and allow the developer to actually connect them together and configure them somehow and deal with the differences in the analog part in the end this became quite popular you can get one for like 10 bucks also including development kit including a programmer including everything you can just get this $10 kit and just put it into your PC and do your thing which is awesome but there are also downsides to it because like the software for working with that is Windows only also like recently they switched from the AT51 to Cortex M0 and even more recently they switched to M3 so it's getting increasingly powerful and useful which is cool and they added digital blocks so it's no longer only analog matrix analog switch matrix but you also have digital blocks there now registers, gates and so on which is nice okay so I mentioned that the design software is Windows only but there is open source project associated with the open FPGA IRC channel going on which is called the green pack and they are documenting the way the digital switch matrix works in this chip so it's definitely something you should check the green pack project regarding the Windows tool it apparently runs also in wine apparently, I didn't try that but if you install that which is slightly annoying because you have to hunt down the right link for the right kit you have and then install that and that kind of sucks if you manage to like do that installation it looks like this basically what you have here is a schematic entry that's where you configure the programmable part you just connect this together click compile it will check out some blob which basically the CPU core executes to configure the the switch matrix and all the blocks which you can see there are actually mapped into the CPU address space so you can address them you can access them from your program alright and I think I basically pretty much mentioned all this stuff yeah alright, if you want to use an RTOS on this stuff you can do that basically just pull out the configuration blob from this piece of creator software put into the init code of your RTOS before you start the main thread just let this stuff program programmable part and then start your main thread in the RTOS and it will just so that's it for the PSOC any questions about the PSOC no okay now I have another one which is also Cortex-M which is micro-semi-smart fusion this one is actually getting a little bit more it looks a little bit more like a real SOC but not really it has Cortex-M3 it actually has an MPU in there but there exists a port and it runs UC Linux 26 36 I think but they were like super proud these guys who did that that they upgraded to 4.2 right now which is by the way deprecated for 2 years I think so yeah it can run Linux but not really and if you need security fixes forget it it would be real nice if someone actually did update the Linux port and got it mainline that would be awesome you can get a kit for 125 bucks sure and you will need the design software to actually program it and I have this slide here which explains how to get the design software running yes and that's like that's not all of it actually so first of all yes you need to register the micro-semi-website and download the which is called the libero you need to download service pack that comes separately and then you need to download something which is called license server demons they don't tell you about that they tell you only about the first two things so you install that the libero then you install the service pack then you figure out you are missing some 32-bit libraries so you install those 32-bit libraries right and then you realize oh hey well but it tells me it doesn't have a license so you go to the micro-semi-website and you try to get the evaluation license but there is this in-obvious button fill the questionnaire so you have to fill in the questionnaire and tell them basically hey I'm not going to use it in aviation or anything alright so at that point you get your license which is sent to you via an email and then you need this sort of huge script which has a lot of weird variables which they also don't tell you about I actually had to like scrub it off the internet this stuff to actually get the design software to even boot right it boots for like a minute because it starts checking all the licenses and stuff and if you actually don't have the licensing demons it will also work because the licensing demons are shipping with the installation of the design software there is just one of them which is missing which is the one from Synopsys and that's in the licensing demons package so you actually do need them I have no idea why but okay you will figure that out if you are missing some of the licensing demons at the point where you actually write some of your Veriloc code or VHDL code and then you press synthesize and it will say hey I'm missing this one single file and like sorry but this is not all of it if you synthesize your design in the Libero tool you need to flash it into the Smart Fusion device right and for that there is a tool which is called the Flash Pro 5 except this is using some really obscure windowing system from Axtel which is called Win Flash U and I just didn't manage to get it working on a modern Linux system so sorry actually there is a replacement tool which is called Flash Pro Express which is written in QT which is awesome but if you have some old projects which are still using the Flash Pro project files you actually need to use the Flash Pro to convert them into the Flash Pro Express format so like you are basically stuck you cannot launch Flash Pro to convert these projects into the Flash Pro Express format sorry I actually wanted to show you the UC Linux but I'm sure you can find some tutorials on how that is done and here is an example of the MicroSemi design software how that looks it actually looks pretty nice on the way I like first the first impression of that is pretty nice but it's extremely rudimentary and compared to all the bigger design software like stuff from Intel and from Kylings this is like super simple tool and you will soon find that you are missing some stuff but when you launch it you get some wizards which allow you to configure the FPGA part and what's in the CPU part what's enabled there and how the buses between the FPGA and the CPU work and how the interrupt mapping between the CPU and FPGA is that sort of stuff so at least they have wizards for that oh and you can get a radiation hardened version of these parts so that's I guess cool and ok now I'm kind of getting to the more interesting stuff which is running Linux so first of all it's the Alterosug FPGA the older ones are Cortex A9 systems either single core or dual core it has the standard set of peripherals like SD card SPI, NAND flash controller SQC USB that sort of stuff now there is a new upcoming chip which is Stratix 10 from Altera which is Cortex A53 so R&B 8 but this is like bloody expensive so I'm not sure if this is that interesting it usually runs Linux and Uboot which is cool but you can also run RTS on that BSPs are available oh yeah and it can run in AMP configuration Linux and RTS like side by side on the dual core part which I find to be quite interesting there is a tool called Sparrow from Altera which allows for that this is how the design software looks it's called Cortex now Intel FPGA tool because Altera is now part of Intel it's proprietary but it runs on Linux and it runs real well the reason for that is that if you're looking for huge FPGAs you just need this built farm somewhere and you will be offloading your projects to it and the built farm will usually be Linux so that's why it runs real well and there is was that no that's Verilog thanks for the question and it's actually probably some broken Verilog because I just took that and stripped it out and wanted to show you something okay yeah there is project Typhoon which is working on cycle 4 bitstream documentation so that's also kind of work in progress to get an open source tool for Altera FPGAs okay regarding software support on Altera bootloader Uboot Altera is providing some ancient version 2013.1.1 well the upside is that they actually picked the fixed version 0.1.1 which is like awesome but they disabled a lot of stuff and hacked it up and like the experience is not awesome if you are working with mainline Uboot a lot on the other side mainline Uboot actually works on the SogFPGA it's usable, it's used in production and like Altera is actively helping to maintain that which is perfect if you want to use Uboot on SogFPGA just please use mainline um but there is another option if you have some sort of problem with GPL there is bootloader called MPL and it's basically in its hardware um loads some blob from QSPI flash and then just starts that problem is this bootloader still contains the box which were fixed in Uboot so like it's borderline on maintained the MPL it just exists I saw it in use like once so I thought I would mention that for the Linux kernel well you have two options obviously vendor kernel Altera is trying to keep track with mainline so it's couple of versions behind mainline they have some questionable patches there but most of the IP book support which are on the HBS is in mainline already except for I think NAND oh yeah there is a Denali NAND controller in there which is not super well supported in mainline so if you have like something using NAND you might want to wait a little actually there is a guy from Socio next working on fixing that Denali NAND driver and this will probably happen in the next few releases the FPGA part still needs a couple of patches in mainline in the vendor kernel this is actually usable but I'll talk about that later there is a competing product from Cylinx it's basically the same thing Cortex-A9 idler single core or dual core and they have a new one Cortex-A53 ZingMP again usual stuff it's an SOC SD card QSPI flash DDR DRAM I2C CAN whatever it has a lot of multimedia stuff there which is quite interesting compared to Altera they have a GPU even in the ZingMP which might be interesting to some people on the other hand it's Mali so no open source driver you're stuck with blobs and that's not awesome and actually if you're running 32-bit user space on that ZingMP there is no Mali driver for that yet so the 32-bit blob for the ZingMP will be available in the next release from Cylinx too bad and again it runs Uboot Linux RTO's offerings do exist so let's see software support, first of all design software looks like this it's called Vivado it again works well on Linux because that's what they need it's proprietary unfortunately but the same guy who's working on the Ice Storm project is actually working now on documenting the Cylinx 7 series which is basically Zing great devices and eventually there might be something in a couple of years which can produce a bit stream and it's not proprietary as for the bootloader on Zing this is a little bit more complicated so like there's a guy from Cylinx Mikhail Shimek is working on a lot of upstreaming of the Uboot stuff so like Zing support in mainline Uboot is pretty good and the Zing MP is coming there are a couple of things which are missing because RMV8 is a little bit more complicated in the boot process so on Zing you can just take the mainline Uboot compile that and it will work it will just produce this bootbin file it just put it on the SD card and it boots with Zing MP you actually need to go the FSBL way which is like Cylinx's first stage bootloader thing which starts up, configures DRAM, then programs power management unit, then programs the ARM trusted firmware, then programs the FPGA part optionally and then starts the Uboot. The thing is actually even on the Zing MP you can start mainline Uboot and it will get you into the Uboot prompt but when you start booting Linux on Zing MP without the ATF part the Linux will crash so on RMV8 you actually need that the ATF part and this will land in Uboot in like the next two-ish releases the ATF loading it's already working progress the patches are on the mailing list so that's going to happen soon. As for the Linux support again similar to Altera kind of recent vendor kernel with some patches basically the Zing the older one the Cortex A9 based one is available in mainline most of the blogs are supported Zing MP some stuff is still not in mainline because the chip is basically quite new and the state with the FPGA part is just basically the same thing as on the Altera part. Any questions regarding this stuff? Yeah. So the question is whether the FPGA has its own set of peripherals and what they interconnect this between the FPGA and the SOC. So like, yeah the FPGA has this control logic which is available on the CPU bus. So if you want the CPU to program actually the FPGA you interact with this control logic and then there are unboxing buses between the CPU and the FPGA fabric so that you can synthesize your own peripherals in the FPGA and basically map them into the CPU address space through these windows in the CPU address space where you just access the registers through the XC buses. Does that answer the question? Yeah, that's correct. But you need to synthesize the register in the FPGA fabric itself basically. Well it depends on the implementation there are non-volatile FPGAs but there are no non-volatile SOC FPGAs yet to my knowledge. What does Zing does is actually a form core comes up and then it programs the FPGA so that happens before Uboot even comes into the play. But this is actually optional. You can then use the Uboot to program the FPGA bit stream as well. Or you can use Linux to actually reprogram the FPGA bit stream in the FPGA. And also the FPGA we use either Uboot or Linux to reprogram the FPGA. No one did that. The problem is there is this called the place and root. And this takes like a huge amount of resources so it's an NP complete problem and the ARM core is just not powerful enough to do that. I was trying to synthesize let's say a Nios 2 chip and that takes like 20 minutes on an Intel machine so on the ARM core it would take a huge amount of time. Yeah, because the latest FPGA is really quite small. It is, yeah. Any more questions? Yeah, it actually exists as a pre-production silicon to my knowledge but I don't know what the interconnect is between the CPU and the FPGA. So last time I checked they basically glued these two things on a single chip and connected them with PCI Express on the inside. And this is actually not new because before that there was something called Intel StellarTune and I would just ask you to look it up because that's an interesting chip and they did the same thing except they used Intel Atom and Aria 2 from Altera and again they put them on the same package and this never caught up so you cannot find that on the internet that much but it exists. And now they will have a Xeon with an FPGA and it makes perfectly valid sense actually because data centers, right? So that's going to be really cool. Does that answer your question? Cool. Thanks. Okay, so I'll like move on with the stuff. Okay, let's assume that you have Cortex-A based SOGA FPGA and you decide to put it in your product and want to roll out something which is usable. Basically this is how you do it. You get a working bootloader on that. On SOGA FPGA you just use the design tool, the Quartus, compile the Quartus project, basically the FPGA project. From there you will run the BSP editor, generate Handoff files which are a couple of headers and from that point on you will just use mainline U-boot. There is a script in there which you point to these Handoff files. It will like process them, make them a little bit less confusing. You put them into your board directory in U-boot and add a config file into the U-boot for that board. Couple of key config entries configure that board. Just make and it will produce U-boot with SFP with SPI with SFP and you put that either on the RSD card or into the QSPI flash and boot the board and you're pretty much done. I think this is a little easier actually even. Use the Vivado tool again then generate the HDF unzip the HDF because that's basically a zip file. From there you extract the PSU or PS7 in it depending on which ship you are targeting. Put it again into the U-boot source tree compile and this will generate the boot bin, you put it on the SD card or QSPI flash the same way and boot the machine. That way you get a bootable system image. If you want to program the FPGA there is an FPGA command in U-boot which you can use to just load the bitstream into the FPGA. If you want to use it in Linux this is what vendors tell you use this device node CAD the bitstream into it this will program the FPGA and then enable bridges by poking some CISFS files and then eventually use Devman to poke the hardware registers in the FPGA if you are better you can use UIO but that's also not awesome because user space drivers suck and also you implement something like that and you can bind the driver from user space to the stuff which is in the FPGA that's doable, okay but then someone comes and reprograms the FPGA because he can and your driver is still using stuff in the FPGA so what happens when someone accesses the content of the FPGA then your system will usually either hang or start completely misbehaving because that's just undefined so no, this is not how it can work and the way it actually can work is by combination of two things Linux with device 3 overlays and FPGA manager so Matt Porter did speak about device 3 overlays before the way this works is that you basically say my device 3 is not static anymore it's dynamic and I have these device 3 fragments which I can load into the kernel and patch the device 3 and the kernel will bind the drivers based on what's in those fragments and if I unload the fragment it will unbind the drivers and just do the right thing and the fragments can be either loaded through configfs or some other means basically together, usually configfs is the way to go about that how do you load and unload the fragment here is an example you create an overlay directory with mkdir then you have the fragment which is called in this case Overlay DTS you compile it with device 3 overlay with the special option dash add this will tell it it's supposed to build a fragment with symbol resolution table and then you just put the result from the device 3 compiler into the dtbo file in the configfs the dtbo is basically a binary attribute in the configfs once you do that the kernel will recognize hey, my device 3 changed are there any new drivers which I need to bind yes, no, okay, it will bind them cool, new stuff will appear on your system and you can work with that once you are done with this device 3 overlay you can unload it by just doing an rmdir and you're again good the way the device 3 overlay looks I'm not sure if this is really readable but that's how the dto looks like basically you specify the plugin at the top which is indicating that it's a dto and then you define fragments in there each fragment patches some piece of the device 3 in this case I am patching ethernet there I am enabling one ethernet in my machine and setting the phi mode to rgmai and in the other one I am adding an i2c eprom under my i2c expander node so that's how the device 3 overlay works the other part of this is the fpga manager this is what actually handles the fpga control logic allows you to reprogram the fpga bitstream it supports partial reconfiguration actually but this is kind of dangerous and if you don't know what you are doing you are going to shoot yourself in the lag with partial reconfiguration but we support that and it also manages the bridges between the fpga so it prevents the user from doing something stupid which is awesome it supports fpga from altera zinc from xilinx the zinc amp is coming there is no support for the latest ic40 so you can have a raspberry pi with an spi connected ic40 which is cool so how do you put it together basically you've read the special device 3 overlay which triggers the firmware loading through the fpga manager and then binds the content which is in the fpga and user is happy but there is one detail which you should keep in mind if you unload the device 3 overlay which is using the fpga manager the fpga is not turned off and this is because there can be some critical logic in the fpga which you do not want to shut off when you unload the device 3 overlay so this is something to keep in mind there has been quite some discussion on this topic on the fpga mailing list on the linux fpga mailing list ok so this is how the device 3 overlay with fpga manager looks like at the top you can see that the target path is actually going into the fpga manager bridge so this specifies that my hardware is under that bridge between the cpu and the fpga I have an entry which is called firmware name this is my fpga bitstream and then I specify at the bottom UART which is instantiated in my fpga so when I load this the fpga manager will use the linux firmware loading facility load the rbf file the firmware name load that firmware into the fpga enable this bridge 0 and bind the UART driver and I'll have one more UART on my system so that's how it works so conclusion there's a lot of PL devices available, the small ones well I guess you should stick with RTOS on those or just bare metal stuff with the big ones if you want to use the fpga in linux in any reprogrammable way then please take a look at the DTOs and the fpga manager this is like the way to do it please don't use the vendor stuff because this is so much pain in the backside and this is all I have thank you for your attention and questions yeah that's an interesting question so the question was like if you have an spi controller in the fpga and you have the fpga firmware on an spi flash which goes through the fpga spi controller how do you make sure that you can still access that file so the thing is if you have a fpga flash which has a file system on that and it's mounted then you will not be able to reprogram the fpga because it will say hey there is someone using stuff in that fpga so the system will just prevent you from doing that if it's a raw flash well that won't happen because the firmware has to be under a lip firmware actually so it has to be mounted there somehow what's that? actually it would be great if I could get a microphone that would be awesome because I didn't really catch that sorry I just wanted to be clear that I wasn't just talking about a spi flash where the controller was a soft core and fpga but my understanding was that all the peripherals go through the fpga just to pin maximum just to reach the outside world in a normal Linux running inside say the soak part and I just don't understand how it could read that firmware file and still have the fpga free enough to be reflashed so if I understand it correctly the question is that the SOC is routing all the pins through the fpga how can it access the peripherals on the other side so how could you reflash the fpga while it's busy being amongst so the thing is the hps part has dedicated peripherals and they have some dedicated pins which don't go through the fpga how do you find which those are what's that? how do you find which are not going through it's in the datasheet so basically you read the datasheet like Matt mentioned that's a good skill I read the hps I read the hps datasheet and it didn't even say in that one if you actually go into the QWERTA's design tool and run the QSYS tool and edit the hps component then there is pinmux table which tells you which pins are mixed to which functions so you can find it there but the QSYS shows that it didn't make it clear which ones were going through it looked like I was pinmuxing everything that's actually where the confusion came in ah, right so the stuff which is going through is called actually loan IO so the stuff which is not going through is called nothing like it's called hps pins but if you want to route some stuff through the fpga fabric I have a question back here if I didn't answer it completely we can talk later about that thanks in your experience what manufacturer of fpga has the most works the best with linux well like alter run filings are just okay they work basically because they have to write and then the cypress stuff just works on windows and they tell you use wine if you want to use linux so there is no other way around that unless you look into the green pack tool which is open source tool for the cypress psoc 5 LP and then there is micro semi which is like total disaster unfortunately because it is I'm sorry like it kind of works but it's a lot of hassle to get it going okay and in your experience which of the major manufacturers kind of has a good line for hobbyists who aren't looking for super high end fpga to build some custom a6 circuits but just want to tinker with it is there a brand that sticks out in particular if you're looking for an fpga without an soc for a hobbyist application look at the latest ic40 because it has an open source tool chain it's the project ice storm and that's like really awesome thing and it's so much better than all these proprietary tools because it just works and it's fast and it's not so painful to install and work with that it's real great thank you you're welcome just a little addendum to that testimony about the ice I know somebody who's making a nice little dev kit coming soon it's always going to be all open hardware all open source so that's definitely a good way to go for the ic40 is that can you tell us the name of that kit I'm blanking maybe I know up in Santa Barbara yeah well that's great I noticed there's actually quite a lot of these coming up because the official latest ic40 dev kit just disappears from the shelves it has 29 weeks lead time because everyone wants one and now there's a lot of hobbies making dev kits which is awesome thanks man back there asked you a question about something for hobbies and you said whatever you said awesome and I'd like to know what brand that was so I could look it up oh yeah like this ic40 and the open source toolchain is called project ice storm project what ice storm what would be a use case for running an RTAS on an FPGA as opposed to say Cortex-M where a lot of people run them so what would be the use case for running an RTOS on an FPGA core instead of a Cortex-M the thing is you can get much better timing with the FPGA so if you like implement the real time part in actual hardware you can get much better response time because you eliminate all this universal instruction thingy which the CPU is basically so the hardware can just latch the data in do some sort of combinatorial processing and immediately have the result available while the CPU has to like read instructions execute the instructions do some decision making and there is a lot of hardware involved in that which you probably don't need in some real time applications anybody else thanks I'll give one more, hold on just curious, what's your use case for the FPGAs well, I'm interested in how the Altera FPGAs works so I'm running this typhoon project and looking kind of under the hood and actually we have customers who are using FPGAs as for example video processing parts so you have like multiple camera inputs you do some like video processing in that FPGA because you're streaming the data in do some changes with that and then provide reduced data stream to the SOC so the SOC can deal with that sort of bandwidth of the video data also if you have like some really high speed ADCs you can pre-process the data from the ADCs in the FPGA that's basically my use case there are also some automotive people who need like bazillion URs which are kind of slightly special like the scale line stuff and LIN stuff so that's why they use the SOC FPGA systems as well thanks thank you very much for coming this has been recorded so you can find it online on YouTube, you can also find the slides will be uploaded to the scale website so that will be available later on and of course if you have any questions feel free to ask, thank you you had 126 people in the room and you did very very well don't worry about it it's really not your first language and that makes it a little tougher but your content was superb welcome to this afternoon's session on USB introduction to USB who haven't seen this before this is the embedded track but USB is pretty universal, everybody uses it everywhere so we're hoping that you enjoy this today's speaker is Alan Ott he's been working with the USB subsystem about 10 years within the Linux kernel and he's here to tell us all about it as an introduction, thank you well I've been doing USB for about 10 years but not in the Linux kernel for all of that time but definitely on the firmware side and the user space side and I'm really loud I'm very loud anyways it runs in my family alright so just a little bit about me I'm currently the platform software director at Softiron Softiron is a startup and we're making 64-bit ARM servers and data center appliances, well storage appliances specifically currently we have some products that are shipping now Overdrive 3000 and 1000 ARM servers and we have storage products that are in design I work on the Linux kernel, I do a lot of work in firmware, particularly on USB these days I've given training work on USB, like I said one of the things we'll talk about today is my Mstack USB stack for PIC micros and I've formerly worked on 802.15.4 wireless so let's get right into it and we'll talk about USB so USB, universal serial bus it's a standard for a high speed bi-directional tech bus created by the USB implementers forum which is a non-profit consortium of member companies that developed it and they currently still manage it, you can find information about them at usb.org so USB bus speeds, most of you are probably familiar with this, low speed, full speed high speed, super speed at this point, most devices that you're going to find are going to be full speed at 12 megabits or high speed at 480 megabits and when we talk about speeds we're talking about the rate of bit transmission on the bus now of course this is not the data transfer speeds because there's going to be in any protocol you're going to have overhead and depending on how you design your own USB protocol you could have significant overhead we're going to talk a little bit about that of course it can be mitigated USB standards, you can see some dates here one thing that's important to note here is that the USB standard does not imply a bus speed, like in all things the standard and the bus speed but you'll hear people incorrectly refer to them that way all the time so in USB you have a host and then you have devices, a server or even an embedded Linux system it's responsible for control of the bus and it's also responsible for initial happens on a USB bus it's also responsible for enumeration of USB device to a USB host it's the host's responsibility to figure out what kind of device that is and configure it properly that's called enumeration on any given bus you have one host so then there's also devices according to the USB standard and if you think about USB devices that you might use every day like a keyboard or a mouse that's really the way it works the mouse is providing functionality to the USB host which is the PC so you can have many devices per bus and you can connect them the way USB is designed with respect to hubs in dealing with hubs but that's all handled by the operating system for you so it's transparent at both ends which is very nice so USB is a host controlled bus like I've talked about this means that nothing on the bus happens without the host first initiating it so devices cannot initiate a transaction and devices cannot interrupt the host there's no interrupt line or anything like that USB is also a pulled bus so that means that the host pulls each device requesting or sending data whenever the host wants to do it so it's completely under host control so it's a very important thing to keep in mind as we keep going through this USB defines device classes so a device class is a standard protocol for common device types so the nice thing about device classes is that the same driver can be used for every device that's in a device class so that means you don't have to have a new driver for each brand of thumb drive with your PC because they all implement the protocol of the device class every thumb drive can be communicated with in the same way for example so this allows true plug and play in a way that didn't really exist before USB for PC peripherals so some examples of device classes are HID for input mass storage for thumb drives CDC which kind of encompasses anything involving communication so that's like serial ports network adapters things like that audio, hubs, printers, etc. are all device classes so a little bit of terminology in and out so in the USB parlance the terms in and out indicate direction from a host perspective so USB is considered to be a host centric bus so that means that in and out are always going to be defined from the host perspective so it's important to keep that in mind especially in terms of addressing USB concepts so specifically out means communication from the host going out to the device and in means communication from the device coming into the host so terms like input and output related terms like that are also going to be the same way so this is the same in the USB specification and also in all of the device classes this is very common terminology so let's have a look at the logical USB device so a logical USB device has some hierarchy associated with it we can see we have the USB device the main largest piece there inside of it we have two configurations and then in each configuration we have interfaces so we have in this case multiple interfaces on each side and inside that we have endpoints so starting from the bottom an endpoint is a source or sync of data so an endpoint is where data originates from or it's where data is addressed to so you can think of an endpoint somewhat like a networking socket if you do PC programming it's where the data comes from or where the data goes to endpoints are implemented in hardware above that we have an interface and an interface is a related set of endpoints which provides some sort of function on the device so for a single function of your device you'll have an interface that will group all of those endpoints together the next level out is a configuration so a configuration notice that we have two configurations here only one configuration can be active at any given time for a USB device that's different from interfaces where for a selected active configuration all of the interfaces for that configuration are active at the same time now it's pretty uncommon to have a USB device that has more than one configuration typically USB devices only have one configuration and you don't have to worry about it but it's very common to have multiple interfaces having multiple interfaces is how we implement what we call a composite device so a composite device is where you have a device that does one or more I'm sorry a composite device is a device that does two or more logical functions so if you had something like a keyboard and it had a trackball on it that could be implemented as a composite device we'll see another example of a composite device later on another thing too is that if you're implementing a device class the device class will specify what types of endpoints need to be in that interface so a lot of times this type of work is done for you so for something like well we'll look at that coming up here so these slides here describe what the things I already said so I'm going to skip over these but they're here for reference talking about device configuration interface and endpoint so important things to note here a device can have multiple configurations only one configuration is active at a time a configuration can have multiple interfaces all of the interfaces within an active configuration are active at the same time and an interface can have multiple endpoints are active at the same time so like I said most USB devices only have one configuration but if you have multiple interfaces that's how we implement a composite device so let's talk about endpoint terminology because this is a little bit tricky and people often throw around these terms imprecisely when talking about endpoints so the first thing we're going to talk about is an endpoint number an endpoint number is a 4-bit integer associated with an endpoint which is an interface team endpoints transfer data in a single direction so those directions are in or out so then combining that an endpoint address is the combination of an endpoint number and an endpoint direction so let's take a look here at some examples we have endpoint 1 in and we have endpoint 1 out those are two distinct and different endpoints even though they have the same endpoint number endpoint 3 in that would be an example of another endpoint in addition endpoint addresses are often encoded as they're passed along the wire in descriptors which we'll talk about with the direction and the number being encoded into a single byte so the direction is often encoded in the most significant bit and it uses 1 for in and 0 for out and then the number is encoded in the lower 4 bits one thing that makes it easy to remember and this is the case all throughout USB at least every device device class I've looked at and all through the main standard is that 1 always means in 0 always means out and it's easy to remember because a 1 looks like an I and an O looks like a 0 so just an easy way to remember that so some examples here of encoding that you might see endpoint 1 in will be encoded as hex 81 so we can see that the high bit is set the low byte is 1 where endpoint 1 out is 0 1 so the question is can you have both an endpoint 1 in and an endpoint 1 out at the same endpoints and that's very common across USB devices if you have an in out pair you'll use the same number for both you have endpoint 1 in, endpoint 1 out together what about endpoint 0? we'll talk about endpoint 0 a little bit and we'll talk about what it's used for so tools like LSUSB will show both things they'll show EP1 in and hex 81 for example but it's important to know that these two reference the same thing so endpoint terminology is tricky but it's very important in order to communicate about endpoints and about USB precisely a device can have up to 32 endpoints so that's in and out endpoints for numbers 0 through 15 the same number is used to describe two endpoints and there's no such thing as an endpoint regardless of what some manufacturers example for something I worked on recently at Softiron this is a composite device so it has a communication device class and then it has a vendor-defined class interface at the bottom communication device class actually uses two interfaces two interfaces are required one is for control, the other one is for data and they're set up to look like this and then we have a vendor-defined class interface at the bottom that we use for firmware upgrades and things like that so we have two completely different functions provided by the one USB device and it's implemented as a composite device like this so let's talk about descriptors so USB is what we call a self-describing bus so that means that so USB is what we call a self-describing bus that means that each USB device contains all the information that's required for the host to be able to communicate with it now drivers aside so what that means is that we have no manual setting of BOD rates no setting of IRQ lines, base addresses or anything like that we just plug devices in and they work so that's something that we kind of take for granted today but at the time when USB was developed this was not the norm so this is all implemented in USB by what we call descriptors so devices communicate all of the data that's needed to communicate with them using descriptors so we'll take a look at what types of descriptors there are and how this works so at enumeration time when you plug your device in during this process of enumeration the host will ask for a set of standard descriptors and these descriptors include things like the device identifier so that'll have the vendor ID and the product ID so that's one important thing that can be used to determine which driver to load for the device other descriptors include the logical structure of the device we'll talk about the logical USB device device configuration interface endpoints we'll talk about and show descriptors that describe all of those things so all of that is described to the host using descriptors also which device classes are supported is returned to the host during this enumeration time typically devices contain a device descriptor a configuration descriptor interface descriptors specific descriptors if you're implementing a class and then endpoint descriptors chapter 9 of the USB specification defines these standard descriptors I say that here because you'll often find in USB code sometimes files named chapter 9.h or whatever and they're referring to the chapter 9 of the USB specification where all of these descriptors are defined so one tricky thing that happens during enumeration is that the host will request all the descriptors which are part of a single configuration as a single block so this includes configuration interface class specific and endpoint descriptors so there's a standard request that happens during enumeration it's called get descriptor for configuration and that request means that the device should return all descriptors that are pertinent to a specific configuration so let's have a look here at some descriptors for example of a device descriptor so you can see the device descriptor contains a link basically in this case what's a pound define for the descriptor type so in this case it's a device descriptor today's USB version a device class in this case 0 means that we're not defining a device class or specifying a device class at the device level but we're specifying it at the interface level later on so it's a 10.0 vendor and product IDs version numbers and then indexes to strings that describe the device as well an important thing here is also the number of configurations so then after the host has the device descriptor for each configuration it will ask for the configuration descriptor now this is what I call the configuration packet so in this example the configuration packet has four descriptor structures note here that there's a single configuration a single interface and two endpoints so we can see configuration one packet configuration descriptor interface descriptor and then endpoint descriptors so then when we define this we'll just populate all of the members of each one of those structs in order so you can see here we have the configuration descriptor an important thing here is that it says num interfaces so that's how many interfaces are part of this configuration so then knowing that it can parse the interface descriptor in this case there's an interface number interfaces start at 0 the number of endpoints that are part of this descriptor and then an interface class so hex SF means that the interface class or that this interface is vendor defined so we're not implementing a specific device class or defined class and vendor defined USB devices are very useful for things that just don't fit the norms I mean not very many of us are going to end up making some drives but a lot of us might end up making things that transfer data for maybe industrial IO or something like that, something custom so that's what vendor defined classes are good for in our case the one that implements the firmware upgrade then we have endpoint descriptors notice that we have two of them here the descriptor type is set to be endpoint notice we have the endpoint number in each of them actually it's the endpoint address so it's the endpoint number one of them has the high bit set the other one does not have the high bit set so we have endpoint one in and endpoint one out we can also set a max packet size for each endpoint in this case we're using 64 bytes and we're setting the type of endpoint we're going to talk about the endpoint types coming up here so a preceding configuration descriptor describe a configuration an interface and two bulk endpoints and if you want to see more examples of this you can look at the file names usbdescriptors.c in any of the mstack examples and we'll look at some of those coming up here so let's talk about endpoints so there's four different types of endpoints the first type of endpoint is control so control endpoint is actually a pair of endpoints a bi-directional pair of endpoints control endpoints are really the most complicated type of endpoint and they implement multi-stage transfers so some of these terms we're going to define later but suffice it to say now that control endpoints and control transfers are useful because the transfer of data is acknowledged at the software level not just at the hardware level so there's a handshaking that goes on so here you can be sure that once you've completed a control transfer that yes the device did receive it and the device has returned some status to you so it either accepted the data or it rejected the data or it returned the data that you asked for so control endpoints and control transfers are used during enumeration they can also be used for an application and they're mostly used for configuration items so like I talked about with endpoints being acknowledged at the software level that makes them the most robust type of endpoint so when you really want to make sure that something happens on the device interrupt endpoints transfer a small amount of low latency data so this is implemented by a reserving bandwidth on the bus which we'll talk about next interrupt endpoints are used for time-sensitive data where you have not very much data but it's time-sensitive is human interface devices so it's something like a mouse the mouse doesn't generate very much data but when you move the mouse the pointer on the screen needs to move without any kind of perceptible delay next kind of interrupt I'm sorry the next kind of endpoint is bulk bulk endpoints are used for large time-sensitive data transfers so examples of these are things like network packets or mass storage so if you imagine something like a thumb drive it's really a big deal most of your latency is going to come from the flash chip that you're reading off of anyway and if bulk data or data that you're reading off a thumb drive is a little bit delayed it doesn't really matter so if you imagine a situation where you have a mouse plugged in and you have a thumb drive plugged into the same bus you're transferring files off the thumb drive you still want your mouse to be snappy even though the thumb drive is operating at full speed and that's how this is implemented in USB and it actually works very well so bulk endpoints don't reserve any bandwidth on the bus they use whatever time is left over that all of the other endpoints don't use now in theory bulk endpoints can suffer from starvation because of this but in practice most USB buses are idle most of the time and so that's not really an issue so in practice bulk really gives you the most throughput the last type of endpoint is isochronous isochronous endpoints transfer a large amount of time-sensitive data would get a lot of data that's time-sensitive well this is implemented by delivery not being guaranteed so packets for isochronous transactions are not acknowledged at the hardware level there's no act if a packet gets lost it just gets lost and it's not retransmitted so this is really useful for audio and video streams where late data is really just as good as no data at all so if you imagine an audio stream playing gets dropped it's a whole lot better to just hear a small break in the audio than it would be for the device to basically stop the stream request a retransmission and then pick it up later on so if you've ever used Bluetooth audio you'll know that this doesn't happen in Bluetooth audio in Bluetooth audio the thing that I just described is what happens so if you drop a packet and you do because it's wireless the stream gets backed up and restarted which is really distracting if you're listening to music and you use the beat this doesn't happen on USB so we talk about reserved bandwidth some of the endpoint types reserved bandwidth and this is how we can guarantee bounded latency so for interrupt endpoints we can guarantee bounded latency by reserving bandwidth on the bus interrupt isochronous and control endpoints all reserve bandwidth and bulk gets whatever is less unused each frame it's up to the host to determine if there's enough bandwidth available on the bus for your device if there is that bandwidth will be allocated if there's not the host will fail to enumerate the device so endpoints also have a maximum length and the maximum length is the amount of data that an endpoint can support sending or receiving per transaction so we're going to talk a lot about transactions coming up here transaction is basically the basic unit of moving data on the bus the atomic operation if you will so we have maximum endpoint sizes just notice here that bulk and interrupt are small on full speed devices but they're larger on high speed devices so endpoints for bulk and interrupt that are 64 or 512 depending on your speed are very common so transactions like I said it's the basic process of moving data to and from a device remember now that USB is host controlled so that means that all transactions are initiated by the host much like everything else in USB so that means a single transaction can move up to the endpoint length of bytes so if you have an endpoint length of 64 that means a single transaction can move 64 bytes and it means that you have to have hardware buffers on the device and on the host that can support 64 bytes the entire transaction process happens at the hardware level and we'll talk a little bit more about this I guess it's later on the entire transaction happens at the hardware level so because of the strict timing involved with transactions basically it's a hardware to hardware process that happens and then you get an interrupt when it's done so transactions have three phases there's the token phase the data phase and the handshake phase the token phase is where a host sends a token packet to the device and this indicates the start of a transaction and it also indicates the type of transaction so in this case the type of transaction can be an in transaction an out transaction or a setup transaction we'll talk about these the next is the data phase the data phase is where the host or the device sends data based on the direction so for an in transaction after an in token a device will send data to the host for example next is the handshake phase so the handshake phase is where the device or host sends an acknowledgement now it's going to be the opposite party which sent the data phase of course because that's the acknowledgement the acknowledgement of course is sent either way depending on which direction the data is moving so transactions happen on the hardware level like I said because strict timing is necessary means that the software will use the hardware to handle the transaction conditions before they occur particularly on the device side so that means that the software or firmware on the device side has to be prepared for what's coming and not reacting to what has happened so what this means is that you don't wait for an in token and then send data back the hardware does this for you what you do in your firmware is you set up an end point to respond to an end token you put the data in there that you want to send back when the next end token arrives and then the hardware will take care of moving that data and handling the act it will also take care of retransmitting that data if it doesn't receive an act in the proper amount of time so the hardware if it's not configured to send data for a specific token it will knack so it will basically say I have no data the transaction is over so end points are typically implemented in a hardware peripheral and typically the USB hardware device is called the Serial Interface Engine or SIE so an SIE contains registers for each end point and in these registers you'll have things like pointers to a data buffer and a link and whether this end point is valid for the next transaction or not now it's up to firmware to configure these registers for transactions which are expected in the future like I said you don't react to tokens as they come in the SIE will then generate interrupts when transactions complete as you might expect it's always clocking I don't know if the clock runs all of the time but the important thing is that it's host controlled so the device doesn't do anything unless it's a host so in that sense it is synchronous from an interrupt end point does that answer your question whether transmission errors or things like that it's going to depend on your hardware on the device side devices are usually very simple hardware so if you have to retransmit a couple of times depending on your hardware but typically not not a lot of data gets lost on a USB network because it's connected by wire and those things are pretty robust these days which is why isochronous end points work there's no acknowledgement but you just trust that the bus is good enough and it almost all of the time is which is why USB audio works happen and then you have to know that the host will have to know that it will need to enumerate those devices we can talk to those devices from user space APIs which is what we're going for on the USB host that knows that it's a hub and that happens in the Linux current so we've talked about the token phase so we've talked about the tokens so in the token phase the host will initiate every transaction by sending a token like we said the tokens contain a token type and an endpoint number the device's SIE will handle receipt of the token and will handle the data and handshake phases automatically so what we're saying here is that the device SIE will need to be configured ahead of time and once it's configured it will handle all of that automatically so for most cases this means that the transaction will be an in transaction where the device will send data to the host using an end point the data phase is device to host the handshake phase where the ACK will be then host to device the next type of token is out so this means that the transaction will be an out transaction the host is sending data to the device using an out endpoint the data phase will be host to device in other words it's going to be out and the handshake phase or the ACK will be device to host the setup so this means that the transaction will be a setup transaction setup transactions are used for control endpoints so on endpoint 0 typically it basically indicates that there are more transactions following and what those transactions will be so a setup transaction is like an out transaction but the data phase contains a setup packet which is defined by chapter 9 so then there's the data to the endpoint link the endpoint link is the maximum amount so like I said for end transactions the data packet is coming in for out or setup the data packet is going out if there's no data to be sent the device is unable to receive data the device can send a NAC so if you send an end token requesting data from the host side if there's no data the device can just send a NAC and at that point the transaction is ended prematurely the host will be notified of that whatever software initiated that transaction and the I'm sorry the the host's hardware will be notified of that and it will try again later so it's important to note here that a NAC is not a failure of any kind if anything it's really more about flow control than it is about an error it basically means that the device can't handle data right now or it doesn't have anything to send right now and we're going to keep trying until it does all the way up to the timeout is defined on the host and it depends on the transaction or actually depends on the transfer and how you set it up we're going to see how this timeout works later so the reason it's designed like this is because USB is designed for the host to be a lot faster than the device so if you imagine something like a PC and a mouse the mouse has a really slow CPU the PC has a very fast CPU it has a lot of resources USB that's of any kind of complication is always pushed on to the host so that the device can be as simple can be as simple as possible and this is an example of that so let's take a look at an intransaction here the first thing that happens is the intoken goes to the device and in this case the device is going to send back a NAC basically saying I have no data to send you and more specifically it means that the hardware SIE for this endpoint is not configured to send any data back so then the host will retry and in this case it gets another NAC the next thing that happens the host will try again in this case there is data and the device sends data back and then the host sends an act to the device intransaction the next is an outtransaction so outtransactions are a little bit different first there's an outtoken that's sent and then the data packet is sent now in this case the device is not able to receive it and it sends back back to the host so what will happen then the host will try again it will send an outtoken it will send all the data again and in this case there's another NAC comes back so what happens the host tries again sends the outtoken sends the data and in this case the device is able to receive the data and it sends back an act and now the transaction is complete so one observation here it will be able to NAC it so it's kind of an inefficiency in the USB protocol this will happen on full speed and on low speed but on high speed there is an additional mechanism it's called a ping token and that can be used to mitigate this where you're not sending lots of data repeatedly when you have a good idea that the device is not going to be able to handle it yet so the hardware SIE handles all of this timing for a transaction and like I said before this means you must be ahead of the host on the device in a manner of speaking you have to be setting up the hardware so this here describes what we talked about in the previous slides here so I'm going to go over these it's added here is that when you have an intransaction the first thing that has to happen is that the device firmware will put the data to send in the hardware SIE buffer so that means that for an intransaction that's going to happen sometime in the future the host basically has to put it in the buffer and then it has to wait the device has to put it in the buffer and then it has to wait for the host to ask for it and it's the same thing with an out transaction the first thing that has to happen is the device firmware configures a hardware SIE buffer to receive the data and then the host sometime later will send the out token and it will send the data and then the device will have it of moving data in USB so now let's talk about transfers so a transaction is defined by the standard as delivery of service to an endpoint that's what we've talked about so far and then what's a transfer a transfer is where we have one or more transactions moving data between a host and device so transfers can be large even when we have small endpoints so remember a transaction is limited by the endpoint links or it can be arbitrarily large so how does this work let's take a look first at it can be full size like in the middle 64 bytes maybe it can be smaller like the one in the top or it can be zero bytes like the one on the bottom so it looks like this so we can see we have multiple transactions in that transfer notice that all of the transactions except for the last one are full size or rather they're all length so if we have an endpoint link of 64 bytes that means all of the transactions except for the last have to be 64 bytes transfers are ended by when there's a short transaction which is one that's smaller than the endpoint links or when the desired amount of data has been transferred as requested by the host now this desired amount of data thing is where transfers and ending transfers can get tricky transfers are ended when you have a short transaction and the requested amount of data has been transferred and a short transaction is one that's smaller than the endpoint link and remember like I said this means that in a multi transaction transfer so this creates a little bit of a problem because sometimes a host does not know the number of bytes that it really is going to get so an example is a string descriptor in this case a host might ask for 256 bytes and the device maybe only has 40 bytes which is a string to send back so in this case the device is going to end the transfer early and this gives us an interesting edge case so there's actually four cases of large transfers and let's consider them and then we'll talk about the edge case here so in case number one the host asks for a number of bytes which is not a multiple of the endpoint links case number two the host asks for a multiple of the endpoint links so what we're talking about here is we're having whether the transaction or whether the transfer is going to end evenly with full size transactions and we're going to see case three and four is where it gets interesting so in case three the host asks for some number of bytes and the device wants to return fewer bytes this number is not a multiple of the endpoint links case number four is the most interesting one the host asks for some number of bytes and the device returns fewer than requested but it's also a multiple of the endpoint links so in cases one two and three the device can simply return the number of bytes that it intends to return and everything is all going to work so case one quickly we can see that we have a 16 byte endpoint links we request 76 bytes so we get four 16 byte transactions and one 12 byte transaction so the transfer is ended by a short transaction and the desired amount of data has been transferred yes so we know that a transaction is over I actually don't know it's handled in hardware probably I don't remember the device you don't have to worry about that case number two the host asks for some number again the device just returns this many bytes and three the host asks for some number of bytes and the device returns fewer than requested which is not a multiple of the endpoint links so we have 16 byte endpoint links in this case the host requests 255 bytes so the device will return two 16 byte transactions and one 12 byte transaction the transfer is ended here because we have a short transaction so when the host when it receives a short transaction it knows that the transfer is over so case four is an interesting edge case the host requests some number of bytes the device returns fewer than requested which is also a multiple of the endpoint links so since the number of bytes being returned is a multiple of the endpoint links the transfer will not naturally end with a short transaction so in this case the device actually must add a zero link packet and this kind of thing when you're writing the code it's a real hootenanny to keep track of when you have to do it and when it's happened and when it hasn't happened so here's the case here we have a 16 byte endpoint links and the host requests 255 bytes the device in this case wants to return 32 bytes so it's going to return two 16 byte transactions but then after that it's going to return a zero byte transaction so there's going to be an extra transaction of zero bytes in links yes it's really over even though I didn't send you a short transaction the zero link transaction or zero link packet is the short transaction so does that make sense this might sound like something but the first thing I did with USB I ran into this problem on the device side and we'll talk about why when we look at devices so the transfers so far that we've talked about have been bulk or interrupt transfers control transfers are different and they're more complicated I've got some slides here about them a few but I'm going to skip over those in the interest of time and you can definitely go back and take a look at those if you like another thing that's where you deal with a device and a hub and they're transparent to your device I'm skipping over split transactions the USB spec is really long and it's very difficult there are lots of pieces in there picking out things that are like the core pieces was important here to get this into an hour but yeah there's loads of other stuff done on the host in soft bytes will be enumerated there's a certain number of endpoints in it with a number of bytes it'll calculate how long it takes then it'll just determine if there's this much or if there's not this much and the host will de-configure the device or not configure it if it's not able to do that I've actually never seen that happen I've never tried really hard to make that case happen but you have so next we're going to talk about Mstack for Pick Microcontrollers by Signal 11 Software Signal 11 is my consulting company it's free and open source it's actually dual licensed under Apache and GPL so it's suitable for use with commercial products or with open hardware products it implements vendor defined devices so devices where you have no class no device class and then additionally it implements HID, CDC, ACM and MSC and it's a device class if you want to make devices that have those Pick Micro's 816 and 32-bit on 32-bit Pick 32MX is supported but not MZ yet I'll get to the MZ at some point one of the nice things about Mstack is that it's hidden as much as possible it's possible to abstract away knowledge of USB that's why the preceding slides still matter even though you're using a device stack the URL is listed there at the bottom Mstack is configured statically through a file called usbconfig.h which is part of every which is part of every Mstack application this header file can enable endpoints for use set endpoint links it can configure the ping pong modes on the Pick Micro that's where you basically have two hardware buffers for each endpoint so that you have more buffering that's available you can configure it to use interrupts or you can configure it to be polling and you can set callback functions for common events so things like when your device has been configured fully by the host or maybe when it's been attached Mstack automatically creates and handles the buffers for each endpoint so when we talk about the FIE and setting up the buffers Mstack handles all of that stuff for you so any kind of microcontroller specific constraints about like dual port memory or any of that kind of stuff are all inside Mstack so all you need to know from your application is that you ask for a pointer to the buffer and you get a pointer to the buffer so your application code is actually very simple examples are provided for each device class that work on a wide range of Pick Micro and we have a screen saver out buffer function and then once we've done that we can call some process in your application that data and then after that we rearm the out endpoint so we're saying that I'm done with that buffer so back to the hardware and the end point can be the FIE can be C we can get a pointer to that buffer and populate it with some application specific item we call send in buffer so we call send in buffer this puts the data onto the end point and then the host will ask for that data it's really simple to create a simple application for implementing a USB device now Mstack since it runs on a microcontroller it can operate on the transaction level so all of those things that happen here in the examples these are individual transactions so that means when to point links if we have a larger multi-transaction transfer that has to be handled in your software and that's why all of those things previously were important so if you have to handle larger transfers it all has to be handled in software or in your firmware I guess blocking function these were blocking functions so in this case we're just checking to see if the end point is busy to see whether it is or not so don't get a pointer to the end buffer and then put stuff in FIE is in process of using that buffer already no nothing in here is blocking so let's talk about USB next and we're...oh we have some time we have time you can do that with low speed if you want to support the protocol electrically you can get close and you can have something that works in a lot of cases there is software for Atmel that does do that but no, Instac does not use that it needs a real hardware USB FIE peripheral can use a what for low speed buffer does that do the bit stuffing I've never seen that done or heard that before it'll do all this stuff for you especially because of all the... so LibUSB is a multi-platform so this is the host side of USB it runs on Linux, BSD OS 10, Windows and even other operating systems it runs in user space so no kernel programming is required it has an easy to use synchronous API and also a high performance asynchronous API it also supports all versions of USB so it doesn't matter whether you're plugged into a 3.1 port or whether you're plugged into a low speed port it all just works and you can find the website there at the bottom so unlike an Instac device, a LibUSB host runs on a general-purpose multi-process OS so that means that sufficient permissions are required to open a device and that's not something you don't have to deal with on Instac because you're just on a microcontroller so opening a device or interface may be exclusive so only one process at a time can do it so this is something you'll have to deal with a lot of times you might not have permission to open a device so you can use stuff like Udev to make sure that you have those permissions set properly for your device so now from a host perspective the basic unit of a USB connection and maybe some imprecise language here but the point I'm trying to say is that USB interfaces are what's important to Linux and to things on the host side rather than the device itself and this is because devices can have multiple interfaces each interface might require a different driver so remember we had, before we had the communication device class and then we had the vendor-defined device class it's conceivable and it is what will happen in the default case that the kernel will bind its own driver to the communications class part and you'll talk to the vendor-defined part through LibUSB so that's why interfaces are more important on the host side we can see here we set up a handle we initialize LibUSB and then we open the device this is kind of a shortcut function here because what this function doesn't handle this LibUSB open device with VidPID if you have two of the same that are plugged in at the same time so if you have two devices that are the same plugged in at the same time what you'll need to do is get a list of all of the devices, walk over that list figure out which one you're going to open and remember the next thing you'll do is you'll claim an interface in this case we're claiming interface 0 so this means that we're claiming it for our process after that we'll initialize our buffer this is just some application-specific function and then we'll send some data to the device we call this function LibUSB bulk transfer this takes in a handle to the device address so endpoint address here so this is endpoint 1 out it takes the buffer, the links and then there's an actual links so that's the links that will be returned to us that was actually handled and then there's a timeout on the end so if you wondered how the timeout worked it's just as simple, you pass that into functions on the host side so we call that function it'll send data to the device and it'll return us a code as to whether that was successful or whether it timed out whether the device was removed completely from the bus that's something that does happen the next is we'll receive data from the device here's how we're calling the same function so we pass in the handle the difference here is that we pass in an in endpoint so hex81 is endpoint 1 in and then we pass in the buffer, the links and this is the maximum that we're set up to receive but we also pass in this actual links actual links will be populated by how many bytes were actually received so you could receive fewer bytes than you asked for if your device does what we talked about earlier in the course after that you can process the data as you see fit at the bottom so that's the most simple libUSB example we can see here some observations for that that the length can be arbitrarily long and it can be longer than the endpoint and if that's the case the host will just keep asking for transactions until it's received that many bytes or if it's received a short transaction notice the actual links which we talked about this indicates how much data was actually received takes the endpoint address which we talked about interfaces must be claimed before they're used the timeout for a timeout 5 seconds is good for general purposes but recently I made one where I had a timeout of over 90 seconds I was erasing a flash chip and I wanted to just wait until it was done and it took about 60 seconds there's no reason you can't make it arbitrarily long some people might think that's abuse of some kind but I don't know it all depends on your use case so the previous example was very easy to use but it's not very fast if you want to transfer a lot of data if you want to transfer a lot of data you need to not use that API but you need to use libUSB's synchronous API and so we've kind of run out of time here so I'm going to leave this as an exercise for the reader how to use the synchronous API basically you create a bunch of transfers in the library and as they complete you'll get callbacks and the reason this works faster is because there are always transfer objects that are active in the kernel so the actual transferring of data is not dependent on user space it's not dependent on user space processing that data the data can be received and the user space can be given maybe a handful of transfers if you have any information about USB performance you can see my ELC 2014 talk titled USB in the real world there's a link there several different types of communications are compared and that's it thanks for coming today don't really have any time for questions but if you want to come up and talk to me afterwards that's fine or if you want to email me or call me just don't do it in the middle of the night be aware that the videos are online at YouTube and on the Scale Channel and also the slides will be uploaded to the Scale website too they're available there as well thank you for coming my parents called me Michael since I was born so yeah I get it so welcome everybody this is the last talk of the day here the embedded track this is being recorded for YouTube and it will be available after as well so you can go back and review it the slides when we're done will be located up on the website as well so they'll be available as well if you have any questions feel free to ask questions as you go along today's last speaker is William Turner and he's going to be talking about getting on the mesh and getting off the grid so take it away hi guys thanks for showing up so this talk is kind of a high level landscape of no it's on can you guys hear me in the back or does it need to be a bit louder I can yell a little bit if needed ok alright check check check everybody there we go that sounds good cool yeah so this talk is kind of a little bit of a high level landscape at first of existing mesh networking technology a little bit of a deep dive into some of that networking technology and what communities are using that and how we could potentially use that in our area here so I think they kind of covered the agenda there but first I think we need to define what a mesh network is and so kind of a traditional definition of a mesh network is a mesh network where all nodes in the network participate in assisting the network in the context of a routing table it would be each node in the network sharing a shard of the routing table or having a complete copy of the routing table et cetera but I think for the purpose of this conversation we should probably loosen that definition and really think about a mesh network as just this big kind of decentralized distributed network that really kind of comprises aspects of a decentralized network and a distributed network so most of the projects right now that's built out large scale geographically large scale mesh networks are kind of a combination of decentralized and distributed networks in that not every node is connected to every other node but if you're looking at this decentralized graph here these nodes would actually be connected together so most of the projects that we're going to look at kind of look like that so when you think about a mesh network I think it's important to not focus on any protocol or any sort of technology but really think about what the purpose of this technology is and the purpose is really to ensure connectivity in a diverse array of conditions and problematic conditions really so the idea is networks should be self healing it should have notions of auto registration it should have notions of trust it should have notions of decentralization et cetera and so I think for restricting our definition from allowing all nodes participating in the network right now the internet by that definition is the largest mesh network in the world and so when you think about that it kind of becomes apparent that mesh networks are not really about any one particular technology or any one vendor they're about solving connectivity problems connecting machines and connecting people so when we think about internet of things when we think about internet access in areas in Africa where the internet is not exactly the same as what we perceive here where Facebook is trying to target with free basics situations like that those are kind of the areas currently where mesh networks are thriving but it's even more than that so a lot of existing mesh networking technology out there and it's not just for what we perceive as computers PCs, laptops a lot of it's for embedded devices and even before that it's for radios so one technology that's been around for a while is called Batman and that's an acronym for better approach to mobile ad hoc networking they provide layer 2 and layer 3 networking there's Batman Advanced that's a kernel implementation of Batman protocol operating at layer 2 with some various efficiency improvements there's OSLR that's another one that we see deployed pretty commonly you have IEEE's 802.11s for using mesh networks and so that when you see a product like by Netgear where it's an in-home mesh network or like a Google mesh network typically a lot of companies are starting to implement 802.11s based specs to create these kinds of networks one that's particularly interesting to me is called CJDNS and so that's a layer 3 routing overlay network and what's interesting about that is it uses a PKI crypto system to actually handle addressing so it's not exactly a typical IP based network it's much more of a trust based network kind of setup but getting back to embedded one of the classic kind of mesh networks that we think about is ZigBee and so ZigBee's been around for a very long time and that provides mesh networking at a hardware layer so that from a software standpoint you don't really perceive any of the mesh overhead or any of the complications with dealing with reliability and checking and making sure nodes are there etc. You also have Z-Wave that's used a lot for home automation it's kind of an interesting one because it bridges the gap between communicating over existing power lines in your home and radio as well and creates a mesh fabric between both of those so all of the technology is in use in many many locations there's a lot of devices that are in hospitals right now that use ZigBee to communicate so there used to be things like heart rate monitors blood pressure monitors sugar level monitors so they're relying on the fault tolerance of the network basically to ensure that the data gets to their centralized servers successfully so in talking about building a big mesh network we've got to consider a lot of stuff first of all if you're thinking about a wired network Layer 2 access is typically pretty easy especially at a small scale it becomes a little bit more interesting when you're talking about wifi in the sense of how do I get these two devices to connect to each other and form this Layer 2 network on which we can establish that network so if we're looking at Batman advanced it's actually fairly easy to establish that network I'm not going to go through it here but once you get the network set up you're basically going to be able to communicate with it through MAC addresses so likewise we're going to need a Layer 3 for any sufficiently large network Batman will do that OSLR will do that CJDNS will do that there's a whole gamut of different steps that will accomplish that and it's very similar to what we would experience just with regular TCPIP or IPv6 so you might notice here that looks like a regular IPv6 address but there's a CJDCMD-NG in front of that ping command and it's also not a ping6 command so it's a little bit tricky here but what's actually happening is that IPv6 address is actually being mapped to a string that represents a public key for a node and then CJDNS is using that public key for the communication so it's not a straight ping let me see in this particular slider in general I can't it's an image it's a ping though there's not too much to see it looks like a normal ping as well but you'll notice in the reply there you're not seeing that actual MAC address you're seeing a string representation of the public key for that node so it's quite a bit different than just doing like a ping6 to an IPv6 address so let's talk about some existing networks so this is not something new many communities in the world have sought to do this out to create mesh networks for various reasons one of the big reasons that you'll see in lots of developing countries for creating these mesh networks is actually a lack of internet access so it's a common theme there's a major city that has decent-ish internet access and there's a village around 60 miles outside of that city that has no internet access it's not economically viable to run dark fiber there or to create an ISP there so what's left is for the community members to figure out a way to get that connectivity to benefit their community so there's some various technologies that people have developed to kind of handle those sorts of issues so New York City actually wanted to kind of handle this a little bit and there's a community there created this community called New York City Mesh and what they did was they leveraged OSLR and Batman and CJDNS at different times and created a custom open WRT firmware that they support and deploy nodes with and so this is a community-driven effort and right now they're continuing to deploy nodes but their goal basically is to create a wireless network that can be accessed outside of the internet essentially so it's their own public land that people in New York City can access so if you wanted to post content there and not have it regulated on the internet or for whatever reason wanted it to only be available there that's kind of the service that they're providing but they also establish VPN links to the internet so it serves as a free municipal Wi-Fi existing outside of normal political structures or corporate structures and so I think that's kind of an important thing to acknowledge with a lot of these projects is that that distance and isolation while still co-existing with mainstream connectivity the biggest one that I find when doing my research is the free fun community and started in Germany this was started as kind of an activist project by a group of people that really wanted to ensure freedom of speech and really net neutrality and so their goal was to build a net neutral network spanning multiple countries their goal was a little bit ambitious and so what you're not going to see is multiple countries connected through actual like layer 2 links they rely on VPN tunneling to accomplish this but currently they have more than a few thousand nodes accessible there continuing development and so the way they work is you can sponsor a node you can purchase a node from the community and run it but they also work with local governments to do these installations within their community and continue to spread the network and so they're focusing on creating content within their network but also providing that kind of free municipal access that you see certain cities providing that's a very mature project they're using Batman advance for layer 2 and also are for their layer 3 this one is kind of fascinating this is the village telco project and so this was started in Africa and the goal was basically to create a software platform that would enable communities to become their own internet service providers and so this is kind of the situation where perhaps someone's able to get a single uplink into their village but they need some way to distribute this so the people with village telco created this product called the mesh potato and it's basically a little embedded platform that will act as entry nodes to their network but in addition to that and unlike any of the other communities I was able to find they create a server side component that allows you to handle billing and creates captures the notion of what existing telecommunications do provides granular enough control to be able to actually charge whatever you want or make it free or simply monitor things like that so it's kind of a little bit different of a vein in that it has kind of a capitalistic undertone to it that I think a lot of the other ones but they employ a lot of the same technology so again you're going to see openwrt firmware on embedded devices kind of the typical suite anything you can run white Russian on you're going to be able to run this on again they use Batman and really their goal is to enable the little guy to be an ISP yeah openwrt is a firmware that you'll find that runs on mini MIPS and ARM based wireless routers and socks so majority I think it's safe to say that a majority of Broadcom Atheros chipsets and socks support openwrt so right now you can build openwrt using BitBake through Yachto project and a few different other ways but it's kind of been a standard for a while now I believe it was developed in South Africa yeah it started locally to meet a local need and so they're trying to kind of increase the proliferation there but what's interesting about the village telco project is that one's gained some worldwide attraction there have been people deploying it in Columbia Africa a couple deployments in Asia I believe and so it's definitely the kind of project that is definitely having a real world impact in some people's lives even at a small scale now one network that's of particular interest to me is a network called Hyperborea and so Hyperborea is a project that uses CJDNS as a layer 3 routing engine to create this encrypted public key based infrastructure and so the ultimate goal in the creation of this network by the people who envisioned it is to create a separate internet that's completely decentralized completely it's not anonymous but it's based on PKI style infrastructure right now currently there's not enough nodes set up to do that so they rely a lot on VPN tunnels to connect different regions together and the idea basically is that you create this trust based network so Billy knows Sally and they trust each other so they exchange public keys and now they're connected but Sally also trusts Jimmy and so they do a key exchange but Billy wants to talk to Jimmy but they don't trust each other so we're going to route traffic through Sally so that those two people can communicate with each other and so you see similar style networks from Nullsoft when they created Waste in the early 90s and that was a highly encrypted chat platform that kind of had a similar idea except this is taking those kinds of ideas to the next level and extending it to really be a full network kind of stack on Hyperborea it's kind of an interesting weird place when you start browsing around you'll find there's IRC servers on Hyperborea there's email clients, email servers there's Reddit-like websites on Hyperborea news websites but there's no central index of these sites it's very hard to find things it's very user friendly and it's very wild, wild Westie don't be surprised if you go to a website and it was there yesterday because that person's Raspberry Pi got unplugged it's not ready for prime time and that's the purpose of this talk so this is taken from the Hyperborea website this is kind of like their manifesto so the key feature of Hyperborea is that they do this mapping between their PKI system to IPv6 so basically existing applications just work kind of automatically as long as they can support IPv6 and that creates a low barrier of entry for software developers and really anybody trying to get something onto a network of this sort so currently there's clients for Linux, Android OS 10 and what this means is like your existing network stack on your machine is not able to connect to a Hyperborea to a CJDNS network and installing more software it's completely its own thing it's a separate layer 3 it's not related to anything that exists now so that's kind of a big barrier to entry and so they realized this and provided an IPv4 to IPv6 slash CJDNS gateway that you can use so there are ways to allow clientless connectivity to this network but you're not really leveraging any of the advantages or graphic nature of that network at that point basically all the traffic going through this NAT layer is going to be using the same public key so you're not exactly capturing the trust elements of it used to be known as Project MeshNet and they rebranded as Hyperborea so sorry about how to get around on Hyperborea it feels largely similar to the current internet except with a few different nuances so you're going to have a public key and a private key and this one is not important to me which is why I'm sharing it but when you start getting this set up you're going to need to find a peer and you're going to need to find a peer to connect to that is public that is sharing their public key and IP address that you can use for discovery to connect to them so basically everything that happens with CJDNS is stored in a file called cjdroute.conf it can be called anything that you want it's the traditional name for the file and it's going to contain some information like what's shown on the screen but one thing that isn't shown is how you enter the peer there historically speaking the idea and design of the community was to actually establish this trust based peering kind of agreement but that kind of fell by the wayside kind of a free for all where everyone needs more peers so everyone would hop on IRC and just say like hey peer for peer you know what's your public key and it kind of defeated the purpose of that system so they're in the community there's kind of a lot of talk right now about whether or not peering is dead or whether or not there should be a centralized database of everybody's public keys how that should work so a lot of things are still up for discussion but once you have a good cjdroute.conf and as a peer you're going to be able to connect the hyperborea so once you do that you're able to launch cdns and you'll see it launch you'll see it spin up and eventually you'll connect and like the desktop now this thing is fully operational and it's not a trap so when you're connected it's going to feel a lot like being on an existing network you're going to find similar tooling for a regular tcpip stack you're going to have your trace route and that's going to do exactly the same thing that trace route is going to do for you on an ipv4 or ipv6 network it's going to show you where you're actually routing and so yeah so it feels similar but you're not going to be able to use the standard tool chain you're going to have to use a tool chain that understands how to communicate with cjdns networks yes yes absolutely yeah and in my mind that's a very important feature of potential governance models and handling that and we'll cover that shortly you're going to have ping again it looks basically similar to ping except you'll notice here I hope the font is a little bit more visible yeah it is you'll notice that we started by pinging an ipv6 address but then when we start seeing responses it looks quite a bit different and so what that's doing is we're processing the mapping from that ipv6 address to a public key and then displaying it in another mapped format that more closely corresponds to how cjdns is actually performing routing to discuss exactly how this routing works would be a completely another topic in and of itself but there's a white paper online that you can find and read if you're really interested this one's going to be a little bit different that you're not really unless you're running BGP you shouldn't really be having peers on your local machine so you can do a cjdcmd command and see the peers now I warn you that none of this tooling actually comes with cjdns and that there are around 20 different tools that promise to do the same thing and I've found about two of them that actually work so this is one of the problems with these projects you have these really disconnected communities of people coming together and making tooling but maybe this one group of people was interested in this technology in 2014 and they wrote their tools in 2014 to be compatible with that version and they said okay we're done, we're no longer interested in this now it works well they weren't closely coupled enough with the actual cjdns developers and so cjdns drifts breaking their tool chain and that's kind of what you've seen so in order for this technology to kind of continue advancing and gaining in maturity there's got to be some better communication within the community I think it's important that the network is decentralized but some communication needs to be centralized in order for us to actually be effective and efficient so what's missing on hyperborea and the unfortunate answer is quite a few things are missing in hyperborea quite a few things namely there's no standard solution for dns there's no standard solution for what I'm calling publishable storage and what I mean by that is if I want to host like a file I'm not going to find like imager, I'm not going to find like an s3 bucket on cjdns so in order to do these projects and make it actually feel like a normal kind of network these things have to be thought about so with regards to dns there's a couple different projects going on right now so by show of hands how many of you guys are familiar with blockchain technology okay great, that's awesome so namecoin and dns chain are two projects right now that are attempting to basically create dns based systems that use a blockchain as kind of like a backend database so in many regards some people are viewing this as kind of like trying to address problems with man in the middle attacks or perceived issues at dns, sec, etc but in my mind the takeaway is really creating a way to move away from the centralization of dns I mean if you think about the way dns works at a high level at the root server level of dns it's inherently a centralized infrastructure now it feels kind of decentralized when you start delegating zones and you have your own zones and things like that but if you think about creating this big widespread network it's not really going to work unless you have this central trust entity and that kind of defeats the purpose in a lot of these networks namecoin is the most mature system they can only register dns entries under the dot bit prefix again or suffix right a little bit too complicated to go into details about how that works here there's another project by ok turtles called dns chain and that one's a little bit interesting because they're actually providing dns layers on top of multiple blockchains and so they can provide they can use the namecoin blockchain they can use qid blockchain or nxt blockchain so that kind of creates a lot of different options and it's probably important to be open minded about things like that because none of these are guaranteed to be successful and like with any software project agility is a valued attribute now with decentralized storage one thing that's going to that you're going to have to think about is if you have this network and you're connecting it with wifi or with landlinks you know connected to your neighbor you're going to run into some serious bandwidth issues and caching at some point is going to become vital and so when I think about these networks and how a network would operate let's say like you have four villages in Africa trying to operate these networks and trying to get internet content I think it's wrong for the focus to be on making it work the way the current internet works like I don't think it's reasonable for you to think that you're going to be watching Netflix in your house and then your neighbor is going to be like watching Hulu while your other neighbor is playing Counter-Strike it's not I think going to be a system like that it's much more a system like well every night we pull down mainstream updates on Wikipedia from the real internet and pull it into our local cache so that people can access it so you know in my mind I'm thinking more newspaper than Twitter you know you're going to get a copy of it every day at some point in time not instantaneous push notification instant gratification so right now there's kind of a couple of different promising technologies that are trying to address some of these distributed storage problems one of the more interesting ones in my opinion is IPFS it's a protocol that provides de-duplication and file storage it stands for the inner planetary file system which is kind of an ambitious name but we'll see how that goes currently they are only on planet Earth but we'll give it some time and so yeah basically what they're doing is they hash out all of the blocks and distribute the blocks to nodes within the network so when you create a file on IPFS it's going to be present on your local node until someone else requests it and then they're going to cache part of it and so the great thing about doing this is you're starting to build kind of the notion of a CDN if you will and now granted it's not going to be fancy and stored in memory and performance tuned and everything that you should get if you're actually paying big bucks for a CDN but some of that logic is there in kind of a similar sense so when you request a file you're basically doing a query distributed to a big DHT saying well who's got this hash and you're going to figure out what's closest and where you should get it from and that's kind of the way it works so that's not blockchain based it's just it's math basically now StoreJ is blockchain based they're trying to compete with S3 they're a little bit more centralized than IPFS and that there is kind of this core body of developers and core community that handles governance and things like that so basically it's a similar system except they're more using blockchain technology to keep track of where the files are and payment information so that's kind of where these two differ IPFS doesn't really provide a notion of like if I want to rent out 20 gigs on my hard drive and I want to charge 15 cents a gig you're not looking for IPFS you're looking for a technology like StoreJ and something that could be valuable if you're thinking about creating a decentralized storage platform that individuals can use so that's been around I think for a little over over a year and a half, two years now but just starting to gain traction so let's say we looked at these networks in these other countries that we discussed earlier and the one in New York City and decided well, what if we wanted to build one here, what would that look like so this has actually kind of happened there's a CJDNS Hyperborea extension in Austin, Texas and there's one in Seattle and between those two they're comprised of around a thousand or so nodes and this is a community of hobbyists and Linux enthusiasts basically just wanting to kind of create a community-based project now the interesting thing about their networks though is they suffer from kind of all of the problems that I mentioned before and that there's no DNS there's no storage idea there's no idea for caching so it's very much at this point feels kind of like a hobbyist level enthusiast level kind of project and I don't want to discredit the usefulness of it I think it's immensely valuable and useful but there's a lot of next steps that need to be taken to really move forward so to build our own network we're going to need hardware we're going to need software tooling we're going to need governance and most importantly we're going to need a community around this system so if you start thinking about this hardware one of the biggest problems that I see with people who've tried to create like a Tor router it's kind of the closest analogy I could think to a project like this that I've seen commercially before a lot of people are finding reference designs on Alibaba and then shipping off a custom open WRT build and then building it and there was an incident around two years ago where someone did this and they had a Kickstarter project and I think they raised around a million dollars for it and the problem with it was is that no one actually knew what was on this board the people who were running the project didn't have the capacity to research it they didn't have the ability to JTAG into a chip and figure out what was actually going on so the point is they don't know if this system is that they're trying to claim as a protection against tampering or privacy invasion actually didn't have a backdoor or do anything is either custom or open and actually verifiable and I think another aspect that's important that's often overlooked is when you think about things like a Raspberry Pi and it's very true that a Raspberry Pi is open source but a Raspberry Pi is not easy to manufacture if I as an individual wanted to make a Raspberry Pi I'm going to have a very difficult time doing that just because of the size of some of the components including BGA devices sourcing some of those components it just doesn't really scale down to the level that a community might need dual WLAN interfaces might be important it might be desirable to go ahead and provide traditional IPv4 networks at the edge and only do encrypted new mesh networking protocols as a backhaul kind of network and the real advantage there is if you think about the people who you're trying to target for a network like this if you have to go and give this talk that I'm giving to someone walking down the street for them to understand how to connect to your network it's a fundamental problem you shouldn't need to install any software to be able to leverage mesh technology so by having a NAT layer and a routing layer in between a standard IPv4 network on a separate interface you can leave that kind of functionality and so that kind of ties into in order for a network like this to be successful it's got to be usable by someone non-technical a kid with an iPhone should be able to see a wireless network and connect to it and have it work automatically no matter what's actually going on behind the scenes so we start thinking about questions like how do we administer nodes without centralization how do we monitor status and availability and from a routing standpoint CGDNS does an excellent job but if a node is down and not responding it's going to get removed from the routing table but as humans what do we do about that do we actually care that that node is down do we want to let the human operator know it's down and see well are you going to make it come back up so there starts to be kind of a governance issue around these networks in kind of the same way you know the existing internet works right now we have PeeringDB and that's used for handling BGP peering agreements things like that but with CGDNS there's nothing like that it's a total free-for-all so I don't mean to sound like you need centralization per se but there should be I think there probably should be some sort of communication platform how do you assess network performance in a network like this you know if you have hundreds of different ways to route to different nodes how can you be sure that you're doing it in the most efficient manner how can you test this performance without actually impacting the performance of real users these are questions I don't have answers to and most importantly how do you prevent congestion if someone decides to try and do a denial of service attack on a node in the network how do you mitigate that risk and what does that situation look like now currently I haven't heard of anything like that happening with the existing components of the hyperboreal network namely because it's just run by a group of you know Linux enthusiasts but at scale issues like that you would kind of expect to arise so from a governance standpoint when making these nodes if you're thinking about trying to make this network actually reliable actually useful and not just kind of a toy there's going to have to be some sort of software release cycle there's going to have to be some sort of hardware release cycle and termination you know life cycle associated with all of this so what is that and how do we handle that what does that look like if we built one of these out as a community in the LA area what would we how many years would we say we want to use this would it be acceptable to still be using a Linux's WRT54G from the 90s running OpenWRT in 2020 those are the kind of questions we would need to ask ourselves how do you connect to this network once a big network is connected should anybody be allowed to join within the context of creating a mesh your network is going to get more efficient with more connectivity potentially depending on the uses so you want to have people joining the network but there should also be some sort of criteria there should be some sort of compliance going on how do you monitor how do you handle abuse there might be people within the network that are trying to transfer large files back and forth to each other for absolutely legitimate reasons yet it's very anomalous in comparison to an average sort of use case so these are all questions that would really have to be addressed when trying to design one of these networks and they're tough questions to answer but most importantly for a project like this to actually gain traction as we've seen in different areas in the world the most important thing is you the most important thing is the community in all of these projects and without an active community engaged in the belief that people should have open access to information that isn't controlled by government or corporations it's going to be very hard for any sort of thing like this to actually exist as anything more than just kind of a type project or a toy and so that's kind of what we're going to talk about here is basically how if people are interested how do you get one of these going in your area so it's created a meetup group just to see if anybody's interested in trying to pursue some of this technology in the LA area I think there's probably a lot of good that could happen from a social level here personally one of the things that interests me the most is creating an offline and by offline I mean not paying an ISP copy of Wikipedia that updates and synchronizes with Wikipedia and there is an active project in the city of LA to create a municipal wireless network but that's going to be funded by advertising and given certain current stances on net neutrality I'm not convinced that it's smart to hedge against that so I would love to see it's like we'd like to say that the internet is really a utility but it's not really treated as such and I think it would be fantastic if we as a community could decide that these are important community values that we hold and kind of increase the spread of that information so yeah if you want to reach out to me at Twitter send me an email join the meetup group we haven't had any meetups yet it's a very new thing we're just starting to get organized and if there's interest we'll have a meetup probably within the next two to three months as an initial meetup and so the structure of this really is to kind of create a bunch of different pods to brainstorm about some of these subjects this is way too much work for 5 people, 10 people, 15 people and many of these problems need tons of thought like the DNS problem in and of itself is a huge issue, the governance problems are huge issues but if we're able to come together as a community and brainstorm about these ideas it's my belief we might end up with something really cool and something that makes a really good impact on our local community so I can take any questions now if anyone has any but yeah, yes okay, did they provide tooling to help with that, like software tooling or yeah, very cool that's excellent, yes exactly, you hit the nail on the head so what the question was basically is why wouldn't you use ham and if we were the military that would be a great idea but you can't as a civilian do encryption over ham radios as far as I understand and so based on that limitation it's, I'm not going to say that there couldn't be some use for that I'm sure there could be but it would kind of greatly affect the usefulness and security of the whole platform are you referring to something sort of like a lime wire or like a e-donkey, a cadimlia I mean there have been tons of different peer-to-peer file sharing protocols that have existed over time but the difference really is kind of in creating a global namespace for those files that doesn't really exist within a single product and so what's kind of interesting about IPFS is you're able to just import like a JavaScript library and just include files that are stored on IPFS and like a normal webpage so it feels very similar to existing technology except it's totally different behind the scenes I think you hit the nail on the head there but what you would be looking at in any of these networks including the existing ones is not the current Internet it would be much more similar to hopping on to dial up in the mid-90s you're talking text-based and there's going to be lots of latency but the difference is you're not trying to watch a cute catgift you're trying to download information that's going to enrich your life and potentially help you or you're trying to gain access to news information that has some value to you well yeah and I think that's where with a lot of the stuff especially with some of the routing protocols we're talking about here there doesn't seem to be an implemented caching layer I think if you could capture the notion of a bunch of people in my area are requesting this content let's cache it and charge it across nodes in the area or stick a copy on a node what have you you could definitely achieve a system like that where you're going to actually find a compromise between performance and scalability you're talking about specifically should you even use Wi-Fi for doing this I have seen some research on that before I can't speak to actual numbers on that though but yeah you can make an 802.11 network that will go 30 miles or so I believe the longest network using repeaters was something like 700 miles it was very very large don't quote me on that but yeah you really can absolutely and I think it's going to be different at different points too if you think about how this would be laid out in the context of a city skyline if you're trying to get from one area of town across the town it might make sense to have highly directionalized point-to-point links on the rooftops that connect to each other but then use more omni-directional style for the actual edge networks where the clients are connecting so that you don't have to be as precise with that targeting so you're saying like if you could connect to one of these mesh networks with Bluetooth or Wi-Fi or any sort of different you haven't heard of anything like that you can do Bluetooth personal area networks so it would certainly be possible to do that I don't currently know of an abstraction layer though that you would be able to connect to using you know one of many different types of protocols that then maps to some sort of back-end link-up technology but there's no reason one couldn't do that if you were inclined to write that software oh I see what you're saying yeah that's interesting so what you're saying is you frequency hop yeah okay all right yeah I've got to agree with you it's not real security and the question really is is it sufficient for what you're doing and so I'm trying to avoid using the words like freedom of speech and things like that because I think the more important focus is on ensuring that people have access to information rather than ensuring necessarily that other you know that it's private but yeah I think security is a very important kind of focus but yeah it's interesting I didn't have a cloud in between it hopefully yeah yeah and so that's another great technology that can be used I think Ubiquiti Networks has a couple of products that do that and that could certainly be used as kind of like a backhaul layer for a network like these that are being built absolutely anybody else have any questions or well thanks for coming hope you enjoyed hope to see some of you guys soon thank you for coming today tomorrow morning we'll be starting right after the keynote back in this room again for more embedded Linux topics and thank you for coming again today and these slides will be uploaded along with the video which will be online at YouTube thanks bye