 With that, welcome everybody to the presumably first slot for you of real content. Apologize for the small content that it had to be. Okay, so welcome. What does a CPU do before going to work? That's what I'm going to talk about, and especially in an embedded system, where we will have some variations of the embedded system on the scale. First, my name is Joseph, Joseph Hofmar. I'm serving as head of developer relations for Mender.io, which is an OK solution for embedded Linux systems. This is what I do for a living. So whenever you either want to update your stuff, or you want to talk about updating, or you just want to ask me not about being called Brooks, then you just hit me up. Beyond that, I have a real strong background in the Yocto project, which actually led up to this position. I'm a Yocto project ambassador. I run a lot of social media for the Yocto project. I'm involved with Open Embedded. So you can see me on many occasions in the embedded Linux ecosystem. And completely unrelated, I also am a so-called community hero for Github.io, which is like this trolling steroids in your browser. I'm really, really easy to approach. As you are all here in person, except the ones that are not here in person, just walk up to me and say hello, or whatever COVID distance you prefer. I'm fine with that. Everything. Shoot me a mail, ping me on Twitter, whatever you find. I'm happy to talk to each and everybody. So, with that out of the way, let's get to the good stuff. We have talked about me. Now I want to know a little bit about you. A lovely lady in the back. I actually forgot your name. It's okay. Christy. Christy. Okay, Christy. Because this is the first interactive part. I want to know a little bit about you, which means the people who are joining remote, if any, they are hereby invited to make some reaction and to something into the jet leather smiley flow, whatever that thing supports when I ask things. So, the first question here is, who in the room or the virtual audience considers themselves a hardware developer? What local? No remote? We've got like a 30 seconds left. Okay, that's no fun. Okay, so I will pretend that the virtual audience is just like here, which means that you probably have like 200 developers. Okay, that's not that much. Who considers themselves a software developer? Lots of software. Okay, who considers themselves an embedded Linux developer? The majority, who considers themselves an RTS or very mental developer? Not at all. Oh, cool. Then we actually might have some learning activity for you because a lot of this stuff is not about embedded Linux. This is how this presentation would work. I already said to Christy, I am highly interactive. You might have noticed. And I encourage my audience to interact with me. And I know people don't need kicks. They sometimes need kicks, but what even works better is rewards. And I reward interaction, which means that anybody who interacts with me in some form, whatever, you can come up here and hug me if you so wish. You can propose to me. You can say that I'm an idiot, that whatever I say is completely wrong. You can just stand up and clap, everything. Whatever you do to make this session here more interactive is rewarded by me with chocolates delivered, usually by airman, until I run out of them. Then you'll have to do without rewards. Christy, unfortunately, this does not work for you because you represent too many. I'll try to get you one as a representative for the rest of the workshop. I will gladly eat it. Shall I do this? What does your audience say? Oh, you may not put it. I said, oh, you may not put it. But if you want to... It happens. It happens. Okay, so you have noticed my talks are somewhat entertaining, hopefully. Or cheesy. Feel free to play. Okay, so what will not be in this talk? x86. If you expect anything on x86, you are invited to stay for the show, but not for the content. If you want content, you better leave now because it's just not here. What also will not be in here is full-time optimization because it's a running joke that at every proper ELC, there is at least one full-time optimization talk, and it effectively consists of two things. Put everything into modules and there's a looper in here. So, see, we've covered full-time optimization. There will be nothing more in this talk. There also will not be any code snippet, blueprint, whatever, that you can apply directly to your code because we're talking about concepts here. Once you have understood the concepts, then you are able to find the code that matches your problem. But if you have no clue what your problem actually is and the concepts that you have to tackle, then you end up copying, pasting random snippets from Stack Overflow and complaining because they don't work because you haven't understood what you're actually looking at. And last but not least, very specific and complicated things like NVIDIA Tegra or the presentation on the AM6, whatever, by TI, which has like a gazillion bootstages and stuff like that, also will not be in here. We are, again, covering generic concepts. So, what will be in this presentation? The real low-level stuff, what the CPU does before going to work. And we will, at some points, actually go down to the real electrical kind. So, things that affect booting that you will be able to see on the circuit board. We will have a little bit of historical background, hopefully, and we will have a metaphor, a mental image that hopefully serves well enough for you to understand where the stages actually fit in. My colleague said it's okay. Other said, it's super cheesy. I kind of like it. Okay, because it will work like that. I tried to be a 1MB Randall Monroe. I got myself a tablet. I scribbled some things. So, I pretend this is like XKZD. It actually is not probably because the quality of order is much worse, but hey, it makes me feel good. So, this is what will be used to visualize everything. And here comes the real content. All of you, what do you do before going to work? Sleep and wake up. That's... Sleep and wake up. Okay, I actually only realized that somebody here sleeps. Somebody here wakes up. And I forgot. You did take breakfast, right? Okay, so, we pretend that we are a CPU now. And a CPU, when you switch it on, it effectively becomes somewhere from sleep. It wakes up and it's back. What it does, yeah, what actions do you need to take and what actions does a CPU need to take? Stage one. The real dirt is cheap flash microcontroller. What does it do when it wakes up? Usually, it does have a couple of K, up to a couple of hundreds of K, nor flash. Nor flash is important. Included. When such a system puts out, when it gets power, it actually does one thing. It looks at a flash at a specific point, which I'll get in a second, and then it just runs whatever is there. That's it. There usually is some minimal rollover included, which you need to actually get things into that flash. But beyond that, it does nothing. It's completely dumb. Power on, do something. And that's it. Examples for this. Contemporary ARM M0, M0+, stuff like that, ESPs, and basically any proprietary small microcontroller if it has on-chip flash behaves roughly like this. The metaphor is this. You stay in your bed all of your life and you have a laptop. You wake up, you start working until you like power off and you wake up, you just start working again. This is the life of a microcontroller. Home, office, to the extreme. What does this technically mean? And that's why I said nor flash. Nor flash is usually connected via a full bus width, which means it can be directly accessed at any address. It comes directly into the address space that your CPU can see. So the arrow is here is the program counter or where the CPU looks at first. And this is how your microcontroller sees the world. You power it on, it looks at one point and then it just executes whatever is in the flash. Those usually have almost no initialization, a little bit of GPIO, usually one tick to base things off, usually, but sometimes not even that. Peripherals, only the bigger ones, yeah, that's it. That's really the absolute bare minimum what a CPU can ever do to get to work. Then we enter stage two. We are staying with memory because I said the small microcontrollers, they have a couple of K to a couple of hundred K of NOR. Once you start adding stuff, and especially like HMI stuff, a graphical display or whatever, you need more memory because you will have graphical assets that have to go somewhere. Or if you have network connections, if you have edit connectivity, your software grows. It just doesn't fit in there anymore. So you start adding external flash. This means the CPU now needs to know how to actually get stuff from that external flash because they are very, very similar. It's again, it's a full width connection, but there's variations. And depending on your pinout, we come to the electrical kind here now, you might want to connect it over 8 bits with 16 bits. How are the upper and lower bytes addressed and everything that. The CPU needs to know that in order to actually be able to boot. And how do I tell this to a microcontroller? If there's nothing that even runs there, we use pins usually. Who of you has heard the term pull ups and pull downs? Who of you can actually explain what it is? Okay, I actually had help, no more. So first big learning experience here now. A pin on a CPU should have a defined state. Either it is high or it is low. And this is the magic that happens here. To tell the CPU things really, really early during booting, it looks at the pin state because that's the only thing that you have. How do I tell something to a pin state? I add a resistor that connects the pin to either high, means VCC or ground. So I can per pin, I can give the CPU one bit of information, one or zero. Is that clear as a way to pass information? I take that as a yes. So why pull up and pull down? The pulling is clear. It pulls the pin to either high or low. But why a resistor? Because it would be really, really wasteful if you couldn't use that pin for anything else after booting. That would be an expensive way to pass information in. Therefore, you add a resistor, which is usually in the 10K or 100K ohm range, which means this limits the current that actually flows in and out of the pin. And once the CPU is properly booted, the pin, if it's an output or whatever other peripheral that you have, usually can supply a lot more current that this tiny pull up and down, so it's just easily overridden. This is, we are talking about an initial state that only serves for the first couple of milliseconds, usually when the CPU powers up. And after that, the pull ups are completely overridden. And the bigger the resistor's value is, the smaller the current that flows through it. I told you, we are talking about electrics. And yes, this is stage two. We usually add pull ups and pull downs to pass information into the microcontroller to tell it how the external memory actually looks like. And once the microcontroller has found that external memory, it just works like stage one. It looks at a specific address and executes whatever is there. Examples for this are the higher performance microcontrollers. I've personally done a lot of this on Cortex M3s. I know that Cortex M4s can do the pretty much same thing. If you're looking at risk five, then the kendrite is pretty much like this. But again, a lot of the bigger proprietary microcontrollers behave just the same. The analogy, you actually get out of bed, but not that far. You have to make it from your bed to your desk. And analogy are that the pull ups and pull downs are basically like a handrail. You fall out of your bed and you grab the rail and it tells you where your desk is. And you walk along the handrail until you find your desk and you sit down and then you can finally work. But at least a little brain is involved grabbing the handrail. Was there a question somewhere? No, okay. Technically, it's really like on stage one and we have already talked about pin state. To visualize it's external to the CPU and depending on how it's connected, you get it in there. Stage three, now we are talking about software. And remember, we are still in microcontroller land. A microcontroller that has a bootloader usually means that you have one small and one big application. You put your first application at the point of flash where things are executed as I pointed out in stage one and stage two. This bootloader that is usually very, very proprietary because you made it for your use case. It can initialize additional memories. It can check if there's a real application loaded somewhere again or it can help you with getting the application in there. It can also help you with port bring up. That's a classic use case to have some really initial debugging functionality in there. And what does this mean now I would say? We are getting more towards real life. You get out of bed and then you at least do something to get you in shape for real work. I think breakfast serves as a good thing. But you see, these are two tables and the tables look the same because to the CPU it actually is the same. The CPU does not distinguish if something is a bootloader or is an application. These are just two programs that are linked to different places in your memory. The CPU again starts out a specific place. The bootloader does its thing and once the bootloader decides, well, cool, booting is finished, we've got to do something else. Then it just takes the arrow, puts it somewhere else. So the microcontroller knows where to continue running and then the application goes on. That's it. This is how a bootloader on a microcontroller works. You link to two different locations and you just manipulate the program counter. I think we have covered enough gory details of microcontrollers by now because about everything that I told you so far basically applies to bare metal stuff and bare metal here means that is stuff that really runs directly on the CPU. You can see all of the registers. You can see all of the peripherals. You can do everything. You have no higher libraries usually. What does that mean? This is just like you in your home office. If you are in your home office, you're king of the hill. In your flat you can do whatever you want. You can have video calls with no pants on. Nobody will care. You can work from your bathtub as long as you don't put on the camera. Nobody cares. If you get your stuff done at home, everybody's fine. This is how microcontrollers think of the world. But sometimes and depending on your profession, you might need more advanced tooling. You might need a milling machine or if you're working in electronics, you need pick and place machines, surface mounted device reflow stuff. Usually you don't have that at home. Usually you have that at your company somewhere. So this is where we are stepping up one big notch. We are stepping up to stage four to microprocessors. My personal rule of thumb, everybody here is invited to disagree. My personal rule of thumb is I call it a microprocessor once an MMU is involved and used. There are like minor protection-ish things on some microcontrollers too. But if you ask me that it's just a pain to use and nobody does that. So in the end it's just again like a microcontroller. Once you're running something that really uses memory protection, I personally tend to call it an operating system. And here things change pretty much. Because it's much less a custom thing than there. Bootloaders especially or the operating system as a whole and also the bootloaders turn much more into a commodity than they are on your microcontroller. Who of you knows of a bootloader they can download for an Cortex M0? How many of you know of a bootloader that you can use for a Cortex A8C? That's what I mean. The rest of you certainly know one too. You just didn't make the link here. Because for example Uboot. Who has heard of Uboot? See, that's a commodity bootloader. It needs some minor patching or tinkering to work on a specific board or processor depending on how far it is away from the nearest one that you know. But essentially it's a thing that is out there and that you can just use. What does this mean again now for the boot team? The bootloader that gets kicked off at the first step is to the microprocessor actually just like a bare metal application. Because why should a bigger MPU do other things or behave completely different than your microcontroller? No, it actually doesn't. Some of you might scream now but wait for stage 5. I've worked on microprocessor units that actually did the real same thing. They had Norflesh somewhere. They booted up or they started up. They looked at address 0 after Norflesh and executed whatever was there. And at the time it happened to be Uboot. And then Uboot does whatever you want the bootloader to do. It usually does quite a bit more than your classic bare metal bootloader. It does memory initialization. It has way more driver features. Usually you can also think of the bootloader as a small monolithic OS in itself. It's not very fancy to think of it like this, but I feel it's like the truth. And once this smaller OS is happy with whatever you're supposed to do, then it hands off to the bigger OS, which in our case is Linux. And this one gives you the libraries, the middleware, the connectivity, and also the multiprocessing and the separation with the MMU and everything. This is beyond what we do at this moment. Why did I say now I've personally worked on CPUs that do it that way? They have become increasingly unpopular for cost reasons. We are going back to electrical kind here. I told you that Norflash is involved if you're wonderful with the simple access and Nor memory is extremely expensive. So if you want to add Nor memory for the sake of simplicity, you pay, I guess, like 100 times the price you pay for NAND memory. Plus you need like 30 pins to connect it, whereas NAND you can connect on 7, 8 pins depending on whatever you actually use. So this is more a thing of the past and it still exists for some lower end A5s, A7s, but actually I've used it in the good old ARM9 days. This is like I see the metaphor here. You get out of bed, you do your breakfast and then you can just like walk over to the company and in the company you're under more control. This is the memory management and the protection. You and your application are not allowed to do whatever you want anymore, but you have access to all the big boys twice. They are at the company. They are in the operating system. What does this mean for the technical details? We've already covered most of it. One addition here. Again, the processor starts out at flash 0 and I told you memory initialization and that U-boot is like a small OS in itself. OS tests usually don't like to be executed from flash memory just because it's much, much slower than RAM. So U-boot as a small OS that it is actually initializes your bigger RAM, your SDRAM or DRAM, whatever you have load. So pulls the rest of itself through the controller in the RAM and then actually executes it. This is what basically happens and once it is running there, it is able to pull Linux from whatever else ROM that you have also put it into RAM, hand off to Linux and once Linux is running it can overwrite U-boot because you don't need it anymore. Does that make sense so far? I know that I'm rambling a lot about memories, but actually when I prepared this talk and I assembled all the information, I understood that if you want to know what's going on there, you have to understand memory because once you have understood how memories relate to the CPU, then you can do all the rest. And then we come to the fifth and the file stage, what you probably all have witnessed. It's essentially the same, the MPUs, but we have multi-stage bootloaders these days. They come in a number of varieties and a number of lengths and the most prominent one is the so-called SPL. Who has heard the term SPL? Yeah, everybody. And essentially it means that the CPU is just able to find a small bootloader binary somewhere, which is just enough to again bring up enough memory to run a full U-boot because RAM is external to the CPU. It does not know how to initialize it usually, how wide it is, how big it is, what timings it needs. This is stuff that you need to tell it, that you need to put into software. And the CPU can't run anything complicated. It usually has like 16K RAM internally. What can you do in 16K? It's not that much. So you have to create a secondary loader. I know it's going backwards. Don't blame me for this. I didn't make it up. This secondary loader uses these few K of RAM to set up the bigger SD RAM, load the real bootloader into the big SD RAM, kick it off. This does its job there and hands off to the actual operating system. This was like the classic booting until five years ago or so. And now, chains have become ever more complicated these days. Risk is adding the open SPI loading usually, if you run an OVN-D1 or Unleashed or whatever they have these days. ARM can load up to or trusted firmware somewhere in this process. These are all things that are looped in somewhere between the stages that I just mentioned. I have looked at the flow of a Tegra and it's not a flow, it's a maze. Seriously. I think it has like 25 stages and there are arrows going back and forth and colors to indicate which component can see which other component. Do you want a cool pose for the photo? Stage 7? Oh, come on. Boring. And the key part here is it's a chain. This is the one thing that I want to convey. Booting is not like some a little bit of this and a little bit of that and then everything comes together and it magically works. No. Booting in almost every case is a chain where each and every step has to be taken, has to be successfully fulfilled and only then you have a usable system. And if you fall over somewhere in the chain, well, then sorry, no work for you. This, like I said, applies to about every contemporary processor that I know of in some variation. The rest piece, who knows how are they called? Not boot image but kernel 7 or whatever, those fancy firmware blobs that you have to have on your SD card in order to actually make it work. It's just like before the SPL there. It's however your company or your vendors call it, it's always this in various forms. The metaphor again, all of this happens at home. Some of you might have expected now that this all comes in the company building. No, it does not. These are all things that you do at home in bootloader land before handing off to the actual operating system. And that's why I ask in the beginning, I for example, I get up, I take a shower, I brush my teeth, I sometimes shave, I take breakfast, I put on my cup of boots and all the things. So this can become pretty much contrived. But again, it's a chain. Brushing teeth before having chocolate cookies for breakfast is not exactly a good idea. It only makes sense in one order. And this is how booting also works. For the sake of completeness, I also created imagery so you can visualize what an SPL does. We're not at stage zero here anymore because I mentioned it. Now we're usually not using nor anymore. We more often than not are using NAND memory in whatever form, be it an EMMC, be it plain NAND memory, be it an SD card, whatever. It's more complex memory that is somewhere. So this is relying quite a bit more on the internal loader, which I pointed out, is just intelligent enough to pull the SPL from somewhere. The internal loader goes to ROM, pulls in the SPL, which again, once it is in RAM, gets other stuff from the flash, puts it in RAM. And this goes back and forth and back and forth until all of the stages are completed and your system is hopefully working. What should you take away? Chains, I can't emphasize that enough. The application on the MCU is what you want to reach or the operating system on the MPU because from the point of view of booting, everything that is in the operating system is like the same. You've handed off from your boot process and we don't care anymore. The more complex your CPU is, the more complex your bootchain will be. M0 on the one end, just one reset vector up to the stuff like NVIDIA Tegra or AM6, whatever. I think I wasn't in that talk, unfortunately, but was somebody of you? How long did he take to explain the boot process? See what I mean? So the more complex and high-powered the CPUs or your socks become, the more complex the bootchains also tend to be. Again, the later in the bootchain you are, the less hardware-specific your boot software will be. The SPL is super-specific because it needs to know about the very exact hardware type of SD RAM that you have put onto your board. It needs to know the timings. It needs to know the pinout. It needs to know its size and everything. You boot proper, which it loads usually doesn't need to know anymore. SD RAM is properly set up. It's living there. It works. Everything is fine. The later you are, the more generic and more high-level you can actually be. And if support is there in hardware, you can do fancy stuff, but only if it is there in hardware. The bootloader, just because it is early in the process, does not mean it can do magic. So if you are on hardware, that does not have an MMU or does not have support for, I always forget the word, sorry, Russ, this trusted magic, whatever stuff you can hide from Linux on arms and or SPI on RISC-5. All of the cool kids have something like this those days. If it is not there in hardware, then your bootloader can give it to you. Some people think, well, if I just put it into your boot, I can do it early enough so Linux won't see it. No, once booting is finished, your Uboot is gone. And Linux is king of the hill unless hardware saves you. Why do I talk so much about bootloaders? Because I mentioned it really, really early. We do over-the-air updates and we care about our updates kicking in really early in the boot process. That's why we know about bootloaders. So thank you very much. If you want to know something about booting, I will try to help. If you want to know something about Mender, I will try to help. If you want to know something about the Yocto project, I will try to help. If you want to have a celebration glass of whiskey with me right after this, I just delivered two presentations in two hotels in two hours and I'm going to pretend that I'm awesome. You're all invited. I have something on me. And yeah, with that, I still got chocolates. Try to kill me with your questions. Yes and no. For the sake of this presentation, I would say, please ask me in 10 minutes. Because even though I'm working for them, this presentation is not on Mender, and I prefer to stay true to the developers that attended it for other reasons. Thank you. But you deserve the chocolate. No more questions than I can make up something? Or do you want to run for coffee early? If nobody screams bloody murder in the next three to one, I will just keep babbling about how bootloaders are built in the Yocto project because that's a common source of pain. And one thing that you might actually be interested in is that while you would be close to a somewhat de facto standard, I also pointed out that it is usually patched per board. And good vendors, which I hopefully all of you are either working with or working for, submit their stuff upstream. And they're not so good and sometimes even bad or evil vendors ship heavily hacked versions of you would along their bills. And then poor souls like me and all my fellows in the Yocto project get annoyed by people who be like, Oh, this doesn't build anymore. And what it's just this you would. And then we look at it. And it's like a vendor version from 2015 or 2017. And I kid you not four weeks ago, I looked at the BSP for an IMX eight. So not something ancient that is currently put in production. And the vendor patched their bootloader so badly that it cannot build without Python too. I have no clue what they did. And seriously, if you work on booting, don't do that. Don't make your users hate you. Okay. It's I know it's painful making stuff work and everybody is happy if they're done and it works. And then you just try to get it out of get it off your desk and live happily ever after. But please, you have users and you want your users to not hate you. Okay. If that is the one real life thing you take away from this for your Linux careers, then this shall be it. With that, I think I have technically two minutes left. And this is your well-deserved early coffee break. Thank you, everybody.