 Hi, welcome. My name is Anway. I'm with ARM and I present today how to get a single image for single-board computers. That's not an ARM limited story. I just happen to work for ARM, but do this on my spare time. So don't claim that ARM does something. That is totally independent. Yeah, so what we are talking about, so just briefly how it is today. Why is it bad? What we can do about it and what works in my lab conditions. Yeah, as I said, please don't blame this on ARM. So what's this all about? It's about those called food pies. Basically single-board computers with some cores. We have a lot of them. And they are all kind of separate, different, actually not. But if you look at them or if you want to use them, they are all kind of special. And you need to do something special for each board. It's all about servers. So if you get some decent boot story out of the box, then you're fine. Then you'll settle your install distributions from USB drives and that's everything's fine. But chances are you're not if you're using SOCs. If you're using SOCs for more Winner, Rockchip, MLogic or whoever. So yeah, for those, this is kind of what we are talking about. Other SOCs are possible. I don't have them at the moment. It's also for storage less boards. So what I'm talking about is the idea of having a single SD card that puts several things. If you have a board that puts from SpyFlesh already, then you're mostly done. Then you can have this there and then update via the distribution or whatever. But many boards don't have, or initially they don't have anything on SpyFlesh, for instance, or they don't have a good story. And also as a kind of disclaimer, when I'm talking about firmware, I'm talking about board specific low level firmware including bootloader and it stops there. So it's Ugoot, SPL, ATF, I'm talking about firmware. It doesn't include any kind of kernels or distributions because on Android, Word, firmware is kind of everything. It is just to get something booted. And mainline, I don't have the time to deal with all this BSP stuff. So what's the current situation? This is the Ambien download page. That's from last year, so it looks fancier this year. So this is one distribution and you have tons of boards. This is just 10% of it or something. And you can see, keep in mind that you have orange by win, orange by zero, orange by R1, orange by prime. So be careful with your, you just don't have an orange pie. You have to be careful with exact board you have. The other way around is you go to the board vendor. This is from, I think, Rock64. You go to the board vendor and then you pick an image from there, which is frankly very dodgy. I'm not sure if this is kind of endorsed by Debian, for instance, the Debian image they provide. Just something hacked up with a BSP kernel and you don't want to know. So those two problems. So there are separate images for each board. Actually, it often contains very similar bits because like 99% of the thing is actually the whole user land, which is exactly the same everywhere. Even the kernel is the same today. Just the tiny firmware bits at the beginning that are different. As I said, boards might be mistaken. Orange pie is special. They're very good in inventing board names, which only differ by one letter or something. So if orange pie 2 plus E or something, that's very... Yeah, so it's not trivial. Also problem with different image. If you choose the wrong image, it might work to some degree, but you have a kind of weird failures then. I don't know, Ethernet doesn't work or whatever. And it's then hard to troubleshoot. Also, the thing is that you don't have just not every board covered and you don't care if just your board is not supported, then bad luck, right? Also you have different qualities. So if you go to the vendor page, you might have a good supplier for your images or not. So it's all kind of problematic. So the idea I had, why can't we actually try to get a single image? So one SD card, basically, that is able to boot all those boards. That sounds kind of wacky. The advantageous would be it could be centrally maintained. It could be shipped by distributions. So if you have one image that covers everything, or a lot of things at least, then distributions wouldn't have to go like ambient. They wouldn't have to provide all the kind of different boards. They'd separate download images. Or even more, I mean, ambient is kind of a special thing, but think of SUSE or RATED, they wouldn't need to ship for every board. And in fact, they don't, right? So they just provide a subset of boards that they explicitly support. And it was also a relief to people to know beforehand which exact board they have and which one to download. So what we need to do to get this working. So the first thing is we have one SD card to put it in. We need to get something booted. So we need to get our own code running. It's the first step. The second step is to detect the actual SOC that we are on. And the third step is to detect the board. Once we are there, we get basically into some common ground U-boot which can then load the rest of the firmware. All right, at this point we know the board. Basically, as they can speak, we can pick the actual component that we actually need. And this is then going, standard mainline firmware support that we have. And then we hand over to some UFI bootloader kernel which is basically shared. It's the same. So you don't need different kernel, for instance, for each of those boards. Also grub, for grub using UFI, then it's the same grub image. You don't need that separately. Okay, so let's look at the first three things that is the interesting part. So the boot process in the first step. Typical SOC boot processes, they contain some embedded boot ROM. It's an ultimate blob, if you like, but this is mass programmed in silicon, so you can't change it by any means. This is kind of burned into the thing. So you can't update it. But it seems to be good enough, at least, to get the job done. And the good news is typically it can be read. So you can basically just dump some kind of memory range. And then you find typically that's assembly code. Yeah, well, some either 32-bit or 64-bit stuff. Typical size is 32 to 64 kilobyte. It's the kind of thing which you can reasonably disassemble with some help and understand. The mission of this boot ROM is basically to find the actual real boot source, load some code, and execute it. And actually, so many people say, ah, it's a blob environment, but actually there's no way around to do this otherwise. I mean, it's not in the 80s anymore where you can take your flesh ROM and hook it to the bus, and then it executes the code that is not feasible on a $5 SOC, which has a limited number of pins and the board is this size, right? So you have to do something and then it basically goes over the different boot sources and finds the way. Then the code finds is loaded into some S-ROM. S-ROM, because at this point we don't have any D-ROM enabled yet. So it needs to find some S-ROM. That also means that the code is very limited. And yeah, the boot order, so where it looks for the boot source can be typically changed sometimes. It depends on the S-O-C. And as I said, the size is limited. Yeah, so let's look at the different S-O-C families, I would like to call them. So for all the S-O-Cs, for instance, we know that it loads up to 32 kilobyte from sector 16 of the SD card. It's like a magic burnt into the boot ROM. So there's some... for the boot ROM to accept it, it has to be a magic number in there and to check some, which has to fit. When it doesn't find anything on the SD card, it tries EMMC 9 spy Norflesh and goes into USB OTG afterwards. And as I recently found out, it also tries not only sector 16, but also sector 256, which is very important for the next step. And Rockchip S-O-Cs, similar thing, only they load from sector 64 of the SD card, and they have, of course, different kind of whopping things. And also different is that they try EMMC 9 spy Norflesh and then only SD card. And also they have alternative sectors, which I just discovered this week, actually. Oh, a lot of them. I knew about the first one, but yeah, there are other sectors. Amlogic S-O-Cs got from Antonio two weeks ago. So it seems to load from sector one of the SD card. I have something booting there, but it's early stages. I think we have to put in a drawer there. It can be done. Raspberry Pi is easy, because that one loads from the first FET partition, and it uses magic finemes. So basically, you can easily coexist with all the others, because you can have a FET partition starting quite late in the game, and then all you need is your magic thing. So this is kind of a graphical version of it. So the first thing is the generic SD card layout. You have an MBR, which could be kind of dummy MBR, and you have a GPT, typically the GRD partition table, up to 16K. And then the first megabytes are typically unused, and the first partition starts at one megabyte. That's how a typical partition tools layout is. For a winner, as I said, starts at 8K. And Rockchip starts at 32K, and I'm logic just behind it. And then afterwards they have... So that's the basic boot code. Those are those marked in red here. And this basic boot code then loads the rest of the actual boot code after it has initialized DRAM, because then it has more space to actually loot the stuff. So the thing that you can actually see, it is not really to scale, but at least those FET, that you actually all three overlap in one region. So there are quite some clashes. And even more so, of course, for the other part, for the actual U-boot and ATF, that's quite overlapping. So the cool thing is that with those secondary boot locations, actually, you can work around this if you have enough space. You can use secondary locations and then try to sort out a way where everything fits together. So my first plan was if we can't do this, we can actually play some tricks. For instance, you make the old winner U-boot at the SPL shorter, so that it stops at 32K, and then try to load some other stuff first. Or you can carve out some stuff and, for instance, jump over this one and then repeat this. It's fortunately not needed. Or you could, in general, have very small trample line loaders so that they don't clash and then you actually load this. But it makes it all complicated. Unfortunately, the secondary boot locations kind of solve this quite neatly. So there doesn't need to be anything done. So we have some tiny changes in U-boot but that's all. And it assumes, then, that the location of the secondary images can be freely chosen, which is true if we control the first part, which is true if the U-boot SPL because there's just one variable which actually says from which sector to load and conveniently just behind the SPL because why would you leave a gap? But if you need to combine several, you can actually arrange them and then just put the right values in each SPL. So that doesn't look just behind it but from one megabyte or two megabytes over there. Okay, so that's the first part. So you have your own code no booted. The second part is that you want to support multiple SOCs, right? So it's not just one SOC from each of the families but multiple SOCs. To detect this is actually not that hard. It's just not really... U-boot is not really good at this because forever has been compiled for a single board. So when you configure U-boot, you select basically one board. You don't select the SOC or you don't select a family. You select one particular board and then it compiles something for exactly that board. If you look closer at these modern platform ports, they actually configure the whole thing for one SOC and then maybe select a bunch of drivers and then select the right DTB so the advice tree for this. So actually there's not much difference in there because you could say, okay, I use a super set of drivers which covers every board. Yeah, but you still have this one SOC problem and it gets a bit harder because stuff that is quite flexible in the actual U-boot part is not so flexible in the SPL because many things get hard-coded there. So the solution is there. Basically, you have a lot of if-defs so if you want to see a project with a lot of if-defs, check U-boot code, it's awesome. Yeah, so the idea is to, yeah, easy set, we just convert if-defs into runtime decisions and then we detect the SOC and what was if-defs before then come ifs, basically. Detecting SOC, use platform-specific, MMO registers, infuses or use heuristics. Well, my favorite is the gig. The gig has an ID register in the end but just not trivial so it's not one or zero or something but it's actually very, it's a number of bytes that are very special, basically. And if you can probe this, so if you know that it's safe to read those locations, then you can actually see if that is basically the gig signature and then you can match this with your known gig signature. So if you know that this SOC has the gig at this point, this point of memory, you can actually verify that it's the SOC that you think. So we use this for the H6 for your winner, which is a bit special. How close are we to the size limit if you want to? Yeah, that's a good thing. Yeah, that's a problem but there are ways around this. So we can use two-chain garbage collection even with those, I'll show you in a minute. So that's what it looks today, the FDF thing, right? So you start with, that's a long chain actually, so that goes for two pages or something. So you say if define config sum 5i, then set those pins in this way. You see it's port B19 and the other is it's H20 and whatever. And the idea is that you can convert this into something like this where you basically call a function, you get the SOC ID and then you assign this. Yeah, so that's from a patch I have. So that is not, yeah, so that, that works. I did this for a winner, so I can basically have all those stuff that works for all the winner SOCs. And then it would need to be done for more SOCs, for more families. Another thing is DRAM initialization, which is typically even board specific. And this is also where if you configure for a particular board, you sometimes get a specific DRAM parameters hard to configure for the board. What you could do is you try to probe stuff and also what happens anyway, mostly as you go with one size fits all parameters for DRAM. For instance, on your winner side, we actually suppose everything is kind of DRAM 1333, so DDR 1333, and then use some magic values which we have from all winner, which is kind of safe across all boards. So technically it's not, and you could actually run the DRAM much faster if you know exactly how long the delay lines, so how long the lines are and how much delay you would have to insert. But we don't do this anyway today. And even if we would do, we could kind of work out some fail-safe values which at least work on everything. You might not get the full performance, but for use cases, that might be good enough. Also, if you have different DRAM types, so LP DDR3, for instance, and DDR3, which is common on all winner, kind of half the boards have DDR3, the other half is LP DDR3. And this, even though they kind of look similar, they're actually quite different in terms of how do you need to program the DRAM controller. And there's not many ways to detect this beforehand, but what I figured if I used accidentally the wrong image, it always reported zero megabytes because at some point it went wrong. And I thought, oh, why not use this? And basically, if I get zero megabytes, doesn't make any sense, then I just try the other one. And that required some easy set. Two weeks later, I've refactored the whole DRAM driver. That actually now works so I can try the other one. And that means it boots on whatever the board has LP DDR or DDR. All right. The third thing is detecting boards. So that's a bit tricky. Actually, reliably detecting a board is impossible. So just imagine that the board that comes out tomorrow will never be covered with your image today, but people might expect it. Also, it's dangerous. So you can't just try to set the voltage to something. And if they have the PMIC configured differently, then you apply 3V to something which is 1.8, which doesn't sound very healthy. It can be achieved for a subset of boards where it gets very, very interesting. So for instance, I have this subset of boards that are on my desk basically. I can have kind of heuristics how to detect which board it is. So I could look at the DRAM size and the DRAM type. So if you have a 2 gigabyte LP DDR, it must be the 0.64 LTS, for instance. If you have a 2 gigabyte DDR3, then it must be the old 0.64, which has DDR run. And so if you can confine yourself to a subset of boards, which becomes interesting for board vendors, you can actually achieve something like auto-detection. But that would be then limited to this thing. But it might be good enough. So for instance, I think I've figured out a way to detect all stuff that 0.64 offers, including detecting the pine book from the 0.64 LTS. But this is not really reliable. And it's also not scaling, of course. There will be, in one point, you can't tell two boards apart. So the solution is a thing to present a list and let the user choose. And that list should be ideally shortened already by the stuff that you know. So if you have detected 2 gigabytes of DRAM, you don't need to show all the boards that have only 1 gigabyte. And then this list basically selects the actual device tree file, because that's mostly the actual board difference that we have today. Apart from the DRAM parameters, it's the device tree which tells how the stuff is connected on a board. The cool thing is that this is basically already mostly done because we use fit images typically for those boards. And they can hold multiple DTBs. And they have a board-specific function which lets you choose one of the DTBs according to some hard-coded stuff. So the moment we just use this, for instance, for pine 64, we have one board which is very similar, only differs by the DRAM. And by looking at the DRAM, we already decide which DTB to choose. So we just extend this and just need to present some kind of menu, whatever. Yeah. All right. So the status. I have some Hello World image with some bare metal code which shows, yeah, dumps some information about the CPU. So it basically does the basic initialization, gets the UART running and prints out something. This is pretty short. And yeah, I don't need anything fancy and upstream for this. So I can, I have one image which basically works on all of the ASOCs, including 32-bit ones even. Works on Rockchip 32-28 and 33-99. And it works on Antonio's Ordo Itzi, too, that he lent me. So this is kind of the proof of the thing that actually this thing works. UBOOT on beyond is much more trickier because for the stuff I just said. So it works. I have a single image which boots boards with Orwina A64 and H5. Other ASOCs are not supported because, I will tell you in a minute why, and that works with the Firefly 33-99 because that is supported in mainline UBOOT. And I can basically play all tricks with loading it later on and move the SPL around. For the Rock64 there's no mainline support and no mainline UBOOT support, unfortunately. So I have to go with the existing BSP version. The good thing is because the rest is so flexible, you can basically start with the inflexible one and then see how the others fit in. And you can basically get away with this. So the open issues. The SPL needs to know the load address at link time, which is an issue for if you have different load addresses for different ASOCs. So for the Orwina H6 ones we have a different S1 load address than for the other parts. So the UBOOT code out of a box doesn't work. And the solution is to make this position independent, which is not too hard, but given how many tricks we already play with SPL to make things going, it sounds like tricky to achieve. I haven't tried. If you don't have mainline SPL support, as I said, gets much harder because you have less degrees of freedom how to hack things around. It's for the 3328. It's a sheet where I found some patches that couldn't make them compile, and some mainline UBOOT, or even the branch didn't compile. So it needs to be cleaned up and upstreamed. And then you can do this. And at the moment there's no SPL support for MLogic ASOCs as far as I could find. I could get my own code running, but that started in EL1, which doesn't sound too good. Yeah, that makes coexistence much more complicated. As I said, it can be tolerated for one ASOC. If MLogic is the only one, or if there's one special ASOC, you can start with that and then build the others around it. So the conclusion is there's one image which can boot multiple boards. If some proof of concept, it's unfortunately I didn't have the time to set up the things. I mean, you can imagine it takes some time to set up one board with four boards and with all the different cables. But you can ask me and I can show you outside, maybe. There's a lot of information work, especially up-streaming might get interesting. So if you tell the UBoot maintainers that you want to make the SPL position independent and you want to support multiple ASOCs, that might take a while. I mean, it's not what UBoot is today. Use cases. Distribution installers sound quite tempting to support a range of boards with one installer. Firmware flashes. You don't put real kernels, maybe, on it, but just have something in UBoot where you have the actual firmwares. You support one image. The user puts it in, selects a board, and that flashes to either EMMC or the spy flash. So then you take the SD card out and boot with the same environment. Multidistribution installers, something like Raspberry Pi is this noobs thing where you basically boot into some menu. And you can actually have this quite easily with Grub. You can have Grub where you have ordered the kernels for Redhead, for Sousa, for Debian, for Bantu, and to install in-it IDs under different menu items. And then you have one thing which boots and you can actually choose to install the rest over the network. And, of course, your own bare-metal application, if you like. You can support multiple boards. Yeah, that's it. I said sorry for the demos not happening, questions?