 So, hello everybody, my name is Angelo, and I work as an embedded systems engineer at Tamsys Corporation. So, I talked to speak about early booth stages. I know these are probably quite well-known arguments that you often find in such kind of talks, but I thought it may be interesting to talk again about this, because sometimes when you are on the over the Linux booth, you are more familiar, but sometimes the early booth stages may be more complex. So, these are some abbreviations I will use. So, this is the roadmap I will try to keep, and it ranges from generic booth concepts through Rombu Loader, and of course, reaching also touching new booth, and touching some hardware troubleshooting, and things like that. So, of course, I am focusing on system on chips. There are the majority of the chips used nowadays. So, of course, they generally offer multiple booth options, because they are full of hardware modules and interfaces built in inside. So, here comes to play the Rombu Loader, so the read-only memory bootloader. That is, of course, a program that is generally not updatable, and it makes the initial initializations and decisions about from what device to boot. So, just after, it comes into play the static RAM, so where to boot the code, of course, this system on chips is inside a static random access memory that is mainly a kind of memory that you will know probably already that doesn't require any refresh cycles, so it's easy to be assessed and used, and it's quite fast, and so on. So, you have generally two chances of execution that are execution in place, and execution of code that has been already copied into static RAM. So, when you have a program that needs to be serially read, generally you have a header with a lot of information inside, and also, of course, the size to read from the Rombu Loader. In some cases, Rombu Loader may miss, so there may be legacy CPUs that are booting directly from a parallel bus, like maybe coal fire stuff or things like that. So, it also runs Linux, but, of course, the boot is more simple. Otherwise, execution in place requires random access. Of course, you have the code that is loaded, and in the code there may be, of course, jumps, so the ability to read from a certain specific address is needed. So, then there are some definitions as Rombu Loader is generally the first program loader, and then there are secondary program loader that is the first binary read, and the next to be executed. Then you find SPL, so, then you may find a tertiary program loader in some specific cases that we will see later. So, as a Linux-oriented boot, we are landing on, of course, 32-bit CPUs, or major, of course, or 64. So, those are supported from Linux. So, you generally need onboard some synchronous dynamic random access memory, so DDR or all the rest, the RAM. And because, of course, you need an appropriate amount of RAM to run the kernel, so, maybe 10 megabytes, 16 megabytes, something like that. So, then, of course, a normal attack memory is needed, and to boot to read, of course, the root file system and all the rest. And an SD card or USB stick may be useful, of course, for development to test the kernel multiple times and be able to have a fast development. So, U-boot is a common choice. Of course, you have several bootloaders. Several choices is not that hard to program a proper bootloader in case, of course, U-boot offers a lot of help, and many of you will know it already. So, you can do a lot of things from U-boot. And, at the end, it is good because it is popular that there is a friendly community and there are regular releases. So, in case you have issues with U-boot, better to try a recent mainline code or the last stable release. So, these are some extremes. So, we can go from a simple boot with just a single binary, uncompressed code and standalone to, of course, at the more recent trust zone secure world boot where you have a lot of blobs involved. So, these are kind of extremes I tried to figure out. A bootloader may be avoided, of course, but this is unusual and not often such flexible, let's say. Okay, this is a common generic boot. This is about normal world and 32-bit systems. And you see that in the first stages there is always this wrong bootloader after a reset that starts to check and evaluate some conditions. So, one condition is initially these bootstraps pins. So, you have generally resistors on the board or you can have some switches that define this bootstrap code. So, the system on chip knows, for example, what is the device selected for the boot. Of course, you may have also fuses that you flash like one-time programming fuses. So, generally, the wrong bootloader checks also those fuses. And finally, the wrong bootloader needs to read the first program from the selected boot device. And, of course, it reads inside a built-in static RAM on chip RAM. And that may be from small size, like 4 kilobytes to sometimes also 512 kilobytes. It depends from the system on chip, of course. You have a lot of scenarios. And then once the first program has been loaded, of course, you need to initialize and if you want to try the DDR, and then you finally can load the Linux into the DDR and jump to it. So, this is just a road schematic. So, these are something about bootstraps pins. It's very simple. So, you have generally resistors. As you can see from the image, there are generally three pads, so you can unsolder and move it eventually. Or there may be switches. Of course, the CPU samples these pins at first result. But just after, you can use the pin for other purposes, like GPIOs or the like. So, we can boot from different memory types. These start to be interesting. So, system architect should define the proper hardware. So, depending if it's needed a fast boot or keeping the cost low. Okay, this is one of the common boot device. It's SPI NOR flash. You will probably, many of you will know it already. I am talking about the standard SPI, so it's a bit simpler. And, of course, for this kind of memory that is often used for your boot or initial boot, you have four wires. So, it's very simple, the connection. You have master input and slave output that needs to be connected to master input slave output. So, it's difficult to make mistake, but still, of course, you can find swapped wires on prototypes. Because sometimes the constructor of the chip is using a different naming. So, he's using DO, DE for the bidirectional lines and sometimes maybe swap it. So, here you have a clock and it's quite easy to troubleshoot to the bug. So, you can check the clock by your oscilloscope if something is red. And, of course, there is a chip select that is a signal that you keep low to select the chip and then you can start the conversation. So, this is a quite common option, this SPI NOR. So, it's used to isolate your boot, generally. And, it is synchronous, of course, it is full duplex, but for the first boot, of course, it is generally used just reading. It's simple to wire. And, of course, these SOIC packages are simple to unsolder in case you have some issue in place of reprogramming through JTAG or something else. So, transfer rate is, of course, clock dependent and, of course, is dependent from the lines, from the data lines, of course. So, this chip is not random access, but when you go to quad and octal, there are a lot of chips that are execution in place enabled. So, you can read the busts of bytes, group of bytes, using, of course, specific commands that are available in quad and octal bus devices. Okay, this is an old new friend. He is the parallel nor. So, who worked with all the CPUs will probably know this very well. So, you have just an address bus and a data bus and some signals to read and to write and to select the chip, of course. So, it's interesting to see that this is a 16-bit words chip that are very common. And the CPU side address one is wired to the address zero because, of course, at each address increment you have to read 16 words, so not a single byte. So, for this reason, the A1 is mapped to A0. So, still on parallel nor they are often used for execution in place. So, the code is very simply read from the CPU and executed immediately. So, it's just a fetch and execute as in the old CPU, as, for example, 1851 or the like. So, where they were reading from EEPROM or Flash chips. And the read operation is quite simple. So, this chip are, of course, random access because you can set on the address bus the exact address that you want to read, so you can jump over the code. And the read procedure, for example, is very simple, so you have to put down the chip select, put down the output enable and then, at this point, of course, you have to configure the states on the CPU because these kind of chips are giving the output after a certain time. So, if you read before, you can find simply garbage. And once the CPU has read the proper time, you can find the data in the database. But the output in this simple way of reading is not very impressive. So, this kind of nor parallel chips are used generally in another way for fast boot. So, you need to configure page mode and also set as cacheable the address range used from the parallel nor. So, this may be done simply setting some bits on proper registers, generally. So, you set the address on the address lines. Chip select and output enable are set low. So, you have a first read delay. But then, you still keep chip select and output enable low and you can increment the address. So, the next word is available much sooner, so in 10 nanoseconds, for example. And the speed is much better. It's near 100 megabyte per second. So, this is an end flash memory, nonvolatile memory, of course. And it's quite different from the other memories we have seen. So, there is a kind of a database, but it's bidirectional. And also, there are two wires. So, you can see CLE and ALE that are used to combine commands for the end. And so, the end flash is more error-prone. It's much faster, of course, writing compared to the nor parallel flash. And it's quite fast for reading. But of course, since it's an error-prone technology, you need a mechanism to error correction mechanism done by software. Then is a very interesting chip. It's very nice to use. It's the EMMC that is mainly like an SD card, but of course, is in the shape or in the package of a chip. And the protocol is very similar. It's very similar to SDIO. More or less. So, you have a clock, a command line and a database that is often of eight lines. And it often works also with DDR so can reach very high speed, like 400 megabytes per second. But of course, this kind of chip requires an initialization time because they have inside a circuit that manages the end-flash that is inside and take care, of course, of everything of error correction and everything. So, you just see this chip as an SD card, you can partition and use it in a very friendly way. But of course, still requires 50 milliseconds before you are able to read data. It is not random access, so you need, of course, to read block by block and shadow the code on the static RAM. These are some other... This is a SD card, of course, maybe used for development. There are often some quite high time because before you can read the data from it. These are some other technologies I have inspected that are available for as a non-volatile boot methods. There may be other, of course, but I couldn't enter for a matter of time also, end of knowledge. So, this is... These are some typical cases of boot. So, you generally have a wrong boot loader, of course, and if you are able, in a non-secure world, you can boot, you boot, you are able to make it fit inside the static RAM, very big static RAM chips, so you can then easily initialize SD RAM, DDR, and boot Linux. Where you cannot make... You boot fit inside the static RAM, at this point, you generally use another mechanism that is a secondary program loader that is smaller and you can make it fit inside the static RAM and it generally then initialize the DR and load if you want the fully boot and then the kernel. But it's more common, you can find that SPL can boot directly the Linux kernel. There is a FICOM mode, sorry for the error. So, this FICOM mode mainly allows to initialize and boot the kernel without passing for the fully boot. You can still exit from SPL, generally typing a key and load the normal new boot so you have the command access and you can manage some operation from there. And, of course, this exit pressing a key needs to be implemented in the board C file. This is a case where you may find TPL, of course, so tertiary program loader. There are specific cases where, for example, on one end you can just read one kilobyte, for example. So, it's not mainly a static RAM limitation but it's a limitation of the size of the code. And then, so you can initialize the DDR later from the TPL, boot you boot and the kernel. Okay, this is an interesting case. I found working on this cold fire quite legacy stuff that mainly everything is... the static RAM is small but has been put in the header a small size of code that is read from the wrong bootloader. And inside this small part of the bootloader is performed the DDR installation. So, you need to check the linker script to put the code in the initial part and then you are able to, of course, load the rest of the binary through proper simple driver and it's just a thing I have found. So, I just mentioned it. This is iMakes, some more iMakes scenario. So, who worked with iMakes chips probably knows this well. So, this kind of system on chips allows you to prepend the boot program with some special operative codes. So, you are able to initialize the DDR, for example, before booting the second program loader. So, you can initialize the DDR and boot a new boot bigger than the static RAM size. Of course, you load it directly into the DDR. And this method is, by the way, not always welcome because several times a training is preferred on the DDR, especially for some special cases where you have edge thermal conditions, stuff like that. So, a training is always better. So, sometime you find still inside the U-boot code a training done later. Then there is also a chance to boot directly the kernel. I just briefly tested it. But it's not really versatile, of course. OK, then some words on trust zone. It's more actual technology. So, now several system on chips already boots from the secure world. So, there are specifications, but there are sometimes different implementations. They are very wide to be covered in this talk. So, you may have an additional core that is managing the boot. You have different boot blobs. And also you may find proprietary blobs that there are a lot of guys actually working to try to keep it open. So, you will find a lot of information in these two links where you can find all the platform supporting this secure world boot. OK, here some common U-boot commands. So, these are very, very well-known commands generally. Common to check the device tree that is generally built with U-boot. And it's very useful to lower the kernel image from the file system. So, it's simple. You can just access the file. And you access the... So, U-boot is offering, of course, a lot of file system support. Then you have common commands for memory display, of course. Common for write effuses. You have... This command is nice, it's on IMX. So, there are cases where you don't have... You don't have access to boot mode switches. So, you can use this B-mode to reboot in a different boot mode. Or if you don't have B-mode built-in because it needs to be configured, of course, in U-boot, you may use some tricks like these directly writing or registers. So, it's kind of tricky way to reboot in a different mode. So, then you have GPIO, of course, commands. So, you can test on prototypes. You can enable some circuit stages. You can see if they work before booting Linux, of course. Then here are commands to access, for example, the SPI nor flash. And, of course, the SPI flash... nor flash needs to be supported. So, you need to define the support for it in U-boot specific board includes. Then I like to use, for example, SPI. It's a command that you can just load the binary to the console. So, it's quite old way, but I find it useful because you don't need to care about the network. And then, of course, you can flash directly the SPI nor. These are most common boot commands. Go for an uncompressed kernel, for example. It's kind of legacy command. Then you have the more common boot M that is loading U images. Then you have boot Z that is loading Z images. And then there is the newer 50-mage way. So, you can load this kind of 50-mage built from a device tree and so on. So, you can boot through 50-mage. Okay, these are some tricks about optimization that I have experienced. So, a good result starts often from the hardware. So, from the components choice mainly. And most of the cases for a fast boot, the nor flash, the nor parallel flash and execution in place is chosen. Of course, you have also, as you have seen, EMMC very fast, but there is a startup time while the parallel nor is ready immediately after reset. So, it's still often chosen as a boot method. Then in your boot, of course, you need to check that the drivers are working in the selected mode. So, you can enable, for example, DDR for EMMC and stuff like that. So, you need to check that everything is really enabled and working. So, then you can, of course, disable console and kind of simple tricks. You can check clock signals, of course, on the boot device, so you are sure that you are reading at the proper speed. And of course, you can measure boot time to GPIOs and oscilloscope. So, you can just put on some pins, some levels so you can see where you are over the boot. And then there are, of course, camel optimizations, that this is quite complex story. These are some tricks to debug. In case you have a prototype, the first boot is not providing any output. So, you can check power supplies. Of course, all the involved chips in the boot needs to have proper power supplies, like 3V and 3V and stuff like that, 1.8V that must be stable. And you need to check, of course, the reset signals. Generally, there is a circuit and the signals may reach different, multiple chips. So, you need to be sure that, of course, one chip is not wrongly reseted before another or things like that. And then, of course, you can check CPU clock sometimes. So, it's often possible. You can check activity on the first boot device. So, you can check the data lines. On the SPI north, if something is read from the wrong boot loader, you need to see some bars. So, also from the size with a good oscilloscope of the data that you see, you can understand in what phase the wrong boot loader is stopping. Of course, often you can find on prototypes and on custom boards errors, even if there are very expensive card systems. There should be signal integrity checks and stuff like that, but sometimes you still may find errors. And, of course, also here you can add some toggling on the GPIOs. OK, this is just a small table I did just to show some famous system on chips. I couldn't go over 64-bit stuff because I work just on MX8, but you can see that quite all of them are offering, for example, execution in place. But not all. So, I mean, if you don't have the generic bus, you cannot boot from a parallel north, of course. And then you see that newer stuff is as a big static RAM, generally a bigger static RAM. And, OK, so that was just a brief comparison. OK, I completed probably quite sooner, so if you have any question, I am happy to try to answer. Thank you very much. Thank you.