 Hello, and welcome to this presentation describing the STM32 MP1 platform boot. Booting a Linux platform is very similar to launching a rocket, since this is a multiple stage process where only the last one is useful at the end. The boot chain shown on the screen is really standard in a sense that steps are similar on other MPUs available on the market. The first stage is the ROM code. This is not a software component from a user perspective, since this binary is embedded in the microprocessor and cannot be modified. The ROM code initializes a minimal clock tree in order to get all the peripherals involved in the boot detection alive. Once this is done, the first stage bootloader, or FSBL, is loaded from the boot device to the embedded RAM, then executed. The FSBL is the first real software component executed, just after the ROM code. It completes the clock tree initialization in order to get access to more peripherals with, among them, the external RAM where it loads the second stage bootloader, or SSBL, before execution. The most famous SSBL used all around the world is UBOOT. Its mission is to load the Linux kernel into the external RAM from a given boot device, among the ones supported by the ROM code, or even other ones like the Ethernet, for instance. The SSBL has quite a large feature set, and it is also often used to display an image during the startup process that is called the splash screen. The boot file system, or boot FS, contains most of the binaries needed by UBOOT. The splash screen image, the Linux kernel, and the device tree blob that contains all initialization data given to the Linux kernel. The last action from UBOOT is a jump to the Linux kernel entry point, and Linux is then alive. The kernel starts by initializing all its device drivers, then it mounts the root file system, or root FS, that contains all the user space applications and libraries. The user space switch is realized when the Linux kernel creates the init process that launches the services and applications stored in the root FS. This figure shows the typical sizes of the successive memories embedding the boot chain components from a few hundred kilobytes for the internal memories up to several hundred megabytes for the external memories. Beyond Linux startup, the STM32MP1 boot chain is also responsible for the startup of two other major components of the processor. The secure monitor, supported by the ARM Cortex A7 secure context, also called TrustZone. This secure monitor can be used for user authentication, key storage, or tampering management. And the coprocessor firmware running on the ARM Cortex M4 core. This one can be used to offload real-time or low-power services. The dotted lines in the diagram mean that the secure monitor can be started by the first stage bootloader or the second stage bootloader. The coprocessor can be started by the second stage bootloader, we call it early boot, or Linux kernel. This diagram introduces with colored vertical frames the three hardware execution contexts supported on the STM32MP1 platform. The ARM Cortex A secure context in pink, the ARM Cortex A non-secure context in dark blue, and the ARM Cortex M context in light blue. Gray horizontal frames show the boot chain on the bottom part and runtime services above. Then several boot stages introduced on the previous slides are mapped on those vertical and horizontal frames. The ROM code, the first stage bootloader, the second stage bootloader, the Linux kernel, and user space. On top of that the secure monitor is on the left and the coprocessor firmware on the right. ST Microelectronics proposes two flavors for the FSBL and SSBL on STM32MP1. One puts the SSBL on the secure side whereas the second one runs it in the non-secure side. This is the meaning of the arrow in the SSBL box and those two options are further described in the following slides. The trusted boot chain is the default solution delivered by ST Microelectronics with a complete feature set. It uses the ARM Trusted Firmware for Cortex A also known as TFA as FSBL because this bootloader is delivered under the BSD license that is sometimes preferred by customers who want to hide some details or implementation in the boot chain. It was developed by ARM with the target to be trusted so that it should fulfill all requirements for customers who are sensible to security problematics. And it is future proof since it is widely used on ARM V8 architecture platforms. Then the trusted boot chain uses U-Boot as SSBL that is covered by a GPL V2 license. Note that the authentication is optional with this boot chain so it can run on any STM32MP1 security variant having or not having the secure boot option. T secure OS is optional and can only be used to run trusted applications on the STM32MP1 platforms otherwise TFA secure monitor so called SPMIN is used to implement the minimum set of secure services expected to support the platform. The basic boot chain is also proposed to generate both FSBL and SSBL from a unique source code U-Boot. U-Boot secondary program loader or SPL can indeed be used as first stage bootloader. ST Microelectronics uploads the basic boot chain with a limited number of features to enable the U-Boot community to extend it. The STM32MP1 boot mode is defined by the combination of several inputs. Three boot pins accessible on ST boards. Their possible values are shown in the first column of the table. The next column corresponds to the TEMP backup register number 20 that allows the user to force a serial boot when it is set to 0xFF from U-Boot or Linux. And the one-time programmable Word 3 contains a primary boot source and a secondary boot source shown in the third and fourth columns respectively. The possible values for the boot sources are listed in the tables on the right. Parallel NAND flash, quad SPI nor flash, EMMC, SD card or quad SPI NAND flash. The boot pins have two special positions. All pins at zero forces a boot in serial mode. Binary value 100 allows entry in no boot mode. Useful to take the hand on the co-processor via JTAG for firmware development without Linux. Let's consider the example highlighted in green. To configure your board to always boot on the SD card, that is the primary source number four in the table on the right, then write four as the primary boot source code in OTP Word 3. If the ROM code does not succeed to boot on the SD card then it falls back to a serial boot as the secondary boot source is virgin in OTP. The tables in this slide are copied from the ST Wiki article given in the bottom of this slide so you can easily find them later on. This slide explains how a serial boot is managed in order to program the board embedded flash on the right with binaries available on a host computer on the left. The STM32 cube programmer is a tool delivered by ST Microelectronics to program flash memories running on the host computer. The flash memory programming process requires the connection of the board to the host computer via a UART or USB link. Then the user has to select a boot pin combination corresponding to a serial boot and reset the board. From here the boot chain is executed as explained previously. One, the ROM code starts and detects the selected boot mode is serial whether UART or USB. It downloads the FSBL via the available serial link from the host computer to the embedded RAM and runs it. Two, the FSBL does the same to get the SSBL from the host computer and copies it to the DDR for execution. Three, the SSBL asks for the flash layout to the host computer. The flash layout contains a textual description of the expected flash memory mapping partition per partition. Four, the boot chain remains in a loop in SSBL until the end of the flash programming process following the instructions from the flash layout. When this sequence is finished the user can change the boot pins to select the freshly programmed flash as boot device and reset its board to boot on it. Now let's see the boot chain configuration. Any piece of software needs to get the hardware description of the platform on which it is executed to run properly. This includes the kind of CPU, the memory size, the pin configuration, etc. The Linux kernel used to embed this hardware description directly inside its binary. The consequence of this historic implementation was that hardware variance management often relied on compilation switches that required the kernel to be compiled for each new board. Later the device tree concept was developed. The idea was to describe the hardware configuration in a device tree source file that is compiled to get a device tree blob. This blob is given as a parameter to Linux kernel that can remain the same for multiple platforms. For instance all microprocessors implementing ARM v7 architecture are supported with a unique Linux kernel configuration that is called multi v7 config. Uboot adopted the same device tree concept and ARM follows the same trend in TFA. So ST Microelectronics widely uses the device tree for all platform configuration data including DDR configuration. Linux developers are manually editing the device tree. On the other hand ST Microelectronics customers are widely using the STM32 cube Mx tool to configure the STM32 cube firmware for STM32 microcontrollers. So it has been decided to extend the tool with a DDR tuning function and make it able to generate the device tree for STM32 mp1 microprocessors to configure the internal peripherals. This should ease the move from the MCU world to the mp1 for people who are discovering the Linux environment. Uboot compilation leads to the generation of two binaries U-boot-spl.bin that stands for Uboot secondary program loader. This one is used as FSBL in the basic boot chain and U-boot.bin that is used as SSBL in both boot chains and will be executed twice at boot time. A first partial execution occurs from the DDR base address where it has been copied by the FSBL in order to relocate itself to the end of the DDR. This execution context is called pre-relock and a second and full execution starts from the relocation address. Each U-boot-generated binary has a device tree blob appended at the end. This means that each binary embeds the executable code and the device tree configuration data that is explored by the executable. The U-boot device tree embeds two special properties U-boot DMSPL that stands for U-boot driver model secondary program loader and U-boot DM pre-relock that stands for U-boot driver model pre-relocation. Let's see how they are used. Since the U-boot SPL is executed in the internal RAM that has a limited size, the device tree blob needs to be filtered in order to remove nodes that are useless in the FSBL context. This is done using the FTDGREP U-boot tool that removes all nodes that do not have U-boot DMSPL or U-boot DM pre-reloc from the DTB file. U-boot.bin does not have this memory size constraint, but it would be wasting time to do too many initialization steps in the pre-relock context. That is why U-boot is only taking into account the U-boot DM pre-relock tag nodes while running in the pre-relock context, whereas all nodes are taken into account after the pre-relocked phase. Now, let's see how the U-boot behavior can be tuned to fit with everyone's needs. At build time, it is possible to customize U-boot, defining a board configuration file with dedicated memory mapping, boot command, etc., and using the menu config command to select the target and decide which features to embed, like distro that will be explained now. At run time, U-boot behavior can be changed thanks to the device tree, see the previous slides, and distro feature that allows the load of several boot configurations and shows a selection menu at boot time in the serial console, where the user can select the one he wants to use. For instance, imagine a configuration to boot the Linux kernel present on the SD card, and another one allowing the load of the Linux kernel from the network, directly from the developer host company where it was compiled. This slide shows which group of peripherals are initialized all along the boot chain execution. It contains a great deal of information, so let's first start explaining the legend on the top left corner. The gray color is used for hardware blocks. Pink color is used to highlight Cortex-A7 secure context. Dark blue color is used to highlight Cortex-A7 non-secure context, and dash lines are used to identify customizations on the device tree, and especially with regard to the U-boot, U-boot-DMSPL, and U-boot-DM-pre-relock properties. Let's read this slide starting from the bottom and progressively moving to the top frames. The bottom axis shows the different execution contexts that are crossed during the boot chain execution. Starting from the reset, there are the ROM context, the FSBL, the SSBL pre-relock, and the SSBL after relocation that finally leads to the runtime context where Linux is running. Just above is the mapping of the trusted and basic boot chains components on top of those contexts, with the coverage of the U-boot-DMSPL and U-boot-DM-pre-relock properties. Since TFA does not have anything equivalent to U-boot FTGREP tool, the only way to optimize its device tree size is to remove useless nodes directly in the DTS files. The upper part of this slide shows the peripheral's initialization order beyond the ROM code context. The firewalling consists in defining which peripheral can be used by which context, so it is set up very early by the FSBL. The FSBL supports the same set of boot devices as the ROM code, whereas the SSBL extends this list with USB host and Ethernet. The DDR controller initialization is applied by the FSBL and remains visible in the pre-relock context since it contains the DDR size used to perform the relocation. The clock tree is one of the primary configurations applied by the FSBL. It may eventually be modified at runtime later on. The system time for the Cortex-A7 generic timer is provided by STGEN that is initialized in the FSBL. The power supplies are provided by the external GPMIC that is controlled by I2C4 on ST boards, so both are first initialized in the FSBL and updated all along the application needs at runtime. The UART4 is used as a serial console from all contexts on ST boards. The GPIOs are progressively initialized by each context step by step. All the other peripherals are initialized by UBoot or Linux when needed. The boot process is an incremental approach with new peripherals being added to the list of the previously initialized ones, but the development flow for the corresponding device tree files is done in the opposite order. Linux device tree source files are created. Linux files are copied to UBoot, then completed with UBoot add-ons, including UBoot DMSPL and UBoot DM pre-relock, and DDR settings. Linux files are partly copied to TFA, then completed with security parameters and DDR settings. Let's zoom into the Linux device tree that is split in several files. The SOC file corresponds to STM32 MP15 peripherals definition, most of them being disabled in this file. PIN CTRL defines the GPIO banks per STM32 MP15 packages and the PIN configurations used for each package while assembled on a board. Board files are split in two levels. A board family factorizing everything that is common to a board family and a board variant that allows the management of the differences across a family. Finally, each board can be compiled with the device tree compiler DTC in order to get the device tree blob to be given as parameter to the Linux kernel. This diagram is covering the cases of the evaluation board EV1 and the Discovery Kit DK2. Those boards are variants of the evaluation daughter board ED1 and the Discovery Kit DK1. STMicroelectronics maximizes the use of open source software and the upload to communities. So all those device tree files are uploaded to the Linux repository and this is also true for the two upcoming slides for UBoot and TFA. Starting from a set of Linux device tree files, this diagram shows what files are added in UBoot, mainly to do some overlay with UBoot, UBoot DM SPL and UBoot DM pre-relock properties, but also for the DDR controller initialization. On the right, the build process generates UBoot.DTB and UBoot SPL.DTB files for each board. Starting from a partial set of Linux device tree files content, this diagram shows what files are added in TFA, reusing DDR settings from UBoot and adding security peripherals configuration. As explained in previous slides, the STM32 cube MX tool can be used to generate STM32 MP1 device tree files. Let's now see this more in detail, starting with Linux device tree file generation. Notice that STM32 cube MX only generates the board file for Linux that includes the SOC file on one side and the PIN CTRL file corresponding to the selected package on the other side. STM32 cube MX does a copy of the Linux DTS file for UBoot and completes it with two new files. One file for the DDR configuration and one file for UBoot add-ons that mainly consist of the use of UBoot DM SPL and UBoot DM pre-relock properties whenever needed. This approach is very close to the development process used for the uploaded versions of the device tree files. STM32 cube MX generates a board DTS file for TFA that is a lighter version of the Linux board DTS file in order to save space. This file itself includes the already lighter DTSI file versions on SOC and PIN CTRL sides that come with TFA. Then the same DDR configuration file generated for UBoot is reused for TFA. To boot properly the right boot device must be selected via the boot pins or the OTP boot source. The selected boot flash must then be properly partitioned to allow the ROM code booting on it. This section shows how the ROM code is looking for the FSBL binary in the selected flash. It also describes the complete flash memory mapping that is implemented in the ST microelectronics OpenST Linux distribution. Let's read this table from bottom to top to be consistent with the order in which the boot chain is considering them. The FSBL partition contains the FSBL binary that is TFA or UBoot SPL depending on the selected boot chain. As seen earlier this binary also includes the DTB file used by the boot loader. The SSBL partition contains the SSBL so UBoot with its DTB file. And boot FS is the boot file system and it contains the boot distro configuration file, the splash screen image except for the NOR flash seen next slide, Linux kernel UBoot image, Linux kernel device tree, and optionally this partition can contain an init ram FS that may be used by the Linux kernel on startup. Note that the boot FS can combine several versions of those binaries in a single image thanks to the flattened image tree format that is out of scope of this training. The vendor FS file system is used to store third party binaries to ensure that they cannot be contaminated by unwished licenses such as GPL v3 that are used in the root FS. The root FS file system contains all user space binaries so mainly kernel modules, executables, and libraries. This is usually the biggest partition that can be up to 800 megabytes wide. And the user FS file system contains user data and ST microelectronics examples. Some partitions are optional. TEEX, TEED, and TEEH contain the open portable trusted execution environment known as OPTEE that is the secure OS supported on STM32 MP1 platforms. Logo contains the bootloader splash screen image only on NOR flash. This is because the boot FS is stored in another device. That can be the SD card that is not yet initialized when UBoot tries to display the splash screen. While booting with an SD card the ROM code looks for the GUID partition table or GPT at the beginning of the device in order to localize all the other partitions. Two FSBL copies are available in raw partitions to be able to perform a failsafe update of the FSBL binary. Failsafe means that in case there is a power failure during the FSBL1 update then there would still be a valid copy in the FSBL2 partition and vice versa. The SSBL and TEE binaries are stored in raw partitions since the FSBL does not support a file system. Boot FS is stored as an EXT4 partition that is a file system supported by UBoot and root FS and user FS are EXT4 file systems used by Linux. An EMMC looks like an SD card except some additional special physical partitions. In the boot context the boot area partition 1 and boot area partition 2 are the key partitions. So the strategy consists of putting one FSBL instance in the boot area partition 1 and another one in the boot area partition 2. The ROM code then directly tries to find the FSBL binary in one of those boot area partitions. Then the user data area is partitioned with a GPT table that is used by all the consecutive boot components to find the several partitions. For cost reasons nor flash memory sizes should not be very big on products. Let's say 8 megabytes in most cases which is enough to put the small partitions but clearly not big enough to put the wider file systems. That is why a second flash memory device is needed to store those file systems. The figure shows the corresponding mapping with an SD card but this second flash memory could be an NAND flash for instance. In case of boot from nor flash the ROM code looks for FSBL instances at the offsets 0 and 256 kilobytes. NAND flash memory is the cheaper flash technology that exists but it is also the most complex to manage. Being cheaper implies that it is widely used and on the other hand it is important to understand the physical organization of NAND flash and the two main defects that make it harder to play with. On the physical side NAND flash is split in blocks that are themselves split in pages. Each page contains a user area where the data is stored and a spare area used to store metadata. The first kind of defect. Some blocks may be bad. Over time some of them become bad during the product life due to wear but some of them are already bad out of production. A special tag in the spare area is used to identify factory bad blocks. The second kind of defect. Due to a physical phenomenon such as electrical leakage or read disturb effect some bits may toggle in pages. So error detection and correction mechanisms are needed in order to overcome this issue. Storing the error correction code in the spare area. Now let's consider the software to manage the NAND flash. Any piece of software must be able to detect and correct errors in the pages. This is mandatory. A software not able to manage a bad block replacement strategy simply uses a skip bad block method that consists of jumping to the next block when a bad block is met. A first level of bad block management comes with the MTD meaning memory technology device that allows the management of bad blocks inside the MTD volumes. MTD does not allow the mitigation of the wearing issues. This is why binaries contained in MTD partitions need to be there in several copies. The ultimate level of management for the NAND flash is offered by the unsorted block image that exposes a perfect volume via logical blocks to the upper layers. The UBI is then managing the translation to valid physical blocks and takes care about the wear leveling. So there is no need to duplicate data inside UBI volumes and the UBI file system that are defined inside. This being said the diagram above can now be understood. The ROM code uses the skip bad block strategy to look for a valid FSBL instance starting from the offset zero in the NAND flash. SSBL and TEE partitions are stored in multiple copies inside independent MTD volumes. Boot FS, root FS, and user FS are defined as UBI FS partitions inside a common MTD volume. This slide is an overview of the image signature and authentication process as it is used on STM32 MPUs. As a prerequisite, here are the minimal acronyms to know for a correct understanding. A payload is a binary file on which a cryptographic operation is done. ECDSA 256 stands for elliptic curve digital signature algorithm with 256-bit key pairs. It is an asymmetric cryptographic algorithm able to encrypt a payload via a 256-bit private key, then decrypt it via the corresponding 512-bit public key. SHA 256 means secure hash algorithm generating a 256-bit hash from an input payload. It is a common practice to store a public key hash in order to check the public key integrity later on. And a signature is made of the cryptographic hash of a payload containing the initial payload to authenticate plus the public key. A signature is often called HMAC standing for hashed message authentication code. Refer to cryptographic documentation and training for further information. This diagram explains the complete flow that is put in place to perform FSBL binary authentication with the ROM code. 1. First of all, the key generator tool is used on the host side to generate an ECDSA key pair, so a private key and the corresponding public key. The tool also generates a public key hash via an SHA 256 operation. 2. STM32 CUBE programmer includes an OTP burning tool that can be used to write the public key hash into the STM32 MP1 BSEC non-volatile memory. 3. The signing tool is used on the host side to compute the SHA 256 hash of the FSBL payload plus the public key and the file header. 4. This hash is encrypted with the ECDSA 256 using the private key to get the payload signature. The signing tool finally generates a signed file containing the payload, the public key, the header, and the signature. This signed file is the one used by STM32 CUBE programmer to populate the embedded flash. 5. When the STM32 MP1 target is reset, the ROM code starts by computing the SHA 256 hash of the public key available in the signed file. Then it compares this hash with the one stored in the STM32 MP1 BSEC non-volatile memory. If it is different, the authentication process fails. Otherwise, it continues. 6. The ROM code computes the SHA 256 of the FSBL payload plus the public key and the file header. It compares this hash to the value resulting from the decrypted signature with the public key that has just been authenticated. If the comparison fails, then the authentication process fails. Otherwise, the authentication is successful and the ROM code goes on with the boot process.