 Hello, and welcome to this presentation of the STM32-MP1 Co-Processor Management. It describes how the processors interact through commands and events, exchange data, and share resources. The STM32-MP1 multiprocessor system allows independent firmware to run on two CPU cores. The master ARM Cortex-A7 processor is optimized to run Linux-based OS. The slave or co-processor ARM Cortex-M4 processor can run RTOS optimized for microcontrollers or a bare metal application. In addition, several integrated STM32 peripherals can be assigned to one of these processors. Then, two internal memory regions are shared between the master and the slave processors. These memories are used to load and execute the co-processor firmware, but also to define common structures, like shared buffers for inter-processor communication. Furthermore, for inter-processor communication, a specific peripheral named inter-processor communication controller, or IPCC, enables signaling by dedicated mailboxes. Finally, a hardware semaphore, or HSEM, can be used to protect shared resources from concurrent accesses. Before entering into details, it is important to understand the concepts involved in the management of a co-processor in a multiprocessor system. The first one is the load and the control of the Cortex-M4 core. The Cortex-A7 core is in charge of loading the Cortex-M4 firmware and controlling the Cortex-M4 core reset. Then, the inter-processor communication is ensured by the RPMSG protocol that relies on shared memory and the IPCC peripheral. Finally, a resource management service is proposed to provide facilities to assign a peripheral to a core and control the exclusive access to this peripheral by this core only. Change access to the common or system resources shared between the two cores, for instance, clocks or GPIOs. In terms of software, the management of a co-processor is done through three main frameworks. The responsible processor, or remote proc framework, is responsible for loading the firmware and all resources required by the Cortex-M4 co-processor to properly operate. The remote processor messaging framework, named RPMSG, is used for the inter-processor communication and relies on the Virtual Input Output or VIRT-IO framework. This Virtual Input Output framework provides a mechanism to manage the shared ring buffer pool named V-ring. On Cortex-A7, these frameworks are native in the Linux kernel distribution. The remote proc framework is also integrated in the U-boot, allowing the preload of the Cortex-M4 firmware before the Linux firmware. On Cortex-M4, the frameworks are available by integrating the open amp library for bare metal and real-time OS. This slide shows the several RAM banks available on STM32 MP1 devices and the typical mapping applied on the STM32 MPU embedded software distribution. The RET RAM is used by the Cortex-M4 to put its vector table plus some code and data. The MCU SRAM1 and SRAM2 can be used to put the remaining Cortex-M4 firmware code and data. The MCU SRAM3 is typically used to map the inter-process communication buffers that are further described. The MCU SRAM4 can be reserved to put DMA buffers for the Cortex-A7 when high bandwidth is needed when using the DMA1 or DMA2 instances. It is not mandatory to align this mapping on region's borders, but this can represent an interest since hardware isolation for Cortex-M4 memories is supported with a per-bank granularity. The trusted boot chain is the default solution delivered by STMicroelectronics with a complete feature set. The Cortex-M4 firmware can be loaded and started by the Linux OS or by the secondary boot loader. For instance, U-boot. In normal mode, the firmware is stored in the file system and is loaded by the Linux user land through the remote proc file system interface. In early boot mode, the firmware is installed in the boot FS partition and is loaded by U-boot before the Linux firmware starts. The Cortex-M4 firmware is stored in ELF format to be loaded by the remote proc framework from U-boot or Linux. In addition to the code and data sections, a specific section can be defined to support features related to the coprocessor management. This section consists of a resource table structure, which is parsed by the remote proc Linux framework during the firmware load phase. The first feature declared in this table is the remote processor trace buffer that offers the possibility to output the Cortex-M4 logs to a ring buffer. This buffer can be monitored on the Cortex-A7 side. The address and the size of the ring buffer is declared in the resource table. The second one is the inter-processor communication protocol. To enable the RPMSG protocol, the VIRT-IO device and associated VIRT-IO ring descriptors have to be declared in the table. The Cortex-A7 remote proc framework, acting as a master, is in charge of allocating the associated RPMSG buffers in MCUS RAM and completing this table in consequence. The inter-processor communication between the Cortex-M4 and the Cortex-A7 processors relies on the RPMSG messaging service. The RPMSG and VIRT-IO frameworks are in charge of managing the buffers involved in the communication. Then, a doorbell or mailbox mechanism is in place to inform the processors that a new message is available. This signaling is generated thanks to the inter-processor communication controller, or IPCC. The IPCC peripheral integrated in the STM32MP1 offers six bi-directional channels for the communication between the Cortex-A7 and the Cortex-M4. The principle is as following. A core, for instance the Cortex-A7, sets a channel flag which generates an RX-occupied or RxO interruption on the other core, here the Cortex-M4. The Cortex-M4 clears the flag to free the channel. This generates the TX-free or TXF interrupt on the Cortex-A7. It is important to understand that the IPCC peripheral does not manage any buffer. It works only like a doorbell. The RPMSG messaging uses two bi-directional channels. A channel is used in one direction to inform that an RPMSG buffer is available and in the other direction to inform that the RPMSG buffer has been released. In the STM32MPU embedded software distribution, the IPCC channel 1 is used for messages from the Cortex-M4 to Cortex-A7 core, and the IPCC channel 2 is used for messages from the Cortex-A7 to Cortex-M4 core. On Cortex-A7, the IPCC is controlled by the mailbox software framework. On Cortex-M4, the IPCC is controlled by the HAL IPCC software driver. In the STM32MPU embedded software distribution, the RPMSG protocol is used for the inter-processor communication. This slide is a presentation of the RPMSG protocol and its associated concepts. The RPMSG protocol consists of sending a message from a local address to a remote address. The message is stored in a buffer in the shared memory. The VRT-IO layer is in charge of the management of the buffer lifecycle using ring buffers with associated descriptors named V-rings. On each Cortex, an RPMSG client offers a service. The client is identified by its service, defined by a service name or its endpoint, defined by an address identifier and operating callbacks. On service registration, the RPMSG framework sends a name service announcement message to the remote processor. In this message, the address of the client endpoint is provided. On the remote processor side, the RPMSG framework checks if a local client has registered the same service. In this case, a channel is created between both RPMSG clients and the local RPMSG client is informed that a channel is bound. The virtual UART is an example implemented for STM32MP1 devices to demonstrate an application using the RPMSG protocol. The aim here is to simulate a UART over the RPMSG protocol. On the Cortex A7 core, an STM32RPMSG TTY driver is an RPMSG client, implemented to expose to Linux UserLand, a serial TTY console, and to perform the adaptation layer between the UART and the RPMSG protocol. On the Cortex M4 core, a virtual HAL UART service is the RPMSG client. It provides an HAL UART API for application and also provides the same adaptation layer between the UART and the RPMSG protocol. The RPMSG channel is created by the initialization of the virtual UART on Cortex M4. This triggers the creation of an endpoint for the TTY RPMSG service and the sending of the new service announcement to the Cortex A7. On the Cortex A7 side, the message is processed and the corresponding service is associated to the STM32RMPSG TTY driver. The Linux driver is probed to create an endpoint and a dev TTY RPMSG0 file system interface is created. From now on, the Cortex A7 core is able to send messages through the virtual UART channel. Note that it is possible to create several instances of the virtual UART. In terms of RPMSG protocol, this action consists of creating a new endpoint per instance. The result is the creation on the Cortex A7 side of a dev TTY RPMSGX interface. The co-processor resources management corresponds to the implementation of a mechanism for the management of the peripherals in a multi-core system. First of all, to understand the resource management, some concepts have to be defined. The term peripheral assignment means the action to assign a set of peripherals to a Cortex context. For instance, an I2C peripheral can be assigned to either the Cortex A7 or the Cortex M4 core. At the opposite, some peripherals or resources must be shared across several contexts. The shared resources are typically system resources, like RCC for the reset and clock control. This slide presents the possible peripherals assignment on the STM32MP1 platform. In line with this concept, the legend of this diagram distinguishes two possible states for each peripheral. Assigned means that only one hardware execution context is using the peripheral at runtime. A single-color box means that the assignment is static, whereas a double or triple-color box with vertical separator means that a user choice is needed to assign the peripheral to a given context, depending on its application needs. This assignment can be done with the STM32QBMX tool or manually, and it is reinforced by hardware isolation for the Cortex A7 secure and for the Cortex M4 core. Shared means that the peripheral can be concurrently used by two or even three different execution contexts. This mode implies registers banking or other mechanisms that ensure no contention can happen between the given contexts when they access a common resource. For instance, TIM instances on the left can be assigned to one runtime context and will only be used by this one. The RCC block just below is a system peripheral that can be concurrently accessed by the three runtime contexts. The purpose of the following slides is to explain the concepts implemented on the STM32MP1 platform for the management of the assigned and shared resources. All the mechanisms involved in the assignments are further described in STWIKI pages. Note that this diagram shows the STMicroelectronics recommendations or choices of assignment in the STM32MPU embedded software distribution. Additional possibilities might be described in the STM32MP1 reference manual and may be considered later on in this distribution. To understand the implementation of the resources management, it is important to understand the associated challenges arising in a multiprocessor environment. On STM32MP1 microprocessors, two contexts are running in parallel. When a peripheral is assigned to one context, this peripheral has to be managed only by this context, so the coherency of the global system has to be ensured. To achieve this, the STM32 CubeMX tool offers an interface to assign the peripherals. The STM32MP1 also offers the possibility to isolate a peripheral to the secure part or the Cortex-M4. The isolation is managed by the Secure Firmware TFA through the Extended Trust Zone Protection Controller or ETZPC Table Configuration. The STM32 CubeMX tool can also manage this by configuring the TFA device tree involved for the isolation. Lastly, some of the resources are shared resources. These resources are generally required to operate the peripheral. Depending on its type, the system resources have either to be protected by a hardware semaphore or exclusively managed by the Linux content to ensure coherency in the system, for instance, the clock tree management. In the same way, the STM32 CubeMX tool can help to correctly configure the shared resources. This screenshot shows the example of an assignment as it can be done with the STM32 CubeMX tool. USART4 is assigned to the Cortex-A7 non-secure context for Linux. USART5 is assigned to the Cortex-A4 context for STM32 Cube. Depending on the configuration selected, the STM32 CubeMX tools assign the peripheral to a Cortex context by isolating the peripheral by configuring the TFA device tree, configuring the Linux device tree to enable or disable the node depending on the assignment, declaring the system resource involved to operate the peripherals assigned to the Cortex-M4 core or generating CUBE firmware initialization code for the peripheral assigned to the Cortex-M4 core. To help control and check the resource assignment, some services have been implemented in the STM32-MPU embedded software distribution. The first service is the peripheral assignment check implemented in the STM32 Cube firmware. This utility relies on the ETZ-PC table and allows the check of the peripheral isolation before its initialization. The second service is the shared configuration set. Implemented on the Cortex-A7 core, it allows the configuration of the clocks and regulators associated to a peripheral. These shared resources need to be managed by the Linux OS for the power management optimization. Then each Cortex firmware is in charge of the configuration of the GPIO and EXTI resources. The protection of these shared resources are ensured by the usage of the hardware semaphore relying on the HSEM peripheral. Lastly, a dynamic resource reconfiguration service has been put in place to allow a co-processor to change on the fly the clocks and regulator settings. The peripheral assignment request service relies on the bus firewall controlled by the Extended Trust Zone Protection Controller or ETZ-PC table. Core modes are available depending on the secure non-secure context and the assignment to the Cortex-A7 or Cortex-M4 contexts. Note that the isolation of the non-secure Cortex-A7 at the opposite of the other modes is not a real hardware isolation but a software protection, as accesses from the Cortex-M4 core are still possible. The ETZ-PC table is filled by the TFA's security firmware based on its device tree. Then during the initialization phase, an access is granted. On the Cortex-A7 core, the U-boot updates the Linux device tree according to the ETZ-PC table. Depending on the assignment, it enables or disables peripheral nodes declared to the Linux usage. On the Cortex-M4 core, the application has to verify the peripheral assignment before initializing it. The Resource Manager utility provides services to grant this access. This picture gives an overview of the software architecture related to the check of the peripheral assignment. In this example, the TFA firmware has isolated and assigned the peripheral X to the Cortex-M4 core and the peripheral Y to the Cortex-A7 core. On the Cortex-A7 core, during the boot stage, U-boot has enabled the peripheral Y node and disabled the peripheral X node. As a consequence, only the STM32 driver for the peripheral Y is enabled. On the Cortex-M4, the application grants the access right to the peripheral X thanks to the Resource Manager utility and then calls the HAL API to initialize it. The main purpose of this service is to correctly manage the configuration of the shared resources in the multi-core system. To protect these resources, two strategies have been implemented. The protection of the shared resource by a hardware semaphore and the management of the shared resource by only one Cortex. The protection by hardware semaphore relying on the HSEM peripheral is used by default. This consists of getting a hardware semaphore to grant exclusive access on the critical registers of a shared resource. This is the case, for instance, for the GPIOs and the EXTI configurations. Some other shared resources, like clocks and regulators, need to be managed by the Linux OS. The main reason is the integrated power strategy natively implemented on Linux OS. Indeed, the clocks and regulator chaining can be represented by a tree with parent and children nodes. This tree is monitored by the Linux OS to disable or enable parents depending on the state of their children. For instance, Linux needs to be informed about clocks used by the Cortex-M4 core. Without this information, it considers the clock is not in use and could stop the PLL clock as the parent node. This slide shows the software framework involved in the configuration of the shared resources by the Linux OS on the Cortex-A7 core. The remote processor system resource manager, or RPROC SRM framework, has been developed by ST to declare and configure shared resources used by the Cortex-M4 core to operate the assigned peripherals. This framework parses the device tree for nodes associated to the Cortex-M4 peripheral and configures the shared resources accordingly. An SRM device driver instance is enabled per sub node. On the Cortex-M4 side, the clock and regulator resources are not initialized. Only the peripheral clock is gated as clock gating registers are Cortex-dependent. This slide shows the software framework involved in the protection of shared resources by the hardware semaphore. On the Cortex-A7 core, accesses to the GPIOs and EXTI configuration registers are protected by the hardware spin lock that gets an HSEM semaphore if free or waits for its release. Please note that the U-boot bootloader also uses the same hardware semaphore during the boot stage. On the Cortex-M4 core, accesses protection has to be managed by the application, granted by the Lock Resource Utility API. This utility offers a simple interface to lock or unlock a semaphore depending on the resource to protect. Regarding the code configuration, the STM32-Cube-MX tool generates the code to manage the shared resources. This slide describes the generation of the shared resources for an I2C port assigned to the Cortex-M4 core. In the Cortex-A7 device tree, the resource manager node is declared with an I2C1 sub-node defined with its associated clock and GPIOs declaration. Please note that the I2C1 node used to assign the peripheral to the Cortex-A7 core has been previously disabled. Note also that the pins used for the I2C port are also declared in the device tree. This part is optional, as GPIOs are protected by a hardware semaphore. But this declaration can be used for debug to allow Linux OS to cross-check that pins are not reserved for another peripheral. On the Cortex-M4 side, the clock initialization is protected by the System Resource Config allowed service. Indeed, in some specific cases, the Cortex-M4 core may have to initialize the clock tree. This is the case if the Cortex-M4 is loaded by the U-boot stage and so started before the Linux firmware. Using this service allows the use of the same firmware independently from the booting mode. In this example, the GPIO configurations are protected by a lock service that gets a hardware semaphore. Here is a more complex example using most of the concepts previously presented. It explains how to share an I2C port between the Cortex-A7 and the Cortex-M4 cores on an STM32-MP1 platform. In this example, the I2C peripheral is managed by the Cortex-M4 core. This means that the I2C has been isolated for the Cortex-M4 thanks to the ETC-PC table and that the Linux firmware SRM framework has configured the clocks needed to operate the I2C buzz. To transfer the I2C flows between the Cortex-A7 and the Cortex-M4 cores, a virtual I2C link over RPMSG can be implemented, with the same principle as the virtual UART proposed in the STM32-MPU embedded software distribution. On the Cortex-M4 core, an I2C proxy is required to enable the sharing of the I2C resources from the Cortex-M4 application or from the virtual I2C-HAL. For more details on coprocessor management or on the STM32-MP1 product itself, please refer to the STM32-MP1 user guide available on ST.com.