 Hi, this is Saeed from SD Microelectronics. In this video, I'm going to show you how to use the open asymmetric multiprocessing framework to establish inter-process communication between two heterogeneous cores using the STM32 H745 microcontroller. The agenda for this video is divided into eight sections. We'll start with the use case for the inter-process communication, and we'll do an overview of IPC protocol. Then, we will have a short overview of open AMP. In section five, we will discuss the shared memory between the cores. In section six, we will highlight the need for cache maintenance for the shared memory region. Section seven, we will commence the hands-on example using the STM32 Q by DE. Finally, at the end of the presentation, we will quickly review the expected result that we should see on the terminal console. For the hands-on, we will test the inter-process communication using the open AMP framework, and we will use the remote processor messaging APIs to achieve this. Since cache is enabled on the Cortex-M7 core, we need to watch for incoherent cache. We will take steps to prevent incoherent data. The hands-on project will be implemented on a bare metal MCU. However, open AMP is flexible and can also be used with an RTOS enabled. As a side note, please be aware, you can access all the materials discussed in this video through the link shared in the description. For this project, we will use a single Nucleo H745 Zi dash Q board, a single micro USB cable, a host computer to connect the Nucleo boards to. The process is the same for other dual core STM32 H7 MCUs, so you can use a different dual core STM32 H7 if you don't have this specific board at hand. The only required software for this example are the latest STM32 Q by DE. In my case, it's currently version 1.8, but you can use newer version if available. And we will be printing some strings on the virtual terminal screen. In my case, I will be using the TeraTerm. You can also use other serial terminals. First, let's start by quickly discussing some use cases requiring inter-process communication protocol. Event notification. The event notification can be used to synchronize the execution of tasks on both cores, especially when data processing, pipelineing, is needed. A practical example of this case is a speech activation pattern, detection via CPU2 user interface and its command execution implementation using CPU1. Asking for remote service. In order to reduce concurrency and driver click duplication, including stacks when using peripherals, some services can be implemented on one of the CPUs. This action helps to reduce code size and other concerns related to resource sharing. Some examples of service are file system management and serial communication interface. Another example is payload processing. A functional split can set up CPU1 to perform computation intensive work while real-time tasks such as sensor acquisition and control would be located in the second core. The overall power consumption is reduced by placing CPU1 in low power mode while CPU2 is pre-processing data. For example, data acquisition, handling connectivity and more, before waking up the CPU1 to run intensive data computation algorithms. Let's review all the options available to us for inter-process communication. Synchronization protocol using the device resources. A lightweight inter-process communication channel can be implemented using the available peripherals built into the STM32 H745. For example, hardware semaphore, peripheral and EXTI controller, interop lines can be used to signal events or communicate the availability of data. Other peripherals such as DMA channels can be used to transfer data and generate an end of a transfer notification from one site to the other site. The next option is free RTOS message buffer and stream buffer starting from version 10.0.0. Free RTOS implements inter-process communication APIs. Message buffer allow variable link discrete messages to be passed from an interrupt service routine to a task or from one task to another. Stream buffer allow a stream of bytes to be passed from an interrupt service routine to a task or from one task to another. And the last option is open AMP framework. Open asymmetric multiprocessing framework provides the required software components to enable development of application for asymmetric multiprocessing systems. It standardizes the interaction between operating environments in a heterogeneous embedded system through open source components such as remote proc and RPMSG. RPMSG is a component of open AMP framework. It allows for inter-process communication between applications running on different CPUs. This slide briefly reviews open asymmetric multiprocessing and the remote processor messaging used for IPC. Open AMP uses the OSI layers model for the inter-core communication protocol. In asymmetric multiprocessing systems, the most common way for different cores to cooperate is to use shared memory-based communication. The whole communication implementation can be separated in three different ISO or OSI layers. Topmost layer is a transport or the RPMSG. Second is the media access control layer, which implements virtual IO. And the bottom layer is a physical layer which consists of the shared memory and the hardware semaphores. Figure illustrates the software layers used for open AMP IPC. The virtual IO abstraction layers allow both RTOS and bare metal app on the master processor to interact with the remote processor, making open AMP flexible in different environments. The first step to implement a shared memory area is to choose a memory area that is available and accessible by both cores. STM32 H745 implements a symmetric memory mapping between two cores. As presented in the table, this architecture allows to have approximately 82% of the SRAM directly accessible by both CPUs. We can use that for data exchange. SRAM4 and the backup SRAM are both accessible to the Cortex-M7 and Cortex-M4 and they remain available when D1 and D2 domain are in low power mode. We will use D3 RAM for sharing memory between cores and for message exchange. The AMBA high performance bus SRAM4 is mapped to address hex 38 million and it's accessible by all system masters through domain three AHB matrix, as shown in the diagram. SRAM4 can be used as a basic DMA buffer to IO data from peripherals in domain three. We can also use SRAM4 to store code and data while domain one and domain two enter D standby mode. In D standby mode, RAM and registers are powered down and consequently data is lost. SRAM4 is available to all masters so therefore it can be used to share data between the cores. When dealing with shared data and memory while enabling cache, it is important to consider data coherency. The developer must define the adequate strategy for cache maintenance. Here are a couple of different implementation scenarios. One option is to perform runtime cache maintenance using CMSIS library functions. In this scenario, the data cache must be cleared before notifying and using the data from CPU2. Another option is to use memory protection unit or the MPU to mark a shared memory region as non-cacheable to ensure data coherency and consistency. Okay, now it's time to prepare for our hands-on example. Let's do a brief overview. This application will show how to use OpenAMP Metalware to create a remote processor messaging communication channel between Cortex-M4 and the Cortex-M7 core on the STM32-H745. We will use memory protection unit to mark the shared memory as non-cacheable. This will prevent data incoherency issues when the cores are exchanging messages. Furthermore, we have to modify the linker script to create a special memory section for data sharing between the cores. As previously mentioned, this special memory section will be located in SRAM4. To keep things simple, the example will be implemented on a bare-middle environment without any RTOS. The OpenAMP Metalware is composed of two main components. First, being the resource table definition in the C file, rsc underscore table.c, which is allocated in the shared memory area at address hex 38 million. And the second is the messaging protocol, RPMSG, which provides the messaging infrastructure using the mailbox and virtual IO buffers. OpenAMP Metalware uses the following resources from the MCU. For the physical layer, hardware semaphore is used within the mailbox API for event signaling between Cortex-M7 and the Cortex-M4. And the D3SRAM peripheral for buffer communication between Cortex-M7 and the Cortex-M4. When editing the linker script, keep in mind that the D3SRAM, otherwise known as SRAM-4, has its address start at hex 38 million with a total length of 64K. To be more precise, the reserve shared memory region for this example consists of the shared resource table at address hex 38 million with a total length of 1K and the shared memory region, which start at address hex 38 million 400 with a length of 63K. The main clock configuration for both cores are done by the Cortex-M7. So the clock config code is located only in the main.c for the Cortex-M7 project. Cortex-M7 will be the master core and it will wake up the Cortex-M4 core from the stop mode. Both CPU-1 and CPU-2 will initialize the openAMP Metalware in their main.c code, which uses the hardware semaphore and the remote processor messaging APIs. Cortex-M4 will create that communication endpoint. Cortex-M7 will send a string message to the Cortex-M4 using the RP-MSG service. Then Cortex-M4 will print the string onto the serial terminal using the UR-3 peripheral. We're going to use UR-3 because it's connected to SD-Link virtual COM port. Before we start the hands-on, please take a look inside the zip folder that is linked in the video description. In there, you'd find the source code that we will add to the hands-on project later on. The zip folder contains the completed IPC example, hands-on project for your reference. A PDF of all the slides discussed in this video and a code patch for both Cortex-M4 and M7 cores. Code for Cortex-M7 core main.c, code for the Cortex-M4 core main.c, modification script for the Cortex-M7 linker file. And finally, the modification script for the Cortex-M4 linker file. Now let's get started by creating our STM32 project space. Under MCU selector, we will search for STM32 H745 ZIT. For project name, we'll go with STM32 H745 underscore openAMP underscore demo. We begin the IOC configuration by first enabling the hardware semaphore interrupt lines for both the Cortex-M4 and the Cortex-M7 core through the NVIC-1 and the NVIC-2 controllers. Fulfilling the prerequisite, we can now enable the openAMP middleware on both cores. Next, we enable the data and instruction cache on the Cortex-M7 for increasing performance. Doing this also prompts us to enable the memory protection unit on the Cortex-M7 because we want to mark the shareable memory region as non-cacheable. This will help us with preventing any data and coherency when the two cores are exchanging data. For the control mode, select fourth option from the drop-down menu. The specific mode is not important here, but you can refer to the program menu for details on each MPU option. We're going to create a single MPU region. So enable region zero here. The region we want to protect and add a cache policy to is SRAM-4, which has the address of hex 38 million. Make the region size 64 kilobyte, which encompasses the entirety of SRAM-4. Each MPU region is further divided into eight subregions. This field allows us to disable those subregions if needed. For now, leave it at zero. The next six fields will determine memory region attributes and the cache policy. Leave the MPU extension bit field level at zero. This field defines the cache policy about how data is written between the cache memory and the system memory. You can read more about this field in the ARM Cortex M7 manual. Next, for MPU access permission field, select all access permitted. For MPU instruction access permission, select enable. Set MPU shareability permission to enabled. Set MPU cacheable permission to disabled. And finally, MPU buffable permission is set as disabled as well. Next peripheral we're going to enable is UASRT-3 for the Cortex M4 core. We're going to send message from UR-3 to the ST-Link virtual comfort, which will be sent to the host for printing onto the console. Set the mode as asynchronous, which is just the UART. Leave all other settings to default value. Since the virtual comfort is connected to specific pins of UART-3, we need to remap the default pins shown to PD-8 and PD-9 as specified in the user manual UM-2408. Okay, let's start changing the pen assignment. Hold left control P on your keyboard and hold left mouse button under green pen GPIO. Then drag the pen to where you want it to move over to. If you notice a magenta color warning sign next to the clock configuration, you need to use the automatic clock issue solver to make the issue go away. That was the last step for IOC configuration. Let's begin making changes to the C code. Click generate to convert the IOC file settings into initialization code. Let's pull up the main.c file for each of the cores. To save time for this video, we're going to copy the source code from the zip folder that's linked in the video description. Next, we'll add these fixes for preventing compilation error for both CM4 and CM7 core. Please remember that these patches might be obsolete in the future when these fixes are implemented by our development team. As of making this video, I'm currently working with STM32Q firmware H7 version 1.9.0. We're going to add the source code modification for Cortex-M7 next. Similarly, we add the source code changes for CM4 core. We have to make the changes to the linker script for the CM7 core. The goal is to add the resource table and the shareable memory region into SRAM4. This is described in the linker script. Next up, we make similar changes to the CM4 core linker script. Again, this is done to add the resource table and the shareable memory region section into SRAM4. Finally, we are done making the modification to the source code. We can start compiling our project and check for any errors. Do the same for the Cortex-M7 core. Click build project. The warning message seen here for the Cortex-M7 project is a known limitation. This is not going to be an issue for us here so you can ignore it for now. Next up, let's begin configuring our TerraTone console for receiving the string message from the Cortex-M4 core SD-Link virtual COM port. Create a serial connection with a COM port number which you can find under your device manager. Under setup, select serial port, then configure the bug rate to match the rate we selected in the IOC file setting for the UR3 peripheral. Leave everything else as default. For the last step, we're going to configure the SD-Link debugger setting for each core. We will begin the configuration setup first for Cortex-M4. As a side note, please remember, you can find the exact details for each step of the dual core debugging setup in the application node AN5361. Here, we're going to uncheck the download for Cortex-M4 because the Cortex-M7 project will download the CM4 project for us. The only thing we have to check here is the loading debug symbols, which is used when we're going to step through the code. Next step, let's set up the debug configuration for Cortex-M7. Since we are debugging two projects at the same time, we have to increment the port number for the GDV server by a value that's at least four. We're going to download the Cortex-M4 project along with the Cortex-M7 project. Note, it's important to have the CM4 project listed first, like shown here. Finally, we can commence the debugger and run our inter-process communication program. Start the program by launching CM7 debugger first. Next, launch the debugger for Cortex-M4. First, we're going to click on the green play button for the Cortex-M4 project. Proceed to click on the green play button for the CM7 project. We can also pause the individual projects and see which lines they are currently waiting at. You should observe the following string message on the virtual terminal console. Let's recap. We are establishing an inter-process communication using OpenAMP framework. We are using the RPMSG service to send a string message from the master core, that being Cortex-M7, to Cortex-M4 core. The slave core's job is to now print the string message onto the terminal screen via your three peripheral, which is connected to the virtual COM port. Relevant application nodes, data sheet, reference manual, and the user manual can be found in the following links. We hope you found this video helpful. Thank you very much for watching.