 So, I am Diego Suero, I am an embedded Linux platform engineer at Sepura in Cambridge, UK, and I am the CEO of Embarcados, which means Embedded, a website with articles about embedded systems development in Portuguese. First of all, I'm sorry about my terrible English, but I hope you will be able to understand what I am trying to pass to you, okay? So this is going to be our agenda. First we are going to talk about these heterogeneous multiprocessors in real time applications. Then we will introduce to open AMP, then to RPMSG, then we are going to talk about a variation of this, another implementation that is the RPMSG light. Then we will see how we enable the RP message on Linux. How do we enable the RP message light on Zephyr? How is the communication set up between the Linux and Zephyr? With some luck, a demo. It's working here. Let's see if we will actually show you if this is going to work and the future work about this. The organizer shrunk the presentation from 15 minutes to 40 minutes. Let's see if I will be able to make it in 40 minutes. If not, if I have to go through quickly to these slides or I can't show you the demo, I will be more than happy to show you offline your address, any questions after the presentation, okay? So before we start, just a few words. This work I've made with the hardware reference platform is an IMX7 processor from NXP. The full OpenAMP implementation was not used and it's not being totally covered here. And you will understand the reasons during this talk. The full AMP on Zephyr is supported on mainline, but mostly the demo application for these is for NXP processor with Cortex M0 and Cortex M4 and there is an instance of Zephyr for each core for the OpenAMP. And of course the work is open source. It's not mainline yet for both Linux kernel and the Zephyr. So let's talk about real-time applications in HMPs as known as hybrid multiprocessing or asymmetric multiprocessing. You choose the flavor that you most like, okay? The idea is different CPU architectures and combinations can be found in the same SOC like application cores with Cortex A something, DSPs, FPGAs, low-power and real-time performance cores like Cortex M4 or graphics acceleration, video encoding, decoding and these kind of things. Some applications may have requirements like real-time performance, performance optimization, low-power consumption, fast-booting, system integrity and security, usage of certified software solutions or even reuse legacy software. In the Linux kernel with the pre-empty RT patches can meet some of these requirements, but turning, customizing, debugging, maintaining, updating is costly in terms of knowledge, time and money. With the hybrid multiprocessing, you can have complete isolation and partitioning of the software domains. You can have sensors and actuators hub and even a reduction of the BOM costs. But some challenges will come up, inter-processor synchronization and communication, efficient power management, shared resources isolation and protection, cache coherency management so like the remote processor can be accessing an outdated data. And the SOC vendors are investing a lot in these new hybrid architectures for different marketing verticals. Let's take a look how a typical hybrid multiprocessing looks like in a top-level view. So on the left side here, we have our application core, Cortex A something with its dedicated apps for referrals and memories. And on the right side, our real-time core with, again, it's dedicated real-time applications for referrals and memory. And we will have this shared resources like referrals and memory to be able to exchange data between these two cores. Here in the left diagram, this one, we can see the differences between SMP, synchronous multiprocessing and asynchronous multiprocessing here. So here we find mostly one instance of the operating system running on the top of the same core architectures. And on heterogeneous arrangement, you will have a different operating system running on a different core architecture. On the right side, it's shown the difference, the different cores in a shared bus topology where you have your different cores in here, a bus fabric that's its shared between these cores, and they can eventually share the slave device. These slave devices, we will eventually generate interrupts that is, again, can be shared between those cores as well. So let's see a real-life example here for the IMX7 solo processor. We have our main CPU here, Cortex-A7, with its own features, caches, and here we have the secondary CPU that is Cortex-M4 with its own caches and features. In this case, for the IMX processors, the NXP has these hardware units that the first one is the resource domain controller. That is, it will effectively manage the access to the bus, to the bus fabric of the SOC and the cores to be able to access these common resources. Then we have the messaging unit that is a set of mailbox registers that we will enable you to share data between those cores or even generate interrupts to notify each core. And then you have this semaphore that is basically a hardware-enforced semaphore. So we saw how hybrid multiprocessing looks like. Let's talk about now the frameworks and protocols options that we have available to communicate between these asynchronous cores. So the OpenAMP is a standard managed by the Multicore Association and it is implemented in both Linux kernel and Zephyr mainline. It is composed by the remote proc, which stands for remote processor, which is a framework for lifecycle operations that allows the master to control and manage the remote processors. So operations like power on, power off, reset, firmware loading are implemented in this framework. Then you have a messaging framework that is the RPMSG, the remote processor messaging that provides the inter-processor communication by using the VRTIO component for shared memory management, when you want to send or receive the data from to the master and remote core. It's all about shared memory. You have proxy operations as well where user space apps running on the master side have transparent access to the remote using file system calls like open, close, read, write. It's very simple, it's very transparent in this matter. And proposed by ST and is still in discussion the resource manager that is the RPROC SRM that is a management for the shared system resources like memory reset clocks and shared peripherals resources between the master and the remote without conflicting to each other. This is still in discussion, there is no implementation that we can test or something like that. It depends on the LibiMetal as an operating system environment abstraction layer and a hardware abstraction layer as well. And there is working in progress to decouple the remote PROC from the RPMSG to be used independently. And because of this, it will not be used on our demo here. And you will understand the dependency between them in this slide. We are going to take a very quick look on this diagram because the idea is not to go into detail, but roughly after the master side here receives the remote firmware image, it will decode it to find a resource table that is in the header of this firmware. This resource table basically contains the information about the communication channels supported by the remote. With this information then, the master will create the VIRT-IO device to communicate with the remote. Then it will actually boot the remote processor. Here, we will start the remote processor. And the remote processor, when it initializes, it will get this resource table information and create its own VIRT-IO device, and then advertise the remote channels to the master, and now they are good to go to communicate. And here in the middle, it is showing how you generated this resource table and put in the remote firmware. There is a restriction today that is for the IMX7 devices, the remote PROC mainline driver does not implement the VIRT-IO device creation using the data from the firmware, the resource table. And instead of these, NXP implemented in their repository, all these VIRT-IO initializations, device creation, rings, cues, inside the RP message drive, by using the data from the device tree. So these data will be set in the device tree, and the RP message driver will consume this information, instead of the remote PROC getting the resource table from the firmware and then create it. So now let's talk about the RP message. It's very simple, okay? Following the OZ standard, we have the physical layer here, where we find the shared memory and the inter-core interrupt, it's the mailbox. Then we have the media access control layer, which is basically implementing the VIRT-IO and the VIRT-IO cues. And then the transport layer, that is the RP message implementation for the messaging transactions. For the physical layer, so here in the left diagram, we have the shared memory that is used to exchange the message and interrupt lines to notify each processor. On the right side for the NXP, we will use the message unit to be able to exchange the data, but the message unit is not the shared memory. So we use the interrupts control registers in the message unit to notify each processor, and we use one register to notify the other processor if we wanted to receive the data or send the data. But the data itself is not in this unit, okay? The media access layer, the VIRT-IO, we will not have time to cover the VIRT-IO component in details, but as a top-level introduction, it basically is used to transfer the data in a shared memory region using single writer, single reader, circular buffer. And each side transmission or reception has two ring buffers, used and available ring buffer. The ring buffers roughly contain the address of the shared memory location where the RP message data is, okay? And in the OpenAMP Wiki, there are details on how the RP message framework uses the VIRT-IO and all the data structures, okay? There are these two presentations that gives more detail about the VIRT-IO itself. And finally, the transport layer, we find the RP message protocol implementation. The message is, again, is stored in a shared memory area, and the address is on the VIRT-IO virtual rings. And it's very simple. You have 32 bits for a source local address, then 32 bits for the destination address, reserved area, then you put the length of the payload of the data, and some flags. This is it. In a more higher-level view for the RP message, we have here our master and, for example, a remote. In each side, you have an endpoint, what we call endpoint, with a local address, right? And when a communication channel is announced, is created, these endpoints will create a logical link. So, in the same shared memory, you can have different channels between different endpoints, okay? So here is an example of when the master wants to send an information, it puts the data, of course, its local address and the destination address. When receiving, the remote side will decode this, okay, so this message is for me. It's from this address. So yeah, this is the channel, and then we'll carry on on your application. Now let's take a look in the RP message light. So it's altered and maintained by Marek Novak. This implementation, it's a simplification of the extensive API implemented by the Open AMP. It has a smaller footprint when compared to the Open AMP implementation. You can see more details about this in the GitHub page. It has an option to use an static API, so no malox, and to reduce the code size. And this can be very beneficial for a small system. And it's totally decoded from the remote proc. There is no remote proc dependency here. It has two subcomponents. One is the queue, that is a blocking receiving API, which is commonly found in the RTOS environments, and requires an implementation in the environment adaptation layer. The other component is the name service, which allows the communicating nodes to send announcements about the channels, like creation and deletion of these channels. This name service announcement is implemented in the Linux kernel side, and it's mandatory for the remote to send this announcement in order to the Linux kernel create the channel and be able to communicate with the remote side. The architecture. So in this diagram, you can see how the source code is structured. You basically have here your application source code, then you have the API layer that is actually implementing the API, the name service, the queues, and the core of the RPMSG. Then the engine that is implementing the V2.io, V2.q. And then the porting layer, which is split in the environment porting layer. So like your freeRTOS, Bermetto, or Zephyr, and the platform that is an abstraction layer for the underlying hardware that you find on your system. The RPMSG light is fully compatible with what is implemented in the Linux kernel side or with the RPMSG standard. This is a diagram that we can see the interaction between these components and some examples of function calls that you will find between these components. And the source code is very simple and it's very easy to follow. So let's see how do we enable the RPMSG on Linux. In this case, we are following the platform that we are using, that is the NXP-IMX7, and using the NXP Linux kernel source code. So this is not in the main line. And this is regarding the 4.9 version. When you set the config SOC-IMX7, it will automatically select all these RPMSG-related configs and the MU driver. And here you can see the source code location for these drivers. And it will select as a module these IMX-RPMSG-TTY that is actually a driver that we will expose the RPMSG channel to the user space as a TTY device. So from the user space it will look, the remote will look like an ordinary serial port. So no secrets from the user space side. The device tree, so in this case for the IMX7 solo, the DTSI, we needed to add the message unit node, the register range, the register address, all the interrupts, the clocks that it's using, and the RPMSG node with the compatible string. For the hardware, for the board, like in this case the warp 7, which uses the IMX7 solo processor, we will set on the RPMSG node the shared memory region. And we needed to instruct the kernel that this is a reserved area. You are not going to map this area. We disabled the UR2 interface because in this demo I am using the UR2 for the M4 side as a normal console, serial console. So let's see how do we enable this RPMSG light on Zephyr. So the IMXMU driver is still in review in Zephyr. So then I created a fork from this PR and added support for the IMX7 specifically and the warp 7 board. The NXP guys tried to add the RPMSG light on Zephyr, but the technical steering committee of Zephyr chose to only include the open AMP implementation as the default IPC mechanism. But it's still very easy to have the RPMSG light compiled alongside with Zephyr. So because of this, I created a fork from the RPMSG light to support the Zephyr, that environment layer and supported the IMX7 platform as well by using the MU driver. The MU driver on Zephyr will implement the Zephyr IPM that is interprocessor mailbox API that is defined in this header. And here you find all the source code location related to the MU driver. And we configure and we use the IPM driver aligned with the Linux side where the RPMSG will use 4 bytes with the register index 1 for the messaging direction control using the bit 16. So I think if it's 0, it's receiving, if it's 1, it's transmitting. The porting layers, in the RPMSG light, the Zephyr porting layer is defined in this header and it's implemented in this source code file. And it provides general OS functionalities like memory handling allocation, the allocation, new taxes, operations. And for the platform side or the IMX7, it will implement the API defined in this header and it's implemented in this source code that is basically exposing or abstracting how to use the IPM driver on Zephyr. To build it, we need to select the subsystem, the IPM subsystem inside the Zephyr that is controlled with the config IPM and select our lower level driver. Implementation of the IPM in this case is going to be the IPM IMX. To build the IP, the RPMSG light itself, we control by using this configuration and it is compiled alongside with the application using the normal kconfig, the PRJconf and CMake lists file. It's very like when you compile a normal Zephyr application. And this is a list of all the source code related to this remote echo sample app. We are going to see in the details a little bit more about this app. So now let's see the communication setup between Linux and Zephyr. So in a top level view, on the left side we have our master domain, the Linux running on the A7 and on the right side the remote domain running Zephyr on the M4. For our demo on the power app, the U-boot on the master side is responsible for loading both the Linux kernel and the Zephyr on the M4 side and start the boot images. When the Zephyr boots, it creates the VTQs and waits for the master to signal that the link is up. In parallel, the kernel boots and then the RPMSG driver creates the VTQ endpoints and then notifies the remote processor that the link is up. The remotes and then after notifying, it will wait for the name service announcement. On the remote side, by receiving this link that the link is up, it will create the endpoint and send the name service announcement. After that, both sides are good to go to send and receive messages. Let's see what do we have for our demo. Simplistically, so I have here is the warp 7 board, it's a very small board, so again the A7, the master domain with the Linux, we are connected to the UART1 of this system and the remote side running Zephyr in the M4 core. We have the UART2 connected. Between those two cores, the RPMSG by using the MU and the shared memory for exchanging the data. All the source code, how to compile, generate the Linux distro and Zephyr image and it's all documented here and you can have access later. Let's take a brief look into the remote app. The code is striped because to fit in the screen. We set our local endpoint address and here we have a set of defines that needs to be aligned with the Linux side, like the shared memory address, the max size of the RPMSG the max size of the RPMSG messages and they string with the channel announce, the name service announce. Then this is our task. We basically declare the variables for the RPMSG instance, the queue, the endpoint, in this case we are using name service so we can have a handle. When it receives a name service, it can call back and you can do a process that you want. Here is the initialization process. We call the RPMSG remote in it with the shared memory address, the link ID and then it will return a pointer to the RPMSG instance. Then we are going to wait for the link and as soon as the master side notifies, we create our queue and with regarding that instance and then with the queue and the instance we then create the endpoint, the local endpoint with that address. We send, now we are good to go to send the name service announce and here we are inside a while true, good to go to receive and send data. In this case we are using the receive block API as soon as data arrives, we format that data by pre-pending an echo string and send it back to the remote. We can see here it will fill with the remote address and then we do whatever we want and then we send back to the remote address. Let me show demo, now it's time. In this side we have the A7 UART. Can you see? Is it good? The size? Yes. On this side we have the UART from the M4 running Zephyr. Here we stop it on U-boot. If I issue a reset, I'm going to reset the system, cross finger. We can see that Zephyr booted, is waiting for the master and then when the RP message driver came up, it notified, then Zephyr send the name service announce back and on this RP message driver in the initialization after receiving the name service announce it will send a hello-word string. If we look into the D-mask, we can see that when the RP message driver is coming up and it says that the MEU is ready for exchanging data to send and notifying data to the other core, then it will say that the RP message host is online to register the driver and then when it receives the channel announce, it created this channel and then it notifies the T2i driver that the channel is there and the T2i driver is ready to go. So if I, for example, do an echo here, then we can see here that it received 7 bytes, ELCE and then it send echo with the string back. If we use the micro-con, so whenever I type here, I don't know why it's not updating controls. So if I keep typing here, so I can, for each character is sending and then receiving it back, almost finishing, so let's continue, so this is the demo is very simple, it's just a proof of concept that we can have Zephyr and Linux exchanging data. So the future work, so what we need to be upstream and in the upstream, so the remote proc driver, the mainline driver dealing with the RP message, the VIRTIO creation for the RP message usage, the MU driver and the RP message drivers. We have some patchworks to deal with this. On the open AMP, hopefully having these decoupling between the remote proc and the RP message. On the Zephyr, actually having the IMX MU driver merge it and maybe having the RP message light as alternative and I want to conduct some latency measurement tests by varying the memory type because on these IMX7 you can use external memory, the DDR or internal memories, it has a set of internal memories that's supposed to be fast, much faster than the external memory. The static and dynamic memory allocation, copy, no copy mechanism, message buffer size, so change the message buffer size and even the number of buffers. So here the references and yeah, this is it. Any questions? I have a microphone here. So on the IMX7, the architecture does that the Linux processor is the master, but maybe another architecture, it will be the opposite, maybe the M4 is the master. So is it possible to do that Zephyr protects the memory of Linux and so on? Yes, it is. But there is a, with the RP message, okay, from the software perspective it is possible to have the Zephyr as the master controlling what happens with the A7. But we have a constraint in the hardware for the IMX processor in this asymmetric arrangement that is the A7 is the primary core. It is the core that you have actually started the M4 core. You can't boot first on the M4 and then boot on the A7. It's a harder constraint. Actually, that's a good question because I don't know on the Linux implementation if it is possible the Linux to be a remote. I don't think so. I think that the Linux is only master. I can be wrong. I don't know if someone here has this information, but I think that, but from your application perspective you can implement saying that, okay, no, the M4 is actually controlling things in the A7. So you can do this from your application, yeah. Just a short note that you really should have a look at the mainline kernel because a colleague of mine recently mainline several patches related to RP message there. Okay. And I don't know about the exact state, but you should have a closer look. Yeah, I looked at two weeks ago. So in two weeks a lot of things can change, right? You mentioned that in your demo you bought load the Zephyr to Cortex M4. Does it mean that it was loaded to RAM memory? Or this is just called to flash memory? In this case the Zephyr is being loaded in a TCM memory that is a known ship memory of 32K. So it's a known ship memory. I think this application, this demo has 15K of flash or 21K of flash and 15K of RAM. It's very small. Any more questions? Okay, so this is it. Thank you very much.