 Hello, everybody. I'm Juliana. I'm working at NXP. I'm working here for over 10 years. I've started working on software tools for software analysis based on hardware trace. And then I've moved to open source. I'm working on Linux drivers for security subsystem. Now I'm part of the audio team. I'm working on sound upon former Linux drivers from sound subsystem and also Zephyr. So today I'm going to talk about how we can run Zephyr on Hi-Fi 4 DSP from Tensilica. So let's see the content. We're going to start by looking on the hardware overview of the Iodemic Satan Plus. Next we'll discuss about the current support for the Hi-Fi 4 DSP in Zephyr and what's next, what samples we've enabled and what we want to achieve. We're going to speak about the Linux and Zephyr communication setup, generic communication between the two OSes. We'll touch each of these frameworks listed here, Remote Proc, RPMHG, Mailbox and OpenAMP. I'll discuss a bit about the challenges I faced while adding the support for the Hi-Fi 4 DSP and about our future plans for this core. So first, this is the diagram for the Iodemic Satan Plus. As you can see, it's split in multiple subsystems. We have security, we have display, we have audio, video and machine learning and others. But our focus for this presentation will be the main CPU platform. So we have four Cortex-A53 cores and two secondary cores. We have a 10-silica Hi-Fi 4 DSP and Cortex-M7. But we want to leverage the power processing from the Hi-Fi 4 DSP. We can use it not just for audio, video processing but also we can migrate neural network workload because it has support for TensorFlow Lite. It's a framework that runs machine learning models with just a few kilobytes of memory. And also we can enable optimized and easy to integrate third-party software libraries for voice communication, audio processing, neural network function, Cortex and so on. So how the interaction between the application processor and the Hi-Fi 4 DSP will work. So we have the application processor. This will be in charge of starting the DSP and also loading the firmware on the secondary core. And in various stages we want to have the communication between the two cores. So we'll have inter-process communication. On the application processor we're going to run Linux and on the secondary core we want to use Zephyr because of its rich feature set and also because it has support for multiple platforms. So Hi-Fi 4 DSP, it's from Extensa. It's not ARM architecture or other, so it's Extensa. So the current support, this was added a few years back. So the architectural part in CPU core was already there. We added the SOC support and also our board. We have the NXP8 DSP IRMX8M. We added the support for SoundUponFormer. This was the main goal to support SoundUponFormer. And for this we also needed to include an overlay in the Extensa hall for IRMX8. This current support for SoundUponFormer has a specific firmware loader and inter-process communication. This is in Linux. It's a custom driver in SoundSubsystem. And this is in charge of powering the DSP and loading the firmware on it. Next, what we want to achieve is move from this specific firmware loader and IPC to a more generic one in order to use the DSP, not just for audio but also for other use cases. As I mentioned earlier, we can use the DSP in multiple ways. So on the Linux side we'll have a generic firmware loader and the IPC, which will communicate with the IPC from a generic framework on the Zephyr side. How we want to achieve this was by enabling some samples. So we started with a simple hello word. We moved to synchronization, which is a sample that demonstrates how the kernel scheduling timing communication works. Dining philosophers, we all know it, it's a classical multi-thread synchronization problem. For the inter-process communication we enabled OpenAMP resource table sample. This demonstrates and it's compatible with Linux running on an application processor and Zephyr on a secondary core. And we also want to enable others. So we are also open to suggestions what we can improve. So the extensor hull we was already there. We enabled it, as I said earlier. So we need to also include the NXP hull where we have the drivers implemented and also some external libraries. I've mentioned here OpenAMP and we'll see about this later. So for the generic Linux and Zephyr communication setup there are a lot of discussions, a lot of presentations. I've added here some of them. We have the links and everything. But mostly they are focusing on communication between two ARM cores. So we have an application processor Cortex-A and as a secondary core we have a Cortex-M4 or M7. One is running Linux, the other one Zephyr or in some cases Zephyr on both cores. So our scope is to have a secondary core and extensor architecture. So we have the high-five for DSP, which runs Zephyr. And we also want to move from a specific firmware loader to a generic one. And in the next slides we're going to answer to the questions here. How is the application loaded and how is the DSP started? So we're going to use the remote proc and how the two cores communicate. We'll have RPMsG, mailbox and OpenAMP. So next we're going to take each of these frameworks. We'll detail them a little bit and see how we use them to enable our samples. So first is the remote proc. This is a framework that starts the DSP and also loads the firmware in the coprocessor memory. There are multiple ways to start the secondary core. So through CCFS interface, this has specific start, stop, or commands. Also you can load the firmware. You can give just the name of the firmware. But this has to be in a specific location, like in the linux.pl slash lib slash firmware. Or you can give the absolute path of the firmware. Second option will be to start to load the firmware when the remote proc driver is probing. But usually this is not recommended because the linux file system might not be ready when the driver is probed. But this can be fixed by using initramf to boot the kernel or having the remote proc driver not built in but compiled as a module. And a third option will be to start the DSP before linux is booted. That is done from U-boot and is usually used if you have some hard constraints on the boot time. Also remote proc offers support services to monitor and debug the remote processor. So how we use this? We're going to use this diagram throughout this presentation. So we have the application processor with linux. We have the Hi-Fi 4 with Zephyr. And on each resultant we're going to take and discuss the frameworks we've used. So for the firmware loading and control, we have the generic remote proc driver. This has a set of callbacks used to start or stop the core to load the firmware and parse the firmware in order to set the associated resources like IPC or memory care valves. It also has some callbacks to kick to notify the coprocessor when messages are available between the two cores. And next we have the IMX DSP to proc. This is our proc. This is platform specific. So this implements the callbacks that are in charge of specific resources like registers, clocks or memory. So in our case in IMX DSP, our proc, we had to implement the start and stop functions, the parse firmware and load firmware because we have a right restriction on DSP. And the kick method as we'll see later on because this notifies the coprocessor that our message is available through mailbox. So in order to have our specific remote proc driver, we have to define the DSP node in the device tree in Linux part. We'll see also in Zephyr. So in the DSP node, we have to define the compatible. This is based on this, we load our specific driver, the IMX DSP, our proc driver. And also we need to define the memory region properties. These specify the buffers and rings, the base address, and the sizes for each of these buffers and rings. This will be used for firmware code and data. And as we'll see also for inter-process communication. An important aspect here is that these addresses here must be associated with the memory mapping from the linker script in Zephyr. And also it's not quite related with the remote proc, but in Zephyr I've enabled the UART, UART 4 node here. I've added in the DTS because this is the one used for DSP. And based on the compatible we have there, we load the driver from the NXP hull. And having all of the above enabled, now we can run the Hello World Synchronization Philosopher samples. And that was it. So just some configuration in the DTS. So let's see how the communication between the two cores is working. So as mentioned earlier, there are multiple frameworks we used. First is the RPMsG. This is a remote processor messaging. This is a messaging mechanism, which is implemented on top of Virtio interface that uses Virtio rings or shared rings buffers to send and receive messages from one core to the other over shared memory. We have two notions here, very important for RPMsG. Here we have channels and endpoints. So an endpoint has a local address and a callback associated with it. And actually an endpoint provides a logical connection through a channel. So when a RPMsG driver creates an endpoint, when an incoming message comes through that, when the driver gets a message, compares the destination address of that message to the local address of the endpoint. And if they match, then that message will go through that endpoint. So this is basically how the communication works between the two cores. So let's see how we use the RPMsG. For our sample, we use OpenAMP resource table sample from Zefir. And for this, you have to include to add DTS overlay. And here there are two important things to add. So first is this Zefir IPC shared memory. So we have the DSP SRAM 3 node, where we define the base address of the shared memory and the sites. This base address should be in sync with the virtual ring from the DTS from Linux. And also the size should be large enough to include not just the receive and the transmit rings, but also the shared buffers. This is a very important thing to keep in mind. Another aspect in Zefir, we had to add the resource table section in the linker script of our board. So this resource table is actually a global variable implemented as a structure in which you can define the resources that the coprocessor requires before it's powered on, like a contiguous physical memory. Or you can define a customized resource table depending on the features you want to enable where you can include some entries for, I don't know, buffers for trace or some virtio resources that will be needed by the coprocessor in order for inter-process communication. So this resource table has to be, as mentioned, in a specific location, in a specific section in the linker script because also on the Linux side, a remote proc framework is looking for this resource table because it allocates the memory mentioned in this resource table. It also is in charge of loading the virtio and rpmhg framework and also allocates the buffers in case you want to use trace for remote proc. So very important here, we have to implement the fine-loaded RC table callback. Otherwise, the resource table won't be found, the buffers won't be allocated, and actually the communication between the cores will not work. Next framework is the mailbox. This is actually in charge of sending and receiving messages from one part to the other. It's based on a mailbox controller. This is platform-dependent, and we also have a client which actually oversees the messages to send and receive. So how we use the mailbox? We have the same as with remote proc, we have the generic framework, the mailbox, and our mailbox controller, IMX mailbox. For this, we have to add in the DSP in node from the device tree in Linux, we have to have the compatible. Based on this, we load our driver, and also we need to define the inboxes and inbox names, properties. So what these two are doing? So the inboxes defines the messaging unit that we are using. So we have here the messaging unit revision tool, and the inboxes, we have transmitter receiver and the receiver with doorbell. So what does the IMX mailbox is doing? So this is in charge of handling and configuring the interrupts that are coming from the hardware, from the messaging unit peripheral. So the mailbox names from there, the differences between those three, we have transmitter receiver and the receiver with doorbell. The differences between these are based on the registers we're using to configure the interrupts. So we have transmitter register, receiver-based registers, or general purpose interrupt registers for messaging unit. We also have a mailbox on the Zephyr side. So we added a mailbox node in the DTS. This loads the IPM, I don't mix driver. This is in charge of handling the interrupts on the Zephyr side. We also have a driver in NXP Hall. This is just doing the messaging unit initialization and setup. And this mailbox node from the Zephyr side will be used in our DTS overlay for inter-process communication. So as you can see here, we have the Zephyr IPC, which uses the mailbox node, and we enable the mailbox, we set the status as OK. So last is the OpenAMP framework for inter-process communication. This actually does what all the above do. So this encapsulates the remote proc and RPMHG frameworks. It's based on these open source frameworks. So it provides runtime libraries, tooling, and just is up the communication between the course. OpenAMP can be run on Linux, can be run on real-time OSes, and also bare metal. It's using limb metal to access the share memory. But our use case here, we use OpenAMP just on the Zephyr side, on the Hi-Fi 4 DSP, because on Linux we have the generic frameworks already up and running. So we have remote proc, RPMHG, and VRTIO, which communicates with VRTIO from the OpenAMP in Zephyr. So in order to enable this, actually we didn't do anything. We just used the sample, the OpenAMP sample, and this enables the OpenAMP. It's just a configuration, you just set it on yes, and that's all. But in order for this application to work on our specific target, you have to add this DTS overlay, where you have to mention the share memory, inter-process communication share memory, and the mailbox node for IPC. So having all this, now we also have communications between the two cores. This is like a conclusion. These are the steps for Linux and Zephyr communication. So on Linux side, Uboot starts the kernel. Next, the remote proc loads the firmware on the DSP and starts the coprocessor. Zephyr boots on the Hi-Fi 4. On the Linux side, the RPMHG driver creates endpoints and when it does, it sends a notification to Zephyr. Zephyr creates also endpoints and send the name service announcement to the Linux side. This handles this specific message and creates the link between the two cores and the messages can come and go. But what's up with this name service announcement? So this is a feature. This is enabled by default. It's mostly used for demos. It's easily enabled using the configuration here. And what it does is actually creates the channels, the RPMHG devices dynamically. So on the remote side, we create a remote service which has a name and the local address. These two are sent to Linux through a special structure. We have a RPMHG NS message structure. It's sent to Linux. Linux knows how to handle it. Actually, it maps this name and address with an endpoint created in a step before and it sets the link between the two cores. So we have the endpoints and now as mentioned a bit earlier, we can send the message from one endpoint to the other. So challenges I faced. These are challenges but more like things to be aware of. There is a vast documentation for remote proc, RPMHG, OpenMP, but these are separately. So I couldn't find anything, let's say, a step-by-step guide on how to enable all these. So the documentation could be improved. I'll look. I will work on that also. So for the Linux remote proc part, it's very important, at least for IMX, we had a four bytes write restriction because our applications are written in IRAM and this has a four bytes write application. So we had to implement our own mem copy and memset functions. And also for inter-process communication, we have to implement the fine-loaded RC table. This is mandatory. I will explain why the resource table is very important because this is the structure where we have the rings defined and the remote proc from Linux is in charge of allocating the memory for those buffers. So if we don't implement this callback, the resource table won't be found, the buffers won't be allocated, so actually the communication won't work between the two calls. On the field part, for the inter-process communication, be aware that the shared memory should be large enough, not just for the rings, but also for their buffers. And also the messaging unit must be correctly initialized and all interrupts enabled. For IRAMX, this is quite easy because we're just using some configs and based on those we enable all interrupts or some of them. And for the OpenMP, here, at least for the Hi540SP, we have to do a decation validation when reading the status from the resource table. Otherwise, this is not updated. So no message comes through Zephyr. So as future work, we plan of enabling and maybe creating other new samples in Zephyr that will use the DSP API. But first, I have to upstream this OpenMP support. This is a work in progress, so it will be done soon. We want to benchmark some DSP. We want to have some DSP results because we want to see how good is the DSP and why we should use it. And maybe use a generic loader for other samples, like the SoundOpen firmware. We are also open to suggestions. So thank you. That's all for me. Yes. Thank you for the presentation. It's very nice. You said before, you need to define the device tree to the core. In Zephyr, it's the same. You need it. You have also a specific... You don't have a specific core in the device tree. You have a device tree specific for our board, for this DSP part. And in that device tree for the DSP, you'll add the mailbox, the word, or other peripherals that the DSP will use. Okay. So in Linux, we have a device tree which defines all the board with the A cores, with the M cores, and also with our DSP. And on Zephyr, we have a device tree only for the DSP. And there we enable the peripherals we want to use for the DSP. Thank you. You're welcome. I have a question regarding reusability from this code you made. Is it also planned to port it to an EMX RT 500 or 600? 600? Well, actually, because we want to use... And we demonstrate that we can use generic frameworks for loading and starting the secondary core. We can try with an IRMX RT. So, yeah, it's in-plan. So a future plan will also include that in every single IRMX RT. We're on Zephyr. Yeah, we can use Zephyr. So we can use OpenMP on both... We can use OpenMP on both cores. We currently use only RP... RP message slide. So we cannot start or stop the DSP now. We cannot use the DSP. We cannot start or stop the DSP. Well, OpenMP it also has remote proc. So OpenMP has RPMHG and also remote proc. So we can try and use OpenMP also on the M-core, which will start the DSP. I think this is possible. I haven't tried, but I will try to see if it really works. But theoretically, having the OpenMP with, as I said, both life cycle management of the coprocessor and RPMHG that will be possible. Thanks. You mentioned multiple times about the importance of sizing the shared memory and the Zephyr. What's the process you go through to determine that sizing? Well, actually we have these are defined the sizes and everything are defined in the linker script for the DSP. And we know how much SRAM we have for the Hi-Fi 4 core and we put the shared memory to fit in that SRAM. And also it's based on the sizes we give for the virtual rings in the device tree in Linux. So as mentioned, you have in the device tree in Linux, you have the virtual rings and the base address of there and the size. So you put that size for the for the buffers. Hi, thank you for the presentation. So my question is how did you handle the board, the virtual board in Zephyr 3? You create like separate shared board or I don't know exactly we don't have a virtual board. So it's just the board the Hi-Fi 4 you're meaning okay, I think you're meaning the NXP iRMX 8M board. I meant the separate core which is actually you can handle it. That was added like a separate board but it's it's in charge only on the DSP. It's just adding to be honest I added this to port I think two years back and I used the starting guide from the Zephyr documentation for how to add the board. I don't know exactly how it's called the documentation. So I went through that documentation step by step and this is how I added the board. For DSP it was kind of easy because you just add the core. It's a 10 silica core, it's LX6 compatible and I've added the shared SRAM and I-RAM we're using for the DSP and that was it. Thank you for the presentation, very nice. I'm a bit more familiar with the iMX 8MM, the smaller friend and it has a Cortex M4 as a companionship and we had some troubles in finding the partition so you have to assign from Ubuntu one peripheral to either the bigger core or the smaller one. Well, the partitioning I think it's also done in Uboot, I'm not very sure but for the peripherals you can do it at least for the DSP and I think it's also available for the M-core, you can do it from the device tree. So you have, as I've showed for the UART, I have the UART4 enabled for the DSP because this is the one used, we have four UARTs UART4 is used for the HiFi 4. So you can do this from DTS. Any final questions? Thank you, Juliana. Thank you.