 Hello, everyone. My name is Silvio. I've been working at Expressive for a few years now, and today I would like to talk to you about how to enable multi-mage support on Zephyr using the symmetric multi-processing approach. The main idea of the talk today is how would you like to talk about the Zephyr project in Expressive's maintainers? Just a few people working on that about the ESP32 development status as well. Then talk about the multi-mage support, A&P, how it works, and I prepare some demo in here that I would like to show you. I'll run through the code and talk about the next steps when it comes to ESP32 support and asymmetric multi-processing. The first ESP32 commit on Zephyr was back then in 2017. It was at release 1.9.0. From 2017 to 2020, Expressive ESP32 was only supported in I2C, EWARTS, and GPIO, just those peripherals. When you build the application on Zephyr, the application was working only from RAM. It was not in Flash, it was not like Execute in Place. Then Expressive joined this team and we have started contributing to the project since 2021. This is the first public talk I have about this, so I'd like to show who are the team working with Zephyr right now. Lucas Tamborino and Marek Batey, that's here. They are developers for this project. Ricardo Tafas is the project manager. He works with us and also with internal Expressive project and me, Silvio Alves. Of course, all the efforts that we have been applied on supporting ESP32 was not only about ourselves, so I really need to thank the TSEC members for all the support that we had, all the discussions, even public or private discussions, and also the community. We have a lot of commits from the community to fix bug fixes, bring new features, so this is really helpful for the overall project. When we started working on ESP32, we just took a look of all the peripherals we had and the first ESP that we started to support was the common ESP32. We added support for most of its peripherals, including Wi-Fi and Bluetooth. We are still missing a few of them, like I2S and some hardware crypto functionalities. After ESP32, we started working with ESP32-S2. It's a single-core version, only with Wi-Fi. Then we started working part in ESP32-C3 and then nowadays ESP32-S3. So this is the current status of the development by June 2023. We have been also started working on part in ESP32-C6, that's the latest module. That's the one which has the first Wi-Fi 6, the first SoC from Expressive that supports Wi-Fi 6. We have started working on this so soon, this will get a PR on Zephyr. Back again to the motivation about this talk. Why multi-image and MP? When we started working on part in ESP32 on Zephyr, the first thing we did was enabling SMP, symmetric multiprocessing, because it should be easy and it's kind of easy to implement. But then when we started working with the dual-core and SMP, we found ourselves like stuck on issues related to kernel awareness, especially on the network and Bluetooth stack. So if we enable both cores to work as SMP, then we started having issues with Wi-Fi and the network stack and the Bluetooth stack, so it wasn't kind of working as we expected. This was more than one year ago, so this probably has been updated. Anyway, back that time, we started looking on the AMP approach. The AMP means that we wanted to have two different applications, each one running on each core, so that was a different approach from SMP. So that was one of the reasons that I'm talking about AMP and the reason we started supporting this. So why multimation AMP? First, when you work with AMP, so we have two different applications, so it's really possible that you can increase your product performance, either because you have more processing power to perform some specific task or some algorithm. So this is like, for me, it's a really good motivation, product performance. Another one is you can use the CPUs for different scenarios, so one CPU for low power and the other CPU for some DSP algorithm and high power usage. Then we can put the network and Bluetooth stack on a CPU just to handle all the networking content, which kind of takes a bunch of processing and the application running on the second core. So it allows us to put some services running on a specific CPU, some services that occupies a lot of CPU processing. And of course we can also use multiple firmware architecture, so we can add on CPU zero, for instance, some Zephyr application running in Wi-Fi and Bluetooth. In the second core, we can have some bare metal or some free or toss application. This also can improve real-time operations and critical applications as well. When it comes to critical applications, we even start to think about some kind of critical-safe certification, but this is something we're still kind of far away to get into this, but eventually this is kind of in our roadmap as well. Finally, some examples of the usage of the AMP support is like motor field-oriented control or sensor-refusion algorithms, which takes a lot of CPU processing, so having this in a specific core detached from other network core is really an advantage for the product. Also, if you're working with some HMI, some human interface devices or image handling or camera video, so you can put all this image processing dedicated on a CPU. And also, if you're working with machine learning or AI algorithms, it's also a good approach as well. Well, how it works, how the AMP works on Zephyr, on overall context. I don't like to dig in too much deep on this because explaining how OpenMP works takes a lot of time, so I just put everything on a single slide, but the idea is when I have two CPUs running different applications, you've got to have a framework that handles the communication between those two cores. So in Zephyr, we use the OpenMP framework, so OpenMP stands for Open Asymmetric Multi-Processing. So in Zephyr, if you want to do this kind of communications between two different CPUs, you can enable the OpenMP feature. It's a framework. Once you enable the OpenMP framework, it's going to bring a message protocol, like the ARP message, remote process messaging. So the ARP message is also a kind of a framework, that's the framework that you use on the code itself, like the APIs are there. So ARP messages are those APIs that you add into your code to consume and to send data. So ARP message uses a Mac layer. It's called VITIO. So the VITIO is the library that it uses, that it uses to allocate packets, using ring buffers and some kind of synchronization methods. So that's like the Mac layer. And then VITIO uses the IPC, which stands for Inter-Process Communication, as physical layer. And in Zephyr, the IPC is implemented as IPM, so Inter-Process Mailbox. So in case of ESP32, if you just open the Zephyr RTOS code, you're going to be able to find that IPM underscore ESP32.C. So it's the same for all the other vendors. So this is the place that we put our custom vendor functionalities. So finally, every time a core sends a message to another CPU, what happens is that the IPC is responsible to notify the other core about this new data availability. That's where the cross-interrupture occurs. And that's how we get those callbacks, telling that we got new messages from the other endpoints. And finally, the IPC is implemented over a shared memory. It's our shared memory. And with cross-core interrupt functionality. Well, basically that's the whole idea behind the OpenMP and cross-core communication. Regarding our implementation for the MP support in ESP32, we have some guidelines we are currently implementing. And for instance, our IPM or IPC is... It happens over a shared memory. So our RAM memory is partitioned in three different regions. So we have a dual-core CPU. So core number zero uses a part of the flash. The CPU number one uses another part of the flash. And the IPM uses just a small part of the end of the partitions, like 16 kilobytes of run data. Currently, although we are talking about the same ESP32 module, which has two cores, to have this working, we need to face that each core is a board or SOC. So although ESP32 is an SOC, for Zephyr environments, we have to tell Zephyr that we have two different SOCs in there. One for CPU number zero, one for CPU number one. The reason is that the way Zephyr works at now is that we have to have linker scripts for each CPU to define the memory region and all those alignments and flash region and all those data needed on the linker script. Another second important information is that currently peripheral sharing is not possible. So it means that if you are using I square C zero on CPU number zero, you are not allowed to use the same I square CPU if you're on CPU number one. So at this moment, it's not possible. So the architecture awareness is required regarding implementing both applications. Finally, CPU number zero executes from flash and CPU number one application that we call remote application still works from RAM. And we have been working also on MSU multi-build support. So it means that MSU is responsible for booting both applications running from flash, from XIP, executing place. So this is ongoing development and will be submitted soon. All right, so this was like the explanation about the reasons to working with AMP. And now I have here a setup that I just created to exemplify that. But before, let me tell you what we have here. So for this demo, we have here it's a NEMA 23 step motor and also a driver for the motor and a ESP32 S3 box-like device. This box-like device contains a display. It says ST7789 display over SPI. It has three buttons in front. It has also microphone and speakers to use for vice control devices, but we are not focused on that right now. ESP32 S3 is an SOC. It's a dual-core, extensa CPU. It works at 240 megahertz. It has 16 megahertz of flash and also internal 512 kilobytes. It also has external RAM, 8 megabytes, which we can use like to put some display images, big arrays of data, and a lot of different things. So this is the architecture that we have here. What's our focus on demonstrating the asymmetric multiprocessing? So you can see there is this ESP32 S3 whole module. We have on the left side the CPU 0 and on the right side the CPU number 1. So on the CPU number 0, we are enabling the Wi-Fi, which is going to connect the Wi-Fi of the event, which will use some MQTT interface to perform some publishing and subscription. We also have a USB-C connector on the box light board. This USB works as a CDC for logging and also for flashing and using debugging. We also have the display over the spy peripheral, which will show the temperature and the motor speed, RPM. On the right side, we have the CPU core number 1, that we consider the remote application, or let's say the Creech application, which has a smaller footprint. It doesn't have network enabled. It's just like controlling the motor and measuring temperature and handling buttons. That's what it does. The CPU core number 1, it has a motor interface over LEDC peripheral, which is a PWM peripheral. The speed control of the motor uses these three buttons over here. These three buttons is over ADC, so it's a restore array. And the temperature sensor is connected over I2C, which is that SAT3D sensor. And for logging the data from the core number 1, we use the UART peripheral. And as I told you before, the way to make both cores to talk to each other is over the ARP message framework, which is part of the OpenMP hold context, and they share the same memory area using IPC signaling over cross-core interrupt. All right, so how I build the project. First of all, I created an out-of-tree project from Zephyrs, and I added the CPU project sources. So this is a picture of it. So Summage is the name of the project. Then we have boards and all the source codes, and then the semi-clips, Kconfig, project configuration. Then I added the remote folder, which is another Zephyr project. So these are two different Zephyr projects, which also have the boards overlay, source codes, semi-clips, project configuration. I was questioned before, okay, but how does Zephyr handle building both applications? So the answer is currently, on the main CPU zero project, like the master project, we have to add this information on the semi-clips. So when we build the CPU zero application, the semi-clips have the information that it should first build the remote application. So it builds this as a Zephyr application. Okay, now talking about the CPU zero project. Okay, CPU zero project. It has Zephyr's overlay that I define all the chosen interfaces. I enable Wi-Fi, IPM, USB serial interface, and I added the display. Very thorough. I also have here, okay, for the display, I've been using the LVGL library. So for me to add contents, I also create a splash screen. This splash screen is the image of the OSS picture. You will see that later. So this is a binary array with that image. And also here, all the source codes. So we have the Wi-Fi services, MQTT services, and API, main file, and the display API handling. For the CPU number one project, we have the same overlay, but for that CPU number one, where we define all those chosen interfaces, like I2C, ADC, PWM, and UART, and the source files. We have the main source file, buttons handling, temperature sensors handling. Okay. So now I'd like to show you just a demo of how it works. I'll show the demo first, then I show some logging contents. Okay. Let me try to find. So, okay. This is the Zep motor, and this is a very, very simple display. Okay. I'm not a UART designer, so I just add some gauge in here, some temperature sensor values, label, and Zep for summit title. That's all. So if I press this button is to decrease speed, this button to start and stop the motor, and this button is to increase speed. So if I start the motor, that's going to run. I can decrease speed. Okay. And also I can see the proper RPM on the main screen. Not sure if you can focus on that. So you can see the RPM. I can increase and decrease this. Okay. It's also possible. Oh, sorry. Let me try to focus a little bit, but not sure if you can see, but there is a temperature sensor over here. Yeah, no way I can make this work better than this. Yeah, okay. So if I put my finger on the temperature sensor, you can see that it's going to... Yeah, you can see that, right? Yeah, sorry. I'm not able to focus properly. Yeah. So you can see the temperature sensor over there. So this is the setup. And what happens in here is that ESP32 is working again with both applications. One application is measuring the temperature and controlling the motor, and then updates the other CPU about this information. And the CPU zero is sending these matches over MQTT and also updating the display. Okay. Now, let me show you some real-time operation. So I have here the terminal. So this terminal is the logging message from CPU number zero. This terminal on the right side is the message from CPU number one. This is MQTT... I'm using Mosquito software from the command line. So this is receiving message from the test.mosquito.org. And on the right side is where I'm going to publish some information. So I'm going to reset the board so we can start all over again. So I just reset the board. You can see in this screen that we are like connecting to the Wi-Fi. So it connected. Yeah, it's connected. Got the IP address connected over MQTT and it's completed. So core number zero is like using Wi-Fi and it's connected to MQTT. And below here, it started publishing those messages. If I start the motor, then you're going to see here that the speed is changing. I can decrease it. I can put my finger again on the temperature sensor to increase the value over here. So it means that the core number one is sending messages to core number zero that's publishing those messages. On the right side, I'm just logging the temperature sensor that's been measured in real time. And the last thing that we could implement is that we could send some MQTT information to the core number zero. Like if I send this message, hello, Zephyr, and I press enter, core number zero is going to receive this over MQTT and over Wi-Fi and we could handle this information to control the motor or control the... any kind of periphery that we want to. Okay, that's all. Let me show you a bit of the code. So I'm not sure how many of you has been working with Zephyr. I'm going to just try to show a little bit of the code here. And, okay. So as I told you before, this is the whole project. We have here the source code for the CPU number zero, which implements the display of MQTT services and Wi-Fi services. So it's a very simple code. If you take a look on the main code, I just wanted to show you two different things. The first one is we have the main code here. Is it too small or you can read it? Okay. So you see here the main function here. So once the CPU zero starts, it awaits for the CPU number one to start. So once the CPU one starts, it sends a remote message to the CPU number zero, that it's alive. And then Zephyr application on Core Zero starts. So what we do here is just initialize Wi-Fi, initialize MQTT with a callback, and then it just keeps maintaining Wi-Fi connection. So if it's connected, connect to Wi-Fi. If connected, Wi-Fi connect to MQTT. If Wi-Fi is still on, keep publishing messages every second. That's all. On the remote, on the CPU number one, the main code is also very simple. We just start the temperature sensor, the motor sensor, we create some callback for the buttons, and we just keep looping every two seconds, measuring the temperature and sending this information over the IP message to the CPU Core Number Zero. Okay. Now the most important thing about this is how do we enable the OpenMP in the IP message services? So on the main code of the CPU Number Zero, we have to do two things. The first one is this is all the necessary code. So first we just create this system initialization function that registered the same point. So this is going to be called. And we call the ARP message service to register this endpoint. So this is going to tell the whole OpenMP services context that the CPU Number Zero is live with the IDE called demo, in this case. So the same code is added to the CPU Core Number One. It also initializes this endpoint. In here I added the same name because I'm not using, in fact, this value, but it registered the same ARP message endpoint, and both procedures are enough to perform the communication. So every time that CPU Number One sends a message to CPU Number Zero, this callback is heated, and I have all the information and data from the Core Number One. So it's a protocol mechanism with a callback for each side of the application. And last thing I would like to show you is about the project configuration. So in this effort, we enable most of the features using the project.conf file. This is where we enable all of those necessary libraries to have it working. So for the Wi-Fi, we enable networking, all the net packets, the size of the packets. We enable a bunch of stuff related to Wi-Fi, but when it comes to the OpenAMP supports, we need to add those three lines, which is we want to have the ARP message service enabled, and the Core Number Zero is not the slave one. So the same configuration is added to the Core Number One project. We add the ARP message services, and that's remote, and it's not the master one. So that's the way we enable the whole ARP message library to perform this communication. And finally, as I told you before, both projects have this overlay file. So for the Core Number One overlay to allow the cross-core interpreter to work, we have to select these chosen options here on Zephyr. So these tell us that Zephyr is using this shared memory area and using these IPC services. So that's one of the things that we have to do to enable this kind of feature for both CPUs. The rest of the file is like, it's all the UART information that we have to add, and with PINs we have been using, Bound Rate, and let's see PWM, and all of the peripherals necessary for the Core Number One. Well, yeah, that's it about the code. That's it about the demonstration. Now let's return to the presentation. So what are the next steps that we have to do regarding ESP32 support on Zephyr and AMP content context? So when it comes to ESP32-related stuff, so you are aiming to add multi-mage support using mcboot, so we can let both applications run from Flash. We still have to handle the shared memory region and linker script, because currently, at this demonstration, both linker files are hard-coded, so we just set what's the range of the run for each one, so that's not a good way to do that, but it works. Also, ESP32-S3 module has a peripheral called World Controller, so this peripheral is a feature on ESP32-S3 that you can manage peripheral access permission, so you can do something like this. So this is CPU number zero, so I2C access is only allowed to CPU number zero, so this feature, this peripheral allows doing this kind of control. So it gives you some manner like to avoid any kind of crashes, because both CPUs are using the same peripheral. And also we have to work with the evaluator how flash and cache access is going to happen during multi-core usage, because if both cores access in the same flash area, so this is probably going to crash, so we have to create some locking and syncing mechanisms to make that work. Finally, there is also an issue open on GitHub. I think Marty Bolliver was the one to bring that. It's called Better Support for Multi-core AMP SOCs on Zephyr. And there he brings three fronts, which is device re-updates and hardware models for AMP SOCs and build system. I agree these are really important points that we have to address to get the whole AMP solutions on Zephyr to make it easier and more elegant, let's say like that. And I think that was it. That's everything I had to show you. I hope you got the idea regarding working with AMP and multiple CPUs usage, okay? You have questions, guys? Yeah, go on. I can repeat there. Yeah, so that we are trying to move away for the app message service. It's kind of a legacy now. Yeah, so why? Because we have a new subsystem in place now in Zephyr. That is the IPC subsystem, really. Okay. So with the IPC subsystem, you are basically abstracting away any notion related to transport protocol, we tell you. So you can write your application in a way that is very generic and you can basically hook up whatever backend you want to do the real IPC, right? And with the IPC subsystem, you can also have a lot of information encoded into the device tree directly without using key config or something. You can define the instances into the device tree. You can have the shared memory into the device tree directly without having to do some magic in the code itself and into key config. So, yeah, so maybe you want to look into that because maybe there is also the cache management in there already embedded into the IPC service. So there are a lot of questions that were already solved there and maybe you are interested in that. So for the next generation of your demo, yeah. Okay. Thanks. Yeah, I don't have much to add regarding this, but I think that there are plenty of space like improvements overall, yeah. Any other questions? For any multi-core SFC sharing, memory and peripherals, there is a need to coordinate access to those shared resources. Do you guys have such mechanisms and are there any constructs in Zephyr that can abstract them? You mean I couldn't listen. You are talking about accessing same peripherals using both CPUs? Or same resource in a generic sense, either shared memory or a peripheral or a resource such as a register that may be of interest for both applications. If we take a look on Zephyr itself, we just use like mute access for that, right? Which is my understanding working if the concurrent threads are running on the same core, but if the two threads run on the separate cores, how do you envision abstracting the synchronization between them through a mutex? So you are meaning that let's consider that we have two, one thread working on its CPU, two threads, right? Say your networking core is passing packets to be utilized by the application core and ring buffer and you need to move the pointers in a deterministic manner so that the process of pushing and getting data need to be coordinated. Yeah, as far as I know, this is part of the OpenP framework, especially also on the IO interface that have all those ring buffers, that whereas the messages are added and you just get it from the other CPU. So, for example, it's possible to work with Nordic devices using the CPU NET core for the Bluetooth stack and then all those HCI meshes comes from the CPU NET to the CPU, the app CPU. So this all goes through this APC and OpenP mechanisms. So as far as I know, all this question related to synchronization and mute access, it's all embedded on the ARP message and VIRT IO core libraries. That's what I know until this moment. Thank you. Nice presentation. Thank you. It's useful for me also. So just a question. How do you start... Maybe I've missed it. How do you start the secondary core? So you have a primary core, this starts Zephir and this one also starts the secondary core. How does it do it? Also with OpenMP, I know OpenMP has also remote proc, the one that usually starts the secondary core. Yeah. So current implementation is that the CPU core number zero starts the second core, like during the CPU initialization on the SOC.C file. Before Zephir enters into the... It starts the RTOS itself, so it just starts the CPU number one. So everything now is handled in the application on CPU core number zero. However, we are aware of the remote proc, right, from the OpenMP framework, so we shall change to use the remote processing calls. Then we are not going to use CPU zero, like to start the CPU number one. This is going to be handled by the framework. But for this demo, we are still using the CPU number zero application to start the core CPU number one. Okay. I mean, we can, like, screw the code after this presentation, if you want to. Okay, thanks. Hi. In your demo, you have an application on the CPU zero, which is running from Flash, and CPU one is running on RAM. Why not both on Flash? Yeah, so we have started working on this. It's already... We have a working version that uses MCU boot. So the problem right now is executing from Flash for CPU number one is not possible because we have been using ESPIDF boot loader on Zephyr environment. We also support using MCU boot, and having this CPU number one application running from Flash is way easier when handled by the MCU boot loader. So before doing that, we have to add all these implementations on MCU boot, the multi-image support, and then MCU boot loader is going to like to perform the proper placement and running the codes in proper areas. So this is working progress. We have a working version that should be submitted soon as well. Then we're going to have this full MP working as expected. All right, folks. Thank you very much. Hope you got everything.