 Welcome to my talk about asymmetric heterogeneous multiprocessing, H&P, H&P, Mainline, Linux and Zephyr in Unison. Just a quick introduction to myself. I joined Toradex in 2011, I spearheaded there the embedded Linux adoption and I introduced the upstream-first policy and at times I was a top 10 U-boot as well as Linux kernel ARM SOC contributor and we have an industrial embedded Linux platform called Toryzen and this is all fully based on Mainline technology. So we are using Mainline U-boot with Distroboot, KMS, DRM graphics with Etnavif and Nuvo over-the-air update with OS 3 and we're using Docker respectively Podman as the container platform for applications. What am I talking about today? I will start with a quick evolution of the microcontroller then I look at the integration of those into the Linux ecosystem and I give a quick overview of some open-source real-time OS's that are available out there. Then we're going to dive into the H&P topic with looking at the life cycle. So how can you actually launch code on such platforms? And then we're going to look at Mainline Linux and Zephyr, how they can work in Unison. We look at the remote processor framework, remote proc, then the remote processor messaging, RP message. We also have a look at the communication libraries involved there like OpenH&P and at the end I will give a quick real-life demo here. I brought some hardware as well and we can see how one can actually run code. So the microcontroller, it basically started with the Texas Instruments TMS 1000, it was a 4-bit one that was in 1971, commercially available 1974. This combined read-only memory and read-write memory and all in a processor including the clocks. Then of course Intel with the 8048 which was a 8-bit microcontroller if it first shipped in 1977. It used EAPROM for development and for that you had usually quite expensive those ceramic packages that had this quartz window that you could with ultraviolet light, you could basically erase it and then in production usually you had a one-time programmable one. Then of course the Intel 86, the IAPX86 was the first 16-bit microcontroller that shipped first time in 1978. However that required several additional ICs around it to really realize the full system. Then another family is from Motorola, the HC05s. They had a serial bootloader and EAPROM program storage that entered in late 1980s. Then another one is the microchip, the PIX. They also used EAPROM, they entered in 1993 and they first time allowed rapid prototyping with in-system programming so that you could really use some kind of a probe to programming in the system. Also in 1993, Atmel introduced the first microcontroller that used actual flash memory technology. Those 16-bit microcontrollers there are the dominant volumes since 2011. That shows 8-bit microcontrollers, they're still around and also still before 2011 were still dominant in the market. Let's have a look how such microcontrollers they integrate into the Linux ecosystem. Of course you can have independent such microcontrollers that are interacting with Linux system. For interfacing often for example I2C may be used or it may be mapped to a memory mapped IO bus or SPI, serial peripheral interface. All of those are of course supported by internal subsystems. Another subsystem or functionality that might help in integrating such is the generic register map support. So the reg map, basically you see here where it's documented and where you can find the source code in the Linux kernel and that allows you to basically more generically access registers of such microcontrollers when you want to interact with them from a Linux system. Another functionality that got integrated on the Linux side is the firmware loading. There is an internal firmware IPI. This got first introduced actually to update microcode for CPU erratas. So you might remember the early Intel Pentium days where they had some issues there and basically required in-field updating such microcode and that's when this kind of functionality was born and introduced into Linux. So nowadays it can also be used to update the device driver firmware so it might be required to be loaded on such devices microcontrollers. So you might have a network controller or a storage controller that has additional microcontrollers where you can update the firmware like that. Another thing that it might be used is to actually update device driver information data. For example, calibration data, so maybe in a factory that device might get calibrated and that calibration data can then be updated also with this infrastructure. Of course it might also be stored in some e-prom or things like that. How does it work? You can have a firmware request. It can be either synchronous so basically if the requested firmware is available then it is copied to that device's firmware and then that entry can be released. Or it also allows asynchronous such requests. Another thing that exists is special optimizations for reboots. So if you, for example, have a server or something with a storage card with such firmware if it just does a reboot it can use it, basically it can cache it, the firmware so it doesn't have to basically reload the whole thing. There is also firmware upload API. It has a persistent sysfs node so it allows at runtime that the user can just initiate the firmware update via sysfs. And another place where this can be used is, for example, for FPGA programming, for example, on the Intel Stratix 10 SoC. And the documentation for this is available there. Another subsystem is the multifunction miscellaneous devices or multifunction device drivers, the MFD. This basically allows when you have some kind of heterogeneous hardware blocks that might contain more than just one kind of non-unique varying hardware functionality. So you can basically share stuff. So it can, for example, use an external bus to interface with this microcontroller that then can be used by different functionality. And one way this is also used is usually for memory registers that contain some miscellaneous system registers. That is also often known as so-called system controllers or syscons. That's on many SoCs you find that used there. And the documentation for MFD I noted here as well. And then, of course, there are also HMP or HMP-integrated remote block RP message, which we're now going to have a closer look at. But first, let's have a quick look at the available real-time operating systems. I'm actually not going to go in detail here. You'll find this on my slides. But there is like ECOS, FreeRDOS is quite popular. There is the new one from ARM, the M-Bed OS, and Micro-C OS 3, Nuttex. Riot is an interesting one. Then RT thread is more dominant in Asia or I think mostly in China, I'm not sure in Japan. And ARTEMs and, of course, Zephyr, which I guess Kate already introduced most of this. Very interesting to repeat what she said. So it's really nowadays the most active open-source real-time OS project. Now let's have a look. How does the life cycle look like? So one possibility is that you would basically boot such a course directly from the boot ROM in some kind of a boot container. I had a look at some of the hardware platforms that we use in our company and it turns out this is not supported on the IDOT-MX7 solo or the dual. So this is, I copied it from the reference manuals. It's also not supported on the IDOT-MX8M mini or the 8M+. So they do actually not support that you immediately from the boot ROM would just boot this M4 or M7 core. They basically only boot the A-core first. Of course you can immediately in the A-core have some kind of a little bootloader that then loads the other one. We will get to that later. However, this is supported on some kind of special platforms that also are often used for like safety critical systems where you really, it allows you to directly from the boot ROM load stuff. For example on the IDOT-MX8 quad max there the system control unit is basically already running on an M4 and the boot ROM basically really starts executing on this one. However, the application processor and the Cortex M4s, they have no direct access to those hardware mechanisms that are used there in this system control unit and it's basically envisioned by the vendor that you load this proprietary SU firmware on there. But who knows, maybe there once will be some community project that will rewrite such firmware and have an open source option available, but right now this is rather very proprietary. And I have not looked at other SOC vendors that might have better options available in this area. And like I mentioned before, of course it's one possibility is that even so if you start booting on the A core, you can then, even from a bootloader like for example Uboot, you can then load firmware for a M4 or M7 core. In Uboot this is done using the so-called boot aux command. So it supports booting auxiliary cores. There is this config imxboot aux in our case. So when you have a NXB iDot Mx SOC, it's available for iDot Mx6 where there is the SX which has an additional heterogeneous core or the iDot Mx7 or the 8M including the mini and the plus. And also the Fibrit which is kind of a special, you know, came from the automotive side but it's more or less a similar kind of architecture. Of course the implementation of this boot aux is SOC specific. So how exactly these cores are basically started and things like that. And this code you might find at that path I've given here. But the implementation of the actual command boot aux, that's, you basically give it the address where you have loaded the code that you then want to execute on that core. And it may do optional memory reservation. It's usually done via device tree so the bootloader will pass this also to the Linux site and there is a config available for that as well. Where you can basically configure the base address of it and the size. And it supports M4 respectively M7. Remember the iDot Mx 8M plus has actually an M7, while the other ones have an M4 or M4s. And you can either load the firmware as raw binaries. The command to do that is I've given here. So there not only you have to load it to a certain address, you also have to make sure then to flush the cache and then go on to actually booting it with the boot aux command. Much easier it is to use ELF files because remember ELF files, they have that information about the addressing in their headers. And then it's much easier you can basically just load it to some load adder, it doesn't matter because the boot docs command will then parse the header and figure out where exactly it has to actually put it and will execute it from there. Then a third way to do it, which is more interesting in terms of my talk, of course, is that you actually do that from the Linux side. So modern SOCs with such heterogeneous remote processors, they are often available in such asymmetric configurations. And there, basically it allows different platforms, architectures to control, to power on, load firmware, power off those remote processes. And it also, of course, allows abstracting such hardware differences. So from a Linux kind of a user point of view, this is all transparent to the user. Then it further, of course, also adds RP message virtual devices so you can actually then also communicate after you loaded your code and started it on those, you can then also communicate with those additional cores. The user API is basically, it has a call R proc boot. So that allows you to boot it. Booting meaning it loads the firmware and powers the core on. Usually it's about clocks and power domain stuff like that. Then it can also be powered off. There is the R proc shutdown call. So previously booted one basically can be turned off again. Usually it does not decrement the R proc ref counting. So it only the power ref counting. That means the user can still use such a handle in a subsequent R proc boot call again. So you could basically boot a certain firmware, shut it down again, boot another firmware, shut it down. You could even have use cases like that. And then there is another call to actually find such R proc handle using device 3p handle. Then how does the implementers API look? So if you basically are an SOC vendor and would want to implement that. So it has the R proc alloc that allocates a new such remote processor handle. Then there is a similar free. There is also a register. So it's available within the remote proc framework. And then there is a call so you can unroll this adding. And then there is also a way to do a crash report. So if something would go wrong, so if you have in your SOC some mechanism to detect some problem, you could also report that. There is also a call for that. And then of course there are the callbacks for start, stop. And there is an additional one for kick. This is mainly used when you have some communication between the cores. So you could update the other side that you want to notify it. So it can interrupt the remote processor to let it know that in a weird queue there is something available. And then of course there is also a binary firmware structure. Usually on the Linux R proc side it uses exclusively ELF, either in the 32 or 64 bit flavor depending on what kind of a core you have. Then the kernel configuration in Linux for that is called config remote proc or the more specifically for the i.mix case config imix remote proc. And you also need to add in the device tree certain things. It's, for example, in the iMix 7 case you have this iMix 7D CM4. And there you give it the clock, for example. This is the root clock of your M4 core. There is also a special property called FSL autoboot. That will basically immediately when you boot Linux also via remote proc boot your M4 core. Then you can give it some memory regions, for example, for your code and for the SRAM. And another thing, which I kind of scratched before, is syscon stuff, which basically means that some of the registers that are relevant to control such cores are available in the SOC in some memory area which is used for miscellaneous stuff. And you have to give it this syscon handle. And then the reserve memory. On the right side there you see how you can do that. For example, on iMix 7, these are some specific addresses which are basically reserved. Meaning that those are mapped, that the A and the M cores can access these memory areas. And then this is hands-on how one can use the sysFX interface to do that. We will actually revisit that in the live demo if I happen to have enough time at the end. So we will actually be able to live on the system, execute those. But it basically allows you to check the state of the remote proc when, for example, you started it from U-boot and it would have a state of attached. And then you can, of course, stop it. Give it new firmware, which it's as convenient as just writing or copying basically an L file to some sysFS node. And then you can actually start that firmware again with a co-start. And then if you started it like that, it will show us a state running. Now let's have a look at when you now started such firmware, how can Linux and the Zephyr site talk to each other? And the mechanism for that is RP message. So remote proc, we just covered that. That's for the lifecycle, how you execute your code. And the RP message is for the communication. It's basically a weird IU-based messaging bus. It allows the kernel drivers to communicate with such remote processors. Of course, one thing to note is that it might have some security implications. For example, the remote processor may have full or at least partial access to certain memory or peripherals in your system. That's one thing that from a system design perspective, of course, you have to keep in mind. Then the RP message device, basically, you can communicate. You have communication channels. They are identified by name and some local source and the remote destination address. And you can listen on channels, meaning there is a receive an RX callback, which is bound to such a unique RP message local address. And then the RP message core will basically dispatch incoming messages according to a destination address. And as mentioned above, this is implemented using so-called weird IO, which has a mailbox style synchronization, usually with a transmit receive and the so-called RXDB, that is the doorbell, basically how you can notify the other side. And usually, it uses shared so-called varying buffers. In Linux, there is a device tree binding is documented, as I've given it here. Basically, the first one is for the mailbox unit. Then, of course, it's a regular memory-managed IO. It's used for the weird IO. And there is also the weird IO device. And the documentation for the RP message you find in the staging part of the documentation. And the configs for it is the config IMX inbox with the source code you can find there. And that is config weird IO and config weird IO MMIO. One thing to note is, of course, depending, again, on your SOC, you have to make sure that whatever peripheral you might use from whatever core site that has to be disabled or not used from the other cores in your system, otherwise bad things might happen. So yeah, like GPIOs or UARTs or things like that. Then in the device tree, this looks as follows. Basically, that IMX7D CM4 note that I introduced before, it can also have additional notes and properties for RP message. For example, you give it the inbox names, TXRXRXDB. And you actually allocate which inboxes are used for that, which are then handles to the MU stuff. And of course, in the memory regions, you additionally have to add this RP message V-ring buffers. And this is how the memory for those can be allocated. And also, those MUs that are referenced earlier, of course, you also have to enable, so put the status of those to enable. Then, of course, there is also the Zephyr site. And that's actually where open HMP comes into play. And that I'm going to discuss now. So open asymmetric multiprocessing, open HMP. That's basically a framework providing software components to enable such development of HMP applications. It's actually a Linaro community project. And it allows operating systems to interact with a broad range of such complex heterogeneous architectures. It allows asymmetric multiprocessing application to basically leverage the parallelism offered by such multi-core configurations. And it has integrated lifecycle management. And also, of course, the inter-processor communication IPC, which basically is what RP message is not. And it provides a standard library usable with R-tosses. Or you may also use that, of course, bare metal. You could run your bare metal code on the M4 code and also use this open HMP library. It's compatible with upstream Linux remote proc and RP message components. It supports the following HMP configurations, either a Linux host with a generic or bare metal remote or a generic bare metal host with a Linux remote. And the way that is kind of done is it has some proxy infrastructure. And it also has some demo showcases that with the ability to handle also printf, scanf, open, close, read, write calls from bare metal based remote contexts. So you could integrate that easily. This is the source structure of the open HMP framework. And in the Sefer world, you find that on the Sefer projects, modules, lib, open HMP. So that's basically integrated in the Sefer sources. There are also a few other communication libraries available, for example, RP message light or ERPC. I just quickly wanted to mention that there are also other possibilities. That usually are rather used in even more resource constrained, for example, Cortex-M0 or very small code size, smaller than 5K kind of implementations. Very good. Then we can have a look at the actual hardware here. I have prepared a real life demo. I actually, we have different hardware available. For today, I actually only brought the one on the left side. This is basically an IDOTemic7 system. The actual system on module is on the backside of this board. And the nice thing about this board is that it's basically you just can connect it by USB. It will power it. It will immediately has an FTDI, which gives you the console. And actually, I have another FTDI here that you also see connected to some header there that will actually give you the Sefer console. Let's have a look how that actually looks. Actually, let me quickly first go back here. Remember when I said, we can do this hands-on stuff. So we can actually one-to-one do this use case here. I go back here and I have the console. I actually have on the left side, this is from the A-core. It's now booted into U-boot. And on the right side, oops. Let's see whether I maybe, that should be fine. Well, it's probably just outputted some garbage from the M-core. Let's hope nothing to be worried about. So from the U-boot side, for example, I can now list here. This is an MMC-based thing. So it has some files available here. For this demo, I, for example, can use this Sefer blinky L file. Let's have a look here. This is from the Linux side. We actually start further. We can start with that one here. So from U-boot with the boot-ox command. As I'm a little bit lazy, I'm going to use the L file one, which usually, anyway, makes much more sense. I guess I might have to go here to actually select it. So I can do that one here. However, I want to use the blinky one. So this actually just defines that there is now a boot. So I can run. Thank you very much. There you go. Then we can run that. And it should all go according to plan. It should start blinking. It indeed does. That's the LED here. So we can also look at whether it outputted something here. And it indeed did. So this is the Sefer, basically, on the other UR. And then we're still in the boot loader. Now we can also run the A core side of things. For example, distro, boot, and it starts the Linux kernel side. And I mean, it's still continuous blinking. So the M4 core is still, of course, executing. And like I said, on the Linux device driver side, I made sure I'm not touching now any of his peripherals. So that core will just continue to run, basically. But I set it all up that RpMessage, remote-broken RpMessage is all basically known now. And we can now switch to this other slide that I had here. So we can basically check what the state is of it. And like I said, when it started from U-boot, it has a state of attached. And then we can, for example, stop it. If one stops it, it now doesn't blink anymore. I actually stopped it when it's on, so it just stays on. And we can check the state again. Then it shows us offline. That's how it is done. Now we can actually copy some firmware. I have to make sure somehow it doesn't automatically mount my stuff anymore. Don't know how that happened, but I just mount it again. So this is basically the same EMMC stuff we saw before. And I can now just copy that. Let's see from mount zephyr-blinky. I copy that to this syslip-firmware file. This is the default file that is used. And then we can start it up again, which is a start. And it should actually start blinking again. And if it started like that, we can check its state. And it has a state of running. That's the difference, basically. OK, that is basically it for my talk. And you might have some questions. I can also, yeah, any questions? Yeah. OK, thank you very much.