 Good to go. Very good. Welcome to my talk about asymmetric heterogeneous multiprocessing, H&P, H&P, mainline Linux, and suffer in Unison. My name is Marcel Zieswiller. I joined Toradex in 2011. I spearheaded their embedded Linux adoption. I introduced upstream-first policy and at times was top-10 U-boot and Linux kernel-arm-sock contributor. We have an industrial embedded Linux platform called Toryzen, which is fully based on Mainline technology. It uses Mainline U-boot with Distribute, KMS, Theorem Graphics, with Etnavif Nouveau, ODIR update with OS3 and Docker, respectively, Podman for the application container interface. What will we cover today? We have a look at the evolution of the microcontroller, and then we look at the integration of such into the Linux ecosystem. We have a quick overview of open-source real-time operating system, RTOS, and then I'm going to look at the H&P, H&P life cycle. So basically how one can actually launch code on such systems, and then we're going to look at the Mainline Linux and suffer how they work together with the remote broke and RT message. And also have a look at the communication libraries, and at the end, if there is enough time, I will try some real-life demo. So the microcontroller, it started with the Texas Instruments, TMS 1000, it was a 4-bit one, that was 1971. Later Intel, 8048, it was an 8-bit one in 1977. Then the Intel 8086, first 16-bit one in 78. Then the Motorola HCO5 is quite a common one. That one actually had a serial bootloader, which allowed EEPROM use for program storage, which was kind of a unique concept. And later, for example, the microchip picks. In 1993, they also allowed rapid prototyping with insistent programming. How about the ecosystem around such microcontroller use cases in Linux? Of course, one thing is the whole interfacing. So there are I2C, MemoryMap, DIO, SPI, all these are supported in kernel subsystems, which can be used to communicate with another side, which of course can also be such a microcontroller. There is also generic register map, regmap support, which helps in such use cases. Then another thing that usually you might have is you want a way to actually run or download the firmware to such microcontrollers. And for that, there is a in kernel firmware IPA that allows, basically got introduced to update the microcode of the main CPUs for all these Rattas that are there, but it also allows to update driver firmware, device driver firmware, basically. For example, if such a device has an integrated microcontroller, often it is also used to update some more driver information data, for example, calibration data, stuff like that. How does it work? For the firmware requests, you can have synchronous ones. So basically it calls the request firmware and then when that completes, it copies that firmware to the device and then it releases it again. Or you can also have an asynchronous use case where it basically won't block on that. And another thing is a special optimization for reboot. So if you reboot, it basically could be that your device is still running and you don't need to kind of reload the firmware. For that, there is this firmware request cache. And the upload API, so it basically has a persistences of S node which enables you also to kind of kick such an update from there. And that, for example, is used also to program FPGAs, for example. You find more information in the documentation driver API firmware introduction RST. Then another thing that is available is so-called multi-function miscellaneous device, the MFT framework. That can also be used to basically if you have such heterogeneous hardware blocks that basically make or offer more than one functionality, you could use that to provide that then to multiple specific device drivers. Also for external bus interfacing, for example, or also this whole memory registers. If you have a register set where in the same register set, you might have different bits that might be used in different drivers, something like that. Oftentimes, this is the case in this so-called Syscon use case. And then, of course, we have also HAP-HAP integrated. Nowadays, a lot of system on chips have multiple such, I mean, not only multiple cores in a S&P configuration, but also really multiple core complexes in such heterogeneous configurations. And there, we're going to look at the remote block RP message later. Then I want to give a quick overview of the real-time OS landscape. I'm not actually going to go through all those, but just quickly, there is like ECOS, for example. FreeRDOS is a well-known contender. We have a new one from ARM, the M-bed OS. Then an old player is the micro-COS3. NAPTEX is often also used. Another one is Riot. And an interesting one is the RT thread. It's actually a contender from China. So they kind of wanted to also get more into this real-time space. Other one, RTEMs. And then, of course, Zephyr. Zephyr originated from Virtus RTOS, which was used for DSPs. And Wind River basically acquired that via this Belgian software company Ionic in 2001. And they renamed it as Rocket. And basically open-sourced it and made it royalty-free. That was in November 2015. It can be used for much smaller memory needs as compared to VxWorks. So it's really suitable for this kind of sensor single-function embedded devices. And meanwhile, in 2016, it then got hosted as a collaborative project in the Linux Foundation. And it got renamed to Zephyr. And early members of the Zephyr supporters included the Intel NXP semiconductor synopsis, Linaro, Texas Instruments, Device Tone, Nordic Semi, Autikon and Bose. It's based on a small monolithic kernel and it has very flexible configuration, which is actually a build-time configured. And it also includes a set of protocol stacks, IPv4, v6, the constraint application protocol, and various other protocols. Virtual file system interface and management device firmware update mechanisms. And it uses a similar K-config on device 3. However, it's implemented here in Python for portability reasons. And the build system is actually based on CMake. And interesting, so this really has the largest number of contributions and commits compared to any of the other RTOS as of January 2022. So it's really as the most active user base, basically. And Apache 2.0 license. Now let's have a look at the lifecycle. So how do we actually get code in such a heterogeneous environment on such a core? How do we get that running? One approach would be that from a boot container by the boot ROM. Unfortunately, as you can see, I dig that up. None of these Mx7 or 8, M mini or plus support that. It's not supported. They all only start, the boot ROM starts the A-course. And there is no other way. You cannot strap that differently or whatever. So only your boot loader later could then load code for there. However, there are others like the I.T.A.MX8 quad-max or the 8X that do support that. However, they support that through this system control unit. So the system control unit is made of a Cortex-M4 processor. So even that itself has its own M4 core. However, it runs proprietary firmware. So the application processor and the Cortex-M4, they have no direct access to any of these hardware mechanisms. So the hardware mechanisms are abstracted in that SU firmware. And as you might know, that SU firmware is highly proprietary closed source stuff. So not really too nice. Anyway, other SOC vendors might have better support for such use cases. Then like I said, if we cannot do it from the boot ROM, we can do it once it's in our hands. We can, for example, run U-boot on the A-core and there there is so-called boot-axe command. And that supports basically running firmware 4 or 7 cores. That is called config I.MX boot-axe. And it's available for I.MX6, so the SX variant, the 7, 8, M, including the Minion Plus. And also the Fibrid, which actually also can have an M4. It's kind of a little bit an exotic one. The implementation of the SOC specifics can be found there and the actual command is in this imxboot-axe.c file. And there is also an optional memory reservation mechanism via device 3. And it supports M4, respectively M7 firmware, as raw binaries. I have here the commands, for example, on I.MX7. So one can load such a .bin file and then one has to, kind of once it's loaded, one has to manually copy to that space where it is linked to and make sure to flush the cache. And then it can be, with boot-axe actually, that core can be started. Of course, much more convenient is the second option, directly using an L file, which in the L-fader already contains the address of the binary it has been linked to. So one can really only load it and boot-axe that load address and it will do all the other magic automatically by parsing the L-fader and copying it to the right place and all that stuff. Then another way to do it is from Linux, basically using the remote processor framework, remote proc. So that allows basically different platforms or architectures to control. So power on, load firmware, power off those remote processes while it's abstracting any of those hardware differences. So that, where it's any code duplication, so not, you know what I mean, not the TI needs to have code for that and then XP needs to have code for that. It really basically generalizes all this stuff. Another thing it does, it can also add RP message, weird diode devices for remote processes that support or need this kind of communication. The user API for that is called to boot, it's a R proc boot. To power off, it's R proc shutdown. Here to notice that it does not decrement the R proc ref count, only the power ref count, meaning that if you have a handle to that R proc, you can keep that one basically in a sub-sequential firmware load and reboot or whatever. And there is a, this is the function to actually get such a handle from the device tree, basically, R proc get by P handle. Then the implementers API, so you can allocate a new remote processor, free it, you can add it to the remote proc framework, or unroll that ad again, or there is also a call to report when it would have crashed. So you can have some kind of a mechanism almost like a watchdog that checks whether it's still alive, something like that. These are those implementation callbacks, the start, stop, and the kick. The kick is to actually interrupt the remote processor to let it know if it would have any pending messages basically in any of these particular weird cues. And as binary firmware structure, usually also it supports L32 and 64 firmware binaries. The Linux kernel configuration for that is called the remote proc, and in this particular case, the IMX remote proc. The device tree looks like that. You have a note with the compatible FSL IMX7D CM4. You have the clock for that M4, and there is a special flag also available, which is called FSL autoboot. That would basically allow, during kernel boot, it would also automatically load the firmware and boot the M4. If you haven't already done that, for example, in Uboot. You can reserve some memory region for it, and you have to give it the syscon handle, because that is basically where you find all these registers that will be able to actually start the M4. And on the right side, you see, for example, how you can add those reserve memories, for example, the tightly coupled memory or the SRAM. Then one can also do it hands-on using the sysFS interface. If you have time at the end, I can live show you that as well. So you can basically check on the state. You can stop it, load new firmware, start it again, all this kind of stuff. You can do it for debugging kind of use case via sysFS. Then let's have a look. Mainline Linux and Zephyr, they working in Unison. They can communicate with RP message. RP message is basically the transport layer underneath. It uses the virtio virtq as a Mac layer, and on the physical layer, it uses the regular shared memory as well as inter-core interrupts, usually mailbox or messaging, you need something like that. And then, yeah, there is, of course, also a remote block that we already just covered that. And on the RP message side, one thing you have to be aware is there are also security implications, of course, often types those M4 cores actually might have more or less full open access to your whole memory, or at least two parts of it, so be aware of that. However nicely you lock down your Linux, it could be that you run firmware on the M4, which basically can just access all your memory. Then as for RP message, so an RP message device is basically a communication channel. It is identified by a name, local source and remote destination address. Then when you're listening on a channel, that basically means that you have an Rx callback, it's bound and that usually has a unique local address. Then the RP message core, it basically dispatches incoming messages, of course, according to that destination address. And like I said, implemented using virtio with mailbox synchronization, so you usually have a transmit, the receive and then RxDB, so called doorbell, basically the interrupt that can interrupt the other core, the vice versa. And using shared veering buffers for the actual transport. Then on the Linux side, how does that look like? You have the device rebinding, so in this case it's this FSLMU, it's this mailbox unit that is used for the inter-process interrupts. And then you have regular virtio, MMIO, so the memory mapped one. And you can have this virtio device. Then there is documentation available in staging, and the kernel config is the IMX inbox for this mailbox, which is implemented there in driver mailbox, inks mailbox. And for the virtio, the config virtio and virtio MMIO. And that is implemented in this virtio RpMessage pass. One thing, don't forget to also disable peripherals used by Sefer on the M4 or M7 side. So for example, if you have, need any GPIOs or UARTs or whatever other peripherals, you have to make sure that there is no contention over that. If you have contention, then it's kind of a little bit undefined, Linux might suddenly turn on something that the M4 core already has or vice versa or change the clocks or any kind of bad stuff can happen. Of course, some of the newer SOCs, they have better mechanisms and kind of resource control that helps you in doing that. So, for example, in the IMXM case it can be more or less unintentionally you could have such influence, not respectively. That's kind of a little bit of detail. It would have such a resource control. Unfortunately, there is a little problem with that resource control because if you have sleep states, it won't survive that. I think that's almost kind of a SOC bug on the IMXM case. That basically means you cannot configure that nicely unless you can basically, you can live with not using any of the sleep modes, but exactly that is usually, you might want to sleep the A core side and have the M4 do some processing, but exactly that then won't work. That's why on most systems on the U-boot level nowadays, they basically just configure full access to all peripherals for all cores due to that. And then, of course, like I said, you have to be very careful that you don't try to use some peripherals from both sides. The device tree, it's basically the same node as before with the remote proc, but you can now also add M-box names and M-boxes. And in the memory regions, you have to add the rpMessage veerings and somewhere, it's not very well documented, but somewhere in the code if you look at the actual rpMessage pass implementation, it uses the first such address as the veering address. So it does some special handling based on which one it is, just that that is not loudly mentioned anywhere. Yeah, some such details you got to find out basically. And then, of course, the reservation of these veerings and also the messaging mailbox unit, you make sure, of course, you have to put that on status enabled by default, it's disabled. And then on the safer side, how does it look there? Of course, that's now where OpenAMP comes into play. So OpenAsymetric multiprocessing, OpenAMP, that's basically the framework that provides software components to enable such HMP applications. It's a Linaro community project and it allows operating systems to basically interact with a broad range of such complex heterogeneous architectures and helps you in this lifecycle management that we talked about as well as the inter-processor communication. And it basically has a standalone library that is usable with certain RTOS as well as also in bare metal configurations. And the whole point of this is, of course, that it's compatible with upstream Linux remote broken rpMessage components. So basically while on the Linux side you use exactly those remote broken rpMessage. On the other side, so on the M4 or M7 core, you can use Open8 HMP which basically implements a compatible peer side, not do that. So it, as support configurations, it can have Linux host and the generic bare metal remote or it can have a generic bare metal host and the Linux remote. So both cases are possible. It also allows to have a proxy infrastructure that would allow you to basically handle such calls like printf or something like that towards a bare metal based remote context. This is the source structure. So you see there on the lip you also have the virtio rpMessage remote proc and as I mentioned this kind of proxy use case implementations. And in Zephyr you actually find that under module slip Open8 HMP. Then there are also other communication libraries. There is an rpMessage light. It's basically a lightweight implementation that NXP semiconductor has done. It's meant mainly if you have really much smaller cores like M0 plus based systems that cannot live with the full Open8 HMP implementation. And it is released under BSD compatible license. Then another thing available is ERPC. So the embedded RPC that's basically towards the other side kind of a standardization approach. So it can use different transports among which you can also use rpMessage light. Very good. Now we can actually try to actually see some of that stuff in action. I have here two boards. Actually I limit myself right now to just the first one here. This is an Astor board which I have here. That USB dongle you see that's basically hooked up to the other UART because it has a built-in FTDI for the regular Linux console. But of course because we want to talk to two cores. I have another one that I can see messages of the other core. Now let's have a look here. I saw bigger. Let's see. I have here some terminals. Let me spawn a new one. So I have here these two FTDI things. Unfortunately open those. This one of them and the other one. So now we power that thing on. Sometimes that happens. Now we have Linux booting which I actually wanted to stop in U-boot first. Let's see. It will be too long. Actually that one. So let me reboot that one again. So here we are in the U-boot, mainline U-boot. And you can see on the EMMC I have here some cipher files. So I can for example load this cipher blinky thing. Now I can go back to the slides. We had that there. Here. So I can for example do something like that. Load EMMC 01 load better and this cipher blinky. And then I can boot dogs load better. And if it goes to plan one can basically see now the LED blinking. So that is basically how from the boot loader you can load that. And now we can boot command. So we can boot into Linux. And if one has done like I said to make sure that this is not somehow then taken over by Linux. It basically keeps blinking by the M4 core. And the A core can now run our boot Linux. Once we're booted, which is now the case. We can go to sysfs and we have here remote proc. I'm too lazy to find that. There you go. So you have remote proc zero. So of course if you have a system with multiple of those you could you could have turned on. Now we can look at the state. It basically says okay state is attached. So Linux is aware there is a remote processor and it is attached currently. And now what I can do is for example I can say that is if you go back to the slides that is this. And so on here we have attached and I can for example echo stop to the state. So echo stop date. Now when I do that you can see it will stop blinking. There you go. No more blinking. We could check what is the state now offline. So it went offline. And you can actually the firmware needs to be called our proc in our proc firmware or you can also change the name. But I usually just copy it there. So I can now, for example, copy. You have to make sure some amount of MC whatever P1. Copy that firmware. For example, a linky thing. To live firmware. This in groups. What does he called our proc. So I can copy the firmware there. And then if you now echo start again. It will basically load that firmware via firmware loading infrastructure. And it should basically. Be up again. So it starts to blinking again. And by the way, I totally forgot about that one. The whole point about the second you are here. Do you have the message coming from the from them for core actually his debug you are not very good. And then of course, if you have that. You can also look at the state and it's now not attached. So attached is basically when an other guy started it not and running is when remote proc itself basically started. Then it knows. Okay. That's basically it. Then another thing I don't know. I can quickly show is. On the safer side. I don't know if you familiar with safer. Basically. Like I said before. It. When you're in this safer project thing, it has its own folder again safer and open HMP stuff is. Basically below that. You can find that here in open HMP. And then. You also find some samples. There is a folder samples basically here. And this this particular one is this. Sample what what is it called. Basic I think they call it. Sample basic. Blinky for example. And for the Blinky. I can show you that here. Just have to make sure. Basically just needs to have this the GPI and stuff like that. Define not. So basically it needs an LED. Which goes to GPIO. And in this particular case I I changed it to that one that is implemented on that board. Goes to this LED of course. That's why I had to change that here. And that's. Pretty much it. Okay. Any questions. Yeah, I don't know. You guys have a microphone. Okay. So his question is whether you can do that with vanilla or or whether you need any vendor kernel. So this here I'm really running a mainline Linux stuff. So I run mainline latest safer and I run mainline latest whatever. Linux stuff. See it's 519 RC to next whatever. Okay. Any other questions anybody. Very good. I get this getting late and I'm the last one standing between you and the closing game. Okay. Thank you very much. Have a good one. Thank you.