 Hi everyone. My name is Stefano Stabellini. I work for AMD and I'm one of the extend maintainers. Today I'll talk to you about static partitioning, missed criticality and Zen. Why static partitioning is useful. In most common scenarios, at least in embedded, the software stack is made up of multiple components, only some of them critical to the functionality of the device. The largest amount of code and the highest number of components typically are actually not for the core functionality of the device, but therefore the UI or for cloud connectivity and other things that are important but not critical for the functionality of the device. So putting all of this component in a single environment is like putting all your eggs in a one basket. If something goes wrong, everything goes wrong. And a failure in your non-critical software could end up causing a critical failure in the old device, including the critical function, stops working. So using Zen and static partitioning allows you to have multiple baskets. So you can have your own critical components separated out in a VM or a domain. And your non-critical components, one or more, in multiple other domains. By doing that, you get isolation from interference, from fault isolation, real-time isolation. So if something goes wrong in your non-critical environment, your critical functionality keeps going, keeps working correctly, and the critical function is not unaffected. If your critical function has real-time requirements, they still keep working. You still get your real-time guarantees, no matter how big is the software or how busy is the non-critical environment. And also, it allows you to deploy a software stack where only the critical component is actually privileged. You can run large amount of software and driver, even as non-privileged, which is great not just from a security point of view, but also from a safety point of view, especially safety certifications are part of the design. So static partitioning basically looks a bit like that. You take the SOC you're working with, and you are splitting it up. You're carving out two or three, like in this example, subsets, and you're directly assigning a subset of the hardware to each of these domains. So what are the same features that allow static partitioning to work? The first one is DOM zero less. DOM zero less is a feature, it's called like that, because it allows them to start domains without DOM zero. DOM zero used to be required, now it's only optional. DOM zero less enables static domain configuration before boot given to Xen and Xen start at boot time all the domains that you wanted without DOM zero having to do anything. DOM zero is still optional and could be there for monitoring, for instance, or for rebooting individual VMs. Other features that are critical for static partitioning are real-time support, interact latency with real-time support, and cache coloring, which is isolation from cache interference, which allows you to have very low interpolations. VM to VM communication is often important to be able to exchange data between your domains. Safety certifiability is also very critical in many environments where safety is part of the equation, and we have a number of activities in relation to miseracy and other important items for safety that we are the time ongoing, but I'm not going to talk about those in this presentation. So let's go through an example configuration of Xen and static partitioning scenarios with a step-by-step guide, and let's start with a simple first step, which is to configure the environment with three domains. So the reference board I'm using is exiling this U102. I'm using an emulated QM environment, but you could use the physical board or target any other boards. The configuration is with Xen and three domains. Linux DOM zero, which is not required, as I said, but I'm keeping it around, is useful at least initially for debugging, but typically in production, then you could certainly remove it, if it's not essential. Linux and Linux RT environment with a minimal busybox in EterD and Zephyr. So three VMs, and then we choose the memory allocation and CPU allocation for all of these domains. Like I said, I kept them zero around, but generally one of the questions when choosing domains is whether you want to have a DOM zero or not. Device assignment is going to be critical, and it is going to be configured in a second step. So it's going to look a bit like that. So Uboot is going to load everything, and then Xen is going to start all domains in parallel. So the first step is to look at Image Builder. So Image Builder is really the reference point for DOM zero as configurations is a set of scripts that you can use to automatically generate your boot scripts from a simple text-based configuration file. So the one you see on the screen here is an Image Builder configuration file. You use it as a parameter to the Uboot script gen that you see here, and then Image Builder is going to generate for you the Uboot boot script that loads everything and starts Xen. So the parameters here are memory start and memory end is a memory region to be used by Image Builder for loading the binaries and loading itself, I mean the boot script. Device tree points to the host device tree binary. Xen points to the Xen binary. The DOM zero kernel and RAM disk are the parameters to specify the DOM zero kernel and the DOM zero RAM disk. DOM zero mem is to specify the memory for DOM zero in megabytes. The number of DOM use is a parameter to specify how many other domains you want to start. The new kernel zero, RAM disk zero, are to specify the kernel and RAM disk for the first DOM zero less DOM U, which is Linux RT with a trivial tiny busy box in ether V. The new mem is to give the memory for the new mem zero, the memory of the Linux RT environment. I gave two gigabytes. The new vCPUs is to select the number of CPUs for the Linux RT environment, and I chose two, otherwise default is one. And then the Zephyr domain, which is just a kernel, 128 megabytes is more than enough for the Zephyr to run. So image view that generates for you boot script, but you can configure and edit this boot script further in case you want to change common line parameter or tweak some options, anything you like, you can edit it, and then from there you can still generate the command on the screen with a simple make image call, you can generate the boot.scr again. So we talked about image builder, so we need to provide a few binaries. Let's start with Xen. So compiling Xen is simple and it compiles the same way as the Linux kernel compile. So you just enter the Xen directory under the Xen repository, and you type make. It works the same way as the Linux kernel. So you set the cross-compile environment, you set the target architecture if needed, and then menu config optionally, otherwise it's just going to use the default, the default config, and then make. Menu config is interesting. Usually you would just use the default, but menu config is interesting because it's needed if you want to enable cache-coloning, then we are going to look at it later. Cache-coloning can be enabled via a menu config option. Cache-coloning is not fully upstream yet, but is available publicly on the Xen3 as this URL. So feel free to use it, to give a look at it, use it even on Xen boards, of course, and also non-Xen boards. What about Linux? So Linux, you can just pull in any vanilla release like the 5.17, that's what I use for the demo. Enable config Xen. So config Xen is not required, but it's nice to have, because it gives you more options for the console, and also it gives you PV drivers, an additional way to communicate between VMs and for device sharing. So it's good to have. It's also not adding a lot of code, so it doesn't increase your size of the Linux binary by much, so I think it's good value. The other one that's good to have is the AMBA PL011 driver. So that's the driver for the PL011 UART to have a console. That is because when you run Linux as DOM0 as VM, you get an emulated PL011 UART, so it's good to be able to print on that console. Config bridge is not required, but it's useful in DOM0 if you're going to use PV network, because the bridging version of PV network is the easiest to set up. I just used vanilla plus two PV driver patches that are going upstream, are already accepted and reviewed, but not committed to Linux master yet, and I'll show you later how to use them. For the RouterFest, now if you don't have a DOM0 or you don't care about using the Xen tools in DOM0, like to start or stop VMs, then you can use any RouterFest for Linux that you like. You can use the default Yachto Ubuntu anything, but of course, if you want the Xen tools, then you need to pull them in if you want to be able to create new VMs and runtime, for instance. So in that case, at least for Yachto, those are the instructions on the screen to just enable Xen to the build and get the RouterFest with Xen tools. What about Zephyr? So Zephyr thankfully has Xen support out of the box. So there is a Xen VM machine that is starting to get a genetic Xen VM, so you just use Xen VM machine for the Zephyr build and it's going to work. In fact, that's all you need to do if you want to run Xen as a regular Xen domain, started from DOM0. There are a couple of additional steps needed if you want to run Zephyr as DOM0 less DOMU, so started directly at boot. Not very complicated, but after setting up the Zephyr build environment, as provided by this link here, you need to change in the device tree for the Xen VM board, you need to change a couple of options, but well, you need to rename the Xen underscore HBC console with Xen underscore console.io underscore HBC. The reason why you needed to do that is to enable to switch the console that usually is available for DOMUs, so for regular domains, with the console that instead is used by DOM0 and DOM0 less guests. And then just make sure to run menu config before triggering the build. In the menu config, just enable Zephyr as DOM0 and Zephyr the console for DOM0. These two options are needed again to use the console that's typically used for DOM0, but can also be used by DOM0 less guests. That's it, trigger the build and just use it. Make sure the Xen has been built with debug enabled for this to work properly. All right, so before we go to the next step, I want to show you a short demo. So I'm going to run QM here and I'm just rebuild the boot script using image builder and here the system is booting already. So what's going on here, I'm going to scroll up while this is booting, is binaries that have been loaded by TFTP at boot and then Xen has been started. Xen is starting DOM0 and DOM0 less domains in parallel. Zephyr is very quick to start because well it's very small, so it's already printing Hello World all the way up here. Then the two Linux's are starting in parallel, like in parallel there is limited level of parallel here with QIMU, but you see the mixed messages of the two Linux environments on the same console. Now of course, if you have multiple serial, I would definitely recommend to assign different serial to different domains so that you don't get mixed messages on the screen but also those that work. So here this is DOM0 and I can see that there are a couple of domains running and I can with Control-Triple-A, I switch between domains. The first one is Zephyr, cannot interact with Zephyr. Control-Triple-A again and now I am in the other Linux environment and I can do things and I can check that there is no network. I'm going to add the network in a second stage. So another thing to look at here is the generated boot script. This is the source. This is what if you want you can customize any way you like and regenerate the boot.scr that is actually used to boot the system. And this is the directory with the binaries and the original configuration file for ImageBuilder that is the same as I described before. Right so going back so next step is to add physical devices to each domain. So this is useful really in embedded if you can drive physical devices directly from your Linux or Zephyr environment. So in the example I'm going to assign the azonet device to the Linux RTBM and a timer, a TTC timer to the Zephyr environment. Device assignment is for sure the most difficult part of the configuration. Device assignment works as seen on the screen. So we need to provide a partial device which is just a small DTB for each domain describing the devices that we want to assign. The purpose of this partial DTB is to both describe for the domain the device but also to tell then what to remap for the domain. So this example on the screen you can see there's a pass through node under the pass through node everything is copied into the guest device tree. And in particular there is this azonet node which is the one corresponding to the azonet device. You can basically start by copying the node from the host device tree into this partial device tree and then make changes, make tweaks. What are the tweaks that you need to make? There's three involves here. First of all you need to add any Xen specific configuration. The Xen reg tells then what to remap in terms of MMAO regions. It can be usually one to one like in this case source and destination addresses are the same but it could be mapped at a different address in the guest. Xen pass is pointing to the corresponding node in the host device tree because this allows you if you wish to have a completely different this device tree description for the guest compared to on the host. This is not typically done but that's what it's for. But normally what you would do normally you just use Xen pass to point to the host device tree description of the same node. Finally you need to make one last change. So Xen is going to also remap the interrupts reading the interrupt property and the interrupt pattern need to be changed to point to this magic number that stands for the virtual intra controller that get created for the guest by Xen. So you is not pointing to the real host intra controller any longer because that's for the guest so this is pointing to the guest intra controller. That's it is not easy especially because often these nodes have dependencies on external things such as ping control power domains and clocks. So typically you need to just remove the ping control description and links or remove the power domain controls but the clocks are required so what you would normally do is keep the clock property and import any clocks that are needed for this device. The reason is otherwise typically Linux Linux drivers wouldn't work because they would try to probe and change clocks a boot. So after you have this partial DTBs you can just add them to the image builder config file like this and they get loaded automatically. There is only one final thing that you need to do which is mark on the host devices for assignment. You just need to add Xen pass through property under the host device nodes corresponding to these devices so that Xen knows that they need to be assigned. So we know that this is complicated and a bit too complicated and we are doing two things to make it easier. So the first thing that we are doing is axiolins with we have created a repository with a bunch of example partial DTBs for device assignments for a bunch of devices and they can be used simply you can look at them you can use as a base and they can even be picked automatically by image builder if you just write the path here in the image builder with the new pass through path. However keep in mind this is not smart so it's not generating an advisory it's just matching by name the examples under the repository and finding the right one and that's all. The automatic generation we talk about that in a second. One thing to keep in mind when looking at these examples is sometimes the name change in the device tree so for instance the parent of all of these devices is usually called amba in older device trees and newer devices called axi so make sure that the name is the same everywhere otherwise there is a mismatch is not going to work. This is still is easier with examples pre-made for you but first of all you require somebody to write this example by hand and secondly it's still too difficult so what we're doing is we are aiming at aiming at automatically generating all of these partial DTBs the way we are doing this is through another project called Lopper so there is so we are working on an extension to device tree to describe multiple environments multiple heterogeneous CPU clusters and multiple VMs it's called system device tree together with the specification because it also comes an open source tool called Lopper. Lopper is a device tree manipulation tools it can take a device tree at input and generate multiple device trees at output so Lopper can be used to generate these partial DTBs. There is a talk at embedded linus conf this conference to discuss how to use Lopper to automatically generate these partial DTBs it's still very cutting edge but it's possible and that's going to simplify greatly this problem once we assign the devices to the domains then we need to use these devices in your in the guests so what you see on the screens are the changes needed to add the TTC timer to Zephyr this has nothing to do with them this is just generic the generic the generic instructions to add a new device a new peripheral to an existing board so here I have added the TTC timer to the device tree description of the XenVM board and then added the memory mapping for the TTC timer for the same board in MMU regions.c okay before we move to the next step I'm going to show you a demo so I'm going to regenerate the boot script using image builder again stop the previous run start QM again QM is going to start and it's going to fetch the new binaries over TFTP it's going to start all the guests in parallel now let me scroll up again why it finishes booting so this is Xen booting right and then loading the demuse Zephyr is booted very quickly like last time there is no hello world because I changed application and now I'm using the TTC timer to print something every time there is a timer interrupt every few seconds and then the two levels environments are still starting in parallel as you see here so I'm going to scroll to the bottom and as you see there are these messages printed from Zephyr printing a timestamp every few seconds I'm going to switch we're using control triple A to domain choose that is linux and now I'm going to do it config and we have a nethernet card we have s0 interface which is the one that has been directly assigned all right let's go I mean I can go give you a quick look about the directory so what I did here I added the two partial dtb files and you I'm going to open the azanel one as you can see there is a clock after all in addition to the azanel node the xen reg xen path and the interapparent pointed to the magic value and on the oscillatory xen pass through property has been added under the azanel node okay then let's move on to cache coloring so many of you will want to run at least one application with real-time properties interapp latency is very important and should be deterministic in many of the new of the new socs there is a shared l2 cache across the old clusters what it means is activity on one vm on one core can cause cache line evictions that would affect the performance and the latency of another vm this is not good because it's going to make you know memory accesses non-deterministic and your interapp latency might go up the solution is to use cache coloring to make sure each vm has its own dedicated set of cache lines so each vm only access its own cache lines and another vm cannot cause cache line eviction that affect a different guest the way to do this is by using cache coloring so cache coloring is a smart technique that understands the relationship between cache line and physical addresses and allocates memories monthly so that each vm only end up always hitting the same cache lines the trick is allocating one page in memory every 16 as you see on the screen but you didn't you don't need to worry about this the only thing you need to care about is color allocation so a color is this small unit of cache line that you allocate to guest so on zc1.2 there are 16 colors each color correspond to 256 megabyte of brown so depending on your memory allocation to guests you want to give different amount of colors we here we reserve one color for zen that is color zero we reserve five colors for dome zero given that dome zero as one gigabyte of memory you can always provide more colors which is okay to not provide less color than the amount of memory otherwise there's a problem one color to death here uh given that is only using 120 megabyte anyway would be using less even everything else to linuxft that is the largest vm and that is the way to do it in the confi file by adding the new colors with the color configuration for your dome you so your regular domain for zen and dome zero you just need to specify the full command line by hand and i'm going to show you how to do it in a second so let's switch to the console uh so the configuration is here so we have again the new colors this is for linuxft this is for zephyr and here we specify the full zen command line using with the dome zero memory the zen color the dome zero color and waist size is just for qm so it's normally on your physical uh board is no need that is automatically detected in fact if you want to know how many colors uh there are and how big each of them are you can just start zen with uh there's a minimal color in configuration and zen is going to print it out for you like see here tells you the waist size the maximum number of colors available the zen color that you configured the dome zero colors that you configured your zephyr color and uh the linux rt colors that is enough and sufficient to configure coloring successfully uh and device assignments still work as you see on the screen okay i'm going to stop the demo and tell you about the last part of the configuration which is share memory event and event channels so for communications the easiest thing to do is just to configure um uh shared memory area uh between two vm's you can then exchange data on your shared memory with a ring buffer or you can use a library like open amp and for notification you can use event channels that are like software interrupts provided by zen so the configuration is you pick a share memory address like the one you see on the screen keep in mind that the address is important for your color's allocation because each other is going to have a color the color is defined by this number so pick your address carefully so this means is color one um and then you just use a share memory you map it at both vm's uh and um you can use it for notification you just need to bind event channels how to do that there is a header file bsd license that you can include in any of your projects that have uh definitions for hypercos and to receive event channels just use it bind and event channels and just send event notification between vm's the way you wish in reality linux has full support to event channels already and also zephyr has partial support to event channel already so most of it is already done but if you are working on a different project you might need uh zen dot age uh we are working on armies working on uh this a definition of event channel and share memory that can be done at the image visual level, device tree level before the system is already started that's going to make it even easier and better for you but what you're going to what i'm going to tell you now works already with the past zen release so you carve out a memory region removing it from the memory node and you instead specify it as a sram region as you see on the screen um with accent pass through specification to say this is going to be assigned to the guest and then you add it to the partial dtbs of both your guests this sram region you also need to add zen force assigned without a yo-me-mu because otherwise zen will always try to configure the yo-me-mu for you for safety in this case this is not a dma mastering device the yo-me-mu is no need that should not be configured for this so just specify zen force assigned without a yo-me-mu finally you just map the memory in zephyr and linux using normal techniques to map memory so there's nothing special there and then you send each other event channels that's it now this is the code in linux to map to discover an sram region and then map the memory allocate an event channel write the event channel number on the share memory for zephyr to read buy this event channel and send notifications so i'm going to show you this demo straight away i'm going to start in the system again coloring is still enabled booting zephyr linux rt and dun zero all in parallel in this case zephyr is receiving tpc timer interrupts instead of printing something directly is writing the timestamp on the share memory linux is reading is receiving the event channel notification from zephyr we're reading the timestamp and printing it on the screen so as you can see here zephyr is actually linux rt that is printing the timestamp zephyr is silent so again zephyr receives a physical interrupt write the timestamp of the share memory and sends the event channel notification to linux sprint demo code uh this is zephyr is basically waiting to until the share memory is ready and reading the event channel number from the share memory and then binding it from that point onward is just using the event channel hypercode to send notification to the other guest linux instead is using the powerful linux drivers to allocate an event channel write the event channel number on the share memory and finally for each event channel notification it's just reading the timestamp and then printing the timestamp there is one last thing that i want to show you uh and this is pv drivers wisdom zero s so pv drivers are powerful drivers that allow you to not just communicate but even share a device like share the network for instance across multiple vms in this in this example in this run i did not assign the network card to linux rt i left the network card in dom zero i can still give linux rt network access by using pv drivers first i call any dom zero less uh that is a binary indom zero to is a binary indom zero to initialize pv drivers then i hot plug a pv network uh environment into my linux rt vm and the network just comes up at that point so i create a bridge i created the bridge and then i use i used excel network attach which is a command sorry for the all the messages on the screen but i use this command excel network attach to hot plug a new pv network interface into linux rt and switch into linux rt and as you can see here there is now another s0 environment this is the new environment just hot plug into my guest i assign i need an i p address and i can just ping dom zero from it i could wget page from google or do anything else um so this is how it works so the the good thing about this is that you can still start your critical function very early a boot you you don't need to wait for pv for pv drivers to come up pv driver can come up later when dom zero is ready um and then you know that this is how the communication is gonna work uh and the steps are the one you see on the screen so you call any dom zero less and then excel network attach all right i realize there's a lot to cover uh but the slides are going to be made available and the presentation is recorded so uh you can uh slow down and look at the details uh and i hope you enjoyed the presentation and i'm here to answer questions on chat on the chat