 All right, hi everybody, I'm gonna take his load just in case some people might still want to come in So I'm gonna start by talking about what is a static partitioning Something you might you probably are all familiar with So the idea is that on a modern SOC there is so much hardware that is difficult to manage it all from a single system you want to Divide it up into different domains into different partitions and run different system on different partitions Now in this example, for instance, this is a Xaliens Ultra scale plus and PSOC It's a block diagram of it As you can see there is a lot of stuff in it for fully powered AFCs recourse a Bunch of programmable logic a number of peripherals a graphic processor and much more This is you know from Xaliens, but if you look at the block diagrams of other SOC vendors, you'll see something, you know comparable So the point is What do we do with so much hardware with so much stuff? So ideally what we would like to do is to you know draw circles nice circles around the blocks that you want to use and Use them work with linux with a certain OS With whatever you like with a special setup to make use of these hardware resources and then draw a different diagram a different Different circle around a different set of resources and do something differently there so in this almost random example, I draw a red circle around a couple of the AFCs recourse some memory here and The PCIe controller and I draw red circles around the other two AFCs recourse some more memory and a bunch of programmable logic underneath So again the idea is Yes, there is enough hardware so that you can run Independently everything that you need separately. So they don't really need to cram everything together on a single system anymore So why Why is desirable to? Statically partitions the system that way So there are a number of reasons I'm gonna go through the top ones or the one I am aware of But I think mixed criticality is really is a chief concept to understand so typically in most people across industry and there was a very nice talk earlier on from the BMW from the saying, you know each industry is actually quite different and that is very true I find so there are some commonalities across industries and what one that is common across many is mixed criticality meaning that there is often something that is very critical that you want to run and Something that is far less critical Obviously different industry can have very different example of this So I've seen for instance in avionics the critical application is the Piloting software for your drone and the non-critical application is a control path In automotives the critical application could be The software that takes as an input the rear view camera stream that you look at when you back Whatever when you're big backing out and the non-critical could be just the UI in industrial the critical component could be the software that is Controlling the robotic arm and the non-critical software could be the user interface Something connected to the internet doing statistics and things like that So in all these scenarios the critical and the non-critical software is Desirable for them to coexist on the single board because as we have seen you know at least this board I'm sure many others have definitely enough hardware to run them both But these are two completely different environments develop differently Even if they are the same opera let's say they're both Linux typically is a critical is non-linux But even if he was Linux they would be different kernels develop differently different update schedules different providers One might be an RT Linux kernel the other one might not be an RT Linux kernel So very very different environments With static partitioning you can run both of them together, but independently On different domains on your on your SOC That brings me to the next topic which is safety so the critical application as often not always but often some safety requirements So in the industrial case If the critical application fails to control the robotic arm properly The robotic arm might swing around and hit some some of the worker, right? So there are there is actually often a safety concern related to people and when that is the case There are certification involved, which is a very big topic, of course So in these cases the applications that are safety safety Requirement definitely is a great idea to keep it separate from the one that has known that's non-safety critical because The last thing you want is a non-safety critical environment to compromise to affect In any way your safety environment and that goes across the board. It's not just security But it's also Isolation of the performance level so the non-critical application should not be able to affect the performance of the critical application In any way in both in terms of CPU, but also in terms of RQ latency and everything else Security like obviously like critical and non-critical application typically a very different security update schedules Different they might be developed on different OSS. They might just have different security policies real-time the critical application almost always has Real some some sort of real-time requirements some RQ latency requirements and even Non-safety critical app often have some real RQ latency requirements that need to be respected By partitioning the system splitting it up cleanly. You can guarantee is real-time requirements Even if one of the two systems have not has not really been developed against any real-time properties fault isolation So almost always in my experience with what I've seen one of the first four points typically applies But even if none of them apply It's still a good Is a good software development policy to not put all your eggs in one basket, right? So so to separate the application in a way that they are only run with the least amount of privilege to do their job And so that if they crash the only thing that is compromised is the functionality of that application not the old system So if you do study partitioning you can develop a system that works exactly like that so each partition is separate is isolated and Each application is only privileged enough to do his own thing is partition allows it only to do one thing And if he crushes the crush is contained within the partition Multiple OSes well, that's the simplest of them all if you have more than one OS Obviously typically you need to run them in separate partitions So how do we? How do we deploy or develop or a static partitioning solution? so the goal here is What I've tried to achieve is to use then to set up a static partitioning environment now This looks good on paper because Xen is a type 1i provisor actually following from the previous talk is a microkernel in fact So it it it's very small it start on the highest privilege level So it should have it does have the capability to set up all the system the old system with each domain the right amount of privilege that he needs and Dedicated CPU cores dedicated memory dedicated hardware resources Guaranteed our Q latency the way needed just as I explained In theory in practice is Not quite like that. It's it's close, but not quite quite as as good Why? because yes Xen is Definitely running in the highest privilege mode is start first So you should be able to do all the partitioning the way we like But in practice the way it has been used so far all these years is to start them zero first What is this dom zero dom zero is a special VM, which is typically Linux Which is privileged so if you can do privilege operation from it like starting all the VMs stopping all the VMs Doing all sort of stuff rebooting the old machine and so on now What what does it mean first of what it means a number of things? So you could still set up your system from dom zero But it doesn't look quite as good as this does it and and also There are actual concrete consequences of having dom zero So this is a list non comprehensive list of consequences of having dom zero So one consequence is simply well I start from the easy one you have one more partition in your system, right? So if you don't need if you don't need a Linux system Why should you right? This is one more partition that is using resources if nothing else is going to use power, right? So you don't need that Secondly, this is privileged So you have one has to be very careful what it puts in dom zero because well dom zero has pretty much the keys to the castle Right so you cannot really put your you know The least privileged application in dom zero it wouldn't be a good idea There are other Consequences things the worst one is probably the boot time. Let's say that what you care about is your critical application That you want to put in the yellow domain the yellow partition or yellow VM Are you want to call it? So what you care about is a startup time of your yellow up So with dom zero in the system You have to boot then well that's more and then quick fine But then you need to boot the dom zero corner Which again typically is Linux the dom zero user space all of it And then finally you have the excel tool available to start your yellow VM Then in your boot time if you don't start doing some Optimization some really good optimization on your dom on the new order of the system is gonna be bigger than three seconds It's fair to say bigger than five second I think by default it's gonna be even bigger than 10 seconds if you don't start cutting any system and so on right This is not what you want for your swinging robotic arm controller, right? That would be 10 seconds of panic there So there are other consequence safety certification so Linux is Large is a bit still problematic certifying a little system So what do you do for dom zero there isn't a simple answer you could replace it with another OS So nothing nothing says that dom zero has to be Linux But then if you don't use a Linux as dom zero you are both of an a bit of a strange path No, many other people use a non-Linux dom zero so it become harder to set up son and so forth There are other simple solution like one idea is simply to use dom zero to set up the system and then quit That means dom zero will only be there a start at time. Okay, that may be feasible. It could be a decent workaround But we'll have a better solution just in a couple of slides Complexity obviously you have one more VM to build Yachto and so on Pros I only been focusing on the cons so far. So there are pros in having dom zero Well, you have one place where you can do monitoring you can monitor your health the health of your system You can reboot your VM. You can do a bunch of stuff. Some of them really cool like doing memory introspection Getting very very very detailed information on the behavior of your VM of the application within the VM really cool So, you know, you can't have everything right? And that brings me to the Solution so what I've been working on in the last year and a half is to improve static partitioning scenarios with them and I did that by implementing a new feature that we've been calling Dom zero less. Why? It will become obvious very soon. So the idea is to Not require dom zero to boot your system How do we achieve that goal we achieve that goal first of all by having you boot load more things into memory Than you usually does so what does you put usually load into memory? Well, he load the XANA supervisor Obvious it also loads a dom zero colonel and RAM disk also obvious But that usually stopped there. So We extended is the protocol to also load the colonel and round disk of yellow and the color colonel and RAM disk of purple Step one step two We both then as usual, but then we have then start in parallel all your domains on separate physical cores So here we have dom zero yellow and purple all starting in parallel Now that means that your yellow up here is a boot time of yellow without doing any Optimization is going to be less than a second. Why because the only thing it needs to wait is for Xen to start And then is pretty tiny is less than 50,000 lines of code. So it's not gonna take long So it's easy to probably with some optimization. You can go into the realm of milliseconds of boot time How do we do the parallel boot thing so the idea is that when you would start Xen It passes to Xen some more information Telling Xen that it shouldn't just start on zero and stop there, but it should also start yellow and purple I'm gonna go into detail of how it that is done In a couple of slides But what I want to say here is well, I've been saying dom zero less dom zero less dom zero less, but That's dom zero Yes, so the idea is Dom zero less refers to these two VMs yellow and purple why because they have been started Without any help from dom zero. So dom zero has done Nothing whatsoever to help or provide any services like nothing Zero nada to start yellow and purple and that's why we call them Dom zero less VMs, but you still have dom zero there So you can still use it to start more VMs like in this case are regular your you know around of the meal red VM Usal or do monitoring do the usual stuff that you do from dom zero How does it work? So this is to go to these are technical conference, so I wanted to go a bit more into the implementation detail So the way it works is as I said first of all we extend you I mean we ask you But to load the more things into memory. So here we have You would start loading into memory the device tree that's the first line the dom zero corner the hypervisor They don't zero round disc the last command here is to boot then This is what we usually do This is what it has been done up to now was new is and now you we also Asked to load the colonel and round disc of yellow and again the same for purple or any of the other dom zero less VMs step one Step two we have to tell then somehow that is in this system. There isn't just dom zero But there are two more VMs. How do we do that? Device tree, so we have already some dom zero related configuration under the chosen node in device tree So we just extend as that set of bindings to also Advertise the presence of more VMs So we added a node that is compatible to Xen domain It has some memory some virtual CPU PL 011 and just a UART and emulated by Xen and This tells them where the kernel is the last module tells them where the RAM disk is So this is a unit of description for one of these dom zero is VM Xen is gonna read it it's gonna find out that he needs to start a VM with the colonel loaded here and the RAM disk loaded here And it's gonna start it Okay, so all of all of that Everything I said so far is good and but it's honestly useless without device assignment because yes You can start VMs, but there is no memory. No power parallelized driver. Nothing. What are you gonna do with them? Right, this is not much. So this becomes useful as soon as you can assign physical resources to each of these domain or Static partitions, how do you do that with device assignment? So you remap and memory region and interrupts of hard physical are the resources into these partitions So in this example the network card is given to yellow VM. How is it done? again with device tree so we expose So if you if any of you are already familiar with then today is possible to assign devices to VMs that are started from dom zero and it is done by adding a little device trees nip it To your VM confi file We took the same idea The idea is that these device trees nip it Here describes one device that needs to be assigned to one of your Domains for instance a network card You compile it into his own DTB and then you load it again from you boot added to the domain Description and this basically tells them that there is one device to be assigned one or more device to be assigned How does it work? It's pretty simple. It contains a description of the device to assign a couple of special attributes one with the memory region that you want to remap and the destination and another one with the path like they are basically a link to the corresponding device on the host device tree Because this way if you're asking why there are two description is this is way you might you can expose a different Description of the device to your domain compared to the physical host device tree description There are cases where that is useful All right, so that brings me to Through them zero less so through them zero less mean actually there is no dom zero. Okay So in this example We have only two VMs only two domains yellow with the network card and purple with a PL like a programmable logic block And that's definitely achievable now that we have dom zero less Not done yet upstream, but definitely doable So yeah going back to that slide Using dom zero less we can definitely set up a system where Xen starts and Automatically set up all the static partitions for you according to your configuration with each static partition having physical core fully dedicated to them as well as Hardware resources assigned to them Arqueal latency is minimal and CPU are fully dedicated Using a special scheduler school now That tells them to dedicate fully cores to static partition and not do any scheduling that increase our cue latency Okay, so Yes, the talk is about static partitioning and this is I think where this feature really shine But I wanted to just spend a couple of words to say that it can also be used for non-static partitioning cases like Unless you take it out. You still have them zero so you can use it for stuff, right? You can for instance have a Couple of more VMs which could be you know driving your safety critical or your critical applications Started a boot time to increase the oil to decrease to minimize the boot time And then as well after the system is fully up and running you can still use them zero to start non-critical stuff separately Okay pros and cons So with dom zero less you can set up a true static partitioning system You get way faster boot times. So I get easily less than a second boot times is Excellent for small system with only few static partitions is way easier to certify because now you only have to see To safety certify the hypervisor Not dom zero. So this big chunk of the system the software solution goes away Lower complexity the big chunk of software solutions that goes away also means you don't need to build it You don't need to build a Linux system anymore Especially just to run the extent tools you don't need to use the octo or anything So to set up dom zero less you basically just compile Xen with the linear or GCC The way you do the linear scornel. That's it. There is nothing else to build So you just build then you tell it to start two things in parallel and that's the end What else well I have a pretty big cons for a pretty big cons for you. There is no dom zero Yeah, yeah, there is no dom zero. That means if you want to reboot your VMs You can't you want to start more VMs. You can't right? So every everything is a trade-off. So if you want to get rid of dom zero, it means you have no dom zero after all, right? One thing that you should be aware of Today what is not implemented is you don't have PV drivers between dom zero less VMs This is technically possible to have a network backend to share the network card For instance running in yellow and then the front end running in purple Technically possible doesn't work. Okay. It's on the roadmap and will be implemented at some point So today can only really be used for pure static partitioning scenarios that brings me to the To-do list and what's done and what's missing? What is done is basic dom zero less like starting multiple VMs in parallel that's implemented is upstream in Xen for 12 Device assignment is in progress. We are very close to upstreaming it and hopefully is gonna be in in 413 Which will be out by the end of the year Is also available in the exiling sentry What's next so through them zero less meaning getting rid of them zero for good you can very easily do with a one line Change patch, but is not up to you can you cannot really do it with just a configuration Share memory and interrupts. So it's really is really will be really good to be able to share memory Cashable memory between the dom zero less VMs and let them communicate That's missing and PV front and then back end is a bit farther out But it will be done at some point. So share memory is actually quite easy to implement So I haven't got around to it yet All right, so I have quite a few interesting demos to show you I'm gonna start with This demo So actually first I want to show you something else the first thing I want to show you is I've been working on simplifying the Configuration even further so you don't have to manually edit your device tree manually edit your uboot script So what this does is a very very simple Plain test confi file and there is a very very simple script that takes this as an input and generates Uboot script for you. There's gonna boot into our system So this tells it, you know where the memory is the device tree binary that you the device tree blob you're using Xen the binary that you're using the kernel for dom zero that kernel the ramdix for dom zero How many other domains you have? This is a configuration for device assignment and contains a network card and that's basically the output So I'm gonna run it as an example for you This generated a uboot script You want to see the source This is a source is basically loading a bunch of kern of binaries into memory is also doing the modification of the device tree From the uboot script itself so that all your binary that you provided as input are pristine and unchanged and Then it's simply called boot So I can show you how it can be how it works, but I'm running it from QM. So this is gonna be slow I'm can tell you straight ahead. It's gonna be slow The only thing you need to do is set the IP address load this script into memory and run it and You're gonna see a bunch of VM being started in parallel Slowly because this is fully emulated on my laptop right now So basically there are three VMs all starting the mess on the console is because all of them are printing One of them as a network card and so on okay Right so this is pretty basic dom zero less. So you should be able to rep to do this on your own without too much difficulty I wanted to show you something more interesting than this something more interesting is The demo I wanted to talk to you about so in this demo there is no dom zero there is a Yellow VM with a network card, which is Linux plus busy box So it's a very very small Linux environment and the other one purple is a bare metal up So this is not even Linux is tiny is bare metal and as a you are Directly assigned to it the secondary you are there are like thing two or four maybe on the ultra scale plus more than you need And a TTC timer one the one of the many timers that are available on the board So what I'm running on purple is just a tiny application to measure IQ latency So I'm gonna run it now. So this is live over the internet. So if anything goes wrong, I'm gonna blame it on the internet I'm gonna tell you straight away So here you'll see the system starting and you should be able to see the first VM only and On the other tab here the bare metal up will start. This is then now This is the bare metal up already finished obviously because it's tiny So I already started booted run and finished and it run his own IQ latency test If I go back to this We have only one domain running wise cold domain too as is this is because I implemented Through them zero less for this demo with this patch Return zero there. So I just stopped then from starting them zero So if you take them zero out dumb one is the one with the latency test So dumb two is the one left with the network card So I'm gonna switch input here to this guy and you can see he has an IP address And I can ping the host is connected to and it works Okay, so that's two demos. I Can show you on one more if I can which is a pretty complex solution with Zen starting Three VMs dom zero yellow and purple and then from dom zero starting one more VM Regular with front to the back end With the network are so assigned to yellow So the only one that can actually go out is yellow and the other two can only talk to with one another So this is basically a private network by virtue of the fact there is no internet connection there at all This is recorded and I can see This is about to boot So this is then starting the three VMs started in parallel Bit of a mess on the console. The other two have finished only dom zero is left Okay, dom zero is fully booted. I'm logging in now from here You you you can see so I switched input here to dumb one So dumb one is the one with a network card that will be yellow in the picture it and it has an IP address assigned to it Same as in the demo. I just showed you live I'm here what I'm doing. I'm showing that it can ping the host Funnily enough, I cannot show you ping Google because this is in my firewall the Xilinx network So I cannot actually ping Google But that's the closest to pinging the real internet that I can get into this demo Then I switch the input to the second domain and then back to Zen dom zero Okay, and in dom zero you see I can use Excel to issue commands I start you you see these two now name VMs Dom zero less VM have no name because the name is assigned from dom zero So they come up as now, but you can see that there are there are two of them was one with 2vc pu There's a one with only one vc pu and both of them with 128 megabyte of memory And then what I'm doing is I'm just starting one more VM This is a comfy file of the DM. I'm about to start Starting the VM Okay, obviously there are three VMs now The one called test is the one started from dom zero and now I am gonna do a silly test where I've write chow to index dot html and use a busy box implementation of HTTP D running here And then there is no network interface because I'm using a special para virtualized network Networking called PV calls that doesn't create the network interface And then from dom zero I'm gonna fetch I'm gonna fetch the index dot html page from local host That's because this local host is this communication domain between all VMs in dom zero Using the special PV call para virtualized interface and you can see there is chow written in the page So basically I can you can see that they communicate as usual as you could do without them zero less and still the network is Completely separate because we assign the network card to the other domain Okay, that that's the end. So if you have any question, I would be happy to answer any Yes Yes, so typically dom zero is the one that in a traditional setup Don't zero is the one that out all devices and then or almost all except for the one that directly assigned to others and then you Part of you start this para virtualized back ends in dom zero and the front end in other VMs So here all of that stuff is still architecturally present in Zen So you could do it and the idea that one day You could do these are I these things doesn't quite work So you could start yellow with a network card, but also the network back end as Typically is done in dom zero so that another VM can actually connect to it And in fact part of these protocols There is already a rendezvous mechanism that is based on status, you know check when the other guy is online So it will work with parallel boot to it does that the plumbing guy is not done yet. Absolutely Yeah Yeah, so when I say bare metal for instance, that's not a proprietary So Bermetaly is what Zalink calls these tiny tiny little library OS is not even OS It's just a set of library function to initialize your Bermetal hardware that can also be run in a VM But we are talking about a colonel plus up that is maybe less than 100 kilobytes like it's a tiny tiny application and that's definitely not Linux and This is what you have seen starting with the RQ latency test. That's a completely separate corner that has been run in parallel So keep in mind that I have far better number that I can tell you about So RQ latency is a big topic I could talk about RQ latency for an entire presentation and I plan to do that next year, but I mean I can I can explain so RQ latency is a key to have it low is to fully dedicate CPUs so that you don't do scheduling Discheduling, okay, that's you can do it by default by using cad equal now on the extent command line That means each virtual CPU is gonna be one to one mapped to a physical core now that works if you have a system with no interference and you get to already this pretty good number like if you look look This one has a high latency. So the biggest latency is this test is 3.7 microseconds This is pretty good for an arm system. I'm telling you like it's very hard to beat that however, the key is on a System without interference. So there are still ways on arm systems to create to cause interference between core and affected performance from One core to another, but we are close to publishing a set of changes to XAN that basically me like zero the possibility of interference by separating the cache of each processor and We have done extremely detailed measurements of an IQ latency and even in the worst possible case with the worst possible Kind of interference is always less than five Microseconds, so I feel extremely comfortable on the IQ latency side, but you need those changes not published yet About to be so so is not is not is not separate that way. So so is only that each VM as Dedicated cash line, that's a good way of putting it So that way you don't know you have no chance of a VM Affecting cash line performance of another VM, but you can still share memory and on the share memory Of course, like if you share a page that one can be affected that one page Go ahead Yes, so well yes and no so the code is all open source and I'm sure you can find it somewhere under xalanx GitHub page in practice so the classic way of building it is through Vivaldo Which is a pretty big SDK from xalanx that come with the product so In Vivaldo itself is not open source, but all the libraries is all BSD licensed So you it's you can duty you could build it But I did it before build it by hand outside of Vivaldo and that would be completely open source Yeah, right. Absolutely. I know that people up for instance. I took it and slightly modified it and use it on a Renaissance board for instance So it can be done. Yeah, so Lars is gonna have a presentation tomorrow and It's gonna talk about it. We have a lot of thoughts on the area So we made some progress easier specifically on certification We've been talking about it for a while and we recognize it's key for the X and I provides it to make forward progress So you check out last talk tomorrow. Yeah, so then it supports x86 and arm 32 and that's the cc4 actually However dom zero less only is only implemented on arm So you cannot use them zero less on x86 what you could do on x86 So you could have dom zero start the other VMs and quit and you can get actually close Not quite as good as but close in terms of like Start up time for instance and that you could do it today last questions No, I'm the only thing standing between me and refreshment and okay, so Yeah, I guess the question so the key is it needs to be protected with the SMMU so and Then automatically we'll use the SMMU to protect every device assignment in the system if the SMMU is available On ultra scale plus is available. There are a few other boards with a good SMMU implementations That's definitely something to look at When you look at the solution because if the SMMU is not there or it doesn't quite work Then it becomes a bit difficult. Yeah, thank you all and have a good rest of the evening