 All right, so thank you for coming before I start I wanted to kind of get a sense as to who saw Stefanos Dom zero less talk yesterday? No one. Okay. And who so who's okay stewards talk about certification earlier? Okay, that's almost everyone. So let me just introduce myself. So my name is is Lars Kurt. I'm the community manager for the Xen project. I'm also the chairman of the Xen project advisory board. I happen to work for Citrix, but Citrix has no real stake in safety and embedded. I'm primarily working on this, you know, with my with my open source community hat on. And we're basically going to talk about kind of how Xen is actually becoming relevant in embedded and so a bit of bit of the history where we're going and and how we're approaching safety certification. So let's just start why why actually virtualize embedded systems all together. And you know, in many ways, it's quite similar to to why we do this in servers is consolidation at the end of the day. But you know, for slightly different reasons. And there's also an element of reducing development costs partially as well because you can take some of your existing applications and run them, you know, within a guest of a hypervisor, which means you get more flexibility as you would have had before. Another important aspect is security and safety. So one of our goals, and this is why we're doing a lot of the work around safety and also the building blocks leading towards it, is to support mixed criticality compositions. And it really means that, you know, we will have applications with different safety, security and maybe real time requirements running on the same maybe ECU or you know, and platform. And to be able to do that, really, we ultimately ultimately need to have safety certification in a hypervisor. But also if you have that, it extends the number of use cases we can ultimately support. And as a number of embedded requirements, which ultimately, you know, we have to support. So how do we actually get there from the Xen project's perspective? So let's talk very briefly about a little bit of history. So Xen originally came from server virtualization and cloud computing. What is interesting, and actually this was happened about nine, 10 years ago, the U.S. DOD created a family of Xen-based hypervisors called Xenon Hypervisor Family, which was actually a first time where they tried formal methods on Xen code base towards common criteria EAL5, including usage of some formal methods. So that's kind of interesting, because this was some of the early work which others then also started building upon from a safety certification perspective. Then we had various attempts to bring Xen to the desktop, as well as mobile virtualization. So, you know, the next desktop anyone similar with Xen, we never really kind of cracked that. However, you know, CUBE's OS as a security-conscious operating system is still, is around and very successful, as well as OpenXT and SecureView, which again are basically defense applications. And from that baseline, the project eventually started branching out into more generic embedded use cases, as well as automotive. And we're going to look at that in a little bit more detail. So, the first attempt to bring Xen to embedded devices was started around 2012 by DonaWorks. There were a number of research grants which enabled them to do this at this time. In fact, there was a specific research project which was government funded, which led to the creation, which was looked at the problem of what is possible to safety certify spin-off of Xen towards DO178 level A. And I will quote some of that information from there, because that's kind of relevant to that overall discussion. Later on, DonaWorks created a spin-off of that of a next-generation product called Virtual City OA, which is basically face certified. So, that's Future Airborne Capability Environment, and it's a collection of different safety and architectural standards. In 2016, we have a company called Starlap. They're doing something similar, but there's less known about them. We then, in 2015, Xen started getting involved with Xen. So, Xen actually was the first Xen-based distro for embedded devices for their platform, and they have a lot of additional functionality on top of the open-source Xen. There's currently no safety certification support for that, but it's all something we're working together with Xilinx quite closely. Then we had the first attempt to bring Xen to automotive, and this happened around 2015 with GlobeLogic. So, they just turned up at one of our developer events with a fully-fledged demo showing Android on an onboard. It was really quite cool at this time. We had no idea as a community that was coming. But the problem ultimately was, they were not just too early at that time. There was just no way that at the time you could reasonably expect an open-source project to be safety-certified. So, ultimately, that effort failed. Howard, a team at GlobeLogic, ended all up at EPAM, and they had a second go at this, and they have a really interesting platform now. Now, we're also at the position, and you've probably seen this in Kate's earlier talk, we're seeing a lot more of the pieces falling in place to actually make safety certification and open-source contacts viable. And one of the sort of more interesting things, which has been announced recently, is that DurnaWorks again has secured some funding to prepare, to have some NASA funding to bring Xen into space. So, I don't know that much about it yet, but you know, you can have a link towards that. It looks like a very cool project. To just summarize, actually, one thing I didn't mention was that around 2016, EPAM and RANISAS, before they started seriously working on the Embedded Center Reference Port, they funded a study by Horiba to assess whether it's possible to safety certify a subset of Xen, and the answer for that was clearly yes. And since then, you know, the focus has primarily been on closing functional gaps, adding real-time capability, reducing the code size, and also creating reference implementations for different segments. All of this is open-source, but not all of this is upstreamed in Xen at this point. So, what has this really meant for the project, and how is this really relevant for moving towards safety? So, we ended up with a lot of features in Xen Hypervisor, which is very specific to Embedded. So, we have a number of different schedulers. We ended up with a minimal Xen configuration, as driver support, as support for Opti, and a lot of other things. This is just a very, very, very high-level list. And the two things which are most relevant for the safety certification effort is the minimal Xen on ARM configuration. So, I think we're currently at about 47,000 lines of code, which, you know, it makes safety certification on top of Xen feasible. And a key critical component of this is DOM zero less Xen. So, if you look at the traditional Xen system, you have DOM zero, which is a special VM, and it typically runs Linux. It could potentially also run an RTOS, but that would normally have to be safety certified, and that is obviously a lot of lines of code. So, this effort, the DOM zero less Xen effort is driven by Xilinx, and as a reference to this at the end as well, this will really make it, this is going to be part of our initial safety certification scope. But the upshot of all this is that really, you know, almost accidentally, you know, because originally Xen on ARM was really designed for servers, and then through the effort of companies like EPAM and Xilinx and others, it actually turned out a great hypervisor for embedded and mixed criticality use cases, and so we have a really good baseline to build on. So, that brings us, you know, to the topic of safety certification. So, there have been a number of attempts to solve, current attempts to solve this. So, there's obviously there's free RTOS, free RTOS is open source. There's a safe RTOS, a proprietary spit rewrite, you know, complying with IEC, but, you know, this is not really what we're looking for. There was the SEL 2L Linux MP effort, which then eventually has become, you know, ELISA, and there's a number of Linux foundation projects, which are also having an ambition to become easily certifiable. So, one of the things, and this will also become a theme of this talk to some degree, there's a set of common problems, which, you know, which behind the back, these different projects were starting to collaborate on solving some of these. And of course, you know, each of those projects have a different history, culture, and specific problems that have to be overcome in a context of that project. So, you know, like, Sefer has a quite different culture to Xen from a development perspective. Xen in many ways is quite similar to Linux from how we operate, but obviously it's a lot smaller. But functionally, you know, at the end of the day, there's really, if you abstract it down to some core areas, there's two sets of problems, you know, which we have to solve. So, the first realization is that to make the project safety certifiable, you have to make some changes and sometimes major changes to the code base. It requires tools, it requires infrastructure and expertise, and all this could potentially be solved by throwing money at it, right? However, the elephant in the room is, you know, you ultimately need to change how the open source project operates and how the community works. And until relatively recently, nobody, you know, everybody assumed this was just not going to be possible, and so nobody even really tried. And that's why we've seen, you know, past attempts where somebody used the open source code base, forked it, you know, built something or rebuilt it, and tried and did it this way. However, actually, as we will see a little bit later, maybe this is actually achievable. And the more I think about it, and the more at least in the context of Xen, we're starting to work with our developers. This seems to be a solvable problem, and actually each individual bit you look at isn't entirely new. It's stuff we have solved in some way, shape or form in a different open source context before. But let's look at some figures. And so these figures actually came up, came from Dernaworks. So I'm crediting the sources there. This came out of a research grant where they were trying to answer the question, whether you could safety certify Xen to DO178, which is an Avionics standard, up to DELA, which is the highest certification level. And really there's a sample, you know, they actually looked at a number of components in Xen sub-system and did this. And at the end of the day, came up with cost estimates by our, you know, per line of code. And the key of this is these figures really show if you're experienced, if you both know the Xen code base, and you have safety certification expertise, you can basically get to the standard by the number of hours I quote it, if you don't have that expertise, it's going to take significantly longer. The key point really is, because, you know, I sort of said earlier, there has been already a lot of investment, you know, in that embedded segment over the last few years. We're probably looking around 20 to 30, maybe more man years sort of figure in the air effort. And if you look at those graphs, so we're targeting DELB or DELC, you know, to safety certify code base of around 50,000 lines of code, you're somewhere looking, you know, in between five and 10 man years. And, you know, if you kind of compare this, it's not entirely outlandish, really, to want to do this. But this at the time was assuming a one-off effort, and it's also assuming, and that's actually what we don't want to end up with. We don't want to end up with a fork, you know, where somebody takes consent and safety certifies it. We want to make it easier altogether for everyone building on consent to achieve that and to really bring this cost down. So this has become more of a prominent topic, particularly in the last year and a half. And when we started, you know, as a project, when we started to seriously look at this and experiment with the one or other thing and set up mechanisms to really start that journey. So where's our starting point? So I already pointed out, you know, that we have examples of XEN-based embedded products. Some have some safety standards, some have some support for safety standards or safety certification packages in those products. We have expertise in the ecosystem where, you know, we have companies who have both XEN and safety expertise, so that's actually a significant advantage because, you know, if we can build on this, we have multiple reference implementations. So we have E-PAM, which has an automotive reference stack. We have a Xilinx stack. And more recently, you know, Werner Works has announced their NASA stack and has a similar effort in progress elsewhere, which I can't talk about publicly, but, you know, so there's a lot of, so we're already seeing a pattern here around reference implementations. And we also have some adoption in use cases, as some of them in a non-safety context, primarily in a military. And we have a few applications, which unfortunately I also can't talk about. So that's the problem with this whole avionics space. There's so much secrecy around it, where XEN is used in a safety context, but where the safety part can be isolated, you know, on a different component. So what does this really mean and what's our starting point from an architectural viewpoint? So have a look at, if you're interested, have a look at this slide deck down there. So, or you can chat to Stefano who put that deck together about DOM zero less XEN architecture. So basically the idea here is you have the hypervisor. Today what we have is you have DOM zero. And normally when you start XEN, you know, XEN starts DOM zero and then DOM zero starts all these other virtual machines and you have a tool stack and everything, you know, within your architecture. It's very sort of traditional server focused. But actually what we really want also for low startup latency is the capability to basically start the hypervisor today alongside DOM zero and then, you know, and the VMs start off in parallel very quickly. And then from that point on, you know, if something goes wrong, you would just like have a watchdog somewhere in the system and restart it fairly quickly. So this is what we have today. There's a number of restrictions. Most of this is upstream already. And some of it, which isn't, is currently already being posted on a mailing list and hopefully get into the next XEN release towards the end of the year. What we're really working towards is a system where there's no DOM zero at all in the platform. And that would also be our initial safety certification scope and we'll get there sometimes next year by the way, by the looks of it. So this is our starting point. You know, then from there, we could essentially, you know, consider going to an architecture, you know, like this, where we basically in addition to what we've done for true DOM zero less would have to prove that, you know, if that non-safety critical side goes down and it doesn't affect the safety critical side, that in theory also looks feasible. And then we have the automotive case, which, you know, as you see, is a lot more complex. So this is currently the EDD architecture diagram, which EFAM is using. They're common CMIS that assume that there's several ECUs in the car. On the left side, you have an example of where, you know, one ECU would act as a gateway or application server, you know, connecting to the cloud. This would be for things like, this is basically evolution of a telematic control unit for fleet management or user behavior insurance and all these kind of use cases where maybe you wanna have workloads coming from, you know, from your cloud being deployed on your car. And then the right example is basically a digital cockpit, maybe a cluster with IVI or ADES sort of integrated on top of it. But that is ultimately a lot harder to do. So how are we starting to break all the problems from a process perspective down? We set up a special interest group. We had a big kickoff of that at the beginning of this year with community reps on it. We have a number of safety assessors engaged in this process. And then we have a number of components which actively do work in this area. And then we have one company which is still sort of sitting a little bit on a fence and observing and hopefully we'll get them to help actively out with this process as well. And before we started this, we would need, we had to look at whether it's actually going to be possible to carry the community along. So at the beginning of this year, we got safety assessors, companies who had an interest in a driving center to make it more safety-certifiable. And basically all our key community, all our key committers into one room for two days to start basically discussing, can this be done? So the first day was actually all about trying to build a common understanding. If we look at our committer base today, two have a strong embedded background. The other five really come from a cloud server virtualization background. And so this was a lot about building common understanding on both sides because what was interesting for example is all the assessors we had in the room, none of them really understood very much about open source. And then we had some companies such as ARM and others who already looked at this problem from a wider perspective who really helped facilitate and kind of build a common understanding. One thing which I took, which was most surprising for me was actually how little all the assessors really agreed as well on how you would approach this problem. And we were trying to establish red lines. We were trying to establish what as a community could we agree to and what would we object to? What level of change would be acceptable in how we operate and also identify potential barriers which would prevent us from going there? When I agreed to set this up, I wasn't really sure how this was gonna go. I had no idea whether I would just be screamed out of the room or whether we could come to a consensus. And it was actually really surprising because this whole two days ended up really productive and it actually showed that in principle we can do this. A lot of it depends on the detail. So the first thing is that we agreed this can only be done if you follow a split development model where some of the things are gonna be done in the open and some of it is gonna be done in a closed part which could either be an individual vendor or maybe a consortium of those vendors pooling together and they could then also potentially act as the accountable body who's responsible for a safety certified spin-off. We then agreed that everything that is valuable for the wider community should ideally live in the open part. So this means like documentation, tests as much as possible, any of the traceability, automation, infrastructure, all this kind of stuff which generally the project as a whole would benefit from should live in the open. Another thing but this was somewhat controversial and this is an area of friction is everything which creates code churn, if it wasn't open, ideally should live in the open but this may not be perfectly achievable which we'll see a little bit later. We wanna minimize the changes to the development workflow and really what this means right now we have a very good centric workflow and if you look at sort of same requirements management tools like doors and stuff today they don't fit into that, get workflow, right? So you would then have to deal with all these parallel systems and keep off artifacts in step and all these kind of things. So one of the red lines is whatever happens needs to fit into this overall workflow otherwise the changes which would be required while the community would be too disruptive. And at this time when we looked at this we weren't sure whether this is doable or not, right? And another area which was really interesting that was ultimately a discussion around community scalability, right? So if you suddenly, suddenly we have a community of about 50 people who are active most of them today don't have a safety context if we suddenly have people come in which have a safety background which have a very different mindset that's obviously going to create friction. So we have to somehow manage this. And also we don't wanna introduce any significant barriers for existing contributors. So when we start doing things initially we're gonna have to ring fancies. There's also scalability questions like, right? So for example, if we reverse engineer a lot of the documentation needed for safety certification if one of those vendors steps up and does that and it gets upstreamed, who's gonna review that? Code has to be, or documentation has to be reviewed to go in. So we have to find a way how to bootstrap this. But on the other hand, this is a problem but this is a problem every single open source project which kind of grows has to some degree, right? So there's a standard pattern for this. So now I wanted to talk a little bit about but some of the examples, some example challenges that we're gonna have to overcome and we're sort of working on. And I'm gonna focus on two major things because they're sort of quite different and there's also another presentation I gave in Tokyo at the AGL summit where I'm kind of looking at that in a little bit more detail. There's a link to that at the end because I'm just gonna show you a snapshot of two quite different types of problems we're gonna have to overcome. So the first has to do with basically the V model, right? So most, if you wanna follow ISO 26262, you're gonna have to follow the V model and it really means you have to have requirements, you have to break, you have to have this whole hierarchy of different documents which are traced to each other and then on the other side, you have to have validation which tests things at every different level. So how do you fit that into an open source development process? Can you actually get community buy-in for this at all? Now, the outcome really, the answer was surprisingly yes. I was totally surprised about this, but when you start breaking this down, it isn't actually such an unachievable problem. So first thing to note is that many of the challenges we face there are actually quite similar to try and apply that process to an agile development model, which is actually being done in some organizations which do safety certification. It's not easy, but it is doable. So let's first look at this and see what open source projects kinda, what do you have normally today and how do you map this in the open source project onto this model? So typically, in open source, we tend to not do requirements, right? If they do exist, they happen before stuff gets upstreamed in a project and live somewhere in an entity outside of the project. But actually, if you look at it for a code base like SIN, which is actually a well-defined, it's a hypervisor, there's not that much magic stuff you can do with it. Retrofitting this, oh, sorry, I'm back. Retrofitting this isn't actually an entirely unachievable. All right, hopefully I'll get sort of presentation when batteries died and the power thing doesn't work. It's not a rejuge effort to retrofit this. And actually, as it turns out, once you've done it, it wouldn't actually change that often because it covers some of the core functionality around some of the extra features like around the fringes and where innovation happens, maybe it would. It's actually also valuable for developers and newcomers to the project, right? It's just something which, because normally in open source we don't have to do it, it doesn't happen. If you look at areas like sort of designs and the way how we code, that's actually frequently as good or even better than in a proprietary environment. If you look at, typically if you look at process discipline in most open source projects, it's a very strong discipline, like the amount of time we spend on code review typically is a lot more, at least in Xen is a lot more than in a commercial environment. We never actually get anything checked in unless it takes order boxes and so on and so forth. What doesn't happen at all is anything to do with traceability. And unless there's a way to maintain this, if you had to maintain this manually, it would be an absolute nightmare and that would be just something which would be a red line, which the community wouldn't carry. And then you have the whole thing around testing. Obviously, projects have different approaches to testing. The sum of them are better than others, but practically none of them really tests against their requirements, against the architecture specs and so on, you write tests against the code. So this side would have to be entirely, probably looked at from scratch and that's actually the most expensive part of that entire problem. So what must be upstream? So if you look at what must be upstream to enable that whole cycle, everything on the left side, which is documented requirements, specs and so on or the traceability information, that really would need to be all upstream otherwise this whole, you can't keep the code and all this stuff in sync. So we start looking at some tools. There's actually no real, there were two open source tools out there which do anything in this space at all. One is pretty dead and the other one, it's called DoorStop. It's a little player on doors and it started playing with this and it's basically works on a concept of storing different pieces of documentation artifacts. There could be requirements, there could be other bits and pieces which you can link together, which are stored in the tree and so you follow your normal Git workflow and then when you invalidate it, it actually leads you through the chains to make sure like, if I change something, a top level requirement, it makes me go and to verify that all the things which depend on that chain are actually verified and ticked off. It's very similar in many ways to how Git works. So that kind of could, it has issues but it can probably be fixed with a few weeks of development effort in that community and it's something and they're actually responsive. So I submitted some patches to those guys already and some bugs and they started being responsive so that's kind of encouraging. So that means that actually this left side is potentially feasible. The validation side, because it's so expensive, it doesn't actually have to be necessarily upstream. The minimum requirement is that if it isn't upstream, it has to be somehow integrated into the project CI infrastructure and we have examples of projects actually doing this. So if you look at OpenStack, OpenStack has this concept of third-party CI's where companies like VMware and so on they can plug into that framework and then whenever they release or have regular CI builds, if there's failures in a third-party CI, then there's a responsibility on a vendor who supplied it to fix the thing in a certain time upstream because otherwise they get kicked out of the CI infrastructure. So that's again, there's kind of patterns to deal with this. So it's not something which we would have to entirely invent from scratch. So, how much time do I have? I'm running out of time. So the other really interesting thing is MISRA. I'm gonna skip about this, skip this a little bit. So I picked MISRA initially because interestingly it's one of the hardest community problems that it will need to fix. And that's primarily because the code review process doesn't really lend itself to review, you end up getting tied up too much in bike shedding arguments if you're trying to just like to submit patches, right? So we tried to do this, we experimented with it. You know, I was really mean, I picked some really hard and controversial rules, told some of our contributors, just submit this to the community and let's just see what happens, right? And it was like a really interesting experiment. None of the guys really knew this was coming. So we had some real life data of what would actually happen in the community context. And what we learned what we learned was fundamentally the key point, the key issue we had with this is really community scalability, right? If you have a lot of very large changes where you end up with, you know, if you just say you have a thousand issue, a thousand MISRA issues in your code base which have to be fixed and it takes two or four, two to two, maybe four hours, you know, to do a review on this. That's like two man years just of a review time, right? And approaching things this way is just not gonna scale. So we have to find a different way of dealing with this where we just deal with classes of issues maybe and then upstream them that way. But that fundamentally somewhat breaks the normal code review process. So we're gonna have to somehow try and cover most of the common cases and then just have discussions around the more difficult ones. So getting to a critical plan. So basically what our approach has been is we're gonna try to, we're gonna go down a low customization route. So we're currently investigating the best routes. So this is done by the SIG. I set up, you know, I covered the VMOD model beforehand. We're starting to collaborate with projects like Eliza and Sefer and others. And actually I think the most important thing about this is really, we don't quite know yet how quickly and whether we can do this. So we need to trial and iterate and see what works and what doesn't to help us build the confidence ourselves that we can do that and then we can overcome some of the community problems. But also the more we do this, the more we demonstrate to the wider world that we can actually do this and it helps unlock more funding. And that's basically it. So I'm gonna open the floor to some questions and then the slides are already published if you want them. Yeah, go ahead. So because we're a fairly old project, we're not using user stories today. But we have a lot of meta information which are basically, typically what happens is when you send a patch series to the list, you basically in a cover letter, you have a quite large justification why you're doing it and how and so on. And usually for large stuff that's where a lot of the discussion and clarification happens. The problem is it's on a bloody mailing list and it doesn't end up in Git and it's in a very unusable format to mine. Now one of the things which is actually helping us, I haven't really dwelled on this, but there's a project going on to really take the external arm core and I mentioned before and it was designed for servers. So we have a number of vendors who investigate what it would took to really go over their code base and rewrite it with embedded fully in mind. And that creates an opportunity to look at this and also do this. With the safety stuff and the requirements and all the stuff in mind. So this is another key part of that puzzle. We're really trying to solve. But yes, a lot of information is there. It's just in a very inaccessible format. And the same is true in Linux. Projects like Cepha, which started later, they may have used user stories on various tools and it's probably easier to mine those. So we want this to live alongside the code and that also means that if there are artifacts which are either in the code or alongside the code linked to each other, then of course also we get to join up history through Git. But at the end of the day, a lot of the things around basic core functionality, around isolation and all those things, they're fairly well understood, right? So, yeah, exactly. I guess I have time for another question before I get kicked out. Well, nobody actually is checking time so we can just wait for the next speaker. Any more questions? All right, so I guess if you wanna find me afterwards, I'll be outside. Thank you.