 Good morning, everyone. We are going to talk about Rust based but I have backends for Hypervisor Exhaustic Solutions today. There is a growing trend in the current market to implement virtualization for market health other than the traditional server environment. The server environment is very uniform in nature, but as we move towards richer ecosystems of automotive, mobile, medical, and IoT, a very richer ecosystems, organizations, and more device abstractions are required. At Linaro, we started Project Stratos with the aim to make VertiO as a standard interface between hypervisors, playing mobile, automotive, and all these different platforms to migrate from one hypervisor to another one without the need of implementing the backends again. Myself, Viresh Kumar, I work as a senior kernel engineer at Linaro. I co-maintain KubeNamex in the Linux kernel. You have already met Alex, Alex Manny. He's a senior virtualization engineer at Linaro, and he leads the Project Stratos. We'll quickly go through the introduction to VertiO, Vhost, and Vhost user, how they have evolved over the years. And then we'll go to what we have been doing at Linaro in Project Stratos and what we are looking to do going forward. So VertiO was designed as a standard open interface for the virtual machines to upload their data path processing to the host kernel or to the host kernel or the host user space instead of the guest VM. VertiO defines two entities, one at the host and other at the guest. The entity at the host is responsible to share its word queues, and the entity is called as VertiO device. It basically emulates the device, the virtual device. The entity running at the guest, it is known as VertiO driver. It shares its word queues, which are then accessed by the host device. There are various transports which are already implemented for VertiO, like PCI, MMIO, Channel IO. Though a point to note here is that the higher-level device abstractions, like a block device, they are very much independent of the underlying transport mechanism. This slide shows the basic sequence of events that take place for a VertiO transfer from the guest to the host. On the right side, you can see the guest VM. On the left, you can see the host VM. The host VM is the one which has hosted the guest VM. So they are on side-by-side. A normal write operation, a write syscall to FD, eventually reaches to the VertiO block driver on the guest side here, in the red block. This driver is responsible to prepare the data packets, put them into the word queue, and send a kick to the host kernel. When the kick is sent, the VM exits at this point of time, and the control reaches to the host kernel. In this particular case, the backend, the vhost user-demon is implemented in user space, and it passes the, so the host kernel passes the control to the VMM running in user space, and the backend, it reads the event FD to get the notification. It processes the word queue, reads the data packets, and if required, programs the hardware, the block via the write command, and it already has a map the device. Once it is done, it may update the status of the word queue and then send a call back to the guest VM. One important thing to note here is that VertiO is an asynchronous protocol, so the guest doesn't need to wait indefinitely for the host to complete this operation. It can go on, and a response will be sent at a later point of time. While at it, let's also see how it happens in case of Xen. In case of Xen, the VMM is not responsible for doing all this. The VMM is only there to create and destroy the guest. Rather in this case, the backend implements a Xen specific thing, which is Xen Iorek server, which enables the backend to get events and information from the guest kernel. And this is the layer which is responsible for processing the word queues. Vhost is a protocol to upload data path processing to the kernel from the guest VM. The processing, of course, in this case, happens in the guest kernel, for the VMM to share the word queues to the host kernel, the interface is provided via, the control path is provided via the IOCTLs. This slide is pretty much same on the guest side. On the host side, the main difference is that the processing happens within the kernel. So when the guest exits, does a VM exit, at that point of time, the host kernel comes up and it, the adapter there, it goes and process the word queue, all the data in from it and then eventually sends the returns back to the guest. Here, the VM, if you see here, the VMM, it implements the Vhost IOCTL for the VMM to configure the various word queue information to the kernel driver. Vhost user protocol, it complements the Vhost protocol in a sense that the control plane is implemented via Linux domain sockets. The main purpose of Vhost user is to implement the back end, the Vhost user daemon in the user space instead of kernel. The Vhost user protocol defines two major entities, one is front end and one is back end. Front end is the entity which shares the word queues. It looks like the word IOCTL driver in a sense, but it is really the VMM, like QM in our case, which shares its word queue and back end is the entity which consumes the word queues. This shows the word Vhost user path. So it is very much similar to how it looks in case of Vhost. The major difference is that the standalone user space device emulation daemon is sitting in the user space which talks to the VMM over the Vhost user protocol using a unique socket domain and it processes the word queues from the guest. As I said earlier, Linaro's project stratos was started with the aim to make vertio the standard interface, allowing all these platforms like IoT and mobile to make the back end spotable between hypervisors. Right now, QEMU has some of the Vhost user back end implemented within it, which can run independently of QEMU but they share a lot of code with QEMU right now and they are not really independent in that manner. So and we have been looking to make these back ends hypervisor agnostic. So the same back end which runs on QEMU with KMVM can also run on Xen and on some other hypervisor like cloud hypervisor as well. We have done a lot of work in around one and a half years until now with just a bunch of engineers. We have worked in all different categories like, so we started with what are specifications. We wrote a few of our own, reviewed a few, sent improvement patches for a few of them to the what are specification. We also implemented kernel drivers, reviewed some of them. The main thing that we have done is implemented the Vhost demons in user space which are supposed to be hypervisor agnostic. While doing so, we implemented the back end for GPIO and chose LibGPID to be the intermediate layer and we were required to add support for LibGPID as well which was very time consuming. In the process because we wanted to emulate the device on QEMU and Xen, we did develop some patches there as well. We tried to upstream first. We are upstreaming everything that we have. There is nothing which is closed source as of now. So Rust, Rust is a new language which is making its way into the embedded domain nowadays. We will likely see it into the Linux kernel very, very soon. Initially, we developed everything based off C and we thought maybe it's good time to try Rust. Rust is a language which guarantees performance, safety and concurrency. It is of very high importance in our case because we have to handle entrusted guest data at the host side and the guarantees that Rust provides are really, really significant in our case. And then Rust is cool. Everybody talks about it nowadays so we thought maybe we'll give it a shot. Rust VMM is a framework. It's hosted in GitHub normally. It's a framework for building VMMs. For example, there will be a create. Create is like a Git repository you can say but it is still different. So there's a create, a module for Vhost. There will be a module for Vhost back end, Vhost front end. These are all different modules which when combined together they make the real thing. So there are many hypervisors which are already using and sharing these modules like Cross-VM, Firecracker, Cloud Hypervisor which are heavily using these Rust VMM components. So whatever we have been developing we are trying to upstream into the Rust VMM project because that is the most sensible project where we can upstream all this stuff too. When we started, the Rust VMM project did not have any specific create where we can upstream the device implementations, the device simulations. So we created a Vhost device create for our own. It is currently maintained by Leonardo employees, Alex, me, Matthew Poirier. He's not here today. And we have already upstreamed quite of them like GPIO, IOTC, Random Number Generator. While there are some others which are still in progress. RPMB and SCSI and VSOC. There are pull requests which are there for implementing the Rust support. While SCMI and video, they are C-based right now and we are looking to migrate them to Rust as well. One of the common thing here is that the current implementation is very much Linux environment specific, like we access the device from the dev interface since it's FS in the kernel. But there are some which are bare metal like SCMI. It is a bare metal implementation, but right now in C. So initially we developed everything with QMU because it was easy, fast, we can just do it on our machines. Once we had a couple of backends in place, we thought, what can we do now? Because we are claiming that they are going to be hypervisor agnostic, but we have only tested them with one of the hypervisor, KVM along with QVMU. We thought which is the hypervisor which makes more sense to go to. And Zen was one of the choices we had because it's a type one hypervisor. It fits very well with the arms model of hypervisors. And then it has good upstream support. There is a good community around it. So we moved to Zen. When we moved to Zen, the problem was that a lot of stuff was not there, like we host user protocol. These were not implemented for Zen. All that we could find was there's a disk device emulation which was written in C for Zen where they have implemented the way they can pass the word IO packets, the word queues and all. But the V host user was the main missing thing which was there already in QVMU when we started. So now we have a working setup where we can test the existing backends, unmodified with Zen. And we have already proven that the backends are really hypervisor agnostic and there's a lot of hypertracks right now to stream a lot of Zen bits into Rust VMM and into Zen as well. If you see this slide. So this is the main component that we have added. There are three things here. We'll come across that. These are the three entities which are basically bugged on by us in the Zen site to support V host user and test the existing backend. The existing V host user, demon is this one. It hasn't changed since QVMU. This is the new part which we have to do for Zen. There are three main components here. The first one is Zensys. It is developed by Matthew Poirier. This basically provides support for Zen hypercalls via iOS CTLs and bare battle. Then we have V host user master. The V host user master create was, the basic purpose of the create is to provide the V host user front and side of the development. The Rust VMM community already had the backend side of implementation and we could use it earlier for the backends but the front end side, they did not have. Luckily in our case, the cloud hypervisor did implement the V host user master side and we were able to fetch a lot of stuff from there, add some things out on the top of it and make the V host user master create. Now this create is not Zen specific. It is very much hypervisor agnostic. Anybody who wants to implement V host user front end can use it. QVMU can use it for that's it. If they want to use a Rust based implementation. Zen V host master is a V host user master implementation specifically for Zen which has all the sequence of event you need to do for mapping the memory, the Zen iorek server thing and it uses V host master side as well. V host user master create as well. It is based on ePems, what I would disk implementation as I said earlier which was done in C but it did not have any V host user protocol implementation at all. When we moved to Zen, we wanted the backend to be really hypervisor agnostic. They still are, but there are challenges which we have faced. In case of KVM, the guest memory is mapped via Dev SHM. These are user space pages and but in case of Zen when we moved it. So with Dev SHM, the main benefit is that we can follow the standard Linux kernel way of mapping the pages. We can just do a map and everything will be fine. We will have the guest memory available to the host kernel, to the backend and it works fine. And that is how the Rust VM projects are aligned. They just do a map and they expect it to work. But in case of Zen, the Zen community couldn't use the user space pages. They had to move to kernel space pages because the kernel sometimes sets the page table entries to invalid state and if Zen comes and tries to access the page at that particular time, they get a fault. And because of that, they couldn't use user space pages but they have to move to kernel pages. For implementing that, Zen supports a lot of hypercalls, not just M-Map and to have a uniform way of doing all the hypercalls, the way they implemented is that for M-Map, you have to do M-Map first, which will set some flags and all, but eventually you have to issue an IOCTL. And this is where our system broke because we thought that M-Map should be just enough. And we do not want to implement something which doesn't work well with the Linux call way. So we expected M-Map to be robust enough and it should just work, but it does not. For now, we have hacked the kernel for Zen in driver-zen-pref-cmd file. We have hacked it in a way that everything which is done by IOCTL for M-Map is done in M-Map itself. It works right now for us, but it is not a solution we can upstream and we still needed to find a solution for this. What's next? So we have been looking to upstream Vhost user masterquate, which we picked from Cloud Hypervisor and then Vhost master, which we have implemented ourselves. We already have few pull requests to the Rust VMM community in GitHub to get things started, start merging these things bit by bit. We need a standard M-Map interface, as we discussed earlier. We think that there's opportunity to improve the performance by reducing the number of contexts which are required for IRQFD and EventFD from the guest to the backend. And then there is space for guest memory space privatization. We'll cover them one by one. So this is a previous slide. This shows indirect IRQFD, EventFD. So when the control passes from guest VM, VM to the host kernel, eventually it reaches to then Vhost master here, which is implemented via Iorek, ZenIorek. When the event reaches here, that some data is present in the word queue. We send the EventFD from here. It again goes into the kernel, come back to the backend. So we wake up the user space just to wake up another thread. And there's an unnecessary context which to kernel and then to user space which you want to avoid. What we think is if the kernel can somehow handle this interrupt from the VM and straight away pass it on to backend, it will be great. We do not have a solution for it right now, but I think Paulo mentioned something this morning regarding in kernel event delivery that can be very interesting for us. Then comes guest memory space privatization. So in the current setup, the backend can see the entire guest space, which is of course not ideal because we may want the guest VM to be secure and it doesn't want to show its entire user space, its entire space. The word queues in the guest VM are always mapped in the host side. So the host can always see the word queue area, but the data buffers which are filled by the guest, they are, that is the tricky part. So there are three solutions how we can map that. The first one is per request map. This is something which Zen community has already agreed to to some level and merge some of it in the Linux kernel. It is known as ZenGrant or IOMMU based solution. The problem here is that potentially there can be multiple buffers for each whatever request and then we'll be required to do a map for all these buffers on the go for every single request and we worry that it will be too slow and it is really not efficient. But this is already merged and we'll see. The other solution that we can think of is carved out region where a fixed region is set up between guest and host where all these buffer will be copied but then it will not be zero copied, it will be fast but it will have its own problems. And the last one which was again suggested from Leonardo by Arndt Bugman is fed word queues. The idea here is that since the word queue memory is always mapped, why not put the data right into the word queue? But the problem here is that if the data is small, yes it can make sense but the word queue memory is not infinite, it is still finite and we cannot put really big chunks of data into that. So there are a couple of solutions here but we do not really know what's the right solution to go forward with it. That's pretty much it. You can join us at Project Stratos webpage. We have a dedicated mailing list where we discuss all the things that are going on. We have fortnight list Stratos meeting and Rust VM meetings as well where you can join and discuss if you would like. So that's pretty much it. If you want to ask any questions, me and Alex are here. Do we have time to do questions now or are you? Yeah. Any questions? Obviously not, okay. So if you have any questions come to you just grab us during the rest of the forum and we'll happily answer them. Thank you. Sure. Sorry, what was that? So the question was what our general experience of switching over from Rust to C. From C to Rust. C to Rust, sorry. Yeah, neither of us had done any Rust before so it was a fairly steep learning curve. I tried to dabble with Rust before a couple of times and generally just got lost after I'd gone through the first Hello World examples. What really helped for me was actually going on a week long intensive course where I could do Rust every day and get the model in my head. How did you find it? So I think Alex started the Rust training first while in my case I did write a couple of backends and upstreamed them before I got a chance to do the training but it's a fantastic language. I cannot really promise what they promise. They do guarantee that there are a lot of things which like the performance will be as good as what you can see and see. I cannot confirm it because I've not tested it at a great length right now but the language is fantastic. The kind of problems it solves for us like the races it avoids and all this is really, really good and it's an easy language to learn. It's not really difficult. Just a few things and that's it. I think the main thing is getting your head around the Rust model. It's like pointers so you have to understand pointers and then everything is fine.