 Good morning, everyone. My name is Jens Freiman. I work for Red Hat. I heard there's a few of us here. I work in the virtualization team and most of my time I spend working on deep DK and radio topics, more specifically, radio net. Today I want to talk about hardware accelerators for radio because they are becoming a thing now and they're becoming real and there's more interested in them. So today I want to talk about what was done for the Verdi O specification and what has already happened in terms of implementation in the software stack in QMU and KVM and so forth. So just very shortly Verdi O, it's a standard way to provide para-virtualized drivers for devices. So basically, it's how you make your guest talk to your host. There's a Verdi O core. It's WordQs and on top of that there are, you can build devices for networking, for block, for anything you want actually, for audio, input devices, whatever. So just two terms, I will use a lot during the presentation, the driver as the part and the guest and then there's an emulator device and a hypervisor. That's what I would call a device. Verdi O is an RAZR standard, so there's a specification for it and I will talk about how the specification works and how you can participate as well. But first of all, so hardware acceleration. Why do you even want hardware acceleration for Verdi O? Well, it seems these days people want hardware acceleration for everything. But why would you not just use a pass-through device? Because that basically gives you all the performance you would want. But there's a few advantages that you could have if you had Verdi O capable hardware accelerators. For once, live migration with pass-through devices can be a pain and it could be much easier if you had a Verdi O device actually in hardware and your guest doesn't know if it's talking to a software implemented device or a real hardware device. Then basically the migration problem is you use the existing Verdi O framework for a live migration. Obviously, you would like to have less latency and a higher throughput for networking. You could have better hardware isolation for your VM so you could share single queues or something of the hardware to a specific VM. You could implement bandwidth limitations or things like this with hardware support, so without actually spending CPU cycles on that. By this, this implies you free up course and you get more course to run your actual workload, your actual business logic. So I said there's a specification for Verdi O. It helps people who want to implement features in Verdi O and to implement new devices. And the next version of Verdi O specification will be 1.1 and it's currently under its final public review. So I think there's a bit less than 30 days left before 1.1 will be released. The focus for 1.1 was mostly performance optimizations. But also making it easier for hardware vendors to implement the Vada ring or to implement Vada in general in hardware. So that's yeah, lots of new features contributed by many different people from the community, mostly other companies or vendors. The current current status of the specification, you can download it here. There's a PDF in that file. Also the latex source code if you would like to read that. So yeah, you can get this here. So what do we do for the specification? What will we change? One big topic was packed word queues and packed word queues are a more simple form of word queues. The structure and memory is much simpler. So before we had two rings and memory and several fields that need to be updated read, read and written. Now it's much more compact and dense and that's friendly for hardware implementations because they only have to go to one location and memory most of the time. Ring size also change with packed word queues. It doesn't have to be a power of two. Packed word queues is like one big new chapter in the specification and I talked about it in a lot more detail last year's FOSSTEM. So the presentation is there on YouTube. If you're interested in how this works, you can go and watch that. So another thing, ordering of memory accesses. So software implemented devices and actually real hardware might have different requirements there. So often a hardware implemented device will need more strict memory ordering while a software device needs the memory barriers that are suitable for a CPU course. So there's a feature bit added for that and if that bit is present, it means that the driver has to use stronger memory barriers. This is also in the specification. Then we have on some platforms memory accesses can be restricted in some way and it could be that addresses are translated or not all memory can be accessed. They could be behind an MMU, IOMMU and that translates bus addresses to physical addresses. There could be some special operations needed for the memory to be updated or cash flush or something. So that's a new feature bit that basically indicates exactly that and if this bit is present it means that some platform code needs to kick in and handle this. If it's not present it means the opposite, that the driver sees the same physical addresses that the device sees. There's a way to enable and disable notifications for both device and driver and this is done by setting up a data structure in shared memory. And there's also a way to set flag and the value that means only notify when this specific descriptor has become used or available. Why is this good for hardware? Well it's a good thing in general, not only for hardware but because you have to handle less notifications possibly. But in the case of hardware it also means less transactions over the PCI bus. And similar there's a new enhancement to add more data to notifications. So before we would only add the virtue number to notifications so that the device now say in this word use there's something new. But with this you would add additional data like for packed word use you would add the offset and the descriptor ring and the wrap counter. Similar for split word use. So it also could save some speculative reads for hardware. And we talked about the spec that's all in theory it's on paper. Let's talk about how this was implemented in software and the status of the implementations. So for packed word use there is full support in DPDK. So the V-host part was upstreamed with DPDK 1811 and whether a polymer driver will be in 1902. For QMU patches are still under review but in the final stages I would say kernel where the net driver was accepted upstream. The V-host part not yet but also hopefully soon. And of course packed word use as I said they will be in the 101 spec. But so how do you integrate your new hardware card into QMU and KVM and how do you make it work with the kernel and everything. So and if possible in a generic way that avoids the mess where every vendor does its own thing for his card. Intel has been doing a lot of work here and came up with a framework for hardware accelerator cards. It's called VDPA and VDPA is short for V-host data pass accelerator. Main goal here is basically to decouple data pass and control pass. Because you want the data pass so DMA, notifications, Q interrupts, etc. It will be really fast. No QMU involved if possible so the device will basically be DMA into shared memory and that this can be accessed by the virtual driver and the guest. So basically what they did is implement a new QMU and have client called VFIO V-host which sets up VDPA mediated device. And by this interface you would run the control pass which can be slow because it just sets the device up, sets the V-rings up, stuff like this. What are the benefits of this? You have a consistent device interface for the guest to us. You have more flexibility for hardware design. The card doesn't even have to be a full PCI device. And you can use the existing data or live migration framework. So I mentioned V-host MNF. It's another new thing. V-host MNF is an MNF-based V-host backend for hardware. So the idea is that you can set up this V-host backend in hardware like you set up a software-based backend. And you would use even V-host messages. So the same format that is used for V-host IO controls or V-host user messages for this. So MNFs are a standard way to have emulated devices in the kernel. And so this implements an MNF device that can accept V-host messages and deliver that onto the accelerator drivers. This device exposes non-vendor specific interface. So it has bars where you can basically write these V-host messages too. And you would also use it in other bar for notifications. So this way you would achieve the separation of data and control paths. Status of this is actually only in RFC state. So it was proposed by Intel. But there's still discussions going on upstream about where to have vendor-specific drivers. So Intel proposed to have them in the kernel. There's other opinions that say we could also have small vendor-specific drivers even in QMU. So this is still going on. Just stole the slide from a presentation from KVM forum from Intel to show, maybe give us a little bit better overview of how it would work. Basically create an MNF and then pass it onto the QMU command line. And then you can see how you basically skip QMU for GMA and for notifications. And you see that there's a more generic V-host MNF driver and the parent driver. The MNF parent is then the actual accelerator driver. Now this is the actual hardware that exists today. We talked a lot about hardware now and there's one card that exists. And even that was announced last year at OVS Conf. And it's expected that you can actually buy it in this first quarter of 2019. Supports. Verdeonet hardware upload. Actually it even does full OVS acceleration. So it's FPGA based. There's tools to program it. It's very advanced actually. I mentioned live migration here would work as I described before. So this and they even implemented not only what I want at one goes back to 0.95. So you could even use old drivers. So old distros to run this. There's more detail talks about this. I won't go into more details here. At OVS Conf there were two talks. The first one from then they leave us more high level. So good overview. And the second one goes more into how what it all supports how what tools are available and how to use them. So actually already my last slide. What I wanted to convey is that. We see increased interest from hardware vendors in implementing Verdeonet hardware and hardware accelerators. There's a handful that I know of. But Intel is the first one that actually came out and announced something. That's the one device I talked about Intel also worked on the software side. They send patches in DPDK. They are already accepted. So in DPDK there's no P-host M.Daphne or anything. So we have in DPDK we have support for VDPA. And they have an example driver for that card already in there. And they have an example application that uses the driver. So an example that makes use of the hardware card. We have the Verdeo specification. That's one thing to take away. New version coming out very soon. Still time to review. If you work for a hardware vendor or you want to work on something like this please take a look. And actually participate in the community. There's a mailing list where everything is discussed. There's a mailing list that you submit patches to the spec. For example you're working on a new feature. And you want to make sure. So it will take you some more time to implement this. But the first thing you might want to do is reserve a feature bit. And that can be a one line patch to the specification. As a first step. Later on we will want to see actual implementation in QMU. But that doesn't have to be immediately. First step could be very simple. Also if you want to take part in our monthly meeting which takes place on the phone. That's only in addition to the mailing list. But where we can have more person to person discussions about topics. Questions about something we can do to stare. Everything important however will be moved to the mailing list. So any discussion that needs to be on the mailing list because everybody needs to take part will be moved to the mailing list. And I think that's it from me. Are there any questions? Yes. Can you speak up? No. I don't think I said that. The question was I mentioned that there's a limitation for the number of viewings that can be offloaded. But I think that's a misunderstanding. Which slide? This available instances. That's the MDEV instances that are already somehow instantiated in your host system. But I don't know if there's a maximum number of MDEVs outside of the scope. I can find that out. But this basically just has the number of available instances at the moment in the system. Can talk offline. Any other questions? Thank you very much.