 Saj ja ti ga dober na začnev. Zamo tudi, da bom ti prišli na ksenonarmi. Zato bomo prejnit pomembno ksenonarmi. Zveni i nogos, v San Diego, Sankt Sammi, in Linus Korn, N. America. Zdaj, tudi 13 moneč, ne pač, da se prišli na čas, od kaj glasba. Tako da je bilo. Po 1 mnistu, kzenni v September, kzenni vzelo v linux 3.7 in za armi. Po mnistu, kzenni bilo vzelo v 1. reali hrgvor, vzelo v vsatare spresu Cortex-15. V janu, vzelo, je in vzelo vzelo kzenni v 1v8, vzelo vzelo vzelo vzelo vzelo vzelo vzelo vzelo vzelo vzelo vzelo vzelo. I March Citrix announced that it was going to join the NARO, that it is a high profile forum to do open source development for the ARM ecosystem. And that is relevant because while this way Citrix can make sure that Xenin look after and also play well with the other components. Of course, the Linux kernel, but also bootloaders, grab, UFI, hard description formats, and so on. In v juni, XAN suporu za Arm64, da je v tem, da ArmV8 je vzgleda v linuče, na linuče 3.11. In v juli je XAN za vzgleda, da je vzgleda vzgleda za ArmV7 in ArmV8, 32-bit in 64-bit. Vzgleda vzgleda vzgleda vzgleda vzgleda vzgleda vzgleda vzgleda vzgleda vzgleda. Tudi, da je sestm probila, nekaj najtečne. V 2012. Sleepal je auto vzgleda in formuvali arm asa in v nol ljubi v zbrushelih. Znihče, da nešta ljubi vzgleda in arm nešta ljubi vzgleda in arm nešta ljubi. acrylic človek što najtečno zestima v cron. More importantly, 39% of these emails are not from Citrix. That means that they are not from the core team. Of course, there are a lot of individuals, Gmail accounts, but there are also lots of companies that are clearly trying to send an arm, evaluating an arm, even helping out and sending patches. Some of these companies are listed here. Some of these companies are going to present afterwards in this truck. So there are clearly a lot of interest on the project. Let's see the status. Today Citroen Arm supports the Versatale Express Cortez-15, the Hyundai board, and the R&B8 emulator. This is in 4.3. Now, in progress, we have a port to Kaxida Midway, that is a quad core Cortez-15 SoC. Apply MicroMustang, that is an R&B8 64-bit platform. The QB board 2, that is a cheap development board, Cortez-A7 based. Broadcom B15 and OMAP5. So lots of porting in progress. In term of feature, what do we support? So Xen4.3 supports all the basic lifecycle operations. So you can create VM, destroy VM, reboot VM, post VM, and so on. It also supports memory ballooning, so you can dynamically increase and decrease the amount of memory of your VM, and all the scheduler configuration and VCPU pinning. So Xen is a pluggable scheduler architecture, so you can select whatever scheduler you like across a set, including real-time schedulers, and they are very tweakable, so there are lots of parameters you can change. You can also pin VCPU to physical CPU, so you can say, this virtual CPU get 100% of the time of this physical CPU. And this is all supported on Xenonarm today, actually in the last release. Linux 3.11 supports booting on Xenonarm Asdom 0 and Asdom U, 32-bit and 64-bit. It supports multiple VCPUs, so S and P in the guest side. It supports Paravirtualized Disk Network and Console, so it's the main core Paravirtualized protocols. So what's coming in the next Xen release? Xen4.4 is going to be out at the beginning of next year, January, February timeframe. So we are looking towards filling all the gaps that we have today to have production, so these are the main gaps that we have, that is 64-bit guest support. So as so Linux can run 64-bit on Xenonarm, we don't have the tool support to be able to create unprivileged 64-bit virtual machines. Ian sent patch series on this, so it should get in for four. Live migration, and we're going to be at talk later today on this subject by Samsung, and I encourage you to follow it. I also expect it to make it on time for 4.4. And finally, the software utility B. So the first two item on the list are pretty clear what they are, what is the software utility B and why do we need it, why is it important. Well, now it tends a clear and simple status update in the very convoluted explanation of the software utility B starts. There is a blog post, actually, in the talk with a bit more information. So the problem... Was that in the picture? It's more fails. The problem is that in Onarm all the guest in an HPM container runs with nested paging enabled. This is using Xen terminology. Using ARM terminology, all the guests run with second stage translation. So it means that what Linux sees as a physical address is not actually a physical address. It needs to be translated into a machine address. And this is true also for DOM0. And when DOM0 then go and program a device to the DMA, it's going to use, well, the physical address it sees, that are not the real physical addresses. So as a consequence, the device is going to go and read or write all the pages, ending up corrupt in memory. So the best solution you can think of would be to use an IOMMU driver, for example, an SMMU 400 driver in Xen. So this way Xen could use probably the same set of page tables is using to set up the second stage translation for DOM0, also for the devices. So when the device is going to start, the DMA transfer, the IOMMU is going to translate the physical addresses into machine addresses and therefore everything is going to work. The device is going to end up reading or writing the correct set of pages. However, not all the SMCs have an IOMMU. And more importantly, we don't have a driver in Xen anyway for the IOMMU. So either way, we need another solution. So what we have been doing so far is to map DOM0 memory one-to-one, we call it the one-to-one workaround, that would be to have physical addresses equal to machine addresses for DOM0 and DOM0 only. So these works, because of course if the physical addresses are the same as machine addresses, DOM0 programs the devices with the correct address in the first place. Now, the problem with this approach and the reason why we never liked it is that it's a rigid solution and Xen is not free to allocate the memory for DOM0 from any range in memory. It just needs to select the same precise range that's going to assign then to DOM0. It also prevents us from using ballooning in DOM0, so you cannot increase or decrease the memory of DOM0, otherwise you would break the one-to-one or add pages that are not belonging to the one-to-one. Page sharing. Page sharing is a feature that is not very commonly used in Xen deployments, but it also will break with the one-to-one. So, yeah. Most importantly, probably the worst of these limitations is about foreign grants. So when a guest frontends grants a page to the backend to be used as part of the parameterized protocol, so block, network and so on, then the backend in DOM0 is going to map this page and then my or my not do DMA on it. So if this page belongs to another guest, of course, it's not going to be part of the one-to-one, therefore if it's going to be used to do DMA, directly it's going to break. So for this reason we were very unhappy about the solution and we tried to improve the situation by porting the software UTP driver in Linux to ARM and to auto-translate guest. So what I mean is, we already had, so first of all, software UTP is a library of function in Linux. But when I say software UTP, I mean the software UTP send driver that uses this library of functions in Linux. And today does it for PVGest. First, it solves a similar problem that also PVGest have, that is physical addresses do not correspond to machine addresses on x86 for PVGest. Now, aside from porting it to ARM, there is also the problem that, well PVGest are very different, they don't run in an HPN container, they don't have nested paging. So it actually needs to be changed and modified to run in this environment. Once that is done, the software UTP will be able to give you a layer of translation within Linux to translate physical addresses into machine addresses before doing a DMA transfer. And therefore, again, the device will be programmed with the right set of addresses from the start. And the translation is done cooperatively with the hypervisor. So in practice, the software UTP will ask the hypervisor, what can I use, what is the translation of this address, and therefore then it will use a lot of memory coherency problem going from x86 to ARM because there were a lot of assumptions made on x86 that weren't holding through on ARM. So after fixing all these problems, what they did is introduce a new hypercall so that Linux could allocate a set of pages, pass them to the hypervisor, and the hypervisor would make some continuous machine address space, so make sure it's a buffer continuous in actual machine address space and then return the machine address of the buffer to Linux. So now Linux has a safe buffer to use for DMA and can use it to bounce all the DMA requests into this buffer, knowing what is the right address of all the pages in the buffer. So the good side is we could completely remove the one-to-one loop around and at the same time use DMA as a driver for the network card without issues and so on. The bad side of this is it introduces an additional mem copy for each DMA transfer. So clearly this was not an ideal solution so we were still unhappy about it. So we tried to improve it further. So the problem is the additional mem copy. So how can we remove the additional mem copy? So what we thought about doing was to dynamically translate physical address of a page into machine address by asking that provider. Now it's not the simple, first of all Xen doesn't make any guarantees. There's a physical to machine mapping stays the same during the runtime of a virtual machine. So it's not just about getting the current machine address of a page, but it's also about pinning, like freezing the mapping, the physical to machine mapping of that page. And then obviously give it back to Linux. Linux would keep track of these pin pages, pages of well-known machine addresses using a red-black tree. So at this point the Software Hotel B would be able to use the original page that Linux allocated to do DMA. It didn't need to bounce every time the buffer anymore, as the mem copy could be removed. However, the red-black tree maintenance turns out to be expensive in Linux. And in order to do this translation, we had to do many guest virtual to machine and guest physical to machine translation in the high provider, all uncached. As a result the CPU utilization increased so much it was higher than in the mem copy case. So in fact it was not the improvement we were hoping for. So of course we could have kept going in this direction and trying to cache the translations or having a larger set of pages that we pin and we keep pinned in Linux or change the structure we use to keep track of these pin pages. But maybe it was best to go back and revisit our approach from the start. So if we go back to what are the problems with the one-to-one workaround, well it's a rigid solution, but maybe it's time to climb down the ivory tower we stuck ourselves in and come to a compromise. So no ballooning in DOM0 actually lots of extend deployments today on X86 do not use ballooning in DOM0. A conserver does not use ballooning in DOM0. So maybe it's not such a big of a deal. Page sharing is the same way not many deployments use page sharing at all. So the only real problem among these four points is the last one. So we do need to be able to handle foreign grans otherwise we break current parameterized front and back end protocols. So the way we did it is we kept the one-to-one workaround design the software to be for the start just to for arm just to take care of the foreign grans problem. So if we do this, if we start from this assumption then we don't need any pin and pin hypercores because the grant mapping itself already gives you back the machine address of the grant being mapped. Also we can take lots of shortcuts because knowing that well the one-to-one case and we only need to do lookups for foreign grans that is a huge advantage. Finally of course the tree is going to be smaller because not all the DMA request end up being actually just a small portion of the DMA request end up being involving foreign grans. So overall the tree is smaller, the lookup are faster, the tree maintenance is faster and so on. And of course all of these is still avoidable if we had an UMMU driver in my UMMU and SOC. So this is the last solution we tested on a quad core 1.5 gigahertz Cortex A15 with a one gigabit link and the result is we finally had the same throughput as native and as Xenonan with just the one-to-one workaround and nothing else with less than 2% of CPU utilization increase. That is exactly more, I mean before in term of performances and in term of solution. Now the patches are out there they are not in Linux yet so initially the work required some changes for the pin and pin, hypercall and so on. The latest version because we dropped this part is only a Linux side patch series is a big to be honest. So I'm not sure it's going to make it in the next merge window but if it's not going to be 313 it's going to be 314. We have also the kernel trees available there. So enough about the software it will be. So what's left to be done so this was the last item for the Xenonan 4 release so what's missing. So there is a lot of backfishing and stabilization that we need. And even so we are a bit far from the next release it would be good to start now. So if you try Xenonan on your board and you find it has a stability your there is a bug to be fixed I encourage you to submit a patch and we'll apply it as soon as possible. So benchmarks so benchmarks is interesting especially if you are working for a hardware SOC vendor you are trying Xenonan on your SOC it would be very useful if you will run some benchmarks maybe find some bottlenecks and fix them. So there are still a lot of low hanging fluids here so it's relatively easy to gain a 5-10% of performance improvement by just fixing a couple of functions at the moment. So that would be also another area where you can help us. So the biggest item are the last four over there. So of course the famous IOMI build driver in Xenonan and now you know why it's so important. Device assignment is probably the last high-level high-profile feature that we're missing and it's about assigning a device to a VM that is not Tom Zero. So the high-provisor has already all the capabilities needed to do it but we don't have support in the tools so you cannot go and write a nice configuration file and say it's not yet. And finally you will find the CPIs. These are two technologies that come into ARM. UFI is relatively easy to support. A CPI has a different order of magnitude of complexity but we are working on it within Linaro. So I'm gonna use the last few minutes I have to show you a quick demo. So this is Xen booting on Calxida Midway SoC It's going a bit quick but I'm gonna slow down initializing Xen. In a moment I'm gonna go back to the top. Yeah. So this is Xen 4.4 so it's unstable booting on ARM 32-bit processor. The platform is Calxida Midway. It found for physical CPUs and initialized physical CPU in the Xen hypervisor. So DOM0 is booting Linux Linux 3.11 Rc2 Going back and now I'm gonna just start a couple of VMs. So this is well as you know the normal Excel tool that works on ARM, listing one domain that is just DOM0 with 128 megabyte of RAM processor. As you know you don't need to assign the same number of physical processor in virtual processor to DOM0 so we just gave one. So this is a configuration of the first VM I'm about to create. So build the Linux it doesn't mean much actually in this case because we only have one type of guest so this is mostly useful on x86 to distinguish between PV and HVM guest. You have kernel so we are specifying memory 512 megabyte disk so that's interesting because so 2v2 CPU that's interesting because it's a physical partition. Being a physical partition means that the front end is basically the pages passed by the front end to the back end are gonna be used directly for DMA. In other words without the software it crashes. Now I'm gonna create the VM. This is the bug output that you normally only get using XenConsole. I'm connected to the console. There are two processors as you can see in the guest. 512 megabyte of RAM a network card we got the IP we can ping google always a good sign. Starting a set on VM this one is different as you can see it's called SUSE and in fact it's an open SUSE system D starting it's an open SUSE virtual machine kernel 312 XenVM is the name of the virtual platform with port of virtual machines there is a network card here as well now I'm just going up to retrieve the IP address of the first virtual machine so I can show that they ping each other yeah they ping each other now I'm creating the last virtual machine this one is also different so Julian here ported 3dsd in his own spare time to ARM in the last couple of weeks this actually demonstrates how easy it is to port another operating system to ARM and the reason is well on ARM we don't require invasive changes to the guest kernel so actually just a matter of writing a couple of new drivers that's why it was possible to compile with C lang and yeah we have a disk and yeah it's previous D for real and that's it now I was just basically showing that vcpu pin works and if you are already familiar with vcpu pin command it's actually trivial vcpu pin works exactly the same as on x86 so that's all so if you have any questions maybe a software IOTLB one-to-one workaround how is that going to work with driver domains yes so that's not going to work with driver domains but if you want to use driver domain you probably require an IOMMU anyway so if you have an IOMMU then it works you don't need devices I mean is going to require but since we know there are SOCs that don't have an IOMMU available for this that's yeah so without an IOMMU it's going to be very hard for driver domains another question I suspect the answer will be the same do you have problems with matDMA to grant of memory if it's not contiguous no because the software IOTLB takes care of it but that's using mem copy in contiguous memory so on the block layer we force it to be basically only one page at a time so yeah it's an existing other questions so I suggest we run over a bit and we just eat for five minutes into the break and make sure that all the other sessions in the other room are aligned so everything will be five minutes late because we started five minutes late have you ever tried that some use partition to mount in DOM0 I did try and it works what is the software IOTLB and what is the implication is that DOM0 have its own buffer cache and DOMU have its own buffer cache I mean of course they do not synchronize it may crush the entire file system so yes of course when I said yes I didn't mean while the VM is running so it's not a good idea to mount to guest partition DOM0 while the guest is running I meant after or before you can mount it to install things or edit it other questions no well thank you for the demo and a great talk and maybe we don't have to be five minutes late