 Okay, can you hear me guys? Can you hear me just well? Okay, so I guess it's time to start sorry for the delay But I was told that I had to finish I had to start just on the time and also finish on this on the time so Hey guys, my name is Michael Priosnik I am a senior software engineer here at Red Hat and for all of the time I've been employed there I've been working on Librear and Today I'd like to tell you how you can perform in studio virtual machines using just like the tool So firstly what Librear is Well, no firstly, let me start with a question who here has ever heard about Librear, you know, oh wow cool and Well, yeah, and who hasn't one two, okay, so Majority here you have to bear with me a couple of slides in the beginning because I had to explain what Librear is for those Who doesn't know? So if you guys have played with the virtual machines like you know hypervisors You probably notice that each one of them is being controlled completely different for instance in KMU You have to build this command line where you specify all the devices that you that you want your domain to have Whereas in for instance virtual box you have to click it in their GUI or just use some API they expose In Zen you have to create a config file and stuff like that so Librear's aim is basically to hide all these details from you and create a unified Configuration and basically management system So that you don't have to bother with all these details and just basically use the virtualization And benefit from it We are basically implemented in a C which means we are just C library. We are standard C library However, we do expose our API's in many other languages out there like like Pearl Python Java Ruby, whatever you name it So you are basically not all limited to use Librear in your project From this point of view We do provide a stable API Which means whenever an API has been released we keep it around for the rest of Librear's lifetime And we try not to break it and as a part of that the configuration itself configuration format Is considered stable as well and as I said earlier we have support for many High high providers out there like like KMU virtual box we and where Zen in all its flavors Using more Linux, believe it or not And also when it comes to playing with the virtual machines, you know managing them you want to prepare the host environment Some some way as well for instance, you want to pre-create all the network devices Detach all the PCI devices that you are going to pass through to your domain and stuff like that. So well, Librear has a powerful Substitutes them for that as well So how does domain configuration or virtual machine configuration in Librear look like? I Maybe you know spoiled something because Domains is how we refer to virtual machines in Librear Fun fact who knows Why do we call them domains and not virtual machines? Okay, who knows and is not a Librear developer So it has a historical background I in order to answer that question I had to remind clock back to 2005 that's when a Librear project was started and it started as nothing but a Zen wrapper and Even though we gained support for other hypervisors during the time We basically stuck with the naming that Zen uses and you know, it's Referring to virtual machine as domains So the domain configuration is basically an XML document Yet again, we had we had only two options back then when we started the project It was either XML or Jason and the because the guy who has Started the project stands also behind Librexaml 2 and is a part of W3C XML core working group or something I think the choice we made was pretty obvious so in in the in the XML basically the domain configuration is Just putting the correct elements into the correct places or you know setting the correct values to those elements and attributes and This XML not only described the guest part The guest visible part, but it also describes the host visible part for instance Here in the emulator tag. I've I've told Librear which emulator binary should be used when trying to execute my My domain and also a couple of you know what should happen when a lifecycle event occurs Just a second, please Okay so The performance tuning with Librear is really just putting the correct values onto correct places Within this document. That's it. Librear will basically then take care or will take care of the rest Like setting all the knobs in in the host system so Yet again couple of historical backgrounds firstly The first approach to virtualization Has been so called full virtualization Which is done today's as well, but we will get to it So when it comes to full virtualization the hypervisor basically does something called binary translation Which means it scans the code that games gets wants to run and Replace all the non-mutualizable instructions With some trap which we draw then whenever a guest Wants to run such instruction the trap is hit the control is handed back to Emulator which will hopefully emulate the instruction and pass the control back to the guest This is this is nice. This is one big advantage Meaning that you don't have to modify your guest at all. You can just take whatever a system you have place it into virtual machine and run it as is However Historically this hasn't been you know the top level Performance possible so another approach developed and it was called Paravirtualization this is something that for instance then does Basically instead of the hypervisor scanning the guest memory and replacing the virtual the instructions The hypervisor it will instead expose a set of API's and let guests call the API Whenever it it wants to you know do something privileged The problem with this approach is that you have to modify your guests And you know given the rate how the new software is being released for instance nowadays It wouldn't be feasible to you know catch up and do all the modifications so even though Begin the old days the power virtualization has been used a lot Until the hardware vendors came in They introduced something called hardware hardware assisted virtualization Which basically tries to Eliminate the need for the for the para virtualization And in case you want to check whether your host is capable of of this You what you are looking either either on vmx CPU flag or SVM CPU flag depending whether you are running on Intel or AMD architecture And for some reason please don't ask me why This may be not enabled by default on your motherboard So you need to you know get your hands dirty and go to BIOS and enable it there Don't ask me why I'm I'm no motherboard vendor This has been such a breakthrough that many High-provisors are built on the top of hardware assisted virtualization notably the KVM Speaking of which who knows the difference between KVM and QMU Come on raise your hands. Don't be shy Okay So for for those who don't The KVM is the actual hypervisor doing all all the you know As I've shown here full virtualization and the KMU is there to emulate the IO So it works like this KVM is a loadable module in your kernel You will you will have to enable it if already not done by your distro And actually KMU is the one who sets up sets up the or call the you know APIs in the in the in the module and Whenever a guest wants to do some IO It gets to KMU which will then decide where the data should go How do you enable hardware assisted virtualization in kernel? Oh, sorry in in Libvert. It's really simple You just select the color correct domain type which is in this KVM How does it perform? Well, I've done I'm done some testing here on my laptop previously and As you can see what I basically just measured how long does it take for a guest to boot up? So in in the KVM case I was able to get shell in somewhere around 10 seconds Including the five seconds group timeout Yeah, and and if I wasn't using the KVM or the hardware assisted virtualization just letting the KMU emulate all the Instructions in in memory or in software After 60 seconds the guest was still booting up and I couldn't get any shell So I basically just kill it and stop counting Nevertheless, so we can we can see the hardware assisted virtualization is you know performs really really good however, there are still some cases where we want to Join where we want to have the para virtualization joined in and it goes like this So imagine you have some data that you want to send. I don't know for instance on the network You want to download something or you know? or upload bless you However Since we are still emulating still the KMU is emulating the real hardware We have to The chemist has to emulate it with all drawbacks that it takes for instance with all the errors or the interrupts and stuff like that So what if we just instead can take the data from the guest as is And let just host deal with them without needing for you know to do this all emulation miracle Actually, it's possible we can do use something called weird IO which is where the no para virtualization kicks in once more Sorry, my my and it really Boots your performance So basically the way that defines a new set of APIs for the guests to use And since it's a new device model you have to have drivers in your guest operating system For it to use however, unless you are running some really ancient system from I don't know 40s or something you can you are sure you can be sure that the drivers are available for you and The host bar drivers because you know it's still a new device are basically built in the hypervisor itself How do you enable with IO in Libreware? Well, basically This is this gets a bit tricky because for instance for this you have to place them onto the correct bus Whereas for instance for interfaces you have to sell the correct model And same goes for mem balloon and stuff So before doing this you should really consult the documentation But it's it's really Easy you just put the correct value correct virtual world Into the correct place Either in attribute or element How does it perform? So I don't measurement yet again on my laptop So I've I've had a guest with and I've assigned with with the network art and I've assigned multiple Models to to it to the network art. So the the blue bar is some some real tech device the red one is an interface and Intel interface and Yellow one is weird tile so we can see with weird tile and one V CPU. I was able to Get like 16 gigabits per second Which is you know when compared to some Intel with barely two gigabytes. It's really boost however You can see some slight decrease here in performance And I suspect it's because my laptop has only four cores Well two cores and hyper threading so In in case where I where I gave it for VCPUs My my my processor has to not only Process the the network data, but also process the VCPUs themselves. So I think that's the reason why And at this point I would like to it's you know stress one thing that has been Deserved that has been demanded for a really long time and has been implementing just quite recently and it's called retail GPU and Basically works like this. So you take the Open GL commands from the guest pass them through onto your host And let let the host GPU card render them and basically render them directly into the guest frame buffer However, this has some limitations because of the process. So for instance, if you want to I don't know emulate an NVIDIA or an Intel GPU. You really have to have The corresponding card in the host as well because you know, you can hardly emulate NVIDIA and have Intel host GPU card, right? And it and it also uses some Host site software, which is called weird gel renderer. You can check it out online if you want to So That's really So now we should do some quiz to show to let me know how you pay attention And it's a great opportunity for you to win scars So the first first question what format does LibreD use to store the domain config? Come on Who said it first so I can give him scar Okay Stop by after after talking. Okay. How do you enable hardware system virtualization and in LibreD? Nope Donate it stop by please and how do you enable your trial? Yeah, you basically said the said the correct Value and to to some attributes and elements. Okay, cool so now I I will I will explain some some primitives that are in Linux kernel that LibreD uses when it comes to Well much slightly slightly complicated stuff and it's C groups So secret C groups are a feature of Linux kernel That allows you to either limit or account or basically just isolate some resources that get this that host have And not only that they can be used for prioritization in between processes as well. So for instance if you have I don't know this this greedy Application running in your host you can place it into into a block IOC group and just basically limits limit How much bandwidth it has when trying to access a disk? The so the C groups that LibreD uses are CPU set This control group basically tells where you can have process running on which hosts CPUs Block IO described earlier and memory is basically the same as CPU set because it but only Limits the memory. So for instance it tells on which new nodes You can have the order the process can have some memory The C groups form some hierarchical tree, but there has been some work in this area lately in the kernel. So maybe this will Be obsolete soon And you know Previously you could place the process anywhere within the tree with a new approach You can place them only into these but that's you know, we don't need to get to low level here So how does LibreD uses these? When you know trying to performance boost your virtual machine We use it for pinning. So imagine this big host with many CPUs and many Memory nodes or chunks of memory In in this host some CPUs are more closer to Some chunks of memory than the others and vice versa So if you have a domain running Which is basically Linux traditionally Unix Linux process You you want to have the data processing as close to the the actual data as possible So you basically want to Tell that my memory should be running here among with my virtual CPUs and Of course, even though you are still using Weirdtio hopefully You emulator is still doing some some work So we want to have it as close to the domain as possible so this is yeah, yeah, this is Slightly complicated to configure in LibreD, but it's slightly complicated stuff anyway so for instance in this specific case I've Paint the virtual CPU number zero and Allow it to run on the host CPUs number one not two and Three and four The same goes for emulator or for memory and stuff So How you can boost your storage Even when it comes to virtualization you basically have actually three layers Of caching so the first one is in the host operating system itself, you know, it's an operating system It has some cache all the processes on the on the on the system have them The second is in guest operating system yet again and the and the third one is just in between them in in the KMU And whenever guest wants to write some data The data has basically has to go through all of these Caches unless configure otherwise However It has some advantages and also disadvantages, you know playing with the KMU cache So the KMU cache can basically Be turned off using the cache none Which saves saves us some time because whenever data has to pass through these layers It has to be copied and you know it takes some time However if for some reason your guest gets disconnected from its storage The KMU cache can actually save save you from data loss So if you trust your guests that this scenario won't happen If you trust your guests that this this scenario won't happen you can Turn the KMU cache off otherwise you should you know you should be really cautious about it One attribute that you may actually seen here is the IO and it has value of native The other value it could has is threats And to So what does what does this do? I will refer you to the next talk. Well, actually the third told after me I guess Where this is going to be explained in much more detail and I think some performance Graphs will be shown as well So it's working So Back to my previous example you had some data that do you want to pass on to onto network? So that okay, you are using with tile. However How you can enable the you know you want to use something even? much faster than that Near bare metal or bare metal well, you can you can basically use a PCI pass through which is You take the PCI device from the host detach it and Place it into into your guest and let it use it However, it has this advantage In that sense that nobody else beside the guest itself can use it So before you try this on your own Please make sure that you are connected either to Wi-Fi or why another interface because you are definitely gonna use connectivity So well yet again another help from hardware vendors Was needed and they they have developed something which we call IOV which means that the That the PCI devices are able to Create the virtual functions on the fly after after the get after the host operating system is booted up and You can actually pass through only the virtual functions into the guest So the PCI device is actually after all shared with either host or with multiple guests There's really no miracle here the bandwidth that such card has it's basically shared so Don't don't expect 10 gigabits on a 1 gigabit network So this is how you configure it in in Libre xml You basically in this specific example. I'm using a PCI device on this address on in the host to be passed through to my guests and I should mention that these Approaches require yet again some CPU features Yeah, no not only VTX, but also VTD anyway, so There is a couple of things that you Your your host should have when trying to do all of this So is there some tool that you can use to check whether you have everything you need? enabled yes, there is and it's we're also validate it's a Really small binary that Lifts in our repository and it will do all all the checks that are needed not only for the KMU but for containers as well and It will print basically some really nice output and if you failed to set something It will at least give you a hint where you should you know Where you should be aiming when trying to enable it So my enumeration of all the possible You know values that you can set to performance boot your virtual machine or container It's not complete and really cannot be complete because we have a limited time here, but I would refer you to our documentation and also to our Redhead documentation where This topic is covered in much more detail in case you have any questions feel please feel free to ask them on our mailing list and Hopefully you will enjoy my demo that I've prepared Its aim is to show you the the storage Boost and Computing boost For the graphs I haven't prepared Just a second. Yeah, I'm here. Okay, and hopefully that's Visible for you guys. Yeah, so I have prepared two to virtual machines and only Okay, maybe it's too too big just a second. Yeah so in the first Difference that we can see Is Okay, it works. So in the dev conf tuned I I really am Pinning the db CPUs I'm enabling Well, basically disabling the the cache with cash none and I am the other difference that I got here is I'm using Virtaio when compared to you know some real tech device So, let me just So for instance Just a second. Yeah. Oh, sorry Yeah This is a very small small program that I have in both virtual machines You can you can see it's basically a dummy program to just do some some memory excess So, let me let me see How does it perform? How does it perform here on a not tuned machine? It's gonna take some time. I guess Meanwhile, I'm gonna run the the same program On the tuned machine Still working Yeah, when I when I tried it at home it took like 40 seconds. So please bear with me here Meanwhile what we can do is to Is to show you how the storage boots up? super-secret password so for instance, I can Try to write some data On the disk for instance One one gigabyte and I'm gonna use Well, this is okay. Cool. So In the not pinned Virtual machine it took 43 seconds to complete my little test and in In the tuned it took only 31 seconds cool Nice So the the right speed has been something about, you know, seven roughly 70 megabytes per second whereas in This one I was able to get to get nearly one gigabyte per second which is Kind of strange because I don't have so fast SSD Nevertheless Yeah, so I guess this this other example I'm showing is probably spoofed nevermind So I think that Full block sure Okay, and Yeah, so Over some sounds slightly over six megabytes per six hundred megabytes and when compared to 150 and Okay, so that actually brings brings me to end of my talk So the conclusion here is you should enable hardware assistive virtualization whenever possible you should use prefer your tile and And As you can see the see the CPU pinning or actually the pinning itself Makes sense on on some small machines as well, but you know, it's easier to explain on huge machines So if you have any questions, please feel free to ask them now Yes, so the question was whether the Computing directly onto onto GPUs is supported As far as I know we are still trying to work on it There has been some patches on the on the upstream list, but frankly, I'm I'm not here me developer I'm a Libre developer. So I you know, I maybe have I may be have delayed something Maybe have not seen something and stuff So you should better ask some came you developers Yeah, yeah So so so the well, so yeah, so the so the question of basically point was that the host This scheduler has some influence on the on the This guy or as well or this performance as well. Yes, it has From all the all the tests I made yes, as he said the Deadline scheduler has performed the best You know Try and see whatever suits your workload Yes, yes, you are. Yeah, well basically a partition Yeah, so the so the question was why are we using searoops instead of setting the CPU affinity? From within the KMU The problem is we want to be able to To change it afterwards Because sometimes you you users may have decided that they want to move the the cp the vCPUs onto some different, you know CPU set and It wouldn't be possible for Libre to do so because at that time the came you he is Itself another process So we would need to instruct KMU and kindly ask it to move there because you cannot really modify the affinity from outside the process So we it just instead rely on on the c-groups sorry That's already implemented and implemented. Yeah, it's in the docs If you check them you will find okay, so if no more questions, thank you very much Thank you