 How many of your sysadmin's here? Maybe use KVM, sorry? Open easy, open easy is containers right? What's that? One line it's like what VMware does in one million lines of code? KVM does it in 10,000 lines of code. What's that? Yeah, it's not a theoretical that's insane. There is this thing called spec word. That's not weird and it's a third party. Okay, let's start. Anyway, my name is Kashyap Chamakri. I work for RATS, specifically in RATS Identity and Security Engineering Group. I also deploy a lot of KVM as part of my test infrastructure and also general upstream interest. So today I'm going to talk a little bit about KVM-based virtualization, how hardware virtualization came in and a little bit about management which is like limp words and disk image modification tools where you can edit, modify, inspect virtual machine disk images in case if they're screwed up and you want to rescue them. You can use this looping called libbsfs and then we'll do a little bit of a small demo and then if you have time for questions, we'll take them. That's the agenda we just talked about. KVM virtualization. Am I audible to the last person there? So Intel and AMD started, what's that? So Intel and AMD started providing, you want me to drag it ahead? Okay, that's why they kept it there. So okay, Intel and AMD started providing hardware extensions, hardware virtualization extensions around 2005 or so. So what exactly these hardware extensions provide? They enable a new operating mode called guest mode in the CPU so that virtual machine instructions can run natively on the physical machine. Not all of them, mainly CPU based instructions. Mode instructions doesn't run on physical machine from the guest OS. We'll talk about that. So KVM is essentially a kernel module. It was introduced around 2006 or so. It's a loadable kernel module and this is an engineer called Lavek Kiviti. He posted the first patch and he was introduced into Linux kernel 2.6.20, I guess. So it's a kernel, not simple, it's a kernel module. So you could see three of them there. One is a generic kernel module and two are the vendor-specific kernel modules, Intel and AMD. Intel calls its virtualization as VMX and AMD calls it SVM. VMX stands for virtual machine extensions. AMD SVM is a secure virtual machine. It's just similar implementation with just a different name. One of the neat things with KVM is it turns the Linux kernel into a hypervisor. Hypervisor is just a fancy term for the ability to run multiple virtual machines. It's also called as virtual machine monitor. VMM, that's how it's referred internally by Intel manuals and stuff. So it essentially exports a character device called slash there, slash KVM, so that user space could create virtual machines and allocate memory to virtual machines using this character device. So the neat thing with KVM, and it's a powerful thing with KVM is it reuses existing Linux kernel infrastructure. What does it mean? It doesn't reinvent wheel by any hypervisor needs, standard things like CPU scheduling, memory management, timer handling, NUMA, device drivers, all these things. Instead of re-implementing all of these things with a special kernel for the guest, KVM reuses Linux kernel infrastructure. So kernel has excellent support for all these things. So why do you have a beam and a wheel? So that KVM developers can concentrate on the core part of the problem which is virtualization itself. So naturally KVM's design and development is driven by hardware specifications so that whatever Intel and AMD bring in innovations, there's VTX, X is for CPU virtualization, VTC is for connectivity, and network is IO virtualization as well. It's done by PCI Group. So there's several things in progress. Anyway, this is a high level overview and architecture of KVM. At the bottom of the stack is everybody able to see that? There's nothing much there anyway. The stack is X86 hardware which supports virtualization extensions either Intel or AMD. And then there is Linux kernel which runs KVM, which puts the CPU into the guest mode whenever you want to run instructions natively on the physical machine. And KVM, KVM is an interesting piece of software. So KVM itself cannot do everything. KVM does CPU virtualization. So it runs CPU intensive workloads directly on the physical machine. But to run a complete virtual guest, we need a PC-like environment. We need IO devices like VGA cards, network cards, sound cards, ID disks, mouse, all these things. Where do you get all these things from? All these things are emulated by QMU. QMU stands for Quick Emulator. So it's the IO emulation device, sorry, software. It does all the IO emulation. So why do I mode regular applications there, like Firefox and other stuff? Each virtual machine runs as a QMU instance. So each QMU instance is a regular Linux process. What does this mean? It means we could use standard Linux process infrastructure tools like PES, GIL, DOP to monitor virtual machines. So that's the reason I just listed regular applications. So it's pretty powerful that way. Let's just briefly about KVM. Can you be a bit more louder, sorry? Are the applications running as a process? Okay, the question was, are the applications running as a process or the virtual machine running as a process? Virtual machines are also like regular Linux processes. So virtual machine is a process. Applications are also a process. Modify the RAM for what, sorry? For the virtual machine? For the virtual machine. Would you give a RAM for the virtual machine? Yeah. Okay. Yeah, you can do hot memory-adding and stuff like that. You could add memory and reduce memory. There is work for that. This comes under Luma and stuff. So I can show it to you live once the session is done. There are a couple of management tools. LibWorld is essentially a hypervisor agnostic management library. What does it mean? It supports several hypervisors like KVM, Zen, etc. So that you don't have to learn hypervisor-specific commands to manage virtual machines. This is just a high-level view of LibGestFS, LibWorld, and WordTools. What is LibGestFS? I have a detailed slide for that. In short, what LibGestFS does is it will let you modify, edit, inspect virtual machine disk images. Let's say they're broken. You could rescue them by, you know, if a machine is screwed beyond the repair, you could just boot that disk image and copy out the contents so that you don't lose all the data. So it's just a slightly different view from what we saw in the different picture earlier. So LibWorld interacts with QMU. We could involve virtual machines with QMU command line. I don't know. Anyone has seen QMU KVM command line? No, I think we could imagine. It's like several lines of... I can show it to you when you're doing the demo. It's like several lines of parameters and directives to pass to it. So it's difficult to keep track of it, and it's not usually recommended to run QMU KVM itself. So LibWorld takes all the best practices and provides it the best command line. So LibWorld interacts with QMU, and there are a bunch of tools written on top of LibWorld. WordInstall is a command line installation software. It comes with the package Python WordInst. And then there is Word.sh. Word.sh is the shell, virtualization shell interface for LibWorld so that you can do all the general lifecycle management. And there are a couple of more graphical interfaces as well. How many of you heard of Word Manager? It's cool. Okay. Don't... Okay, I won't say that at the moment. Word Manager is one of the most common things which people are aware of, and Boxes is... It's called GNOME Boxes. It's integrated into GNOME Shell directly. It's for desktop virtualization. It's one of the projects which is being worked on. And LibWord, LibGuestService, provides a shell tool called guestfish to interact with disk images. Just a bit more detail about LibWord. LibWord is... It uses a client server model so you could connect to a remote LibWord D given from your client machine and you could access and do all general lifecycle virtual machine lifecycle operations. And it also uses an XML format to define virtual machine guests. So everything is an attribute and an element. Right from CPUs to vCPUs, your memory, disks, devices, all devices, IO devices, everything is implemented as an XML element. So there's a new neat tool to access this XML files and edit as well. And like I mentioned, Word Assets is a shell interface which LibWord provides. And there's just a bunch of things which I know already that you could do several things. There's just a laundry list of some things it could do to manage virtual machines. You could manage devices, network interfaces. You could hot add network interfaces and hot unplug hot add. And Snapshot. Snapshot is an interesting thing which I really like to do because in development or test environment, you should have a known state where you want to reward too. So this gives you a... Yeah, exactly. He's talking about clipping bandwidth and capping the amount of bandwidth virtual machines can be assigned to. There's this complex technology called C Groups which will do that. You can do good capped network bandwidth and stuff like that. You can assign and allocate resources. So yeah, you can do that. These are disks, actually. Snapshots, they're different flavors of Snapshots. They're internal disk Snapshots as in the original Snapshot and it's Delta. Everything is stored in a single file. And there are external Snapshots where the original file is the single... the original image is the single file and all the Delta's are different files. And then you have VM state Snapshots which captures the memory state of a particular virtual machine. So there are different flavors of Snapshots. If you do memory save, there's this VM Snapshots which do memory checkpoints Snapshots, I'm sorry, which takes the snapshot of memory of a virtual machine so that it saves it into a file. So whatever application we're running, when you reward and invoke it, they're just back and... back to the same state with which they were invoking. Libge is there first. This is a bit of network tuning. We could get back to it once, you know, once we do the demo. It's of course a bit of detailed discussion. So we don't deviate from what we have in hand. Is that okay? So LibGestFest is also a library, an API that provides a shell interface like guestFest so that you could script your changes to virtual machine diskimages. So you could... One practical example is recently one of the virtual machines I was using got screwed up to do SLNX and I'd like to try to boot that virtual machine by placing SLNX in the permissive mode. How do I do that? But the virtual machine doesn't boot. So enter guestFest. So this boots the disk image when the virtual machine is on and off. Sorry, you should not run guestFest when the virtual machine is live and running, you could cut up the data inside it. So you invoke guestFest on a disk image, which is off, and then I lock into it, edit the SLNX config file permissive and then boot it in. So this is just one of the practical aspects. There are also several word tools based on LibGestFest which are essentially wrappers around the LibGestFest API so that they're easy to access and use the tools. So that's just briefly about LibGestFest, it lets you essentially do a lot more with diskimages. I just noted a few of the things which it could do. Resizing images, you could even create a disk image from scratch with that. Okay, let's quickly see a demonstration. I think I should Is it visible? So virtual machine creation, there are several different ways to create virtual machines. One of the common ways which I use are a word install and OZ. This lets you create virtual machines unattended so that you don't have to provide any manual input to the virtual machine and you could just fire it and forget that the virtual machine will be back up. Let's just quickly walk through how it does. It creates a virtual machine in four minutes or so. A minimal virtual machine which involves all the code packages so that you have all the necessary tools and infrastructure so that builds the VM to whatever you want to make it as a main server or an identity server whatever it would be. Is the text visible? So there are two different types of disk image formats. One is raw and the other is Qo2. Qo2 stands for cumul copy and write. It's a versatile format. Raw is let's talk about raw first. Raw gives you ability, it gives you robust and it's robust kind of and also it's robust against power failures and it's kind of fast compared to Qo2. So usually production disk images they tend to go raw. Qo2, Qo2 has improved a lot. These days I use Qo2 for most of my infrastructure but you have to do a little bit of tweaking to get the disk image to perform as well as raw. So it's not a bigger deal. So Qo2 provides you the disk image snapshots. They are possible with Qo2 disk images. So let's just quickly look into this shell script which I wrote. Is this text visible? It's just checking if there's a bridge or not. Let's go to the meet of the part which does the installation and this is the tool. I'm invoking it this way and it connects to QoMU. QoMU is the initializer like I mentioned. KVM cannot initialize everything. So QoMU initializes all the IO devices. So we are connected to the QoMU hypervisor there. System is the root access so that you get all the elevated privileges and network. I'm just using standard bridge. Lippert provides a standard bridge which is the phrase called verb ar0. It gives you nothing. So I'm just using the standard bridge and I'm using a minimal kickstart file. So let's quickly look into this. How many of you are familiar with kickstart files? Cool. This is the minimal kickstart file. That's it. The kickstart file ends here. So just a bunch of random things. Root password, just use the same one. Packages. Only at core packages. You can see that. So it will install the minimal virtual machine. Let's go back to this thing. This is missing some kernel logs. Yeah. Okay. Anyway, let's go through this. So here you could provide kernel arguments like wherever we are here. You provide a minimal kickstart file and then you provide extra art as in you provide serial console. Let's say you don't have a virtual machine inside your virtual machine. How do you access the virtual machine? So this provides you a serial console so that you can connect to the virtual machine and do some kind of rescue operation or bring up the network. And you provide the name which is self-explanatory and disk path. I just gave it a variable upstairs and the format which we use caches none. That's performance optimization parameter so that your disk images are optimized for better performance. What if whatever operations are happening with the virtual machine, they're all written to the disk directly. Yeah, they're all written to the disk directly. So there's cache code to write through, write back and etc. We can discuss about that. And it does check the CPUs as in the virtual CPUs which you're providing are not more than available physical CPUs. And we have CPU set which will do Luma. How many are aware of Luma? So Luma is just not just it's non-uniform memory access we could allocate CPUs to physical, virtual CPUs to physical ones and also memory access. CPUs can access memory which are local to their node and stuff like that you could do with this. So when you provide auto you don't have to do all the Turing stuff. This would select the default and whatever is same on your machine it will just auto select that one. An Accelerate it will use KVM acceleration. This is actually by default. These two flags are now taken by default. So you don't have to provide that. And then Location, I can provide path to a tree of installation. Yeah, it usually takes a tree. Or you could also provide ISO but usually with tree it does slightly faster. Of course it's all on land assuming. And OS type next and OS variant Fedora 16. So these two will optimize the guest further for better performance like it will pick what I have. What I have is the IO framework for KVM based virtual machines and there is no graphics so that virtual machines I can put them directly on the shell and it won't give me a graphical interface so I don't have to deal with X. This is available on Fedora 16 I'm sorry, where is the question? Is KVM available on Fedora 16? KVM is developed on Oh yeah, it's available. It's available on all Linux distros. It's there by default. It's shipped by default. It's integrated into Linux kernel. So whatever you're installing the packages, they're the user-level management software. The Liberty management software. So it's on by default if you have the next kernel and you're assuming your CPU supports virtual extensions, how do you check that? Let's check if our CPU supports virtual machine extensions. Let's grab for VMX and SVM or AMD we're grabbing for Intel extensions or AMD extensions slash pro CPU info If you get some kind of output, you do have it. So yeah, I have four to five processors on this so that's why for each processor it is listing so much of output. On a regular clean interface, you'd see it much better. So it's in there somewhere definitely. If you get any kind of output yes, it does support it. Sometimes. And also, let's see if our machine is properly configured so that Lubeword can manage the gask. So let's run 0 What host Validating So it checks for hardware virtualization available device KVM slash KVM available. It checks for a couple of virtual network devices device files available and it checks for Linux containers as well. This is a whole check. Lubeword like I said you could manage plenty of hypervoiders, right? You could also manage Linux containers so it is also checking for Linux containers. So if you want to check specifically for QMU so we could just do that. So we'll just show you the QMU So let's just see what does the installation look like on the STD out on the shell. This is the standard installation on the shell. We are not opening up a graphical interface. Everything is done on the shell so it just picks the Linux kernel and image etc. This is all the output which you get when you provide a serial console on the extra arcs for the kernel. We would see what is that so you could see package installation like that on the shell. So you don't have to invoke X whatsoever. So these are the extra arguments, kernel arguments which would provide us a serial console. OC is again another tool with which you could create virtual machines with as minimal input as possible from user. You could also automate this right away. With this you could install virtual machines even with ISO. Let's say you don't have access to a tree everyone doesn't have access to OS trees on their machines right. So if you want to do it from an ISO you could do it with OC. Just quickly let's see what it takes. This is again just a sanity check for the script. Let's go to the meat of the box. Formatting is really stood up at least we'll do better for it. Okay it will take a .tdl file. I don't know if it's visible. It's called a template definition file with which it takes. It brought the name again version. Standard options I brought the ISO file path and root password and etc. So it just takes everything from this file and then it will invoke and just making the .tdl using standard redirection. I'm just carrying it to a file. And then it invokes the OC install command and outputs the .tdl to some text files. It's just an alternative way with what is it you can't always do unattended with ISO. So OC could do it with ISO. OC is being used as a code part in several cloud projects like EOS Project, OpenStat. All these things to use. Sorry not OpenStat. EOS uses it. EOS. You could do check much about EOS and EOS project. So let's just see some quick virtual machine operations with Voresh. This will list all the virtual machines which are available on the machine. So all these machines are turned off. Let's try to turn on one of the machines. Voresh can turn off S16 T1 so that we could see all the booting on the shell. The console will give you access to the serial console. Let's say you could select the kernel there. You could see all the boot messages. Loads the initial round disk. You got the root access for the shell. That's good. There we go. So we are logging in without stopping. Let's say the network is not there inside the virtual machine. This is a very handy and useful way to see the boot messages and login to the virtual machine. Again this is the virtual machine. The host is called Tesla. But the virtual machine is called Stream. So you could just see the difference. So you won't get lost. Let's say Exit. And how do we get out of this? Control. What else are we got? I'm sorry. Yeah it's running. List. List will give us what are the virtual machines which are running currently. PS. I want to show that as well. PS. Or let's do PS. That's a good question. CUMU. Like I said CUMU works very well with KVM. So we are just gripping for CUMU. This also gives us the each command line which it uses. So just as an example I was talking about nobody at least a human can remember that much of a command line. So LibWord makes it easy for us so that it picks up the best defaults and gives us the best possible performance. Sorry to say that again. How much CPU does it consume when it is not running any virtual machine? I mean if your VM is not running anything inside the VM if you're not running any workload it shouldn't consume anything much apart from just the regular Linux VM process which it needs to keep the machine up and running. Yeah sure. When you're not familiar with it on a day-to-day basis of course we have all these kind of questions. I just wrote it down in small shell script so that I remember what I'm trying to do. Let's do a snapshot. Let's search list dash dash snapshot Let's list if this machine has any snapshots. So I'm listing the snapshots for the VM at 1621. It doesn't have anything. Again you could do live snapshots as well but let's do an offline snapshot by turning off the guest. Like I said there are several different types of snapshots. Internal snapshots wherein the snapshot and the master file everything is stored on one single file disk image. In word install there's a flag called vcpu dash dash vcpu and ram there's a flag called ram of course there's the basic parameter to get the machine up and running. Yeah that's a slightly advanced question again you could see that but again you need to have Mooma and all kind of binding available on that machine. There is no CPU pinning for that much of a machine at the moment. It's just a simple machine I created for this task purpose. Let's shut down this machine quickly it doesn't take much time. I could do that. What I said shut down as well. I just used it so that you could see the shutdown console messages on the shell. You could just do what I said shut down VM as well. This is the snapshot simple syntax for that so I just shut down the guest usually you could run this script right away. So snapshot create as for domain f17 it's just another watcher machine I gave you could replace any of the watcher machines you have snapshot2 is the snapshot2 name of the snapshot the other is the description of the snapshot. So let's do the same t1 that's our watcher machine right we're handling that one. So let's call it snap snap for root. It should blink for 10 times or so just because it's writing all the snapshot data inside the single qcurve to disk image it's taking a little bit of time if you're using external snapshots it'll be instantaneous because everything is written on a different file it's done. Copy on write. This is the disk snapshot and you could do VM snapshot by doing sudo where I said save VM save virtual machine name so it'll save the virtual machine memory state into a single file. So snapshot is created. Let's see that I'm not faking it all by listing the snapshot list what's the virtual machine? t1 there we go so we could just have a tree of snapshots I have another machine which has a couple of real set snapshot list I have a rel6 machine as well it has 3 snapshots you could also check the relationship between snapshots by just doing a tree so it gives you a tree view of the snapshots available in a neat way what else have you got? so you could also trivially apply the snapshot by saying verse revert snapshot name virtual machine name no snapshot name oh yeah you could do all kind of things both recently integrated into you could also hot add devices you could attach a network interface let's say you want to have a second network interface inside the virtual machine so how do you do that? while the virtual machine is running you could just really do like that so what is attached interface we are attaching a network interface domain lipword parlance node is a physical machine domain is a virtual machine so if you are confused by what is this domain thing domain is just an alternative name for a virtual machine which lipword uses it and type bridge I am using a source what source as in what is the bridge which we are using so we are using standard bridge virvr0 and model is vortio vortio is an IO framework so we could just do that for this virtual machine so let's do it for a running virtual machine which we have right now where as such attach interface domain at 16 t1 type bridge source vr0 0 model vortio let me just change this d we are trying to do a hot add the virtual machine we are trying to hot add the network interface but the machine is not running so let's just turn it on start at 16 t1 you could see what it's doing still coding up it already has two interfaces e1 and e0 we are trying to add another one added successfully so it is giving the boot message as well it's using word afpci it's the IO framework so let's just see if it's really added or it's just acting up if config e2 there we go so we got three interfaces as well you could also detach the interface very trivially just by doing detach etc so it's just self explanatory let's see a couple of quick examples for guest fish like I said guest fish could let you look inside virtual machines and do operations without having a need to boot the virtual machine again it's dangerous to run guest fish while the virtual machine is running where is it list shut down look like you asked let's do shut down again let's see what it's doing stopping all the services shut down where is it not running so let's do guest fish let's do it as root guest fish because the machine is off we can do it without operations f16 d1 dash d is domain dash i is inspect so what it's doing how do you think it's giving us a file system shell to a virtual machine when it's not running it's cheating us a little bit so it's guest fish behind the scenes it is running a small qmukm appliance to boot the virtual machine instantaneously so we could see that in the debug messages so we got the shell file system shell of f16 d1 so you could see sysconfig network or anything like that scelemics so yeah you could turn scelemics on I'm just editing is it visible it's to bottom yeah maybe I'll do it upstairs it's a bad idea to do it on the bottom shell anyway guest fish dash dash read write domain f16 d1 dash i so it's a operating system fedora 16 it mounted the file system on slash on slash code so vi slash dpc sysconfig network scelemics let's turn this into enforcing it's never a good idea to put it in permissive so it's recommended to run enforcing all the time let's see if it reflects exit so yeah bashing is not running we could also see this by looking at one of the word tools which is a wrapper around it we don't have to fire the guest fish shell all the time so we could do word cat dash d domain f16 d1 and which file we want to look inside of this virtual machine so let's we want to look at it to see sysconfig network again behind the scenes it is invoking a small qmukm appliance how it is we should see scelemics there we go so just let's quickly view the debug mode what it's doing behind the scenes you could see it's invoking a small virtual machine actually you could not see it's showing all the files which is trying to monitor where are we look at that it is invoking a small qmukm appliance the disk image is the disk image which we have f16 qcurve to etc so it's invoking essentially a small qmukm appliance to boot the disk image and give us the files as the maxes so yeah there are plenty of things which you could do you could inspect all the applications which are inside a virtual machine and you know you could edit the virtual machines file name using its python bindings just a quick python example you know how do you add a the one which we did just now editing scelemics conflict file you could do it using a script python script just trivial one you just import all the standard system modules you import the guestfish module we create the guestfish handle we add the disk image we launch what is it launching it's launching the qmukm sub-process there and then we are tracing it will just list out all the commands of qmukm and it mounts the file system just a bit of replacement logic yeah and it uploads the file and unmoves what is it doing to replace hostname in a file so automatically it can do it for you so yeah there are plenty of examples with which we could play around all day so I think that sums up the demonstration so there is a bit of exciting features coming up there is live snapshots there is a storage improvement earlier there was a limitation of adding only 20 or 28 PCI devices so that limitation is now removed with ordeus kasi and PCI device assignment and nested virtualization it's a transit nested virtualization is an interesting thing it's essentially a virtual machine inside a virtual machine so you have layers like inception so one might think what's the need of it let's say you you're trying to get a virtual you have software you want to test it on three different distributions so you go to a cloud vendor like amazon or something you get three different virtual machines instead of that you can get a big bulky virtual machine and then you create virtual machines inside that it's not all very smooth and stuff it's kind of highly experimental but I got it working and a couple of others also got it working so essentially the use case is floating around that type you could get a big bulky virtual machine and you can create different virtual machines inside that so that you don't have to get several virtual machines from the cloud vendor and there are several other features which I just haven't listed some resources if you want to reach me in that email address you can log in any questions and all these slides I've uploaded on my Fedora people page so all of it will be listed on the root com you know page so you could just go there and check out the demonstration and all the nodes and stuff any questions at the habit today performance monitoring there's Perf KVM so there are a bunch of tools which people use so Perf KVM is still in the process there's a PMU performance monitoring unit which is still being worked on so there are a bunch of tools to do monitoring and stuff when the instance is running can you hot add memory yes you could do that yeah it is essentially a wrapper actually I'm not too familiar with the internal details it deals with Perf KVM as well so VMU is a solution for performance monitoring guest machines I'm sorry yeah you could hot add memory that's one of the features which is being introduced the question the guest name yeah the question was the VI command which I used inside the guest fish how did I use it so it's guest fish will allow you to use a couple of 100 to 400 commands useful commands which are not required to rescue machines VI is part of guest fish so yeah I ran it inside the guest machine it's not OS it's not OS it runs a minimum appliance so you just put it in 4 seconds or so so how is it doing it uses a small it uses the host finderings and copies them into a different location it uses a small appliance called supermin appliance so you could read about the architecture on the website which I just listed libgesense.org we have Richard Jones who is the primary developer busybox could say but it is much more sophisticated you could script things and so on busybox is a rescue CD so it's a rescue shell for disk images so you don't have to have an IS server just do it right away with guest fish install anything else ok thank you very much