 So it's time to start, hello and welcome to this talk. My name is Thomas Woot and I'm telling you today something about KVM on Power. I'm a software engineer at Red Hat and working there for another virtualization team and taking care of KVM on Power. So I thought it would be a good topic for our talk. So what I'm going to tell you today is a short introduction into the Power architecture and the Power platforms and then we have a close look at KVM and QMU on Power directly. And at the end I want to talk a little bit about something that is special on this architecture too. That's the partition firmware that is used to boot the guests on the virtual machines. So let's start with a short introduction of the Power architecture and platforms. What is Power in this case here? I mean Power is of course a word that has many meanings but in this case it's a family of risk CPUs and it's an abbreviation for performance optimization with enhanced risk. It was originally introduced by ABM in the early 1990s with their Power 1 server processor and ABM continued to develop this processor line for the servers until Power 8 which was released I think here two years ago in 2014. Beside that there's also the term PowerPC you maybe know that. PowerPC was introduced by a joint venture of a couple of firms in the 1990s. IBM, Motorola and Apple they teamed up to build power compatible processors that were suitable for desktop PCs too. And later also Motorola and later Friscale also used this term for releasing embedded chips that were compatible with the power instruction set. However the PowerPC name is not officially used anymore by these companies so it has officially abandoned I think maybe 10 years ago or something like that. But you still see this name in a couple of software projects for example in the Linux kernel the architecture is still called PowerPC and in the QMU sources you will see the abbreviation for PowerPC called PPC. So the naming can be a little bit confusing you have power with capital letters that's normally meant to be the server chips by ABM. Power with one capital letter and the rest small is normally meant to be the power architecture or the power family in general and there are these legacy terms PPC and PowerPC which normally nowadays if they are still used also mean just referring to the whole family of power chips. At that point in time let me ask a question whoever here in the audience worked with a power based system before. Oh that's a lot. Okay whoever of you was using here the open power hub in Bruno. Okay that's not that much so if you are interested here's an open power lab in Bruno you can get virtual machines on so if you want to experiment a little bit that's a good place to start. So what makes the power architecture special so you will have generally speaking two different flavor of the chips there are the server chips many manufactured by ABM and there are embedded power chips you can maybe have in your dishwasher or in your mobile phone or something like that which are often manufactured by free scale and other companies. Yeah desktop yeah it was it was Apple who used the desktop and this was the last power PC chips so they were in between I would say yeah so if you're really interested I've got a nice picture at the end yeah so here you can more or less see the evolution of the instruction set architecture and the server line by IBM and then it was the desktop line which later influenced the book is the embedded line and someone in 2006 people said oh it's got too diverse we've got to unify again and since 2006 there's at least only one instruction set architecture specification so it got a little bit much again but within the instruction set architecture specification there are two books books 3s and books 3e so that's still a difference between the servers and the embedded chips. Beside the server and the embedded flavors you can have 32-bit or 64-bit machines where the 64-bit machines the instruction set is a super set of the 32-bit instruction set that means unless like in ARM as far as I know where they completely reinvented the instruction set for 64-bit the 32-bit instructions are still valid in 64-bit mode and interesting thing is that you can have some of the chips can run in big or in little engine mode and traditionally big engine is more common but this is also changing nowadays for example with the new Power8 server CPUs you can switch them to a little engine mode too and since our last rel release we've got also no support for little engine in the rel kernel platform so this was the power architecture only talks about the CPUs themselves so everything that is around the CPU defines the platform so this is the other hardware that can be found like South Bridges or onboard devices and of course also only the firmware interfaces which are used for booting the machines and on power platforms each platform mostly often looks completely different than the others so for example the Apple Power Max had completely different interface than IBM servers or the embedded chips for example the embedded chips are often system on a chip where everything is in one chip even the thing that you normally find on a South bridge or network devices or stuff like that we will talk later about firmware interfaces so this this graphics shows you a little bit this variety so in the x86 world you basically have one kind of machine standard PC you could say of course there are some differences you can have different chipsets different South bridges but more or less the programming model for the operating system looks everywhere the same so okay nowadays you also got UFI beside BIOS for booting but the machines look very similar while in the in a power world every machine can look very different so for example at PlayStation was also one one power PC based machine which looked completely different than an embedded power PC that you used in your dishwasher for example and there are two platforms I want to talk a little bit more about and this is the so-called pepper platform and the power and weave platform and I start with the pepper platform sometimes also called as pepper for server pepper to distinguish from the e-pepper which is the embedded pepper pepper means power architecture platform reference and to understand this you've got to know that the IBM servers ship with a standard hypervisor since power for already so your operating system on these machines is normally not running bare metal but within a virtual machine and a logical partition and wording of IBM and this logical partition environment is defined by this by this pepper specification so pepper defines a set of virtual devices like a virtual network interface virtual SCSI interface virtualized access to the management memory management unit and virtualized access to the PCI host bridge it defines an environment for booting or firmware interface in this case it's open firmware I'll speak a little bit more about open from a ladder slide slide and also a runtime interface for the operating system so this is a set of hypercalls and something called arthas which means runtime abstract service is why I am am I you telling this this is good to know because QMU mimics this pepper environment with the so-called p-series machine so QMU and this case also then KVM of course mimics this standard hypervisor that is normally shipped with IBM server processors now you might wonder if these server machines shipped with a hypervisor how can I use KVM on these because KVM is also hypervisor well here comes the power and we platform this is a new offering by IBM and that are that are the machines that are shipped without this standard hypervisor so on these machines the Linux kernel can can run on the bare metal and of course KVM is part of the Linux kernel so in this case you can use KVM on these machines too and of course they also come with their own firm interface so in this case it's called opa so if you ever see that term it's it's the firm interface of the power and we platform but I'm not going into details here okay now we've got our machine now we can run KVM on power just to recap a little bit for those who not that familiar with the terms and KVM as the kernel based virtual machine it's a it's the virtualization infrastructure in the in the Linux kernel it's a kernel module which provides access to the virtualization features of the host CPU of course alone if you just have a kernel module you cannot do anything as a normal user with it so you need a user space application which which uses these interfaces and this is normally QMU the quick emulator and since KVM is only taking care of the virtualized CPU everything else and that means I owe emulation as normally done in the in the QMU emulator so virtual disk virtual network stuff like that on top of this you normally have some some kind of management layer like lip word verse that manager over however this is not part of this talk it's I just wanted to mention it here for completeness so speaking about the KVM kernel module as a normal user you do not have to know that much about it because you normally do not work with it directly there's only one thing I wanted to mention is that on power we have actually two different KVM kernel modules the first one is the so-called KVM HV kernel module and HV stands for hypervisor mode this is a bit in the machine state register which tells the CPU that it's now running in hypervisor mode and this kernel module takes advantage of the virtualization features of the host CPU so it's only working on on CPUs that support this feature for example the latest power I chips from from IBM and because it's using these hardware features it's the virtual machines can run there very fast but it has the disadvantage for example that you can only run a guest CPU that a CPU type in a guest that is very similar to the to the CPU type in a host so for example you can use a power 8 guest CPU on a power 8 host and I think it's also possible to use a power 7 CPU on a power 8 host with some compatibility compatibility features the other kernel module is the so-called KVM PR kernel module where PR stands for the so-called problem state bit in the machine state register and this bit says that the CPU is running in user mode so how this works is that the virtual machine is actually run in user mode so the KVM module switches to user mode and as long as you only run user mode instructions in your virtual machine they can be executed directly now if you run privileged instruction a kernel mode instruction in the guest then the virtual machine is exited and this privileged instruction is emulated in the host either in a kernel module or in QMU so this is of course a little bit slower if your guest is using a lot of privileged instructions but it has the big advantage that this kernel module more or less runs on every kind of host CPU and because of because the user mode instructions are more or less similar on all power architecture CPUs you can even run different guest CPUs on your host or for example you could run an embedded guest CPU type on a server CPU host it of course has some restrictions you cannot run a 64-bit guest on a 32-bit host CPU but apart from that it's much more flexible and it has the nice advantage that you can use it for nested emulation so you cannot use the KVM HV kernel module for nested emulation but if you want to run a guest in a guest on a host then you can use the KVM PR kernel module so that's all I wanted to say about KVM looking at QNES the question actually I'm not sure about the KVM PR kernel module but you can with the KVM HV kernel module you can run a little Indian guests on big Indian hosts and vice versa so looking at QMU because of the variety big variety that you've seen already it also supports quite a big number of CPU types and also a big number of machine types or in my previous working platform types if you want to run a normal server VM on a server host you should of course use enable KVM and always use CPU host and machine p-series now the good thing is if you use enable KVM CPU host is the default and machine p-series is also the default so you can simply also omit the last two parameters and you also get what you want normally something that is different from the x86 world is that you also get a set of devices devices that are specific to to this paper platform so as I said earlier in this paper standard defines how our para-virtualized network interface and the para-virtualized SCSI interface should look like so QMU offers these devices here too and it's important to know that if you run QMU without special parameters or if you use for example the convenience option dash HDA you get one of these devices here so the paper devices and that's good because older guests that do not support word IO yet normally support the paper standard so they immediately can work with these devices however since the paper standard has not been designed with linux in mind and since these devices did not get that much attention as the word IO devices yet if you really want to have performance you should rather use word IO net and word IO block and word IO SCSI devices instead instead if possible if you create new virtual machines everybody knows what word IO is anybody who does not know okay great so if you are a little bit familiar with KVM on power already this is a slide for you this is what has been changed within the last year what's new so we had a lot of cross-engineers fixes it should now be working pretty well to run a little alien guest on a big Indian host and big Indian guest on a little Indian host we've got a new feature called dynamic micro threading or it's also called dynamic split core this these power eight chips are eight-way hyper threaded so each core has eight hyper threads and this dynamic micro threading uses these hyper threading in a little bit better way so that you can for example if your guest has does not use hyper threading but has eight real cores the scheduler can use one host core with the eight hyper threads to run them at once we've got now hot plugging of PCI devices and memory it's partly still work in progress but the basics are there and we've got a set of new hypercalls for example the H random hypercalls which allows you to pass high quality random number data to the guest so that the guest has a good source for random numbers too it's an alternative interface to what you also get with Word.org if you know that so we've had a look at KBM we've had a look at QMU now I want to show you another difference to the x86 world and this is the firmware or the bios that is used in in a virtual machine to to boot the operating system in this case here it's a firmware called sloth and that's an implementation of the so-called open firmware standard and this is used because this pepper specification mandates this open firmware standard open firmware was originally standard introduced for the sun spark machines later also adapted to power pc and arm and I think there was even a version for x86 used from one laptop per child and it was an IEEE standard in the 1990s however that standard did not get extended so it's considered as abandoned nowadays and it's based on a fourth programming language and you might wonder why fourth it's a stack-based language and it has the big advantage that you can do an interpreter with a very small footprint so you could put an whole interpreter into a small ROM which was important in the 1990s and have you already been to the IT museum here I can recommend it there's one for example there's one 8-bit machine which has a fourth interpreter built in as as interface 2 so it made me smile so QMU now uses this open firmware interface and uses the sloth firmware which has originally been developed as alternative firmware for from some of the IBM servers and later been open sourced for the terasoft power station in 2008 so the power station was our last attempt to continue with desktop power pc machines after apple said oh we are doing now x86 instead however that was never really successful but at least the open firmware has been open so the firmware has been open source for for this computer so when a firmware was needed for QMU for the p-series machine QMUs this open source release has been adapted to run in partition of QMU instead some features of this firmware it's supporting more or less all typical devices you get vga support you get usb support even xhca support it has already typical block devices kazi word ioblock are available and one nice thing is that you can even get support for some some file systems so it has support for fad extender 2 for example so if you want to boot a file from your usb stick that you plugged in you can do it immediately without an additional bootloader magic of course it also allows you to to boot from via network so it supports the typical network devices with ionet and also the pepper network interface and beside the normal ip version 4 support there's also ip version 6 support almost there and there are some few things still left to be done but should be doable in this year i think so when you start your partition or your guest your vm the first output that you normally see is something like this here and the interesting thing is here it says press s to enter open firmware and then you end up up at the sloth or open firmware prompt and now i want to show you a little bit what you can do here i'm trying to do it live i hope it will work is it big enough can you read it also from the back okay great so it says press s so i should do that and so now the firmware is currently booting up it's terribly slow here because it's running on my laptop in emulated mode so it's running without kvm and of course if you run it on a real machine it just goes flip and you have to be really fast to press the s key before the operating system is booted so the firmware is written for most parts parts and fourth and this gets interpreted during each boot so this is also not as fast as a firmware that is completely written in c but yeah it's still usable in an emulator so that's okay so now here do you see it okay should be okay i hope so now we are at the firmware prompt and now what can you do here i mean um you probably haven't never been here before so you can do everything what you could do with force so force is a stack-based programming language so i can say three and this puts a three on a stack and as you can see here that it's a little bit hard to see maybe i go back i think that's a little bit better so you can see here the prompt has changed and the number here says how many items you've got on a stack so now i have a three on a stack i can put a four on this four on the stack and now i have two items on my stack and if i want to see what's on a stack i can use for example dot s and it prints out three and four i can use plus plus um takes the top two top most stack items and adds them together and puts the result back on the stack and i've got of course only one item on the stack and if i just use one point that prints out the top most stack item so three plus four are seven for example that's that's just the very basics of force so if you want to have a closer look i can recommend some reading later the more interesting parts are maybe analyzing the device tree for example so the device tree is a structure that tells the operating system later how the the system should look like for example where can i access the pc i bars which parallelized devices are available and stuff like that so to to have a look at the device tree um or to to move around in the device tree you use the dev command and then you say at which node you want to go for example i go to the root node and say dev slash and then there's um ls so it's a little bit like shell so list and it shows me how the device tree looks like so you can see for example we've got a para virtualized quasi interface here at this address and there's currently for example only one cpu in this virtual machine and now i could for example have a closer look at the properties of that um um of that cpu if i and if i go to that node um and then there's the dot properties command well by the way we've got tap completion so if you press tap twice you get a list of um of supported commands and there you go so um the cpu node you can for example have a look at how big the caches are or what frequency it uses and stuff like this so this is sometimes good for for debugging your boot processor something is not working as expected now if you're done with analyzing the system um the most interesting thing is of course to boot system and um if you want to know um where you can boot from for example if you do not want to boot from the default disk or from um because the default installation has been corrupted you want to boot from another disk um then it's good to know what devices are available and for this i recommend the so-called def alias command so this prints out a list of aliases to the device tree nodes and um the sloth firmware adds an alias for each device tree node which it thinks it could boot so for example if i want to boot from the cd rom i simply say boot cd rom and it says load failed because the cd rom image i've added to my virtual machine does not contain a bootable cd rom but this i know that um the cd rom image uh that i've added to my virtual machine contains a file which i could boot so um since sloth now has support for the file systems i can specify that file here and say oh how did i call it i think it was test dot fs so this is a little force script and yeah it booted the force script and says hello well thank you so um apart from that there's also um um support for network booting so um if you can't forget our tftp server it gets the information from tftp server automatically but you can also specify the parameters manually so the first parameter for example is the tftp server address then you say the file name and then you say um which which should be the ip address of the client so in this case i am using another one more and should also work for example now i did use the 0.16 address so yeah we finally booted our um our virtual machine our virtual power system and now everything is running and that's pretty much it do we have any questions i have some minutes left so yeah yeah yeah so um the question is what's the real advantage of using power hardware compared to intel um i would say i'm not the right person to to answer this question better go to a sales person instead um if you want to have an unofficial answer for me i would say there are a couple of advantages like um for example these power chips have a high-speed interface where you can attach gpu's onto and there are certain workloads which are really running faster on power as far as i know so and do you want to have a microphone maybe the advantage of power is that the threads in power 8 performs quite much better than on the hyper uh threading i intel has so uh there has been some ibm did release some performance numbers where they had smt zero which means no threading at all and you had a certain amount of uh performance out enabling smt2 where two threads it was a little bit up and up and up to smt8 it was actually twice the performance as without uh but that requires that the application is written completely for uh multithreading so it can actually benefit from it and when it comes to virtualization it's um there are some unofficial uh uh reports i've heard from ibm where uh they actually put up a full specs machine and started a new vm every second and they stopped around 1000 vms and uh so say it's in the production environment you can run 200 vms on a quite uh okay uh specs machine uh you cannot do that with the same price range with intel uh so so there's benefit and uh you mentioned the the hardware interface you have the copy interface which is actually uh a way of uh kind of pc i over to an external device ibm put up a replaced a half a rack of intel service running a reddit database uh with four units it two units were uh uh power pc uh power 8 machine and the other two units were actually uh disk array with ssd 40 terabytes of ssds and they could actually uh with this copy interface uh get the performance that when they worked with the disk it was like working with uh ram uh which meant that those four units replaced half a rack for the reddit database and they had better performance so that's just a few vague uh indications and if you go to dev conf i think uh last year uh jeff shill who is the chief architect for our power uh leading some power he had a presentation here where he actually uh told much more about these things yeah i think that presentation is online on youtube i've seen it so yeah you can write false code that's that's incredible isn't it so um yeah you mean okay if you're if you're happy with the normal ibm system you can of course buy also that this is so um this this power kvm or the kvm on power stuff you can also only run linux clients for example you cannot run ai x on on these machines yet so this is um but these machines are of course also a little bit cheaper so um if you just um are happy with linux running on on power and um you already have some knowledge with um how to manage a kvm based system then these machines are uh the ones you want to have but if you have traditional ai x words workloads then of course you you um need a traditional systems instead so um for just i am repeating a little bit for for for those who um watch it later online um so um the power kvm offering um to say trod is is also to um to attract some people who are used to to the internet machines you could say right um so the the um people who are used to kvm on internet machines um so that they could easily or be moved moved over and um yeah so we're pretty much out of time last i mean it's still supported you can still buy it yeah so i okay yeah thank you um if you have more questions you're mailing this somebody over you wants this car one two three