 Good morning, and hello to my basic talk on Q&E, and here you'll find, in case of all the non-use. So who am I? I'm Alexander Graf. I'm a KVM Q&E developer working for Suzy, and I've done things like, for example, the server-class Parkinson's board, or NASA's SVM, in case you've been running virtual machines on the A&E machine, and I've started for a number of them. I was one of the only members of the Suzy and W2D arm teams, so I've been involved a lot with arm teams recently, and I did things like the newly-mised-involved arm teams, with a whole new advice story in general. And this is work that's happened because I happened to know arm and newly-mised-involved. But to figure out what the differences are, imagine you have two systems. You have an arm system, this is AFCC-4, and an XACC system, this is XACC-64. We're not talking to the internet anymore. If you have these systems and you plug in a graphics card, or then they're actually monitored, and then you boot them up, what happens? Well, your XACC system just works, right? So if you plug in a graphics card to turn on, you can see the nice logo right here for this, it's shiny and gray, and the arm system pops, and basically does the cleaning. Because it doesn't know how to talk to that new graphics card. So why does your XACC-6 system know how to talk to that graphics card? Well, that was the answer. The graphics card itself has a small storage space on it, which provides a driver for the firmware that's running on your XACC system. These days it's usually UFI. And that firmware is, or that firmware driver, is an XACC-6 finally blocked. So it didn't work. Well, well. So back then, and now I'm going to have a couple, about a year or roughly four days, Ard and me sat down together because a now member started complaining about the fact that they couldn't use, that they just recently acquired another company that produced network cards, and they couldn't use their own network cards in their own arm service, and they complained about that fact, which I can certainly understand. So we sat down together and figured out what we can do to work around that whole issue, and there have always been ways to work around limitations where there are two norms in the field, but nothing ever got to a place where you could actually use it. So to understand why this really needs an issue, let's first take a look at what the difference really is between an XACC system. So if you, from a really 10,000 feet perspective look at that, and you just see, well, this is one of those XACCs, you can see registers. So registers are basically a small storage entity that you can use in your CPU to put data into, to put integers, numbers into, and then you can do operations on those fields. So if you want to do any operation like an app or anything, it has to happen on these, instead of these small pockets, and then you can combine these together, for example, on the app. So any operation happens on a lot of registers, and as you can see, as you can see, XACC has vastly different register names and different number, and everything just was different from ours. If you, for example, want to move, you just want to copy data, you want to copy the contents of this one register into this other register. Ignore names, but still, you just want to do the same operation. You want to copy something from one register to another. The operation is even called slightly differently, but down here you can see the instruction stream, the hex codes of what the CPU reads, and you can see that they look vastly different. There's basically no example to be one in the other. And the way to make one thing work on another thing, that's what you need to do something. Here, we do this really nice thing there, that happens to actually already implement a way to run XACC's instructions on your ARM CPU, or it really actually does anything maybe. Let's get to that part. What is Q and U? Q and U violates machines, usually. Something in the system, but what actually comprises of your system and what does this regulation need? So in this regulation, Q and U starts and violates the whole stack of everything that your computer emulates your CPU and emulates your hardware, every single piece of devices everything you have. And because it emulates your CPUs, it also emulates every instruction running on these CPUs, which means it emulates your kernel that's running on that virtual hardware and it emulates your user space applications that are running on that virtual hardware. And that's what we call system emulation in Q and U. That's a really slight trick to that whole thing, where you can even see the emulation part out and you push that into your whole CPU and that's what KVM is about. But in a traditional sense, this system emulation basically just emulates the whole system, everything, hardware, the top of the body, the whole stack. Then there's something about regulation. With literacy simulation, what you do is, basically you draw the line where you start to where your app emulates something way higher up the stack. So Q and U goes and emulates the kernel to user space interface what this is called in the face. And basically it only converts the structures that are needed from one to the other and the numbers are needed from one to the other to be able to run a, say, PowerPCB engine guest binary on your XADC's whole system. So it has a lot of conversion code in it to convert the structures to convert system numbers and then just invokes the same call where it was invoked by the guest binary on the host windows kernel. Which is much more lightweight, much faster and actually more elegant in the ring to run code. But that's not what we need because even in filmware, this doesn't help us anything because filmware is different. In filmware we have this thing that's just running on our hardware and it does something. So who of you understands what UEFI really is doing? That's way more handsome than I was expecting. Awesome, very good, very good. This is the right audience. So UEFI, what is UEFI? UEFI, as you drive through hardware is the first thing, not the first thing, but it's the first thing that you see that comes up on your PC. You boot up your PC, there's a small initialization code and there's your filmware, your actual big film, where you need five core parts that go and run. And those things have drivers. They have drivers for your block device, for your network adapter, for your graphics card, for your CD-ROM, for everything you have in your system that are drivers inside your filmware. And these drivers these drivers then connect to the UEFI core even if you talk to each other. They're kind of random drivers that can communicate. But they cannot just communicate with each other, you can even do fancy things like use your block device to load a payload like, say, grub to that again have grub access that same storage back end to load another payload like, say, linux and then access to linux and then pass over an access to the system over there into linux itself. So in linux there's a special mode when linux boots up it basically removes most of the UEFI driver parts and just leaves a small piece called runtime services. UEFI 10 second walkthrough. Just take a deep look at what happens when a driver gets initialized. So if you have a driver for, say, this device or that driver does it from start up because a function called install multiple protocol interfaces and it passes a struct to the function. Inside that struct there are multiple fields. There are fields for evaluating whether a device is compatible with this driver to start the driver on a certain device but also contains fields that are just numbers or pointers or something random and once it's installed UEFI core just remembers that there is a driver and now they run something like a PCI driver and finds that there is a PCI device that creates a PCI object and then asks every driver whether it actually supports that PCI device and that one then can say, yes, sure that PCI device is actually the PCI device that I can drive in the driver form and the driver can then go in and say, sure, now it gets started and then the driver goes and creates a block device inside the UEFI object space using a call called install multiple protocol interfaces where you can basically create an object or attach protocols to an object as protocols again are callbacks into something that you can find, you can find things like basically all devices in the system that support block device like callbacks and this thing basically includes one of those but again you can see that you have multiple callback function pointers and data structures and again you don't know which ones are where by definition it's just part of the specification of what the particular protocol includes when you payload the thing by using and can for example find objects that implement block interfaces and call them so block for example can go and say these give me or block devices and you have, if it means all the objects that have a block protocol attached to them and then call a read function on those which then means you can read data from the block device now how do you put all of that into a virtual context and emulate things what are the problems when you're trying to emulate code like this well for starters the biggest issue is that different new architectures have different application binaries that are standard for new devices so the application binaries defines things like the structure layouts and where things are things like the end units which way around to write characters into your memory so if you take a look at this particular block structure in memory then on your active system it will basically look like a little ending system a 64 bit so everything is 64 bit aligned and you have just a couple of pointers that are 64 bit number on number well it turns out that on an AH64 system it actually looks exactly the same because the API is compatible in this regard between the two architectures if this was a 32 bit architecture everything would fall apart pointers would be 32 bit longer but because those two architectures are so similar the structure just happens to be identical so there's a number of new architectures padding so different big guys again have different standards on how to have things how to make sure that a pointer which is the thing we see down there the reset function called that starts on a natural boundary in a pointer in a 64 bit case it's 64 bit long which means it needs to start on an 8 byte boundary which it does here in this case on xv6 because we basically take a 32 bit number write it down to memory and then the 4 pieces of memory are unallocated or unoccupied and then we show that we need to write the content to the source well it turns out on was it exactly the same way 64 of that same API compatibility is essentially assured it means in a nutshell if you had imagine you're in a CPU that could run on a cool pen at xv6 at the same time everything they keep in memory would look identical it could basically pass from the left and right without having to use it up common cases to that are really worth a hundred percent but at least by everything we care and you rewrite it up that would work and basically those two architectures the APIs of those two architectures that are used on the unify in the unify are a hundred percent compatible for integers so that means that in theory our if imagine we had an ARM grub running calling into an xv6 read function because the sluptures are the same the ARM grub can actually find the pointer to the read function because it knows exactly the offset that the read function's pointer is there so it can call it but what happens on call really I mean what happens in the CPU when you're trying to call a function function calls actually pretty simple you have a function and add arguments to it and what happens is that you basically just take all those arguments put them into registers this again is the kind of reading you have and then you take the address you came from and you set and you change the address you are at into a new address that you were that you're trying to access that you're trying to run from now that's always to call but imagine that this instruction that you're trying to execute now is an xv6 instruction because we loaded an xv6 option long into our ARM system you saw earlier that the instruction sets are completely off-organic, completely different so you couldn't just execute the code it wouldn't work so we have to do something to actually get us into an emulator at that point but how do we get into an emulator we cannot just go and change all the structure call that into something different because we don't even know which fields of the structure which fields of the which fields of the structures are pointers at which ones are there because the structures can contain any random fields at any random point it's not fully random we could create a huge emulator that knows every single protocol there is in this world and knows all the structures for every single protocol there is in this world and then clusters to replace function pointers with function pointers that happen to the emulator but that would be insanely high so basically our strip back then was why don't we just use this amazing feature called noexity so what do CPUs have a feature called an ex-mythical that basically allow it to on certain regions in your memory space say well yes this region can be read and written but as soon as some of these structures execute code from that region just don't run this tell me about it this is usually a security mechanism for music applications imagine you have an exploit that tries to put code on your stack this mechanism is used to protect the application from such an attack because it basically the application can then say that my stack I will never execute my stack just mark my stack as noexity so that if somebody tries to attack you do your stack attack you get a sector instead of malicious code executed but in this case you can misuse that fusion and just say well every time you look ex-mythical we just mark the region of the code that is ex-mythical as noexity which gets us a trap and on a trap we can do whatever we want and now we do know oh wow this application was spinning was trying to execute ex-mythical then we try to do something about it so we can swap into our emulation stack and execute an emulator at this point that tries to then read the ex-y6 instruction stream and make sense of it but to do that well we were just calling a function so what do these registers actually mean on the other end which is ex-y6 well it turns out ex-y6 again has an ABI which is very different from the ex-y4 ABI of function callbacks but it's convertible so what we can do is we can take all the registers that are also sort of registers as function callbacks from the arch side to the ex-y6 side in this case then there's a special thing on ex-y6 where you can only fit four parameters into a into registers so the fifth lips on the stack which well you can just put it on the stack that's fine and then we also put in the return pointer and execute the code in in sort of a few years because we can now just simply execute ex-y6 code but imagine that ex-y6 code is now trying to run on code because it's voice actor wants to allocate memory you can do the same thing you can actually do the same trick so it's not exactly the same as the very similar trick so our emulation code knows which regions are ex-y6 because of the registers that are in the first it knew exactly where it took those ex-y6 binary code pieces so what it can do is it can say oh well this ex-y6 code was trying to jump into an address that is outside of the store and actually supposed to emulate it so it makes a trap internally where we can just move all of those pieces back into the ABI conversion backwards again take all the registers from the ex-y6 put them into ARM state and emulate the function callback on the ARM side what that gives you is you can basically have a fully changed mode of operation of calls you can have ARM code call it into ex-y6 code the driver call it into core UI code memory allocation and the switch over between where we emulate and where we do not emulate that what happens on a function at the bottom so you basically now have a system where every single in a system where every single function could potentially be either ex-y6 or ARM code and they both mimic it with each other and they just live happily about it do you stress that this does work? yes it does so this is me in front of monitor just playing a picture using an NVIDIA TCI graphics card on an ADCI this is that film where I mean this is rough by the way so this is in Pindu so this is actually displaying something using so this ARM display something using an ex-y6 driver right off the graphics card so why didn't we use tcg on that one there's a couple of reasons for using tcg the biggest one for me for starters was C there are a couple of alternatives like box for example to C++ code but if you look at EDK2 which is the core implementation for UBI right Andreas was just saying that I should explain what tcg is which I completely concur with the terms it's not quite obvious tcg is the emulation engine QNU has used to have a different emulation engine and now about 10 years ago recently switched over to something called tcg the tiny culture so this is basically the emulation core of QNU so why didn't we use that part well it's C code which fits in really nicely with EDK2 if you were trying to put C++ code in then you suddenly have to start worrying about things like exception handling which you have to fit into the C model somehow so EDK2 gets complicated tcg is very well isolated and was originally written exactly for the purpose of taking an emulator into different projects so it was originally based on a tcc project which is a tiny code compiler so this was a tiny code compiler basically was able to compile some code within 4 seconds so you could always compile your own code and then there's another really big one that's basically most of the core code is lgpl and actually everything we use in this project is lgpl which means that I'm not going to get into real issues with tbl code but it's something in an environment that basically is very close to our strategy I don't want to even remotely give you the lawsuits of people complaining that they're whatever drivers now need to be open source because the firmware can be considered a big general entity in the emulator because that should be able to sustain everything and tcg supports x8664 it sounds trivial but if you look around in such emulators you'll find it doesn't release emulators for 32 different x86s but people are finding them for 64 different x86s tcg is really one out of 3 emulators so how do they use those things I guess anybody of you know has their own system to know with PCI Express of course and you want to put your emulator back as a name and just try it out or maybe you just want to machine emulate your answers for x86s which you can do there's a data repository it's also very much a big website called x86 emulator package and that's a very easy thing to basically compile an open emulator so it's very much used inside of q and u if you're buying the virtual q and u machine in if you want to go to q and u machine itself on an x86 virtual machine if you're running on random holes and running on a virtual machine then this is basically the binary running while you're running that virtual machine that description is also a big websites system but you can see it's actually not hard to really just do the same thing you would do if you were building your firmware manually but you also just say you can just also refer to this module over there and you're going to see how it works for real not just the picture of me standing in front of the screen I could have taken so the demo is going to be exactly what I explained I'm going to run an armed virtual machine on an x86 notebook while it's an option run an x86 option run that's passing, using and interview the expert network for starters we're going to run we're going to run the emulator there we've got the normal version without the emulator so this is basically we're going to run a virtual machine a stock virtual machine with a stock EDK2 firmware that is exactly the way you would run it in an under-resolution you can see that it builds up and it basically shows you where but it didn't find anything you can look at PCI and you can see that it found an e1000d pixel one e1000d adapter so it found a PCI file and if you look at a shell you can see that there is no network device at all whatsoever because the normal film where you're running does not contain a driver for the e1000d because usually the normal environment comes with a type so if we run the same thing again with a film where it's compiled with the axis it's emulator and you can see one major difference in that the film where it actually goes it tries to do the job of creation and then loads stock from there which is an Armwork this is exactly the case I was explaining so this is an Armwork running on an arm system this is an arm virtual machine but the whole network drive everything was loaded from the network everything is still in there is going through the x86 driver the emulator is still in the network and you can just use that as a question because it feels like a normal driver and you don't even realize that it's an x86 driver it's just exactly as that film where the driver was actually living in a normal approach and two drives you don't even see a difference you are also to take care about sort of interruptions also the question is we're taking care of options and data and we need to take care of interruptions what do you mean by interrupts and interrupts? yes interrupts maybe some of the drug ads are there in terms of interrupts to ask some of the drug ads so yes the question is basically are we really taking drugs how do we conduct the kind of interrupts are we always expected a way in any case even in the u of i frame or any place people see a device driver does not have direct access to the network to view it doesn't even know necessarily what interrupt control you have so you always basically just give it one piece of call back to what you want to get hold when there's an interrupt arriving there's nothing keeping you from running the embedded interrupt so you have to think about it but the conclusion basically is you don't have to do anything okay so two questions first question is easy to answer this works during boot time so basically when you boot up your system what do you need to do to make that was the question sorry I guess that was the answer that was the question so the question is this works during boot time I do it during boot time basically when you're on system boot up but does it also work during run time so when the limits of weights is sort of blue line when the limits of weights are running and threw away most of the u of i code u of i can actually serve some teachers there and make them available to windows the answer is very simple yes the first part let's defer that to later and then first answer those two questions so maybe I'm terrible at stacking if I get multiple things on top um so yes the first part because we basically removed the xv6 emulator while we from the whole context so we would now have arm code calling the xv6 code for wheel the good news is it never happens because there is basically not a problem in notifications or anything that we do so it happens to accident it happens to work by accident but it is an accident I do agree linux so the question is linux never causes an option in a normal environment in normal circumstances linux never causes an option into the first place the only thing that could happen is that the driver registers a special thing for run time servers and that could happen but I'm not aware of linux drivers that actually do that so our manager said that the emulator so we have code that basically the way the run time driver would get initialized is that we have another dv button that gets loaded and that gets marked as a run time driver and we refuse to attack for run time drivers so it wouldn't even get loaded we would just refuse to load that driver and this would potentially not work and we would not be ready to attack them I do agree what's that we're able to do? so yes the answer is there's another sort of case for network drivers that there's another sort of case for network drivers that are not allowed to register run time services anyways because they have a run time driver thing and for human for human question we realize that a lot of in general this would be truly important but in you group we only implement interfaces that are facing new applications we are facing very few interfaces that an actual device driver would use so for IPX we might make it work but for a really network car that only uses some PI, random call backs which we don't even know if this doesn't make sense think so you know what? you get a microphone if it works the button doesn't ring if you can make it work you have to speak to that there's a button that doesn't ring let's do it now my life is a work method it's the who I teach their hardware and find it in the States more questions the problem you have solved is that what would happen to you over you were expecting that the question is all of this should be solved by EBC what happens if you like to ride the horses EBC is a great option it doesn't even have a compiler as far as I know it's essentially abandoned because EBC doesn't exist anymore in England so there's no optinons shipping with EBC and the problem we're trying to solve is that you have existing devices that you want to run and these existing devices will definitely have XC6 optinons and basically zero chance for the EBC optinons so you could either work on vendors to have an XC6 optinome and an hour optinome place there so that with our native arm if it finds a native arm system it will convert into a completely weird zombie-ish platform that nobody really gets about which sounds to me like a dead body so it's a problem I can add a little bit to that we had this discussion in 5.4 that talked about maybe someone from EBC an interpreter from arm because for years there was an interpreter for 9 or 10 minutes 4T is the same because the problem we don't have created so we went proudly to that let's spend itself in 5.4 and tell the problem not to problem the normal customers at the end of the day when they looked at it most of the vendors really wanted to run the if they were going to strap the driver to hardware they would just ask the hardware vendor to give you an idea of what's right and so if you're solving a small one of these problems on the servers there's not a need of a system for it and then have the problem of having a carrier you have to carry around two options 1.4 EBC 1.4 3.4 so the increase in the size of the firmware is important the increase in the size of the firmware because you're entering the you added the emulator to the firmware so of course about 10 megabytes if you add the 2,200k that's a good number to put it back on the government to be honest it was suddenly magical compared to the overall size there's a bunch of ones that I started before but I don't remember the numbers and I'm just I'll follow up before this gets to the commercial stream there are basically two things blocking this the biggest thing is that it's using a framework to have DEP loader so we need to basically when a binary gets loaded we need to say well I know that this is a binary that's not meant for your architecture but please load it anyway and tell me about it and then I will fix things up for you and that interface is not in core EK2 and getting the scene is from all I hear very painful because you first have to write about 200 pages of work I think it's actually getting to the post that's what I heard but if you think about it I'm more happy to make a patch and it's in so time's up answer is the parallel city thank you