 It's Friday. Welcome to Friday. Two weeks left to finish assignment three. People seem to be doing pretty well. We have an exam a week from Monday, and just one more week of content. So what I'm going to do is today we're going to talk about virtualization. So we'll talk about virtualization today. On Monday, we have a paper on para-virtualization, which is an interesting twist on virtualization. And a very common way of actually doing virtualization, for reasons we'll discuss today. On Wednesday, we're going to look at James Vickin's paper, which is on, really, you can think of it as a way to virtualize execution environments inside web browsers. So that's something that's pretty somewhat orthogonal. But I think it's a fun paper, and it would be fun to read. And then on Friday, we'll do some sort of review session for the exam, and then the exam will be online. OK, so there are four new groups out of 100 Club, believe it or not. I'm going to butcher most of these names, Krishabu and Girish. I think I saw Krishabu here. Congratulations, guys. Akash and Sagar, are they here? They came to class today. I should just put everyone in the 100-100 Group every day. That way you guys would actually come to class. Kartik and Ananda, you guys here? Don't see them? They didn't come to class. Pranav and Viral, Pranav is usually here. Congratulations, guys. So yeah, eight new people who know how to do assignment three. So find them and get them to help you. What's conspicuously missing from this list so far are undergraduates. We are still waiting for the first undergraduate group. The undergraduates outnumber the graduate students in this class. So far, they have decided to take a more relaxed approach to completing assignment three. Fine, that's fine. Thank you, Jerry. OK, so clearly we're going to need more A grades this year based on how you guys are completing the assignments. I'm really excited about how far people are. I think there's some other groups that are knocking at the door, and you guys have two more weeks. So keep it up. OK, so today we're going to, you know, early in the semester, I used the matrix analogy when we were talking about how operating systems work. And to some degree, that's true. But today we're going to peel off another layer of the onion and blow your mind a little bit more and talk about how we can actually run operating systems on a hardware that's virtualized. So you guys have been doing this this semester if you've been using our virtual machine, and you may wonder how this works. Now, this is a topic that really deserves large amount of time, maybe an entire semester. But I'm going to try to do a little bit of justice to it in just a couple of days. So what is virtualization? So up until now, we've been talking about operating systems running on and managing actual physical hardware resources. So running on what people in the server industry sometimes refer to as bare metal. And bare metal is real hardware resources. We assume that the operating system has exclusive access to these resources. It's managing an entire machine. And it uses these hardware interfaces directly to communicate with the other things of this machine. Now, this is still something that's a pretty common way for operating systems to work. Actually, it'd be a great thing to find out. I have no idea what the numbers would look like. I suspect that the percentage of operating systems that are running on real bare metal has been dropping. Maybe. Although with mobile devices like smartphones that are now running essentially mature operating systems, maybe that's helping tilt the balance. But there are certainly a lot more cases where this is not true today than there were 20 or 30 years ago. And that's because we can also run an operating system inside what's called a virtual machine. And we make a distinction here between a virtual machine which is something that looks like a machine. The analogy with virtual memory is not actually too bad here. And we'll come back to it later in the class. A virtual machine looks like a physical machine. And in fact, to perform full virtualization which we're gonna talk about today, it has to look and behave exactly like the physical machine. And, but we use a different term to refer to the piece of software that provides, that creates a virtual machine. That we refer to as the virtual machine monitor. So virtual box, VMWare, these are, this is software that falls under the category of a virtual machine monitor. It implements the virtual machine sort of abstraction. You could think of the virtual machine as another type of abstraction. So there's some additional terminology that we have to introduce in order to get this to work. And I don't have as many slides as usual today. We're gonna go slowly at the end to go through some of the examples. I wish I had more pictures, sorry. But this gets, it starts to get confusing. So just, if you have questions, please stop me and I'll try to go slowly and carefully. So the operating system that runs inside the virtual machine we refer to as the guest operating system. The operating system that runs the virtual machine monitor we refer to as the host operating system. There are now two operating systems running on the machine. For the type of virtualization we're gonna talk about today which is different than para virtualization which we'll talk about Monday. This is the type of virtualization you guys are probably used to as an end user. When you use something like virtual box you have your own operating system. Maybe it's Windows, sorry, or something else that runs the virtual machine monitor software, virtual box. That's an application that provides the virtual machine abstraction. This is even hard for me to talk about. Inside that virtual machine you have the guest operating system. So in your case, if you're using the OS 161 virtual box that we created for this class that guest operating system is some version of Linux. I think it's the Lubuntu packaged version of Linux. I don't remember exactly what version it is. We created it a few years ago so it's a little stale. So does that make sense? Does this make sense to everyone? And again, hopefully you guys are familiar with this because you've been using some of this software this semester. You used it at other times in other courses. So whatever it's running on your machine is the guest operating system. Virtual box is the virtual machine monitor, provides the virtual machine abstraction. Inside that virtual machine runs Linux. So there are now two operating systems that potentially more. You could probably start up several different virtual machines to run several different operating systems. This is something I used to do. I don't really do it anymore because my students don't let me program anymore. But when I used to program I used to do this just to sort of keep things separate so you had different development environments for different projects. But you can certainly run more than one guest operating system inside the host operating system but there is only ever one host operating system on a single physical machine. So obviously the virtual machine has to differ from the physical machine in a bunch of really crucial ways. So in order to do this safely, I cannot provide the guest operating system with the same level of access to the machine as I provide the host operating system. If I did that, then the guest operating system would be able to see things outside the virtual machine it wasn't supposed to see, it would be able to mess with other programs or whatever. Remember the operating system is this piece of software that seems to be really powerful. It seems to be fully in control. But in order to get this to work, I can't allow it to do that. So for example, your Linux that is running inside virtual box cannot change the page table entries or otherwise muck with the parts of the system that are being controlled by windows or whatever the host operating system is on your machine. So that's really important. We, the term I'm gonna use later in the lecture is this idea and you see this in other places of not allowing the guest operating system to pierce the virtual machine. I don't wanna let it get out. It shouldn't be able to see anything outside the virtual machine. So if I give virtual box a certain amount of memory to use to create a virtual machine, the guest operating system should be able to manage that memory and see that memory but not see anything else. And of course, these two things are related. I can't actually allow the guest operating system to run with the same privilege levels that it's normally used to. So this is, remember, we talked a long time ago, how did we bootstrap the privilege of the host operating system? You installed it on that machine and during boot, it starts running at a high privilege level and then lowers the privilege level before it runs other things. These operating systems are being run as applications and to prevent them from interfering with the host operating system, we have to make sure that they don't have the same level of privilege that they would normally have. Okay, so, and so the virtual machine monitor is, again, a piece of software, virtual boxes, just a program that runs inside the host operating system that can allow another operating system to be run as an application. And this is the point alongside other applications. Inside, and in order to do this, I have to create this illusion that the guest operating system has access to a virtual machine that behaves in a way that's analogous enough to how an actual physical machine would behave. And again, I mean, this is hopefully more proof, if you needed it, that the OS is just another program. I can run it inside this environment and it will behave properly, provided that I create the illusion of the virtual machine effectively. One of the reasons that I like to cover this at the end of this class is I think if you understand virtualization, if you understand what we're gonna talk about today and Monday, you understand operating systems. If you can understand virtualization and what goes on here, it really means you have some deep understanding of how operating systems work, because this sort of, in a way, ties a lot of things together. So yeah, so remember when Neo meets the architect and it's like, yeah, we've destroyed Zion like a gazillion other times, right? We thought we were special, right? Not anymore, right? There are a bunch of other matrices. Okay, so we're gonna get to the how question. You guys are probably wondering, how do I do this? How do I take a piece of software that's used to having exclusive access to the machine and allow it to run as an application? But let's talk about why we even got to this point. Why does anyone want to do this? And we've been talking about operating systems all semester and I've been trying to convince you that they're fantastic, that these great, they provide all these nice abstractions, they're really good at managing the physical resources of the machine. So why would I want to go to what's gonna turn out to be a fair amount of trouble to take this, all the powers that these operating systems have and confine them to this little virtual machine? So let's talk about some of the problems with OS environments. So the first one is something that I would refer to as hardware coupling. What this means is, so the first thing this means, for example, is that you can't run multiple operating systems on the same machine. And on some level, this is something that maybe you only are going to do if you're a little bit geeky already and you're taking a class on operating systems so you're not a normal human being. But particularly when we start talking about para virtualization, this is a tool that's now used heavily to create what you guys know of as cloud computing, commodity computing. A lot of the load balancing and the elasticity and the other types of things that are provided by cloud computing providers are all based on this idea that I've broken the connection between a virtual machine and the physical hardware that it runs on. So when you purchase, for example, computing, how many people have used EC2? Okay, so let me give you guys a hint about how to get a job, okay? One of the things that you should do to get a job is use things that other people use. So EC2 is pretty prevalent and the best thing about EC2 is there's a free tier where you can play with it for a period of time and get a sense for how it operates. You guys should certainly know how some of these tools work and that they exist and what some of their capabilities are. Anyway, so go out and play around with EC2, it's kind of fun. But the way systems like EC2 work is you pay for computing as a commodity and then Amazon intelligently tries to use as few machine resources as possible to run your virtual machines to some degree of quality. And one of the ways they do that of course is they take a bunch of machines that look different from potentially different clients and they actually run them on the same physical hardware. So there's some big powerful 64 core machine sitting in some data center in Virginia and you've purchased this little micro instance from EC2 and it's running as this tiny little virtual machine next to a bunch of other virtual machines on top of this one big physical machine. And of course there's lots of fun things. There's lots of nice properties that provides. For example, it looks like a machine to you despite the fact it's sharing hardware with a bunch of other machines. So something else that virtual machines make possible which of course we've exploited to provide you guys the development environment that you get when you get our virtual box image is that they allow me to transfer setups between different machines much more seamlessly because to some degree the virtual machine monitor is providing a more consistent environment that the underlying hardware would. And it turns out if you read more about this and delve more into this, what's happened actually is that the needs and interfaces of virtual machine monitors have actually really started to change how hardware is built. So do you remember last time we talked a little bit about the fact that these networking cards now have a bunch of different ports to allow them to be used simultaneously by multiple cores? Features, some of these features like that started because of virtualization. They started because it made it easier for a bunch of different virtual machines running on the same physical machine to share an underlying piece of hardware like a network interface card. And again, so Massey do adjust hardware resources to system needs. And this essentially is cloud computing, right? So rather than having to buy a server and hope that that server can keep up with the demand of a particular application, I can buy a piece of a larger server and allow that piece to fluctuate and essentially be scheduled on a beefy hardware resource in a way that meets the needs of the applications that are running on it. And this is something that, you know, this makes much, much more efficient use of hardware. If I have 10 websites, for example, that are active at different times of the day, I don't need 10 different servers that are all underutilized most of the time. Now I can have one server that's heavily utilized all the time by taking those 10 setups and collapsing them onto one piece of physical hardware. And if one of those websites suddenly blows up, I can, and starts generating a lot more traffic, I can move it to a bigger machine where it has access to more resources. So this is another nice thing I can do. And these are kind of the same thing, right? So the hardware coupling, static up-prompt provision of machine resources that doesn't, it's hard to respond to demand. Okay, so another problem with operating systems, and this is something I've alluded to before, is application isolation. And so operating systems leak a great deal of information between different applications, and in many ways applications running on the operating system compete for the same underlying physical resources. So if my database server is hogging all the memory on the machine, it may mean that the file server slows down. And this caused, you know, okay, so I'll get to this minute, this caused software vendors to start to require that in order to certify a particular piece of software, enterprise-grade software, it had to be the only thing running on the machine. Because if you call me up and you say, I paid all this money for, I don't know, some sort of network file server, and it's so slow, and then I realize that you're running eight other applications next to it that are causing it to slow down, I don't really feel like it should be my problem. But the fact that people started to require that you have this one-to-one mapping between services and machines really drove a lot of the need for virtualization in the first place, because I don't want to have to buy a new machine for every piece of software I want to run. There's also issues with software setups. You know, I may want to install certain versions of packages. I mean, this is largely, I think, been, is now a solved problem. Package managers were pretty well, but in the past, this was a little bit more of a pain. And certain applications may have, may operate, but remember the whole exo-kernel argument, right? And to some degree, a lot of the exo-kernel arguments and design principles haven't necessarily made their way into the operating systems that you guys use and the operating systems that run the services that you use. So I can take Linux and I can tune it to the needs of the file server or I can tune it to the needs of the database server, but if those aren't the same, then I have to make a choice between those two applications. If I have two virtual machines that I can figure one to work very well with the file server and one to work very well with the database server, then I'm gold. And, okay, so I talked about this already. So virtualization allows me to package and distribute also an entire software environment. How many people have ever gone to the website of a tool or some sort of thing you were interested in experimenting with and actually asked you to download a whole virtual image to play with it? There are tools that will do that. It's kind of cool. You have some sort of server software that requires a couple of moving pieces and rather than giving you all these instructions and it might require that you have a particular machine that you can install Linux on and blah, blah, blah, they say forget it. Here's a virtual box image with all of our software installed. You download that, you fire it up and you can play with it. And then of course when you're done, you can just delete that if you don't like the software and it's gone. So there's no need to have an entire spare machine line around in order to play with the new tool. We can dynamically, so I mentioned this before, we can take one big piece of set of hardware resources and dynamically divide it up. Certainly we can do static provisioning up front but we can also do dynamic provisioning at runtime and use that to run a bunch of different virtual machines with different operating systems, different operating system configurations, different access control policies that might belong to different customers. So again, this is all just what makes cloud computing possible more than anything else. And of course when EC2 has servers that they wanna replace, they can take your application, to them it's just a virtual machine. They can take it, they have to tell you about it. Of course they say, oh no, we're taking this machine down and they can migrate it. There's potentially some problems with quality of service during the migration, but they can migrate it to another physical machine. And so this allows them to continue to provide relatively seamless service even as they're adding machines to their data centers, decommissioning old machines, things break, stuff like that. So there's some nice properties of this for people who want to run server farms and other large computing resources. Okay. Any questions up to this point? Okay, so now let's talk about how we actually, what it requires to be a virtual machine. And these were actually outlined in 1974. Pretty impressive, right? That was a long time ago, a long time before this stuff took off. So the first thing is Fidelity. So if I run software on the virtual machine, it should run identically to how it would on real hardware. And that should say BM, not VMM. So, and modulo timing effects. So there are going to be timing effects. It's gonna run slower maybe the time, it's fundamentally now competing for lower level hardware resources with other virtual machines, but whatever, that's okay. I'm just saying correctness. I shouldn't see differences in how typical applications run between a virtualized environment and a normal bare, I shouldn't say normal, because it's not normal anymore, a bare metal environment. Performance. So, and this is, so up to this point, you guys may be thinking, you know, why, what's the difference between virtualization and simulation? So here's the difference. Virtualization relies on the, on a match between the instructions that are being executed inside the virtual machine monitor and the underlying hardware. And in order to achieve good performance, what I want to do is I wanna make sure that as many instructions as possible can be executed on the bare metal. That's the goal. The goal is to allow applications and the guest operated system to do as much work using the hardware directly. So most of the time, hopefully almost all the time, they are running, they're running hardware instructions and they're using the machine just like a normal application would. To write a simulator, so for example, Sys161, which you guys use, is a simulator. Sys161 takes MIPS instructions and modifies its own internal state in order to reflect how those instructions would have run on a piece of hardware that's been obsolete for decades. I remember asking David once, it would be fun if we could give students the opportunity to run their OS161 code on a real piece of hardware at the end of the semester. That was 10 years ago. At 10 years ago, he was like, I don't think we can find that hardware. Might have to go to some MIT garage sale or something like that. The MIPS R3000 hasn't been around for a while. So what we do is we simulate it. Now in a simulator, there's no need for the architecture, the instructions that are being run inside the simulator to have any relationship with the instructions that the simulator uses to run on the machine. It can be totally different. The price you pay, you remember me talking about gen five, which is the hardware simulator that we use, the price you pay is performance. Hardware simulators are incredibly slow because they're not able to use the actually underlined hardware to do work directly. So to get good performance, virtualization requires this match between the instruction set of what I'm executing and the instruction set that's provided by bare metal. So I can't, for example, take an operating system and compile it to run on ARM, like an ARM instruction set, and run it in a virtual machine running on top of x86 because the instructions that are being run inside the virtual machine now have nothing to do with the instructions that are being run on hardware. Does this make sense? This is an important distinction, yeah. Yeah, so essentially, here's another way to think about it. The virtual machine monitor can execute all of the instructions that are being executed inside the virtual machine on the real hardware. We're gonna talk in a minute about why it doesn't because doing so would cause some problems, but most of them, like add, subtract, jump, things like this, shift are totally safe to execute. Most of the instructions that apps execute that do most of the work they do are totally safe to execute. On a simulator, so for example, Sys161 gets this instruction that's like RFE, and of course it's actually a couple bites of binary data. If it tried to execute that instruction on your actual computer, just it would blow up or something bizarre would happen. The computer has no idea what that means. It's like they're speaking different languages. That's probably the best way to think about it. To virtualize, the virtual machine and the hardware have to speak the same language. In a simulator, I have a translator in between, that translator is slow. It's like you're at the UN, everyone has those helmets on. All right, and then finally, safety. The virtual machine monitor should manage all hardware resources inside the virtual machine. Okay, so there are two different ways to accomplish this. So today what we're gonna talk about is something called full virtualization. And the goal in full virtualization is pretty awesome. It's to take an unmodified operating system. So if I took the Linux or the Microsoft that's running on your laptop, literally the same binary bytes, I can move them inside the virtual machine and the virtual machine monitor can run that code. That is full virtualization. There is no change to the guest operating system required to use full virtualization. We're gonna talk about why this is hard. The other approach is something called para-virtualization. And this is a lot more common today. So what para-virtualization does is it says, let's make some small changes to the guest operating system so that it plays nicer with the virtual machine monitor and with the underlying operating system. So this is, things like EC2 and stuff like that, really what they're doing is para-virtualization. And Linux and other operating systems have long had support for this. Because as we'll talk about, there are these corner cases in the x86 instruction set that make full virtualization really painful, although it's possible. It's being done by tools like VMware. Okay, so our goal is to run the unmodified operating system and applications in the virtual machine next to other virtual machines. And VMware is the best known provider of these full virtualization software solutions. They invented a lot of these approaches in this technology. So fundamentally, why is this hard? What's hard about this? Again, I want to take an unmodified operating system. It has no clue that I'm about to run it inside the guest OS. So what makes this difficult? Yeah, that's, well, I mean, yeah, so this is an idea of privilege. Okay, and let me get to traps in a minute. So privilege, operating systems are used to having privilege. They're used to running with kernel privilege. Now I could run it with kernel privilege, but then it would be able to do anything and be able to manage any hardware resources. So in order to do, so if you go back to the requirements, in order to achieve safety, I cannot run it at the same privilege level. Now it turns out, and this is not something I'm gonna talk about today. That's kind of a fascinating topic, that the x86 instruction set for years had support for multiple privilege levels, not just two. The x86 has actually four privilege rings. I don't know why they're rings, but you can think of them as four privilege levels. Operating systems typically, operating systems and applications before we started thinking about virtualization typically only used the least privileged, which is where applications ran, and the most privileged, where the kernel ran. But then when virtualization came along, and I think this is something that's covered in the paper for Monday, people were like, oh wow, there are these other privilege levels. Can these potentially be useful? And it turns out they can't. And one of the things Paravirtualization allows is for the operating system, the guest operating system, to be run in the second most privileged state. So you can think applications run in what's called ring three, which is the least privileged state. The guest operating system in a Paravirtualization approach is modified to be able to run in ring one. And there's something called the hypervisor, which is something you may have heard about, that runs in ring zero. That is now sort of the host operating system, although it's not really a full-blown operating system. It exists entirely to support guest operating systems. That's another difference between sort of server virtualization environments and the virtualization you're used to. It's that there's not necessarily a full-blown operating system, fully usable operating system running on the machine. In many cases, the machine, the server has been configured only to run guest operating system. Okay. So let's talk about some of the mechanics of doing this. I mean, that's the high-level goal. How do we handle traps? So a trap by an application in the guest operating system, where, who needs to handle? Let's say you're running your shell inside VirtualBox. When the shell causes an exception or uses a system call, who needs to handle that system call? I heard both. Okay, so the guest operating system has to handle it. Remember, this is a Linux system call that I'm making. Windows has no idea what to do. Now, who's going to receive that system call? If I don't, if I'm running it normally, and so what's gonna happen is the host operating system is gonna be vectored is what's gonna start running when that exception occurs. But somehow I have to get the exception to the guest operating system to be handled. If I don't, it's not gonna be handled properly. The second problem is the guest OS is going to try to execute privileged instructions. How do I know this? It's unmodified. It thinks it's in charge. It thinks this is its universe. It thinks it's supposed to be able to modify any part of the machine. So I need to do something about this. I need to be able to handle cases where the guest operating system tries to do things that it's not allowed to do. So if we, as I pointed out before, if we run the guest operating system with kernel privileges, everything works as expected, except for the fact that the guest can access the entire machine. So we can't do this. This is wrong, this violates safety. If we run the guest operating system with user privileges, what's going to happen when it tries to run a privileged instruction? What's that? So what has to happen, right? So normally, if an application tries to execute a privileged instruction, what happens? The operating system gets to handle it and normally kills off the application. In this case, the application is an operating system, but it looks like an application to the host operating system. So when that application generates certain kinds of exceptions that would normally terminate it, those exceptions have to be forwarded to the virtual machine monitor. So for example, let's say that I'm, let's say that I'm trying to modify the page tables, that's going to generate an exception, but the host operating system has to say, aha, that's coming from a virtual machine monitor. I'm going to send that exception to the virtual machine monitor to handle because it's possible that the virtual machine monitor needs to see this exception to update its own state. There's something about the virtual machine that the virtual machine monitor is providing that's going to change. Okay, so now here's the goal. Ideally when privileged instructions are run at user privilege, the CPU traps the instruction, the trap is then handed to and handled by the virtual machine monitor. This is what we want to happen. And then, assuming the guest to us is doing something okay, the virtual machine monitor needs to check this because it's possible the guest to us is buggy and needs to be killed. Maybe the guest to us just crashed. Assuming the guest operating system is doing something legitimate, the virtual machine monitor is going to adjust the virtual machine state. And then it's going to continue the guest operating system. So most times that the guest operating system does things that require privilege, this is what needs to happen. Now we refer to instructions that have this property as classically virtualizable. And the approach is referred to as trap and emulate. You guys see why it has that name? I trap into the host operating system. I vector the exception to the virtual machine monitor and the virtual machine monitor emulates. It adjusts the virtual machine state to account for this instruction having been executed on physical hardware. One of the main places this takes place is memory. So adjusting page tables. Okay, so notice that, so now the question is what is the virtual machine monitor to do with traps that occur within the virtual machine? So here I have a branch point. If the trap is caused by an application, I have to allow the guest operating system to handle it. So this could be something like a system call. System call has to be handled to the guest operating system. And if the guest operating system caused the trap, the virtual machine, monitor has to handle it. Now notice that this requires the host operating system's cooperation. So when you install something like virtual box, there's these dialogues that pop up that essentially are asking you to install certain kernel drivers in the host operating system that are required for these tools to work. And that's because without that support, the traps that occur, like for example, the attempts to use privileged instructions are just going to look like a buggy app. So this is the difference between virtual box being killed because it tried to do something like modify the TLB and virtual box being allowed to do that because the operating system realizes, aha, there's another OS running in there. So this is something that's required. And in order to make sure that this works, all traps and exceptions originated inside the VM must be passed to the virtual machine monitor and handled. Or again, they may cause the virtual machine monitor, they cause the virtual machine to die. For example, I might reboot the machine, I might power it off, I might cause an instruction that would normally cause, so I might call an instruction that would normally cause the guest operating system to crash. I can have a blue screen of death inside my virtual machine monitor. A buggy guest operating system needs to behave in the same way that it would on bare model. Now, keep in mind, this is not the normal case because in order to achieve performance, most of the instructions that I execute need to run normal. And it turns out that most of the instructions that operating systems execute do not require kernel privilege. They're just normal processor instructions. And so even if I take away the privilege behind the kernel's back, most of the time it doesn't care and it doesn't cause any problems or any extra overhead. Now, of course, there is extra overhead to running things inside a virtual machine monitor. What is the extra overhead? What's one piece of extra overhead? Yeah. No, I'm just saying that things the host OS, this things the guest OS could do quickly on hardware take a lot longer in the virtual machine. Why? I mean, this is whole exception handling cycle that didn't occur before. Those privileged instructions, normally I'm just allowed to execute them and they just execute in a single cycle, right? Boom. I adjusted a co-processor register that's privileged for some reason. Done. On inside the virtual machine, there's this whole long path where that has to generate an exception. They've handled it initially by the host operating system that has to pass it to the virtual machine monitor and then the guest OS can be restarted. So there's a lot more overhead to doing things like this. And again, the only way that we get good performance, the only way you guys, your systems run as well as they do when you're using things like virtual box. So that's a testament to the fact that most of what the kernel is doing does not require kernel privilege, only from time to time. Okay, so now let's talk about what happens when an application, this is a good way to make sure that everyone understands this stuff. This is, I think it has like seven bullet points or something. So let's say there's an application running inside the virtual machine and it makes a system call. What is the chain of events that takes place? What's the first thing that happens? Yeah, so the first thing is, I'm gonna trap into the host operating system. I have to. Remember, the guest operating system does not have kernel privilege. I'm not gonna allow it to install exception handlers. Probably not. So I trap into the host operating system and then the host operating system has to vector that trap into the virtual machine monitor. So what happens next? Yeah, that already happened. So now I've already got, I've already vector the trap to the virtual machine monitor software. But now what does it have to do? No, this is a system call, right? This is not a privileged instruction, yeah. Yeah, so now I have to inspect the trap. I have to see this as a system call. So this is an instruction that generates a trap. It's not a privileged instruction, but I trap into the host OS. The trap gets vectored, gets handed off to the virtual machine monitor. The virtual machine monitor looks at it and essentially has to get the guest operating system to react as if it had handled this trap directly. So it has to pass the arguments into the guest operating systems exception handling path. So now the guest operating system is going to handle, it's gonna actually do the work. So now the guest operating system starts to run and does whatever is required to handle the system call. Now, when the guest operating system is done, it's also going to generate a trap. I call something like return from exception and that's gonna do the same thing. So now I'm gonna go back to the host operating system, bounce back into the VMM, and the VMM is responsible for trap, passing, oh, sorry, passing arguments back to the process that actually made the system call, yeah. There'd be multiple VMMs. So I need to make sure that I pass, I need to be able to identify which VMM caused the trap. So I have to see, okay, VMM from, there's a great question. So system call from the VMM running Linux send it to that VMM, right? Or a system call from the VMM running Windows send it to them, right? So yeah, the host operating system has to be able to vector or hand off the exceptions problem. Okay, so actually I lied. That wasn't even the real complicated example. So here's a federal one. What about a TLB fault? What happens when there's a process inside the virtual machine that generates a TLB fault? What do I need to do? What's gonna happen, first of all? It's the first thing that happens. Trap to the host operating system. Remember, the host operating system is ultimately in control of the machine. This is a good thing to keep in mind. All of the stuff that's going on with the guest OS, that's all happening because the host OS is allowing it to happen. If the host OS says, no way, I'm not running virtual machine monitors, I could just kill things to try to do this. So if you tried to run virtual boxing machine that didn't have the right drivers or where the machine didn't want to do this, it wouldn't work. So this all relies on the cooperation on some level. And again, it's normally done with drivers rather than with direct changes to the host operating system. It all requires the cooperation of the host operating system. So I trap into the host, I hand the trap to the virtual machine monitor. The virtual machine monitor sees that it was generated by the application and passes control to the guest operating system because this is a fault. This is like the fault you guys are handling for assignment three. The guest operating system is gonna begin handling the TLB fault and it's gonna try to load it entering to the TLB and now what's going to happen? Is it allowed to do this? Oh, back to the host operating system I go because this is a privileged instruction and I don't have privilege. So now I have to hand the trap back to the virtual machine monitor and the virtual machine monitor has to inspect the trap, see that it was generated by the guest OS and adjust to say to the virtual machine appropriately. But what actually happened here, of course, is that the virtual machine monitor would have to load this entry into the actual hardware TLB. Remember, I want to allow the process and the operating system to use the hardware resources. So I'm certain and the TLB is critical, right? And without the TLB, without being able to map memory at runtime, there's no way I can run anything. It's just way too slow. So ultimately, I actually have to allow the guest operating system to modify the hardware, the hardware address translation, let me try this again, the address translation hardware, there we go, there's the right order. Now this gets even, if you want to, if you want to convince yourself that you have mastered this fully, go out and look up what are called shadow page staples. So on x86, this gets even more interesting because remember, on x86, the hardware handles TLB faults internally. And so x86 requires this whole extremely intricate approach to handling page faults inside the guest operating system and inside the virtual machine. Okay, and just to finish up, let's hope, and what's really interesting here and what I want to make sure you guys understand is what's being virtualized is the hardware interface. So most of the hardware interface will be virtualized by just letting it run because it's safe. But then there are parts of the hardware interface that we actually have to apply an approach that's very reminiscent of virtual memory. So the interface to virtual memory was load in store. We ensured safety by translating every access and we got good performance by caching translations. For hardware virtualization, the interface is all the instructions that the processor could use to modify the state of the underlying hardware and cases where the changes to the underlying hardware causes the processor to run. We ensure safety by intercepting or turns out rewriting unsafe instructions. I'll come back to this in a second. But if I can do trap and emulate, which I hope I can, then I just intercept all the instructions that would cause problems and check them. Normally I check them to make sure that they're okay. And then in many cases I actually allow them to execute. So going back to the TLB example, that modification to the TLB that is generated by the guest operating system, assuming the virtual machine monitor says it's okay, it's actually going to run. It's gonna take place. It will be run with the appropriate privilege. It's just not gonna, it's not, but the point is the guest operating system can't make changes to the TLB without the host operating system and the virtual machine monitor checking them to make sure that they're safe. Let me get good performance by allowing safe instructions to run directly on the physical hardware. So really the same thing that we did with virtual memory, most addresses don't need translation because they're cached. Here most instructions don't need to be virtualized because they're safe. Yeah, remember when I trapped in the host operating system, I ran in privileged mode and the drivers that I have to load to support the virtual machine monitor also require privilege because they have to be able to do things like. That's a great question. I'm not sure I would describe it that way, but I think it's effectively true. I think if your virtual machine monitor is buggy, it can do some bad stuff if that's what it boils up, yeah. Yeah, so on some level, but on some level it's really, you can just think of a virtual machine monitor as another application. Now the big difference about the virtual machine monitor, of course, is usually it has like a huge chunk of memory that it pre-allocates. So when you set up virtual box, you give it like a gigabyte of RAM. That's a lot more memory than a normal process would need. And so that's one significant difference, but in many ways, these compete for resources with other apps like in other apps. But they're just big memory hungry apps. Okay, so unfortunately, so you might think, oh, this sounds doable, right? The sad thing that sort of created the company called VMware to some degree is the fact that the x86 architecture is not classically virtualizable. So some instructions don't trap properly when you run them without kernel privilege. So they either fail silently, they do something different when you run them with privilege or without privilege. Some instructions have different side effects depending on which mode they're running. And so what VMware does to fix this to provide full virtualization. So the fact that these instructions exist essentially makes it impossible to apply the trap and emulate approach correctly to the x86 architecture. Sorry, doesn't work. So what do I do? So what VMware actually does is something called binary translation. So when the guest operating system is running, it's scanning it, looking for these types of problems, certain instructions on the x86 that have these issues, and it rewrites them to safe instruction sequences. And it sounds terrible, it is terrible. It's just really ugly and gross, but it works, they got it to work. And of course you can apply a very similar strategy as Java virtual machines do and cache translations to improve performance. So portions of the code that I've already rewritten in order to make these instruction sequences safe, I can just leave those there and the next time it executes the same block of code, I can reuse it. Okay, so here is again a short, non-exhaustive list of all the really cool stuff that is out there. I talked a little bit about privileged rings, but not enough, we'll come back to it on Monday. Shadow page tables. And one of the things that's very interesting about this is that virtualization is so powerful and so important to modern computing that it's actually made significant changes to hardware architecture. So new x86 instruction sets have all sorts of support and instructions that are designed to make them easier to virtualize. And we'll talk a little, I think we may talk about this on Monday. Okay, so on Monday read the pair of virtualization paper and we'll talk about it in class. I'll see you then, have a great weekend.