 OK, so first we're going to talk about virtual machines. And again, this is recording all the lectures, all the material will be available afterwards. And you have notes for all the lectures, but I'm not going to get them up on the screen at the top. So quick show of hands. Who's used a virtual machine before? Oh, wow, great. Decent type of people here. What are virtual machines? Basically, they're simulated computers. Like, you can use software on your computer that lets you run an entire operating system and bunch of software inside that in a way that's isolated from your host environment. You can take this intermachine and configure it in different ways. Like, you can give it some percentage of the resources available in your machine that are physically available in your machine. You can take this simulated computer and install a different operating system in there or run various programs that you wouldn't want to run on your host machine. And the reason we're covering this topic first in this course is that for the purpose of this class, you can use virtual machines to experiment with new operating systems and new software and new configurations without any risk. So if you set up a VM, set up an environment, and install some software that's kind of buggy or something, you don't want it to throw you off in terms of being able to get other kinds of work done. And so having an isolated environment is great for that reason. You can experiment with that risk. You don't have to worry about anything. In general, virtual machines have lots of uses. So as I alluded to a moment ago, you can use them for running software that only runs on a specific operating system. So for example, if you're running macOS like I am and I want to use some Windows specific software, it's pretty easy to set up a virtual machine, like install Windows and then install software for Windows and just run it inside my macOS environment without having to worry about something more complicated, like setting up dual boot, like actually getting my machine to natively run Microsoft Windows. Virtual machines are also used for experimenting with potentially buggy or malicious software. So if you download some sketchy binary from the web and you want to run the program, but you don't really want to run it in your regular desktop environment where it could steal your files or install a key logger or do other sketchy things, well, virtual machines give you pretty good isolation. And so you can experiment with potentially sketchy software there. So there are a bunch of different programs you can use. We've linked to some of them in the course notes. One example is VirtualBox. So this is a free and open source virtual machine manager and I'll basically just give you a quick demo just to show you some of the features of this. So I've just gone to the VirtualBox website, you can find it via the course notes and now I can say, okay, I'm going to go configure a new virtual machine. I can give it a name, doesn't really matter what I call it. And then this particular software lets me select which operating system I'm going to install inside it. And so I've gone and downloaded a Debian installer CD from the internet. Basically the way you install operating systems is via one common way of installing operating systems is you get these bootable CDs that you can boot up and then they can install the operating system on your machine. So I've gone and downloaded that ahead of time. And now I'm setting up this virtual machine and I can kind of configure what this simulated machine has access to in terms of the resources of the actual physical machine here. And so like one setting here is how much RAM I give it access to. So my host machine has 16 gigabytes of RAM but I probably don't want to give this simulated machine access to the entire physical memory because say I'm experimenting with some potentially buggy software and it goes and allocates lots of memory, well that might make my host system unstable but this provides me some isolation between the guest and host system. So I can go and play with these settings. Another thing you need is like this virtual machine has a hard disk, it needs some storage. And so of course even that is isolated from the host machine is kind of simulated. And so one thing I can do is configure a virtual hard disk. So like this is configuring the boot disk for this machine. And so I'm going to go ahead and create a virtual hard disk. And there's some details here of like exactly what disk format to use. They don't really matter. This is asking me whether I want to. Also by the way, is this text big enough for you to read or should I make it bigger? Bigger. Okay. Also at any point if there's anything you don't understand or like if there's text on this screen that you can't read, please just speak up and I can fix it. Is this readable? Okay. And actually all the details here don't matter that much because you guys should go through this process on your own after class. And so you'll get to read all the stuff in detail. So this virtual machine needs a disk. And so this tool gave me a couple options. One thing it can do is like the simulated disk. I might say okay I'm going to allocate 20 gigabytes of disk space for the simulated computer. I can go ahead and allocate that entire space on my actual physical disk at a time. But of course since this is a simulated disk, you can kind of just dynamically allocate that space as you go. So if the simulated machine doesn't actually use all the disk space, well then I don't actually need to allocate all that space on my physical machine. And so I'm going to go ahead and create this virtual machine now. And then once you've done that, like here now here's the simulated computer and it's currently, it's like it's a blank computer. There's nothing installed on it. It has all the simulated hardware for real machine, but it has no operating system. So then what you need to go ahead and do is install an operating system. So they mentioned earlier, I've gone ahead and downloaded an OS installer from the internet ahead of time for Debian, which is a Linux operating system. And so on the first boot, the virtual machine manager lets me select the CD that it's going to boot from. And then afterwards once I can figure it, it'll boot from the hard disk. And so what you see here is the Debian installer running inside this simulated machine. And I can click inside here. Now you see more details about the software, but the software can forward my keyboard and mouse interactions to the simulated machine. And it's telling me some information about how I stopped communicating with the simulated machine and go back to interacting with my host machine. So it's talking about this thing called keyboard and mouse capture. So if I click inside now, if I use my keyboard, it's interacting with this simulated computer inside. And if I want to actually control my outer machine, like the key to do that for the software, the command key. Again, all details that you'll read about as you go through this on your own. So I can go ahead and launch the installer and I'm actually going to go through the install process because you should be able to figure it out yourself. But basically it's like a standard OS installer that will ask you exactly how you want to configure the operating system. It'll ask you to create a user account and stuff like that. And once you go through that process, you should have a working virtual machine. So I'm going to actually kill this because it takes a couple minutes to install the OS and instead start up a different virtual machine that I can figure out how to time. Any questions so far? It's kind of walking you through this tool and then go through this all on your own and it'll become a lot more concrete then. Okay, so like this is what a virtual machine looks like. This is running the Debian operating system. I have a program running inside here. And it's a really good environment for experimenting with stuff. So as I mentioned earlier, one really useful feature of virtual machines is isolation. So I can run potentially buggy or malicious code in here. Another kind of cool feature that I think you should make use of in this class is something called snapshots. And I want to actually go through the menus and show you this myself right now. But one really cool thing you can do with a simulated machine is, since the whole thing is simulated, you can take a snapshot of the entire machine state, everything from the contents of the memory to the disk to like what's in registers and take what's running and freeze it. And then you can go ahead and do whatever you want with it and then throw away all your changes and go back to your snapshot. And what that's really useful for is when you want to do something that could be potentially dangerous and you don't want to accidentally break whatever you've set up in here and then have to set up the whole thing again from scratch. And so what's kind of cool about this is you can run kind of crazy things inside here and not really worry about affecting your host machine. It would be kind of unfortunate if I did that on my host machine but inside here I can experiment with that and see what happens to a Unix system if you try to delete everything on the boot disk. Yeah, it's kind of broken down. But thanks to snapshots, I could just undo this very destructive action that I just did. So that's a really brief introduction to virtual machines. And of course, in our list of exercises we'll have you go and download one of these virtual machine managers, download an OS boot disk, actually go through the process of installing the operating system in the virtual machine and then experiment with some of these more advanced features like snapshots and like just throwing your virtual machine and trying to recover from that without installing the whole thing. Some other useful things to know about. There's a lot of tools that make it nicer to interact with these virtual machines. Some of them are in our notes and you can also experiment with these by just going through the menus and seeing what kinds of options the tool offers. But one particularly useful thing is something called guest add-ons where different softwares use slightly different names for these. But basically you can install software inside the simulator computer that makes it easier to communicate with your host machine. And so this can give you nice experiences, like one thing you can do is something called, you just are in a slosh. Yeah, yeah, it's called, thank you. I had just heard my virtual machines, it's probably not gonna behave super great anymore. Like one example of a nice feature is something called C++ mode in virtual box or like slightly different names for other softwares. Where you can make it so that the individual windows running inside the guest machine appear as windows in your host machine. So like instead of just seeing this big box which is like the screen for your virtual computer, instead it'll kind of appear as if you're running the programs natively ish on your host machine. Just to make it a little bit nicer to interact with. Or there are things which let you have clipboard integrations. If you copy some text in your host machine, you can go to paste it in your guest machine, it'll just work. Or you can drag files back and forth between the machines and it'll just work. So lots of nice things there. You should look into those. And so that's basically it for what I wanna talk about about virtual machines. A little bit about how they're implemented. How would you imagine like simulating a computer inside your computer? Does anybody have any ideas how you might do that? So like maybe one thing you could do is you could think about what kinds of operations and x86 CPU supports and go in and write some like big interpreter that just interprets x86 instructions and like simulates them. Or they're very, they're like potentially slow ways of doing this which are like straightforward. What's kind of cool is that modern hardware actually has hardware support for doing this kind of stuff. And so virtual machines are actually pretty efficient. There's some particular applications for which they don't work super well compared to running on bare metal. So like for example, accessing certain kinds of hardware in certain ways. Like say you wanna play video games if you install a Windows virtual machine on your Mac that Windows VM isn't gonna have access to your graphics card, it's not gonna work super well. So for certain applications, you really want to install software directly on bare metal, but for many applications virtual machines are actually pretty efficient and work well. So that's my quick introduction to VMs. Are there any questions? Or do any of you have anything to add? No? Okay, cool. But I do have something to say about containers. Yeah, so next, John is gonna talk about containers. Do you need anything on the screen here? We know that's the perfectly good picture for virtual machines. I like it, we love my job. So how many of you have heard of the word containers? Is that a thing that? Okay, no, this is for good. So virtual machines are really nice for doing things in complete isolation. Like you want to spin up your own system that just you're gonna do something in isolation over there and you don't want to touch anything else. However, it turns out that very often you want to run software that, like you want to spin up like 100 virtual machines. So for example, how many of you have taken 6858 or 824, the computer security or distributed systems classes at MIT? Or any of the other like systems classes? All right, not that many. All right, so all of these classes have programming labs as many other classes at MIT do. And one of the problems that we have is when we want to do grading of student submissions, we don't trust student code, right? Who knows what code you've written? Like it's going to do like RM slash RF on our grading machines and that would be terrible. And so we sort of want to run it in isolation. But at the same time, like all of them are written to be run in the same environment, all of Linux. There isn't really any reason to simulate a full computer for each one. So you can think about it this way. In a virtual machine setup, you have your like real computer and inside of there, initially, is just sort of your operating system. So this is like Linux or Mac or Windows or whatever. And normally that takes up the entire thing. If you start up another virtual machine, you can think of it as you start up an entirely new box, right? So if this is macOS and this is your Windows VM. This isn't entirely a true picture, like this is not an architectural diagram, but mentally it helps to think about it this way, that it's a completely different machine. So you might think of like macOS has a bunch of code for dealing with like networking, implementing TCP, screen drivers, implementing the file system, like all of that stuff existed here. All of that stuff also existed here, right? And all of it needs to run. When you start up a new VM, you have to go through the entire root process of the operating system. And every time the operating system wants to do something, it sort of does what it normally would do if we're running on bare metal hardware. And then whenever it actually tries to touch the hardware, that's sort of redirected into the true operating system and then it has to go through a bunch of code and then it goes to the real hardware. And this is a little bit of a pain. It means that everything is a lot slower in a virtual machine, even though it gives you really nice isolation. So containers are sort of a solution to this. They're observing that if you have environments that look a lot the same, then really what you should do is try to share as much as possible. So in a container world, you have your main operating system, right? So macOS or whatever. And the operating system consists mostly of what's known as a kernel. So the kernel is what implements things like disk operations, networking, inter-process messaging, all of sort of the primitives for talking to the hardware, talking to the external world, all of that is implemented by the kernel. And what it observes is that really if you want to spin up a machine that looks kind of like the current one, we can really just set up another, so if you think of this as like, the kernel is the top, or the kernel is the top half of this diagram, and the bottom is sort of all of your programs, the programs you would run a new machine, that talk to your kernel. If we want to spin up another machine that looks a lot the same, we can just like set up another of the bottom half and have that talk to the same kernel. This means that we don't have to boot an entirely unrelated new machine and everything has to be indirected sort of twice. Instead, we can just launch applications and sort of set up a little fence. Like we're going to set up a little like guard rails around it. So it thinks that it's running without anything else there. So whenever it asks like, are there any other partitions on my disk? The operating system is going to say no, there's nothing else. There are no other files. If it looks for what's in the root of my disk, it's going to say just your files. So it basically is the kernel, your kernel, the one that's running on your machine knows about this thing and sort of makes it think that it's alone. When in reality, all of, and whenever it tries to, whenever your application's here trying to do something, it really just goes back to the original kernel. This turns out to be a lot more efficient. It's a lot faster, but also provides weaker isolation. You can think of this as your operating system needs to implement all of these fences correctly, especially it has to implement a jail is often what this is referred to as. It needs to jail all networking, all disk interaction, all hardware interactions, everything that this little box could possibly do, it needs to pretend as though there's nothing else in the world. And it turns out that doing that, sort of bulletproof isolation is really, really hard. So it provides weaker isolation, but much higher performance. But what that does mean is that you're going to spit up hundreds of these very quickly on your machine because you can think of them as basically just normal programs. They're like all of the other programs on your machine. And so they're as cheap to start off as the normal programs are. And people have used this technology to build really large infrastructure for automating all sorts of tools. So we use it for grading in a lot of the MIT classes. So when we get labs, we spin up a separate container for each student submission and then we run it there and to the student code, it's as if it was running on its own machine. If it tries to RM slash, then all it affects is the student's code. And there are companies built on top of this too. So Amazon runs, so you may have heard of Amazon EC2. So Amazon EC2 is this large cloud provider that basically rents out virtual machines. They also have a service called ArcGate, which is essentially the same, but instead of running full virtual machines, it runs your programs in these containers. And that means that instead of taking minutes to start up, they take seconds to start up. This means you can run much shorter, sort of many more and much shorter jobs efficiently on that kind of system. This is also used for, if any of you use GitHub, for example, whenever you push stuff to GitHub, a lot of the infrastructure the GitHub uses for building pages like the course website is built from GitHub repository. All of that is done through containers, continuous code integration and testing, like automated testing of software is often done using containers. As it continues to provide this nice alternative for when performance is more important to you. But you should be aware that there are a couple of restrictions, such as the isolation is a little bit weaker. You can usually tell that you're in one of these jails, whereas with isolation the machine can almost pretend that this isn't happening. So this machine might not know that it's in its own VM. The other thing is, observe that because this shares a kernel, you can only run containers that are sort of similar to your system. So if I'm running like a Windows machine, I can't spin up a Linux container because they don't share a kernel. They're totally different operating systems. And so in that case, I would in Windows spin up a Linux virtual machine and then I could run Linux containers on that machine, on that virtual machine. And then I could spin up many if I wanted to. There are a couple of different things. So there's Docker, might be the one that many of you have heard of, the dust containerization. There are others like Rocket spelled RKT. There's LXC, the Linux container stack. There's Amazon Firecrackers, a new one that was released. So there are a ton of different ways and the ways in which they're different are usually the APIs for setting these up and what features of the kernel they use to provide this, sort of pretend that this jail is here. But under the hood, they're really all the same. They're really just spin up a neural program and then set up all these fences around it to make it think that it's alone. The biggest way in which you can figure these things and the reason why they're really handy is they let you describe a computer, like describe how you want it to be set up and just say, give me one of these. With virtual machines, you can download a full like disk image or you can download and install CD and go through the process and then what you end up with is basically a virtual disk image. It's like here, all the bits are on this disk and I can send it to someone else. They can download it and then start that virtual machine from that disk. With containers, what often happens instead is you write a file that you might include with some piece of software or you might post it online somewhere for other people to see that is basically a recipe for how to set that machine up. So it usually starts with something like a from which defines the basic install that you should start from. So this might be something like Debian or Ubuntu or whatever. It's just saying like, here's the basic system set up I want and then it dictates a number of commands. So it says, run this command inside of that container. Then run this command, then run this command. And this could be things like install packages that I need, download software, clone repositories, like whatever you might have to do. And then you do something like expose is usually the keyword that's used. So expose says these network ports should be visible if I start one of these. So if I have a container that starts a web server, for example, then I want it to expose port 80. I want, if someone starts this up, they should be able to access it on port 80 and then be able to look at the things that what the web server serves up. And usually there's also like an entry point. So this is, if you start this, which program do I run? Because you can think of a container as sort of a self-contained program that just happens to also include the entire operating system, right? So a container, you can usually just start it and then it will download a bunch of stuff and basically build the entire environment for you and then run a program. And the reason this comes in really handy is let's say that I've written some really good new web server that I want everyone to use, but it's really painful to set up, right? It requires you download all these dependencies and you download all these tools to build it. There's lots of configuration. What I can do is just provide you with this file and you run it like a program. And then what Docker or whatever other framework you have does is it just downloads the front, the sort of base image, runs the commands that I have written, I have the program developer written, and you don't have to run any of them. And then at the end, you now have that server running and it will run just the way I intended your system to be set up. And so this makes it very easy to distribute software without other people having to know how to set up their system to make it work. So you can almost think of those sort of self-contained applications. I think that's most of what I wanted to say about containers. You come across them all like in a lot of places on the web now more and more. You'll see that this is being used both for application installs and sort of deployments, testing, it's being used for deployment of websites a lot. Like nowadays, if you want to host a website somewhere like use Heroku, which is a very common service where you just sort of upload the source code of your website and they set up all the servers and all the Ruby or Node.js or Python installs you need, really what they're doing is they're just spitting up a container for every website. Every website has its own container, its own web server, and it thinks that it's running alone, but really they're hosting sort of thousands of these per machine because all they really need to do is just share the one kernel and then have all of these web servers run independently. I think we'll stop it there for virtual machines and containers. The next segment of the class is gonna be on shell and scripting. It will be a lot more sort of in-depth and technical than this one because we'll actually go through how to do things on the shell. So there'll be like green text on white background type stuff or black background type stuff. There will also be a lot of headscratching because the shell is really weird in some cases. If you have any questions about any of this stuff then feel free to come up in like chapters. Otherwise we'll take like 10 minutes or so before we start with the next session. Are there any questions like publicly before we move on? Are you gonna come with .files too? .files will not be today. We did a little bit of reorganization yesterday because it's a little weird to talk about .files before we've talked about editors because otherwise you don't know how to edit your .files. So we'll cover that slightly later in the class. No, of course. All right, let's take 10 minutes and then we'll do shoulders.