 So just a lot of unanswered questions for me after watching that video. Why is he wearing a raincoat? I don't understand. Anyway, it makes him look very sketchy, right? I wear a long raincoat when there's no rain, because that's the normal thing to do. It's the guy you hope you don't end up in the elevator with. OK, so today we're going to keep talking about virtual memory. So maybe we should play that song every Friday, just as punishment to whoever put it on the playlist. By the way, if you don't like the music that's being played before class, please add things to the queue. I've been going in order, but it's not clear that's a good strategy, so I might start skipping around. OK, so last time we talked about address spaces, we started to talk about this. We came up with some requirements, some ideals for how we wanted to manage memory. We came up, we were starting to frame the view of memory that we wanted processes to have, and today we're going to start talking about how that actually works. And we'll talk about sort of the key additional abstraction that we bring in here to make address spaces workable. OK, so assignment 2 is due a week from today. So you have whatever 7 times 24 is, 144 hours. Please, well actually I should say you have 141 hours, because clearly you have to come to class for three hours. My class, you don't have to go to your other classes to finish assignment 2. So please get going. I don't know, how many people are done? All right, see he's messing with you. Yeah, so how many people have started? How many people have something working? OK, good. I mean a week is not an unreasonable amount of time to do assignment 2, but you'd actually need to do it. So please get started. This is also, I feel like a good time to just remind you guys about the course collaboration policy. So every time you submit an assignment, you are explicitly agreeing that you followed the policy. You are also giving us a copy of your Git repository, which is great. And we are going to use all the information in that Git repository to try to make sure that you have not violated the collaboration policy. So please, look, if you struggle with assignment 2 and some things work and some things don't, fine. If you struggle with assignment 2 and you decide to submit somebody else's assignment 2, that will be bad. You will regret that decision, I promise. So these assignments are hard. We have office hours all week. We'll add some office hours toward the end. Jinghao is back from China. And Jinghao knows everything about this. Some of you guys have been probably reading his blog, which is out of date, unfortunately. But Jinghao in the flesh is here. And he can solve all of your problems very quickly. So we're going to try to help you as much as we can. But please don't cheat. So back to virtual memory. So at the end of Wednesday, we were talking a little bit about some of the conventions in terms of address space layout. So we have this address space abstraction. And because address spaces are so large, there are conventions in terms of where things could put and how things could organize. The ELF file describes where to put stuff. It describes how to load things from the file into the address space. And ELF files can pretty much tell the operating system to put things anywhere. But in terms of compiling, there are conventions about where stuff ends up. So we talked about why we don't load the code at the very bottom of the address space. So by the way, when I talk about the bottom of the address space, I'm talking about low numbers, low. So 0x0 is the bottom of the address space in the view of memory that we provide to the process. The top for MIPS for you guys for assignment 2 and 3 is 0x7FFF for some systems that's bigger than that. In theory, it could be as high as 0xFFF. But usually there's a bit of memory up there that's reserved for the kernel. So we'll come back today. We'll talk a little bit about example physical memory layouts today. So OK, that's where that came from. Oh, and so the other thing I just want to point out. I see this code sometimes when people are doing assignment 2. They're like, oh, I'm going to check for an old pointer. Now, the user might have passed me a null pointer. Here's the problem. There are 2 to the 31 ways that a pointer that's passed by the user can be wrong. And you have just wasted several cycles to turn that into 2 to the 31 minus 1. So don't bother to check for a null. It doesn't matter. As long as you pass this pointer to the appropriate functions to move things back and forth from user space, you'll be fine. Don't waste time with this. I know in general this is good practice, but in the specific case, it's not necessary. Because again, there are all sorts of addresses the user could pass that are bogus. And don't waste your time just checking for 1. Because then maybe they passed you an offset to a null structure, and it's going to be 2. And you're going to think, oh, that's a good address, and you'll try to use it and it won't work. Don't bother with this. OK, oh, and here's the other problem. No can also be valid. I forgot that. So not only is this few times, it's also wrong. It's possible, if I decided to do my own layout, that in my address space, 0 was a valid address. So this check is also incorrect. At some point in the future, we'll have a test for sign the 2 that'll make sure that it catches this. We don't right now. OK. So we talked about there were a couple of different parts of the address space where stuff gets put. Do people remember some of them? What's one region of the address space? I think about memory. What does a process need to have in memory? Yeah, a stack. So one stack per thread, I mean, a single-threaded process, that stack typically lives at the very top of the address space. Once I start creating multiple threads, I need multiple stacks, and so I need to find a space for each so that they can grow. It's not an issue. What else? What else does a process store in memory? Yeah. Registers. Are registers stored in memory? No, registers are on the CPU. But what else does a process store in memory? Yeah. Yeah, we call that its code, the code section. So the compiled program itself, the instructions to tell the computer what to do. Those could put in memory. We also have a heap. What lives in the heap? How do I get a memory address in the heap? What's the call I make? Yeah, project. Game. Game. Uh-huh. Malik, right, OK. So Malik is how I get addresses in the heap. Malik is just a library that's used in user space. But it allocates, does dynamic memory allocation out of the heap area. And so the heap is typically put at a place right above where the code is loaded. And as I increase my heap size, as I call Malik more often and don't call free, the heap grows upwards. The stack, on the other hand, if you imagine for a single-threader process, starts at the top and grows downwards. And in general, these two parts of the outer space will not collide. And the reason is they start off several gigabytes apart from each other. So by the time you have your stack actually colliding with your heap, either your stack is way too big, and you're probably have been killed by them because you have some sort of bad recursion that you're doing, or your heap is way too large. And you've been killed because you've used up all the memory on the computer or something like that. OK. So remember one of the problems we had with giving processes direct access to physical memory was that the addresses of stuff change every time I run. And this is one of the things that's beautiful about the address-based abstraction, is that I can put a particular variable like every time I run, I know that the address, the memory address of this data structure is going to be the same. And the compiler takes advantage of this, because the compiler uses these addresses as if I've got from 0 to 2 to the 32. And it puts things in various places, and it knows that those things are going to be there. Now we're going to talk today about how this actually works, because may have identified some problems with this approach. But just keep in mind, and we're not going to go into this in this class, in certain cases I still do have code that needs to be moved around. These are called dynamically loaded libraries. Those can be put anywhere in my address space. And in order to support that, when you build those, there's a bunch of extra information that the compiler has to include, which allows me to move them around. This is called relocation information. So those libraries have a whole section that's part of the L file that describes how to relocate them into different parts of the processes address space. So this sounds awesome, right? So this address-based abstraction has solved a lot of our problems. It makes it very easy to lay out things. It means that a process can put things in the same place every time. It doesn't need to worry about the actual memory addresses that are available to you on the computer. But there's some problems with this. Has anyone ever read New Egg reviews? So if you don't read New Egg reviews, you should read New Egg reviews. Regardless of where you buy your computer hardware, New Egg is by far the best reviews. So even if you want to buy it on Amazon, who cares? Read the reviews on New Egg. One of the things that New Egg people will say when they like something is, that's a great Wi-Fi router. The cons doesn't cook breakfast. So clearly we can't expect everything from a Wi-Fi router or address spaces. But can we build these address spaces? That's the real problem. So what we're going to talk about for the next couple weeks are the challenges in realizing this idea. And you guys may have sort of foreseen some of the problems that this particular view of memory creates for us. Oops, it was supposed to be Tom Cruise. I don't know where he went. All right, so what is this going to require? So what are some of the challenges here? The view of memory that I've been talking about giving processes is pretty radical. Can people see some problems? Look, all I want to do is give every process on the system the same view of memory that starts at 0 and goes up to 2 to the 32. No problem, right? What's hard about this? What are some of the ways that this doesn't map down easily onto actual memory? Yeah, so process A and process B. If process A uses memory address 10 and process B uses memory address 10, should those be the same memory address? They can't be. If they were, those processes would be sharing memory, and I don't necessarily want that. So I'm doing something weird here, right? There's clearly something strange going on. This has actually helped things work. But yeah, so the same address from one process to another process is different. That's to refer to different memory. From now on, when we talk about memory, in particular when we talk about memory that's visible to user processes, there is no, if you say what's at memory address 0x10000, that question cannot be answered from a process context. You have to tell me for what process. Because I'm giving all these different processes the view that they can all use 0x10000. Those addresses have to be private. That was one of the conditions of the address space abstraction. And so there's clearly something weird going on. But again, from now forward, a process address is meaningless without the process identifier. I have to combine the two in order to figure out how to interpret it. And this comes back to the idea of protection. And then how do I, so again, on MIPS, when you guys are working on assignment three, you're going to be providing processes with a two gigabyte virtual address space on a machine that only has two megabytes of memory. And that's supposed to look contiguous. So how do I even get it in there? I have a problem with fitting this huge thing into this tiny little amount of physical memory I have. So clearly I need to play some games to get this to work. So here's the point of the course where I reveal an unfortunate truth. Well, I shouldn't say it's unfortunate. It's kind of beautiful. You guys have actually never dealt with real memory. You might have thought you dealt with real memory. You've used memory addresses. You've called malloc and free and whatever. But it hasn't been real. Sorry. And you believe things that are true that are not true. So this is kind of like, oh, man, what happened to all my background images? It's really sad. This was supposed to be the matrix. So this is like your matrix moment. I'm about to introduce you to the real world. Oh, actually, you know what? This supposed to be Santa Claus. So Santa Claus is not real. Neither are the memory addresses that your programs use, OK? So it's probably obvious to you by now that somehow, if I'm going to get address spaces to work, I have to break the connection between the addresses that are used by processes and physical addresses. Because 10 different processes are going to use the same process address, and that needs to refer to different actual physical memory. Now what we're going to do here is we're going to add what's called a level of indirection. Where have we done this before? There was another place we did this. Yeah, so remember when we did this in the file handle, we used it to reorganize some information to allow a certain amount of information to be shared. Now here we're doing it for a different set of reasons. The reason we're going to introduce this level of direction is that by translating references to memory, which is what we're going to provide processes with now, we're actually not going to ever give processes a real view of memory. A process will never use an actual physical memory address. And that's why you as a programmer have never actually used a physical memory address. At least unless you were programming some sort of stupid little 8-bit microcontroller or something like that, in which case, sorry. I'm sorry you had to do that. I did that for a while, actually. It's not painful. Now by introducing these references, it actually gives me a lot of power. And the kernel uses all of these different capabilities when it's managing virtual memory. So instead of giving you access to physical memory, I give you a reference to physical memory. And I can revoke that reference. I can share that reference with other processes and actually allow you guys to share memory if you want to. I can move things behind your back. Because every time you need to use that memory address, you actually have to translate it. And I can change properties of that address behind your back as long as I maintain some invariance. So we're going to talk about them in a minute. So I can take it away. I can allow two different processes. So this is what we did with file handles. We allowed processes to share a portion of the file state under certain conditions. If I actually want to enable memory sharing between two different processes, I can use the same idea. And I can also move stuff around and change it from whatever color that is to green. So sometimes I want to give you green memory. So I can change it that way. So now let me ask you guys a different question, which makes it kind of weird. What is the, did I actually have a question? Yeah, what's up? So how do you product the data and then the difference with that? So it's a great question. Let's say I've set up something so that two processes can share memory. Does the kernel at that point have any role in ensuring that that happens safely? And the answer is essentially no. The two processes have to coordinate themselves to determine what to do. So this came up before. I actually got a surprising number of questions about VIM lock files, which I thought was fascinating. So people were asking, what happens if two, so here's a great question. What happens if two processes try to access the same file at the same time? Yeah? In the previous one? No. In general, what happens if two, if I have two processes that are reading and writing to the same file, what's going to happen? A mess, potentially, right? The point is the kernel doesn't care. Kernel, if you're allowed to write and you're allowed to write, and you guys both have the file open, I just let you guys go at it and assume that you know what you're doing. It's not my problem if you make a mess of things. Now, you guys have noticed with your editor that sometimes when you try to open the same file again, it gives you some sort of warning message. How does that actually happen? Is that the operating system telling VIM, oh, you've got this file open somewhere else? What happens if you try to open the same file in like Emacs or, I don't know, some other terrible editor notepad? Ooh, it's scary. What would happen? Nothing, right? How does that work? It's just a fun software engineering question. So you open one file in VIM, and then you go to another window, and you try to open the same file. And VIM shows up this scary message saying you've already got the file open. How does that work? Yeah. Yeah. So VIM creates a temporary file in that directory. How many people have ever had this happen when you didn't have the other file open? And then you have to read. You know, you actually have to read. It's like, oh, do this, or do this. There's a delete button, so does it work, anyway. So yeah, all of that stuff happens outside of the OS. The OS essentially really provides no help to processes for that type of file-based coordination. And so in this case, the only reason it works for VIM is because VIM knows how to cooperate with other instances of VIM. If you open up some other editor that doesn't understand that the fact that there's a VIM swap file there means that the file is being edited by another editor, all hell will break loose. And you can do whatever you want. You can edit the file in two different places at the same time. Maybe that'll give you a leg up in doing assignment two. I don't think so. But you could try it. OK, anyway. So what is the memory interface? What is the interface to memory? Talk about things as having an interface. What is the memory interface? It's not a big interface, not particularly complicated. What can I do to memory? What are the two things that a couple of things in memory supports? What's that? Not really. That deep reference, what can I do? Memory has an interface. What are the two things you do with memory? Yeah? Read and write. How about load and store? So load says I give you the address. Or the address is really just a key. And what I expect to get is whatever the previous store returned, or if I haven't done a store, then maybe the result is undefined. So it's just load and store. Now the trick is, what we're going to do to expand our idea of memory is we're going to obey this interface while changing things around behind the scenes. Because it turns out that there's a number of different things that can obey the memory interface. So in order to implement address spaces, we have to break this connection between the addresses that are used by a process and actual physical memory. The way that we do that is we introduce this concept of virtual memory. How many people have heard of virtual memory? Awesome. All the memory we've used is a programmer. In 99.9% of cases has been virtual memory. This is the concept that is so baked into the design of modern systems that it's probably completely, it really have never realized what's going on, but it's cool. So here's what happens. We refer to data accessed. So virtual memory is stuff that acts like memory. It has to preserve the load and store interface that normal memory would. However, virtual memory can point to, so virtual memory addresses point to something that acts like memory. What does that mean? It means that when I write a value to it, the next time I read a value from that same key, I should get the same thing back. What are the permanence assumptions about memory? So if you're a programmer, you've thought about this. Use memory. Memory's awesome. What's the downside of memory? To the degree there is one. What are the permanence assumptions you make about memory? I write a value. So will that store, will the load, always return the previous store value? Always. What are some things that can happen that would cause it to not return that same value? Oh, nothing. What was that? Say it louder. Interrupt. No. Oh, come on. You guys know this. You guys probably just, it's so obvious to you. When are contents of memory lost? When the computer turns off? So that's a good point. So memory is not forever. When else? From a process perspective. When the process exits, thank you. So if you start up a process and you build some beautiful data structures in memory, and then you get killed, where are those things that you put into that beautiful data structure? They're gone. So that's the other assumption we make about memory. The processes have to make about memory is that, can I help you? No, I don't see you as being it. OK, sorry. I didn't see you. So a process, so as long as we preserve those assumptions, so memory is not going to suddenly take on magical powers. Virtual memory is, for a process, it expects it to behave the same way as normal memory. So if I shut down and restart, the contents of virtual memory are lost. If the machine shuts down, then the contents of virtual memory are lost. Shuts down at that point. Now, the nice thing about virtual memory, compared with actual physical memory, is that the semantics are a lot richer. So I get all these benefits from using a reference that the kernel forces processes to translate. So some of the things we talked about. I should have checked. Sorry. So one of the games I get to play with virtual addresses is where the data is actually stored. So a physical memory address, where is a physical memory address located? Not a trick question. Memory. It's one of those sticks of memory that are inside your computer, right? A virtual address can point to memory. It can also point to memory but actually have been moved to disk. So it turns out, so I could translate virtual addresses to physical addresses, right? I can also translate virtual addresses to disk offsets. Has anyone ever used Mmap before? Too bad. So Mmap is a system call that allows you to tell the kernel, I would like you to take some file and load it into my address space so I can access it like memory. So at that point, you have a chunk of memory and all the reads and writes that you make from that part of memory will end up on disk. So this is a nice way. Remember I just said, when you shut down, all your virtual addresses are lost. If you have your virtual addresses mapped to a file, that's not the case. And those will be preserved. So it's actually a nice way to save data structures. This is something that databases do all the time. So this is how one of the main ways that databases load data structures into the process and access them. It's not by, you know, just think about it. I mean, it's hard to implement complex data structures using read and write and lseq. That's just a mess. If I make the file look like part of my address space, I can just set up a structure in there and then access it and let the file system do the work for me. It turns out that I can actually, if I wanted to, I can translate virtual memory to another machine. So a virtual address might not even point to memory that's on my own machine. I might point somewhere else across the network. And so rather than tying the semantics of virtual memory to actual physical memory, there's all this extra richness that I get here. And it's all made possible by the fact that I'm translating these addresses and the kernel has a lot more control. Okay, and then virtual addresses are also frequently used to access hardware. So rather than communicating with hardware using some other method, I map in a part of the hardware ports into, usually into the kernel's address space so the device driver can use it. And then when I read and write to that memory, those reads and writes actually head out over the device bus and hit some hardware port. So this is a common way to make hardware devices accessible to device drivers. Okay, so we have the same, you know, when we're talking about virtual memory, the process expectations should be the same as whatever the virtual memory is supposed to point to. So if I'm using virtual memory to point to physical memory, then I assume that those values are stored transiently. If I shut down or if the machine shuts down, the data is lost. However, what happens if I use mmap? Let's say that I use mmap to get a portion of a file into my address space and I make some reads and writes to it, how long do I expect those to last? Where should they go? So the process told the kernel, I want you to load, I want you to make this part of the file, present in my address space and allow me to read and write to it, when I read and write to that part of memory, where should the writes go? Eventually, they might be cached in memory, but where should they go eventually? It should go to the file that I told you to mmap, right? So in that case, I expect those values to be stored permanently. Device ports are an interesting case here because in the case of a device port, hardware can actually change the value of those registers independently. So remember before you said the interface of memory is if I store a value and then load it later, I expect that I get the back the same value with the hardware device that's not true because the hardware device may be changing that value itself, right? So you don't need to worry too much about that. Okay. The other thing I can do with virtual addresses is layer on additional protection. This is pretty important. This can be done in a couple of different ways. It can be done at hardware level and it also can certainly be done by the kernel itself. So usually there's some portion of the virtual address space that is only accessible to the kernel. And that gives the kernel the ability to protect memory. It gives the kernel all the benefits of virtual memory that processes enjoy, but it also gives the kernel the ability to protect certain memory addresses from processes. What do you think happens to a process that tries to access kernel memory? What would you do if you were the kernel? Kill it, right? Cause it to exit, right? Yeah, I mean, that's a misbehaved process. It's either broken or trying to do something it shouldn't, shut it down. And then virtual addresses can also usually, and this depends a little bit on the hardware support, but they can usually also be assigned even more fine-grained permissions of the kind that you associate with files. So I can tell the kernel can establish an area of the process's memory that it's allowed to execute instructions from. What part of my address space would I need to make sure I can execute instructions from? The stack, that's interesting. The code section. So it used to be, for a while, people would amuse themselves by writing what's called self-modifying code. Has anyone ever written self-modifying code? So it used to be people would write programs that would rewrite their own instructions as they were going along. It's kind of wild, right? I don't know how exactly you do that. It's harder to do now because in a lot of cases, the code section is not only marked as executable, it's also not marked as writable. And that's just a safety mechanism. It means that once the operating system has loaded the code in during exec, it prevents the process from accidentally modifying its own code. So I might have a buggy part of my program that accidentally writes something into my code section, and then if I didn't mark it as writ only, I might try to execute that garbage later. I'm sure you guys will experience the fun and assignment three of trying to execute random stuff. So if you take random values out of memory and start telling the processor it should execute them, it's very interesting. It usually doesn't last very long, right? But usually you get to something that the processor will say, I have no idea what this is, and then throws them to the weird error. All right, read, write, execute. I can also mark certain static variables in my address space as writ only. That's helpful if I know that they won't be modified. Constants that are never modified. Okay, any questions at this point? Virtual addresses, virtual address translation. I'm gonna start talking about where, so if I met and I have an address space, now we're gonna talk about where do virtual addresses actually come from. So one place, one system call that creates virtual addresses that needs to create virtual addresses is exec. Because exec, remember, exec's job, one of the main parts of exec's job is to lay out the initial address space for the new process. So it uses the blueprint in the out file to figure out where you want the code to go, where you want the stack. There's usually no initialization on the stack, right? It's mainly the code and any sort of static variables. And then if you have any global constants that are used in the program, it creates space for those as well. And so, and exec creates virtual addresses. You can think of it as creating virtual addresses as a point to memory. So my code is stored in memory, the variables, the heap usually is initialized, but there's not really any space allocated for it until malloc starts to run. And then it creates the starting point of the stack or indicates where the starting point of the stack should be for the first thread, right? So going back to our P-map mappings, you remember we were looking at bash, this is from maybe a month ago now. So what are these addresses over here? Are those actual memory addresses? What kind of addresses are they? Those are virtual addresses, right? If I started two different copies of bash and ran P-map on both of them, remember this is showing me the memory mappings for a running process. If I started two copies of bash and ran this command, would these values be different? Yes? They would? No, they wouldn't? Yeah, depends on if you fork. Let's talk about the code section. So if I fire this up for another copy of P-map, am I gonna see, if I start up another copy of bash, am I gonna see the same code section at the same place? Yes or no? Okay, why not? Right, but remember, what's the nice thing about virtual memory? What's that? Right, and what was the nice feature, what was one of the nice features I got out of creating this illusion? Yeah, so I'm glad we had this conversation, right? These are virtual addresses, okay? When bash starts up and uses the L file to set up its address space, it does it the same way every time. So again, if I fire up P, let's say I start another shell and fire up P-map, where is the second shell gonna have this blob of code located? I use the same L file. So what is the address going to be for the code section in my second copy of bash? Yeah, it's gonna be identical. Let's pause here and talk about this for a minute, okay? These are virtual addresses. They point to physical memory, but P-map doesn't tell you where they point. P-map just shows you the virtual addresses that are in use. And actually I wanna point something else out, so this address down here, so let's say I showed you this output and I said, are these physical addresses or virtual addresses? What's a clue that these are virtual addresses? Let me give you a hint. This machine, let's say this machine has a gigabyte of RAM. How can you know for sure that these are virtual addresses? What's that? Well, they could, so physical addresses would also be hex values. We love us are hex when we're talking about memory, right? But a hex value that starts with B, where is that? You don't even need to know your hex very well, right? So a 32-bit integer stores how many gigabytes? Four, right? 30 is a gigabyte, the other two is four, four gigabytes. B is like big, right? It's bigger than eight, you're into the letters now. So where is this address? If the machine has one gigabyte of physical memory, is this address something that would refer to a byte of physical memory on this machine? No, it's too big, right? This is like three GB. Now you have to also assume that the memory starts at zero, whatever, right? But yeah, assuming that my physical memory starts at zero, this address is way too big to be part of physical memory. Now, my address space can be much bigger. And on Linux, I think the address spaces are almost, well, you can see, I mean, at least goes up here. The stack is right up at the top of the address space. So I think the address space goes up to BF. I think this is a three, I shouldn't promise, a three gigabyte virtual address space that's used in Linux. So these are virtual addresses. And because they're virtual addresses, the operating system controls what actual physical memory they point to. But the nice thing about this is every time bash starts, it puts its code at the same place. And so all of the code that refers to different, like when bash is referring to one of its variables, it knows exactly where that variable is every time because the variable is in the same place every time, right? Any questions about this before we go on? It's pretty fun, yeah. Yes, yes. All the addresses, this is another good point. Every address used by a program or a process that we will talk about in this class is a virtual address or no access to physical memory, yeah. We will talk about this, right? Yeah, so we haven't quite got there yet, but thank you for pointing out that this sounds like this is a huge performance problem. Every memory operation now has to translate an address. And so the answer is no, but why the answer is no is interesting. And we'll talk about it more. Yeah, do you have a question? Yeah. So the heap location, the heap matters a little less that it's in the exactly same spot, but it will, right? If I run four copies of bash, the heap will be in the same spot because the way that the location of the heap is determined, it has to do with how large the code segment is and the code segment is the same, right? Yeah, that's a good question. All right, I think I have another example. All right, let me come back to this, right? So the other, well okay, this is another good example of this. So another place where virtual addresses come from is in fork. So when I call fork, one of the jobs of fork was to make a copy of the caller's address space. Now what this means is that the child can use the exact same set of virtual addresses that the parent had when it called fork. So whatever memory the parent had allocated, the code that it loaded into an exact, any variables that are on the heap, the stacks, remember there was only one stack that we copied maybe, but of different threads, all of that should be preserved when I call fork and the child has access to those same virtual addresses, but the child has private memory. It is not sharing memory with the parent. So the addresses are identical, but they have to point to different physical memory. So here's my example. So simple piece of code, initialize a variable call fork. If I'm the, so who am I right here? If return value is not equal to zero, I am who? The parent, right? The parent gets the child's PID back, the child gets zero. This is going to print the virtual address of this variable. Well, I shouldn't put that way. It's gonna print the memory address at the variable, but the only addresses that the process gets to see are virtual addresses, okay? So now what the parent does is it sets I to four and that will print four. The parent has changed, one has written, it did a store into a location in its virtual memory, and when I print the value back, I get what I expected. So let's assume that the parent ran first. Now when the child starts to run, what is the first print statement going to say? What will the first print in the child? What will that print? Yeah. I'm printing the address of I. What will it print? Yeah. It'll be the same. It'll be the same. Yes, thank you. Someone is optimizing their answers, right? It's 0x20010. It's the same value right here. I, fork has made a copy of the address space and since I copied the address space, whatever location this variable was at and the parent, it's at the exactly same location in the child. Now what will this next print F print? Well I set the value to three. You guys missed the line of the code. Print prints two, yeah. So I set it to three. I print the value, so now it's three. You can compile and run this code and it'll work every time, regardless of the order in which the parent and child run. And what this illustrates to you is this is virtual memory in a nutshell. The parent and child just did a right to the same address, what they thought was the same address and yet two different values. So clearly something is going on here. Clearly this address is not the same as this address despite the fact that the addresses are the same. So remember, whenever we're talking about virtual memory, in order to translate a virtual memory address, I need the virtual memory address itself and the process. There is no, virtual memory addresses are completely meaningless without an address-based or a process to translate them. All right, questions about this before we go on? Okay, so we talked a while ago about the fact that fork, so to do this naively, I have to take all the memory that the address-based is pointing to and make copies of it. I have to copy all the physical memory. So all the virtual addresses in the parent's address space, I need to take all that physical memory, find new physical memory, copy everything and set up a new set of mappings for the child that points to private memory. This is particularly irritating because a lot of times the first thing that the child does is call exec. And what exec does is it says I don't want that old address space anymore. We'll come back to this when we talk about clever virtual memory tricks because there is a way to get around this problem that is quite cool, right? That exploits some of the control that the kernel has over how virtual memory addresses are translated. So in terms of our laundry list of system calls that can create virtual addresses, Sbreak can also create virtual addresses. Has anyone ever heard of Sbreak before? Oh yeah, there we go. Someone is either looking ahead to assignment three or just knows their Unix system calls. So what Sbreak does is it asks the kernel to move the breakpoint. That's the BRK and Sbreak, breakpoint. And what this is used by is malloc when it wants more heap. So when malloc runs out of memory and needs another chunk of memory from the kernel, it asks the kernel to move the breakpoint. The breakpoint is essentially the top of the heap, okay? And so what would happen is this and the word heap would get thicker. That also happens, right? The kernel has to make sure that that happens as well when you increase the heap size. All right, so Mmap, we just talked about a little while ago is another system call that can create virtual addresses. So Mmap asks the kernel to create virtual addresses and you have to tell the kernel where they point to in the file. So you can imagine what I do is I take a contiguous part of a file and I create a section of my virtual address space that now maps into that file. And again, this can be a really neat way to access files, particularly if you're doing certain things to them, where the read-write L-seq interface gets cumbersome. Okay, so let me see how much time we have. Try to get through this before we're done. Yeah, okay, I have a minute or two. All right, so let me quickly go through. So this is now the virtual memory addresses that are set up by the kernel are all determined by the specific physical memory layout of the machine and the addresses the machine uses. So on your system, and you guys will get intimately familiar with this, that was very weird. Apparently it's time to leave, I should just shut up. No, I've got two minutes left. So 32 bit addresses on the MIPS, zero to FFF. Somebody came up yesterday and said, you know, yeah, this is a problem, now we're onto 40 bit addresses, yeah. So we're going to wider addresses, we need them, there's more memory out there in the world. Someone else wants to hear me get Rick rolled, I guess. It happened a while ago. So 32 bit wide address, the MIPS architecture at the hardware level. So now I'm talking about physical addresses. It defines four different address regions. The first region are virtual addresses that are translated by hardware. We'll talk about how this works later. The second are what are called kernel direct mapped addresses. So in this case, the address is translated by hardware. The way that the hardware translates addresses in this 512 megabyte region is it takes the address and it lops off the first bit. It's very simple. The top bit just gets converted to a zero. So OX80 million gets converted to what address? Zero. The goal here, and this is a rubber MIPS, R3000 is a pretty old architecture. The goal here was to give the kernel a way to access all the physical memory on the machine without actually requiring the kind of virtual memory mappings that we're gonna talk about next week. Now that you have machines that regularly have gigabytes and gigabytes of memory, this wouldn't work. But at the time, it did because machines had a lot less memory. There's another section which are called kernel direct mapped addresses. I'll leave the mysteries of these to recitation. It doesn't really concern you guys, at least not for assignment three. And then at the very top of the address space, these are kernel virtual addresses. So addresses in this region will trigger translation, which is what we're gonna talk about next week, but they are only usable by the kernel. If a user space tries to access one of these address, if something in user mode tries to access this address, it'll trap and the kernel will run. Okay, we'll pick up here on Monday. Have a great weekend. Good luck on assignment two.