 OK, I apologize to whoever picked that song. It goes on for like 10 minutes, so I got here a little late, so sorry. OK, so today, last time, last week, we had introduced the outer space abstraction, and we kind of created a problem for the operating system to solve, which was the view of memory that we're going to give to processes is going to be a virtual view. It's not going to be real. Today, we're going to start talking about the complication this creates for performance, which is all of these addresses, these virtual memory addresses that we've been talking about, have to be translated somehow to an actual physical address when the contents are in memory. So any virtual address that points to physical memory has to be translated. We're going to start talking about some ways to do that. And this is one of those fun stories where there's a nice interplay between the operating system and the capabilities of hardware. OK, so time at 2 is due Friday. I've had a couple of emails about the late submission policy. There is no late submission policy. You will get the best score for whatever you turn in before 5 PM on Friday. That's the late submission policy. That's already up on the core syllabus, actually. There are no extensions for the assignments, except in extremely rare circumstances. Where the Scott and Guru are working on getting the assignment to submission stuff all set up, that's going to be released pretty soon. Something else that's going to come in in a new version of test 161 is checking for memory leaks. So this is not something that we will have points attached to for assignment 2, although you will get warnings about places where your kernel is leaking memory. However, for assignment 3, there will be points attached. So please start fixing your memory leaks. Given that a lot of you guys aren't familiar with C and doing manual memory allocation, I suspect that your kernels are leaking all over the place. So this will be a good tool to help you find those leaks and fix them. Oh, by the way, I'm sure there's going to be posts on discourse about this, but I just want to own up and do my little mea culpa. There is a bug in the current version of test 161 that can cause it to loop forever sometimes when trying to process certain configurations. The person you are looking at is the person who introduced that bug a long time ago. So at some point when we add this, if your test 161 hangs, you can thank me. Please upgrade to the new version. We just fixed that this morning. So never write an infinite loop. Just don't ever do it. Even if it's for i equals 0 to a million, it should terminate at some point. Otherwise, you will be embarrassed later. OK. Any questions about virtual addresses? So last time, a couple of times ago, we talked about the address face abstraction. We have each process a big uniform view of memory. We pointed out last time that clearly that abstraction requires breaking the connection between physical memory and the view of memory the processes have, and we introduce the idea of a virtual address. Any questions on this before we do just a few minutes of review? Everybody understands this already. Awesome. OK. So what I'm doing here is I've created a new level of interaction. So what does that buy me? I'm forcing processes to translate a reference to memory to gain access to it. What are some of the things that that allows the kernel to do? What are some of the new powers that I get by forcing processes to translate a virtual address into a physical memory address? There are some of the new tricks I can play. Yeah. OK. So I can enforce access. I can enforce permissions and access requirements. So I can prevent processes from stomping on each other's memory. That's good. What else can I do? Yeah. Yeah, I can enable sharing in a nice way. And that sharing can actually, so two processes can actually share the same physical memory at different virtual memory addresses, which is kind of neat. So I can choose if I want to share a process a and process b want to share memory. Process a and process b can both set up different virtual addresses in their own address space to point to the same physical memory. That's kind of cool. What else can I do? I haven't talked about probably the most important ability that this gives the kernel. Yeah. Yeah, so that's a cool thing. I can make memory. I can use discontinuous physical memory to create contiguous virtual memory. That's awesome. What else can I do? One more thing that's pretty critical to memory management. Every time you use an address, I'm going to translate it. Yeah. Yeah, so that's great. So this gives me the ability to do uniformity. Every process thinks that if a process forks 10 times, all 10 copies of that process think that a particular local variable or global variable is located at a particular address, despite the fact that those addresses are clearly different. They have to be. Each process has a private address space. But there's one other thing that other thing is I can move things around. So I can take memory that was memory and I can put it somewhere else temporarily. And when the process needs it again, it has to translate this reference. And that gives me the ability, that gives the kernel the ability to make it look like memory again. And so we'll talk about this in maybe this week or next week. Well, week after when we talk about swapping. OK, remote shared, moved, altered. OK. So clearly the address space abstraction requires that I break the connection between the addresses that a process uses and the physical memory addresses. And if you don't understand this, I would encourage you to go back to last lecture and look at that example at the end again, because that is critical to understand. Addresses the processes we're using are not real. They are translated. And you can convince yourself of that pretty easily by writing some pretty simple C code. And we refer to these as virtual addresses. These are addresses that are going to behave like memory. And last I've asked the kernel to map it into a file or something. These are addresses will behave like memory. They obey the memory interface. But they may or may not be actual memory. So a physical address points to actual stick of memory. A virtual address points to something that acts like memory. That's to obey the memory interface, in most cases, although we talked about some variations of that. So we can see what you use. Absolutely. Every address you've ever used in C is a virtual address. Do the experiment. I mean write a little C program and just play around with this. Write a C program, have it 4, 10 copies of itself. And you can see they are using identical memory addresses and storing different values. So clearly something weird is going on. You've known something was wrong with the world all along. Just telling you what it is. OK. So yeah, we talked about virtual addresses. And because I'm forcing the process to translate a reference, I have the ability to do a lot more and layer on a lot more power into this process. So we talked about four system calls that create virtual addresses. Now remember, when I'm creating a virtual address, all I'm doing is I'm giving the process permission to use this address to address something that is kind of going to behave like memory. In some cases, these things point to memory. In one case, they actually don't. So exec, what is, what virtual addresses does exec create and why? Yeah. Yeah, so it moves, yeah, exactly. You guys probably are figuring this out by now. That is one thing it does. How many people have that working? OK, good luck to the rest of you. Have a plan, all right? You won't get anywhere without a plan. Yeah, so what else? But that's only, that's just the part of exec that you guys know and that it's going to give you nightmares for the next week. The exec does a lot of other things. Most of it's done for you in OS 161. What else, what other virtual addresses does exec create? What other parts of the address space? I just called I to use somebody else. Decides where to start the heap, that's true. How does it make that decision? Yeah, say it. Maybe the file labels? Yeah, those are usually the, so the file table lives in kernel memory. We're talking about user virtual addresses. Remember, what other big part of a process's address space is exec directly responsible for? Yeah? Code? Code, yeah, that's the big thing. Code, static variables, everything that's initialized when the program starts up. So when the program starts up, I've got to move all that stuff from the L file into the address space. And that's mainly what exec does. If you guys look at loadL, it's doing a lot, it's doing quite a bit of work for you. You're just putting the cherry on top. OK, fork. Where does, what virtual addresses does fork create and why? Yeah, and what is that, what virtual addresses does that address space have when it starts up? Yes, modulo a few things. The child starts up with the exactly same addresses allocated as the parent. That includes all the code, all the static variables, any global variables that the parent may have changed, all of the heap that the parent had when it was running, everything. The parent and child are supposed to be mostly indistinguishable once they start to execute. What about sbreak? What's sbreak for? What does it do? I just called on you. He did give away the answer, though, if you were listening. I'm working on my ability to tolerate uncomfortably long silences in class. So I can get somebody to speak up. Sbreak, what does sbreak do? What uses sbreak? I'll give you this one. Extends the process heap. So when you've been using malloc, malloc calls sbreak internally, sometimes, not all the time, but sometimes. Finally, mmap, who remembers what mmap does? Mmap is the one that creates virtual addresses that do not point to physical memory. There's a hint. What do they point to instead? Don't point to their virtual addresses. They don't point to virtual memory. Mmap says, map a portion of what into my address space? Yeah, a file, not even the disk, a file. That map allows me to create a portion of my address space that looks like memory. I can read and write to it like memory, but all the changes to it are written down to some part of a file. OK. All right, we won't go through this. This is boring. All right, any questions on virtual addresses before we go on? So it's important to get this concept sort of firm in your mind. So now we're going to talk about actually how we get this to work. So somebody pointed out last time that we've created a potentially serious problem for ourselves. What is it? The virtual address abstraction requires that the kernel translate or the kernel participate in the process of translating every memory access. Yeah, I mean, memory just got super slow, super slow. Imagine if every time you access memory, you had to trap into the kernel, the kernel had to somehow do something to the address. I mean, your machine would even your modern incredibly fast machines would seem incredibly slow. So this is clearly not OK. Our goal here is that the kernel, in cooperation with hardware, the only way to make this fast is to get the kernel out of the way. This is one of the few cases where software is just way too slow to do the job. If the kernel has to be involved with every memory access, the machine would never, like computers just would never have happened. You guys wouldn't have one, because people would have said, this is way too slow. So the only way to make this system faster is to get the kernel out of the way, get the kernel off the critical path. And that's because the kernel is too slow. Now this is another case of a nice divide between software and hardware. So going back to the CPU, going back to how we did CPU scheduling, what is the kernel's primary responsibility when multiplexing resources? What is it in charge of? Yeah. Yeah, so what we want is we want the kernel to control all the policy aspects of address translation. How addresses get translated, permissions attached to them, blah, blah, blah, stuff like that. But we want the hardware to provide mechanism so that this is fast. So the kernel needs to maintain complete control over virtual address translation, while the hardware makes sure that most translations don't require the kernel's help. Does this make sense? So the goal here, the kernel maintains control over how address are translated, but is not involved with translating specific memory addresses. So here's one way that we could try this, which is what I refer to as explicit translation. So here's what happens. Process says, I have this virtual address I want to translate. Can you tell me what physical address it translates to? Can I do this? Can the kernel answer this type of query? Yes? How many people think yes? How many people think no? Why would, what's the problem with doing this? What have I just given away? Well, there's two problems with this. First of all, remember, can a process use a physical memory address? That's over. User processes do not get to use direct physical memory addresses. In fact, the kernel doesn't even get to use them anymore. All addresses are translated in some way, according to rules of the kernel sets. But all addresses are going to trigger some sort of hardware translation. So there's no such thing. I can't use a physical address. I also can't tell the process how I'm translating things, because if I give away my translation now, if I want to change things in the future, if I want to make sure the process stops using that memory, I can't do it anymore. So all addresses must be translated. I have to translate all the addresses, or in blurry basis. OK. So this is how things actually work. And this is why you've never noticed this before. When a process does a write or read to a memory address, this triggers a translation process in which something called the MMU, anyone want to guess what that stands for? Memory Management Unit, there we go, is in charge of translating addresses. That's the job of the Management Management Unit is to organize this address translation process. And the first time a process tries to translate a specific address, the MMU doesn't know how to do it. And so what the MMU is going to do is it's going to ask the kernel for help. How does it do this? What's the mechanism here? Yeah, this triggers an exception. Remember how we talked about most exceptions are completely benign and occur completely naturally? This is the case of an exception. When the process ran this instruction, it did not know that this exception would be triggered. But this instruction cannot complete until the MMU knows how to translate the address. So it has to figure this out. It traps into the kernel. The kernel has to do one of two things. Either the kernel has to translate the address or what else could happen? The kernel has to tell the MMU how to translate this address, what physical address it translates to, or I can do what? I hear muttering. Does someone want to mutter louder? Make a route? Well, no, so that is telling the MMU what to do. So either the translation can complete or what else could happen? Segmentation fault. It's possible that this address is not translatable by this process. The process doesn't have permissions to do what it's trying to do to this memory address or the process doesn't have this address as part of its address space, in which case I can't complete the instruction. Yeah, Josh. Yeah, so how do processes run out of memory? So processes usually run out of memory on most systems because they hit some sort of predefined limit that the OS sets that they can't exceed. That's the, so on a well-administered system, you have policies in place. So for example, a user program at a particular priority can't allocate more than a certain amount of memory. Now there's cases where the OS can run out of memory and there's also cases where there's not a lot to do. We'll come back to this. The way that Linux handles this is just by killing things at random. Basically, right? It's not, I'm serious, you guys are laughing, but it's not completely random. I shouldn't make fun of like this. They have this thing called the out-of-memory killer, but you talk about black magic. I mean, that thing is impossible to understand, but essentially what it does is, because at some point you're kind of like, the only thing I can do is kill a process. What's not important? And it tries to figure that out using a bunch of heuristics. So we'll come back and talk about that in a moment. Does that answer your question? Yeah. Cool. All right, so this is how I want things to happen. The MMU sees an address, it doesn't know 100% how to translate. It generates an exception. The kernel runs. The kernel tells it, hey, by the way, this virtual address maps to this physical address. The MMU says, thanks, and the store completes. Notice that the process has no idea what just happened. All the process done is do a loader, a store, or whatever the equivalent memory operation is, and all this happened automatically. So from the process's perspective, it just executed one instruction, and now it's gonna execute the next instruction. All this happened behind the scenes. Okay, so here's our example. Let's go through it again. The MMU triggers the kernel, doesn't know what to do. The kernel tells it how to do something, and that triggers it, so now we have a diagram for the visual learners among us. Now, what happens when this process starts running and another process starts to run? Does the MMU still know how to translate 0x10,000? It does. People are nodding. Who wants to make an argument why it doesn't? Yeah? We'll point to it differently. Remember, a virtual address is meaningless without a process. If I ask you to translate a virtual address, you cannot do it unless I tell you what process is running, because every process has virtual addresses that are in the same range that point to completely different places and completely different parts of memory. So every time a process switches, I have to reset the MMU and teach it how to translate addresses again. This is another part of the overhead of doing a context switch, is that the MMU's translation cache, which we're gonna talk about in a sec, is flushed and has to, I have to reload things again. Okay. Now, here's the nice thing. So let's say it's the same process, it's running along, it hits this address again, now what happens? So the first time I translate the address, the kernel gets involved, tells the MMU what to do. The second time I translate the address, what happens? Yeah? It's in the cache. And so the kernel doesn't get involved. This is the trick that we use to get the kernel out of the way. Now, we're gonna talk in a lecture or two about why this works. Specifically, there are patterns to process memory management. If process is accessed random memory all the time, if the memory accesses by processes were completely randomly distributed throughout all the parts of their address space that were valid, these caches would not work very well. But they're not, as you would expect, right? If I'm executing code, all the memory, let's say I'm running a bunch of instructions and I'm just loading and storing a bunch of instructions from the same part of memory, all those loads are just sequential from my code section. So it turns out there's quite a bit of locality that allows this to work. In general, if I tell the process, if I tell the MMU what to do, I don't have to tell it again for a while. You can translate a bunch of other addresses. Okay. Now, this can also happen. So this is the case where the process didn't know what to do. This just makes sense. Address translation, okay, yeah. Good question. So when would I need to remove an address from the cache? I've got this MMU. You've made the correct assumption that the MMU cannot cache an infinite number of translations. When do I need to remove translations? Yeah. So if it's out of the process exits, if I do a context which I have to clear the cache, what other condition might I hit where I need to clear the cache? Yeah? It gets full. It gets full. And at that point I need to figure out what's the right translation to remove and get rid of it. So, and those are the two reasons that I would remove a translation. The other case is it's possible that the kernel has made a change to how that memory is mapped and needs to remove it from the cache. So if I've made a decision that has caused the process to not have access to that memory anymore, then I need to make sure to get it out of the cache. We'll talk about this when we talk about swapping. Yeah. So now, so this example, the problem with this, well okay, what's a problem with this approach as I've described it so far? How effective is this cache going to be? This cache, according to my silly example, is translating one virtual address to one physical address. So how effective is this cache going to be? Let's say that I'm executing an instruction sequentially. Every one of them is gonna trigger a new translation lookup. And the kernel's gonna have to run. So this is bad. So in general, what we try to do is, what we need to do is find ways to tell the MMU how to translate more than one address at a time. If all I do is tell the MMU one at a time, here's how this byte translates to, this virtual byte translates to some physical byte, this is gonna be a mess, it's gonna be really slow. And so what we're gonna do now, we'll talk about several ways to optimize the process by being able to instruct the MMU how to translate a bunch of addresses at the same time. So I don't have to, the kernel doesn't have to keep running and slowing everything down, okay? So here's the simplest virtual address mapping works as follows. I assign each process a base physical address and a bound. So the base address is the place in physical memory where this process starts, don't worry, I have a diagram in a minute. And the bound actually determines the size of the virtual address space. And then, here's how, and there's two things that the algorithm needs to be able to do. The first thing it needs to be able to do is figure out, is the address valid? The second thing is, it has to be able to translate that address, translate a virtual address to a physical address. So this algorithm, or you can think of this interface as having two parts, one is check, the other is translate. Also keep in mind, this is implemented in hardware. And hardware is pretty limited in certain cases in terms of what it can do. And so making this fast means making this as simple as possible. The nice thing about base and bounds is that it's really simple. To check, I just make sure that the virtual address is within the bounds that I've assigned to the process. And to translate it, I take the virtual address and I assign it and I add it to the base address and I'm done. So this is very easy. Here's my example. Let's go through 0x10000 again. The MM, so now this determines for each approach, there's different pieces of information that the MMU needs to retrieve from the kernel to translate an address. So in this case, I ran, the process tried to translate an address, the MMU didn't know how. And the kernel said here are the base and bounds for this process. Who can finish the job? So how do I translate 0x10000 into a virtual address? What would the virtual address, sorry, physical address, what would the physical address be that 0x10000 in this case would translate into? Yeah. Well, what is it? Exactly. So I take the base, no, no, no, no, wrong, sorry. How do I do this again? Base plus what? Base, virtual address plus base. So what's my physical address? Yeah. 50600. Let me pause and just reveal to you something about me that's going to make you very happy, because I'm a relatively benevolent person. When we do these examples on exams, the arithmetic is always base 10. Thank you. Actually, I should say you're welcome, because no one wants to do hex arithmetic. At some point I'll fix the slides, but when we do this on exams, you guys can do base 10 arithmetic and use your fingers like everybody else. But you should probably at some point learn to add hex. It's not, well, I shouldn't say it's not that hard. I don't know how to do it. Anyway, so these examples are all pretty simple. So I take the virtual address, here's my base. Now here's the thing. You can notice this is the process's entire address space. So its virtual addresses started zero, its physical addresses started 0x40600, but my whole address space is right here. And so this hits at one of the limitations of this approach. The nice thing about this is once I've told the MMU these two pieces of information, I'm done. There's nothing else I need to tell it. And it can translate other addresses completely happily. Here's another one. There's the case where you'd have to do some hex math. But I can translate all of the addresses in the address space until I'm done. Now what does this trigger? How do I translate this address? What does this address translate to? Boom, yeah, this is out of bounds, right? My check is to determine, is the virtual address less than the bounds? In this case, the size of my address space is only hex 30,000. So this address is off the end. This would be out here somewhere and I can't translate it. So this would cause the process to crash. Or this would call this the kernel to fail the translation and abort the process. So the nice thing about this, yeah, sorry. So for base and bounds, it's a requirement. The only way this works is if the entire address space is located in memory and located in the contiguous physical memory. So despite the fact that this is simple, it stinks for our address space abstraction. The address space abstraction, the idea, remember, is I was gonna give address spaces this huge view of memory that was big and contiguous and it lets them spread out and put the code way over there and the stack way over there. And I can't do this with base and bounds. So with base and bounds, I would have to have these tiny little address spaces and now I create all these problems associated with doing it. So this is the major con of this approach. There also turns out to be a lot of external fragmentation when I try to lay these out because of how they fit in physical memory. So yeah, remember again, if this is my base and bounds, I've got way too much stuff. And if I encourage processes to spread out, even if I could get this to work, I still have all this wasted space. So all those gaps between the parts of my address space have to be located in physical memory. I have to allocate physical memory to them. So yeah, this is a problem. So let's try something else. Let's get closer to a modern approach. You may have noticed if you look at this diagram that the traditional layout of process address spaces isn't a, you know, it's a bad fit for a single base and bounds, but it's not a terrible fit for more than one base and bounds. So I can extend this idea using something called segmentation. And segmentation is not the modern way that we actually do address translation, but it's intuitive enough that a lot of systems still include the idea of a segment, despite the fact that the granularity of address translation is actually small. We'll come back to what this means. The nice thing about segments is I can, rather than having this one huge segment, so base and bounds is like one segment for the entire process, I can use individual segments to cover each contiguous part of my address space. And this means that I can eliminate all of that wasted space. And I can assign different permissions to them, they can be different sizes. So going back to this example again, I have one segment here, I have a second here and I can have a segment for each stack. So let's see how this works. Every segment, so there's one additional piece of information I need to get segmentation to work. With base and bounds it was implied that the process virtual address space started at zero. Here, each segment's gonna start at a different virtual address, so that's an additional piece of information that the kernel has to communicate to the MMU to get this to work. So there are, each segment has a start virtual address, a base physical address, so that's where the start maps to in physical memory and then a size or a bound. To determine, so there's an additional step here that the MMU has to perform, which is when I try to translate a virtual address, it has to ensure that that virtual address is inside a valid segment. If it doesn't know, if the virtual address is not inside a valid segment, it'll produce an exception and the kernel has to either tell it that there's a valid segment that exists that contains this virtual address or kill the process if the address is invalid. Once I determine the segment that this virtual address falls into, there's one additional little piece of math I have to do which is I have to subtract off the start of the segment before I add the offset to the base physical address. I can't believe I said that right. I think I did actually. Let's use the diagram. Okay, so here we go. OX 10,000, our favorite address. Kernel gets involved. It says, yes, MMU, there is a valid segment for that process that contains this address. It starts at virtual address OX 10,000 and the location of it in memory is OX 43,000 and the size is 1,000 hex. So what physical address does OX 10,000 translate to in this example? Yeah? OX 43,000? OX 43,000? Remember, the start translates directly to the location of physical memory. Any other addresses that are inside the segment I have to compute an offset and do the math but the start is very easy. So the start always maps directly to the base. What about OX 10,000 and 10? Where would that map? 43,000 and 10. Compute the offset with respect to the start of the segment, add it to the physical. So intuitively there's a chunk of physical memory out there that corresponds to this segment. And what I'm doing is I'm mapping the physical addresses corresponding to this segment into that chunk of physical memory. So, now in this case, what happened? OX 400, is it inside of a segment that the MMU knows about? Yes or no? No, it's not contained. So this segment starts at OX 10,000 and runs up 1,000 hex. So it ends at OX 11,000. This is not inside that. And so the MMU has to ask the kernel, does there contain a valid segment? Now normally the answer here would be no because this is close enough to being a null pointer that I would probably make sure that there's no valid translations in this part of the address space. But here because I didn't want to use big numbers, let's pretend that it's true. There is a segment down here, it starts at OX 100, it has a 500 bounds and maps to OX 16,000 in physical memory. What is the translation of this address? 15,000, 16,300, exactly. So I took 400, I subtracted off the start which was 100, I had an offset of 300, I add that to the base. How many questions about this? This is pretty much math. But simple math, like third grade math, right? There's no linear algebra required to do this. Just addition and subtraction and hex, sorry. What about this address? What do you think's gonna happen? I should do this next year and have it be valid because it could be valid, it could be a valid address. That beef is a silly address. It turns out to be always invalid on the MIPS that you're using, but it could be valid. So if I can't find a segment, I kill off the process, right? And now you understand this error message. That's exactly where this error message comes from. So you've seen this error message segmentation fault core dumped. It means that you try to access invalid memory but the reason it's a segmentation fault goes back to the idea of using segments to manage virtual memory addresses. Which again, is still something that operating systems do as a logical concept to understand how to apply permissions to things and stuff like that, despite the fact that the mapping mechanism has done a little bit different. So now you know what this means and you can impress all of your friends who are still writing C code for some reason. Tell them to do something different. Okay, so is this an ideal solution? So there's some nice things about it. One nice thing about it is it's pretty simple. It's still pretty simple. It's a little more complicated than base and bounds. Obviously, there's a little more information. There's a little more work I have to do. It's really nice that it allows me to organize and protect contiguous regions of memory because in a lot of cases, that's how address spaces are laid out. And this is the reason that this has survived to the modern day, this concept. And it's a better fit for address spaces because there's not all this internal waste. What are some problems with this though that might lead me to wanna do something a little bit different? Yeah, so I still, if you think about having to be the kernel and allocate physical memory, which is what you're gonna have to do, this is still a little bit of a pain because every segment is a different size and so it creates this sort of external fragmentation problem where I might need to allocate a segment that doesn't fit in any of the areas that I have free and physical memory. So that's too bad. What else though? Yeah. Yeah, where? Yeah, so not only does this create an external fragmentation problem, it also creates an internal fragmentation problem. Imagine you're talking about Microsoft Word, which is always my favorite thing to beat up on. How much source code do you think Microsoft Word has? Like a small amount? Medium amount? Like an enormous amount? Yeah, we're getting warmer, right? All of those features that you don't know how to use, all those ways to do the same thing that are super confusing and poorly documented. I'm gonna get in trouble one day. I'm never gonna get money for Microsoft for anything, clearly. Anyway, like I bet even who thinks that there are Microsoft Word power user, right? Who could teach me how to use Microsoft Word? Really? It's that bewildering? There's one person up here. Okay, so that's awesome. I bet that you haven't used half of Microsoft's Word's features. Is that fair? You think there are menus that I could open that you've never seen before? Yeah, okay, so all of that code has to be in memory despite the fact that you're not using most of it. So the fact that the entire segment either has to be in memory or not in memory is a problem because there's a lot of potential for internal fragmentation or internal waste inside those allocations. And then I have this external fragmentation problem as well. So this is not quite good enough. You guys seen the squirrel fishing videos? Has anyone seen those? Google Squirrel Fishing. Google Squirrel Fishing Harvard specifically, right? There's a fantastic website set up to document a really important research project that took place right outside where I used to work then involved trying to determine how aggressive squirrels would be if you dangled a peanut on a string. And it turns out some of them are real go-getters. Okay, anyway, there's one that actually, there's a liftoff picture, but they're actually holding onto this peanut for dear life and it's being tugged into the air. It's awesome. I mean, that is the squirrel that you want to hire. All right, so let's regroup here and think about our design requirements for this problem. So we're getting closer, right? Segmentation is useful. Base and bounds is way too simple. So what would we like ideally? Ideally it would be awesome if we could map any virtual byte to any physical byte. That's kind of where we started. The problem with doing that is that the amount of translations that the MMU would have to know about gets really, really huge. So both segmentation, if you think about base and bounds and segmentation. So if I need to map any virtual byte to any physical byte, the MMU's cache is going to be able to map a very small amount of physical memory. With base and bounds, I could actually map the entire address space with one piece of information in the MMU. With segmentation, I can map the entire address space with only a few pieces of information in the MMU and I did a better job because I didn't have as much waste. So we're going to keep going in that direction. We're not going to be able to map any virtual byte to any physical byte. And the operating system won't be able to do this anyway. The question is, is there a middle ground and is there a way to use a smart piece of hardware? Okay, so let me introduce something called the translation look-aside buffer, or the only thing you will ever call it is the TLP. How many people think that they know how this works? Really? Okay, I was expecting a couple of you. And this is one of our classic operating system design principles. On some level, your computer, I think I've said this before, is really just a series of caches. You have small, fast things that cache slower big things. And that starts, I mean, you can think of that starting on, and the registers on your chip and ending with Google's data centers. I mean, that is the modern cache hierarchy. It's incredible how large it is and how many different sizes there are. So this is what we're gonna do. Something's too slow, we're gonna throw a cache at it. The thing that's too slow here is address translation. And the cache that we're gonna throw at is something called the translation look-aside buffer. So these take advantage of a very clever hardware capability that allows me to do a parallel lookup of a bunch of things at once. This is something called content addressable memory. So most memory, let me see if I can explain this. Most memory is addressed by the location. For content addressable memory, you actually address it by the contents. So if I'm looking for something, this is very helpful because instead of trying to find it by looking at all the places it could be, I tell the hardware what I'm looking for and the hardware can find it extremely efficient. All right, so here's to some degree how these things work. So if I was looking, if I was normally, let's say I'm trying to translate a particular virtual address and I'm using the TLB to do this. One way would be to search individually every mapping. How does this scale? So if my TLB has n entries, how long will it take to do a linear search through the n entries to find the math that I'm looking for? Order n, that's not too hard. I haven't taken 331, but I still know how to do that type of big old stuff. So the nice thing about a CAM is I can do all these searches in parallel in order one. Now you might be asking why is this awesome property not exposed to me in more ways? The problem is that the amount of circuits that you need to implement this grows roughly proportional to the number of slots squared. It is not good. So CAMs tend to be small. You might have a CAM that allows you to search for 256 entries or 512, but at some point the amount of circuit area and transistors that you have to devote to this thing is not scaling well. So you can't make a four gigabyte CAM. That would be awesome if you could. Computer size would be different if that was the case. Very different. On the other hand, I can use a small CAM to help the MMU cache certain address translations and search for them efficiently. And it turns out that on modern systems we have all these different permutations of certain CAMs that have certain properties and they have these nicknames and stuff like that. Who cares? That's stuff for the hardware we need to worry about. Just think of the fact that you can do this efficient search through a small number of things in O1 time. Well, I think I just explained all the caches. I can't make this CAM unbelievably large. Now if I could make this CAM unbelievably large, I could actually do what we wanted to do which was to translate single virtual bytes into physical bytes. Because eventually I could teach the MMU how to translate every virtual byte in the process's address space to individual physical bytes. And that would be pretty cool. Because the CAM is a limited site, I can't do this. So let's think about trying to develop a middle ground. So segments are too large. Individual bytes are too small. This would be the TLB wouldn't be able to cache very many entries. And so is there a middle ground here? The correct answer is, of course, there's a middle ground because there's a middle between these two things. And the middle ground is something called pages. So I have ended early for the first time this semester. That's interesting. On Wednesday, we're going to talk about page translation. Good luck with the sign of two. I'll see you guys then.