 That was not a bad song. Thanks to whoever recommended that. I'd say the person who recommends the song that I like the most gets 1.1 extra credit points. So far he's in the lead. OK, so what are we going to do today? So today we're going to pick up where we left off last time. So last time, we had talked about multiple ways to translate. So we've created this problem for us, which is that we have to be able to efficiently translate virtual addresses that are going to be used by applications into physical addresses that actually address memory. And that gives us a bunch of capabilities. But there are a couple of problems that come along with this. And we're talking about one of them, which is how do we efficiently perform that translation. And so today we're going to introduce you to the modern way of doing this, which is to use something called pages. And then we'll work on Friday. We'll talk more about pages and some of the things we can do with where pages go and things like this. OK, so Simon2 is doing Friday. So do with Simon2. I can see people have been cutting their hair to prepare for Simon2 to be done. That's one of the things I consider to be one of my capabilities to sit. For some reason, I'm very good at detecting when people have cut their hair. I don't know why. Men only. When we get the haircut, it's like they get a little bit here and there. And it's very hard to tell. Men, I don't know. I don't know why. I'm good at that. Anyway, maybe I'll find a way to monetize that ability someday. Maybe I'll start a game show or something. OK, so the midterm this year will be, so when we get back, you can think of that as being midterm week. So you come back from spring break. On Monday, we will do review during class. On Wednesday, we will have the midterm. And on Friday, we will go over the midterm solution. If the calendar doesn't say so, then I'll fix the calendar after class. But that's how it works. So you sort of pause. You guys go to week off from working on the assignments, focus on the midterm. And then on Friday, we will. I mean, assignment three is already up. But by Friday, after spring break, Scott will have finished all the work on the assignment three targets and adding to the description and things like that. And you guys can go, OK, you can already start on assignment three. It's already up there. The write up is very short, actually. And then, yeah, so after class, on Friday, we're trying to bring in as many people for office hours on Friday as possible. So we'll try to get as many of the ninjas and the TAs and everyone else who's involved, who's available to help you guys out on Friday. And then after class, on Friday, we'll sort of try to fix any of the last minute bugs. So we're working on trying to incentivize you guys, particularly on assignment three. We're working on a new leaderboard feature for the website. This is something we'll probably release in the next few days. So we'll put up instructions. But essentially, this gives you a way to kind of brag about your performance on the assignments to people, people who might want to hire you, people who might care. We're going to put up for assignment three some performance targets where we run some performance tests on various parts of your submissions and have those numbers up there. Maybe we'll have some extra credit if you can beat the assignment three solution which is not that hard, I don't think. And this is all opt in. So by default, your submissions are anonymous. If you want to share your information with people who might stumble by and look at this, I will certainly send a link out to this to some people I know who might be interested in people who can do well in this class. So anyway, so that's coming. We'll tell you more about that. Oh, and just let me make a, I've had a couple of questions for people who are asking various types permutations of why did test 161 not work the same way that it worked when I ran sys161. So test 161 doesn't do anything mysterious. It is, you can think of it as running sys161 internally and typing characters into it. That's pretty much all it's doing. So it doesn't run your kernel in any special way. The differences are usually created by the fact that it will configure the sys161 simulator differently. So if your kernel works fine when you use it and when test 161 tries to run it to dust and boot, look at the configuration that test 161 is using. I think Scott's adding a feature so you can print this out. It's part of inspecting the output of various tests and configure the kernel the same way if you want to reproduce the same results. OK, yeah, so I hope you guys are figuring this out. When I taught this class at Harvard, the course staff said that every year, either one of two things happened. Either people crashed and burned on assignment 2 or assignment 3. I can't quite figure out what's going on this year. We'll see. But yeah, these are hard assignments. And this is just assignment 2. So when we get back, you guys will have like a month, like six weeks to work on assignment 3. And assignment 3 is worth twice as many points as assignment 2. OK, so let's go back to address translation. So remember, we had created this level of indirection that allows us to really play lots of games with memory, with what processes think is memory. It allows us to support this address space abstraction, which we really like some of the features of. But the problem is that it's potentially inefficient if the kernel has to be involved a lot. And so the goal here was to make sure that the kernel is not involved a lot. Most address translation should be able to proceed without the kernel's participation or interference, because if the kernel gets involved, it's too slow. Yeah, I just said that. And this is another case where hardware provides a mechanism that is fast, and maybe not another case. And in context switching, the hardware mechanisms are pretty limited. Here, we have a nice interplay because hardware is really the part of the system that's in charge of making sure this happens quickly, because that's something that hardware is very good at. But hardware has to provide enough of an interface to allow the kernel to set the policy dictating how translations happen, whether translations are valid. So the kernel is in charge. The kernel maintains control at all times of all memory that each process has access to, and the way it does that is through manipulating the translations that are available to that process. But the hardware is in charge of making sure that this is fast, particularly making sure that the kernel doesn't have to get involved every time. All right, I'm just going to skip through this. So we talked about base and bounds, which was very simple. Every process had a one base and one bound. It was very simple to check if the virtual address is valid. It was very fast to translate it. But there were all these problems with that approach, specifically, that leads to a lot of wasted memory and both internal and external fragmentation. So we didn't like this. We extended the idea slightly into what we called segments. So rather than needing to cover the whole address base with one chunk of contiguous physical memory, segments allow me to cover each contiguous part of the address base with the chunk of physical memory. Now I have multiple bases and bounds per process. And translation and checking that the address is valid are a little bit more complicated. The MMU has to store a little bit more state. But there were advantages to this, specifically, well, there were advantages where it worked a lot better than base and bounds. There was a lot less waste of memory. But this still requires that every segment be contiguous in memory. And it's possible that segments internally have a lot of unused parts. So this still leads to a lot of internal fragmentation. And because the segments are different sizes, we still have this external fragmentation problem. So some of the same problems as base and bounds in terms of how physical memory is allocated, the only difference is the segments are small. So that's good. It takes fewer segments, less virtual memory in within the segments to cover the address space. Because most of a process's address space consists of virtual memory that's not valid. Any questions on this before we go on? Address translation is a topic which is popular on midterm exams. Because it's fairly easy to design questions about. Because the questions involve math. So this is something that will probably come up again. Yeah? I just want to point out, he asked for that, right? So maybe what we'll do is if you want, then OK, fine. That's OK. I'll do two examples on the exam. One will be in Hex, one will be in base 10. Would anyone like to request other bases? Octol, we could do one in whatever 7 is. That's weird, right? Whatever, you can do anything. We could do binary, right? It's going to require a lot of paper. Would you like them in Hex as well? Yeah, we can do that. It's not a problem. We will have two parallel problems. One will be in base 10. The other will be in Hex. I've never had anyone ask that before. Yeah? Yes. That's the point. You will get a good question. We'll figure that out, right? I'll add the scores in Hex, right? And we'll see where you go. Yeah, I don't know. OK, so any more questions on simple address mapping schemes? Now, neither one of these really did what we wanted, right? What we wanted was a way to map from any virtual byte to any physical byte. Now, maybe this was overkill. Maybe this is a dream that we were never going to achieve. And so we talked a little bit about this piece of hardware called the TLB or the Translation Look-Aside Buffer that was going to help us. So how this helps us, it allows us to do constant time searches over a large number, not an infinite number, of potential translations looking for one that matches. So if you think about, you go back to the segment idea, you might be worried that as I add segments to the MMU, it's going to take longer and longer to find the segment. And in that case, that would probably actually be true because there's a little bit of math I have to do to figure out whether or not the virtual address is inside a particular segment. But with the KM or content addressable memory that I'm going to use to build the TLB, I can do this very efficiently. So you don't have to get into the hardware details, but you can think of these as something that essentially allows me to search a bunch of different entries in constant time. So they're kind of searched simultaneously and returns whatever entry contains the key that I'm looking for in this case, which is 0x800. So the problem with these aren't very big. And so what would happen if, let's say I use these to map virtual bytes to physical bytes. What problem does that create? Let's say the KAM maps. So let's say the way that I translate addresses is that I look up the virtual address in my TLB, and the TLB returns the physical address, the physical byte of memory that that virtual byte maps to. Why would I not do this? This sounds like it allows me to do exactly what I wanted. I wanted a fast way to map any virtual byte to any physical byte. So why not do that? What is the problem here? Want to take a guess? Yeah. Well, I can't make a KAM infinitely large. So let's say my KAM has 1,000 entries. How many bytes of virtual memory can I map? If each entry maps one byte of virtual memory and there are 10,024 bytes, how many bytes can I map? 1024? Right? You guys seem tired. What's that? Ran out of fingers. Ran out of fingers, yeah, exactly. So I mean, if there's a one-to-one mapping, and that's a pretty huge KAM, actually. Think of KAMs as like 32 or 64. I think the one on the MIPS has 64 entries, if I remember correctly. So I can map 64 bytes of memory, of virtual memory. How much is that? Not very much. So let's say I have a program that's running along sequentially. It does not loop. It does not branch. How many address translate, how many times will the kernel have to be involved to load new address translations? So another way to think about it is, once I put an address in the TLB, I want to get some mileage out of it. I want it to get reused. So if each entry maps one byte, and my program is running along sequentially, how many times will that entry be reused? How many times will it be used? Maybe that's easier. Once. So I go to the next address. That address isn't in the MMU. There's a trap to the kernel. The kernel loads it. The address is translated. And I repeat this over and over again. So how, what percentage of virtual addresses generate an exception that the kernel has to handle? These are not hard math questions. 100%, all of them. So this is not what I wanted. What I wanted was the kernel to be involved as few times as possible. So at this point, I've got segments that are too large. Mapping individual bytes doesn't work either, because I can't map enough of them. And I've also got these fragmentation problems that are created by segments, because they're different sizes and there's too large, just internal fragmentation. So what do we do? Well, we try to make a compromise here. And the modern compromise is to, so there's one reason why I would like the amount of memory I map to be big or the page size to be big. And what I do is I choose a translation granularity that's small enough to limit internal fragmentation, yet big enough to make good use of my cache. And we'll come back to this. We'll talk about this a little bit. But that's the trade-off. The smaller these units are, the less internal fragmentation there is, because I'm assuming that things that are close to each other in memory are related, which is normally true. Structure and array access to them are clustered. As they get bigger, though, they allow me to map more and more memory using a fixed size cache. So we call, OK, so and this also, depending on the size of this unit of granularity, this also determines the size of the data structures that the kernel has to maintain. So this is another trade-off. As these get bigger, it takes fewer of them to map the same amount of virtual memory and any kind of data structure that's related to memory management gets smaller. This is all, I mentioned this a little bit last time, but I want to come back to it, because this is pretty critical to understand. So a lot of this is driven by this notion of what's called execution locality. If a process, you can think of the virtual memory that a process has mapped at some given point in time. It's a combination of the stuff that has its code in it that came in through exec, anything in the heap that was created by malloc through s break stacks of the threads that are running along, whatever. Imagine that you're watching all the accesses to that memory in time. You could even do like a heat map, showing the number of accesses that are occurring in various parts of memory. And the important thing here is that over small enough windows of time, the memory accesses are highly clustered. So it's called execution locality. The reasons for this are pretty obvious. So let's say I have a loop of code that's running, that's looping at 10,000 times. That loop accesses the same series of instructions over and over and over again. A lot of times that loop is manipulating some kind of data structure, maybe some sort of array. That array is also located in contiguous memory. So now it may be that I have a couple parts of memory that are hot at one given period of time. Part of it could be where my code is. Part of it could be some object in the heap that I'm accessing. What might the other part be? In general, I might have three regions that could be hot. Three regions where I would see a lot of different accesses. I gave you two of them, what would be the third one? You know, think about a piece of code. Think about a loop, right? It's executing a series of instructions. So some instructions came from its code somewhere, so that part is warm. I just told you it's accessing some sort of dynamically allocated data structure that's on the heap, so it's producing some heat there. What's the third part? Where's the third part of memory that that loop is going to access, probably, in most cases? Stack, my loop iteration variable is probably a locally allocated variable that's part of the function context. That's on my stack. So I might have these three parts of memory that are hot. The rest of my address space is completely unused. It's not touched. There's a whole bunch of the code I'm not executing. There's a bunch of different data structures that I'm not touching right now. And there's other parts of my things that are higher up on my stack. The function context that was saved by the caller and the caller of whatever function I'm running, that stuff's also quite cold. Over time, these areas move around, but over a small enough time window, there are just small pieces of the address space that are in use. Does this make sense? This is kind of critical to understand why the rest of this works. If you, now, it would be fun. You could try this at home. Try to write a program whose memory accesses are entirely randomly distributed. So in that case, if I have a certain amount of virtual memory allocated, every time it's like I'm picking it random from anywhere in my virtual memory and I'm touching that byte. I think that would actually be a very hard program to write, but that's not how programs work. If your program ran that way, it would be terrible. It would really, really frustrate the cache and the other things that we're going to do, but it turns out, programs don't run that way, so that's good. Okay, so the modern unit of execution to granularity is something called a page of memory. This is probably something you guys have termed that you guys have heard before. Page sizes, so for a long time 4K was this really canonical page size. Now you see architectures that are looking at larger page sizes. Why would we start to think about bigger pages today? What's happened over the last 10, 20 years that might lead you to think that 4K starts to feel a little bit small? Why am I not thinking about like a four megabyte page instead of a four kilobyte page? What's different about today's computers? What's that? They have more memory in particular. The disk size is sort of irrelevant here, right? But they have a lot more memory. So a 4K page on a system with two megabytes of memory seems reasonable and you can calculate how many pages it would take on a system with eight gigabytes of memory, 4K pages start to seem pretty small. So there's something else that's changed too though. So it's not just the amount of memory. On certain systems, you can configure different parts of your address space to use different page sizes. What's an example of something that a process might be using where it might make more sense to have a larger page size, like a four megabyte page size? Databases, yeah. So databases are a great example. Databases, I do not want the operating system playing games with these little pieces of memory. If I have a big database, I want that database all in memory all at the same time and I want it to stay there. What's another example? This is something that I suspect you guys do on a regular basis with your computer. It's a pleasurable activity. Yeah. Oh, okay, games would be, games are interesting, okay? That's not what I was thinking of. Gaming for me is not a pleasurable activity. I'm not very good at computer games. Yeah. Videos, right? Videos are huge, like several gigabytes. And the video is really, I mean, videos show a great degree of locality, but they're so big that if I was loading things in 4K at a time, that would cause more overhead than I want. So once I touch a certain four megabytes of the video, I'm probably gonna watch all of it. Four megabytes of video is maybe, I don't know, 10 seconds or something like that. Don't quote me on that. It's not very much. So in general, your playing videos from start to end unless you're strange. So once I see memory access to a certain part of the video, I might as well load that a big chunk. Okay. So, and now if you, the way you can think about this is how much, when you think about TLBs, you wanna think about how much virtual memory will the TLB able to cache translations for? So if I have 4K pages, then 128 entry TLB, which could be pretty standard, maybe they're a little bit bigger now, can cache one megabyte of virtual memory. That means that as long as my process is using less than a megabyte of virtual memory and all those translations have been loaded, I shouldn't see any new exceptions that are generated by this. And this has, this is related to execution locality. So if the data structures and the code segment and the stack that I'm using, if that all fits into within the amount of virtual memory I can map in the TLBs in a small cache, then I can do a lot of work in my program before the operating system has to load new translations. Eventually I break out of that loop and I go off and program to something else and the operating system needs to load some new translations so it can use some other part of its virtual address space. But to the degree that the clustered accesses over short time windows fall into this amount of memory, I'm good. Another way, an easy way to think about pages is just as fixed size segments. So we don't talk about a bound with pages because there isn't one. The bound is the same, 4k or whatever the page says. So for page translation, because these are powers of two and because they're fixed size, we can play, there's some games we can play here. We can optimize a little bit. So what we usually do is you, any memory address, is in some page. So if I count pages from left to right across the address space, I can break any memory address into two parts. The first part is the page number which we call the virtual page number. That identifies the page that the address is on and then there's an offset. The offset is between zero and what for a 4k page? A 4k page, once I've identified the page my memory is on, the offset is between what and what? What's that? Zero, okay, so someone is forgetting their hex, right? So that's, should I accept that answer or not, right? It's like jeopardy. Zero and three, nine, nine, nine, nine, no. How about zero and four, nine, four, zero, nine, five, right? So it's zero to 4k because what I do to figure out, I find the page and then every page is 4k and so the offset within the page has to be between zero and 4k. When virtual pages are resident in memory, the virtual pages map to physical pages. Now there are other places that virtual pages can go. Talk about that on Friday. But when a virtual page is in memory, when the memory is, when the virtual memory is actual memory, the virtual memory, what we do is we just map a virtual page onto a physical page. So you split up physical memory into page size chunks as well. And because each virtual page maps to a physical page, by construction all the addresses within that virtual page map onto the same physical page. So here's how we do this and again, I have a diagram in a second. So to check whether or not a virtual address is valid, we need to make sure that there's a valid virtual page translation to physical page translation that exists for this page. And then translation is very easy. So to translate a page, what I do is I take the, I take, I compute the offset on the virtual page. I map the virtual page to a physical page and then I reapply the offset. Because the pages are 4k or four megabytes, we can do this using fancy hex number tricks, which I'll show you in a sec. Oh, by the, oh, okay, so I just wanna show you guys this. How many people have seen this movie? Yeah, you guys remember this scene? So I was actually a TA for this class. And when I saw this movie, so did you guys see that slide that's up there in the middle that has like 60,000 arrows going into that box? Remember thinking they clearly made that up. There's no way that my advisor, who's supposed to be the professor in this, Matt Welch actually used a slide like that. And then when I was preparing this course, I found that slide, that exact slide. Like wow, that was real. Yeah, I don't remember Mark coming to class that often. I do remember his partner in the class who came back years later and was a TA for the class, fell asleep in class once, and we squirted air in his ear using one of those compressed air things. That's a really mean thing to do. I don't think he appreciated it very much. He did wake up though, that was kind of the goal. All right, so let's keep talking about virtual memory. So we're using 4K pages, and here's how we do the translation. So the nice thing, the TOB here is not mapping virtual bytes to physical bytes. It's now mapping virtual pages to physical pages. So for any page of memory, we can divide it into a virtual page number and an offset. The fun thing about 4K pages is you can actually do this visually. So can anybody tell me what the page number is? The virtual page number is for this virtual address. The offset is 4K. Can anyone see how to do this? Anyone want to guess? So the offset is the bottom four hex digits. The page number is the top. So you can just, and this is only because the pages are 4K, if they were something else you'd have to do something more messy. But with 4K pages, bottom three hex digits of the offset, the top, however many you have are the page number. So this is virtual page number 800. What physical page number does this map to in my TOB? Yeah, 306 right here. So I search the TOB, identify the entry. What physical page number, what physical address is this memory access going to hit? So I've looked up my virtual page number, how do I reassemble the physical address? It's pretty similar to how I de-assemble the virtual address. So I put the physical page number, I replaced the virtual page number with the physical page number, and the offset's the same. So that's my physical address. All right, let's do this again. So a virtual address, bottom three hex digits are my offset. So I split it into the offset and the virtual page number, which is 800. I look up the virtual page number in the TOB and map it to a physical page number, and I reassemble the physical address by combining the physical page number with the offset. And because these are 4K pages, you cannot, don't try this at home with a 1K page, it won't work. With 4K pages though, I can just smack these guys back together, and I'm good. Questions about this? Yeah. Yeah, I think that's just me being sloppy. Yeah, sorry, the virtual page number, the virtual page number here was 800. 346 is the offset. Just, good question. Any other questions about this? Yeah, yeah. So heck, I wish I knew this. There's some limit on a 32-bit system. Is it eight? Yeah, I think it's eight. So I think 32-bit addresses will go up to OX, F, F, F, F, F, F, F, F, F. That's like, buffalo, buffalo, buffalo, right? So if I had an eight-digit hex address, how would I compute the offset? It's the same. It's the rightmost three hex digits. Everything else to the left? So if I had eight digits here, I'd have a five-digit virtual page number and a three-digit offset. Does that make sense? Yeah. Let's do another one. So I want to cut to the chase here. Who can map this to a physical address? Who can play TLB? Yeah? Yeah, split off the virtual page number, look up the virtual page number of the TLB, place the virtual page number with the physical page number, and snap the address back together. Any questions about this before we go on? So what you can notice is that rather than using the TLB to map bytes, which is what we were doing before, I'm using the TLB to map pages. So my address translation granularity has gone from one byte to 4K. And that's what allows this small cache, which is only 128 entries, or this one has four entries, to map, in this case, how much virtual memory would this TLB be able to map with 4K pages? 16K. Each page maps 4K of virtual memory. So if a process started to run, and pretty soon, all of its virtual addresses were clustered within a 16K region, the operating system would not have to get involved at all. The TLB would have all the information, or the MMU would have all the information it needed, in the TLB to map all of those addresses without the operating system being involved. Yeah, so these are also already at translations that have been loaded by the operating system. So who controls the TLB? Where do entries in the TLB come from? Yeah. Yeah, the operating system has to load them. This is how I provide processes with access to physical memory. If the operating system doesn't have control over the TLB, then anything can happen. If I allow a process to load TLB entries, the process could have them mapped anywhere to a physical memory that it doesn't actually control. So that would be bad. So the operating system has sole control over mapping of loading entries in the TLB. How is that enforced at hardware? What would happen if a program, a user process, tried to execute the instructions that I would need to run to manipulate the TLB? Those are just instructions. They're part of the instruction set. What happens if a user program tries to use one of them? How do I make sure that this is something that only the kernel can do? Permission level. So what happens if a user program tries to execute one of these instructions? Generates an exception. So these are instructions that have to be executed in privileged mode. If I'm not in privileged mode, if I'm not the kernel, I can't execute these instructions. So if a user program tries to run one, I trap it into the kernel. The kernel sees what happened. It said, that's a terrible thing to try to do, no way. And kills off the user process. Yeah, there we go. OK, sorry. So this is different. So if a process, now here's the thing that can happen. A process is running along, and it tries to access a virtual address. It tries to access a virtual page, but it is not loaded into the TLB. So in that case, this is the normal exception handling path. The TLB traps into the kernel, asks the kernel for help. The kernel needs to load a valid translation or kill the process, in the case that the process has gone haywire and it's accessing memory that it shouldn't be able to access. All right, so what's good about paging? So paging allows us to maintain a lot of the benefits of segmentation. I can layer segments on top of pages. And a lot of systems, memory is organized into what are called memory objects that map a certain region of the address space that consists of multiple pages. Those memory objects have information like permissions, sharing information, allows you to share regions of memory between multiple processes and stuff like that. The unit of translation granularity is still the page. But the segments allow me to sort of organize pages into groups that have similar properties. So that's nice. And pages turn out to solve two of the problems that we have with segmentation, both of the fragmentation problems. So how do pages help solve the internal fragmentation problem that segments have? I feel like I'm just going to watch these videos and write these questions down and put them on the exam. It's always hard to write midterms. So there is both internal and external fragmentation that I had with segments. How does this solve the internal? Or pick one. Well, it's pretty much no wasted space, or at least. So it's possible that I'm only using one byte and a page. It's possible that there's a data structure that I'm only using one byte of, and the rest of the page is wasted. But even in that case, I'm only wasting 4,095 bytes of memory in a segment, which could be very large. I could have a segment that's megabytes in size and only be using one byte of it, and then I'm wasting everything else. So the maximum amount of memory I can waste in a page is the page size minus one byte, in the case where for some reason I needed a whole page of memory to get access to a one byte data structure. That's really the only way that would happen. Even if I'm looping, and even if I'm in the tightest loop that you can write and see, that still requires, I think, two or three instructions. So there I would be wasting 4k minus three instructions or something like that. What about the external fragmentation problem? So I'm limiting internal fragmentation. How does this handle the external fragmentation problem? This is very nice. This is even nicer than the other. In the case of external fragmentation, there is now zero external fragmentation. Why? Yeah. Yeah, I mean, remember, I'm splitting virtual memory into pages. And when I need a page for a process, I find a page of virtual memory. All the pages are the same size. And so there's no case where I could try. I would need, I always need, think of it this way, I always need 4k. If you always need 4k, it's very simple to write a very, very efficient allocator that suffers from no external fragmentation. Now, this isn't quite true, because if I have different sized pages, I might have problems. And the kernel sometimes needs bigger allocations that are also contiguous in physical memory, blah, blah, but more or less, when I'm talking about user memory allocation, every user memory allocation needs a page. One page, two pages, three pages. And those pages don't have to be contiguous because of how memory maps. So no external fragmentation or fixed allocation size. If you ever want to reduce external fragmentation, pick an allocation size and stick with it. That's the easiest way to do it. OK. Now, the problems with paging I really have to do with the fact that pages are sort of small. And so now I need this special hardware to help me out to do the translation. The other issue with paging is that you might have noticed is that pages have the potential to require a lot of state in the operating system itself. So when we talked before about what does the MMU need to know for base and bounds. So for base and bounds, how much state per process does the operating system need to store? The entire mapping between a process's virtual address space and physical memory really boils down to how much state for base and bounds. For base and bounds, how much state does the operating system need to store? Anyone want to guess? The base and the bounds. So we're thinking eight bytes. That's pretty good. What about segmentation? Now the plot thickens a little bit. So if I asked you for a particular process, how much state would the operating system need to store for if it was using segmentation and not paging, how could you answer? Can you know exactly? Why not? Depends on how many segments the process has. So how much per segment state do I need to say? Each segment has a base, a bounds, and a start virtual address. So now I've got 12 bytes per segment. So that doesn't sound so bad. A lot of programs don't have very many segments. Code, heap, stack, three, maybe one more for some of my statically initialized variables. What about pages? How much state do I need? You still can't answer exactly, but how much, what does it start to depend on? Yeah? Page size? Depends on the page size. What else does it depend on? Yeah? How much pages? Eh, close. Yeah? The size of the process. The size of the process in what? How many virtual pages it has? So for each virtual page, I clearly need to store some information. And so now in terms of scaling the amount of operating system state that's required, I've created a lot larger problem. So now it pays us to do some of the type of bitwise engineering that the social network lampooned. So now here's the challenge, right? Here's the state maintenance challenge. So you can think about it this way. When the TOB creates an exception, the kernel needs to be able to answer questions about address translations. So first of all, I need to make sure I can store those translations compactly. And I also need to be able to locate the information rapidly. Because this is another source of overhead in the operating system. If the operating system takes forever to look up page table entries or to figure out how to map virtual addresses to physical addresses, the whole system is going to slow down. So usually, now I don't know if this is true anymore. This may on modern systems, we've sort of gone to 40 bytes of 40 bits, bit wide addresses or 48 bit wide addresses because you guys have too much memory to address in 32 bytes. But on a 32 bit system, 32 bits, on a 30 bit system, it's usually possible to pack all the information I need to know about each page into 32 bits. I think now I suspect that these data structures are 64 bits on modern systems or at least bigger than 32. Now the nice thing about it is that the virtual page, the physical page numbers are only 20 bits wide. The offset is 12 bits. The rest of it is 20 bits. You can work that out at home. And then here's other stuff that I might wanna know about each virtual page. I might wanna know where it is. Is it in memory or is it somewhere else? Talk about where it goes on Friday. And then I can remember I can apply these permissions to memory now in a way that's analogous to what I might do for files or what you guys are used to for files. So I can say, can this page of memory be written to? Can it be executed from? Am I allowed to load and execute instructions from this page? And this sort of dramatizes what I said before. If the operating system is the source of slowness here, this will start to become overhead that can cause the machine to run extremely slowly. Part of the problem is, when I get here, I'm already slow. I was slow right here, right? So by the time I get to the kernel, something already went wrong. What's the thing that went wrong? I'm already on it, well, you would think of it as a slow path here because something didn't work out for me. What was I hoping would happen? What do I rely on to make this all fast? Yeah. Yeah, I hope that the MMU already knew how to translate this address. So by the time I get into the kernel, I'm already in trouble. I'm already on a slow path. I've trapped into the kernel, I had to do a context which is super expensive and my cache has failed me. So this is already bad. And if the kernel takes a long time to figure out what to do and how to reload the MMU, it just gets worse and worse and worse. So I need to be able to locate this information extremely rapidly. There's another requirement. So now we're really talking about data structures. What's the other requirement for the data structures that I use to map virtual addresses to physical addresses? What's that? Remember, I'm running in the kernel and the software is now manipulating the TLB so that this address can be translated and the instruction can complete. But I need to make sure that these data structures allow me to look up the information rapidly but I also need to make sure, what else? Whenever you design data structures, there's usually a trade-off between speed and some other things, space. So if these data structures are huge, what are they going to consume? Space, physical space? Where are these data structures located? Memory. The more memory the data structures take up, the less memory there is for processes to use. And we'll talk about on Friday, once I start having to move things around out of memory and onto the disk, things really get slow because memory is already kind of slow compared to registers but disks are like Pluto slow compared with memory. Okay, so we refer to the data structure that we use to perform this mapping as a page table. Okay, page tables are responsible for storing information about virtual to physical translations and access it efficiently when needed. On Friday, we'll pick up here, talk about some page table structures and hopefully get through swapping. So I will see you then. Good luck on assignment two.