 Welcome back everybody. It's hard to believe, but we're on lecture 16. Term has been flying by and we've been talking about virtual memory and so we're going to continue in that vein today. I wanted to fill out a little bit more of the caching discussion that we had last time just to remind you again about average memory access time. This is 61c material that hopefully you'll be familiar with, but if you remember the average memory access time is composed of two different hit times probabilistically. Now we're in this situation where we have a processor talking to say an L1 cache which is talking to DRAM and the trick is to figure out how the cache improves our performance and to do that we have an aggregate hit rate probability that we come up with the cache and the hit rate is the percentage of the time that we actually get a hit in the cache and then the miss rate of course is one minus that. The hit time is how long it takes when we hit and the miss times how long it takes when you miss. So this is not rocket science here, but I just wanted to put up some definitions and inequalities here just to make sure we're all on the same page here. Clearly the hit rate plus the miss rate better be one or something weird is going on. The other thing is the hit time which is the time to get from the L1 cache is actually a part of the miss time as well. So when we talk about miss time it's not only the penalty which is going down to the DRAM and pulling it into the L1 cache but then we have one more hit time afterwards. So this miss time we're talking about up here is actually hit time plus miss penalty. And then of course the miss penalty is the average time to go from the lower level which in this case is DRAM. All right, so this is all 61c. If you were to take some of these blue items and put them into the original equation and rearrange things a little you'd actually see that another way to talk about average memory access time which is the one I often prefer for a reason. I'll show you in just a second. It's really the hit time plus the fraction of time you miss times the miss penalty. Okay, and so why do I like this one better? Well, suppose we've got more levels. Okay, so we have an L1 cache and an L2 cache to DRAM. Well, then we can just take our new equation that was in red up there. We say the average memory access time is the time to hit in the L1 cache plus the miss rate in the L1 cache times the miss penalty. Okay, I haven't done anything but just copy here. But what's interesting about this is what's the miss penalty in the L1? Well, that's the time to get it out of this L2 DRAM combination. And so if you notice the miss penalty of L1 is just the average time to get from the lower level which is really just the hit time of L2 plus the miss rate of L2 times the miss penalty of L2. Okay, and you can do this recursively. In this case that miss penalty of L2 is just the average time to fetch from the DRAM. And so average memory access time of this total combination is the hit time at the L1 plus the miss rate at the L1 times and then in parentheses the hit rate excuse me the hit time of L2 plus the miss rate of L2 times the miss penalty of L2. Okay, and so on. So you can keep doing this recursively. Modern chips like the ones we've been talking about typically have three levels of cash on chip these days. L1 and L2 is part of the core and then every core has an associated slice of L3 for instance, and then there are many of the cores on chip. All right, good. Now, the other thing we've been talking about of course is caching in general applied to translation and that gave us our translation look-aside buffer or TLB. And okay, there's a good question here that's on the chat which is could you start accessing DRAM in parallel with the cache? Yes. In fact, that's an optimization that's often done with server class chips where they'll actually start a DRAM access even while they're busy doing the checking and the cache. The downside of that is twofold. One, you're burning energy because DRAM is one of the biggest non pipeline consumers of energy. So that's going to be expensive from an energy standpoint. And it also means that if you're accessing a DRAM, it means that somebody else couldn't be accessing a DRAM. So if this DRAM is shared, you've just slowed things down. But in a typical server environment where maybe you don't worry quite as much about power but you really want performance, then you could certainly start a fetch early even while you're looking in the cache. So applying caching to translation, as we said, we just basically have the TLB which is a cache. And so we say here's a virtual address. Is it cached? If the answer is yes, we go ahead and we go to this physical memory which is a combination of caching and DRAM whatever because we have a physical address. And so this is a very fast path. On the other hand, if we miss it, then we've got to go to the MMU to translate which means walking the page table. We get our result back. We'll store it in the TLB cache for next time and then we'll go ahead and access. And of course, oftentimes there's the ability to go around the translation entirely. So the question of this, of course, is what a page locality does it exist so that we'll mostly be hitting in the TLB. And we kind of made the argument that yes, there's a fair amount of page locality certainly in the instruction and stack accesses, but even the data accesses have some good locality in them. And we can build a TLB hierarchy if we want this look up to be fast. And so rather than having a fully associative 128 or 512 way TLB, we might have a small direct map cache as a first level TLB and a more highly associative second level cache. And we just showed you the equations in the previous page of how you might analyze that. Okay. Now, so, and again, the other thing I wanted to point out this is a slightly different picture of the hierarchy that I showed you before. What notice that between the DRAM and the secondary disk, I've actually got flash or SSD storage, which is pretty much more modern system. I will point out that the page tables are in memory. Okay. And so we're hoping that those will mostly be cached. The other thing I'm showing you here is that things like registers L1, L2 cache, L3 cache main memory are all accessed in hardware. And however, when we're talking about caching and as demand paging, and that's what we're doing with this lecture, then all of that's managed in software kind of by the OS. And so the main memory becomes a cache on SSD or disk. And so that's all going to be managed in software. And today is going to be the day that we talk a lot about how do we manage that those page tables so that we get sort of the best result from our caching. And notice the TLBs are up here very fast, kind of at the speed of registers. And then the page tables in DRAM and so on are much slower by many orders of magnitude. And so the key is going to be we're hoping that our TLBs get enough locality to help speed up the page tables. Okay. So what we started talking about last time was demand paging mechanisms. And the page table entries in the page table make it possible to build demand paging. Okay. And so we know that in the Intel chips, well, we know that in general PTEs, there's a valid bit. And when that valid bits setting equal to one, the page is in memory and the hardware can go ahead and do the reference. When it's not set to one or zero, the page is not in memory and you get a page fault. Okay. And of course, in the Intel chips, this is called present, as opposed to valid, but it's the same idea. And so what I showed you here, this is again something we had at the end, is basically supposed to use a reference as a page with an invalid page table entry, we hit a page fault, then what do we do? Well, the memory management unit at that point was walking its way through the page table found an invalid entry and caused the page fault. And so now we're going to have to do something, which is we need to pull that thing off the disk, for instance. And what does the OS do in this case? Well, it has to find it on disk, and then it's got to find space. Where is it? Which page is it going to put in DRAM? Which page is it going to use in DRAM to handle this fault? And at steady state, all of the DRAM is potentially full. And so the first thing we got to do before we can even handle the page fault is we got to choose an old page to replace. And that's going to be a big topic for today. How do we replace a page? Which one do we choose? And so let's suppose we know how to do that already. We picked a page. The first thing we're going to do is say, well, if that page has been modified or the dirty bit is one, we need to write the contents back to disk because it's got more up-to-date contents than are actually on disk. And so we got to clear that page out before we can even use it. And then we're going to change its page table entry and any cache TLB to be invalid. If you remember, the reason we want to do that is that we're reusing the page for something else. And so the page table entry originally said this was a valid translation, but we got set it to invalid. And then the TLB is a cache on the page table entry and the TLB is going to be incorrect. And so we got to throw that TLB entry out. And so some process lost a page in the process here. And if it needs to get it back, it'll, of course, get another page fault and pull it back in off the disk. Then we got to load the new page into memory from the disk. And of course, that's a process that can take time, a million instructions worth. So while we're loading the new page off disk and actually while we're writing the contents back to disk, if we do this in line, then we're putting the process to sleep. And then when the page comes back and the memory is now full with the new page, we get to update the page table entry in the page table and invalidate any TLB for that entry. And if you think about why we need to do this, originally the TLB for that entry said invalid. Why? Because we got a page fault. And so we're going to, by invalidating or throwing that TLB entry out, we know that when the processor retries that original access, it'll cause a, it'll miss in the TLB and we'll have to walk the page table. And then we continue from the original location. And this is really what makes this thing act like a cache. And if you notice, one of the key things in here is this very first item, which is how do we choose the old page to replace? And that's called the replacement policy. And it turns out that's a topic of extreme importance when we're dealing with demand paging. And so that's going to be a topic that we need to cover today. And the TLB for the new page gets reloaded when the thread continues because it doesn't have an entry for that address and it'll be pulled from the page table. And of course I said that processes are suspended when we're going off of disk. So I wanted to pause here and just see if there are any questions on this particular slide because there's a lot of content here. And so I just want to pause and everybody take a breath and tell me if there are any questions. So good. Why invalidate the TLB for the new entry? All right. Well, can anybody tell me why? Okay. So we're talking about this line here, the second to last one. And the reason is we want to go back through the MMU again. That's correct. And the reason is that we've just changed the page table entry. Originally, when we went and tried to do the reference, we looked into the page table and we pulled that page table entry into the TLB and that said invalid, which is why we trapped in the first place. So we have to basically invalidate this purely so that it'll get reloaded from the page table. And when it gets reloaded, then it'll be valid the second time around. Okay. Does that make sense? Good. Any other questions? Okay. The third step and the fifth step, you mean load new page? Oh, one, two, three. Change page table entry in any cache. So the difference between these two is we're, this is the page table entry we kicked out when we replaced the page. This is the page table entry that we're filling. And remember why we're invalidating both of the TLB entries. So remember there's one page in the DRAM that we're changing from belonging to process the original process to this new process that just page faulted. So that physical page is transferring ownership. And in the process, we have two TLBs, one from the old process that owned the page, which we have to invalidate so that it's invalid. And one from the new page entry, the new process. And in that case, we have to invalidate the TLB so that when we go back to the page table, we get a valid entry. So there's two TLBs involved here. And we're invalidating both of them because they're both wrong. Okay. This is the point usually where people say, well, why can't you just fix the TLB up? And the answer is that most processors don't give you the option of modifying the TLB directly. There are, there's a few architectures like MIPS that allow you to mess with the TLB directly, but most of them do not. And so the best we can do is just invalidate them so that then the MMU will pull it in off of the page table, which is the actual correct contents at that point. Okay. Good. So here are our steps that in handling of a page fault, I just wanted to show you this graphically. So originally we tried to do, say, a load to address M. That reference looks up on the page table and we noticed that the page table is invalid. That's what this I means here. And we get a trap into the kernel, which is a page fault. Okay. And at that point, we realize that the page that we want is on disk. So we start that access coming in off of disk. And meanwhile, we have also found a free page. Okay. So on that previous slide, I said we emptied that page out by writing it back to disk potentially. In a real system, which we'll talk about later, we typically have a free list full of pages that are available for use. And we're constantly cleaning pages by moving, sending them back out to disk to make sure we have free frames. So let's assume we had a free one. In that case, we pull the page off the disk into the free frame. When that's done, we reset the page table to be valid. And we invalidate the TLB. We restart the instruction. In this new time, it will work without a page fault. All right. Now, some questions we're going to need to answer. Like during a page fault, where does the OAS get the free frame? Okay. And on the two slides previous, I kind of indicated, well, we find one on the fly and maybe we have to send dirty pages back to disk on the fly. In reality, like I said, there's a free list, but we'll get to the free list after we investigate our replacement policies a little bit more. Okay. How are we going to all organize all these mechanisms? They're going to be organized around the replacement policy. And, you know, replacement policy being something like LRU or random, et cetera. Another question we're going to want to address is how many pages does each process get? Okay. How many page frames? So that means if I take my DRAM and I divide it up, do I give every process an equal amount of DRAM? Or do I only give DRAM frames to the processes that need them most? Perhaps that might be a good policy if there was only a good way to figure that out. We'll talk about that a little bit later. Another thing that we're kind of interested in here is allocating disk paging bandwidth. Because if you have a malicious or badly written program that's walking all over memory, you're basically causing continuous page faults, which is going to empty out the TLB. It's going to slow everything down. And every process in the system is going to hurt as a result. And so one thing we might want to do even is figure out how to allocate paging bandwidth fairly. Okay. So that's the type of schedule. So as we start here, we need a working set model. So what do I mean by that? Well, here is the addresses. Here's the whole address space. And here's time. And what we see here is we see at a given slice in time, if we were to look in this red band here as it goes by, at any given slice in time, we can look at all the addresses that are currently in use. And that's the blue one. So we slice straight through here. And those are all of the pages that have to be in DRAM and mapped in DRAM in order to make progress. And notice a couple of things. One, it's not always the same pages, which makes sense, right? As this, let's go back and show you that amazing animation. See that? As that little red part goes across, what we see is our slices in time represent different pages that are in use. And so part of, there's a couple of things to learn about this. And we're going to make this a little bit more formal in a second. One is we've got to have a good way, as the working set, which is the set of pages that are currently in use at any given time frame changes, we need a way for that use of the DRAM to evolve so that we have the pages we need in memory and the ones we don't maybe are on disk so that somebody else can use the DRAM, okay? The second thing we're going to need to do is figure out how can we make sure that everybody's got their working set in memory, because if we can't fit all of the working sets in memory, then we're going to have thrashing and we're going to be in trouble and things aren't going to work properly, all right? Now, so we can look at how does cache size versus hit rate work in general? So this is what the working set model is, how big of a chunk of DRAM or cache are we going to use? And what you find is that as the cache size increases, what you find is that there are certain plateaus where all of a sudden you reach a new stability point where now your hit rate is higher, okay? But before that, you just didn't have enough cache and so you're missing all the time. You hit a stability point where even as you vary the number of pages available or the amount of cache available, it's not changing things much and then finally you'll hit a size again, which will go to a new plateau and so on. And so the working set model we're showing you here kind of represents, well, I have enough cache for say this slice but not for this slice here, which is a lot more addresses, that might be this first plateau and then the second plateau might be a little bit more and so in general as you increase the cache size, the hit rate is going to go up or at least that's what we're hoping to do, okay? And as we transition from one working set to a next, we're hopefully going to kick out things we don't need anymore, bring in things we do so that we can optimize the cache size we've got, okay? And of course, just as with the regular hardware cache, which we were reminding you about, we're going to run into capacity conflict and compulsory misses, okay? Potentially, although in the case of the memory system and virtual memory, we're probably not going to run into conflict misses too often. Why is that? Does anybody remember? Why are conflict misses unlikely to be an issue in virtual memory? Yeah, great. Because effectively, the way our page table works is any page, any address in the virtual address space can be mapped to any address in the physical address space, so we effectively have fully associative caching in this case and therefore there aren't going to be any conflicts. Very good. All right. Now, another model is this zip model, which basically sorts pages by their popularity rank, okay? And what you see here is the popularity is this blue curve that goes down as I go up and rank, and the hit rate goes up, all right? But it goes up a little more slowly than in the working set model. And so the issue with this is the likely of accessing an item of rank r is sort of 1 over r to the a, where a is a constant, small constant. And so it's rare to access items below the top few, because notice how the popularity drops off, but there's a really long tail. And so what this means is that a small amount of cash does a lot of work, help to you, but a large cash doesn't help you as much as you might think, all right? And this particular model of locality is very common in web accesses and other things like that. So it's going to be interesting to ask the question of, do we have this kind of stair stepping working set model or do we have a zip style working set model? And that's going to tell us something about how much we need. It's definitely diminishing returns here, okay? This case, by the way, rank is is equal to the size of the cash in some sense. But what it's really talking about is if I take all of the pages in my virtual address space, and I sort them by popularity, the most popular page is number one, the second most is number two, and so on. And so, yes, if I think of this as cash size, then, you know, and I go out to 16, that means I can hold the 16 most popular pages, okay? So it's both, you know, it's the rank which can correspond to the cash size and when it does, we can figure out what our hit rate is, right? So this particular distribution, there's a substantial value from a tiny cash, but very rapidly diminishing returns because of the long tail, okay? Substantial misses from a large cash. So let's see if we can use our, come up with a cost model to see how important it is to make our replacement policy work well and keep our hit rate up. So demand paging is kind of like caching and it is caching, right? So you can compute an average access time, which we're going to call the effective access time here just so that we keep it distinct in our minds from average memory access time, but we're going to use the same equations. So the effective access time here is the hit rate in the DRAM times the hit time plus the miss rate times the miss time. And that miss time is going to be, this miss time is going to be having something to do with going to disk, okay? Now the question about why conflict, so I'm going to answer this question that's in the chat about why conflict misses aren't a thing is because you only get conflicts when you have associativity that's not, that's less than fully associative and with our page table we have a fully associative cache and so there are basically zero conflicts in that situation, okay? And that's not because of the TLB, that's because of the page table maps any address to any other address. So if we're trying to figure out the cost of a situation where we have a limited amount of DRAM, lots of disk, and we can compute a hit time and a miss, or excuse me, a hit rate and a miss rate for accessing data in the cache, then what do we got here? Well, let's try some numbers. So typical memory access time to DRAM might be 200 nanoseconds. The time to deal with a page fault might be eight milliseconds, okay? And suppose that p is the probability of miss and 1 minus p is the probability of a hit. So then we can do this computation here, okay? So p is the probability of a miss. So if it's in the DRAM, it's 200 nanoseconds, otherwise with some probability we have to go all the way out to the disk to bring it into DRAM, there'll be eight milliseconds, okay? And I have to compute my units, convert them so that I have all the same units, so that's nanoseconds, okay? So milliseconds is one thousandth, nanoseconds is one billionth, okay? So you got to make sure you know your units. And so here's my effective access time is 200 nanoseconds plus p times 8 million nanoseconds. And here's where this pays off, right? If one access out of a thousand causes a page fault, then the effective access time is 8.2 microseconds. So that's one out of a thousand accesses causes a page fault. What we've just done is we've slowed down the DRAM speed by a factor of 40, okay? So that factor of 40 is potentially quite high, right? So that's pretty bad, and that's a one out of one thousand accesses causes a page fault. So you can see why it's incredibly important to not have any page faults, right? So if we want to slow down by say less than 10%, then we can do a computation here where 200 nanoseconds times 1.1 basically is our maximum speed that we want. And we can come up with the fact that our probability has to be less than 2.5 times 10 to the minus 6. So that's basically saying that if I want this effective access time to be no worse than 10% bigger than the DRAM time, I can only have one page fault in 400,000 pages. So it's extraordinarily important to never page fault, okay? So I'm going to pause on that. It's extraordinarily important to essentially never page fault because the moment you start page faulting, that time to go to disk is so high that you just bring your performance to a grinding halt, all right? Questions? Okay, we good? So do you have enough DRAM for that? Well, that's a good question, but it turns out it's the not quite the right question, okay? Because this doesn't just depend on the amount of DRAM we've got. It also depends on the access pattern. So if you had a loop that only accessed one page over and over again forever, then you could get 100% hit rate, no misses, and you'd only need one page of DRAM, all right? And yes, we do have one page of DRAM. So the question about is there enough DRAM to hit this slowdown is going to be heavily application dependent. It's going to depend on what the application's memory access pattern is, how much DRAM we can give to it, okay? And so this brings up the interesting question of should we try to predict the access pattern, maybe, or maybe we should try to do some observations and see if things that are missing too often, if we can give them slightly more DRAM and things that maybe are just hitting all the time or really frequently maybe we can take some memory away from them without a problem and maybe we can come up with a dynamic policy for redistributing pages, okay? So that's a good observation. And one that we'll come to in a little bit. But so the thing to get out of this particular slide is the extreme importance of not page faulting, which really means we've got to be very careful that the pages we have in memory are the right pages so that we don't miss and that if we have the right pages in memory, we've got to be very careful not to throw them out incorrectly. So that's where the replacement policy comes in, okay? Excuse me, we got to make sure that no matter what, if we have to find a new DRAM page because somebody needs one, we don't want to throw out a page that's going to be useful for us, okay? Because if we do that, then we're going to start taking an extra 8 million instructions or a million instructions to do something 8 milliseconds and that could be a problem, okay? So what factors lead to misses in the page cache? Well, we are once again back to the three Cs with the fact that there aren't any actual conflict misses. But first and foremost, we have compulsory misses and these are pages that have never been paged into memory before. So the best we can do with these, of course, is prefetching, predicting the future somehow, okay? So this is not quite what the previous requester had said in the chat, but if we can somehow find out that a process is walking its way through memory, maybe we can have a prefetchor that's already got the next page coming off of disk so that by the time we get to it, it's likely to already be in memory. So that would be a way to get rid of compulsory misses, okay? And there is some prefetching that goes on in modern operating systems. Capacity misses are cases where we just don't have enough memory, okay? And so in those cases, maybe if we start getting a lot of capacity misses, maybe we start adding a little more DRAM to a given process to see if that'll help, okay? Now, you know, so one option is actually increasing the amount of DRAM, but the problem with that is you got to shut everything down, put in some new SIMs with DRAM in it and start things up again. We'll leave that option off the table for now because that's the drastic option requiring buying more stuff, right? Another option is basically if you have a bunch of processes, maybe we can readjust the who's using what DRAM to get better overall page misbehavior, okay? Conflict misses, as we already said, don't exist in virtual memory since it's fully associative, all right? So that's good. Now, if you remember back a lecture or two ago, what did I say? I said the three C's plus one because there was, you know, there was an extra C that we tossed in there caused by the cache coherence, coherence misses. In this case, we're actually not going to have a fourth C, we're going to have something called P, which is a policy miss. And this is caused when a page was in memory, but it was kicked out because of a bad replacement policy. And so what our next sort of, I'm going to say third to a half of the lecture is going to be about here is how do we avoid policy misses? Because those are drastically bad in the case of paging because we have to go to disk and burn a million instructions worth of execution time. So how do we fix better replacement policy? All right, so let's talk some administration. As you know, midterm two is coming up. They do seem to come rather frequently. I guess the upside is there's no final, so that's good. But timing is five to seven p.m. Unless you talk to us about a conflict, the conflicts with 170 are the same as they were last time, which is you're going to take the 170 exam after 162. And you will have heard about that or asked us about that if you're not sure. Other conflicts need to have been resolved already, so we've talked to several of you. There may be a couple of outstanding ones that we know about that we're still trying to work out. All right, topics are going to be up until lecture 17. So just keep that in mind. So certainly today's topics are going to be there as is potentially Monday's topics. As I mentioned before, we're going to require you to have your Zoom proctoring setup working. So you must have screen sharing, audio and camera working and no headphones unless you have explicit DSP allowances for headphones. So try to get your setup all debugged and ready to go. Review session is going to be next Tuesday. Timing is going to be seven to nine p.m. Zoom details will be announced on Piazza if they haven't already, I forget. And questions about the midterm. By the way, I'm glad that we don't have to have a final on the birthday of the person that's chatting there. So happy not yet birthday, Nikki. So do we have any questions about the midterm or are we good? No final. No. Only three midterms. Okay, there's a last midterm, which technically assumes that you remember concepts from first and second midterms. All right. So don't forget I have office hours two to three. Come shoot the breeze. Talk about whatever you like to talk about. Talk about operating systems. Talk about life, the universe and everything if you wish. These office hours are not necessarily for helping you with lab assignments and so on. But definitely come talk to me about high level ideas or lectures or whatever. That'd be great. Otherwise, I'm just sitting here with my Zoom up and doing other things. So come talk. Let's see. The other thing I wanted to mention is make sure to do your peer evaluations. We talked about this last time. But the basic idea here is you get 20 points for each one of your partners. That's not including you. So for instance, in a group of four, you'd get 60 points to give out to the other partners. And you're going to give them all out. I've had some people say, well, can I, you know, not give them all out or whatever? No, you got to give them all out. And this is a an evaluation of your your evaluation of the relative effectiveness of your partners. If you're completely happy with them, everybody gets 20 points. That sounds great. If you're less happy, you could give 18 to one and 21 to the other two. But notice the sum is still 60. And everything is validated by the TA and the end of the class. So your TA also knows the dynamics of your group. So make sure they know that. And in principle, the project grades are a zero sum game. So if you're out there and you're not contributing to the project at all, it's quite possible that your points will get redistributed to your other partners since you've given them extra stress as a result, because this is a project class. So I'd much prefer to have 20 points across the board for everybody. And so let's have that as a, as a goal. All right. Pierre the peer evaluations are not about giving yourself points, any points, right? Your other partners give you points. Every term somebody tries to give themselves, you know, they've got 60 points to distribute. They try to give 59 to themselves and one of their partners and zeros everybody else. It just doesn't work that way. And we're going to ignore the 59 points you give to yourself and rescale everybody else. So just do the right thing and hand out all the points to your partners. Okay. Last elections coming up. All right. Don't forget to vote if that's an option for you. I mean, this is one of the most important things you can do in the United States. Don't miss the opportunity. I don't need to tell people that this is the probably the highest stress, most important election for lots of people. Those of you that can't vote, I apologize, my condolences to you. But this is all the more reason that those of us that can should do that. All right. And you know, vote your mind. The important part is that you participate. That's the most important thing. Okay. And don't put your ballots in the fake ballot boxes in Southern California. Use the post office or something. Okay. Good. Now, so let's talk about replacement. Okay. So page replacement policies. Why do we care? Well, I think my effective access time slide hopefully gave you a good why we care. The replacement is always an issue with the cash, but it's particularly important for pages because the cost of being wrong is really high. Okay. The cost of going to disk is a million plus instructions. If you're wrong in a hardware cash that is going to DRAM, the miss time to DRAM is not that high relative to other things. And so the cost of being wrong there might be less. And that was we were talking about things like random replacement working out pretty well most of the time. When you're talking about going to disk, random is really not great. Okay. Because you're going to do the wrong thing. And there's so many better things you could do in terms of picking a page to throw out. All right. So let's talk about some simple policies. Right. You could imagine FIFO comes into play. This sounds like what we did with scheduling. Right. We started with FIFO. You throw out the oldest page, and you're going to be fair because you're going to let every page be in memory for the same amount of time. Okay. So this sounds good except that it's very bad for the following reason. It may turn out that the page that was admitted into the DRAM a long time ago is still used every other reference. And so the fact that it was loaded right away but then is referenced every other time means that you're going to do very definitely the wrong thing if you throw out the oldest page because eventually you're going to throw it out even though it's probably the most frequently used page. Okay. So FIFO seems like it's probably a bad idea. Okay. FIFO has been a bad idea with scheduling in the past and it certainly seems like a bad idea as a replacement policy here. Random, we brought up as a replacement policy in the hardware cache instance last time or the time before that. This one was better than you'd expect in the case of associative caches in hardware. Okay. And so the idea here, you pick a random page for every replacement and this is a good solution maybe for the TLB because it's fast. Okay. But the TLB when you miss, you go to through the MMU to do a page walk, page table walk. And so maybe this is an okay policy there because the cost of a page table walk may not be so bad. Okay. But it's still pretty unpredictable and it's really not a great policy for page replacement because you're likely to randomly pick something bad as likely you are as to pick something good there. Okay. This is my favorite guaranteed not to exceed policy. Okay. This is called MIN and if you remember the SRTF policy, which was, you know, if we knew the future, we could, you know, pick the best task to schedule the shortest remaining time first to schedule. Here, MIN is the same idea. We're going to replace the page that won't be used for the longest time in the future. Okay. And this is a great policy for paging for page replacement because it's provably optimal. But of course, once again, you can't really know the future. Okay. So MIN is going to be our, you know, yardstick against which we're going to measure other policies to see how close they get to MIN. And, you know, a little hint about what's good there is going to be while the past is a good predictor of the future. Okay. So this is not LRU. Right. So LRU may be a good policy that's sort of like MIN, but MIN is replace the page that won't be used the longest time in the future. So it's not LRU. Right. As if I knew the future, I'd pick of all the pages I've got, I'd pick the one that I'm going to use longest in the future. And that's the one I'd throw out. LRU is the least recently used page, which is going into the past and trying to make a prediction based on the past. All right. So these are, these are little different things. And as you, you know, as you've already figured out here, LRU is going to be an approximation to MIN. Okay. It's going to be a way of trying to use the past to predict the future. All right. Good question. So MIN is not LRU. So let's look at the next one, of course, is LRU. And this is the replace the page that hasn't been used for the longest time. And programs have locality. So if something's not used for a while, it's unlikely to be used in the near future. And it seems like LRU might be a good approximation for MIN. And most of the time it is. Okay. Now let's ask ourselves how we would actually implement LRU. So obviously, we can't implement MIN. Right. MIN is a, an ideal oracle that let's, has us use the future. We don't know how to do that. But how can we do LRU? Well, we just put all the pages in a list. And, you know, the tail is the least recently used one. And every time we use a page, we move it to the head. And so the thing at the head is the most recently used page. And the thing at the tail is the least recently. And when we're looking for something to replace, we grab the tail. Okay. So this sounds great, except this is very much not great. And the reason is that every reference requires us to move the page we're referencing to the head of the list. So that means that every load or store from DRAM potentially has to rearrange a bunch of items in the linked list to put the page we just referenced back to the head. And so this is basically not going to be an implementable policy in any way that avoids making loads and stores really slow. All right. So I'm going to pause there for a second just to make sure that's clear to everybody. Because in order to do LRU, every loader store has to take the idea of the page and somehow rearrange it so it's at the head of the list, which means multiple loads and stores are required per loader store to come up with LRU. Okay. Now another thing you could imagine maybe is keeping a time stamp on every page so that every time you reference it, you stamp it. The problem then is of course that you'd have to sort by stamp to figure out which one is the oldest least recently used page in that time frame and that's hard to do as well. Okay. So it seems like we're being stymied here. We want LRU because it seems like a good replacement for the oracle at MIN, but now we don't know how to do LRU. And so just to give you a preview, we're going to find a way to approximate LRU in a way that works mostly as well as LRU would if we could implement it. Okay. And thereby give us a way to get closer to MIN than we might be able to get otherwise. Okay. So that's our little bit of foreshadowing. Okay. So in practice people approximate LRU and we'll tell you how. Okay. But let's look at some of these policies just to understand. So I want to go through some simulations just to see what happens on a request pattern. So let's set this up. I'm going to have a really limited processor architecture here which has three pages of DRAM and four virtual pages in the address space. So the virtual pages are called ABCD and the processor is going to do ABC, ABD, AD, BCB. That's got a great B to it. So let's see if we can figure out what FIFO would do. All right. So here we have the three pages, one, two, and three. These are the physical DRAM that we've got. And when we do reference A, that's in virtual memory. And at that point we need to map A to some page. Now that's really easy right now because I don't have any assignments of DRAM pages to address A. So I'll just pick the first one. Okay. So now A is in DRAM page one. B is in DRAM page two. So I'm just working my way through pages because I'm doing FIFO replacement here. Actually I'm doing FIFO assignment. I haven't replaced any yet. So C grabs the third one. So now if you look here, we now have all of our pages are currently assigned in the page table. And we happen to know that address D in the page table is marked as invalid. How do I know that address D is marked as invalid in the page table? Anybody figure that out? Why is page D marked as invalid? It's never been accessed, right? What else? It's not in the DRAM. How do we know that? Do we know it's not in the DRAM? Okay, how do we know it's invalid? I'm giving you... Yeah, all of our page frames are assigned to other addresses, right? Page frame one, we know the page table gives it to A. Page frame two gives it to B. Page frame three is given to C. We know that there is a slot in the page table for D, but because A, B, and C are already taking up all the physical pages, we know that D has to be invalid, okay? Or the operating system is broken. But let's assume that that's not true for the moment, right? So when we get to A, what's great is the MMU gets to find an entry to A. And in fact, we can even guess that the page table or the TLB already has A in it, we can imagine. So not only did we get the mapping for A back here at the first cycle, but we also set the TLB up. And so this works fine, okay? We get to B, that works fine. We get to D. All right, now D is a miss in the page cache. Why? Well, D is going to be looked up in the page table. We're going to see it's invalid. And at that point, we get a page fault and we're going to have to do something here. And what page are we going to pick to replace for D? A, Y. Yep, because we're doing FIFO, right? So we were doing page one, page two, page three. And now page one is the oldest page. And so voila, we pick, we overwrite page A and assign this page frame one back to D, okay? Okay. Now A comes along and look what happened here. A is going to be another page fault, okay? Because we got rid of A. And if you notice, that means we've got to assign A and we're doing a FIFO assignment. So A gets assigned to page frame two. D, that's good. We don't have to do anything. B, well, B is now gone. And so we're going to assign B down here. C, well, C is now gone. So we have to assign C. And then B has no fault, okay? And so if you look here, we've got one, two, three, four, five, six, seven page faults when A, B, C, A, B, D, A, D, B, C, B is encountering a FIFO page replacement algorithm. Okay? So there's seven faults, one, two, three, four, five, six, seven. And notice when we're referencing D here, replacing A was the wrong thing to do, right? Because we were going to immediately need A again. So if we had a better replacement policy, maybe we could avoid this page fault for A. And maybe we could avoid this page fault for B. Okay? So FIFO here is not doing well. Let's look at MIN, okay? Which, by the way, is going to do the same thing LRU does in this case. But let's just think about MIN for a moment. So here we go, A, B, C, A, B, D, A, D, B, C, B says, here we go, ready? So A is going to do the same thing. B is going to do the same thing. C is going to do the same thing. Now you might say, well, aren't you doing FIFO replacement? Well, the answer is I'm just grabbing things off a free list. I haven't replaced anything yet. I've just sort of done the assignments. So now A works, right? There's no page fault, B no page fault. And now we come to D, all right? And MIN says, pick a page to replace that's going to be used farthest in the future. Okay? So if you look in this reference stream up here, the thing that's going to be used farthest in the future is not A, it's not B, it looks like it's C, right? So C is the page that's going to be used farthest in the future, which is why we choose to replace C with D, okay? So that's MIN. MIN is looking into the future, looking into your crystal ball, tell me what's the page that's going to be used the farthest in the future, okay? And now we get to D, or we get to A again, and A's in place, right? We get to D, D's in place. We get to B, B is in place. Why did this work out so well? Because we know the future, okay? C, well we get to C and now C's no longer there. What do we do? Well, at this point, we don't have much in the future to go on. And so we're just going to replace A. Great. So as Chris stated in the chat, page frames refer to physical memory. We only have three pages in physical memory. And the fact that we have four virtual pages means we have more virtual memory pages in use than physical pages available, which is a typical reason for a cache, right? We've got more virtual pages which are out on disk than physical pages which are like the cache, and therefore every page fault is pulling things off the disk and bringing it into the cache, okay? So yes, correct. Now the other thing I wanted to point out is when we got to D here, which one, which of these pages was the least recently used? Okay, let's look back. Okay, so if I back this up and we go to D, notice that both A and B were used recently. So C is the least recently used page. So if we had a way to do LRU, D would have picked number three also. So this is a good illustration of why LRU is often at least a good approximation for min. It's not always the same thing. So in this case we have five faults instead of the seven in the previous example, whereas D brought in, it's brought for the page not referenced farthest in the future. So what does LRU do? Same decision making, all right? Are we good? Now, is LRU guaranteed to perform well? Consider the following, A, B, C, D, A, B, C, D, A, B, C, D. Well, you can imagine what's going to happen here. I'm just going to walk you through. So here is a case where not only where LRU performs exactly the same way as FIFO does, and that's because we have three physical pages but four references and we're always going A, B, C, D, A, B, C, D, A, B, C, D, and as a result we get this cascading page fault pattern. So what's interesting about this is this, it's a lovely pattern, I would agree. The thing that's interesting about this, though, is this is the kind of pattern you can get when, for instance, if this is the page cache, which we'll talk about later when we talk about file systems, and you're walking through a file system by doing a recursive grep or something, you can also end up with this situation where you're always page faulting and none of your cache is helping you at all. What I want to show you here, this is a fairly contrived example with a working set of n plus one on n frames. What's interesting here is that min, though, does better because at the point we get to D, min will actually make a different choice. So we have one, two, three, four, five, six. So min will actually only have six page faults rather than whatever we had up 12 up there. So min is still the oracle guaranteed not to exceed best case of which LRU mostly behaves like min, just not always. Questions? Now I'm going to state up front here that LRU mostly performs very well. So I gave a contrived example here. The question that's going to be important here is how do we make LRU if we can't do it? And what I'm going to show you first, though, and we'll talk about how to make an LRU, is this graph of page faults versus number of frames. If you look here, we have three frames. So if we vary, so we're at the three points in the previous slides, but if we were to add some more frames, presumably our number of page faults would come down. That's a desirable property that as you add some extra memory to that process, the overall hit rate goes down. And the question is, is it always the case that you add more frames and the hit rate goes down? And unfortunately, the answer is no. Okay, there's something called belated anomaly and certain replacement algorithms like FIFO don't have this obvious property. You can actually add some more physical memory and the fault rate goes up. And I'm going to show you this. So does adding memory reduce the number of page faults? The answer is yes with LRU and min, but not with FIFO. And so here we have a reference pattern, A, B, C, D, A, B, E, A, B, C, D, E. And notice that we've got three physical page frames and five virtual ones now. And what's interesting about this is if we add a fourth page frame, a physical page frame, and we do the same FIFO assignment, you can work this through on your own. What you'll find is there's actually more page faults, even though we've added more memory to that process, which is a little counterintuitive. And it turns out that FIFO is just bad for many reasons, not the least of which is that FIFO suffers from belated anomaly. And so contents can be completely different with adding more memory. And that's kind of part of the reason this has a problem. In contrast, with both LRU and min, when you add some more physical pages, things always at least stay monotonically go down. They may stay constant for a little bit and then go down. But this is why we are going to abandon min as a desirable policy from this point on, okay? Questions? Now, did I say, yeah, I said I meant FIFO. We're abandoning FIFO. What did I say? I'm sorry. Whatever I said there, I mean, we're abandoning FIFO from this point on. We're not going to abandon min. Of course, we couldn't implement min anyway. All right. Thanks for catching that. Now, so how do we approximate LRU? Well, there's something called the clock algorithm, which I'm sure you've all heard about. So the idea here is we take every page in the system and we link them all together, okay? And so every physical page is in this loop and we're going to have a single clock hand that's going to point at a page. And what happens here is we're going to advance only on page faults, okay? And we're going to check for pages that aren't used recently and we're going to mark them in a way to keep track of that, okay? And so what we're really looking at is not the least recently used page, but a least recently used page, okay? An old page. And so how do we do that? Well, the details are pretty simple here and I'm going to walk you through them. But the idea is that every page is going to have something we're going to call the use bit. Now, Intel calls this the access to bit. Let's call it use for the moment here. And that use bit is something where the hardware sets the use bit in the page table entry or the TLB when the hardware uses that page. So either a read or a write to that page will set the use bit, okay? Now, the hardware never clears it, never puts it to zero. And so that's going to be up to our software. It's going to be up to the OS to set the use bit, clear the use bit to zero underneath the clock hand. And so what will happen just abstractly here is we're going to put the use bit to zero. And then when we come all the way around, we'll take a look and if the use bit is still zero, then we know that that page hasn't been used and all the time it took the clock hand to go all the way around. And so at that point, we're going to call this an old page because it hasn't been touched in the time that we went all the way around, okay? And again, keep in mind that we only move the hand when there's a page fault. So going all the way around meant that we've had enough page faults to walk through all of our memory, okay? Now, if the clock hand looks at a page and its use bit is one, that means that that page has been touched since the last time that we were there. And so we'll set it back to zero again and then we advance on to the next one and we check in the next one. And eventually we'll find one that is a zero use bit. At that point, we know that this is a good candidate for replacing because it's an old page, okay? So that's the clock algorithm. I'm going to pause for a second here, all right? So notice that the use bit gets set to one by the hardware but cleared to zero by the operating system. So it's a funny bit. It's a set by hardware cleared by operating system. Now, some more details. Notice that what I said here is that you first check the use bit and if it's a one, you set it to zero and you keep looking for a page because you've found a page that's not an old one yet. The question is, will you ever find a page or will you just loop forever? Okay. And the answer is you'll always find a page because notice that we don't let any processes run. So while we're trying to find a page, it's only in the operating system. All the other stuff is suspended and therefore, we keep setting everything to zero and in worst case, we may work all the way around but now that page that we set to zero is when we end up replacing right away. And you can imagine that if we have to go all the way around before we find something, then maybe we have a lot of thrashing going on. But this algorithm is guaranteed to find a page as I've stated it here. Now, what if the hand is moving very slowly? Well, that's actually good, right? Why is it good? Because there are not many page faults because I only do this on a page fault. And it either means that page faults are coming very frequently or I quickly found a page. In either case, it means that I'm not walking my way through all of the pages just to find one to replace. So that's a good sign. If the hand's moving quickly, that means we have lots of page faults or lots of reference bits set. And that means there's a high axis of pages and a lot and or a lot of page faults, either of those mean I've got some trouble. I've got what I would call memory pressure. Okay. So one way to view this clock algorithm is a crude partitioning of the pages into two groups, young and old. Okay. And we're going to throw out somebody a page from the old category. Okay. Now, you might say, well, you know, why not partition into more groups? Well, we can do that. There's something called the nth chance version of the clock algorithm. All right. And this is basically give a page n chances before we throw it out. And the idea is the OS is going to keep a counter on each page. And it's been to be the number of sweeps of this page. And so on a page fault, you check the use bit. And if it's a one, you clear it. And you also clear the counter, because this page was used in the last sweep. And so it's a young page. We're going to totally discount it. On the other hand, if it's still zero, what that means is it hasn't been touched in a whole iteration around the loop. But rather than the vanilla clock algorithm, what we're going to do instead is we're going to say, oh, let's give it another chance and we'll increment our count on that page. And only if we hit n do we replace it. Okay. So what this nth chance is doing is it's saying that before we replace a page, it has to be not used for n loops of the clock. Okay. So that basically means the clock hand has to sweep by n times without the page being used before it's replaced. How do you pick n? Well, what's interesting is if you pick a really large n, you're effectively getting a better approximation to LRU. Excuse me, because now we're dividing pages into not just two groups, young and old, but groups that vary by what the value of n is. And as n gets larger, we're dividing it into more and more categories. And if it's really large, you kind of get a better approximation to LRU. But it's really expensive because you have to go around many times before you find something to throw out. Why pick a small n? Well, it's much more efficient. So you might imagine a small number like n is two or three, not n is a thousand. And here's a particularly useful way of using n, where we're going to keep n to be very small. And the thing we haven't talked about at all up till now is when we throw a page out, we need to make sure that it doesn't have data in it that we can't afford to lose. And therefore, we need to write back to disk. That means that the modified or dirty bit is going to be set. Okay. And in that instance, if we go around the clock and we pick a page that we're interested in, but it's dirty, what we could do is start it on the process, you know, start it being written back to disk, and then wait to go around again. And if it's cleared at that point, then we know it's a clean page, and we can just throw it out. Okay. And so one idea here is basically that clean pages, you use n equal one, and you immediately replace them if the value of use is zero when you look at them. Otherwise, if it's a dirty page, then what I'm going to do is I'm going to start it being written out to disk. And I'm going to wait for n equal to, namely, I'm going to go all the way around again before I replace it. And hopefully when I do that, it's now a clean page by the time I've gotten all the way around. All right. So that's called the nth chance. Now, let me bring the Intel PTE in this and also talk through the page table entries. Again, I've shown you this before. We've really got four bits of interest to clock type algorithms. P or the present bit or the valid bit. Those are called different things on different architectures. The writable bit or W, you see here, basically says that this page can be written when it's a one. Sometimes there's the opposite sense in which you have a read-only bit. So when it's a one, the page can only be read but not written. In the Intel architectures and a lot of other ones, there's a W bit which has to be equal to one before you can write. The access bit or the use bit, we've already explained that it's zero if the page hasn't been accessed since the last time the software set it to zero. It's a one if it has been accessed and it's been set to one by hardware if it's been accessed since the last time it was set to zero. And then there's the D bit or dirty bit. It's also sometimes called modified. And if D is zero, then the page hasn't been modified since the page table entry was loaded. You pulled it off the disk, the page itself. And if it's a one, then it's been written to since then. And so these four bits, P, W, A, and D, make for a much more complete set of bits required for paging. So we clearly need the present bit to know whether a page is in memory or not. The writable bit is basically how we allow to have some pages read-only and some written. I'm going to show you another interesting way to use that in a moment. The access bit and the dirty bits we've already talked about to some extent. But let's see, let's look at some variations. So some variations might be, do we really need a hardware supported modified bit or dirty bit? And if you think about it for a moment, once I've paged something in and I've marked the page table entry as valid and writable, for instance, then I'm going to turn the processor loose and it's going to be allowed to do reads and writes to that page all at once. So hopefully you can see why that question I asked on the following slide is a good one because you can imagine that if I didn't have this done in hardware, then how would I know that the page has been written to because I'm just letting reads and writes, loads and stores go against that page and the operating system isn't involved? So unless I have that modified or dirty bit in hardware, I won't know that information. And so that's why this question comes up, do I need it? And it seems like the simple answer would be yes, but the real answer is not. It's no and it's because we can be clever in how we use those four bits. So we can emulate it using the read only idea or the W bit. So what we're going to do is we need a software database of which pages are allowed to be written. We kind of needed that anyway. So for every process, we know which pages are marked read only, which ones are writable and we need that as we page things in and out from disk and so on. So we assume we already know that. And so we're going to let the MMU help us so that the operating system gets to take over when we need to record information. Okay, now the question is, does the CPU on the chat here is does the CPU set the dirty bit if we override data with itself? Yes. So the CPU doesn't try to distinguish the notion that you wrote the number three over the number three. All it knows is that you wrote it. And by the way, in most cases, that's a particularly good simplification because it's very rare that you completely overwrite everything in a page with exactly the same value. So the dirty bit just means that I executed a right instruction against that page. Now, what we're going to do is we're going to tell the MMU that pages are more restrictive permissions than they really need to. And so what do we do? Well, we're going to mark pages that could be written as read only. Okay, and if we do that, then we know the moment they try to write, we're going to get a page fault and now the operating system can record the fact that we've written. Okay, so this is a new algorithm. I'm going to call this the clock emulated modified bit or emulated M. And initially, we're going to mark every page is read only with W equals zero, even the writable ones. And we're going to clear all the software versions of the modified bit that we're keeping in the operating system somewhere. And notice why do I say the software versions because we're assuming for a moment the hardware doesn't support modified bit. Okay, so the moment we cause a right, what happens is we get a page fault. And if the rights allowed, we check up in our database, then the operating system is going to go ahead and set the modified bit in software in the operating system, and then mark the pages writable. And so from that point on, we let the rights go at full speed without any page faults, but we've already recorded the fact that somebody has written. Okay. And so whenever the page then gets written back to disk, we'll clear that modified bit back in software and mark things as read only again to catch future rights. Okay, so this is hopefully pretty clever here as you see. What we've done is we've decided to play with the permission bits on the page table entry to give us page faults that are events that we can then use to track whether the page is dirty or not. Okay, now could this cause twice as many page faults? Yes. All right, this could cause an issue with, you know, you get a page fault when you pull the thing in and then you get another page fault when you write it. So that's twice as many page faults. Hopefully, notice the page faults, the second one, the one that's on this page, doesn't require going to disk. It's a simple event into the kernel and back out again. So that's a fast page fault as opposed to the page fault that pages things it off of disk, which is a million instructions. Okay. So basically trapping into the kernel, if you set it up properly, can be reasonably fast for things like this. Okay, but as you have identified, we are page faulting twice as frequently as we would otherwise. Here's another question. Do we really need the hardware supported use bit? Okay, so once again, so there's the first right happen here. Well, what happens, that's a good question. I'm glad you mentioned it. So the first right caused a page fault, right? Correctly. So we set the modified bit and market is writable and then we return back to the program or the process, excuse me, which is going to retry the right. Okay, because the page fault is a synchronous operation, which occurred on that right, and therefore that right is not going to make any forward progress at all. And so when we return to retry it, we're going to retry the right and it'll succeed on that second time through. All right. And so you're going to actually have two rights, one of which caused the page fault and the other of which actually causes the right to happen. And then the process gets to go forward. So the idea about do we need a use bit? No, we can emulate it in the same way above. In fact, what I'm going to show you is how could we get by without a use or a modified bit so that effectively the only thing we've got is the valid bit and the writable bit. Okay. And how do we do that? Well, here's the clock emulated use and M bit, or M algorithm. And we're going to mark every page is invalid, regardless of ones that are valid or not. Okay. And notice that we're going to mark all the pages is invalid, even if they're in memory and we're going to clear all the emulated use bits and modified bits to zero. Nikki, I'll answer your question just a second. And so now what happens? Well, it doesn't matter if we do a read or a write, we're going to cause a page fault because we mark things as invalid. Okay. And so we'll we'll trap to the OS on any access. And at that point, we're going to definitely set the use bit equal to one because we had some access. It doesn't matter whether it was a reader or right, the use bit gets set to one. And then we can take a look at what it was because we know the address that this was at and we can know whether it was a reader or right that was attempted. And if it's a read, we're going to mark the pages read only at this point. And meaning W to zero. And that means we'll catch future writes. On the other hand, if it was a right, we're going to just set the modified bit to one and mark the pages writable. So because we set W to zero, meaning it's read only, then if we happen to write, we'll catch it and be able to set the modified bit. Okay. And then when the clock hand passes by, just as I mentioned the clock algorithm earlier, we're going to reset the use bit to zero and mark the pages as invalid again. Okay. And the modified bit gets less left alone until the page gets written back to disk. So the question that was asked in the chat here, which is a good one, which is, well, this doesn't seem useful. I'm saving one or two bits. So why are you going to all this trouble? So the answer is, I'm talking about architectures, like processors that don't have a use bit or a modified bit that's implemented in hardware. Okay. Intel ones that you guys are dealing with have the advantage that they have both a use and modified bit or an access and dirty bit. Those are done by the hardware. If you had an architecture which didn't have them, like the Vax, which I'll mention in a second, then you got to do something else. Otherwise, you're going to get incorrect behavior. And this is showing you how you can get by with just the valid bit and the read write bit to emulate modified and used. Okay. But we have a lot of page faults going on here as was identified, just to simulate use and modified. And so remember that the clock algorithm is just an approximation of LRU. So maybe we could do better. If we don't have a use bit or we don't have use and modified bit, maybe we could do something better by doing something slightly different than the clock algorithm. And the answer is, yes, we can do something called a second chance list. Okay. So the second chance list divides the pages of a process into two categories. I'm going to call them green and yellow or directly map pages and second chance list pages. And things that are green are pages that are mapped writable and therefore whatever the processor does, it won't cause page fault. The second chance list are ones that are in DRAM, but they're still marked invalid. So that means if I get a page fault on them, I'll be able to mark them as valid in a moment without going to disk. But for the moment, they're yellow here because they're marked invalid. Okay. And let's look at this a little bit. So the access pages and the active list are done at full speed. Otherwise, we page fault and we deal with stuff in the yellow page list. And we're going to manage the yellow page list as an LRU list for real. And the green pages, since they're directly mapped, we don't get any events on them and we'll get to just do them at full speed. Okay. So let's look at something here. So suppose we get a page fault because we access some page that's either in the yellow group or on disk. Let's assume for a moment that it's in the yellow group. What's going to happen is these pages in green are my current directly mapped pages. I'm going to get rid of one of them and put it at the end of the LRU list and yellow. I'm going to take the page that I was looking for. I'm going to put it in the green list at the new end and I'm going to mark it as valid. So that page fault basically was a page fault because I wanted to get one of this page here that's in yellow. I wanted to access it, but it's invalid. It's marked invalid in the page table. And so instead I get a page fault. And what I'm going to do is I'm going to do a swap. So I'm going to put this green thing that's at the back end of the FIFO green list. I'm going to put it on the LRU list on yellow. And the page I actually page faulted on is going to be in green. And so what you notice here is that the green list is managed FIFO, but the yellow list is managed LRU. And if the yellow list is big enough, then I'm going to effectively get something very close to LRU without having to emulate the clock algorithm and have page faults all over the place. Okay. Now the other interesting thing here is if the reason for this page fault was because of something on disk, what I'm going to do instead is rather than this yellow page going down to the green, what I'm going to do instead is I'm going to take the least recently used item off the end of the yellow list, throw that out and bring the page off of disk and put it at the new end of the green. Okay. So this particular algorithm called the second chance list is basically keeping the pages that are really, really actively accessed in the green list. And then when it throws something off the end, it puts it in the yellow list where they're sorted LRU, but then they're given a second chance to be brought back. That's why it's called the second chance list. Okay. And you don't have to scan through the second chance list because I'm going to manage the second chance list as an LRU list. So this will be just a single pointer list that keeps track of the old end and the new end. Okay. And so you notice all I really have to do is whenever I take something out, I have to be able to close up the list and whenever I put something in, I have to be able to put it at the new end of the LRU. And the way you know that the items in the second chance list is because A, you got a page fault. So it's not in the green list. And B, remember that database I mentioned earlier where you keep track of everything? Well, you know that it's in memory as opposed to on disk. Okay. So this has got some data structures that are keeping track of where everything is. Okay. Now, so how many pages do I put in the yellow list versus the green? If I put nothing in the yellow list, then this goes back to FIFO because that green list is FIFO. If I put all of the pages in the yellow list, I get LRU, but I get a page fault on every reference. Okay. So the expense of managing this is 100% LRU as I get a page fault everywhere. And I can decide kind of how much of the green list to have to avoid those page faults. Okay. And I pick an intermediate value. And the pros of this is that there's fewer disk accesses than that emulated clock might be. So the page only goes to disk if it's unused for a very long time. The cons are there is a little bit of an increased overhead with trapping to the OS in this case. And with page translation, I can basically adapt to any kind of pattern the program makes. And later, we'll kind of show you how to use the page translation and protection to share memory between threads. And that's going to be something we'll have to talk about a little later. The interesting thing I want to point out here is that this second chance list was used in the original Vax operating system. And there's some funny history there. So Strechler, who is the architect of Vax, which look it up on Google, you'll find it's a very famous architecture from Digital Equipment Corporation, asked the OS people, do you need a use bit? And they said no. And so then when they got around to trying to implement replacement policies, it was like, oops, yeah, we really did need a use bit. And at that point, Strechler got blamed for screwing up the architecture by forgetting to put a use bit in, but in fact, he was told he didn't need it. The Vax operating systems folks came up with a second chance list algorithm as a way of avoiding the use bit. So you don't have to do the clock, even though you can't do the clock algorithm, you can do the second chance list algorithm, which is still pretty good. All right. Now, bear with me for just a moment. This clock hand, the clock algorithm, as I've been telling you, which by the way we can use on an x86 processor, because we have use and modified or accessed and dirty bits. The way I told you about it, as I said, well, there's a single clock hand, and you advance only on page faults. So that means that at the moment I have a page fault, I got to go to the clock algorithm, define a page, maybe push it out to disk because it's dirty. And I got to work my way through that clock hand to find a free page so that I can start my access to the disk to pull it into data or to pull it into DRAM. So that sounds like a dumb idea. In fact, that's not the way people do that. What happens is there's a free list, and that free list is filled up by the clock algorithm. And so there's a demon in the background that looks for free pages to keep the free list full. And things that are put at the head end of the free list, if they're dirty, they get written out to disk. And so in that instance, as long as they're get written to disk by the time they work their way down to the front of the free list, any time I get a page fault and need a new page, I just pull it off the head of the free list because it's clean. So this idea of a background clock algorithm is what's really done in modern operating systems. They often call it the page out demon. And the dirty pages get paged out by the time we get to the head. And just like the second chance list, if it turns out I have a page fault that needs one of these pages, then I just pull it back off the free list and put it back into the clock, and I don't, as if nothing happened. So all of these things in the free list here are second chance pages. So I could probably color these as yellow if I wanted to be consistent with the second chance list algorithm. The advantage here is it's much faster on a page fault. You can always use the page or pages immediately after. So the last thing I wanted to say here, and then we'll finish for tonight, and we'll pick up on Monday, is when you evict. So the free list is separate from pages in memory. That's the question in the chat. These are still in memory. So they're in DRAM, but they're marked as invalid. So they're not in the clock. They've been taken out of the clock, put into this free list, marked as invalid. And the reason it's a second chance list idea is because we can pull them back in if we need them. So when you evict a page frame, one of the things that you may not have thought about is that you actually have to figure out all of the processes that point to that page frame. And that gets hard in the presence of shared pages. Because when we fork processes, we have shared memory. There's multiple processes whose page tables all point to the same page. And so there's something called a reverse mapping mechanism which has to be very fast and basically lets us go from a physical page back to all of the virtual addresses and page table entries that represent that. And there's many ways to do this. You could have a linked list of page table entries for every page descriptor that can be expensive. Linux has a way of grouping objects together to do a much faster way of going from physical to virtual and finding all the processes that own a page. Okay. All right. So we'll end us for now. So we talked a lot about different replacement policies. We talked about FIFO, MIN, and LRU as kind of idealized policies. FIFO being simple to think about but being just a bad policy all over. MIN being replaced the page farthest in the future. And LRU being kind of an ideal prediction based on the past that we can't quite implement. We talked about the clock algorithm which is an approximation to LRU that we arrange pages in a circle and we use it to find an old page, not the oldest page. We talked about the nth chance algorithm which is a variation that lets us divide the pages into multiple chunks instead of just two. We talked about the second chance list algorithm which is another approximation of LRU that was used on the backs when you don't have a use bit. And next time we'll start talking about the working set a little bit more to understand better how to figure out how much memory to give each process. All right. I think that's good. Our time has expired here. I hope you guys all have a great weekend. Good luck studying. And I guess we will see you on Monday. Ciao, everybody.