 All right. OK, everybody. Let's get started. So we're doing a format change now in the lectures from now on. And we're introducing some mini quizzes so that you can test your understanding. And we can get a sense of what people are understanding. They're simple, true false questions, not graded. And they'll give you immediate feedback. And they're going to completely separate from the quizzes in section. So let's look at the material from last time. All right. So first question. Associative caches have fewer compulsory misses than direct map caches. True or false? False? OK, why? Because the compulsory ones, it doesn't matter, right? They're forced on you by the fact that you're bringing the content in. All right, two-way state-associative caches can cache two addresses with the same cache index. True or false? What's that? True. All right, yes. OK. Yep, that's the point. They give you two options for getting something with the same address into the cache. Right through caches, a read-miss can result in a write. All right, OK, which type of cache is that true? Write back. OK, very good. OK, TLB, read this one carefully. TLB caches translations to virtual addresses. Good trick question. OK. All right, so let's quickly review multi-level tables again and look at their limitations. And we're going to look at the last fairly widely used method for doing address translation. So you've seen these before. The idea is that when you have a virtual address like this with a lot of page bits, so there's 20 bits of page address there, doing that in a single table is going to be impractical. You'd need about a megabyte or several megabytes of page table. And you normally don't need it because you're probably using addresses that are significantly shorter most of the time. So by using the two-level table, you can make it sparse. You don't need to use all of the possible values of the first index, just the ones you're actually using. All right, so that's familiar. And people have taken that idea pretty far in practice. So in the x8664, which has a 48-bit basic virtual address, you still keep the page size the same. It's a 4K byte page with 12 bits. But you actually build a four-level table for the rest of the page bits. So there it is. It's what you'd expect, just four levels. Now in this case, because the addresses are not 32-bit anymore, you need a longer page entry, page table entry. So these are 8 bytes. And well, what else changed between this and the previous slide in terms of the address? What do you notice up here? Yeah, we lost a bit. And why did we lose a bit? Apart from the fact that it does fit in 48, but there's another reason. Well, OK, how big are these guys then? How big are those page tables? How many bytes? Yeah, they're one page. They're 4K bytes. OK, so that makes them easy to page in and out as well. All right, so yeah, and the actual number of bits, it's process of dependent, can be more or less than the actual address, than the virtual address. All right, so yeah, question, go ahead. Right, because the point of this is we're dealing with a 48-bit virtual address and presumably a 40- to 50-bit physical address. And these entries have to store physical addresses, right? These are the page tables really in memory. So you need at least six bytes. And there's usually other stuff in the entries. So if they're 8 bytes, we've basically doubled the size of the page table entries compared to the 32-bit processors. And so we just lose one of these bits. It's convenient with 48-bit addresses. Works out nicely. Well, the logical extension would be if we had a true 64-bit address, which they have on the Itanium architecture, it seems to imply a six-level page table. What's bad about that? What's that? Slow, yeah. Yeah, slow and kind of wasteful too. Down here with these first two blocks of page address here, we've got about a gigabyte of addressable memory. And not many processors would use more than that. So basically you'd have a chain of page tables with one entry and everything else null. So there's something more economical. It's too slow and wastes a lot of table space. And the idea that's used instead in that architecture is an inverted, excuse me, inverse page table, IPT. And here the idea is to base the table on the physical page addresses instead of the virtual ones. So it's a table going backwards from physical memory to virtual memory. So if you imagine a process has some virtual memory pages there, they're mapped somehow into physical memory like this. And so let's say VM page 0 is actually at address 0 in physical memory. Then the entry at address 0 in the IPT is going to point or rather contain the address, the virtual address of that page. So it's similar to a forward table. This is an address, page address actually. These are all page addresses. They're not necessarily in the same order. But they're indexed now by these page addresses. So you can see in hex, we're stepping down by page boundaries. And so the page address is just the first digit of that hex number. OK, so far so good. So the issue though, obviously, is that we don't need to go backwards from physical memory addresses to virtual memory. The processor is going to have virtual memory addresses that we want to map to physical addresses. So to do that, it's typically done with a hash table, hashed on the virtual page number here. And that gives you a hash. It's arranged so that it hashes to one of the physical page table addresses here. And so for instance, for virtual memory page 2, by definition it was here because it hashed to that location. And this page in physical memory is here exactly because it's address hashes to this location, OK? So that's the idea. The address here is just the table address. So this value here must have hashed to three. And that tells us that we have to use a physical address of three. And the virtual memory page is actually an address, a hex 3,000, OK? So there's obviously a few problems with this, but there are solutions for them. But that's the basic translation process, yeah? Well, this is probably the first step because there's somehow of how they got there is based on the hash. So in reality, I'm starting with something like this. I'm starting with a virtual address. I'm going to hash it into this table. And initially that would be empty. Once it hashes there and if there's nothing currently in that physical memory location, then basically that page is swapped in to address number three or 3,000 here. And it occupies that frame. So the hashing is how you figure out where to put things. And the location in physical memory is just defined by that hash function. So among the problems with this, the limitations are rather that because there's a unique page, virtual page associated with each physical page, you can't share addresses. So you've got to do something else. You can see here the physical page addresses here. You also have to include the process ID to specify which of the processes is currently running actually owns that virtual memory page. Excuse me, it owns the physical memory page here. OK, yeah. Yeah, so that's a very good point. So in fact, typically the process ID is used in addition as part of the hash. So you get a different. But that's a very good observation. Yeah, so on the bright side, though, we have a table here that's linked directly to the size of physical memory. So this will always fit. And in fact, it always consumes a small fraction, about 1,000th of physical memory. All right, so the good thing is it scales well with memory size. Most of the time, unless the hash table fills up, it's going to be fast. It just requires two memory accesses into the IPT and then to the address of the IPT. You shouldn't need to page it out because, again, it's just not consuming a lot of resources. In fact, it would break if you did have to page it out because you wouldn't be able to tell what was in physical memory. The hash function can be fast if it's implemented in hardware as it is. But there's some difficulties. So we saw one of them is that you can't share pages. The other one is that collisions can happen. And so you've got to manage those, say, by chaining, which means if something is already using the slot that you want to go to, you have to either step by a fixed step or rehash. And because of that, by the way, that's the reason as the hash table fills up, it's going to get slower because there's a higher chance of you trying to put something into a location that's already occupied, having to step down a few times. OK. All right, so let's review some of the concepts from address translation. All right, so going back to the beginning, paging does not suffer from external fragmentation. True or false? All right, that's kind of the point. Yeah, it's a solution of using fixed blocks so that you don't have fragmentation. What about segment offset can be larger than the segment size? False? Yeah, I mean, false unless you want a fault, unless you want to throw a trap. All right, and to compute a physical address, physical page address, you add physical page number in the offset. All right, good. OK, now you're concatenating them. The addition is the method used for segmentation. OK, uniprogramming doesn't provide address protection. We're more or less defined it that way, yeah. The virtual address space is always larger than physical address space. Good, yeah, they can go either way. All right, what about inverted page table keeps fewer entries than multi-level page tables? I heard one. It's actually a bit of a mind cruncher. Yeah, and unfortunately, it's not generally true. The inverted page table always keeps a fixed number of entries. The multi-level page table usually is less because the entries here normally do map to physical page entries. And each process normally has a subset of the physical pages, so it's generally less. OK, all right, so we're going to get into caching policies pretty soon. To quickly review the translation process, the whole process, we start out with virtual address and an offset. We can go through the usual hierarchy of page tables that are grayed out there. But from last time, also we know that we can use a cache of the page numbers, page number mapping, which is the TLB, which will get us much more quickly to a physical address. So very important also with these increasingly hierarchical page tables. OK, and in addition, sometimes we go to physical memory. Ideally, though, we'd like a lot of the time to go to cache instead. And that process involves extracting from the address an index into the cache, an index into the block of the cache entry, and then a tag to verify that you've got the right element in the cache. So that should be familiar. So the index was actually the part of the address that you're using to access the cache, fully associative caching that sort of the whole thing. But the end-way associative caches, you need to verify that you've got what you think you've got. The byte is just the address into the block or cache line since the cache always caches things bigger than a byte. And the tag is the information that's in the cache to verify which entry it is. OK, so now it's an appropriate time to consider how we decide what to throw out when we have to throw something out. So we'll look at three different algorithms, FIFO, LIU, and the clock algorithm. So we looked at caching as a means of giving faster access to memory so far. The trouble is, though, that people use a lot of data, a lot of processes. And fortunately, code spends a large amount of time in a small fraction of the code and also in a small fraction of the data. So we can accelerate things by using caching. We already saw that in the context of main memory. But in addition, we can use an expanded virtual memory that's mapped onto secondary storage and do caching at that level, which is demand paging. So memory caching is a way to accelerate accesses to main memory. Demand paging is a way to accelerate access to virtual memory that's mapped onto the file system. There's even more reasons for doing that, because the gaps in access time between main memory and disk and SSD are much larger. So demand paging is the kind of caching. So we should ask about the design choices that need to be made. What block size it's going to be a page? What about the organization of the data? What kind of mapping should we use? So I'll just give you that one since we already showed up. So fully associative, because we like flexibility and it turns out the algorithms we're going to use for deciding where to map things are general enough that there's no hardware constraints or associativity constraints. So fully associative is the most flexible mapping, so that's the one we're going to use. All right, how do we find a page in the cache? With a paging algorithm. All right, sorry. We have a TLB, and we still have the page table. So we're using the same infrastructure that we already developed to talk about mapping virtual memory to physical memory. You can also use that to map to the disk image of virtual memory. All right, replacement policy. We're going to explain this. So it's going to be based on Leith recently used. That's the best you can do deterministically. The best you can do without being clairvoyant as far as replacing pages. All right, so what should happen on a miss? Yeah, you've got to read the appropriate page into cache. Normally it would just come from secondary storage. In principle, you could have two levels if you wanted to, if you wanted to have SSD before disk because it's actually a couple of orders of magnitude faster. Should we use a write through or a write back cache, do you think? What's that? Yeah, write back, okay, good. Why? It's very slow, right? So we don't want to be doing it too often. Which requires us to keep track of when pages have been written to so that we can remember to write them out. Okay. Okay, so for demand paging, we need to keep track of what addresses are currently, what pages are actively mapped in memory currently. And so we'll be using a valid bit to do that. When the valid bit is not set, that means that that page is not currently in physical memory. And we need to fetch it from disk if we want to use it. So there's an address there that may be the same or it may be a different address depending on how the mapping occurs to the virtual memory image on disk. Okay. All right, so when the user references a page with an invalid bit means it's not in memory, the sequence of things happen. The memory management unit is normally checking for that and so that causes a trap. You want that to happen fast so it's supported in hardware. This question? Yeah, I mean RAM, yeah. So when the user references something with an invalid bit in the page table entry, you get a trap, it's a page vault, and the operating system tries to fetch the appropriate page. Before it can bring something in, it's got to throw something out assuming the available physical memory is full. So we'll assume that it is. So you've got to find an old page to replace. So since you're using write back, you've got to check whether the old page to be replaced is dirty and it's been written to since it was read, it needs to go back to disk first. We need to change the page table entry and the cache TLB that was pointing to that page to be invalid. Load the new page into memory from disk, update the page table entry, and let's see, invalidate the TLB for the new entry and basically go back to the original faulting location. So we continue the process. All right, so that's a software implementation of the cache. So it's doing very similar things to what the hardware was doing from memory. So the TLB for the new page is going to be updated when you actually access the new page when the thread continues. So because the fetching of pages is going to inevitably take a long time, the operating system is normally going to, while when this thing traps, it'll normally pass execution off to another process that's ready to run so that it can keep using the CPU. Yep, all right, well, let's see. So update page table entry. So we've loaded it into memory. We haven't yet, well, let's see. So the hardware is responsible for putting a valid address in the TLB, which means you have to access it, I think, through the table. I mean, you basically have to read the entry, I think, before, read the page table entry before the TLB will be able to map it. And I've tried to understand TLB to be invalid. TLB page avoid. All right, I'm sorry. I should have checked that more carefully. I'll have to get back to you on that. Yeah, so what's the method there? All right, so to review those steps, there's a reference from a user process into the page table. When the folder occurs, it implies that that entry had an invalid bit, meaning it wasn't currently in memory. That generates a hardware trap back into the OS. It resolves somehow from the page table entry where the appropriate page is on backing store, pulls it into a free frame of physical memory. So we've left out the steps of ejecting something, pulls it into a free frame and updates the page table entry. And the TLB updating is happening in the background as described on that previous slide. All right, so we can compute effective access times for page accesses as well as we could for cache accesses. And it's gonna be basically hit rate times hit time plus miss rate times miss time. The difference now from before is that there's orders of magnitude difference between those two. So it's a lot less, a lot more sensitive to the probability of misses. So memory access is a couple of hundred nanoseconds typically, disk accesses five to 10 milliseconds. So p is the probability of a miss, hopefully small. One minus p is the probability of a hit. And we compute the expected value. It's just hit probability times hit time, miss probability times miss time. And basically one minus p is close enough to one. So we've got the basic hit access time plus a probability times eight million nanoseconds. So if even at one out of a thousand of those accesses causes a page time, we get an effective access time that's orders of magnitude or at least factor of 40 higher than the basic memory access time. So we would really like to avoid that, but it's pretty hard. If you imagine instead that we wanted to make the misses effectively invisible, i.e. if you wanted to slow down to be less than 10%, that requires the probability of faults to be down below one in a hundred thousand or around one in 400,000, in fact. So it's pretty hard to get those kind of numbers. And in fact, but getting close is important. So the page replacement strategy is very important because if you throw the wrong thing out, you're gonna get a fault again. All right, so which things do lead to misses? All right, compulsory misses are the ones that are caused by processes that are being read in for the first time. Usually not too much you can do about it. You can try to, in some advanced systems, they try to learn what pages are needed and prefetch them, but otherwise it's very hard. Capacity misses are where there's just not enough capacity in physical memory to store everything. So there's overall capacity misses that require overall increases to memory, but at the same time they're also, we have some control over how we allocate memory to processes and we have the option of allocating more memory to greedy processes that are trying to use more. And that will give us a better, more balanced allocation overall. So page allocation is quite different from scheduling where we're trying to be fair to everybody. Page allocation is there are advantages to being, to basically rewarding greedy processes. All right, so conflict misses are caused by, address conflicts and because we're assuming an associative mapping between virtual and physical addresses, we don't have this issue. The ones that we have some control over are the policy misses. So those are based on our decisions of what to throw out. And so we design the best policy we can to minimize the number of those. All right, so we've already argued that misses are extremely expensive. So we really, and they will be, the policy misses will be controlled by the policy we use for ejecting pages. So let's start looking at some of the strategies for doing that. Now if we were doing scheduling, we might think of something like first in, first out, which means throwing out the oldest page. That's essentially being fair to each page, giving it an equal amount of time in main memory, but it's a terrible strategy in practice because the use of pages is very non-uniform. The heavily used ones, if they're being heavily used they're gonna be used again heavily and cause you page faults if you throw them away. So in fact the successful strategies do try to measure and reward processes that are being greedy or pages that are being used a lot. So an optimum strategy is min, which involves replacing the page that is being used the least in a specific sense. It's the one that won't be used again for the longest time. It's intuitive that that would be optimal, but unfortunately it's not practical. But on the other hand it does get used in theoretical analyses of caching protocols because it does provide the best possible outcome and you can easily specify what it would do on a particular sequence and you can compare theoretically what it does with some heuristic scheme. Another option is random replacement which is very easy to implement and it has a virtue that it's not going to suffer from pathological cases. It's gonna behave more or less in similar ways regardless of how the inputs are arranged. So it's not repeatable, but it's simple and very fast and in fact it's used in TLB replacement. Okay. All right, so let's look at how we implement FIFO. So we wanna keep track of the order which pages were read in. So we can do that with a queue which you can implement with a link list. So the head of the queue is actually the oldest page. The tail is the newest page and when you add you add something to the tail. And over time the oldest pages will make their way to the front. Okay. And you'll have a handle from the head of the queue on the page that's been in the longest. In other words the first page to be added. So if you list gets longer than the capacity of your memory then you can eject it. All right and we'll see in a minute through an example how well that does. Generally not very well. LIU is an approximation of men by instead of, since you can't look in the future you look in the past. Intuitively pages that are being used a lot which are gonna occur soon in the future will have probably occurred recently in the past. And the process that you care about to eject which is the one furthest in the future or rather the page excuse me is often also the page that was accessed the least recently in the past. So the furthest away from you in time in the past. So this you can measure. All right so and we're assuming that the behavior of the program is consistent over time in terms of frequency and accesses. So the past predicts the future. Okay and it does turn out to be a reasonable approximation to men. So the implementation seems simple but there's some significant performance issues. So again we could try by maintaining a queue. So if you translate the most recently used page that's just the current page. You wanna put that at the end of the queue. The least recently used page if all the pages are distinct is just the oldest page. Okay so so far it just looks like the FIFO queue. And so you can pull from the head of the queue add to the tail and you'll get FIFO behavior but what we haven't so far factored in is pages that are already in cache. So here we have page two. It was used some time ago and it's almost the least recently used page but we've just accessed it again. So if we're gonna maintain the structures least recently used one of these has to go away which one should go away? Yeah I think I heard that one. So yes the reality is it's actually currently the most recently used page. So clearly this is suggesting that it's almost the least recently used page. So that entry is wrong. So we gotta remove it from the middle of the list and add it to the tail instead. So you can implement LRU using a link list like this. It has to be doubly linked list so you can pull from the middle of it. Yeah I've heard of that. I know people have tried Ngram analysis which is basically sequences of pages. Yeah it works sometimes. But it's usually not the kind of thing that you wanna build into an operating system. But yeah you could certainly try to be a lot smarter. All right so let's continue with this description though. Okay the big difference between FIFO and LRU though apart from the fact that you've got a doubly linked list it's a few more pointers to update. The big difference is that we made an edit on this list based on page use not just page entry time into RAM. So FIFO is updating when there's page faults, right? That's the only time it has to update. LRU has to be updating on page accesses which is all the time. Or at the very least it has to be updated on new page accesses. And that's the big problem. All of this overhead is somehow has to be built into the fairly directly into the compiled code so that these checks are happening. So we also wanna avoid though doing these pointer updates. So in practice people use a simplified version of LRU. So we'll talk about those next. Okay so first let's look at how FIFO behaves on some example data. So here's some incoming page requests and we're maintaining a simple physical memory with say three frames in it. And we've gotta map these pages into the three frames. So let's look at FIFO replacement. All right let's get the lasers ready so we can do this simulation. I'll set, I'll start it off. Let's bring in the initial pages. All right so there's the page requests. A is already in cache so let's just put it there. B is in cache. All right what about D? Where is it gonna go? Why are these lasers warming up? Go ahead shoot. Yeah all right. Good okay. All right first thing good we have a consensus. All right so yeah we're gonna eject page, the page in frame number one. So A is going out and we've got D there. All right so A now what's gonna happen? We just kicked out A so where is it gonna go? Good then the first page that's actually in, we've got three pages in the cache and the one that went in first was B. Great okay next D all right it's already in. B we kicked out so where is it gonna go now? All right it's great. All right so all right that's FIFO. It's not very smart though. I mean we ended up doing how many four page faults basically after the initial load. Okay seven faults and all. Okay. And you know because we couldn't look into the future we ended up replacing an item A which we're gonna use right away. So min obviously would do a lot better in this case. All right so let's look at min now. All right so all right so in this case you've gotta look up at the sequence that's coming. So let's take a minute so all right somebody's already got it. So we're at D okay and so the requests coming in the future we've got A, B and C in the cache so which one is further in the future? C good all right so we've got C. Okay now let's see A is there, D is there, B, C all right well we're gonna actually run out of run out of the future right now or are we? Let's see, C, C, C, C, B. Yeah well it's right okay it's not B so yeah A, excellent. All right we actually can solve that. All right thank you. Interestingly enough you know LRU we said it was a typically a good approximation to min and in this case it would have actually given us exactly the same answer but in general it won't. And we can make LRU perform very badly. All right so what's gonna happen here? Yep so I'll just put a few more up here. All right what's B gonna kick out? See yeah does anyone see what's happening here? What's this, how did they come up with the sequence? They're so I think you can probably see after a while you'll notice actually getting a page fault on every access so what does that mean? Yeah they're basically picking the element that's out of the four pages they're picking the one that's not currently in the cache so that's gonna generate a page fault every time and in fact it's basically gonna reduce to FIFO and by contrast if we were able to see into the future this is the result here you can see it's less than instead of 12 page faults we've got five no six so it's only making half as many mistakes so clearly you know the gap between LRU and min can be arbitrarily large luckily most of the time it's quite good. Here's a representative graph as we change the size of the cache and look at number of page faults and you can see they're decreasing for this particular workload quite sharply so increasing memory size, cache size is normally gonna help in practice and we would hope in addition that that is true of specific policies as well so in other words it would be desirable if for any of these algorithms if you could show that when you added more memory the miss rate actually goes down so interestingly enough it's true for LRU and for min but not for FIFO so here's a counter example so again the anomaly here is that we're going to add a cache location so we're giving the algorithms more places to place pages but nevertheless we're actually gonna get more misses so here's the same sequence in both cases here we've got three entries in cache here we've got four of them and it's easy to count the hits and the misses we've got three hits here but actually only two hits here so there's an extra miss in the second cache which has one extra entry so anyone can anyway explain why that happens? The trick is that we force a miss here basically on this A by kicking it out here, excuse me, we sort of somewhat artificially choose to place this new element here which is gonna force something out we force the element out that's just coming on the next access so it's somewhat artificial but you can see how this could be done so this is called Bellati's anomaly and ideally we want to avoid it it doesn't happen in LIU however it does actually happen in most of the approximations to LIU that people use in practice so it's something to keep an eye on you're never guaranteed that adding space to the cache is gonna improve miss rate in all cases all right so quick reminders about project status project one, the project was due yesterday the final design docs are due tonight at midnight and we do wanna get your feedback on performance within your group so please be frank about who's doing what midterms coming up on October 21st in this room in the LSB closed book devil-sided handwritten page of notes and no calculators et cetera and it's covering material up to right up to next week next Wednesday and the projects one and two okay any questions all right and there's a survey coming up in pretty soon maybe in a couple lectures all right so we'll appreciate your feedback okay let's take a five minute break and wrap up after that okay let's continue so we're gonna look at some ways of implementing LIU efficiently specifically some approximations called second chance in the clock algorithm intuitively the essential information you want for LIU is a timestamp whenever you have a reference to a page you wanna keep track of where the time was and somehow keep perhaps an ordered list or something some kind of easily searchable list in reality as page tables have become very big this has become quite difficult to do really impractical very few systems that attempt to do this so there are a variety of approximation algorithms and usually they do heuristics to avoid having to change pointers or move data around and very often they use just a single bit per page to save enough information to get it to have a reasonable policy of replacing not heavily used pages so second chance is an approximation to LIU really a very course approximation that is going to try it's going to use a FIFO like protocol but try to avoid replacing pages that have been used recently go ahead I mean you could the issue is though that you'd need enough hardware to basically contain all of the page table entries and we're talking about page tables now for a gigabyte of memory which are megabytes large so there isn't a reasonable solution now and in fact the hardware isn't handling the page tables anymore I mean it's handling manipulation of them but they live in memory they can't live in some kind of associative memory that would allow you to do that well when it says time stamp page I mean it doesn't literally have to live in the page it can be in a page table entry all right so our goal is to approximate LIU without a full time stamp and just using a single bit per page table entry and so there's a use bit per page stored somewhere the idea is to set that whenever you access the page just set the single bit and then when you have a page fault and assuming you're maintaining the FIFO queue of page access or rather excuse me a FIFO list of page addition then you check the page at the head of the FIFO list which is normally the page that was first swapped in and you look at it if it's use bit is set which means it has been used since it was brought in at least once so you give it a second chance it's at least being looked at once so heuristically you save it around and give it another try so it goes back to the tail of the queue which means it's not ejected yet you try another page at the head of the queue the next page and you keep going until one of them has an unused bit so that page has been read into a cache but it hasn't been accessed since it was read in so heuristically it's at least n steps ago if there are n elements in the cache that this was actually brought in so it is a not recently used page so we eject that one well it's accessed on the first time it's brought in but the first time it's brought in we don't set this bit when the bit said it means it's been accessed twice initially on being brought in and then at least once again so this is not too bad we're actually only doing this move to tail and pointer updates when there's a page fault the bit settings happening on the accesses but that's a lot more lightweight yep well the issue is, all right let me think well the issue is that you'd need to know what was if you're gonna, so I guess I'm trying to think how you could do that because you'd need to have a series of queued requests to bring things in which the issue would be how would you know what to bring in you'd either have to wait until you'd had a bunch of requests which means you stopped everything or you'd have to somehow look in the future I mean in certain circumstances you might prefetch if you see that pages are being accessed sequentially or something but yeah beyond that I don't think there's a reasonable way to do that all right so let's look at second chance on a small page table so let's assume that we've had this sequence of arrival so page B arrived there it is, page A arrived it's stuck at the, this is ahead of the queue here so that's normally the oldest page here's the next page added, next page, next page and in the middle there we had an access to page A so its use bit was set at that point so we've got four locations so now the cache is full and another page F arrives so what should happen well we're gonna check this one it's an unused page so that one is gonna go away we'll read in page F and it goes into the tail of the queue and it's bit is clear when it comes in so it has you know we just accessed it but we have that bit ready to check for it to access the second time so now we access page D that's fine yeah so those accesses, let's see where do we go NBCD page F oh yeah I'm sorry access page D actually let me go back one step so page F arrived okay so we put it in access page D so all we did was we set the bit of page D on that access now page E arrives so we have a page fault which means we pulled out the page that was previously there page A its bit is set, it's been used so we push it to the tail of the queue we're not gonna throw it out yet so the search for a page to eject hasn't finished yet we tried D this time D has its bit set so instead of ejecting it we just push it to the back and we cleared the bit of A when we put it to the back we should be clearing this one really just happens on the next frame all right so A and D because they'd been used since they came in they were saved for later now we have this page C which hasn't been used so that's the one that's actually gonna go away and E goes to the back of the queue ready to be checked all right so is it clear enough yeah yeah so you're jumping ahead a bit yeah but this does have a really bad worst case if everything is being say sequentially accessed I mean if you have a series of accesses do everything that is in the cache and then you page fault then you've gotta go basically all the way through doesn't do anything it's just gonna stay it's bit stays set so it's just a no op if it's a greater than a second access so okay so the second chance algorithm was pretty good we can do a slightly cleverer version of it by observing that once we have a full cache the number of elements in this queue is exactly the size of the cache so when we talk about when we start pushing elements to the back of the queue we're in effect just rotating this fixed size list and rotating a if we make this list circular then rotating it doesn't mean anything it just means basically we all we need to do is keep a pointer to the head and keep moving that around so it's the idea of the clock algorithm and that's where the clock metaphor comes from so again it's just a version of second chance where we have a list of pages that fully occupies the cache the only operations we're going to do are either removing a page and putting something at the end which is actually the head and the end of this are actually the same place if it's a circular list or we're just going to rotate an element to the back of the queue so that again is basically an in place operation so by making this thing circular we actually avoid the need to do any editing of the links right so the clock algorithm is that implementation strategy so we arrange these pages now in a circle whose size is the number of frames in the cache number of frames in physical memory I guess so on a page fault we again check the use bit and again because we're sort of at the head and the tail of the same thing and if we're if it has been recently used we just clear the bit and leave the page alone intuitively what we've done is actually rotated it from the head to the tail but that's the same place if the page that we're looking at has a zero use bit it hasn't been used then we are really going to eject it and we're going to bring in a replacement candidate which we're going to put in the same place and move the hand forward which then makes the tail be the head all right so let's look at this anyway hopefully intuition is already there so we'll take a page table size of four and our pointer is going to point at the oldest page so we get a page in B hasn't been accessed after its initial arrival page A arrives and the pages we're going to arrange them this time in a circle like a clock so we access page A so what's going to happen accesses just mean setting the bits like we did before all right page D arrives so we want to add that in the circular queue page C arrives we're going to add that to the queue with a clear bit all right now there's a new page that arrives and the cache is full so what are we going to do this is the head of our queue now it's also the oldest page so that one's going to go away now if you notice that we replace this one and we move the pointer we don't have any links but what we've done is basically make this one be the head of the queue and now this becomes the tail of the queue does that make sense so we kind of cleverly I mean we could have actually had our linked structure and then it would explicitly be the tail all right now if we access a page D that's fine so we just paint D all right now we have a page E that's not in cache so what's going to happen quickly quick summary of what's going to happen yeah good right we just we're just going to skip skip over these and clear their bits because you know we as the thing goes around it's clearing bits basically so that the bits are representing recent activity all right so we clear that clear that one oops page E arrives uh oh well sorry we it looks like we skipped a step here but we you know we should have we would have replaced E with the new page and it should have actually replaced E okay so discussion of that um well is it good or bad if the hand moves slowly yeah it's good we're not having yeah the hands moving either with page faults or with failed attempts to eject pages so yeah it's good if it's going slowly what if it's moving quickly yeah we're either doing a lot of replacement or we're um failing to replace things okay um are we doing oh yeah we've got time okay so alright so this seems like a uh a nice approach uh let's try to generalize it a little bit so it seems sensible to give pages one chance it seems more sensible to give them a few chances and somehow arrange the bit updating that so that pages that are being more frequently accessed are somehow getting their counts reset more often so that's the idea of the nth chance algorithm so to do this we need to maintain a counter on each page though which is um a counter from how recently it was checked okay so um we still have a use bit and we're going to um again check the head of the queue and clear its use bit if it's under the head and we'll also clear the counter this counter is sort of counting um what sweep we're in um and if we get to zero then we're going to increment that counter and if we get all the way to n that's when we replace a page so we're basically now replacing pages that have been in the through this uh process many times alright so because we're incrementing the counter on each pass over a page it takes n passes over a page to get a count of n so something that has a count of n has been in there for a long time so large n actually gives you an approximation to liu um because it is a quanta kind of quantized or approximate time uh since it was accessed the act you know uh an actual access will reset the counter so as the counter gets big it means it wasn't used recently and if we use enough different values for n we get a good approximation to liu and in fact n about a thousand is almost the same um on the other hand it's more efficient in terms of bits to use small values um the other problem is that you might have to look a long way to find a free page so another heuristic that you can build into this is that because it costs more time to um to repair page faults with dirty pages because they have to be written out as well as a new page read in um it's rational to try to give the dirty pages extra chances so give them a tag or use the tag that says dirty and give them extra count before replacing them that's a not commonly used heuristic so for instance um you can assign a clean page as a account threshold of 1 and dirty pages account threshold of 2 okay so alright so let's try to review some of the stuff that we looked at today on paging and do our final questionnaire alright so um demand paging incurs conflict misses yes or no yeah question no I don't I think that there might be a way to implement it I think it was conceptually simpler to have the use bit I mean so your assertion would be that you'd somehow yeah can you reset it to zero yeah it might work I'm not gonna try to figure that out in real time yeah it seems reasonable but I couldn't tell you for sure if that's gonna work alright let's try to answer some of these questions we've got just enough time so um demand paging does that have a problem with conflict misses no okay why not yeah great fully associative alright um yeah liu can never achieve higher hit rate than min is that true yeah that's min is actually uh yeah it's perfect basically so yeah um let's see liu miss rate may increase as the cash size increases okay no actually yeah maybe I didn't clarify this enough liu because it's a stack algorithm it doesn't have that paradox I think it was written on the slide but I don't think I said it FIFO has the problem that it that it can increase as the cash size increases but both min and liu cannot have a beliety paradox so yeah false um clock algorithm is a simpler implementation of second chance um okay and apropos of the question that you asked earlier so if we have a cash of 100 pages what's the maximum number of um pages that second chance would look at before finding something to effect oh my no not quite obviously it's approximately 101 yeah actually because if um if all 100 pages have the um the use bit set it won't eject any of them until it's cleared all of them and gone around to the zero again that it yeah so it's actually 101 okay alright so let's summarize um so demand paging is the approach of treating main memory as a cash for secondary storage this is very similar to memory caching um we describe today the policies for deciding though what page to to get rid of which for disk caching is extremely important because of the massive disparity between disk access time and main memory access time um you know it's a a transparent operation to the user all of this is happening without awareness by the user application software or the process we talked about three policies FIFO which is a fair but very inefficient policy of simply ejecting the first page that was brought in easy to implement and lightweight though because you're only doing updates when there are page faults uh min is the reference algorithm that the dejects pages furthest in the future but is impractical and L.I.U. is an approximation of that uh which tries to eject pages that were accessed furthest in the past trouble is though that you know whenever you're measuring uh access time you're making updates based on uh access to new pages and so it's a much more frequent type of update so we talked about uh two approximations uh to the to that one of them was second chance or nth chance and the other one was the clock algorithm which was using the fact that um second chance is basically uh traversing a fixed size Q and if we simply wrap the Q we don't have to edit the pointers anymore so uh and it involves again just a use bit or uh being updated on each new page access so it's a lot more efficient doesn't require data structure accesses alright that's it