 Well, at least today or Friday, we're going to stop, finish, complete talking about virtual memory, which is kind of wistful for me, because I like virtual memory. But assuming we get through today, then on Friday we'll start talking about disks, which are kind of the third big unit of this class and another interesting device that the operating system has to multiplex and try to improve the performance of. Today the idea is so, you know, we've come on a kind of a similar journey to the one that we came out with the CPU. So we started our CPU discussion talking about, talking about wondering if the video recorder is working, is it working? Is it on? Okay. Just making sure. What the, you know, posterity would, would, would hate us if we forgot to turn on the video before. So, so we, you know, with the CPU we started with these kind of low level mechanisms that had to do with sort of features that hardware provided. And then we talked kind of about hardware, the hardware's interaction with the kernel. We talked about some abstractions that kernel created and some of the kernel's responsibilities. And then we got up to the very top and we started talking about policies, right? So today we've kind of built ourselves up to that point, right? So we talked about pieces of hardware that help the, we talked about some of the important abstractions. We talked about some of the pieces of hardware that help the kernel perform memory management. And we're, now we're at the place, right, where we have some interesting policy choices to make. And so today we'll talk about some, some policies particularly related to choosing pages to move to the swap file and how we do that, right? And that can have a big effect on performance. So right. I mean, this is similar to Monday. And I don't know if I announced this Monday, but it's on Piazza. So we've chosen the final big, one big due date which will be May 10th, which is a week before grades are due. It's the Friday after the exam, the exam's on Monday, I think, or Tuesday. I can't remember. But it's on, it's all on the calendar. So if you forget it's on the calendar, but you know, that's a little bit more time than I thought you guys would have, so that's nice. So you can keep procrastinating or do better than you would otherwise, right? It's up to you. So all right. So any questions about swapping? So on Wednesday we talked specifically about the process of moving pages back and forth to disk, the different things I had to do in order to keep my data structures up to date and some of the overhead in the process, and some ways to try to make it faster. So any questions about this stuff? All right, so who can fill in the blanks here? So what's the goal of swapping, right? I take a portion of my disk, and if I'm good at swapping the amount of memory on my system feels like, Greg? Yeah, it's as big as the disk, and as you want to finish it, as fast as memory, right? It's memory, right? It feels like memory, but you feel like you have a lot more of it, right? And I usually have a lot of disk that I can use for this purpose, right? So if I'm as large as the size of the disk potentially, right? I probably actually want to do other things with the disk, like store all of your important photographs on it or whatever. So I usually use a piece of the disk for this, not the whole thing, but ideas, whatever piece of the disk I use, I want to make it feel, that part of the disk feels, fast as RAM. And if I don't do this well, right? So we talked about some corner cases on Wednesday of, what's today? Today is Wednesday, on Monday, paying what happens? Actually, so it feels worse than that, right? It feels potentially as small as RAM and potentially even smaller, right? But essentially, it feels like I don't have any, maybe I don't even have any RAM at all, right? All I have instead of RAM is a really big slow disk, right? So if I do this badly, it can make my memory feel the slowest disk, right? So that's kind of terrible. And we'll talk today about a particular case where this happens, right? And the algorithms that we talk about today and the policies that we talk about today for choosing pages to move back and forth to the swap file essentially are what move us between one of these two categories here, right? If we choose the right pages or at the top, if we choose the wrong pages, we're at the bottom, right? So what do I need to do in order to move a page to disk, Ciroc, give me the first crack at this. There's like three things up there, so yeah, but. So the page is in memory, right? Let's say it's actually not only in memory, it's in the MMU. What are some of the things I need to do? Jenny, you wanna help them? Yeah, so I have to stop the process from using this page, right? Because of something I'm about to do, right? What else do I need to do, Tom? Yeah, so copy the content somewhere, right? The disk is where we talk about swapping here, but really I could put them anywhere. I could put them in outer space if I could get them easily, right? But I need to preserve the content somewhere and where the place I use to preserve them is the disk, right? So I need to remove the translation, copy the contents and what's the last thing I need to do here, sir? Yeah, I better update my own data structures here. I'm not gonna be able to find the contents later, right? So I update the page table entry and then I'm done, right? So we talked a little bit about, so what's slow about this, first of all? Yeah, what's the slow part of that process, right? There were three things I had to do. One of them is not like the others, Simon? Yeah, that the actual copying of the contents. So what can I do to make that faster, Nick? Okay, yeah, okay, sweet, yeah, we talked about this on Monday, right? So remember, we pointed out that, and we'll come back to this maybe today. I don't remember if we do or not, but, right, pages in the TLB might be very bad pages to evict, right? Because they've been used recently, potentially. On multi-processor systems, this is even worse because on a multi-processor system, it turns out that each processor, each core, has its own TLB. So if I want to remove a page that's mapped in any TLB, right? So I'm the kernel, and this is something that, you know, if you choose to do this for assignment three, you will have to get right, right? So I've got these four cores, they're essentially running independently. One of them has decided to evict a page that's in the TLB on another core. It turns out that now this gets really ugly, right? Because I actually have to be able to stop that core from running. I need to send it what's called an inter-processor interrupt, right? So we talked about how interrupts or how hardware devices communicate with the processor, while I also have a concept of allowing cores on a multi-core machine to communicate with each other, right? So let's say I'm core one and core two over there, and I know that core two over there has an entry for the page I'm about to evict mapped in its TLB, how would I know that? Yeah, I've checked the page table entry, and the page table entry says it's in a TLB somewhere, right? But now remember, I'm running in the kernel on core one, but core two's out there running some user programs. So now what I have to do is I have to actually send core two an interrupt, right, which will cause it to enter its own interrupt handlers. And essentially core one has to tell core two, you need to yank that entry out of your TLB for me, because I'm about to evict that page, right? In OS 161, the name that is used for this, I think this is probably a BSD term or maybe a Linux term. I wish I knew what it's called, because it's a cool name. It's called TLB shoot down, right? So imagine I've got a, okay, I'm very pro gun control, right? But I'm gonna use a gun metaphor here just for a minute. So you know that you've got the TLB over there with another entry in it, and I've got to like shoot the entry out of it, right? It's like shooting clay pigeons or something, right? So yeah, so that's the process, and it's implemented using these IPIs, and these IPIs can be really difficult to get right, and they can also not scale very well, right? So suddenly I have to send all of these other cores on a big, like I have a 64 core machine, right? Now I don't, maybe I don't know which cores have this entry map, so suddenly I've got to stop everybody, right? The whole machine, everyone's got to, I have to send everybody an IPI. They all have to yank it out, and then we can go on, right? Doesn't, not very good, right? So in general, I might want to avoid entries that are mapped in a TLB when it comes to look for pages to evict, right, as Nick pointed out. Okay, did I answer your question? No, no, no, I mean, it's in general, right? I mean, the idea is that if the page is in the TLB, it's a sign that it's been used recently, right? And as we talk about today, one of the goals of page replacement algorithms is to find a page that is not going to be used recent, not going to be used, sorry, recently, not going to be used again for a long time, right? So that's different, let's, let's talk about this though. So we talked about something that we could do, like in the background, you know, when the operating system is just hanging out looking for something to do, something that I could do to improve the swap out process. Simon, what was it? Yeah, so if I, if while your system is just hanging out there, you know, you've got your laptop open, but you're in my class and so you're fascinated, you're not, you know, checking your email or Facebook or posting on Twitter or whatever. And it's just sitting there and it's kind of like, oh, you know, nothing's happened for a while, maybe it's a good time for me to do some housekeeping. One of the things it might do is sit around and write some dirty, some pages into the swap file, right? So that means I take the contents of the page and I copy those contents into the swap file and then I make a note of the fact that the pages contents now match the contents in the swap, right? I remember actually when I, this is one of those silly stories, when I was in college I had my computer right next to my desk and my bed because I loved it so much, no, just because I had to do a small room and so that's where it was. And, you know, I was like learning about Linux and I was pretty excited, so I had Linux running on my computer and this was back when it was like still kind of hard to do that, you know, one of my roommates spent about a week getting Linux to work on his laptop, like with the monitor and stuff, which was important. Not very many people like a text-only laptop. So, and I remember it, like I'd lie there at night and the thing would be on and I could just hear it, like the disc going, like all the time I was like, what is this, right? And I looked around and it was it was really irritating, right? It kept me from falling asleep. And it turned out, I think it was the, it was the Linux swap process, right? That was repeatedly doing this, right? So the machine was idle, there wasn't much going on, but there was some kind of process that was dirty in some pages and so it kept writing these pages out to disk and every time it did that, my disk would make an irritating noise, right? So I found some way to disable that and then I got, I slept much better. So yeah, so in the background when nothing's happened and I copied the pages out and then why does this make things faster? Why does this help me? Richard. Yeah, because if I know the contents in the swap aisle already match the contents in memory, then I don't have to copy the contents again. And then, you know, this whole process, this part I can avoid and now I just have a little bookkeeping to do, right? So this is very, very nice. You know, as part of assignment three, if you are, I can't remember if you are asked or suggested to write a page cleaning demon that will do this. What's that? Yeah? So keep in mind, this is not swapping out a page, right? I am not removing its contents from memory. What I'm doing is I'm bringing the contents in the swap file up to date with the contents in memory. We'll talk today about how we choose a page to swap out when I need to swap out, but this is not swapping out, right? I'm not removing the page contents, right? All I'm doing is I'm making sure that the swap file is in sync with memory, right? Because then when I have to evict a page I don't have to do the copy, right? All right, and then we talked a little bit about this at the end of class. So who remembers this? We'll review this again in a couple of minutes, but we got to this right at the end of class and the question is when, so when, you know, the first time a process, so when a process, we talked about a process that's of its address space during the exact, it tells the kernel. Here's all this stuff on disk and a binary file. Here's where I want it to go in my address space. When does the kernel actually put stuff there? When it needs to? Yeah, the first time the page is accessed, right? So I don't, you know, I do not copy the entire, you know, Microsoft Word is always my, you know, the dead horse I like to be down here. I don't copy the entire thing into memory right away, right, in fact I usually don't copy any of it, right? What I do is I make some notes about where that content is and then when the process starts generating TLB faults or page faults, then I start to copy that content into memory, right? So essentially what I do is I do some bookkeeping and I say, okay, the process is allowed to use this virtual address and here's where the content is, it's in some file and then when the process faults on the address, I'll go get it, right? And if I do this well, it means, so I'm making a trade-off here, right? So what happens is the first time a process uses any piece of code, I'm gonna have to go get it from disk, right? So I'm making it a little bit slower at the beginning, potentially. But over time what happens is the process ends up not using large pieces of its code base and so I don't have to allocate memory for it, right? And yeah, so, well I just told you the line, right? So a lot of code and data pages may never be used, right? All right, so now we've talked about getting the page out of memory, it's on disk somewhere, when a process faults on the address, you need to go get it again, what do I need to do to do this? Like seven things, so this is an easy question, but listen. Yeah, so I have to find, like I'm gonna have to put the content somewhere, right? So I need to find a page, right? So I need to, well, I need to stop the instruction first, like so I can't let it complete. Now I need to allocate a page of memory to hold those contents, right? Tam, what do I do now? Yeah, so let's say I've got the PTE, right? And then, you know, now what do I do, right? I've got a page of memory, I've got a PTE. What's that? Well, but what's in this page? Yeah, I better go get the contents from disk, right? So yeah, okay, Tam, right? I locate the page using the page table entry and then I've gotta get the contents, right? Because before I can let the process start using the memory and has to have the contents it had before, right? Then I update the page table entry and indicate the process, that the contents are in memory. Then I have to load the entry to the TLB so we can actually use it and then finally I can restart the instruction, right? So you'll come back to these slides when you're implementing assignment three, right? So and then we talked about two types of memory-related faults and these are particularly important when we talked about Harbor Managed TLBs, right? So there's this idea of a TLB fault which means that the pages is that there's no mapping, sorry, it means that there's no mapping for the page in the TLB, right? This can be because of two reasons, right? One case is that the contents are in memory but there's just no cached entry in the TLB, right? So that's what we call, you know, that's a type of TLB fault and then a specific type of TLB fault that we distinguish is what's called a page fault which means that the contents that the process is trying to use aren't even in memory, right? So if the contents are in memory, I can handle a TLB fault by just pointing the virtual address at the physical address where those contents are located. If those contents are not in memory for some reason, I potentially have to do more work, right? And that's one of the reasons why we distinguish between these two things. Usually TLB faults can be very fast to access, right? No, it's just not in RAM. And what I mean by not in memory, I mean, it could be in the SWAT file, right? But there's certain cases where we haven't really discussed but there's certain cases where if I, so let's say for example, this is a good question, right? So let's say for example, the process asks to increase its heap size, right? By like four megabytes or something, right? So it asks for a thousand new pages in the heap, right? So first question, am I going to immediately allocate a thousand new pages of physical memory? Who thinks I am? Who thinks I'm not? Why not? Because I'm an on-demand pager, right? I'm a procrastinator. The process hasn't used this memory yet, right? I'm gonna say, okay, I'll give you permission, right? I need to make some notes in my page table entries to say, hey, you can use this portion of your address space. But the next, so let's say now the process faults on an address within that region. What do I give it? Where are the, this is a good question, where are the contents of that memory? They're not, the contents of that virtual address are not in memory, but where are they? What, I mean, so first of all, what should the contents be? I allocate some heap. I fault on an address in the heap for the first time. I'm doing a read, what should that return? Probably just zero, right? I mean, I haven't written anything into the memory yet, right? So this is kind of a different type of page fault, right? What it means is that the contents of the page aren't on disk, right? This is called like a zero-fill page, right? So what I do is when the process increases the heap, I say, okay, I give you permission to use four megabytes more heap. When you start faulting on those addresses, I find a page of memory, I zero out the contents and I give you that page, right? Does that make sense? There's no contents for this page, right? They're not on disk anywhere, they're just zeros, right? Once you start writing into it, now the page has contents, but the first time you use it, it doesn't. It's just zero, assume it. So do we make it like that for whatever page is in the memory? What do you mean? I mean, how do we know the page is in memory? If there is no contents, we'll do it. Yeah, so how would I know? So I have a TLB fault, right? How would I know that the page is in memory? Nothing. I get a TLB fault, the MMU tells me, I don't know how to translate this address, right? It gives me a virtual address and what do I do? So it asks the kernel, what does the kernel do? Yeah. So I have a page table, right, which is gonna help me map this, but what am I looking for? How are you? A page table entry, right? So the page table entry needs to reflect the fact that that page is in memory, right? So usually the page table entry, what all the page table entry have in it if this is good review, if the page is in memory? Sean, yeah, physical address, right? So if the page is in memory, the page table entry should have a physical address pointing to where the contents, right? So that's how I handle the TLB fault. I look up the entry in the page table, I find a page table entry if it's valid, right? Page table entry will say it's in memory and it will say here's the physical address and that's what I tell the enemy, right? All right, oh, my laptop just woke up, okay. All right, so, and we talked a little bit, and I won't go through this in detail because I'm going kind of slowly today, which is nice, but we talked a little bit about, you know, the differences between hardware managed TLBs, right? The hardware will actually search for the TLB entry itself, right? And this means that, sorry, the hardware will search the page tables to find a entry and it will load it itself if that page is in memory, right? And the nice thing about this, this is faster, the difficulty is that the page table entry, the page table structures are determined by the hardware, right? So if you implement a kernel for an x86 architecture, you have to implement your page tables in a way that it understands, right? If you don't, it'll fail all these look-ups, right? And again, with the hardware managed TLB, your kernel never sees TLB faults and you will have to, all right, so taking a long time for review today, which is fine, but let's go on. So we talked about on-demand paging and we talked about the nice thing about it, right? Was that potentially there were cases where I didn't have to actually end up loading things that processes requested, right? So a process might say at this big chunk of code I don't need it, right? So we went over this and I don't load these until they're used, we did this pretty well in the review. Right, and this is what I just talked about, right? So the first time I do a loader store to an unilitialized heap stacker data page, right? Something that doesn't have contents, right? So my code comes from disk, right? Initialized variables that my program uses, those are also in the L file, right? But the L file doesn't contain entries if pages are just supposed to be blank, right? So if you allocate a big empty data structure in your program, the L file will say you need to have a portion of the address space for this big empty data structure, but there's no contents in the L file, right? Why would you fill it with zeros, right? The L file just says, fill these big empty pages with zeros, right? And that's what the kernel will do. All right, so now let's talk about how we do page eviction, right? So when we swap out a page, we need to choose what page to evict, right? And like I said, this decision, so this is now the policy part of virtual memory or a pretty important policy part, right? And there's a, you know, the way I like to think about this is there's this cost benefit calculation that the kernel has to perform. So what is the cost? What's the cost of swapping out a page? Potentially. In one of the ugly cases. Navier. Yeah, but why, and why does it take time in the worst case? Yeah, so it means using the disk, which is slow, right? Potentially, right? If I have clean pages and I'm in great shape, well, my voice just went up like three octaves when I said that. If I have clean pages, then I'm in good shape. If I have dirty pages, then I have to write out the contents and so that's an issue, right? So the cost is essentially the time and the disband width, right? Because, you know, again, believe it or not, use the disk for other things, right? We'll start talking about disks on Friday or Monday. But, you know, the idea is that, yeah, there's other stuff going on with the disk, you know? Like, you know, you're playing heavy threes or browsing, you know, doing some big search over all of your really important Twitter posts or whatever. So, but, you know, there's, and it consumes time, right, because the disk is slow, and then we're also making the disk active and competing with other disk traffic, right? What's the benefit, right? Why, you know, why, it seems like a terrible idea, right? What do I get out of this? Gabino, I'm gonna swap out a page, right? The cost is moving into disk, but what do I get? Yeah, I get 4K of RAM, yes, you know? So, I get 4K of RAM, and here's the interesting thing. If you think about it, I get 4K of RAM for some unknown period of time, right? I get 4K of RAM until when? When does this whole process kind of come around to bite me, to bite me, Tom? Yeah, so on some level, right, remember that this cost has two components to it potentially, right? Both the component to move the page out and the component to go get it again when I need it, right? And in the meantime, I've got this benefit, right? So what, you know, given this cost-benefit calculation, what's the goal here, right? What am I trying to kind of optimize for? Dan, okay, but in this very specific decision, right? I'm gonna argue that, let's say the cost is relatively fixed, right? I have, and again, I mean, there's some components here that could change in certain cases, but let's say I have a page that's gonna be used again, right, I'm gonna have to move it back and forth. So what part of this, what variable here can actually change? Josh, I mean the cost is, I'm gonna argue fixed, right? The benefit is 4K of memory, but what could change, right? What could, what's the thing here that I don't know that I'd like to try to do my best with, AJ? Yeah, the amount of time this page is not going to be used, right? So if I, so this goes back to Nick's question before, right? And we talked about this on Monday, sort of the worst possible case I could get into, right? Where I start swapping a page out and while I'm in the process of swapping it out, I have a fault on it, right? And it's got to come right back in, right? So in that case, the benefit is what? The ability to use 4K of memory for how long? Zip, right? And I just have this huge cost, right? So that's like the worst possible boomerang sort of effect, right? What's the, I mean, again, what's the best case here, right? What is the best, the best case that actually even affects the cost, one by one? Yeah, imagine this page will never be used again. Never, right? It's whatever reason, you know, you clicked on some weird, some menu in Chrome or whatever you're using today and, you know, it like, you loaded the Chrome help page, right? How many times do you do that? And, you know, it found whatever, you know, parts of that address space needed to be moved in, it moved in that code, the code ran, you saw the help screen, that was nice, right? Now you've answered whatever question you had and you may shut the program down before you use that functionality again, right? So if I find that page, right? So there's two things that happen, right? The benefit is huge, right? Because that page never comes back and the cost is actually half, right? Because I never have to swap that page in, right? So that's like my dream page to find, right? So there are, and there are, I don't, you know, let's point this out, right? There are tricks that we try to play to minimize the cost, right? We talked about one of them, which is this idea of trying to push pages, copy pages to disk before I actually need to remove them, right? So that helps me reduce the cost because it means that I don't actually, so the page cleaning that we talked about, right? What does that do to the cost, potentially? What can I, you know, what can I reduce the cost by? Yeah, yeah, so my cost roughly goes to what? Ballpark, right? What? No, for a page, right? Let's say I can clean pages and then the cost for swapping, the cost for doing page eviction, no, there's another component here. So it's about half, right? I still need to go, if the page is used again, I still need to go get it, right? So I'm still gonna have to do that IO. That IO I can't avoid, right? But I can avoid the IO of moving it out to disk. So that's one trick I can play and again, that reduces my cost by about half. It's also nice because it takes cost kind of off a critical path, which is sort of nice, right? But we're gonna focus today on ways to try to figure out how to maximize the benefit. So that means find pages that are unlikely to be used in the future, right? And another way of describing this, and this is some other ways that kernels talk about this, is minimizing the page fault rate, right? So what happens when, what do we call, and you guys have probably experienced a system like this before, right? You may not have known what to call it, but, and this isn't like a technical definition, right? I looked around for one and there really isn't one, right? But thrashing, right? How many people have think they've used a computer that's been thrashing in the past, right? I definitely have. So thrashing is a system, is a state your system can get into where essentially what's happened is that for whatever reason, right? And sometimes this happens just because the system is completely out of memory, right? And maybe the kernel might have a huge amount of memory allocated and processes are trying to allocate massive amounts of memory. And so what happens is, you know, almost every page access produces a disk IO, right? And, you know, you'll sit there, and one of the ways you can identify thrashing or the way you used to be able to identify thrashing when you had disks that actually moved was you'd see, you'd hear the disk like ch-ch-ch-ch, you know? Like the computer's completely unresponsive and the disk is like ch-ch-ch, you know? And it's just, the system is just completely, at this point, like frequently, you just have to like hard reset it, right? You might never even get a menu back, you know? It's over. Yeah, so the computer to do great or collapse, right? And, you know, again, if you do page replacement wrong, you can easily get into the situation, right? If you pick the wrong page then you're constantly having to move pages back and forth to disk and just things get really slow. I mean, that's when you realize how slow the disk really is, right? So, again, the goal of maximizing the benefit is to try to pick the page that's gonna remain out, essentially the longest, right? And we just talked about this, right? So if we can find a page that's ever used again, fantastic, right? That is the dream, that's our dream page, right? But, what do we need to be able to do in order to do this, right? So what would we, you know, again, this goes back to the, this is kind of, we're back to policies, right? So you can go back to thinking about scheduling. What will we like to know? So we talked to when we scheduled threads about things we'd like to know about what a thread's going to do in the future in order to be able to choose which thread to run next. What would we like to know about a page? This is, again, a little simpler. No, no, no. You gotta think oracle here, man. You have a crystal ball, see? Ball, right? What do we want to know? So that's something we could know, but what would we want to know if we could predict the future? When it will next be used, right? So how long is it going to be before this page is used again, right? If I knew that, I could, for example, identify those pages that will never be used again, right? And get rid of those immediately, right? I don't even have to keep them in memory at all, right? As soon as I know they're never going to be used, gone, right? So, again, the optimal schedule on some level will evict the page that remains unused the longest, right? And this basically clearly maximizes this calculation, right? It's because the cost is essentially fixed, and if that page is going to remain unused for the longest, this is the best we can do, right? And again, just like our oracle schedulers for threads, this scheduler is hard to implement, right? And we do something similar to what we did for threads. So what was our strategy? Paul already did this, without even thinking about it, right? He's totally absorbed the principles of this class, right? I asked him about the future, and what did he try to do? Use the past to predict the future, right? And yeah, it was warranted his last year, maybe. Okay, so I think you guys got the hang of this by now, right? So, so now essentially we're going to do this, right? And what piece about the past, you know, what? So there's a couple of things we need to think about. First of all, what information are we going to try to track, right? And we can have these ideal conversations about how to do this that end up not really being very implementable, right? Because of the features of making page algorithms fast that we've already talked about, right? So we're going to have to talk about this in the context of the information that we have, right? So what visibility do we have into how pages are used, and then how do we store that information? Why do we care about storing that information? Like what? Who cares, you know? Just store it somewhere, you know? Again, I'm going to gather a lot of information about how pages are used. I'm going to store it all over the place, right? Why does this start to become a problem? Look, I didn't know where you stored it, but what else is it consuming? Memory, right? Or are you not going to put it in outer space or you put it on the computer, right? So, okay, so, and again, this is another case where there's this really interesting trade-off between wanting to collect a lot of information and where we have visibility to collect that information, right? So for example, let's say that, you know, so we just talked about the fact that if we store a lot of data about this, that can potentially be very expensive, right? But what about this? So let's say, okay, you know, what I'm going to do is I'm going to keep a count of the number of times that every page on the system is accessed. And I'll, you know, I'll keep that count in the kernel somewhere and, you know, that'll be awesome. What would this require? Potentially. I'm just going to, you know, in my PTE, I'll just keep a count. This is really easy, right? I'm just going to have a counter in my PTE and every time the page is accessed, I'm going to bump the counter and they'll have a really good understanding of what pages are really in use. Stand. Well, okay, that's one thing, Frank. Okay, so you're thinking about storage, but what else is the problem here? Yeah, now we're back to this terrible universe that we, the MMU helped us escape from where the kernel has to know about every time a page is used, right? So every time a page is used, I'm going to trap into the kernel so I can look up the page address and bump a counter, right? No, not going to happen, right? Way too slow, right? The whole point of the MMU is to do this for us, right? So we don't have to get involved. But it's a starting point, right? This is a strong man and you guys can implement this for assignment three. What's the simplest potential possible, easy to implement, paid replacement algorithm? Thor. You made a motion that indicated, I don't know if that's like your... Oh, okay, let's see, that's what I thought that was. I thought that was like a Thor sign language for random, you know? Yeah, so a random page, right? Or whichever one you want. So yeah, so we just pick a page at random and we evicted, right? What's nice about this? Yeah, I mean, Thor can implement it with his hands, right? So that's pretty easy, okay? Simple, what's... And again, this is also a good baseline, right? So, and I think there's probably, I don't know, I wish I had like a Greek mythology metaphor for this, but there's probably like thousands of smart people who have like broken themselves on the rock of trying to make this problem faster, right? So it's like somebody comes in and they're like, I have this awesome new algorithm. It's so much, you know, and I've been tuning it for a couple of weeks and it's really great. And then you're like, dude, I just ran random and it beats your algorithm, right? So welcome to the real world. It's still cool though, I'm glad you had fun implementing it. And then, what's wrong with this? I mean, it's not that smart, right? We could probably do a little bit better than random. Random doesn't use any of this nice information about the past that we wanted, right? So, I don't know why I put this up here, right? So this is an algorithm that uses the pages past to predict the future. I'm not gonna talk as much about page replacement algorithms as we did about schedule, right? So least recently used is kind of the inverse of most recently, most proximately going to be used again. Which doesn't produce a nice algorithm, a nice acronym either, right? And it can't be implemented. So the idea here is that we try to use the fact that a page has been cold for a while, right? It's been sitting in memory, hasn't been touched. And we try to make this the basis for making a guess that that page will not be used again for a while, right? So it's kind of like if a process has been ignoring this page for a while, it's probably gonna keep ignoring it maybe for a while. So, the nice thing about this is this might be as good as we can do without a crystal ball. The cons with LRU are really come down to how can you implement this, right? How do you store the state? And how can you, what kind of visibility do you have into page accesses, right? So we're gonna talk about two of these things, right? So how do we tell how long it is since a page has been accessed, right? When do we see accesses? And how do we store how long the page has been accessed, right? And we wanna do this pretty quickly, right? Okay, so again, another little bit of review. When do I know for certain that a process, let's say you have a virtual page that has never been used, right? It's allocated as part of a heap, or it's part of the code that the process hasn't used yet. When do I know that it's used? When does the kernel know for certain, Andrew? Yeah, okay, and what do we call that? Well, maybe it's a swap in, right? But what does the MMU generate? Yeah, who? Yeah, the TLB fault or a page fault, right? Depending, now, even if I have a hardware-based, hardware-managed TLB, I still get a fault here, right? Because the page is not in memory, right? The whole point is that this is how I know that it's being used for the first time, right? So the first time that I load an entry into the, or I make the page table entry valid for a hardware-managed TLB, that's what I know, right? So as soon as I load a translation, I know that the page is being used because the whole read-on is in translation is because there's a translation on that page that the process was stopped because the MMU couldn't translate, right? The second question is, does this reflect every page access? Can I use this to count the number of accesses to a page? Robert? No. No, why not? Yeah, so this mechanism is nice, right? And to some degree, it's all we have in the kernel, right? But just let me point out, this completely fails to distinguish between a page that has been accessed once and a page is of access, potentially thousands of times, right? The same block of code. Because once the translation's in the TLB, I don't see those accesses anymore, right? So this is, it's a clue, right? It at least allows me to say that the page has been used, but it doesn't say anything about the frequency of use, right? And we just talked about this. There's no way to record every page accesses, too. So I guess you could try to use hardware to do this and I wonder if people have tried that, but it doesn't seem like it's useful, right? So, and the other question with LRU, right, is the idea was that I was gonna store how long it's been since the page was used. How do I measure time, right? How am I going to store this information? Who has like a nice naive approach? Sean. Well, yeah, I could use time right up to measure time, but then like where do I put that value? Like how much time information am I going to store about this more when we get to performance, but. I mean, essentially I need to have some per page table entry, right? So let's say I decide to store two to the 32 ticks, whatever a tick is, right? A tick could be a very fast timer, a very slow timer, right? But that's how I'm going to use, that's what I'm going to use to measure time, right? But now remember I had my page table entries all jammed down into 32 bits, right? Or maybe 64 if my bit packing code hand wasn't too dirty. But I've jammed these things down really, and now I've doubled the page table entry size, right? And this is a non-trivial issue, right? I mean, real systems have lots of page table entries, right? So you might not seem like that big of a deal to you, right, but this is a factor of two, right? So now you've doubled the amount of memory that the kernel needs to use for a storing page, and that doesn't really seem worth it, right? If I do eight bits, then I can store 256 ticks, and there's a clear trade-off here between the width of the amount of information I store, right? And then the question is, how do I find, right? So let's say I'm going to store, I don't know, eight bits or three bits or something, some number of bits, and I'm going to, I'm just going to increment those periodically. I'm going to try to use those to reflect how long it's been since the page was accessed. How do I, so now the question is, I've got all these page table entries, right? And let's say I'm trying to evict a page, and so I know which ones are in memory. How do I find the one that's been used the least? Or there's been, how do I find, LRU is really tripping me up today, how do I find the one that has not been used for the longest? What do I potentially have to do? There's a couple of approaches here. Yeah, Sean. No, no, my point here is I'm storing how long the page has been out, but I'm trying to find the page that's been out for the longest, right? I've got these page table entries. What do I need to do, potentially, through all of these page table entries? I might have to search through all of them, right? And if I do this linearly, it could take a long time, but I might build, I might decide to build some fancy data structure to do it, and then that would also be complicated and difficult to get right. So, yeah, so I need some sort of efficient data structure to allow me to do this search. So, you know, a solution to this problem that we'll just go through at the end of class and start with on Friday is something that's kind of a canonical page replacement algorithm that's called the clock, clock LRU, okay? And what clock LRU does is it uses one bit of information on the page table entry. When I know that that page has been used, I set the bit, okay? And when I'm looking for a page to evict, right? So that's when I set the bit. When I'm looking for a page to evict, what I do is I cycle through all of the pages on the system in a fixed order, right? Whatever the order is, who cares? We're gonna be arbitrary. If a page, I stop when I find a page that doesn't have a bit set, okay? However, every time I go through, I clear the bits, right? So as I'm going around the clock, I'm clearing all the bits. So let's say I start the clock and all the bits are set. Will I ever find a page to evict? Who thinks no? Who thinks yes? Alyssa, tell me why. Yeah, so I'm clearing the bits as I go, right? So, and I'm gonna go until I find a clear bit, right? So if I hand whips all the way around, right? Then maybe there's some bits that are now set because the pages have been used while I've been doing this, but eventually, because I'm clearing bits, I should find a page, right? All right, so here's a quick example of this, right? So here are my, let's say these bits are, so these are my, so these bits are clear. So green means clear, that reminds me, right? So I start here, right? I clear the bit on this guy, right? Now I go to the next page and this is the page I evict, right? So now the next time the clock runs, right? I'm gonna clear this bit. Now I'm gonna use this guy, right? Next time the clock runs, I go on to the last line, right? So does this make sense to people? This is a fairly, fairly simple approach to LRU and we could talk a little bit more about on Friday and how this implements a form of LRU, right? You should, but think about, before Friday, what does the speed at which the clock hand is spinning tell me, right? Other than, well, anyway, we'll make a sarcastic reference there, but with this algorithm, you know, I could look at, you can think about the speed of the hand as being indicative of something about the state of the system, right? So we'll start on Friday with that and then we'll talk about discs, which are fun. So I'll see you guys on Friday.