 All righty, welcome back to 353, so sorry, sick yesterday, caused by a combination of probably a virus, time zone changes, and lack of sleep due to, yeah. So I got that surprise when I got home. So yeah, that's fun. So there are five huppies there. So these were, while I was at the conference, we had a emergency foster, so we are taking care of these puppies and mom, and I got a nice new picture too. So that's a nice plus. So yeah, that probably, yeah, not on the slides, but that's probably why it's one of the reasons I wasn't able to sleep. Puppies are loud and don't sleep. All right, content, yay. So good to start off with puppies. Hopefully I can remember how to lecture. So where'd we leave off? So new topic today, gonna be fairly easy, considering we came from virtual memory and threads and all that fun stuff. So in a computer system, we kind of have a trade off of memory, so like, our CPU has registers in it. They're not very big, but they're really, really fast. Then we have like caches on our CPU, like level one, level two, they are much bigger, but slower. And below that, like if we have levels of memory, we have random access memory. Below that we might have non-violetal memory, so like our storage devices, and then solid state devices, they're cheaper, they're bigger, but they're slower. Then we have hard disk drives, and then below that, we have actual tape drives, which many of you probably have not used, but I swear to you, they are still a thing. So Google uses them for like deep storage, so they're like actual magnetic, kind of like VHS tapes. If any of you have any idea what the hell that is, they're literal tapes, and they store like, I don't know, they can be like 500 terabytes, but they're really, really slow, but they're inexpensive, so if you don't really care about your, or if you need to store a lot of data, you might resort to that. So all of this pretty much, the lower down the pyramid you go, the more capacity you have, the cheaper it is, but the slower it is. So the illusion we want to have is that you have like the speed of the level above it and the capacity of the level below it, so you kind of get the best of both worlds, and that's what we use caching for. So we want to hide this from the user, so again, pretend you have the speed of the layer above it, capacity of the layer below it. In terms of memory, so if you actually look and like overload your system with a bunch of memory, you might notice that you might exceed the physical memory of your machine, because that memory is not all in use at the same time, and what that means is, well it could kind of cache your memory and use like your SSD or something like that to cache the storage so that if you run out of memory, well you just move some of that to your hard drive and then you keep on going, it might be slower, but you can keep on going and running your program. So the idea behind that is that, well, all our processes aren't using all the memory all the time, it doesn't actually have to be in physical memory, so I'll put like the pages they are using in memory and then maybe I put others on disk and then I can swap them back to memory from disk when they're actually needed and that's what we get to talk about today. So there is something, a term called demand paging, so basically that means we use memory as a cache for the file system, so wherever we try and access a file or anything like that, our kernel is going to map memory pages to file system blocks, maybe one to one if they're the same size, maybe two IO blocks to one page, whatever actually lines up and then it'll just kind of make that mapping like if we end map a file and then it will only load those pages into memory from your hard drive when you actually use it through the page fault handler, so you would get a page fault whenever you're trying to access that memory and it can store like, oh, this page is supposed to reference the first IO block of file x. So if the page doesn't represent a file and I've just run out of memory, well, we can map it to swap space, so there is a special file just called the swap file and that's where pages go that are only used in memory that aren't actually like files on the disk. So if you go into Windows, I believe on C if you unhide your files, I think it's called page sys or page file or something like that sys file and it's like an eight gig file, it's basically what it is that's Windows swap space. So if you run out of memory, like if you run out of physical memory, it will start putting entries there in the swap space and you can go ahead and you can configure that all you like, I believe that's what called on Windows, on Linux there'll be a swap partition or a swap file and macOS, I don't know, but it'll have the same thing. So that'll move it to disk if we actually need more physical memory. So a processes working set should fit in physical memory. So what is a working set? So given like a set amount of time, the number of pages your process actually uses, it's called its working set. So my browser needs to use, I don't know, 20 pages while I'm using it to access my favorite website, whatever, that's your working set and if you can't fit in your working set into physical memory, your process does something called thrashing where it's constantly moving entries in and out of the cache so it's hitting your hard drive instead of all accessing memory and then your computer gets really, really slow and your solution might just be to reboot it or buy new memory or something like that. So the process of actually booting pages out of memory and onto disk and that process is called page replacement and of course there are different algorithms for it. So the first algorithm is called optimal. So you replace the page that won't be used in the longest. Again, this is only for evaluation purposes because we cannot tell the future and what will happen, maybe you can make a guess but you're probably going to be wrong. Second algorithm, ain't really an algorithm, it's just called random, use replace a random page, not much of an algorithm and then next is our favorite first in first out. So you replace the oldest page first, it's just a queue, everyone likes link lists so that's something we could implement with that and then last one that we will go over today is least recently used or LRU. So you replace the page that hasn't been used in the longest. So whenever we evaluate these algorithms and yeah, typically these get thrown on the final so but like scheduling, pretty easy to do. So how we evaluate is that usually we assume our physical memory can only hold a set amount of pages. So in this case, maybe it can only hold four pages. Again, your real system's gonna hold like a lot of pages so this is just so we can analyze it and then we're gonna say that we access these following pages and we assume that they are initially all on disk so we have to read them into memory and our memory can only hold four pages at a time. So we'll use this exact setup for all of our examples during the lecture and unlike scheduling where we like keep track of the average weight time, the average response time, the number of context switches, so on and so and yeah, those are the main things. For this, all we care about is the number of page faults we get total because that is our slowest operation. We have to read it in from disk into memory. So for every single example, all we'll do is find the number of page faults. So how I typically write them out is that I would put, draw just a box in this box with four spaces, just represents what is in physical memory at the time and above it I put what page we're accessing and if it causes a page fault, I just put it in red. So initially we assume that our memory has nothing in it and we can hold up to four pages. So if I access page one, that's a page fault. I would have to read it from disk into memory and then after that, page one is in memory. So we're all good about that. All right, so the first four steps can be really boring. So if I access page two, that's not in memory. So I have to read it from disk. That causes a page fault. I wouldn't kick out page one because I have room for four pages. One makes sense to kick out one since I might use it later. So I would just go ahead, load two into memory in a new space. So now I have pages one and two in memory. Joy. All right, are we having fun yet? Sweet, yeah. What's gonna happen if we access page three? Oh, we're gonna do the same thing. We're gonna load it into memory. So that's another page fault. Same thing for number four. We're gonna load it into memory. Pretty much for all the algorithms, they're gonna all start like this. It's fairly boring. So let's get into the exciting thing. So now next we access page one. So if we access page one, does that cause a page fault or is it already in memory? Yeah, it's in memory. So how I write it is you just don't, it just doesn't change. So we access one. It's in memory. It doesn't cause a page fault. We just read it. Everything's nice and fast and we are nice and happy. Now we access page two. Same idea. It's already in memory. We're all good. Oh, now we get the fun. All right, so now we access page five. So we are out of space in memory. So we have to kick out a page. So I have to kick out one, two, three or four. And in this case, if I'm doing the optimal algorithm, I want to kick out the page that isn't going to be used in the longest time into the future. So if I'm at five right now and I look into the future, well, I probably shouldn't kick out one because I'm going to use it just the next instant. Probably shouldn't kick out two because I'm going to use it right after one. I probably shouldn't kick out three because I'm going to use it after that. So my last choice is kicking out four, right? I don't have any other choice. That's the best I can do in terms of reducing the number of page faults. So everyone agree with me? Access page five, kick out four, fairly straightforward. So how I draw it is I just replace the four with the five. Five goes in red. Boom, that's the page fault. All right, so now because we were smart and this is not something you can implement, again, this is pretty much just for evaluating these because we're looking into the future. And yeah, someone else is asking the question, can you ever do this in real life? The answer is no, typically you will just get, like you'll get page accesses and then you'll analyze it after the fact to see how close your algorithm got to optimal. So we pretty much use this as a benchmark with how well we can optimally do. All right, so now we access page one, that should be good. It's already in memory because we were smart. Access page two, already in memory because we were smart. Access page three, it was already in memory because we were smart. Now we access page four, we need to kick something out. So what should I kick out? One, two, three, or five? Yeah, one, two, or three, just anything but five. So as long as I kick out anything but five, I'm all good. So I know most people just hate one for whatever reason, so we can kick out one. So that's another page fault. Four goes into memory and then we access page five so that is a hit, it's already there, yeah. So for this, you wouldn't actually implement this, you would just analyze a trace after the fact and see how good you could have done because this is the best you can do. So we'll use this to compare other algorithms too. So yeah, the main thing we're comparing is the number of page faults. So how many page faults do I have here? AKA how many red numbers do I have? Six, all right, so that's all we want to get at the end of that. How many page faults? We got six page faults. So keep that in mind while we try other algorithms. So I won't implement the random algorithm because it's random, that'd be a weird exam question. I guess I could for fun, but that would be fun to mark. All right, so real one we can do is FIFO first in, first out. So this we could implement just using a link list, all that fun stuff. So again, only hold four pages, same accesses. So can I skip the first four because they're gonna be boring? Yes, all right, one, two, three, four, boom. All right, so we load all those into memory because well, they're the first access to anything. Then we access page one. Okay, well, that's a hit. So we don't have to replace anything. Access page two, that's a hit. We don't have to remove anything. Now we access page five. Well, now we have to kick anything out. If we do first in, first out. Well, the first page that was in was one. So I'm just going to blind, I'm just gonna kick out one. That's the first one in, it's the first one out. So I replaced page one with page five. Was that a good decision? Probably not. So now I access page one. Okay, what do I kick out when I access page one? Two, all right. Was that a good decision? Probably not. So I kick out page two with page one. Now I access page two. And what do I kick out? Wow, this is real bad, ain't it? So now I kick out page three with page two. Oh, this is like, this seems really bad, all right? I'm like, this is real terrible. All right, now I access page three. Okay, what do I kick out? Four, oh shit. All right, so I'm not doing good here. So now I access page four and I have to kick out page five. Wow, I'm great. All right, now I access page five. Now what do I kick out? Well, I kick out page one, and then I have this. So another way to look at this, if you wanna do this really fast, all your page faults will be a diagonal line like this because it's first in, first out. So if you wanna do it really fast, just think of the weird diagonal waterfall line. How many page faults do we have here? 10, pretty terrible, right? This is like the worst case, this is the worst thing I could ever have. So I have a thought experiment for you. So if I had, let's say my memory held five pages. How many page faults would I have? Well, five, right? I have to load them all in the memory and then everything's a hit after that. So more memory, less page faults, that makes sense, right? I have more memory in my machine, I have fewer page faults. So the opposite should be true, right? If I have less memory, I should have more page faults, right? Yeah, in the extreme case, make sense of in the extreme case, if I have one memory page, I have to swap every single time, right? All right, well, humor me. So should be less page faults than 10, 10 is like awful, right? All right, so let's just humor me for a second and just say we have FIFO and memory can only fit three. So we have less memory. So let's see what happens here. So this should be worse. So access page one, have to load it in. Two, have to load it in. Three, have to load it in. All right, that's no different. Access page four, well, that's a stupid thing to do. So I'm going to, what do I replace with page four? Page one, right? Now I started off worse than before. So now I access page one, that's also a page fault, right? At least before I got two page hits for one and two. All right, so that's a page fault. Crap, and I kicked out two. Now I access two. Crap, I just kicked it out. What do I replace two with? Or what do I replace four to? Three, I kick out three, right? All right, I kick out three. Hmm, still not doing great. All right, so now five comes in. That's another page fault. In this case, if we do like waterfall, I have to kick out page four, right? So kick out page four with five and now it looks like this. Okay, now I access page one. That's a hit, finally, mercifully. Then I access page two. Okay, that's a hit. We're good. Then access page three. What do I kick out here? One, right? So I kick out one for three and it looks like this. Then access page four. What do I kick out? Two, right? Kick out two and then access page five. It's in memory. How many page faults we got? Nine. That's better. Why'd you lie to me? What the hell? We got 10 before, right? Now we got nine. So we have less memory, fewer page faults. So take this as a lesson. If your computer gets slow and you run out of memory, just take out a stick of memory and it'll go faster. So this is silly. It gets a silly name too. So it's actually called Belladies Anomaly and it says more, whoops, more page frames. That is a typo. More page frames causes more faults. So this problem has to do with any first in, first out algorithm. It doesn't exist with like LRU or any stack based algorithms that we'll see later. In fact, if some of you are interested in math and Greek symbols, there's a paper that they wrote in like 2010 that said that this anomaly was actually unbounded. So you can get any page fault ratio you would like. They can construct some arbitrary sequence to get any page fault ratio they would like. For all other algorithms though, this is a weird anomaly. They don't get special names. So for any other algorithms, your intuition is correct. So increasing the number of page frames or increasing the number of pages you can hold in memory decreases the number of page faults. It's only in this one strange case where the opposite is true and it gets a fun name. So only the case for FIFO. So let's go back and we'll do our final example of the day. Wow, we are going through this fast, sweet. All right, so we're gonna do least recently used. This is also something we can implement. So we can use FIFO to break ties. In this case, we won't have any ties because we're assuming a single core, but on a real machine maybe you have the possibility for ties. So we'll assume physical memory again holds only four pages and we have the same accesses we had before. So for FIFO what we had 10 page faults, for optimal we had six. So let's see what we do with least recently used. So first four page accesses, well, they're gonna be all faults. Then we access page one, well it's already in memory, don't have to do anything. Then access page two, already in memory we don't have to do anything. And then access page five. So least recently used, it tries to kind of approximate the optimal algorithm, but instead of looking in the future it looks to the past and the idea behind it is if you just use memory, you're probably likely to just use it again, maybe you're running a loop or something like that. So in this case, if we are at five and we are deciding what page to replace and it's the least recently used and we look backwards in time, well we shouldn't replace page two. Okay, we shouldn't replace page one. We shouldn't replace page four. So our last victim is three, right? So fairly straightforward, I just replace three with five. So any questions about that one? All right, so after, now when we access page one that's a hit, access page two, that's a hit. Then we access page three, so we need to replace something so we'll do the exact same thing. We will look back into the past and see what we used recently. So here we just used two, we wouldn't get rid of it. Just used one, wouldn't get rid of it. Just used five, wouldn't get rid of it. So the only thing left is four. So we would replace four with three, which turns out to be a bad thing. So place four with three and then we go ahead and access four, oh crap, okay. So same thing, I have to go look back in the past so I can't get rid of three, can't get rid of two, can't get rid of one, so I have to get rid of five, which again is going to prove to be a bad mistake. So kick out five, load four, and then I access page five and then have to do the same thing again. Can't get rid of four, can't get rid of three, can't get rid of two, so my last victim is one, so I just replace one with five. So how many page faults we got there? Eight, so. Not close, close error to optimal, better than FIFO, so we could actually implement this, right? So this is something we could implement. You could implement it using, well one way you could implement it that searches all pages is you can like keep track of like a counter of each page or like a timestamp for each page and then each time you use a page, you just update that timestamp or just save the system clock to it or whatever you want and then whenever you need to replace a page, well you scan through all the pages and then find the one with the oldest clock and then you get rid of it. So that would be pretty slow. So every time you access memory you have to scan every single page, well if you have gigabytes of pages, probably really, really slow. So what would this class's solution to this be? If I want it to go much faster. Yeah, so I could like kind of approximate it-ish and kind of randomize it and not keep track that well. Yeah, I could do that, but yeah. Yeah, kind of, what if I had, here let's see if this spurs anything. What about if I had a doubly linked list? For a doubly linked list removing an element is constant time, right? If you know the node removing it from where it is is constant time too, because you have a previous and a next, right? But if you know you are element three and you have the next or the previous and the next, then you're good, right? So we could create a doubly linked list, right? And then each time we access a page it would just remove where it is and then put self at the front. Maybe I have the front as my most recently used and then when I need to get rid of something I'll just remove whatever the hell's at the back, right? Because that would be the oldest. So every time you access it goes to the front, goes to the front, goes to the front and then whoever is at the back is one I can remove. So, oh yeah, we're priority queue, yeah doubly linked list is all we really need. But implementing that in software you can do that but constant time will kill you. So it is actually too expensive as well because well, think of what a page reference is. A page reference is like just accessing as small as accessing a single byte, right? So if every time you access a single byte you do six or three pointer updates to remove it and then three pointer, three more pointer updates to move it back to the front. So you have to do like six pointer updates every single time you access a single byte. Well, your computer suddenly is going to get at least six times slower probably more than that depending on how much memory you're accessing at a time. So even worse than that if you have multiple CPUs and they just have that one linked list of pages as you now know from hopefully doing lab five you would have to have a mutex to protect it there would be data races. So it would be even worse and worse and worse if you talk about multiple cores and we want to make like what we do on a page reference as quick as possible. So implementing this isn't going to work at all. So in practice we can't implement LRU it's way too slow. So what we'll see next lecture is how to implement a approximation of LRU. Idea behind that is well, LRU is an approximation of the optimal case L anyways. So having an approximation of an approximation is sweet. We can do that. That's pretty much what operating systems people love to do. So lots of different tweaks you could do just like scheduling to tweak it to make it run more efficiently make it a bit smarter. So specifically in this course we'll be looking at something called the clock algorithm which I just shifted to the next lecture because there's a big rotating clock and lots of fun and it takes up more time than I have for the end of this lecture. So this lecture gets to be shorter but there's also other things like least frequently used. So instead of just like actually keeping track of what is least recently used have like what your idea was of I'll just kind of track when a page was last used. I don't care when. So I can kind of approximate that. There's something called 2Q which kind of uses that link list idea but implements it in hardware so it's faster. And then there's like adaptive cache replacement. There's all sorts of crazy stuff. So, but what we saw today, page replacement. All the algorithms do is try to reduce the number of page faults. Today we saw the optimal. So good for comparison but not realistic because we have to look into the future. The random page replacement algorithm actually works surprisingly well. It's better than FIFO because it avoids the worst case and it doesn't have that stupid anomaly thing. But FIFO, you could implement that. Easy to implement that. Has that weird anomaly where you have less memory. Oh, you have less page faults. Kind of weird. And then we saw LRU which gets close to optimal but it's expensive to implement. So yeah, I guess first lecture back after being sick we can all just be around after this if you wanna work on the lab or ask me questions later. And I can put back up pictures of puppies if not. So just, oh yeah. So the anomaly at least based off that paper like it's just you can just construct any sequence to get anything you want. It doesn't necessarily follow anything. It's in practice it would be more or less random. All right, so just remember, phone for you, we're on this together. Doot, doot.