 A few of the proud this morning, Monday morning, and low energy level in this room, maybe it's a good day to stand up and stretch a little bit, everybody stand up, all ten people that decided to come to class today, stretch a little bit, if I was a cruel person, you too Jason, you're good, stretch your legs or something, get some blood moving, alright, good morning. I guess I had good weekends, it's a good day to be in class, it's a good day to be in class, maybe it's a reward for the people that decided not to show up today, I'll just have the midterm be entirely on material covered in class today, entirely, it's like there was a, I don't know why not, I might as well tell a story, so you guys know who Al Franken is? Al Franken, former Saturday Night Live writer, current, is he still US senator from Minnesota? Anyway, so he graduated from Harvard and he spoke at my commencement, he was telling a story about a class that he had had, I think it was in the morning and he was doing some sort of show at night and so he would, he would come to class and he would frequently fall asleep in class, none of you guys seem to be capable of doing that, it's just nice and it's good you don't because I have special things planned for anybody who dozes off in this class. But anyway so, so toward the end of the semester he was concerned about his grade and he had also heard that the professor apparently thought that maybe he was on drugs so he decided to to address the situation by meeting, by meeting with the professor so he went to see this guy and you know he went in and there's some sort of enter room and he sat down and and it was really warm in there and you can imagine what happened next which is that he fell asleep so the guy came out finally woke him up and was like what's going on so he explained the situation the professor was you know very nice as we like to be and he said you know well the exam's coming up and the exam in this class was worth I don't know like 80% of the grade and so he told Al Franken he said you know the exam's going to be entirely on the readings, he said it's going to cover nothing from lecture just the readings so you know it's still okay you can still catch up it's all Frank it was like oh thank you thank you went to the library and he spent like two weeks apparently reading all the readings for this class and and I don't know what happens that you'd be but at least at Harvard no one ever did the readings for classes you know I mean maybe we might do five percent of the readings if you were lucky right because they just assigned way too much to read so so anyway then he did have this thought in the back of his head as he was doing this which is maybe this guy's messing with me you know like he thinks I'm a drug addict anyway you know maybe he's just really really telling me the totally wrong thing maybe this stuff's gonna be entirely what was covered in lecture and I'm but it turns out he showed up at the exam and he knew everything and he did this he did fantastic on the exam and apparently the the professor was really angry actually said you know you got the best grade in the class and the person who comes to the class falls asleep all the time so so anyway hopefully that won't happen any of you guys but but you're welcome to come to me and ask questions about the exam and I will not make things up but today we're gonna we're gonna try to finish talking about about swapping right so so last class we introduced the idea of swapping and we talked a little bit about some things that make might make swapping work well right and one of the biggest challenges to swapping well is choosing the page that we're going to move to disk right so if we choose the wrong page then we might do a lot of work and get very little reward out of it if you choose the right page the cost-to-benefit ratio can be very good right so we're gonna talk today about specifically the components of the cost and the benefit and then we'll introduce some algorithms for choosing pages that will probably feel familiar to you guys so when we looked at scheduling algorithms we talked a little bit about tricks or general principles that we play when we're trying to predict how resources are going to be used and those come back and apply again today when we talked about page replaced okay all right so there's a couple announcements today the first is that the the midterm is Friday there are some of you that will take it Thursday because you're gonna be out of town I'm gonna write a separate exam for you guys which will be so much harder oh just kidding it'll be pretty similar but so on Wednesday we're going to review in class right so I we've been getting questions to the course list about the midterm right and I haven't been answering those questions people have been asking things like oh well can you send me a sample question well there's a simple reason I haven't been answering those questions it's because I haven't written the midterm yet so I don't know what the midterm is gonna look like I've been thinking about it and at some point very soon I will have to write it so we don't we don't have a repeat of the preterm kind of disaster right so I am I'm leaning towards not giving a multiple-choice exam right so it may be you know three four short to long answer questions what I'd like to see you guys do is is take some of the design principles that we've learned in class I don't really care if you guys remember the nitty-gritty details of you know the rotating staircase deadline schedule I just don't think that's that interesting right what I'm trying to figure out if you guys can do is take some of the principles that we've applied in class and apply them to new problems right so what you might see on the exam is a couple of design type questions that introduce you to a new situation and ask you to sort of discuss how you might apply some of the design principles that we've learned about in this class to those problems right so that's probably what you're likely to see on the exam I just don't care if you guys remember what LRU stands for right I mean this stuff is really easy to look up right and so the test will be less about sort of like little pecuniary details of various things and more about trying to stress some of the things that we have emphasized over and over and over again in this class about how to do good design and how to write good system soft right so any questions about the general theme of the exam all right so yeah was there a question somebody made a noise I'm not going to I'm not going to box myself in here right I'm not going to take any options off the table right but but right now no there won't be any sort of I'm thinking it will be it will be a better exam if there aren't fill in the blank and multiple choice questions and this will put more stress on me in the TAs but they haven't had enough to do recently so we'll give them some more work probably no no no it probably it probably won't be that involved it probably be more you know here's here's a problem and can you identify some general principles to follow in solving this problem right so there's there's things that have popped up over and over again in class right and we'll talk about some of these things on Wednesday but you guys have heard these things over and over right I mean you've heard for example about separating policy for mechanism right this is a classic systems design approach that allows us to write better software so you may be introduced to a question that asks you you know in this particular case given this particular context what would be a way to do that right or does this design successfully obey this design principle so it's going to be more things like this right we'll cover this more on Wednesday right I don't want to spend too much time today on so I assigned everybody a partner over the weekend right and at some point I just broke down I got tired of you guys waiting to find a partner and I just assigned you people who didn't have a partner I signed you a random partner so hopefully that works please contact the person that you were assigned just to make sure that like they're living breathing human being and not some sort of like you know cyborg student that's been sent to like test hub or something right yeah right yeah I need yeah we need to talk yeah we'll figure this out I'm kind of hoping that there's at least one more person who's in that situation so I can like only kick you guys up but but if we have an odd number we'll figure out what to do about it right they're a classic that we have approaches to this all this problem so the assignment to design documents are due on Wednesday night there's no set template for this document we just want something that is PDF format and under two pages and reasonably format okay I think I can say that and if we get stuff that's like in six-point font then we're not gonna read it right so if I require is like a microscope to read or if the mart like if you have like quarter inch margins or something then we're just not it's you guys know what reasonable is right I mean to get to non reasonable you have to break a bunch of things in word right and like click okay to a bunch of warnings right so just don't do that and I think that's it right oh yeah there's one last bit of assignment to that we're gonna release this morning right that's that's just about setting up code sharing between you and your partner and shouldn't have slowed you down to this point right okay any other questions maybe it's slow down Isaac because he's way ahead of everybody all right so let's talk about the stuff done Friday right Friday it's like such a long time ago right so questions about swapping right anybody have anybody be was there anybody was thinking about this all weekend it is burning with the desire to ask some sort of really key question about swapping maybe nobody remembers what swapping is after the week all right so what is the goal of swapping so I guess the first question which is not up on the slide is is why why do we swap why do we swap let me start in the back right why do we swap what what why like what why even bother right the disk is really slow so why would we move data to the disk anyway not sure over here we're just so we're right we're trying to create the illusion that there's more memory on the system than is actually present right that's the goals that's that's why we smart right week you know swapping is not required it's optional right but when we ran out of core if we weren't able to swap we would just have to start failing allocations right we'd have to start not letting programs execute or having to fail malloc allocations within a program and usually programs don't deal with that very well right so so we'd like to be able to relax this requirement that we can only allocate as much memory as this and actually has right now when we do swapping well right when we swap well the goal is that the system feels like it has memory that is what right back here slow no so okay so so the memory is fast right so so we want meant we want we're going to use both the memory and the disk but we want the system to feel like it has memory that is what as big as disk but at the same time as fast as memory right that's the goal okay memory is fast right disk circle okay now on the other hand if we don't do swapping well what could the system feel like Jason has memory that is what as small as ran and at the same time as fast as the disk or as slow as the disk as well say because the disk is slow what I do with my water ball I shouldn't walk around all right so this is our goal right what do we need to do in order to swap a page of memory out right so remember pages 4k of data right typically maybe 8k I'm gonna I want to I've decided that I have a page I want to swap it out to disk what do I need to do to make that happen let's start over here else right the first thing I have to do is remove the translation from the TOB and all the TOB is on the system right if I don't do that it's possible the contents of the page will change while I'm in the process of swapping it out all right what's the next thing copy the contents of the page to disk right got to store the contents but I don't store the contents the next time the process tries to use the page it's going to be disappointed that all its hard work that it had dumped in this 4k of memory is now lost forever all right what's the next thing I need to do yep update the page table entry I could tell where you were going right you said update right that's right update the page table entry right operates the semester keep track of where the contents of this page are when the page moves to disk the page table entry has to be updated to reflect okay awesome what so wait hold on do I have it don't look okay good sorry these are out of order all right so so what can we do well what can the system potentially do to increase the speed swapping out requires I do this really slow disk IO or does it what can I do to increase the speed of this operation anybody right so while the system is idle what am I going to do what's that I'm gonna it's not really swapping because I'm not removing the page right but I'm gonna copy the contents of the page to the swap disk right so that when I'm on the swap out path I don't have to actually write the data I can just remove the page and update the PTE right so this is this is the goal right and I if I can do this while the system is idle hopefully when it comes time to allocate memory and I need to swap out pages I've got a lot of clean pages around meaning that the page matches the contents on disk right all right so another grab bad question operating systems when do we typically load the content right so when I when I start up a process it calls exec it tells the operating system here's how I want everything laid out in my address space right I've got this big blob of code and I want it right there right when is the operating system usually actually load the data into memory touching when I really need it meaning when the process actually uses that page right so the operating system will make notes it's it's kind of like if you were if you were designing a house right and your architect wrote down exactly how you wanted the house to look and he's like oh right okay you want a room this size but until you walked up to where the wall was there was no wall actual wall there right he was like I'll put that wall in when you're going to notice it right so until you kind of we're about to bump into it he's like okay stop you freeze you build the wall and then you can continue sorry so this is this is what we're going to do right we're going to make notes about where things are supposed to go and where they're supposed to come from but we're actually not going to move any data into memory until absolutely necessary right and and why right why would we have why would we want to avoid loading in data until the pages are used right because there's a lot of stuff that never is used right especially a lot of code paths that your ears executable may never actually go down right so we can avoid loading that code until it's absolutely necessary but on the other hand I want I want just to explore the fact that there's a trade off here right so what what what is what problem does this cause right the benefit is that it's potential that I could save a lot of memory but I'm not loading memory until it's needed but what's what's the downside of on demand paging right there's a delay now right because when the process actually sees or uses the page for the first time I've got to stop it and go get it right so I'm making a trade off here the trade off is that I'm going to avoid allocating memory until it's absolutely necessary but when it's necessary I'm going to take a little bit of a penalty right in terms of extra latency alright so we have to swap on a page when that virtual address is used yeah okay so we'll get to that today actually right but but in general right so so here's a great question right what what makes a plate page clean right according to this slide what do I do that makes a page clean right and but what do I do to make that happen right when the page data is copied to the swap disk at that point the page is clean when does the page get dirty when something writes to it right now how do I tell okay remember I have virtual addresses I have the TLB I've got access permissions on my virtual addresses how would I tell that in that a process has written to a page how would I know this how can I use hardware to help right what's that well well let's let's say I have a bit of my page table entry that I'm trying to update right but how do I get hardware to tell me when the page is written to no no but but using the mechanisms that we've already defined you guys are thinking about how to store this date I'm saying how do I know how do I get a trigger when this when though when the page is written to essentially yes right so what I what I can do right and what you guys can do on your OS 161 system using your MMU is that I can load an entry into the TLB read only right so one thing I might do is I might load when I load a page into the TLB if I want to track when it's being written to right I might load it if if the address that if the instruction that causes the address to be loaded into the TLB is a load right if it's a store I know it's dirty immediately right because I'm immediately writing something to that page if it's a load on the other hand what I might want to know is is the does the process ever actually write any data right because it's possible that this is a code page that's never written to right or it's also possible that this is a data page that just happens to be read only right that the process doesn't actually do any rights to so what I can do is the first time I load the entry into the TLB I can load it as read only what will happen after that is that the process tries to write to it I'll get a separate exception right so the TLB on your system has a read only exception which means that there is an entry in the TLB for that virtual page number for that process but the process is trying to do a store when you've only allowed it to do a load right so that mechanism allows us to see when pages are being written right if the TLB didn't have this feature I would just have to market dirty whenever I loaded the entry into the TLB because after that point I have no idea what the process does to it right process could do stores the process could do loads but I don't see the mixture right but if I have an ability to to load things into the TLB read only then I can distinguish between pages that are only read from right and pages that are written does that make sense okay I don't have a slide for that but all right so when do I need to swap in a page right so I have again when the virtual address is used the operating system had better make sure that that virtual address looks like memory again right regardless of where it is right if it's on disk then what do I need to do what what are the steps here there's more steps than there were before starting we'll start in the back what's that we'll actually grab back here first step what do I need to do I hear mumbling right I need to stop the instruction from executing right the instruction can't execute until I've restored the virtual address to something that looks like memory right because right now it doesn't point anywhere right just points to some on you know part of the address space that's not actually valid right all right so what's the next thing I'm going to do here what's that right so so and there's a couple of things that happen kind of in parallel here the first one is that I need to allocate space for the page right so I need to find a page in memory to hold the new page contents now what might this require swapping out right if I'm if I'm low on memory if I don't have any pages to use any pages sitting around I might actually have to remove a page in order so that I can swap it right so the next thing is I need to locate the page on disk I use the information that I stored in the page table entry when I swapped out to do this right what's the next thing I need to do I found the page on disk I have somewhere to put it copy it in right copy the contents of the page from disk okay three more steps update the page table entry right because the next time if I fault on this page again I need to know that it's in memory now is supposed on disk I've moved the contents loaded in the to be and finally restart the instruction right now this instruction can proceed any questions about this oh sure sure sure yeah I mean it hopefully when we get to step two there are either unallocated pages or clean pages around that I can use right and as we said before a lot of times operating systems will try to keep some reserve of completely unallocated pages around at all times so even when the system has a little bit of memory left it's constantly going around to processes and pushing some of their pages out to disk right so it's kind of saying okay here what's what's a page you haven't used for a while okay I'm going to put that on disk for you so it's constantly trying to do this background bookkeeping because when you launch a process right there's this massive spike in the number of pages that the system might need to use right and if I don't have much memory left on the system I've been kind of trying to let everything expanded to all the available space then when I launch something I've got to do all this stuff to get all that to make enough room for this new thing right so it's really you know as far as I'm concerned it's really the fact that you launch processes and that creates this huge discontinuity in the number of pages that are in use that's the reason for keeping some sort of buffer right and we do that by gently trimming processes over time okay any other questions about swapping so we reviewed swapping we're going to talk today about how we pick that page to swap so any other questions about the mechanism right this is another mechanism versus policy split in operating systems right really everything we've talked about up until now has been mainly mechanism right how do we translate virtual addresses right how do we move content between page and between sorry between the disk and memory right today now like we did with threads we're finally popping all the way up to the point where we can talk a little bit about policy right so let me let me before we go on let me just introduce a useful piece of terminology that's important for you guys to understand there's two little bits of grab bag stuff in here and then we'll get into page replacement okay the first thing is normally we use two different terms to distinguish between two different kinds of virtual memory related faults or exceptions right the first one is called a TLB fault right and a TLB fault is what it sounds like it means that the TLB doesn't have an entry for this virtual address right so the TLB doesn't have an entry for it and the way I address a TLB fault is by loading an entry to the TLB a page fault on the other hand is is a little bit more serious a page fault means that either the page is not in memory the contents of that page are not in memory or the page might be uninitialized it might be a new heat page that the process is using for the first time or in certain cases it might be that the process is trying to allocate an address that's just totally you know as they say on car talk boo good right I mean you know some sort of bizarre faulting address and the results going to be that I'm going to kill the process right so you know three things can trigger a page fault bogosity swapping and new pages that haven't been initialized right so as you'll see on some level every page fault is preceded by a TLB fault right to page faults are essentially a super set of TLB fault right on the other hand not every TLB fault creates a page fault right if the page contents are in memory then I can satisfy the TLB fault without creating a page right okay and and the reasons for this are again if the contents of the virtual page are not in memory I can't have a translation that points to memory right so there's no way if I have a page fault that I could have I could have put an entry into the TLB right there's definitely not entry in the TLB however if the page is in memory and I have a TLB fault it's possible I can satisfy it without raising a page fault right the reason why this is important to understand is that there are architectures that actually allow hardware to assist the operating system in the process of managing the TLB right well we've been talking about in this class when we've been talking about TLB management we're talking about loading entries into the TLB we've been talking about the operating system doing right so the operating system actually being involved you know an exception occurring the operating system taking control and putting an entry into the TLB on some systems there's this further optimization which essentially says that hardware can walk the page tables right so hardware can look up entries in the page table and if there's a valid entry in the page table that points to a physical page hardware may be able to load that into the TLB without ever trapping into the operating system right so so what do you think the pros of this approach are right so essentially what happens here is that we've been talking about every TLB fault triggering a kernel exception the kernel taking control and handling right and what I'm telling you is there on some systems the only time the kernel gets control is when there's a page fault right TLB faults are handed transparently by hardware right so what's a pro of this approach fast right again we can avoid the overhead of trapping into the kernel and having to you know save all the exceptions stay you know wake up the kernel whatever else all these all these instructions that have to be executed every time there's a TLB fault right so so the pro are pro is hardware is clearly faster here what are some of the cons what does this mean about hardware so so okay so we can change the algorithms what can't what what do hardware and software have to agree on here to get this to work right and specifically what interface what what data structure does the hardware have to understand in order to get this to work the page tables right so remember the whole lecture we had we had all these different approaches to different ways to implement page tables if you tell the hardware this is how page tables are set up you're stuck like the hard hardware is dumb right or or hardware is like a certain kind of smart person that only knows how to do a few things really really well right but it's completely inflexible so essentially the page tables have to be set up in a way that the hard work and understand because the hardware is going to walk them without ever involving the operating system right so so the x86 architecture for example is one architecture where the page tables are actually accessed directly by hardware right meaning that the operating system doesn't get control unless there's a page fault right and again as I said before with a hardware managed to be the operating system does not see to be fault this is the whole point right the whole point is that I let the hardware understand the page tables it can find if there's a if the page is in memory and there's an mapping in the page table hard work and find it without interfering with the operating system at all right and that the hardware will only alert the operating system if it looks in the page tables and doesn't find an entry right now on the other hand one of the reasons that we've been talking about yeah yeah I think yeah so so I think if I remember correctly and and and my former advisor Matt has a great lecture he's to do on the the x86 memory management architecture which we may or may not do after spring break I haven't decided it's the x86 is really disgusting and gooey right so it's not it's not a fun thing to talk about it I think it ends up being really confusing he does a really good job of actually simplifying it as much as possible but if I remember correctly right exactly I have a separate page table per process and so on context which is I think what the operating system has to do is at least tell the MMU here is the here's the address of the first level page table for this process right I think the x86 uses a two or three level multi-level page table and so the idea is you have to tell the hardware here's the location of the first page table and then after that hardware can walk it itself right but exactly so every time I have a context which I need to update the hardware with a little bit of information just so it can get started walking right and once it does that the rest of it works right great question all right so as I point out on the bottom side don't worry that one of the reasons we've been talking about this is that your system OS 161 has what's called a software managed TLB right so you get to set up the page tables however you want and you will see every TLB exception right the hardware is not going to help you here at all and this is a good thing for your purposes right because it gives you more flexibility in designing your their page tables maybe it's not a good thing actually it probably is a good thing if you had to debug hardware trying to walk your page tables you'd probably go insane right because there's just very very hardware probably wouldn't give you much visibility into what you were doing wrong right okay so so again in order to in order to actually swap out a page right we need to find a page to remove right and we call this process sometimes we call it swapping out sometimes we call it page eviction right it means a page is being evicted from memory right I'm throwing it out on the street okay in order to swap out a page we need to we definitely have to choose which page to move and frequently like I said swapping in or creating space so that I can swap in also requires choosing a page to move out right especially once I get to the point where I have what's called memory pressure right so memory pressure means that the applications combined application usage of memory is more than what will fit in core right so at that point you have a machine that's under what's called memory pressure right if you put 16 gigabytes of RAM in your machine you may never have memory pressure right but if you have you know two gigabytes or one gigabyte you might be under memory pressure all the time right alright so and and on some level making a choice about what page to swap is the operating system trying to optimize this cost-benefit calculation right so what are the costs of swapping a page out anybody the couple of different kinds of costs involved here John right so this is slow exactly I've got time so it's essentially time and time right time and bandwidth right so I've got to stall some sort of process for long enough to move the data to disk and also I'm doing IO right so if you guys ever had a system that's so we'll come to this in a few slides have you guys ever had a system that's just been really unresponsive and it's sitting there and you can hear the disk going constantly is like you know so that's what's called thrashing right and thrashing is something that happens we'll talk about in few slides but that means there's so much memory pressure and the system is doing such a bad job of allocating memory that the disk is kind of going constantly right so I maybe actually saturated my disband with trying to keep up with all the page right time and ban okay now what's the benefit here what do what do I get from evicting a page and moving it to memory right so so as specifically as I could put it the benefit is I get to use one page of memory 4k of memory for as long as that other page stays unused right so I throw a page out and as long as the out is the process doesn't use that virtual address that points to that page I've got some extra memory I can play with right so again as long as the page on disk remains unused I've got an extra 4k every right so for a specific page this is the cost benefit calculation the cost is the IO and the time the benefit is I get some extra memory and again the benefit is time dependent right the benefit is I get to use that memory until the page is used again right alright so on some level optimizing this cost benefit trade-off we could do two things one is that we could try to minimize the cost right we could try to reduce the cost of doing this and there are a lot of tricks that operating systems play to try to minimize the cost for example I might try to set up my swap partition in a way that allows me to do the IO required to move pages to disk more efficiently okay that would reduce the cost a little bit and I would reduce my cost-benefit ratio but mainly what we're going to try to talk about today our algorithms that focus on maximizing the benefit right so again I can improve this by minimizing the cost or maximizing the benefit most of what I try to do is maximum benefit minimizing the cost is mainly some sort of clever engineering tricks that try to reduce the overhead of the IO that's required to swap right alright so a complimentary description of this goal which is way at the bottom of the slide sorry is remember I talked about page faults right so page faults happen when the page if a page is on disk that will definitely produce a page fault right and on some level another goal here a complimentary way of describing this that you'll hear about some dozen people talk about these algorithms is minimizing the page fault rate right so if I'm generating lots of page faults it means that I'm spending lots of time moving stuff back and forth the disk and I'm slowing the system down right if I can minimize the page fault rate or reduce the page fault rate it means that I'm doing less IO and but under the same conditions and the processes are actually able to use memory that is in memory and and operate faster right alright so thrashing let me just would just talk about what happens when things get really really bad right so so thrashing as far as I could tell and according to Wikipedia doesn't really have a scientific definition right but you it's one of those things like like the famous Supreme Court Justice said about pornography right you know it when you experience okay so thrashing right and you guys have probably experienced thrashing right it's a colloquialism and it describes the computer's virtual memory system is in a constant state of paging right the paging is is a synonymous term for swapping rapidly exchanging data and memory for data on disk to the exclusion of most application level process this causes the performance of the computer to grade or collapse right so some of you guys have probably had this sort of thing I remember once somehow I launched a fork bomb on my machine at work right accidentally not on purpose right for me we talked about fork bomb before right fork bomb is something that causes an exponential increase in the number of processes on the system and you know within 10 seconds the thing was just sitting there you know and eventually I think I had to like unplug it and you know turn off the power to most of eastern Massachusetts to get the thing to come back up right okay so so anyway this is again this this this is when things really really get bad right if we do a bad job of choosing pages to evict we the system will thrash hard right and you will essentially be forced to shut it down and start over okay so okay so so again how do we maximize the benefit is we get the use of 4k as long as the page stays on disk right so how do we maximize this benefit right what's the one thing that we can control here what's the one thing that we can maximize maximizing the amount of time it takes for that page to be used again right so exactly right we want to pick the page to evict that's going to remain unused the longest right this is the goal of these algorithms okay so so first of all just let me point out there's there's one little core case here so we talked about this last class what is like the dream page of a page replacement or page eviction algorithm what's the page that will be never used again that's like oh that's again that's that's the that's the thing that these guys dream about a night right if you could only figure out which pages the process was never going to use again right then that would be great now I was thinking about something when I was preparing these slides so why why doesn't the process tell the kernel hey I'm done with this page right are there make it first of all interesting question people will be working on assignment to are there mechanisms to do this is there any way for a process to surrender a portion of its virtual outer space yeah no no but I'm saying what what's the interface that I would use to tell the kernel like this page will never be used again right there is one that I know about does anybody know what it is does anybody implement this for assignment to if you've gotten way ahead so I think s break can also it can extend the break point but it can also reduce the break point right so s break I think allows the process to return memory right so I can ask for more heap or if my allocator is clever I might actually tell the kernel that I'm not using some of that here but in general these mechanisms are few and far between and I would argue what do you think the reason for that is why aren't there these great interfaces allowing processes to tell the kernel oh I'm done with this page well okay so that's one problem right there's there doesn't seem to be really be a benefit here to using that interface but what's a more fundamental problem what's that no no no let's say I can only free pages in my virtual outer space so I could just tell the kernel this page of my virtual outer space I'm not going to use again right there was a I think there was a well okay so the kernel can't trust a process I can certainly use it as a hint though I can say well if it told me that then maybe it's right I would argue the more fundamental problem is processes don't know right like they just don't know they don't think about this right they just say oh give me more memory more memory like oh wow all this code is so great like there's no mechanisms in system design for even expressing this right like processes just don't have a clue right so so it'd be but it'd be interesting to think about can you can you do this well right would there be a way to either write this into the compiler at some level so that you had some hints that a process could give to the system about things that was done but I think that that would be so hard that it wouldn't be even we would do better by guessing right which is kind of what we're going to do all right all right so let's let's so you guys remember when we did schedule in algorithms one of the things we talked about is what do we want to know about a page or or when we did schedule now because it wasn't pages it was threads right so okay so okay it's interesting right so when we talked about scheduling algorithms we talked about threads we talked about priority we talked about their usage of resources how long are they going to run before the yield or block etc etc right so if I could predict the future paging is a little easier right what would I want to know about a page no no no if I could predict the future Isaac no no no no no no I've got my crystal ball right what would I like to know about a page when it's going to be access next right if I could predict the future right and I could and I knew that this page isn't going to be access for another two hours right whether this other page that I'm considering is going to be access two minutes later then I'm done right then this is easy I order pages by the length that they're going to stay out and I just I remove them in that way and that's it's really really easy to see that this maximizes our cost benefit calculation right the cost is the same across all the pages I'm considering and the benefit is the greatest the longest the longer the page stays out of memory right because for all that time I get an extra 4k right so this you know again like we said with thread scheduling this is the optimal schedule we think about this as an optimal schedule it's not realizable right but it's a good metric to use when you think about comparing different approaches right say okay well how a lot of times in systems research what we do is we construct an optimal approach and then we compare our approach against the optimal right and if you can show that the optimal provides a hard bound and your approach gets you know 95% of the optimal performance and you're doing pretty good right okay this schedule is difficult to implement that's an understatement it's impossible you don't don't have no crystal balls right okay back to one of those classic system design principles right one that I think I may maybe will see I drilled into your head a little bit more insistently than some of the other ones so when we can't predict the future what do we do use the past to predict the future and again when we can't predict the future what do we do everybody together use the past to predict the future right so again this is on some level this is all I have all I have is the past right I've just lived it with my memories about what pages did I don't know what's going to happen in the future right so on some level doing intelligent page retracement requires solving three problems right first of all figuring out what about the past do I care about right that's that's problem number one the second problem is figuring out how to collect statistics or track information about that thing right about that characteristic and the third thing is how do I store that information and access it appropriately right okay so now now now what could happen here right so so essentially what I've said I can't predict the future so I'm going to use the past to predict the future but I need to know something about the past so I need to you know keep track of them some things that are happening right if I don't do this well what could happen right what I'm trying to do is I'm trying to make an intelligent decision about which page to evict but what's the trade-off here right what what what could what could start to overwhelm my system if I'm not careful I want I want somebody on that side of the room guess back no anybody in this area we have a guest about well let's say let's say I can I can do a better job of picking pages right but let me give you guys a hint so if I'm not careful just the process of collecting this information has its own overhead right and if it slows down let's say the process of collecting information requires that the kernel translate every virtual address every time it's accessed well now I might as well not have a TLB in an MMU right I'm in big trouble right and and the systems be so slow that your users are going to be begging you not to be so quote-unquote smart they're going to say just do something dumb right you know you're you're killing me man all right what about story statistics what's the other problem with story statistics memory right I mean this this might consume a lot of space and again if I consume so much memory story statistics that I don't have enough memory left for the things that the process is trying to do for the data that the process is actually that's actually visible the processes then this thing falls down right so these these these are these are my trade-offs I have to be careful I have to make sure statistics collection is lightweight and story statistics is is compact right okay what back to our schedule in algorithms who remembers simplest possible page replacement algorithm I need a page to evict I've got some pages that are in memory what's the simplest way to choose one choose one at random I just pick a random page okay so what are the pros to this approach very similar to schedule process thread schedule it's like no overhead simple no overhead no state to store you know it's great right and here's the other thing I want to impress on you even if we don't get any farther today random algorithms right are frequently also great tools to evaluate your clever approach right because you may think man I've got this great idea for a page replacement algorithm I'm gonna I write out this code and I'm gonna collect these statistics and I've got these really clever I'm gonna use like a you know some sort of artificial intelligence learning online learning algorithm etc etc etc and if it doesn't outperform random or if it just barely outperform is random then forget right I mean right random is usually a good baseline I mean the you know it's it's you know it's the worst you can possibly do but it gets the job done right and again what's the con here which I basically just said maybe we can do better maybe right it's it's potential right this this actually doesn't use any information about the past right it just makes a decision completely rent right alright so who who can get ahead of me a little bit and so again my approach is random doesn't use any information but I want to use the past to predict the future that's that's my mantra right that's what that's what I do so what information about this pages past could potentially be helpful to to to to keep track of when it's accessed right how how recently has the page been accessed right and so least recently used right or LRU chooses the page to evict that has not been accessed for the longest period of time right so if I knew this about the pages on the system I order them and I say okay this page hasn't been touched for an hour right that page is probably that page may never be touched again right but that page is probably not super important so I'm gonna move it out to desk right again there's no guarantees here right the process may be about to use that page you know maybe like oh wow I haven't touched that page for an hour it's time to get back to it right and and you're moving it to desk right so there's no guarantees but but this is you know maybe the best we could do right and what we're hoping is that pages that are cold right quote unquote that are not being actively accessed may stay cold right that they may not be accessed again in the future for a while all right so what are the what are the pros to this approach I mean this might be as good as we can do right at some level you know there are more clever things that we can potentially think about doing and and there are there are more sophisticated approaches here but on some level you on a page by page basis without incorporating more information this is probably close to the best that we could do right and and what what are the cons here though what what does this require the system do so there's two problems here right first is how do we tell how long it's been all right how do we tell when a page is accessed and second again how do we store how long it's been right so we need to do two things to implement this we need to somehow be able to figure out when the page is actually being used right and we also need to store that information so right so let's talk about how we collect statistics right we talked about this a little bit earlier in class at what point does the operating system know for sure that the page is being accessed at what point do I know for sure right a TLB fault right a TLB fault is the TLB telling me hey kernel this process tried to use this virtual address right so at that point I know for certain that there's an and a instruction that tried to execute that was referencing something on that page right so at that point I know for sure now on architectures where the operating system doesn't see TLB fault what I usually need to do is rely on the hardware to help me store this information right because otherwise when pages are in memory I have no way to know what the process is actually using right but when I have a TLB fault I know that the process access an address on that page okay now what does that not tell me right let's say a process starts to run and it generates a bunch of TLB faults at first because it's paging entries into the TLB and then you know 10 milliseconds go by and another process starts to run right what what two pages can I not distinguish in the set of pages that the process used and then I loaded into its TLB so can I distinguish between a page that was used once and a page that was used a million times no all I see is the first page access after that the access these are completely transparent and that's the way I want it to be because I don't want to have to stop the kernel every time a process access is a page and even hardware data collection at this granularity would potentially be expensive right I want those translations to complete immediately and let the instructions execute right I don't want to keep track right and and again I just told you why too slow I cannot record every page access right that would be infeasible and it's also not clear that the benefit from this would be really that high right now let's talk about storing data about the page access right so you know how do I represent time and the system what's that a time stamp right and what's a time stamp it's a number and what are numbers the series of bits right and the more bits I have the more time information I can store right but imagine I try to store like some 32-bit wide time stamp for every page on the system when I loaded into the TLB right remember our page table entries used to be 32 bits right now there's 64 bits right so when one fell swoop I doubled the size of my page tape lunch right it's not clear that this is going to be a win right what I could try to do is is take some subset of the bits in the TLB sorry in the page table entry that are unused and jam a time stamp into them but now you know again now I've gone from two to the 32 ticks of granularity where a tick could be could be any unit of time right as long as it's the same now I have 256 right so that's potentially much small right so there's a trade-off here between the width that I use to store this information and the granularity right final question before we break up for today how do I find the least recently used page right so now I'm in the algorithm and I've got physical pages on the system and I've got I can you know and actually one of the out data structures will have to design for assignment three as a way to map from physical pages to page table entry but let's say I can access all the page table entries for those physical pages that are in the system that are in memory in court how do I find the least recently used page I've got to design some sort of gnarly algorithm right like a heap or something that allows me to to store this so that I can really really quickly find the one that has the longest the least recently used time right the longest period of time since it was accessed right so this is another challenge right I need some kind of efficient data structure holding all the physical pages and I have to search it on every eviction right so this is potentially fairly expensive right and it could be hard to get right all right so let's see here what time is it all right so let me let me just get through clock LRU and they'll be done I promise okay so clock LRU is a fairly simple and stripped down what I would I would call it it's an LRU like algorithm right and clock LRU is nice for two reasons first of all it only stores one bit of usage information no time stamps just one bit and the second thing is that it stores pages in a very efficient sort of length list data structure that it searches on each eviction right and here's what clock does and I have a picture to show it on the next slide right so I arrange all the pages in physical memory in a fixed order maybe I just go by physical address right if when I'm locating a page to evict what I do is that if the access bit is clear right if the access bit has not been set I evict that page that's the page I choose if the access bit is set I clear that bit okay when do access bits get set does any can anyone remind me when I load it into the TLB right so when I load into the TLB I set the access bit on the page right and this is how this algorithm works right so let me show you right so the reason we called a clock is we think about a set of page table entries or pages that are physical memory the green ones I'm using here to show ones that where the bit is clear right so these are pages that have not been accessed since the clock hand came around so my clock hand starts here and what do I do what do I do with this page type of this one has been used so what am I going to do to it I mark it clear and I go on okay what do I do to this guy evicted right so the first round of my algorithm that's the page I choose to evict okay what happens the next time I run the algorithm which page will I evict the next one here and what do I do to these two mark them as clear right so I walk through here I mark these guys as as clean and then this guy is the next one I marked as clean and I throw that one okay so I will come back to this maybe I'll maybe I'll do this very very quickly again on Wednesday before we review because I want you guys to see this but this is essentially the end of the material for today I will see you guys on Wednesday for midterm review