 Picking up where we left off from last time, we're going to move on from TLBs into demand paging and if you remember though, just to keep you apprised of where we're going here, we talked about caching when applied to actual address translation and in that case, virtual address comes out of the CPU and the TLB is a type of cache and that cache in this particular case as well, do I know about that virtual address and if the answer is yes, then we get a very quickly get a physical address that we can go to physical memory with and get the data. On the other hand, if things aren't cached, we have to go through a translation process with the MMU which involves often a table walk through multiple levels of the page table and then finally, when we get that result, we can bring it back to the TLB and at that point we have a physical address and the TLB will cache it for future access of that particular page and then there's often an untranslated path for the kernel. The question is really one of page locality which is does it actually exist and because if we don't have locality, then it doesn't matter that the TLB is a type of cache and is much faster when things are cached. If there's no locality, we basically don't get any benefit and we kind of talked about the fact that instruction accesses have definite locality to them as through stack accesses and some data accesses so this can be good and then also the TLB can be a multi-level cache as well. There was an interesting question that came up on Piazza after last time which was well, why do we have both a TLB and a cache and the answer in this case is that basically the TLB is caching the address translation whereas the actual cache is potentially caching the data from physical memory. Okay, so I just wanted to point out here's a modern chip and the TLB is very important so there's a TLB here next to the cache for data. There's a TLB next to the cache for instructions and there's actually a second level set of translations to provide us with some additional performance to avoid having to walk the page table unless absolutely necessary. Okay, so now very quickly walking things through to show how this all worked, we last time showed you if we have a virtual address that happens to be a two-level page table then we grab some bits from the virtual address, look them up in the first level of the page table that gives us the second level page table potentially this one we look the use the second say 10 bits in this 10 10 12 for example to look up an entry and then finally that gives us a physical page so we started with this red block here which might be 20 bits in one instance that's a virtual page ID that gives us a physical page ID which is also 20 bits we copy copy the offset and suddenly we have our physical address that physical page points to a 4k chunk in memory that's what it's doing and this is DRAM over here and the offset points to a point an actual place within that page okay and then just to pull it all together obviously here we walk the page table potentially now our TLB might actually have this cached and so the way that works is we take this virtual page which is 20 bits we look it up in the TLB that gives us our physical page and so that's a much faster path and this TLB is on chip and very close and then finally just to pull the cache for the data back into play you know this is basically we can treat this physical address as a tag plus index plus bite offset that index looks up on the cache the tag gets compared and if that matches the byte then picks up a byte out of the regular cache rather than having to go to DRAM all right so that's everything in one little animated slide I was wondering if there were any questions on that or should we keep going are we good now actually I had a question about the the physical page number can allow the physical address to be larger than 32 bits is that right it could depending on how you set up your architecture so if if it's an architecture that can deal with more than say 20 bits worth of 4k pages then potentially this could be bigger there were some intermediate architectures which were kind of halfway in the x86 between 32-bit and 64-bit architectures where there was the potential to have a somewhat larger physical address than virtual address pretty much now that it's a little bit of an anomaly pretty much now the physical address has the same number of potential bits as the virtual address but as you say that's possible okay so and if that were to come up in an exam for some reason you know just see what the bits are that we tell you we have so and then we kind of ended up with this at the end which was that we said well in a good circumstance the say the instruction address comes out as a virtual address we look it up and it's in the page table so we immediately then get a translation or excuse me it's in the tlb we get a quick translation we look in the page table and under that circumstance the the page is marked as present and so what's good about that is that now the page number is basically picks a place in that page in physical memory and we can go ahead and complete the access now if this page table entry marks it as not present then we get a different result the instruction goes to the tlb that tlb might or might not be cached but in that case again we look things up in the page table so if it's cached we know immediately what the page table entry was otherwise we have to go to the physical page table in either case the things marked as not present and we get a page fault and that causes an exception and so that process now is is stalled because we have an exception and because we're potentially going to have to resolve this by going to disk under those circumstances we are going to put the process to sleep and pick another thing off the run queue the page fault handler itself prior to that potentially schedules the disk to load something off of disk later that will potentially actually come off of disk you remember our number of a a million instructions worth of time might come off of the disk into physical DRAM the page fault handler at that point starts running again potentially putting a valid entry into the page table and putting our task our process back on the scheduler which eventually wakes up the instruction gets retried and the page table entry is now valid and we go forward so what i've just shown you here is the example of a page fault leading to pulling data off of disk putting it in memory cleaning up the page table and then doing the access afterwards so this is actually a good example of demand paging so in some sense you can view this DRAM here is a cache on the disk which is potentially much bigger than the DRAM okay and so now this is a new kind of cache okay there are caches everywhere and so if you look at our typical caching diagram here where we have fast things to left and big things at the right and the whole point of caching is well the point of caching is as fast as the small things and apparent and as big as the big things so that for instance the disk is going to appear to be faster because we're caching things in DRAM and so potentially if you need a lot of physical memory say more than what can fit in the DRAM instead we're going to put it all on disk and we're going to hope that our caching mechanism properly puts the things that are in use in the DRAM and if we could do that namely get locality out of it then we can run processes that have a much larger address space potentially than will fit in physical DRAM but will fit on disk and so basically this is often the 90-10 rule program spent about 90 percent of their time and 10 percent of their code and it's pretty wasteful to waste DRAM just holding things that aren't used very often and a great example if you're wondering why that might happen is you have a huge library that you've linked with and only some of the code is actually in use by you most of the time that's one instance and so most of the binary is left on disk the second good example of this might be as your program goes through phases and while it's doing one phase the other phase of the code and data is not in use and so this is really caching just like we talked about for instance here where the SRAM might be a cache or there's nonchip cache these are caches on DRAM in that instance we try to put only the things that we're actively using for the DRAM into cache and get performance out of that okay so we're now talking about caching from disk to DRAM and so the solution of this type of cache is using memory here DRAM as a cache on disk and so what we talk about just to give you a little bit of a how the ambiguities of terminology show up we often talk about caching if you don't have anything else to know about what we mean we're often talking about DRAM being cached in SRAM if you're being more specific and you're talking about paging this is a type of caching where we're caching the disk image which is big in DRAM it's still a type of caching but if you use often you use word paging to represent that and today's lecture as we get on in toward the end there we're going to start talking about a bunch of different mechanisms to try to do a good job of picking which of the blocks on disk actually go into DRAM all right and one thing just to know is uh for example the clock algorithm which we'll talk about is the very common way of doing this all right any questions on that so now if you look uh so demand paging is caching you can really ask all of the cache questions we talked about and we started this last time so what's the block size of this cache well in this case it's one page because the thing we bring from the disk into the DRAM is a page at a time when we were talking about caching from DRAM into into the fast cache on chip we were talking about a cache line which was typically 32 to 128 bytes so here the block size is 48 is 4 kilobytes excuse me the organization in this case is is it direct mapped is it set associated is it fully associative it's going to turn out that it's actually fully associative and the reason for that is because through the page table we can essentially map any uh virtual address to any physical address using the page table in the TLB and so we end up with a fully associative mapping there how do you locate a page while you first check in the TLB and if you can't find it then you walk through the page table to find the page in the physical DRAM what's the page replacement policy so when we when our DRAM is full and we need to pull a new page off the disk which one do we throw out that was already there and you could ask you know what are some options while we talked about LRU and random last time for caches this gets much more interesting okay there are many page replacement policies we could come up with uh and it's going to really matter to us here because when you do a cache miss and I do that in quotes a page table miss you got to go to disk and that's our good old a million instructions worth of time so it's really got to be really careful not to throw out the wrong block off of the disk so um so what happens on a miss well we go to the disk and then finally what happens on a write so remember if you remember when we were talking about hardware caches we could write through or write back where write through was the write goes not only into the cache but into the underlying backing store if you think about it we absolutely don't want write through in this instance because that would mean the pro every processor write would not only go into DRAM but then it would have to be written back to disk and so now suddenly our processor writes would be a million times slower than they should be because everyone had to go to disk and so this is a write back cache what is our can anybody tell me what the circumstances are having to have a write back cache here yeah great dirty data in the cache that's exactly right so if there's dirty data in the cache that means that before we replace something we we really have to write it back to disk because otherwise we lose data so great good answer so this is an illusion of infinite memory right so in the case of a 32-bit processor we remember that two to the 32 is four gigabytes we would like the illusion of having four gigabytes in that instance without necessarily having four gigabytes okay and the way we do that is this virtual address space of four gigabytes is squashed down through the page table to the physical memory which might be a lot less so I'm showing you here a small what would be a small machine by today's standards but let's suppose that my it's 32-bit address space so we have four gigabytes and we have only 512 megabytes of physical DRAM what happens is we use the page table to map parts of our virtual address space to things that are actually in memory and the things that aren't actually in memory get mapped to disk okay and so we know that to have a four gigabyte virtual memory space that's all in use we need to have four gigabytes on the disk this may not be a big deal you know today we've got terabytes without problem but we couldn't fit that whole four gigabytes into our physical memory and so that's what our page table does and so the disk is larger than the physical memory this is pretty much always been the case and so by basically using the disk as our actual memory we're referencing but then caching it in physical memory we get the illusion that we have much more physical memory than we than we want and also I will point out that this is only one virtual address space so if we have you know 20 processes each of them might have four gigabyte virtual memory space we can multiplex them through their page tables so that each of them might have four gigabytes on disk but together they don't fill out more than the 512 megabytes that's physically available and that's kind of how we get this multiplication effect we get to have much more the apparent increase of our physical memory by a significant fraction and so the principle here is really a transparent level of indirection that's that when we're using this virtual memory the processor doesn't have to know that it's virtual and transparently all of its accesses get translated through the TLB to if it's actually in physical memory then we get very fast access okay and if it's not actually in physical memory then we do page faults we have to pull things into physical memory possibly throwing something else out of physical memory adjusting all the page tables but we do that transparently to the program the operating system this is its job in the case of paging and the page table lets us basically place portions of the address space anywhere in physical memory and as a result it's very flexible how we decide to use this physical memory amongst the set of processes that are there okay are there any questions on that does that make sense everybody okay are we good now and feel free by the way to use the chat I am actually watching the chat so if you have any questions I will restate them so now let's remember for a moment we talked about a page table entry this is in fact the x86 page table entry for 32-bit machines it had this 20-bit physical page frame number remember we've talked about 20 bits in that example I was reminded you of at the beginning of the lecture and then the remaining 12 bits are basically consumed by a bunch of different bits that give us different status bits and so on three of them are free for the OS to use in any way it wants several of them for instance may be forced to zero by um reserving being reserved for later the most important ones I want to talk about for now are the present bit and the dirty bit okay so the present bit which is the same as valid in pretty much other architectures says that the virtual address that that is going through this page table entry has actually a mapping or not and if p here this lowest bit is not a one if it's a zero then the rest of the 31 bits including the page frame number are basically meaningless and are free for the operating system to do anything they want on the other hand if this present bit is a one then the rest of the bits are meaningful including the page frame number and we know that at that point we have a translation the other bit we haven't talked about yet is the dirty bit and it's exactly what you think it is it basically keeps track for valid or present pages it keeps track of the fact that that page has been written recently okay and so this is the way that the operating system knows that if it's going to free up a physical page and reuse it for somebody else it's it can't just throw it out it's got to actually write it back to disk first in order to preserve that data and so that dirty bit is hardware managed okay in this case d is something that it happens automatically when you do a write or a store operation to a page it's already mapped into DRAM and this this d bit gets set by the hardware for you now let's look at some demand paging mechanisms so first of all I talked a little bit about the fact that the page table entry makes demand paging implementable so valid okay hold on I'll answer that question a second actually maybe I'll answer the question so the question was does the implementation for having a disk page file itself introduce overhead so the you're talking about the overhead of this overall arrangement where some of the the data is on disk and some of it's in physical memory is that the question oh so the question was does this arrangement introduce overhead in the instance where we don't actually have to go out to disk and the answer is no this translation of the TLB and so on is put into the pipeline in a way that doesn't impede fast access okay so so if you didn't need the illusion of infinite memory but did something else then you could design a processor perhaps without virtual memory but if you have a processor that's already been designed for virtual memory part of designing the pipeline for that is figuring out how to have the TLB access work at full speed of the pipeline so that's part of what a computer architect has to figure out how to do now if you're thrashing in the TLB because it's too small and you're constantly missing in the TLB and going back to the page table that can have some overhead all right good now a good question so so basically the difference between valid and not valid is whether that page table entry represents a page that's actually mapped to DRAM but suppose the user references a page with an invalid page table entry so let's actually start to think about that now if you remember in that brief animation I gave you a few slides ago we're obviously going to get a page fault at that point so the memory management unit traps to the operating system calling causing a page fault and now it's up to the operating system to decide what to do and I will tell you off the bat here that you know once you've trapped in the operating system it's not always the case that there is a valid page out on disk it's possible that this is really a part of the address space that the user wasn't supposed to use in which case you're going to get a page fault or a segmentation fault core dump out of it it could be that the reason we've marked this as not present is to be a flag to catch some condition in which case maybe we return immediately without changing this and so you could actually have parts of the address space whose sole purpose is when you read or write from an address you actually trap into the operating system to do something so there are many interesting things you can do another one by the way I'll just toss out for you is that this this w bit if we mark something as not writable we might do that as we mentioned before right after a fork where we've copied the page tables and pointed both the parent and the child at the same physical pages but marked them all read only that's the mechanism with a page fault that you could do copy on break we mentioned that before so memory mapped IO the question is is this the way memory mapped IO works in many cases what actually happens with memory mapped IO is it doesn't even go through the regular page table there are certain addresses that are known to go directly to not to the page table but go directly out to the device in other cases you can have these parts of the page table marked to ignore the cache and write through if it's part of the address space that's covered by the page table so these are a little different depending on circumstances we'll talk more about memory mapped IO in a couple of lectures so we get a page fault what does the OS do on a page fault well assuming for a moment that we're actually doing demand paging we have to pick an old page to throw out so let's assume for a moment that all of the DRAM is in use and the fact that we just had to pull something in off of the disk means that we we have to get rid of something else so we pick an old page to replace if the old page is modified namely the D is one then we have to write the contents back to disk so we don't lose any data and then listen carefully on this we mark its page table entry as invalid because the previous page that we're removing it's no longer accessible because we put it out on disk and any cached TLB has to be also marked as invalid because if it isn't then the processor could try to access this old page and the TLB would say oh here it is and we would get bad translation okay so the question is how would virtual addressing work without demand paging the answer is if you're trying to do demand paging virtual addressing wouldn't work without demand paging so you would be pulling things off of the disk but you can also use like I said you can use the virtual addresses to do copy on write for fork even if you're not demand paging things off the disk so maybe that is one answer to that question and the other question is can you briefly explain how the correct page on disk is identified hold that question I'll tell you in a second that's totally up to the operating system as to where it wants to store that information could be a page table etc it could be in the page table somewhere or it could be using those 31 bits we'll get to that in a second at that point we load the new page so we're back here by the way to a page fault we load the new page into memory from disk and we have to update the page table entry for that new page to now be valid because we've pulled it in off the disk and we've map updated its mapping and this is a little confusing usually to people we have to invalidate the TLB entry for that page which probably indicates that it's invalid so if you remember how did we get into this whole process we uh referenced a page with an invalid pte that invalid pte probably got loaded into the page table and so we have to get rid of it out of the TLB was loaded into the TLB we have to get rid of the TLB so that the moment we try to restart and update this page table it won't find anything in the TLB it'll have to go down to the page table come back with updated TLB and at that point we just continue where we left off and this is the cache so if you say demand paging is caching this little thing I have uh this box that I've highlighted is what makes it like a cache okay um and what's interesting is once we restart the TLB for the new page is going to be uh have to be reloaded because we'll have a TLB uh missing TLB uh have to go down to the page table we'll reload the uh page table entry which is now valid and we get to continue where we left off and of course as I mentioned before while we're waiting so notice uh this load new page into memory from disk that little line there is potentially a million instructions worth of time so while we're waiting uh we hopefully have put this process properly on a weight queue so that some other process can run until the data comes back all right questions okay so uh good so back here so you're uh the question is so why do I do this in validation for the new entry all right so the answer is think about the TLB as a cache on the page table okay so what happened was up here at the beginning we referenced the page and we got a page fault because we looked at the page table entry it said invalid but if you think about what happened in that process we took the invalid entry from the page table and put it into the TLB and so now we have a cached page table entry that says invalid and so if we don't invalidate that cached entry in the TLB then when we restart the process down here and uh go to look at that page it's still going to be invalid because the TLB entry still says invalid so the first thing so what we do is we invalidate that entry we kill we kill it out of the TLB so that when we restart the instruction we have to go all the way down to the page table reload the TLB with the page table entry that now says valid did that help great I and I understand that's confusing it always is the first time you see it but it what's the key thing is to think of the TLB as a cache on the page table and it would have the old page table entries in it which have become invalid or incorrect because we're messing with the page table itself we're changing it all right are we good everybody else good on that all right good so the original origins of paging were pretty much what I said before which is you had way too little memory for all of the processes you were trying to run so you had many clients running on dump terminals this might be a main frame there's lots of processes but they're all out on disk because our memory is very small and so paging was really necessary to do anything all right now today you go to buy a machine I know when I buy machines you know if I don't buy them with 16 gigabytes I'm kind of wondering what I'm thinking because programs require a lot of memory these days so it's less the case that we have huge processes that need to be multiplexed but they're still you know they're still pretty big parts of libraries that aren't in use so there are some uses of paging but it's less it's less necessary than it was originally so disks provide most of the storage we have a little tiny memory and this was kind of where things started now if you move forward and by the way we're actively swapping back and forth a lot so paging happened a lot now paging of course if it's a million instructions worth of time and you're paging a lot you can imagine that nobody's making any good progress all of our processes are hurting for performance and this is just bad so if you kind of look at the modern recommendations for linux and windows and macOS and so on they're recommending that you have enough DRAM so that paging very rarely happens it's something that kind of happens to readjust and mostly be to readjust what's on disk and what's in memory but most of the time we have tremendous locality and as a cache this is you know 99 percent hit rate is the moment you go out you're in trouble so today's very different situation we have these huge computers and even our local machines are huge and so if you look at a single machine for instance and you guys should all you know occasionally look at your task manager or psaux or whatever you want to look at what you see is that memory is about 75 percent used not 100 percent 25 percent is in use for dynamics if you were to stare at some of this and what's interesting here to me is that 1.9 gigabytes of our memory of our 16 gigabytes roughly 15.5 that's usable to processes is shared so that means there's a whole bunch of sharing going on between processes through memory and that's that's very different first of all the gigabytes were unheard of back in the original paging days but secondly we've got a lot of sharing both from libraries from common communication and so on and so yes we have paging going on demand paging in the traditional sense but we have a lot more interesting use of sharing going on as well now many uses the virtual memory and demand paging today and we've told you several of them one for instance the stack which is growing downward has an empty page at the an invalid page kind of at the point where the stack is going to hit and the moment that the stack grows down you get a page fault you allocate a new page and zero it so this is an instance where the page fault isn't going to disk in fact what's doing is it's going and allocating a brand new zero-filled DRAM page and adjusting the page table entry to point to that new page extending the heap same idea forking we talked a lot about using page table entries marked as read only as a way to avoid copying when you take a child when the child is first constructed you don't copy the actual data you copy the page tables and mark them all as read all exec you're only bringing in parts of the binary that are in active use and you do this on demand so that's a good example etc of using demand paging and memory mapping to get explicit shared regions or to access files so virtual memory processes are actually or virtual memory mechanisms including interesting uses of the page table entry are used for lots of things these days now some administration so you know I realize it's very stressful for everybody I'm hoping that everybody's remaining safe you know obviously washing your hands and good social distancing to the extent possible is important but also important is I want everybody there to stay in touch with people use your devices talk to other people you know come up with you know google hangouts or whatever and talk because it'd be very easy to get isolated now I know I'm getting cabin fever staring at my my screen so I imagine you guys are all kind of doing so as well uh yeah party over zoom yeah that's a good idea um except for class time so uh we're attending to keep teaching cs162 virtually so I want you to know that um we want to keep feeding interesting information uh to you guys and um so the lecture is going to stay live discussion sessions and office hours are live uh we've made some adaptations that I outlined in a recent post but um for instance we're only going to have one Friday section per sign slot for now um but uh we're going to make sure that you guys can all attend office hours we may adjust this as we see necessary I apologize for disruptions and office hours including mine hopefully this will stabilize um and one change to sections is we're going to start recording some walkthroughs of the section material independent of the section itself so that then posting videos so that you can look at those okay um we've relaxed some deadlines and added some slip days so the piazza post talks about what we did uh we're making homework uh eight nine optional um and moving the homework deadlines out and giving you giving you some more slip days uh to help and realize it's extremely stressful as you guys kind of transition from uh dorms out to home and um etc and so uh just let us know how it's going and make sure to stay in contact with your ta's and and so on um now this uh is a little more controversial than I thought um but uh we move midterm two to uh the week after the week after spring break really uh hoping to uh give people a little more time to get adjusted now I understand that this is conflicting with physics uh seven b um so uh we may need to tune this date a little bit more but I'm intending to do it that week and I think the question is really um and maybe I'll post this on piazza whether making it say Wednesday or something instead of Tuesday I guess given things are virtual we have a little more flexibility so I'm gonna maybe do a little bit of a poll to try to figure out whether Tuesday or Wednesday is better and give you guys a little more flexibility uh part of this is for us to figure out exactly what the right way to do the midterms are um and so anyway uh I'm uh we're just trying to trying to help you guys get through this crazy transition period I think the next two weeks are going to be very stressful so all right uh I don't know if I have any more to say on this but um we want to keep CS 162 going and uh I see a lot of thank yous on the group chat so you're welcome uh let's you know I love this material so I want to make sure you guys get a chance to still learn it and I know you're working very hard so um ah so the question is about the midterm being after the past no past deadline that's an interesting question uh I think it always was though wasn't it because it was midterm 10 uh I think it's the 10 weeks into the term but maybe that counts as the previous one all right uh that's a good question well uh let's figure this out um okay gotcha former date was before the deadline hmm okay uh I guess a poll is in order okay so let's move uh move on from this a little bit um okay so now moving on um for now we'll figure this out a little bit more I wanted to give you a little bit of uh more of a um graphic before we move on to some of the mechanisms for replacement policies so in the old days what happened when you loaded an executable into memory is basically the executable lived on disk uh contained code and data segments os loads into memory and you pretty much load everything at once um program sets up stack and heap and uh we run but let's look at with uh look at this in more detail when we add virtual address spaces into the picture um so the virtual address space for a process is uh this in which the um you know we have the kernel and for now we're gonna keep this up um keep this up kind of uh the kernel up in the address space we're gonna ignore the sort of meltdown questions for now um and we have sort of the stack growing down and the heat growing up and data and code and this is the virtual address space on disk again we still have the executable with code and data and so on and what happens is uh the pages that we end up using uh in the virtual address space are going to be backed on disk called the backing store and typically it's in an optimized block store but um you could think of this as a file and in many operating systems it is just a file um so this is uh on disk the backing store for the virtual image so what does that really mean that means that um we take this virtual address space and all of its information it's in the disk it's on the disk somewhere it's not necessarily physically in memory okay and so the user page table maps the entire virtual address space um in one way or another swapped into an out of memory as needed so uh just to give you an example here's the page table and notice how the page table is mapping um some of the entries from the virtual address space into physical uh parts on the disk or excuse me physical parts in memory so these pages are actually in memory as is the page table itself because remember the page table can uh be paged out all right but uh what else well the things of the virtual address space that are not in memory have to be somewhere else okay so the resident pages map to memory frames everything else the os records were to find them so if you think about this for a moment abstractly we have all of these page table entries which are marked as invalid because they don't map to physical memory and they point somehow at the stack and at the heat and at the data that's not in use okay so these are all back pointers now there was a good question earlier uh how how does that happen I mean how do you do this mapping and there's many ways to do it uh and uh anything you can think of is probably being done some simple ways of course are the there's 31 free bits in the page table once things are invalid so you could use those 31 bits to indicate where things are in the swap drive or the swap file the other thing you could do is in fact you could have a mirror of your virtual address space in a data structure like a hash table or something in the kernel that also maps virtual address to where it is on disk so you're you you're all very clever and would have any ways that you could come up with of doing that mapping but this is up to the operating system and it's entirely done in software so the hardware doesn't have to worry about uh for pages that are invalid where are they that's up to the operating system okay and so for instance in typical linux there's like a fine block uh where you give it a process id and a page number and it'll tell you which disk block it goes to okay and some operating systems have spare space in the pte to use it some uh purely in softwares hash tables and um i guess i said all this already in the previous slide okay um and you usually want a backing store for the resident pages too what do i mean by that so that even though these pages are in memory you probably have a space for them on disk okay and why is that well because um well if they're read only uh you can just throw them out without having to worry about putting them somewhere and if they're dirty uh when you write them back to disk there's a well-defined place for it so usually this whole address space uh of all of the active pages has a place on disk so um and you may map code segment directly to the on disk image and this saves a copy to the swap file so let me show you this and you can share that so for instance here's an interesting thing where we have process one has some code which notice by the way now these code frames i have is a cyan color here and the code is on disk okay and so basically um things that aren't in memory are just mapped back to the disk itself we don't have to somehow map them to a separate copy okay separate from the binary and then here's process number two and the fact that it shares the same code means that really we only have to find space on disk for the things that are unique about process two like its stack heap and data but not the code and so um in that instance here if you notice the code maps back to the same code segment and so in the page table the code that can actually map to the same physical page frame between these two processes so this is a pretty good use of shared um shared mapping okay and this is used for uh binaries that are launched in more than one process if you're running the same program multiple times it's also very good for shared libraries etc all right um now last thing i wanted to point out by the way so this is the the page table uh in memory as well so um one last thing is uh an active and interesting uh aspect here is let's look at what happens with the page fault now that we've got this all laid out yeah guys can you hold the that's good question if you can or a good point if you could hold the comments to the lecture for now um let's uh we'll talk about this other option later um if you look here at uh process one is busy running and if notice what happens here if we're trying to do a data access and we get a page fault now what so what we've got here is this representation of who's actually using the processor at this point uh we try to reference something in the page table it's invalid and so now what uh well this process can't run anymore so what we're going to do is we're going to start pulling that page off of the disk at the same time we're going to switch over to this other process which can now run okay and so once we've started this fetching um we've gone out to the device driver and it started to disk fetch and so on we move our way over to this process and we're busy running later when the the data comes back and is put into memory then at that point we can uh fix the page table entry up we can restart the active process it'll rerun that same instruction that was run before and this time it will succeed in this process can go forward okay good questions all right now uh so the summary here is basically when we go to do a load for instance we try to reference it we get a uh an invalid page table entry that traps into the kernel the operating system decides what to do for instance it finds that there's a page to pull in that page gets pulled into a free frame in memory if there weren't a free frame in memory we would have to sort of write one back out to disk or find one uh once it's finally in then we can reset the page table to be valid and rerun that load instruction in the second time it will succeed okay so that's our basic page full handling all right so some questions we need to answer so during a page fault where does the operating system get a free frame from well it keeps a free list okay oh that's fine but where does the free list come from well unix variants typically run some form of reaper which is if the memory gets too full schedule dirty pages through you've written back to disk um and zero out clean pages which haven't been accessed in a while to help keep the free list free full so that at the point that we actually get a page fault we can just go to the free list and grab a free page as the last resort we may have to evict a dirty page first and this gets a little tricky um the question is sort of how do we build all these mechanisms uh how do we organize them and what's the replacement policy right i'm kind of saying here that uh we're running a reaper to get rid of things but what exactly do we do and then how many page frames for process okay the question is sort of as we're balancing between a lot of different processes then what uh process a gets you know so much d ram process b gets so much d ram what's our organization process okay and so the rest of the lecture i'm going to be talking about some of these mechanisms so right now oh it's uh yeah so thrashing is uh when a process gets put put to sleep to pull a page from disk and then the page gets evicted before the process gets to run again is the question and that's correct and we'll talk more about thrashing um and uh in a little bit but basically if you're in a situation in which processes are constantly going to disk and not making any progress um that's called thrashing and uh that's a very bad scenario and it's clearly a situation where you have too many processes running for the amount of memory or that they're working sets of all the processes when you add them up is more than the physical d ram we'll talk about that in a moment um and the other question was it's an invalid page the same as a page that's unmapped and uh the answer would be yes to that okay what's the difference between a page in a frame in this instance the page in a frame are the same thing those are different names uh typically the page is is 4k and uh that's the size that we're moving through the system and and registering with page table entries uh so this is a great place to break let's take about a i don't know four minute break and come back and we will continue where we left off i'm going to pause the recording and uh and i'll see you guys in just a moment okay since we have a couple of questions here i'll i'll talk before we restart so um there's a little bit of more of a question about um page versus frame so uh when you look at a disk and uh other block device typically there's something called a sector which is the physical kind of minimum thing that you can read in right off of a disk uh sectors are small so typically they're um in a lot of disks they're like 512 bytes that that's uh sector size is really not particularly useful because the overhead of doing things in 512 byte uh chunks is just too high and so things are combined together into 4k pages so oftentimes the word page is what you pull on and off the disk but it's not the physical minimum thing it's the it's the unit that the operating system has decided to pull on and off pages are also what uh is the page table is optimized for and so if you notice all of our discussions with that page table entry that's 32 bits have been about uh the fact that it's um uh optimized for 4k pages in that instance okay now what um when people talk about frames moving in and out and pages moving in and out those are oftentimes used interchangeably and are kind of uh i realize it gets a little confusing uh but uh i would say for most of the things we're talking about here a frame and a page are pretty much the same but the physical unit you're looking for there is a sector all right i don't know if that answered uh that question but um i think let's continue with the lecture uh so moving forward um we uh let's talk a little bit about this uh using the memory as a cache okay and so to do that i want to say a little bit about our working set model so if you typically look over time and you look at the addresses that are in use uh what we have is address on the y-axis and time what you see is that um as the program executes it sort of transitions through sequences of working sets which are chunks of addresses over time that are in use and so as we transition through here you'll see that in any given time you could take a slice and you'd see that well this range of addresses and that range of addresses are in active use and then maybe during this segment only these uh parts of the address are in active use and so on um this segment is particularly um i'm going to say bad and you'll see why in a moment because look at all most of the addresses are in use so in cases where we have a small number of addresses the working set which is the set of things in active use is small when we have a lot of addresses in use the working sets large and what we want is we want the working set of the processes that are currently co-resident in the DRAM to uh add up to something less than the total size of the DRAM otherwise we're going to be fresh okay and so what's interesting here is you can talk about the uh cash behavior under a working set model and typically what happens is as you increase the cash size more and more cash of course the hit rate goes up because you can store more things in the cash and what's interesting about this is because of this working set model um as you grow the cash you're allowed you're able to sort of um absorb larger and larger working sets but they sort of uh happen in kind of chunks where it's like well small cash is good here and then when I have a slightly bigger cash suddenly maybe I can do there and when I have a really big cash I can do that and you get this stair step kind of behavior and growth and hit rate as a result of uh the cash size okay and this is also true that as we put uh give more or less of the DRAM to different processes as cash we can get different hit rates out of the different processes depending on what their working set model is like okay and we're transitioning from one working set to the next uh will actually cause us to um have some misses potentially if we have a big enough cash we could have two different working sets that are co-resident okay and uh you know we can start worrying about capacity conflict and compulsory misses all of the same things you've thought about before start coming into play applicable to pretty much every cash uh what's interesting though is there's another model which is um of locality uh which is the zip model and this is basically that the likelihood of accessing items of rank r is one over r to the a so rank means this is the most popular item the second one's the second most popular item and so on we sort by popularity and then if we actually look at the hit rate uh and we look at the popularity what you see is this popularity is uh what fraction of the uh items of a given popularity are there and the um the hit rate is a function of cash as we make the cash bigger so that it now can encompass more and more of the rank we see what the hit rate does so notice that the hit rate's over here what's interesting about zip is that it's a very common access pattern for uh database accesses or website accesses over the internet but uh what it really says is kind of that it's very rare to access items below the top few but there's a very long tail um you get substantial value from even a tiny cash because if you have a tiny cash you suddenly get uh a huge hit gain but uh even though you have a fairly large cast you still have a lot of hits because there's a lot of items that are out in the um in the tail here and so uh depending on your model of access you can get completely different uh behavior of hit rate as a function of of cash size um that much being said uh let's let's think about demand paging uh from a cost model standpoint and again we sort of have hit rate times hit time plus miss rate times miss time or and by the way this is effective access time eat uh or hit time plus miss rate times miss penalty these two guys are the same idea so we could say our memory access time is 200 nanoseconds our average page fold service time might be eight milliseconds so this is looking at the disk notice the difference between uh d ram and disk is is a significant difference and the probability of a miss might be p and one minus probability of a hit is uh one minus p or is the probability of a hit excuse me and so then we can say well the expected access time is 200 nanoseconds that's our hit time plus the probability of a miss times eight milliseconds and uh this is something to watch out for you can't add nanoseconds and milliseconds directly you actually have to scale to one unit or another so i bring this out to nanoseconds and now i can say sort of if one axis out of a thousand causes a page fault then uh you add this up and uh you plug a um one of one one thousandth into p and you get 8.2 microseconds for your average access time which is a slowdown by a factor of 40 so look at this for a sec here just by saying one out of a thousand is a miss you know that's a 99.9 percent hit is a slowdown of 40 so this is why basically today's machines can't afford to ever page fault on a regular basis and pull things off of disk because you just kill performance and so if you're page folding a lot your processes are running very poorly and you need more DRAM if you want a slowdown that's less than 10 percent here's a good execution or a good exercise for you guys then you can say well 200 nanoseconds times 1.1 is less than effective access time you work it out you find that you end up with one page fault in 400 thousand is necessary to only have a 10 percent slowdown so hopefully what you get out of this slide is page fault bad uh no page fault a big DRAM good right uh so um keep that in mind and again this is just that disks are slow they're physical beings now what factors lead to misses in page cache well just like in a regular cache we have compulsory misses which is the first time we've ever seen something how much you get rid of them while you could prefetch so if you have something and you notice that basically you're starting to read an item off of disk because you did a fault maybe you read the next couple of them hoping that there's some spatial locality so rather than pulling just a single page in on a miss maybe you prefetch a few pages or maybe you do something more complicated this is like predicting the future in some way uh but you know we talked about that uh a while back where we were talking about schedulers so um you know whatever way you can predict maybe you only you don't load just one page off the disk you load multiple capacity misses is really a not enough memory so this is a situation where you got to get more DRAM because uh your working set sizes don't you know some of the working set doesn't fit in memory uh increased DRAM that's not a quick fix another option is is surprisingly if you've got too many processes in memory maybe you ought to just stop a couple of them page them out to disk run the remaining ones to completion and then pull the other ones back off disk you're more likely to complete much faster than if you try to run them all at the same time uh think that through that might seem a little counterintuitive but in fact in this instance thrashing is just such a bad thing that it's much better to send things out to disk and bring them back in and to try to run them all simultaneously conflict misses well there aren't any why we have a fully associated cache okay so that's good there's a new thing called policy misses okay this is something that you don't really see when you talk about hardware caches and a policy miss is uh one in which pages were in memory before so it's sounding a little bit like a conflict but they got kicked out prematurely because you had a bad replacement policy so if you had a replacement policy that was very poorly figuring out uh which were important pages and it was just kicking them out what could happen there is you load a page into memory off of disk and then you go back to look for it and your replacement policy is kicked it out and you're acting very slowly again and so really policy is extremely important a replacement policy so how do you fix better replacement policy now what are some page replacement policies well why do we care about it well replacements an issue with any cache it's particularly important with pages cost of being wrong is high you got to go to disk okay we want to keep important pages in memory so let's talk about a few so fifo first in first out you throw out the oldest page you're fair you let every page live in memory for the same amount of time um seems good because it's simple but in fact it's just a bad idea because if you have a page that's used over and over and over again look what happens with fifo it's used over and over and over again and eventually it's old even though it's still constantly in use it gets thrown out so fifo has no adaptation for for the way the pages are used and as a result you could imagine it's just bad idea as a replacement policy for pages in demand page another might be random pick a random page throw it out so this is the typical solution for the tlbs themselves okay because the tlb at worst maybe has to do a page uh you know go down to the page table and uh fetch it although you could try to do lru with tlbs it's easier to do random but it's pretty unpredictable and then unpredictability means that uh it's still a bad idea not as bad as fifo but it's still a bad idea because it has a tendency to throw out just by accident things that are still heavily used well that doesn't seem great so what else could we do how about min this is great here is uh let's replace the page that won't be used for the longest time in the future perfect all right so this is what we need except uh unfortunately you got to know the future uh and so min is going to be a guaranteed not to exceed policy for us uh it's provably optimal but um perhaps a way we can get toward min is the fact that the past is a good predictor of the future um so the uh the answer for why a random is typical solution for tlbs is just that uh it's not it's hard to do lru uh in hardware for something that has to be as fast as uh part of a pipeline cycle and so um if you've actually got your page table properly cached then um you know that page walk might not be too bad and so using random tlb replacement when you only have 120 when you have 128 or 256 bit 256 entries uh might be an okay policy and way to keep the tlb fast um the other thing i will point out is as i showed you earlier in the lecture typical tlbs that are really fast uh close to the processor pipeline are backed up by a second level tlb which might have a better replacement policy okay so fifo is bad so the question is can you explain again why fifo is bad and the point is the heavy pages get in the cache so let's suppose uh suppose you have a page that's used uh over and over again in fact it's used every other page so you use the the heavily used one then you use something else and use heavily used one you use something else and so what happens is you're walking your way through all the pages with the something else but that heavy one keeps getting used eventually it becomes the oldest page and you throw it out even though it's the absolutely the last page you could possibly want to throw out because it's the one you're using every other time hopefully that will answer your question so another thing we might imagine is least recently used okay replace the page it hasn't been used for the longest time programs have locality so if something's not used in a while it's unlikely to use in the near future so maybe lru is a good approximation to min where we use the one that hasn't been used for the longest time maybe that'll give us a good idea of the thing that's unlikely to be used for this in the future so how do you build lru well you build a list right so you put all the pages in the list you have a head and a tail and as a result you can kind of figure out what the oldest page is and on each use you remove the page from a list and place it at the head so the most recent one is here and the lru is the tail and this sounds great is it great anybody have any thoughts yeah the comment that was on the chat here is it seems slower the answer is yes the reason this is slow is remember that the we're trying to do really rapid replacement policy and the the trouble here is that we're doing a bunch of page manipulations for every every load and store to figure out which page is the most or the least recently used so that means that we took something that we had gotten fast the load and store by building a tlb and building a cache and we made it slower again because every access has to rearrange this this list of all pages and it has to do multiple pointer manipulations so this is definitely slower okay and so you need to know immediately when each page is used you can change the position in the list and this is just many instructions per instruction for loader store and this is just not going to be what we want so we clearly would like to at least figure out how to implement lru but this is currently not the way to do it so in practice we approximate so let's look at uh FIFO strawman here i want you to see suppose we have three page frames four virtual frames and so notice that since we have four pages in our virtual address space but three page frames it's not possible for all of those four pages to be in uh DRAM simultaneously so i've set up a situation which is uh you know guaranteed to do some replacement and now suppose we do you know abcabdadbcb there's our access pattern and let's consider FIFO page replacement so here is uh page one two and three in the physical memory and here's our reference the first one we asked for is a and uh one is our first thing to replace that's no problem then we asked for b and so we're going to leave a in here because our next page to replace is b is number two uh so we put b in number two c we put in number three and now at this point the FIFO policy has to start replacing pages uh as soon as we need some a new page we're going to replace a by FIFO policy but fortunately the next thing we want in this access up here is a oh a is cached we don't have to replace anything b b is cached c oh not c okay now d comes up we don't have d so now we have to replace something well this case because we're doing FIFO it's a that gets replaced okay because the FIFO replacement a is the oldest page now we come back to a and oh ta is not in memory and so a has to be replaced and now number two is our uh page two is our thing to replace and so we put a here and then we come back to d you know that's good and we come back to b and ah b is missing what do we replace now page three okay c comes back i gotta replace page one b comes back oh that's cached we're good to go so FIFO in this access pattern gives us seven faults and we're referencing d again if you look right here clearly replacing a was the wrong thing to do because a was the very next uh thing that we wanted so here this is clearly a policy problem or policy miss when we go to look for a and it's missed all right so now let's see if we can apply the same idea to min so suppose we have the same reference stream but now what we're going to do is we're going to use an oracle and figure out to replace the page that's used the longest in the future or that's uh needed the longest in the future and so a we start out exactly the same way okay because there's nothing different here because there's nothing in page one two and three but now everything's full if we come back to a a is good we come back to b b's good now when we go to d the question is what do we do and if we look in the future here clearly c is the farthest one in the future that's going to be used and so when we replace for d we're going to replace c okay and why c because we use this magic oracle that replaces for us the one farthest in the future that we need and so now when we go to a and d those are all fully cached we get to b that's fully cached we get to c at that point we have to do a replacement and pretty much at that point we can replace anything because uh there's not much of a future here right and b is still in memory so the only thing uh given this reference stream that would have been wrong is to replace b at this point all right so min has only five faults whereas pipe overhead seven uh and what would lru do so it turns out that it's the same decision as min so in this case uh if we had applied lru to this pattern we would have gotten the same uh result okay and that's really because um at the point we go to replace d you can see that a and b have been recently used and so c is the least recently used and so we would have gotten the same pattern not always the case and in fact here's a question suppose we have abcd abcd abcd i hope you can see this is going to be just bad for lru so look why this is the worst problem this is the worst possible case for lru where the working set of n plus one on n frames n plus one pages on n frames or n plus one frames on n frames basically gives us a miss every time okay if you look at look at them together uh when will lru perform badly you can see this example and let's look at how min does much better here so if you look uh if i work this out min will actually uh go from one two three four five six seven eight nine ten eleven twelve faults to one two three four five six so while lru seems like a bit of approximation to min lru is not min and there are some weird patterns in which uh min does better but of course this is contrived and if you're really walking through all of your memory um and filling up a page uh pretty much you're in trouble okay um now what we'd like is a situation where if we are in trouble because we see that we don't have quite enough memory and we're page faulting enough wouldn't it be nice if we could just add some more pages to the um to that process and lower our uh miss rate so in this instance up here if we just added a fourth page to this process then all of a sudden it would have no misses after the first four uh compulsory misses we would be good to go in fact if we had a prefetch or maybe we fetch a and it fetches bc and d and we have only one miss but that's a that's a different conversation for a little later so the question is this is a desirable property you add memory the miss rate drops is it always the case seems like it ought to be right it seems like i ought to be able to add more frames and i'm good to go and the answer is no so there is a famous anomaly called belay these anomaly and that's certain replacement algorithms like fifo don't have the property that when you increase uh and add memory to it that you actually get a lower hit rate um and uh it's true so adding memory does actually make l r u have been better not for fifo and uh there are some good examples you can show where i go from three pages okay with fifo that one i showed you earlier well it's actually a little different because it's got e in here but three pages with fifo versus four pages with fifo and if you were to count this out there's actually more misses with more pages down below so uh fifo is not only bad because it throws out uh uh highly used but first referenced in the far past pages but it also has belay these anomaly so the bottom line of this slide is just say no to using fifo for a paging algorithm okay so let's talk a little bit about l r u before we finish up for the night um so a perfect implementation of l r u might be we time speed time stamp every page on a reference somehow and keep a list of pages ordered by time but um this is also too expensive to implement for many reasons so this and the linked list option i showed you earlier is not good but what we can do is the clock algorithm and the clock algorithm which we'll go into i guess we're not going to have enough time to go into it in depth today but i just want to give you a little flavor for it is we're going to arrange physical pages in a circle with a single clock hand uh approximating l r u okay um and what we're going to do is we're going to take every physical page and put it into a serp into a linked list okay and we're just going to walk around and we're going to replace an old page not the oldest page so the reason that the clock is going to work and not have this problem that we had with the linked list is uh we're no longer interested in replacing the absolute oldest page just an old page okay and the details are we're going to have uh in our page table entry a use bit and that hardware use bit basically gets set by the hardware every time a page is used a mapped page so when you have a page that is um mapped properly and you do a loader store to it what happens is the use bit or the access bit gets set okay and what that says is if the bit isn't set the page hasn't been referenced in a while okay and so on a page fault all we do is we look at the next page in our ring not in real time we look at the next page in our ring and we say look uh check the use bit if it's a one it's used recently and so therefore this page is is popular somewhat popular we'll clear that page and leave it alone otherwise if it's zero we're going to say hey this page could be replaced because it's not that popular and hasn't been used recently now why do I mean used recently well because we're we're going to clock through every page in a circle and uh as uh and as we go around we set that bit to zero and so a bit equal to zero when we look at it really says that it hasn't been used in the last cycle through all the pages okay and the question might be will we always find a page or we're going to loop forever and the answer is that even if all of the use bits are set uh we're going to go through we're going to set um set them to zero in a full loop and then we'll get to use one on the next page so here's a graphical version of this so if you look there's a single clock hand as we go around we first look at the use bit if the use bit is off then we've got a page to replace otherwise we set it off and we keep walking around until we find a page that has the bit off and um what a bit that's off really says is that in the last loop nobody's used it okay now if the hand is moving really slowly what this means is we're not looking for a lot of pages to replace so that's good okay that's a good sign okay not many page faults and uh so we're good to go the hand is moving quickly we know that there's lots of page faults or we're having to go around many times to find a page and so we're basically thrashing okay and so one way to view this clock algorithm is that it's a crude partitioning of pages into two groups young and old pages okay are there any questions on that okay you know you may ask a question so this is effectively partitioning pages into old and new pages or old and uh and uh recently touched pages into two groups why not more than two groups so the question is does this introduce a lot of overhead good question so what is it what is uh when do we run the clock algorithm well the way I've described it to you we only run the clock algorithm on a page fault where we already know that we're going to have to go to disk and therefore it's going to be slow anyway so the overhead of this algorithm is probably not particularly high relative to a million instructions worth of time so that's one answer to your question the second answer though is that in reality this isn't quite what happens what happens is we use the clock algorithm to fill up a linked list uh when we're not busy page faulting or excuse me a free list when we're not page faulting and as a result typically on a page fault we go to the free list first and we're only running this clock algorithm to keep the free list free okay um the last thing since we're running virtual I'm going to go for a few more minutes here there's a version of the clock algorithm called the nth chance algorithm which is basically give the page n chances and so the operating system keeps the counter per page which is how many times have we gone around and looked at a given page and on a page fault the os checks the use bit if the use bits are one what do we know we know the page is recently used so in that case we clear the use bit and the counter uh because that's a that's a frequently or a recently used page if it's a zero we know that page hasn't been used in the last cycle but maybe it was used in a previous cycle and we don't want to just divide our pages into old and new we want to do something a little closer to lru and so what we do in that case is we increment the counter associated with that page and uh if the count hits n then we replace it okay and this means that the clock hand basically has to sweep by n times without a page being used before it's replaced and how do you pick n well if you pick a really large n you actually get a good approximation to lru but you have to go around many times before you decide a page is is uh really old um a reason to do small n is it's more efficient and uh what about dirty pages in this instance what often happens is because the dirty page can't be immediately reused what people do is for a dirty page you let n be a little bigger and you start a dirty page out to disk when you first notice it and then uh let you go around a couple of times and so a common approach might actually be for n pages with set n equal or for dirty uh excuse me for clean pages we set it into one and for dirty pages we set n to two um and uh this is the n chance version the clock algorithm so i want to free you guys up so in conclusion we started talking about replacement policies which are uh how to how do we get by those policy misses and so the replacement policies are for instance pipo uh that we talked about place pages on q and replace the page at the end of the q so that's replace old pages not well used pages we talked about min which is replace the page that will use farthest in the future that's a guaranteed not to exceed oracle that we'd like to approximate and our best approximation we've come up with so far as lru which is replace a page that's used farthest in the past the problem is that lru can't be done easily and so we started talking about the clock algorithm which is an approximation to lru and just to repeat that you take all of the pages you put them in a circular list you sweep through them marking them as not in use as you go around and if a page isn't in use uh for one or more passes then you know that it's an old page and maybe it's replaceable and we started now talking about the n chance algorithm where we let a page uh be not in use for a few more loops to let it age a little longer before we throw it out um and uh next time we're actually going to talk about some interesting variants like the second chance