 All right, welcome back everybody to 162 we're gonna pick up where we left off discussing various demand paging techniques from last time as I've said before make sure to type in the Make sure to type in the chat if you have any questions and not to worry it's a fake background. So yes, I'm sheltering inside So one thing I wanted to remind you is we've been talking a lot about cashing and I kind of joke That pretty much everything in operating systems is a cache But if you remember we talked about this before One way to compute the effective caches is to compute the average memory access time and we talked about that last time And that's sort of the average memory access time is hit rate times hit time plus this rate times miss time And then of course, you know the hit rate plus the miss rates equal to one That's just a probability statement the hit time is the time to get the value from the cash if it's there So, you know, that depends on the speed of this cash itself The miss time is the time basically that you incur when you miss and that involves both grabbing it from the DRAM And then getting it to the processor from the L1 cache So typically that's hit time plus miss penalty those two together now the thing We didn't talk too much about is the miss penalty, which is the average time to get a value from the lower level in this case from DRAM Okay, so that's what miss penalty L1 is this is the total time on average to pull something from DRAM and if you put all these things together you get this other Version which is equally valid average memory access time, which is the hit time of the L1 cache plus the miss rate times the miss penalty Okay, and if you look at these very carefully these two versions of AMAT are the same Now the reason I wanted to just briefly bring this up and again This is 61c is if we have more levels like an L2 cache or perhaps there's a TLB in front We start out exactly the same way So the average memory access time is going to be the time on average to pull from the L1 cache So if you notice this is exactly as it was above But now we have something interesting because the miss penalty for grabbing something from below is actually the average time to pull something out of the L2 cache Which is surprisingly not surprisingly hit time of L2 plus miss rate of L2 times miss penalty Notice that this isn't the exact L2 version of this and so what is miss penalty L2? Well, that's not the time to get from the DRAM you put all these together And you get the average memory access time for this second version with two levels of cash is the hit time of the L1 plus the miss rate of the L1 times The hit time the L2 plus miss rate L2 times miss penalty And if you wanted to put three four five however many levels of cash in there, this would just keep recursively Writing out like this. And so this by the way is one of the reasons I really like this Second version of the AMAT is it's a very cleanly hierarchical and so basically you can do this recursively for many more levels. All right So One of the ways we were actually using something like average memory access time Which we're going to call effective access time when we've got the disk involved here is really the hit rate Plus the miss rate times miss penalty That's exactly what I just showed you except in this time in this case the hit time is the time to get it out of the DRAM And the miss time is the time to go to disk and notice the difference here 200 nanoseconds That's 10 to the minus 9 versus 8 milliseconds. That's 10 to the minus 3. Those are very different times And then we did this last time we said well the average time is really that 200 nanoseconds That's the hit time up here plus the miss rate, which is a probability. We failed to have it in the DRAM times the time to go to disk we normalize the units and then we gave you a couple of examples here So if for instance one out of a thousand page references cause a page fault We've already slowed everything down by 40 So this is much worse than the hardware case where we're just talking about DRAM to cash in this case when we're talking disk to DRAM That disk is really slow And you can do a kind of a different computation if you want to know what if the slowdown is less than 10 percent and you can basically figure that out Which basically means that my time is no more than 1.1 times 200 nanoseconds And that turns out that I can't afford more than one page fault in 400,000 or Basically, I have more than a 10 percent hit All right, so I just wanted to repeat that And of course what's interesting given maybe what I just showed you last time is you could put a cash in front of this So, you know the average so that uh, you know, this hit time Would actually be used in the previous equation hit time post miss rate post miss penalty. You could use this effective access time here In the place of l2 if you wanted so make this the DRAM and make this the disk or whatever So this hierarchical measurement is in general useful Okay, I'm going to leave that but I just wanted to make sure that there weren't any questions on that before I leave it Are we good? Okay, so I would expect that everyone here would be able to do this by the way So, uh, I wouldn't hurt to look through these slides and ask your tas if you want So now where we were last time. Oh, so what was the conclusion from this slide the conclusion from this slide was We really can't page fault and in fact if we're if we have any significant faults We're going to burn all of our performance. And so what that says is it's really important to get your replacement policy correct So that you don't have to worry about Page faulting very often and so that you keep all of the pages you really care about in memory. And so this is One way to look at this. We talked about the clock algorithm as a version It's like lru and there's a single clock hand that advances only on page faults And what we do is every time we advance the hand we're assuming we're looking for a new page and when we do that advancement what we see here is that We first check and see if that page has been used recently And if it hasn't we go ahead and use it otherwise we clear the use bit And go on to the next one and so the key thing to remember is the clock algorithm as we described it last time is about finding a An old page not the oldest page And so the bits in the page table entry that are useful to us I didn't list this at the end of the lecture last time, but the use bit The modified bit the valid bit and the read only bit And we'll talk more about this in a moment for other Other approximations of lru, but again what I'm talking about here is each time you move the hand You check this use bit and if the hardware has set the use bit To a one then you know in the last time that you went all the way around this clock hand Somebody has touched the page and so that means it's not an old page. It's a relatively new page And so you clear the use bit to zero and then keep advancing And if a bit stayed if the use bit stays zero all the way around Then you know that that page hasn't been used in one loop The modified bit of course is also supported in hardware typically and that tells us whether somebody has modified the page since it was paged in Okay, are there uh Any questions so the clock algorithm just to to give you another uh way to state it is pages are all in a ring So basically every page is linked together. We move the clock hand along that ring and on a page fault We move the clock hand. We check the use bit once as it's uh Been used recently so we clear it again Zero says it's a selected candidate for replacement and it's really a crude partitioning of pages into old and new Okay, I want to pause there to see if there were any further questions Yes, I I wonder does uh keeping track of uh the these These bits uh, does that happen in the operating system or is that more of a hardware thing? Well, so that's a good question and the answer to that question is yes It can be done in many ways. So the way I've described it to you so far Uh, the use and the modified bits are entirely handled in hardware Um, that's not going to be true in the next couple of pages But for the next couple of slides, but for now the way I've described this is the hardware For a page that's been mapped and valid will set the use bit every time you do a A load or store to one of those pages and the modified bit will be set in hardware every time you do a store to one of those pages Okay all right And the advantage of clock over lru is the question. What is the advantage and the answer is um The advantage here is that the clock algorithm we can do really inexpensively All we're doing is linking all the pages together and we're just sort of traversing them one at a time To check their use and modified bits the problem with lru as I mentioned last time Is in principle to get the actual least recently used you would have to track every load and store And keep rearranging all the pages So that you could actually figure out which page was the oldest And that was leading us to one an approximation because that would be way too expensive To have in the operating system. So that answer that question All right, so Here is our pte and if you notice we've been talking about a couple of bits. Here's the dirty bit Here's the access to our use bit Okay, here's the right only or right able excuse me bit and the uh valid bit, which is a p Okay, and do we really need the use and dirty bits? Okay, and that's uh the access to our dirty bits Um, and the answer is well, no you can emulate them in software. So if you keep software structure Um in your operating system, you have actually uh your own versions of use dirty right able and present And by the way, I when I put this on here, I thought it would help But I apologize the uh the access bit a is the same as the use bit. Okay, so my apologies there I will clean up those slides for later But anyway, you can emulate these bits. So the question previous was previously was do they have to be in hardware? Well, ideally they're on hardware, but if your hardware doesn't have them Then you emulate them in software and the way you do that is you keep a data structure that for every Mapped page keeps track of what the page table entry use dirty right able and present bits are You mark all the pages in the physically and the pte is invalid. So that means mark presence is zero Okay, and What does that mean that means that even though the page is in memory every time we go to try to reference it We're going to get a page fault because the page table says it's not valid but because we're keeping the real presence bits in Our own data structure that means that as soon as the page fault happens We take a look and we say oh wait a minute. This thing is actually supposed to be present And you know where it is in physical memory because of the data structure here as well And therefore what I know is happening is I am using the invalidity of this page to figure out the use bit Okay, for instance, and so if we read an invalid page We trap to the operating system if the page is actually in memory. We set the use bit mark the page read only Um, and otherwise we actually handle a page fault go to disk All right on a right to one of these invalid or potentially read only pages We are going to set the use bit in software the software use bit and Set the dirty bit in software and as a result We are now emulating use in dirty bits directly by page faults And that gives us the ability to use hardware that may not have those things directly May not support the use in dirty bits directly And so when the clock hand advances, of course, we check the software use in dirty bits That's the one we have in software and to decide what to do And if we're not uh reclaiming we're going to mark the page table entry Invalid again and reset the software use in dirty bits for the next loop So the question we have here is how would you know when to clear the present bit? Would the software not know which pages the mmu is removing? So the answer here is That's a great question. And let me repeat what I just said. So we actually have the hardware page table entry and the software operating system structure the software operating system structure is keeping track of Which pages are actually in memory and where? And so all we're doing is we're now emulating these d and a bits by playing with the p and w bits and so Pretty much every virtual memory system that That's out there has the ability to mark a page table entry as invalid or Read only and so with these two hardware bits we can now Simulate the software ones but in answer to the question of just the software know Which pages the mmu is Keeping track of the answer is the software is the king and knows what's actually in memory And we're playing with the page table entries to generate page folds On exactly those places where we use a page on the first pass around the clock Or we write to the page and that lets us actually simulate the use in dirty bits. Did that answer that question? so Remember of course that the clock algorithm is just an approximation to lru Can we do better? because if we don't have the Access the dirty bits or the use in dirty bits and hardware We're page faulting a lot We're essentially page faulting every time we go around the clock And the question is can we do something better and the answer is yes We can use something called a second chance list All right, so here's a different algorithm from the clock algorithm and this showed up on the original backs hardware And i'll tell you in a moment. It showed up there as a mistake Because the hardware was designed and they forgot to put a use bit in and really not the hardware designer's fault It was actually the operating system people told them they didn't need it and they designed the hardware And it turned out they really wanted it But the way this goes is instead of putting all the pages in a ring like we did with the clock algorithm What we do is we we divide them into two pieces and they're not necessarily equal Chunks, okay, we'll talk about sort of how many pages are on each side in a moment But the green side here are pages that we're going to order in a FIFO list And uh, these are ones that are marked as usable in their page table entries And they're read write and ready to go. So that's why they're green. So anytime the operating system tries to use them They will just work. Okay Now the yellows are pages that are actually in DRAM. So they're not on disk. They're well, they're pulled in from disk They're in DRAM, but Their page table entries are marked as invalid So if the program tries to use one of these pages, even though it's in DRAM, we get a page fault the first time around Okay Now we're going to call this the at the active list or call this the second chance list Now, uh, why is that useful? Well, we uh, as I said the uh, we can access all the pages that are in the active list at full Speed because they're marked as writable and present Otherwise we get a page fault. So that's either a page fault because we touch a page that's in the second chance list Or we touch a page that isn't even in DRAM. It's out on disk What happens there? Well, whenever we get a page fault what we know Is we're going to rearrange things a bit. So the first thing we do is we take sorry the oldest page uh In the FIFO list and we move it to the second chance list at the end Okay. Oh, the other thing is I forgot to mention the green list is as FIFO and the second chance list is is an actual LRU list like you would build In software. Okay. So what we do is we take this page off the end. These are the oldest In the FIFO list we pull that one down to the oldest in the LRU list Or the uh, this is the excuse me. This is the newest in the LRU list Okay, and we're going to mark it as invalid because it's now in this second chance list and then Uh, assuming for a moment that the page fault was because the page was on the second chance list Well, we'll give it a second chance by pulling it off this list Putting it at the new end of the FIFO list and now it'll be green and easily access access So the way to think about this is ignoring for a moment Pulling pages off of disk forget that all we're trying to do here is do a good job of simulating use and dirty bits and We're assuming for a moment that all of these items Are already in DRAM. We just have ones that are in DRAM but marked invalid and ones that are in DRAM But marked valid and this trick Is an approximation to LRU. Why is that? Well, if the green list is is uh, some small subset of the total pages The yellow list is giving us actual LRU because every time we touch a page Uh, we're keeping this list in order and the newer ones on the list Are put on the new end of the LRU and the old ones that haven't been touched are kept at the end of the LRU the LRU list Now let's look at one more type of page fault. So the first type of page fault we looked at was uh The page fault in which it's actually already in memory rather than on disk We just have to change its status The second one is if it's not on the list, we're going to have to page in from disk So in this case, what we do is we page in from disk and put that at the the new end of the FIFO list And we throw out an LRU victim at the uh, the old end of the LRU list put it back on disk And as a result of what we just did here, we're keeping the total sizes of the uh, the two lists the same and um, basically the sum of things on green and things on yellow represent um, the total amount of DRAM pages Okay pause Somebody ask me a question And you're welcome to Talk if you wish So we're all good, huh? This is impressive Okay, well, let's let's see question. Can you read? Yeah, go ahead Um, what's the significance of having the green list being a FIFO list? So the significance of the green list being a FIFO list is the following Uh, we can't do any better Okay, now and the reason that is is because since these pages on the green list are all mapped as, um They're all mapped as uh valid and read write Loads and stores just go to those pages Automatically and just work So there's no way to rearrange these pages Based on anything about access patterns And so the best we can do is we kind of put them in a FIFO list I mean there there's just nothing else we could do because we're not getting any more information That we can use to make any decisions about this The list is Did that make sense Great and the ones on the yellow we can manage as lru because The only time we ever touch the yellow list is on pageful So we're actually running software at that point now. This was a Question about how much this really saves us from the lru slowdown problem the answer is Is a good one here, which it says uh It seems like either you'd have enough green pages that you're losing the lru benefits Or so few green pages that you'd end up stuck with a mostly lru system So the answer to that is clearly there's a trade-off And in fact you could imagine that if there are no yellow pages You've got FIFO. Okay, and we all know that FIFO is a bad idea Because you get bellettis anonymous You get bellettis anomaly and as the result pages that are really popular are going to get pushed out to disk The other thing is if all of the pages are in the second chance list That is going to be Unfortunate because every access every load in store is going to cause a page fault Okay, why is that well because if the green is empty every access to a yellow page is going to cause a page fault And so the answer to the question is that you've got to pick somewhere in the middle here to get the sort of a nice benefit of getting enough lru That really popular pages don't get pushed out to disk but Not having so many page faults that the performance is bad Okay, and going back to this again Oh in the in the first case is the swapping atomic So the question is is your time about is this overflow Modification and atomic operation. Is that the question being asked? So let me let me take a stab at what I think Is being asked. So when we the only time we ever do any of these arrows Okay, is when we're coming in from a page fault So that means that the operating system has control the the user program isn't running So it's not generating any page faults And so none of what we do here can be affected by a page fault happening at the same time So it really is atomic And um, the other thing I wanted to point out is what is the lru benefit here? So imagine that there is a page that's very popular Okay, and we put it on the green side, but eventually it becomes the old page Now if we had FIFO only we would throw it out at that point But instead what we do is we put it on to the yellow list And the yellow list has caused us to call the second chance list because it really was popular All that we have to do is take a quick page fault Pull it back to the green list at the front Adjust the page table entry and now it's got it's going to be in the green list for a whole pass all the way through again And so basically the yellow ones are the second chance List of things that we might want to grab back immediately because they're popular And it's only if they sit around long enough to get all the way to the lru and then we potentially throw them out to disk Okay, are we good now So we're going to pick an intermediate value And the result of this second chance algorithm is we sort of Eliminates a lot of the disk's accesses because page only goes to disk if unused for very long time and Not very popular The con is we haven't increased overhead of trapping to the operating system But if we don't have a use and And dirty bit in Soft or in hardware then this is kind of what we got to do Okay And with page translation we can we can adapt to any kind of access program makes And so this is a good example of using the page table entry to emulate something So those page faults that we're getting in the middle here like the one that causes an overflow notice That's not actually pulling something in from disk what it's doing Is it's performing some operation which is some rearrangement in the operating system and then Making the page table entry valid so that when we retry it just works And so this is Another example just like when we did copy on write and some of these others where we're using the page table entry for something other than necessarily demand pages Um, now the funny little historical significance of this is why didn't fax include the use bit? Well, you had a very conscientious computer architect who talked to the os people and said, you know, okay What what do you need? Let's talk it through they basically told them they didn't need the use bit and And so then they designed the vax and then You know the vax got built and they came back. They said, oh wait, but we need to use it and uh poor striker actually got blamed for uh forgetting to include it, but um Really wasn't his fault and by the way the vax did okay. Anyway in the second chance list kind of gave them their own version of the clock Okay So With that being said What if we go back to a moment for a moment just to uh the hardware having the use and and uh dirty bits And see something else that we could do so the way we've been talking about the clock algorithm Is that whenever you get a page fault you start moving this clock hand Uh to try to find a page that can be reused Now there's several problems with this first is you're running a bunch of complex stuff on a page fault Where what you'd like to do is find a physical page so that you can actually start the uh Start the disk access with dma enabled and so you really need a page Even though the disk access is going to be taking a long time And the second is that some fraction of these pages are dirty And so you can't even use them as soon as you figure out that they're an old page You've got to actually start writing it back to memory or back to disk so you can uh Take care of these two uh problems with one uh solution which is build a free list which is exactly like a second chance list this free list In the background you run this clock hand. This is something often called the page demon What the page demon is doing is it's advancing this clock hand only when the free list kind of gets below a certain threshold And it takes pages that are old and puts them on the free list and notice that I have some mark red with a d And that means that they have data. They're dirty that needs to be written back to disk Okay, and so what we do is we pull them off the clock and we put them in this list And uh, this is a fifo list and what we do when we have a page fault and we need a new page Is we pull something off the free list And the good thing about doing it this way is the dirty pages get pushed out to disk So we start that operation as soon as it's finished This becomes green and so these pages that they had here are typically read only pages that don't have any dirty data in them And we can just overwrite them So this is a second a different version of a second chance list Which is much more common in modern operating systems with hardware that has a use fit So this is typically called the page out demon And it gives us some time for dirty pages to get written back to disk And the other thing is why did I call this a second chance list? Well, should one of these pages need to be reused because somebody references It while you're going around the loop then all you have to do is pull it back into the ring And uh, so it gets a second chance. And so that page has All of the time that it's sitting in the free list to get just pulled back into the ring Without having to come back off disk All right, um, and the advantage here is it's faster for the page fault To get you in use because mostly we have free pages. All right now Uh, another interesting thing that you can imagine is this need for something which we're going to call the reverse page mapping or a core map Which is a mapping. So if you think about what, um The virtual memory mapping mechanism we've been talking about maps from virtual address to physical address okay, uh, but As soon as we want to take a physical page and reassign it to some new address We have to hunt down every page table that is mapping that physical data Okay, and uh, if you remember we talked about sharing A single page with multiple page tables. That's exactly the situation i'm talking about if we want to free up that physical page We have to go hunt down All of the page tables it's in so we can mark them as invalid Okay, and so how do we do that? This is often called a core map or reverse page mapping Okay, which hunt down all the page table entries pointing at a given page frame Um, and uh to see if they're active Okay, and there are many implementation options Uh, you could for instance for every physical page you could keep it a linked list of page table entries That it points to this is pretty expensive um Linux actually has a slightly different, uh mechanism, which is Similar idea, but it takes chunks of physical memory Um in uh, read memory regions in chunks and it frees them up Uh simultaneously and so by taking chunks of pages at a time linux gets to Reduce the overhead of remate of this core map Okay, so um Now let's talk a little bit about allocating of page frames so, uh Who gets what suppose you got a machine that's running and it's got you know 100 processes And one of them gets a page fault then what? So how do we allocate memory? Uh, you can imagine many many options. You could say, uh, maybe every process gets the same fraction of memory Or maybe they get different fractions dependent upon um priority maybe When our pages start getting uh, we start needing a high demand of new pages Maybe we completely swap a processor to out to memory so that it's not even in memory at all Thereby allowing other processes more memory well The starting point for this allocation question is that every process needs a minimum number of pages And that's basically so that the process can make some forward progress without hitting a page fault And for instance, if you were to take any given process Uh, or excuse me any given hardware processor You could figure out what is the minimum number of pages that need to be mapped so that the next instruction can always run And for instance in the ibm 370 you needed six pages Because the instruction being six bytes might span two pages There might be two pages to handle the from and two to two And uh, as a result you could do go through every one of your instructions And say if this instruction happened to be the one that I stopped at Because I was rescheduled or I had a page fault How many pages would I absolutely need to have minimum before that instruction could be guaranteed to run And so this number is a small number of pages But if you violate that you have so many processes in memory That you can't even give six pages to every say process in this instance Then uh, you're really in trouble. You're really crashing So what are some scopes of replacement? One of them is a global scope and in some sense this clock algorithm as I've been describing it to you is a global replacement It says that all pages from all processes are put into the same clock And when a when a plot a process needs a new page Then it can uh glow to the global free list and select a page And in that instance it's basically competing with all processes By grabbing taking away a page from a process that's running You can also do local replacement where each process selects Uh from only its own set of allocated frames So suppose that we say well this process is going to get a hundred pages In memory period Then when it needs a new page you could go and say well, I'm going to just replace its one of its pages To for the new thing I'm paging in so this global versus local both of these are options And they're used in different circumstances a lot of os's use the global replacement case because it's simple But if you really want to have something more real time where you've carefully controlled What memory is in use? Then you might do a local replacement policy So that so that one process cannot if impact the other ones real-time guarantees So let's look at some possibilities for uh fixed priority allocation So you can do this equal allocation every process gets the same amount of memory So you can imagine a hundred frames five processes Uh total then each process only gets 20 page frames You could do a proportional allocation Where uh the bigger processes get more memory So you know here we have sort of the size of process p And we sum up the total sizes of everybody and that sum is used as a fraction So we take the size of a process over the total size times the amount of memory We've got and that gives us the fraction of page frames we want Uh, I don't know can anybody think of uh, this seems like it says give the bigger process More memory and it seems almost like a good idea, but can anybody guess why this could be not a good plan? any thoughts Yeah, great. I have a couple of folks basically saying processes can falsely claim to be large In memory and as a result get a lot of memory frames, even though they don't need them. That's correct The other thing to keep in mind is the size of a process may depend on a bunch of libraries You're linked in that have nothing to do with the ones you're actually using The other thing we could do is priority allocation, which is a proportional scheme using priorities So that higher priority processes get more memory Um, these are all fixed schemes in some sense and they're a little bit unsatisfying because Either they're gameable. That's this proportional one or they don't really reflect What the process actually needs to make forward progress And so maybe we want an adaptive scheme of some sort, you know, what if some application just needs more memory? So, uh, we'll we'll get to that better scheme in a moment I don't have a lot of new administration of you other than to repeat that This new normal is a little weird. Um, as you see, I have changed my background to be Nice and soothing and no, I'm not outside. Um, but uh, make sure to wash your hands and You know, practice good social distancing But on the other hand, make sure to stay in touch with people use your tools Use zoom use google hangouts Whatever you can talk to people Uh phones exactly Whatever you yeah, whatever you can do um I'm going to keep teaching 162. I love this class and uh, it's always better to teach in front of a live audience. Uh, so that's We will keep going on this and see how it goes can have live lecture discussion sections and the sessions and office hours I had somebody ask me earlier about my office hours and I Very much apologize for not having any over the last couple weeks I've been trying to figure out what to do in this this new normal, but I'm going to start having office hours again and so, uh We'll see how that goes. I'll announce them and I may have a few a week or something because it's pretty easy to do virtual office hours for uh, you know, I don't know inspiration or conceptual ideas And we're hopefully starting uh tomorrow. You'll see in section. You'll see a recording walkthrough of the section as well So that that should be interesting Uh, some of the deadlines got relaxed. You can see the piazza post I say from this afternoon. I meant from uh, from uh, Tuesday Sorry, and uh, we did move midterm 2 to April 7th, uh, which is the week after the week after spring break I know we have some conflicts out there And uh, we are continuing to engage with you guys about those um, the material part of this is really for For us to figure out how we're going to do this uh this midterm But um, I would say the questions you have for now the midterm is clearly going to be online probably with zoom proctoring But for now keep your uh conceptual view of this as a an online version of what we've already done and we will Give you more details Uh, since we figured them out. We're kind of figuring out best practices from the midterms people have been doing Um, and I do see some another suggestion on here On the chat list, which is a good one You know, uh, I would say if you have access to Sanitizer your phones and your keyboards might be a good Thing to do as well. I realized Last week that I almost never cleaned my phone. I have uh remedied that Okay, so the new discussion times are so far the same Uh, there'll be one discussion section per time slot Um, so exactly uh as before from nine to four whenever it was You can be sure that every hour That uh was a section will have a section for now There would just only be one ta and I think they will be posting Those links for you guys. Hopefully if they haven't yet on piazza. So just watch for that But we're still doing for now. We're still doing one per time slot. I think when we get into the After spring break if People are not necessarily back at berkeley One thing we may do is try to put in an earlier time slot like eight o'clock In the morning here for people that are on the on the east coast or whatever We'll see we may adjust our time slots later um Okay, good. I see a suggestion about uh, how to get good questions in office hours and um Let's uh, let's talk about some of that stuff offline, but that's those are some good suggestions And actually you could make a piazza post about that that would be good. Okay so going back to our um material so um one uh Thing we could do to try to dynamically figure out how many pages the process needed is uh Something like tracking page fault frequency per process So could we reduce capacity misses uh by dynamically changing the number of pages? So if you imagine that we say well, this process is stuck at 100 pages We're going to start getting capacity misses if it needs more than 100 pages Okay, and so if you were to look at this curve, which I showed you earlier uh In a slightly different context you could imagine that the page fault rate for a given process is a function of the number of pages We choose to give it varies and assuming we don't have belates anomaly Because we're doing something better than FIFO than as we add pages. We're going to decrease the uh Page fault rate and we could have an upper and a lower bound deciding that as long as our page fault rate is within The lower and upper bound we're good And if we're above the upper brown that process is thrashing So we need to give it a few more pages if we're below the lower bound We could imagine that process has more pages than it actually needs and we might want to give them to a different process okay, and so um This is a pretty simple idea Of course, we might actually ask the question. What if we just don't have enough memory for everybody? But how would we know? How would we know? Assigning a certain number and changing the number of pages to a program only apply to the local pages case Yeah, so this is that's a good question So the question was does what we're talking about here applied to the local page allocation or global page allocation? So this is almost by definition a local page allocation concept because global page allocation just kind of shares them among everybody you could You could modify the global clock to try to skip over pages for processes that were Underfunded in terms of number of pages. So you could use a global clock for something like this if you wanted But for now, let's think of this as the local page allocation version so Uh, so thrashing is when we don't have enough memory period no matter what we do. Oh, sorry go ahead So, oh good. So the in the question is if all processes are above the upper bound. We know we don't have enough memory That's correct. Actually, I would say if there's no way to get all processes below the upper bound Then we know we don't have enough memory um That that would be another way to look at this and we have this notion of thrashing And what I want to just show you here is if we look at adding more processes or threads To get a better performance out of the system where we're trying to overlap i o and so on in in In the case of multi threading we're trying to sort of get better use of pipelines and so on We can add more and more tasks, but at some point We're going to stop getting better performance and we're going to fall off a cliff here We're just adding a few more processes all of a sudden We're page folding all the time and thrashing going back and forth the disc and so if we're in this Past this uh cliff of no return. We know we're in trouble and uh really thrashing is Uh a situation where at least one maybe more processes are busy swapping pages in and out With little or no progress and uh, how do we detect thrashing? Well, we can start seeing that our Utilization is low or that we're basically spending all of our time with overhead of talking to the disc And what's our best response to thrashing? Anybody have any thoughts? Yep kill some processes. So that would be one option, but I think killing processes is uh pretty drastic We have several interesting ideas here Um adding more ram. Well, that's good long-term solution I think uh one that was brought up here is putting some processes on inactive is a good one so What we do in that instance is actually page out all of the pages of given process Finish the ones other ones that are running and then bring it back and as a result what we're going to do is we're going to Reduce the thrashing notice. This is a supremely steep cliff and so by deactivating some processes We can actually get much better performance by putting them to sleep running the other ones the completion and then running the ones that uh That were um, but we put out a disc Okay, another suggestion here was to reduce the amount of memory that some processes get So that one is kind of what we're trying to do by tracking here So at some point we notice if we have too many processes below our lower bound We could try giving them to ones that are above the upper bound and when we can't really do that anymore than we're just posed and we got to be something else okay, so Let me remind you a little bit about memory access patterns. So We've done this a couple of times. We we looked at a version of this last time But here's an example of an actual memory traced Over time from a process and if you look at which memory addresses are in use you see That there are these periods If we were to actually scan along and look at a time slice We see like for instance in this one We just ended up at that the addresses kind of in this region are in use but the ones elsewhere are not And so if we were to try to if we could somehow figure out The number of pages that this process is actually using During this region Then that might tell us sort of how many pages it really needs Okay, and this is called the working set and the working set defines to some extent the minimum number of pages needed for the process to perform well Okay, and not enough memory for the working set of all processes thumb to get Some together then we've got freshen and that's the point where it might be better to swap out the process So the working set model is pretty simple is if you look at the references from a given process And you take the last delta of them And you look at them and you count the number of unique pages in that past window That will tell you that for instance the working set at this t one time is pages one two five six and seven But a little later the working set at t two time is three and four And so if we identified the pages that are in active use in any given time frame Then we both know how many of them are and we know which ones they are Okay, we might be able to respond. So the working set window is a fixed number of page references. It might be a thousand of them and so You know this Chunk here is the working set of process P i for instance And it's a total set of unique pages referenced in delta And if delta is too small, you don't really get the whole locality So if I if I only look back a little bit, I might miss the pages that are in active use If delta is too long, I might have gone from here to here And now I I would encompass both three and four and one two five six seven and I'm not really understanding enough about the changing access pattern of that process And of course if delta goes to infinity you're basically talking about every address or every page that that process access is So if we were to sum up the working set size of all of the running processes That gives us how many physical pages we need in memory And if the total number of pages in the working sets of all of the running processes is greater than the amount of memory We've got where it's rationed Okay, and so you could imagine a very simple policy that when d is greater than m use suspenders swap out all the processes And that can really improve your your working set or improve your performance And um notice that this is something that you can keep track of to some extent and so if you're willing to do some Maintenance of information per process then you can start making some of these kind of decisions now What about compulsory misses? So if you remember compulsory misses are misses that represent the first time a process is pulled a page into memory and You know these are pages that are touched for the first time Or there are pages that are touched after the process and swapped out and brought back And the question might be can we Save ourselves from those and we did talk about Prefetching in one sense, but one option here, which is actually used by various operating systems is clustering So on a page fault you bring in multiple pages that are around the faulting page that's a type of prefetching and Since the efficiency of disk reads increases with a sequential read Then this can be a much more efficient than a series of accesses to disk now our very next topic Next time I guess after we get back from spring break Is really starting to talk about how disks work in physical disks and also ssds and we'll talk about a few other storage Devices, but basically with clustering we can do a really good job of pulling things in once we have that information Now the question here is what sort of replacement policy comes with clustering So you could think of clustering as a meta replacement policy Which is a form of prefetching that we only use when we pull something in That hasn't been used for a while. So when you pull a process in off of disk For uh for the first time in a while, we might use the information that we stored with it as the which pages to pull in It's also potentially something that we might use when we miss We're actually it's telling us which things to pull in And so in both cases, it's not actually about replacement This is about the pages you're pulling in and the replacement question is really about Which pages are we replacing and that's that could be whatever replacement policy you have So I guess a simple answer to the question of what sort of replacement policy is comes with clustering is It's actually that clustering is a way to decide which pages to pull in not which pages to throw out I hope that helps Uh and we can do working set tracking which is used some algorithm to track the working set of the applications And when swapping a process back in you swap in the working set Now I want to talk about just briefly. There's a type of clustering which is used by pretty much every operating system. I know uh, maybe not pentos, but um, It's basically when you read a page The simplest type of clustering isn't clustering at all. It basically says sequentially pull in the next n pages where n is typically small and uh, this is used quite frequently and it's a Basically a a bowing to the fact that most accesses in Um page that are coming off of disk are sequential And so if you access one page, you're likely to access the next couple of pages and that's a very good policy in general Okay so I want to switch gears for a little bit And unless there's any more questions on this Alrighty, so, um, let's talk a little bit about linux now because that's sort of The unix style operating system you're likely to run into and get the chance to modify in your careers as software engineers and um memory management in linux is considerably more complex than the examples we've given for instance linux running on an x86 processor has to bow to the historical details of x86 processors like for instance memory below the 16 megabyte physical realm was used for DMA able memory on things that happen to be on the isobus, which is a an old Default bus that was used and and still is to some extent for things like keyboards and mice and so on There was a zone normal which was 16 megabytes up to Uh 196 megabytes which is mapped above c1000 and I'll show our c 000 000 000 And then high memory is everything else. And so these three zones Of physical d ram were often measured. We're often managed each with its own Clock like lru algorithm Okay, so each zone has one free list two lru list which is the active and inactive pages And so it's actually doing something that has it's like a cross between the clock and the second chance list Many different types of allocators. There's slab allocators per page allocators mapped and unmapped allocators that uh, you will encounter in in linux And many different types of allocated memory So for instance the notion of anonymous memory Is memory that's not backed by a file So a good example of that is the heap or the stack Is often not backed by a file. It might be backed by the uh, the swap drive But not necessarily by a file an anonymous memory is another word, uh, also for shared memory between two processes and I'll actually when we get into mem memory map and map Uh system call a little bit later in a couple of lectures. We'll take a look at, uh memory that's anonymous as well as mapped memory so it's possible in linux for instance to execute mem map and map and Map it to a file and then you can read and write the file by reading and writing memory As if the file was entirely loaded into memory And uh, the page the man paging now is is used to actually pull things out of the file and into dram on demand Okay, and then there's allocation priorities in uh, the way the allocator works like is this blocking Can you can you put the process to sleep when it allocates off of certain, uh, zones of memory Now what's kind of interesting here is if you look at the difference between a 32-bit address space and a 64-bit address space By the way before the meltdown bug was discovered a couple of years ago. They typically looked like this Okay, and 32 bits is uh No longer really a lot. Okay. That's only four gigabytes but what happened was the first three gigabytes of the virtual address space was granted to uh, the user and then um from So from zero to c In that that first nibble and then uh from c to all apps was kernel and the kernel among other things would map the first 800 96 megabytes of the uh d ram Up into these kernel addresses directly. So when the kernel popped in Uh through a system call or an interrupt or whatever it had a direct mapping for pretty much every physical Page in the system Unless you had something more than 896 megabytes in which case you had to swap in and out of the mapping in the kernel When you get to 64 bit virtual space There's just a lot. Okay, two to the 64 is just big And it's so big That most processors don't even bother mapping all 64 bits of the virtual address space through a page table It's just so big so One of the uh examples we gave a couple of lectures ago Was really only mapping 48 bits of the virtual address space through page tables And here's a good example of what happens with that and so what's Funny and what I wanted to point this out is we have what's typically called the canonical hole Which is a chunk of address space. It's not available For mapping at all. So the kernel nor the users page tables can use this space And so what happens is the virtual addresses of uh everything from uh zero up through seven This is 47 bits where the ones Are mappable and then if that um, and then the chunk up here from fff eight all the way up are also mappable And nothing in between Okay, and you could kind of think of that as this Seven is basically zero one one one as a nibble of four bits And when the one gets set you really set all of the bits And so these are the only parts of the virtual address space that makes sense The user is only given access to these two to the 47 Bites worth of address space and the kernel gets two to the 47 bytes of address space up here All right questions on that. All right, so If we uh continue so continuing with this for a moment pre melt Okay, so what did meltdown do well meltdown said that we couldn't actually have kernel addresses up here And that's a problem. Okay, so the pre meltdown virtual memory map, and I'll tell you about meltdown in the slide Look like this essentially The kernel memory was not generally visible to the user So all of these addresses in red had uh page table entries that said kernel use only Okay Especially the exception was there were some special dynamically linked shared objects that were up in that upper space that were uh Basically put there so that the user could do things like get time of day and so on quickly without actually having to trap into the kernel and uh, you can you can google bdso to see some details on that But it was an optimization to try to eliminate system calls for things that people might need to do rapidly Every physical page is described by a page structure in linux They're collected together in lower physical memory Can be accessed in kernel virtual space and linked together in various lru lists, which I mentioned the three zones before For 32-bit virtual memory architecture as I said when physical memory is less than 896 megabytes All of the memory is mapped up top when physical memory is uh, greater than 896 megabytes You have to swap In and out And for 64-bit virtual memory accesses basically all the physical memory is mapped up in this upper space Uh, and that what that really means is that the kernel You know if DRAM goes from zero to something The kernel actually has a chunk of this space Which is literally mapped directly to the physical DRAM. Uh, just to make it really easy to access Without translation in a sense, uh all of the physical memory So, uh, then what happened? Well, what happened was 2017 Okay, in 2017 was a very bad year for computer architects There were a number of bugs Security violations that were discovered that were all related directly to the fact that computer architects had been doing Prediction for a long time. They were doing branch prediction. They were doing uh, data value prediction, etc in order to allow processors to run ahead of Slow operations like branches in order to get much better performance And they were doing out of order execution as well And so I just want to show you some code that gives you an idea what happens. Okay So what actually happens in a physical processor? Uh, these days and and you should take uh 152 or 252 if you want to learn more about this Is speculative execution says that instructions get to run ahead of the Of the decision-making parts of the pipeline like branches or in the case I'm going to show you here. It gets to run ahead of the Page faults that are for protection violations. And so this interesting little bug which was demonstrated and kind of Demonstrated to to manufacturers of processors and to microsoft, etc of operating systems back in 2017 and it was not announced to the world until 2018 is something like this Where you have an array that let's say has 256 bits 256 entries uh in that array Spaced out by 4096 bytes each And i'll tell you why that's spacing in a second and you flush this all out from the cache So there's uh nothing in the cache and then what you do is you say oh the result I want i'm running as a user now is a kernel address Okay, now what's a kernel address a kernel address is up here in red I as a user try to access this memory now that shouldn't work I should get a page fault and a segmentation violation and the process should be killed but in in the world of Dynamic execution what actually happens is this address gets loaded Temporarily into the processor registers while the violation is being checked And so it gets loaded into the register and then I go ahead and and say well, I want to access my array. This is in my physical in my user space With the result times 4096 I touched that array and then eventually what happens is the processor says, uh, wait a minute You're not supposed to attach to see that and so what it does is it causes a page fault and it cleans up these registers So this result value doesn't actually get into a register in a way that I could look at And notice how I've done a try catch here so that even though I got a bad access I still go on without killing the program And now what's cool about this? I guess if you're a processor architect before 2017 is You know We were doing this out of order with the assumption that this was correct But we squashed it properly and so there's no bad result here Okay, it's not like my program that now after this looks at result gets to learn anything about that kernel data However, notice that during the time until I've caught the violation I've actually caused one cache line to be loaded as a result. So if this result came back to, uh, you know 129 Then there's an array entry the 129th array entry gets loaded into the cache And all I have to do as a user is scan through my 256 slots and the one that doesn't have a Cache miss which acts is very rapidly tells me What the value was so I'm finding out indirectly what the actual value is Okay, and the patch Unfortunately was you need a different page tables for user and kernel And so all of the optimizations that had been done over the over the years As a result of this mapping now, uh, suddenly couldn't be done And so you had to have two pages one for the kernel one for the user And uh, only versions of linux for instance after 4.14 was even able to use the uh, the id tag in the tlb And so as a result just by changing page tables for user and kernel for instance, uh Taking a system call The best the kernel could do is it have to flush out all the tlb Go to the kernels page table do whatever you want Flush out all of the tlb go back to the user. So there are multiple tlb flushes And it turned out it was about an 800 overhead just by uh repairing this for the meltdown bug A little bit later with the version 4 14 of the linux. It was able to use the The id tags that were on the tlb And so you could have a separate user and kernel tlb is loaded into the same tlb at the same time Without interference, uh, the real fix is better hardware and that's still kind of on its way All right, I'm going to end this out. Okay. How is it possible to get root access to a system just by viewing kernel memory? so, uh, that's a good question But imagine for a moment that the thing about the kernel Is it's got all sorts of stuff the user is not supposed to get access to it's got keys It's got passwords all of that stuff is inside the kernel because it's the one that's controlling access to the process So the fact that you can essentially drain out all of kernel memory Using this hack means that now you can see all of the information the kernel is trying to protect from processes And you can violate uh all of the privacy constraints that processes depend on Between each other so you can get passwords you can get cryptographic keys you can get Data that's been decrypted all of these things you can get access to because for instance, as I mentioned this Red area has a mapping of every physical DRAM page And so if you can look at that you can drain the contents of every processes memory just by running this hack So this was pretty devastating and bad and so it really once it was discovered There was pretty much no way you could get by without fixing it okay all righty, so um Now let's talk I want to talk a little bit about i o before before we go off and and for spring break so So far in this course, we've been talking about managing the cpu and memory But what about i o and really without i o computers are kind of useless. They're sort of like disembodied brains Maybe they're computing the last digit of pi But you can't even tell anybody about it because they don't have any i o and so Really we have thousands and millions of different devices each one slightly different and the question comes up pretty quickly How can we standardize the interfaces to these devices? Um devices are unreliable. We have media failures transmission errors They're unpredictable or slow How do we deal with uh with that unpredictability in a way that We can give a good virtual view of the world to The processes that are running now we're going to talk about virtual machines in more detail in another lecture But really as we started this class we talked about how the process abstraction was a type of virtualization that we uh sort of give we clean up the details of the actual hardware and uh This has been fine talking about virtual memory or talking about virtual memory and so on but we haven't talked about devices And so if you remember from way back We showed you this figure where uh below the red line was all the hardware that we were virtualizing And things like disk drives to load into memory things like network cards to load into memory into the operating system to touch And so really there's a whole bunch of buses And who deals with that complexity? Okay now maybe that you don't because you're not a device person, but somebody has to and um the os Uh it has to right and so um and remember these time scales. So this uh Numbers everyone should know from a cache uh miss. That's uh under a nanosecond. We're nanosecond again 10 to the minus nine Up to milliseconds Okay 100 milliseconds is is the tenth of a second So we see a wide array of different times in the system and somehow the ios subsystem has got a deal Okay, and in a picture, you know, we had this uh memory Hierarchy where we have disks And ssds maybe in front of the disks and dram in front of the ssds and an l3 cache and an l2 cache and an l1 cache and registers And by the way, if you look at the first slide of this lecture, uh, You can figure out how to build an average memory access time for something that's getting pulled from disk up to the processor many layers and Off of the processor, we have i o controllers that let you read and write Your disks the i o controllers have Wires that talk to the devices of some sort and they have an interface that talks to the processor And then um devices may have something direct memory access is a way to basically pull things from the device into memory Without the processor having to to transfer every byte so, um You know all of the i o devices you recognize are supported somehow by our i o controllers and processes processor, excuse me access them by going through the controllers Reading and writing commands and arguments and we're going to talk about that in a few moments So modern i o systems basically, uh are all about the i o Or excuse me modern systems are all about the i o so if you look, you know, where is the processor? Oh, that's this thing right here And it's got bridges to different buses and many different types of buses and different types of devices and really From an interesting interest and complexity standpoint All of this stuff is what's interesting and complicated that processor is it's just a processor Uh, and you know as a computer architect, I I don't really want to minimize all the cool stuff in the processor But I want you to see all the other stuff Okay So for instance, what is a bus so a bus is a set of wires Okay, a common set of wires either on chip or off chip With a set of devices talking to it and it's running a common protocol for carrying out data transfer Operations like read and write their control lines like address lines and data lines and typical many devices on some buses other buses are point to point A protocol Is something that each bus has to have which allows an initiator device to start a read or write across the bus to some other device um and These buses when they're very close to the processor namely on chip. They're very high bandwidth They get lower and lower bandwidth as you get away from the the processor So here's an example for instance of a pc i bus which is um More modern kind of architecture. You have the cpu here. It's got a memory bus. It's extraordinarily fast to its ram And then a host bridge that might go to a couple of pc i bus Buses and those pc i buses one of them might go to an isa bridge And these are the really old legacy devices like keyboards and mice But there might be another pc i bridge and on that second pc i Bus then you might have a usb controller and then all of these usb devices you're used to I see we have a little bit of a a bug here with wrapping. We have a webcam The keyboard. Sorry about that So you can have a usb controller. You could have a static controller This is a serial ata for hard disks and scanners and so on and so this kind of a bus hierarchy is How we plug the system together and these buses the pc i buses are often the ones that you open up a Machine and you plug cards into it and then out the back of the machine You've got a separate bus and so the usb is a type of it's a universal serial bus That is a type of bus for disks and scanners isa is a type of bus for legacy devices So i'm going to pull up a somewhat older processor now just to give you an idea. So inside the processor These get pretty cool. Okay, if you're a computer architect. So you've got Four out of order cores that are doing that out of order execution. They might actually have some memory protection extensions Software guard extensions. So that's the sgx to build enclaves You cannot issue say up to six micro ops per cycle. So these are Operations where the intel instructions that you learn in your debuggers get compiled inside in hardware into Risk instructions. That's all done in hardware where they're running very high speed pipelines We have large l3 caches With an on-chip ring bus. So this ring bus basically Connects with the caches the llc's here High bandwidth access to them over that bus. We have integrated i o so we might have these are high speed d ram d dr interfaces, we have Displays displays. So we might have a gpu agent that talks to the displays et cetera But these are a few i o devices. What's possibly more interesting is what's called the platform controller hub So here's the processor. I just showed you with some d ram And some pc i bus for very high speed pc i and displays But we've also got this platform controller hub and that platform controller hub through a direct media interface Talks to pretty much everything else Okay usb's and lands. So that's the ethernet and audio and so on the low performance controllers for some of the Old legacy devices and so on Many types of i o are actually on the platform controller head. So usb ethernet under both three Audio bios all of this stuff discs come off that chip So really i'd say the one that's the chip that's got all the action is kind of this guy Okay Yeah, there's a processor. Yeah, it's very cool. But this guy has all the i o attached to it right now When we start talking about i o Depending on what the device is we got to start talking about things like what's our data granularity? Is it bite versus block? Some devices give us a bite at a time. You can imagine the keyboards like that others provide whole blocks Discs networks, etc. We talked last time about what is the native? What's the native block for a disc and that's actually called a sector that's like 512 bytes But oftentimes we'll read multiple sectors at a time to get what's typically the The operating systems version of a page But still things are coming off in larger chunks We might ask about access patterns. Are we forced to pull everything in sequentially or can we grab it randomly? So some devices Are sequential in nature? Okay, like networks. You can only pull things in sequentially off of them others like discs. You can randomly read Okay Some of them require you to continuously pull others generate interrupts so Some have transfer mechanisms where you read one byte at a time From them others have direct memory access So these are all kind of interesting things that are handled by The linux kernel. So, you know, we've been talking all this term about different parts of the kernel We sort of talked about process management. We've been talking about memory management Now we're moving over to talk about some of the device control And we'll talk about networking and then we're going to move into file systems as well But really the system call interface has a Machine independent portion at the top and then as we get closer and closer to the devices They become more and more like device drivers and closer to the details of the device themselves So the goal of the io system is to provide uniform interfaces to despite a very wide range of different devices For instance, this code F open dev something Giving you a file descriptor and then reading through From that device and printing out or in this case, we're printing out to that device f print f works For all sorts of different devices. It doesn't matter if it's a file or if it's a network or if it's something else Okay, a screen because we've got this standard Um version that everything looks like a file and we talked about that earlier in the term But uh, why is this work while the code that controls the devices or the device driver implements the standard interface And we're going to try to get a flavor of what's involved in actually controlling devices as we go forward Probably mostly next time, but we can only scratch the surface here Okay, so we want standard interfaces to devices block devices Uh are things like disk drives tape drives dbd rom They access blocks of data And commands include things like open read write seek raw i o or file system access is an option Memory map file access is possible, but the essential part about block devices is we're always pulling in blocks or chunks of data Okay character devices like keyboards mice, etc Single character at a time and in that case we have things like get input Okay, and libraries are layered on top Network devices are actually considered a third class of device. So things like ethernet Wireless bluetooth, etc. They're different enough from both the block devices and the character devices that They are Treated separately and they also have the socket interface which we talked about Uh last month, I believe and um, so The unix and windows both include socket interfaces So do the apple Ios's the separate network protocol from network operation has things like select And the usage here is pipes and pipos and streams and cues and mailboxes and so on So those are the three types of devices And you might ask, well, how does the user deal with timings? So here we also have some options So mostly what you've been doing up till now is a blocking interface or a wait interface when the request data like read a system call Has to grab data from the device you go to sleep Okay, that's a blocking device. We block until the data is ready Um, or when we write mostly we just write the system call and the process gets put to sleep only if There's not enough buffering or if the device is not ready for data yet Non-blocking interfaces which you can do with a lot of devices say don't wait. Tell me what you got now So this is when you do a read System call to a non-blocking interface you get back immediately what it can give you Okay, and how do we get a file for instance file? Uh That we've opened how do we turn it into a non-blocking interface? We can actually use the iOcto system call we talked before and in this case We return quickly from read or write and what it tells us is how many bytes were transferred Finally, there's a notion of an asynchronous interface, which is the uh, tell me later And here what happens is when we request data Uh, the operating system takes a pointer to the user's buffer and then uh returns later with some sort of signal That says your your, uh, iO is ready Go for it. Okay. And so notice the difference between non-blocking and asynchronous is non-blocking returns with whatever it's got Asynchronous returns after having gotten what you asked for Okay, and so in the asynchronous case you may wait a little longer But you get later notified that a complete read has happened or a complete write has happened Okay, and so typically you can open something in a blocking version and then using the iOcto interface iOctl You can turn it into non-blocking or asynchronous with the right uh calls So how does the processor talk to the device if you guys can bear with me for a couple of moments? I want to um make sure we talk a little bit further So the processor as we've been talking about talks to memory over the processor memory bus Uh, typically there are bus adapters And an interrupt controller talking to the cpu and from those bus adapters then we can put our device controllers on here Okay, and so a device controller is a piece of hardware that is uh plugged into the bus That manages some device and here i'm showing you a screen for instance And inside the controller are a set of control registers, which lets you say for instance, what's the resolution? Uh, you know, what kind of features are you asking for? And the addressable memory potentially you can read and write data directly on the device controller as well Okay, so it contains a set of registers that can be read and written And uh may contain memory for requests. So those are cues or bitmap images And uh regardless of the complexity of the connections in buses Which is often hidden the complexity of the buses is usually hidden from people By the hardware But there's two ways that a typical processor can get access to the to the hardware One is an i o instruction. There aren't too many processors that directly have i o instructions, but um x86 is one of them and then um Good example is out o x 21 will actually send a um Send a request to uh register 21 Okay, the other is memory mapped i o. So this is a case where load and store instructions will actually uh Go over the bus and directly access the uh hardware just like it was A um just like it was a cache reader write except that it's talking to the hardware So this is the last thing I want to show before we end up today So here's an example of the physical address space. So normally physical address space Is only we've been only talking about that is where the DRAM is but it's also where the controllers are And if you look here in a memory mapped case the hardware maps control registers and display memory to the physical address spaces and then The processor just writes to display memory directly Okay, in many cases these addresses are actually covered by the Covered by the pte's in the virtual address space And what's interesting about this is just by giving Just by having the processor write to this range of addresses for instance from 8,000 f to 8,000 f f f we Write in the display memory and in many cases those writes appear directly as dots on the screen Okay, or when we write, uh, for instance this command q is an example where we might actually write graphics description Like a set of triangles to draw and just by writing to them we have set up a a set of figures to draw and Maybe we write to a command register down here by writing that address and as a result suddenly your Your image is drawn Okay, so that's the idea of memory mapped IO Okay, and in many cases you can protect this with the dress translation And the ones that you can protect with the dress translation You can actually give direct access to the user to do this memory mapped IO under the circumstances where that's allowed Okay, now before I end on this, are there any questions on the memory mapping? Okay, we're gonna we'll pick up more discussion of Direct memory access when we get back after the holiday, but I'm going to give you an inclusion We've been talking about IO device types here many many different speeds different access patterns like block devices character devices network devices different access timing like blocking non blocking asynchronous We've been talking about IO controllers, which is hardware that controls the device the processor accesses through IO instructions or load stores the physical memory and Basically, we're going to next talk about notification mechanisms So we're going to bring back interrupts. You remember that where with the old interrupt We really talked about that was interesting up till now was the timer interrupt But now we're going to talk about how devices can use interrupts to notify the operating system that they need service And we're going to bring back our discussion to device drivers and look at how the device driver really provides a clean interface from The higher levels of the operating system through to the the specifics of the devices Alrighty Thank you everybody. I hope you have a safe and relaxing holiday We will pick this up When when we get back and we will tell you the details of the midterm But as as I told you pretty much everything up till the next lecture is going to be fair game I'm going to try to get these lectures up till today with the closed captioning we already have closed captioning on lectures 13 and 14 and I hope you have a great holiday and Talk to you later. Bye now