 All right, welcome back to operating systems, so Yikes, not a lot of people here. So either the heat scared you off virtual memory did Which hopefully it's more thing or I did so hopefully not me But yeah, don't blame you this room is a Terrible all right page table implementation. So what about that? So that's like this is either the hardest thing historically or like the second hardest thing in the course So we go over it again. Yep Reds and synchronization data races. Yeah, so oh It'll be lots of fun All right, so remember this is what our page tables look like on a normal system So your arm processor, whatever this is what it currently looks like our page size is four kilobytes So we have 12 bits for the offset and we have multi-level page tables So each of our page table fills up an entire page page table entry is eight bytes Therefore we can have two the nine entries in our page table. So it looks like this We have three levels of page tables because we support up to a 39-bit address I'll the entry in L2 points us to L1 L1 points us to L0 and L0 That's how we finally look up the address that we're trying to translate for So any questions about this at all that makes sense last time? All right good we're like ten times better than last year. So that's great So turns out changing the slides helps so just in case people are confused. So alignment is a thing and Eventually all memory computers like when things line up in multiples of two Everything's a multiple of something and it eventually lines up with index zero. So all of the pages will be something called Aligned so they'll be aligned whoops To this 4096 boundary so that means the offsets in memory are going to be multiples of that That's all a line really means that it starts at zero the next page starts at 4096 the page after that starts at 8192 so on and so forth. I can't do that very long So means all the lower bits are zero. That's the first Byte on the page and then it goes to the last one So why we do that is because otherwise it would be a complete pain And we couldn't just take advantage of using a whole page and the offsets Working out. So like for example if a page started that address 7c zero zero and we have to go until it would end at the end of the page So it would end after 1096 bytes it means the last byte on that page would be at address 8th out 8 BFF which is kind of a pain, but because everything is aligned at Starting at a zero boundary the starting address is always a multiple of a page So instead that's why we don't have to keep track of the offset in the page table entry or anything like that We know that Because all of our page tables fit on a page and all of our page tables start at address zero If something is on physical page seven it means it starts at seven zero zero zero And it goes all the way to seven FFF So here's a question for you if I just give you the address EC is that eight byte aligned? No Right, so you can look at powers of two pretty easy in computing So this would be a power of two and anything above it is also a power of two So if it is eight byte aligned that means this number would either be a zero So it would start at zero and then it would be zero eight Why can I do math 12 is that right 12 16? Whatever it would just be multiples of eight go on so if this Hex digit is not a multiple of eight. It is not eight byte aligned which would either be Zero or eight so that's a quick way to check if it is eight byte aligned and similarly if it was four byte aligned should start with Yeah, if it was four byte aligned it should start with zero or end with zero four eight and see which is 12 So yep Yeah, so alignment just means that it's essentially just a way of saying it's a multiple It's an exact multiple of that number Because computers just want things to start at zero and then it can easily compute Where things are so like you might see that like your stack has to be 16 byte aligned or something like that And some architectures which means your stack pointer has to end essentially in this case in zero Yeah, yeah, so it's just a rule things start at multiples of something makes your life a lot easier. Yep Yeah, so that would be a multiple of four So it would also be two byte aligned. It would also be one byte aligned although one bytes aligns not really a thing So essentially alignments just multiples of things So if you get an address where it's like entered in a three or something Computers only like things in multiples of two, so it's not a power of two So if it had a three over there your computer spoiler alert is probably going to be way slower if it can even handle that without crashing so Yeah, that's just a thing that comes up with computers. It just makes life lots a lot easier So he did this before we simulated the MMU to get some addresses Anyone want to go back to that or we can go back to it later and play around with it? Alright, let's put play around with it later and get through what we need to get through otherwise Just remember that each process would have its own unique page tables so it would have its own unique L2 page tables because Those are the things that implement virtual memory. So every process gets its own page table So here's a fun question. So how many page tables do we need? Let's assume our process uses 512 pages There are two situations here There is a best case and a worst case with how many page tables we need to use so if our Process is using 512 pages anyone take a guess of the minimum number of page tables we would need Yep Just one so if we have a three level page table here We need at least three Unless we like recursively call one again, which I guess Yeah, which would be weird not quite work. Yeah, how many pages are in a page table? so Here in a page table a page table doesn't actually contain the pages a page table just points to pages So there's nine bits here, so each page table has two to the nine entries Which is 512? So even to translate a single address, we need one L2 one L1 and one L3 So so even for a single address we need three So if my program use 512, how many would I need? Three at minimum, right? so if I had It could be a situation where my L I have one L2 page table It points to a single L1 page table Which points to a single L zero page table and then this L zero page table has all 512 entries pointing to different pages Each one the program uses. All right. Yeah. Yeah, so this whole thing The reason we do this that there's two the nine entries is remember that we're making a fit page table fit exactly on a page And that's because of that so the number of entries we can fit our page table depends on our page size and how big our page table entry is if we're doing this and Another reason we're doing this is because the page table entry like it doesn't have to store an offset or anything weird like that because everything's aligned so if in L2 It points to physical page 8 that contains the page table for L1 and it spans that entire page We don't have to do any weird things where we have to like keep an offset or anything We know where it starts and where it ends because everything's aligned So we you could come up with a system where that's not the case But it's gonna be real ugly and it's just nice if every memories of page page tables are on pages Everything's up Everything's on a page Yep Yeah, so this STAPS SATP register is a Register on your CPU that points to the L2 page table It would be something that your process isn't allowed to change. So this would be determined by the kernel No, so it'd be the same register just the value in that register would change Yeah, so when you your process gets context switched in or set up for the first time The kernel is going to have to create an L2 page table create all the page tables Set that register to point to the L2 page table and then whatever context switches in and out It would also have to change that register. So it's pointing to your address space cool Okay, what about the worst case so we said Three pages is sufficient in the best case where they all fit on an L They all fit on the same L zero page Table what is the worst case if a program's using 512 entries? Three times 512 So you can only have one L2 page table. Yeah, so In this case, we would have In our worst case where we got super unlucky or just like someone hates us In our L2 page table We could just have it full of entries. So it could have all five 512 entries pointing to different unique L1 page tables. So there would be 512 Full Right and each of them point to an L1 page table So I have this each of these point to an L1 page table. So how many L1 page tables would I have? 512 and then in each L1 page table Well, they could just point to a single L0 page table because it just has to translate one address and then in here In my L1 page table, it also just points to one thing Which would be, you know, the actual physical page that I care about Something like that. So whoops so How many L0 page tables do I have? 512 again, so in total I would have 1025 512 plus 512 plus the one L2 page table So best case I only use three page tables Worst case I can use 1025 which is pretty bad, but you might have noticed the way it works with Going here the way it works with this so if all your addresses are contiguous So they're all in the line Well, hopefully that corresponds to where L0 is as an index, right? So it's the lower part of the index So if it's all contiguous in memory all their page tables entry all the page table entries would also be in a line So they would all probably be on the same L0 page table until you cross like a 512 boundary and then you have to get a new one But ideally they're pretty much always beside each other So that's another reason why computers like contiguous memory. So physically it likes things beside each other, but also Having it closer is also an advantage for page tables, which is cool because They could have flipped it around where the L0 index came at the front and L1 came next and then L2 There's no reason they couldn't flip it for some reason It would just be a really dumb decision because most of the time your addresses are contiguous All right. Any questions for that? Sound good. All right other types of questions So fun types of questions might be hey Given the system, how many levels of page tables do I need? Well, if you get something like that, you'll be given the size of the virtual address how big of a page is and the page table entry size and You'll have to figure out something you might also be given the number of page table entries That fit on a page and then you have to figure out the page table or the page Table entry size although that would be weird, but you might be given something like this so you get a Description of a system then for multi-level page tables your goal is always to fit each page table within a single page So find the number of entries that fit in a page table And then if you take the log 2 of that or just whatever powers of 2 that is That's the number of bits you need as an index for a single page table That fits on a page and then if you keep if you want to calculate the number of levels of page tables You need it's just the number of virtual bits you need in your address Minus the offset bits because we don't have to train translate that Divided by the number of bits you need for a single index and then You take the ceiling of that or otherwise just round up if it is a bad number So in this system, what do we have in this system? So we have So funnily enough, this is the system that was used back when we had 32-bit CPUs I guess when you guys are born that Wow, that's not good. Actually, I guess when I was born, too No, that's not true Wow, that's sad. Okay So we should be able to figure out given this system this 32-bit system with the page size of the same thing and The page table entry size of only four bytes now. Why is it four bytes? Well on a 32-bit system I can only access up to 32 bytes. So I don't need that much space in my page table entry So so let us do this for bite So given the system, how many bits do I need for an offset? That looks like 21 to me Yeah, so the offset corresponds with where we are in a page So my page size is our good old trusty 4096 which is 2 to the power of what to the power of 12? So how many bits for my offset? 12 bits. Yeah Okay. Well knowing that How many entries how many page table entries can I fit on a single page? How many page table entries can I fit on a page? 1024 or I could also just change this to to the power 2 and then To the power of 12 divided by to the power of 2 is the same as 2 to the power of 12 minus 2 which is to the power 10 Or if you wanted to that it's 1024. That's a lot of twos. Computers like twos Yeah, if you don't look if you have like a fear of twos you should probably drop out of computer engineering So Where are we so? This is However many entries we can fit on a page table. So how many bits do we need for an index into our page table? 10 right because we have to sign if we have to be able to pick Which entry in our page table do we want to select here? So we have 10 bit Index for L1 L2 whatever levels we need so In here our formula was the number of levels we need is equal to the floor of the virtual Bits minus offset that's Also offset so that's fun divide by index bits so for this How big was my virtual address 32 bits How many offset bits did I have? 12 How many index bits did I have? 10 I take a ceiling so in here Thankfully, it's 20 divide by 10 ceiling and Then that's just two because it's a nice number So if for some reason this system was like hey, I want a 33-bit virtual address Well, that the only thing it would change in our calculation is this number So this would change to 30 whoops 33 and then suddenly it's 21 divide by 10 and rounding up and that's just well, that's not even rounding up That's just going up. So it would be Floor of 2.1. Whoops Okay, yeah, so for this question. This was like a system back in the day. That was 32 bit virtual addresses Your your physical addresses were also 32 bit. So this is what they had back in the day All right. Yeah, sorry about that had to change the microphone All right, so there's the answer for that. So yeah, you might have imagined we kind of said this at the beginning last lecture that Using multi-level page tables is really slow. So we have to follow the pointers. Well, you essentially follow the pointers and Likely we're going to access the same ones every time or at least close to it Sometimes your process doesn't use that many pages at a given time anyways. So what's the solution? Throw a cache at it. That's a favorite computer science Solution that's the solution to this. So the cache for your MMU is something called the TLB Stands for translation look-aside buffer. Why they named it that I have no idea, but whatever. That's what they named it So what so what the TLB will do is essentially keep track of the virtual page number and what the Final page table entry was so whatever the Final page table entry it would have that in its cache So it doesn't have to go through all the levels of the page table as long as there was a hit So if there's a hit, it's going to use all 21 or 27 bits of the virtual page number Check it in its cache. If there is a hit it follows the green line and it just throws a translation in there Doesn't have to walk the tables doesn't do anything. It caches the page tape the final page table entry Then if there is a miss, it's the red line So you follow it then we have to do the translation we did before so it would have to Start at the L2 page table go to the L1 then go to the L0 And then it would go ahead and populate that in the TLB So it doesn't waste its time doing that and then whatever it looked up It would also do it for the translation. You get the final translation. So Make sense TLB cache for page table entries. Yep. Yeah, so it could be fully associative or whatever. That's up to them Yeah, so it's no longer in the ray, but you know the architecture people are smart. They know how to do caches So I'll leave that to your architecture course All right, so cool TLB any questions about the TLB just a big cache All right, other things you might be asked because there is a TLB because remember we want to say that hey we want our Performance using virtual memory to be as close to physical memory as possible So let's assume that there is only a single page table Which only has a one additional lookup if we want to compute effective access time So that is like the access time you actually see Well, we have to take into account our TLB. So There if there is a TLB hit and it is in the cache Well, it will do what you said it will take some time to find it in the cache. Hopefully it won't be that slow so The time if there's a TLB hit is the time it takes to find it in the cache plus The original memory access the physical memory access that we can't get rid of anyways If there is a miss well, it's still search the cache and then it failed So it would have to essentially walk the page table in this case because it is only a single level It just has one access to an L zero and then an access the original memory access So if we had three levels of page tables this number I'm multiplying would be four instead so in that case the effective access time is alpha and alpha is just the proportion of Cache hits and then so it's the proportion of cache hits Multiplied by the time it takes if there is a hit Plus all the miss times So for example if we had an 80% hit ratio and it took us 10 nanoseconds to search the TLB and Normal memory accesses took a hundred nanoseconds if we had to calculate the effective access time well 80% of the time it would take 110 nanoseconds So the original memory access plus the 10 it took to search it and then 20% of the time when we had to walk the page table and do the translation it'd be 10 nanoseconds for the search and Then our two memory accesses which would be 200 so 210 and then if we Throw all these numbers together our effective access time is 130 so Essentially normally if we didn't have virtual memory would always be a hundred this case It's a hundred and thirty so it's thirty percent slower which isn't that bad With all the benefits we get from virtual memory. So any question about that simple calculation not too bad So fun things and we can then we can see it in action. So context switches Because it would switch that root page table register and they would point to the same page tables Depending on how your CPU is implemented. Well, this cache might be really dumb So it would only be valid for the page tables you are using so You would have to handle the TLB and flush the cache if you context switch over to another process that Has a different address space and has a different L2 page table which would have different mappings So you either have to flush the cache or that cache has to be aware of processes So some architectures in the TLB it will also have a process ID tagged with each entry So they're only valid for a certain process Some of them don't have that feature in which case if you context switch over to a different process You have to flush the TLB. So most implementations just do that on risk 5 you have to do this Instruction this S fence whatever Horrible name essentially all it does is flush the TLB on x86 or like Intel chips or AMD chips if you load if you change the base table register or Their version of SATP it will automatically flush the cache for you Yep, so yeah flush this means clear So just get rid of all the entries in the cache because The new process might have the same virtual addresses that map to different physical locations. So it's not valid anymore Yeah, all right TLB testing so you can actually see the effects of this. So let's do the fun thing so there is this little program that was written by Good ol. Oh wait not that one. So yeah So that's our test. So there is this little program test TLB that was written by Linus Torvalds that wrote the Linux kernel or at least started the Linux kernel So what this program does is let you see the effect of your TLB. So it takes two arguments So whoops the first argument Right here is how much memory to allocate in this case I'm allocating a page and then this argument says is Like the distance between in bytes between every access. So this means For this one page thing access every four bytes So this would access by four byte zero by four byte eight byte 12 byte 16 and record How long it takes to do all those accesses So if I run this I Get an average time of 1.6 nanoseconds to access my memory Why is it so fast? Well, everything is on the same page, right? So In order for this process to actually Access all of its virtual memory. It would just have to translate once So walk through the page tables once for that first entry the first access to that page and then after that That all the other entries, you know that page will be in the TLB And it can just use the TLB over and over and over again So essentially the first access would be a cache hit or cache miss and then everything after that would be a hit So we get really really fast performance So the other end of the spectrum is something like this So I allocate was a large number of Memory and then I access every 4096 byte So what would the performance of this be or how many cache misses do I get for this? this Lot of memory I forget what that number to the Would you say it was 21? No Not 30 Was it there? So it's to the 29 which is 512 megabytes so allocate 512 megabytes and then I will access memory every 4096 bytes So every page right so I would expect that to be a miss every single time because I'm going from page to page to page How many how many TLB hits would I get? Zero so How slow will it be? Well, let's see Yikes every memory access in this case is 43 nanoseconds so it's what? Yeah, like 43 times slower something like that how slow is it actually 1.16 so it's like yeah 27 times slower if you're not just based off the TLB Very cool, huh? So mill the road if you accessed whoops If you accessed like 16 megabytes and then did what every 128 you could probably figure out the Hit and the miss ratio in this case see it's it's still pretty bad 12 nanoseconds But it's not quite as bad as missing every single time But it's still pretty slow if we have a bunch of hits really really really really fast So turns out things being contiguous Mostly matters for the virtual memory system. It really likes things that are it on the same page and Your kernel it allocates memory to you in pages. It only cares about pages That's all it cares about All right any questions about that cool Yeah, so you have to flush when you change between processes Yeah, well, so You might replace the old one, but when you context switch in a new process The new process might use the same virtual address Right which points to different physical memory so it can't be the same So that's why we have to flush Okay, so we can go back So the other thing we didn't explain in like when we S trace the libc We saw this SBRK instruction Well, that is a system call that grows and shrinks your heap and Your stack guess what it just has a set limit the default limit for a stack size is two megabytes So whenever the kernel creates your process, it'll just map two megabytes of virtual memory However many pages that needs to be for two megabytes so whenever you do this system call all it will do is It's just a big the heap is just one big contiguous block of memory and If you need to grow it all it'll do is grab pages from the free list and fulfill the request and just make all of The mappings in the page table point to valid memory now and now that process can use them It won't get a page fault if it tries to use them or seg fault or anything like this so it just kind of kicks the can down the road so That giant block of memory. Well, that's only mallet managed by malloc So malloc manages that memory and it should be the only thing that manages that memory If you continue on with this course and do systems programming you write malloc So writing a malloc that deals with arbitrary Size of allocations. It's a lot harder than just dealing with pages so we'll see how we can do some memory allocations not really a Focus of this point, but the operating system The colonel doesn't do it malloc does. Yep. I think so for yeah system program is 454 Yeah, so they'll have malloc Sometimes real memory allocators there's this M-map system call that will use later And it is short for memory mapping so it lets processes play with their own virtual memory And we'll see fun things we can do with that It will it has some very nice benefits so This is like the address space of a process in that little toy colonel that you can explore if you really want So what it does is it initializes the program's address space sets up the page table It makes it so that that whoops it makes it so the page zero is invalid So the null pointer just points to address zero. So it wants to always page fault if you try and Write it or something like that. So it would create an invalid mapping for address zero So you get a nice seg fault then it would be responsible for mapping a bunch of Whatever that executable is mapping the instructions to pages somewhere mapping the data to pages somewhere and then here It would have to create a stack however many pages fit in a stack and then One thing it'll do to detect a stack overflow. So in this case Your stack pointer would start here and assume that it grows downwards what the kernel will do is create something called a guard page and It is at a known address because your stack is a set size So because the kernel knows where it is it creates an invalid mapping there And if the kernel sees that you have a page fault there It means that you have over flown your stack. So if it sees a page fault Generated by this guard page it pretty much means that you just went over your stack And that's how your operating system detects whether or not you have a stack overflow or not So you can get around that warning if you jump over this page So you could overflow your stack so bad that you jump over a page in which case you'll just get a normal seg fault You won't get that nice stack overflow because You missed so that could be the case But it'll just put a guard page there just to give you some notifications. So that's some fun stuff. It could do heap could grow downward and Then if you read the source code of that kernel for some reason it has to do some tricks so When there's this thing called a trampoline which sounds like a weird term All that means is when you generate a system call Then the kernel has to have a handler to handle that and whenever the system call comes in The kernel like the CPU will not change that base page table register So it will be the same values as for the process So you'll be using the process is virtual memory. So what the kernel does is it maps a set page? that has all the interrupt handler code and that way the process it has protections on so the process can't change it and This is where the code actually lives for the kernels interrupt handler. So it would go to the trampoline then That's where the code would be that would save all your registers and everything and start initiating the context switch And it would have to switch into kernel mode and the kernel might use virtual memory And then it would have to play with those page tables That's like stuff. We won't really get into in this course But you can kind of imagine how it would work and then trap plane frame That's where it saves the registers. So it's the same idea It has to be mapped in the user's address space because the interrupt won't switch automatically and They implement it this way just because it was a lot easier Yeah Yeah, so The question is well if the heap is going downwards can also overflow the stack. So the answer that is yeah so you could put a guard page in between that but usually the heap is that like the top of the virtual address range so if it grows down and It's likely that by the time it reaches anywhere near the stack your machine is going to be out of memory anyways So like it would be at the top. So you would need in order to overrun the stack you would need to use like about five hundred and twelve gigabytes of memory to Crash into the stack. So likely your computer will run out of memory before then anyways But if you wanted to you could put a guard page between the stack if you wanted All right, so other fun things that the kernel can do So system calls are slow The curl can provide some sick fix virtual addresses That you can read kernel data without a system call. So like for example Some system calls that might need to be fast like really fast or like getting the system time and All you're doing is reading information. So to make it fast This is like the only exception where you don't make a system call for something that looks like a system call The kernel maps a page into your address space and maps it to the page It controls that has the information for the time So that way you don't have to do a system call if you want to get the current time You just read a known page that is mapped for you and Lib C those developers know what the page is so you just call that function But under the hood it's just going to read a known location and the kernel maps it for every process So that's some fun things you can do Also fun things you can do is those page faults Well, the operating system can handle those page faults They're just a type of exception for saying that the mapping didn't work out It's generated if the valid bit zero or if it can't find the translation or if a permission check fails or something Like that and this allows the operating system to handle it So it could just set up the page tables and know that that should be a valid address But actually not allocate a page to point to it until your process actually uses that memory And then only at that time the kernel can go ahead handle that page fault make it point to valid memory and then actually Make it point to valid memory and then just do the translation and then do the translation lazy Also, it can be used to implement something like copy on right, which guess what you do for lab 3 So you get to do that so you'll see how that works You'll have to think a little bit about that the code for lab 3 after you think about it The code's like 30 lines if you write it nicely So you get to implement that so for lab 3 start with the default copy Then do copy on right But this also allows the offering system to like map memory to disk do a whole bunch of fun stuff that we'll see a bit of in the next lecture So here's a summary so page tables just translate virtual the physical addresses MM use the hardware that does it it can be a single large page table, which is wasteful takes up a lot of space Kernel allocates memory in pages much simpler just uses a free list We use multi-level page tables to save space because most processes do not use a lot of memory And we use the TLB to speed up this multi-level page table. It's a cache. Yep Yep lab 3 is No, just having 30 lines of code does not mean it's too early. It means you should be thinking about it right now so You need to do copy on right This is the last lecture that will probably help you with that so it's just thinking so for lab 3 start doing the copying so you just You're essentially Mimicking what happens when you fork a process so you'd have to create new page tables And if they're completely separate and you just do the copy thing You have to copy all the memory that it uses and then make sure all the page tables line up and that they're completely separate So just remember pulling for you. We're all in this together