 All right, welcome back to Operating Systems. So thanks for coming and keeping me company, otherwise I would be alone for Thanksgiving. So yeah, I guess a lot of people started Thanksgiving early, so that's good. All right, so where we left off, which is probably good because this is relevant to Lab 3. So our multi-level page tables, this is where we had before. So we had three levels of page tables. This is what our 39-bit virtual address kind of looked like. We separated them out so that each page table fit exactly on a page, which was 4,096 bytes, and our page table entry was eight bytes, so that's why we had nine bits here for index because that's how many entries are in a single page table. So to translate an address, we would need a root page table, which just points to an L2 page table, and then given a virtual address where we take all the index bits from, well, we know where to go in the L2 page table, to go to the L1 page table, to go to the L0 page table, and then to finally do our translation. So any questions about this multi-level page table thing are fairly good. All right, we get to talk about that the rest of the day then. So first thing we need to talk about is alignment because some of you have never heard this term before. So alignment means that all memory is multiples that eventually line up with byte zero. So for example, if I say pages are 4,096 byte aligned, it means they always start when the lower 12 bits are all zero, and computing really likes assignment. So that means all your pages will start at, it'll either be address zero, 4,096, 8,192, da, da, da, da, da. They would all start on a multiple of that 4,096. And why is this? Because in computing, it makes your life a lot easier. Computers really like alignment. So if we didn't have alignment, we could have a weird situation where a page started at address like 7C00, in which case if we added a page, or like the size of a page to it and figured out what the address would be of the last byte, the last byte would be address 8BFF, and that is a pain, that is a pain. So if everything is aligned, we don't have to store exactly where it starts because we know every page starts when all the bytes or all the bits are zero. So in that case, I just tell you what page it is, and then if I tell you it's page seven, you implicitly already know because everything is aligned, that it starts at 7000, and then it would end at basically all those 12 bits are all ones, so it would end at 7FFF. So computers really like things to be aligned because it saves them a lot of trouble. You don't have to mess with doing hard addition, I don't like doing addition, no one likes doing addition, addition's hard. You just kind of take the bits, keep them as they are. So here's the question, is this address, you see 8 by the line. So 8 by the line again means that the address would be a multiple of eight. So 0, 8, 16, 24, 32, something else, 40, whatever. So would this address be 8 by the line? No, why, because it's the last thing ends in a C. Yeah, so C is 12, so this is not 8 by the line because in order to be 8 by the line, well since if I write in hex, it's in powers of two, you can easily check if something is by the line by looking at whatever the last line is, at whatever the last hex character is. So if it is 8 by the line, it means that the last hex character there will either be zero or eight. And it doesn't matter about any other bit because if any other bit here would be also a power of eight, so we don't have to keep track of it, it's really easy to check if something is aligned by just looking at the lower bits. So would this address be 4 by the line? Yeah, so same difference. So if it's 4 by the line, it's easy to check. It's lower hex character would either be 0, 4, 8, or C. So if it's one of those four, it's 4 by the line. Computers really like alignment. In fact, if you look at the documentation for your stack and stuff like that, your stack pointer sometimes has to be 16 by the line, all that fun stuff, which just means if it's 16 byte aligned, it just means that this lower byte has to, or this lower hex digit has to be zero. So that is why everything works, why we don't have to store the alignment as part of the page table entry or anything like that. Everything's aligned, don't have to worry about it. So we can simulate the MMU. We did that last lecture, we can do it later if you want, but first let's get through the rest of it. So just to make sure we understand things. So let's assume our program uses 512 pages. So means that it needs to have 512 different pages that it has used and the page tables need to accommodate that. So what is the MIM number of page tables we need if our program uses 512 pages, which means we have to be able to translate 512 different virtual memory addresses or virtual page numbers. And then the other question is, what is the maximum number of pages? So any idea for the minimum number of page tables we need? Two? Just one, so assuming we're using three levels of page tables, so assuming this scheme. So to translate a single address, we need at least three page tables, one L2, one L1, and then one L0. So is that all I need to be able to translate up to 512 different virtual page numbers? No, so if I did it with one, it would mean I only have one of these tables and it would need to point to itself, which would be really weird. Yeah, so we're assuming for anything, this is like the real what your hardware actually uses, so for most things, unless otherwise described, we assume this, yeah, yeah, and this is what you'll be dealing with in Lab 3 too. But yeah, this is how virtual memory works on your phone, on your desktop, on your watch, on your whatever, uses this to do translation. Yeah, yeah, so we have enough, if our program's using 512 pages, if we really wanted to, if we got so lucky, well, we would have, whoops, we would have our one L2 page table and then it would have an entry that points to a single L1 page table, which would have one of its entries point to an L0 page table and then we could, each page table has 512 entries, so this could just be full of 512 entries pointing to all different pages. So like this could point to page one that it uses and this points to 512. So at minimum, if our program uses 512 pages, we need three page tables because they all fit. So if it used 513, well, that L0 page table's full, so I'd need at least one more L0 page table. All right, so if this is our minimum, what's my maximum number of page tables if things go very poorly? Yeah, minimum is three. Yeah. 514? 514, so why 514? So I could have L2, L1 and then I could have 512 different L0 pages and then each of these only have one entry. So here this would be like page one, dot, dot, dot, page 512. So I have 512 L0 page tables, so I would have 514, worst case? Yeah, well, 512 cubed. So you only have one L2 page table because that's your root page table. You can't have multiple L2s because it only has one register to pick, so you can only have one of those. Yeah, 512 squared plus one. Yeah, so in this case, the worst, it's the same idea, you just make yours even worse, so I can make it even worse where each L1 page table goes ahead and points to one, a single L0 page table. So I would need 512 of these and then my L0 page table or my L2 page table would just be full of different entries. So in that case, my max would be what, 1025. So that's the worst case. You can also see that, well, given how we picked our indexes, I could have picked them in any arbitrary positions because this is just implemented in hardware. You guys know how to do that. You could just switch around the numbers if you really wanted to. But this is why computers like contiguous memory because if everything is contiguous, all the L0 entries are gonna be beside each other, which corresponds to our minimum case. So if all of our L0 entries are beside each other, we need less page tables, which is great. So that is actually the primary reason why in first year, they told you computers like things contiguously. Well, this is one of the reasons why. And we'll see it, we'll do a little demo later to show you really why. So this is the reason. So, all right, any questions about that? All right, other types of questions we can have. It's like, oh, well, I could describe you a system on something like a midterm or a final cough, cough, hack, hack. And it could be describe a system that has a certain virtual address size, a certain page table size, and a page table entry size, and you get to figure out one of the variables. So in this case, this describes a 32-bit system, which I was informed last lecture that you guys were born after that was a thing. So you guys, even when you were born, there were 64-bit systems, which doesn't make me feel old at all. But this, back in ye olden days, was how computers work. So your CPU could only support up to four gigabytes of memory, and no one had four gigabytes of memory, so it was fine. So everyone had a 32-bit virtual address and physical address. The page size was unchanged back in the day, so it was still at 4,096, and the page table entry size was only four bytes. Why is that? Well, because the physical address was only 32 bits as well, so it didn't need to be as big, so it could save some space. So in that case, we should be able to figure out how many levels of page tables we need in this case. So in this case, if we have a 32-bit virtual address, we have a 4,096 byte page size, and we have a four byte PTE. Well, we need to figure out how many entries we can fit on our page, because it's always easiest with multi-level page tables. That's the thing we do. We just make each page table fit on a page. So in this case, our page size is two to the 12. Four in powers of two is two to the power of two. So if I divide those out, I hopefully can do math where I get two to the 10. So this is the number of page table entries per page. So that's how many I can fit. So in that case, for each level of page tables, how many bits do I need to select what entry I need to choose? So if I have two to the 10 entries in my page tables, how many bits do I need to select which entry in my page table I want? Like how many index bits do I need? 10. Not a trick question. So it's just this. So I have to select one of those. So I need 10 bits. So in that case, I can figure out that I need 10 index bits, which you might realize that, hey, that also makes sense. Because before, when I had the same page size, but I had a eight byte page table entry, well, guess what? It's twice as big, so I can fit half as many things. So now it's 1024 as opposed to 512. So I can fit more things because it's smaller. So in that case, to figure out the number of levels I need, well, the handy dandy formula, which isn't terribly surprising, is the ceiling of the virtual bits minus the offset bits because that's not part of the translation for multi-level page tables. I don't have to translate any offset bits. So they are not part of this, oops, divided by the number of index bits. And then this is, we take the ceiling because we have to round up. So in this case, what are the number of virtual bits I have? 32, what are the number of offset bits I need? Still 12, that's where I am in a page. So it's 12 divided by our number of index bits, which we figured out was 10, and we have to take the ceiling. So luckily, this is a nice number. So 32 minus 12 is 20, divide by 10, and we don't really have to take the ceiling because this is just the ceiling of two, which is two. So in that case, we just need two levels of page tables, which makes sense because we divided out and wrote out what our L1 and L0 was. Well, L1 would be 10 bits, L0 would be 10 bits, and then our offset would be 12 bits. So those all equal 32, so that makes sense, yeah. No, so the 12-bit offset has to do with the page size. So it's where you are in a page. So whatever the page size is to the power of whatever is how many bits you need for the offset. Yeah, so the offset's always related to page size. Yeah, so it's just where you are within the page. So that makes sense. And then also if I did something silly, like I don't know, say I said there was like now 33 bits from my virtual address, well, guess what? It makes things really ugly because I have to take the ceiling of 2.1, which is just three. So if I add one more bit, well, guess what? I need a whole another level of page table and that new L0 page table, if I only have 33 bits, it will just contain two entries and it'll be a complete waste. So that's why virtual memory will just be in multiples of the number of levels because if it has to add another level anyways, it may as well fill it up. So that is why in this case, we have 39-bit virtual address and then if you look at your spec for your hardware, your hardware can actually support a larger virtual address range and it would just add another level on. So if this is 39 and I have nine index bits, means the next highest level would be 48 bits and then go on from there. So I just add another level and essentially every time I add another level, it's just slower because to translate anything, I have to go through another level, which is just another memory access, which is slow. All right, so that was fun. So like I said, slow. We have to follow pointers across multiple levels of page tables. Most of the time you access information on the same page multiple times. So in your program, if you have a bunch of local variables, they probably all fit within a page. If you're gonna use one, you're gonna use the other. There are different virtual addresses, but the root of the translation is still the same. It's a virtual page to a physical page. So your processes probably only use a few pages at a time. So you only need a few virtual page number to physical page numbers mapping at a given time. Sometimes in the literature, this is called your working set, which is how many pages you're actually using at any given time. So if we wanna make this faster, we just do what we do in computer science. You just add a cache. Add a cache, boom, go faster, can't explain that. So the cache for your page tables or MMU is called a TLB or a translation look-aside buffer. And it basically just caches your page table entries and makes it so you don't have to walk the page tables every time you access a virtual address because that would be slow. So here it would take your virtual address, take all the bits that represent the virtual page number and then look it up in the TLB. If it's there, it would store the page table entry, which has the physical page number. In that case, it doesn't have to access main memory or anything like that. It can just do the translation and then do the memory access you wanted to. Otherwise, if it is not in TLB, it is a miss. And then we have to walk all the page tables, do all that slow stuff, and then whenever it finally reaches L zero, figures out what the page table entry is supposed to be, it would add it in the TLB and then figure out the virtual address what we saw before. So remember, one of our goals was to have performance as close to possible to physical memory while using virtual memory and we can calculate our average access time. So it's the effective access time. It's called eat. We don't have a sense of humor. And the time it takes for a TLB hit is, well, a TLB is a cache. So it takes some time to actually search to the cache to see if that virtual page number is in the cache, but it's going to be on your CPU, closer to, yeah, on your CPU, much closer to that in your register. So it's going to be a lot faster. So this TLB search time plus the memory access is how long it takes if there was a hit. If there was a miss, you still have to search the TLB and then you have to walk through the page table. So in this situation, this is only, this is only if there's one level of page table. So I need two memory accesses, one to access the page table and then one for the original memory access. If the system said this is for a three level page table, this number would just be a four instead of a two because I would need to access on a miss, I would need to access L2, L1, L0, and then finally the original memory access. So the effective access time, hopefully you would be able to come up with this for yourself, you get an alpha that represents the proportion of hits and the effective access time is the proportion of hits times however long a hit takes plus the proportion of misses times however long a miss takes. So in this case, if we had a 80% hit ratio and our TLB took 10 nanoseconds and our memory accesses, our physical memory accesses took 100 nanoseconds, our effective access time would be well 0.8 times 110 nanoseconds because this 10 is for the search and then 100 nanoseconds for the memory operation and then in the case of a miss while 20% of the time we still need to search the TLB which is 10 nanoseconds and then we have to walk the page table and do the memory access which is the other 200. So our effective access time in this case would be 130 nanoseconds as opposed to just basic physical memory which would be 100 so 30% slower which isn't too bad, 30% slower instead of four times slower. So caches are big good, big good. All right, so because we have this cache too we have to deal with it when we context switch. So every process has its own page tables and its own virtual addresses would map to different physical addresses so whenever you context switch in a process and start using a different page table, well guess what? Your TLB is now no longer valid because it was valid for whatever process was accessing memory before. So you either have to flush the cache which is a term that also just means clear the cache or what you can do is your cache has to be aware of what page table it used for the translation. So some really fancy architectures in order to not invalidate the cache will put a process ID as part of one of the fields in the cache. So it knows that, hey, this virtual address is only valid for this process. But if you don't have that, most implementations just completely flush the TLB and clear it, so on risk five they have to use this SFANTS VMA instruction. If you look at that teaching kernel you'll see whenever they do context switches they use that instruction and that just flushes the TLB. On x86 whenever you change that register that represents the root page table it just automatically flushes the cache for you so you don't even have to remember to do it. All right, so now we can test and see the effect of the TLB. So in here, whoops, you didn't see that. So in here, this is a little program that was written by none other than Linus Torvalds that started the Linux kernel. So he wrote this in order to see how fast what the effect of your TLB is. So there's this little program called test TLB and what it does is fairly simple. So it allocates a memory with the size of the first argument. So in this case it would allocate 4,096 bytes and then this argument is how often to access memory. So this will access every fourth byte. So it will for that memory allocation. So it would access byte zero, byte four, byte eight, byte 12, byte 16, so on and so forth until it got to the end of this allocation of memory. So how many memory accesses would this be doing? 1,024, right? Yes, I screwed that up on the last lecture. Cool, we got the right answer. All right, so yeah, we do 1,024 memory accesses. So whatever I access byte zero is that going to be a TLB hit or a TLB miss? A miss, right? So if assuming this is just all on the same page, first time I access that memory, it's gonna be a miss because I've never accessed it before. And then for the other 1,023 accesses, if they're all on the same page, they should all be hits, right? So I'm still accessing the same page, all the rest should be hits. So if I run this, what it'll do is it'll calculate how many memory accesses we have and then just see how long they take, tell you how long each memory access takes. So if I do this, I see that each memory access takes like 1.6 nanoseconds, which is quite fast. So I can see what the opposite extreme is. So I could do something like this where I allocate this large number is I allocate 512 megabytes of memory and then I access every 4,096 bytes. So how many misses do I have in this case? All, simple answer. So because I'm accessing every 4,096 byte and that's the same size as my page, every memory access I do is going to be on a new page. So it's going to have to walk the page tables every single time. So if I run this, it will probably be a lot slower and indeed it is. So it took, whoops, it took 40.14 nanoseconds as opposed to 1.62 if we had mostly hits. So it was like 25 times slower. So guess what, your TLB is the primary reason why they said access things contiguously because they'll be on the same page and also you'll get the nice benefit of you'll probably use less page tables as well because everything's right next to each other. So you could go kind of in between where we access every 128 bytes so we have like, you could probably, you can calculate our hit and miss ratio in this case. I don't really want to think, but this will be in between the two where we have a few hits until we switch to the next page. And in this case, our time's not as bad but it's still fairly bad. So we have like 12.6 nanoseconds per memory access as opposed to what we would probably want which is closer to where the hell my mouse cursor go. There we go. As opposed to 1.62 which is fast. Yeah, yeah, number of cycles would be clock cycles on the machine. So my machine's pretty fast I guess. All right, any other questions with that? So TLB good, cache good, make things beside each other. This is the reason why they told you in 105 to have contiguous addresses. Ideally you want everything on the same page and use as less pages as possible and that will really affect your performance, yep. Yeah, if you didn't have a TLB at all, everything would be the 40 nanosecond case. So every single variable you access is gonna be 40, like if you didn't have a TLB, people would just immediately throw your computer in the garbage. Like you're missing one little cache, suddenly your computer is 25 times slower then you're like, yeah, that's not good. So yeah, cache is good. All right, any other questions with that? All right, cool. So we saw this system call before when we were going over what libc did. So we saw SBRK, it looked like a weird system call. What that does is it will grow or shrink your heap and for your stack it has a set limit. So the default limit is your stack is like two megabytes and that's it. So for your heap, it can either grow or shrink and it all has to be contiguous memory. And then for growing, the kernel has an easy job because it only cares about pages. So what it will do is if you request some more space on your heap, what it'll do is just create new page table entries for you, grab some pages from the free list and just set all of those page table entries to point to those pages it just grabbed and make the page table entries valid. So it just grabs you some memory. Kernel only allocates memory in terms of pages. And because of this, while your memory allocator is the only thing that uses a heap, the kernel just kicks the can down the road and doesn't deal with memory allocators, you have to write your own memory allocator or use something like malloc. So malloc will use the heap. So if you get into the system programming course, well, you get to write your own memory allocator and we'll see some memory allocators as we go through because the kernel would have to allocate memory as well. But in terms of the user processes allocating memory, the kernel just deals with pages, it doesn't care. And usually you rarely shrink the heap. So this is also why sometimes you run out of memory really quickly, quicker than you think you're actually using memory. Because it's really hard to shrink the heap because they're all contiguous addresses. And in order to shrink the heap, it would have to have an entire page free at the bottom and then give that back, which typically doesn't really happen. It's really hard to free pages. So some memory allocators will use a system called like mmap and that will essentially let you play with the virtual address space and you can just get new pages on any address you really want. We'll see how to use that in the next lecture. It's basically a way for your processes to control virtual memory a bit better. So this is, if you look into the code of a real operating system like that XV6 teaching operating system, this is what a processes address space might look like. So each of these boxes here would represent like a page and it would do some fun mapping. So like for instance, the page at the bottom here would represent virtual addresses zero to like zero F F F. And why does it always have that pages invalid? Well, it's because the null pointer is zero. And if you try and de-reference it, it wants to guarantee that you will not access memory. It will get a page fault and crash your program or tell your program it's sake fault or whatever. So the kernel would make sure that that first that zero with pages never mapped to any real memory. So you can crash immediately. And then pass that there's only a few, few virtual pages you're not allowed to touch. That would be one of them. Otherwise the kernel, whenever you exec your program, it'd be responsible for reading the file and then putting all these instructions on some pages as many as it needs. Some pages that represent data for your program, which would be your global variables. And then it would need to allocate some pages for the stack. And then some of us has seen a stack overflow message, right? We've all seen stack overflow. So the way your kernel knows that there was a stack overflow is it will put something like a guard page at the end of the limit of your stack. So it would be at a known address. And because of this, if your stack grows down, well, if you go into this guard page and try and access memory there, your kernel is going to know that, hey, I got a page fault when you tried to access that address. That address is part of the guard page. So I know you likely overflowed your stack. There's no guarantee because you can just make up virtual addresses, but that is how you get a stack overflow message. So typically it'll only put one guard page there. So if you overflow your stack by more than 4096, you would not get a stack overflow message because you overshot this and just landed at some random memory. And if you overshot it a lot, you might, if you really badly screw up your program, you could be lucky enough to overshoot your stack, such that you get into some global variables and you start messing with that instead of your stack. And then you'll never ever debug anything in your life. So luckily most stack accesses don't go over a page. So that's why that works. And then here your heap would be somewhere near the top. And then you might notice some weird terms here, like a trampoline or a trap frame. What the hell is that? Good question. So you don't really have to know this. This is more like, if you need to implement your own offering system, these are the silly little details you need to get right. So whenever a system call happens from a process, that would generate an interrupt that the kernel gets to handle, right? So whenever the kernel starts handling that code, well the page table doesn't automatically switch once the kernel gets control. So if the kernel tried to access any virtual address or any memory, it would be accessing the memory for your process because it hasn't switched the root page table yet or anything. So what the simplest way to solve this issue is that the kernel will essentially map the instructions it needs for the interrupt handler in your user program's address space. So it will put a page there that's like read-only and executable so you can't modify the instructions, otherwise the kernel would run that. And then whenever a system call happens, it would start executing instructions at this known virtual address, which is valid in your address space. And just to make things easy, this trap frame would essentially contain all the registers for context switching and all of that. So the kernel would save that at another known address, that it maps to your page or maps in your virtual address space. And then it could switch over to its own virtual address space, which gets even more complicated or it could use physical addresses or something like that. So that's like a fun implementation detail if you have to implement your own kernel. Yep, so there's different choices you can make. Your kernel can use just straight up physical memory if you want, most kernels use virtual memory. So as part of it, if you're dealing with an interrupt, typically you do some stuff and then you switch to your page tables and then you can access your memory and then because you're the kernel you can actually control all physical memory because you can arbitrarily change the page tables too. So if I really wanted to access this page of physical memory, I can just change what a virtual page points to and then access it through a virtual page or something like that. And there's, if you get into like implementation there's lots of different options you can have for dealing with physical memory if you're the kernel. But beyond this course, that's like grad level stuff or if you need to implement your own kernel. But like this is what, if you read the code for XV6 for whatever reason, this is what it does because it's simple. All right. So other stuff we should probably actually know is because of this, kernel can basically use that same idea and have some fixed memory addresses. So we know that system calls are pretty slow. So the kernel can do some things to make things very fast. So this is the only exception where you can access, you can essentially do system call E things without doing a real system call. So for instance, for operations that you want to be really fast, like reading the current system time and if you want it in nanoseconds, well guess what? The system call is gonna take way longer than it would take to like, actually just check the timer a few times. So what the kernel does is something that looks like a system call like clock at time. Instead of doing a real system call, the kernel will map the physical memory that contains the system time to each process. So as read only, so every process can just read the clock. So if it knows that virtual address, libc will know that virtual address, you just access a known virtual address and you read the clock. You don't have to do a system call, you don't have to do anything, you just read that virtual address, you know that it represents the current time. So nice and fast, avoid a system call, basically only for read only data because you don't really want to be able to write to the kernel. So this is the only exception to having to use system calls to interact with the kernel. Yep, yep, yeah. So in this case, say I know the times that virtual address, I don't know like someone's favorite number, 4,000 or something like that. It could be at any physical address and the kernel would know about that and the kernel would just set up the page tables such that for each process address, 4,000 points to that physical page and it would be the same in every process. All right, so any other questions? Good for that, cool. So other little thing, these page faults allow the offering system to handle virtual memory and be lazy about it. So page faults are again what happens whenever you follow the page tables and it can't resolve it to a physical address. So either it's not valid or a permission check fails, like I'm trying to write memory and the right bit's not set or something like that. So because the offering system or the kernel gets these page faults, the kernel can hide the page faults from the user. So the kernel could just lazily allocate a page and say, okay, process requests that this virtual address is valid. I could be like, yeah, no problem and not actually get a physical page. I can just put the page table entry that just says this is invalid and then I'll put some other information in there that says this is actually valid if they try and use it and then when they try and use it, I get a page fault, then I actually grab a page and back it by physical memory and then I change the entry such that it's now valid and now it can use it and it didn't realize that whenever it requested memory, I completely ignored it. So you only use as many pages as your program actually uses. So this is also why you might find out that if you write a C program and you allocate, you could allocate 10 gigabytes of memory and you will get an instant response be like, no problem, I got you. And the reason why this works is because it just sets up the page tables for that and it doesn't actually grab any physical memory for that and it just says, and whatever you try and use it, only then will it get a page. So you only run out of memory if you try and access every single page. So that's why sometimes you can just allocate a large amount of memory and also why in Htop or any process manager, you might see like your memory usage is way more than what you expect and what actually is being used. It's because those programs requested more virtual memory but they actually haven't used it yet. So that's why you might see those numbers. And other things you could do is implement copy on write, which is lab three. So what that means is by default, whenever you fork, you could share physical memory between processes as long as they only read that memory because then it is safe to share it because no one's modifying it. No use making a copy of it if it's just the same for both processes. Then I could wait until one process tries to modify that memory and only then I would copy the page, then do the modification on another page and fix up the page tables such that one process points to its modified page and the other points to the original one. All right. So, TL didn't listen, I guess. Page tables, they're used for translating virtual addresses to physical addresses. MMU's the hardware for that. It uses page tables. It could be a single large page table, which is really wasteful because typically we don't use that much memory in a program. Even on 32-bit machines, it's big and wasteful. Even if it's only a few megabytes and the kernel, it doesn't do any complicated memory allocation. It only cares about allocating using pages, which can be done using a free list of pages. All of our systems now have multi-level page tables. Primary reason for that is to save space for most programs because like we saw before with our 64-bit one, we don't want a gigabyte page table per process. That's just not going to happen. And then in order to combat using multi-level page tables and being slow, we use the TLB as a cache for that. So, with that, have a good Thanksgiving and just remember, boom for ya. We're on.