 All right, good morning. Welcome back after a restful reading week, hopefully. I know I slept most of it, so that was good. Hopefully everyone's done with midterms and everyone's good to go now. So we'll be easing back into it by talking about basic memory allocation and how basically malloc works and you actually get memory from the kernel because remember the kernel doesn't really care. It manages memory very, very easily. It just allocates in pages and gives you a whole page, which is like four kilobytes and then you're responsible for doing any other allocations past that. So first, let's talk about different allocation strategies just as kind of review and make sure we understand what's going on. So static allocation is kind of the simplest strategy. So for example, in your program you might have like a global variable that says char buffer of 4096 and now we all know that magic number is because that is essentially the size of a page. So how this works is that global memory, whenever your program loads in, the kernel is going to reserve some space in virtual memory for that buffer for you and make sure that it's valid memory that's actually backed by physical memory that you can go ahead and access. And this memory will go ahead and exist as long as your process does. So there's no need to free it. That's why it exists for the entire lifetime of your process. There's no such thing as a free in this case because the allocation essentially happens whenever the kernel loads your program into memory. So that's the simplest strategy, basically a big old global variable. But often in our programs you don't know how much memory you're going to use if you have static allocation that means it's static. Once you allocate it, you're not getting it back. You can't really free it. You can't ask for more because you're already running the program so you can't do anything. So dynamic allocation is often required for any real programs. So for example, you only conditionally require memory based off maybe what the user's doing or based off something you're reading from the network or something like that because static allocations might be wasteful if you allocate a bunch of memory that you never ever use then there's no point of actually doing that allocation. Maybe you just want to only allocate it on like a per use basis and then you need dynamic allocation. Other reasons are you might not know the size of the allocation ahead of time too so static allocations need to account for the maximum size because you aren't ever going to grow, or buffer before that's never ever going to grow. That's just allocated when the program loads, that's it, you can't grow it after the fact. So static allocations have to account for the maximum size so if that buffer represents, I don't know how big your file can be or something like that then you have a maximum size of the file that isn't that big. And while there are limits on all computer systems, generally you want to be flexible with the limits and kind of be able to increase the limits if you know you need to. So then the question is, where do you allocate this dynamic allocation? And we all kind of know the two main strategies are we either allocate it on the stack or on the heap. So, stack allocation, that's what's done for normal variables. So if I just declare an x in a function, that will declare four bytes on the stack and you may not know this, but you can actually allocate stack space yourself if you really want. There's this function called allocA that you might not have ever seen before, but essentially that's what the compiler is doing for you. So it will allocA instead of malloc and it just takes how many bytes you need. Essentially it just increases the size of the stack and gives you a pointer back. So the compiler is going to insert that for you so you don't have to allocate stack space yourself. It's kind of one of the reasons that people actually use C as opposed to just assembly, just because it's nicer to you for using the stack. So the deal with allocA, you'll realize that there's no free for allocA. We've never freed our variables in C at all. So whenever that function where the allocA called returns, it frees all the memory. So it just resets the stack point back and all that memory can be reused again. So all it does is restore the previous stack pointer and now that address is no longer valid. It's essentially freed for you and the nice thing about it is no matter how many times you allocA in a function, you only have to reset the stack pointer once to get rid of all the allocations. So it's also nice and fast. But this also won't work if you try and use the memory after returning. So say we had a function that kind of looked like this. If we just had like a function that returned an int pointer and we called it foo. Well, a silly thing we could do is declare space for an integer x and then we could return the address of x. So this is going to be a very bad thing to do. It's going to compile and be fine, but we know that the space for x, it would allocA whenever it enters the function and when it returns and leaves the function, it's essentially freed and it gives you a pointer to x. Now that is freed. So that's now pointing at invalid memory. And if you use the pointer returned by foo, you would have no guarantees what would happen. Essentially, it's going to just, if you modify it, it's going to replace some random stack variables. So this is pretty much the equivalent of doing something we already know we shouldn't do if we had a pointer to x and we malocked it and then we freed it. Well, if we did something like that, we would actually probably get a seg fault or something bad would happen so that's like a use after free. So you can also do that with stacks, but it's just harder, but it's just something you know you never want to do. Okay, so any questions about that? Hopefully kind of see review. Okay, so we've all used dynamic allocation before too. So these are the malock family functions. So this is kind of the more flexible way to use memory and also the more difficult to get right due to all the seg faults we've been seeing. So because of this, you have to manage your lifetime of your pointers. So whenever you malock something, you have to free it at some point and you have to call free on it exactly once. So if you call free on it twice, that's invalid. If you never free it, you're just wasting memory. So on top of this, so today we'll be talking about the actual implementation of malock. So there's kind of a new concern. The concern about that is fragmentation, which we kind of touched on before, which is essentially just wasted memory. So this is a unique issue if you're doing dynamic allocation because we're allocating memory in different size contiguous blocks. So whatever argument, whatever size you give to malock, you're getting all that memory contiguous and you can essentially ask malock for any size you want. So they're all going to be very different sizes and if you're malock, you have no idea when the user is going to free it ever. So you can't, compaction is like, if you tried to compact all the, if you had a bunch of fragmentation, so you declared a bunch, you malocked a bunch of memory and then say you freed like every other pointer, then you'd have kind of holes in there and wasted memory. And what you would like to be able to do is just kind of compact that, kind of just move everything so you get rid of all the holes, but you can't do that because you gave a pointer to the program and they have to access that memory through that pointer and they won't be able to update that pointer unless you go through like a layer of indirection. So something like Java will be able to do compaction because Java doesn't let you deal with pointers so it could do this, but of course another layer of indirection is slow, so Java is also slow. So what a fragment is, is just a small contiguous block of memory that cannot handle an allocation. So you can think of it as just a hole that wasted space. So if I have one block of memory, that's like a hundred bytes and then I have a hole of say like two bytes or something like that and then another 200 bytes allocated. Well, if I never make any allocations that are two bytes or smaller, then that hole is never gonna get filled because all the allocations are bigger than what fits in that hole. And to have a fragmentation problem, there are three requirements you have to have. So if you have different allocation lifetimes, that is a recipe for fragmentation. If you have different size, if you have different allocation sizes that also is one of the conditions for fragmentation and the inability to relocate previous allocations is also another one. So you have to have all these three things in order to have fragmentation. So if I don't have number one, so everything has the same lifetime, well, then I don't really have fragmentation because if everything has the same lifetime, it's just exactly like dealing with a stack. So the lifetime of any stack allocated variable is however long that function lasts for. So I'm not gonna have any fragmentation because I allocate it and I get rid of it all at once. So that's an example of a stack. And then another one, if every allocation is the exact same size, I also won't have this problem because I can essentially say, hey, if all my allocations are like 128 bytes or something like that, well, if I have to deallocate stuff randomly, then I'm going to have another slot open that can fit an allocation of 128 bytes. So everything is the same size. So I just essentially pick a slot and fill it. So I'm not gonna have any fragmentation there. And then the third one is of course, the inability to relocate previous allocations. If I could relocate previous allocations, well, I could just compact everything. I could move everything and make it all kind of one big contiguous block. Okay, any questions about that? So there's two types of fragmentation. There's internal and external fragmentation. So generally when you talk about fragmentation, you argue in terms of blocks that you use to fulfill the allocation. So external fragmentation occurs when you allocate different size blocks. So there's no room for an allocation between the blocks. So in this example, this big block is a whole size of memory you can deal with. And the red blocks are the allocated memory. So I have a gap between those two red blocks. And if all my allocations are bigger than that block, then I'll never be able to fill it. It's essentially just wasted space. And then the other flip side is internal fragmentation. And that only occurs. Some memory allocators will allocate fixed size blocks. That might be bigger than the actual request. So that's wasted space within the block that the allocator is essentially managing. So for example, hey, I could say, oh, I'm not gonna deal with fragmentation because I'm each of these big blocks here. So at the bottom there, there's two blocks that the allocator manages. So say they were, I don't know, like 64 bytes. And then the user is doing like a 40 byte allocation and a 10 byte allocation. Well, the memory allocator might just allocate two 64 byte chunks of memory. And then internally, the program doesn't actually use that entire block. It uses like 40 bytes and 10 bytes or something like that. So the wasted space within that block that the allocator cares about is internal fragmentation. Okay, so as our goal for memory allocators, one of our goals at least is to minimize fragmentation. So it's just wasted space, we should prevent that and allow programs to use as close to possible the most amount of memory we have in our system. So we want to reduce all these holes or blocks in between memory if we have holes. And yeah, generally, if you do have a hole, one of your strategies could be, well, if I have a hole, I'll keep it as large as possible. And then that hole is actually not really wasted. I can just use that hole for allocations. So I want to keep allocating memory without wasting any space. So the allocator implementations usually just use a free list. So offering systems tend to like link lists. So they keep track of free blocks of memory by essentially chaining them together as a big link list. And then you need to handle a request of any size. So for an allocation, generally you'll choose a block big enough for that request, remove it from the free list and give the user that allocation. And then for the allocation, you would add that block to the free list and now it can be reused again. And if it's adjacent to another block, well, then I can merge them and I have more free memory because it's all contiguous. So there's three general strategies. If you're doing kind of heap allocation using this method, there is best fit, which is choose the smallest block that can satisfy the request. So the idea here is I'm picking if there's a hole in memory, I try and a request comes in, I try to pick the block that best matches it. So of course, because I'm looking through the free list, every time an allocation happens, I'm looking through every element of the free list, which is kind of slow. I have to look at every element to see what the best fit is, unless there's this exact match because then you know you're not going to do any better. So you may as well just do the allocation there. So there's an end early condition, but you might not hit it that often. Then worst fit seems like a worse idea, but we'll explore that a little bit. And the idea between worst fit is, well, I'll just choose the largest block, which has the most leftover space. And again, this also has to search through the list. While if I didn't want to do all the searching stuff, the strategy I could use is just first fit. So choose the first block that satisfies the request and just use that. So let's go through an example. If we're allocating using best fit. So at the top there, I have a red allocation, a blue allocation, and then the blocks with the blank background and number are free blocks. And the number is essentially how many bytes that block is. So in my big block of memory that my allocator is managing, I have a red allocation, blue allocation, and then a free 100 byte block and then a free 60 byte block. So if an allocation, a user requests an allocation of 40 bytes, I have to pick a place to do this allocation. And if I'm doing best fit, where should I do this allocation? In what? 60, yeah, so this is best fit. So I have two choices. I can either put it in the hundred or put it in the 60. If I do best fit, it's the closest size that can actually fit my allocation. So 60 is a closer fit than 100. So I would allocate there and then split that off so that 60, I'm using 40 of it for the request. And now there's 20 bytes left over. So now I have a purple allocation of 60 bytes. Where do I allocate that? 100, I can't put it in 20 because it's too big. So I put it in 100. So now I have the 40 byte block and a 20 byte block. So now a pink allocation comes in of 60 bytes and I'm screwed at this point. So I have 60 bytes of free memory, but it's not contiguous. So I can't actually do the allocation. So I have, okay, the helicopter is landing on the building or something. I don't know. So I have enough bytes to do the allocation, but because of fragmentation, I can't actually satisfy the request because the next block doesn't fit anywhere. Wow, that's really annoying. Oh, is that jack hammering or helicopter? Jack hammering, jack hammering. Yeah, jack hammering downstairs. Perfect time to do it. Okay, well, let's do the same allocations doing worst fit instead and see what happens. So worst fit generally sounds worse. So let's see if that's actually the case. So if I'm doing worst fit and that 40 byte allocation comes in, where do I put it? In 100, right? I'm looking for the biggest block possible. That is the worst fit. So I would do the allocation in the 100 byte. So I have 60 left over. So now if I have a purple allocation of 60 bytes, well, where do I allocate this? And at this point, there's two 60s. So I don't really care. I'd probably pick the first one. So now I have 60 bytes left over. And now if I have another 60 byte allocation, well, this fits exactly in the remaining space. So in this case, worst fit was actually better because I could use all my space. So just because it has the name worst fit doesn't actually mean it's worse. In some cases, but not all cases, it sometimes works actually better. But both best fit and worst fit are both really slow. And if you simulate this with different size allocations, if you look at best fit, it tends to leave very large holes and very tiny holes. And the small holes might be just complete fragmentation that you might not ever use and they'll kind of be completely useless. And then worst fit simulation says it's the worst in terms of storage utilization after all, if you actually simulated a bunch of times, but they're both really, really slow because they have to scan the free list in its entirety. And first fit, in addition to being much easier to implement and much faster because it doesn't have to scan through everything, it also tends to kind of leave average size holes and actually work pretty well, which kind of works out in our favor. Okay, so because I cut this one short because no one was here on last Thursday's lecture, so we'll go over page tables. And if you have any more questions on page tables, I hope, well, get out of your system today because isn't there a quiz this week? I think so. No, it's not, it's next week. Okay, whew. But page tables will be on it and hopefully we'll go over it again. So if you're the kernel or you don't have a C standard library, you have to implement your own memory allocations. We'll see more allocation strategies that the kernel actually uses today because it doesn't actually use any of these strategies. This is more for like malloc and stuff. But the concepts are gonna be the same for user space allocation like malloc and stuff. The kernel just, yeah, remember in terms of the kernel, the kernel doesn't really care to user space programs. It just allocates some pages and then backs them with actual memory, which are done through page tables that we'll explore a little bit more. But in terms of memory allocations, there's static and dynamic allocations. And then for dynamic allocations, we have fragmentation being our big concern because dynamic allocation just returns big blocks of memory and then fragmentation between blocks that the memory allocator is managing is called external fragmentation and fragmentation within the blocks of memory that allocator is managing is called internal fragmentation. And there's three general allocation strategies for different sized allocations. We saw best fit, worst fit and first fit. So let's go over the page tables. Oops. All right, so no one was really here last lecture on Thursday. So the hardest thing is probably the multi-level page table thing. Oh my God, that jackhammering is great. So we have a multi-level page table here. So I have three levels of page tables. So there's a level two, level one, level zero. This is going to use SV39. So there is a 39-bit virtual address and then a, yeah, 39-bit virtual address that translates to like a 56-bit physical address, but that doesn't really matter. So we have three levels of page tables because our page table entry size is eight bytes and our page size is just four kilobytes. So in this, I essentially wrote a simulator for the MMU that uses the page tables. So in this, we allocate a level two page table and allocate page table just returns a page. It doesn't do anything that special. So it allocates a page and we're going to use it as our level two page table. So the only thing the MMU actually uses is a root page table, which is just a level two page table and that is what you would switch, whenever you switch processes and you want to switch what the virtual address is mapped to. So in this case, my root page table is a level two page table and I create a level one page table and I make the entry of the level two page table at entry zero, essentially point to the level one page table. So it creates a page table entry from this page table. So it essentially just removes the offset and it essentially just gets the physical page number of that page table and sets it as the entry as index zero. And then we do the same thing. We allocate a level zero page table and then in the level one page table, at index five, we go ahead and point to the level zero page table and then the level zero page table is going to directly translate to from whatever virtual address that led us there to a physical address. So our page table entry is from a physical page number and we just want it to map to the physical page number cafe. So if we use this virtual address in this case, whoops, so in this case, if we go ahead and use that physical address because of the way we set up this page table, we should know that it actually would translate to this address C-A-F-E-D-E from A-B-C-D. Any questions about that? Okay, let's see how well we know stuff then. So what about if I make the address like that? If I change the F to an E, is this going to actually map to a physical address or is it going to do a page fault? It's good? All right, what physical address am I going to get? So that's my virtual address. Yeah, so if I run it, it's actually going to map because remember the last three hex characters are the offset in a page. So if I change them, it doesn't change the virtual page number because this is the virtual page number here, it's A-B-C. So whatever entry that is, we'll still map all the way to, you know, I could do zero, zero, zero. Whoops, I do zero, zero, zero and that still maps. But if I go over and change this to a D, well then I'm going to get a page fault because I have no entry. The other question, questions about page tables or why some of the magic numbers are here or do we all know why I put, why I used all the numbers that I did? 84. Yeah, so why did I use 188? So that is a good question, so I'll just go review it again. So here's the address I had. It was A-B-C-D-E-F. So my offset was the D-E-F part. Whoops. So this was my offset, the D-E-F and then A-B-C was my virtual page number. So if I go ahead and write out A-B-C in just decimal, or sorry, in binary because each of my levels of page table are nine bytes so it's a bit easier to do that. So if I write out C here, well all C is 1100 and then B would be 1011 and then A would be 1010 and then everything beyond that is going to be zeros. So then if I group it out into nine bits falling from the last one, well I can see the last eight, or sorry, the last nine bits here are going to be my index into L0. So that's 010111100. So if I add all those numbers together because we all know our binary arithmetic, it's four plus eight plus 16 plus 32 plus 128. So it's 188. So that's why we index 188 into L0. Then similarly, for index into L1, well the number there is one plus four. So we index five there and our index into L2 is going to be zero because it's all padded with zeros. So this is going to be, oops, that is to do. So this is what our page table looks like. We just allocated level two page table. It's entry to zero pointed to an L1 page table and then it's entry to five pointed to an L0 page table. And then it's entry at 188 pointed to had the physical page number at cafe and that's where we did our translation. Okay, so kind of makes sense, ish. Okay, well, let's see how sharp we are today. So let's translate. So if I do something like 7FFF, FFFF, FFFF, just to make it easy. So if I do this, what do I expect that to translate to? Whoops, we're blind. Okay, if I do that, if I do MMU 7FFF, FFFF, FFFF, FFFF, what do I expect that to translate to? Given my page tables, page fault. So if I do that, page fault, okay. Say I want 7FFF, FFFF while I can't do anything with the index bits. So say I want that to map to, I don't know, like 1FFF. So how would I do that? I would need a new page table. Which page table would I need? So I need a new page table. So here, I'll give you one. So all I would need is a new page table and done. I would need to index into it. I'd even left bits for 7FFF, FFFF. Okay, so if we were to translate step back, so if we were to translate 7FFF, FFFF, FFFF, FFFF, how would I actually translate that? Load the steps I would use if I had, you know, a 39-bit virtual address. These are normal page size and 8-byte page table entry. Because that's essentially what SV39 is. So how would I go about translating this? Even if I didn't know how many levels of page tables there are. Yeah, so the last three are gonna be the offset because it's, you know, to the 20. To the 12, so each hex digit or each hex character is going to be four-byte bits. So the last three, that's 12 bits. So I don't have to translate that, that's the offset. Okay, so how many levels of page tables do I have for this? Yeah, so I have three levels of page tables and why is that? Yeah, so the number of entries. So if I do multi-level, remember I have, I fill an entire page with page table entries so I can figure out how many of those I actually need. So the number of entries I can fit, page table entries I can fit in a page is going to be to the 12 divided by two to the three because there are eight bytes. So they're going to be two to the nine entries in a page. So my index bits, so within a page, the number, or sorry, within a page that represents a page table that fits on a page, well, I'm going to need nine index bits because that's however many I need. So if I want to calculate the number of levels, I need in my page tables, remember it was the ceiling of the virtual bits minus offset bits divided by index bits. Well, in this case, this is, even if we forgot what SV39 was, so and we're driving it. So the number of virtual bits was 39 because that was from the question. And then the offset bits, well, that's from the page size, that is 12 because each page is four kilobytes. So that's minus 39 minus 12 divided by the number of index bits which we found was nine and then ceiling of that. So in this case, it's actually kind of nice because it's 27 divided by nine. So you don't really have to take the ceiling of that because that gives us a nice three levels and then you can see why they pick the number 39 bits. So we have nine, we have nine bits for an index. We're gonna have three levels of page tables. So that's, so our address would look like L2, L1, L0, and then offset. So this would be nine bits, 12 bits, nine bits, nine bits. And we can double check because 12 plus nine, plus nine, plus nine, that's our 39 bit virtual address. So seven FFFF, that is 39 ones in a row. So if I have 39 ones in a row, well, if I, it would be nine ones here, nine ones here, this isn't an accurate number of ones because I don't want to write them all out. But so what are nine bits of ones in decimal so we can actually read it? Five, 12, five, 11, five, 11. So in this case, for my big virtual address of seven FFF, well, in the level two page table, it's going to look at the 511th entry. Then in level one, it's going to look at the 511 entry. And then at level zero, it's going to look at the 511th entry. So what is happening and why we generate a page fault is right now we have our level two page table and in our level two page table, so this is what we were doing before, in our level two page table, it would go look for slot 511, there's nothing there, so it's a page fault. There's no valid entry there. So we need to make a valid entry there. So that's where we have the new page table come in and that should be a new L one page table because we only have one L two page table. It's already there. That's our root page table. We don't have to make another one unless we were like switching processes or something. We wanted to make a new address space. So we could just call it some creative name like L two page table two and then let's go ahead and copy paste. So we need to make it the 511th entry and we need to create a page table entry from page table two. So essentially that's just going to rip out whatever physical page number there is and put in the page table entry. So now we have an entry for that. So if I compile and run it, do I get a page fault or am I good? We're good. So what we did, we created a level one page table. Yeah, we need more levels. So there's an entry in the level two page table that points to the level one, but in the level one page table, we just kind of made it, which one is it? This one, we just kind of made it and we didn't do anything with it. We didn't fill, we didn't give it any entries. It's just full of invalid. So it would look up, first it would look up, 511 in the L1 page, or L2 page table. It would get the L1 page table that would look up 511 again in that. And then that's not going to have an entry. So we need another page table. So we're going to need another level zero page table because we need that new L1 page table to point to an L0 table. So let's do that. And then we have to make sure to actually create an index in our new L1 page table that would be at 511. And then that should point to our new L2 page table. Cool, all right. So if I compile and write it again, page fault or am I all good? We're good. So for anyone that said we're all good, what address were you expecting? Well, what were you expecting? You're expecting kaffafafafa? So we wouldn't have kaffafafa because essentially the page table entry we have is to translate this virtual page number into a physical page number. And this is the virtual page number we're trying to translate now. So it's going to look at completely different index bits than it would if it's trying to translate the ABC. So what do I need to do to finish my journey? Okay, so in my L2 page table, I didn't need to make another one because remember what we said before in here? Well, first thing it's going to do, look at the L2 page table, look at entry 511, then use that as the L1 page table, then look at 511, then look for the L0 page table, then look at entry 511, and then it's going to do the actual translation there. So that's going to just get the physical page number and actually do the translation there. So where we're at in our code is we have our L2 page table, and then that L2 page table at entry 511 points to our new L1 page table, which is good. And in our new L1 page table, which was L1 page table two, it has an entry at 511, which is good, which points to an L0 page table called L0 page table two. And now it would get that go to the L0 page table and look at entry 511, and what's there right now? Nothing. So it would page fault. So how do I make it something? So I need to do essentially what I did here and make an entry for it. So in L0 page table two, whoops, at what spot? Yeah, 511, that's where it actually does the final lookup. So I would get a page table entry from a physical page number, and I said I want it to be 1FFF, so that would be physical page number one. So if I do that and compile it, am I all good or a page fault? Everyone's scared now, everyone's page faulting. So why that page fault? That was a good question, because it shouldn't have. I have a typo, right, this is page table, oh no, page table one, page table two, page table two, okay, that should have worked. Anyone point out the stupid thing I did, because I probably did something stupid. No one find anything weird yet? So I don't make another L2 page table because there's only be one. How do I screw this one up? They screwed the address, did I miss an F? Oh, there, I screwed up an F. Woo, so I messed up an F. So there, it actually works. As long as you type the virtual address correctly. You can't see from here? So, whoo, saved that last minute. All right, any questions about that? No? If there's only eight Fs, like what I had before? So if there's only eight Fs, it's gonna change the index bits it uses, so I just have to make it map correctly. So I would just choose different indexes in that case, like for ABC, I looked at the different indexes. So as long as you can get them, then you can populate them however you want. Yeah, yeah, and you can also do fun things like, hey, if I want both these virtual addresses to, I don't know, go to the same address. So if I had ABC, FFF, and seven FFF, if they both had the same physical page number there, then if you translate both of them, they both translate to the same physical address, and then you have overlap, and it's a lot of fun. So just remember, pulling for you, we're all in this together.