 All righty, welcome back to Operating System. So this is the last lecture full of content that is testable for the final. I don't quite remember if I actually asked a question on this because I already forgot because I wrote the exam like last week. So where we left off last time, we were talking about how the kernel actually allocates memory for itself, doesn't do the most general thing, so it does something called a buddy allocator or a slab allocator are the two most common ones. So before we had the general version of the problem and that is where we ran into difficulties. So the buddy allocator, it restricts the program so it doesn't allow allocations of any size at least while internally. It typically for computers allocations are in powers of two anyways, so like two bytes, four bytes, eight bytes, sixteen bytes, thirty-two bytes, da-da-da-da-da-da, you know, iNodes in lab six for 128, so on and so forth. So a buddy allocator restricts it so that all your allocations have to be of powers of two at least internally and how it works is it starts off with a big block of memory and then it just recursively splits it into two until it has enough memory for the allocation and it's called the buddy allocator because, well, it splits memory into two so you have one half and the other half and they're essentially buddies because they are allocated contiguously and you already know that they are contiguous, they are together because hey, they're friends, they're buddies, so this allows us to do fast merging which if you try to implement it using free list merging is actually a pain in the butt, yep. So this is not why iNodes think blocks are five-twelve bytes, I don't know where that hard-coded value came in, yeah. So also today's lecture will probably be fairly short, I will stick around and you can work on lab six, some smart students have just like camped outside my office after and just worked on it and then just asked me questions, so recommend you can do the same. Alright, so the point of the buddy allocator is, well, if you actually implement it like I said coalescing free blocks that are close together is typically kind of hard, this makes it a lot easier because, well, two blocks are buddies and you know that you can coalesce them together or merge them together to create a bigger block. So you can implement this using multiple link lists, one for each level and then each level all of the blocks are of the same size, so you don't have to bother iterating through it, you're just keeping track of multiple free lists of some blocks. So you restrict some request to be two to the k, where k is just some number between zero and some large number for however many bytes you're actually managing and if a request comes in that is not a power of two, you just round up to the next power of two and then you would have some internal fragmentation. So our implementation in this case while we would need n plus one free list, so one list for each size of block and then if a request comes in that's a power of two, you just search the free list until you find a big enough block and we don't have to search the entire free list. So we would search the k free list, so blocks that are of the same size, if that is empty, so there's nothing that exactly matches that, I go up to the next biggest, so I'd search k plus one, see if there's a bigger block free there. There's nothing free there, I go k plus two and then three until, well, until I find one or I'm out of memory. And what it does each time it goes up, if we find an allocation it would just recursively divide the block into two as soon as it finds a block of any size until it gets the size it actually needs again. And all the buddy blocks it created by splitting it in two, the unused ones, well, they go back onto the free list. So for deallocations what you do is you just coalesce the buddy blocks together. You don't have to do it recursively and go on forever. It will be limited by log k or log n, so you can recursively coalesce the blocks if needed, but you'll be always coalescing buddy blocks together. So what does that look like? So assume my buddy allocator is managing 256 bytes of memory, so each of these levels would have its own free list. And if it has a dark box background, it's just to illustrate that that block has been split into two. So in this case, this is all of our memory 256 bytes and there is not a free block here. So it is in use because we had to split it into two 128 size blocks. So there would be no elements of the free list of size 128. And then we split each of those into a 64 byte block. So in this case, well, I would have an unused 64 byte block. I got split into two 32s and this 32 for the red allocation. So there'd be a red allocation that's 32 bytes. And then its buddy is this element of the free list, so it's 32 bytes free. And then for the rest of the 64 byte blocks, I have two free ones. And then I have this blue or teal or cyan allocation of 64 bytes. So if someone requests 28 bytes from the buddy allocator, what happens? Yeah, it'll fill up the second block of 32. So 28 is not a power of two. So it's going to round it up to the next power of two, which is 32. In this case, it would start looking for free 32 byte blocks. In this case, well, there is a 32 byte block. So I would just put the allocation there. So what happens now if I have yet another allocation this time for 32 bytes? Yeah, use the next 64 byte block. That would be the smallest free block available. So then I split it into two. And I would add a 32 byte entry to the free list and use the other 32 byte buddy to do the allocation itself. So 32 byte block one would go in the purple, I think I have it. And we'd have a 32 byte entry free. So now what happens if we free the 64 byte block here? Yeah, so this would turn white. And then in which case is it's free and is its buddy also free? So if it's buddy is also free, we'd coalesce them together and create a free 128 size block. So in this case, we just have a free 128 size block and we would be done. So any questions about the buddy allocator? Yeah, yeah, yeah, if I had an allocation of size three or something, get round up to four and then it would find the 32 byte one and then split it into two 16s. One of those 16s would be split into two 8s. One of the 8s would be split into two 4s. I'd use one for the allocation, one would be free. And there'd be one free on each level. Yeah, there'd be like a four free. An eight free is 16 free because I split it off. Yep, yeah, yeah, keep coalescing until you ran out of buddies or you ran out of friends and your buddy is allocated. So it's like real life, you keep being, okay, I don't know where I was going with that analogy, just ignore it. Yep, yeah, yeah, so we have to fill in free blocks. So that's the ones with the blank backgrounds. So those were the only ones that were free. At the first step, there were two 32 byte ones free. I just picked the first one. But internally, if you actually implement it, it would be a linked list of 32 byte ones. So it'd be whatever, they're all treated kind of the same. Yep, yeah, so the free list would be stored by the kernel somewhere, just probably in, they'd probably just use global variables or something like that. Seeing how it would just initialize itself, see how many free lists it needs and then just allocate them. Yeah, so you just allocate space like the free list would just be, it just needs a pointer for each one, right? So I figure out how many, how many bytes I need to manage. And then through powers of two, I know how many levels I need, so I just allocate that many pointers. But we don't really go into the implementation, but it would just allocate some, yeah. So just in the code somewhere else, for each of these levels, you'd have to have a linked list for each level. So in here, my 32 byte free list would be empty. Then my 64 byte free list would have these two entries in it. And then I'd have my 128 would be empty, my 256 would be empty. Yeah, yeah, each level has its own linked list, so like here. So our implementation would use that one list for each level, or for each powers to. Yeah, by level, so linked list for each level. So linked list only controls one fixed sized allocation. So you can just check based off, it's laid out in powers of two and everything's aligned, so you can do it. It's just some low level bit manipulation stuff, but it's pretty quick to check if you're a buddy. Yeah, sorry? N, so N is just how many, the largest power of two, the largest exponent power of two I need. So if it's like 256, then N would be eight, say. Yeah, so in this case, let's say this would, if I have a 32 byte allocation, that means my K would be five. So there would be a linked list of the five levels. So if this was full, I'd have to search up to the power of six, which would be this. So there'd be level six for all the 64 byte entries. So how many levels depends on how much memory you're managing in total, so what the highest one is. So you could have it managing a page, in which case it would be to the 12. Or you could have it manage more memory if you want. Just every power of two you go up, you just add a linked list for that level. All right, any other questions about the buddy? All right, so, advantage is that fast and simple, compared to dynamic memory allocation, if you actually go to implement it, coalescing takes up a large majority of your time if you just allow allocations of any size and try and make them fit that size perfectly. This also avoids any external fragmentation, tries to, you could still have it if you don't have any requests that are big enough. But it's a bit better because everything is at least on the same level a fixed size. I have more things of a fixed size and I have buddies. So it's more likely my memory is going to be contiguous. And I can go ahead and coalesce my buddy. A lot less likely than I'm going to have as bad fragmentation issues. The disadvantage is, well, if I don't have an allocation of a size power of two, there's going to be some internal fragmentation. Because of just forcing it to have blocks of powers of two. So if you don't have an allocation that's a power of two, you're going to waste some space. So the worst case would be if I had an allocation that was like a power of two plus one. So if I had an allocation that was a power of two plus one, I would have to round up. And then I'm wasting almost half of a block due to internal fragmentation. Yeah, yeah, yeah. If we had a 128 size allocation here, well then the 264 byte blocks would be like external fragmentation. It would say out of memory because it's not contiguous. Yep. So malloc can use this implementation if you want. Typically it works better with powers of two. But with this, it doesn't have to be a power of two. But internally, it might round up and then you might waste space. Depends on the malloc implementation. I forget what the standard one you'd likely use is. Yeah, I forget exactly what it is. But it's either this or the free list. It probably uses free list of some type. And different programs, I think I said before Chrome, just implements its own memory allocator because while all these, like the more general it is, the worse it might perform in your specific application, if you get into high performance, do you actually care about performance? You might implement your own memory allocator. So you might implement a buddy allocator yourself if malloc doesn't use it. If you know all your allocations are powers of two, and you might need to optimize it a bit so you can get rid of some levels or something like that. So the other type of allocator is called a slab allocator. And it takes advantage of just having fixed size allocations. So everything is the same size. So how that works is similar to how lab six works if you've started it. So it allocates objects of the same size from a dedicated pool. Because while all the objects are of the same size, so in lab six, all the inodes are the same size, they're 128 bytes. So essentially you treat them as an array and each is as good as any other. And to keep track of all the allocations, you just have an array of bits. And that corresponds to each inode and it's quite simply. Zero means it's unallocated, one means it's allocated. So I just use arrays instead, don't have any link list. So since everything's the same size and you know it's the same size and all your blocks also match that size, you don't have any internal fragmentation because, well, everything's the same size. So a slab is kind of like a cache of slots. Each allocation has corresponding like slab of slots. So one slot holds one allocation. So like I said before, instead of a link list, just use a bitmap. So you just have a bit keeping track of whether a slot is used or not. And for an allocation, while you just search the bitmap and see if one of them has a zero in it. So you just do some bit manipulation, so you can just check. Go across the bytes, if they're full of Fs, I mean they're all ones. Just keep going until you find one that at least has a zero in it. And then to allocate it, you just flip that bit to a one. Say it's allocated and go grab the memory with whatever slot it corresponds to. And then for D allocations, super simple, we just flip the bit. So the bit goes from one to zero. And you may or may not want to just overwrite the memory or clear it out associated with the slot. But if you wanna be fast, just leave the old values there because it'll be overwritten. And you can implement if you want a slab on top of a buddy allocator. So you could have a buddy allocator for general allocations. And then just do an allocation of a fairly big allocation. And put a bitmap at the beginning. And then just have a bunch of fixed size slots if you want. So you can kind of build them on top of each other if you want. So the slab, typically this is what you used for when you are optimizing memory allocations. If you know something is the same, you'll do something called a slab. It might be called like an arena or other things in other languages. If you wanna call them something different. So let's say we had two object sizes A and B. Well, say this was on top of a buddy allocator. So I could have a slab of objects of A that were some power of two for the slab and I have a bitmap on there. And then I can hold four elements. And in this case, well, I filled up these two elements and for some reason my program had to go ask the buddy allocator for more memory I needed another slab for more A objects. And here I can fit another four in. And in this case, well, if it doesn't nicely fit on a power of two, you might have this dark background that indicates that there's some internal fragmentation within the block returned from the buddy allocator. So I could also have objects that are nicely sized called object B. And in which case, I get something back from the buddy allocator. Allocate two allocations of B objects and then create another slab and go ahead and I have two free B objects here. So we can reduce the amount of internal fragmentation if the slabs are like located adjacently. So if I wanted to, instead of having slab one A and A1 and A2, well, I could have just made a one that was larger. That was sequential. Maybe I could have squeezed out a ninth element in this case if I treated it as one contiguous thing. So any questions, yeah? Yeah. So for the same reason, there'd be a dark box in A1. Just the elements of A don't quite fill up that box. So in this case, say the slab, the chunks from the slab were like 128 or something like that and every A object was 25 bytes or something like that. So I'd have 28 bytes left over for internal fragmentation. Just making up numbers. 128 and each is 25. So I could fit one more, so 30. Each one's 30 bytes, something like that. So I can't fit another one. All right, any other questions with that? So nice and simple. This is essentially what you do for lab six anyways. Yeah, you could implement it, I guess, in hardware. But you have to implement in software because, well, you would have to tell it what the size of the objects are ahead of time and no one's really gonna have hardware specifically for that. So you just do it in software. Computers are good at checking bits anyways. Like your CPU essentially checks bits really fast. You don't really need special hardware. So yeah, here's a slide we left over from the file system lecture. We did not talk about a journaling file system. So, rewinding back to the iNode lecture. So remember, if we were to actually physically delete a file on Linux, there would be a three step process. So I would remove its entry from the directory, so that's that unlink system call that removes that name iNode pair. And then if there are no more hard links to this iNode, then I can go ahead and delete the iNode. If you've started lab six, well, there's a bitmap that keeps track of whether an iNode is used or not. So to release the iNode back to the pool of free nodes, I have to flip that bit to say I'm not using it anymore. And then, well, that iNode could have used some blocks. Since it's deleted, I can take those blocks and then indicate that they are free as well. So there'd be three steps to actually deleting a file on Linux. And, well, since this is actual physical hardware, are we under attack? No? All right, that's just a real noisy fan. So if you're doing this, so your disk has some persistence. So at any point in this three-step process, you could yank out the power cable and your file system would be inconsistent. Again, if you started lab six, you probably got some inconsistent warnings just from you working on lab six, where your file system isn't quite consistent. So some of that will happen naturally if you just yank out the power cable at a bad time. So if you just disconnect the power cable between any of these steps, your file system will be inconsistent. Either that iNode will exist. So in this case, if it crashes right after step one, well, then that iNode's going to appear to be used. But there's no actual names that point to it. If it crashed after this step, well, I'd be wasting some blocks. It'd still be indicated as used, even though there are no iNodes that actually point to the nodes, and so on and so forth. So in order to recover from just yanking out the power cable between any of these steps, since it's pretty much impossible to recover, there's something called a journaling file system. And what that will do is before it starts doing these three steps, it will make sure it writes to somewhere on the disk, kind of a log book, if you will. And it will say, I'm going to delete this file. And it will just put a note there, write it up to this and say, I'm deleting this file. And if that entry is still in the log whenever it boots up again, it means it did not successfully finish all three steps. If it did finish all three steps, it would just remove that entry. And then you know you're completely done with it. But if that entry is still in the log, it means you know where to go to fix your inconsistency. So you would start by checking, oh, did I remove its directory entry? Oh, okay, maybe I did. All right, let's make sure all of its blocks are free. Okay, well, now I can free the blocks because I know someone joined the power cable at this point of time. And you can actually recover and make your file system nice and consistent again. So any questions about that? All right, finally winding down the course. So we saw even more memory allocation. This concludes our testable material for the final. And tomorrow or Thursday, we'll talk about virtual machines, and then we'll just have study sessions. So for this, kernel restricts the problem for better memory allocation implementation. So it will use a buddy allocator as a primary allocator it uses. So that restricts the memory allocations to just be of powers of two, one link list for each power of two. And you might implement a slab allocator, takes advantage of fixed size objects to reduce fragmentation. Don't have any external fragmentation because everything's the same size. And to keep track of it, instead of using a link list, I can just use essentially an array of bits. So with that, remember, I'm pulling for you, we're out. So if you ask a slab for a size that it's not handling, it just won't work. It has to be of the same size. Yeah, smaller size would work too if you want. But typically you're just allocating, in the case of your lab, just iNodes, all iNodes are the same size, so you wouldn't waste any space. Yeah, you might combine two slabs if you're controlling all the memory. Because if you move something, you have to update all the pointers and everything. And then if it has multiple threads, hey, data races are fun. Everyone loves data races, right? All right, everyone loves data. All right, yeah. No, so before it starts doing all these three steps, it will write to the disk what it's about to do. Yeah, so I'm going to delete this file. And then it would start doing the three steps. And then if it crashes before it, or you yank out the power cable before, it would check, what was I doing? I was deleting this file, I obviously didn't finish it. I know it to check for consistency. So as part of the putting it into log, it would know the three steps it needs to do. So most of the time, it would be too slow to say every step you do before you do it. So it just writes one entry before it does it, and it implicitly knows the steps. After it's done, after all three steps are done, it would remove that entry. So now it's done with doing that operation. I don't have to write it down anymore. So analogously, it'd be like you crossing off something on your list. On your to-do list, I did it. All right, so rest of the time, I will be here. Work on lab six, and get things done. I will be here until the end of lecture time. So just remember, phone for you, we're on this.