 Welcome back to 353. So today, another short lecture. So if you haven't already, you can get your laptop out. Make sure you have a copy of Lab 6 so you can work on after this because you'll probably have enough time and I will be around to answer questions. So we are wrapping up memory allocation and what the kernel actually uses to allocate memory for itself because, of course, it would not have malloc. So one common memory allocator is called the buddy allocator and it restricts the problem a little bit in order to actually implement something that's a bit more efficient and a bit more easier to implement with most of the nice benefits of just the general technique using a free list. So typically, on a computer anyways, allocations are like powers of two anyways. So like 2, 4, 8, 16, 32, 64, 128, da, da, da. Page size may be even larger than that. So what the buddy allocator does is it restricts every single allocation to be a power of two in order to have a more efficient implementation which we will get into. But basically, what it will do is split blocks into two recursively until you can handle the request. So if it has a big block of memory like 4,096 and you're requesting eight bytes, well, it would split that 4,096 into two and then split one of those 2,048 into two, then one of the 1,000 whatever into two until you can actually fulfill the allocation and then the reason we mostly do this is for fast merging. So every time we split a node into two, we say that those two are buddies. So they're beside each other in memory, they're contiguous. So if you free both of them, since they're our buddies, you know that you can merge them back into one bigger block and they become a super buddy, whatever you want to call it. So how you actually implement this is, well, it's everyone's favorite link list except you have multiple link lists. So you would restrict the request to be some powers of two like two to the K. So at most I can fulfill an allocation of, I don't know, pick your favorite number like 30, maybe I can fulfill it one up to a gigabyte, maybe a megabyte, maybe a kilobyte, whatever you want. And for your implementation, whatever that to the K of the maximum request is, you would actually implement it using n plus one free lists. So you have a free list for each block size. So for instance, I might have a free list that keeps track of all the free blocks that are size 4096, then a free list that keeps track of all the free blocks that are 2048, then a separate list that keeps track of all the free blocks that are 1024. And that way, well, each free list, all of the free blocks are the same size. One is as good as any other, so it doesn't really matter. Then if a request comes in for some memory, that would be a powers of two, you just search the free list until you find a big enough block. So if you have an allocation that comes in for, I don't know, for example, like 128, well, there's a free list that corresponds to every free block that's 128. If you can find a free block that's exactly 128, great, you are done. If you can't, well, then you have to move up. You have to be like, okay, well, there's nothing that's exactly the size that fits. So I'll check the 256 free list, see if there is a free element there. If there's not, then I have to go up and check the 512, so on and so forth until we actually find something that is free. And then as soon as you find something that is free, you would recursively break it off into two. So you recursively break it off into one half and the other buddy half, and then you would just go ahead, insert all the broken house back into free list for the other sizes, and then fulfill the allocations. And then for deallocations, you would coalesce or merge the buddies back together. So if you free a node and the buddy is also free, then they just come together and they form, well, reform that bigger block of memory and you could do this recursively as well. So if you and your buddy are free, well, then you coalesce them into one bigger block. And then if that bigger block also had a buddy, coalesce them or merge them. And if that bigger block also had a buddy, then you merge those as well. So what would that look like visually? So here's what we could imagine where somewhere using a buddy allocator at the top here, I'm just putting the total amount of memory the buddy allocator is using and anything that's just kind of housekeeping is done with a block background. So this buddy allocator is managing 256 bytes of memory. And then at some point, it had to break this into two buddies here that are 128. And then it had to break off both 128s into two buddies that are 64. And in this case, there's two allocations, there's a red one that's 32 and then a blue one that's 64. So this 64 byte block got split off into 232s. One of the buddies was used for the red allocation here and this 32 byte entry is free. And for the 64 bytes, there's two 64 byte entries that are free. And then here we have the blue one that has the 64 byte allocation. So we would have, in this case, we would have four free lists. So we would have a free list for 256, that's empty. There's no free 256 byte ones, free list for 128, also empty, nothing is free, free list for 64. So we would have two free entries here, this one and this one. And then we would have a free list for 32 byte entries that has one free entry. So now we can see what happens. What if you were to use this and you requested 28 bytes of memory? What would probably happen? So is 28 a power of two? All right, so if I have to do powers of two, what power of two should I probably do for a 28 byte allocation? Probably 32, right? You just round up to the nearest power of two. Can't round down because if I use 16 for 28, probably not gonna work. So how it works is, well, I would just round up to the nearest power of two, in this case, 32 bytes. And then technically I am losing some space due to internal fragmentation because I'm going to allocate 32 bytes to it and it only needs 28. So I'm essentially wasting four bytes due to technically internal fragmentation. So what would happen? We round up 32 and then we would go ahead and we would just use that free 32 block. That would be our new allocation. We can put it in green. All right, now what happens if we get another request for 32 bytes? Yeah, I just split up a 64 byte block into 232s. And then while that's the size I need, I would have two entries on my free list so I can go ahead and use one of them for the allocation. So let's just say we break up the first 64 byte, break it off into two 32 byte buddies, and then now it looks like this. We put the allocation in purple and now for a free list, there's one entry in the 32 byte, one and one in the 64 byte. Good, everyone great, good great grand. All right, so what happens if I free the six? Oh, yeah, sorry. So what exactly is in the free list? So the free list would just be a link list of pointers to the free blocks of that size. So there'd be a free list for all free 32 byte blocks. So there'd just be one pointer to here and be stored in a link list. Yeah, and then link list for 64 byte, one entry just points right here. Okay, yeah. Yeah, so the free list for the smaller size depends on the implementation. Usually you just create them all ahead of time in, it would just be empty initially and then you just add them and remove them as they come in. So you probably just initialize them, yeah. Yeah, so the question is, does the free list kind of know what the buddies are? And the answer to that is because it's set up in powers of two, you can actually like look at the memory address to actually see whose buddy is which because the lower bits would match depending on the level. So you can kind of play around with the bits and mess around with memory to know exactly who your buddy is, right? But yeah, so turns out that's another reason why they did this in powers of two because computers like powers of two and you can just kind of figure it out from the memory address. And playing around with memory addresses is good old fun, everyone loves doing that, right? And doing bit manipulation and all that stuff. So I won't go into the details, just know that it is possible. Okay, so what happens if I free the 64 byte block here in the blue? So I just free it and then I have 264 bit or 264 byte entries. No, why? Because it's got a buddy, right? So they can get together and do buddy things, I guess, and become one, doesn't that sound great? All right, I should just stop talking. All right, so I would free this, it's buddy's free so they get to merge into one and then I would have one big 228 byte block free list. All right, that's basically how it works. Any questions about the buddy allocator? Yeah, yeah, so if I allocated one byte, then yeah, technically I would have a free list for like 16, eight, four, two, one, or maybe I just say, I'm not doing one byte allocations, I make my smallest one four and then if someone requests one byte, maybe I just give them four and then I just say three or lost to the internal fragmentation, I don't care. So it depends, yeah. Yeah, the free list would only just keep pointers directly to the blocks. So the buddy blocks are figured out just to where they are in memory. So like for the 32 byte block, what? The last five bits would be the same and then the only difference would be like if you have a one bit difference in the six bit or the bit at index five and then you know they're buddies. So you can just kind of figure it out from the memory address. Yeah, so what's the purpose of merging blocks? Just so, because you want to merge them together because you need to know that your memory is contiguous. So if a bigger allocation comes in, you can actually go ahead and fulfill it. Like if I just had the 264 byte ones there and someone requested 128 bytes, you might look at them and be like, oh, I don't know for sure that they're contiguous because they might not be if there was a lot of them. I don't know if they're contiguous or I don't know if I can actually do that allocation. So for all the memory allocators stuff, usually if you were to implement that free list thing we saw yesterday, figuring out if you can merge two nodes and doing that is like really, really slow for this. Like it's logarithmic, right? And it's fast. You can just do some bit manipulation. Oh, I pulled my shoulder. All right. So buddy allocators actually used in Linux. So they're fast, simple, compared to dynamic memory allocation. If you are weird like me and for some reason you look at the Linux kernel sources, you will see buddy allocators used everywhere because well, they're great and it avoids external fragmentation for the most part by keeping everything fairly contiguous. All the free physical pages doesn't really matter. The disadvantages of this is there will always be internal fragmentation if you do not allocate empowers of two. Luckily for the Linux kernel developers, they're sane so they know what they're doing. It's usually always empowers of two. So when you give it to a user, you cannot be sure but if you need to make this work for something that is not a power of two, you just round up and then you can go ahead and you have some internal fragmentation. So the other topic is slab allocators and they are four things that used a fixed sized allocations. So if I allocate objects of the same size from a dedicated pool, like all the structures are of the same type, they're the same size. If you have started lab six, this looks familiar to you because this is basically that pool of inodes or that inode table. It's basically a giant array. Everything's the same size. One is as good as any other. You just need to keep track of whether or not it is used. Since they're all the same size, doesn't matter. There's not going to be any internal fragmentation, not gonna be any external fragmentation. I can just use it as long as there is space in that array. And people will also use this in programs if they allocate a bunch of weird things that are all the same size. Well, if you use malloc, it's gonna use some variation of the free list of everything and it's going to be really, really, really slow. So if you know that your program is going to use millions of things that are all going to be the same size, you can go ahead and implement the same idea in lab six. For your own means and it will be a lot faster because, well, it's basically just one giant array. So each allocation has a size. They're all the same size. And then there's like a slab of slots, basically an array and a slot is just like an element in the array. So instead of keeping track of the allocations in a link list, we use a bitmap. So all you need is one bit of information to know whether or not something is in use or not. And there's going to just be a mapping between the bit index and the slot. So for allocations, it's really, really simple. You would go ahead, search the bitmap. As soon as you find a free slot, so it's not full of ones. Well, figure out what the index of that zero is and then that's the slot, like address of the slot you should give back. And then you just set that bit from a zero to a one to indicate that, hey, it's now in use for something. Please do not allocate this again. And then for deallocations, whoops, you can be really, really, really lazy. And you just set the bit from one to zero. You don't have to free the memory if you don't really want to because it's slow. You just clear the blit. It's going to be really, really fast. And if you wanted to, you can mix and match memory allocators. So you could use a slab on top of a buddy allocator. So that would kind of look like this. So this picture is a bit weird. So bear with me. So consider we have two object sizes, like an A and a B. Well, in here, I have four different slabs. So two for A objects, two for B objects. And then maybe this slab that I call 1A, maybe that's a big hunk of memory that I got from a buddy allocator. So it's some power of two. Let's just assume it's like, I don't know, it's 256 or something like that. And then on top of that, I have some room for a bitmap. And then I can fit, in this case, I can fit four A objects on it. And then I have some internal fragmentation represented with the dark bit here, because it doesn't quite line up exactly. And then maybe I have another slab called A2 that's also holding A objects. And I got that from the buddy allocator. So maybe it's 256. And then maybe I have the slab with a bigger object called B. And I get another 256 back from the buddy allocator. And then maybe I can fit exactly two B's on that block. And I don't have any internal fragmentation at all. Then maybe I have another slab that I got from the buddy allocator. So you could do something like this. So you're using that general buddy allocator for everything and then having little slabs on top of it. And then this may or may not work as well, because you might look at this picture and be like, oh, well, instead of having two smaller slabs for A objects, what about if I just had a bigger slab for A objects and then maybe instead of being able to fit in eight, I can squeeze in nine. So maybe you could do that. Kind of depends on the implementation and how flexible you want this to be. So any questions about that? Yeah. So the question is, how are slabs different than just static allocations? So at least in this picture, each of those slabs could be done dynamically. So I could just get one slab that holds four. And then I use five, so I allocate some more memory to hold some more A objects. And I just do that so I can do it dynamically. And you could do it on top of malloc. So usually what some programs will do to implement this, if you know you're going to use tons of objects that don't live that very long, they're all going to be the same size. You'll just do malloc and just say, give me a megabyte, malloc a megabyte. And then you start managing that yourself. And you just use it all for those objects. And you essentially write your own bitmap and write your own little memory allocator. And you just say, malloc, give me a bunch of memory. I'll do something better with it. Yeah. Yeah, so in this case, the question is why would I even put a slab on top of a buddy allocator isn't fine by itself? Maybe the objects are like weird sizes. Maybe they're not a power of two size. And it's also going to be a lot faster to do the allocations and deallocations. Just accessing a bitmap is a lot easier than a link list. And they could also just be weird sizes. And that could be why, too. Yeah, if you're doing it dynamically in their weird sizes and you use a buddy allocator, then it'll just round up to the next power of two, right? You have no choice. Yeah, so y slab A2 also has internal fragmentation. So I'm assuming that A has a weird size. So say this is like 256. And each A object is, I don't know, what's a weird size? 60 bytes? Is that a weird size that works? Yeah, that works. So each of them are like 60 bytes. So this would be like 240. And then this would be like 16 bytes wasted, right? And if there are two different slabs that could be not contiguous, well, I can only fit four of them on them anyways, right? All right, any other questions with this? But yeah, usually, I mean, you could just malloc and do that. OK, yeah. Yeah, so the bitmap would probably be within the slab itself, or you're managing it somewhere else in memory. So yeah, but it's kind of up to you where you want to put it. You just have it somewhere. Maybe it's somewhere else in memory. Maybe you put it at the beginning of the slab or block. Maybe you put it at the end. Up to you. OK, other last thing that I forgot to mention about file systems is there are things called a journaling file system. Wait, did I mention this? I did. OK, screw it. I did mention it. All right, I fixed it. OK, that was, I guess, there before when I forgot. All right, so wow, I think this is a new record today. Sweet. So we got even more allocations. The kernel restricts the problem. So instead of doing general memory allocation where users can just say any old size they want, the kernel restricts it so that you can only allocate in powers of two. And because of that, it can actually implement something called the buddy allocator because, well, that's a restricted problem. It only allocates in powers of two. So it needs a free list for every power of two size, but then they're all the same size. So works out really well. And one is as good as any other. And then, most importantly, merging becomes really easy once you play with memory. And then there's a slab allocator, which is basically a big old array and a bitmap. So that just takes advantage of everything being a fixed size to reduce fragmentation. And yeah, I will be around otherwise. So just remember, we're all in this together.