 All righty, welcome back to Operating Systems Lite today. So what's going on today? All right, well, this is the last lecture of actual content. Pass that. Next one will be virtual machines and then just review sessions. So this one may is valid for the final. I forget. I wrote it a while ago, so I forget if there's a question on it, but yeah, this is the last lecture that is actually covered for the final. And guess what? The final is next week. So that's fun. All right, so we left off last lecture talking about how the kernel would allocate memory. And it wouldn't just use a free list of any old size. It would do something different. And today is that different thing. And it kind of relates a bit to Lab 6. So the allocators that the kernel uses are called buddy and slab allocators. Why would it be called the buddy allocator? Well, because it uses the idea that memory can be buddies with another piece of memory. So how it actually works is it restricts the problem a little bit. So in the general case, we have to deal with allocations of any size. And we make our blocks the same size as the request. But the buddy allocator does not do that. The buddy allocator restricts it a bit and only does allocations internally, at least in powers of 2. So every allocation that deals with internally will be something 2 to the n. So like 2, 4, 8, 16, 32, maybe 128. That's like an iNode. All the way up to page size. Maybe it's more than that and manages a bunch of memory. So why it restricts powers of 2? It's to have a more efficient implementation. And the idea behind it is it will just split blocks in two. And you know that they are contiguous because one block is a buddy. With another block, they came from the same parent. So if you need to merge them back together, merging is really, really quick because you can check because everything's aligned. Check if you and your buddy are free. If you and your buddy are free, you can merge them back into the next power of 2. And with this, with most memory allocators, once you start implementing it, the problem becomes searching to find a free block of the right size. And merging is also difficult if you implement the just most general strategy. So because we restricted it to be powers of 2 only, your implementation for a buddy allocator can use multiple lists. So since we restrict all of the allocations to be to the k, where that k could be 0 all the way up to n. And we round up, if needed, to a power of 2 if someone requests a weird size. So how the implementation would work is you have a free list for every power of 2. So there's a free list that just keeps track of all the free blocks that are size 2, a separate free list that keeps track of all of them of size 4, a separate one that size 8, so on and so forth. So if I went up to n powers of 2, well, then I need n free lists. So how this works is if you get a request for size 2 the k, so say it's like 32 bytes or something like that. So you search the free list, so you find a block big enough. So you search the free list for exactly 32. If that is empty, then you search for the next size up, so 64. And if that, there's no free blocks there. Then you search up for the next power of 2 up, which would be 128 or whatever it was, until you actually find one. If you do find one, you just keep breaking them apart, keep splitting them in two, until you create a block that is the same size of the request. And when you break them apart into two, you create a lot of buddies, so you would create more entries for every power of 2 you had to create. You would add it to the free list. And for deallocations, what you do is that coalescing. So if you free an allocation, and its buddy is also free, you coalesce or you merge them together to create a bigger free block. And you could do this recursively if you needed to to create the maximum size free block. So how would that look graphically? So say I have a buddy allocator, and it is managing 256 bytes of memory. So anything with the dark background there or the filled in background, that means this block has been split into two, and I'm not actually using it for allocations. So in total, I am managing 256 bytes, and I split that into two entries of 128. And then for each of the 128 byte entries, well, I split them each into two 64 byte entries. And for this 64 byte entry, I split it into two 32 byte entries, so this entry and its buddy. So anything that has a colored background means that it is used for an allocation. So there's a red allocation for 32 bytes, and then there would be a free block of size 32 bytes. So that would be stored in its own link list that keeps track of all of the free blocks of size 32 for the 64 byte ones. Well, there's two entries here, so there would be two entries in the free list. One's as good as any other one if you have to search for it, so it doesn't really matter. And then I have this blue or teal or cyan, I guess it's not really cyan, blue or teal, allocation of size 64 bytes. So what would happen to the buddy allocator if a request comes in for 28 bytes? How would this, what would it do? Yeah, so check the list of 32 size. So 28 is not a power of two. Got to round up to the next highest power of two, which is 32. It would check the 32 byte free list. Hey, there's a 32 byte one. So it would use that and just do the allocation right there. So we would fill in the green allocation and we'd have some internal fragmentation, right? We'd have four bytes of internal fragmentation because while we're only using 28 bytes of that 32 byte block. So what happens now if we get a new request for a 32 byte block of memory? Yep. Yeah, so we would search for exactly 32. Right now there are no free 32 entries so we go to the next power of two up, search for 64. This one doesn't really matter what you pick. One is good as any other. So we would take this 64 byte block, split it into two 32, and then we would use a 32 for the allocation and then put another 32 back onto the free list. So it should look like this. So now we have our purple allocation and we have one 32 byte block free and one 64 byte free. So what's going to happen now if I free that 64 byte block? So this one becomes free. Is this buddy also free? Yeah, so it's buddy is also free. The one that was allocated beside it, they both have the same parent. So we could create an even bigger free block. So we could create a 128 byte free block out of those two and we can't go the next level up because while we're using part, we're using the other buddy. So at the end of the day, we would be left with this after all the allocations. So any questions about that? That is the buddy allocator in a nutshell. All right, probably be quick today. So yeah, the buddy allocator, that is one of the primary allocators that are actually used internally by the Linux kernel. Advantages, well, it's fast and simple compared to just general dynamic memory allocation where we just have a free list and we have to search it every single time and there's just one free list and coalescing is really hard. Searching it is really slow. So this is a much better approach for that. The only disadvantage is that we have powers of two and we can only do that. It also kind of avoids external fragmentation because it keeps things contiguous because while it only has, you have buddies, so it's a lot easier to keep addresses contiguous if you're doing that approach. You can get into the same case where you do have fragmentation but it's at least a lot better with the buddy algorithm. The disadvantages is with this, if you don't have powers of two allocation, there's gonna be some internal fragmentation you just can't handle because while you only deal with powers of two internally because you always round up. So the last one, which is kind of related to lab six, so this lecture is built to be really short. So I will be here and I suggest you work on lab six if you haven't done it already and I will be here for any questions. So the last option is a slab allocator and that further restricts it. So for a slab allocator, all your allocations are all the same size. So how that works is you just allocate essentially a giant array, giant pool of objects and since everything is of the same type and of the same size, well then one is as good as any other and you can just keep track of whether it's allocated or not by just having essentially a giant array of bits that will keep track of each slot. It will be zero if it's unallocated and one if it's allocated or used. So this wouldn't have any internal fragmentation because you assume that everything is of the same size. So you don't have any of the fragmentation issues. One slot is as good as any other slot. So you will find this with lab six because you essentially have a giant array of inodes and then your inode bitmap keeps track of what inodes you are actually using. So it is actually an example of a slab allocator for your file system. So you can think of the slab as just a cache of slots or elements in an array if you want. So each allocation has a corresponding slab like a slot or an index, however you want to think of it and one slot is just one element of the array. It's one allocation. So instead of having a link list, we're essentially just using arrays and that's what a bitmap is. So there's just a bitmap. So for each slot, you just have one bit, keep track if it's allocated or it's not. And if it's trying to allocate memory generally or allocate one of the elements of the array, for allocations it's really quick. All you do is search the bitmap until you find some byte that has a zero in it. If you find one that has a zero in it that means that slot is free. You can go ahead, use that slot and then you just flip the bit from a zero to a one to indicate you are now using that slot. And then for deallocations it's also super fast. You just figure out what bit you need to change from a one to a zero to indicate that it is now free. And then you may or may not actually clear the memory for that and set it all to zero. Typically computers will not do that and kernels won't do that because it's slow to do that. It's faster just to mark it as unused because next time you use it you'll probably just overwrite the data anyways. So what's the point of wasting time setting it all to zero? And the slab, well it can be allocated on top of a buddy allocator on top of any general one. So typically if you really care about performance and you are creating thousands and thousands of objects of all the same type, typically what people will do for performance is essentially create a slab allocator, just ask malloc for a bunch of memory and then implement a slab on top of that and just have a bunch of, just treat it as a giant array where they just keep track of if a slot's allocated or not. So it could be on top of a buddy allocator, it could be on top of malloc. Typically when you implement this sometimes it's called an arena, sometimes people will call it an arena if they're implementing their own memory allocators. But once you start caring about performance you will probably implement something like this or use something like this. So if I have like two object sizes A and B what I might do is, well I get slab one or A one from the buddy allocator and I might put like four objects of type A there and then I ran out so I need to ask the buddy allocator for more memory. So maybe I get this block and I call it slab A two and I put four more A's on that and I'm wasting some space within the memory I got from the buddy allocator. So say each of these like I got, I don't know, 128 bytes from the buddy allocator and each of these is some weird number like 30 bytes or something like that, 28. Some number that doesn't quite fit. Then maybe I have bigger objects of size B that fit exactly within a block. So this one holds two B objects and B one. This slab for B two also holds two objects. So in this case, well you might notice that instead of having slab A one and A two I could have just used that entire space of memory and just have one giant slab of A objects and maybe I could have reduced my internal fragmentation instead of fitting in eight elements there I could have fit in nine. So you might want to do something like that as well if your sizes don't match up but typically you probably wouldn't wanna do that because that's again moving memory so you might make some addresses invalid and need to make sure that you update all the addresses and use all the newly updated addresses which we already have enough memory problems and if you're doing that in a multi-threaded application well guess what then you have to think about data races and all the fun stuff we know and love. So any questions about these memory allocator stuff? Yeah, so it's a bit weird. So typically you have like talk about external fragmentation probably from the buddy allocator that allocated these slabs and for that it wouldn't have any external fragmentation because it only deals with powers of two and just has internal fragmentation. So that's probably what you would consider in this case. All right, so rewinding to lecture 28 we ran out of time for this slide so now we have ample time so we can talk about journaling file systems. So remember back when we were trying to delete files there wasn't actually a delete. It was just RM is just an unlink so it just gets rid of the name to the inode association. So if you run out of, if there are no more hard links to an inode then you can actually delete it. If you start doing lab six you'll see the steps in order for the kernel to actually delete and get rid of a file there would be a three step process so it would have to remove the last directory entry so that gets rid of the name to the inode which case no one's using this inode anymore. So as the second step you can essentially free this inode back to the pool of free inode so that would require changing the bitmap that keeps track of whether an inode's allocated or not so in this case it would change that inode from a one to a zero since it's not being used anymore and then we would also need to free any of the disk blocks that that file uses because well we're not using it anymore. So if it used physical block six, eight and 10 for instance we would have to set those as freed as well but because this is all on a disk that is persistent well all of these operations I could yank out the power cord at any time assuming it was a desktop or I could lose power or run out of battery or something like that so between step one and two if I run out of battery then my file system is inconsistent so I would have nothing pointing to this inode but in the file system it would still be marked as used and it would be kind of lost forever so you might get in that situation and you want to be able to recover from it but in this case if you don't do anything it's really really difficult to recover from it unless you check the entire file system you walk every name and see oh nothing actually points to it so that inode is actually free I should probably fix it up or there's some inconsistency. So that is where journaling file systems come in so what a journaling file system will do is it will record in the log on a file like on the physical drive itself before it starts doing these three steps it will make sure it writes out what it's about to do so it will probably write that hey I'm about to delete this inode and then after making sure that that is on the disk then it would start doing these steps and if it crashed somewhere in between one of these steps makes it a lot easier to recover because you would just check the journal you would see oh you started deleting this file and you did not complete it yet so I need to see if just this is consistent and I can walk through just that instead of the entire file system and go ahead and clean it up. If however you know the three steps actually happen they're successful you would just remove that entry from the journal you have successfully completed it so you don't need to check it anymore so any questions about journaling file systems? So yep so if it crashes in the process of journaling then it would be like you never started deleting the file so it wouldn't have started anything so that file it would just look like you just never accessed the file or you haven't deleted it yet so it would be at some consistent state before anything happened so it just journals like the whole step so it just says I'm going to journal I'm going to delete these three things which implicitly means it's going to follow these three steps so in this case like deleting a file is these three steps so the journal would just say what's the bet to do so right to the journal I'm going to delete this file and then before it even did step one it would make sure that I'm going to delete these files in the journal in the middle of which step? So in the middle of doing this? Yeah so it would just say okay well the journal it's kind of like a to-do list so it's like I'm going to delete this file right so it would write that in the journal and it would be undone then it would start trying to remove this directory entry and maybe it removes the entry and then power gets lost oh if power gets lost like it just journals and then power is lost then well the file system would probably still be consistent but has a pending operation so it would probably just do that operation so if the journal said I need to delete a file and it's not done yet that's essentially a to-do list so as soon as you have power back again it would check the journal it would say oh I need to delete this file oh I didn't even do step one well I'll start at step one and go all the way to three and actually delete it so you might notice that when you format stuff especially on macOS will be like the file system and then the journal file system and the default option is a journal file system so this is what they're talking about when it says journal file system so it just means your file system will be consistent but it's also a little bit slower because you have to write to a journal before you do anything but typically it's fairly fast and it doesn't really matter all right any other questions before I guess we work on lab six all right so today we saw even more memory allocations so the buddy allocator is what most kernels use for most memory allocations of course it can get complicated for other things but buddy allocator is one of the most commonly used one it's a real world restricted implementation so it only deals with powers of two and uses the implementation when you use multiple link lists and then in the other case if we know everything is of the same size then you use the slab allocator slab allocators are used all over the place in fact it will be used in lab six if you haven't started yet because everything's of the fixed size so you don't have any fragmentation so with that just remember pulling for you