 Uh-oh, why didn't we work? All right, there we go. All right, welcome back to operating systems. So today, last content that is actually coverable on the final. Yeah, your final is next week. That sucks. So last lecture that is valid for the final, just to make it easier for you. And not much will be covered at most. It's like a short answer question. I wrote it a while ago, so I already forget what it was about. But something related to this lecture or the last one, that was a short answer question. All right, so where we left off last time, we were talking about memory allocators. And we left off with, well, the general one kind of sucks. It is slow. I have to search the entire free list. It's not super general. What would the kernel actually do? So the kernel, most kernels implement something called a buddy allocator and a slab allocator. So we will look at them both today. And after that, well, I suggest you work on lab six, because this one should be short. And I will be here for your lab six questions for whatever you need. And there's a special dog bonus at the end. So the buddy allocator, so it restricts the problem. So instead of accepting allocations of any size and making the block size match whatever the requested allocation is, it restricts allocations at least internally to be of powers of two. So it only deals with two byte requests, four byte requests, eight byte, 16, 32, 128. So that would match your i-nodes from lab six, maybe four, 96, so on and so forth. So restricts allocations to be a power of two. And this way, we can have a more efficient implementation that we will get into very shortly. But the gist of it is, well, it's called a buddy allocator because everything's a power of two. I can split a block into two and create two smaller blocks of the next size down. And then, well, I could do that as much as I need to to actually handle the request. Because the goal for this memory allocator, if you actually go to implement one, which you will do in four, five, four, is you want to make it really fast to do searching. So searching for a free block, you want to be really fast. So in the general case, I have to search the entire free list. And you also want merging to be fast that we kind of touched on. So if two memory or two free blocks are contiguous, you want to merge them together into one bigger free block. So this is a lot easier to do that because, as the name implies, well, if you split a block into two, it was contiguous and you split it off into two. So you have one allocation and essentially it's buddy. So if both you and your buddy are free, you can come back together and be friends again or something. I don't know where I'm really going with that analogy, but that's the gist of it. So if you were to implement this, instead of having one free list that you have to search for, in this case, since you restrict it to be powers of two, you have multiple free lists, all that just keep track of a certain size. So you restrict the request to be powers of two, so that would be two to the k, where k could be from zero all the way to n. And in that case, if you restrict it so that it can go to zero all the way up to n, then you have n plus one free list for each size. So in this case, if n was two, and I had a free list for block sizes of one, two, and four, so I would have a free list for each of those sizes. So the general way it works is if a request comes in, that needs to be a power of two. Well, I would search the free list until I find an entry that is big enough. So say I have a request for, I don't know, 32 bytes. So I would search the 32 byte list, and all I need is a single entry from that list for the application, for the allocation. So if I search the 32 byte block free list, and there's nothing in it, well then I search the next size up. So I would search and see if there's any 64 byte blocks free. If there's no 64 byte blocks free, then I look and see if there's any 128 blocks free until you find one, or if you don't find one, it means you're out of memory and you return an error. If you do find one, so like I find 128 byte one, then I would divide it up in groups of two until I get the exact size allocation I want. So I take the 128, break it into 264, and then take one of the 64s and break it into 232 ones, and then I'd use one of the 32 byte blocks for the allocation, yep. No memories of size, two to power k plus one, k plus two, or anything larger would exist. So if nothing larger exists, it means you're out of memory. So I can't do the allocations. Oh yeah, true, thank you. Yep, so the other part of this is that's allocations for deallocations. Well, if you deallocate one, so you just mark it as free, and then essentially you would check it's buddy, and if it's buddy is also free, you just create a bigger block, a bigger free block of the next size up, and then you could recursively do this. So if the next size up, you also have a buddy that's free, while you merge those two together, and then you could merge this until you can't anymore. So it can do that really fast, it's just like log n, and that's much better than actually searching in the worst case every element of the free list. So here's how it would look. So assume that I have a buddy allocator that's managing 256 bytes, so I have it at the very top, and some allocations happen, so a red allocation of 32 bytes happened, and a blue or teal or cyan, I guess it's not cyan, allocation of 64 bytes happened. So our memory allocator could be in this state. So the filled in background means that it's just here to illustrate that that was a big block, but I broke it, I'm currently not using it, and I broke it into two smaller blocks. So I'm managing 256 bytes, and I had to break that into two 128 byte blocks, and then each of those 128 byte blocks, I broke into 264 byte blocks, and then this 64 byte block, I broke into two 32 byte blocks where I have a red allocation here of 32 bytes, and then another allocation of 64 bytes here. So in this case, I would have a free list for each of these levels, so I would have one element of a free list that has a 32 byte block free, and then in my 64 byte block free list, I would have two elements of it. So, now if I'm using a buddy allocator and I have an allocation request of size 28 that comes in, what would happen? Yeah, it would just fill in the 32 byte block. So 28, not a power of two, have to round up, so it would check the 32 byte block list first. Oh hey, there's one free there, so I would just put it there, and essentially I have a little bit of internal fragmentation here, I have four bytes wasted, so I would do that. So now what happens if I have a 32 byte allocation after this? Yeah, I picked one of the 60, so if I did the same steps, I would search my 32 byte block free list, oh nothing is free, go to the next size up, so oh there's 264 byte blocks free, one's as good as any other, so just pick the first one, divide that into two 32 byte blocks, I use one for the allocation and then one would go on to the 32 byte block free list, so it should look like this. Do I have questions about that? Oh, I almost tripped and killed myself, all right, cool. Your final would still happen because they already printed it, so yeah, if you offer me right now, it's already too late, you should have, okay, don't try to murder your professors, all right. I don't know, okay, I should just stop talking, all right. So now what happens if we actually free the 64 byte block here? So I'm done with this blue allocation, so what should happen? Yeah, so if I free this one, well if I look up, its buddy is also free, so that means I should merge them together and create a 128 byte block free. So now I would have one 32 byte block free and one 128 byte block free. So any questions about that? All right, we're all buddies now. All right, so that's a buddy allocator. So they're actually used in the Linux kernel because while they're fast and simple compared to just a general free list that has different size or any size block that the user requests, it avoids some external fragmentation by keeping all the free pages nice and contiguous. It's all powers of two, so it kind of trades off some internal fragmentation for some getting rid of external. This advantages is well, if I don't have powers of two allocation, then I'll have internal fragmentation, but computers like powers of two anyways, and if you're a kernel developer, you're probably only using powers of two because well, you're closer to the hardware, so this is something that the kernel would actually use. So any questions about that? All right, cool. All right, next one is a slab allocator, and they take advantage of all the allocations being exactly the same size. So essentially what happens is it allocates an objects, they're all the same size, so they just create a pool of objects. Instead of using the word a pool of objects, you could also just think of it as a big array. So just in a big array of objects, everything's the same type and are the same size. So for lab six, that's what you do in lab six. So you essentially just have a giant array of inodes because they're all the same size, and you just store them all contiguously. So every object has its own pool. So if you had multiple objects that were all different sizes, you would have a pool for inodes, a pool for some other objects. Well, you have a pool for IO blocks, but in general you could do this for anything you want in your program, and this prevents internal fragmentation, well, and external because everything's the same size. So a slab is basically just a cache of slots. If you want to think of it as a giant array, well, it's just elements of that array. So each allocation size just corresponds to an element in the array or slot if you want to use the more general terms, and one slot just is an element of array, it just holds one allocation. So in order to keep track of what array indexes are actually used or not, instead of a linked list, I don't need to do that. I could use a bitmap or just an array of bits, and there's just a mapping between a bit that keeps track of the index. So like bit zero can keep track of index zero, bit one can keep track of index one, so on and so forth, and the bit would just say one, the slot is being used for an allocation, zero it means it's free. And if you try to do an allocation here, while finding a free slot is pretty simple, you would just search the bitmap until you find one that isn't full of all ones, and you just take that zero, flip that, figure out what slot that corresponds to, then flip the zero to a one to indicate that you are now using that slot for an allocation. And then for deallocations, well, I just flipped the bit from a one to a zero to say it's not used anymore, and then it's up to you whether or not you want to clear the memory or not or zero it out. Typically kernels are lazy, and they will just mark it as unused and not actually write all zeros to it or anything like that, because typically, well, it will just overwrite it anyway, so why would I bother wasting time writing it zeros? And we could implement a slab on top of a buddy allocator if you want, or in more general terms, if you want to do this in your program, so if this will be a common optimization, if you really care about performance and you see malloc as being bad and you are just allocating thousands of objects or millions of objects and they're all the same size, you might just be like, okay, screw you malloc, just give me like several hundred megabytes of memory, and I will just turn into a giant array of these things and I will keep track of them myself. So if malloc was a buddy allocator, it would be on top of a buddy allocator, but it could be on top of anything. So what that could look like is, well, I could have two objects here, like an A and a B, and I could have different slabs, so each of these blocks of memory would come from the buddy allocator. So say I get slab A1 from the buddy allocator, and I put four objects on here, and at this point I'm only using two, and I have a bit of filled in space here that represents some internal fragmentation, and say I need some more space for these A objects, well, I just asked the buddy allocator for some more memory, I get slab A2, and then I have four more objects there, and then I could have slabs for objects of B that are slightly bigger, that actually filled the block completely and don't have any internal fragmentation. And for this, you might look at this and say, oh, there's some internal fragmentation, so instead of having these two separate slabs, why don't I kind of merge these two slabs together and have a giant slab of A objects, and maybe instead of having eight and internal fragmentation, maybe I can squeeze in a ninth there, but if you do this, you'd be moving memory around so you have to make sure all the addresses line up, and if we had multi-threaded application, you have to do that with data races and all that fun stuff, so probably not worth the effort. So any questions about that? All right, so you get practice with it in Lab 6.2, and this lecture is short fairly intentionally, so you can work on Lab 6 and I'll be around here, and that's what we will do very shortly. But to wrap up, so we skip this slide in lecture 28, this is the second last slide, so I will just go over it quick. So there are file systems that are called journal file systems, so this is kind of review. So remember how the kernel, well, if you get into Lab 6, to actually delete a file on Unix, typically involves three steps. So I unlink it, so if I RM it, then I would get rid of that name to inode link, and then if nothing, if this inode is not being used anymore, so there are zero things pointing to it, then the kernel can go ahead and free all the space associated with that inode. So the first step would be to remove the last directory entry, so that name to inode, and then after that, well, that inode's no longer used, so I should deallocate this inode, so I have to write to the file system that I'm no longer using this inode anymore, so it would change from a bit from one to zero to indicate I'm not using it anymore, and then that, assuming it's a regular file, it would actually use some blocks to store its contents, so now I should also mark these blocks as being unused because while this file doesn't exist anymore, I can use them for other files. So in general, you write all these things to the disk and then it's permanent, but your power could go out, you could run out a battery between any of these steps, and then your file system would suddenly be inconsistent. If you've started Lab Six, you have probably made your file system inconsistent by accident, but some of these can happen by just yanking the power cord out or running out a battery while it was in the middle of some operation. So for instance, if I remove the directory entry and now nothing is using that inode, and then power is cut, and I didn't write the rest to disk, well, in order that inode would still be marked as used and all the blocks would still be marked as used even though nothing is actually using it, so to recover from this, I would have to scan the entire file system to figure out that, oh, no names actually refer to this inode anymore, so I should actually mark this as unused now and it would take forever to be fixed and found, so what a journaling file system will do is essentially the journal you can think of as a to-do list, so before it starts all these three steps, it will, whoops, before it starts all these three steps, it will write to the storage before it starts doing them that I'm about to delete this file, so you can think of it as a to-do list so it will say I'm about to delete this file and then it will start doing those three steps and then if it's completed the three steps, then it does a little check box, so it's like I've done that operation, I've done all three steps successfully, it was all consistent and written out to the disk. If you pulled your power out between any one of these steps, then that entry would still be in the journal and next time you power it on, it can actually make itself consistent much faster because it would say, oh, I was in the middle of deleting this particular file, so I should check that everything associated with just deleting that file is consistent, so it would just check rerun all these steps, might be like, oh, okay, that one was okay, oh, I crashed here, so I need to free its inodes and then I need to free all the disk blocks, so it gives it a hint so it can recover faster. So you might notice this as an option, whenever you reformat like a macOS hard drive and you have the file system, there's an option between a blank one and a journaled one, journal essentially means this, so it's a little bit slower, but if you run out of battery or something like that, your file system is going to be consistent, so it's a lot easier to fix. All right, any questions about journaling? All right, so we saw even more memory allocations today, kernel restricts the problem, only uses powers of two internal, like internal size blocks for memory allocation and if you do that, it implements something called the buddy allocator, so that's the real world restricted implementation, it just needs one list for each level and makes searching, coalescing a lot faster. Then we saw a slab allocator, which is basically a giant array and then a bit to keep track of whether or not you're using that element of the array and that is for fixed size allocation and then we don't have any internal or external fragmentation. So we will work on lab six in the last half an hour and I will be here for any questions or anything or to write or for just dog. So just remember, Paul and Toya, we're all in this together.