 Okay, good afternoon. That's kind of loud. Doot, doot. Wow, okay. No, not that. All right, good afternoon. So we're going to do a bit of review. This isn't new content, but we will solve a mystery. So everyone loves solving mysteries, and then we'll get into the time you can work on your lab. I can help you, and we can wrap that up. So memory mapping. We kind of touched on this a little bit. We kind of know the principles behind it, but we didn't actually explore actually useful things we can do. So everyone loves large language models now, right? Chat, GPT, everything like that. So there was this post on GitHub that got circulated around. So they have like a 30 billion parameter model that takes like 20 gigabytes, and there's one implementation of it that's like, oh my God, whenever I try to use the model, get some output out of it, well, my laptop doesn't actually use 20 gigabytes of RAM, which is the size of the model. It only uses 6.8. So how's that possible? So this is from like Lama, sorry, Meta's large language model, or Facebook, Meta's stupid name. So it's supposed to be like this is an efficient implementation, and it's so efficient that people were just like ultimately really, really confused. So like here's the post. So it's like, oh, this model only fits in this mail RAM. How is this even possible? So we will explore how this is possible, and you'll be able to answer the questions here better than anyone else has posted in the replies here. So just really quick. So there's a system call that has kind of been used in the labs that we haven't actually explored, and we can actually control our own processes' virtual memory. We can't modify our own page tables directly, but we can ask the kernel nicely through a system call to change page tables for us. So that's what you do with an M-map system call. It's just short for memory map, and you can actually use it to map files into your own processes' virtual address space. So this makes it so that the pointer you get, which is your own virtual address, it allows you to just access that file directly through memory addresses, instead of doing read and write system calls. So you don't have to do any system calls after the M-map. You just read and write memory, and it actually reads and writes from the file. So let's go ahead and see that example. So it's surprisingly small, and here's the whole example. So at main, we know how to open a file, I hope. So you have an open system call that say we're going to open this file. So this is just a C file, so it's just going to open itself. It's essentially going to CAD itself. So it opens this file as read only. Of course, we get a file descriptor back. We can probably guess what file descriptor number we'll get back because 0, 1, 2 are all used. So we can just make sure that we get file descriptor 3 back because we're just kind of flexing our OS knowledge here. And then there's this structure stat and an F-stat system call. And you kind of know what this does if you have been good and you've done Lab 6. So this will get all the information about that file if that file descriptor is pointing to a file. So if you've done Lab 6, you know that this will actually kind of read that I-node and give you the fields from that I-node. So one of the things we care about is like the size of the file because you have to tell M-map how many bytes I want to map in. Whoops, that's a big thing. So after that, you can do M-map. So M-map takes like six arguments, but a lot of them don't have to be used or they're for permissions. So M-map takes the first argument where you can kind of suggest what virtual address you want if you're super picky about it. If you want the default, you just say null. And then the next argument is the number of bytes you want to read. So we're going to make it match the size of the file. So that stat structure essentially just has all the fields that would be in an I-node. It just works for file descriptors that aren't regular files. But in this case, this is just a regular file, so it would actually use an I-node. And then the next argument are like the permissions. So for this, we'll say our memory map. We should just be able to read. We don't want to write because we're not going to try and modify the file. Then there's flags. So there's a private flag, which means you want that memory mapping to be private to you. This is how one of your test cases worked. So another option is to set it to shared. And that means whenever you fork, that virtual memory will be available in your child process as well. So you can actually share memory through this too if you want. Next one is the file descriptor you want to map. So that is our M-map file that we specify. But in general, this can be any file you want to open. And then offset, which is just how many bytes to start mapping into that file. So if you set zero, it means I want to start reading at the beginning of the file. All right. So any questions about that, we'll kind of explore it as we go on. But so after that, we're done with our file descriptor. We can close it. We don't need it anymore. Our kernel is set up so that if we try and access through this pointer now, which is data, it's valid for length bytes, which is our size. And we can just read from it. Yep. Yeah, normally you'd be using like a read and write system call, right? If you wanted to read from a file, you'd be using read. You'd have to set up a buffer, then transfers back and forth. And it's kind of a paint. So in this, you're actually requesting. So normally if you open a file, we have to read from it, right? So you read and it copies the contents of that file into the buffer. And it would go on, if you call read again, it copies the next byte, so on and so forth. This doesn't copy anything. It says take this file, map it to my virtual memory there. So if I access that address, it actually accesses the file. So you don't have to copy. There's no buffer involved. If you access the address specified by data, that's the start of the file. So you can just read it directly. No system calls, no mess, no anything like that. So it's actually quite useful. So that way that data pointer we get back is to the beginning of the file. So we could just have a for loop. If we want to print every character in the file, we can just go ahead for loop zero to the however many bytes we have. And we'll just print out that character. So data i and that's all we do. And then if we unmap it, that means we're done. That's essentially like its version of free. So if we run this now, it should pretty much do what cat did before, except now I don't even have to do a read system call. So if I do that, it just spits out that contents of the file. Yep, yep, yep. Yeah, so you can kind of tell behind the hood if you've done the lab six. So lab six, all the contents of a file points to a block. So what the kernel would do is make that virtual page. It would say, oh, this virtual page should actually map to this block in the file. So if you try and access that memory, you get a page fault and it would actually fill in that memory contents with the block so they match. So it loaded into memory, does a whole bunch of steps for you, and it's actually quite awesome. So it saves you a lot of work. Okay, so why did I, so yeah, any questions about that and how it works? So go over it real quick. So yeah, like six arguments like I said, I probably don't have to go over this, but it's like suggested address, which generally you just say null, don't care, however many bytes to map, protection flags, mapping flags, and by default, this was originally set up that you map a file, but if you want to use this because it's actually quite good at just allocating memory if you really want to or sharing memory, you can, one of the flags is anonymous. So if you have an anonymous flag, that means the file descriptor doesn't have to be valid. It's actually not backed by a file. It's just some random shared memory or just random memory you could get. So you could just ask for some pages too, and you could do your own slab allocator or do whatever with it too. So it's a very sweet thing to do, and it's also really lazy. So all it does is set up page tables kind of like what I alluded for. It doesn't actually read from the file at the time of the M map. So how it would work, we've done lab five, so it would create an invalid PTE entry with whatever address it decides to use. And since it's invalid, it can use the rest of the bits in the PTE entry to be whatever the hell it wants. And one of the things it could be if we're mapping a file, well, you could store the information that says, oh, this page maps to this block of the file. And then if that process tries to access that memory, you'll of course get a page fault, and then you can do your copy on write thing like you did in lab five. But instead of that, page fault happens, you read the page table entry and be like, oh, this is supposed to be this block of the file. And to resolve that page fault, you go ahead, read that in to memory and then just set it as being valid now because it's actually been used. And you can return back to the program and the program can use it and it would read it kind of on demand. So only the first access to that page would be a page fault, then the kernel loads from that file into memory and then it works, right? And the nice thing about that too is if it's a gigantic file, like for that large language model, well, it only reads the part of the file that actually gets used. So if I set up a 20 gigabyte file, I would have to set up all the page tables for that, but if I only used parts of it, then it would only load parts I actually access and I actually try and use. So yeah, going back to the question then, so how does a 20 gigabyte file only use 6.5 gigs of memory? Well, for these large language models, they're gigantic, it's a 20 gigabyte model, but for any one question you ask it, well, it's really, really sparse and you're not going to use actually all the model. So there's some parts of the model that are dense that you do use, but a large portion of the model just is really sparse and you don't use it. So if you have this, let's go back, so if you have this, you M-map the file and then you use it, well, it's only going to actually read into memory but it's actually being used and it turns out that, hey, it's actually quite good for this implementation. Go back, so that was actually the solution that caused this whole thing, which you guys should be able to do now. So if you go there, it's like, oh, it doesn't do the impossible, so someone has taken an operating system course and knows it hasn't done the impossible, but it is using only 6 gigs of RAM. Then let's see. So someone says there's some voodoo going on. And yeah, so here's someone that made it, which made the improvement. I'm glad you're happy. So the thing that makes this possible is now we're using M-map to load large models. That was the only thing they changed to bring the RAM usage down like 15 gigabytes. So we can actually read this. So this lets us read the weights without having to read them or even copy them because it does that M-map thing. So if you just read the memory address, then you let the kernel handle it under the hood. So it says the operating system, so clearly this person has at least knows the stuff in this course, creates a page table entries which reserve 20 gigabytes of virtual memory address spaces. Crudely speaking, the mapping of 20 gigs of RAM requires 40 megabytes of page tables this calculation. The individual page tables aren't actually loaded into memory or the resident blah, blah, blah. So this calculation, you guys should know, does this actually kind of look correct like what are the parts of this calculation that should be correct since this person kind of knows what they're talking to. Does that make sense? So I'll give you a hint. So the whole file is 20 gigabytes. So 20 and then here, this is just 2 to the 10 times 2 to the 10 times 2 to the 10 3 times. So that just means gigabyte. So that means 20 gigabytes. Why would they divide it by 4096? Page size, yeah. So that's our page size. So that's how many pages 20 gigabytes would take. And why do we multiply by 8? Yeah, that's the size of the page table entry. So we need this 20 gigabytes is how much memory we're using and they're divided up into pages. So each page is 496 and each page table entry takes 8 bytes. So this is how many at minimum we would need and this is just to divide it to get megabytes. So this is just 2 to the power 10 times 2 to the power 10, which is megabytes. So yeah, this is actually 40 megabytes. So can we improve on this? So instead of crudely speaking, can someone be more specific of exactly how much this takes? Because it's not quite right. Yeah? No? Yeah. Yeah, this would be the real systems multi-level page table, right? So this number is how many L0 entries you would need, which is like best case scenario. Yeah, so we need to split it up into a bunch. So we need to split it up into how many L0s we need and then from how many L0s we need then we can figure out how many L1s we need and then once we figure out how many L1s we need well we're pretty much done because we'll only have one L0 page table. So for a bit of fun because we should be exact or I guess a bit sadistic. Well, here's some calculations. So this is someone posting that you need this. Someone later in the post clarified because someone threw those numbers up. Of course, someone couldn't read them but since we've taken this course, we can read them. So someone else said that oh, you had to simplify it. So this is divided by the 4k page size times 8 bytes for page table entry divided by 1kb, which they had to typo because that's supposed to be megabyte and then we kind of argued that oh, that's just for L0. So how much space do we actually need? So that's correct. So if we took, you know, we have 20 gigabytes. So this is 20 times 2 to the 3. And then if we divide by page size, so oops, that's supposed to be 2 to the 12 because that's our page size, means yeah, we do need 20 times 2 to the 18 page table entries. And from that we can figure out how many L0 page tables we need. So we just take that number and divide it by 2 to the power of 9. Why 2 to the power of 9? Because that is how many entries we can fit in a page table if it just fits on a page, right? That's the page size divided by the page table entry size. So this is also assuming best case scenario where all of your entries are located beside each other so they don't span over tables. Worst case is it would be super spread out and you have like different tables for each one. But we'll assume the best case scenario where everything's located together. So in this case, you need 10,240 full L1 page tables which matches their 40 megabytes so far we should be good. And then we know that each L1 page table can point to 512 L2 page tables so we can figure out how many of those we need. So if we take however many entries, L0 entries we need and divide it by 512, well that's how many full L1 page tables we need so we would need 20 full L1 page tables. So in total, we would need 10,260, not 40 page tables and if you multiply it by the page size and then divide by megabytes actually this is like that really nerd thing but actually we need 40.04 megabytes of memory in the best case. So why did I do that? I have no idea, I wanted to and you guys should be able to but it kind of makes sense too that if the L1 entry is full, well it can point up to a gigabyte of pages like it can span up to a gigabyte so if we have 20 gigabytes that we need well we could have 20 L1 page tables because each of them could actually translate a gigabyte of memory if we wanted to. Yeah, this 8 here? Yeah, it's supposed to be in brackets there. Sorry, not in brackets. So it's supposed to be 20 gigabytes divide this and then just times 8 after that answer. Yeah, if you just throw it into Python it calculates correctly. Bed mass and all that. All right, so yeah, but this is kind of neat to know that hey, if it's a full L1 table that makes sense that it points up to a gigabyte and my whole address space is 512 gigabytes and in my L0 page table I have 512 entries so that kind of makes sense. All right, cool. We can finish Lab 6 then with the remaining time or just there's still chocolate. Eat it so I don't have to bring it home. Yeah, that's it of me talking. You can work on Lab 6, ask me questions or whatever. I guess I can shut this off and go around too. All right, just remember, we're pulling for you, we're all in this together.