 OK. Sorry for being late a little bit. So today we're going to go through our eighth recitation. We're going to start with assignment 3. So we're going to have an overview today of assignment 3 and we'll give an overview just to understand the big picture so you can start working through these three deadlines that you have. So we're going to go through assignment 3. Then I'm going to explain the virtual memory subsystem and the virtual memory interface. And then we're going to go through the DOM VM and what does it provide and what limitation does it have. And then I'm going to explain what you need for the design document. So an overview. So assignment 3, you need to implement a virtual memory subsystem. And the implementation, as you know, is divided into three incremental parts. First you need to implement the physical memory management, which is the core map. And then you need to go into the address space and TLB management, which is the user paging. And then you need to implement the swapping. We have three deadlines set for these three parts. So each you will have two weeks time to finish it. So these are the deadlines. And our next deadline going to be not this Friday, the one after, for a working core map. So the virtual memory subsystem. Basically, the virtual memory subsystem is just like a memory management technique that maps virtual addresses used by the user process into physical address in the computer memory. So the user process, basically, what it sees is a continuous address space. This is the first one. And then the operating system manages the address space and assign it to the real memory that is the physical memory in the computer. And the hardware unit that is part of the CPU that is responsible for translating the addresses of the memory management unit. And that basically translates the virtual addresses into a physical address. And that is what we call, once you start implementing, is the TLB management. So you're going to implement a TLB that MMU are going to use to translate virtual addresses to physical addresses. So what are the benefits of this virtual memory system? Basically, it provides just like a continuous space for the user. And also, it tells the user that just like you have the whole memory for you, you can use it. And it also provides a smooth allocation for the user, just like it gives them the illusion that the memory that it has is continuous and smooth. And also, it also has just like it provides a memory isolation. And this is used to improve the security for the memory. And also, it is able to conceptually use more memory than the physical memory might have. And that's what we're going to do with the swapping. Just whenever we run out of available pages, we're going to swap some pages out and use that these pages for the new process that are requesting them. So this is the whole picture. So basically, what you have is what you're going to implement a virtual memory subsystem that basically what it does, it just like translates or it's a technique just like translates virtual addresses to physical addresses. And that's what the virtual ads are, the ones that are going to be used by the user processes so that you can provide the illusion that you have a full memory just going to be used by you. No other process is going to use it. And also, it just like provides a smooth memory allocation and the allocation. So yeah, this is basically what the virtual memory subsystem is. Any question? OK. So we have several interfaces. So you already have a virtual memory subsystem that is implemented for you, which is the dump VM. And basically, it's dumped because it doesn't do that much. And you shouldn't even look at it when you want to implement your own virtual memory subsystem. So let's give you an idea of these three parts that you need to implement, which is the physical memory management. This is the first step. And then you need to implement the addresses space TLB management and then the swapping. We're going to go through each one of them. And I'm going to give you a general idea of how these should work. So we can later on, the next week, we can go into details of each one of them, which is the physical memory management and the addresses space TLB management and then the swapping. So we have a header file that is already there that we can go through and see which is the VM.h. So this is the VM.h. And what you have is basically three interfaces. You have the VM bootstrap that is called when you boot the kernel, boots up. And you do a different kind of initialization into that. So basically, for example, whatever data structure you implement for your virtual memory, you need to initialize it through the VM bootstrap in that function. And that's basically where, so we have a collection of just like bootstraps that are already in main.c, I believe. So let's go through here. So in main.c, you will see that in the boot function, you have all these bootstraps that are already there for you. You have the VM bootstrap already there, but it doesn't do anything. We're going to go through this when I explain the DOM VM for you. So this is the place where you need to just like do all the initializations and whatever data structure you use into your VM, you need to just like initialize it here. Whether in that function or somewhere else, because sometimes, if you know, the core map needs to be initialized before the other data structures are initialized. And you can go through just like two or three posts by the professor on this course that's explaining this. Why should this happen? So this is the VM. So let's go through the VM again. OK, so this is the VM bootstrap. We have the VMFault function, which will basically handle if what's for false happens, whether it's page or TLB. Also, we have the allocate k pages and the free k pages. So basically, allocate k pages and free k pages allocate. So when I call the allocate k pages, we'll allocate a chunk of pages, one or more pages. And the free going to do the same but destroying them, not allocating. So if we go through the signature of this function, we will see that allocate k pages receive the number of pages. And that does make sense. OK, I'll give you, I need 10 pages to be allocated. And it will allocate it for me. But the free k pages receives only an address. And this is one of the things that you need to think about, because I'm only going to give you an address free. The page is starting at this address. So how many addresses should I have free? This is one of the things that you need to figure out or implement. So your data structure should be able to retrieve that information, yes. Let's just say you try to allocate 10 pages. What should allocate pages do? Right now I just have a panic. I'm not really sure what else to add. No, so you mean when you allocate the? Yeah, you try to allocate 10 pages. OK. What should happen? So the allocate k pages, this is the kernel one. And it has to be a continuous. So you mean there is a continuous 10 pages, or? There is not. There is not. Oh, so just like, so if there is not enough 10 pages, so I need to go through that case. But what I recall is when you call allocate k pages, it has to allocate 10 contiguous pages. Yeah, but it's a 10 contiguous page. Right now, OK. So basically when you don't have enough free pages, then you should swap. So for now, that's what I wanted to just like, do you mean before the third bit line? Yes. Yeah, OK. Yeah, that's what I need to go through. Because otherwise, if we talk in general about assignment to three, this is what swapping will do. Yeah, yes. So this is one of the discussions that is already on the discourse. Because previously it was able, you were able to just like use the dump VM to allocate pages for yourself before getting to the swapping. Or before you get the swapping done. But what's going on right now is because we are providing tests for memory leak, we cannot allow the dump VM to allocate some. Because the problem is when you bootstrap to the point where you call VM bootstrap, so you're going to use the dump VM way of allocating. And the problem with that is that whatever you allocate with the VM is not going to be free. So you're going to leak memory. And this is, if you go through the post on the discourse, the recent one, there is a discussion on this and should tell you exactly what you need to do about this. So this is one of the differences that we had this year from the last year. So the problem was just like before VM, you cannot, the K-MELIC, you cannot use it. Because you still don't have the core map initialized and everything. But if you don't want to leak memory, you should have the K-MELIC ready. And that means you need to initialize the core map. The core map should be ready. And that's why I said that you need to make sure that the core map is initialized at a very early stage of the kernel boot. But for your case, I think I need to double check what should happen before these deadlines. Before this walking is done. Yeah. Sorry, I need also to edit some of the slides. So I'm going to post an updated one later on today or tomorrow. So also the TLB should down, the TLB should down. All of this basically will flesh out a TLB entry. You don't need to worry about these. So what you need to worry about for your first deadline is the alloc K-pages and the free K-pages and the VM bootstrap. The VM bolt and TLB should down all. These are for the second and third deadlines. Let's see if there's anything else to go through here. OK. So this is the physical memory management. This is the first part that you need to implement, which is the physical memory management. And this is for the first deadline. You should also have a core map implemented here. So the second part is the address space management. You should be familiar with the address space. So we went through that address space header file before for the files as calls. And the interfaces that we have is as create, destroy, copy. Copy is the one that you should have used with fork. We have activate, deactivate, activate basically. So if we have an entry in the TLB for a process and another process when I come and get that entry, then any process should just activate that TLB entry for the current process. Deactivate will just deactivate a TLB entry for that process that I'm going out. So just like whenever a process want to use a TLB entry, it should call the activate. And that should also be used in your, I think, one of the syscalls too. As defined stack basically, returns defines a stack region in the address space and returns a pointer to that stack, to the user stack. These are the new ones that you're going to deal with, which is define region, prepare load, and complete load. So let's go through the address space.h. We can better explain things over there. So OK. This is the address space header file. This is the address space structure that we're going to go through it later on. So let's go through the functions. As we said, create will create an empty address space. Copy will create an exact copy of the past address space. Activate will make the current process address space the one currently seen by the processor. And that means just activating that address space on the TLB entry. So just like getting a TLB entry for it. Deactivate unloads the current process address space. So it isn't currently seen by the processor. So as destroy, it's going to destroy the address space define region. So for define region, basically, we know if you remember the process model or the diagram that I showed you, we have in the address space we have different regions. We have a stack region. We have data region. We have code region. And we have a heap region. So whenever you want to define a region, you're going to use the as define region. And we're going to go through example showing you how this is used. As a prepare load, basically, when you call an executable or when you want to load an executable file, so you need to call that function before you load data into the data segment or, let's say, whatever content you want to load from the executable into the regions of the address space. You need to call the prepare load before loading data into the address space. Once you're done loading data into the address space, code data, whatever it is, you need to call the as complete load. So one of the major examples for using these functions is the load health file. And that's what we're going to go next through. So it's in currencies called load healths. So if we go through the load health function, so basically, the load health is called whenever we want to load an executable into the address space. And that's so load an health executable user program into the current address space. So this is the for loop. There is a for loop in that function. This for loop basically will go through the regions that are defined in the executable and find out how many regions does that user process needs, for example. So we have different regions, as I said. We have just like data region, code region, heap region. And there could might be other regions, too. So for each region I go through, I'm going to, as you can see, call the as define region. And I'm going to give it an address, a size. And also, I'm going to define some flags, which is these are the permissions on that region. So for example, a data region or a data segment, you can read and write into it. But a code segment, you cannot write into it. It should be read only. Otherwise, you're going to override the user code if you're going to write to it. So this is how we define regions. We're going to give the address and also the size and also what are the permissions for that region. Once we define the regions, then we're going to then, as you can see, first we're going to call as prepare load, which is called before I load the content into the region. So now I define the regions. I know how many regions I have. Now I need to load the content of these regions from the executable. And this is what I should do before that, which is calling as prepare load. And then I start loading whatever data or content I have from the executable. And this is done through the load segment, as you can see here. So once I'm done loading all the content into the regions, then I'm going to call as complete load. And that means I'm done loading content from the executable into the regions that I defined in the user address space. So this is for the second part or part of the second part. So you have the address space management, all the as set of functions that you need to define. And also you have the TLB management that you need to implement. So any questions on the address space management part? Yes? Yes. Yeah, this is so address space and TLB management is part two. So we have also one more syscall that you need to implement, which is the ESSA break. And this is for the heap. So basically the ESSA break is a set process break. And that means, or let's say, allocate memory. So if I define a region for the heap, then the initial size will be 0. And as I call ESSA break, I should just like allocate memory in the heap using that system. And you can go through the details of that syscall in the man pages. Let's go through it. So this is the ESSA break man pages. So it tells you just like set process break or allocate memory. It's receive an amount. So basically this is the end address of a process heap region. So the ESSA break call adjust the break by the amount, which means basically allocate memory on the heap. Yeah, you can go through the details here. So here it tells you what I told you, which is the heap region is actually empty. So at process startup, the beginning of the heap region is the same as the end. And may thus be retrieved using sbreak0. And as you call sbreak, it will just like start allocating memory for you and bypassing the amount of memory that you need on the heap. OK. And so the third part is the swapping. Swapping, we don't have anything for it yet into the OS161. This is we don't have a slide for it. I presented you with the stuff that already exists on the OS161, the interfaces for VM that already exists. So you need to add to these. And also for swapping, there is nothing currently exist. And that's what you need to implement. So this is for the VM interface. So this is what the virtual memory is used for. So basically, we need to provide an allusion to the user using the address space. And that would be the size of the physical memory that we have. And then we need to just like handle the translation between the user virtual address space into the or virtual memory into the physical memory. And that's what the MMU does for us using the TLB management. Yeah. So this is basically the VM and VM interfaces that currently exist and what you need to implement. Any questions on the VM interfaces? OK. So let's go through the dump VM. As I said, we have a virtual memory that is already implemented for us, but it's dump. It doesn't do anything. Nothing much is done, actually. This is the most correct phrase for it. So let's start going through it. So we're going to go through all the three parts, which is physical memory management and user address space and swapping in dump VM. And we'll see what do we have and what should be done and what is currently happening with the dump VM. So let's open the dump VM. So for the dump VM, we have again the same interfaces that we had for the physical memory management, which is allocate k pages and the free k pages. But what is implemented in there? We're going to see that. So for VM bootstrap, there is nothing implemented. So nothing is initialized. If we go through the allocate k pages, so it will basically call getP pages or get physical pages of that size. So what will happen here is it will call getP pages and getP pages will steal memory. So what's happening in dump VM currently is just like this. So this is the physical memory that we have. When we boot the kernel, the kernel will get its space from the physical memory and what is available for us to use is starting from after the kernel memory. This is where the first physical address will point to or the pointer will point to. And the last physical address is the last page in the physical memory. So whatever between the first address and last address, this is what we have available. What does dump VM do is basically when you call allocate k pages, it will allocate k pages for you and when you call free, it will do nothing. So whenever you call allocate n pages, 10 pages, it will allocate 10 pages. Then you call it allocate five more pages, it will allocate five more and that means 15 pages are now done. If I call k free with dump VM, as you can see here, the free k pages does nothing, basically leaks memory. And that's what's currently happening. So it will basically keeps adjusting the first physical address pointer to the point that it reaches the last physical address. Once it's reached that, it will panic that we are out of memory. Okay, so if we go through the, for example, if we go through the get p pages, as you can see here, it will call RAM Steelman. So what happens in RAM Steelman, let's see here. So basically it will get the number of pages and the size equal number of pages of the current page size. It will keep adding. Once the first address plus the size is a greater than the last address, it will return zero, which basically it will panic. So this is what is currently happening. Maybe you might wonder, okay, so how does the k, so we were just like allocating memory using the k malloc and free, and it was a freeing memory. So as long as you're freeing memory less than a page size, then that should be handled for you. But once you exceed that, it is not handled. So let's say allocating large chunk of pages, that's gonna be a problem currently for you because if it's more than what's available, you won't get anything. If it's less, you will get, and while you keep allocating, once you reach the maximum size of the RAM size or the available size that you have, which is the size between the first and last address, once you reach that size, then the kernel will basically, the dump VM will panic. So this is what's happening in the dump VM for physical memory management. And also if you can see that one of the stuff that is done in dump VM, and you should also worry about is, and get P pages, you can see that it is acquiring a lock. So you should have a synchronization primitive used with the physical memory. Why? Because the physical memory basically is a shared system-wide, so you need a kind of synchronization primitive to just like, so it is a shared resource, then you need to protect it using one of the synchronization primitive that you've implemented. So this is basically how the dump VM handles the physical memory management. It's very basic. So please don't go through the dump. Many students just like use to go through or copy the dump VM file, and then they try to just like modify the file based on what dump VM had. You shouldn't do this. You shouldn't even look at the dump VM when you implement your physical memory or the virtual memory subsystem. So don't follow what is there. So here I'm explaining things to you because I need you to get an idea so what's how the things are working in OS 161. So now you can go back and see how you should improve it. So as we said, the limitations that we have for dump VM physical memory management is basically it doesn't free what is allocated. There is no reclaim or recycling of pages. It keeps stealing until it runs out of pages. How should we improve this by implementing the core map? So how should we think about the core map? The core map basically is just like a data structure that keeps track of the pages that you have. All the information that you need to keep about a page, you should keep it in the core map. So let's say it is something similar to just like a process table, let's say. So the process table keeps track of all the information of a process. The core map is a data structure that keeps a track of whatever information you need for every page. So let's say what kind of information you need for pages is basically whether that page is available or not. It is, so, yeah, whether it is available or not, maybe. So I don't recall the stuff clearly, but we can go through them. So that's what we're gonna go through next week in details. So, but in general, that's what the core map needs to keep track of is whatever information you need for the pages. And also keep in mind that, so the core map is a fixed size. So that means what happens once your kernel boot, you should figure out how much memory you have and then or how much pages you have and you're gonna allocate that size for the core map and it should be fixed. So it's not like the process and let's say the file table that wasn't of fixed size. So you have that number of pages, you need to allocate that number for the core map. This is one of the steps that you need to do, which is, so if we see here, so after we have the kernel, this is the, as we said, this is the first address. So when you initialize your core map, you need to count how many space does your core map does it need and then you need to shift that first address again to point to the end of the core map allocation. So your physical memory basically will be something between after, sorry, after the core map to the last address. So the additional thing will be here is the core map to add. So this is one of the stuff that you need to do, which is figuring out how much memory you need for the core map, allocating it, and then you need to shift the first address pointer so that it points to after the point that the core map ends. Yes, okay. Couldn't do this, yeah. So you need to allocate. So in the kernel, you need to allocate. There is, so the kernel doesn't use a TLB. The TLB, for example, is used with the user and that's why when we have the TLB implemented, we can just allocate different pages in different places and the TLB will handle getting these pages. Oh, this one. Yeah, but in the kernel, the kernel has no TLB. So what happens is that the kernel has to just like allocate a contiguous pages. So if it doesn't find contiguous, then let's say you want four pages. If it doesn't find four pages, then that's where the swapping comes in. And then you need to decide which page you need to swap out in order to just like get these four pages available. So for these functions, which is allocate pages and free K pages, this has to allocate contiguous pages for these functions because these are kernel functions. Kernel has no TLB. The TLB is used by the user and that's why, as I said, you can allocate different pages in different places and then the TLB will just like translate it for the user and it will know where are they. But in the kernel, when you implement these functions, you need to allocate contiguous pages because the kernel has no TLB thing. So whatever you allocate has to start and should be contiguous to the number of pages that you need to allocate and you need to return. So if you don't have that much, then you need to swap out. Yeah, and before you swap out, you need to just like figure out what to swap out. So these are all the stuff that you need to think about. So as I said, so the core map should come after the kernel and the initialization. So any questions on the physical memory management for the DOM VM? Okay. Still have, okay. So the user address. So if we go through the, if we go through the address space, the structure for the address space, we will see that. So I've brought it up here. So if we're running the DOM VM, this is what we have. What does that mean? So we have virtual base one, physical base one, and end pages. This is the size. So and this is points to the code region or the code segment. We also have another V base, P base, and end pages that points to the data segment. And also we have the as a stack. This is the base, the pointer to the stack base. So as you can see here, the DOM VM has no heap. It only defines two region and that's it. It's only a code region and a data region. And no more than these two regions you can define. So for example, if the executable has more regions, that's cannot be handled by the DOM VM. DOM VM only supports two regions and I mean it by numbers, two regions. So no more than these two regions. So it has the physical base for that region and virtual base for that region and what's the size of that region? Yeah, so this is the DOM VM, but what you need to do, so as you can see, fixed number of regions we have. We don't have heap. That's what you need to implement using the S break also. And we have a fixed size stack. So one of the limitations for DOM VM is just like when you allocate a stack, it will allocate for example, let's say four pages for the stack. And that means if the executable or if the user process only needs one page, I'm gonna again using the DOM VM, it will again allocate the four page. It won't allocate the one page. So that means a waste of memory. So we have a fixed size stack and if you go over this, it will just like panic. So the user stack shouldn't go over the defined size. Where you can find the defined size, so this is it, defined DOM VM stack pages, which is 18, which is 72 K. So this is the maximum stack size that you can have. Okay, so how to improve it? You need to just like have the ability to just like define a variable number of regions. So if you have more than two regions, then you should be able and also variable stack size. So the stack size shouldn't be fixed. So it should be variable. Okay, so let's go through the address translation in DOM VM. So we know that this stack starts at address 80 million growing down and the heap grows up. So the address translation in the DOM VM goes like this. So we have the V base, we have the P base and we have the size of each region. So what will happen is, let's say we got a VM fault. So we will get and that will pass to us a fault address which you can see here. So what I will do is first I will get the offset which is by first you need to figure out this address belongs to which region. This is the first thing you need to do. How you should do this by using the V base plus the size of the region. So this is you can figure out what is the size of that region. Then once you identify to what region does this address that generates faults belong, then you can compute the physical address. How does the DOM VM do it? This is how it does it. So it will get the address minus the V base and then so it will get the address should be somewhere here let's say and then it will take out the V base and plus the physical base and it will get the physical address. So this way of translation is only used by the DOM VM. This shouldn't be used by you. So you shouldn't look at this when you implement your address translation. So how should we improve this as implementing the page table? So and page table is basically the data structure that should translate physical addresses into virtual addresses and to physical addresses for you for the physical page addresses into the, sorry, virtual page addresses to physical page addresses. Swapping, we have no swapping and DOM VM. It will basically panic once it runs out of memory. How should we improve it by swapping out pages once we don't have enough free pages available or let's say enough free contiguous pages available? This is for the swapping and then finally we have the design document. Please, this time try to do the design document. So today we went on an overview. I could start with the core map today but that wasn't right because you should have an idea what the whole assignment three is about because once you get an idea you can go through all these files, you can understand what should be implemented and how should be implemented. Assignment three has a lot of dependencies so make sure to write your design document before you start implementation. And this is why an assignment two up until let's say the last two days we still got questions on what is a file table? How should I allocate a file table? If you don't have a design document you won't be able really to finish the assignment to three in its entirety. So make sure to have this stuff in design document which is the data structure, how you're gonna implement them, the core map, add the space and page table and also the functions, the VM fault and all the as set of functions. And then also mention the synchronization that needs to be used. Thank you.