 All right, welcome back to one week before your operating system's final. Isn't that scary? So, lab six, I told Satau to start marking them ahead of time so that if you're done, you get your mark back, you know you're done, you don't have anything to worry about for the week leading up to it. So, ask away, otherwise you know, watch the YouTube recording. So, anything we want to particularly go over, otherwise we can just go over final questions ahead of request from, to do our midterm virtual memory question. Any preferences to what I do? So, spoke up first, so here let's quickly go over our virtual memory question. So, system page size of 64 kilobytes. Here I told you what the size of one kilobyte is in powers of two, just in case you forgot. So this is, whoops, that's a big one. This is two to the six, so two to the six times two to the ten is two to the sixteen. So that is our page size, PTE size is sixteen bytes, which is two to the four. So system supports a giant physical address, so that relates to how big our page table entry size needs to be, so that's why it's sixteen bytes. Here are some truncated page tables, so that just means I don't show you all the entries, just the first few, and I show you below then first question. How many page table entries can you fit in a single page? Well page size divided by PTE size. So, two to the fourteen divided by two to the four, that's two to the twelve, right? Two to the twelve, two to the sixteen, sorry, two to the sixteen divided by two to the four, two may fours. So, two to the twelve, good on that. Alright, so if we only have a single level page table that fits on a page, what is the maximum size of bits of the virtual address we can support? Well, it would be VPN, it's always VPN and then offset. If we only have a single level of page table, well then guess what? We just have like a whole L zero here. And the number of bits we need to, for our index is essentially two to the what power is how many entries I can fit on a page, so that's how many things I have to index in my page table. So, this would be twelve bits and this would be sixteen bits because of the page size. So, if I add them together, well then that is twenty-eight. Right. Alright, so then for the system with a single level page table, assume the root is at PPN A, which means remember the physical address always looks like this as well. So, PPN offset, so in this case, well if it starts at paid PPN A, that means this is A and it fills the entire page so that offset can be any number because, well the whole page table fits on that. So, if we were to draw or to note the starting address of that, we would start at address zero, just like for arrays it all starts at index zero. So, in this case we have a sixteen bit offset, so that's three hex zeros. So, it would start at address A 000. So, if we try to translate this address where this is the virtual page number and this is the offset, then while we don't have to translate the offset and this is where we would look up our L zero index in this case. So, we would look it up at index one, so the page table at A 000 is this one and then what is that index one, well it's this. So, we would get, see that it is valid, so we can use it for translation and the PPN is E, so that is our final translation. We just take the VPN, replace it with the PPN which is E and then keep the offset the same as 777. Good so far, all right. So, same setup above except I'm using the virtual address this. So, all that changes is this which is the L zero index, otherwise it's exactly the same. So, if I try and translate that address, same thing I go to the same page table and I look at index two, oh guess what, it's valid bit is zero, so I can't use it to translation, I'd get a page fault and in my program I'd probably seg fault or something like that. So, I would seg fault at that because the valid bit is zero. So, still good, still with me. All right, next one. So, we do this with two levels of page table, same thing, each one fits on a page, maximum size of virtual address. So, now we have L zero, L one and then our offset. So, this is still 16 bits, this is now 12 and this is 12 as well. So, we just add them all up together. So, then if we do that we get 40 bits, right? All right, so now for that page table with the two level page table, assume the root page table of process 100 is ppnb. So, that means it starts at address b-0-0-0-0. So, I'll just go draw it here. So, this is the L one page table for process 100 is this one. So, if we translate this address, 1-7777, what physical address do we get? Well, if we draw that all out and fill it with all the padding, everything has leading zeros. So, this 0-0-0 would be our L one index, then 0-0-1, then 7-777. 0 index and then offset. Well, my writing is getting bad. All right, so in this case I look at my L one index is 0. So, I go look at its page table, look at address, or sorry, index 0. That means the ppn is d and it is valid. So, that means L one's level zero page table starts over here. So, L zero for process 100 starts here. And if I go back and look, well, the L zero index was 1. So, the L zero index was 1 that corresponds to this entry. So, since it is in my L zero, that's the final ppn I use for that virtual address. So, it is 9 and it is valid. So, I would just substitute this whole VPN for the ppn, which I would get just 9 and then the offset 77777. Right, okay, that's the same system for question, yeah, all good so far. All right, then same system for question 24. Assume the root page table for process 101 is C. So, it would start over here, L one for process 101 is C. If we translate the virtual address 1, 2, and then that is the offset, what physical address do we get? So, our L one index is 1, our L zero index is 2, and our offset is 4789. So, in this case, if we look at the L one index, we get this. So, at index one, it says it's valid and says the ppn is at D. So, it turns out that the L zero for process 101 is also here. And then it also asks that we have to do the rest of the translation. So, the L zero index now is 2. So, we get to look at this one. So, it is currently not valid, so we would get a page fault. If it was valid, then our final address would be 82789. But it's invalid, so we have a page fault at our L zero lookup. All right, good, any questions about that? So, now it says, given our processes with the root page tables, are there any issues to make sure they have completely independent memory? Explain why or why not? Well, there is an issue because they're both sharing an L zero page table. So, if you allocated memory in one process, it'd be allocated in the other process. Like if it got a physical page for it, you could see it in the other process. They're sharing pages and since they're sharing the page tables, all their translations will be the same. So, if I tried to translate anything at address in process 100, if I tried to translate something with an L one index of zero and then in process 101, if I tried to translate anything with an L zero, oops, an L one index of the 001. Because those map to the same L one page table, or sorry, L zero page table, anything I put here, any memory that has the same L zero index in both of them will map to the same page. So, this, let's just make up a number, zero, zero. That maps to the same physical page because they use the same page table. So, it'll always map to the same physical page. As long as their indexes are the same. So, if I have this and like that, that also maps to the same physical page because it's using the same L zero page table. So, good, any questions about that? Yep, so the question is why did we say the system can support a 128 bit physical address? Because it wasn't even to trip you up, it was to justify the PTE size. The PTE size is 16 bytes, which is gigantic. So, the PTE, remember what two things does a PTE have to have in it? Yeah, physical page number permissions and the permission should at least be a valid bit. So, in that case, well, I needed, yeah, basically I wanted them all to have nice hex numbers. And I wanted to justify why the PTE size was so big. Because 16 bytes is equal to 128 bits and then my PPN would be equal to 128 minus the offset. So, 128 minus 16, which is whatever. So I'd need 112 bits for my PPN size. And then I have 16 left over for permissions and stuff like that, so it's actually justified. If I said that it supports 64 bit physical addresses and 16 or an 8 byte PTE size, then it'd be like, okay, that's too small, that's impossible. Which was a question before. All right, anything else we should do? If not, yeah. Locks and data races, all right, that's a good subject. Everyone's favorite. So, good lock question. That doesn't look like a lock. That doesn't look like a lock. Oh, that looks like a lock. All right, wanna do this one? Oops, here, I will give you some time to read it. And then we can see if we wanna do it or skip it. We wanna do it? All right, so code snippet, we have three threads then thread one locks mutex one and two, thread two locks mutex two and three, and thread three locks mutex three then one. So we only have three thread, three independent mutexes, and each thread executes a different function. So, first question says, identify and explain a deadlock condition that can occur in this program. Can this deadlock, sure, yeah, yeah, we have a circular weight, so we could have the case that thread one locks M1, I'll just write it shorthand, and then we execute thread two. It locks M2, then we execute thread three, and it locks M3. Now, in this case, no thread can proceed further. So thread one cannot proceed because it's trying to acquire mutex two, but thread two has mutex two, so it can't make any progress. Thread two is trying to acquire mutex three, but it can't get mutex three because thread three has it, so it can't make any progress. And then finally thread three, well, it's trying to get lock one. So thread one has lock one, it can't make any progress, no threads can make progress, therefore deadlock. Or if you wanna give this a name, this is a circular weight. All right, good, definitely have a deadlock, yep. No, yeah, if you didn't write it was a circular weight, you did not lose marks as long as you explained what the deadlock was. Generally, how I tell the TAs to grade is like, if they understood the concept, full points. They didn't, well, then use their judgment. So if you didn't have to say circular weight as long as I knew what you were getting at. All right, well, the next question says, propose a solution to avoid the deadlock and this code. Modify the code and describe your proposed solution. Briefly explains how it addresses the identified deadlock. All right, how would we fix this? Yeah, just so you try, you unlock, and then you wait. So you unlock whatever you gave up, so M1, wait, and then lock M1 again, and you do that for each of them. So you eliminate the whole hold and wait thing. So if I don't get both locks, I give up the first lock I got. What's another way to fix this that I can do in one line? Well, just swapping around two lines. A1c solution two, switch them both? Yeah, okay, if I only switch around thread three, then I also do not have a deadlock. Why don't I have a deadlock in that case? Yeah, well, they won't be fighting each other for a lock. I essentially eliminated the circular weight because I always acquire locks in the exact same order. So if I always acquire locks in the same order, I won't have the circular weight. The condition with the deadlock is lock three was held and tried to acquire lock one. So if I swap the order where I always get lock one first, I don't have a deadlock. No, just the order between all the mutexes. So I always acquire mutex one first, then always acquire mutex two second, and then always acquire mutex three third. And if one's missing, I don't have in that order. But in that case, if I just swapped around those two lines, this one, in this one I would acquire M1 then M2, in this one I would acquire M2 then M3, and then this one I would acquire M1 and then M3. So always in the same order. So one has the most precedents followed by two, and then followed by three. Yeah, all the threads, yeah. It has to be consistent for all the threads. Because otherwise, if it wasn't, then you could have a circular weight between the threads. Yeah. The thread is OK. Yeah. All right, here's a bonus question. If I only have thread one and thread two, is that a deadlock? Nope, M3 is M3. No, right? I don't have a hold and wait. The only lock they share is Mutex2. The other one's not shared, so whoever gets Mutex2 can get the other one with no contention. No one else is waiting on it. So be no case for a deadlock in that case. All right, cool, bonus question. Yeah? M3. If what? If M3 was M1? Yeah. Yes, and that would be a deadlock. Yeah. Yeah, if you swap the order, so if we just have thread one and thread two, and they're both ordered like that, essentially whoever gets lock one is guaranteed to get lock two. So we don't have a deadlock. But if we swap it around just like this, wow, that was really bad writing. Sorry. Well, thread one could get M1, thread two could get M2, and now they're waiting on each other. All right, holy crap. Just swap in two lines. That was six points. Sweet. All right. Is that a fours? Yeah? All right. So here, let me start deleting stuff. So pretend you didn't see it. All right, you didn't see any of that. Still can't see it. OK, I should probably use something better. Oh, no, that's not. Yeah, I should have. This one I screwed up and I made it in such a way that I finished it, ran it through, printed it, and then did the solutions on it and had no good way of going back. So yeah. Yeah, so question, will the final exam have a lot of short answer like the midterm? No. So I think there's five short answers. 10 marks each? No. No, the short answer is I believe between three to five, if I remember. Yeah. Yeah, so there's going to be several topics. They'll say at the top of the page what it is exactly like this. Senate four is 15 points for all of it and then it has multiple questions throughout it. They usually don't carry over. There's one mega question that starts over fresh. So halfway through, it just starts over, but it's the same kind of concept. And other than that, it's like same thing. Tell me different things about it. Usually they try and be independent. But anything that's not, if you made a mistake before, you don't get penalized for it twice. All right, so wow, that looks terrible. All right, so you were given this code. So we have eight threads. We got a center four that we have no idea what it's initialized to. We have a run that initializes everything and then we have a call to initialize thread and we have this condition. So we want all cores to execute this only after initialize everything finishes and we only want thread zero to execute this. So initialize thread should run only after initialize everything happens and only one thread should call initialize everything. So we have a little bit of an ordering problem here. So it says, assume we have n threads. All set up to start executing the run function. Each thread receives a unique thread ID. In addition, that initialize some of four executes before any threads get created. So it's safe to initialize the center four. So then it says, using a single center four, some enforce the synchronization constraints specified in the comments. You may add code directly above and see your pseudocode and be sure to give the center four an initial value. Try to make a solution that supports any number of threads. Assume that thread zero always exists. For a maximum of seven points, you may use total threads and assume that it's the number of threads executing the run function. So for that, you could have multiple center fours as many as you want. So that was just another solution you could come up with but let's see if we can come up with a solution that just uses a single center four. So if we want only thread zero to execute this, well it should probably be in an if. So if thread zero, thread ID equals zero, then well we should probably just let it call, initialize everything. So how would we stop other threads from executing? Yeah, so that decrement, what is the function call for it? Do we remember what it's called? To decrement, wait, all right. Center fours get to wait on stuff. So if we're not the main thread and we shouldn't execute that initialize thread function, well in here we should probably just do a sem wait on whatever the center four. So if we want all these threads to wait and actually not execute, what does the initial value have to be? Well first, everyone remember what a center four is? Yeah, so center four is not just one zero, it's just any integer. So it's just a number, two operations that are atomic. I can post, which will increment it by one and I can wait, which will decrement it by one. If it's currently zero, it'll just sleep that thread until it can successfully decrement it. Think of something else maybe. All right, so if I want these threads to actually wait, what should the initial value be? Zero. All right, so if it just looks like this, well if threads one through seven start executing, well they'll be stuck here until eventually thread zero runs and it runs this. So how would it free all of the other threads? Yeah, increment it after running thread zero. So I could post and I could just post that sem. Do, do, do, that sem, whoops. Oh, I made a mess of things. All right, I will write it in this color sem, post. So now, if I only have one post, that only lets one thread run, right? Hmm, what else should I do? Yeah, I could post right here as well. So now if I have eight threads, well each one after it decrements it would increment it to let another thread run. So I could have as many threads as I would want to do this, yep. Yeah, so that posting seven times in thread zero was this answer. So that answer capped you minus two. So if you want to, yeah. If we want to use total threads, we could just increment, we could just post total threads minus one or even total threads doesn't really matter at that point and just have a wait. So both solutions are acceptable, one slightly more acceptable. So it wouldn't cause a thundering herd problem because seven things can execute, so you wake up seven things, that's what you want. You'll, with a mutex, the only reason we have a thundering herd problem is because with a mutex, we know we can only have one thing running. So why bother having eight if we can only execute one? Here, if we have seven, we can execute seven. Yep, yeah, if it's zero. Sorry? Yeah, okay, yeah. No, so we don't care if the other ones run in parallel and this, we can run in parallel, so we don't really care. So we want, after the initial thread runs, initialize everything, we could have the rest of them run in parallel if we want. So this will set off the chain, so that would change that zero to a one. And now every thread that waits, it would successfully go ahead and decrement it from a one to a zero. And then, well, it would also, once it gets by the weight, it would increase it back from a zero to a one and it would essentially wake them all up, which is what we want. Yeah, yeah, the weight is just to prevent any other thread other than thread zero from executing. So they all get stuck on that weight and then eventually when thread zero is done, one of them will pass the weight and then that will post, then cause another one to wake up, post cause another one to wake up, post cause another one to wake up, post, Jesus, sorry? Oh, this one? Yeah. Not relevant for us because I told you it wouldn't be, but it was like that P shared. So whether or not the center four is in shared memory. So if I set that to one and I forked that center four would be the same in both processes. I said I wouldn't do that because it's already confusing enough. All right, are we done with this question? So we did that, is there anything else? Consider the following solution without center fours. Oh, okay, so this is the example of a question that, you know, same idea, but doesn't carry on. So if you got the first one wrong, it doesn't matter. So here is a solution without center fours. So we check if we're thread zero, then we initialize everything. Otherwise, we just yield once. So assume that we have cooperative user threads, explain why or why not the solution is correct. Wow, that's worded terribly. So does this work if I have cooperative user threads, AKA your lab four? Yeah, you can assume that it's your lab four implementation. Yeah, so this, so assuming you have an implementation similar to lab, well, for you guys it's lab four. So if it's in FIFO order where you get thrown to the back, then if everything would essentially have its turn and then everyone would get skipped, no matter where thread zero was, it would eventually run. And then it would run, initialize everything before anything else, and then they would all continue past that. So that would work. All right, next part. Assume we have cooperative kernel threads running on multiple cores, does it work now? No, right? So even if they yield, well, if I actually had enough cores, it could all be running in parallel so that yield might not even do anything. So depending on the scheduler, you might have thread two or whatever, make it here before any other thread and whether or not yielded might not even make a difference. Because they're kernel threads, the kernel can go ahead schedule them in parallel. You don't get to tell the kernel what to do for the most part. So that makes sense. Yep. Yeah, if they're kernel threads and you have way more CPU cores, if you yield, it could just be like, okay, you're yielded, it's your turn immediately again. Go for it. Yeah, and then just continue. Yep. No, so join was blocking the thread, that was it. Yeah, so in lab four, if I had you implement more, you could have implemented thread sleep. So that could have blocked a thread until some other thread called wake up and unblocked it. You could have probably implemented that, it just would have taken some more time. Yep. So even if you have a single core, like cooperative threads might be good, if you want to switch, makes progress on a bunch of tasks at a different time. Like you want to be able to switch, make some progress, switch to something else, make some progress, switch to something else, make some progress. Sorry? Sure, banking example or a web server or anything like that. Yeah, but we could have done it with user threads if we wanted. Anything you do with kernel threads, you can do in user threads. You just won't get any benefits, but if they're user threads and they're doing IO or something, well, if one thread blocks and the whole process gets blocked, right? So that's the main drawback for that. So if one thread makes a read or a write system call that needs to physically read and write a file from, for example, your file system, that's slow. So that system called blocking, so you won't return from that system call until that operation's done. You've read something from the disk or wrote in something to the disk and in that case, your process is just blocked, right? It can't, it's in the middle of a system call. Yeah, if the kernel only knows about your one user thread that made that system call, well, it just knows that whole process is blocked. If you have user threads, the kernel doesn't know that that process has other threads that could execute. All it knows is that, hey, there's one thread I know about, it's doing a write system call. I will tell it when it's done. Okay, so we've got 10 minutes left. Any other pressing things or should we go to the next one? Condition variables. Condition variables, file systems. Wait, which one is this? This is the, okay, my screen died, cool. So page replacement. We shouldn't have to do a page replacement. Everyone knows page replacement or at least knows to look at it. All right, clocks are good. So threads question, oh, there's a bank question. Does it have deadlocks? Here's a center for question. Memory allocation, disks, virtual machines. Yeah, condition variables, I think it was just a short answer on this one. There, short answer. Briefly explain the purpose of condition variables and how they are used in conjunction with mutexes to synchronize threads in a multi-threaded program. You wanna do that one? So anyone wanna tell me what the purpose of condition variables are and why you need a mutex with them? Yeah, and why do I need a mutex with it? Oh, you just, God damn it. All right, anyone that just doesn't read my line? That was a good one. All right, so it also needs a mutex. Well, because a condition variable is basically a queue. So it needs to be able to add itself safely to the queue without a data race. So that's one reason to use a mutex. The other is that, well, it needs to put this thread to sleep. It needs to give up the mutex so that another thread can go ahead and grab that mutex, hopefully update the condition variable without a data race. And then when you wake up, you go ahead and you try and lock it again. Okay, that screen's dead. All right, so just a short answer question, but for you, you might wanna know how condition variables are used slightly better? Yeah, know the function calls and know it's essentially a queue. I remember I did something with one. Long answer. Long answer, but like half a question, long answer. That's the mega question. And by mega questions like 25 instead of 20, maybe it's worth 30, I don't remember. Yep, there's one short answer off the top of my head. Processes, threads, virtual memory, file systems, set it for slash condition variables, locking, sorry? Page replacement, yes, of course. Yeah, page replacement. I think that's it. I remember you said that. Nope. Oh, I shouldn't have said that. Maybe, maybe but no. Yeah, it's more heavily weighted towards after the midterm, minus important things. So processes, well, everything involves a process. As long as I show you code, that's a process. Can't get away from that. And virtual memory is always fun. This, they're fun. This are okay. Nothing major though. This one had this. Briefly explain the concept of RAID 5. And hey, I guess I took inspiration for this one because this one only has five short answers and then goes right into questions. So yours kind of looks similar to that. Yep. Nope, okay. All right, anything else for the last five minutes? File system, speedrun, yeah? This exam is good. Realize that I wrote it so it won't be exactly the same as this. So anything that this doesn't quite test, you should be responsible for. So topics are the same, but obviously the questions aren't gonna be the same because, well, exams that are the same year or over a year don't really test anything. Test who got the last year's final. So you got a request for file systems real quick. No virtual machines. Nope. All right, file systems. All right, let's speedrun this. All right, for the following questions, assume your file system has a block size of 4096. Each point or two of block is four bytes. I-nodes are 128 bytes each. What is the minimum amount of space in bytes you need for a file system, like your lab, EXT4, to store 1,000 files? Assume that each file is 96 bytes in size. Include the size of the I-nodes in your calculation. You may skip the final calculation if you don't have a calculator. So how much space on the drive would a single one of these files occupy? That is 96 bytes. And you can see the answer anyways. So even if a file is smaller than an IO block, well, a regular file will still consume an IO block no matter what. So each of them would need a single block, which is 4,096, and each of them would also need an I-node. So add that, you need that many bytes per file. So times it by 1,000. If you don't have a calculator, leave it like that. You'd get that many bytes or a fair number of bytes. So next question. So the previous question, how many bytes are lost due to internal fragmentation? You may say zero bytes if that's the case. Yeah, 4,000 times 1,000. So for each file, while they get an IO block of 4,096, they're only 96 bytes. So we have 4,000 bytes lost due to internal fragmentation per file. Therefore, if we have 1,000 files, then we have 4,000,000 bytes lost due to internal fragmentation, which is most of that number. All right, so this is a question that I thought of when I was copying a bunch of small files in Windows, and it was super slow. And then I was like, oh yeah, this course. So why might copying a directory containing a single large file be faster than copying a directory with many small files? Even if the total size of the small files less than the size of the large file? Not quite, yeah. So this is copying a file to a new thing without any caches involved or anything. So like for instance, what's 4,000? It is like four megabytes or something like that. So let's assume like a four megabyte file versus that 1,096 byte files. So yeah. Yeah, well in this case, we're a single large file. There's just one single inode and it points to blocks. Those blocks are all going to be full except for essentially probably the last one. So it's essentially going to have zero internal fragmentation because it's a single large file. It's only gonna waste some space on the last block. And aside from that, it's just one inode. If it was a bunch of small files, it needs to copy an inode for each of those which in this case is bigger than the file itself. And to read the contents of each of those, well, it might actually have to access more blocks than that single large file because there's so much internal fragmentation. Reading a block is what the file system's doing and if it consumes more blocks, it's going to be slow. And in Windows, if you see once you get to a whole bunch of small files, the performance goes like that for the copying speed. That's because the copying speed only measures the contents being copied from the contents of the file. It doesn't include like the IO equivalent of Windows or anything like that. So it's doing a bunch of more disk accesses and it's not really copying anything. It has to read a bunch of blocks. All right, and we're over time. So with that, just remember, pulling for you, we're all in this together.