 Today's the last lecture of our midterm week. So what we'll do today is go over the solutions to Wednesday's exam. I have a chance to sort of talk about some of these things together. We haven't started grading the exams yet. So if we uncover problems today, then they can be incorporated into the grading. I got the sense that people thought the exam was a little challenging. Sorry about that. It's hard to tell exactly how hard the exams are. But I hope that you guys thought the exam was fair in terms of what it covered and what we asked of you. So don't worry about it being hard. It's not going to matter. That affects everybody equally. OK, so I might finish a little bit early today. I'm not going to go over the multiple choice unless people had specific questions. Everybody will get credit for the first question. I always try to give a fun question on the exam and everybody already fail in some silly way. So I'll just accept any possible response for that. Hopefully they didn't throw you off and ruin the rest of your test. Any questions about the multiple choice? I didn't think. I will post this solution set after class. I wanted to have it up before, but I ran out of time this morning writing the answer. And it would also be nice to incorporate some feedback from you guys if you have any questions about the exam. So any questions on the multiple? As my old advisor always used to call them the multiple guess questions. All right, I thought these were pretty straightforward. OK, so the short answer questions. Let's start from the most straightforward and work backwards. So the process page tables question. So you were provided with three page tables. Remember, these are 100 byte, no, sorry, 10 byte pages. And the happy OS operates. This means it's two level page tables. It's 1,000 byte virtual address space. Divides the virtual address into a virtual page number and two indices into each table. This is the one time of year where I'm going to be happy that I have a laser pointer. So the first question was, given a virtual address 273, identify the virtual page number offset and first and second level page table indices. Who thinks they can do this one? What is the virtual page number? Anybody want to guess? 27. What's the offset? 3. What are the first and second level page table indices? 2 and 7. Is this a valid address for this process? We didn't ask you to translate it, but it's not, right? Doesn't have an entry in the first top level page table. OK, so the rest of this question asks you to do four more of these translations. I don't know if I can get them all on the screen at the same time, but I'll try. So the first one is 165. So I'll just walk through one of these. So how do you tell how many bytes are for the offset? What was the critical piece of information you needed to have absorbed? The page size, the 10-byte pages. Everything below 10 is the offset. It's the 10-byte page size that allows you to divide the virtual address into a page number and the offset. The page number always identifies units of the page size. Does that make sense? OK. OK, so the first one I asked you guys to translate was 165. The way you would do this is you really didn't need to strip off the virtual page number. You could just go digit by digit. So one, now what this table tells you is the location of the second level table. And there were three second level tables. Potentially, the only wrinkle in this question was that the three tables are not in order. This one points here, five points over here, and nine points there. So I'm such a bad, tricky person for not putting them in order for me. I almost put them in order because that's the natural way to lay them out. But I thought, just see if people are paying attention. OK. So I start with index one. I follow that. I get the second level table located here. Now I'm using the second index, which was six. And that identifies a physical page number of 14. I started with 165. Therefore, my physical address is what? 145. I have mapped my virtual page number to a physical page number. And I just slap on the offset. Any questions about this? OK. The rest of them work the same way. Second one, there was no entry in the second level table. There was an entry for five in the first level table. But the second level table, which again is over here, has no entry for seven. That's the entry for six. The third one, potentially a little wrinkle here too, what index in the first level page table do I need to use? Zero. So I use zero. Zero doesn't map anywhere. So this is also an invalid address. And the final virtual address was 900. That one worked. It took you to the second level table at 320. That one mapped zero to 31. And I got 310. I used the index offset zero from the original virtual address. So this question was, hopefully if you didn't get confused by where the second level tables were, this question was pretty straightforward. Any questions about this before we go on? OK. Let's see. That one's a little bit interesting. OK. So let's do the second question that was somewhat mechanical. So we gave you a piece of code. And I've updated the exam to reflect the clarification that was provided during the exam, which is that we ask in about 4K pages. So given this piece of code, the question was, what's the minimum number of pages, minimum, that this process would have to allocate, would have to have allocated at this point in the code? So at some point in the exam, somebody asked about struct-fewer, but it doesn't matter. Because struct-fewer hasn't been pushed onto the stack yet at this point. So the size of struct-fewer doesn't matter. You didn't need to know that because I've hit this point. I have not allocated struct-fewer yet. So the size of struct-fewer is irrelevant. So how many pages? OK, well, two code pages. Why two? Why at least two? Here's one 4K code page right here. This is a static variable. It's going to be in the code section somewhere. We only ask you about code, heap, and stack. It's not in the stack. It's not in the heap. So it's in the code section. There's one 4K page. This is a Uint32. There's 1,024 of them. That's 4K. One page. The other page is for other. At least one more page because there's actually code in this program. There's the code that's located inside the main loop. So there has to be at least one more page. For the stack, I have a similar situation. I've pushed, at this point in the code, I've pushed the bar array onto my stack. So I have a 4K array on the stack. It's very similar to foo. It's a 1024 array of Uint32s. There's a 4K on my stack. And then another stack page for something else. Because there has to be at least one more thing on my stack at this point. And that one more thing could be buffer. I need that's a locally allocated variable. It could be the constant care star that main gets. Remember, when you've done exec v, that gets allocated on your stack. So there has to be at least one more stack page. So at least two stack pages. And then for the heap, at least three, can I silence the constant stream of whispering from that side of the room? Sorry, it's been distracting. And then three heap pages for the 10K heap allocation. This is 10K bytes. I'm using 4K pages, so I need at least three pages of heap. Just to store that data. Yeah, it doesn't matter. It doesn't affect the answer. Buffer, that's true. I probably should have put an allocation definition for buffer, but whether buffer is in the code or in the stack, it doesn't matter. Because at least the code in the stack have to have at least one more page for something. And where you put buffer doesn't affect the answer. Good question, though. Yeah. What's that? Foo? So there's no data segment that we asked about. It's just code, stack, and heap. I still can't really understand the question. But potentially, if you wrote that in the answer, I think we'll accept it. I mean, the code segment that we were considering would have included a statically allocated variable. So if you wrote down specifically that you did not ask about the data segment, but this would be in the data segment, that's fine. If you didn't mention that, it's wrong. It's a reasonable clarification. This is why we do this. OK, any other questions? All right, let's smooom on. Let's see, which one do I want to do next? Yeah, OK, this one, this is too bad. All right, so we asked you, what would you have to change about the OS if I said that you now have to support 48-bit virtual and physical addresses? We've been talking this semester about 32-bit addresses, which were traditional on systems for a long time. But at some point, someone decided that 4GB of RAM wasn't quite enough, and addresses had to start getting bigger. 64-bit addresses are huge, right? I mean, with 64-bits, I think you can address every atom in the universe. So that's maybe too big, right? And 64-bits cause a lot of things to get quite a bit bigger. 48-bits might be a good compromise. So what we need to change? Clearly, page table entries. So we've done some work to get the page table entries to include necessary information about being able to map pages. But that is under the assumption that I'm using 32-bit pages. So I clearly need to do some work to my page table entries. The address space size itself. So this is a design decision. Having 48-bits of addressing does not mean that I have to support larger address spaces. It could mean that I could simply support more of them. So this is potentially something that we've been intermingling these two concepts during this year, and that's pretty traditional because 32-bit address spaces and 32-bit addresses. But you could have decided to increase the address space to 48-bits, and that would be reasonable to do. It's not required. Larger pages, now that I'm not that I'm increasing the width of my virtual addresses to 48-bits, it may not be appropriate to have only 11-bits of offset or 12-bits of offset anymore. I might want to have more. The move to 48-bit addresses is caused by the larger amounts of memory that are available on today's systems so that's the same thing that might cause you to say, you know what, 4k pages are too small. Let's make them bigger. Different page table structures. So the two-level page table structure that we discussed in class is going to start to struggle when you get to addresses that are this wide because the parts of the page table itself are going to have to get quite big. So for example, if I take 12-bits off of my 48-bit page, that leaves me with 36. And so both of those, if I want to have two-level tables that are balanced, they each have it to support 11, 18 times 4. So those get bigger, and actually they're bigger than a page as well. So those would be multi-page data structures. So you might want to go to a three-level table at this point, or something even deeper, a four-level table. And finally, clearly, if you said anything about the MMU or the TLB, I think we're going to give you a point for that, because duh. I mean, the 32-bit MMU is not going to properly map 48-bit addresses. We asked you for two. So any two of these would work. You don't have to use larger pages. No, no, I mean, you could continue to use 4k pages. I mean, I think it would have been hard to justify saying that we should use smaller pages. I would be interested to see if anybody managed to pull that off. That's a hard sell. But no, you didn't have to change a page size, or the address space size, or anything else. Just had to talk about a couple of things that would change. All right, so I think, oh, and actually I didn't complete the second part of this question, and the solution is a good point. So let's talk about the second part for each one of these, for the page table entry. I mean, regardless of what I do, 48-bits are going to cause me to have to store more state. That's almost inevitable. So my paging-related data structures are going to get larger. Page table entries are going to potentially get larger. The page tables themselves might get larger. If you're still using two-level tables, those levels get quite a bit bigger. Address-based sizes might not affect the memory or computational overhead in a meaningful way. The MMU change is not clear that I can't, it's not clear that I would be able to support the same number of entries in the TLB if I have to go to a larger matching width. But those are some of the implications of some of these changes. Questions about this question? Yeah, I mean, you could argue that I can still have an MMU that should be able to do this in constant time. Let's put it this way, you guys don't, I haven't taught you or expected you to know enough about how that hardware actually works to be able to make the argument. Without that information, I would say, at least it can be done in constant time. That constant might be a bit bigger for a wider page size, but I don't even know. All right, good. Okay, onward and upward. Okay, so eliminating deadlocks. So we talked about eliminating cycles in class. That's the sort of canonical way to eliminate deadlocks when you have the ability to walk multiple resources. But there are some other ideas. So one of the things, the first thing that we mentioned when you look at the four ways to eliminate deadlock is just don't have productive access to shared resources. But that's sort of, that's hard to get rid of. If I have a shared resource, I probably need to be able to use it in a safe way. So that one sort of off the table. That left you with two others. So the other two requirements for deadlock are that I cannot be preempted once I have a resource. So I essentially, I can't take a lock away from a thread once it's acquired one. And the other requirement is that threads are allowed to make multiple requests. And there's a couple of ways to work around both these. So with the first one, and there are systems that actually do this. So in the first case, it's possible that if your clever, if your kernel is clever, it can actually identify a deadlock by seeing a cycle on the weight queue. So if I have some notion of what threads are waiting for, it's possible that I can see on the weight queue that there's a circular dependency. So, and again, there are tools and systems that can do this at runtime. They can detect when a deadlock occurs. That's the first thing you have to be able to do. Unfortunately, once you get there, it's still not clear what to do. How do I fix this? I can shoot one of the dining philosophers, for example, but that's, so for example, I could kill a thread. We will probably give you credit for that if you put that down. Because let's put it this way, it's better for some of the system to continue to make forward progress. Even if it means tossing one of the threads out of the boat, then for the entire system or a portion of the system to just be stuck forever. This is actually, if you guys use, well hopefully you haven't done this to your virtual box, but if you use Linux systems, you'll find that Linux has what's called the out-of-memory killer. And essentially, this is how the out-of-memory killer solves problems. When the system really runs out of memory, I think we'll talk about this next week, what do you do? And I mean both physical and swap space, right? You're totally exhausted. You can't allocate pages anymore. Well, what Linux does, it has this very complicated procedure for choosing a process that just gets terminated. There's nothing else to do at that point. You have a series of bad decisions that are available to you. And so you try to make the best bad decision possible. So you could say, I'm just gonna terminate one of the threads. That should allow the others to make progress. If you wanna do a better job of this, maybe you could provide a way, when you lock a resource for the thread to register sort of an escape hatch, so that if I need to take this resource away from you, here's some code to execute when that happens. You can imagine passing a function or a pointer or a callable or whatever you want that will get run when you take the lock away from the thread. So it's a signal to the thread that I grabbed a lock. Then I went and did some other work. I tried to grab lock two. That had the potential to cause a deadlock. And so the system actually backed me all the way up again to where I had grabbed lock one and said, you don't even have lock one anymore, right? This is a little bit tricky to do. The simplest solution would have just been to say to kill one of the threads. Why not? So in the second case, allowing multiple requests. So this is actually a little bit of a simpler solution, which is to just make it impossible for a thread to hold two locks simultaneously. I go to acquire the second lock and either it just fails, which is probably not the right thing to do, or I assert that I don't have another lock and the thread dies if it tries to acquire two locks at once. Now putting this in place means that you're gonna have to redesign a lot of code because there's a lot of code that safely acquires multiple locks. You're gonna have to go back to that code and fix it because now all that code is blowing up all over your system. But it's not clear that you always, in fact, I would probably believe that you can write the same code and instead of acquiring two locks, you can do things different, right? So this would force you to change how you design parts of the system. Alternatively, and this is something we mentioned in class, you could provide a function that allows a thread to request two locks at once and does not succeed unless it can return both those locks simultaneously. So these were your options for this question. Your questions about this question, yeah. No, this is really a resource problem, right? This is a correctness problem. Remember, the threads that are deadlock aren't deadlock because they're, they're not necessarily deadlock because they're waiting for a resource, right? They're deadlock because they're trying to use the resource correctly, right? So for example, you might have a table as part of assignment two where you allocate pits. Fixing deadlocks on that table, it doesn't work to just allocate another table, right? Because I need one table, right? I need that for correctness. I need to lock it for that to happen. That's a good question. Any other questions about this one before we go on? Okay. So that was the first one. We've done this. Okay, pipes. Here we go. So this question, the thing we were looking for you to mention was simply that by opening a pipe and passing data between processes, I can, I have more information about when a process can make forward progress. So remember, in general, I had this problem which was when a process was blocked. I didn't know how long it was going to remain blocked and it was unclear in many cases what would cause it to be unblocked. But in the case of a pipe, it's very clear. So here's my example, shell pipeline where foo is piping data to bar. I know that if the buffer is empty, bar cannot make progress. And I know what is going to cause the buffer to become non-empty, which is foo running. So if you identify that relationship between these two processes and the fact that that's information that the OS knows, then you're gonna get some credit for this question. How do you use that information? And what's the resource allocation trade off? So for example, if I want, let's say I want to reduce the number of switches, contact switches between foo and bar on a single core system. The way to do this is pretty simple. Run bar until it's finished. Just run it to completion. That way bar has generated all, sorry, run foo, foo's first. Run foo to completion. That way foo is generated, all the data it's going to generate. And I never have to worry about bar blocking because the buffer is empty. Then I run bar. And bar should be able to run to completion because foo is done, right? Bar, you know, when foo completes, the buffer has all the data it's ever gonna have in it. And when bar is done, the output contains all the data it's ever gonna have. What's the problem with this from a resource allocation perspective? Okay, so that's fair, yeah. I mean, I might get more mileage out of using both foo and bars. But what resource do I know is going to be overutilized if I let foo run to completion before I start bar? Where does the data that's piped between these processes go? It's buffered in memory, actually. But it doesn't really matter. It has to go somewhere, right? It has to be stored somewhere. So the output from foo is gonna be stored before I give it to bar. If I run foo to completion, that means I have to store all of foo's output. By definition is the maximum amount of output that I have to store. We did cover in class that pipes are buffered in memory, but regardless of where I buffer the pipe, I have to store all that data. Then I have to feed it all the way into bar. So whatever resource I'm using to store the pipe contents, running foo to completion first will use up as much of it as possible. The tradeoff is if I run bar more often, the buffer size will be smaller. I let foo write some data into the buffer. I stop foo, I run bar, bar clears the buffer out, and then I start foo again. If I go back and forth doing this, I can keep the buffer to a smaller size so I don't have to use as much memory for the pipe buffer. The tradeoff is that I have to stop and restart bar and foo more often. So this is the tradeoff I had in mind when I wrote this question. If you identified another one, like the one that Isaac pointed out, you'll get credit. Questions about this? Okay, let's look at, oh, we did this one. Ah, okay. So separate exception handling paths for TLB and exceptions and interrupts. So the question was why would I use this? Why would the hardware provide this? And the second was how would the OS take advantage of the differentiation? And to some degree, the second question sort of answers the first, right? So the answer to the first question is just sort of because it allows the operating system to do this. If I bundle all the faults together and push them down one path, then there's no way for the operating system to really optimize that, right? All the faults come in in one particular place. So the answer here, there's a variety of different ways to think about doing this, right? But one thing to observe, and another thing to observe with the first part of the question is that what's the other difference between TLB exceptions and memory-related exceptions and other? What do you think there are more of now that we know how virtual addresses are translated? There's a lot of memory-related exceptions. Processes are generating these all the time. Certainly, processes generate way more memory-related exceptions than they do system calls by a large order of magnitude problem. And even when you start thinking about interrupts that are generated by hardware, still the TLB memory-related exceptions are the most common. And in a lot of cases, I can handle those exceptions pretty quickly. So in certain cases, for example, the TLB might generate exception that I can, you know, I look up the entry, I load it, and I'm done. And so because there's so much commonality to the way I can handle some of the fast path TLB exceptions, it can make sense to put them on a separate code path so that I don't have to save some of the state, for example, that I might have to to handle a general-purpose interrupt. So remember, on the general-purpose interrupt path, I'm saving every register that the thread was using. And then I'm gonna restore all of them later. The other thing can be that memory-related exceptions don't have to and frequently don't cause a context switch, whereas interrupts and system calls can frequently do this. So if I have a context switch, there's a lot more information I have to save every time because I need to create a trap frame for that process so that I can restart it later. If I'm on the memory-related system call path, if I can design a handler that uses fewer registers, then I don't have to save those other registers that the process was using because I don't touch them. So, and that's really one of the answers or part of the answer is that if I can eliminate some of the overhead of storing all the state necessary to restart the process, then I can eliminate all of that work that I'm doing when I come in for our memory-related exception. Any questions about this question? I think if you identify that this is something useful because I might wanna handle these exceptions differently if you said something about the number of times that these exceptions are triggered, I think that you'll get full credit for this question. All right, okay, so let's move on to the long answer question. So we have, I think a good amount of time to talk about these, okay. So the first, let's talk about the first long answer question. So the first long answer question is about the size of the operating system interface. So, kind of canonically, people when they compare Unix and Windows, one of the things that they'll talk about is the fact that the Unix system call interface is fairly thin, provides a small number of calls, and probably even a smaller number of those calls are the most heavily used. Whereas Windows, for a variety of reasons, provides a much bigger interface. One of the reasons that Windows provides a bigger interface is because the interface has changed over multiple iterations. And so, I think the 3,000 calls that we put in here, that's only for one version of Windows. If you go back and you look at Windows and its compatibility modes, it's actually supporting a lot more calls because it's supporting old Windows API calls from previous versions. So there's some backwards compatibility problems here. But regardless, the idea is I can implement, I can take the same set of core features and I can expose them through a thin interface that may cause the process to have to compose multiple calls to do what it wants to do. Or I can expose it through a thick interface that exposes some of these calls sort of in a different way. So let's go through some examples of this. Well, first of all, I mean, what is good about a small interface? You guys have been working, you guys implemented the, I mean, it shouldn't be hard to answer this question because you guys implemented some of the system call interface for assignment two. So what's good about a small interface? Yeah, it takes, it's less code to write. It's easier to implement and support. The bigger the interface gets, the more calls there are, the more there is to do. And the other thing is that when an interface provides multiple ways to do things, it can be difficult for app developers to know what the right way to do something is. I've got three different calls that open a file in sort of subtly different ways. Why would I use one versus the other? So maybe I picked the one that I've been using before, but maybe they've added a new one and a new release that's a little bit different. So having one way to do something that's provided by an interface can be very helpful because it constrains the developer to only use one interface. And it also constrains the maintainer to only support that one way of doing. So in the answer, and you weren't required to talk about this, but when people, when there were complaints about Microsoft's monopoly of Windows in the past, one of the things that generated those complaints was the fact that if you, for example, were implementing a web browser called Mozilla, you were working on your Windows version, you had these maybe six different ways of doing this particular thing. And you'd be scratching your head thinking, what am I supposed to use? Which is the one that's going to be supported in future versions of Windows? What's different about that person than the person who's building the Internet Explorer web browser? What thing can the guy who's building Internet Explorer do than the guy who's building Firefox cannot? Yeah. Yeah, he can call up the guy who maintains that interface and be like, dude, what's going on? Which one should I use? Or alternatively, and there were allegations about this, he could call up them and be like, by the way, you know what would be super helpful? Is if you had a system called that did X, where X is this useful thing that I want to do. K things by next interface, there's this new, did that ever happen? I don't really know. Is it likely, of course? I mean, this is co-designed software, so you try to design it to meet the requirements of other people you work with. In any case. Small interfaces, potentially easier to build, but of course the trade-off with the small interface is that, and the nice thing about a large interface is there's all these things that you can feature is you can add to the large interface. You can start to do certain things in different ways, and you might be able to simplify things that require several different calls in the smaller interface. So I'll scroll down to the answer. So we asked you for some problems with the small interface, and you guys could have talked about other things. The main things we were interested in, you guys observed me, was the fact that frequently I have to use multiple system calls to do things that are potentially could be done with one system call. Fork and exec, the conical example of this, right? I have to, in order to create a new process, it's running a new program, I have to call fork, and then I have to call exec. And we've talked in class about why this is bad, and why it would be nice if I had a single system call that would do these things. Open, so here's some other examples. Open followed by read and write. What if I just want to read an entire file into memory? I've got my buffer already, and I don't, or I want to read as much data into the buffer as possible from a file, whatever. Why do I need to open the file, mess with the file handle, I've got to allocate space in the process, file table and do all this stuff. And what if I'm going to just open, read the entire file and close it again? Why can't I do that in a single call? Seems reasonable. Read and write and L-seq, right? So file positioning is implicit in Unix which is kind of annoying, right? If I want to move to the end of the file I have to call L-seq. If I have multiple threads that are using the same file handle, now I have a synchronization problem at the process level which is really gross. So these were examples. And you guys could probably have come up with more. Where I have to use two calls, or more than two calls potentially, to do the work of one call. And of course, there's overhead involved here. Every time I perform a system call, I'm gonna block, some other process may run, but I have to get into the kernel, there's overhead associated with that. So these were the type of examples we were looking for you guys to pull up. And in most of these cases, unless you guys came up with some weird things, there wasn't really a huge amount you had to do to implement this. Other than add a system call number, add a call handler, and implement. So there's not necessarily a lot of new capabilities you would have to handle failures properly. Questions about this problem? Yeah, no, I mean it's not just context, but it's just more kernel transitions in and out of the kernel. Every time I perform a system call, I have to cross the kernel user boundary, there's high overhead to that. So if you think about it, the thicker interface gives me the ability to do more powerful things with the interface, with one call. So again, rather than having to compose multiple system calls together to accomplish what I want, I can use a single call that does exactly what I need. Any more questions about this question? Yeah. I guess that's true, I mean, for a thick interface, you would have more, the kernel would get a little bit bigger. It's not clear to me that thick interfaces don't just compose existing code, but yeah, that's a reasonable trade-out to identify. The thick interface, your kernel gets a bit bigger. The volume of code you have to maintain gets larger, and that's, I think, the bigger problem. It's just harder to maintain. Even if there's a lot of overlapping functionality, I still call it different ways, and I have bunches of different ways that I can get onto the same codebath, and that could be very difficult for people to think about. All right. Okay, so last question was on concurrent system calls. So the idea here is, and toward the end of the semester, we'll look at some fantastic papers about this problem, but now we have these 48, 64, 128 core machines, and shared memory multi-processors capable of running existing operating systems what people have found is that a lot of times, you don't get the scalability that you want on this machine. I double the number of cores. I want my app to go twice as fast, but it doesn't, it only goes 20% faster. And what's been happening, even for programs that are themselves well-written, is that increasing core accounts have started to expose scalability problems within the operating system itself. And in some cases, so again, towards the end of the semester, we'll look at two examples of this. So in some cases, just clever engineering inside the operating system helps, right? So better, so for example, a lock. When I have two cores, this one particular lock was not a problem. I have 48 cores and this lock is a huge problem. And so now I have to think about how to redesign my internal data structures in order to get rid of that lock. But in other cases, it's the specification of the interface itself that's the problem. There are things that the calls are doing, guarantees that they're making that don't scale well. And so this code has to look at example of that. So we gave you this little loop where threads repeatedly open and close a file. And you can imagine that I'm trying to run as many threads as I have cores on the machine. And what I want is for all of these threads, as I increase the number of cores, I want the performance of the threads in this stupid little program to scale. So I want them all to be able to run at the same speed. They should be able to run on separate cores, essentially entirely independent of each other. The problem here is that they don't. And we told you that this code would not scale well on existing systems. Notice that this code really doesn't do any work. It calls open and close. Just over and over again. Don't worry about why. It's just what it does. And I recognize that this question was a little bit subtle and so I tried to sort of help lead you guys to the correct answer and so the note here points out that open is implemented to return the lowest available file descriptor. That's part of the interface specification. That's something that Linux says that open will do. Now, apps don't rely on this behavior. We're a very, very, very small number. The file descriptor you get back is an int but technically it's just a handle that allows you to identify the file to the operating system. So who cares? Whether it's the lowest or the highest or anything in between, it doesn't matter. So scroll down to the answer again. And I will be honest. I felt better about this question after Monday's review because we talked about a very similar situation in Monday's review, although not identical. So the problem that you encounter when you run this code is that open because it has to return the lowest available file descriptor it has to perform a linear scan of the file descriptor table starting from zero in order to find the lowest file descriptor and technically to guarantee that open returns the lowest file descriptor, I have to lock on top of this loop. I can't even safely do the read lock read optimization we talked about because remember that only ensured that I found a slot in the table. I'm supposed to ensure that if I make three calls to open that the first one gets zero, the second one gets one and the third one gets two. They have to return in that order, that's in the interface specification. And so if we use the locking trick that we talked about on Monday, I could safely ensure that they all got a file descriptor but I couldn't ensure that they were handed out in the correct order. And so now what's happening is every call to open is doing a linear scan to this table with a lock. So I have a critical section. The file descriptor table is a shared resource inside the process so every thread is gonna be looking through this and I'm locking across a long critical section potentially. So there is your scalability bottleneck even for this little stupid piece of code. How do you get rid of this? Once you've seen it, it's not hard. Just don't return the lowest file descriptor anymore. Change the interface specification to not guarantee that. We told you apps don't depend on it so they're not gonna break. And once you do that then the sky's the limit. You can, you know, the ideal solution here would be to just start at a random point in the file descriptor table and start looking for a file descriptor there. Just making that change even if you grab the lock before you look at the table is going to help because remember when I'm returning the lowest available file descriptor the file descriptor is gonna be used in use are all packed at one end of the table. So I'm starting my search at the point in the table where I'm the least likely to find an available file descriptor. If I start my search at a random location I can do better. And once I do this, if you were there for Monday's class you could also have pointed out that I could do the read lock, read optimization meaning that I don't even have to hold the lock outside of the loop. I could just grab it inside of the loop. So again, this is an example of how a very small change to the interface specification allows me to remove a scalability problem. Questions about this problem? I have a sneaky, how many people did this problem? Oh okay, maybe some of you. I was worried this would be like the problem that no one answered. Okay, any overall questions on the exam? So your last chance to influence our grading has already successfully earned one point back by defeating my fun question. Any other questions on the exam? Okay, on Monday we will go back to talking about virtual memory and we will talk about page replacement policies and how to ensure that we swap out and in the right pages. I will see you then.