 Welcome back everybody to CS 162. So we're going to move on to start talking about address translation and virtual memory now. But if you remember from last time just to remind you we did talk about deadlocks. And basically we were distinguishing between deadlocks and starvation so starvation is a general situation where the thread weights indefinitely. Maybe a low priority thread waiting for resources constantly in use by high priority threads. Deadlock is a particular type of starvation where there's a circular waiting condition that's not going to resolve by itself so in the case of sort of generic starvation. There's a good possibility that, for instance, if all the high priority threads went away then the low priority one would get to run the case of deadlock because of this circular weight condition it's never going to resolve. Here was the picture that we kind of used here where thread a is waiting to acquire resource to is already owning resource one and thread B is waiting to acquire resource one but it already owns resource to. And this is the circular waiting that we talked about. And of course deadlock is type of starvation but not vice versa. So then we went and gave some examples and came up with four conditions. For deadlock. Now these are necessary but not sufficient. So you need, you need to have all four of these and that doesn't necessarily mean you have deadlock but if you have all four of them you might have a chance for deadlock. Neutral exclusion said basically that a resource can be held exclusively by a thread and held on to while it's waiting for other things hold and wait is exactly that that the thread holding at least one resource is waiting for another one. It's not possible to preempt or take away resources from a thread. And then finally the circular weight condition says that you have a set of threads T123 through up to and that are all waiting for each other in a cycle. And if you have all four of these things there's at least the possibility of having deadlock. Okay, and we talked about a an algorithm for generally figuring out whether you're stuck in a deadlock situation or not. We talked about a number of ways to avoid deadlock and so on. One of the things we talked about however was the bankers algorithm. And the bankers algorithm is a way of dynamically handing out resources so that you won't get into deadlock. And the assumption is basically that every thread pre specifies a maximum number of each type of resource that it needs. You don't have to ask for the ball at once you can ask for them dynamically. And now the threads can request and hold dynamically their resources and the bankers algorithm will make a decision whether to hand those resources over based on whether it's going to deadlock or not. And basically uses the deadlock detection algorithm is like a, you know, a sub routine as part of that as we showed you, and for each request that a thread makes, we do a thought experiment we say well if we gave this resource to the thread. Would there still be a way to get all of the threads to complete without deadlocking. And if the answer is yes, then we would hand the resource out and if the answers no then we put that requesting thread to sleep. And so basically just as a summary here the bankers algorithm is preventing deadlocks by stalling requests that would lead to inevitable deadlocks we called those unsafe states. Now notice that the bankers algorithm doesn't fix all of the problems because if a thread grabs a bunch of resources and then goes into an infinite loop, the bankers algorithm doesn't help you with that. I gave one very simple example of the bankers algorithm here, but I didn't call it using the bankers algorithm what I did but here was this example of two threads that happened to be asking for locks but they're doing it in an opposite order so thread a asks for the X and then the Y lock and Fred B asked for the Y and the X lock. And as we talked about last time it's possible to get that stuck in deadlock. But if we actually have the bankers algorithm running then a says, G bankers algorithm I'd like to acquire X and the bankers algorithm says well, go ahead because you're not going to deadlock anybody. As we talked about last time if be asks for why and gets it we are now stuck. We're not quite deadlocked yet but no thread is ever going to be able to make progress and so we're in an unsafe state at that point. What happens with the bankers algorithm and this bankers algorithm does that thought experiment it says well suppose that I gave thread be why. And if I did what would happen and if it does that thought experiment it'll find that a and B are now deadlocked if it does that. And so rather than handing why to be it actually puts be to sleep. What does that allow well that allows a to go ahead and get why finish up and release the two and then be can be woken up and it can move forward. Okay. So basically that's an example of how the bankers algorithm could prevent this particular deadlock. All right, and we also talked about a couple of algorithmic responses to things like the, the dining lawyers problem we talked about where they're each grabbing two chopsticks, and how to prevent that basically coming up by analyzing it with the bankers algorithm. So you don't necessarily have to have the bankers algorithm running dynamically live you can actually use it to analyze an algorithm and figure out how to prevent deadlocks. Okay. So, I'm going to stop there I wanted to briefly see if there were any questions on deadlocks we I definitely recommend you take a look at last lecture because we talked about a number of different examples of deadlocks and how to avoid them in different simplest way without the bankers algorithm to fix a this A and B deadlocked as anybody know how would we prevent A and B from ever getting into deadlock without having to use the bankers algorithm. Yeah, so we basically pick an order call it XYZ whatever dimension order and you always request resources in that order. A would get X and then why and B would get X and then why, and you can prove that for instance there's no way to ever get a deadlock because to get a cycle. One of the threads would have to have Y and request X which is going back in order and so you can actually do a proof that shows that there's no deadlock there. Good. All right. So we're going to move on. We've been talking a lot about virtualizing the CPU. And it's time to move on a little bit to some other resources, but in general, you know, different processes or threads share all the same hardware. You need to multiplex the CPU so that was scheduling and, and some of the lower level mechanisms we talked about, you need to multiplex memory going to start today. You need to multiplex disk and devices that's a little bit later. We'll even talk about virtual machines where essentially virtualize a hardware view of the whole machine we'll talk about that a little bit later as well. But today we're going to focus on memory. And why do we worry about memory well, the complete working state of a process is defined by its data and memory and registers so if you were to take all of the state in the system and you were to put it aside somewhere and then you were to, you know, throw out the CPU and get new ones and reload them all up. You'd be able to pick up where you're left off. So memory is pretty important. It actually represents the actual running state of the system. And you can't, among other things, for instance, let different threads use the same memory, because what you're going to end up with is either interference, or you're going to have a situation where private information gets stolen, etc. So I like to sometimes think of this in terms of physics where two pieces of data can occupy the same location in memory. Okay, now if you guys were to get me to talk about quantum computing at the end of the term where we sort of have random random topics. I might modify that a little bit when we have a quantum machine, but for now, two pieces of data can't be in the same place and therefore we have to virtualize resources somehow to get around this problem. Okay, and you really don't want to have different threads having accessed each other's memory unless they're intending to, because otherwise you can get malicious modification of the state of a process. So, does UCB have quantum computing work? Yes, we have some that's going on. In fact, we have a brand new big grant that just started. If you're interested, we can talk at some point soon. So if you remember the very first lecture pretty much or first or second lecture, we talked about some fundamental OS concepts this was lecture two. And one of them was the idea of a thread which was the execution context or a serial chain of execution with a program counter and some registers and stack etc. The other piece of that that was very important was the address space either with or without translation, which as you recall was a set of memory addresses accessible to the program for reading and writing, and it may be distinct from the physical underlying memory space of DRAM. Okay, and that's when some translation comes into play. And that's when we start getting virtual address spaces which we're going to dive into today. Now processes, which is what you get when you combine a set of threads and a protected address space. And, and then we talked about dual mode operation, which was basically how to take and protect the address spaces that the operating system is producing, and make sure that processes can just randomly alter their address spaces and thereby violate the protection. Okay, so address space and dual mode operation are things that we haven't really talked much about since the very beginning of the class and so it's time to bring them back. All right. Now if you remember the basics of the address space it was the set of addresses that was accessible to a given thread or process. And I just want to toss a couple of things out that you all know but was make sure we're on the same page. If the address of a CPU is k bits, then there are two to the k things that we can access right so if there were only eight bits. There would be two to the eighth or 256 bytes in the address space. Now we've moved far beyond that, unless you're dealing with little tiny iot devices. The one thing that is pretty standard now is things that when we're talking about number of things we're usually talking about bites, unless we say otherwise. Okay, and a bite is eight bits. Now, in the early days before people really figured out much about computers everybody was kind of doing something different. And it turned out that the things that people counted were all sorts of different lengths so there was actually six bits was a standard because with two to the sixth or 64 things you can get pretty much all of the printable characters subset of ASCII and so that was, there were six bit things and then of course, 36 bits might be a might be a number right but today, people would stare at you a little strangely if you talked about a 36 bit word. In some instances everything is kind of multiples of eight powers of two. So, two to the 10. Okay, what's two to the 10 bytes. Well that's going to be a kilobyte right or 1024 bytes. And notice that when we're talking about memory or sizes of things. When we talk about a kilobyte KB, we're going to be talking about 1024 not 1000. Now I know in 61 a and other early classes you guys all talked about kibby bites kib. The real world doesn't often deal with kibby. That's something that that you see when you're lucky, but you may need to interpret units when you're out in the real world and usually when you're talking about memory and you're talking about kilobytes it's 1024. Alright, and we'll try to make that clear in an exam but if you're out in the real world and somebody talks about kilobytes of storage you know they're talking about 1024. We can do things like how many bits are to address each byte of a four kilobyte page. Well four kilobytes is four times one kilobyte, which is four times two to the 10th, which is two to the 12th which is 12 bits total. Alright, so you're all going to be great at doing this kind of log base to sort of stuff by the end of the term but you should definitely get more and more comfortable with it. Like for instance, 12 bits here is how many nibbles does anybody know so a nibble is a single hex digit or four bits so you know how many nibbles are we talking about here. Good. Okay, so three hex digits gives us a four kilobyte address. Okay, how much memory can be addressed with 20 bits or 32 bits or 64 bits well two to the k. Okay, use your calculator app. And of course, two to the two to the 32 is very common one for us these days which is, you know, a little more than four billion. So there's some numbers that are very useful to get to know. Okay, so back to address spaces so two to the 32 is about a billion bites on a 32 bit machine, and we typically look at that as starting from 0x8 zeros to 0x8 fs, fs, fs, fs, fs, fs, fs, fs, fs, fs, fs, fs, fs, fs, fs, fs. address space and those 32 bits will specify a specific bite within that four billion bites. Okay, so how many 32-bit numbers fit in this address space? Why is this upside down? Well, I'm just trying to keep you guys keep you guys on your toes. We'll go forward or backwards or upside down every now and then. I apologize that it's not always consistent. But how many 32-bit numbers fit into this address space? Well, that's a question that you might ask yourself sometimes because a 32-bit integer is how many bites? Well, it's four bites. It's 32 bits. Okay, so there are two to the 32 over four or about a billion 32-bit integers in this address space. Okay, what happens when a processor reads or writes to an address? Well, this is an interesting question. You probably haven't thought this through, but now you're probably sophisticated enough to know the answer. When you, when the processor reads from a particular bite in the address space, it probably, some of it acts like real memory. So if you read from it, you get data. When you write to it, you modify the data. When you read back, you get the data you modified. However, a lot of other things can happen like you could get an IO operation. We'll talk about memory mapped IO later in the term where just the act of reading or writing from some address causes data to go in and out of a, of an IO device. Okay, that's memory mapped IO. Or maybe it causes a seg fault where you try to access something, like if you try to access something in the middle here between the stack and the heap, it's possible you'll get a, you'll get a page fault under some circumstances. Okay, or it could be shared memory, which we'll talk about later in the term, in which case not, not too much later in a lecture or so, in which case you might actually be communicating between two processes by setting up shared memory between them. Okay, so address space is the set of things you can access. And what happens when you do varies all over the place depending on how you've set that address space up. And so that's part of what we're starting today is understanding how to set the address space up. So this is the typical structure of a Unix style address space, where the lower addresses typically have code in them. And then there's a stack segment that's at high memory and grows down, which in this case is up because we're inverted, and the heap grows toward the FFF addresses. Okay, and there's a big space in the middle. Okay, and the program counter or the IPC, depending on what processor you're looking at points to the code segment and the stack pointer points to the stack segment. So these are two registers, the PC and the stack pointer, which we always deal with. Okay, and this idea that there's always a big hole in the middle is something you ought to keep in mind because we're going to talk about that shortly. When we start talking about how to do virtual memory here, there's all we're going to want a virtual memory structure that lets us have holes like this. In fact, we're going to want a lot of holes in our space. This is just the most common one that you learn about really early in this class. Okay, and one other thing that you're going to get to learn about is the S break SBRK system call is the one that's used to add more physical memory in the heap. And what happens is you're sort of growing this yellow segment to be larger by putting physical memory in there and mapping it. Okay, and so S break is a system call that I believe we have you implement in one of the projects. Okay, it's either two or three. So any questions? Why do we want the holes? Well, the answer is that even today for, so there's two questions here. So even today, four billion bytes is a lot for most processes. Okay, unless you're doing some sort of high performance computing. And so you never want all of that space. But the processor, if it's a 32 bit processor, has a whole addressable space. And we know about the stack growing from high memory down and the heap growing from low memory up, which tells us right off the bat that we if we don't want to fill all of this with physical memory, there's going to be holes in there just because the processor can address far more than we want to actually have real memory for. So that's one of the good reasons for holes. And the question about S break is yes, indeed. So S break is called inside a malloc malloc is a user library at user in user code. And it calls S break when it needs more memory to put on its free lists. Okay, so the other thing to recall is we talked about the notion of single and multi threaded processes. So you've seen this particular figure over and over this term. But if you recall the address space is the protection environment. So that's like around the whole box. And then there's one or more threads inside. Each of the threads has a stack and a place to store its registers, right? That's the TCB. And then there's common code data file descriptors, all those other common things that everybody in the same process shares. On the left, we have a single threaded process on the right, we have multi threaded process. And we we've spent a lot of time over the last several weeks talking about how to make threads work. Now we're going to talk about how to make this address space work. And so you can think of threads as the active component, the thing that computes in the address space is is the protected component of a process. It's the thing that is preventing threads from different processes from interfering with each other. Okay. So what are some important aspects of this memory multiplexing we're thinking. And the reason I put multiplexing down is there's a single chunk of DRAM that's typically shared amongst the whole bunch of processes. And so the question is what are some important things to think about. So one is obviously protection. So we want to prevent access to private memory from other processes. Different pages of memory can be given special behavior. So that if you try to write a read only segment, then you're going to get a page fault. And a good example of read only is the code segment because the code once it's loaded in shouldn't be modifiable in many cases. And if you can make it so it's read only, then different processes can actually share the same code without interfering with each other. Okay. It's also the case that sometimes we might want to have memory that's invisible to user programs, but available to the kernel. And we could do that with mapping. Okay. The kernel data is protected from user programs, programs are protected from themselves, etc. So protection is a big aspect of this multiplexing. The other that's interesting is translation and we're going to work our way into why translation is important. But this is basically the ability to take the processor's accesses from one address space, which is the virtual address space it sees and translate it into the physical address space, which is where the actual bits get stored in the DRAM. And when there is translation, not all processes have it, processors, excuse me, have it, then the processor is going to be using virtual addresses. And the physical memory will be in physical addresses. And so we're going to be translating somehow between the two. And some side effects of this is you can use translation to help protect and avoid overlap between different processes. So each of them can think they have the address is zero, when in fact, they'll be pointing at different physical places in DRAM. Okay. The other thing is sort of a controlling of overlap. So if we have multiple processes, they're all running together at the same time, we want to make sure that they don't actually collide in physical memory. First of all, because that's going to screw up the state. But second of all, that's where an important part of protecting and only allowing communication between the parts of different processes address spaces that we choose. And of course, the default thing we told you at the very beginning of the class, which was the by and large, the most common thing is that there's no overlap. So neither no process can write to another processes address space or even read from it. But we're going to start allowing controlled overlap, which is when they can share with each other. Okay. Now, an alternate view that might be useful for some of you is to really think about interposition instead. So the OS is interposing on the processes attempt to do IO operations, for instance. Why because you have to go to you have to do a system call to do IO. And as a result, the OS takes all of the processes attempt to IO and intercepts them and decides whether to allow them. Okay. The OS interposes on processes CPU usage. Okay, well, and interrupt will allow you the scheduler to take back the CPU and schedule somebody else, right? We've been talking about that. So really, our question today is about how the OS can interpose on processes memory accesses, and thereby come up with a uniform protection scheme. And the obvious thing to think about here is it's not practical for the OS code to take over on every read and write of the processor, because that would just be way too slow. Okay, and that's where our memory translation accesses or memory translation mechanisms are going to come into play, where we're actually going to provide hardware support to help us so that the OS can do its interposition and come up with a good protection model, but it doesn't actually have to look at every read and write. Now, there's a question here in the in the chat about, is it possible for a process to demand more virtual memory than we have space for? And the answer is yes. And then basically what happens is the process ends up getting killed with a segmentation fault of some sort. So if you ever try to exceed the amount of physical memory, then the OS clearly knows that because it's managing the physical memory, and it would end up killing off that process. Okay. So really, we're talking about interposing a protection model on a process running in a processor. And if you remember, we talked about loading at the very beginning, where a program is in storage, and what we do is we pull it into memory. Okay. And when we pull it into memory at that point, what we're going to do is we're going to make it ready to run. So what's stored on disk isn't always quite ready to run. There's a loading process. And what I want to tell you briefly is a couple of ways in which that loading actually can reflect a translation and protection model in software. Okay. And so I want to remind you of a few things from 61C. So here's the processes view of memory here, where we have some, we have some labels, we have some storage that we're putting aside. So this is saving 32 words. We have some loading from that data. So this is sort of load from data one, using and put it into R1. Right. And we can do some jumps and links and so on. If you actually look, this is assembly code at the left. But when we load it into memory for execution, it's got to be put into binary. Okay. And so this compiling that you and linking that you've been getting good at is really the process of turning this compiler output. So the compiler is compiling C it produces assembly, and then it gets linked into physical addresses. Well, for instance, if you notice here, the start part of the loop is actually, I mean, this this load that's at start is basically referencing data that's data one, which is a particular address in memory. That address here is actually at address o x 300 and hex. And that's where that data is stored. This load instruction is loading from that address. How does that work? Well, during the linking process, we put the address 300 hacks into that instruction. And it turns out that you don't need the lower two bits in the actual instruction itself. So 00 C zero is really the same as zero 300. Okay, because it's zero C zero times four gives you zero 300. And so the linker is the part that figures out how to take all of these references to addresses and turn this into a binary that actually runs on the CPU. Okay. And that binary has actually been placed in a particular place. So what I said here is Oh, the data one is at address zero x 300. Why is that? Well, because we're putting it physical address zero x 300. Okay, and so really, this program in assembly is kind of in a location independent form still. And the linking that we do makes it runnable in a particular part of memory where the address of the data is at zero 300 hex. The start is at 900 hex. And we've done this linking so that that instruction that load instruction knows how to get zero 300 hex. Okay. Question so far. So this is all 61 C kind of quick summary here. Now, what's kind of interesting is what if call this app X. So what if we want to run it again? So we have two instances of the same application running, but we want to put it in a different part of memory. Well, one thing we could do is we need some sort of address translation so that we can put it down here at a part in memory. And it will still run. Okay, now if we don't have actual translation and hardware, what we end up having to do is, for instance, translating and linking this at a different address. So notice what I did here is I put this data at Oh, X one 300 hex. Okay, and I've altered all of the offsets and all of the instructions to point. So they're consistent. And if I do that, then I can load two copies of this thing, they're both been translated slightly differently, and linked slightly differently. But now they can run in the same physical memory without needing any hardware translation. Okay, so I can I can link this in two different ways. And now when the processor happens to be running in this green part of the code, everything works because it's self consistent, and it's been linked self consistently. When it's running in the yellow part, it's all self consistent, and it can run self consistently. So this is the compilation and linking process where I'm linking to different physical parts of memory. And I want to pause just for a second there to let that catch up with everybody. Notice that trying to run the same app at two different places in memory causes a different binary to be loaded in this environment that we're coming up with. Okay, and so I'm hoping that people are starting to say, Well, that seems inconvenient, right, because it means that you're dynamically linking things at load time differently, depending on where they are in memory. And that also means that you can't just move these around in memory. Okay, everybody with me on that so far. I'm hoping everybody kind of remembers this idea of assembly language being linked for a particular load point in the address. Okay. So there's many possible translations. And in fact, every different place I can put this in memory, I have to translate the physical machine code differently to make sure that all of my addresses that I use inside that application are consistent. So where do we do this translation? Well, we can do this at compile time, we can do it at link or load time, or execution time with the right hardware support. So right now so far, I'm showing you at link or load time when we load this in to the to the processor. Okay. So here's kind of something that you've all been doing but haven't really thought about it. So you start with your source program. And you run the compiler, which produces some assembly, which then assembles it into an object module. So if you look back here, the object module is kind of the compiled or assembled version of this assembly before I've actually done my absolute linking. Okay. And then I can take a bunch of those dot oh modules with some other ones, maybe for libraries, I can link them. And now I have a load module. Okay, and that's kind of what we've done here. This is now a loadable module. And then the loader can have the system library involved. Okay, and I can load all that together. And I can statically link libc if I want, or I can dynamically link libc. And what that really means is that there's actually libraries that are preloaded already and running on the processor. And only when I start running, do those addresses get linked. Okay. And so addresses can be bound to final variable values pretty much anywhere along this path. So what I was showing you back here was we were binding the final addresses at the point at which we were loading. So that was a linking and loading combination process. Another thing that would be done if we did it statically like this is we might link in the libraries at that time for for one last little bit of linking before it starts running. Okay. But that's not what you typically do. Because typically, what happens is you get all the way to the loader, the loader loads it in. And then the dynamic libraries are actually linked at the time that things are running. So there's a little bit of code that's put in there instead of your call to the libc routine you're thinking about kind of a stub. And as soon as that starts running, then we jump into the dynamic linker and link to a version of libc that's actually running already on the machine. And that's how we can actually have a whole bunch of dynamically linked libraries that are read only from a code standpoint and basically shared by all of the running tasks on the system. And thereby we have a lot less memory space that's taken up by dynamic libraries are essentially shared across all the programs. Okay, so DLLs are these dynamically linked libraries and they're linked at the time that the program starts running. So let's talk about unit, unit programming, which is back in the old old days, when you could sort of have one thing going on at once. Okay, and here unit programming has no translation or protection and hardware at all. The applications always run at the same place in physical memory. And there's only one application running at a time. But when you when you take your compilation chain and you link something and you make it ready to load, you have to come up with absolute values for all of the offsets inside there, just like we did back here, where we hard coded at load time exactly all of the addresses and that was only good for a particular load point in memory. Okay, so this is our, this is basically what we did back in the old old days before we had more powerful machines. Now this is actually a bit before my time. But and by the way, for you guys, I flopped this again. So we now have all the high addresses are at the top and the low ones are at the bottom. So we'll try to do that a little more consistently. But the application that's running gets the illusion of having a dedicated machine because it's got a dedicated machine. So this is not terribly interesting from an address translation standpoint. So let's quickly move on from this and see what else we could do. Well, if we wanted to take that idea and multi program it, which is kind of what we've been talking about so far in this term, we could do this without translation or protection by making sure that we never overlap different applications accidentally. And we have to link them for exactly where they belong in memory. And by the way, that's like Microsoft Windows three one or the original Mac OS Macintosh. And so what we show here is that application one is running at one place and application two is running at another, the operating operating systems running up in high memory. And the loader and linker combination basically adjusts the application for a particular part of memory. Okay, and the translation is done it at load time, and very common in the early days. Okay, kind of until about Windows 95 or whatever in the Microsoft side when they started doing something more powerful. All right. Now, there's really no protection in this. Okay, so it's quite possible that application one or two could reach out and start overriding the operating system and crash the whole system. Now this was considered a feature. Because you could get all of these various drivers and other modifications, the operating system, you could download them from all over the network and, you know, enhance your operating system to do good things. This was a an early time where people were much more naive about the dangers of doing that sort of thing. And there weren't as many people out there trying to screw you up. But we've clearly moved beyond this primitive multiprocessing to something else. Okay, now the question here is does this in this environment are all jumps relative? No, they don't have to be relative because when we link, like we did back here, we're actually coming up with an absolute set of binary code that's been configured to be exactly good to run at this particular place. So jumps don't have to be relative jumps can be absolute because we've actually modified things at load time to run in this particular part of memory. Does that make sense? Now, this is not to say that we wouldn't like to have a lot of relative jumps, because that would be far fewer things that have to actually get changed on the way into the system. But let's start adding some protection. So can we protect programs from each other without translation? Of course. So we did we talked about base and bound way back when and by the way, the idea of a base and bound protection came from the cray one way back this by the way is the cray one. It's one of my favorite device, one of my favorite machines here because it had this circular configuration with seats around the outside. I'd like to think of this as the love seat configuration and it was circular because it was cooled and every wire was carefully measured to make it as fast as possible. And this is back when engineering came down to actually measuring wires and everything. And notice that what we've done with base and bound now is we're protecting to say that well, if application two is running, it can't exceed the base and bounds and therefore it wouldn't be able to write into the operating system. It wouldn't be able to write into application one. And we already talked about this. If you remember this slide from one of our early lectures and the idea here is literally that the program is busy running in this yellow segment. Here's our original program which we thought of as going from zero to whatever, some limit, zero, one, one. Once we loaded into memory and we've linked it for that particular part of memory, then the base and bounds kind of just prevent the program from getting outside of it. The actual program is CPU is basically just running instructions and it might have an address like one, zero, one, zero, zero. And what happens is that address before we allow it to go to DRAM is just compared. Is it greater than or equal to the base and is it less than the bound? If so, we let it go forward otherwise we fault it. This particular instance is now a feature of the hardware, the base and bound, because as the addresses are coming in from the processor, we are actually checking these in hardware before we allow them to happen. It requires hardware support, but this hardware support is very simple. But the OS has to set the base and bound registers in order to make this work. Now, it requires a relocating loader to work. We talked about that already, but you have to be able to take your program and relocate it so it's runnable starting at one, zero, zero, et cetera. And notice there's no actual addition on the address path, so this is still fast. We're just checking kind of off the edges to see whether we should allow that access to go forward or not. Now, we talked about this. This is kind of fine and dandy, but it's still requiring this relocating loader. Wouldn't it be nice if instead we could just come up with one linked version of the original program that could run no matter where it was in memory? But to do that, we need to start doing our relocation in hardware rather than in the loader. So up till now we've been talking about doing this relocation and final linking in the compiler, but now let's see if we can do this with translation. In general, the idea of translation, which we've also brought up a couple of times this term, is that you'd have a CPU that's busy using virtual addresses and those addresses go into something like a memory management unit and translate from the virtual addresses the CPU is using to physical addresses. Now, suddenly there's two views of memory. There's the view from the CPU or what the program sees and we call that virtual memory. And then there's the view from the memory side, which is the physical memory. So if we were to ask ourselves where every bit is actually stored, well it's stored in the DRAM somewhere and there's a physical address, but that physical address for that bit is physical and it's different from what the CPU uses. And there is a translation box in the middle and that translation box is kind of the topic for the rest of the lecture. We're going to talk about what's in the translation box. And as you might imagine, there is, as the CPU produces virtual addresses and we translate them into physical addresses, something is in the middle here that takes a little bit of time because it's hardware. And so the virtual address goes in. We take a, you know, some number of nanoseconds, whatever it is, what comes out is a physical address. Now, the question is, is there only one translation per system? No. Once we have a general translator, we can translate any way we want. In fact, we're going to talk a lot about what those translations look like. And so you can map addresses any way you want once you got the flexibility. And so that's the important question here. Notice, by the way, I also show this untranslated reads or writes. So typically you can go around the MMU, but only if you're in user, or system mode, excuse me. Okay. So once we have translation, now it's much easier to implement protection. Okay. So clearly if task A can't even gain access to task B's data, there's no way for A to adversely affect B. Okay. Now the MMU, the question is, does the MMU traverse the page table faster than you could other, I'm not sure if I understand the question entirely here, but I will say the following. The MMU is doing this translation and we're very, it's very important that that translation in general be a lot fast, very fast. Okay. It's got to be faster than the cache. Hopefully otherwise, what's the point? We'll get into speed a little bit later. Right. Once you have things like page tables and so on, then the MMU is going to occasionally be slow and that's when the caching is going to come into play. All right. So the nice thing about, so we'll get to speed later. All right. So you're going to have to just hold off on worrying about speed for a moment. Let's assume it's infinitely fast and then we'll come back from that later. So once we've got translation though, every program can be linked or loaded to the same region of user space and so every program or every process can pretend that it's got the address zero and it's got the address 50,000 because every time we have a different process that we give the CPU to, we change the translation and so now we can give every process the illusion that it's got address zero, which means that all of our linking and loading can now be linked and loaded once no matter where the thing's going to run physically because the virtual address space always looks the same no matter where it's loaded physically. Okay. So that's a huge advantage here. Now this was the simple thing that we started with where we said, look, rather than just checking, we're going to have base in bound, but now what we're going to do is we're actually going to translate addresses on the fly by taking the program addresses coming out of the processor and adding a base to them. Okay, so notice that if I have program address 001, 00, what I do is I add the base of 001 to it with an adder and what comes out is a higher address because I'm adding a base to it and that's the physical address. So now what we've got here in blue is now physical addresses. Okay. And what's coming out of the processor are virtual addresses and the way those are related is with a very simple addition operation. Okay. And if you notice the base in bound, the base is related to this translation, the bound, typically we check the addresses coming out of the CPU and so we make sure they're not too big and then we add the base to them and then we let things go forward. Okay. The good thing about this simple mechanism is it's very simple, we just have an adder, and the original program can now, as I mentioned, can all be linked no matter where this yellow piece ends up. It always looks the same because we do this translation. All right. Questions? Now, this is hardware relocation. Why? Because we have hardware, that little plus that's relocating for us. And now we can still ask a question, can the program touch the OS? Well, no, because we don't let the program addresses go below zero so they can't get up here and we don't let them go above the bound and so basically what we've we've got a little sandbox here that the yellow code is forced to be in and so it can't screw up the OS. Okay. Can the base be negative? No. Okay. So the base, these are unsigned operations here. Okay. Can it touch other programs? Well, no, because no matter whether the program is in memory above or below the yellow thing, we still protect it by checking for not going below zero and keeping less than the bound. All right. Okay. So this is pretty simplistic and you might imagine that if this were all it is there wouldn't be a whole lecture on this. So clearly something more complicated is going to be needed and let's start looking a little bit at these ideas here. So one of the problems with a simple base in bound is the following. So here I'm showing you a chunk of memory. We have the OS, okay, and we have a couple of processes, six, five, and two that are running and over time what happens is well process two finished and so that left a big hole and then process nine showed up and then process 10 showed up but process five left and now process 11 comes along and we can't find enough space because even though process 11 would fit in the sum of the blank empty areas there is no area that's big enough for us to fit and if you think about this simple base in bound that I've got here this requires the memory to be contiguous physically. So we have to actually find a chunk of DRAM that's big enough to handle all of our data. Okay and that's just asking for trouble because suddenly we've got a fragmentation issue and really why is there a fragmentation issue here it's because not every process is the same size okay and as a result of the different sizes suddenly we get fragmentation and the only fix to this is going to be well we would have to copy processes nine and ten and push them up in memory and then we could make space for 11 or whatever but there's going to be a lot of memory copying going on which is expensive and that's just to try to coalesce things together and get out of the fragmentation issue. Okay so we're missing support for sparse address spaces here also if you think about what I've just told you here is look at process 11 process 11's chunk has to be contiguous and so if you remember just a few slides ago I said oh we want to be able to have a hole between the heap and the stack because that's part of the way we use this well we can't get a hole between the heap and the stack given this base and bound idea because the memory is contiguous here okay so that already tells us that maybe we need something different okay the other thing is it's hard slash very hard to do any sharing in memory between two processes because by definition process five for instance isn't allowed to access any memory outside of its chunk including the OS so the only way that five could communicate with six is maybe you could set up a pipe where you had to do a system call into the OS and then that would call back into process six so pretty much we forced ourselves to have to do inter-process communication entirely by going through system calls because of the way we've set up this memory sharing okay so one thing we could do which is done is rather than one segment per process we could have multiple of them and you're already very familiar with this you got to play with gdb on your very first homework zero right and what you saw was there's a bunch of different segments that represent things like stack and program and so on and what we can do is we can have each segment be a contiguous chunk of memory so the user's view is that there are a bunch of individual segments that are kind of floating in space the physical view is really that you know they're contiguous chunks in memory okay and once we do this then we can start talking about well this green segment four we'll just map that same segment four into two different processes and now suddenly they're sharing memory and they can communicate with each other okay so the mere act of having more than one chunk of memory suddenly gives us much more flexibility now let's talk a little bit about what's in the mmu to give us multiple segments we already talked about base and bound as being single registers but now we need to have multiple base and bounds or have I have it called base and limit here so here I'm showing eight of them they might be loaded into the processor just like they were with the single base and bound and so the segment map is in the processor and now what do we do well we can take the virtual address and have some chunks of some segment bits at the top of the address which we split off how many bits are we going to need up here in order to have our eight segments if we do it this way anybody figure that out three very good why because two to the three is eight right so the top three bits we use to pick the segment the rest of the bits will call the offset okay and so the offset will get added to the base and that will give us a physical address which will then check the limit to make sure we're not too big that'll give us an error if we've gone too big and now suddenly we've got a multi-segment base and bound and this is a little more flexible right because just by having base to limit to the same for different processes now suddenly we have a chunk of memory that can be shared okay and we have as many chunks of physical memory as their entries entries here and I don't know if you noticed but I've also got these valid or not valid bits and so some of these segments might be intentionally not set up as valid okay now how does this relate to segment registers in the x86 this is very similar with one slight difference which is notice in this particular model we grabbed the top three bits of the address and used that to tell us which segment register whereas in something like the x86 model those bits actually come out of part of the instruction the instruction encoding and you take a few bits off and that tells you which segment register you're dealing with rather than having to pull them out of the segment some bits out of the virtual address absolutely the same idea though all right so and so if you were to look at what's going on inside of an x86 processor you know this is using the the es segment it basically decides which segment it is based on the encoding and then it uses that to look up in the segment table okay so what's the v or n this is whether it's valid or not valid so typically when we do an access we're going to not only look up the base and limit and check the offset giving us an error but we'll also check the valid bit potentially giving us a different error if we try to access a not valid segment okay so suddenly we're getting into an interesting model here which isn't quite where we want to be but it's starting to show you all the major interesting aspects of a translation scheme where there's certain requirements on the addresses namely we can only talk about segments that are currently valid there's also certain constraints on the offset in this case where the offsets can't be too big etc and we're starting to look like I said at certain access requirements of valid or not valid we'll get more sophisticated here in a moment okay questions now what you should how large are segments well segments in this case suppose this is a 32 bit address and we take three bits off the top what's the biggest segment that we could have how do we do that yep two to the 29th exactly so this particular scheme could have a really large segment right and so the the size of the segment is real the maximum size of the segment has to do with the maximum size of the limit in this case okay good now here are the x86 model the basic original 8386 introduced protection but also had these segments and so these segments there's a six of them that you're well familiar with the code segment stack segment and then four other segments are typically used and this is a typical segment register just like the green ones from the previous page and it's not quite the same as what it was but it's close so this index in the segment register actually points into a table that then looks up the set of segment registers that you have access to okay so this is just one little level of indirection but it's pretty close and so what's in CS for instance is these 16 bits the index is used to look up on another table to get the base and limit and then there's a couple other things so for instance the current RPL level okay is the what level you're executing at are you executing at kernel level or user level for instance remember there's two bits here because there's four levels segmentation is fundamental in the x86 way of the world okay and so you can't just turn segmentation off it's in every it's in every access okay so you know here if you were to look at every instruction there is some segment portion of that instruction that's there okay and it may be implicit or it may be explicit but there's always a segment portion of the access when you're dealing with x86 what if you just want to use paging or some other flat scheme we'll talk about that in a moment but typically what you do is you just set the base to zero and the bound to all of memory and now you've effectively said I'm not going to worry about my segments anymore because I'm effectively treating them as pointing at all of memory okay all right and by the way when you get into the 64 bit version of the x86 scheme all but the the top two segments fs and gs all of the other segments basically are effectively have a base of zero and a limit of of two to the 64th so they're essentially non-functional we can talk more about that a little bit later okay so I want to give a very simple example here again just to walk us through so if you look here's four segments okay so one two three four that means two bits taken out of the address out of a 16 bit address so this is a really small address space here's the virtual address space here's the physical address space and I want you to notice that I've divided this into things that started zero zero zero things that started four zero zero eight zero zero and C zero zero zero and if you think that through this address if you strip the top two bits off of zero x zero zero zero zero what you think of is there's the top two bits or all zeros here the top two bits are zero one here the top two bits are one zero and here the top two bits are one one now if what I'm saying is a mystery to you it's time to start reviewing your hex so you should get to where you know zero through f very cleanly and you know exactly what the four bits are so that you can strip that off easily but we'll assume for a moment you know that so what that means is segment with the id zero where the top two bits are zeros I look up here and I say oh segment id zero has a base of four and a limit of eight so that means that this little pink chunk here gets mapped to this little pink chunk in physical space by this scheme similarly this chunk of cyan I should call this magenta I suppose this little chunk of cyan gets mapped to this chunk how do I know that well because four thousand is a zero one in the upper two bits which segment id zero one has a base of four eight zero zero zero and a limit of one four zero zero which means that it goes from four eight hundred to five C hundred okay and similar etc okay and we can start talking about oh yeah this yellow chunk is something we share with different apps or whatever there's lots of ways to decide how to put controlled overlap into your use of physical space once you have the ability to do this mapping okay so mapping is pretty powerful and the other thing to keep in mind is this green table which is in the processor needs to get swapped out every time I change from one process to another because I'm changing the address translation so when I change from one process to another I save out the green table on one processor and I load in the green table on another okay now I want to give you another example of translation okay so here is some assembly and here's some virtual addresses okay so blue here is virtual because the processor is going to be running there here is my segment registers all right and we are going to pretend to be processors I don't do this too often because it's time consuming in class but let's simulate a bit of this code to see what happens so let's start the program counter at 0x240 and notice that's a virtual address because this is what's in the actual program counter of the processor okay so the program counter has a 0x240 and the question is if it wants to load the next instruction for that program counter what happens well it says it's going to fetch 0x02440 what happens in the MMU is it takes this address which is 16 bits notice and by the way this is 0240 so 0 is 020 2 is 010 4 is 0100 and 0 is 0000 okay again maybe you want to write out hex to binary and put it under your pillow and sleep with it till it absorbs into your brain if this is not something you're comfortable with but once I translate the address into bits I can look and I say oh look the top two bits are zero which means I'm in the code segment and so that means I'm in virtual segment zero zero zero what's the offset well the offset is pretty much everything else well if I take the top two bits out of there what's left over is still 240 and so what I do is I take my base which is 0400 and I add it to the offset of 240 and what do I get 4240 voila so the physical address has been translated from this virtual instruction fetch to 4240 and at that point I fetch from DRAM at 4240 and I get that instruction which is an load address into dollar a zero from var x so now I've got the instruction loaded okay all right and what I want to do is I want to load this var x address which is 04050 into address a zero now notice 4050 is a virtual address and I'm loading it into register a zero question do I translate the 4050 into a physical address before I loaded into register a zero can anybody tell me whether I do that or not good no why everybody who said no is correct that's right good the process only sees virtual addresses so this is a virtual address 4050 and it gets loaded into a zero great answer so I don't translate because I'm not going to DRAM here I'm just loading a constant address into a zero okay now next instruction we're going to fetch well we were at 240 now we're at 244 because we incremented the PC by four okay because in this risk and processor the addresses are 30 or the instructions are 32 bits in size we translate the 244 into 4244 which is exactly what we just did in the previous step but the next instruction we bring the jump in link to string length okay and we're going to jump to where the string length is and once again we're going to move the 248 which is the return after jump in link into the return address okay so the typical risk processors like risk five that you guys dealt with in 61C there's a return address and that a return address is going to be 248 which is once again a virtual address and since we're jumping to virtual address 360 we put that into the program counter now I want everybody to appreciate the fact that the only time we translate is when we go to DRAM which so far the only time we've done that is when we load the instructions we have to figure out where those instructions are stored in DRAM okay now we get to this string length okay where we're going to load a value into v0 of 0 so we translate the physical address here is 4360 we get it okay we're going to move the constant 0 into v0 okay that's what that instruction does and we're going to increment the pc by 4 and last thing we're going to show you here is we're going to fetch this instruction okay so so far the only time we've done any virtual translations to physical is when we load the instruction but now look at this this instruction is a load byte so not only do we load the instruction which we fetch from 4364 okay but now we want to load the the value that's at address stored in dollar a0 so we have to take that dollar a0 which is 4050 and translate it well 4050 looks like this in hex or I mean in binary 0100 0100 0101 0100 this is virtual segment one because the top two bits are one got the data that tells us the basis 4800 therefore the physical address is 4850 okay all right and then as a result we load a byte from 4850 into t0 etc and we're good to go okay now if you notice we actually did do a virtual translation right we figured out top two bits bottom bits are this 50 when we add the base plus the 50 we get 4850 so this is showing you the translation going to DRAM both when we're loading instructions and when we're doing data accesses all right I realize that was a long process here but I just wanted to talk through that once in class so that everybody had seen it once okay do we have any questions on this before I move on okay now does the OS have special instructions that access physical access addresses directly yes in most cases there's a way to go around the MMU okay the other thing that the OS has access to is this green set of registers so only the OS ought to be able to modify this which means only when you're in system mode not in user mode are you allowed to change the green registers okay all right so what are some observations about what we just did there so we're translating on every instruction fetch loader store okay that's fine the virtual address space has holes in it that's good right if we look here we've got some holes in the virtual address space okay that may be getting us part of our where we wanted to be when is it okay to address outside a valid range well this is how the stack grows okay if we look at our stack back here and I'm going to go back to this previous figure okay we might have stack is base zero limit 30 3000 I mean so that's this green chunk we could figure out that if the process tries to go outside of that it's effectively wants to grow the stack because it gets a page fault or a segmentation fault in this case by trying to access an illegal address that's outside of the limit the OS could take that as an indication that that needs to put more physical memory in there and it could grow the segment in that case okay but you can see that there's some limits to what you can do because you can't run into another segment all right the other thing is clearly we need protection mode in the segment table for example code segment might be read only data in stack might be read write etc so we want to start putting protection bits on the different segments what do you have to save and restore in a context switch while this particular scheme that we came up with the segment table stored in the CPU not in memory because it's small and therefore we need to every time we switch from one process to another we need to store the green segment table out to memory and pull in the green segment table from memory for the next process and if we want to put a whole process on disk we have to swap it all out we'll talk about that in a second okay all right now what if not all the segments fit in memory so if I take the set of all processes that I want to run and they need more physical memory than fits one option is you just kill them off if they don't fit a less drastic option is in fact to do swapping okay this is an extreme form of context switch where you take whole segments send them out to disk so that other processes can use the physical memory okay now of course the cost of context switching excuse me is a lot worse in that case okay because you got to go to disk you remember what's the number I told you guys a disk is like a losing a million instructions worth of time okay and notice that because of the way we set up our segments this is extremely inconvenient they always have to be kept together as a whole so if you look at that green chunk of memory it doesn't matter how big it is the whole thing has to be swapped out to disk we don't have any option of putting a slight part of the segment out there okay so you might imagine that we need something better here because this is not quite what we need so far so a desirable alternative to swapping everything might be some way to keep only the active portions of segments in memory at any given time and swap out the ones that are idle and that needs something better than this whole segment at a time thing that we've gotten ourselves into we need something finer granularity okay so problems with segmentation are one must fit variable size chunks into physical memory leading to to fragmentation you have to move processes around multiple times remember just to deal with fragmentation remember I showed you that you had a set of processes some left you've added some new ones and pretty soon your memory is all fragmented and your only option here is to move stuff around so that seems inconvenient you have to swap the whole thing to disk okay and so really there's multiple levels of fragmentation that are bad here and just to remind you guys of the different types of fragmentation there's external and internal fragmentation so external fragmentation says there's gaps between allocated chunks of memory that need to be coalesced together and so we're really talking about external fragmentation here internal fragmentation says you've allocated a chunk of memory and you don't need all the memory within the chunks and it's possible that we allocate our segments larger than we needed and now we've got fragmentation inside of them but this external fragmentation is clearly a major problem with our model so far so that leads to this picture which I've shown you several times this term and the idea there is we want a smaller quanta of stuff right so we want to go through this translation but maybe rather than having whole segments worth of chunks maybe we actually have many we divide the data into lots of little pieces which we're going to call pages translate each one of them separately and now we have a lot more control over placement okay and so that's going to be general address translation not just this simple base and bound segmentation that we've been talking about okay so we're going to do thick sized chunks okay and this is a solution to fragmentation first and foremost so physical memory is now going to be page sized chunks and I'll tell you right off the bat a page is typically a four somewhere between 4k and 16k let's think 4k for a moment bytes okay remember which is you know four times 1024 right every chunk of physical memory is now equivalent and so therefore you can just use a vector of bits to handle allocations there's no longer this weird keep track of all the free segments and their sizes and then figure out if you have to coalesce them together etc now pretty much any chunk of memory is the same size as any other chunk of memory and so really we only need this huge bitmap that tells us which ones are free and which ones are in use so that seems advantageous right should the pages be as big as our previous segments well no because that clearly left us into some problems with fragmentation so what we want is to have smaller pages okay now the original ones and original unix were kind of in the one k size you can get up to 16k we're going to think about four which is kind of in the middle here okay and so that means that what we're calling segments like the stack segment or the code segment or whatever is really comprised of a bunch of individual pages that we're then going to put together into a virtual memory space for the processor to access okay and so our MMU memory management unit is going to do something more than just this base and bound translation it's actually going to translate from one page set of pages to a different set of pages from virtual to physical so how do we get simple paging okay so this is our first chunk first try at this right so rather than having a register set of registers inside the processor which gives us the base and bound we're going to change gears for a moment and actually have a single register called the page table pointer and it's going to point at a chunk of memory now that's going to have a set of pages to translate okay and the page table for a moment we're going to have one page table per process and it's going to have a single page translation in it called a page table and that's going to be stored in memory okay and for those of you that are thinking ahead this is not quite what we want yet we're going to get to what we want next time but we're going to get closer this time okay so these this green portion now resides in physical memory not in the registers of the processor it contains physical page and permissions for each virtual page so if you notice page zero here is valid and read only page two is valid and can be both read and written et cetera okay page four is not valid and how does our virtual address mapping work well here's our address we're going to take the top set of bits and that's going to be our virtual page number and the bottom set of bits are going to be the offset and this offset is going to have enough bits for our page size so we've decided on a four kilobyte page which means the offset is 12 bits and the virtual page number is pretty much anything else it's all the rest and so now that offset in our translation is really easy because all we do is we take it out of the virtual address and we copy it over the physical address so the lower 12 bits so the physical address is exactly equal to the lower 12 bits of the virtual address okay and then the virtual page number is used as an index into this page table so if those remaining bits happen to be zero zero zero zero one then that represents page one and so we take the virtual page number we look it up in the page table it says it gives us the physical page id that's page number that's page number one that we copy into the physical page frame number and now we've got our physical address so we take the virtual page number look it up in the page table copy the physical page number into our physical address and we're good to go okay and if you look at this for instance by the way I'm talking about 1k pages here for a moment so if the offset's 10 bits then you might have 1024 byte pages and so it's 10 bits that gets copied the remaining bits well if it's a 32 bit address then 32-10 is 22 bits so there's 4 million entries those 4 million entries are used the index into the page so one of 4 million options which page it is we look it up okay and of course we got a check found so this page table is only so big and so in this case there's only six entries here and so the page table size says that if this virtual page number is bigger than six we get an error and if I try to do something that the permission bits don't allow like I'm trying to write when I'm only allowing reading here then I get an access error okay now every the OS does give every process gets a page table pointer that's exactly correct all right and the way we've set this up so far is every process also gets a page table size as well okay so they get a pointer and a size kind of like base and bound but now this is a level of indirection on page granularity okay and by the way by the time we get to next time we're no longer going to have a page table size because this is not going to be quite what we want okay now when the data to process wants something more than a page that's not a problem because we just use page zero and page one contiguously we find pages for it in physical memory and all of a sudden the virtual address has two pages worth of virtual address that's physically backed so there's the nice thing about this paging is that you can allocate physical data any way you absolutely desire to give you whatever set of contiguous addresses you like okay so this isn't this is ultimately flexible okay I hope I answered that question so let me show you a very simple page table example this is a silly one but it gives you the idea we have four byte pages okay a four byte page means that we only have two bits offset right and the rest are the page ID and so what happens here is if we have eight bit addresses since two bits are the page ID then the top six bits are address are page zero which tells us that the base is four of the physical page zero zero zero one zero zero that's the number four we copy zero zero to the offset there and that tells us that this pink set of virtual addresses turns into this pink set of physical addresses and we can do the same with the blue and the green where here notice that the blue one cyan are zero zero zero zero one because it's you take the OX4 and you split it out so this is page one which turns out into page physical page three and that's why this chunk of cyan maps to this chunk in the physical space now and the green one similarly and now we might ask well where's six well the thing about six is six if we split it into the offset which is two bits and the page ID we see the page ID still one and the offset is one zero or two so we basically are in this blue region right and address six is between four and eight so that makes sense and all we do is we take zero zero zero zero one one and that's our new physical page we copy the offset to the offset and we're good to go and that's over here and that's the same as true with nine okay and what's nine well nine is zero zero zero zero one zero zero one lower two bits are copied the upper six bits tells us which page ID we want that gets copied over and we find out that we're up here so virtual address nine turns into physical address five in this translation scheme okay questions now good question if I fragment pages across physical memory does it matter and it depends on what you mean by that but let's assume here that we're looking at this figure and let's just talk about that so notice the processor sees three pages in a row that it can use and it could have a data structure that spanned all three pages if it cared right because that's that's okay and notice how those are split all over the place okay and the answer is it doesn't matter from the translation standpoint how this works okay however there are certain cases where the DRAM might be a little faster if things are next to each other but that's going to depend a lot on your architecture by and large I would say it doesn't matter how scrambled they are in physical memory the processor gets to use them in virtual memory okay and when we get more into performance it's going to be more about which of these pages are on disk versus in memory than how they're scrambled amongst each other okay and the other question is will it ever be the case it's only partially loaded if it's across multiple pages if I have a data structure here yes so it's possible that this blue thing is going to be out on disk where the pink thing isn't and so if I start reading a data structure in memory and I get to the blue part that may cause a page fault which has to pull stuff off of disk and we'll get to that next time as well okay now what about sharing I want to just show you a little bit here here's an example where process A here's its page table and I'm going to map this page number two to some chunk in memory and here we go process B has a different page table pointer and a different page table and it's going to have something that ends up mapping to exactly the same physical page now because we did that now process A and B can share data by writing in their shared page and each can see what the other wrote I hope you all see something weird about this though process A sees that data one set of addresses namely we're up top here at 00010 process B sees that same data at a different place which is 000100 so these two virtual addresses map the physical page to different places so you would never really want to do this probably unless you had some sort of data you were sharing that didn't mind where it was if this is a linked list you want to make sure that the mapping in the page tables is at the same place in the two virtual processes or in the two processes and we'll talk about how to do that with when you're setting up shared memory segments okay when we get to that now if you bear with me for just a second so where do we use page sharing all over the place so the kernel region of every process has the same page travel entries and the process can't access at a user level but when you go for a user to kernel switch now the kernel can access those pages we're going to talk about the meltdown bug next time and that will be an interesting issue that we'll have to talk about but for now the kernel can share the same pages obviously between different processes different processes running the same binary we talked about that earlier but if we have I don't know what your favorite editor is here but enax running twice on the same machine all of that code is read only and it's mapped to the same shared set of pages between two processes and so the their code segments actually end up mapping to the exact same physical DRAM and now those two processes can run away happily sharing the same data okay I have started a holy war by saying emacs vi vi so now by the way I'm a big fan of emacs I apologize to all of you out there so user level system libraries are another great example of sharing okay we share dynamically linked libraries which I mentioned earlier in the lecture and the way that works is the the actual code is shared and it's linked into every process automatically okay and you can have shared memory segments between different processes usually shared at the same point in their virtual address space as a way to allow you to literally talk to each other in shared memory okay and you can share linked lists and objects and everything okay now the memory layout for Linux is kind of like this okay it's a little bit different than we've been talking about so typically the kernel space is up high the top one gigabyte in a 32 bit machine are for the user code and although we've been talking about the stack starting at the very top in fact it's starts at a random offset and the things like dynamic libraries are at a random spot and the heap is at a random spot and the reason that this randomness is introduced as to where the starting point is is it makes it a lot harder for an attacker that breaks in to your process or the kernel to find your data because you've moved it all over the place okay and notice just from this figure all of these holes in the space so as a thought for from now till next time is what we've come up with doesn't work with holes very well okay because this page table is contiguous and if we have you know if we have four if we basically have a whole bunch of pages we need and with a bunch of holes in it we need to have enough page table space to cover everything okay so this is going to be a problem and are these holes used well right now they're not used for anything in the stack these holes are going to help signal that we need to put more physical memory after we sort of try to go below the currently assigned stack so yes the holes can be used once they've been mapped okay so I should let you guys go I'm going to and you can have more than one page for the stack but you will only put the minimum down there for now okay so we'll talk about some of these other interesting questions I want to let you guys go for now we talked a lot about segment mapping okay so segment registers within the processor by default the segment ID is associated with each access maybe because it's a couple of bits in the address or because it's in the actual instruction every segment has base and limit information in these segments in some cases can be shared okay we started talking about page table so in this case memory is divided into fixed size chunks of memory virtual page number is pulled out of the top of the address the lower parts the offset you just copy the offset you translate from virtual page number to physical page number and unfortunately right now we have really large page tables because of all of the way we've done this next time we'll have multi-level page tables where we can deal with sparseness much better and the page tables are much less overhead all right so we'll let you guys go for now and I hope you have a great night and we'll see you on Wednesday