 Started it's a fairly busy lecture today. So let's try to stay on time So I'm gonna start off by reviewing a classroom last time actually so I really know how many people were able to view the video Since last time and since last week. I mean about maybe half. Okay All right, well, please take a look if you haven't already I'll review a little little bit of it now, but you should look at the whole thing And then the main topic for today's is address translation and we'll be talking about this for actually two lectures And we'll cover some of the basic methods this time And get into more advanced topics next time so address translation is going from there each processes virtual view of memory to the physical memory So and we'll be covering those topics segmentation, you know how the Different areas of memory in each process are handled and maintained separate Paging which is how virtual blocks are mapped to physical blocks in manageable size chunks More advanced translation from virtual to physical addresses For higher efficiency and to deal with really large address spaces like 64-bit and address spaces And then actually page page tables Another more advanced idea and inverted page tables which are on the IA 64 architecture So here's the link to the YouTube video You don't have to transcribe this it's near the top of the class home page right now And there's also a link in piazza Okay, so but please take a look All right, so in that last lecture we were looking at scheduling I The two different schemes First come first served and it was sometimes called FIFO. I think it's equivalent to FIFO And the other one is round robin So first come first served means simply that a process comes in it goes into a queue When resources are available it runs to conclusion And round robin there are time slices of some size Allocated in turn to the waiting processes and once every process has had a slice in the ready queue If they're not finished you go around again. That's The reason for the name round robin All right, and so Let's quickly look at an example of that even if you haven't seen this this is probably going to be fairly intuitive And so imagine we have ten jobs each of them's taking hundreds of seconds of CPU time and We have a round robin scheduler With a quantum of one second, so it's allocating chunks of one second in This round robin fashion to each of the ready jobs The first come first serve doesn't have any parameters it just runs every job To completion so there's no need to Describe it with parameters And so if we just numbered the jobs in the order in the queue P1 through P10 If each one's taking Hundreds of seconds You you'll get this plot here with each job basically taking up a massive chunk of CPU time and then stopping okay, and That contrasts with the round robin approach where Each process gets a little slice You have the context which each time give each process a little bit of time And once you've gone through all of them you start again, so you go through the ten processes here After ten seconds each one's had one slice of one second, so you start again and So it's very fair and It has some advantages although this example is not very favorable to round robin So this was presented last time. Yeah That's the amount of that's just the amount of time that round robin is assigning so round robin You know assigns a chunk of time and then interrupts the process So it's just how long each process gets to run without being interrupted All right, so All right, and so you notice with round robin The process are actually not finishing until the last phase if they're all exactly a hundred seconds then They won't finish until this last phase, so Essentially all the processes spent around about a thousand seconds nine hundred to a thousand seconds Actually in between nine hundred ninety and a thousand So Here we're showing the completion time for each job and first come first served You know, it's just the point where the job finishes So it's one hundred two hundred through a thousand And then round robin because they're all finishing in that very last chunk is between nine ninety and a thousand So what are the how do the average completion times compare? What's the average of these numbers here? What's that? Yeah, okay 550. Yeah, very good. And what about for round robin? 995 near enough. Yeah, right. So about twice as fast in this case the first come first served is It's about twice as fast This case is a bit atypical though Okay, what's unusual about this example Yeah, you saw some other ones if you saw the lecture you saw probably different behavior with different examples. Yeah Right. So what happens if they don't what if you have say lots of short jobs? Yeah, I ran Robin's generally going to do a lot better because If a job takes five seconds, it's going to be done after about 50 seconds Right, or if it takes 10 seconds, it's going to be done after a hundred seconds In round robin format if it's in first come first served. It's just anybody's guess when it finishes. It's just the well actually You you this is almost a universal result, right? If you have any number of jobs whose total running time is a thousand seconds roughly speaking the expected completion time It's going to be about this Certainly if the ordering is random, that's going to be your expected completion time so first come first served is pretty consistent At least with random ordering round robin though generally will give you a completion time That's proportional. It is proportional to the running time of the job And so when you have short jobs the short jobs will finish soon. So it's generally better for Realistic examples because realistic cases you typically have a look generally more short jobs than long jobs. All right So this is just reinforcing that Yeah, so we saw a worse time with round robin in this special case With all jobs the same length, but generally it does a lot better when you have many short jobs because the short jobs finish quickly On the other hand some disadvantages of round robin are that because there's a lot more context switching You there's overhead in associated with the context switches, but also there's generally Additional overhead because of cash loading the cashers have to work harder because they're serving ten effectively ten different processes at once Versus serving just one process essentially being able to clear everything when the next job starts and have that each job only have to Load its own values into cash so typically Round robin if it's memory intensive jobs round robin maybe not such a good Strategy alright, but it is the preferred strategy these days among other things that gives you real-time Experiences for users for processes it involves some interaction. Yeah So sir, what's the percentage of time? Involved in the context switch compared to to the quantum. Do you mean? well As best I can determine that the context switching time now is it's around three to four microseconds the TLB flushing which we'll talk about next time is around five microseconds so even fairly Expensive context switches at the machine level are pretty fast these days and Typical machines like windows is doing probably thousands of context switches a second So at at a few microseconds each that's maybe a percent or a few percent overhead for the context switches Something like that. All right. Okay, so and the schemes we talked about so far are oblivious to Knowledge of how long the job is going to take There was another approach that was described which is called shortest remaining time first If somehow you have advanced knowledge of how long the job is going to run The smart way to schedule is to schedule jobs that are going to finish soon At the front do them first So that's exactly what this heuristic is if you can determine how long a job is going to take to run You just push jobs with the shortest remaining time to the front of the queue So all right, so If you imagine we have three jobs One of the two of which excuse me a and B a CPU bound so they're going to run a long computation Not yield the CPU just go for it incredibly long time Versus an IO bound job That's alternating between what is it a millisecond of CPU and 9 milliseconds of IO Then you know a and B typically when you let it run by itself. It looks like that C has this pattern alternating between disk and CPU in fact In a lot of operating systems so you can actually overlap at CPU depending on the dependencies But you'll often be able to overlap the CPU With its IO Certainly when you scouts start scheduling other jobs, they can overlap their CPU use with job sees IO All right, so When C is running it's doing these 9 millisecond slices of disk and then one 1 millisecond of CPU So it's using 90% of the disk Which is desirable if it's if it's disc intensive that's good And a and B use 100% of the CPU which is fine. They're CPU intensive All right But when we try start trying to schedule them clearly Somebody loses because a and B once they get in in a pure first-come first-serve strategy They're just there until they run out. So it's Poor job. See it doesn't get a chance to run Let's look at the round robin scheme which is was Really the best one we've seen so far and then compare it with the shortest remaining time first strategy on a timeline so when when Clarifying remark I'm going to make is that we're going to assume C is actually The reason it simply doesn't say Spends 90% of the time on disk and 10% on CPU is that it's actually in these explicit slices So we're assuming at the end of a slice. It's going to yield the CPU So in fact CPU C is going to do a little burst and then give up control to the scheduler and Other jobs will be out of run Presumably because he can't really do anything until it gets its net disk IO So So here's round robin With a hundred millisecond time slice All right, so C starts off let's assume it starts off by doing IO and Because it's going to yield the processor it's going to do its its IO and then Maybe use a little bit of CPU actually I guess It hardly matters. It's only a millisecond of CPU. So basically a runs its CPU task and then B runs and Then we're back. It's round robin. So C might start again All right, is that clear so? There's nine milliseconds of of IO one millisecond of CPU in there somewhere can't really see it And then a hundred milliseconds of CPU for a Is that clear? So we're not getting C's not doing very well Let's see yeah, it's running for nine milliseconds, which is it's it's quantum these A and B jobs are being scheduled for the full hundred milliseconds. So it's only about nine Milliseconds of disk IO out of 201 So we're again assuming that the disk IO is being overlapped with the CPU So there's a total of 201 time slices because it's one plus two plus a hundred plus a hundred chunks of CPU and Then just nine chunks nine milliseconds of disk That's about a four and a half percent utilization of the desk So because Job C is sort of working in these small chunks It's gonna do a lot better if we use round robin with a small time slice like one millisecond and it's also more realistic In a real operating system that's trying to be responsive to the user and so What's happening in here is that there's Pretty hard to see little chunks of one millisecond of CPU happening in here and We're going C AB AB The C CPU isn't running right away because it's interleaved with its IO So see does a millisecond of CPU then it has to do nine milliseconds of IO and then it can run again So, yeah, you can see this Roughly nine milliseconds here. So the disk is actually working pretty hard, which is good and The CPUs are actually working reasonably well as well So This is a much better regimen. It's it's C is doing fine now. It's actually fully utilizing its disk really and and you can see also Jobs A and B are running at close to 50% utilization of the CPU, which is what they had in the original scheme as well So this is doing very well The downside is that you do have those additional context switches at one millisecond. That's a thousand context switches a second Not too bad not too big a problem these days though. In fact, not not really atypical All right, but finally The shortest remaining time to finish is going to Basically give the same utilization To The jobs but give them larger time slices. So C is going to do a little remaining chunk a gets a Reasonable chunk of time I think it looks about 10 milliseconds here and Then I'm not sure why is running probably should be be there But anyway, you get larger chunks because the time to finish For a because it's got a fixed horizon is decreasing and normally Yeah, this one might need a little edit But normally if you take a chunk away from a actually no, I'm sorry. I see why it's sorry. I mean a is Once we take a slice out of a it's because it becomes the shortest job to run So that's actually correct. It should be the next one to run So Yeah, if it runs for 10 million excuse me 10 milliseconds It's become shorter than be even if they started out the same. So it's going to tend to run a In a block so we still have this utilization But fewer context switches All right, so it's sort of a more similar to the original first come first serve So it's reducing the amount of wake-ups at the expense of perhaps Favoring one of the jobs So in this case, it's favoring a All right So some downsides of this shortest remaining time to finish You can show that it's giving you okay one thing. I'm sure I should have said I'm sorry. I didn't clarify the advantage of this scheme Overround Robin is like the previous slide, which is if we do this if we run a to completion first Job a then has a nice fast completion time Job B has a slower completion time, but the average completion time Is much better than for round robin. So for round robin the completion time of a and b is near the end if on the other hand though We let be finished that we let a finish first we get a lower average completion time So SRTF turns out to be provably optimal in terms of minimizing the expected completion time. Yeah question Well, we haven't really clarified what the remaining time to finish is for C. So it's a little tricky Yeah, presumably what's gonna happen actually the way it's shown here we can infer because because I is running now it should run to completion because once you start decreasing its time if it's already Has the shortest time to completion Actually now I'm confused because see oh, no, I'm sorry See may have a lower time to completion. That's probably what's going on here But it can't run Until its IO is complete. So that's presumably why C is probably able to be scheduled But it can't be scheduled except from from when it's basically ready to run So we can infer the from this that C probably has see must have the shorter time to completion Then a then B So it makes sense Yeah, yeah, I so so Yeah, so the question is to see know that it has the shortest time to finish so There's this is a hypothetical algorithm that is discussed last time So I'm only giving you one data point from last time's lecture There's a variety of heuristics. We're actually trying to estimate for a specific Piece of code how long it is it remains to finish. So we're we're talking really We're first of all looking at the operating systems view of these processes not The operating system may or may not be asking the processes. How long do you have to finish? And the process itself may not know how long it has to finish. That's really a policy decision We're assuming here that we have some kind of oracle not necessarily coming from the job That's telling us this job has this much time to to remaining to finish if it if we knew that somehow Then this is what we should do for scheduling in practice to make it work we employ Things like the history of that particular program We might also look at its current running profile, you know a job that's run for a long time Normally has a long time to run Just a statistical fact. So Anyway, so No, this is just dealing with the case that we somehow know what's gonna happen. All right, so So some trade-offs with the shortest remaining time to finish is that If you have Many of these small jobs they're going to jump in front of everybody else longer running jobs And if there's a stream of those arriving into the scheduler, then then the longer jobs will never get a chance to run Paradoxically the average Completion time will be good because every job that completes will complete fast Trouble is that there's some long lines will not Which is almost the reason that this This desirable property sort of turns against you because you're almost like avoiding the jobs that will mess up your performance All right, so let anyway large one large jobs don't get a chance to run and The other big downside is that somehow we're assuming knowledge of the future. So, you know, you can ask the job or You can look how job how long the job's been waiting this various heuristics Let's not get into that right now Okay, so It does happen to be better if you have perfect knowledge though that happens to be provably optimal if you have perfect knowledge of How long the job's going to take? Yeah, and has this unfairness property which is it may not allocate any resources to long-running jobs Okay So let's move on the main topic for today is memory mapping and We have to map memory most in most cases these days because we have a process abstraction and Processes have their own monopolistic view of the machine Which includes their own copy of its memory address space? and And we talked before so we just finished talking in fact about sharing the CPU that's a scheduling problem and Today we're going to talk about sharing memory, which is the multiplexing and mapping problem. All right. Well, it There's a bunch of reasons we need to multiplex memory You know again, the main reason is that we give the process of view that it owns everything and clearly in reality process is a sharing memory and Typically several of them are running at the same time so they can all own everything somehow we have to Produce some kind of Mapping from what they see to the physical memory in the machine in some way that Both provides access to what they need but also protects each other From bad accesses and protects the kernel from somebody doing the wrong thing also So yeah, so we can't let everybody use everything we have to somehow Divide things up. All right All right, so here's some desiderata for the memory mapping task We occasionally do want to allow overlap So for instance when you want tasks to share say different Processes may share the same piece of code They may share the same piece of constant data and the rest of the time though you want each process to have its own memory for anything that that it's doing Related to its own state or its own Calculations those should be separate from the other processes if you want it to be correct so so mostly separated access but not completely and along with the separation is that if you if something goes wrong and a Bad piece of code tries to access something that's outside of the valid addressable range of some process It shouldn't be able to that's most likely to be somebody else's data or was still their code So you don't want to be able to write to that and you typically don't want to read it either because it could be secure It could be sensitive information So Typically along with the mapping are some bits associated with each area of memory that say not just That this belongs to this particular process, but also it's okay to read it This process can read that data. This process can write to that data Sometimes there's bits that say this is code versus data and so on And sometimes there's bits that say this should only be accessible by the kernel so protections that intimately tied up with mapping and necessarily and because different processes have a monopolistic view They can't have unique distinct addresses that would require them to know about each other so in it you have a simpler problem where When you design and program a process its view of memory is as though It was the only process in memory and you allow it to access anything Let's say that's not kernel memory So that but when you take those addresses that it's using for its own use It's own sort of individual world view addresses. You have to map them to physical addresses in such a way that they don't collide with some other processes and All right, we'll look at this a couple of different ways So from the processes point of view Let's say it trying to access some data somewhere This is a location in memory that's symbolic in assembly language and There's just some data there constant data Then this instruction here, I guess load word into our one It has a physical address typically that's a word address so that when You translate into a memory address, which is usually a byte address There's normally some shifting going along. So if we have four byte words The original address in hex is this Zero C zero got to multiply that by four Here it's actually in binary. So multiplying it by four is just shifting it left and it ends up as zero X 300 there and so we've basically taken this symbolic address from over here data one and translated into an absolute addressing memory instruction So if we aren't if this was the only process running such as if it were a DOS process and it really did have a direct addressing into memory you would Address this zero X 300 virtual address directly into zero X 300 physical Now it in DOS it works because that's going to be normally the only process running. Unfortunately in Any other operating system there may be an app that's already using that those addresses You're not going to know until your process starts so Then the task is when we'd like to create a unique address for this somehow somehow fix the address And in fact all addresses in the program so that they point to some area of memory that we really can address so Here Let's say we have an available block of memory here starting at zero X 1300 then we can Correct that instruction address and we'll be able to correctly address this memory here and That requires us to go through make a pass through the code and basically relocate all of the addresses And it's a different translation depending on the block that we're targeting So yeah, this in fact does happen And it can happen in different times I mean the worst way to do it is that compile time because that does require you to know ahead of time Where every block of code is going to be sitting in memory when they're loaded again, that's more of a DOS thing and Realistically though, it's a lot more useful If the piece of code is dynamically linked Or if it's just being loaded as a process after all the other processes are loaded By the time you're ready to load it into memory and and run it You know where it's going to go in memory and so by that time You can do this kind of translation just typically adding an offset to the all of the Addresses in the assembly code that correspond to memory addresses. They're normally marked as such so When that's true, that's that code is called relocatable code and it can be moved around All right, so any questions? Okay, so Let's look at it in a bit more detail of the process of actually getting a program From source code compiled linked and loaded into memory So The first stage is the compiler such as GCC and It produces an output which is Typically a form of assembly code with relocation Instructions in it or relocation annotations in it and then at linking Slash load time Different modules are glued together and their addresses are resolved so the difference between linking and loading So GCC compiles to get an object model module for different source programs. You get other object models modules, excuse me the linker takes all the pieces of object code and normally assembles them into a single block of code and in doing so it does resolve the function calls and Accesses to shared constant memory locations or I guess shared Heap allocated variable those all have to get resolved in linkage finally This link piece of code though. It still has to be relocated. So The linkage editor produces a program with all the modules correctly linked together But still those addresses it's not going to know where it's going to run into memory So when you load it when you pull it into memory, there's a final stage of Relocation potentially Usually there's some relocation necessary even for position independent code But finally anyway, so this this code is loaded in and the last step is that it gets linked with Dynamically loaded libraries So dynamically loaded libraries The code that's loaded into memory Has to be able to access those those have typically been loaded already Or they're not loaded yet So this loading process for your main binary Does relocate the memory into the block, but these other dynamic link libraries are typically in different blocks of memory So that's the final stage of of loading and that involves typically little pieces of stub code In your main program that basically run and check first of all Is this shared library currently loaded? If so, where is it? It'll communicate with the operating system find out what modules are loaded What address what's the base address once it does that it typically replaces the stub code with a simple Indirect instruction that allows you to go directly to the functions in the shared library so The effect is finally that you have one piece of code with correct link jump and branch instructions within itself and also Correct branch and jump instructions into the shared library code So that things go fast. There's not much. There's perhaps just one level of memory in direction To call the shared libraries relative to its statically linked libraries But once that's all done then the code runs more or less in the same way as though you produced a massive binary from scratch Okay questions all right, so So that's the sort of I suppose offline Memory translation process that happens code translation process, but it's still not enough and And the reason is that once you load processes into memory There's a continuing scheduling process that's happening, which means other processes are being pulled into memory and Your process is occasionally being scheduled out and then back in and you don't know what other Processes might try to use Physical addresses of memory that correspond to the addresses that you want to use So we have to have in addition to Methods of relocating codes so that it looks like a single binary in memory we also need methods of Giving each piece of code that's running each process a Solipsistic or you know egocentric view of memory So that's what it address translation is about So if reach of the two processes here, we have some code code block data and heap and stack blocks of data and His physical memory and physical memory has to host code data heap and stack for two processes and Usually more and you have to in fact also allow some of these to grow but for right now There's some physical layout of these things. They don't have to be Contigu to contiguous they don't have to have the same order as the original ones, but they do have to be somewhere mapped into some block of the same size as the Processes view Each process should only know about its own Blocks of storage It shouldn't know about what's going on here. So these translation maps Provide a complete view but it is a distinct view for each process Okay, now to make that translation process very fast and transparent to the code you have to use hardware So it relies on a memory management unit and those have been part of computer hardware for many decades certainly well beyond MIPS processors and Once you have a memory Management unit there there's a clear distinction between the CPU's addresses the ones appearing in CPU instructions which are called virtual or logical addresses and The actual physical addresses the bus addresses of locations in memory so older computers this was a Direct operation, but in anything in the last probably three or four decades. It's a translation process all right, so The address space you can talk about it on either side It's the set of all the addresses that some entity can address and For processes, they should be completely distinct as we said So We have these two views one from the CPU and one from memory and the MMU is the entity that translates between them And translation as we said Provides both this distinct view and also protection Yeah, I mean translation it just among other things that allows programs to be linked loaded into Contiguous block of actual physical space All right, so in traditional uni programming like MS-DOS This actually didn't happen Which meant that code always had to be compiled into fixed physical addresses You could have There were typically some services running at the same time But it was necessary to put System services operating system services in a particular area of memory separate from the user space so that user space user Code and data wouldn't collide with them The application was allowed direct access to physical memory so it could directly access The the physical storage on the machine So instead of that though we want to give Processes the view that they own everything and Let's try to do that in the simplest possible way Which is let's try to do it without translation So if we're going to do that then the operating system is going to have to Do the separation at load or link time most likely at load time And we talked a little bit about this the translation then involves typically relocation of instructions So using the annotation in the assembly language file that says this is a relocatable Instruction you have to add you know The abide offset right shifted by four bits or something like that in order to get the correct relocated address And that was the method that was used in the early days of multiprocessing and The downside is there's no protection Bugs can propagate they can access Code and they can probably also access the OS all right, so a Simple extension to this that it does support protection it is to add both a base address to represent The offset of the application area the application memory space and then in addition a limit address which says So basically a base address plus a limit defines a range or a block in memory that a single process is allowed to run in And in order to do it efficiently you want to do this with some registers base and limit registers and You know set them in code and then when a single memory access Happens it gets compared against two registers which can happen very fast So these data now base address and limit address have to be part of the description of the thread They have to get swapped in and out when the thread gets which swapped in and out And they have to be system managed user code shouldn't be allowed to set them. Yeah Why sorry, I'm not seeing why do we avoid translation? I'm not sure why we're avoiding translation Oh, I I said what you're saying. No, we're just setting things up. We're gradually introducing translation in terms of the description Yeah, we're gonna do it. We're just sort of Going one step at a time Yeah, no, we're definitely gonna do a lot more elaborate things in a minute. So Machine such as the cray used a system like this so a virtual address came from the CPU There was a base address that was added to it to get the an address in RAM and It was also compared to some limit in a register if you exceeded that limit it would throw an error so Essentially implemented in hardware, this would be a register special register for the base special register for the limit and In very very few instructions possibly one instruction all of this would happen Because we're using a register for both limit and base. This is a very dynamic a kind of address translation and in fact By loading different offsets into that register say when we do a context switch We're able to move this piece of code around very rapidly that we can relocate it with every context switch Okay We have error checking so pretty good protection Blocks in theory can only access their own code and data Not the code and data of some other process and not the code and data of the kernel Yeah, so so the relocation is now happening through this register which avoids the need for us to do relocation sort of rewriting of instructions like we were doing a little bit earlier, okay and Most of the time and you'll see this in nachos certainly in the next project Blocks of data and code and so on Are often kept separate you want For instance stack and heap storage to be separate stacks normally growing Say down and then heap would be growing up Other types of storage, let's say might be shared you might have some shared constants or some shared code So a more flexible scheme has segments that have a few different types not just one segment per process So typically you'll have code data stack and probably heap and other times different types of shared memory So now we want to give those segments of of storage a Unique mapping into memory We can use the same idea some kind of base pointer and a limit register and Then allow each segment to independently live somewhere in memory. So There's four sort of virtual segments on That the sepa you can see and we can map those Into memory in an independent way Whereas here the processor may have its resources spread out in its address space When we read when we read that into memory when we install that thread we can pack them a bit more tightly So that there's more free storage for other processes Okay, so let's look at this multi-segment model The idea now that the addresses that the CPU manipulates Now you can divide them into two parts The higher order bits form a segment address or segment number and then the lower order bits are an offset So In order to do the mapping from each segment into physical memory Each segment number here indexes into a table and the table contains both the limit and the base values and Also typically a single bit. That's a valid or invalid bit What the valid invalid bit allows you to do is specify a map or a pattern of memory. That's not Everything it allows you to say that Let's say This block of memory here is part of the process or it's not Yeah, this is going to be a little bit more easy to explain when we get into page mapping, but basically not all of these Pages should automatically be valid. We want to allow the Process to say that I want to use these areas of memory and not these ones the ones that it doesn't use We'll have this not valid flag. All right, so The segment number is mapped to this base and limit pair as we saw before the base Now is the real physical address in memory of the start of that segment and The offset specifies an actual address in the segment. So you add those together and Compare the offset against the limit from this table So the limit is specifying just how big this segment is If your offset is bigger than that it'll flag an error or if the offsets negative, it'll also flag an error I guess if we're We didn't talk about sign numbers. So it'll always be positive Okay, so Now a disadvantage of this is that These entries are mapping Into physical memory And the segments are addressed by the portion of the virtual addresses so This is going to get very large if you have large physical memory even worse if you have large virtual memory so a Lot of processes including the x86 include explicit offset or a Segment addressing registers and this is a segment addressing register in in red so the segment addressing register allows you to Once again implement this with an instruction versus Using a series of instructions to read in these offsets and limits and execute code perhaps in software So there are hardware accelerations of those operations All right and valid invalid We talked a little bit about Those get checked when you do a look out to see if the area you want to look at is actually valid or not that is Is a segment so a segment is just some bits of your address all right, so Normally this table is going to include every possible combination of those bits So let's say you had a three-bit segment address you'd need eight elements of this table. This is what we have But you may have specified that Let's say only the first three out of eight of those segments are actually part of the addressable virtual memory The next block is simply not supposed to be addressed In other words, you've set a constraint on your processes memory because it's you've allocated all it needs here You don't want it to run and consume more memory than you expect it to So this would then provide a kind of a segmentation fault check Yeah so the question was what is the x86 example that was an example of Basically hardware support for this kind of process where the base address is in this Say a special register in x86, which is for segment addressing so The instructions that use that typically support Basically adding. Let's see. Yeah, adding. This is an offset to an actual address and then addressing that Yeah, so just a way of making those steps go faster All right, so Another example of this quickly and let's say that we have four segments again code data Some shared memory and and some stack Here's the content contents of the segment table include Some base addresses which would refer to physical memory here and then some limits which are constraining the size of those segments and Let's say here are some virtual addresses so here we're trying to address a block of Virtual memory from the CPU The address is just zero, which means the segment ID is zero if you look up here The segment ID is just the top three bits of the address So it really is zero there The segment mapping table is Addressed first by the that segment ID, which is zero it has a physical memory base address of Hex four thousand so You're a segment being mapped into this base address of four thousand and then it has a limit or a size of eight hundred so this sort of logical block in Virtual memory is being mapped to this physical block at four thousand So the base tells you where it starts the limit tells you how far it goes The next segment with ID one Let's see it's it's one so I'd have to It's one because there are 12 bits Before the one associated with a four if you right shift that another another was if you right sift that 12 bits Which is the offset size? You get one so a segment ID is one The base address of segment one is 4,800 so the base address of this segment is here right after the previous segment and It's limit is hex fourteen hundred and when you add that to forty eight hundred it's five C zero zero So you get the idea hopefully the By using the base and the limit you get a range such that When addresses come in They'll be checked. They'll be mapped first of all, but they'll also be checked and if they fall out of the range It's going to throw an exception. Yeah, all right, so And so there's a few different ways of doing it It can be stored in the early days There was some hardware support and there are enough registers in the older 80s 8086 is to put this in a register and there's advantages to doing that but these days Because segment segmenting is combined with Paging it's not done. It's normally It's a kind of complicated answer. Generally though the complete tables will be in memory that we'll see in the rest of this lecture Except that there's a cache Which is the the TLB which we'll talk about last time which has Typically a hardware implementation in the CPU so logically They're being stored in memory, but they do have a cache in the CPU that's accelerating the process They're these days too big and complicated to be implemented in a few registers like these could be so so the answer is Notionally, they're normally data structures that are stored themselves in memory But rely typically on a custom cache the TLB to accelerate their accesses so Actually They're part of the process state, but it's the PCB I Yeah, the PCB normally wouldn't contain an entire mapping structure. It would be too big the PCB if you needed to Save the state of an entire translation table. You'd probably have to save it to disk separately The the normal mode of running is that you'll have your page tables for running processes already Resident in memory so that when you swap a process back in you don't have to move a large amount of memory Off of disk I suppose there's nowhere else to put it So the the page tables, especially the part of the page tables that are being actively used Should be in memory So there are ways of actually moving parts of them out of memory into disk if you're not using part of a page table It probably will get it will probably not be in memory But I think you can get the idea. It's a complicated process that that Will be much better to talk about next time when we get more deeply into caching the the page tables All right, so the sim the simple segmentation has a number of issues see So fragmentation is one You can see that as I add processes and remove them I create holes in memory as with any other allocation task and sooner or later. I'll get Problem of not being able to fit something in memory The segmentation we've described allows arbitrarily large chunks of memory and that can lead to arbitrary bad fragmentation All right, so Yeah, so we actually this is partly answering your question So, yeah, so moving some or all of the process to disk which Basically means moving its memory along with its paging page page table structures I suppose another way to think about it is that the page table structures are a little bit like the Directory data in a disk file and if you move the disk if you move it file You normally want to move this sort of block directory of the file. It sort of go all goes together There's not a lot of sense to sort of moving the memory without the directory moving the directory without the the data that makes sense so swapping out a process which means swapping out All of the memory associated with the process normally would go along with moving out the page translation tables So maybe that's a better answer the question anyway So Ideally what you'd like to do is only swap out the parts that you don't need But anyway, that's a part of memory allocation, which is something that's also coming later So I better move a little bit more quickly because I'm Gonna run out of time So we already described these problems such as fragmentation inability to fit everything so after the administrator will get back to Paging which is a way to provide smaller chunks smaller sort of mappable chunks of data that will support segmentation But also do a lot more efficient use of memory Okay, so for now Reminders Project one code is due Tuesday, October 8th by midnight and The design docs are due the following night Here are the mid-term rooms again It's a closed book exam. You shouldn't use notes or Calculator smartphones Google Glass cognitive implants other prostheses But you can take one handwritten page of notes Yeah, I thought it was one-sided That's too alright was too then All right Okay, so One handwritten page. All right, okay All right, it's covering material up to I think the Wednesday before The midterm we do have a review session scheduled in her smining the Friday before the exam Okay, so please try to attend that all right Okay, so All right five minute break and we'll try to finish up after that. Yes. No worries Sorry Hang on a sec. Oh and too fast. Sorry All right, let's try to wrap up so The solution that we're going to talk about now is paging which means Instead of trying to allocate these huge segments of different sizes and manage them We're going to break memory up into fixed size fragments fixed size for now anyway called pages so You could if you have fixed size chunks and one observation you can make right away is that you can think of the allocated set of pages as being Describable with a bitmap So in other words make a long list of them for every a page that's part of the processes allocation You could assign a one things that are not part of the process allocation would be assigned to zero So this is a very efficient kind of a page table, which just says here are the elements that you're allowed to address and in fact that is used Never by itself, but in more complicated down page representation schemes so When we were talking before about segments We were talking about segments that were big enough to hold an entire set of data probably make it many megabytes or entire program But that's going to cause the problem the problems that we had before which are you know these massive chunks We won't be able to fit many of them in memory and they'll cause fragmentation with paging instead we choose blocks that are small enough that They small enough and a similar size that we can allocate and reallocate lots of them without really having fragmentation But also large enough that the overhead of keeping track of them Doesn't become too large. It doesn't dominate memory All right, and that's what the paging approach is you'll choose chunks typically in the range of about 1k to 16k kilobytes and Then you'll still have to have tables, but there are clever ways of manipulating tables to those pages That are themselves pretty economical and basically Where the size of the paging structure is proportional to the size of the memory that you're using so Here's a simple page table. It's similar to the segment tables that we looked at before So there's a page table pointer It's addressing some number of pages here though instead of Base address you'll have a page number which is typically In order to get a physical address you multiply by the size of the pages The page table normally resides in physical memory, although it might be cached and And it will contain permissions again of the type we had for segments and Also valid read valid bits as well to say whether that page is usable currently by that process So A virtual address here now we have a page number instead of a segment number and an offset and The virtual page number is going to index into that table The offset will add later So a virtual virtual page will give us Let's say if it's one will access page one now Page one is going to be some physical page address like 17 or 300 or something Those bits of that address are then appended in front of the offset so you can see that the Physical page address here is effectively shifted by the number of bits in the offset So it's not it's a bit different from the offsets that we saw before Okay, so Yeah, and here this is a simple model where basically we're just preserving all of the offset bits and replacing The virtual page bits with physical page page bits So this is a simple table-based translation of that address there into a physical address All right, we can also include Actually in at least two places a page table size bound Here we probably aren't going to use an offset bound as often because the pages are small We're more likely to be concerned about a virtual page address straying outside of the boundary So it's more reasonable to check this page table size Going out of bounds Does that make sense? Anyway, that's commonly done. We'll do a page table size check here to make sure that we're inbound for that particular process Okay, now This approach Does allow us to share memory if we have a couple of processes each with distinct page tables The the actual physical page page Addresses can be shared and that gives us a way for the two processes to access Either shared constant memory or shared data memory if they're going to communicate All right, so and just going a little bit drilling a little bit deeper. Here's an example of Three pages of virtual memory being translated into physical memory. These are tiny pages with four bytes each ABCDFG those are alphabetic labels not addresses in hex and You can see that here in the virtual address space we have basically three blocks of four objects each four bytes each That's the page table size so each of these addresses I Suppose implicitly addresses an entry in the page table The page table has to contain an entry for all of the addressable virtual memory. So here. It's only three And it's mapping from each of these physical addresses which are Right shifted by two bits Because it's a four byte page into zero one two, so it's zero four eight and Because the page size is four you've got a shift away the offset effectively And then your page number zero one two is that clear From there you can just look up a new page number, which is four three one and Append the new page number Here it is here It's one zero zero in front of the two bit offset and you get this entry here So that segment is going to map into a physical memory at address hex ten and Similarly for the other one it's going to map The address is three when you append the three in front of the offset the original offset you'll get hex C And so that blocks Now above the other block and the third block is going to Address up here. So you've got independent now locations for these three pages. Yeah Well, you've got it. It's no special reason. Why did we choose to two bytes? Well, we actually chose four bytes, which means that the address The the number of bits we have to use to address within a page is two It's a four byte page the number of bits You need to address the pages the log to the base two of the number of bytes in the page So if it's a 1k byte page, it would be 10 bits here. It's only four. So we need two bits So the two bits of the address are in black and then the rest is actually a page address So so four goes to one and eight goes to two and so on so essentially you strip away the Address within the page left with just the page number that Goes into this table, which is a complete table should be for all possible page addresses And then it just takes you somewhere else in memory But you do the same thing with this address here, which is you Appended in front of the actual within page address Okay So on context switching the idea is to do the minimum amount of work So to me that that means not trying to switch the table simply trying to switch the pointer and potentially the limit so that You know process a goes out and process B comes in process B will have its own page table pointer and Limit in its process control block and we'll assume that that table is still in memory And then all it'll it'll just keep on going because the state should be the same as from before all right, so Advantages of this it's a simple allocation scheme and easy to share But big problems with large address spaces So yeah, this scheme because we have an entry for every page With a large address space you need millions of entries So So there is a solution to this and I think we're going to run out of time I'll just introduce it and we'll have to complete it next time so One elegant approach is to use a tree of tables and and the tree is a spa structure that only contains entries for the Actually valid pages in the page table and In fact, we'll often use the most economical at the bottom of the table, which is sort of the lowest order address We'll use the simple bitmap representation of the page table, which just means some pages are in some pages are out So here's the idea and we have the virtual address here the CPUs view of the world and some of the bits form the first virtual segment and and the second is the second part of the virtual page and Those are used together with two different tables In fact here with this the first one serving as a segment Table so this one has segment data, which is a base and a limit that then accesses into a separate page table for each segment and includes the bit specific to that page so The segment plus the virtual page are translated into a physical page and an offset and That's the the physical address So this with this scheme It's possible now to still do the same kind of things we were doing with segments before But we have the additional flexibility in that the segments don't have to be allocated as contiguous blocks in memory And each segment can be comprised of many different fragments or pages that are distributed through but using up most of physical memory Yeah, we'll still have the same kind of checks as we did last time All right, so Okay, so we're out of time. We do have next lecture Is going to get more into address translation anyway, so we'll just take pick it up from where we leave off here next time