 OK, let's have class. So today we're going to continue talking about address translation, and we'll talk about a way to solve some of the problems that Carl described on Friday and an approach that's much more in line with what modern systems actually do, as well as a friendly piece of hardware that actually helps us solve this problem in a nice way. So Simon 2 is due on Friday at this point. Hopefully, where you would want to be is pretty much close to being done. I can see people smiling. I'm sure that means I'm done. I feel good about that. Or maybe it means I have no idea what you're talking about. What is the Simon 2? I haven't started. I don't know. Either way. So yeah, the goal here is to get to the point where you have something working and then spend, you're probably going to need a few days to hunt down some of the corner cases you're missing. Make sure you're handling error codes properly, throwing the right error codes, things like that. OK. Yeah, so hopefully this is like started to sink in about the assignments in this class, which is that they are difficult. Now, Simon 3 is worth twice as many points as Simon 2. And that's kind of a rough estimate of how much more difficult it is. It's probably even a little harder than that, but also more fun. Because assignment 2 is fun. Simon 3 is actually probably more fun than assignment 2. So hopefully you guys will find that out. Yeah, so this is like, assignment 2 is just like the end of volume 1 of Kill Bill. And then you still have quite a bit of a kill list to get through. OK. I just posted some announcements on the forum, but I sort of want to address this in class. So I suspect I was talking to some people last week at a conference I went to, and they've done a study on plagiarism. And one of the things that they discovered is a lot of times the plagiarized code that students submit doesn't actually do very well on the assignment. And I've actually found that to be true in this class as well. I've had students fail this class for submitting code that didn't compile much less actually run. OK. So think about that for a minute. And what that leads me to believe, because I'm a scientist, is that a lot of these cheating episodes are caused by sort of last-minute freakouts. I mean, there may be some people here, maybe in the sound of my voice right now or watching from home, who I've just gotten through school by cheating on stuff, by just submitting code that's plagiarized, and come into a class with that in mind. I hope that's not you. But I suspect the common case is that it's Friday at 4.30 PM, and you wait until Thursday to start the assignment, despite the fact that I've been badgering about it for a while. And it's like, OK, I'm going to go find something online, cut and paste it into my terminal, and hope that it works. Those are actually also nice cheating incidents, because they're really easy to detect when you have people that are like, I, through my incredible brain power, miraculously produced this 1,000 line chunk of code, character for character, identical to somebody else's on the internet. Like, that's just a chance. I've had people tell me, well, it's just an accident. It was just coincidence. I mean, I don't know. I would love to be able to calculate what the likelihood is, that that's actually the case. Maybe the likelihood that the sun won't rise tomorrow due to quantum effects or something. So it's not likely. So anyway, this is sort of the danger zone a little bit, I think, for some people in here who might think that they don't want to cheat on the assignment, but are worried about being behind or whatever. So just please, if you get to that point, don't do it. Go talk to Carl for a little while. That's a good way to kill some time till the deadlines. You don't get in trouble. You know, it's a great way. I mean, I love talking to Carl, right? I mean, I learned some really interesting stuff every time I talked to Carl. He's a farmer who's been talked to. Or the other TAs, they're also interesting to talk to. Like, call a friend, go for a walk. Just don't go to GitHub and find somebody else's assignment that doesn't work. Cut and paste the code from it and submit that. So please, just I'm begging you. Whatever you think bad is gonna happen if you don't submit an assignment too, the thing that will happen if you plagiarize the assignment too is worse. And I can give you very, very good odds. My batting average on actually managing to fail people to do this is pretty good. So don't do it. I don't like it. I mean, I really don't. I'm praying, because this is my last semester, please let there be no cheating cases. Okay. Questions about anything? Oh, no class on Friday. I'll just point that out. In class, I will be gone. And it's last Friday before spring break. And the assignment is due. We'll have lots of hours on Friday. Any questions about anything up to this point? Okay. So let's review a little bit about what we talked about on Friday. The goal here is, so we've discovered the Saturday space abstraction. We like it. We wanna get it to work. The goal is to allow as many translations as possible to proceed without getting the kernel involved. The kernel is way too slow to put on this path. The more the kernel gets involved, the system slows down to a really, really large extent. So we want almost every virtual to physical translation to proceed without the kernels involved. And I just said, right? So what's the split here? So this is another case of sort of a, a split between these two things that we try to arrange when we build systems. What's the kernel's role in the situation? Right, so the kernel is in charge of setting and enforcing policies about how addresses get translated. An address space is really nothing more than the collection of a set of policies that the kernel is enforcing. That says that a process is allowed to use a particular virtual address or not allowed to use that virtual address. And if the process is allowed to use the virtual address, what does it map to? The hardware is here to provide the mechanism to make this happen quickly. So whatever we do in hardware, we need to make sure that the operating system remains firmly in control of how the translation happens. Okay. So we talked a little bit about this, the idea of implicit translation. So when you guys use, when your programs use virtual addresses and all the addresses that your programs use are virtual addresses, this is happening every time. The kernel's not always involved every time. Like we said, that would be too slow. But every time you use a memory address, there is translation that's taking place. Behind the process is back and ideally as rapidly as possible. All right, so there were a few different approaches that we talked about on Friday. This was probably the simplest. So someone described to me how base and bounds translation works. For each one of these translation approaches, there's some amount of state that the kernel has to store for each process. Remember, because process virtual addresses have no meaning outside of a process context. If you ask me what is 0x10,000.2, I cannot answer that question without a process. And then there's also an algorithm for using that information along with the address that I'm translating to produce a final physical address. So base and bounds. Did Carl go over this on Friday? I think so, I think they're supposed to be reviewed, but everybody's giving me blank stares, okay, yeah. Okay? So right, so I've got one big contiguous set of physical memory. What information does the kernel have to store about that? Yeah, so I need to know where the memory that's actually allocated this process is in physical memory and how large it is. Those are my base and my bound. Now how do I translate a virtual address from the process to a physical address? Yeah, so I just simply, so the first thing I need to do is check to make sure it's okay. So this is something that I have to be able to do. This is part of the policy, is making sure that the address is valid. The check here is just making sure that the virtual address is smaller than the size of the segment, because every address space in this case starts at zero and moves up. Every process address space. And the translation is easy. All I do is add it to the base address. What were some problems with this approach? Yeah, okay, someone wanna answer that question? What do I check here? Do I check virtual address less than bound or virtual address plus base less than bound? Depends on, that's a good question, because it depends on what I mean by bound. If the bound is the size, let's say the bound is the size of the segment, the size of the amount of physical memory that I'm allocated to each process, or to this process, then what do I check? Just the virtual address, because virtual address is the offset into this particular segment. But let's say the bound is the physical address where that chunk stops, then what do I check? Then I would check the base plus the physical. Good question. What's wrong with this? Why doesn't this work? This is not a good fit for our address space abstraction. There's one very, very big, big gaping problem with base and bounds, yeah. Yeah, specifically what? What's gonna happen here? If I have to give every, I could do this today. So that's what's really interesting. Back when these techniques were being proposed, nobody had four gigabytes of memory in their computer, but now everybody does. So I could actually get away with this today if I wanted to, I don't. But I could say, okay, I'm gonna make a small decrease. Let's say I give each process a one gigabyte virtual address space. It's not much smaller than what I'm giving them already. And now I can put four of them in four gigabytes of memory. So that sounds great, about four cores, four gigabytes of memory. I'm gonna assign one gigabyte to each process and just let it set up its whole address space there right continuously. Why does that not work very well? Yeah. So what about that huge gaping hole between the heap and the stack? That's, what's in there? Like in the actual virtual address space, what's in between the heap and the stack? This could be gigabytes of space. What's in there? Nothing. Those are, what's gonna happen if the process tries to translate one of those addresses? Yeah, I shouldn't use so many violence metaphors in this class. Like, what's a different way of putting that, right? Like game over. Yeah, anyway. Yeah, yeah, yeah. Sorry, you've been eliminated. Voting you off the island. I don't know where the process goes at that point, but who cares? Do you know where the contestants from those shows go? Nobody knows. Into outer space. So, yeah, so those are invalid addresses and the interesting thing is I'm allocating physical memory for them. So this is a terrible idea. I've got invalid physical, I've got invalid virtual addresses to point to valid physical memory. This is bad. I'm wasting all of those addresses. They're not valid, they can't be translated but they're still actual physical memory backing, right? And that's because somebody pointed out outer spaces tend to be sparse. That's the goal is to exploit that sparsity. So I need something that's better fit for my outer space of distraction and doesn't produce as much internal fragmentation. What's the difference between internal and external fragmentation? Internal fragmentation is when spaces where it stood where. Inside my allocations. My allocations are a good fit for when I'm doing it so there's lots of internal wasted space. External fragmentation is when spaces wasted where? Well, internal, yeah. On the physical memory. Well, in between, right? So external is when I have fragmentation that occurs between the my existing allocations. So internal fragmentation wastes spaces inside the allocations, external fragmentation wastes spaces between outside. So it has to do with how those allocations are placed and the size of it. Okay. And if I don't do this very well, this can also suffer from external fragmentation. Although there's probably ways to get around this. Okay. So then we talked about segmentation. So how do I extend this idea slightly to make it into a better fit for address spaces? So there's a trade-off, there's a game we're playing here. There's a set of trade-offs that we're making. So simplest possible approach, bad for fragmentation. How do I address that using as little extra information as possible? Yeah, so think about it. I've got a virtual address space, a lot of it's empty. If I cover it with one big basin balance, I've wasted a lot of memory inside that allocation. But what I can do is I can say, okay, well, I've got a code segment over here, or I've got a part of the address space that typically has code in it. A lot of this layout is due to convention, but I've got a part over here that typically has code. I cover it with one segment. I've got a part over here that has the stacks. I cover that with another segment. I've got another part in here that has the heap in it. I'll cover that with another segment. And by the time I've allocated three or four or five segments of the process, I've pretty much covered everything. And so I'm good. So how does this get a little bit more complicated compared to basin balance? They're not contiguous, so what extra information do I have to keep around? Yeah, so with basin balance, there was this implicit assumption that the basin balance, the virtual address always started at zero. When I start using segments, every segment is not gonna start at zero. They're gonna start kind of where I need them. So now I have multiple bases in bounds. And so not only do I need to store where they are in physical memory, but I need to store a little bit more information about where they are in virtual memory as well. So each segment has a start virtual address, a base physical address and a bound. So three pieces of information per segment with the addition of the start virtual address. The bound is the same. Why? Why can I just store one bound rather than, I could also store the end virtual address and the end physical address. That would also be correct. Why can I get away with just storing the bound? Well, what's true about the size of the segment in virtual and physical memory? It's the same, right? So the size of the segment can vary. The code that when I start, I might not have a heap at all. So there might be no heap segment, but the stack segment might start off smaller than the code segment or whatever. But whatever the sizes of the segments are, they're the same in virtual and physical memory. Now, again, everything's getting a little bit more complicated. So rather than a single check, I have to look through all of the segments that I've allocated for the process and try to figure out if a particular virtual address is valid by figuring out, is it inside any of the segments? So before I just had one check to do, which is I could just check inside the one single base and bounds. Now I've got to do it for each segment. And to translate, once I found the segment that contains this virtual address, assuming it exists because it's valid, then I do a little bit more math, all right? So segmentation is good and then it fixes some of those problems. What's bad about segmentation? So now I'm not wasting these huge chunks of space where there are invalid addresses. So that's good. All those invalid addresses that I had to back with physical memory before I don't have to, but what's the problem here? It can also potentially waste space, where? Oh, no, this amount of information is reasonable. Most processes are not gonna have that many segments. Most have five, six, something like that. I mean, even fewer, maybe two, right? A code data segment and then a stack, yeah. Ah, sorry, stupid thing. I mean, I can, yeah, this can suffer from some external fragmentation because the segments are now different sizes, but where is there wasted space here? So the first case of wasted space has to do with address space locality, address space sparsity. Most valid addresses are clustered together. The next part of wasted space has to do with runtime locality. Dave, do you want to guess? Okay, so give me an example. Yeah, so this helps with external fragmentation because I'm not wasting all those spaces in between valid segments. But think about the code segment. Think about the code segment for like, okay, I'll stop making fun of Microsoft Word. I'm gonna shift targets, Adobe. Adobe is an easy target. They make some terrible software. All right, sorry. Hopefully none of you guys are doing internships there or something like that. But if you are, you can help them because they need it. All right, so let's talk about Photoshop, right? How large do you think the code segment of Photoshop is? It's got all the code to run Photoshop. Does anyone use Photoshop? Okay, so this is like, like give me your estimate of the code size of Photoshop. Huge, yeah, I mean there's all sorts of code in there to do like crazy stuff. All sorts of functions that you never use. How much of that code segment do you even know how to use? Let me ask you that question. Anyone here a Photoshop wizard? Okay, Gus is a Photoshop wizard. How many parts of Photoshop do you think you can actually reliably operate? There we go, 10%, right? For me it's like 1%, like how do I get out of here? I opened it by accident. I'm like, click exit, you know? There's probably like five different exit dialogues somewhere, right? It's like run script to exit, I don't know, whatever. I mean, yeah, so these tools have gotten really complicated and a lot of times there's multiple ways to do things or whatever, but it doesn't matter. I mean, the other thing is at any given point in time you're only using a few of those functions. So if you think about, if you could think about a graph that would highlight the parts of an address space that are in use at any given point in time, they're very small. So there's some little bit of code that I'm using to run some subroutine that's like blurring the image and then there's the part of the image that's stored in memory and a lot of the rest of the address space is dark. It's not actually, now those sort of hot spots move around as you use the program and moves from one subroutine to another subroutine, whatever. This is called execution locality or temporal locality. And so if every time I need to run one tiny little bit of Photoshop code I have to yank this huge code segment that's like a gigabyte in size into memory, I've got a considerable amount of internal fragmentation. And if the segments have to be contiguous in memory there's also this external segmentation problem as well. So here are my issues. All right, questions at this point. I find it useful to review this stuff. I'm sorry if Carl said everything that I just said in a more interesting and entertaining way on Friday. All right, questions. Okay, so the goal here is as follows. I just wanna, because segmentation still has some problems left the goal when we started or something that would solve all of our problems is this idea of being able to extremely efficiently map a byte to a byte. So if I can map any virtual byte to any physical byte I could use this ability to fix the problems with segmentation because I wouldn't have to yank this whole segment in I could just map every byte of the segment to a different spot and control those mappings independent. Now the OS can't do this but the question is is there a way to get hardware to help us? And it turns out that there is a piece of hardware and you can, like a lot of the terminology in this class this particular piece of terminology is very specific to a particular implementation of something but it doesn't matter. This concept is pretty universal. So what we're gonna talk about today is sometimes referred to as a translation look aside buffer. Yikes, what is wrong with this thing today? Is someone messing with me? Someone's got a clicker? Anyway, don't do that. Just giving you guys bad ideas. So and this is a really sort of classic systems approach. Sometimes I refer to an operate system as a series of caches. Registers are a cache for the L1 cache which is a cache for the L2 cache which is a cache for the L3 cache which is a cache for main memory which is a cache for your disk which is a cache for the internet. Which is a cache for all of human knowledge that we're slowly populating. Yeah, I mean, to some degree, like that's how computer systems work. I just explained it in one sentence. It's still hard to get them to work. That's how computer memory works. There are other caches for other things. But what we're gonna do is we say, okay, the operating system can't do this. Is there a piece of hardware I can use to help me? The piece of hardware is something that sometimes referred to as, and this is kind of cool, content addressable memory. Does that make sense? Has anyone ever heard about the addressable memory before? At some point they taught you about this in some other class, like a cam. Maybe you just called it a cam and the acronym was never explained. Yeah, so I think this is kind of cool. Normal memory, I address it using an address in memory. And maybe you guys are wondering what I'm talking about here. So when I address normal memory, I give you an address in memory and then I tell you load or store. When I address content addressable memory, I give you an identifier for something that is stored in the memory and it returns an address. Does that make sense? So it's like if I could go to memory and say, tell me every location in memory that contains the byte 12. And it would be able to answer that question. So this is content addressable. It's addressed by what's inside the memory rather than where the memory is or some sort of arbitrary location. So here's an example of how this works. Here's my TLB. And the way I'm gonna use TLBs here is I'm gonna use them to cache this mapping that I want to do. The mapping from virtual byte right now to physical byte. So here's my cache and the cache remember is addressable by content. And what that means is that I can, you can think of it as simultaneously being able to do this look up efficiently across the entire cache. So when I ask the TLB to look up OX800, what does it return? Yeah, so essentially it will return the location that contains this content. Does this make sense? So instead of asking it for location two or location one or whatever would this would be, if I start from zero, this is location one, I query it based on the contents of that cell. This is kind of neat. And what I get back is, okay, I do have that value and there's also this other byte next to it that is X306. So I can use this to do translation. So I ask the TLB, can you translate OX800? It says yes. And then looks that up. Okay, well we'll go through more examples later. Okay, so this sounds awesome. I mean some of you guys are probably sitting there being like, wow, content addressable memory. I never knew that was a thing. Because if you think about this, this totally explodes the possibilities of the kind of things you can do with memory. Imagine I have a hash table where I can look up every entry in memory super efficiently. That's kind of what we're doing here. There's only one problem. Cans are really complicated to build. There's a lot of hardware circuitry and they don't scale. So I can't make one that's four gigabytes. If I could, I could do some wax stuff. Like computers would be different. But I can't, so I have to make them small. And so they can't, so what's the problem here? Let's say I can only build a TLB that has 32 entries in it. Why does that matter? Yeah. Well I have a lot of, remember I'm using this as a cache. So I'm not expecting it to store every translation. But if it can only store 32 translations and I'm translating bytes to bytes, how many bytes of virtual memory can I translate? That doesn't sound like very many, right? That wouldn't be very, so it turns out one of the magical things about virtual memory is you don't need a very big cache here to do the job, but 32 bytes is not large enough. Like you need to do better than 32, okay? Or 128 or 25, and they can probably make cams out there to like 10, 24 if you spend like a billion dollars, okay? But that's still not enough. So at this point, here's where we are. Segments that are too large, so they do internal fragmentation. Mapping bytes means that the TLB can't store enough information. So what would happen here? This would work. Let's say I have a 32 byte, a 32 entry TLB that can map bytes to bytes. I can map 32 bytes of virtual memory. This will work, but what will keep happening? This is gonna be really slow, why? Because I'm gonna keep asking the TLB, can you translate this byte? And the TLB says I don't know how to do that. And then what do I have to do? I have to ask the operating system for help. Then the operating system has to get involved and make sure that that address isn't invalid. And if it's valid, it has to tell the TLB how to translate it. So the smaller the amount of memory that the TLB can map, the more often the operating system gets involved and everything slows down, all right? So the modern solution to this is to pick, to use these TLB pieces of hardware because they're super useful, but to pick a translation granularity that allows them to map enough memory. So there's a very clear trade-off here between internal fragmentation inside my mapping units and this amount of memory that the TLB can map. Those are, this is the trade-off that I'm playing. I'm trying to find a middle ground so that the TLB can map enough entries, but those entries don't get so big that there's a lot of internal fragmentation. This also, so the other thing that I do care about here a little bit is that, remember that kernel has to store information about these mappings. When I stored information about base and bounds, I had like two bytes or eight bytes of information to store. When I stored information about segments, I had like 12 bytes times the number of segments which is still really small. This approach is gonna require that the operating system store a lot more information. So to limit the size of that information, I also wanna choose a mapping granularity that's not too small. If the operating system had to save one mapping per byte, then the size of the mapping table is the size of the amount of memory that the process is using and that is too big. All right, now remember, we have a friend here which is execution locality. I just wanna make sure you guys understand how powerful this is because this allows us to get away with a much smaller cache. So here's what typically happens as a program runs. When you do something weird, like you know you do something unexpected, like you ask Photoshop to take a picture and convert it to grayscale, okay? Photoshop suddenly starts running this subroutine that maybe you've never run before. And so initially when it gets there, there's a lot of misses in the cache. It's accessing data that's not in the TLB, it's accessing code that's not in the TLB. So for the first infinitesimal little amount of time, you get all these cache misses and then the cache fills up and then the rest of things run entirely inside the cache. And so a little bit of penalty every time the program sort of changes states, but then once it's in a new state, assuming the cache is big enough, there shouldn't be any more misses and it can make a lot of progress without asking the operators for help. Does that make sense? So again, as you're clicking on menus and stuff like that, you're kind of like giving it these unpredictable inputs and forcing it to adapt. Once it's in a more stable situation and it's working on something, the goal is for all the translators to be cached so they can happen really fast without the operators having to get involved. What we call these units pages and page size of 4K was like this canonical page size for a gazillion years. I suspect it's still pretty common despite the fact that as memory has grown, there are some systems that have started to use things like 8K as a default page size. There are variants of Linux that allow you to have large page regions where the pages can be multiple megabytes. So on some level, the fact that memory has gotten really cheap and systems have gotten lots more memory has caused us to rethink some of these design decisions but it doesn't really map, right? I mean, we're gonna choose a size of a page and pages are what gets mapped, rather than bytes. And this is how you can think about this. If I have a TLB that has a certain number of entries, I can combine that with the page size and get a limit on the amount of memory that can be mapped by that TLB at one point in time. So with a 4K page and 128, is this still wrong? It's still wrong, sorry. It should be 512K, whoops. Okay, someone should, free extra credit if you file a bug report against this slide because it is wrong. 4K times 128 is 512 just in case you were confused. So my 128 entry TLB allows me to cache 128 entries each points to a page of memory being 4K, that means I can cache 512K of translations at any given point in time. Sometimes, so remember I was talking before about the memory that's being used by a particular part of a program? That's also referred to as a working set in some terminologies. So the pages I need to kind of do a particular computation or do a particular amount of work, this is not a well-defined concept, by the way. It just reflects the fact that when a program is working on a particular task, it tends to use a pretty stable set of pages. And as long as those pages fit inside my cache, I'm good. It's also helpful if those pages fit inside like high-level caches, like an L2 cache, for example, yeah. Yeah, exactly, yeah. So the TLB always has to be able to look up based on virtual address and return entries that contain a physical address as well, yeah. No, that's a great question, right? So how much information does the TLB have to store to identify, this is a good little preview, I think this is on the next few slides, but if a page size is 4K, how many bytes does it take to identify that page uniquely? On a 32-bit system, sorry, that's important. On a 32-bit system, how many bits does it take to identify a 4K chunk of memory? 12, close, 12 was on the right track. No, not close in that way. 12 addresses 4K. So there are 12 bits that are used to address the part of the page you're talking about. So if it takes 12 bits to address within 4K, how many does it take to actually address the page itself? We'll come back to this. 20, 20 bits, right? 12 bits describe the offset within the page, the other 20 bits identify the page itself. We'll come back to this. Ashish looks confused, that's always dangerous. Yes, yes. So one way to think about pages is that pages are like fixed-size segments, right? Remember when I talked about segments, they could be various sizes. Pages allow us to simplify things a little bit by assuming that every page is 4K. Or again, on systems that support large pages, I have one type of page that's 4K or 8K, and another that's like four megabytes. Why the move towards these larger page sizes? I mean, 4K seems nice, it's like a nice number. It's like a computer science number. I see people like writing code, it's like number of max processes, 30. 30 is not a computer science number, don't use that. A computer doesn't like 30. The computer looks at that and it's like, oh, man, what is that number? Just use 32 or something, power of two, please. It'll make me so much happier. And that's important, obviously. Yeah, wait, yeah, four megabyte pages, why? Why are we seeing larger page sizes? I mean, there's a particular type of object that applications use and manipulate a lot today where these larger page sizes make a lot of sense. What would that be? When it gets close to mid-turn time, I'm just generating multiple choice questions in my head when I'm talking to you, by the way. Anything where there's like a longer than a five or 10 second pause, I'm like, okay. Like, why would I, again, I mean, Pater's supposed to reflect something about what the process does. Where do I start to want like large chunks of memory? Where are all the bytes in four megabytes? What type of content is there where all the bytes in a four megabyte chunk are kind of related to each other? They're probably gonna be used together. Well, to a particular type of file. Yeah, any sort of like video, I mean, videos are like gigabytes long. Like four megabytes of video content at a full frame rate for high def is like a second or two, maybe? I don't know, someone can do that math for me, but it's not very much. So if an operating system is gonna, if the program's gonna use those four bytes, they're just gonna use all of them at once, right? I don't need to break that into smaller chunks. Either you watched that part of the movie or you didn't, right? Either you seek past it or you seek to it. I don't need to break it up into chunks that hold like a microsecond worth of movie content. That doesn't make any sense. All right. All right, I just said that. Where are we at in time? Oh, we're good. Okay, so when we start translating pages, and this is sort of like segments except for their fixed size, but we break, but because they're fixed size, we can break every page into two parts. There's the top part of the page that identifies a 4K chunk of memory. It's known as the virtual page number, and the remainder is the offset. So, and every virtual page, as the chief pointed out, maps to a physical page. So what I'm doing is I'm breaking virtual memory up into 4K chunks, and I also break physical memory up into 4K chunks. And every allocated piece of physical memory maps to some virtual page. If it's in use by a process, the kernel might also be using it for other things. But every process page maps onto some contiguous chunk of 4K memory. And as you can see from a fragmentation perspective, particularly external fragmentation, this is very nice. We'll come back to that in a sec. All addresses in the single virtual page map to the same physical page that sort of implicit. And so here's my algorithm for using these, and I have a little bit of a slide about it too. So the first thing I need to do is figure out whether or not that virtual page number is valid. And this is where the data structures aspect of this problem comes in, and this is something we'll talk about, Carl will talk about on Wednesday. So the app rate system has to be able to quickly figure out whether or not a virtual page number is valid. I don't worry about addresses anymore. This is all done on a page granularity. So I figure out what virtual page the program is trying to use, and I figure out if it's valid. As a corollary to this, I only provide access to memory on 4K boundaries. So this is kind of interesting. People sometimes are like, well, is it possible for the kernel to allocate like 32 bytes to a program? No, I can only give it pages. There's no way to protect anything inside of a page. So if Malik wants memory, Malik can ask for 32 bytes, but what it's gonna get back whether it wants it or not is a page. And then the next time, maybe it asks for 32 bytes and I just don't have to give it pages if it's inside the allocation I already gave it. But the point is that all the protection and allocation is now done on page boundaries. So I check to make sure it's valid. Then I translate the virtual page number to a physical page number. And this is something that the operating system has to be able to do. And it also has to be able to tell the translation look aside buffer how to do it. Because remember, the operating system does not wanna do this every time. It wants to look it up once and then it's gonna tell the cache, here's how to do this in the future so you don't have to bother me. And when I'm done, what I have, so the process of translation is kind of pull off the offset, translate the virtual page number to a physical page number and then plug the offset set back in. And for this particular, well actually this is true for any Power Rift 2, but for 4K pages, you can think of it as ripping the bottom 12 bits off, translating the top 20 bits to a different 20 bits. Again, this is 32 bit translation and then stapling the bottom 12 bits back on. And you can literally do that in bit operations and you will for assignment three. All right, anyway, you guys can watch this later. This is my favorite little clip from the social network. So the only funny thing I'll point out about this that you wouldn't get without me here to point this out is that when I saw this movie, I saw this slide and I was like, that is way too complicated. Someone just made up a joke slide because I was like, I'm pretty sure that Matt never used a slide that was that complicated. And then when I started preparing the slides for this class and I was looking at his old slides, I found that slide. So that's a real slide. So if you think my slides are complicated, look at that, I have no idea what that's about. Except that there's a lot of boxes and arrows on it. Okay, so let's walk through an example using the TLB. Let's say that we're using 4K pages, that's pretty typical. So how do I translate this address? OX 800,346. So there's some helpful things visually to remember. In Hacks, each character is how many bits? Four. Four, right? Remember this about Hacks? Two to the fourth is 16. Zero through nine gets me 10 symbols and then eight through F gets me the other six. So that's why I go up through F, right? So every character is four bits. So what characters correspond to the last 12 bits of this address? The bottom 12 bits. 346. So I can pull this apart. I'm gonna yank off the bottom 12 bits, translate the top 20 bits. Now I didn't show you 20 bits, I only showed you 12 bits, but the top two are zeros, sort of by the top two nibbles as they call, I love that word. Bits and nibbles are zero. All right, so what is this? This is a virtual page, remember, is it valid? Right, because it's in the TLB, what does it translate to? 306, so what's the final answer here? 306, 346, that's it. So once I've translated those top 20 bits, I have to plug the offset back up. Because the offset identifies a unique byte on the system, but it's only the top 20 bytes of each address that actually get translated every time. All right, what about this guy? Yeah? 050336. 050336, yep, got it, replace it, plug it back together, rip it apart, find the top 20 bits, plug it back together. Yeah, any questions about that? So this is what would happen at runtime once these addresses have been inserted into the TLB by the operating system. And remember, the TLB is this special piece of hardware that's built to do this thing and this thing only, and so it is extremely fast. It's close to the processor, every processor has its own TLB. So the address management is actually done on a per core basis, and this can be done, I don't know what the cycle overhead of this is. This is a great question, but this happens all the time. So I suspect it's like zero. I suspect this causes no stalls in the CPU pipeline. Because every code byte that I pull, every data byte that I pull, it's all being translated. All right. So where did, yeah. Ah, okay, good question. So what happens if it's not found in the TLB? Let's say I tried to translate something that was not in the TLB. What has to happen? Yeah, I mean, have you guys seen this error? When you've been, some of you guys have seen this, right? On OS 161, what does it say? TLB miss on load or store. Usually that's a bogus address that you gave it, right? But the error message that's being generated by OS 161 is actually an error, that's an exception that you are going to get very familiar with in assignment three, because you're going to handle those. What's going to happen is the TLB is not all of them. Some of them are your fault, right? So those is just like, okay, I'm going to panic and die. But some of them for assignment three, you're going to handle because they're going to be valid addresses that the TLB doesn't know how to translate in. So where do these entries come from? The operating system puts them there. So remember, when my execution locality changes, I jump to a different part of the program. Initially I get a bunch of TLB misses because the entries that are there are not describing the part of memory that the program is using. What happens is I break into the operating system and the operating system says, okay, this is good, this is good, this is good. Plugs those entries into the TLB and off we go, right? And this is called a TLB exception in this particular case and you guys have seen some of those when you tried to address buggy memory. It's possible that the address that, so an invalid address, like something that's going to eventually cause a segmentation fault, it's going to run through kill curve thread and shut the process down. The first thing it'll do is generate a TLB miss. But what will happen is the operating system will look up that address and be like, no, no, no, this is bad, bad address problem, and then kill the program. And again, this is stuff that you guys will have to do for assignment three. All right, I think I'll stop. Am I out of time? People keep getting up too early. I'm not out of time, I've got four minutes left. Sit down. Sending me these signals. Have you guys figured out? This is like psychological. Like if I started putting on my code at 40 after, we can get out of here early. All right, so pros to paging. What's nice about this compared to segmentation? I don't have to store the size, so that's nice. But what's really nice about all the pages being the same size? What does that eliminate? External fragmentation. Remember, sometimes it's like I allocated 20 bytes here and I allocated 20 bytes there and I've got 10 bytes left over and I've got 10 bytes over there and I need 20 bytes, but it's split up and I can't do it. This goes away, everything's 4K. I need 4K, I've got 4K, right? If I've got chunks between two allocations, they're 4K or a multiple of 4K, so it's not a problem. So this really helps with external fragmentation. Many of the things I liked about segmentation can be layered on top of this approach. So in a lot of cases what you see is sort of a hybrid model where some of the operating system data structures are organized like segments, but within those segments, I have per page information. So I can still do some of the things I wanted to do with segments. I can have segments be read only if they describe the code or whatever. It's a better, and there's even less internal fragmentation. So I've got zero external fragmentation and much less internal fragmentation than segmentation because rather than having to give the entire code region a contiguous chunk of memory, I can only map the parts of the code region that are in use. And when we talk about swapping, we'll see why this is even more important because the parts of the code region that aren't used, those features from Photoshop that you never use, those may never even get into memory at all. They may never come into memory, they may live forever on disk. You paid gazillion dollars to that piece of software and some of that program has never, ever even been in your computer's memory. How does that make you feel? All right, so, and no external fragmentation, okay. So cons, one of the cons, I'm almost done. One of the cons we just talked about, which is this does require a little bit more translation. The translation granularity is smaller. So I need more hardware to translate less memory. And I need per page operating system states may be wondering how does the operating system keep all this information about the process? And here's where I get to do some fun engineering inside the operating system to make sure that the operating system itself can look up information about pages efficiently because it has to, the physical address that ends up in the TOB, that's something that the kernel had to find. And one of the things you guys will talk about either Wednesday or next week is how that happens. All right, I'm gonna stop here. Good luck finishing up assignment two. I'm really sad I can't be here. I think people are doing pretty well, but keep cranking this week and I will see you guys the Monday after spring break.