 I am assuming that most of you know about these topics, at least have seen something in some form. So, the first portion of this lecture, these nights will be just recaps and then we will get deeper into cache architectures. So, when I talk about virtual memory here, we will not really get into the details of virtual memory because that really belongs to an operating system. Here we are really talking about what a processor needs to offer to facilitate implementation of virtual memory. So, why virtual memory? Very basic question and most of you probably feel that because it is an illusion of bigger memory, that is not exactly true. That was not the reason why virtual memory was invented. So, here is a question put in a different way. In a 32 bit address space, you can access 4 gigabytes of physical memory. Of course, you will not get the full memory any time because operating system will have some results. But still, even if I assume that you get 50 percent of 4 gigabytes, that is a huge amount of memory. And the day to day applications that you run would probably be happy with 2 gigabytes. So, another question arises is why virtual memory? Today every machine comes with an operating system which has an elaborate virtual memory unit. So, virtual memory was not invented because you wanted to give the user an illusion of bigger memory. There is a greater reason for this. Does anybody know why? So, that was the main reason. So, the point is that there are important applications that have much bigger memory footprint. Like databases, scientific applications operating on large matrices. However, the point is that even if your application fits entirely in physical memory, it seems unfair to load the full image at startup because none of these applications, even these ones, will require the entire memory footprint at any point in time. It will require a small fraction of that. So, that is the locality principle that we believe in. So, however, if you do that, if you actually try to load the entire memory footprint at startup for an application, what will happen is that it will take away memory from other processes that could run in the system, but probably does not need the full image at any point in time during execution. So, it hurts multi-programming actually. So, what virtual memory actually allows you to do is to take this particular physical memory and allows it to multiplex on multiple processes and that essentially requires a mapping process. You have a virtual address space that is the application space. You have a physical address space and you want multiple processes to share this physical address space. Essentially, what this means is that you take a virtual address of some process that will get mapped to some physical address. And this is the mapping that virtual memory actually establishes. And this allows you to bring small amount of data from the disk to the memory for a particular process, work on that. And then when you are done, eventually it gets thrown away. It gets replaced by some of its own data or some other processes. So, that allows you to do more multi-programming. If you did not have virtual memory, there would be no way of doing that. The process starts with a fixed physical address space. And you have to load the data and the instructions exactly at that address. Just to give you an example, suppose you have two processes P1 and P2. And if you assume that the process address space is fixed, fixed in the same way, certain range of memory will be assigned to the instructions, certain range of memory will be assigned to the data, certain range of memory will be assigned to the stack, etc. Then P1 and P2 will probably have the same address range for the instructions. So, now if I want to run P1 and P2 together, there is no way to do it actually. P1 starts up, it fills up, so let us suppose that it is my physical memory. And let us say that the lower portion of physical memory is reserved for code. So, P1 starts up, it loads the entire code. It fills up the code area of the memory. Now, when P1 gets context switched out to be able to run P2, I have to remove this entire data from memory to disk. Even though maybe this entire portion is actually free, I know it will use that. Because there is no way to translate this address into this address. It is a fixed address. So, that is why it hurts multi-programming because to run P2, now you have to remove everything back to disk. You bring the data and instruction of P2 into here. P2 runs for a while and then again there is a context switch. Again, you have to do the same thing for P1. So, every context switch takes a lot of time. Because if you remove the data and instruction of one process, making the data and instruction of another process will do the same thing. Even though you may have a huge amount of memory left unused completely. There is no way to use that. So, virtual memory allows you to do these things. And that was the main reason why do we do this. It is not that it gives an illusion of bigger memory. But yes, it does in certain implementations. However, this is not always true. You can easily design a machine which has a smaller virtual address space compared to the physical address space, which is fine actually. The machine is still not working. Any questions? Always. So, virtual memory, since it is some form of a memory, it needs an address to access that. And that is called a virtual address. So, if you assume a 32-bit virtual address, every process will see 4GB of virtual memory. 2GB of 32. And this is much better than a 4GB physical memory shared between multi-program processes. Because now every process gets an exclusive 4GB virtual memory. But when it runs, of course, ultimately, this is what is going to be used. It will be multiplexed between multiple processes. The size of virtual address is really fixed by the processor data pathways. So, that is the only thing that decides how big the virtual address is going to be. For example, a 32-bit machine, you cannot generate a virtual address which is bigger than 30 bits. So, there is no way to go beyond 4GB of virtual memory such as processors. Because you have seen that how virtual addresses get generated. For example, when you in MIPS, when you say something like loadward $2, $0, $1, whatever this gets completed to is a virtual address. So, that is what the processor is going to generate. And its width is decided by the width of this register, which is 30 bits and 30-bit processors. So, 64-bit processors, like the alpha line of processors which are there longer there, some are not as far, also not there anymore. AMD, Athlon 64 onward, IBM power 4, power 5, 6, 7, MIPS, R10 key onward, Intel, and today's all Intel processors provide bigger virtual memory to each process. Because potentially, you can have a 2 to the power of 64 bytes of virtual address space. Although normally, you will never get 64 bits, because there are certain bits, there are certain segments and all these things. It will probably be smaller than that. Large virtual and physical memory is very important in commercial summer market, because there you need to learn, run large databases. So, they are important actually. But of course, for day-to-day applications, well it doesn't matter. So, if you have really a new machine for everyday applications, you can do away with your virtual-to-physical memory, certain aspects of the virtual-to-physical memory translation layer. The only thing that you need to support is the multi-core memory part. You don't have to worry about the address space size and all these things. So, how do you address the virtual memory? There are primarily three ways to address virtual memory. These are called paging segmentation and segmented paging. How many of you haven't heard of any of these terms? Paging, are you ready? Other part of paging? You are not talking about your pagers or which you have sent messages. Paging in the context of all these concepts. Segmentations, segmented virtual memory and combination of these is segmented p. So, what we'll do is we'll not focus on any of these two at all. We'll look at flat paging only. Because again, I treat this is not really a lecture on virtual memory as such. I'm only talking about the architectural support as we get to do virtual memory. So, what is paged virtual memory? The entire virtual memory is divided into small units called pages. Virtual pages are loaded into physical page frames as and when needed. It's called demand pages. Whenever a process needs a particular page, it will load it from the next level of memory which is normally the disk into your physical memory. And that portion of memory is called physical page frame. So, essentially we're talking about a mapping process here. This is my virtual memory and this is my physical memory. I divide the virtual memory into pages. I divide the physical memory into equally sized physical page frames. And then, if I need this particular virtual page, I look up here, find out a hole. I'll put that data here. For example, maybe this is a hole where this will go. So, the physical memory is also divided into equal sized page frames. The processor generates virtual addresses, but memory is physical address, so you need a virtual address to physical address translation. The virtual address generated by the processor is divided into two parts, page offset and virtual page number. So, here's an example. Assume that you have a 4 kilobyte page size. So, that means I need 12 pages to access any byte within the page. So, within a 32-bit virtual address, lower 12 bits will be page offset. That's the offset within a page. And the remaining 20 bits are virtual page number. So, that gives you 1 million virtual pages total. So, 1 million times each page is 4 kilobyte. You get 4 kilobyte of virtual pages. The page offset remains unchanged in the translation because whether it belongs to the virtual space or physical space, the offset within a page cannot change. What changes is the position of the page when you move from virtual space to the physical space. So, essentially you need to translate the virtual page number to a physical page frame number. And this translation is held in a page table resident in memory. So, first we need to access this page table. So, whenever you want to access any data, you need to do this translation. You go from virtual space to the physical space, and to be able to do that, you have to access this page table which holds this translation. And how is this translation established? Who establishes this translation? I start a program. How does the translation get established? Who populates this table and how? Initiate the name to be integrated in the program of Sharksong. How and when? When you start the program, then it should find these elements. Is that right? So, when you are allocating an element, you should... Yes sir. When you are malloc, at that time, the table gets allocated, the populates. So, the data structures even will probably not go through the malloc. The malloc is allocated to malloc. Certain structures will not stack. So, when and how will this table get populated? Anybody who has taken an operating system course, who hasn't taken an operating system course, these residents, the rest, must have savings. So, what should be a meaningful way of doing this? What is common sense? Where you require it? Where you require it? So, can you... Sir, we just... So, basically the techniques are embedded and then we start out in all the... Every entry of this page... Every has a bit slide. If this is embedded, if you don't start out on the operating model. So, this gives us an idea of the page's startup and therefore, the startup, we take up on the list... Fine. ...and get it back on the list. And, otherwise, we call for a page entry to policy, which attributes a page. This is how? The table gets populated. You didn't say anything about the statement. How do I... Where is an entry allocated in this page? So, every process gets a page... Sir, when the requirement arises? When the requirement arises. Yeah. So, whenever a process needs a page, it will be brought from the disk, allocated a physical page frame. And, at the time, the operating system establishes a translation and allocates an entry in the table. Okay. So, now, since we need to access this particular table before we can go and access the data, there has to be a way to get to locate this table. Where is this table located? That's the first question we can answer, right? Okay. So... Sorry. Answer to this question? What's the question? Right? We can... How do I find the page table? So, we have got it to a new residential complex, right? We haven't been there yet. So, there's a central location where you can go and ask. But here, I'm looking for the spread of mine, where which flat is in, right? So, I want to know the location of the central thing. How to get there? What is common sense here? Starting the address of the page... Yeah. So, what is that? That's exactly what you're looking for. So, if I know the starting address of the page table, am I now? Is that enough? Yes. Why? So, when you're using the offset and you can get to the point... Which offset? So, the page number offset. So, we have a virtual address, right? Yes. So, let's assume that you know the starting address of the page table. So, this is my page table. Using the offset, you can... This offset? Yes, this offset. What is this offset? This offset gives the offset with respect to the page number. No. No. That's not the... What is that offset? Somebody? Offset. Offset using a page. Okay. All right. So, let's assume that we know the starting address of the page table. Using the VPN to map to the number and then where in the page you get the address... No, no. I'm not holding up the page at this point. I just want to locate the page table entry in this table. At the VPN... At the VPN... Well, not exactly the VPN. VPN multiplies the size of each other. Well, very much. Exactly. Well, how do we know? Page size, right? According to the number of pages and all that. Number of pages. Sir. The number of pages required to address the pages. What is the meaning of that? Sir, if there are ten pages required for that page... So, what is the meaning of that sentence? Ten pages required for the page size. Sir, what was the question? The question? Okay, good. That's the right question to ask. So, the question was how do I find out the page table entry corresponding to this virtual page so that I know the translation? You index to the VPN at the entry of this page table. Is it the VPN entry? Yes. You're assuming something in this entry. I'm assuming you have one page table. So, there is only one page table for process. Yes. That's true. No. No, no. How do you know that? That's way too complicated. You're not even talking about it. You're almost correct, but you're assuming something implicit. You're saying that starting from this, go down by a VPN amount, right? And this is the entry that I'm looking for. That has an assumption implicitly. What does this entry contain? What address? Out of the page... What address? Physical address. Physical address. Is it the physical address or the physical frame number? I said that the offset doesn't change in the translation. What does this entry contain? Physical frame number. Anything else? Offset? Offset doesn't change, right? It cannot change. I have a page in the virtual address space. I have the same page in the physical address space. See, if I'm looking for a byte within the page, the offset here will be exactly same as the offset here. That cannot change, right? What will change is the location of the page, right? And that is what the physical page frame number is. I need that. What else is required? So you mentioned about a valid page or something. If it is a valid page, if it is a valid page, can you even check the valid page? Can you put that strand down? No, no, no. I'm not trying to even interpret the content. I'm just saying what are the contents of it. There is one valid page. There is a physical page frame number. What else is needed? What is common sense? It's all about common sense. There is nothing great about it. What else do we need here? No, wait a minute. Some bits influence the replacement models. Can you elaborate on that? Yes, exactly. So there are some replacement statements. What else is needed? The page table entry. So that's the valid page, right? The translation is valid. The page is in your brochure. That's the interpretation of the valid page. Do you need any permission bits? Brings a bill. Have you heard of that? It's a permission bit site. Read write permissions. Read write execution. Read write all the permissions. And there is one more bit that's needed. And that's what is it? Not needed. To signify if the page is modified. Why is that needed? Somebody, do you think of any use of that to me? Sir, if some other person is updating the page, then it should draft it to this. It should ask people to draft it to this. Exactly, got it. So that turning bit tells you when the page gets replaced, but the confidence should be written back to the result. So there are several things in this particular entry. So now coming back to the previous question, is it enough to go down by VPN entries here? VPN what? VPN number of entries or VPN bytes? VPN entries, right? So if I want an address in terms of a byte address, how do I get that? Starting address plus VPN. Multiply by? Sign of an entry, okay. Thank you. Okay, so the page table base register PTBR contains a starting physical address of the page table. PTBR is normally accessible in the kernel mode only. And this is set up by the operating system when a particular process is loaded. When you're starting a process. So you need to know this. There's no other way to get to the correct entry. So PTBR has to be known and operating system knows it. So I assume that each entry in the page table is 32 bits. And we have just discussed what these bits might be. So this gives you the following formula for computing the page table entry address. PTBR plus VPN shifted by 2. VPN multiplied by 4 is a change. So this gives you the byte address where I should read the page table entry corresponding to this particular virtual page number. So you access memory at this address to get the 32 bits of data from page table entry. These 32 bits contain many things like a valid bit, the much needed physical page number, which could be 20 bits for a 4 gigabyte physical memory with a 4 kilobyte page size, because that will have 1 million pages. Access permissions like read, write, execute and a dot here are modified. And there can be many other things like she has pointed out. You may need some replacement statements and there will be many other things. Now the valid bit within the 32 bits tells you if the translation is banned. If the bit is reset, that means the translation is in ban. So that is interpreted as the page is not recident in memory. And that results in something called a page font. So you have to do something to correct that font. And just to make a correspondence with what we have been discussing, a page font has to be a precise exception. So when a particular lower store instruction takes a page font, the pipeline has to be flushed, meaning that all instructions have to be moved. And you have to take the exception, come back and restart the lower store instruction. When you restart, hopefully this time, there will be a page font. This page has been brought in. So in case of a page font, the kernel needs to bring in the page to memory font disk. That is what is really needed to be done. The disk address is normally provided by the page table entry. So this is a different interpretation of the remaining 31 bits. So you have this 32 bit entry. Normally this 32 bit entry of physical page from now on, formation bits, et cetera, et cetera, provided the value is reset. If the value is reset, then the remaining 31 bits actually give you a disk address. How to locate this page on the disk? So also the kernel needs to allocate a new physical page frame for this virtual page. If all frames are occupied, it invokes a page replacement policy. Which essentially takes a page. So that's basically the new page. Okay, so page fonts take a long time, order of millisecond usually. Why does it take so long? Why is it slow? Why is it slow? Why is the disk particularly slow? These are mechanical devices. So if you rotate the disk to bring the head to the right point, it's a mechanical process. That's very slow. Why is the disk particularly slow? These are mechanical devices. So if you rotate the disk to bring the head to the right point, it's a mechanical process. That's very slow. So you must have heard of the sick time and all these things. So the point is that, of course that means the page replacement policy must be intelligent. Because you don't want to take a page for two of them. Which means you must replace a page which is not likely to be used in your future. That's what you need. So again, I'm not going into detail of this but this is not the topic of this course. I'll just assume that there's a black box algorithm which will replace some page. That's it. So once the page file finishes, the page table entry is updated with a new virtual page number to physical page frame number mapping. Of course, if the valid bit was set to start with, you get the physical page frame number right away without taking a page. Finally, the page frame number is concatenated with the page offset, this offset, to get the physical address. So this is what it looks like. You just replace the virtual page number by the physical page number. So this is the physical address that the processor can use now to issue a memory request to get the necessary data. So in summary, you really need two memory accesses to get a piece of data. Because the first memory access is to get the translation number. You have to access the page name. The physical address, then you actually can go and access the data. So can I improve this because it looks ridiculous, right? To get one piece of data, I have to make two memory accesses every time. Actually, we've done the translation. We've done cache. We've done stores. The recent translations, translation queries being made. Exactly. So there's a name of this structure. C-A-B-F-D-B. What does this stand for? Translational Lucas-I-Buffer. Thank you. So again here if you apply common sense that tells you that you should be able to cache the most recently used translations, right? Okay. And this particular cache has a special name. It's called Translational Lucas-I-Buffer or TLB. It's essentially a small set of registers which store the most recently used translations to these registers. And normally the TLB is fully associative, but it could be anything. It could be a set associative structure also. So a fully associative what I mean is that whenever you have a virtual address, or rather a virtual page number and you're looking for the corresponding translation you have to search through all these registers. It could be anywhere. This particular translation. You have to search through all these registers to find out where the translation is. You may or may not find it actually. So each TLB entry has two parts. It has a tag which is simply the virtual page number. And it has a corresponding page template. So a TLB looks like this. It has one of entries and this is one translation register. And each register has two fills. One is a tag. One is a page table entry. Of course you can probably have a value of it also and it may have a replacement state base. So what happens is that you have a virtual address, VPN and offset. So what I want to know is given that VPN tell me the corresponding page frame number. Physical page frame number. That's what is the requirement of the translation process. The tag is the VPN. So you store the VPNs here. So you take this VPN compare against all the tags. At most one will match. Whichever matches you take the page table entry of that one and that gives you the translation. If nothing matches that's called a TLB miss. And now you really have no option but to go and access the page table. So each entry has two parts. The tag which is simply the version page number and the corresponding page table entry. The tag may also contain a process ID. Why? Share delivery. Share delivery. Sorry? Share delivery. Come again. Sorry? Because of share delivery. Share delivery. You do two processes. Share delivery. So process ID may be required to evaluate if this process can require this data. Share our permission. Why do I need a process ID? It says the tag may contain. That means it's not a require. Under what circumstances you require a process ID? What I'm saying is that I'll extend my VPN by the process ID as well. So I'll have another field here which is the PID. And what I'll do is now my TLB hit process will change a little bit. I'll not only compare VPN with the tag I'll also compare the ID of this process with these. Both must match. Then I have a TLB. Otherwise not. Why do I need this? No TLB is not a power process. TLB is not a process ID. You need a process ID. Any process can access this PLB right? And can access the page form? Okay. Can you use the page? The permission doesn't have that permission. You can use the page. So saying that a process was done has cached certain translations and given that the virtual address space of all processes is identical. Another process may come in later generate the same virtual page number look up the TLB pick up a translation and can go and access the page. Right? Yes. Absolutely. And to prevent that you need a process ID. Can I do away with the process ID and still be correct? I tell you that it's not really required. I can do something to get rid of but it is why it is made. No, I cannot do that. Virtual address is generated by the providers. Is everybody following what he is talking about? Okay. So I am saying that yes, that is correct. The problem that he has mentioned can be avoided by having the PID. But I am saying that I can fix it other way also. It has to be between two processes, right? There is no problem with the single process. Between the two processes between running these two processes there has to be some one particular event that has to happen. What is that called? A context switch has to happen. Can I do some extra work using context switch with this? Yeah? Exactly. I can just evaluate all the traffic data actually. So that is called a TLB flash. So on context switch if I flash my TLB then I won't require the PID. And what do I use by doing it? Do I need anything? Of course I gained back something in terms of storage. I don't need to store the PID at all. I simply find a TLB to do it also. What am I using by doing this? I have to repopulate the table. Exactly. So there will be a cold start effect. Whenever a process is brought in, right? There has to be a cold start effect. And the bad thing is that I am doing this in a very conservative way. There will be a conflict. Which may not actually be true. So that's why if you do not have a process ID you lose in terms of performance. Because every process will see a cold start effect whenever it is switched in. Having a process ID can simplify. It can give you back the performance. Downside is that your TLB becomes bulky. So that's all. Because these process IDs are not really small IDs. These are large numbers. So on a TLB hit you just get the translation in one cycle. It may take slightly longer depending on the design. On a TLB miss you may need to access memory to load the page table entry in the TLB. So again, there is a may here. Can somebody guess what else can I do? So I am saying that I missed the kill. Which means I did not find this virtual page number anywhere in any of these entries. Now the obvious solution is to go and access the page table and get the translation. I can do something else also. Which is why there is a main. Next to the translation somewhere else I look up the TLB first and I miss the TLB. And immediately before going to the page table can I look up somewhere else and I will think of anything. So the answer is that nothing stops me from caching the translation in the cache editor. In my data cache for example. Because it is just normal data. It has an address. It has a value. So I can just put it in the cache also. So I miss the TLB and look up my cache. If it is not there, then of course I don't know it. So we will talk about this more later. Today almost all processors actually do this. They put the translations in some level of the cache. So we will talk about that. And normally there are two TLBs. Instruction and data. Why? Why is that? Why are you sending TLB? You are sending TLB to the implementation portion. Sorry? Exactly. We avoid structural hazards. Because in a pipeline processor in a single cycle I may have to access the instruction TLB. I may have to access the TLB for instruction as well as for data. Whenever the fetch and the memory stages will be aligned to the same cycle there will be the pipeline processor. The structural hazard will arise. So another way of resolving is that you have a single TLB to access the instruction. Today all processors actually have separate instruction data. So once you have completed the virtual address to the physical address translation you have the physical address. The question is what's next? So you need to access memory with that physical address. But you actually do not directly access memory. You first access your caches. Instructional data caches are small memory structures inside your processor that hold most recently used that is temporarily closed and nearby that is spatially closed data. So what does this mean? That means if I am accessing a particular data point now I assume that I will require this data point in real future. So it makes sense to cache. So that's called temporal locality being exploited by the caches. And also whenever I access a particular data point I usually bring in subsequent few data points together. Since I am accessing this data point I will require close by data points also. So that's called spatial locality. And so this is what the cache is giving so we will talk more about this how actually we achieve this. So use the physical address to access the cache first. Caches are organized as arrays of cache lines or cache blocks. Each cache line holds several contiguous bytes 32, 64, 1, 2, 8 bytes. So how do you address a cache? The physical address is divided into several parts usually tag index and a block offset. The block offset determines the starting byte address within a cache line. The index tells you which cache line to access. So remember that cache is an array of cache lines. So this index tells you which cache line I should access. And this one tells you within that cache line which bytes I should access. Where am I accessing stuff? In that cache line you compare the tag to terminate it. So this is what it looks like. So you take the physical address divide it into three parts. And the index tells you which cache line to access. So this is my cache array of cache lines. Each cache line has two things tag and the data. So the index tells me which things to access. I take the tag, compare it with this tag. If it compares it passes then only I pick up the data. Not otherwise. So if I pick up the data then the question arises this is a large chunk of data. I just said that it is 32 bytes 64 bytes and 128 bytes. But if we look at the width instructions that we are discussing these are like 32 bit these are at most 4 bytes. The question is which 4 bytes I should access if it is a load-word instruction. So block offset tells you from where I should start within these 32 bytes. And my instruction tells me I will access size that is how many bytes. If it is load-word it will be 4 bytes. If it is load-byte it will be 1 byte. So starting from there I will access those many bytes. And that is what is my final data that is returned to the processor. There are some state bits as well which tells me for example valid invalid and many other things which we will discuss later. So this is roughly how the cache works. So let us suppose that we have a 32-bit piece of glass. Cache line is let us say 64 bytes which means the block offset is 6 bits. With 6 bits I can represent any starting point in the 64 bits. And let us assume that the number of cache lines is 512. Which means I need 9 bits of relays besides which cache line to access. So remaining 17 bits are going to be tagged. How do I get that? So I have 6 bits of block offset 9 bits of relays. So I am left with 17 bits out of 30 bits. These are my tags. So what is the size of this cache? I have 512 lines each line is 64 bytes. So the cache size is this much which is 32 kilo bytes. Each cache line contains the 64 bytes data. So I said the line size is 64 bytes. 17 bits of tag 1 valid bit and several state bits such as share, dot, here, etc. So we will probably not talk about this particular bit at all in this course or maybe a little bit. This one tells you if this cache block is modified. Since both the tag and the index are derived from a physical address. This is called a physically indexed physically tagged cache. So we will see other variants very soon. So essentially what we do is we take the physical address, take these 17 bits of tag, go to this particular index whichever index you need to access compare the tags if the compliances passes so cache it you access the data. Otherwise you have to go and access the data from the point. So the example assumes that you have one cache line for index but it does not have to be such. So this is called a direct mapped cache which maps a particular index to one unique cache block. The problem here is that a different access to a line evicts the resident cache line because it may happen that there are two physical addresses with the same index bits which can happen. And then what will be the problem? The problem is that both of these cache lines will map to the same index in the cache. There is a collision now. So this is called a cache-mess which is not very good. So how do you solve this? The solution is so this is called a capacity or a conflict-mess because see you have two addresses calling on the same index which means at any point in time you can cache only one of them. Whenever you access the other block you have a cache-mess. And the cache-mess is because of these two addresses or simply you can also see it as a capacity-mess because if you have a bigger cache probably these two blocks would map to different indexes. Because if you look at this example if I double the number of lines in the cache 1024 lines there will be 10 bits now in the cache index. So suppose I have two physical addresses this is line bits these two portions are exactly same so then these two addresses will map to the same cache line but suppose the 10th bit here 0, 10th bit here is 1 So now if I double my cache suppose I have 1024 lines in my cache then the index bits will be 10 then these two addresses will actually map to different different cache lines and the conflict will be different so it is very difficult to say whether it is a conflict-mess or a capacity-mess suppose there is a way to categorize them correctly so we will talk about that soon but for now we just say that the problem is either a capacity-mess or a conflict-mess so conflict-messes can be reduced by providing multiple lines for index that is one solution here I have talked about one solution that is you double the cache lines to increase the cache the other one is you keep the cache size unchanged but you allow multiple lines for index so access to an index now returns a set of cache lines instead of just one cache for n-way set associative cache there are n lines per set and now you have to carry out multiple tag comparisons in parallel to see if any one of the set hits so here is an example so this is n-way set associative cache a particular index corresponds to two different cache lines so which means we access both of these tags compare both of these against this one at most one will match if none of them match that means you have a cache-mess and whichever matches will give you the data so now this problem is gone actually two lines will decide here here one will go here they will coexist in the cache so now when you need to evict a line that's because it may happen that now there are three addresses that map to the same index so when the third address shows up you have to make room for it by replacing one of the other two and now you have an option which one to replace which you do not have on it so you need some algorithm here to decide which one to replace so that's the cache replacement policy so you run a replacement policy LRU for example least recently used is a good choice keeps up most frequently and most recently used lines favors temporal locality so that's how you reduce the number of contradictions the time permits will look at some of the replacement policies but again I will not go into detail of that so there are two extremes of set size one is straight mapped which is a one way set of cache and fully associated where all lines are in a single set like the TLB example we talked about it's a single set you have to make all of the maps so here is an example suppose you have a 32 kilobyte cache which is two way set associative and line size is 64 bytes so what is the number of indices or number of sets how does it look like this particular cache so there are two arrays of cache lines this is way 0 this is way 1 right so line size is 64 bytes so each way is 16 kilobyte right 16 kilobyte here 16 kilobyte here so how many lines do we have 16 kilobyte divided by 64 bytes okay that's basically 256 so this is 256 alright so you here you got eight indexes so whichever index you get that will give you two lines so that's my set so that's a two way set of 32 kilobyte cache another example is suppose you have the same size 32 kilobyte capacity 64 byte line size but it's a fully associated cache what does it mean so all your cache logs are in a single set okay so so if it looks like this etc so you have all the cache logs in a single set okay so within the set there are 512 lines and you need 512 tag comparisons for each axis there we need just two so that's why fully associated caches are expensive you need to make a lot more comparisons alright but what is the advantage doesn't anybody see is there an advantage of doing a fully associated cache over the set associated cache otherwise why do we need to be talking about this so in general what happens if I increase the associativity of the cache so can you find out my goal of designing the cache is to maximize the number of bits okay that's what I want because caches are slow but do I get that by increasing the associated are you right because the number of conflicts normally should go down okay that's actually not true it happens only at the starting point so we'll get to the detail of that very soon the curve looks like this if you plot associativity against miss rate okay a number of misses it will look like this so there is an optimum point beyond which the number of misses will begin starting so let's explain that very soon