 Hi friends. So, in this session on scientific session of ARM based development we are going to be covering cache memory ok. So, I hope you have gone through all the previous lectures including this and now you are ready to take on to this a new topic ok. So, let us see what all is going to be covered here. I will explain you what is the cache memory ok and how it is organized and what are the performance benefits of having them then what are the different design elements. So, what I mean by design elements isthe various parameters that goes into designing something that is called design elements or design parameters ok. So, you have so many requirements or the new set of requirements or the limitations around which a system has to be designed. So, if we need to design a cache memory ok, what are the different parameters that we can see for an example if you are to design a car ok. So, what are the design elements with a car? So, you can decide whether you want ok a diesel version or a petrol version ok and then there are so many you know inside of the car what are the features you want to have ok. We can decide so many parameters to before designing a car. So, similar to that if we need to design a cache memory there are so many parameters that we can tweak ok. So, we will study those things and then mapping function is one of them ok we will go in detail of this and then understand how it is implemented ok good. Now, before we go into designing the cache and what is the cache memory let me give an analogy so that now you understand why cache memories required in the first place ok. So, maybe let me change that color assuming you know we have a library ok. So, site lines are not there. So, library this is a library rack it gives some various addresses and that is the sensenow in the library you see that there are so many so so many of you know code will be there where to identify you know a session of you know which book is kept there ok. So, there will be so many racks ok. So, there will be in the racks there are lots of books kept. So, there will be some coding to identify which book is lying where you know it may be categorized as this is may be C 6 report you know rack and then this is may be a chemistry one or you know the different things. Now, let us see as a person we are going into the library ok we want to access certain books you know may be you are you are you are you are sitting in the library and you want to prepare a report ok. So, if I say you then you may be ok you are there you are interested in writing a report ok. So, you are you need to refer lots of books ok you you do not want to walk to the you know racks every time. So, what you do because the librarian is so nice they have kept a small rack for you. So, you are keeping the books which you are interested ok. So, you cannot sit in a suspension. So, I put some chair for you. So, you keep some books ok you need to go and pick it up from there and then keep it here. Now, it is not that this space is huge ok there may be you can keep some 10 books here hundreds of books are kept here. So, you go access may be one from here ok and then you know you keep a a book ok may be two books from this rack I am carrying it. So, that you know that different books carry some different things. Now, after you finished over what happens these books go back to the relevant places ok. Who does the librarian does because if you do it may be you do not keep it in the right place. So, they do it now the books do they have do they change their address when the book is moved from the main rack to the temporary rack that you have the at the coding that is written know may be 100.1.2.1.3 b ok 13 b ok. So, what does it mean may be this 100 101.2 is the subject code and then may be 13 is a sequential number may be each rack is this is into a b c and then 13 is the 13th book in the rack ok. So, it is coded that way ok this coding even if the book is moved to your temporary rack here which is close to a table that does not change ok the book keeps the same address it is only a temporary place. Now, you are writing the first chapter which may be based on know the different talks about the particular topic ok then what you do you are running out of space here. So, what you do you keep this book back into the place or you can may be keep it in under table. So, that librarian takes care of know keeping it properly you do not want to do it know in a wrong place you do not want to keep it back and then what you do you bring some other book ok may be another book from here and then keep it here. Now, how do you decide which books to go back which you are you are not likely to refer because may be in the introduction say chapter when you are writing the report you wanted some topic which is well written in this particular book and then now you are done with that may be you may need it later on, but now you are getting into a topic particular topic. So, based on that you go and pick up whatever you need and then you start writing the report. Now, why are you keeping it here the rack what is provided to you because you can just pull it out easily and then press for that and then know make sure you know you progress with your report without wasting your time working up and down to the racks where there are so many racks. So, you will be wasting time in doing so. So, effectively what what is done here you have a limited space may be you can keep 10 books and may be 1 or 2 you can keep it on table because if it is more than that then you will knowyou will run out of space. So, basically this helps you in keeping the books which are required for you which you are going to refer quite frequently which are which can be kept very close to you. So, the advantage of being it keeping it here is that you if you want to refer one chapter one page in that book it is much easier than going to that page and then getting it back ok. So, the same logic is what you know is done in the computer ok maybe I left one leg there ok fine. So, this is same with what is being done in the processor world. So, who is this guy this is a CPU ok you can call it as CPU or may be right name is processor ok. So, let me change the color so that whatever I am referring to will be different looks different. So, this is the processor now who are who are these this particular thing books this is the data ok or may be the instruction may be you may refer one book to say that how to write a report ok that may be one of the book that you have that may be an instruction which will give you whatever topics to be covered in your report. So, you are referring to this continuously and then it says that a write an introduction chapter or you do that and then it says that now you will start as first chapter you know covering about a overview of architecture ok and then second chapter should talk about instruction set and then third chapter should talk about may be the differentexceptions and modes and whatever. So, the books the book instruction which you are taking and you know set up instruction you are referring it very often may be if this book is not sufficient you may refer something else the based on that you may get it from there. So, you are doing that and then data portion is may be you know where do I get you know well written book on the instruction set. So, you refer that book for the first chapter may be when you are talking about acceptance there is a separate book only talks about instruction the interrupt exception. So, you they bring that book and then start referring to the to write the report. So, basically this is the processor which does that ok it accesses the instruction and data whenever it needs may be if I call the instructions are stored here and then the data is stored here all the you know then it goes from that. So, what is this whole thing is about this I can call it as main memory ok and then may be the you know. So, so this may have different places may be further another room there are some racks I would call it as a second time memory ok may be that you are choosing a place where your your topic of interest is here may be you need to talk about some mechanical aspects and temperature sensitivity of the components may be you can go to the you know racks which are switching to some electronics or thermodynamics whatever you call it as you go there and fit it up from there ok sometimes you I can call this as a certain storage mode ok which are very rarely used these books. So, you are choosing a place where which is very close to that. So, basically the CPU that is the processor where you are writing the report which is a program ok writing a report is your intent and then you are trying to do all these things by bringing different things from different sources. So, basically the processor also does the same thing whatever it has got you know this I can call this whole thing as an SOC in that case processor is you know you are there and then the cache is here close to you and then this is the table and the pen, paper and all that may be you can register set. So, you are trying to execute your goal or your program to by referring to different things and doing it. So, the CPU also does the same thing ok. So, basically remember that this space whatever wherever you are storing these books they do not have any specific address ok. The books have their own address and it retains their addresses ok the coding whatever I mentioned they retain their addresses and then once you are referred back it is all going back to the places old places ok. The temporary storage is a only for you to know ok you know maybe you will order it in such a way that you know all the instructions are in one place and then all the data is on other place. So, maybe instruction cache and you know data cache maybe if you are given another rack you may store it in a different place or unified cache you keep both of them together. So, basically this is what is happening in the processor for it to use cache ok I hope this is this example has made it clear to you ok. So, let us now change the color not the topic only the colors ok. So, I will talk about this and then this specific implementation ok let us see cache memory which is a person who has really thought about this innovation ok. Slave memories and dynamic storage allocation. So, this is very the e book e copy of this is available on the net if you look for this particular transaction andthe article nameyou know sub if you give you will be able to read this this hardly a two and a half page write up ok it is a paper two and a half pages maybe I think I remember. So, this clearly talks about the cache how it can be implemented ok. So, it is a early thought you can see that this is the year of this innovation and this has resulted into a huge you know innovation in the processor architecture. Because I told you that the memories have different characteristics and then if you want to take advantage of that you need a cache to or different kinds of you know memories to compensate for the problems with the huge memories ok which we talked about that in the last class. So, this is the innovation which is the basis on which the cache memories are designed ok. It is a very good paper it is very easy to understand because it is a early thought in 1965. So, you know you will enjoy reading this ok. Now, what are caches a smaller faster storage device that acts as a staging area of a subset of data in a larger storage device. So, the rack that is closer to that person near the table is the cache and it is a subset because it is having a few of the books which are there in the main rack. So, we are not you know the person has not brought some book from outside and kept it here ok. There he is a person is using the books which are already available in the library and then only thing is he is keeping it there close to it for easy access from the larger and slower device because the getting it a book from you know the far off places will be slower, but it is large ok. So, similar thing is true for memories also. So, if you different levels of memory if you name them as this is may be you know 1 2. So, different you know 1 k ok and then the next one is k plus 1 suppose that the 1 is which is below. So, the faster and smaller device is k ok with a smaller number is you know is faster and slower device the larger and the I am sorry faster and smaller device and then the larger and slower device is k plus 1 ok the next level below good. Now, what is the advantage of memory hierarchy we have already talked about it programs tend to access the data at the level k more often than k plus 1 we know that and then the storage at k plus 1 can be slower and thus larger and cheaper and per bit. So, it can be larger no problem because if the program is going to use this memory only more often. So, it can be k plus 1 may be larger inside as well it can it will be cheaper anyway. So, net effect is you know this sometimes you know read it ok this whole whole thing a large pool of memory that cost ok as much as the sheet storage near the bottom ok. So, we are trying to compare a huge memory is there ok you know somewhere in level 1 somewhere in level 2 somewhere in level 3 ok is like that is a large memory. So, the total cost of the whole memory that you have that means, taking into consideration of all the levels in the memory hierarchy you know that the bottom most you have the cheapest cost right. So, we are organizing in such a way that the average cost of the whole memory system is close to this cost that means, what it is cheaper ok close to this cost and serves the data to programs at the rate of fastest storage near the top ok this is the rate. So, we want we have designed such a way that the cost is closer to this and the rate of the memory access is close to some level which is very close to register is this. So, we are having the best of both worlds ok that is our ultimate aid. So, this is a you know nutshell this is what is the goal of going in for such a memory hierarchy ok. Now, how can a cache be arranged in a processor ok or connected to the processor. So, this is a typical system ok word processor a processor is there core and then it may do a word accessor byte accessor half word access ok and with the main memory we call this main memory as a slope things ok. Now, we are not talking about other levels ok main memory then secondary memory all those that is the hard disk the tape all those thing we are not going further down ok. Our interest is the cache design we are talking about a cache which is designed between these two ok. It is not that we do not have any caching of some other logic be a you know between this memory level there may be there may be you know if you know in a Unix word buffer cache is maintained ok buffer cache we call it as a buffer cache this is something to do with the disk access ok when the OS is you know Unix or Linux is accessing the disk hard disk the buffer cache is maintained ok a part and then you knowbefore whatever is to be written into the hard disk has written into this and then later on that is question to the hard disk by the OIS. So, this is the cache arrangement between maybe main memory and the hard disk ok. So, this philosophy of caching is there everywhere ok andone more example may be further down I can say that I told you web server is here and then there will be a proxy server here ok proxy is what is actually doing the job of a web server ok in the networksorry server. But it is actually a mirror image of the content of web pages whatever is there in the main server it is kept here. So, this may be you know you can say in a word may be this in the USA and then we have in India own proxy server ok and then people accessing this some pages which are maintained actually by this server by India you know sites in India the network you know PC is connected to the network and then they are all placed in different cities in India they actually the proxy server brings this data and then keeps it here for serving these people. So, this is the cache ok between this and this you know PCs or the network between the web server which is keeping the main pages. So, proxy server which which keeps the pages cached here so, that it can be served here. So, what is the thing may because it is placed inside in Bombay or Bangalore or Delhi. So, it can easily serve the request from the local connections. So, the caching mechanism is there in the memory level between every level ok between every level there will be a different kind of caching ok. So, that is why I am explaining you these things so, that you are you do not think that caches only what is in by what is in the processor it could be between any memory level, but what we are interested is between the core and the MM now ok. So, we are restricting to our discussion between these two memories the processor core is a fastest which is registers and then we are saying that main memory is what the programs and data are maintained and we are trying to see how can I design a cache between these two to serve me that data or instruction much faster rate to the processor. So, this is the design ok. So, see here the cache access is fast and any access to the memory is slow it is relative this slow is compared to this is slow, but compared to some hard disk access it is faster. So, it is relative. So, please remember we are talking about only top 2 drivers may be L 1 cache here may be L 2 cache is next and then processor registers are here. So, we are talking about the caching mechanism here and then main memory is here ok. Now, maybe we will remove this suppose only one cache is there then this is between these two ok good. Now, what is this guy? This I will be talking in the later lectures, but just keep in mind this is some buffering where is it actually it is inside the processor ok if it is associated and if you see maybe I will remove this so that you can see that it is slow and here it is fast because anything inside the processor is accessing them or writing into them is faster right, but this will be like a FIFO. What is FIFO? First thing first out suppose this is the order in which things are pushed this is the first data the second data third data ok. Now, what is being pushed is may be the data and address at which it has to be written. Suppose there is a write you know you are saying you know STM or store you know some register values is stored into the memory ok the ultimate aim is to store it into the memory some register values, but actually writing happens to this if it is already in the cache I told you the data is already in the cache it is written here and this cache controller decides to write it into the write buffer it is written here and later on it is pushed out to the memory. What I mean by later on when the processor is executing some instruction inside then it is not doing LDM or STM instruction that means it does not want the memory and then no other peripheral or somebody or DNA is not transferring anything with the memory then this write buffer takes control and then in its sweet time it pushes this data into the main memory. So, that is the purpose of write buffer. So, write buffer is a a limited may be no depth of 16 words or 32 words you know it could be anything, but it is not in terms of kilobytes or something ok it would not be in that range. This will be in kilobytes that write buffer will be few bytes few words you know bytes does not sound nice words that means 4 bytes if you call it as a buff word ok. So, few words will be there. So, the memory this is a and again is implemented with the you know a set of fast may be registers it could be and we write into the processor in fact, processor does not write cache controller writes into it because it knows that it is a write cycle and then the data is written into this. So, I will explain to you explain you clearly what is cache and what is may be write therefore, more later on, but at least you understand this flow you you know be clear about this ok. So, this access is slow. So, we are using this path ok. So, what is this? There is a possibility that ok some part of the memory is not to be cached non cached ok. You may wonder we have a cache and then why are we saying that some memory needs not to be cached. One typical example will be peripherals. So, you may wonder why peripherals are coming here ok we are talking about memory, but peripherals this is the right time may be I will introduce you to that let me go here. See this is CPU if you are familiar with x86 processor there are some instruction called in and out ok. They are all they were all actually for writing into some peripheral registers ok. So, it had a memory address and Ivo address memory and Ivo. Ivo is what any peripherals ok and this had a separate address space with no Ivo space. What I mean by space means a set of no viewers of locations that we could be access using separate instruction. Once you say I am interested in writing into I n and then say may be 52 there is a 54 there is a decimal number then the 54 location in the Ivo space is written in ok. So, once you use this instruction it maps on to some different space address space it is something different from memory ok. This and this are totally different space and an address 54 will be accessing something in memory and Ivo 54 in or out instructions will be accessing some other location in the Ivo space. So, normally they are all mapped on to. So, when I say something is mapped on to there is a address decoder ok address decoder will be there in the chip in the you know so, processors. So, when address is given the Ivo instructions will chip select will be there right. So, it will select a particular decoder may be this Ivo decoder ok it will select this. So, this addresses will be decoded by the Ivo decoder which will be enabling a particular peripheral each based on the address it will be enabling a peripherals some a set set of peripherals. Whereas, if it is not a memory cycle you know Ivo cycle it is a memory cycle then it will go and enable another address decoder which has memory it will be connected to the memory chip select or CE or the very code. So, the decoder is based on whether is a instruction which I mean in and out instruction or may be a move or load instruction then it will perform different operations that was all you know processors which treat the peripherals as separate Ivo memories separate subspace. Now, what about our friend ARM? In ARM the Ivo space is Ivo is memory mapped ok what is that memory mapped means ok the Ivo registers as well as memory addresses the peripherals ok. If I call Ivo means you know it could be any peripherals they all share the same address space. That means, suppose if you have 4 GB of space ok which is that is a bit address space reserves into 4 GB of space may be few kilobytes here ok or reserve for Ivo what I mean by reserve for Ivo the decoder is same address 32 bit address ok coming into the decoder is same, but if the address ranges from may be you know if I call it as 1 kilobyte some space you know 200 0 x ok 10 bits ok to may be here 100 to 200 ok. So, that means, 8 bits and then I will have 1 more kilobyte ok may be if it is ranging from 2 to some range. So, if any address is given between these 2 then it will give out 1 chip select signal which will enable the Ivo space ok it may be connected to all the peripherals based on the address range. So, they will all be getting the they will get selected and then any data coming on the bus will be taken by them ok because they are selected and then they will be reading the data bus. So, if it is the processor is writing into the Ivo space register. So, memory map in the sense Ivo space is actually shared that even the same memory address there is no separate instruction. So, if you know we have seen all the instructions are on now right did we see anything specific to in out or specific to any peripheral there is no instruction specifically mentioning say that you use these instructions for accessing the peripherals. So, basically you will use the same LDM or LDR whatever we saw earlier to access the peripherals also that means, they are treated as a memory addresses, but based on the content the based on the address values and organization of linear system a part of the memory is mapped to the Ivo space ok. So, maybe I will talk about more when we talk you know about Ivo peripherals and other thing, but you should have an idea about this ok. So, basically those things will be accessed. So, why do we need no caching for the peripherals because peripherals normally when we write into it we cannot do a particular 2 writes ok into the space may be the Ivo registers know if you write one after the other it may be interpreted differently or as soon as it is written it should be reflected in the peripherals are you another reason ok it cannot be cached and then later on it will need not cannot be kept in the write buffer and then written into the Ivo space. So, Ivo memories or the peripherals need to be written without any delay, but here you have a delay ok there is a delay in terms of effectively the time a particular write happens and actually that gets reflected in the main memory there is a delay ok. So, that delay cannot be you know cannot be tolerated in the Ivo space. So, maybe that will be accessing it directly without the coming into the caches ok. So, that facility is also there know you can say that you know some part of the memory may not be cached. So, now you understand what is the cache is it is a temporary space whatever memory data is coming into this place and then going even if you are not clear no problem you know it will become clear now as we proceed further. So, I have mentioned so many time this is high speed and this is. So, you can compare tau cache if it is a access time which is which will be less than tau memory ok that is the range ok fine. And this is a smaller size compared to this now what happens first time when you are sending the address out it is first checked in the cache ok we will see how it is checking and all that later on, but at least what happens is whenever an address is given which address it could be a processor instruction or a data access it is verified here whether that particular content is already available in the cache assume that it is the first time address is coming out it is not there. So, it goes to the memory now memory what does it do it gives the data. Now when it gives the data that gets written into this it is similar to picking a book from the main rack to the rack which is close to a table. So, the the person keeps keeps it in the table ok and then if the person wants to access it we will not keep it in the shelf which is near the table we may keep it on the table does not matter ok. As far as it is on the table it is equivalent to cache maybe we will copy it into a register. So, we call table may be a equivalent to register. So, my library example ok. So, the but the data what was read ok it could be an instruction or it could be a data it goes into the cache also ok once fails in the cache is reserved that means, taken up for keeping this data and then it goes to the register because when the CPU is executing an instruction LDM ok this this is similar to LDM R 1 comma this is a ah sorry maybe I will say LDR ok one value is copied. What is happening the address which is in R 2 is put into the address bus ok R 2 is having the address and then whatever data comes from the memory is to be written into R 1 correct I hope you know this instruction. So, we are loading into the register from memory 0.82 by R 2. Now the address what R 2 is pointing is here ok in some somewhere in memory and the value what is here is written in some location in the cache some temporary location and then it is written into R 1 also got it. Now suppose the processor is going on executing something and then it has to do some one more LDR ok assume it has it is encountering a register ah an instruction which is does not matter ok R 3 or R 2 as long as the R 3 content now is same as the R 2 content which was earlier ok. It does not matter which register your is keep in mind I am intentionally changing this. So, that you do not relate it to ah a particular register or something we are interested in the value what is there if I say whatever is the value that was used here is what is there in R 3 ok and then now I am using after some time I am using this instruction and assuming that this value still continues to be in cache ok there is no no other lower LDR was done. So, cache was not replaced assume this instruction when it is executed this is what will happen ok what is happening here the address was given, but it was picked up from the cache itself because it it seems that the value is here. So, you may copy it into may be R 4 does not matter ok both can be different as long as the address is same and if it is not modified in better way the value you will get the same value copied into R 4, but this time it is not coming from though you are saying instruction or LDR LDR says that you load it from memory it has not even access the memory it got it from cache, but the data value that value has is the address of that has it changed has it changed the address of that memory sorry the address of that particular value that has not changed only thing is instead of picking it from here it is picked up from there. So, this is what is happening in a cache memory ok. So, cache as such does not have any it is own addresses ok it does not have it is actually we are accessing the cache using the same address that we have used with the memory you would have used it with the memory ok. So, because it is a temporary location only, but today at this moment ok not today may be at time t 1 and t 2 this location by a particular location may hold the value ok of this address ok value here and then another time some other time t 2 let me write it as t 2 the same location in cache may have another value in different address ok. So, t 1 and t 2 this this is like a scratch tab we take this value and then it suppose something has been changed now I have not shown you that example in the previous example we said 2 LDR. So, suppose I do a HDR ok R 3 R 2 it was no ok if I do this now store this value into this suppose this address is same as what was copied now this value what is in R 3 is written into this place ok it will have become some value 3, but has it been modified here it may not be modified immediately ok. It depends on how the cache is designed and whether in between some write buffer is there or not all that dependency is there, but when this particular instruction is executed it assumes that it is written with the memory and in cache goes forward with the next instruction it does not wait for this memory cycle or something to be completed. So, if you recall we had used the timing times I know cycle time for each instruction all those things know we said n cycle sequential cycle all those things may not hold water when cache is there ok. Actually these instructions do not take that much time of writing into the memory they are doing much faster equivalent to some one single cycle no single n clock ok this will be done because we are not even trying to access the memory we are doing it at a much faster rate which is equivalent to m clock ok into the writing into the cache memory because it is a SRAM and it has got a fast access time and it is a small size and it is a temporary location and we know how to know which location to write into based on the address and there is some logic I will explain you. So, this STR just does this that is all it does not even go to the main memory and then in the background that is writing is happening to the main memory. So, please do not think that it will not it will never be written ok that is not true it will later on needs to be written into main memory otherwise the purpose the programmer thinks that it is written into main memory and if it is not written and it is lying somewhere and then it is forgotten then we are not doing this job what the programmer wanted to do. So, it will be done, but it will be done in the background and later on later on in the sense when the CPU is busy with something else and there is a memory no cyclist you know we can use the bus for transferring the data that time it will be delivered to the main memory ok. So, correctness is achieved as per the program what it was supposed to do at the same time the speed is also achieved because the STR was not taking memory you know so many cycles like what we mentioned earlier it is just a 1 m clock 1 single internal cycle equivalent to an internal cycle it no even the memory write and reads happening. So, that is the advantage of cache and that is what is happening when we do this ok. Now, let us see how this cache is implemented that is very interesting part of it. So, there is aname to that if suppose an access has to go to the main memory and it is does not find it in the cache a location in a you know then it is called a miss and if it is available in cache and the data was provided by the cache memory it is called a cyclist. Now, what is the percentage of that normally it is more than 90 percent. So, do not try to remember this number it is not that always 95 percent it depends on what is the cache size and what is the algorithm used and what is the program running and what sequence of memories are accessed it depends on so many factors on coming up with a particular heat ratio, but we want this to be around this time this you know data fine. Now, let us move forward I am just trying to you know give you overview again on virtual memory which will be covered later. So, this is supported by all the processor which remember allows a program to treat its memory space as a single continuous block it may be considerably larger than main memory it is all very very obvious. So, MMU takes care of mapping that virtual to physical letters this will be covered later I am just giving you an introduction because next slide I am going to talk about you know logical and physical physical caches that is why I gave you an introduction. So, what the CPU or a processor gives ok I should not use CPU word processor ok ARM 70DMA or anybody get you know above you know ARM 9 or ARM 10 whatever ok. So, now, going forward do not you know associate it is only with 70DMA it could be a any higher family processor. Now, when it is giving a virtual memory address ok that gets converted by a physical address by MMU, but before that cache comes into the place. So, cache here if you see the CPU processor is here and main memory is here MMU is in between anyway that is separate. Now, the cache can be in two places it could be after MMU or it could be before MMU. So, I am trying to just know make you aware that it is it could be anywhere ok. So, based on where it is ok it is called either logical cache or physical cache. So, it is very very simple to understand because MMU is the physical memory right. So, a cache which is closer to the physical memory is called physical cache and this is called logical cache. So, between ARM 7 through ARM 10 they use logical cache ok. Starting from ARM 11 the physical cache was introduced. So, we are not going into details of what the difference and why you know all those details ok. I will give you a high level difference between these two for you to have a clear you know some basic understanding of these two. The our intention is to understand caches ok. The logic you know designing a cache it does not change much ok whether it is city here or here, but whether it is interpreting the physical addresses or logical cache no addresses it has a different annotation to the processor later on ok. So, we are not going to be covering about covering those details here ok understand this. So, cache is coming in between processor and the memory and it could be either if MMU is there then it could be here or here ok. Now, let me give you a higher level overview of that. A logical cache maps are used as virtual addresses rather than physical addresses to locate an address within the cache. So, I told you cache is a temporary address temporary location. So, it uses the same address which is going to the main memory right. So, the problem now is because of MMU it what address was given by the processor is different from what is actually going to MMU that is why we are bringing this to caches whether we are using this address or this address. If we are using this address this is called logical cache because this is the logical address virtual address which is given by the processor. If the cache is using this then we are calling it as with a for a this is the cache ok. If it is using a physical address which is going here to the memory and the cache is sitting here then we call it as a physical cache. So, that is why we are talking about these two differences. So, processor addresses logical cache directly without going to the MMU because if you saw the previous picture let me clear this. So, I told you it will first you know processor will first check whether it is available in cache if it is there it will take from there right you saw the animation. So, before even MMU transfer translates it and then goes to the you know main memory to get the data from there it checks with this ok and then if it is available it takes from there. So, effectively if it is a logical cache it directly accesses from cache without going to MMU. Advantages the addresses need not have to be translated by MM ok that additional job is need not be there. If it is available it is a hit otherwise if there is a miss addresses get translated by a MMU and content for access at the physical address from the main memory a very natural right. Then it has to go to the main memory then it has to be translated first and then it goes there and gets actually ok. What is the disadvantage of having a logical cache? Processes have overlapping virtual address space managing logical caches are more complex in a multiprocessing system ok. I need to just give you a little background ok suppose multiple processors are there or processors ok, processors are maybe my pronunciation we may not I am talking about processors different processors are running in the same cpu ok. Now, processors have their own memory space I told you virtual memory ok. Virtual memory also varies from 0 to all f of 32 bits ok address, but it will have some address assume that some 4 40,000 hexadecimal address to 45,000 hexadecimal ok. So, this is the process address ok maybe process P 1, P 2 also is free to use the same virtual address ok please remember this is virtual address. So, I am building a process a person A is building a writing a process and person B is writing another process both will be running in the cpu may be OS is also running to handle this. A typical example is in our PC Microsoft Word is written by Microsoft MS Word correct that is running in your PC and then Accroread a PDF Accroread ok for a opening a PDF file is written by another company Adobe and that is also running in your PC correct. Microsoft is doing it has developed this MS Word and then Adobe has written a the PDF write Accroread that is also running in the PC. Did they talk to each other about which location they are going to be using for their own process code and data and what is a stack their own correct. So, there is a possibility that two processors written by two different developers may be assuming its own address space is in the same area this address. Now, when it is running in the PC what happens is the MMU comes into place and OS Microsoft you know Windows or Linux any class you know OS is coming into place and then they will make sure that if you open both you know PDF and then Word both should be running at the same time right. So, you are you are the memory DRAM whatever you have 2 GB RAM or whatever you have is 15 2 multiple locations may be this address is assume that it is some 20000 ok and then another address ok may be I will keep this ok this address is some 30000 ok something this is running here. So, both the programs are running now, but you are change the address now who is changing it the MMU coming into in the middle changes this virtual address whatever 40000 ok whenever Microsoft Word is running it will change it to may be here MS is running here PDF is running here as I read it. So, whenever this process another process is giving an address it will change it to the different address it has it happens on the go ok the CPU if it is a single CPU either this will be running or this will be running it will be multiplied the multi time slicing that. So, they will be converting the each program and code every address that is generated by the program process is converted into equivalent physical addresses where it is mapped. So, while running today it may be mapped to this address tomorrow when you are running it may be mapped to some other address you cannot even imagine that whenever you open the Microsoft Word it is always going to be running a particular address in the memory it depends on what are the other programs running in the code in the processor and then which are occupied it depends on that it will be allocating different space. So, basically the temporary space is chosen by the MMU and the wires together. So, that is what is what I am trying to say in a multiprocessing system ok when we are trying to do this the reason why explain this is when the processors the MS Word process and the PDF they change ok one of them only can be running. So, the cache was holding cache ok in between cache it was a logical cache assume it was mapping the 40,000 address which is coming out from this both the processors ok it sees only this 40,000 address because the MMU comes and then it changes to some different 20,000 and whatever the 10,000 whatever I told you which are does later on, but cache sees only the same address. So, when this process is running it will be having it will come here and then all this MS Word related data code everything is in the cache. Now, once this process goes out and then some other process is brought in that is the P2 is brought in now this needs to be flushed flushed in the sense it should be written back to the sorry not MMU it is written back to the main memory. So, that later on when this process comes into the CPU all that will be copied back. So, it will be written into this place and then the wherever the physical no different location in MMU it has given more 10,000 or whatever. So, from there the new data is brought in. So, now cache will be filled with all the data related to the pdn acrylic. So, this is what is happening when we are managing logical caches complex and it is complex because the cache has to be flushed that is what I am trying to say ok. I hope this is clear to you even if it is a little bit wait do not worry when we talk about virtual memory it will become much more clear. So, context switch that is what context switch is context switch is changing from one process to other process flushing of stack contents as well is a equivalent and then identifying that the what are the entries are in the cache is also another complex thing ok. Let us now move forward it is let us look at the performance of cache see this this is a cache access time tc tm is a memory access time. So, if you recall may be if I say it is 1 nanosecond and this is some 10 nanosecond no approximate timing ok. Now, heat ray issue is is actually a ratio ok. So, it is between 1 to 9 0 1 and 0. So, average access time is computed like this let us try to understand this. Heat ray ok let me see a 0.9 percent of the accesses ok it is taken from the cache and then 0.1 percent the accesses are going from the main memory. Now, whenever there is a heat this is the heat ratio right whenever heat ray to 0.9 0.9 times the access time will be tc it is getting from the cache. So, access time will be 0.9 time 90 percent of the time ok it is not 0.9 percent 90 percent of the time it is tc that means, 1 nanosecond whereas, the whenever there is a miss happening if it has to be taken from the memory memory it has to access the main memory and then also I told you that that content just written into the cache and then goes from the cache to the register. So, it has to do a cache access as as well as the memory access. So, it is a summation of both it is not only this or only this it is a combination of both how many times 1 minus 10 9.9. So, 0.1 into 0.1 times. So, this is the access time ok of the cache . So, if you want to know what is the performance of cache you have to know what is the heat ratio it would be 90 percent or 80 percent maybe the program is too badly written that it is only hardly 50 percent whatever it is we need to have this heat ratio and then we should have this values then we will be able to find out how much time is saved because of having a cache. Now, let us see one example ok. So, take a 5 minutes break try to see if you can get the values ok. Welcome back is very simplified simplified you know simply applying this values to this equation. So, what is the time access time without a cache all the accesses will be happening with the memory only. So, that is what 30 nanosecond. So, this is the time all 50 accesses because total number of accesses are 50. Now, what happens with the cache 90 percent right it reduce 0.9 is 90 percent of the time it is happening from the cache. So, 0.9 time into 50 that many times into 1 nanosecond because that is the time it takes for accessing the cache ok. And then what happened the remaining time it is a combination of both this plus this into 0.1 times of 50 right 0.1 times into 50. So, that is a summation ok. If you have got this answers well good well and good nice even if you are not got it you may have made a smaller mistake minor mistake, but please make sure that why why is there is a there was some mistake what was the misunderstanding you had clarified that from this. This example will give you clearly what is the advantage of having a cache good. Now design elements I told you where are the what are the properties block size of the cache that means, so far I have not told you one more thing the cache when we are bringing ok this is a processor main memory. So, when data is brought from main memory address is given by the processor it may say that I am interested in reading a byte right or a half word or maximum maybe it will say I want to read a word ok. So, these are the possible things if it is the instruction it will say one word of instruction. So, I told you in the last example in a stores case you know if you are going little farther away try to get more ok then what you can if you are too far you are going get more here you will get this much you will here you will get this much. So, that is a advantage because getting a small one byte from or one small item here from too far is a costly transport cost will be much more. So, try to get more so that it could be kept here as is the you know for later use because of the locality of reference the same logic is true here. So, the block is whenever whether any read is happening byte or a word or half word it always brings some block of data it could be 16 bytes or 32 bytes or 128 byte anything ok, but it is not a single byte or a word it is normally a few words ok. So, that is the one of the design decisions how much of block size we want to keep and then a cache size is the total size we have in the cache. So, this is the block ok you may decide to have may be 16 bytes ok, but how many blocks you want to have in the cache is another decision we need to have it could be a 1 kilobyte cache or it could be a 4 kilobyte cache or 16 kilobyte it could be anything right. So, once you decide the block ok number of blocks that needs to be there in the cache is the cache size I hope this is clear to you ok. So, naturally cache size will be in term multiples of block size it cannot be in between right we store only in blocks inside the cache. So, it has to be multiples of cache block size is very very important part ok. Mapping function ok this we are going to talk little bit more details about this. So, this is the main memory I told you that cache is a small subset of main memory size this is the processor you know blocks now you know the size this size is may be suppose 1 kilobyte and assume that this is a 4 kilobyte suppose see one fourth of data only can be kept here ok maximum one fourth of the rate. So, it is may not be that contiguous 1 kb it will be one from here one may be from here and one may be here. So, we will be copying some part of the blocks from the main memory. So, I will be using blocks. So, going forward. So, you should know what is the block actually mean what does it mean. So, we may get a block from here or may be from here and then store it here. So, how do we map which block is to be brought here and how we see there is I said 1 kilobyte and 16 bytes are there there are so many blocks are available right. Now, how do we map which block goes there know you can randomly you can put it somewhere or you can have some logic in keeping it there ok both have got some advantages and disadvantages, but please remember it is not only mapping is not based on the convenience of writing this data into this we have to see from the process perspective also when the heat ratio is 90 percent 90 percent of the time the addresses which is going to the main memory is going to come to this cache only. So, the processor should be able to find out which address is located in which location of the cache which block is holding the data. So, in that may be I am interested in a particular byte it will read from there and take it there. So, it has to be resolved by a processor also what is coming from the main memory what is temporarily stored here ok. So, that is based on this mapping function how we map the particular block. So, here this map blocks are tied to particular address, but once it comes here it is carrying a you know tag also saying that my address is this one. So, so that this data is currently pointing to this address ok it means the contents of this is stored here later on this will be removed and then maybe this will be sitting here. So, it can be anything. So, we need to know which are which location is which you know in the particular block of the cache. So, that mapping function is different options are there that we will see more and then replacement algorithm I told you that this place is limited. So, if if already it is full and then one more data needs to be got from main memory something has to be written back it is of course, it is modified. So, it has to be written into the main memory and then this location should be filled up for putting this new data into that. So, that replacement algorithm which one to choose among all the blocks ok based on the mapping function and the replacement algorithm can be different right policy. So, I will talk about that ok whether whenever anything is modified here ok is it required that it needs to be immediately written into the main memory or it can be delayed and then later on through write buffer or even flushing the cache can happen later on ok. That is also a decision a separate decision whether to have a both though unified cache for both instruction and data or can we have separate cache for both that is another decision and number of caches how many levels of caches you want. So, these are the different design the properties we call or design elements we call they are all possible different values you can choose for the different algorithm you can choose to design the cache. So, we will see what are they. So, what are the block size? Size of an element in a cache is called a block line it could be 32 bytes for a 32 bit processor it could be anything whenever location in mm is to be accessed the entire block of line cache is filled from the mm. Even if the processor wants to read a byte in that line whole line is brought to the cache from memory please remember it could be a block or a line ok we call it as a no if you are reading any books you should not get confused. So, that is why I am introducing both of them. So, whenever a byte is to be read or no return it is the whole block is brought from the memory and then the cache is maintaining it for future writes or writes to the those locations ok. So, this you should remember that means, it is not talking only the byte the whole block will be copied. What is the cache size? Cache size is small enough. So, that average cost of the cache is closer to that of memory. So, you cannot keep a huge cache because power consumption and the limitation of the chip will be a problem and cost also will go high. So, the chip area becomes more than the cost goes high ok. So, that is the decision and then we need to keep make sure that average access time is always equivalent to the PC not tm. So, cache size ranges from 1 kilobyte to 5 12 kilobyte normally it is between these two ok. Maybe high servers may have high performance computing we may have huge caches more than this, but in a typical embedded world or in a you know not a high performance computing we may have a smaller caches or the cache sizes we need from this. And then sizes having wider range because of advancement in the BI side of course, you know ok. Number of blocks this is block size what I mean by that see this is the main memory and this is a cache. So, it is a you know so, I will this more number of blocks ok. If you have more number of blocks the cache size is increasing right whole size. It lowers the miss rate because if you have more blocks maybe most of the blocks what you need will be what the program program needs is the in the cache. So, it lowers the miss rate what is the distance range? Larger number of blocks is slower and expensive ok. More number of blocks are there it is slower and it will become clear to you when I am explaining how the cache locations are accessed. So, right now take it from me if more number of blocks in the cache it is slower in terms of access time in the sense it is not lower you know slower than PM, but the TC value itself will be little bit higher that is what it says. And of course, it is expensive because you need more size now. So, naturally it will be expensive. Now, you have another option suppose the same main memory ok same size, but we are having a huge block ok. Each block is divided into huge things and then the block size will be like this ok. So, whenever any byte is required from this particular not across the thing let me rewrite this. Suppose any block any byte from a particular block has to be copied the whole block is copied here and then it is given to the processor ok. So, you can imagine you know if a block size is of you know quite old bytes ok. It will be one for reading one byte we will be doing a memory access of quite old byte and then get it in the cache and then it is given to the CPU. So, more data available within the block ok through that more data more code or data will be available within the block no issues. But it is a wasted memory if suppose you are only accessing a few bytes in that you are accessing getting the whole block of memory from the memory. So, it is a waste of byte access memory accesses because when you are modifying a small byte also the whole thing has to be written back to the main memory before everything it for bringing some other memory before you replace it with another block. So, it takes more time maybe I should have this as a here ok. It is a discerning like I am talking about discerning like this, but fewer blocks in the cache increases miss rate. So, it actually increases the miss rate also because only few are there you may be interested in accessing some other block, but you can have only two blocks of data here because cache size is you know limited. So, what you will do if you want to access some other block you have to you know write back this block and then get this new block and then give it. So, miss rate will be more larger blocks mean more time to swap. So, it is natural now even if you want to get the new block you have to write back this old block into the main memory. So, replacing them will be very time consuming. So, it is actually a design decision rather to keep two many blocks and then how many blocks to have in the cache and what is the block size is a design decision that we need to as a processor designer they need to take ok. Now, we are coming to an interesting part of the discussion mapping functions ok. Let us see cache is categorized based on how the main memory addresses map to cache coefficient ok. There are three high level you know ways it can be mapped a block of memory main memory is always mapped on to one place in the cache ok. I will explain you just take it from me now many to one ok this is the main many to one comparison and then any block of main memory is mapped to any place in the cache. That means, suppose let me explain from the library example this is the rack this is a table. Now, suppose the person decides to keep one you know may be few block few memory few books can be kept here few books here. See he decides to keep all the physics related books here all the physics related books in this location and then max related books he is preparing from for some GE exam ok. So, he needs all the books I do not know who will prepare all of them together, but assume our guy is very different he needs all of the all the books with a smiling face ok. Now, this fully associative and many direct mapping I can explain you now. If suppose it is many to one in the sense any physics book ok can be put here ok, but chemistry book cannot come here yes block it ok may be three books here three books here and three books there. Now, what happens if suppose he wants to replace he wants to refer another physics book and already these three books are there already he has to put it something back and then get a new book from there to keep it here ok. So, that is the many to map direct mapping that means, all the physics book are mapped to one location in the some few locations may be in the cache ok. Maybe I will say only one book is there suppose for a discussion sake only three racks are there. So, any physics book has to come here if one more physics book had to come he has to throw this book out not throw out of the window library will not leave you. So, keep it back in the rack and then get another physics book here. So, what happens is is a multiple physics book many physics books are mapped to one location here ok. Similarly, many chemistry books are mapped to one location in this rack same thing is true. Now what is fully associative meaning is he thought ok this is not good. So, he decided that I can keep anywhere you know anything maybe if he is reading today morning he is very fresh and he wants to concentrate on maths he can use all these three places for maths book ok. And then physics books he suddenly wants to refer it can be thrown out you know keep it back and then one physics book can be kept to he has made it such a way that it is not related to any location in the cache. So, that is any book to any place got it. So, anywhere you can keep the book ok. Now another ok I am sorry this is map the set associative is similar to what I told about having more books may be 3 2 ways set associative of course. So, if 2 books he is keeping any physics book can come here any max book 2 books may be 2 to each that is what is called set associative ok. So, if you understood this fine otherwise I will give you an example that we will understand that. So, given an address how to find where it goes into the cache now that has to be found. So, by indexing using the direct mapping you what you do now let us go back to you know memory access it is done by indexing ok. So, any physics book ok whatever he can go into access the particular location if it is the fully associative that means, he has kept any to any he has to scan the whole rack to see whether that particular physics book or chemistry book is available or not. He cannot go to a particular location to look for a physics book or max book he has to look at the whole rack. So, that is what is called set associative. So, the searching time increases because of this ok he has to do a search all through whereas, in direct mapping he can come to this location and then search for that particular thing ok sorry here full search full search he has to do when he has made it anywhere, but if suppose he has said that 2 books of physics will be here 2 books ofmax book here 2 books of this. So, the person has to do only searching around this it is a limited search for this I hope you understood this see by indexing means the first one full search means he has limited you know he has not put any limitation on where the books goes into the you know rack. So, he has to do a full search of the rack ok or if he has decided that 2 books will be here 2 books from each category will be here then he has to do a limited search in each of the category that is what is called a set associative. So, with this idea in mind ok let us go into the actually implementation for memory ok. Let us assume we have a 16 bit location a main memory address bits are 16 bit that means, block size is ok let me go from 4 ok. 4 these are all number of bits ok let me start from here address bits are 5 7 and 4 that means, 4 bits maximum 0 0 to 1 1 1 that is 16. So, 16 bytes are is a byte address. So, 16 bytes are in each block each block has 16 bytes here also here also ok. Now I am saying that 4 0 9 6 blocks are there in the main memory that means, what how many bits here you should remember if I give you 4 0 9 6 how many bits it occupies 12 bits that is plus how many each block has how many bytes 16 bytes. So, 4 bits of address this corresponds to 4 bits of address. So, totally 16 bits of address. So, 16 bits of address correspond to how much of memory this much ok 64 k we say, but 65 absolute no exact value and what is the cache size? Now you look at the cache size how many blocks are there 128 blocks are there. So, you have to multiply 128 into 16 ok then you get 2 k cache size 2 kilobyte. Now let me explain now you understood this numbers now ok. Now let us see about mapping function mapping ok see here this is exactly same as what these are all same as the cache size agree each of them is 2 kilobyte correct. When you have 64 kb ok 2 kb of cache how many such sets are there 32 right. So, how many this whole thing I am talking about ok there will be 0 1 kind of the low or if I start with 1 2 3 there will be 32 entries here you agree it was 64 by 2 32 blocks are there. Now I want to use a simple logic ok this 2 k size whatever is there ok it is mapped directly to this. That means, if I am accessing the first block of this I will put it in first block here. If I am accessing the first block of the next 2 k ok I will put it in the same location similarly this is a loss it should have been correct only it is ok ok. If I am accessing the last 2 k block 2 k not a block 2 k of main memory the first block of that also is mapped with this information what does it mean actually it is actually tells you in the address what is the 7 bits correspond to they correspond to this 128 locations ok. If 4 are the block size if you are now totally if you want to access this cache ok how many bits of addresses you need I told you you have to multiply 128 into 4. So, 2 k b correct 2 k b addresses correspond to 11 bits you agree. So, it is 11 bits is 7 plus 4. So, when I say that the each block is 4 the selecting a particular block will need remaining 7 bits is very natural very easy to got it. So, if suppose if I tell you the main memory is also of size 2 k b the exact match this block will be here this block will be here and exactly you do not have to worry, but only when I introduce another block then I have to show see whether it is a 0th block or this one the next address bit. So, I need to keep that value here saying that if it is a 0 then it belongs to this if it is 1 it actually belongs to this correct. So, if I introduce one more block now I may need 2 bits to store to say that which of them is actually here which block of this whole area is here in the cache. So, in our case the remaining 5 bits actually correspond to the tag what I mean by that suppose if I know if suppose if I access this ok it is would have been all once maybe ok. Then if I say the 3 the 5 bits ok it is a 5 bit. So, if I keep this higher address also as a part of the address here with each block then I will know whether it is here or maybe all 0s whether this block is there you agree whether I am keeping this block or this block I can make up from the higher addresses that is what is the tag. So, you need to whatever I explain it will be difficult to believe me until you think and keeping it in mind that this is the way how it is mapped and then how the way split the address and how it is mapped. So, you will be able to easily understand that. So, this 5 bits if I tell you that apart from the block I keep another place to say that I what is the which block 0 of either this set or you know this set is stored then you will be able to find out whether if this is here or this is there ok. So, that is what is done by the tag speed ok maybe I will change the color I do not like this black always ok. So, the 7 bits gives you which block we need to check ok where you know that once I go to the particular line or block I will get all the 16 bytes. So, I do not have to worry about this. So, I need to only take this bits and then map on to this 1 turn this 8 values and then go to a particular location and then look at the tag speed to know which one of this block are copied here ok. Now, let me take an example simple example that suppose I am trying to give an address which is 0 0 all 0 we need not map here it will and what will be the tag value it will be all 0 5 bits. So, I keep this value suppose I color it like this it is here. Now, next address ok because it is 128 you will know that it will change to this tag field will change it to this and then remaining will be all 0s ok all 0s. Now, this is map to block 0 ok. So, what happens I take this data ok I remove this and then write it here that means suppose I assume that I am not write modified this value. So, I can overwrite it I will get this block here and then keep it. So, where now what will be the value it would have been change to this address ok to indicate that now this block is in the cache. See if we do not have this tag there is no way we can find out which block is here whether this is there or this is there or maybe this is there we cannot find out. So, we need this higher 5 bits to be saved somewhere. So, for each block there is a separate location maintained and so, when a particular address is given what does the processor do it x know it uses this particular block index to come to a particular block and then it uses the tag field to compare against all the tags not all the tags sorry. The block line which it is interested in whether that tag is matching with the tag we are interested in if it is not then go to the main memory write this value if it has been corrupted otherwise get the new value and keep it there ok and then change the tag value. So, this is what happens now I am not explain to you what is this v flag v flag is actually valid ok. That means, initially to start with the cache may not have any content about please do not think that some blocks of memory is always copied here cache is a temporary storage I told you. When the person comes into the library the rack is empty nobody has nobody knows what this person is interested in being today. So, nobody will do a you know pre fetching and keeping it here the person has to decide based on what the intent interest is and then go and bring what is interested what he is interested to do today. Similarly, when the program is running only it will know that I am interested in this block or this block what is there. So, cache will not be preloaded with any value in a majority sense okthere may be some specific cases, but in a typical programming sense cache will be fetched as and when needed when the program is interested in accessing some location that will be brought. So, it is during that time the value plug is set to indicate that it is valid ok. The the content of this cache is actually valid and you can look at the tag we will to know which exact memory is copied here ok. I hope this is clear to you now let us see how this interpretation of this is done. Now, this is another tutorial view the word corresponds to a particular block this is the cache ok and this is the main memory. So, particular block will be you know that many bits will be used from starting from this address and this s address is minus 15 to 2 part 1 is s minus r which is set to hat and this is a number of lines that are needed to it depends on the number of lines in the cache ok. I think I have explained to you. So, what happens is initially it compares it uses the line to come to the particular address it uses the tag field ok and then compares it with the tag here if it is this know what is stored here is compared with the tag here ok. If it is a hit yes the particular word is accessed using this and then given to the processor, but if it is a miss then this will be written back into the main memory if you know it was changed and then new value from wherever it is acquired is brought into this cache and then given to the processor. So, this is the order in which it is done. I think with a lot of introduction if this will be very easy to understand now how is it implemented? Now tag field has to be compared against a complete tag locations of all the cache lines, but we do not have to compare with every tag field in this direct mapping only a particular line. So, that particular line is chosen from the index. So, use that tag field and then use it compare it with the address what is the tag that is there in the address and then if it is a hit take this content using the word index otherwise bring it from the cache. So, this data this particular word index is used to marks you know one of this words will be used to give a data, but it. So, this is very simple direct map ok I hope this is very clear to you. Now direct mapping every block in main memory is mapped to a particular block in the cache it is a many to one mapping if two blocks in the main memory are referred alternatively from the program. So, please know I will I will give you one scenario I told you that block 0 and block 0 here both go to a same location in the cache this is the block 0 here this is the cache this main memory. Now when this is there this cannot be there in the cache correct. Now suppose a program is in a loop ok and one data array i is here and may be j is here i is accessed in the loop and then immediately after sometime j is accessed. So, now when j has to be brought this i has to be written back and j is brought and then again it will go to the loop and then it will access i. Now j has to be written back and i will be brought. So, this will be within this loop this cache is getting flushed every time this is called fashion ok continuous mapping ok because this is a direct mapping has this limitation that is what I am trying to tell you because one particular set of blocks in the main memory are all mapped to the same location in the cache. So, in any case if there is a need of using these all together one after the other of course, this will be trashing will be happening. So, but it is easier to implement one comparator to, but this is the problem, but it is easy to implement because there is one comparator is needed because you are only looking at the tag of a particular line. So, once the cache line is chosen only set that tag is compared. So, it is easier to implement it is simple logic ok and overhead bit is what see apart from the actual data there is a separate tag is saved with all all blocks and then one valid bit is also saved. So, this is the additional bits are required for implementing the cache ok. Now, let us see fully associative I hope that mapping is clear this is actually simpler we are implementing the same thing ok this is 64 k memory this is the cache size is 2 k, but we are using a a different mapping ok any to any see here any block ok block I to 40 any block can be anywhere in the memory. If I tell you this what exactly happens apart from the block address which happens to be a 4 bit the remaining all 12 bits are indicating whether this is this block or may be this block or may be this block or this block the remaining 12 bits only give the indication whether which block is in the cache. So, that means, what we need to keep all these 12 bits in the tag field all 12 bits in our example ok in this example. So, earlier we saw tag was 5 bit right our example now it has increased to 12 bits. So, how many lines 128 line blocks are there into 12 bits are this many bits are to be kept in the tag field apart from that we have a another problem also whenever any address is given the 12 bit what is coming as an address needs to be compared against all the tag ok you cannot go to a particular cache line now because I said any block can be anywhere we are not you know saying that only this block will be here and then you know 128 will be here we did not say that. So, anything can be anywhere. So, you have to look at all the tags that means, all the 2 k tags needs to be compared against this 12 bit to decide whether to to know whether a particular address a particular block is in the cache or not and then we have to take a decision whether to bring the one which is not there. So, please I hope this is clear to you we do do a comparison of all the values in the tag field tag field is what as additional storage space for each block and it keeps the value. So, one advantage you see suppose I am picking this block 0 and then I am picking a block that 128 which was a problem earlier because it was mapped to the same location we have to move this out forcibly here I can find another location and keep it anywhere one advantage be any block can go anywhere. So, the thrashing problem that I encounter I told you will not happen here and one more advantage also that the number of entries in the cache is not limited by the it is not restricted ok. I am not saying that this is restricted that, but you can decide to have any number of blocks here it is not related at least let me say limiting is the right wrong word it is not related the number of blocks that you have in a cache is not related to any part of the address. That means, what once the block size is decided that is 4 bits of address which corresponds to 16 bytes apart from that the rest of the bits are used for locating this. So, there is no restriction on thenumber of blocks ok and you have to compare more number of entries that is the limitation ok. I hope this is clear to you let us see now what happens the tag is coming from the address and then every tag field is taken and compare. So, that many number of comparison hardware engineer you will know that this comparator logic will be complex and it takes more time also in terms of resolving it in if it is you knowyou not implemented properly or it needs to be it will take a huge much a period to implement it. So, it is a cost cost actually the cost factor. So, you cannot arbitrarily increase the tag size ok. So, and then number of entries also you have to limit it because the that many number of comparison needs to be done, but the advantage what you have is any block can go anywhere ok that is an advantage. So, this is how it is accessed it is called content addressable memory why we are giving the content of this memory what is inside that to know whether it is there or not we are getting it getting the line see normally we give the address and get the content right here we are giving the contents to get the address because that only will give us which line that particular addressthe content of that address is stored. So, we are storing sorry content addressable memory. So, based on the content we are trying to access the with the address and then which line is it is then we go to the cache and then get the value. Please remember the actual data is in the cache only, but only thing is the address part of it is stored here and we are trying to whenever we get a new address we try to see search through thisprocess of search user to know whether it is a hit or miss and then the content is taken from the cache or it is stored from the main memory. Now, made the main memory when it data has to be copied back into the cache it can be kept anywhere in the cache ok that is a different logic algorithm which I will talk about in the next class. So, fully also there any block of the main memory can be mapped to any location tag value decides which block of main memory is in a particular line or a block of a cache. So, it is a content addressable memory very expensive to implement and higher complexity the number of comparators needed is equal to the total number of lines in the cache we know that comparator out input use of it is equivalent to number of lines in the cache. So, the actually though they decided by the number of comparator that you can implement in the system the number of lines you have in the cache is decided by the number of inputs that it can go into the comparator. Since this mapping is not tied to any location the direct mapping or a trashing is not a problem here ok that is all. Now, the last discussion that we are going to have in this class is set associative ok here also we are having the same size, but the thing is it is 15 to 2 portion ok this is 1 kilobyte 2 kb no total size ok this is 1 kilobyte size how do I say 1 kilobyte just see that block is changing from 0 to 64 64 0 to 0 to 63 that corresponds to how many bits 32 is 5 bit right 64 is 6 bits that is what is this ok. So, 6 plus 4 bits for the block size that is 16 byte. So, 16 into 6 you will get sorry sorry not 16 into 6 63 64 into 6 ok 64 into 6 because 64 blocks can be there in the cache or it will be equal to 1 kilobyte. So, adding this also is a total 2 kb only is designed here, but it is 15 to 2 what do I mean by that let me come back this if you understand this I think we are done. See this block 0 or a 1 they are all mapped to same as direct mapping ok, but earlier in the direct mapping we had only one location in the cache to store that value. If you get next block ok it was 128 128 earlier now it does not matter half of it ok because it is 1 kilobyte cache assume that I am designing 1 kilobyte is a direct mapping the problem what I told you was either this will be there in the cache or this will be in the cache. So, there will be a possibility of thrashing if suppose these two locations are accessed alternatively to avoid that suppose if I introduce one more location and if I tell that this can be either here or there ok then what happens this thrashing problem is gone suppose if I want to access this and this simultaneously I can keep block 0 actually this block 0 goes here and then the block 64 ok block 64 goes here because block 64 is a next set you know say it is also giving block 0 to 63 this also giving block 0 to 60 that means, what it is repeating. So, block 64 can be here or here similarly block 65 can be here or here. So, now when there is a read for a two blocks or to be accessed simultaneously or one after the other they can be in two locations in the cache there is a one more place kept for back plate purpose then the thrashing problem is resolved ok that is the only difference here. So, it is actually doing a direct mapping or identifying the particular cache line and then fully associative because the rest of the thing if you want to know whether this is here or this is in the cache or this is in the cache we need to look at the cache tag and then check whether similar to fully associative. But you are comparison is limited to only two locations now only two bit not two input comparators are required not like number of cache lines. In the fully associative we we needed a total number of cache line, but here we need only two input cache lines ah sorry a comparator. If suppose instead of two way if I say four way that means, I will have one more set so that I can keep this as well this or this all in the cache simultaneously. So, we this is the combination of direct memory mapping because we are using this block index to address which block it goes into and then the within the block index we are doing a fully associative search to find out resolve the final thing ok. Once you identify a particular cache line you have to know whether one of this particular set values are there. So, that way you compare the all the sets to find out which block is sitting in the which location. So, it is not required that this block only come here and this block will cause there know this will this may come here and maybe this may go here it does not matter. Now choosing between these two is totally independent of where it belongs to ok that is the only difference. So, two way associative. Now what do we do we need to be you know compare is only we are getting we are having the tag RAM of this and this once you identify a particular thing only those two addresses are compared two comparators are maybe it is implemented this way. So, the tag particular line is coming from there and the actual address is the one of them will say I have it here or now I have it there ok one of them will say I have it then the data will go ok the cache line is decided based on the index block index and the birth. So, I hope this is clear to you ok. So, basically we are doing a after identifying a particular tag we are comparing against these two tags like similar to fully associative. So, comparing the benefits of direct and fully associative not a as complex as fully associative needs to choose only from two way or four or eight multiple directly mapped blocks can be placed in one of the six and continuous wrapping or thrashing problem is not there it is less complex than fully associative ok. For a m a set associative cache number of comparators needed is only yeah. So, instead of calling it as a how many bits comparator we are calling it as a number of comparator that is m number of comparators are acquired if it is a two way it is two comparator ok as the diagram showed in the previous slide. So, with this we have come to an end of the class it will be more time I hope this is clear to you. Make sure that you read the books also to understand this ok every book on organization computer organizational architecture will have explanation about this understand it and then try to you know take an example and then see how the different block index field and this tag and then this will be split and then stored and how many comparators are needed try to do some exercise to make sure that you understand this fully. I thank you very much for your attention and time and I enjoyed sharing it with you see you in the next class. Have a nice day bye bye.