 So, in this discussion we will be covering whatever is left over some more on the cache memory organization some of the different elements which we are not touched upon so far in detail I will do that in this lecture and then we will specifically see how cache memory organization is controlled in a on inside and on this way ok. So, I mentioned that some design elements need to be covered right what are the design elements suppose we are designing a cache what all different parameters we have to speak to come up with the final design ok. So, there are so many input configurations possible based on that we design the cache and then we get a final design of cache and which we will be integrating with the processor ok in our case it is the on processor and then we will be integrating it with the on array voltage ok. And then later on this will also be connected to the memory controller or it could be memory may be inside or it it will be outside. So, we will be designing cache ok there are so many ways you can design it and the what are the design elements we will be covering it in this class as well as I will tell you how do we controlling ok suppose ok let me a little bit many when I come to the co-processor I will talk about that. So, that will give you how we control the cache inside a processor ok. So, that is it that is going to be our focus. So, with this discussion we will be completing the cache related discussion and then we will continue with the visual memory management from next lecture onwards ok. So, what is the block replacement ok if you if you all remember the basic cache design let me again give you some introduction one of the kind of a reminder. So, based on the size you know that this is a main memory and with this cache ok. So, processor is somewhere here ok our our processor on is here. Now, let us assume ok this main memory and this cache size ok this is of 16 words ok just for a discussion sake ok. So, this is 16 words are there in this ok one word is 4 bytes ok ok. Now, this is also we have 16 words assume that we know that each cache you know is storing it in blocks. So, we are assuming now each block is of size 4 bytes ok let me write it as capital B. So, that you know that it is not a bit size byte. So, 4 bytes of blocks that is we consider 4 byte as one word and then and we are designing we have a system ok it is not practical to have only you know 16 words of a cache but just for a discussion purpose ok. So, how many blocks are there 0 to 15 blocks are there ok and then let us assume there are 4 such 16 words ok each 1 of them are 16 words so, how many totally 32 you know 16 32 then this is 64. So, 64 words are 64 into 4 bytes of main memory ok only this much is there. Now, each of this 16 words each can be divided into words right in each of them are all of them are accessed in terms of words. So, let me give you different blocks in a color ok this is one block not one block this is a this size is equivalent to the cache byte ok the whole thing is equivalent to the size of the cache. Now, what are we trying to do we have a main memory which is 4 times bigger than the cache ok naturally main memory will be more than that. So, it is 4 times bigger than the cache now we are trying to see how it can be mapped. So, just to refresh on you know yesterday's class last session let me tell you how this is done ok. Now, one thing I showed you direct mapping this means what whatever so, this first block word here word is a block. So, first block of that memory is mapped to this ok first block similarly the first block of this is mapped to the same location. So, every see once the main memory is there and your cache is there what you do you divide this main memory into the equal size of the cache ok if suppose 16 words are there you divided into 16 words. So, it should be a multiple ok first of all the main memory should be multiple of the cache byte ok first point then then once you do that then you can have the same number of blocks in the each of those division of sessions I will call it as may be know a session one session ok. So, each session will have the same number of blocks that are there in the cache. Now direct mapping does what I will do it just maps the every block into the same location. So, that means this first block will be mapped to the first block of this and this first block will be mapped to this first block. So, now if we are talking about a replacement policy ok why do we need replacement because we have a limited cache memory and then once we are done with this and then we want to get a new block from the memory we want a place know we want to see where we can place it. So, in the case of direct mapping when you are doing a direct mapping what happens that the location where each block in the main memory is supposed to come in the cache is already fixed if it is lying in the first one it will come to the first location. Now if it is in the second location of the session it will come to the second block of the cache. So, it is one to one mapping, but multiple sessions are mapped to this cache. So, how do you decide the replacement? Suppose you want to get a new block from here which happens to be a third block in this do you have any freedom to choose any any other block other than the third block if they not because you know the mapping is done to hardware ok please remember it is not done at the tie you know like a software programmable some things but know once you decide the mapping it is going to be fixed for the particular execution. So, this is going to be the mapping. So, you do not have any choice other than removing the block which is already there in this location ok you know if it happens to be different only you are trying to bring this or if earlier also this same block was sitting in the cache you would not be you know considering it as a cache mix and you would not come to the main memory, but when it there is miss that means, other than this block some other block either this ok or this or the third block here is in the cache one of the many in the cache. So, we we have to evict this put it back into main memory and then bring a new new block into this location. So, the location where we are getting this block is already predefined based on which block you want to bring it into the cache. This is the case for direct mapping ok. Let us for you know the in the same place I will explain you the other mappings also ok. Now, let us see if it is going to be a some other thing ok let me take different for that. Now suppose if it is a ok let me ok fully associated suppose if you consider it as a fully associated ok that means, what any block in the cache in the main memory can come in occupy any place in the cache assume that the design is same ok 16 blocks are there and then here also the same 64 into 64 blocks are here ok and in the main memory. So, we can this is organized in such a way that we can bring any block in the main memory to any location in the cache that means, you can pick anything here to know evict it and then bring it here bring the new block here it does not matter either it is not tied to a particular location that is what is fully associated ok. So, you know that it is now we maintained the remaining address of that you know main memory in the tag as a tag and then locate which location of which main memory block is in the cache. So, we will search through the tag and then find out. So, that is fully associated what about set associative it is a another method of mapping the cache. So, assume that I have I am adding one more of the same 16 words ok 0 to 15 you know I have added one more here that means, it is 2 set associative ok 2 set if it is 4 set associative what I have done I would have added 4 more or 2 more that means, totally nearly 4 then it will be equivalent to how many you have in the main memory there is no point in in this case having a 4 set associative for this kind of a main memory correct. So, if it is 2 set associative it is actually we have a half the size of the main memory here in the tag. So, in this case what happens either this suppose you are talking about the first block any one of this first block can be in one of these 2 locations ok. So, in this case what happens it suppose we are saying that ok we are trying to get this block, but this block and this block are in the occupying this first locations of the cache. Now, you have to decide after coming to the particular cache line or a block we have to edit one of these two ok then it is a choosing between 2 of the blocks which one to be evicted. So, that is the replacement policy. So, in case of set associative we have a restricted set or restricted blocks to consider, but it is better than the previous one it is not that only you know the we identify only one block and that needs to be evicted. Now, we have we can choose one of them whichever is being used more often we can retain it and then other one we can to to so, we are talking about based on the different mappings the replacement algorithm will change that is what I try to explain here ok. I hope this is clear to you it is a no recap of what we discussed in the last class. So, let us go forward. So, no alternatives in case of direct mapping I told you in direct mapping what happens the particular block which needs to be brought into the cache exactly comes to a particular location in the cache. So, we do not have any choice other than everything that earlier already occupying know in the cache whatever is there in the block that needs to be thrown out ok. So, another sense it should be written back to mainly. In 2 way associative the option is to choose one of the entry I told you ok. So, suppose we have 2 set associative means 2 2 entries like this suppose this block is coming equivalently the same block in one of this we can be chosen ok we have an option here. So, what is the advantage of having an option ok let me explain. See when you have an option to choose ok you can place the one the block which is coming from main memory into one of the phases in the cache that is what I am saying in 2 way 2 set ok 2 way we have 2 ok 2 way set associative here I can choose either this or this. So, what is the advantage? See we can add some more attributes to each block in the cache ok to maintain the accesses suppose you see how do you decide your block is to be maintained in the cache or not ok it is based on the usage pattern who decides to use usage pattern the program which is being executed by the process. So, based on the program system and whether it is a data or whether instruction does not matter whether it is a loop or you know it depends on the control flow of the program and then what is being executed based on that we might have some more hits happening ok to a block ok in the cache. So, what I mean by a block in the cache? See when you have a 2 way set associative there are 2 different part of the main memory is in the cache ok let me in the color you take see assume that this is mapping to this. So, it is sitting here and then another ok let me choose a different color. So, let you find some different here. So, another block is here ok it could be a data or instruction does not matter ok let us for a try you know for a simplicity for our discussion sake we let us assume that this is some data d 1 and d 2 and based on the color may be this is d 2 and this is d 1 it is in the cache now based on the program sequence either d 1 may be accessed more often or d 2 we do not know assuming that I showed more arrows here. So, assuming that d 2 is used more often that means in the program for loop there is a know variable i which has been mapped to some register r 2 and then that r 2 happens to be in this block the variable i happens to be in this block and which has come here and sitting in the processor and that i is moved into the very you know register inside r 2 and then we are doing an operation called input r 2 implement r 2 then we have to either move r 1 to r 2 and then there is a add r 2 for many instructions which are all of them are using the same right r 2 which happens to be a variable i that is actually will be used in the program for loop. So, what happens the on processor will be trying to get that value d 2 know i may be whenever it is written into then it is right into the cache whenever it is to be read it is read from the cache so there is so many read and write are happening ok. So, once again so so so what happens this is being used more often some time back and then you know the program is not using it. Now there is another variable coming which has so happens to be mapped to the same location now let me change the color so this is the variable or some value which has to be mapped to this location because up to this point direct mapping after that it can be chosen either you can sit here or here now if suppose in the cache a controller ok along with every block if you are maintaining a counter ok counter it is a hardware counter so every time it is being accessed this particular block is accessed it is incremented ok and then similarly there is a counter for this entry so that is also getting incremented but because this has since it came into the cache ok it is incrementing happens for every block coming into the cache it will be reset to 0 and then it will start incrementing from there and till a new block comes in to occupy the location so now what happens this will have a more count may be may be 10 this is having some 3 as you know now when this has to be brought in it is very easy to choose ok this should not be removed so we choose this that means we decide that ok this particular block needs to be evicted now when it has to be evicted it has to do some more things ok if it is a data may be it would have been modified then it should be written back into the same location the value should be written back into the same location and then this new value will come into this place so this would not be disturbed ok so now what happened this count will become 0 now because new entry has come and then whenever it is being accessed it will start incrementing so this is what happens in a processor inside so you should know when we read something like this what it actually needs now you should go back to the program level and then how it is converted related to the register level and how it is impacting the access to the cache ok and then choosing this so if you have all this in your background or back of your mind then you will have a better understanding so today associative the option is to choose one of the engine in fully associative I told you any block in the cache can be picked ok it is not that only you know a particular block in my memory is mapped to one location it is mapped to anything you know any to any so in that case what happens you can do the same thing but you have a wider option as we earlier in set associative if it is two way set associative you have a very narrow option so this are this ok whereas in fully associative if you are maintaining the same counter like this every block is maintaining a count you can choose the the least count and then throw that block into the main memory and then get this new block from there into that location so it is easy you have a wider size ok always we want to have a wider size in any in our life as well as in our hardware aspect so this is one advantage of associative mapping now what are the different algorithms ok I mentioned algorithms here so there are so many algorithms which can be implemented in hardware please remember it is not that one one exception will happen one software handler will come and then it will execute this algorithm in a program no it is not in software please know we always associate algorithm means it is tied to some software and we write an algorithm this is implemented in hardware ok so we cannot have a cache replacement policy in software ok we are bringing cache only to make it faster and then we say that we want to you know execute this algorithm in you know in a software and then come back and then decide oh ok I am choosing this block please evict it know by the time so many instructions even this executing this handler will involve cache please remember even executing this cacher needs cache so these things cannot be implemented in software it is all in hardware this I am bringing it to a point because to have it in mind why it is not possible that is very important to know so what is people so a simple algorithm ok whichever suppose it is well let us for discussion sake let us assume that it is a fully associative ok this choice please remember the algorithm has to be different way it has to be done for different mapping ok a cache is designed with multiple parameters I told you one of them is how it is mapped whether we have chosen one of this algorithm after that within that we want to say we look at a replacement algorithm so our set the the number of entries that we are looking at to find out the best suitable block to remove depends on the mapping I am assuming that fully associative means we have all the blocks in the cache to be a candidate for replacement in that case how do we decide we can choose people that means what the block which came first ok in terms of time ok people means first in first out so the first block which have come among the block which are there in the cache please remember in terms of time that means what we have to have some count on how many clock cycles have elapsed or may be how many multiple clock cycles are now it depends on how many bits we are maintaining for each block so there should be some hardware maintained to you know keep track of time you know when a particular block has come so it is not the absolute time it may be some relative you know measurement so it is based on the the you know some very early will be evicted so what is the the logic behind it it is a temporal locality ok in terms of time the see suppose this is some data entry which has been used may be times 0 ok now 50 millisecond has elapsed ok nanosecond or ok many second is due ok 500 nanoseconds have elapsed ok so many instruction have been executed and then we tend to based on this temporal locality we tend to believe the data or the instruction which have been used most recently will be used in the future too ok that is the temporal locality so based on this logic the instruction or a data which was used long time back in terms of time may not be used in the future so if we choose a block which was brought into the cache so late in the in terms of time scale it is not likely to be used in the near future so if I take it back and then fill it with a new block I am not going to encounter a miss looking for this block again immediately this is only an assumption and then our agreement has been signed that this block will never be used in the near future if it was used long time back nobody is giving you in any assurance but this is the best probability you know it is all some probability so most probably this is not going to be used compare to the block which have been used very recently so this if I use this as then I am going to gain benefit based on this okay so again whether it is going to be used or not or whether this is going to be used see you may have you may choose a block based on this logic but if there is a possibility that that block is used immediately after that then what we do you may have to bring that block again back into the cache so it is a investment time investment and wastage of typing so we are trying to optimize it based on the behavior of the software so this is one thing now what is random do not give any association just pick up one of the blocks in a random speed so that may be a random speed you know any random generator in a software you may have seen some software libraries generating random numbers the random numbers are very much used for so many algorithms in a software world okay for encryption or even to decide about where to choose where to save some key generation those things are all based on random algorithm so the randomness is also decided based on some hardware interface okay so may be a temperature variation in the chip or the fluctuation in the clock there are so many hardware logic is used to bring in a random phenomena and then generate a some count may be a 32 bit or random bit means more bits to keep it is a more safe so may be 128 bits some things are maintained inside the hardware and then it generates a random pattern so if you have such things may be you can choose which one to decide which was which are the blocks to be chosen I am talking about fully associated so you are having a freedom to choose one of the blocks so we choose one of them and then decide to edit that so that is the one way of deciding a block to edit another one is recently used that means you are maintaining a count of this is similar to what I told you so recently the count would have been more so based on the again usage is one thing another one is whether it was recently used or not that is the suppose I maintain a time stamp of when a particular block in a cache was accessed I keep updating that sequentially this was used is actually time sensitive it was used in terms of time whether it was used or not we are ordering the block in terms of usage pattern and then we choose the one which was not used for quite some time so it is the same as what I told you so the cache controller tracks references to all the blocks and computation procedure and then increase the track counters when he turn so if there is a miss then we are trying to choose a particular block if it is a hit we do not even run the replacement algorithm if the replacement algorithm is required only when there is a miss for a particular data and needs to be brought into the cache now when we want to bring into the cache from main memory we are trying to choose one of the blocks to be edited I hope the whole sequence is understood by all of you now we have done with the replacement now write policy let me intentionally did not talk about this because I know that I am going to be covering this in this session see we were maintaining a bit valid bit was there and then a dirty bit for each of the block please remember all these attributes maintain the counter or tag any attribute maintain for the cache and trace are all at the block level okay at a block level it could be 32 bytes or 54 bytes anything any amount at a block level these are maintained because why we cannot you know within that because we are bringing a block whenever we want to get into the cache and then we replace a whole block whenever we write back okay so now what is write policy okay now for our understanding of this write policy let us assume that it is a data cache what we are talking about now the processor is accessing the cache element it has come already in the cache so it is reading it it may do two things it may read an entry from here or it may write into the okay may be I should write here it may write a particular element into the cache or it may read from the cache okay now when we are writing into the cache okay what happens this element data element it could be a byte or a bird or you know a part of the block is written okay this similar location is there in the main memory agree inclusion right because this data whatever was here only was brought into the cache and then later on it saw and then it was trying to access from here and then writing into also it is writing into this location now what happens before this but the data element was brought into the cache the value was 10 in this location okay let me change the color again okay the value was 10 here and then the 10 was read by the processor the 10 came into the processor how did it come it was doing a LDM LDR LDR it was doing R1 come on some address in R2 that address that R2 was pointing was here so we brought the whole you know like Hanuman we brought the whole block from the main memory into the cache though it wanted only a word okay so now what happens that 10 also came into this cache and then the 10 moved into R1 now 10 is sitting in R1 now now assume that the processor is executing an add instruction okay R1 comma R2 comma R3 okay sorry again R1 R4 okay sorry about that let me rearrange it okay R1 what am I trying to do I am adding R1 with R4 assume that R4 is here inside the processor if you are having some value 5 we have added R4 with R1 and written into the R1 that means 10 is added with 5 and then 15 is written into it after this execution so what happens this has become 10 now sorry 15 now now assume there is another instruction I am writing it here okay HDR okay HDR R1 comma R2 assume that R2 is not modified so it is still pointing at the same location and then we are trying to write this value that is R1 whatever is there into R1 the location pointed by R now do you think that it will go to main memory it will not because it is already having it here so what happens actually okay this location okay gets written with a value 15 it is not doing a memory write but then what is the use of having a cache so 15 is written into that for doing a HDR okay storing this value into memory but as far as the programmer is concerned we have saved it in the memory but actually it has gone to the memory yet it is in the cache now we have an option who has an option the processor has an option now whether to write this immediately or not okay there are so many write policies okay three I am going to explain you three maybe the latest request may have more but what is implemented in form and what are what are some of the popularly available write policies are three of them that means you can decide to say I will not write this until this block is going to be replaced because some other block next to the drop into the cache okay dirty bit will be fixed okay please remember dirty bit is actually what this location has become dirty okay actually what is what is mean means what is brought is modified what we have brought from the main memory is modified no longer it is same so it has become dirty now when we are evicting this block that is what I am saying evicting means we are trying to get a new block doesn't matter we have chosen this block to be evicted now if the dirty bit is fixed to maintain the consistency that the original programmer wanted this to be done into the main memory so we have to oblige the programmer's intent so the processor needs to do what it should do it should write back this value into that location before bringing a new contents from another location into the cache it is a temporary storage so it has to be written back so that 15 comes here okay now if we delay the writing here what I am saying is true but suppose I may have another algorithm to say that I will not delay I will write it immediately you may wonder what else what is the use of cache may be reading I may get the benefit because I will get it from cache but whenever I write more often than than writing okay into the memory assuming no that is the kind of a program flow and that is what is the normal control flow is so we can get the benefit of cache by reading it from cache more often but whenever we do a write into the cache it is also written in the main memory that is one way of deciding another one is I wait till the block needs to be evicted writing into the cache but I want bother to update the main memory but I will write it only when I need to remove it from this cache so that time I will write that is a delayed writing okay that is one way of doing it and a third way is a write buffer I will talk about the write buffer later okay in the subsequently so these are the subjective things I hope you understood the background of this okay while we are talking about the write policy and where it may be clear to you now we can go with the posture so when a new data is written into the cache there are 3 options to choose from regarding the writing the same into main memory so when we say write into the cache means is it written into the cache from main memory or written into the cache from processor good question right which write are we talking about data is written into cache it is not this it is this when we are writing into the cache from the processor okay so it is now what has become it has become dirty the dirty bit is set and we have a choice either write through or write back or write buffer just before okay these are all 3 names don't worry about you know what they mean I will just explain you now before that I will ask you one more question that instruction cache needs to implement this feature will there be a need to write into instruction memory by the program so what I am saying is suppose you have a cache which happens to be instruction cache you will bring instruction from the main memory right is there a need for you to write into the instruction cache by the program program is writing the program is writing into the program code memory program is of a code is writing into a code base you know another code area okay there are some possibilities okay one is you know it may be to a memory transfer or may be program find a new content to be written into a program memory so this there are some you know the self configurable processes are there where the processor configures itself by changing the program okay self configurable processor or another example I could tell you is intract vector the intract vector let me tell you you remember that intract vector I go okay location from 0 0 to you know from the location particular location okay so what happens we are writing the branch to a handler okay this is a reset handler you know there are handlers for data about instruction about all those things so now somebody has to write into this location for the processor to access it when the exception happens okay and this happens to be in the instruction space of the memory so this part of the address will be map to the instruction cache so there will be sitting in the instruction cache so we may have to write into these locations also at some during the bring up board bring up for you know initial reset locations reset pages so yes very occasionally one example is the write into the intrapresent nothing do the write policies okay this is a another question I am asking you whether instruction cache also needs to implement this quality or not but mostly it is meant for data cache okay please remember there are policies that we decide correct cache need not be same in a same processor I may have an instruction cache and then a data cache and then I may have a policy to say that this is no there is no write policy however I have instruction cache I may do a write through policy and then for data cache I may have a write back policy so we can decide different policy I may have block size of 32 bytes here okay and then I may have block size of 64 bytes in the data cache so we will treat the instruction or data cache in a different way if it happens to be a split cache but suppose we have a separate cache for both instruction and data but if it is a unified cache then the policies will be uniform across the cache so these things you should have in mind and explaining it whenever I am getting opportunity so let us see what these three mean okay write policy write through means always write to memory and cache simultaneously so when we write into the cache when the processor writes into the cache it also writes into an engine method assuming that let us frequent then then we gain some of some status out of having a cache but write through policy does it immediately into the main memory also it gets reflected so writing into memory is under times slower than cache it is some approximate number if there is a frequent write okay it impacts the performance because of the increased bus cycle that is very natural right if you have a very frequent write into the data in performance impact or otherwise there is a gain okay more register done than more you know comparatively I hope this is clear to you write quality write through policy another one is write or copy back okay some literature or the books may refer it as a copy back or write back so both are same write only into cache but just a dirty bit and drop where write was performed write only into the cache okay not into the main memory so for a moment the value here will be different from the location actually it corresponds to okay where it was brought from the main memory this will be different from this value okay but we write into this only we do not do the no processor does not write into the main memory when a block with a dirty bit is on has to be replaced please remember we are trying to bring another block which match to the same location in the cache and we need to know this particular block which was corrupted that means with the dirty bit is set now we have to write back into the main memory so it is a delayed write so this is efficient when there are frequent writes into frequent writes that means what the program the program which is running in the processor is writing into the data more often please remember it is not write modifying any register or something it is actually a STR or STM is being executed more often store is done more often by the program because of that the memory cycles are created but it is actually writing into the cache only so it does not go to the main memory so we are saving a lot of time because memory is not impacted at all in the sense it is not accessed at all so this data will be different from what is written into this but when it is to be edited that time we will take care of updating the memory but do you see any issue here I will show you I will give you an example where we will have an issue if this will happen only when there are multiple processors exist in the system so this is one processor assume there is another processor in the same system when I have in the SOC when we have multiple processors the cache may be same again there is a design decision whether to use the same cache or different cache that is different even if it has a different cache there are that is not an issue let us assume in our case there is a P1 and there is a P1 and there is a C to cache 2 for this P2 there are 2 processors but they are having the same memory okay they are working on the same main memory in the same space so now what happens there is a possibility that a data element which is being accessed from this main memory and which is in residing in the cache 1 P1 is acquired by this guy P2 also wants the same location same memory content it wants to read it now what happens this guy has already updated it and this value is no longer reflecting the current situation of the data but this guy this processor wants it now so there is a coherency problem this is what is called cache coherency because 2 different caches are there for different processors working on the same memory space and a memory content is already taken to the cache and then this guy is updating it locally now when the other processor wants the same content to be copied it is having a stale data this is called stale data in the stack in the main memory no longer the data in the main memory is valid so this is the separate topic I may need another hour to explain this so there are cache coherency policies MSI there are different policies are used so that is not our part of the discussion in our class so we will not talk about it so what I am saying is whenever such multiple processors are accessing it then this policy is a bad choice or if you have a choice of this maybe we need to implement some cache coherency protocol to handle it otherwise write through is a better option in the sense it will be immediately regretted in the main memory so if this guy wants the data it is already updated so it can get the latest updated data from the main memory so if you write through is a write through does not need a cache coherency policy otherwise we need a cache coherency protocol to be implemented so we will not bother about that understand that there is a problem when there are multiple processes need to take care of cache coherence when multiple processors share the data please remember if it is not sharing no issue suppose P1 and P2 are working on different locations in the main memory they do not share any data at all then they will be doing only restricting their accesses only to be their own memory that physical memory needs to be separated they are accessing different spots of the main memory then we do not need to worry about you know somebody this guy P1 accessing it and P2 using it that kind of situation will not happen I hope this clear to you now what is write buffer I mentioned this write buffer when we discuss about logical or physical cache let us see writing into the cache also writes into the write buffer so let me draw a diagram okay and pick up this view let me show the color okay cache this is processor this is cache this is main memory always I am just you know you do not get to confuse with what is what now I think that is a data cache now something coming color is not too different write buffer this is the inside the processor this is inside the SOC that boundary is very much correct so whenever any data is writing written into the block in a block please remember in a block N words are there this write buffer can be a few words okay it may not be a block state so let us assume that the width of the write buffer is word one word in this case let us assume that it is a people that means what the latest you know it is like whatever data is written until it is coming here and then it goes out to main memory like this so up to this point is grid and this is all empty okay some fumer entry may be 4 entries are here and then some 12 entries are here which are 3 so write buffer is a hardware intermediate buffer whenever a processor does a write into the particular location a may be a word into the write buffer on both places clear as well as clear now what happens after that the processor continues with its execution so what was written into the write buffer unless we maintain okay both the data as per the address of the data write it has to be there both okay because we might return a location which belongs to this location we know a address a1 and then a2 may be some data is written and then some location a3 address 3 is written so this b1 b2 b3 okay these are all data actually written into this should go there is written into this now but it is reflected here in 3 locations in the cache which is reflected unless we maintain this a1 a2 a3 also along with the write buffer engineer the write buffer cannot go and write into the proper location it has to know which location it has to write into because writing by this written by the processor at time p okay p1 then at time p2 may be later in the time if this is flashed into the main memory so write buffer does it at a different time a different point in time okay so it has to have maintained the address also of what is between written written so once it is written then we can the write buffer can steal some time from the memory cycle is free then it can background it can keep comping all these files out to the data that is what happens in a real processor and our ARM processor also supports this so the depth of how many words in the write buffer is you know design dependent particular implementation dependent but the processor supports that I hope this is clear to you I am giving all the examples so that you know exactly what is happening okay so the data comes from here just written into the their cache and then that if it also may be set because it has to indicate that it is being okay but it is not bothering about the write policy is chosen as write buffer okay so it also writes into the write buffer now let me tell you one scenario where the write buffer is not getting any memory cycle at all to comp this values into the main memory when will it happen when the programs they know program is also getting access from the main same bus so program is getting access or may be new memory cycle is happening okay or co-processor is in the process you know in the system is accessing the main memory often or the DNA is doing something so there are so many you know the external factors which may impact how the memory is being used by different guys in the system so okay please remember the co-processor is inside the SOC so what happens is the write buffer is not getting time to write into the main memory so in that case what happens the complete write buffer is filled okay it looks very clumsy let me tell you so you know so write buffer and write only the write buffer is filled with lots of values a data value and an address value this is D15 and then A15 in the FIFO order so it is supposed to go to the main memory like this and the processor is writing into this location like this now this is all filled now the write policy for the processor is whenever you do any write into the cache make sure that you write into the write buffer also now what happens it is full now okay now the processor is doing the write it gets written into the data cache comes here and then sees that the write buffer is full now what happens it would have been because of some SOAR operation getting executed by the buffer processor now what happens this is in the execution stage of the pipeline okay please remember our old friend pipeline is still there it is getting executed here now the processor needs to decide whether to throw this out of the execution site now stage and then move the the decode stage instruction which is there next to the now what happens since the write buffer is full the processor decides to start pipeline starts because if this policy is followed this is what will happen so the instruction will wait okay the processor starts it will now no longer issue any you know naturally it will be here there is still a side effect the processor starts because of that there is no fetching of pre fetching of infection correct so if pre fetching does not happen the infection access is not happening from the main memory correct in that case what happens the write buffer will get some cycle to write into the main memory so these are all kind of you know same together same reaction happen if you have to have all this in your back of your mind okay what happens as a system designer or as a programmer or as a hardware designer you should know if something happens in the processor when I say pipeline starts what is the side effect of that there is no pre fetch happening if there is no pre fetch happening what is the advantage there is no memory access meant for instruction is happening it is not reading the instructions from the main memory then what happens the write buffer gets some memory cycle memory is not busy so it will write into the main memory so it will flush this out then there will be one space creator so what happens this whatever was written here will go into this location the last location and this B2 will be can be one then it will move into the pipeline like a FIFO then what happens this instruction moves out of this once it is written into the write buffer processor assume that it is written into the main memory and then it takes the next input so this is what happens okay I hope this is clear to you write into the cache write into the write buffer the memory controller takes care of writing it from the buffer into the main memory so later on it is later in the time whenever it gets time write buffer efficiency depends on the ratio of m1 y to the number of instruction hd2 there what does it mean number of times the hdr kind of instructions are hd2 there okay if it is suppose one out of you know maybe 10 or maybe 20 whatever no number that means not there are not too many hdr instructions in the screen in that case you will get a benefit because you are the processor is not going to write into the write buffer so frequently then the write buffer is likely to get a free memory cycle to flash it okay from the write buffer so we are getting from this particular quality processor gains as the system is going to be there you go let us see the data written into the write buffer is not available for reading until it has executed the write buffer so what I mean by that see there is a possibility that when the particular data is sitting in the data buffer write buffer okay this is the data cache okay an element is here okay this is modified and it is sitting here because it was modified it was a quality write buffer quality so it is here to be written it is not in the top of the queue so some more data need to be written then only this will be written okay now what it says is the data written in the write buffer is not available for reading until it has executed the write buffer so the the reason being I tell you the possibility you may see that okay if it is in the cache why can I not read it from here you have to have it in your back up mind the eviction the replacement quality is a background problem okay it is another background thing happening so there is a possibility of this particular cache entry okay is evicted in the sense it was actually written into um write buffer okay when it was in the cache okay and then because of the replacement quality some of the data was supposed to come and this is evicted but this earlier address it could be here okay because it is actually you remember write policy is for use for replacement also so it is written into the buffer we can now you do not have to worry about the dirty bit or anything okay we can evict this and bring a new data into this okay assume that the same location a new data is brought into the cache okay because we have written into the write buffer process that assume that okay I know it will anyway one day it will go into the main memory okay not one day within few title it will go into the main memory so I do not have to worry about it is in the cache so I can flush it out and then bring a new value here now what happens if this is not gone into that we cannot know there cannot be a read to the same location this is lying here to be brought from the main memory and then we cannot place it here in another location because it is a free associative the same location wherever this is pointing it okay this is what is pointing at this location is here we cannot bring it to the some other location this failed data because already there is a very most recent data is available here we should go out to the main memory then that should be brought into the cache so it is not available for read until this is flushed out of the wrangler part so the processor keeps track of okay there is a particular entry is there in the write buffer then it does not allow the reading to happen so basically what is happening okay from the same location then this instruction will not proceed okay it will not proceed it will be blocked till this write buffer flushes it and then if the data is brought from that to the cache to some other location may be now it will proceed further because the data is brought into the register also wherever it is it is in the cache so it can proceed so that is what happens okay I hope I am explaining more so that you understand the logic behind it so multiple writes to the same location will leave the last data written in the location and the other writes are not so what I mean by that one more thing we should remember suppose there is a data here as okay being 3 and then you know address is a3 and whenever a write to that location is happening okay in the cache if it is in the you know already one entry was there gone into the data now write buffer but it is still not written into the main memory then one more write is happening suppose processor is writing into the same location again now what happens there for the same location a new data has come so what it will do the processor will write that new data here also in the write buffer okay anyway most of it is in the pipeline to in the C4 to be written into the main memory so it will update it so any intermediate writes are reflected in the write buffer but it is not reflected in the main okay the intermediate writes but the last write will write reflected here that is what I am saying here so if I erase this multiple writes to the same location will leave the last data written in the location and the other writes are lost in the location image in the main memory location only the last write will be left in the main location but intermediate writes are only happening in the write buffer and it is not happening here but as far as the correctness is concerned it is you know the last data is written so no issues okay but it will not be any case when multiple processors are there but I am talking about a single processor so it is not a mission it is called write collapsing or combining or merging write combining or write merging or write okay so if you encounter these words remember that it is what is happening is the data is getting into the write buffer entry but it is not getting into the main memory one when it is a due for crashing it comes out with the last data written into that will go to the main memory I hope this is clear to you if the buffer is full the processor has to wait I will expand this scenario just now earlier okay now we are going to unified cache I told you that infection and data can occupy the same cache okay in this case what happens a particular location a particular block in the cache can be competing for instruction to this instruction to sit or for a data to sit in the location both of them are competing for the possible for the same location okay it is not enough if you know that okay this is the diagram and this is what happens but you should know the background what is actually it means okay I will say that instruction and data competing for the same location in the block what it means when your refresh is happening that instruction is coming in sitting in one particular block in the cache similarly when some LTM is happening loading from memory is happening that is also coming and sitting in a cache but both can be mapped to the same location if possible then they will be competing for the same location okay now what happens caches for data instructions are combined data accesses can accept each other. So, when they are completing for the same location in the cache they can impact each other ok. So, it is likeyou know our the core which is you know actually the code like LDM or STM is also a code ok actually an instruction when you are executing this when you are accessing this instruction it is going to create a instruction cache hit or miss correct because this instruction is accessed in the pre fetch stage and after some three cycles when it enters the execution stage it is going to create a an issue with the data cache because it is going to create a memory read or memory write. So, that is going to cause a again a a disturbance to the cache. So, an instruction which is ok three clock behind can come and cause a disturbance to the cache ok. So, they are dependent. So, load and store instructions can cause structural hazard because cache is common for. So, when I what I mean the structural hazard is a blockage is happening may be a pipeline is solved because the particular block ok is not available or you know there is a miss. So, it has to be brought in the main memory. So, it is a structural hazard that is what is happening. So, the load and store instruction can cause a structural hazard ok very easy and a speed cache does not have an issue because they have a separate you know physical caches and I told you that it can be different size and different block and it can be of different write policy you know you can do anything independent of each set ok. So, you can design this and design this independent of each other based on the requirement of a particular program running in the process ok. So, that is what it is a speed cache. So, caches for data instructions are separate both caches can be configured differently and that can be accessed in family that is very very important they can be accessed in family ok. So, one entry in a instructive cache may not or will not not may not will not instructive instructions in other ok there may be a dependency ok. Something happening here may have an effect on where something is happening here ok there is a side effect may be there, but it is not that one entry here is not as tied to the unified cache ok the same instruction can try to a with a particular block, but what is happening here will have a side effect here because you suppose instruction is happening then there will be a data entry coming here ok that way there is some you know dependency, but actually the access to the cache of these and this are missed happening here and it is happening here or whatever they are independent of it. So, they can be accessed in parallel. So, when a pre-cache is happening it is going to access from this ok that is one more important thing you should remember. When I am saying it is in parallel means the pre-cache happening will be this you know will be making the instruction cache busy whereas, any execution happening here will be keeping the data cache busy. So, they can be in parallel ok. So, that is the advantage of having a fixed cache. So, what are the multiple level caches? There can be number of caches in the system say in an SOC we can have I show for I show you only one cache because we want to explain what are the policies design elements or associated the cache. Then once you are now clear with one cache now I can introduce one more cache this is level 1 cache L2 cache I am showing it as both are inside of this. Now, size wise I need to show that L2 cache is bigger than L1 cache and it will be a multiple of this suppose this is the 8 kilobyte of cache this can be a 16 kilobyte or 15 kilobyte ok. And the main energy can be a few hundreds of kilobyte ok. So, what happens actually the L1 in L2 L2 cache is a cache for L1 ok. So, it is just a scenario for the processor L1 is a cache similarly for L1 L2 is a cache. Suppose if the processor tries to find something here it may encounter a miss then it will come and see whether do I have it here because this is the bigger size right. So, may be it really assess that cache the empty was here and then it is still lying here because it is a bigger thing the replacement will be replacement will be more often here compared to cache to cache. So, replacement ok that means any block coming out and getting replaced. So, if you do not find it here you are likely to find it here then what happens it is brought into this immediately and this is that that ok and this is being used here. So, there will be a separate policy for this and separate policy for this what I mean by that is it can be different, but the kind of size what we decide in the system has to be a multiple of L1 ok. So, what is the advantage we get if there is a miss here we can try to see whether it is available here instead of immediately going to the main memory if it is not in L2 also then what happens it is brought from main memory. So, when a new block is brought from main memory what happens is it gets actually a bigger block is here ok the sizes of blocks may be different ok the block size may be 32 bytes here ok and 64 bytes here as I told you as we go away from the memory the transport the number of bytes we transport will be more bigger. So, when we need only texture byte it is a miss here ok it is not available here and the same location is not available in this. Now, what happens L2 see no longer L1 is communicating as a memory L2 is communicating as a memory because that is closer to the main memory. So, L2 says I want to 64 bytes of you know data around this location ok which is aligned to 8 byte or 2 byte or whatever. Then that data is wanted in the cache and part of it based on what part of the 64 byte is required by this processor processor may mean one word or one byte is from here. So, based on that a process this location is copied into this location and then given to the processor ok. So, what happens when this processor is accessing the next may be another location which is closer to 32 byte it is not here this block is not here because only 32 bytes are sorry yeah 32 bytes are available here whereas the next subsequent value 90 years because we brought 64 bytes from here. So, when it encounters a miss here it is likely to find a hit here. So, now what happens it will bring the next set of 32 bytes here and then maybe you can replace the previous one so that it does not have to search other block. So, now what happens we are gaining a hit because of having a get this. So, this is the scenario of how L2 cache might save us from going to the main memory more often ok. So, split caches so L1 cache can be on chip or L2 cache may be on or off chip normally L1 is a split cache ok whereas L2 cache can be a inbuilt by the split cache. What I mean by that the instruction and data ok of L1 cache may be different caches ok whereas L2 cache bigger as L2 may be on both both D1 and L1 L1 will be it is a unified ok and then this is communicating with the main memory ok and processor of course talks to distributed good if you understand when you see any processor architecture internal diagrams or blocks you should be able to relate L1 cache always match the contents in L2 ok this is very important L2 cache is normally in multiple of L1 cache I told you that ok that is all. So, what are the difference between parameter now if you see links of elements you should have very good clarity and a similarity and not only similarity you should also understand you know direct match by now ok let me show you the correct one. Ok, the correct one is the full associative and fully associative I call it a scan because accessing the cache with the fully associative mapping is done ok content based addressing. So, content at the end ok and you know replacement policy and the round driving on random or LRU LRU is and you know about unified or separate caches and you know about physical cache in the full cache just to refresh your memory, processor is here and MMU which I am going to talk in the next class ok and MMU main memory is here. Now cache can be here or here what is this? This is the virtual address and or logical address and this is the physical cache. So, cache can be either here or here it is not both places either one of this and if it is here it is logical cache and if it is here it is a physical cache. What does it mean? It maps the logical cache logical addresses to the entries in the cache whereas, this cache maps the physical addresses in the MMU to locations in the cache that is all ok. So, you understand these two also. So, you understand all the entries here ok if you do not please go back either one lecture in this lecture itself or go back one more lecture ok. Now ARM cache features I am not sure whether you will be able to read them all, but it can refer any manual these are all different word sizes 8 or 32 ok. It could be logical or physical location and the two way set associative I told you two way four way eight way, but there are 32 way 64 way processes no caches of it. That means, what directly mapped to the cache, but inside that it could be in one of the locations suppose if it is a 64 way cache it could be one of the 63, 0 to 64, 64 locations in the cache in the particular directly mapped line it could be found in one of these locations ok. So, there are different families of processor ARM 720 ARM 920 ARM ARM 11 and then data or immediate you know instruction cache how much of size 32 kilobyte or 8 kilobyte there are different sizes cache line size you know block size 4 or 8 bytes you know words 4 words means 4 into 4 bytes. So, 8 words maximum you can see that only 32 bytes are there ok, but later processors will be in that, but it is a block size or line size of a cache. So, it can be you can see that kind of different values given to each of the element. So, we have addressed each of them in detail. So, you should be an expert in cache now ok. Now, we have talked about cache inside the processor. Let me just give you some more than 10 minutes to listen to it ok. Cache is here let us for a moment forget about m and root if it is not in our preview notes cache is here ok. So, if there is no MMU what is the type of cache here it is a physical ok, because there is no MMU. So, whatever address given by the processor is physical address and physical memory. So, in this case I told that the person of cache can be configured, configured in the sense I can configure it for write policy some things are decided in the hardware, but sometimes what happens is the hardware is configurable. So, because when the IP is given of an cache controller, they cannot design one IP for a particular type of a policy ok, then it is it is not an IP right. So, cache controller will have a configurability in it. That means, you can choose write through policy ok or write back policy one of write policies I can see ok. Or I can choose you know the memory mapping policy I can choose one of them. Once you choose a particular set, then it may remain as same throughout the execution of the program. But the choosing a particular configuration is or you can even you know you can choose the IP and then have that only limited security in the system. So, that you know your SOTS is limited that is fine, but sometimes you may want to give the configurability to the system programmer. In that case what happens is the configurability is given to the programmer ok, I do not know what it is ok. So, now what we will do is we will see what are different co-processor. It is actually configuring the co-processor, now cache is done by co-processor. Why co-processor see cache controller itself is considered to be a co-processor the reason being accessing the co-processor is not a memory cycle ok. Please remember it is nothing to do with the memory act ok and they are not. So, whenever you do a memory access only the cache controller comes into play MM comes into play and then we when we are saying that we are configuring a cache, if it is the configurability is also is mapped to the main memory or memory map if those registers are also memory map similar to peripherals then we are in trouble because we cannot consider the memory or the cache or the memory also can be you know needs to be considered. So, whenever you want to consider any of them they cannot be in the memory map that means, it cannot be in the 2 power 32 locations that can be addressed by the ARM processor you see I am talking about only that processor. So, they cannot be in that location. So, the ARM designers have you know nicely designs such a way that the controller of cache or the controller of MME considering it is mapped to the co-processor that means what we can use those co-processor that we have seen MREMS here ok only these 2 ok MREMS here is what either co-processor to register one of ARM register or register to co-processor. So, ARM inside a register some sentence can be written and then it can be written into the cache controller. So, you have to assume the cache controller the cache is here whatever seems so far cache controller is a totally a different element which is associated with the cache. So, I let me call it as a cache controller. So, this is mapped to the co-processor world. So, it is mapped to co-processor 15. So, if a processor throws some instruction with a co-processor ID 15 and then it conveys some particular location then what happens the co-processor controller gets written which happens to be a no based on what you choose it can be a cache controller then what happens cache gets considered then when you do any memory access it will be accessed in the cache. So, you may wonder where this this MRC and MSEAR instruction comes from they come from instruction memory the instruction memory is also in the. So, it is nothing it is not independent of memory because the instructions are used to be in the memory. So, they have to be mapped without a cache. So, these are all non-cached items ok. These some part of the initial memory which is accessed for configuring this cache controller in a view they are all non-cached. So, these instructions are non-cached. So, they can be directly accessed. So, cache will not be coming into play at all and then the processor will configure the cache then it will enable the cache ok that is the flow of things which you will be having a better understanding when I talk about MMU, but in this case you remember that the caches are also also to be configured using co-processor instructions and there are specific registers that programmer can write into or read from overall system control configuration can be done using them cache configuration can be done and this is all different tightly coupled memory MMU and LPU ok. Don't worry about them I will talk about it later in the subsequent process. So, apart from cache management where is the cache the co-processor switching is also used for configuring other controllers. So, that is not our topic of discussion in this section. So, you should know that cache controllers are designed or first configured using these you know on co-processor instruction. So, they need to be enabled only when the configuration is done for the cache, till then the instructions are accessed from a non cache part of the memory. So, that part of the memory which is holding the instructions for configuring the cache or access without the cache that is involved. So, ARM can directly access from main memory and then do the job if you know this if you can understand this particular background I am happy ok. So, we will cover these things later. Now, co-processor switching registers please remember these are all the some primary registers. That means, if you recall the instruction if I am calling it as MRC there is a CPID somewhere here things ok 8 to 15 if I remember something. So, CPID the co-processor ID is in the instruction like this then there are some operands we talked about ok operands which are actually the co-processor registers ok co-processor you see registers. So, what I am talking about primary registers the primary registers are this OP 1 operand 1 and operand 2. So, you have a 15 bit main no sorry not 15 bit 4 bit. So, 0 to 15. So, 16 co-processor registers can be accessed. So, these numbers are varying from 0 to 15. So, you can access them using these operands and then mention the CPID as what 15 with a CPID co-processor 15 value then we can do different job ok. So, it is all designed and given by the ARM only. The co-processor controllers are ARM IP means IP coming from ARM that is why they have decided that ok this is the registers I will be using to configure them. So, in your SOC if you have a cache you will have a cache controller along with that. So, you will write these registers and then configure according to the instructions ok what needs to be written into then you know if the cache will behave that way ok. So, this is how the configuration of cache happen. Now what is a flash cache line drop ok these are some jobs ok I am explaining TV and flash I think it is better to understand this. Clashing a cache means suppose you want you have so many data so much of data is there in the cache and then the processor is restarting ok this is as given then you can decide to do a flash. Clashing the cache all the entries in the cache are made invalid and even if there are dirty entries here they are not written into main menu ok. They are not written into main menu invalid block without writing back the dirty block into main menu ok. If there was a write back policy you know which was you know delayed write so it was not write to so, but some entries were there which needs to be written in the main menu, but when the processor is restarting there is no you need to write them all right. So, you can there is one scenario there may be many scenarios where you do not want to write this data into the main memory. So, flash is one control ok one command you can give it to the control of the cache, P in cache is another command you can give it to the cache. So, these are all required when there are the process context which are happening ok P1 is you know executing in the processor now P2 is coming into the ARM processor. Now whatever was used by the P1 needs to be written back into the main memory and then so that P2 can start using the processor and then start using the cache also. So, for that you need to clean the cache. So, clean the cache means you suppose there are dirty entries in the cache it should be written back to the main memory. So, one controller one command can be given to the cache so that it writes back everything into the main memory ok. Then it is giving you the scenario. Lockdown in the cache not allowing selected blocks from main. See there is a may be a very less you know see there is a need for some part of the cache ok some two blocks are there can be locked out ok this is cache can be locked out it could be instruction cache. The same suppose you have ok for FITU ok FITU FITU vector the FITU vector is pointing to some location in the main memory which is the handler FITU handler it is here in this location. Assume that this instructions are all mapped to this location in the cache and then you want to say that it should be locked down. What I am telling you is whenever the ARM processor is executing ok it will be using the for its own execution it will be using this instruction cache for its program whenever an FITU can be executed ok interrupt it may be an you know external interrupt coming from outside world. Then it has to go to the interrupt vector and then access the handler. Suppose if all the instructions meant for FITU are all in the cache system it will not go to main memory it will get the handler and then everything in the cache itself. So, FITU can be executed then since handler can be executed without any delay because if all the instruction meant for handler are all available in the cache ok it will be happening only when the replacement does not happen for these blocks. So, you can lock down saying that do not even consider any block appearing in this cache blocks to be for replacement. So, replacement happens only within these locations and these locations are not not replaced so it happens ok. So, if you can run faster. So, that is the purpose of this block. I hope you understood all this. So, we covered quite a lot about cache in this and we may not be talking about cache anymore and by now I think you have a better understanding of cache and you can read more literature please do not stop with this do not read from literature some books to have better understanding and clarity ok you may you have internalized whatever you hear from this. So, reading more will help you to streamline your thoughts and understand them better ok. I enjoy really sharing this with you I hope it was useful talk to you in the next class we will talk with we will start the next class with the NMU with the NMU ok Thank you very much for your concern see you in the next class have a nice day bye bye