 And welcome to today's lecture on hierarchical memory organization. We have already discussed two topics on this and particularly we shall be concentrating on mapping function today. And as we have seen the mapping function can answer two important questions related to hierarchical memory organization. Number one question is where can a block be placed in cache memory? That is block placement and we have already discussed about two approaches as I mentioned. First one is direct mapping, second one is fully associative mapping and today we shall continue our discussion on set associative mapping. And second question that this mapping function can answer is how is a block found if it is in the cache? That means whenever a particular block is present in the cache, how can it be found out from the cache memory? These two questions can be answered and particularly the second question is answered with the help of tag field and different blocks present in the in a block different words present in the memory. We will see how it can be found out. So let us quickly recapitulate the direct mapping. We have seen in case of direct mapping the address that is coming from the processor is divided into three fields, the first field is known as block offset or byte offset also you can see it essentially represents the number of bytes present in a block. And the second field is index field and as we have seen in case of direct mapping index part is used for the purpose of identifying which particular line I mean it points to a particular line and checks whether that line corresponds to the address that is being generated by the processor. So this is the address generated by the processor and as we can see that index field is used as a kind of address to the cache blocks or cache lines and it goes to a decoder and decoder identifies a particular line that is present in a in the cache memory. And as we know the function of a decoder is if it has got n inputs it will generate 2 to the power n outputs and one of them is active that is the function of a decoder. And here as you can see that index field is applied and obviously one of the lines that will that will be coming out from the decoder will be active and in this particular case this particular line is active. And that means this is the block or line you have to consider that may correspond to the address. Why I am telling me the reason for that is it is a many to one mapping and by comparing the tag field of the address with the tag field that is present in the cache memory we have to see whether that address really corresponds to this line of the cache memory. That means whenever there is a matching it is compared with the help that these two tag fields are compared one coming out from the memory another that is stored in the cache memory. And these two are compared and if these two are same then only we know that there is a matching of course that valid bit field has to be one that means the memory at some point of time has been transferred to the cache memory it indicates that that means if this is one only then output will be hit. That means if there is a matching here and if the valid bit is one then it is a hit otherwise it is a mess. So this is the direct mapping and you can see the memory that can be used in particular in direct mapping is conventional memory. As we know in conventional memory we have got we apply address is applied and it produces data and if the number of address lines is n and number of data lines is m then it is we know it that memory is 2 to the power n into m memory. It can be it may this is the standard organization of memory it can be for RAM as well as for ROM. So we can see here we need not use any special memory for whenever we are using direct mapping standard memory can be used only thing in the memory we have to store different components like valid bit then the tag field and the data and data can be instruction or data depending on it corresponds to the instruction cache or data cache that we shall see. So this is the direct mapping then we have seen as fully associative mapping and as we know in case of fully associative mapping we have to perform parallel search in all the lines. Therefore memory is not really the conventional memory that memory has here as we can see there is no index field. So since index field is not there it does not point to any particular cache line. So the tag field is compared with each and every tag field that is present I mean for corresponding to each line and a kind of parallel comparison is done and that is the reason why it is called a CAM content addressable memory. So here the RAM has to be a very special type that means RAM will compare with a part of the content. So here what is the content content is the tag field. So the tag field content has to be compared with each and I mean with the tag field that is coming out from the address generated by the processor and you have to do parallel search in all the lines that is present in the cache memory. And it is also called associative search and as I have already mentioned content addressable memory has to be used and whenever there is a matching with one of the tag field and also the valid bit is one then it is a hit otherwise it is a miss. So we see that in this particular mapping whenever it is used the memory has to be of special type and it is quite complex because not only you have to store some information in the form of instruction or data you have to also provide mechanism for parallel comparison within each and every line that is present in the memory. So it is quite complex and costly so it is in practice this is not used and also the direct mapping which I have discussed all that is very simple but it does not give you very good performance. So we have to go for a particular mechanism which will give you best of both the words as I mentioned in my last lecture which tries to combine the good features of both the mapping functions that is for direct mapping as well as fully associative mapping that is known as set associative mapping so which performs limited search. So it is a compromise that exhibits the strengths of both the direct and associative mapping and it also overcomes the some of the disadvantages of both that means the complexity is much lower than the fully associative mapping and it gives you better and the performance that is achieved in direct mapping is improved in this technique. So in this particular case m is the number of lines that is equal to v into k where v is the number of sets and k is the number of lines in each set. So v is the number of sets and k is the number of lines in each set and j is the cast set number and i is the cast set number so that means i is equal to j mod v because v is the number of sets that is present and j is the main memory block number. So you can find out i by taking a mod of j with respect to v j mod v and that will give you i that is the cast set number. So here as you can see again the address is divided into three fields the first part is byte offset or you may say block offset it more than words are present so it is essentially represents the number of bytes. So here it has been assumed that four bytes are present so two bits are required for that byte offset and then you require several bits to represent the set. So here we have got seven bits and assuming that cast size is only 4 kilobyte so 4 kilobyte that means you require total of nine bits to represent the size of cast memory and the remaining bits are for address. So you can see here the example that is being shown in this particular case you require you can see here it has been assumed that you have got four sets I mean in a particular set you have got four lines. So you have got four lines and that is why it is called four way set associative memory. So four ways associative memory has got four lines in each set and you can see total number of sets is equal to I mean 256 because you require eight bits and remaining bits that 10th bit to 31st bit remaining 22 bits are to be stored in tag fields of different sets. So you can see here the comparison has to be done for each set I mean for a particular set you can see you have got four lines for each of the lines you have to perform parallel comparison. So you can see the number of comparators that is required in this particular case is four because it is a four way set associative memory. So instead of that number of comparators that is equal to the number of sets here the number of sets is lesser as you can see and only four and you require a multiplexer to find out which one of the line is corresponding to the address that is being generated. So that is found out by selecting one of the words one of the lines and that will come from that it will come from the block offset field and that is being applied to the multiplexer and it identifies one of the blocks and that goes as data. So you can see this is the set associative mapping and let me illustrate it with a very simple example where we shall have the same cache size but we shall go from simple direct mapping to fully associative and two way and four way set associative mapping. So in this particular case as you can see we have got only in case of direct mapping we have got only eight sets. So and in case of two way set associative mapping the number of sets will be equal to m by 2. So here m is equal to 8 so 8 by 2 we have got four sets and each set you will have two lines that is for two way set associative memory and similarly as you go from and as you can see this is for direct mapping. For direct mapping v is equal to m and k is equal to 1 that is the number of lines that is present in a set is 1 that is why k is equal to 1 so it reduces to direct mapping. So you can see that set associative mapping is a generalized situation we can consider it as a general case and then direct mapping is a special case of set associative mapping whenever your k is equal to 1 that means the number of lines that is present in a set is 1. So it reduces to direct mapping on the other hand when v is equal to 1 that means you have got only one set and in that set you have got m lines that is present. So this corresponds to the issue you have got only one set as you can see. So in a single set you have got all the lines that is present in the memory. So this is a associative mapping this is again a special case of two set associative mapping where v is equal to 1 that is number of sets is equal to 1 and the number of lines that is present is equal to the number of lines that is present in the cache memory. And you can see this is direct mapping this is two-way set associative mapping this is four-way set associative mapping and this corresponds to fully associative mapping. So the mapping function how the mapping function varies for different situation is shown with the help of this simple example. Now as the associativity changes your performance improves it has been found that as we improve the associativity the performance improves but it will not come free of cost. So you have to pay the price for it what kind of price you have to pay. Number one is increasing associativity requires more comparators that we have seen as well as more tag bits per cache block. So as you go from direct mapping to fully associative mapping number of tag bits keeps on increasing number of comparators keeps on increasing in direct mapping we have seen you require only one comparator and the tag bit field size will be minimum and as you go for two-way set associative mapping you will see that size of the tag is increasing number of tag bits that is increase will become double. So the choice among direct map set associative and fully associative mapping in any memory hierarchy will depend on the cost of a miss versus the cost of implementing associativity. That means whenever a miss occurs cost of miss that means miss penalty versus the cost of implementation. So you will you have to ultimately consider the you have to look at the cost performance of the memory system that means if you can afford to have larger cost fully associative is very good. On the other hand you have to optimize the cost in that case you will go for either two-way set associative or four-way set associative or sometimes eight-way set associative. Now let us consider a simple example just to illustrate the how the cost increases. So let us assume that the address size is 32-bit, word size is 32-bit and block size is word size. So for the sake of simplicity we have assumed that here the block size is equal to one word. One word may consist of few bytes normally a block comprises several words but in this particular case we have assumed that there is only one word and address size is 32-bit data size is also data size or word size is also 32-bit that has been assumed. In other words we are essentially considering a 32-bit processor and 32-bit processors usually have 32-bit word size and 32-bit address size and cache size is considered to be 16 kilobyte. So cache memory size is equal to 16 kilobyte. Now let us see for different situation how the number of comparators and number of tags changes. So this will correspond to direct mapping and two bits will be required for byte offset because you know your word size is 32-bit so two bits with the help of that that means you can have up to four bytes. Four bytes means two bit will be sufficient to address so that is for byte offset and since you have got 16 kilobytes of RAM totally will require 14-bit but out of that two bit goes for byte offset. So remaining 12-bit will provide you the index so with the help of these bits you will be able to point to a particular cache line and then the tag field will be 18-bit. So you can see total number of tag bits that you require we have seen your cache memory in case of direct mapping will have three fields as we know tag valid bit and the data. Now this tag field in this case is 18-bit and number of lines that is present that will be decided by the number of index bits. Number of index bit is 12, number of bytes so actually number of bits in the index field is 12 so you will have two to the power 12 lines you can say. So in each line you will be having 18-bit so the total tag size in this particular case is two to the power 12 into 18 this is the tag size in case of direct mapping and number of comparator that will be required is only one so this is the size of tag bits and this is the size of comparator. Now let us go for another case that is your 4-way set associative mapping. So since the size of the cache is same 16 kilobyte now you will have 4 lines in each set. So essentially it will be like this so you will be having 1, 2, 3 and 4 such in a particular set you will be having 4 lines 1 line, 2 line, 3 line and 4 line so this is 4-way set associative. So in this particular case since the size in this case is two to the power 12 and it is divided into 4 so number of bits in index field will become now 10 bits so the number of sets that will be present is 2 to the power 10 and the size of the tag field now will become equal to 20 because here this will now become 20 because the size of the index field is reducing so the tag size in this particular case will be equal to 2 to the power 10 into 4 because you have got 4 lines into 20 because 20 is coming the size of the tag bits in each of these fields will be 20 earlier it was 10 now it will be 20 so this is the size of the tag field so this man can be represented as 2 to the power 12 into 20. So this part is remaining same however this is increasing from 18 to 20 so size of the tag field is becoming you can see it is increasing significantly it is becoming 18 into to the power 12 and this is 20 into to the power 12 so it is significantly increasing and number of comparators in this case will be equal to 4 so number of comparator is increasing and the number of tag bits is increasing now let us consider the fully associative memory. So in case of fully associative memory we can see there is no index field because index field is not required you have got only one set because index field identifies a particular set and since you have got only one set and so you do not require index field and the entire remaining part of the address that is your after apart from the byte of set the remaining bit 30 bit is the size of the tag field so in this particular case what will be the total number of tag bits so that is your 4 this is for 4 ways set associative what about fully associative in case of fully associative here that 2 to the power 12 will remain same because that will be the total number of lines and that has to be multiplied with 30 that is the 30 is the number of bits that is present in the tag field so this is the size of the tag field and number of comparators is equal to now 2 to the power 12 not 1 or 2 but 2 to the power 12 or in other words you require a content addressable memory a special type of memory in which the comparison has to take place in parallel with each and every for each and every line present so you can see the how the size of the tag bits is increasing size of the number of comparator is increasing so this is the case for 4 different situation 3 different situation direct mapping 4 ways set associative and fully associative of course you can have 2 ways set associative 8 ways set associative and for them these figures will be different you can easily find out the number of bits that is required for 8 ways set associative that will be more than double of that 4 ways set associative and of course the maximum number of tag bits is you will require in fully associative and total number of comparators will be 2 to the power 12 so by this shows how the cost is increasing for implementing as you increase the associativity. Now comes the question of replacement as we know only a small part of the main memory is present in the cache memory it has to be a it will be a many to one mapping so whenever there is a miss you have to take new block from the main memory you have to transfer new block from the main memory to the cache memory so you have to replace one of the block that is present how will you do that what algorithm will use for that so that deals with the replacement algorithm question and the in this particular case the question is which block to be replaced on a cache miss so one of the obviously one of the existing blocks is to be replaced when a new block is brought in so this is quite obvious. Now if we focus on direct mapping we have seen in case of direct mapping there is a I mean one block of cache memory corresponds to one memory block that means it is fixed we have seen that if this is the cache memory and if the index field is same I mean for same index field it will map to the same same line same you can say set of the cache memory so it will map to the same so it will be in this case there is no alternative that means if the index field is same then for different tags it is fixed so you there is no choice you do not have any choice of replacement it is fixed however in case of associative and set associative mapping there are several approaches because you have choice now let us consider the simplest case simplest case it is two ways set associative in case of two ways set associative you have got for the same index you can have two lines that means you have got two choices either it is present in this particular line or it is present in this particular line depending on the index field so it is either here or here so you have to replace one of them so since for the same index you have got two choices one of them has to be replaced and which one do you replace and for that there are several algorithms or approaches the simplest one is known as least recently used and that is the most common one what do you mean by least recently used that means you replace that one which has not been used in recent times how do you keep track of that what you can do you can add another bit so whenever another bit and that particular bit is used solely for the purpose of replacement what you can do you can if this is being accessed at a particular instant you can set it to that particular bit can be set to one and the other bit can be set to zero so that means so far as the present instant is concerned this is the most recent line of course compared to this one so if this is accessed at next instant this will be again set to one this will be set to zero that means what you can do by changing these by with the help of one additional bit you can find out and whenever you have to make the replacement what you will do you will check which one is one I mean when you replace that one for which it is zero that is how the replacement can be done but matter gets complicated as you go for four-way set associative eight or eight-way set associative or fully associative in that case you cannot keep track of this least recently used with the help of a single bit you have to use multiple bit and you have to modify in each of these lines so it becomes quite complicated there are other approaches for example first in first out so you can maintain a first in first out first in first out I mean kind of shift register so in this case the it is organized in the form of a shift register so the first in which was brought first will be replaced now so this approach can be used another is least frequently used so in this case you have to keep track of the number of access of different lines in a particular set and that again you have essentially you have to maintain a counter starting from I mean when you put the power on or you may at regular interval you can synchronize it by setting all the bits zero and then increasing the counter so least frequently used can be implemented with the help of my having a counter for each and every line that is present of course the simplest one is random you simply replace any one of them there is no extra overhead you do not have to maintain any I mean any housekeeping information no extra bit is required but random may not give you very good result so this can be used as I mentioned the least recently used the most common one but there are other options available that I have discussed for the sake of completeness now another important issue is what happens on a write so what happens on a write that means arises in case of memory write request so as we know you have to perform write you know there are instructions like load and store I mean load and store or you perform write in a particular memory location as you write where do you write you have brought a particular block to the cache memory and obviously the CPU will modify in the cache memory but in that case what will happen the information that is present in the main memory will be different from that is present in the cache memory and as we have discussed in our first lecture on this topic there we have seen there is a property called inclusion property demands that whatever is present in main memory must be present in the cache memory it cannot be different so what will happen in that case and it has been found that the number of memory writes is about 15 percent of the total number of instruction and there are two different approaches used for the purpose of writing one technique is known as write through in case of write through the information is written to both the block in the cache as well as in the main memory so what you do in case of write through so you have got your cache memory here you have your main memory here normally you are accessing from the cache but you have to perform write so as you write in a particular you modify this data then if this corresponds to a particular main memory location that can be easily found out you have to modify this as well so you have to modify cache memory as well as main memory however the time required to access cache memory is much lower than the time required to access main memory so in that case what will happen whenever there is a write operation performed then you have to access both the cache memory and the main memory and as a result the time required to perform that write will be much longer if you had written only in the cache memory that write operation could have been finished much earlier but since you are writing in the main memory it will take longer time and moreover it generates substantial memory traffic that means normally CPU is accessing the cache memory so traffic on the main memory is much less but whenever you are using write through not only it will take longer time but it will generate more traffic on the main memory so you may be asking so what you may be telling so what let there be more traffic on the main memory for a single processor system this is okay it does not change the situation much however whenever we go for multiprocessor having a shared memory then the traffic on the main memory is very important because different processors will be accessing the main memory so in such a case it is important for each and every processor the traffic I mean on the main memory between that processor and the memory is less so that is why this is particularly important in the context of multiprocessor system having a shared memory. Now to overcome the disadvantage of write through another technique can be used that is known as write back now in this case you are writing only in the cache memory so the information is written only in the block of the cache and an update flag is a bit is set so what you are doing in this particular case you are adding additional bits so in this case you are adding an update flag bit so update flag bit it was initially 0 now you are setting it to 1 because this particular data has been modified so you have not yet modified this that the main memory has not yet been modified the reason for that is you may have multiple word that is present in the in a particular block so a particular word is modified that does not mean that I mean but other words other words that is present in the block may not be modified so they may be remaining same so tag field will be there and you are using valid bit is there and update bit is you are adding. Now in this case you are keeping this block because it has been found that until this block is replaced there is no problem the main memory may be different from the that is present in the cache memory because as long as you access from the cache memory it is updated it is most recent so if although the main memory content is old invalid it does not affect you but whenever a particular block has to be replaced that time you have to update your main memory because so that next time whenever you bring in any in that block from the main memory to the cache memory you get the updated block that means you have but how do you do that that is the reason why you are adding an additional bit known as update bit so at the time of replacement normally you keep on using because that particular block may be accessed many times before it is replaced so whenever you replace it that time you check the value of the update bit if it is one then you update the main memory so that is what is being mentioned main memory is modified when the block is discarded and if the update flag is set otherwise there is no need to change so this creates cache coherency problem in multiprocessor based systems so this will of course create another problem because we have modified in a particular cache memory but what can happen if you are having multiple processors this is one processor this is another processor and each of these processors can be having its own private cache memory main memory may be shared through a bus this is your main memory which is being shared by the processor this is the CPU 2 and this is a CPU 1 so this is your cache this is your cache so whenever you are using this is your main memory so in this particular case what is happening when you are using write back policy then you have modified let us assume in this block but you have not modified in the main memory and also you have not modified in this cache memory so what is happening the cache memory content of the same block corresponding to I mean corresponding to the same block of the main memory is now becoming different this is a special situation in the context of multiprocessor system so in the context of multiprocessor system two cache memory blocks or lines make will correspond to the same line of main memory but what is happening these two are becoming different so that will be that problem will arise in the multiprocessor system and we have to deal with this problem in a multiprocessor system and this particular problem is known as cache coherency problem and we shall see there are some special algorithm which has to be used to overcome this cache coherency problem in a multiprocessor system. Another important issue is block size as I mentioned the size of a block so far we have assumed only one word is present but whenever multiple words are stored then how it is being done and particularly it has been observed that as the size of the block is increasing is increased then the performance improves why it improves because of the locality of reference and since you can see here I have got a cache memory where two words are present the tag bit is common valid bit is common index field is used same index field is used to access the data of this block and which is having two words so what you have to do you have to use some bits of that I mean there is a block offset field so that block offset field has to be used to select one of the two words that is present so what is happening now you may say that earlier we were having three fields this is your byte offset so considering it has 32 bit you require two bits and then we had the index field and the tag field that is present in the tag field that is the address generated by the this is the address generated by the processor and we are dividing it in this way now what you have to do in the index field whenever you have got multiple word in the cache memory that index field is again having two parts one is your block offset so here in a single block you have multiple words so these bits will be used to select the word that is being addressed by this processor so the size of the index number of index bit will reduce as you increase the size of the block and these bits have to be used to select the particular word that is required by the processor and this is what is being shown in this particular diagram that means a part of the index field I mean that is byte offset which is shown here is applied to the multiplexer and multiplexer selects one of the two words that is present in that block and that is provided to the processor because processor will require only one word at a time the transfer between the processor and the cache has to be word in terms of words and so only one word will go and of course the heat means mechanism is identical that comparison of the tag field then ending with the valid bit those things will be identical so this is the situation where you have got four words that is present in a cache memory this shows that there are two bits byte offset and two bits for block offset so you can see these two bits are used to select the particular word from this particular line of that cache memory and that data is provided and of course the tag field is common valid bit is common as I have already mentioned so multi word cache block for is used for better performance and it takes advantage of spatial locality what is spatial locality spatial locality says that if you use a particular I mean address I mean at a particular you are accessing a memory from a particular memory location then adjacent memory location will be used in your future that is the spatial locality may be used and since you are having four different blocks and obviously they are in adjacent memory locations and they are likely to be used in future and so this will this increase this takes advantage of spatial locality and as a result this improves the performance so a cache block is made larger than one word of main memory and in case of a miss multiple adjacent words are to be fetched that are likely to be needed shortly so here there is a I mean the advantage is that it will improve the performance that means the number of that that heat rate will increase miss rate will decrease but unfortunately another problem will arise what is the problem in this case multiple adjacent words are fetched and they are likely to be needed shortly that means you have to transfer multiple words whenever there is a miss so whenever you have to transfer multiple words from the main memory to the cache memory it will take longer time that means the miss time so whenever there is a miss that time will increase earlier if there is only one word that is present in the block so one word has to be transferred from the main memory to the cache memory and the processor will resume execution of the next instruction but this will not happen whenever you have got multiple words present in a cache line you have to transfer all the words from the main memory to the cache memory so this will increase the miss time so as the block size increases the heat ratio initially increases as I have explained because of principle of locality the miss rate may go up as the block size goes beyond the beyond some limit when it becomes particularly significant fraction of cache size so this particular diagram shows how the miss rate changes as you increase the block size initially as you can see miss rate decreases because it increases because of the principle of locality and however as the block size keeps on increasing you can see a point reaches when the miss rate keeps on increasing that happens because it goes up as the block size goes beyond some limit when it becomes a significant fraction of the cache size that means when the block size becomes a significant portion of the fraction of the block size the total cache size then this problem will arise however as you increase the size of the cache memory then this problem reduces as you can see this corresponds to only 1 kilobyte of cache memory whenever you go for 2 kilobyte of cache memory you can see although you are not getting much benefit but definitely the miss rate is not increasing so miss rate is reducing as you increase the size of the cache memory that is quite natural but you can see the miss rate is not increasing as you increasing the block size because in this case block size is insignificant portion of the total cache memory so this is the larger block size reduces the number of blocks that can fit into a cache and as the blocks becomes larger each additional word is further away from the required word requested word so because of these two situations you are getting this type of curve another important issue is number of caches how many cache memories you will have single or two level so the advancement of VLSI technology have allowed on chip cache which provides fastest possible cache access earlier when the cache memory was introduced it was not on chip it was outside the CPU it was off chip cache memory but as the with the advancement of technology you are able to have more and more transistors on a single chip so in addition to the CPU you are adding cache as part of the CPU on the same chip and this is known as on chip cache so this will definitely be faster than the off chip cache and all present day processors are having on chip cache now the question arises whenever you are having on chip cache will you have another cache which is off chip or another second level cache and that is that that is why it is called two level that means this allowed eliminated I mean this also eliminated external so it leads to two or more levels of cache on chip cache l1 and off chip cache l2 and it provided still better performance so normally you will have two levels of cache memory another important issue is unified or split cache as we know you have to fetch instructions from the memory as you keep on executing the instruction and not only that you have to fetch data array and other things from the memory so will you have a single unified cache both for instruction and data or you will have separate cache one for instruction and one for data actually to support pipelining to overcome structural hazard two separate memories particularly two separate cache memories are preferable and it is widely used so separate cache for instruction and data as it is shown in this diagram so when the Princeton architecture was introduced as I mentioned earlier they proposed a single memory both for instruction and data and Harvard architecture they proposed two separate memories one for data another for program but unfortunately when their proposals were evaluated in those days you know that the cost of memory was very high so they could afford a single memory so the Harvard architecture was discarded and Princeton architecture was accepted but that is a history now now all modern processors are having Harvard architecture so a load or store instruction requires two memory access one for instruction and one for data therefore unified cache causes a structural hazard as I have already discussed and modern processors use separate data and instruction cases as opposed to unified or mixed classes so CPU can simultaneously access instruction data address to the two ports so both cases can be configured differently another aspect is if you have got two separate cases one is instruction cache and one is data cache and they can be organized differently differently in the sense say instruction cache can be can use direct mapping data cache can use set associative mapping so that flexibility is provided in modern processors and you will see that when two separate cache memories are used you can have two different types of I mean organization like the size can be different for two different cases associativity can be different as I mentioned like that so this shows the unified versus split cache in this case you have got a processor and the unified cache one unified cache two so I mean both instruction and data access but this shows a split cache example the first level cache has got instruction cache and data cache separate but second level cache is unified essentially for cost consideration so you can see it is a combination of unified and split cache first level cache is separate and second level cache is unified so separate instruction data cache avoids structural hazard also each cache can be tailored to specific need as I have already mentioned they can be organized differently so let us quickly consider the cache in Intel processors Intel it has started being used I mean cache memory is being used starting from 80386 it has got 32 kilobyte of off-chip cache direct mapped block size 16 bytes and it uses write through then 80486 8 kilobyte of off on-chip cache so here it used both on-chip and off-chip cache so first one you can see in for 80386 it was off-chip cache only there was no on-chip cache and here it is on-chip cache and it uses 4-way set associativity and here it was direct mapped and block size was 16 bytes and it used write through then in case of Pentium 4 2 on-chip cache I mean you have got split cache data and instruction each of 8 kilobyte block size is 64 bytes and 4-way set associative and off-chip cache is 256 kilobyte and block size is 128 bytes and it uses 8-way set associative map so with this let us come to the end of today's lecture in the next class we shall start our discussion on how you can improve the performance of this hierarchical memory organization. Thank you.