 lecture on Hierarchical Memory Organization. In my last lecture, I have given enough background for Hierarchical Memory Organization. We have discussed the basic characteristics of memory devices which are used in a computer system and we shall see how the performance of memory system can be improved by using Hierarchical Memory Organization. We shall start with the first level of memory hierarchy that is done with the help of cache memory. So what is a cache memory? Cache memory is a small first storage device that is introduced in between CPU and the slow main memory to improve average access time. So this is the basic idea normally as you know you have got your CPU central processing unit. I have already done this diagram CPU and it usually directly communicates that means before the cache memory was invented the CPU used to directly communicate with the memory and IO devices. And this memory is known as main memory or prime memory. The reason for calling it main memory or prime memory is that to execute a program it is very much essential that the program is present in the main memory. That means for the execution of a program to fetch data it is the memory which is accessed. So that is why it is called main memory or prime memory. But as we have seen that main memory which is usually realized by using dynamic RAM is slow and we have also seen the speed of processor is increasing at a high rate. The speed of dynamic RAM is not increasing at that rate. So the gap is widening so we have to bridge the gap and that can be done by using cache memory. So what will be done that CPU will not communicate with main memory directly. First it will communicate with the cache memory which is much faster and usually it is realized by using static RAM. Static RAM is used to realize this cache memory. So it is much faster may be one order of magnitude faster compared to main memory realized by using dynamic RAM. But one point you must understand the cache memory size is smaller it may be faster but it is much smaller in size. That means a small portion of the main memory is present in the cache memory. Then question will arise how do you improve the performance. Performance can be improved by exploiting spatial and temporal locality that is inherent in program execution. And this will help us to improve the performance even with a small cache size and we shall see how it is being achieved. So here I have shown the main memory which is byte addressable and let us assume the total size of main memory is 2 to the power n bytes. Where n is the number of bits that is being generated by the CPU as address. That means the number of lines that comes out from the processor is n. This is the address bus and so you can have a main memory size of 2 to the power n bytes which is shown in this diagram. So main memory organization is shown here starting with 0, 1, 2. So you have got 2 to the power n minus 1 bytes. However, whenever you organize it in this way you can see that a word will comprise of several bytes. So may be suppose you are considering 32 bit CPU. In such a case you will be accessing 4 bytes. So this is the word 1, 2, 3, 4. So 4 bytes will be accessed simultaneously. So this will form a word and which will be accessed simultaneously. That means although the unit of reference is byte but access is done by the CPU in terms of words. So this thing must be clear to you. So for a 32 bit CPU 4 bytes are accessed. For a 64 bit CPU 8 bytes are accessed simultaneously and accordingly the memory has to be organized. Now let us see what is being done in case of cache memory. So side by side we are trying to implement a cache memory but cache memory will have much smaller size. Let us assume the total number of lines usually it is specified in terms of lines. Total number of lines is m 0 to m and in a particular cache it is assumed as m 0 to m. Assume that it has got k words in each line which is called block. That means you have got one block in one line. So this is a block I shall discuss about the other bits. So this is a block which comprises several words may be let us assume four words. It is not mandatory that a block must have more than one word. Only one word may be present in a block but as we shall see to exploit the locality of references more than one word is present in a block. So few words are present in a block that is what we shall assume. So we have assumed that there are k words in each line. Then we are and as I have already mentioned the transfer between CPU and cache memory takes place in terms of word because here CPU will want one word at a time. On the other hand whenever the data which is not present in a cache that means it is a since a small portion of the main memory is present in the cache you cannot expect that you will always find your instruction or data in the cache memory. So whenever the data or instruction is not present in the cache memory it is called cache miss. On the other hand if the corresponding block is present in the cache it is called cache heat. So question arises how do you identify cache miss and how do you identify cache heat that we shall discuss in detail. But one point you have to remember that whenever the cache miss occurs you have to get the word that is required by the CPU from the main memory. So from the main memory the blocks are transferred. Now you can see the transfer is not in terms of word that means whenever there is a miss not just word is read from the main memory. But a block is transferred so here the transfer takes place in terms of words I mean sorry in terms of block. A block if several words are present in a block all the words will be transferred one after the other although CPU will require only one word at that moment that means once the block is transferred to the cache memory only one word will be used by the processor at that moment. And the other words which is being transferred from main memory to the cache will be used subsequently. So the basic operation can be explained with the help of this flow chart although everything happens in hardware. But for the purpose of explanation the way the cache memory works can be explained with the help of this flow chart. So first thing that is being done receive address from the CPU that means the CPU generates an address which has been stated as RA receive address from and that address is applied to the memory system. And then what has to be done the memory has to find out whether a block is present in the whether the block containing RA is in cache that means the as I told you have to find out rather the memory system has to find out whether a particular word which is being asked by the CPU is present in the cache or not. So, if the answer is yes then it is very simple the corresponding instruction or data which is present in the cache memory is delivered to the CPU from the cache memory. So, here since the cache memory is faster whenever it goes through this path it is quite fast and the operation is for completed. Now, the what happens whenever the corresponding word which is which has been requested by the CPU is not present in the cache. How that can be identified we shall discuss little later, but let us assume it is not present then we call that cache miss. So, cache miss has occurred. So, what has to be done the memory system will automatically access the main memory for block containing the RA that means the received address. Then one thing has to be done the block which has been read from the main memory has to be stored in the cache memory. So, you have to allocate cache slot for main memory block. So, once a block is read it has to be stored in one of the blocks one of the lines. So, you have got m lines in one of the lines it has to be stored. So, question arise that means that is called allocation of a particular line which to which line it will be stored. We shall discuss later how that is being identified that is a that is another important thing you should understand. You have got m lines it has to be one of the lines has to be allocated for the word for the block that is being transferred from the main memory to the cache memory. Then deliver the RA deliver RA word to the CPU. So, once you have transferred the block to the cache memory and stored it in the allocated cache slot the requested word can be transferred to the CPU and load main memory block into the cache slot. So, it will be done together. So, whether you load it in the cache slot first or you deliver to the CPU that is immaterial because once you have read it from the main memory it can be immediately transferred to the CPU and then of course, it can be loaded into the cache. Why these two has been separated? The reason for that is you can see that your bus may be the by width of the bus can be same as the size of the word. So, if you have to transfer multiple words it will require multiple memory cycles. So, which can be deferred if necessary for later time. So, and then that is that is being done. So, this is in not cell the way the cache memory works. Now, let us consider several issues. One important issue is the size of the cache with respect to the main memory. We have seen that small it should be small enough. So, that the average cost is closer to the main memory. So, if the cache size is very large then cost will be high and not only cost will be high. Whenever you make the size of the memory larger it will be slower as well because you know if you look into the details of the implementation of cache memory which is realized by using static RAM. There is a decoder portion address decoder that address decoder will be more and more complex as the size of the memory is larger and that makes it slower. So, it has to be small from two viewpoints number one the cost should be smaller because cache memory is costlier second is it has to be fast. So, from that consideration the size of the cache memory should be small, but another is it should be large enough. So, that the speed is closer to the cache memory. So, this is another requirement. So, if we want that the cache memory should be smaller from one angle another angle is cache memory should be larger. Why it should be larger? Then most of the time the processor will find the instruction or data in the cache memory and so the access time will reduce the speed will be closer to the cache memory. So, these two are conflicting requirements. So, you have to do optimization. So, it should not be too large it should not be too small. So, the cache size is between 1 K to 512 K. So, you may be asking why such a wide range? The reason for that is in the early years when memory was very costly when cache was introduced fast size of the cache was pretty small. And as the advance with the advancement of field site technology the cost of cost as well as performance of cache memory has improved cost has reduced performance has improved. And as a result in as the with the advancement of time the cache memory size has also increased. So, as a result you will find in modern processor size of the cache memory is not 1 K 1 kilo byte here byte is not mentioned. So, you will consider always in terms of byte. So, 64 K or 128 K or 256 K or 512 K these are the typical cache memory sizes which are present in the modern contemporary processor. So, it has been found that this gives optimum result. So, satisfying these two contradictory requirements you have arrived at some compromise which gives you optimum result. That means it is that it is not very small not very large and it gives you speed closer to the cache memory, but cost closer to the cache sorry speed closer to the cache memory and speed also closer to the cache memory. That is number one issue second issue is where can a block be placed. So, I have already mentioned about that question you see you will be reading a block from the main memory and you have to allocate one of the cache line to store that block. So, that this is the question and that question is answered by what is known as mapping. So, as there are fewer lines in cache than main memory blocks suitable technique for mapping of main memory blocks into the cache line is necessary. So, you have to develop you have to use appropriate mapping technique and it has been found that the mapping functions can be divided into three basic categories. So, here the first one mapping function first one is known as only one place corresponding to one block of main memory to one or two a block of main memory. That means, so if you read a word from a main memory say may be from here and it will always go to a fixed line that is fixed. So, there is a kind of 1 to I mean it is not really 1 to 1 it is because many blocks of main memory will actually map to one of the locations. So, it is many blocks will map to one block. So, it is a many to one mapping. So, it is a many to one mapping because you have got many blocks of main memory and each of them although the line is fixed, but to where it will go I mean that is fixed that means several such for example, this block may also map to this location to this particular line. So, but that is fixed that is not changed. So, this is called one place is fixed and that is known as direct mapping I shall elaborate this technique in detail little later. Then second possibility is that you can have few places that means there are few alternatives you are introducing some flexibility that flexibility is that instead of always storing in a particular line you can have more than one alternative 2 or 4 or 8 limited number. So, you can store either in this line or may be there is another alternative line you can store in this line in one of the two lines in one of the two places. So, you can have few places. So, this is little more flexibility that is being added and later on we shall see how this improves the performance and at what cost. Then the third alternative is any place actually these two have changed places actually any place will be associative mapping that means few places is known as set associative mapping. That means one of the sets can be used for placing and last alternative is any place that is your associative mapping fully associative mapping. That means a particular block from the main memory can be stored anywhere that is known as associative mapping and or full I mean fully associative mapping or associative mapping. So, these are the three alternatives which have been explored and we shall see their advantages and disadvantages. So, these three techniques that I have mentioned has led to three possible ways by which you can identify where a block has to go. First technique is known as indexing that means there is a fixed place one place for a particular block in main memory where it can be stored that is done by technique called indexing. Second is done by limited search that is set associative that means some search will be performed within the cache to identify where it can be placed. Then full search that means to which line it has to be stored by using a full search it will be found out and it can be stored in appropriate place by based on some policy which I shall discuss little later. So, this is done by first breaking down an address into three parts what are the three parts. So, the address which is generated by the CPU is this is the address which is generated by the CPU. Now, this can be divided into three parts first part is block offset what do you mean by block offset within a particular block how many bytes are present this will be specified in this part that means suppose we assume that a particular block a particular block has got four words and each word of four bytes let us assume then how many bits do you require in the block offset. So, for four words you will require two bit to identify which one of these words and two bit is required to identify which byte of the four bytes. So, you require four bits. So, you will require four bits to as block offset whenever you have got four words in the cache and each word is a four bytes if you have got only one word present then say one word as I said that is also a possibility. So, in such a case this particular bits will not be required. So, you will require only two bit as part of the block offset. So, essentially the block offset bits identifies different bytes that is present within a block. Now, the main part is used for the purpose of identifying the block. So, this is known as block address. Now, this block address is present and this block address is again divided into two parts. Second part is known as index. This is the index question arises what do you mean by index and how many bits are to be present in the index. Actually, this index finds out say you have got m lines present in the cache 0 to m. So, how many bits do you require to identify one of the lines? You will require log to m lines and that is actually specified by the index. So, index specifies the address part which identifies which of this line is present in the cache. I mean one of the lines it points to the line where there is a possibility of storing that particular block. So, indexing that index from the index field it points to the cache memory. Now, why do you need the third part? So, you have already identified a particular place in the cache where a block can be stored, but as I said it is many to one mapping there are if there are if the number of bits present here depending on the number of bits present here there will be so many alternatives. Say if the number of bits present here is say n not n is usually specified at the that address size and this can be say k. Let it be some number l. So, you can have 2 to the power l different blocks which can be mapped to the cache memory line having the same index. So, one of the 2 to the power l blocks has to be will be present here question actually arises which block of main memory it is. So, that has to be found out by by technique known as by putting the tag as part of the cache memory. So, here you are storing the tag this value this is known as tag is known stored along with data not only data, but tag fields that is the higher order bits are stored in the cache memory. So, what will be done this will be compared with this tag and if they match then you know that particular block which is referenced by the cache memory. I mean which is referenced by the processor is present that means cache heat will occur if this matches with this tag. On the other hand if this does not match with this tag field then it is miss that means this is hit if it is not then it is miss. So, you can see the role of the 3 different fields I have discussed in detail and later on when I shall discuss about the different different types of mapping how they are used we shall discuss. So, we find let us consider a following system addresses are of 32 bits which is generated by the CPU the block frame size is 2 to the power 4 4 bytes that means this is the size of the block which has been assumed to be of one word let us assume 32 bit processor. So, the cache is 64 kilobyte that means you have got 2 to the power 16 bytes that is present in the cache and since 4 bytes are present in each block and what will be the total number of lines which is also called block frames that is 2 to the power 14. So, for each cache block brought in from the memory there is a single possible frame along among the 2 to the power 14 available that means 2 to the power 14 available this is the size this is the number of frames or lines which is present and we have to compare the tag to identify whether a particular address that has been generated by the CPU is present at that moment or not. So, this is being elaborated here as you can see your address is 32 bit and these 2 bits 0 1 is used as byte of set I mean block of set then R is the number of bits that is used for the purpose of indexing. So, since you have got we have already seen that you require 14 bits for the purpose of indexing. So, 2 to 15 14 bits are used for the purpose of indexing and remaining 16 bits are used as tag. So, the number of blocks in main memory is 2 to the power s that is the number of blocks. So, in this particular case it is 16 plus 14 30 2 to the power 30 is the number of blocks in main memory and the number of lines in cache memory is m which is equal to 2 to the power r and so the this is how the address length is s plus w s is the block address plus the block of set. So, together forms the address length. So, you can see we have this is what will be stored in the cache memory your data will be present which is 32 bit assuming that 32 bit processor then 16 bits you will require as tag field you see there is another bit is present v what do you really mean by v there is a now your cache comprises 3 fields first part is data where useful information that will be required by the CPU is stored then as we have seen you will require to store the tag corresponding to different frames or lines one tag is required for each line. Now, another bit has been added v that is called the valid bit why do you need this another additional bit need from that arises need for the valid bit arises because whenever you turn the computer on your cache is blank there is nothing no information no useful data is present in the cache. So, it is full of garbage. So, you have to take into account that situation as well. So, that means when the computer is turned on the valid bit of all the lines is set to 0 that means what is done in the beginning all the valid bits of the line all are turned 0 all are 0 all are 0 now suppose the CPU generates an address whenever the address is generated by the CPU using the index field it will point to a particular line let us assume it points to this line and if the valid bit is 0 which will immediate indicate that the corresponding data or instruction is not present in the cache memory. So, that means if v is equal to 0 then cache miss. So, what will be done then at that time you will be fetching it from the main memory as we have seen and after you fetch it in the main memory from the main memory and you store your data here and store the tag field here then it will be set to 1. So, there will be a tag field present here and data present here once that is stored then the valid bit is 1 is set to 1. Now, next time whenever if the index field is same then that time not only this valid bit will be checked, but this tag will be compared with the higher order bits of the address to check that means the tag field has to match which I shall elaborate with the help of this diagram. As you can see here you have got 32 bit address these two are block offset or byte offset which is identical at this moment because one block comprises only one word. So, that is why byte offset and block offset is same in this case and this index field is used to point to a particular frame in the cache. So, this has been pointed. So, what it does it first checks whether the valid bit is 0 or not. If it is 0 then this hit will be 0 it is an AND gate if it is 0 output will be 0 that means it is a miss only when it is 1 then it is hit. Now, in addition to that it will compare the higher order 16 bit with this tag field which is the tag field that is being stored at the time of bringing in the block to the cache memory. So, this tag field will be compared with this higher order address lines of the which is known as tag field of the address and if they are same then this will be 1. So, the tag has to match and the valid bit also has to be 1 to get a hit that means if the tag does not match even when the valid bit is 1 it will be a miss because it is a many to one mapping. So, there is a possibility that particular cache line does not corresponding to this address. So, in such a case in spite of the fact it is a valid bit is 1 it will not get hit because the tag field is not matching. Now, question arises you can see the relationship between the number of lines number of bits which is given here where I is the cache line number cache line number means that is the cache line number in the cache memory. How do you find out find it out I is equal to j mod m where m is the number of lines that is present in the cache. Now, it is very easy to do it whenever I mean do it by breaking it up in this manner that means you have to take j mod m. So, we have seen in our case the j is the number of blocks that is present and we have found out that that is equal to 2 to the power s j is equal to 2 to the power s 2 to the power s and on the other hand your that m that mod m m is called 2 to the power 2 to the power 2 to the power r. So, how do you how it can be easily found out. So, this will be equal to 2 to the power s minus r. So, that is actually the value of j main memory I mean sorry that is how you can find out this is i is equal to 2 to the power s mod to the power s. So, this i is found out that index is found out by this. So, this will be equal to this by this. So, that is how you can get this value and this is the relationship and i is equal to j mod m that is used for the purpose of direct mapping. Now, you can see here how the mapping in the mapping takes place in direct mapping as I was told as I have told that it is a many to one mapping. So, you can see a single cast line is being mapped by so many main memory blocks. So, these main memory blocks can be assigned to this cast line 0th, mth then 2 mth, 3 mth in this way 2 to the power s minus m. Similarly, the first cast line will be mapped by 1 plus m plus 1 plus 2 m plus 1 in this way it will go up to 2 to the power s minus m plus 1 and the last line m minus 1 line that will be mapped by that m minus 1th block of the main memory or 2 m minus 1th block of the main memory and in this way it will go up to 2 to the power s plus 1. So, you can see this is how the it takes place in case of direct mapping. Now, let me elaborate this particular thing has been shown in this example you can see this is the different main memory blocks which are mapping to a single cast line. So, in this particular case you have got 16 kilobyte of main memory and 4 kilobyte of cast memory. So, you have got 4 kilobyte of cast memory and 16 kilobyte of main memory. So, you can see accordingly the byte of set is 2 bits and 14 bit is required to I mean that is the total number of bytes in the cast memory because it is 4 kilobytes and out of which 12 bit is required for the purpose of 10 sorry you require 0 1 10 bits because it is 4 kilobyte that means you will have 1 k lines. So, 1 k lines for addressing purpose you will require 10 bits. So, 10 plus to 12 and you have got only 4 bits for the tag field. So, what does it mean that 4 bits in the tag field means 2 to the power 4 different blocks of the main memory can map to a single cast line. So, you can see this is 16 different blocks of the main memory can map to this and that can be found out from the tag field. Tag field identifies which one of these blocks is present. Let me explain this with a much simpler example. Suppose your main memory has got 2 to the power 5 blocks let us assume you have got 2 to the power 5 blocks and your cast memory has got only 2 to the power 3 blocks. So, that means you have got 8 lines. So, 8 lines are present here. So, this is your cast and your main memory has got 0th and 2 to the power 5 minus 1 this is the I have considered block I have not stated in terms of words or bytes. So, it is in terms of blocks only. So, one of the 2 to the power 5 blocks will be here. Now, it is divided into 3 fields index and then data. Now, suppose the first an address let us assume the address here the index fields will be start with 0 0 0 and it is 1 1 1 and here it is 0 0 0 0 0 0 and it is 1 1 1 1 1 5 lines. Now, suppose the block address that is generated by the processor is 0 1 0 1 0 this is the address which is generated by the processor. Now, where it will search where it is 0 1 that means this part it will be done. So, this part will be used these 3 bits will be used for the purpose of indexing. So, this is 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 then 1 0 1 1 1 0 and 1 1 1. So, 0 1 0 it will point to this indexing will be done here. So, once this address is generated it will search this line indexing is done by this field. So, this bit will be initially it was 0 now it will be set to 1 and you will write 0 1 this is the tag field in this and the corresponding data that is present in this location 0 1 0 1 0 will be stored here. Now, suppose subsequently the CPU generates another address that is equal to a is equal to 0 0 1 0 1. So, in such a case it will index bits will sorry index bit will point to 1 0 1 1 0 1 is this 1 and then this 0 bit will become 1 and it will write 0 0 and corresponding data will be stored here. Now, suppose the address that is being generated next time third address that is being generated is 1 0 0 1 0. So, 1 0 0 1 0 address has been generated 0 1 0 is matching with this. So, it will point to this now you can see this is this bit is 1. However, this 0 this is 0 1 and this is 1 0. So, this at this moment there will be a cache miss in spite of the fact some data is present here, but it is not matching with this address the third address being that is generated. So, this will be a miss and this will be replaced by 1 0 at this moment and correspondingly the data which is being stored here will be will be replaced. So, this is how it will work. So, I have illustrated with a very small main memory and cache memory. So, in real life processors obviously, the number of bits that is present for I mean that that is used for addressing indexing and number of bits in the tags will be much larger. Now, this is how it is done in direct mapping and as you can see it is quite simple directly from the address bits you can find out you can you can you can you can separate out the index field which will be applied to the cache memory then comparison can be done I mean with the tag field and by checking with the checking the valid bit field it will identify whether the heat is there or it is miss, but this direct mapping has a number of disadvantages. The various disadvantages are number 1 is fixed cache location for a main memory word. So, as I have already mentioned that direct mapping gives you a fixed location for a given main memory block. So, for a given main memory block this a particular line is fixed. So, two words with the same index bit index, but different tag value cannot reside in the cache simultaneously. So, this is a serious restriction because only one can be stored. So, two words with the same index, but different tag value cannot reside in the cache simultaneously. What does it mean? That means for example, these two words these two words a 1 0 1 0 1 0 and 1 0 0 1 0 these two are generated one after the other. If it is a direct mapping each times say suppose alternately this is being generated each time there will be a miss. That means since two locations two two main memory having the same index field cannot reside simultaneously this will always lead to a miss. So, this particular restriction can be overcome by using other type of mapping as we shall see. So, as I have explained with the help of this example this is vulnerable to continuous swapping. So, continuous swapping will take place in such a case and this can be overcome by using other type of mapping like say associative mapping where it will do full search. So, this is another extreme in the earlier case our restriction was that only one address can be used one frame or line can be used to store. Now, in this case the extreme is that you can store any block in any line that is your associative mapping, but in such a case what you have to do you have to compare the tags of all the lines simultaneously with the tag that is being read that is coming with the processor coming from the CPU. So, you have to do what is known as parallel search that means you have to search in parallel in all the cases. So, you can see although I have shown only one comparator you will require the number of comparators will be equal to the number of lines or number of frames that is present in your cache memory. So, this associative mapping allows any main memory block to be mapped into any cache line. So, this is the flexibility you are getting, but this makes this cache very costly and this type of cache is known as content addressable memory or CAM because you are using a part of that address to be stored in the memory and used for checking and obviously, this will give better performance and but it is extremely expensive to implement. So, we find that the we have got two extremes direct mapping and fully associative mapping. So, both of them have their own advantages and limitations. Now, in my next lecture I shall discuss about another technique which is known as set associative mapping which tries to achieve best of both the words that means good features of direct mapping and good features of fully associative mapping. It will try to incorporate in that particular technique that is known as set associative mapping which I shall discuss in my next lecture. Thank you.