 In this video, we are going to explain the concept of set associative cache. Now, in order to understand this concept, we have to first remember the other two possibilities that we have studied so far. The first one was base and perhaps the best way to illustrate it is taking into account the memory address that gets emitted from the microprocessor. Let's assume it has d bits from 0 to d minus 1. And we divided that address into three fields. This is what we call the offset and indirect mapping. This was the cache block in which the memory block will be stored and this third field, the most significant bits, would be the tag. So the cache memory in a direct mapping would have all the entries containing three columns, the tag, which would be directly obtained from here, the data, and always a validity bit over there. Now the important feature here is that looking at the bits in the cache block, it will give us directly the only possible entry for this address to be stored or for the block containing this address to be stored in the cache memory. So the restriction was one memory block would only be in one single possible cache location. So this is the direct mapping. In contrast to this, we also studied the associative mapping which basically lifts this restriction and rather than forcing us to put one memory block in only one single position of the cache, we take the memory address like we did before, we still remove the offset bits, and the remaining bits is what we call the tag. Now the cache memory still has a similar structure to the one presented before. We still have three columns. This one still contains the tag, second one the data, third one the valid bit, but now the tag is directly storing any of these entries, but we have to compare the tag if it is equal to all possible tags here, so we require a lookup of the tag in all blocks. So this was an advantage and a disadvantage, an advantage because it gives me more flexibility on how to store the blocks in any available slot in my cache, but it has the disadvantage of making the implementation of these memory much more complex because the read and write operations require all of them to lookup tag in all the blocks. So the set associative cache is the policy that is going to be derived by mixing both of these schemes, and the best ways to illustrate this is if we consider again the memory address that gets emitted by the microprocessor, this is the address again from D minus 1 to 0, D possible bits, we're going to have the memory always divided into blocks and these bits in the lower part of the address will give me the address in that block, but now we're going to have an additional field here that is going to be determining the set that I'm going to be considering and the higher order bits still give me a tag. Now this set means the following the implementation of the cache memory right now is as follows. We're going to have still three columns and a bunch of entries here, but now all the slots of the cache memory are going to be divided in what we call sets, so this is going to be the set zero, this is going to be set one, this one is going to be set number two, and this is going to be set number three. In each one of these sets it's going to have exactly the same structure that we have seen so far, which is three fields, still three columns, all of them are identical, all of them have the same size, three columns, first column is the tag in all the entries, second column as before is the data, let's not forget that this memory is storing a copy of the data that is stored in main memory, and of course the validity bit in each one of these entries. Now what we have here is these three four sets, number from zero to three, and here we can see why set associative cache is the mixture of these two heuristics because these bits over here will determine only one of these sets uniquely, they're going to determine only one of these sets in which the memory block to which is referring this address can be stored, but within this block, sorry, within this set we still follow an associative mapping and therefore we still need to do a lookup. So in this case one memory block can go to only one single set, but to any on that set. So again, this is the mixture of these two previous idea because even though we do have a set of bits here that restrict the amount of cache that needs to be observed or needs to be looked up, inside that set we apply an associative mapping, this is also the reason why this heuristic is called set and associative because we divide the cache into sets but within each one of these sets we apply an associative cache. Now the other interesting observation here is that still the size of the data in the cache it's going to be the number of sets multiplied by the number of blocks per set multiplied by the size of each block. So from that point of view, these memory and these other memories are fairly similar, they all store a tag, the data and the validity bit, but the difference is on how the data is stored and retrieved. So here what we have in principle is the best of both worlds which is a simpler lookup operation, simpler mainly by the fact that instead of checking all the tags in all possible entries of the cache memory we only check those tags that are members of the set specified by the address so we have a simpler lookup and at the same time we have a still flexible placement of the blocks. So even though we have a simpler lookup within this set we still can use any of the available slots in here. So as you can see these three types of heuristics will give different type of performance or different type of heat and mist ratio and typically what happens is that designers carry out a very large number of simulations with memory traces that are obtained from real programs executing and they can anticipate with those traces what is the performance of each one of these heuristics and most importantly for this one they can adjust the number of slots that we have per set and the number of sets we have in the sets associative cache so you can perform countless simulations of memory accesses and tune the size of these memories so that it gives you the best performance for the execution of your programs.