 Hello and welcome to today's lecture on main memory optimizations. In the last lecture, we have discussed about the organizations of main memory, different types of main memories that are used and how they are organized. Just like we have discussed various cache optimization techniques, today we shall focus on various techniques for optimizing the main memory. Now, as we know one of the very critical bottlenecks in high performance processing particularly when you are trying to interface a high performance processor to the main memory. As we know dynamic RAM is used as the main memory and that situation has not changed over the years, although the static RAM is size of static RAM is increasing, cost of static RAM is increasing, but as main memory dynamic RAM is the choice of the day. Main reason is cost and second reason is high packaging density that is possible in dynamic RAM, you can have a single package with 256 kilobits or more on a single chip. So, this has led to the use of dynamic RAM as the main memory almost in virtually all systems. So, whenever we say main memory optimization, essentially we shall see how we can improve the performance of dynamic RAM memory systems. This is that performance gap I mentioned and this gap is increasing over the years, because the processor performance is improving at the rate of 50 percent per year, dynamic RAM performance is improving at the rate of 9 percent per year. So, this gap is increasing, how to bridge the gap that is the main question and we shall try to address it by using different techniques. Another one technique that is used is to increase the bandwidth, higher bandwidth that means the rate at which data transfer will take place between the processor and the main memory. How we can increase that rate, rate of the transfer that is known as higher bandwidth and that can be achieved in three different ways which we shall discuss. One is the use of wider memory, second is the use of interleaved memory and third is the use of independent memory banks. So, these are the three commonly used techniques for increasing the bandwidth of the main memory systems and after that we shall discuss about advanced dynamic RAM organizations. The many innovations have taken place to improve the organization of dynamic RAM with the objective of making it faster and so that the gap between the processor speed and the rate at which the data can be provided by the dynamic RAM is reduced. So, one obvious way to improve main memory performance is to have a higher memory bandwidth. By that we mean this increasing the memory bandwidth that will bring in more bytes per unit time from the memory up to the hierarchy. So, first we shall focus on wider memory, wider by wider memory means we shall increase the we shall make the memory bus wider normally you know the width of the bus is same as the size of the word of a processor. So, if it is a 32 bit processor width of the bus is 32 bit, if it is a 16 bit processor width of the bus is 64 bit. Now, we would like to make a wider bus say maybe that will transfer more number of words and the that can be done to improve the performance. Since the CPU needs one word at a time there needs to be a multiplexer between the CPU and cache. However, as we try to do so I mean increase the width of the main memory bus you will see that you will require some multiplexer because the processor in any case will need in terms of words it will fetch word by word because the cache memory main memory bus is only in terms of words. So, we can explain it this way normal situation we have got your CPU here there is a bus and this bus is connecting to the cache memory and bus with the same as the word size and here also you have got another bus that is connecting to the main memory and having the same width of the bus. So, this is the word size bus this is also word size bus. Now, what we are trying to do is to have a wider bus between cache and main memory so CPU will be there and having the word size bus and here you will have your cache memory. Then you will require a multiplexer between this bus which will transfer in terms of words and a wider bus that will be coming from the memory. So, here you will be having a multiplexer multiplexer can be 2 to 1 or 4 to 1 depending on the width of the main memory. So, main memory is now is having wider bus and this bus will transfer at a much I mean more number of bits will be transferred. So, if this is a 64 bit this can be 128 bit or 256 bit depending on the width that there is a choice we can use. So, let us consider how the miss penalty is reduced because of the use of wider bus. So, for that we assume that one memory clock cycle is needed to send the address that means the CPU has to send the address and that address will be to send that address associated with other I mean along with other control signals that will require one cycle. Then 20 memory cycles for each DRAM access that means the whenever data transfer is taking place between the and this is the main memory dynamic RAM after receiving the address it will take 20 cycles not one cycle, but 20 cycles and one memory clock cycle to send the data that means per one memory clock cycle data can be sent from the after reading it from the main memory to the processor. So, that is what is being shown and we have assumed that the we are using a block size of 4 words not really words that means the cache memory is having 4 word block that means you have to transfer even if this if it is word size you have to transfer 4 words before you can provide control to the CPU even in this situation. So, in such a situation whenever you have got one word wide DRAM bank that is transferring between the cache memory and main memory. Let us see what is the miss penalty for standard memory first one clock cycle that is needed to provide the address then 4 clock cycles in the 20 that is required to transfer 4 words of data from the main memory to the cache. So, 4 word that will be required that means 1 plus 4 into 20 for transferring from the main memory to the cache and after that you know you will require 4 more cycles to transfer from the to write it into the cache memory. So, this is the total time you will require 85 cycles memory clock cycles 1 plus 4 into 20 plus 4 into 1. So, 85 memory bus cycles bus clock cycles are needed in the standard situation. Now, let us assume that we have made the main memory bus to 2 wide 2 wide means this is say if it is 64 bit this is 128 bit not 256 bit this is 2 wide that means we can transfer at a time 2 words simultaneously and by doing that you can see the how the reduction in miss I mean means miss penalty occurs miss penalty occurs. So, in this case you will require 1 plus 2 into 20 because you are sending 2 words at a time. So, you will require 2 into 20 and again 2 cycles of 1 clock cycle that is for transferring the data to the cache. So, this is the miss penalty that is required 1 plus 2 into 20 plus 2 into 1 that is 43 memory bus clock cycles. So, by having wider memory we can see it is roughly I mean there is a speed up of about I mean close to 50 percent 85 by 43 that is very close to I mean may be 100 percent increase in performance you can see the transfer rate becomes double almost double. So, this is the miss penalty reduction by using a 2 wide bus if you make it 4 wide then this will be 1 into 20 and this will be again 1 into 2 into 1. So, still improvement is possible, but you know more wider wide is the bus it is very costly to implement that is why the width of the main memory bus is not made very wide may be 2 wide or at most 4 wide. Now, let us consider another situation where we can use interleaved memory. What do you really mean by interleaved memory? Here as we know main memory consists of multiple memory chips therefore, each chip could be meant to serve part of a request at any time. So, what we are trying to exploit here? Although we are considering a single memory system we know that memory system comprises a large number of memory chips. So, each chip may be provide only 1 bit or 8 bit at most, but not more than that. So, why not exploit that and that means several memory chips are there and that parallelism can be exploited and how we shall see. For example, this is the typical memory bank here it has been assumed that your memory chip is organized as 2 to the power 16 into 1. So, this is how it is organized. So, it is the memory chip is bit organized. So, what you are doing? You are transferring the address simultaneously to all the memory chips. The chip select signal to all the memory chips. So, that means all of them will get ready at the same time to provide data on the data bus. So, this is a 64 K memory bank of 8 bit worth. Now, this can be extended. This idea can be extended to realize multiple memory bank in interleaved memory system. So, here what has been done? Instead of having a single bank, we are having the CPU that is interacting with the cache memory. Now, between cache memory and main memory, we have got several banks. So, this is bank 0, this is bank 1, bank 2, bank 3. So, let us have this. So, this is the assume we have got 4 banks. What is being done here? The cache memory will apply the address and other control signals simultaneously to all of them. So, since it will be supplied simultaneously to all of them, they will require 20 clock cycles to get ready to provide data. That is the access time. As we have seen, access time is 20 clock cycle. So, after 20 clock cycle, all the memory banks will be ready. Now, after that you can transfer one after the other. You read from memory bank 1, then you read from memory bank 2, then you read from memory bank 0, then memory bank 1, then memory bank 2 and then memory bank 3. So, this one, after they are ready together, you will transfer one after the other. That is what is being done in an interleaved memory. So, how the mispellent is reduced? So, in this case you can see you require one clock cycle to supply the address to send the address. Then 20 clock cycles are required for the memory banks to provide data. Then, of course, you will read one after the other. So, you will require 4 clock cycles and total of 25 memory bus clock cycles are needed to transfer the 4 words to the main memory to the cache. So, you see the reduction in this case is 25 memory bus clock cycles instead of 85. So, instead of 85 clock cycles, in the normal situation you require only 25 memory cycles. So, there is a significant improvement in performance. You can say speed up is roughly 4 times. So, you can use this interleaved memory. This idea can be extended to have independent memory banks. So, this is a generalization of the concept of interleaving. So, multiple memory controllers can be used, multiple banks can be used, multiple buses can be used. So, like having multiple memory systems, as if you have got multiple memory systems and each of them will work independently and simultaneously in parallel to provide the response to the CPU. Each memory system can itself be composed of interleaved memory banks. So, what we are considering here? We are considering independent memory banks. Each of them can be interleaved inside and each such memory has a distinct use. For example, in case of input devices where you will require independent memory banks from where it will be read simultaneously, they will be responding to the request from the processor. So, this is the concept of independent memory banks. Now, I mean the dynamic ramps, the way they are organized is shown here. They are organized in the form of dual inline memory modules. So, those are known as DIMM. DIMM is usually contained 4 to 16 DRAMs. So, you can see here you have got a 4 into 1 DRAM chip that provides you 4 into 64 kilobyte. So, in this way you can have 1, 2, 3, 4, 5, 6, 7, 8, 8 into 4, 32-bit wide bus you can have. These DIMMs are used for realizing nowadays in your desktop and servers and workstations as the main memory system. So, each DRAM will provide 4-bit and then you can have 4 to 16 DRAMs on a single printed circuit board providing the 8-byte wide bus for the desktop and then you can have several such DIMMs in your system. Now, we have seen how you can enhance the direct transfer rate. Now, we shall consider advanced DRAM organizations. Dynamic RAM has gone through a number of innovations to provide less and less access time or in other words to provide data at a faster rate and we shall consider them one after the other. The first technique that has been used is to have SRAM cache, use of a static RAM cache memory built in as part of the dynamic RAM. So, SRAM cache was the traditional way to improve the performance of DRAM. So, basic DRAM is dynamic RAM is unchanged since the first RAM chip was enhanced. So, how it is being done? Let me explain by drawing a diagram and how the static RAM is incorporated as part of the dynamic RAM. So, as we know we have got column decoder. I am not drawing the entire DRAM memory chip, but part of it highlighting the incorporation of static RAM. This is your column decoder, then you have got your, this is where you have put your 514 bit into 4 bit SRAM. This is the static RAM 500 into 4 bit SRAM. This is the static RAM 500 into 4 bit that is actually available on a single row and then you have got the other things like the typical functionalities that is required in static RAM, dynamic RAM, sense amplifier and column write select signal and then here you have got your DRAM array. So, DRAM array is 2048 into 512 into 4. So, this will be 512. So, this is the, this is how the DRAM is organized into dimensional array from where it goes to the sense amplifier and column write select and from there a single row you can see you have got 2048 rows 2 k rows and one row is transferred and that is available in the form of a cache in the static RAM. So, this is from here it goes to the external world. So, here is your I O I O control total H S. So, this is for external control and of course, you will require the row decoder here. So, the address A 0 to A 10, 11 bit address comes here. Similarly, here the, you will have the column address latch. So, this is the main idea here is what is happening instead of reading it from the dynamic RAM, you are latching one row here and then by changing the column address you are reading it, I mean you are transferring it to the cache memory. So, that will make it faster because you know because once you have transferred to this static RAM, the access time will be much lower compared to reading it from the dynamic RAM. That is the basic idea and so, you have got a small RAM and S RAMs access a cache holding the last line. So, because of the principle of you know locality that the you will be reading sequentially from sequential address locations and the which are available in your this in a single row. So, from that single row you will read one after the other. Now, extension of that idea is to have cache D RAM. So, instead of a small D RAM, you will be having a large static RAM say here you have added only 512 into 4 bit. So, that means 2 kilo bit instead of that you can have 64 kilo bit. So, larger cache can be used and which will really act as you know cache that means first you will transfer from the dynamic RAM to this large static RAM and that large static RAM from the large static RAM you will read it one after the other as long as it is available there. So, the way the cache memory works you transfer from the main memory to the cache memory and from cache memory it is transferred to the processor in the same way it will work. Only difference here is that from dynamic RAM it is going to the static RAM and from static RAM it will be read by the external world. So, this is the cache D RAM concept that is used in many situations, but this enhanced D RAM this is very popular. Now, this enhanced D RAM it operates based on the idea of first page mode. So, allow row to remain available for multiple column at accesses as I mentioned and holds row data in sense amplifier for longer period. So, that sense amplifier in this case is acting as a kind of cache. So, we are not using this static RAM we are not using it, but what we are doing we are transferring one row of data and this sense amplifier at the output of the sense amplifier one row is available and which will act as a kind of latch. So, in this case without using static RAM you are able to read it, but in this case you have to restrict to reading from the output of the sense amplifier and by changing the by holding the row address select signal while changing the column address select signal. So, the sense amplifier function is cache for D RAM rows. So, multiple column address select signals can access multiple words in the same row again this exploits special locality via successive accesses to the same row. So, basic idea is same, but in this case you are not using static RAM. Then another concept is first page mode short and cycle time by allowing processor to use the same row address, but a different column address as we have discussed. So, this removes one step of addressing sequence one state of addressing means we have seen normally a row address is provided, then column address is provided only after both addresses are available data is read from the dynamic RAM, but in this case only row address is provided then by changing only the column address you will be reading one after the other. So, that is the difference and the data of a single row is referred to as page and extended data out and data out allows processors to overlap data read cycle with the right of the next column address. So, in this case we are overlapping operations data read with the right for the next column address. So, idio results in savings of approximately 10 nanosecond for each read within a single page. Single page means in this case we are referring to one row and if we look at the this timing diagram this will be clear as you can see this is the row address select signal and that has been I mean kept stable. Now, you are changing the column address select signal and you can see this is the row address when the row address actually it is row address strobe not the select. So, row address strobe signal will latch this address in the row address buffer and the column address strobe signal will latch the column address in the column address buffer then the data will be available after some time. So, this data out you are reading this data and this data out and at the same time you can have another address that column address that can be read by using activating another column address strobe signal. So, this data reading and column address generation is over getting overlapped and which is essentially lead to the data to be read in the next clock cycle. So, you can see this is how by using EDO that extended data out this allows overlapping of data read cycle with the write to the next column address. So, write to the next column address and the data out both are taking place simultaneously. Now, you can also use burst mode in the EDO RAM timing. So, in this case basic idea is you have generated a row address and that row address strobe signal has latched it that remains in the buffer then you will keep on changing the column address one after the other. So, here the column address you have applied and again you will subsequent column address select signals are generated and a burst of data is coming out one after the other in each cycle from the same row and that data is getting transferred this is the basic idea of burst mode of EDO dynamic RAM timing. So, this will also make dynamic RAM faster. Now, let us consider synchronous dynamic RAM SD RAM which is nowadays used in almost all workstations, desktops and every file. As we know traditional dynamic RAMs are asynchronous what do you really mean by asynchronous? By asynchronous we mean what happens in case whenever we access traditional dynamic RAM the address will be address along with different control signals will be made available to the dynamic RAM chip that means row address strobe column address strobe and various things and after that the RAM chip within the RAM chip various operations will take place. There are you have seen there are very large capacitances that various large capacitances present inside the dynamic RAM chip that those bit lines and various other I mean row lines and so those capacitances get charged. Then there will be a sense amplifier that sense amplifier will sense the data and that will be transferred through the I O to the outside world. Now, this operation will take some time for the CPU does during this period. So, there is some multiple weight cycles I mean you will require multiple weight cycles before you can read data. For example, 20 clock cycles as I have told is the access time for traditional dynamic RAM that means the CPU will weight has to weight 20 clock cycles to read one word of data from the dynamic RAM. So, that means if you are reading it successfully one after the other if you have to spend 20 clock cycles then another 20 clock cycles and so on. So, because it is taking place in a synchronous manner. Now, in synchronous DRAM this is overcome by providing an external clock. So, access is synchronized with the help of this external clock. So, the processor or you may call it master issues the instruction and address information to the dynamic RAM. Then dynamic RAM responds after a set number of clock cycles. So, here it is synchronized that means the processor will generate address and clock cycles at which is synchronized by clock. Then there is a kind of synchronize that external clock will do the synchronization after fixed number of clock cycles the RAM will be ready. So, the processor does not in this case what happens the processor or the master need not weight. So, can do other task while the SD RAM is processing the request. So, the difference between the previous case and in this case is earlier the processor was continuously checking the status of the dynamic RAM whether it is ready or not and weight cycles were being generated, but in this case the processor is no longer waiting for the data. It is simply after providing the address and control signals it gets busy in other things. So, it can do it can perform other operations or tasks. In the meantime the SD RAM will that dynamic RAM will get ready and SD RAM employs a bus mode using a mode resistor to set up a stream of data to be synchronously fed on to the bus. Then after that set number of times the SD RAM synchronous will provide data in a bus mode. Bus mode means in each clock cycle one word one word one word in this way the data will come and this is again synchronized by the clock. Now here we are making use of a resistor known as mode resistor. What is the role of this mode resistor? Mode resistor is set up to set up a stream of data to be synchronously fed. So, mode resistor will contain two information will you can the user can you can actually set the mode resistor which will decide what is the latency. That means how much clock cycles how many clock cycles the dynamic RAM will require to get ready to provide data. Another thing is the bus size how many words to be transferred in a single bus. So, these two can be set with the help of this mode resistor and then a stream of data can synchronously be fed on to the bus. As it is shown in this diagram you can see here the processor is providing two information row address and column address with the help of these signals and after that sorry this is the synchronous DRAM timing which I have already explained. Now let us consider the synchronous DRAM timing. So, the synchronous SD RAM will send this command read a this is the address is generated and after that it is weight it is skipping few clock cycles. This is the no off that means the it is generating say three clock cycles in this case. So, latency is two. So, it will generate these no offs it will continue to generate no offs after providing this address and then data will be available after a fixed latency. So, this is the column address probe latency after this column address signal is obtained after two clock cycles the data will be available. So, this is the after this latency after two clock cycles data is now available and now in this case the bus length has been set to four. So, you will get four words of data from a 0, a 1, a 2 and a 3 obviously from consecutive addresses. So, this is how you will transfer data one after the other. So, instead of this asynchronous DRAM timing I mean the way the you are reading one word then another word and for each of them you have to spend 20 cycles here you can see initially there is a latency, but that latency is decided I mean is dependent on the cache memory access time sorry dynamic RAM memory access time that you are using and accordingly you can modify the mode register. So, the SDRAM has multiple bank internal architecture providing on chip parallelism. So, you may be asking how you are able to provide this one after the other that is possible because of the use of multiple banks that is provided inside the chip. So, that means here you have got four banks and for each bank you are getting one data I mean one data in four consecutive cycles. So, that is how it is made it has been made faster and obviously this will lead to faster average memory access time. Now, there is another innovation that was provided which is known as DDR SDRAM and DDR stands for double data rate. So, double data rate SDRAM double data rate synchronous dynamic RAM in this case it allows data to be sent twice per clock cycle. So, normally you know you have got a clock usually data is transferred either by using this positive edge or by using this negative edge. One of the edge is used that means on this edge one data is transferred then on this edge another data is transferred that is the normal DDR that happens a normal SDR that happens after that initial delay. So, instead of that in case of DDR double data rate SDRAM this allows data to be sent twice per clock cycle. So, you are using leading edge and trailing edge both the edges are used to transfer one odd of data that means you are transferring data here this is normal SDRAM. Now, in your DDR SDRAM you will be able to transfer at this edge as well as at this edge as well as at this edge and also in this edge and so on. So, that means your data rate is becoming double because you are transferring on both edges. So, this is another innovation that was incorporated in dynamic RAM and that has helped to provide data at a faster rate. So, this is the typical SDRAM organization you have got a memory controller and as I mentioned you have got multiple banks. So, here four banks are shown and four banks are getting the different control signals coming out from the controller and actually this is a very simplified diagram a more realistic diagram or a more for a real life process memory that was developed by IBM 64 MB SDRAM is shown here. Here you have got 14 bit address buffer then you have got different four banks of sale array as you can see this is a sale array 2 MB into 8 here also 2 MB into 8 2 MB into 8. So, in this way you have got 2 plus 2 8 MB into 8 that gives you 64 MB mega bit of dynamic RAM stored and various control signals generator data memory control circuitry and here is your data and eye buffer. So, here you are reading data 8 bit at a time that means here you have got 8 bit bus for external transfer. So, using this IBM 64 MB SDRAM you will be able to get 8 bit at a time and you can see various pin assignments of this SDRAM. You have got a 0 to a 13 address inputs then you have got clock input clock enable signal chip select signal read address probe column address probe write enable and data input output dq 0 to dq 7 dq m data mask. So, these are the various signals which are shown here for external interfacing. So, this SDRAM is very popular then you can make that dual in line memory module can be made by using these chips. So, you can see the first synchronous DRAM DIMMS has the same bus frequency for data and address and control that means here you are using I mean not using the double data rate. So, this here you are using single data rate on a single edge. So, PC 66 gives you data at the rate of 66 megahertz PC 100 gives you at the rate of 100 megahertz that means you will be getting 100 into 8 bit at a time and if you form from that DIMM and here PC 133 that will give you 133 megahertz. So, these DIMMS can be used in parallel to realize the memory system. And here DDR1 SDRAM DIMMS that uses double data rate and this is achieved by clocking on both rising and falling edge of the data strobes which I have already explained and again you have got a different types different modules PC 1 600 that gives you at the rate of 200 megahertz data and strobe, but 100 megahertz clock and address. So, data rate is double that of clock and address and control similarly PC 200 gives you at the rate of 266 megahertz clock data and strobe and 133 megahertz clock for address and control and that PC 2700 gives you 330 megahertz data and strobe and 166 megahertz clock and clock for address and control and PC 3 200 gives you 400 megahertz data and strobe and 200 megahertz clock for address and control and in addition to that you have got DDR2 SDRAM. So, you are going from one generation to another generation technology is improving speed is improving and as you can see here these DIMMS are also double data rate and it is starting with 400 megahertz that PC 2 series 400 megahertz double data and strobe, but 200 megahertz clock for address and control. So, in this way it goes up to PC 2 6400 where you can get at the rate of 800 megahertz data and strobe and 400 megahertz clock for address and control. So, you can see you have got a variety of DIMMS available nowadays commercially available these are commercial available DIMMS which you can procure and depending on the speed requirement of your computer system you can use them in your computer. Another innovation that was in the dynamic RAM was the use of Rambus DRAM. So, developed by Rambus. So, it takes standard DRAM as the core. So, inside you have got the standard SDRAM as the core. So, you have got typical DRAM array, you have got the row decoder, column decoder, sense amplifier and various other things that is required in a standard dynamic RAM, but you have got a kind of bus. So, which is known as packet switch bus. So, it is provided by a bus interface called a packet switch bus. So, a single chip acts like a memory system. It should not be considered like a memory chip, it is a memory system that means lot of control and other functionalities have been built in as part of the chip. So, through the bus, 28 pin bus it will interact with the processor and it has got 28 pin and various pins which are provided as shown here. The data bus is 18 bit, then that RC 8 bit, RC clock 2 bit, T clock 2 bit and so on. So, you can see, you can have up to 28 pin to 300 RDRAMs that you can have in a single system and you can have a number of RDRAMs up to 300 RDRAMs can be, you can have in a single system that can be interface to your computer systems. So, between sending the address of the request and the return of the data it allow other accesses over the bus. That means this is another features that is provided in RDRAM and also internally it does the refreshing. Since you are using dynamic RAM, it will require that refreshing of the memory. That refreshing as you are doing the writing or doing the reading, that time in writing in any case it gets refreshed while reading also as you read it, it gets refreshed. So, that reading is used for the purpose of refreshing and controller and these are the various components controller and RDRAM modules 16 bit data and 2 bit parity cycling twice the clock rates and 8 lines for address and control. So, 8 lines for address and control. So, this is the RDRAM memory system that is available and this was adopted by Intel for Pentium and Atenium processors and this is this Rambas RAM is the main competitor of SDRAM and it is available in vertical package. That means all pins that 28 pins available on one side Tata exchange take place over those 28 wires and that width of that is less than 12 centimeter and bus address up to 320 RDRAM chips as I mentioned I mean I mentioned wrongly as 300, but it is 320 RDRAM chips at a rate of 1.6 gigabits per second. This will be not BB GBPS gigabits per second. So, asynchronous block also it work in asynchronous block or inter protocol where the access time is 400 nanosecond and all then after this access time it gives you at the rate of 1.6 gigabits per second and this is the typical RDRAM with the pins various pins the heat sink integrated heat sink. So, it has been found that RDRAM is I mean compared to other contemporary standards Rambas. So, significantly increased latency heat output because of you know that various control circuits built in as part of the Rambas RAM it generates lot of heat. So, there is a long latency heat output manufacturing complexity and cost. So, Rambas RAM is much costlier than the SDRAM and RDRAM requires longer die size requires to house the added interface results in 10 to 20 percent price premium. So, as I mentioned it is much costlier than SDRAM and these are some other issues related to RDRAM. Few DRAM manufacturers ever obtained the license to produce RDRAM those who did license the technology failed to make enough rims to satisfy PC market demand. So, this means that it is less popular than SDRAM and during RDRAM decline DDR continues to advance in speed while the same line it was still cheaper for RDRAM. So, while RDRAM is still produced today commercially it is still produced, but few motherboards support RDRAM and between 2002 to 2007 only 5 percent market was captured by RDRAM. So, essentially all we are trying to tell is that SDRAM is more popular than RDRAM although it is being manufactured. So, this is a typical main memory organization here you can see the IO bus this is the processor this is the processor bus and this is the memory and IO bridges and you have got the read queue, write queue, response queue, cellular buffer and here are those various banks, bank 0, bank 1 and deems are connected to realize the memory and IO banks. So, this is the hierarchy that we have already discussed I mean this drive main memory on chip L2, on chip L1 register file. So, various technologies that are used magnetic DRAM SDRAM main memory is DRAM and bandwidth as you can see 2 gigabits per second 1.6 we have seen nowadays it is 2 gigabits per second and latency is 50 nanosecond for main memory compared to 2 nanosecond for on chip L2 cache and cost is significantly smaller than the cache memory. So, later in the next class I shall discuss about another hierarchy that hierarchy between main memory and the secondary memory which is known as you know that virtual memory system. So, in the next lecture we shall discuss about that. So, to summarize we have discussed about the enhancement of main memory, main memory optimization techniques by using wider memory, interleaved memory and also we have discussed about various dynamic RAM specific optimizations like the use of SDRAM and the RAM bus RAM. So, with this we have come to the end of today's lecture. Thank you.