 So, friends welcome you all to the 21st session of on base development. So, today we are changing the topics not that I am going away from class of architecture, but we are going to see something different from what we have seen so far. So, what did we see so far let me summarize ok. So, that you know where we are. So, this is a typical system ok system on a chip we have we got a ARM IP ok inside that on thing. So, these are the connections ok. We saw what does ARM do and what is the pipeline structure is and then all the ARM instructions including some state instructions and then we saw how interrupts are handled ok. The interrupts can come from outside world or it could be some of the peripherals in the chip inside ok does not matter. Now, after looking at the interrupt handling and acceptance handling we started looking at coprocessor ok. In general how a coprocessor gets interfaced with the ARM processor ok and then we saw what are the coprocessor instructions are and then we took a an example how vector floating point coprocessor is implemented as a coprocessor to the ARM course. Having seen all this now we are going to explore some other different worldwhich is also part of the maybe if I ok one second this is memory controller maybe I shall I should write it like that ok. Now, we are going to see we are going to come out of the SOC ok and then going to see what all different kinds of memories ok. How does help in achieving our goals of using the processor this whole thing we can call it as CPU and then this is the memory in the system ok it is maybe you can call it as a board a embedded board or a embedded system which has got a SOC with all the pins and then we have memory. So, this is a memory controller actually which is got both no address and data bus coming out from the controller ok. Typically in a system the DRAM or what we call it as a main memory ok it resides outside the SOC ok. So, this is our peripheral maybe let me draw using a different color. So, this is our chip ok now inside that ARM SO ARM IP is there and then memory controller from maybe as a system designer we buy memories ok. There are different companies manufacturing DRAMs. So, the DRAM companies need not be in the same business as a processor companies ok or they can be the same, but typically there is two different worlds you know memory and processor. So, this memory we buy and then the memory controller is integrated into the SOC ok and then the programs are accessed from the memory and maybe you know we have another memory or maybe a common memory where the port and data is varying based on what kind of architecture we design and then we execute the whole program. So, now going forward for the remaining three sections at least we will three or maybe morealmost six sections which we will talk about their memory first the three sections we will talk about cache and memory ok which which comes in between the processor and memory ok. There is data cache and instruction cache we will be talking about it from next session and then we will talk about how memory is organized and what all virtual memory and physical memory all those things will be talking about. So, we have completed this we have completed this now before going into the bus ok AMBA bus or AHBA, all those buses we will be seeing. So, now in the subsequent classes and then maybe peripherals and how they are connected to you know a APB bridge and then on a APB bus you may find lot of peripheral setting. So, we will be covering this buses later on and then the we will touch upon peripherals and then see how different you know peripherals connected outside with the chips some peripherals may be inside the chip some some may be outside and how they are all interfaced we will talk about those things at the end of the class. So, end of the course. So, today we will start with memory hierarchy ok and then different kinds of memory ok this cache comes inside and then there are registers inside and then DRAM comes outside ok and then this may be connected to a hard disk ok this is hard disk and then this is the secondary storage it could be a hard disk or may be a a huge flash memory we canyou know with the advanced technologies maybe we we will be shifting away from the hard disk and then we will go into a flash memory. So, it could be a secondary storage it can be called as a flash or it could also the faster flash can be you know part of the memory itself. So, different levels of memories are there in the system and we will talk about those things in the subsequent classes ok. So, this is a overview of what we are going to be saying. So, we will we will not talk anything specific to a controller because that is very specific to a particular implementation of a controller. So, we will only address a issues from the programmer perspective and how that perform how does the memory perform and how does it how is it interface with the processor and what are the different kinds of memories in the system andtechnologies and how do they differ from each other and how does it help us in achieving our system goals. So, this is going to be our focus ok. So, let me change over my pen color let me take a brown. So, in today's class we will talk about memory hierarchy ok and then cache memory will be the topic of discussion next ok good. So, all of you know if you are from hardware background you must you will know more than what is going to be covered here ok and all of you would have heard about this in your earlier classes or courses on digital design and then we will see how the memory technologies and the CPU technologies are differing in terms of their performance and then we will talk about why do we need a different kinds of memory to be there in the system and how the we how do we use them to achieve our performance goal ok. So, effectively everything boils down to performance what do we mean the performance suppose I have built a system ok and then is of course, embedded system and then we provided with some power supply right and then we are using this to control something. So, some other control system maybe it could be the embedded system is inside a car or it could be inside a rocket or it should be it can be inside a satellite anywhere. Now, how do we define performance we should be able to make this processor function as long as possible with a limited power that means, what it should be power efficient and it should be reliable ok I talked about this in the initial classes. So, so to achieve all this we have to have not only a knowledge of processor ok processor is one part of it maybe I would say it is 60 to 80 percent of your knowledge and what you choose as a processor could play a major role ok what you write into the processor the software, processor as such cannot live alone in a system to a you know realize a a embedded system which is you know performance is better ok highly performing system. So, we need other technologies to be integrated with the processor like memories like some peripherals ok peripherals also I can call it as a processor, but now if they have a different purpose and then we will have sensors no temperature sensors gyroscope or you know different light sensor or motion sensor there are you know you typical example is a mobile. So, it is not only it has a processor a mobile that is got so many things including where touch screen in a separate system all together. So, so knowledge of all these things will make you a complete embedded engineer. So, we have now learnt quite a lot about ARM architecture in particular, but now we will see how ARM architecture is interface with the memory and cache and other technologies we will be looking at all of them and that will give you a overview of what an embedded system basically contains ok fine. So, let us see why we need some memory hierarchy or no we we have seen a hierarchy in the system in your any company you will have a CEO and then you know we will have vice presidents then president then director then there will be senior leads and then engineers there are so many ok there is always some hierarchy in any system that you see not only in the social world also in the processor world ok or embedded system world ok. So, memory as such has a hierarchy is similar to a pyramid structure and we will see why it is needed and how does it help in realizing our goals ok. Just a brief introduction about memory technologies because we can talk ah averse together on this. So, our intent is not to go into detail of everything every detail of this, but if this this is going to be a just the overview introduction ok which we are very much bothered about. So, random access memory all of you are aware ok. So, what is a random access memory you know maybe let me tell you see memory is a sequential some data is stored in a address right maybe if I say it is a 4 k kilobyte memory ok. So, immediately it should should strike you how many address pins are acquired ok how many are acquired see very simple rule you know you do not have to mug up a lot 1 kilobyte corresponds to 10 bits ok address line. So, 4 means 2 more you have to add right. So, 12 bit of address line is acquired and then maybe this memory could be a data bus of 8 bit wide or 15 bit wide or 32 bit wide does not matter ok. Based on what is the bit of the memory the data bus bit maybe it will access 1 byte at a time access it you know to throw it out or it may access 2 bytes in the such you know ah sub you know sequential address and then puts it on the bus or it could be 4 bytes in the sequence. So, so address with decides the size of the memory all of you are aware and the data bus with decides how fast in one access how much of data is taken out from the memory ok. So, I am not going into details of how address is you know decoded inside you know RAS and SAC all those things we might have already seen in the memory technology and my intent is not going into that those details, but we are going to see a typical memory where why do you call it as a random access number ok that that only you know my intent of explaining this was this. So, before that let me go to the another kind of a storage a secondary storage a tape. So, if suppose you know all of you are aware of audio tapes movie may not be using it no more now. So, the songs are stored in audio tape suppose you want to hear the second song you will do a forwarding button and then you will try to see the second song right and if you want to see that the fourth or fifth then you may have to forward it and then see those you know hear those songs. This is a typical example of audio tape ok audio tape I do not want to draw it. So, ok now if a tape is also used for a long term storage ok for a typical you know secondary storage tapes are also used hard disk is one which you have seen it and this is in your systems. So, these are sequential accesses because you cannot go to a particular data until you skim through the whole thing. Similarly, in a disk if you want to seek a particular you know a sector and a cylinder then you may have to go to that you know the particular place and then access them ok a tape is an example of a sequential access. Whereas, what do we call is a random access here that is the time taken to access any part of the memory ok. It does not matter whether the address is I say 4 kilobyte. So, suppose if I tell some other address or let me change the color I want to address you know the 1 kilobyte some 1 kilobyte I want to access some 10 bytes ok or I want to access a you know address at 2 kilobyte location and I want to access some 30 bytes. So, the address it could be varying from 0 to 4 kb and then I could I may be interested in accessing any part of that memory any it may not be a 10 byte it could be a 1 byte in that location also. So, when you have a random access memory the time you take for accessing a particular location is independent means the access time toget a data value for a particular location it takes equal amount of time I am not saying that it is very fast or something, but we will compare the sequential access it it is not that if you give me 4 you know address which is at the 4th kilobyte boundary it will take you know more time compared to the first location in the memory. So, it is like equallydifficult or you know access time will be errors you know independent of the address that you are trying to access. So, that is the difference between a sequential access ok and it provides you any address you give it provides you the data okwith a relative delay, but it is independent of the address that is what I am calling it as a RAM. So, the basic storage units in a RAM is a cell ok 1 bit typically DRAMs are you know done that way. So, multiple RAMs is form a memory ok this is all some high level which you might have alreadybe aware of static RAM is one technology ok which is built with a you purely a transistor circuits okmaybe a 6 transistor with a let maybe later technologies which may have a lesser transistor than this, but this is at you know a typical example of a larger footprint. That means, if you want to you know design a SRAM ok of a size 4 kilobyte ok let me take the same maybe it may take this much of space suppose, then if there is another memory which I will be just going to talk about DRAM ok dynamic random access memory. So, this will be much smaller compared to a SRAM design ok why I will that explanation iscoming all the way. So, this 6 transistor states you know if you want to actually transistor in the circuit I see it will take its own sweet no location. So, if a particular cell takes this much of this many transistor then the space area occupied will be larger. So, that is one feature of SRAM I am not calling it as a drawback, but it it is a feature and if you have a transistor of this much then the power they will be power hungry also because transistors will be in two states right either it is in a cutoff region or it may be a saturation region because 0s and 1s are defined in typically in that location. So, it will be consuming a lot of power from the VCC ok. So, when it is power it didn't the value the power goes off this SRAM is as good as having nothing and it is relatively insensitive to disturbances because this transistor circuits know when I say electrical noises it could be a EM electromagnetic disturbances okwhich SRAM is not impacted much and it is very fast in terms of access time compared to DRAM ok. And it is expensive because you need a bigger size because larger footprint and building that kind of a huge memory in SRAM is a expensive proposition compared to DRAM I am not I am not telling that it is compared to something else, but it is compared to DRAM technology. Now what is dynamic RAM? See both are random access memories, but this is called static RAM and this is a another technology which is built with the transistor and the capacitor. So, it holds a charge in the capacitor and then we have to as you know any capacitor is left even if you charge it to some Q if you leave it for some time because of the leakage current it will discharge slowly ok, but it will finally, get discharged fully after some time. So, if you want to hold a 1 or a 0 see typically a memory is used to hold either one of these values. So, if you want to hold one of these values you need to have the charges in the capacitor to be retained. So, you need a refresh cycle to make those DRAM cells refreshed frequently. So, that the values retained suppose if you are holding a 0 in a particular position accordingly the same value is retained. If suppose a DRAM cell is having a 1 we should keep that continue to have the same value. So, do not think that when you are refreshing it all will become 1 or all will become 0 they will continue to be holding the same value, but we need this. So, this means a DRAM based on number of bits you have and then the size there will be a refresh circuitry also required. Even if you have a additional circuitry the benefit of DRAM in terms of footprint is much much better compared to SR ok. So, even you have additional circuitry that does not matter actually the problem with the DRAM is the power consumption because you are running this I am not a power sorry I am not a power consumption it is about the having this frequent refresh ok doing that regularly is what is the problem with the DRAM. And another problem is sensitivity to disturbance because it is holding the value as charges you know that any electromagnetic radiation may induce for this correct. So, they may disturb the bits which are stored in the DRAMs ok. So, comparatively compared to SRAM and DRAM what you need to remember is ok this for a given size for a given size of a memory ok may be our our standard size may be 4 kb the footprint is the in a the area taken to H this particular SRAM or a DRAM takes a longer larger size compared to DRAM, but DRAM has a problem that it has to be refreshed frequently ok and then it is much slower than SRAM what I mean by slow and fast it is in terms of access time. So, how when I give an address to the memory ok when a processor not me may be processor gives a address to the memory how long does that a memory takes how long does it take to give max the data ok that is access time. So, the access time is actually a critical feature ok or a parameter which decides what memory to be closer to the processor ok. If access time is more then the processor needs to wait for longer time ok because you know processors are running faster compared to the access time ranges because the from these numbers you can make out I am not sorry I am not showing the access time here only the request time. So, access times are much much slow lower you are higher in the terms of time quantity it is much more bigger value compared to the processor speeds. So, if the processor has to get a benefit and then does not have to wait for long to get the memory the data from the memory the memory which is having a lower access time should be closer to the processor ok. So, then naturally you can choose given an option between this which one will you choose. I told you the SRAM is faster sorry SRAM is faster compared to DRAM. So, basically when the CPU is there the natural tendency is to keep a faster memory close to it ok. I will tell you why and no more about it. So, so that as soon as the address is given not as soon as, but at least after a less amount of time the data is back into the CPU. Maybe I can keep a slower guy away from the processor ok. So, I will maybe I will access it in advance. So, that the slower access time can be absorbed ok. So, suppose if I need a data may be down the line I may give the request early. So, that I get it before I will get it in time when I when the CPU wants. So, who is the king here who is the CEO of this organization or CPU? Because what is below them all are slaves no SRAM or DRAM is all do not have anything they just do what the CPU is has to do. So, no I give me the data and then I will give you the data. So, they are all damn devices I would say whereas, CPU is the only, but it is not true with organization. So, I am not calling the CEO is the only brainy guy and the rest of the organization is you know not. So, please do not read these two and then tell that this is what I am conveying here. So, I am trying to say that organization is like this, but once I come to processor world please do not associate any other features of this. I am not saying the CEO is a faster than then the VPs are slow and then the address are much more slower than that please do not ever think like that that is not true. If you are so, you cannot go up in the ladder ok. So, let us only talk about processor now no longer social hierarchy very good. So, I have given you a flavor of memory let us go into some more details. So, this is the comparison these numbers may not be exact it it is very close, but ok. So, the access time in terms of see do not say that 10 is bigger than 1 for access time this is better ok. Now, if it takes suppose S1 takes 1 nanosecond 2 at access a data and give you the data processor the data and then DRAM takes 10 nanosecond you cannot say DRAM is better than SRAM. Here the guy with a lower number is better ok because it is giving me the data much faster and then refresh is not needed here, but it is needed there in DRAM it is not noise sensitive, but DRAM is and cost wise it is like this ok if this costs 1 dollar ok there may be a similar size of SRAM will cost you under dollar. It is only a relative number now do not try to go and look at the cost of a DRAM and SRAM and then sumi that is not in this ratio ok. It is only a relative number I am trying to say. So, cache memories and main memories this we should have a clear a I told you SOC cache is mostly they are inside ok L 1 L 2 cache some L 3 cache or something is nowadays outside and memory is always ok we we keep them outside ok. So, we call it as main memory which is normally implemented realized with a DRAM ok. So, you have a freedom you can use any kind of memory to store the values right you can have a flash you can have a DRAM or you can have a ROM. There are so many other things which I am not talking about you might have e squared ROM, p ROM, programmable ROM there are so many kinds of memories ok we need a 10 hours or more to talk about only memories. So, I am trying to talk about that in a 1 hour session. So, we cannot talk about everything in the world. So, I am taken only what interest us at this moment. So, cache memories are implemented which are inside or with a SRAM technology and main memories are implemented using DRAM technology ok. So, that is a typicaltechnology we could do ok there are other flash and other things are there for as long as along with the main memory there may be some memories ok. So, typically these are the two technology that are used ok. Now, I explained about memory access is slower let me talk about that. So, DRAM typical takes this many nanoseconds whereas, and then further systems may need 3 x longer or more to get the data. See actually the access time may be this much, but as I told you the processor is inside deep inside ok there is a cache in between may be level 1 level 2 caches and then the DRAMs comes outside ok. So, even though DRAM has access time of this there is some more delay coming in ok it is for some benefit, but that it is like before you go and meet the company CEO you may have to go through the security first and then may be the divisional secretary then personal secretary and then you may have an appointment with the CEO right. So, the data which needs to flow from DR you know DRAM DRAM has to go through different levels of cache and then it reaches there, but there are advantages no you may wonder if this is all causing problem remove all of them ok connect CPU directly no no there is this case are there for some purpose. I will talk about that in the subsequent lectures, but what I am calling here is some more delay will be there in accessing the data. Whereas, the processors are going becoming faster and faster even our PCs are running at a above gears no 2.2 gears 2.2 gears 1 no this kind of a frequency is being used no processors are used or at this clock rate. So, this 1 gears converts you know if you draw this 1 gear hertz graph clock how sorry 1 gear hertz will be how much 1 into 10 power 9 right gear hertz 1 clock cycle will be this much correct. So, this is nothing, but 1 nano second. So, 1 clock cycle and as I told you CPU's no processors are designed with a pipeline. So, effectively the truth it is ok 1 instruction getting completed per cycle CPI cycles per instruction ok and average it is we claim that it is a 1 clock cycle 1 instruction comes out or get executed every clock cycle. That means, the processors are now running at this speed ok. It is having capability to execute 1 instruction in 1 nano second ok, but does the memory have that kind of a speed unfortunately they are not they take 100 times maybe it is coming down, but still it is in the range of you know few tens of you know like nano seconds it takes to give the memory. So, you may wonder why is it related no let me tell you I told you that you know pipeline is there instructions are flowing in and then they are going out. You may wonder I have an add instruction I have a move instruction within a register to register ok. I have some you know R instructions you have seen lots of arithmetic instructions and none of them are having anything to do with memory where they are all operating within the registers which are inside the processor and why are we making so much of first about memory. See any processor you know even I told you in the VPU case any processor it is not sufficient if it performs some jobs within the registers. They have come out it is similar to a in a restaurant the cook is making lots of food ok. If the food is residing within the kitchen of the restaurant there is no use of it right we should come to the tables and the guest should all the customers should get fed then the person in the you know cash counter will make money. So, the processors if they are executing only instruction within themselves and do not try to interface with the memory or some peripherals or it does not control any other systems outside then what is the use of the CPU executing at a one years and doing something within the register no no use. So, finally, they have to come out of the world to do something they need to get some data from outside world and then they should to throw some data out right. So, that time what happens memory comes into play. So, so the number of instructions that is related to memory access they are much more ok. I think it is from 30 to 40 percent ok if I am right, but it depends on application you cannot call it as you know you cannot give a fixed number that this many out of 100 instructions ok. I cannot confidently say ok around 60 instructions will be only arithmetic instructions which are operating on registers and then the remaining may be 40 instructions will be memory based or memory related. It is veryspecificate specific in a particular application. So, so what we have one time to say is a useful application will need memory accesses also ok. So, we have to worry about how fast the memory gives us a data and when I say data please do not relate only to the datawhat you get from LDM and SDM ok. Our problem is not only with only these instructions ok. So, that is one more thing you should keep in mind instructions ok. I told you though arithmetic add or R or whatever instructions are all arithmetic operation they can do with only registers inside. So, they do not have to come out of the memory, but how did the CPU get this instructions first of all it got it from memory only right. So, the instruction access that is a very very very important point ok. I should not have missed out this and I do not want to. So, instruction access is also a very important part of this. Maybe you may have a vector processor where a small instruction does a bigger job inside that is ok, but even then the data is coming from outside world. So, whenever you say data access we have to keep both these two in mind processor to run it is not sufficient if it operates with registers, but it has to get the instructions on what to do it is coming from memory. So, it is very very important that memory interface with the CPU is and you know the problems with that are understood well understood and then it is resolved using some technologies or mix of technologies to achieve our performance goals of embedded systems or any system I would say ok. Even your CPU your PC has to run faster we need to have all this in mind. So, please remember memory access is not only for data it is for instruction also. So, that is very important point ok. So, we are talking about load store. So, whatever I said was around 30 to 40 percent turns out to be correct it is exactly 33 percent ok as per this particular example. So, approximately one third of instruction being loads and stores ok. Apart from instruction accesses the memory the LDM and SDM instruction because we are talking about risk. So, our our focus of load store is only two types of instructions are there in the risk processor which accesses memory. So, there is no doubt about those things. So, maybe swap is another part of that ok. Do not worry about that do not question me that ok fine these are all the important points that I wanted to highlight here. Let us go into some interesting facts. So, I got it from a paper published ok in 1997 not too old. So, that is why we have a data maybe is still 10 years back it is fine ok. This graph is not changed much now in 2014 you know you can extrapolate it and then see that this is going this way ok. Maybe this technology might have increased, but the CPU is also increasing. So, effectively speaking the gaps will be there ok. So, why is this gap building up? The reason being CPU performance increases 60 percent per year. So, this is the kind of advanced or cross architecture which we talk about parallelism pipeline is one kind of a parallelism where there are so many scalar processors and then vector processors there are many many more advanced processor architectures. Research is going on in that field lot of interesting research going on in this field has made the CPUs performance increase by 60 percent per year. I am not saying that memory technology is not increasing, but in particular a DRAM technology DRAM the average increase in performance is 10 percent per year. It is nothing do with the lack of research, but it is the limitation of the material used or the system limitations ok. So, it is not that the research is working on this or better than this ok I am not saying that. So, effectively what is happening the CPUs are becoming much more powerful. So, it is not only the CPU processor technology even the processor technology what I mean processor technology that is you might have heard about 32 nanometers ok, 28 nanometers technology that chip ok when you have a wafer ok. What is the technology we use to etch the circuitry inside the wafer ok that decides how much you can pack into a particular chip. So, as you go into a lower technologies we are able to keep millions of transistors billions of transistors inside the chip. So, a complex circuitry a complex not only a single core or even multi core processors are possible to be built into a single chip and because of this the CPU is going up in terms of performance processor architecture advancements as well as the technology both aid in improving the performance of the CPUs . So, the there must be something done to make sure whatever the time amount we spend money is being spent on improving the performance of CPU is met and it is not you knowthe CPUs are not made way made to wait because of memory being slow ok that is very important. We have to keep this CPU engaged with useful work continuously do something to parallelize the instructions or you know do some operations ok or keep some a temporary storage inside the CPU. So, that this memory speed is offset that means, it will exist, but still we will achieve the better performance in the CPU. So, we are going to see what are the different ways we could do that ok that is our goal of this particular lecture ok. Now, though we are trying to address the problem that the CPUs and memories ok interface has and then we are trying to reduce this gap ok if someone means, but the program which is running in the CPU ok. If it is not helping us to achieve this memory latency is to be absorbed then we cannot do much ok, but luckily to oursurprise the programs that are running in CPU ok they have a principle of locality which helps us in our goals ok. Let me tell you what is that program tends to reuse data and instruction that are closer to those they have been recently or that were recently referenced themselves. What I am saying is CPU is here memory is here ok let me draw it to scale because memory is all going to be occupying more space compared to the chip inside ok. We are accessing a particular part of instruction and then the program is also accessing some part of the data ok and then let me not leave out the our stand stack ok. Now, let me explain to you suppose you are I hope you are familiar with a simple C function. So, you you are calling a C function ok with some parameter float A ok sorry B. So, you are declaring int I may be I declare some under ok sorry about the handwriting I should sorry you know tell this in every lecture. So, by now you are used to my handwriting. So, thanks to you thanks for thanks for your patience. Now assume do not tell me you know why are you declaring a huge array insert what are you going to do I I will do something ok do not worry about that. So, some code is running and then you will say return may be K some local variable is declared as K let me return it. So, let me say this function is returning a integer. So, what is happening this is some main function ok I am calling it which is value 2 and then 10.5 I can do this preferably this customer as this compiler converts it to you know integer format and this is you know type this is the 10.5 floating point you know I can instead of passing a variable I am passing some value values to be the function ok something is being done. Now, I am saying this is having a principle of you know locality let me prove it ok assume this has got a some small for loop ok and then it is operating on this register this array and doing some computation some may be you know floating point is coming. So, some mathematical computation some sign or cross or whatever you want to do ok. So, effectively assume this is having some 5 instructions ok in in terms of C. So, maybe if it is converted into some assembly it could map on to some under assembly instruction you know the roughly ok the whole of the for loop ok. Now, what happens in the memory let me go to different color I am getting bored with this color. So, let me erase this guy ok this actually will justify why I am saying the locality of reference is very important that is why I am writing some time memory is there let me use different colors for different kinds of data. So, I set stack ok our friend stack where this i is there i is located where our i is here ok assume that 100 integers are stored in this array and then let me take it is using some global variable ok global variable maybe it is using some global array also ok int g ok it is supposed to be g some 100 assume ok some global variable you know a strict programmer will say that you know accessing a global variable an array of global variable inside a function is not you know partition it is not the right current of programming. So, please ignore those commands for a moment. So, it is accessing a global variable also for some purpose where are they they are in some data imagine both are in the DRAM data is here ok in it is in some other part of the data. Now, let me draw some other maybe I pick up green for a moment let me say that maybe a flash you are accessing the instruction from flash directly you know if that is the proof that is the case instruction is here. Now, what all different kinds of data and ok thing I have told you this is there this is there this. Now, the principle of locality is it true for all whether accessing this or this or this is it true take this example I of 100 is an it is all accessed within this I within you know maybe if it is 4 bytes it takes ok given size of this to be our friend 4000 4 kb right this 4 bytes into 100 is how much for 400 bytes right it is a smaller percentage of that. So, that much of memory is accessed assuming some huge you know caches are there ok. Now, this program whatever I have written is concentrating on a small portion of the memory inside this program I agree do you agree and then it is also accessing another array of data which is in another part of the memory, but that is also 400 bytes ok. And what about instruction I told you instruction says of how many instruction where is that I said 100 instructions the whole form of this 100 instruction. So, there are 100 instruction compared to maybe a size of this flash maybe you know it could be 2 kb or whateverrelatively I say that instruction size is supposed to be lower than the data size the DRAM size. So, so a small portion of the code is accessed because finally, this instruction whatever we wrote is converted into assembly and stored there in the instruction. So, instructions are also have having a principle of locality that means, what the programs the program whatever we write ok tend to reuse a data and instruction it is reusing because it is for loop right it will go back to the same starting of the for loop. So, it is accessing this setup instruction more you know 100 times it is accessing. So, if suppose if I tell you I have a method of ok let me come back to some look addresses some method of keeping this instruction the set of 100 instructions close to the memory ah sorry close to the processor and then maybe out of 400 maybe if I can accommodate everything into a data cache ok. Assume that there is a temporary memory available within the CPU CPU is a processor with some cache and some memory controller some small peripherals all that ok. So, if I keep all this instructions and the data inside the CPU do not you think I can do this job much faster because I am going to use a SRAM here ok I will use SRAM technology to build this cache I told you that in the previous slide. So, I am not going to use DRAM what I used in building this you know main memory. So, what happened this will provide me data much faster rate and then I have enough space to keep all those instructions are used to be you know recently both instructions and data I am keeping it close to the CPU. So, I can do much faster ok it is similar to ok I hope you got this locality of reference is there in both that the data access in as well as in local variable subtract and then in terms of instruction every everywhere because of this program is structured like this I can have those things which I need very close. So, that I can perform my job much faster. So, CPU is able to accomplish its goals though it is running at a under sub nanosecond speed it can perform the job. So, there are two kinds of locality temporal locality a temporal means time recently reference items are reference in the near future. So, a for loop comes back after sometime right a similarly arrays are also accessed if they are you know closer data that is spatial locality. Suppose if I have an array of 100 most probably a computation involving this array will be using the elements one after the other within that space. So, that is for spatial locality that means the look the data which is kept in the memory which are close to each other are used by the program more frequently compared to some other data here some other data somewhere ok. I am not claiming that it will lose only the data which is closer it may pick something else otherwise you cannot achieve you know you cannot write a meaningful program, but the percentage of using a closer data is more much more ok please understand that. So, spatial is in terms of location in the memory and temporal means in terms of time because when I have a for loop that instruction is are going to be executed one after the other in the time sequence ok. So, in terms of time also they are going to be accessed very frequently. So, this is a quiz time now take two minutes I do not need I do not think if you need more you have to tell whether each of this ok particular thing correspond to whether it is spatial locality or temporal locality ok. Just write it in a piece of paper do not do not look at the neighbors paper of course ok it does not matter even if you get it wrong nothing ok take a break ok welcome back. So, array elements I think mostly I have given this answers in my discussion itself. So, correct array elements are closer in the memory. So, spatial locality access of sum of each iteration say which sum I am not talking about this sum it is not in the inside the iteration maybe I should have you know this inside this for loop there is a sum this is a variable ok. It is a one single variable sitting somewhere it is not close to this array a ok that is why I told you now data elements need not be accessed which are very close you also access and some variable which is sitting somewhere maybe it could be a global you are returning. So, it is not a global variable maybe it is a local variable, but it may be know after some local variable it will be in the memory. So, if sum access because of being closer to a or is it because of some other reason it is because it is inside the for loop correct. So, naturally anything to do with the for loop is because of the time because after some time the for loop is going to be repeating how many times n number of times ok sorry I gave this result by now you may have the result. So, instructions are accessed in a sequence please remember the pipelining itself instruction pipelining architecture ok instruction pipeline basically depends on this particular case ok. The instructions are all accessed from the subsequent addresses. So, it is a special locality andalways normally jumps are a sequence of instructions are executed and jump is taken and then another sequence and then another jump. So, this way it goes the jumps are not frequent ok it may be 30 to 40 percent again ok the number of jumps that you see may be once in 4 to 5 instruction they say ok number of jumps in a instruction stream 5 sequential instructions will be executed may be after one jump instruction may be encountered. So, if you do not have this ratio then there is no point in having a pipelining structure itself. So, there is a inherent benefit of executing instructions in a sequence before you encounter branch then you go and then start executing another set of instructions. So, instructions in a sequence is a special locality ok it is very obvious cycling through the loop is temporal locality ok. So, hope this are all clear to you ok let us go memory IIT before we come to memory IIT I think you are now convinced that it is required. I think we can go faster in the remaining cycle let us see some fundamental of enduring properties of hardware and software past storage technologies cost more ok it is like in our Indian IPL team ok the person who scores more run rate ok in a cricket match the person who is holding a maximum run rate will cost more in the auction also correct cost is more based on the speed . So, memory with a past technologies that means, the access time is much much lower they cost more to implement to build. So, and moreover I told you that this gap is also widening ok and well written programs tend to exhibit good variety of reference this is one benefit which we got ok we did not do anything particular for this, but the programs are such that they always need which are very close to each other which are access very frequently. So, we have this advantage and we have some disadvantage because of this we are going to marry these two to our advantage ok the fundamental properties of complement each other that is a beauty ok. So, if you put these two together then we get a better performance. So, this suggests an approach of organizing memory such as a memory and sorry systems known as memory hierarchy to achieve higher performance ok. So, it gives us a reason for doing this now this is the hierarchy I have shown you process that is sitting inside ok CPU registers or you can call it as CPU here the head of all. So, these are the memories ok the guy who is in the top is closer to the CPU ok and then we have memory which is further little bit and the DRAM is outside the chip and disk may be outside the system as the memory goes away from the CPU ok as you can see the width is increasing ok parameter will be stable when the width is increasing. So, the storage space increases ok this is size of the memory you might have heard you know nowadays terabytes of disk gigabits of RAM ok MV of megabytes of cache ok those days are gone where these were all diverse ok terabytes of disk gigabits of DRAM megabytes of maybe you know it is not too long, but they they lie between these two ok may be still the caches are in kilobytes also ok maybe if you see the embedded word. So, maybe I may I may swap these two or I will keep these two together here. So, kilobytes of cache and few registers few means may be 32 registers to 128 maybe more this is the size of memory you have. So, that is why the pyramid is going this way or the way I would have put it this way ok let me explain smaller memories are faster, but expensive ok I am not striking it I am trying to underline it larger memory is a slower, but cheaper ok that is the way it is SRAMs are costlier, but you cannot keep too much of SRAM memory inside the chip. So, you tend to keep because 6 registers I told you right. So, and they consume more power. So, and L2 cache also of this type DRAM here this is the main memory. So, you have more expensive which are closer to this over CPU ok exploit locality to the if you to get the best of both words. What I am saying is since our programs are having a beautiful property called locality of reference why do not we make use of it. So, because of that what happens is because the program running in the CPU are going to access more frequently the memory or data or instruction which are in the cache compared to what is in the main memory you get the best of both words most accesses are to the faster memory ok and larger memory available with the cost cheaper, but can you do away with you know can I say that ok anyway most of the accesses are to the local memory I do not need DRAM at all disk at all I will keep only the cache and then build my system. It is not possible your program what I told you is a smaller port of the memory is used more frequently that does not mean that your program is small your program is a larger one. Nowadays kernel is coming into going up to few megabits kernel size forget about your program application program occupying few megabits. So, your program is larger, but at a time you may need a smaller part of the program and then you may go to another part of the program based on the input or based on the control flow of the program. So, you need this larger memory, but you need this smaller memory which are faster to improve the performance. So, based on this what are you trying to achieve? We achieve the most of the memory accesses of the CPU ok since they are happening they are only to the local memory that means, the faster memory. So, most of the memory accesses are faster that means, we have a faster memory from the CPU perspective ok, but I am able to keep a larger program also and I am able to run this larger program also with this hierarchy. So, the amount of money that as a program developer or as a system builder I am going to spend on storing this program in a memory may be a no disk or a flash is lesser cheaper. So, a larger programs are saved or preserved or may you know maintained in the system using a smaller memory or cheaper memory sorry and execution is happening with a faster and a costier memory, but it is small based on this we get the best of both words that is all ok. I hope I do not have to reiterate this and all of you are convinced about it. Now, this is the kind of a detail structure ok if you want to go into the details of each of them, please read this and then see, but most of the things I have already told you and these are all L 1, L 2 caches are different layers of caches, do not worry about this if you do not understand we will talk about it in the subsequent class ok. Now, another quiz for your ok good I do not want to do one more mouse click take may be 5 minutes ok. You need to have for each of this you need to have either this or this ok which is larger or smaller do not put equal to ok I am not talking about this condition at all. So, you need to fill for each of the comparisons one of this take 5 minutes break come back ok welcome what is access time I explained you multiple times which is faster SRAM is faster RDRAM is faster in terms of access please remember access time in absolute value ok. Suppose if I call SRAMs access time is TS and if I call access time of DRAM is TD which will have a smaller value correct. So, you have to be careful do not say that oh SRAM is much faster. So, you know must be ok ok sorry one more thing I need to tell you I is closer to processor and I than I plus 1 I forgot to mention I hope while taking a break you might have noticed this nodes. So, if you do not if I not done please because this is closer to the memory ah sorry CPU and this is farther from the CPU. All this elements are further memory which has farther from the CPU and this is further closer to the CPU that you should know ok I need this I plus 1 is below ok maybe I plus 2 is below. So, maybe if I understood in a reverse way you might have got all of them in a different way that does not matter as long as your understanding is good that is fine ok. Cost you must be knowing I already told you the memory which is closer to the CPU is expensive memory size you cannot keep too much of memory closer to the CPU because of you know your SOC cannot be room size right it needs to be fitting within a board or you know small chip and transfer bandwidth that is a speed with which how many see it is not one speed it is a how much of data is see bandwidth this is something which you should all know you might have heard about LAN right 100 mbps ok those are old days ok ah 1 gbps what does it mean 2 gbps whatever this is how many bytes you know if I put a a watch you knowstop watch and then look at ok I stop it at this time t 0 and t 1 ok this is the LAN cable or a fiber cable ok I look at this how many bytes get transferred in a given time that is that is what is called bandwidth ok. So, CPU is sitting here and the cache is here and the some disk is here so how much of data you transfer that you mean each layer that is what is called transfer bandwidth ok and how fast you do. So, since you are accessing cache codes often that transfer bandwidth is higher than the other ones because other memories are going to give you data in a very lower rate, but this is very important unit of transfer how much of data you take ok from CPU to cache this is like this how much you transfer it is maybe if I can explain you one thing ah now you you might have heard about you know ah ball mart or food world or some stores they have shops here they they have a store room there ok storage space ok and they have a multiple layer ok in the Bangalore or you know any other city they may have a store house and then you know they may have a different you know at a higher level maybe in Delhi or ah closer to the port ah they may have a bigger store house so when they are bringing items into the stores food world or ball mart or whatever stores what happens they access initially ah you know some maybe few packets of or two bunches of being data a particular not a data sorry items from this store room, but when they are getting it from another intensity or ah you know intracity they they do not transfer one packet of data right one packet of ah items this case they may one ah cotton of multiple cotton of data or ah know ah items they shift they shift in that range. So, as they go farther and farther from the stores the amount of items that are transferred are becoming bigger and bigger because the transport cost is equivalent to how much you transfer actually if you add more it is not going to add more, but if you are adding more trips it is going to cost you more. So, as you go away from the the place you tend to get more amount of items or more amount of things in a one transfer that is the normal logic. So, unit of transfer very close to CPUs or in terms of bytes whereas, between a disk and a DRAM it may be with in terms of kilobytes or megabytes. So, as you go farther the amount of unit of transfer will increase that is very important ok. If you apply a natural logical thinking if you do you would be able to relate this and then get convinced about why it is happening and the frequency of access how far how far you know how frequently you will be access you know getting the items from another city may be once in a month may be within a city once in a week, but may be from the local stores every day. So, the frequency of access for a store as the distance increases is becoming you know more that means, you will you will do more frequently here compared to what you do from the farther distances ok. I told you this is one one man once in a month this may be once in a week and you are storing it once in a day because items are getting you know over then may be in a day itself you know they will be supplementing with some more items in the store. So, effectively what is happening the frequency in terms of number is more for the memory which are closer or the system which is closer to the consumption. So, it is equivalent to a CPU consuming the data where the stores people come in and then consume the items it is closer you know in a in a in a CPU world or in our programming world the data is consumed with the CPU. So, it accesses more frequently the memory which is closer compared to the farther one. So, it actually gels well with the reality very good. So, there are three important properties inclusion any data part of the lower level. So, what I mean by lower level is actually in the figure it is higher level, but which is closer to the CPU I told you I cause there is high plus 1. So, this is lower in terms of index at least ok. So, this is the lower level. So, inclusion that means, what any data which is here as well as which is here I would say any data here will also be here and if it is here this will also be here. I will take a you know example suppose you are accessing a file which is in the disk correct. The file is accessed by the OS code running in the DRAM main memory and then may be a small one page of that is brought into the main memory. So, it is here and then may be a particular block of that data is going to the cache. So, it is here and then may be one data element may be an integer value is going into a register. Now, if this data element which is accessed and put into the register is it here in the file of course, if it is a data file of course, ok your your program is accessing the data maybe it is an image ok JPEG image you are converting into some other format. So, every pixel has to be in a file which is stored in a file needs to be coming to a processor you agree it could be a CPU or it could be maybe a coprocessor if it is a image processing maybe our friend VSP will be accessing that it does not matter. As long as I amwhen I say CPU please remember it is it does not matter whether it is a VSP or a floating point processor or somebody else they are all all same processor ok. So, a small data in the file has to come to the register here to get operated that means, to cut converted it may come to a floating point processor or it could come to a CPU register, but they have to come, but while coming they come to DRAM and they come to cache and then to the CPU that means, the data what you see here in the CPU is there in every level you agree ok. This is I have convinced you that any data in the file has to be here and it should be there in every place, but you may ask me another question suppose if I have a instruction called add R 1 comma R 2 comma R 3 maybe the data what I have I is here I am intermediate addition I am doing and putting it in R 1 which is not there in the file ok, intermediate results which you encounter in registers need not be in the final file ok. So, that is one caveatwhich you should keep in mind ok, but I have convinced you that the data in the file has to be in the in all the levels before they come into CPU. So, that is also true and this is maybethe percentage will be lesser than what you normally see here that is why the inclusion is true for everything except for registers ok. Otherwise every memory level is below they have to have inclusion property. Coverance what is the covariance? Multiple copies of the same data are available at each level. See when you have when I copied a thing back to CPU I kept a a copy everywhere on the way, but suppose if I am modifying it and I want this to be also modified suppose may be a you know this file is being sent in real time by converting from JPEG to another format ok. Assume JPEG to some other format you are transferring and then it is sent to over network to some other player who is playing may be you know JPEG or MPEG that does not matter ok movie MPEG form and we are converting into another video format and sending it. What happens? A MPEG file is coming into CPU then it goes into another format goes into it it does not go into immediately it goes into different stages and then goes into a file and then getting sent over network. That means, any data that is there should also be in every other places also ok. They need to have covariance it is not that you have some other data here and something else on this side then it there would not be correctness of program. So, covariance is another property which you should all have keep in mind and if you can relate it with the real application it will make sense. And locality of reference I have talked about so many times. So, now by now you are all familiar. So, I do not want to talk this here. So, ok. So, these are the different three properties in memory hierarchy ok. Let me pictorially show a virtual memory I have not talked about this this will be coming soon. So, now you assume if a huge memory compared to physical memory this is our main memory and we copy what is required from file this may be a file system this ok. We copy what is required and then we take it we split it into blocks ok. Each page this is page and they are split into blocks and going into the cache. Cache is some other SRAM faster memory inside the CPU ok CPU is here ok. Now, it is multiple words are copied these are words. Words may be based on the CPU it is a ARM processor our friend. So, it is a 4 byte word ok byte not bit let me complete it. So, now what happens those bytes whatever is required is copied into the register. So, let me explain again suppose a particular value I I told you some jpeg or mpeg some pixel data is required which happens to be a floating point value or may be in a main integer it is required in a register what you need to do you need to copy may be a particularthe file may be of big size ok one movie may be few you know megabytes or whatever size. So, you may copy few kilobytes ok into physical memory and then we copy few 32 bytes or 64 bytes or 178 bytes we copy and then finally, out of which we copy one of them. So, this I if it is here we we do not take ok I showed you this way. So, let me assume that I is here we will not only take I from here I told you that when we are taking it from farther we need to get a larger value blocks it is like a store here the good world or Walmart is here and they are accessing it a small item that they will not take only one item they take one cotton up item right. So, it comes here a few kilobytes come out of which I am interested in only onepackets or something. So, I take one small portion of it and that. So, I is lying here suppose then I will come here and sit somewhere ok I assume that it is here then it comes finally, here. So, that is where that is how the memory is accessed. So, this is very very frequent this is little bit more frequent and this is the every instruction may be ok. So, you get the benefit of memory hierarchy here ok. So, this is a huge data table, but I do not want you to mug up this ok. Have an idea ok how it scales in terms of latency how different memory scale and then where are these memories are physically present in the system you know that you know let me start from register it is in CPU registers ok it is in the CPU it may be a 32 bit word CLB I will talk about it later something to do with the virtual memory memory management. So, ignore it now cache is 2 kilobyte ok of size, but one block is of 32 bytes see cache size itself will be kilobyte, but one block will be of 32 bytes. So, multiple blocks will be there in a cache. So, cache can be either inside a chip or out of chip normally L 1 in our L 2 also inside the chip second 2 levels, but as you go further the cache size is increasing see here the virtual memory pages are increasing and then these are all in mega bytes and kind of you know increasing size. These are all something to do the networking and the web based technology if you are aware of. So, web pages when you are accessing it it comes from the web cache web cache is like a proxy server actually if you are accessing a cricket info or anything you know where which is you know a cricket info is feeding a cricket scores live they will not have a one single web storage in a sense web server feeding all over the world you know that you know there are people accessing this cricket info when the match is going on from all over the world. So, they will keep multiple proxy servers which will feed based on the location suppose from India there will be some huge cities they know that the traffic is more from IT world because people working in IT companies now watch those cricket info more frequently than a guy working on the road or a civil engineer doing some constructions. So, naturally IT silicon value will have more accesses to the cricket info during the office servers than other places right. So, cricket info we will see the traffic on the network and then put the proxy servers that means closer to the places where the access is more frequent. So, the web cache is of that kind. So, that gives the data and then it goes comes to your web browser cache and when you switch over your web pages based on your manager is nearby or not then it will be caching different kinds of pages on the browser and finally, it comes into our network and then now when it has to be displayed on the screen it has to go to the GPU the graphic processor unit ok. That means, it has to go through the buffer cache this is the cache which is used for buffering the then data access from the disk and then it goes to virtual memory and then maybe cache and then finally, the processor displays it in the screen. So, your data cricket info data finally, gets to your screen through passing through all these stages ok. So, that is the way things are done. So, who is managing them the different kinds of no web browser or a wire card may be a hardware and then who is deciding which pixel to go where and the assembly who is writing it could be an assembler generating a code or it you know actually managed by this, but actually developed by you and me right you are writing the program ok and managed by you are not deciding that where my variable i should go into register r 0 or it should go to the register r 1 you are not deciding right when you are writing the program. Maybe if you are writing an assembly code as a programmer you will be deciding that my particular data should go into a particular register otherwise it is done by the tool if it is a i level language. So, this is what is the comprehensive comparison of different memories. So, I thought I will give you a flavor of where the virtual memory system is all about, but do not worry if you do not understand this I will be talking about this now in the subsequent lecture. So, processor is here and then there are TLB's here and cache and then main memory comes and then disk. So, DMA is a direct memory access if this will be performing a transfer of data without involving a processor to access data from the disk to the main memory ok. So, this is a typical system and cache will be accessed by the memory directly both for data and instruction that is very important ok. So, with this we have come to a end of the class ok. So, I hope this was very useful to you just to give you a overview of how a memory technology plays a role in our processor architecture. So, what do you write in your code actually ok assembly code what you write you need to keep in mind what is the cache block size ok what is the page size. If I tell you this you may not believe that, but actually if you design your program your data structures keeping these in mind actually you will be improving the performance of the system. So, for be to be a successful programmer successful system engineer you should not restrict to only one part of the world you should know about everything around this memory and how they relate to your program. So, that holistic view will give you a a better understanding of how to optimize your program and how to get a better performance in your system ok. So, it does not stop it only the processor it it even involves networking ok. So, I am not telling that you should know everything, but you should have an idea of all of them to have a better understanding to design a better system in the to achieve the performance goals of the system ok. So, I really enjoyed sharing this with you and see you in the next section for taking a journey through different caches ok. Cache means not this ok if you understand this this cache you will get better cache ok. Have a nice day enjoy your day. Thank you very much. Bye bye.