 So, we are discussing instructions set architecture. So, last time we talked about basic definition, so last time we stopped here classifying instructions based on number of operands. So, we looked at three register operands, zero memory operands, one memory, one register, two memory, zero register, three memory, zero register and the main point here was that if you want to have a memory operand, the obvious thing that you have to do is that you have to encode at least one memory address in the instruction. So, and a memory address is typically large depending on the size of the address space, it could be 32 bits, it could be 64 bits. So, that takes space and that makes your instructions on instruction size. Whereas, in a machine number of registers could be very small, normally small 32, 8, 16 of that range, number of registers exposed to the program. So, in that case a register address would be just 5 bits, 4 bits. So, if you have register operands, your instructions are going to be smaller in size. So, you have an instruction with memory operand as an advantage. That is, if you look at one instruction that is, let us suppose that is doing an addition, adding up two memory operands. Let us take this one. So, adding up a memory operand with a register operand and putting the results in let us say a memory operand. So, it has two memory operands, the instruction and one register operand. So, here what you are actually doing is you are doing three operations together. You are coding a value from a memory address into your processor, some register program, internal register. Adding that internal register value with this register value, putting it somewhere inside the processor, again some register that is not exposed to the program and then storing that register value back to the memory operation. So, if you want to do that in a machine which does not have memory operands, you will have to actually have three. So, that would affect your code density. So, that is there is a clear tradeoff between instruction size and code density in this classification. And we will further see certain other aspects. For example, if you have memory operands usually it is very difficult to pipeline, see why that is so. And it will of course have impact on execution time because we say that execution time is a combination of three parameters, the cycles per instruction, number of instructions and your cycle time. So, we have already seen that number of instructions is going to get affected, the code density depending on what encoding we actually use. And yeah, so number of bits for register encoding. So, we will as we go along we will look at these paid off small. Any question? So, let us start with memory addressing. So, this is a very important component of every instruction set architecture. So, by the way somebody asked last time I will just reiterate that once more that what comes first? Is it the instruction set or the architecture? So, the first thing that you do when designing a machine is the instruction set architecture, that is which instructions my processor is going to have. And your processor is essentially an implementation of the instruction set. We try to implement the instruction set what you get is essentially the process. So, this is what decides what your processor will do, what your processor will look like. So, as I said last time if you make a gross mistake in defining an instruction set, there is no way you are going to have a good processor. However smart designers you may have in your team. So, memory addressing is an integral part of your instruction set which is which decides how you address the memory. So, as accessing a value in memory requires two things. One is the starting address of the value that you want to access. And how many bytes you want to access, the length of the value. So, load store register instruction set architectures. So, just to remind you what this is, you might have forgotten. This one does not allow any, this one allows memory operands only in load store instructions. It does not allow any memory operands in any way instructions. So, a load instruction will have one memory operand, one register operand. So, the load instruction will take the value from the memory operand and put the value in the register operand. A store instruction would similarly take the value from a register operand and store the value back to the memory operand. So, these architectures normally encode the length in the opcode. So, what this means is that they offer different instructions for different lengths. For example, if you want to access a byte in memory for that there will be a separate instruction. If you want to access a half-word, half-word is two bytes. So, these particular types are taken directly from MIPS. They may vary from process to process. So, here I define half-word to be two bytes. If you want to access half-word, you will have another instruction. If you want to access a word which is four bytes, you will have one more other instruction. If you want to access double word or eight bytes, then you will have one more instruction. So, type becomes implicit in the opcode itself. So, I am not typing. How many bytes you want to access? And access for x bytes normally needs to be that is if you are trying to access let us say x bytes starting at address, then address modulo x must be 0. Why? Any idea? Why should it be? This is called an aligned axis. And if this condition does not hold, that is called a non-aligned axis. So, in most processors or at least in some processors, you will find that this is one requirement. That address modulo x must be 0. Why is that? Which means if I want to do a load word, for example, I can only start at an address which is multiple of four. I cannot start at an arbitrary place. Any clue why this should be? I am not saying that this is a requirement for every processor. In fact, one of your most popular processors do not require this, but this simplifies matters a lot. Why is that? Any idea? Reduces the size of the memory. Reduces the size of the memory. If we choose x as 4, then we can reduce the, we need not consider the last two bits to decrease the instruction size. I see, I see. But that would be the, so saying that if x is the power of 2, then I can decrease the size of the address in instruction. That is a good observation, but we will see that we can solve that problem otherwise, some other way. There are other reasons for this. Memory is in multiples of 2. Memory is in multiples of 2. Suppose I took the power 10, is that called a multiple of 2? Multiple of 2 is 6 is also, sorry, 6 is also multiple of 2. Power of 2. Power of 2. So, if you are supposed at the end somewhere around, suppose at 1024 bytes is the memory and if you are accessing 1021 and if you are trying to access 4 bytes, then in that case it may fail. So, what is this? If we give a restriction that only multiples of 4 should be allowed, means when you are accessing. No, I will allow all of these. Load byte instruction can start anywhere. A load half-word instruction has to start with even address. A load-word instruction will start at addresses multiple of 4 and double-word will start. So, if we take the case of 4. Then we should access from starting from 1020 to 1116 like that. If we are starting from 1021, then it may fail. It will access 3 bytes. So, what you are suggesting is that there is something that is power of 2 in memory, which is why if we adopt this, then we will guarantee that we will never straddle and boundary. All accesses will finish by that boundary. What is that power of? What is it actually? 1024 you said in your answer. What is it? Size of memory. Size of memory. Why is that important? Size of memory. I give you 4 gigabyte of memory. That is also total power. Yes, that is the power of 2, but I will have a problem only at the end. Just one access made, which is why I would restrict that. Anybody else? So, he is close. He has brought up an issue with power of 2. Indeed, it has something to do with power of 2. Yes. The thing is that if we read like 8 bytes and the remaining are 2 or 3, we would be unnecessarily doing that. Instead, what it would help us that it would break it down to the last chunk of those 3-4 bytes and then we would again drop down to 4, 2 and 1 simultaneously. No, what is this 8 byte thing you are talking about? So, if we have a last data to read. Last means? So, instead of doing it last, we would have it first. So, if we have something like 31 bytes to read. So, first it would do 1 byte, then we would have 30 bytes to read, then it would go with... Well, 31 can be decomposed in various other... So, it would help us decompose that, that is what I am saying, because for majority of time during that, it would read in chunks of 8 bytes, but when it can't read 8 bytes, it would reduce it. That is fine, but what does it have to do with this? So, because if we read, so if we don't do that, then we would unnecessarily doing the 8 byte reading even if we don't have the 8 bytes. Can I give an example? I am not following, what you are saying, I am sorry. Can I give an example? Concrete values, so that not enforcing this would actually break your, whatever you are suggesting. So, suppose only 3 bytes are left and we would be... No, what do you mean by left? That is what I am asking. End of the world or something or what is it? What is meant by left? Nothing beyond that? There is something beyond that, but we would be unnecessarily reading that. So, let's say it would be... No, left to what? What is that? After 3 bytes, what is going to happen? So, the 5 bytes that are left, that would be... 5 bytes now, okay, alright. No, but you have to define what is meant by left. So, that is a garbage value for us. Beyond this point? Beyond the point, yeah, that's... What is this point? So, that's the load instruction that we are getting. So, you read this much data. How much data? Give me a number. How much data? You have to read. I am just trying to get an example from you. Give me some number. You load 4 bytes. 8 bytes. What is it? Now, what I am not able to understand is, what is it that you mean by saying 5 bytes are left? Left in what? Do you see what I am asking? So, can you just articulate what you are thinking in your head? There must be something, right? Why you are saying this? What is it? I actually... So, we would have these only instructions. I had thought that we would have one. Instructions had been... Okay. Anybody else? Yes, sir. So, the boundary value is a memory block. Boundary value is a memory block. Can you define what a memory block is? So, the memory controller at the memory level will need a block by block. So, the... Typically, what is a block? Can you give a size? I think it was a... In kilobytes. When you request memory some data, how much data do you get back? Usually. No, I mean, what is it? Who is request... Sorry? The boundary that we are talking about... Yes. No, I understand. The boundary that we are talking about is a memory block. I am just trying to ask you, what is a typical size of block? What determines a block? The memory block. I know structures. No, no, no. That's the file system. No, no. That's the file system. Sorry? So, what size? What size? The word size is for the CPU. That will return that much block. Is that right? Is that so? Normally, where do you look up this data? When you try to... When the CPU tries to access data, why does it go first? Yes? Cache. Cache. Is there something called a cache block size? So, that's what the memory will return, right? And the cache block size is typically 64 bytes, 32 bytes, 128 bytes, 256 bytes. Probably not more than that. So, that is the boundary that we are talking about. If I do not enforce this, the chances are that I may end up accessing a particular piece of data in an instruction, part of which is in one cache block, part of which is in another cache block, which actually requires doing two memory accesses with one instruction, which is not a very good thing. So, here keep in mind that since we are allowing x to take only these values, there is some assumption about block size here, which is the power of 2. And that is why we need to enforce this. So, if you design a new machine where x could take other values, then this particular condition might change. Is it clear to everybody? So, these are called aligned accesses. x86 does not actually enforce aligned accesses. Your Intel processors actually can handle unaligned accesses as well. But we will talk about one processor that is MIPS, which actually does not allow unaligned accesses. We have to do all aligned accesses. And this is compiler's job actually to produce addresses which are aligned. However, in any case, an alignment network is needed for loads. Can somebody decrypt this particular statement? So, I have just said that this particular condition holds. So, if you have a load instruction of 4 bytes, we know that the address will be at a boundary of 4 bytes. It is a multiple of 4. We know that. But then I say that well, even then I need an alignment network. What is this network doing actually? What is the meaning of an alignment network? Can anybody guess? Think about the load byte instruction. And remember that the load instruction will first look up the cache. And there is a cache block size, which is usually bigger than any of these bits. And your load byte instruction, what is the destination? It is a register. You will read that those 8 bits out. A byte is basically a bit. And put those 8 bits in the register. And what is the typical register size in a machine? 32 bits? 64 bits. 64 bits. So, when you talk about a 32-bit machine, the register size is 32 bits. You talk about 64-bit machine, 64 bits is the register size. So, a load byte instruction at the end will go to a register. And that byte will sit in the least significant byte of those 4 byte registers. So, now can somebody guess what this alignment network is doing? You have a load byte instruction. You have to read the byte from the cache. And put the byte in the least significant byte within the register. I have already told you what it does. Can somebody summarize? What is it aligned actually? It should not be too difficult. It is like a byte. Can somebody... It is like a byte is broken up into two parts. So, it is aligned in that... Wite is broken up into two parts. Because we have a 32-bit instruction. I mean 32-bit as size. Wite is the register size. So, a byte is broken up into two parts. Two parts. What are these? We have 8 bytes to generate. Don't bring up the word size. Forget about it for now. There is nothing like a word size. I have a load instruction which has to bring 8 bytes from the cache into a register. And if you really want to think about the word size, in this particular case, I don't know what exactly it meant by the word size. Can you explain? Then I can tell you. At the moment, we can read 8 bytes together. This is going to be this one, in this particular machine. Because this is the max. So, we can read only 4 bytes at a time. 8 bytes is the word size in this class. So, we can say only 4 bytes. So, from these 8 bytes, we will look up into two parts. And 4 bytes in one register and 4 bytes in another register. No, it is a load byte instruction. I am just loading a byte. So, if you think about the data parts size. If you are talking about, I don't know what you are talking about by word size. If you are talking about the bus that leads from the cache to register 5, that will be 4 bytes. Because the register is 32 bits. Any data that comes from the cache to register would have to be 32 bits. But the maximum that you can read out from the cache is 64 bits. You have to be. Because otherwise, you cannot execute a double word instructions. So, we are supposed to align that. Align what? As we are reading the 8 bytes from the cache, but we can always say 4 bytes in register. So, we are supposed to align it. Align. So, ok. So, align what? These 8 bytes. With respect to what? These 8 bytes. No, but in this case, I am only reading a byte. That is all. It is a load byte instruction. I will give you the answer. You are close. You have almost you have got it actually. So, these are data pathways. So, from one go, the cache will provide me 4 bytes. Now, from the 4 bytes, I have to get the designated byte. And it can be anywhere in these 4 bytes. Because for a byte load, the condition is only address modulo 1, which can be anything. So, suppose that out of these 4 bytes, the required byte sits in the most significant site. It has to go to the list significant site of the register. So, you need an alignment network. That is exactly what it is doing. It will rotate the bytes, align it to the right position and then copy to the list. Is it clear to you? So, if you want to take concrete example, suppose I have a I will specify a load byte instruction by LB. So, I want to do a load byte from address 0x1. Register R. So, what will the cache controller do? It will take this address and figure out the 4 bytes falling in this address. So, it will be 1, 0, 1, 2, 3. I need this one. I need this byte and this will go to a register R. And the register should have it here. So, what the alignment network will do is that it will rotate it by 2 slots and then move it on the bus. Essentially, it will do a shift operation on this side. Is it clear to everybody? On some computers, it is possible to access list significant parts of a register. For example, x86 and leaving the upper portion unaffected. So, why is the alignment network needed for stores? Yes, yes, of course. Yes, it is needed for stores also. So, this is just an example that I wanted to give you. For all memory operations, you will require an alignment network. Any question? So, intimately related to memory addressing is a concept called Indianness. So, this is, this refers to byte ordering within a word and word ordering within a double word. Both are both the things. So, here I will talk about byte ordering within a word. Same rules apply for word ordering within a double word. So, little Indian machines place the byte with address at the end 00 in the list significant position. That is the little end of the word. So, I can take a word which is 4 bytes. So, what is a 4 byte? So, what will be the address of a word which is aligned? It will have the least significant bits will be 00, 01, 10 and 11. For an aligned word which starts at an address which is a multiple of 4. So, what is saying is that the little Indian machine will place the byte address this in the least significant position. So, this one will actually hold the least significant byte within this word in the little Indian machine. In a big Indian machine, it is exactly the opposite. The byte sitting here is actually the most significant byte in the word. So, here are some examples. So, alpha not there anymore today. Backs, old machine, you have only this one today and this is a little Indian machine. On the big Indian side, MIPS is very much there in the embedded market. Sun, ultra-spark although they are not designing any new processes but of course you can use them. Motorola also again there in the mobile market. So, this ordering remains transparent to the programmer as long as he or she does not try to access the byte as well as the word starting at the same address. So, here is an example. Suppose I define an integer x to be this. This is a 32-bit number. And then what I do is I try to extract a byte from it by saying char star c equal to char star index. And then I print star c. What do you get in a little Indian machine? So, a little Indian machine what will be the outcome? Yes, somebody. In little Indian it will be 78. 78. Big Indian, any other answer? How many of you do not agree with this? How many of you agree with this? What happened to the rest? Undecided is it? Undecided folks understand that it cannot be something in the middle. Is that clear to everybody? It has to be here or there. We are talking about the ends. Big end of the little end. So, what you have said is correct. So, this is in a little Indian machine. So, first of all we need to understand what it is trying to do. It is trying to extract. So, when you say char star c, char star and x is trying to extract the least significant byte of x. Is that clear to everybody? So, that is c syntax. So, when you say this it is trying to extract the least significant byte from x. The question is what is the least significant byte? In the little Indian machine we have just said that the byte with this address is the least significant byte. So, byte with this address on the little Indian machine is this one. On the other hand on a big Indian machine the least significant byte is going to be this one. So, that because it sits on the big end. So, this particular address is on the most significant byte. And since this one is trying to extract the byte with this particular address the 0 0 the last one actually. What we will get is in big Indian machine we will get this little Indian machine. Is it clear to everybody? This is very important. And in a double word we have two words. You can apply the same thing about the word ordering there. So, with your double word the word with 0 0 0 0 sorry 0 0 0. So, in the double word the line double word. You are going to have two addresses. That is if you look at the least significant bits in a double word. If you look at the words. So, first of all we are looking at the bytes right in a word. So, here we are talking about 8 bytes right. This is my one word this is another word. So, 0 0 0 1 0 1 1 and then this is going to be 1 0 0. So, this is 1 0 1 1 1 0 1 1 right. So, the word with this particular address will sit on which side of the double word can be determined by the same rules. So, I suggest that you go back and execute this program. This is not a program by the way. You can make it a convert it to a program. But of course, nothing surprising will happen if you execute on Intel machine. You are going to get 78. If you can somehow find a spark machine we have spark machines in the department and execute this one you will get the next one right. Is there any advantage of one word? No, there is no advantage. It is just two schools to design philosophies. And whenever you try to make a spark machine communicate with an Intel machine the word falls apart. Because you have to make sure that you know things are ok. So, now over time the ways of addressing memory have evolved tremendous. So, what I have done here is a list of things that have come up in several processors over time how to address memory. So, these are called addressing modes ok. So, how to specify an address in an instruction? And it could be specified in a register it could be specified in memory or it could be an immediate value inside an instruction. So, we will see example of each of these three as we go along the list. So, there are 10 major addressing modes that have been proposed over time and this machine called VAX at all of them. So, this is called the register addressing mode where you have just two operands. So, in all this actually we have two operands. So, in register addressing mode what you are doing is you are actually not addressing memory ok. You are actually just adding two registers and putting the value in one of these most probably R4 alright. So, this is called register addressing mode you are addressing the registers you are not addressing memory keep in mind ok. Immediate addressing mode talks about how would you address a particular value a constant in an instruction right. So, that is called an immediate addressing mode. Again do not get confused with a memory address is not a memory address is specifying a constant in the instruction ok. I want to add three to the value in R4 that is all it says alright. Displacement addressing mode this one is actually addressing memory ok. So, what you is doing is it is taking the value in R1 adding 100 to that and whatever you get is the memory address alright. So, often this is called the called the base register R1 and this is the displacement register indirect. So, here essentially same as displacement displacement is 0. So, it is a very special case of displacement addressing alright. Indexed addressing where you specify your memory address as a sum of two register values you add R1 and R2 what you get is basically your memory address and the name comes from the fact that you can think of R1 as the base address of an array and R2 is the index into the array alright. So, what you are doing is you are actually accessing R1 indexed by R2. So, you started R1 add R2 wherever you get is the memory address. Direct to the absolute address you specify the memory address directly ok. So, notice the difference between this and this ok. So, the way we notationally we put a parenthesis here ok specify an address alright. So, this is not a constant here this is a constant, but this is an address ok. Memory indirect this is probably the most complicated one. What you do is you take the value in R3 go to the memory location alright wherever R3 is pointing to and whatever the content of that location is your final address ok. So, it totally is what it does. So, this is your R3 it has some value right. Use it as an address that points to some memory location here ok. It has some value if you use that as an address it will point to somewhere ok. And this is the location that it is referring to this particular one ok that is called memory indirect ok. Auto increment this one actually encapsulates two operations. It will this one first does the register indirect memory access. And then once this operation is done it automatically increments the value of R3 ok. What is the application of this instruction? Can anybody think of anything sorry ok. Exactly right. So, if you want to access an array you would initialize your R3 to the base of the array. And that is it you would just put an auto increment auto increment instruction in that ok. What will happen is that automatically R3 will move along ok alright. Similarly, you have auto decrement. If you want to start at the end of the array you can gradually proceed towards the head. And finally you have scaled. This is essentially a combination of two things indexed and displacement ok. So, I do not know if you can think of anything else outside this list how to address memory ok. Of course you can put one more at here to get memory indirect indirect. But you know fundamentally everything is pretty much here ok. So, what is the use of indexed addressing mode? Use of indexed addressing mode yeah. So, that is all that is what yeah. So, well if you just want to you will require a separate instruction for incrementing R2. If you want to go along and access everything yeah. But as such this is just you know giving you an indexed location into an array. So, what do all the facts had all these addressing modes? It should be obvious that we do not need all these actually ok. If you analyze programs what you will find is that these addressing modes will appear with different you know frequencies. Some would be very very rare some would be highly frequent ok right. So, you have to choose the effective modes only. Because it has implication and complexity CPI and instruction count ok. Because on one hand you would like to choose very complicated addressing modes. Because that would reduce the instruction count ok. Because that would encapsulate very complicated memory address in a single instruction like this one. On the other hand implementing these instructions will be a nightmare. That would probably increase your CPI and also complexity. So, typically what you do is you take programs analyze them. And find out which addressing modes are most frequent. You definitely support them alright. And the rest depends on how much complexity you can afford ok. Or how much the CPI will get affected if you implement the remaining address. So, large number of addressing modes normally increase complexity. Decrease instruction count assuming the target applications and the compiler can exploit them. However, they may increase CPI. Designers normally simulate the target benchmarks to see the relative usage of addressing modes. For example, these are these are these are very old example taken from a text. On the VAX machine immediate and displacement modes are most heavily used by tech, SPICE and GCC benchmarks ok. Memory indirect, scaled register indirect in addition to the above to cover almost all memory accesses. So, you could actually do away with the remaining ones impact ok. We had 10 things in that list out of which in fact 5 pretty much cover everything ok. Register indirect could really be displacement. So, so register indirect was yeah. So, as I said this is a special case of displacement. So, this is a typical analysis that an architect to do before deciding your addressing modes. So, VAX designers argued that the frequency of usage depends on programming language and compiler. So, they did not take any risk. They said well we will put all 10 alright. And probably what they have done in doing. So, is that they might have sacrificed CPI for commonly occurring programs ok. To probably speed up some program which is obscure may run maybe once in 10 years or something. So, before going along you have to decide a few things. So, the displacement mode just remind you once more what it was. We have to somehow encode this particular concept. It is a displacement ok. So, at the design time you have to decide how big a displacement can I support. Because that will have implication on your instruction size. Because this has to go inside the instruction. Instruction has to tell me that you know you should have R1, you should have 100, you should have R4 at the proper places in the instruction. This is usually instruction. When you design instruction it would have slot for the source register. It will have slot for the destination register. It will have some slot for the displacement also in this case. And how big a displacement would decide how big the instruction is. So, it varies a lot. Turns out that 0 displacement is most frequent. And again here you do the same thing. You take benchmark applications, study them and find out the statistics. That will tell me the histogram of displacements ok. I have seen maybe in this entire benchmark suit. I have seen maybe hundreds of billions of load store instructions. Give me the displacement histogram for this ok. So, from that it turns out that 0 displacement is the most frequent. And most displacements are usually positive. You seldom have negative displacements. But again you have to keep in mind that these are all conditioned upon what the compiler is doing. So, because you remember the compiler is actually generating these instructions ok. Large displacements requiring 14 plus bits are normally negative because so they need sign extension. So, we talk about this soon what that means. Those who remember a little bit about binary addition might remember what sign extension means and why that is needed. But anyway we will talk about that in the next lecture. Storage cloud and hence program language may influence the displacements. That stuff that you have to keep in mind. But anyway the take on a point here is that 0 displacements are frequent. Most displacements are positive. And positive displacements are usually small alright. So, what this means is that you can you are fine if you have you know small number of bits devoted to this step. So, we will see what MIPS did exactly in the next lecture. How many bits are devoted to displacement? Any question of this is it clear? Similarly the immediate mode same question arises. How big a constant can I put in my instruction? So, here of course there is a very clear trade off. If I find that my program has mostly large constants and I cannot put them in the instruction. That means I have to store them in memory. That would require extra instruction to bring them to the processor actually. As opposed to if I can put all of them inside my instruction. Then when the instruction is fetched the value will come along with that actually. So, it is used for moving constants. Also used in arithmetic operations comparisons. So, two major questions which instructions should support the immediate mode? Number one. So, clearly arithmetic instruction should support the immediate mode. Because I would definitely require manipulating constants. I would require adding a constant. I would require multiplying a constant. I might require dividing a constant. I might require logically operating on a constant like logical anding, orring, exorring and all those things. So, and of course I should probably require this in the memory instructions. Because if I want to specify an absolute address I should be using the immediate mode. How many bits should be devoted to the immediate? Same question as this. How big a constant can I put in an instruction? So, load immediate and ALU immediate instructions are most frequent. So, these load immediate are not really loads. These are just moving a constant to a register. They do not access memory. So, again if you look at the benchmark statistics you will find that small immediate values are heavily used in arithmetic. Also you can think about the programs that you have written in your life and ask how many times have I used a very large constant? Probably you will find too many cases. In most cases you would be using small constants. In most cases actually 0 is a heavily used constant. Large immediate values are normally used for address constants like some global offset and all those things. So, usually these are anyway kept in registers. So, we will talk about which registers are used for keeping these global addresses and all.