 Last time, we talked about these addressing modes. So, any question on these? So, I want to mention a couple of things before moving on. We also talked about how big the displacement should be, how big the immediate value should be and so on and so forth. So, if you look at, so there are couple of things. The first thing is that, so these are essentially as we said, these are register operations. This has nothing to do with the memory, whereas the rest would go and access memory in some way or the other. And the special case was this one, which actually accesses memory points. Now, of course, as we will go along, we will find that these two are usually the most frequent addressing modes that you observe in programs. So, natural question that arises is, under what circumstances the compiler generates such instructions? Because the compiler has the ability to always generate this instruction, register it directly. You can always put the address in a register and can generate this instruction to access that particular memory. So, the question is, why should the compiler need this instruction? So, that is a very valid question. So, that is one question we will try to answer. The second question is, which I want to address first is, if you look at these addressing modes, the way you address the memory, particularly if you take these two, you find that we have a very compact way of representing the address in the instruction. Essentially what we are doing is, we are putting the address in a register and we do not have to put the address in the instruction anymore. As long as I put the register index one and register index four, my instruction is complete and of course, along with the opcode of that. So, previously, if I backtrack a little bit, we were talking about these classification here, that we have this classification based on number of memory operands, number of register operands and so on and so forth. And we said that if we have memory operands, your instruction size might increase because you have to put a big address in the instruction. But of course, that is not exactly correct. Now, you can easily see that because we have other compact ways of representing addresses. We can put the address in a register and can just put the register index in the instruction and that is enough. But of course, if you are using absolute addresses in your instructions, then of course, you will have to put the full address. But actually, nobody does that. That is a very infrequent way of accessing memory and the obvious reason now should be clear that it would increase your instruction size. You would always put the address in a register and then generate the corresponding instruction like one of these. So, only here we talked about putting the absolute address, but you will settle on fine circuits. So, the second question was that why should compiler generate this instruction? Displacement-based address. Because the compiler can always generate this one. You can put the address in a register and generate this instruction. So, this one you will find very often when accessing the stack. So, usually you have a stack pointer which is a register. And if you want to access something in the stack, you generate a negative displacement here from the stack pointer. Negative or positive depending on which way your memory grows. Similarly, if you have some pieces of global data, usually global data are stored in a specific region of memory. And the base of that memory is pointed by another register called a global pointer. So, if you want to access some data in the global memory, you essentially generate a displacement with respect to the global pointer. So, these are the cases where you will find displacement-based instructions getting generated by the compiler. And of course, as I said, this is a special case of that with the displacement of 0. For example, if you want to access the top of the stack, you generate this instruction with R 1 being the stack pointer. Sir, this is a base point register, register, type of IAC. No, here you have, no, here for example, we are allowing you to do ALU operations directly on a memory operator. So, this would be like a memory register IAC. Sir, it is simulating memory register with register register or it is a memory register? No, it is a memory register actually. One operand is a memory address. It is only that we have a compact representation of the memory address. So, you are using register to store the memory address. That is it. There is no other difference. Whereas, if you just remind others, if you have forgotten, if you only have load store instructions allowing memory operands, then that would be essentially a register register function, where your ALU operations will operate only on registers. Sir, as you mentioned about the, we put the memory address in a register. Yeah. So, I mean, following that, the displacement addressing mode should also be done like index addressing mode, because you can store the displacement in a register and then use that. Sorry, Sagan, what is that question? So, displacement you put the displacement in the instruction itself. Right, yes. So, you are saying, why not use this one? Yeah, that is a very good point actually. So, I was exactly about to mention that. So, why would I need this one? If I have this one, I should be able to exactly synthesize this. The point is that your displacement would be limited to a certain size in the instruction, because see, displacement will actually appear as a constant in your instruction. So, there will be limited number of bits given to you for encoding displacements. Suppose, you want a very large displacement. How would you do that? Well, then you have to use R2 this one. You have to use the index addressing mode. You put the displacement in R1, put the base in R2 and generate this instruction. Does that answer the question? Anything else on addressing modes? Any other questions? So, in these last three cases, this auto increment, auto decrement and scale. So, first of all, let me remind you what this is. So, essentially here, we are taking the value at location pointed to by R3 and adding it to R4 and putting the result in R4 and also incrementing R3. That is what we said. Similarly, auto decrement, you would decrement R3. And the scaled case, we would be essentially adding R2, R3 and 100 and generating a memory location. In certain processors, you will find that there is one more extra parameter. That is usually programmed into the processor before you invoke an auto increment instruction. And that is how much you should increment. Here, we are assuming by default you should increment by one. Need not be. So, usually the processors would have an instruction before auto increment which would actually program that special register which would be implicit here. So, you would increment by that amount. And the reason is that you might have an application where your array elements are not exactly one byte. So, here we said that you could initialize R3 to the array base. And you just want to add all the elements of R3 cumulatively. So, in that case, you would just do this instruction loop and automatically the R3 will get incremented. So, now if your array elements are not a byte, you would want to increment by a larger amount. And that is exactly where this becomes handy. That is how much I should increment that particular flexibility. So, digital signal processors you see to special addressing modes very frequently. This is usually called the modulo or circular addressing mode. Here auto increment, auto decrement with automatic reset when end of buffer reached. So, this is very useful for streaming data. You are streaming through a data and when you reach the end, it is automatically reset to that so that you are now prepared to stream again next time. There is another addressing mode found in digital signal processors called bit reverse addressing mode specially designed for fast Fourier transfer. So, I will not go into the details of this. If you know about FFT, you will be able to understand why that is needed. If you do not know about FFT, I would suggest that you go home and read about it. You will understand why this is actually needed. Bit reverse addressing mode. Now, the important point is that the compilers may never be able to generate these addressing modes because identifying situations is very hard. As I told you last time, first of all, the compiler will have hard time figuring out that an FFT is going on. You are given a large piece of code and it is very difficult for the compiler to figure out that oh, this is an FFT. So, I should be generating bit reverse addressing mode. That is almost impossible. So, essentially what happens is that you would have hand assembled libraries who actually hard code these addressing modes in the instructions. You will rely on the compiler. So, as we go along, DSP applications get larger and compilers become more and more important because gradually we are of course improving your DSP compilers as well. So, in summary, addressing modes should be simple. Displacement and immediate are popular. So, again I will try to remind you that whenever you talk about the immediate addressing mode, it does not refer to accessing memory. It is referring to operative on a constant. So, it is not actually accessing memory. Remember that. And you can always emulate, register, indirect with displacement because this indirect is just a special case of displacement, displacement 0. Your addressing modes must match the ability of the compiler to use them. Very important in the desktop server market because here hand coded assembly libraries are just out of question. Given the wide variety of programs we run. Displacement should be 12 to 16 bits. So, this has pretty much become the de facto today based on your statistics collected from applications. We discussed this last time also that the displacements are usually small. So, this turns out to be enough in most cases. Immediate should be 8 to 16 bits. Again that is again based on your statistics collected from applications. In reality these use the same space in the ISA encoding. So, the immediate and the displacement actually use the same bits in the instruction encoding. So, we will talk. We will look at concrete examples when we talk about the MIPS. Yes. At least one register operant you mean. Yes. The reason is that if you when you are designing the ALU for example, usually the ALU generate a result in the register. So, that is the reason why you want to have at least one register in the input. So, you could hide that as well. See I mean it is all about what you exposed to the outside world. So, underneath what is really happening is that whatever memory operants you have ultimately the memory is accessed and brought into a temporary register inside the CPU. Otherwise you would not be able to invoke the adder out of question. Because adder takes inputs from two registers and puts the result in another register. You could hide all these three temporary registers from the programmer. The programmer would actually see accessing to memory locations getting the result back in another memory location. That is possible. So, the next question is how big should my operants be in my instructions. So, this is usually specified in the instruction opcode. So, here again we are talking about the memory instructions. Because when you have register instructions of course your register size fixes the operant size automatically. So, usually these are specified in the instruction opcode like we talked about last time bytes, half words, words or double words. So, in this class we will follow the convention that a double word is 8 bytes 64 bits. Word is 32 bits and half word is 16 bits and byte is 8 bits. Characters are usually bytes but 16 bit java unicode is also popular. Integers. So, again here what I am trying to do is when you have a high level language program that will have data types. Characters, integers, reals, floats depending on what language you are programming. Now, the compiler will have to compile that data type and will have to map it to some operant type in the machine actually in the instructions. So, there has to be a mapping one to one mapping between your high level language data type and your instruction data type. So, for example when you are operating on a character, your instruction should be actually a byte instruction. It should generate some instruction that manipulates bytes. So, integers can get mapped to half word, word, double word depending on what data types the programming language actually offers. Because it varies a lot. For example, if you are doing programming in C, you would have short ints, you would have ints, you would have long ints, there have been various other flavors. So, it all depends on how it is exactly interpreted. So, essentially compiler will know the meaning of these things. Like when you say short int compiler will know what it means. And it automatically generates corresponding data size instructions. Floating point numbers are usually expressed in IEEE 754 format. So, these are standard actually. So, 32 bits or one word for single precision, 64 bits or double word for double precision. And there is another 80 bit extended precision. So, these are again standards. So, whenever the compiler sees that you have a double variable, it would probably try to generate a double precision 64 bit instruction. If the machine supports. Otherwise, that value will be broken down to two single precision 32 bit numbers. And the corresponding instructions will interpret these two together actually. So, we will again look at, when you look at MIPS, 32 bit MIPS, we will see how it actually manages 64 bit data types. For business transaction, accuracy of decimal arithmetic may be important. In fact, it is very important. So, usually there you would use packed decimal or binary coded decimal. So, these are supported by x86. So, you can actually generate BCD operand types in your x86 instructions. So, what is binary coded decimal? Does anybody know? What is BCD? It is just a number system just like your binary hexadecimal. What is BCD? For each digit we have explicitly write its decimal. Decimal, exactly right. For each digit. So, for example, if I write a number 23, its BCD encoding would be 00100011. You need 4 bits, right? 029. So, that is the BCD encoding of 23. So, why would you do that? Here is an example. Suppose you want to express the decimal number 0.1 in binary. How do you do that? How do you do that? 0.1 to binary. What is the procedure? Multiply by 2 and so, what is it going to generate? 0.1, then 0.2, then 0.4, then 0.8, 1.6. Then 1.2. So, you are back here. So, it records now this part, right? So, it is going to be a recurring binary representation. It is not going to terminate. So, clearly, but you have to terminate it somewhere because your machine has only finite precision. It can only store a finite number of bits. The problem is that suppose you make a transaction of 20 rupees 10 paisa. You convert it to binary inside your machine. You keep losing money slowly. Over time, it will build up by a huge amount actually. So, that is the problem why the business transaction machines like the ones that are used in banks and ATMs would probably never use binary representation, would routinely use probably BCD or go to this back decimal. Is it clear to everybody? The problem of using binary, your transactions will be inaccurate and you will start losing money or maybe gaining money depending on which way you round off. Of course, I showed you how to present an integer. Of course, you can represent a fraction also in BCD. You will have to put the fixed point somewhere. You have an encoding for that. Fixed point GPUs and digital signals. So, GPUs, they stand for graphics processing units and digital signal processors are also very common. So, here essentially what you do is you have a separate mantissa and exponent in your representation. So, what are my operations? Final question, right? Simplest operations are used most frequently. For example, 10 exetistic instructions were found to be sufficient to cover 96 percent of the entire spec integer 92 soon. It's an old truth, but this statistics hasn't changed much. Commonly supported operations are arithmetic and logic, load store operations, control transfer like the branch operations, system calls for talking to the operating system, to talk to your devices like keyboards, displays and all these things. And of course, the floating point operations, floating point arithmetic operations. Instructionally supported, decimal arithmetic supported by x86, string operations again supported by x86. By graphics here, I actually, ok, so it's there. So, graphics operations are often uploaded to a co-processor. For example, graphics processors would support special graphics instructions. Media and signal processors normally operate on narrow width data. So, here also by media, I mean multimedia extensions and all these instructions also come into the same category. So, possible to execute multiple such operations in parallel. So, this is often called a single instruction multiple data operations, SIMD. For example, your 4-wide SIMD in 32-bit Intel processors would actually have 128-bit wide registers. So, that it can operate on 4 32-bit operands in parallel. Digital signal processors normally support saturating arithmetic because, so by saturating arithmetic I mean that suppose you have a 4-bit register and the current value is let's say 14, you add 2 to it. So, it will saturate at 15, not actually overflow and make a mess. The reason is that on overflow taking an exception is out of question, especially in real time applications. In real time applications you just cannot even think about an exception. Because you miss all the deadlines and that will be a big problem. So, saturate to the maximum value. The DSPs also use something called multiply accumulate operations. So, these are essentially fused multiply add. And very important when you are for example, calculating inner product of two vectors. So, essentially what you are doing is that you are just doing this, right? X I Y I. So, what it would do is, it would multiply and add together. So, it will fuse the whole thing into one instruction. So, let's show you how it actually operates. So, initially you do X naught times Y naught, right? Put it into some register z, alright? Then after that every instruction will be doing this. This is just one instruction, multiply accumulate together. It won't be two instructions. So, that's the Mac. So, essentially you can complete doing an inner product of two inner dimensional vectors using essentially any instructions. You won't require two N operations to do. So, Macs per second is probably the most important metric used by digital signal processors. It would take less time compared to doing a multiply and add. May not be same, probably bigger than each of these. But take it together to be smaller. So, we'll spend some time talking about arithmetic logic, although I'm sure there isn't much to talk about. You know me about this. Add, subtract, multiply, divide, and, or, nor, XOR, all these operations are arithmetic logic. Load store not much to talk about. Load from memory to a register, store from a register to memory, right? So, let's talk a little bit about control transfer. So, there are four major types of instructions. Conditional branches, most frequent, about 24% of all instructions. So, which basically means that more or less every fifth instruction is a branch. So, they're fairly frequent. Unconditional jumps could be direct or indirect. So, direct unconditional jumps will actually load your jump target address in the instruction, so that from the instruction you know where to go. Whereas, unconditional indirect jumps would actually take your target from a register. So, you have to read a register to know where to go. Proceed your calls. These are also unconditional jumps, because there is no condition on that. You call a procedure, you go there. Only difference with this is that you'll have to do some extra work to make sure that you return to the right point when you return. So, that's procedure return. This is also another unconditional jump, but it does again some extra work to make sure that you restart your execution from the point where you call the procedure. So, naturally the question arises in all these cases is how to specify the target, because that's the most important thing when you talk about a branch instruction. Otherwise, there is nothing very interesting, because your opcode will tell you that, oh, this is a branch instruction. Of course, conditional branches would have another extra thing that is the condition. So, let's first see how to specify the target. If the compiler, okay, by the way procedure calls may be of two types. Okay, maybe I'll talk about this when you talk about the target. Okay, so let's hold on to that for some time. See, the compiler can figure out the target address. It includes it in the instruction as simple as that. Most frequent one is the PC relative targets. So, these are essentially position independent because it reduces linker border, because what it says is that, well if you are now here, I would like to jump 16 more instructions down. Okay, that's a PC relative target. So, finally it doesn't matter where this code ultimately goes in what address, because this offset still remains unchanged. Okay, so then you don't have to think, the compiler doesn't have to worry about, oh, finally where should this code set have to generate these absolute addresses. Okay, which is why we often find that compiler would generate this kind of addresses, PC relative, because it's the easiest one to do. And it decouples the compiler from the linker, because the linker will finally decide where the code sits. Okay, otherwise any addressing mode can be used. For example, so for PC relative addressing, we see that we'll actually use the immediate addressing mode, because we put the offset as an immediate value. So, internally what will happen is that the machine will take the immediate value, add it to the PC to generate the target. And of course, the immediate can be positive or negative. You can go backward, you can go forward, either way you can go forward. For procedure call and direct jumps, the target is normally included in the instruction as a large concept. Okay, however there is an exception. Procedure calls can be indirect, meaning that the compiler actually may not know the target when compiling the program. Can you think of any scenario? Sorry, same? Function pointers. Yeah, function pointers, exactly. Anything else? Any other situations? Multiple? Dynamically linked. Yeah, so let's talk about static binaries. We have static binaries. Any other situations? Other than function pointers, indirect procedure calls. So, another example is virtual methods. Actually, internally your virtual methods get compiled into functional pointers. Another example is switch case statements. Think about how a switch case statement would get compiled. How would it get compiled? You have a switch, you have a bunch of cases. Would it be conditional jumps? What is a clean way of compiling switch case? So, there is a restriction on your case argument. It has to be an integer. Do you know why? Does everybody know that case arguments have to be integers? It cannot be a floating point number, for example. Why is that? Loses the value. That's fine. That's an implementation detail. What would be a clean way to compile switch case? Would it be a series of conditional branches? I'll tell you what it normally does. Those who are taking a compiler course this semester will probably learn. So, essentially the compiler builds a table. So, each case target will be the table entry. So, case 0 target will be the first row of the table. Case 1 will have the second row of the table and so on. When you do the switch, at runtime it will resolve what the switch value is. It will actually call the procedure at that particular row of the table. So, essentially it's an indirect. Well, it's not exactly a procedure call. It's actually indirect unconditional jump. You can think of it in that way. It goes there. At runtime it resolves the value and calls the corresponding. Essentially changes the PC to point to the content of that row of the table. Anyway, so for indirect procedure calls, of course you cannot put the target in instruction because you don't know. What you would usually do is, you have an instruction first to load the target into a register and that register will be used in your procedure calls. So, at runtime the value will be picked up from the register and we'll take the, we'll make the challenge. Ah, so here, yeah, sorry. Cases, values may not be sequential. Right, yes, exactly. So, it will actually have a mapping for that also. Yes, yes, of course. The actual value mapping to the number of rows. Okay. Okay. Okay. Yeah, so it will actually have a mapping for that also. Yes, yes, of course. The actual value mapping to the number of rows. The actual value mapping to the number of rows. Okay. So, indirect jump and procedure return. Second procedure return. That's also not known at compile time. Why is that? Where I should return. I have a procedure which I'm trying to compile and at the end I have a return statement. Do I know where to return? Why not? Exactly. So, there will be multiple places from where I can call this procedure. Right? So, depending on your execution, it might have to return to several other places. So, not known at compile time. So, you can use any addressing mode other than absolute because absolute is used only when you know at compile time. Okay, right? So, simplest one is register addressing. That is place the target in a register. That's what is used. You place the target in a register and use that register in your instruction for jumping. Indirect jumps appear when using a, so there it goes. Texts, virtual methods, function pointers, dynamically shared libraries. Somebody pointed out. Normally, the target is loaded from memory into a register at compile time. Okay. How big is the precederative offset? 10 bits seems sufficient in most cases. So, again, the question is, I am here now. I have a precederative branch instruction. What is the range? How far can I go? Positive or negative, right? Transfer of the 1024 is enough. In most cases, enough. So, 1024 this direction or 10B. So, the next component of the conditional branches is the condition, right? How to specify branch conditions? So, there are several ways of doing that. One possibility is to use condition code. That is test special bits set by the ALU operation. So, this is exactly what is used by x86. So, you would first do the ALU operation like suppose you want to check for greater than equal. So, you would first do the comparison. And depending on the comparison outcome, the ALU would set some flag somewhere. And then you will have a branch instruction, which depending on the status of that flag will either branch to some location or just continue. So, it creates an implicit dependence for the branch. And that makes your instruction reordering hard. So, the simple reason is that your branch instruction now would depend on the ALU instruction. Unless the ALU instruction completes, the branch cannot go. And the more problematic thing is that this particular flag that this ALU instruction sets, usually doesn't appear as part of the instruction. It's an implicit target. That oh, you should actually implicitly change that flag. That is the most problematic part because if a compiler is looking at these instructions and trying to reorder them, it can easily make a mistake. Thinking that oh, this instruction, these two instructions look actually independent. But actually implicitly the ALU instruction would change a flag with the branch instruction depends on it. So, all these things will have to be kept in mind when writing the compiler. So, found in x86, ARM, RPC spark. The other option is to use the general purpose registers instead of special flags. The comparison result is put into the one of the general purpose registers. And the branch will actually check that register before branching. So, here it's explicit in the instruction. The ALU operation would actually have this particular register as its destination. And the branch will have this register as a source. So, there is a very clear dependence between these two instructions. So, there is an explicit dependence found in MIPS and ALPA. And the third option is compare and blanch. That is, you fuse these two things together in a single instruction. Some flavors of comparison are fused with branch instruction. So, one instruction per branch instead of two. So, you save an instruction. You don't have to split it into two instructions. You can do it in one. But of course, what flavors of comparisons you can fuse with the branch will depend on the complexity of the comparison operation. Because you have to remember that I am now putting it in a single instruction. It better be simple. So, complicated comparisons may affect your CPI. And this is found in VAX. This is the Hewlett Packard's processor, the RISC MIPS ALPA. So, notice that MIPS has both of these flavors. And actually, we'll see both of these when you talk about the MIPS ISL. In terms of that, a large number of comparisons are actually against zero. This is probably the most frequent comparison that you would like to know whether it is less than zero, rather than zero, you know, rather than equal to zero and so on. And also, less than and less than equal to are very frequent because of the loop control. The loop control is usually less than or less than equal to. You say your loop, tell me if I is less than or less than equal to my upper bound. So, proceed your calls. So, what must happen on a call? So, some of you probably already know what has to happen. The most important thing that has to happen is you need to save the return address somewhere so that you can come back to the right place. Normally, it is a dedicated link register or some arbitrary general purpose register. So, this particular term is used in the MIPS world, the link register. So, I want to introduce you to this. And we'll actually use this also later in the course. But anyway, the point is that there is usually a dedicated register or you could pick up one of the general purpose registers and put your return address there. Also, you may need to set up the parameters that are going to be passed to the procedure. Some architectures implicitly do this as part of the call while most generate explicit code for this. So, some of the architectures, for example, MIPS would actually generate explicit code for doing passing the parameter. And exactly how you pass the parameter will depend on the architecture. We'll talk about that later. So, you may want to pass the parameter through some register. You may want to pass the parameter through memory. So, you can do many other ways. The caller meaning that from wherever you are calling the procedure may want to save some registers so that the callee cannot destroy it. By callee, I mean the procedure itself. So, for example, I might be doing some computation and now I need to call a procedure. But I will require this computation after the procedure. So, I want to save the registers where I have the results of this computation. So, this is known as caller saving convention. So, the caller before calling the procedure would save whatever is important. Symmetrically, you can think of a situation where the callee may save any register that it wants to use in later point of the procedure. So, what may happen is that caller says that well I don't care. So, let the callee do whatever it needs to modify. So, the callee before starting it figures out these are the registers I will probably require to modify. So, let me save them first so that I can restore them at the end of the procedure. So, this is known as callee saving convention. So, both of these are fine. You can use mix of them. So, caller may save certain registers, callee may save certain registers and so on. And of course, it may be hard for the compiler to decide what to use because often you learn your compiler course that inter procedural analysis is hard. So, when you are calling a function, it may be very difficult to figure out what are the registers that the procedure your calling will actually modify. So, that may not be very easy to figure out. So, most architectures offer both today with clearly specified caller saved and callee saved register saved. So, what happens ultimately is that to make the compiler's job easier architectures actually specify I mean especially. So, I am talking very much in terms of MIPS here x86 actually does not have any of these notions. In MIPS architecture what they say is that I have these four registers which are caller saved meaning that before you call a function the caller has to save these four registers. In other words the callee is free to modify these registers without even worrying about anything that is what it really means. Similarly it would also have a set of three or four registers which are callee saved registers meaning that the callee before starting the procedure will actually save these registers. So, caller need not worry about these registers. And which in other words which he says is that the callee saved register contents will be restored at the end of the procedure. So, the caller can if it wants to save something it can use these registers. GPR is combining both of them callee callee plus something. Yes, this is the most flexible one this is giving compiler full freedom of picking whatever register it wants to be. Yeah, exactly. So, combining all these ideas essentially what you finally want is an instruction encoded. That you have defined certain instructions you have already defined you know what instructions would have what operands and how much of displacement you would require what addressing modes you would require. And finally what you want is the encoding of the instruction that how many bits I should have in instruction. What fields my instruction should have and so on and so forth. So, it has implication of code size, memory requirement and power. All instructions have an opcode field specify the operation. So, this is mandatory. So, when you have an instruction you should have an opcode field which says what this instruction is. Is it an operation? Is it a branch operation? Is it a load operation? What is it actually? Important decision is how many bits for addressing modes. Because this one is easy you figure out how many instructions how many operations you want to support log of that would be this. This many bits would require for the opcode as simple as that. How many bits for addressing modes also number of registers number of addressing modes and total number of operands in instruction have significant impact in code size. Because number of registers would determine how many bits you would require to address a register. Because log of that would be the number of bits you would require to address a register. Number of addressing modes would essentially decide how many bits you would require for a displacement or any other encoding of the addressing mode. Or even in the worst case you may have to do the following that you have so many addressing modes that you will have to reserve special bits to specify what addressing mode is this. Like if you have 32 addressing modes you have to actually reserve 5 bits in your instruction to say oh this is using that jumbo mumbo addressing mode. So those 5 bits will have to be there. And total number of operands will of course will have to go in the instructions like for example if you have if you decide that oh my instructions are going to be very long it is going to have 6 operands. Even if you have 32 registers and all the operands are going to require 30 bits to specify these operands. 5 bits each you have 6 operands. So there is a limit on this that you know how many operands you can really really encode. Of course it is always good to have large number of operands because that means you can execute complicated instructions in a single instruction. So overall you have to weigh all these things with the CPI. And as I told you the usual method is that you can finally filter out 2, 3, 4 candidate instruction sets. You probably have compiler for all these 4. You compile your benchmarks into all these 4 and you actually simulate all these 4 possible binaries. See the final performance and decide which one finally goes. Too many operands and addressing modes may necessitate separate address specifier field for each operand. For example what mode is used for this operand. Oh yeah so this is another thing. You can have actually cross product. You have 6 operands. You have 32 addressing modes. You can say operand 1 uses addressing mode 30. Operand 2 uses addressing mode 25. So you can actually think of this cross product now. So now that very quickly explodes your instruction size. So x86 has all these things actually which is why x86 actually has variable length of instructions. Instruction length is actually not fixed. With few addressing modes that can be encoded in the opcode itself. Usually this is what you really want. So you can have, you can say that well I'll have 2 addressing modes and I can have if I have 4 load operations I'll just have double the numbers. I'll have 8 load operations and I'll actually have opcodes for separate addressing modes. That's easy in fact. So one more thing that you have to decide at this point is whether you're going to have a fixed instruction length for all instructions or you're going to vary the instruction size for different instructions. So fixed length instructions have a fixed number of operands like your classical 3 operand instructions and a few addressing modes 2 or 3 that can be encoded in the opcode. So essentially the point is that all my instructions are going to have a constant length. That helps the instruction feature a lot because it knows that if I fetch 100 bytes I can calculate how many instructions I have fetched because I know the length of each instruction. Also a fixed length instructions sacrifice in terms of code size to gain in terms of complexity and performance because essentially what you're doing is that you might be wasting bits to make all instructions of equal size. Because what may happen is that some instructions you could actually encode in with smaller bits but you'd actually increase them to this boundary to make sure that all your instructions are of equal length. But of course you gain in terms of complexity and performance because now they're easier to decode because you know that oh I have fetched 4 bytes which means I have only one instruction now I can start decoding. On the other hand if you have variable length instructions you have to keep on fetching until you have the full instruction and that's very difficult to know when you have a full instruction. So variable length instructions for example you have x86 having 1 to 17 bytes of instructions you can have all possibilities actually. Required complicated decoders offers a lot of addressing modes and produces very compact code so that's the advantage with fixed length encoding. So x86 essentially say that well if I can encode an instruction in a byte I'm going to do that. I'm not going to increase it to match some constant length for everybody. And also it's possible to do a hybrid encoding like your ARM thumb and MIPS 16 offers 16 and 32 bit encoding. 16 bit encoding is used for instructions with small medians, a subset of registers and 2 operand format. Whereas if you want to support more operands, bigger register number you'd switch to a 32 bit one.