 So, I received actually a few emails and a few questions at the end of last class, whether I spend a little more time on what actually a function value is and what I really meant by compiling the switch case statements to function point or indirect jumps. So, how many of you don't know about c pointers, you raise hands, everybody knows about c pointers. So, c pointers are essentially a data type and as you can guess a variable of every data type can have a value. So, the pointer values are slightly special in the sales that we have treated as addresses. So, if a pointer variable has a value 10 that means we are talking about address 10 and if the star in front of the pointer that means the value at address 10. Function pointers are just pointers they can have value. The only special thing about function pointer is that these values are of course addresses, these are instruction. So, if you have function pointer f and if you assign it a value something a b c d and then if you call f following whatever syntax you require to call a function pointer it will take you to this particular address, whatever address that may be and whatever instruction is at that address from that instruction the console will start executing. Essentially the program counter gets changed to this particular value that is what it means that is the function pointer. Now, why would you use function pointers because you might have code programs like this which may say that it x k double 2 you want to execute one function here else. So, you may have a function here sorry you want to call a function when x is better than 2 else you may want to call some other function. So, what you may do is you have a function pointer here you assign f to some value and f to some other value here. So, you may be f equal to x f equal to y and then you may call f here. So, then depending on what the value of x is I am sorry this say x 1 you will either go to the location x 1 or to y and we start executing depending on the value of x clear what function pointer is. So, it is clear that when the compiler is compiling this program it is impossible to know what this call is going to be it is a procedure call that is pretty clear, but you do not know the target of the call target would be one of these two all right. So, which is why you have to compile it in a slightly different way essentially what will happen is that depending on the outcome of this branch the value f will be loaded loaded into a register and this procedure call will be an indirect call which will take the argument from the register. So, that is your indirect procedure calls here I actually club them all under indirect jumps. So, these are unconditional jumps by the way this is an unconditional jump although its value depends on some condition. Now, we talked about the switch case statements last time. So, who does not know about switch case statements there is a. So, typically a switch case statement would be like this you will assume some value some variable and you have bunch of cases case a case b case c and maybe a default at the end right. Now, what I mentioned last time is that the compiler prepares a table that is often called a jump table. So, for example, let us first assume that these are contiguous. So, let us suppose 0 1 2 3 these are contiguous. So, then what will happen is that the compiler will prepare a table this is index 0 this is index 1 is 2 this is essentially an array all right and what will be the content of the array. So, location 0 will contain this instruction address this will be essentially your starting address of case a this will be starting address of case b starting address of case c and the default all right. Now, when the value comes up. So, let us suppose you are switch x. So, what we will do is it will use x as an index into this array the only problem is a default part because if it is 0 it is not 0 1 2 then it should be 3 right. So, internally compiler will actually do that and you can jump to you can just use the pointer. So, you can just say let us name this array let us suppose let us call it is J T jump table you will set your essentially you will use J T x as a function pointer all right and then jump to that level. Now, problem arises when these are not contiguous suppose I say case 0 case 5 case 9 and the problem how we compile that here we have contiguous values. So, here we assume that is 0 1 2 case 0 case 1 case 2 it is not contiguous how we compile it any way it is not contiguous. So, if I allow you to waste space what you do what is the simplest thing to do no not if you fail some I still want a jump table, but I tell you that you can waste space. So, exactly. So, you find out the range right minimum maximum okay and essentially what we do is you do a map minimum to map to 0 maximum whatever you know max minus mean plus 1 or whatever it is okay and you will do this essentially. So, only thing is that there will be a lot of default values in the middle because very large out there we will see what any other option what else can you say. Let us say let us take concrete values here let us suppose this is minus 1 5 9 yes what do you mean by mapping. So, I look at the table of table that is why I say. So, and there will be another mapping which are values present to contiguous space. Then what do I say why do we need the second table I allow the table of table entries minus 1 to 9 right then I am done right why do I need the second table. So, we can see what how many cases and then we can take modulus modulus of the cases suppose that here size is 4. So, another table of 4 entries all right. So, we can map like 1 if it comes then it will go to first and then if it comes 8 or then it will go to 0 yes and if there is a function like actually. So, as we are doing there is a function and we shift to the next table. So, modulus of 4 you want ok 5 and 9 will map to the same entry yeah. So, what do you do. So, what do you do is 5 in place 1 and we will put 9 in whatever the in this substitute space which is free. And how do you look up suppose I am 9 now how do I know that I have to go to the next table. And if it is. It is going to be filled in this case in a 5 so how do I know that I have to go to the next place. So, there are two things right compare with set up the table and at one time we look up the table right. So, at one time suppose I my value of x is 9 I look up the table I go to the entry where 5 is currently allocated. But I actually have to pick up the value of the next table how do I know that I have to go to the next table. So, the value also ok all right. So, this is deprecate to an you know speed. But to end how do I get that. End for at the end of the value. Ok all right fine yes. So, you are essentially saying that let each table entry have a field which is actually the case value right ok that is fine enough. But how do I know how many collisions I am going to have. How many stars do I look at actually in the table how many entries do I look at in the table. In the number of cases. In the number of cases right. So, essentially what I will do is. So, what you are suggesting is that in the worst case I have a compiler right. In the worst case. In the worst case ok all right. So, that is a that is a hash table approach right. Anything else can I do to reduce the worst case. Of course, an average has to be put in many. What is this problem actually what is this problem what am I trying to do yeah. I have a value of x I have a range what am I doing actually it is a such problem. So, what do you do in a such problem what is a good algorithm. I can evaluate such right I can sort these values I can put the values in the table right. Like you have mentioned I have a field here and I can do a binary search on this right ok. All right. So, that was the digression I just wanted to give you a little more information about what really goes on, but here the point is that is going to be an integrate jump. We have a compile time I do not know what to go because I do not know the value where I will index it ok. So, it will be ultimately an integrate jump. So, we talked about this branch conditions this one also we talked about. I also had a few questions about collars save registers and collars save registers. So, this fairly simple actually. So, essentially the point here is that here we are talking about two functions f 1 and f 2. So, this is the body of f 1 at some point it calls f 2 right ok. And we have the function f 2 also somewhere. So, f 1 is a collars of f 2 ok and f 2 is a collars ok. So, what essentially it is saying is that if f 1 wants certain register contents to be preserved after f 2 returns then it has to save these registers ok. So, these are called collars save registers. So, before it calls f 2 it will save those registers because it expects the contents to be preserved when it comes back. On the other hand symmetrically if you know if f 2 knows the state of collars save registers it knows that it can go and actually override those registers or use those registers without clearing. So, that is a two way contract that f 1 save something and f 2 can use that without clearing. Similarly, collars save registers are essentially those registers which the caller might expect to be preserved. So, caller may expect certain registers to be preserved of course, caller could save them, but in a convention in a different convention which is called caller save convention you can say that well I will put this responsibility on the caller because the caller needs to modify these registers. So, it should save this actually for me. So, these are the two conventions. Scholar saving convention and caller saving convention. Most architectures as I have seen nips they support both. So, whether compiler can choose whichever. So, we talked about these also instruction encoding code size fix and variable like instructions. So, IVM code pack is an optimization for power PC instruction set where it does a run length compression of power PC instructions on the flight. So, does anybody knows what run length compression is run length compression or run length encoding. So, if I have 5 once of continuous stream of n once that will be replaced by n followed by 1. So, essentially instead of n bits you have run length plus 1 bits. So, that is run length compression of power PC instructions that is implemented I gave code pack and it decompresses while filling in the instruction cache because the instruction cache does not really know what a compressed instruction is. So, before you bring something into the instruction cache you have to decompress the instruction. So, you can see the trade off now it takes time to decompress right it is not 0 penalty. So, that means when you are bringing in an instruction until it is decompressed you cannot start using it. So, you lose some cycles there all right. What you gain in return you gain in terms of memory footprint you can actually accommodate bigger code in small amount of memory. So, and how do you handle branches that is a big problem here because now a branch instruction suppose you encounter a branch instruction. So, you say that well you should go to that target. Now, how do you figure out this target because this target would be somewhere else in the compressed code right. Because when you because branch instructions the compiler does not know about this compression is on the fly compression when the program runs it gets compressed. So, branch instruction would actually have the decompressed target, but the program has already got compressed on the fly. So, how do you figure out where the target is. So, what they do is it uses a dynamic table that grows gradually mapping branch targets to compressed address. Is this problem clearly what I am talking about the branch targets that go inside the instruction are actually decompressed targets that refer to the decompressed piece of code. Whereas, after compression the target has moved somewhere else actually you have to figure out where that compressed target is that is done with the table. On the fly you actually prepare this table as a value encounter branches all right. So, this all also takes time that is pretty clear actually. So, there also you lose whenever you encounter a branch you lose some time depending on the table and figure it out. However, this is a first time penalty because next time onward you look up the table and know where to go. So, 10 percent performance loss for 35 percent to 40 percent code compression that is what they reported. So, in summary for code size memory and power pick variable length or hybrid or narrow encoding. Because variable length allows you to get very compact code as we told as we discussed last time right. If you have an instruction that can be accounted in the one byte it will actually have that if not expand it to meet the maximum limit or the maximum instruction length. Because that is one of the fixed instruction code you could do you would expand the smallest instructions to be equal to the largest one actually all right. So, if you really worried about if you are really worried about your code size you are really worried about your memory footprint you are really worried about power consumption because these are actually length if you are more memory you will burn more power that is obvious. You should pick actually variable length or hybrid encoding that also talked about because you may actually have two different encodings depending on the instruction type. So, last two are somewhat popular in embedded market we discussed about width 16 last time width 16 and width 32. X86 is the only standing example of variable size ISA. However, all Pentium cores or I put Pentium here, but because Pentium we have heard of Pentium. On the other hand you might not have heard of standing virtualizing which are recent Intel processors. But the point is that every Intel processor today internally convert these variable length instructions to fixed size micro operations you might have heard of this term micro operations. So, that is what the internal architecture actually sees internal architecture does not see in X86 instruction. They all got all get translated internally and that is done only for the ease of implementation. So, we talk about implementation one so you see there. So, the point is that today if you take any processor internal architecture actually handles fixed length instructions all are there. However, they will be doing under a hood whatever they look like outside to the compiler that may be very different. For if you are really worried about performance pick fixed length encoder that saves you know a lot of cycles it makes your hardware simpler and simpler is faster. So, what time in fact starting from the first day of history in computing there is a debate between RISC and SISC. So, RISC stands for reduced instruction set computers whereas, SISC stands for complex instruction set computers. So, there is as such no formal definition of these two things and but as you can as you can guess from the names RISC computers would actually have a very minimalist view of computing. They would pick the minimal set of instructions that are enough to carry out all possible computations. So, for example, RISC will not try to offer you compound instructions that can do many things in single instruction they will never do that. They also only can implement this so we can go and implement it. We will only support minimal set of instructions. Whereas, SISC would actually go further and implement very very complicated instructions that can be done in a single instruction. Like for example, we can copy as we have discussed last time also we can copy a string from one principle. So, which can actually be emulated by a sequence of mind problems. So, it is a topic of long standing debate especially since the most successful RISC is SISC which is the SISC, the SISC ISC fact. So, people often you know would tell you that well yeah this is good it is minimal and we have beautiful properties but who won't SISC right. So, but anyway so here are some characteristics of SISC. I put it here because internally as I just told you internally the architectures today all handle this instructions. The ISA exposure compiler for SISC SISC may be SISC but internally they actually get translational RISC micro operations. So, I put it here also because we will have which is one of the most popular RISC instructions in architectures and also we will actually base our micro architecture design pipeline design around RISC architecture because that is what ultimately gets into every processor today. So, what are some characteristics? We have a small number of fixed length instructions. So, here the philosophy is that provide primitive to the compiler not the solution. Let the compiler actually implement the solution. Small number of addressing modes and many fast on-gy registers make it easy for the compiler to implement the solution. So, the compiler has two options. You have difficulty picking the right one and actually there is a higher chance that you make a mistake in picking the right one. So, let the designer figure out what are the good options actually and offer only those. So, the compiler will always pick more or less the good option. And of course, I mean you cannot emphasize more about having many fast on-gy registers. They will actually improve our problems. Equal amount of time to execute same type of instructions. This is again very important for pipelining and other harder optimizations. And again this philosophy applies here. If you have same type of instructions to have to take same amount of time, compiler can actually pick options quickly. They can make options easily. But while I have to this two possible implementations which one should I pick that should be easier for it to actually select. Okay. The best-handing example of Cisco X86 however today Intel processors convert X86 ISA to risk like micro ops for aggressive pipelining. So anyway, so the point is that when you are designing an instruction set this is the bottom line. Help the compiler write. Because if you cannot help the compiler designer you are going to have a shabby compiler. If you have a shabby compiler your processor is going to run shabby code. If your processor is going to run shabby code it is going to report back performance. That looks bad actually. If you say that well look that in your processor is so bad the problem may be that you have to choose a bad instruction set which is why you do not design a good compiler. So the compiler is very important today. Your great architecture may dramatically lose in market if a good compiler cannot be designed easily. Simplified trade-offs between alternatives offer instructions to bind compile time constants that is avoid interpreting them at runtime. So what this means is that essentially what he is talking about is that in your instruction there should be fields to carry constants like the immediate values that you talked about. If you do not have those what will end up happening is that you may have a constant in your program which will repeatedly get loaded from memory into a register. That is not good actually that hurts performance. So have some way of actually having compile time constants that is meaning that you should have space in the instruction itself. So less is more in the design of ISO. So that is the whole course of the day. Instructions in architecture is the first step to designing a processor. Decide what instructions are important may have to change from time to time during early phase of project but should be finalized as soon as possible so that we can have a fair idea about what to do. Late addition deletion in ISO may be needed due to complexity performance because it may happen that as I told you architects are dreamers. So they will dream about certain nice things and finally at the later phase of the design it will go to the circuit designers and they will say oh I cannot implement it is impossible actually it is absurd. So at that point of course the architect will push it as hard as he or she can and it may happen that it is just impossible. So at that point it comes back to the architect saying that well you should go and modify the instruction. Either delete it or break it up into smaller ones something like that. So that may be possible but that should be avoided as much as you can which is why an architect should have some idea about how complex the circuit is going to be for your operation. Question on instruction sets so this is all in the case study we will walk you through a subset of 32 bit MIPS analysis just to substantiate some of the ideas that we just discussed and we will also get to see one concrete instruction set what it does and what it doesn't actually but before that any questions anything that you want to ask anything that you want to know. So there are four families of MIPS instruction set 1, 2, 3, 4 that MIPS 4 is a super set MIPS 3 is a subset of MIPS 4 MIPS 2 is a subset of MIPS 3 and so on and so forth. We will focus on MIPS 1 only which is a 32 bit ISA so what do I mean when I say an ISA is 32 bits what does ISA stand for sorry it is an acronym yeah it starts as an alphabet I mean when I say it is a 32 bit ISA instruction size any other idea sorry register size anything else register size, instruction size what else a hard sum of something or memory sum yeah memory size cannot be 32 bits more address size physical or virtual yeah physical address or virtual address what she says physical you say virtual so I put both physical or virtual anything else that is your address size oh even the data bus data bus what else memory bus is the bus that carries bits from memory to the bus these buses are no bits what else can we we have instruction size we have register size we have address size physical or virtual oh I have one more dimension here instruction address or data address what does it matter I will spare you on that maybe memory bus what else so should I take a board number 1 or maybe I will give you the correct answer to you this is the register size so you have a 32 bit ISA means the register size of the bits which automatically decides your data path within the processor so because the data paths actually originate at registers go through your caches how many of the registers register will be decides how long the data path will require the bandwidth machine has no meaning to have a 64 bit data path that's pretty clear because your register size it can have narrowed data path that's possible but that actually comes with performance because to load a register you have to switch that bus twice has nothing to do with physical address size I can run a 32 bit processor on arbitrary amount of physical memory I don't care however register size decides your virtual address size so if you register size 32 bits you cannot generate larger virtual addresses because virtual addresses get generated through program code program code take data from your registers so virtual addresses will get translated to generate a physical address depending on how much physical memory you have installed on the computer this may be more than 32 bits that depends on where memory bus has nothing to do with this how much data you want to carry from memory to processor in one go is completely audible into how many bits your register has or you know how many bits your address bus has nothing to do with that I can have a very narrow bus and decide that well I'm going to switch this bus multiple times to fill up one cache line that's okay I can have a very wide bus the four cache lines together that's also fun there is no problem it has nothing to do with my ISA width and instruction size has nothing to do with it for example your this one I didn't expect that x86 has instruction size ranging from 1 byte to 17 bytes it implements 32 bit ISA it implements 64 bit ISA port it is a totally different case all right you will not make a mistake on this side it is a big C you should know this as a computer scientist what 32 bit ISA is all right so MIPS 3 onward are a fully 64 bit ISAs the textbook has an overview of that and actually I have also posted the MIPS 4 instructions in manual online if you are really curious to know what's there you can actually go through it but it's a thick manual but it talks about the entire set of instructions because MIPS 4 is a super set MIPS is a load store register ISA what does that mean this particular thing load store register ISA it says some certain things here operates only on registers so your ALU instructions can operate only on registers can access memory only through load store and remember that this is big idea I hope everybody remembers that what big idea is and there is no partial register access well almost true not exactly there are 4 instructions that can allow you to do a partial register access in MIPS we will talk about it so it follows the risk philosophy emphasis is on efficient implementation make the common case fast it takes amr as it is and tries to follow that at every step we will find that it will actually do this we will not spend hardware on things that are rare simplicity provides not solutions we will come across this philosophy over and over in this course so here is a quotation from Tony Ho a system can be so simple that it is obvious so that it obviously has no bugs or so complex that it has no obvious bugs so I just wanted to put it here although it has nothing to do with MIPS as such so the point is that your design your design philosophy should be that it should be simple don't deliberately make things complex there is no point in doing that essentially you are putting burden on the verification engineers to figure out if the implementation is buggy or not so essentially you start learning a programming language you start learning about MIPS ISA with the data types so essentially given that it is a language for a processor we will have nothing but bit strings so what are the bit strings that are allowed byte is 8 bits as you know half word is 16 bits word is 32 bits and double word is 64 these are the only 4 data types that we have ok integers are represented in 2's complement if you are forwarding you should brush it off although we will not refer to this much floating point it supports single and double precision compliant with IEEE 754 standard so what is the storage model it has 32 bit addresses so when I talk about addresses in the context of an instruction architecture that means in the virtual addresses it has nothing to do with the physical memory because ultimately this MIPS processor will get installed into billions of computers with billion different memory capacity so that will decide the physical addresses so here we can only talk about virtual address size which is determined by your register size which is my ISA so it supports 32 bit virtual addresses which gives you 4 gigabyte addressable virtual memory for process so there are separate 31 32 bit general purpose registers so actually there are 32 but one of them is hardware constant to 0 you can't change it so they did it because they took Amdahl's law by heart and they said that well constant 0 is used so much that actually it requires hardware register to maintain throughout we cannot generate a 0 constant it doesn't make sense it is so so popular so that's why there is a register always 0 whenever you need it use it that's it you don't have to generate this value 0 on the other hand in x86 does anybody know what is the popular way of generating 0 those who have browsed to x86 code should have seen that anybody can generate 0 that's an interesting question what are the operations that you can do to generate 0 subtract the same register anything else XOR so that's usually the popular way of doing it XOR R 0 because usually XOR is much faster than doing a subtraction subtraction is basically an addition and if you remember about this ripple carry addition the carry will ripple through you can subtract from the borrow will ripple through and it may take time actually to get the final answer XOR is much faster anyway so MIPS doesn't have to worry about this because it gets the value 0 for free but not quite for free it then it gets on the register the floating point side has 32 bit popular registers and you can see how it actually emulates double precision so writing to integer $0 will not change it so these are essentially norms you can generate instructions which actually has destination as $0 so when it executes the hardware will actually program counter is incremented by 4 except for branch and jump instruction which means instructions are of 32 bit in size in fact for all 4 families they have 32 bit instructions so what is that? instruction size is nothing to do with your ISA bit for example MIPS 4 the 64 bit ISA has 32 bit instructions there are 2 special registers high and low for storing multiply divide results so if you multiply 2 32 bit numbers you will get a 64 bit number that's the largest you can get so the result gets split into 2 registers both of these are categories so there you will get the 64 bit multiplication result and when you do a division you get a quotient with the remainder so the remainder and the quotient will also go into these 2 registers floating point registers are paired for doing double precision so the paired f2n and f2n plus 1 are accessed by the name $f2n that is f2 specifies 64 bits in f2 and f3 with the least significant word in f2 so remember that is beginning so f2 would have the least significant word among the 64 bits any question on this? so the high low registers are different yes these are extra these are special purpose registers hard I mean hard coded so that they get the result only from the multiplier and the divide oh yeah yeah yeah so yes exactly so these are all different yes so essentially you have 32 general purpose registers one of which is 0 you have the program counter you have high low and floating point registers you have 32 and when you want to do double precision you will pair them up any question? so how do you do computation on these data types? so there are ALU instructions these are classic 3-output format two sources and one destination and what are the operands of these ALU instructions these are the general purpose registers or 16-bit immediate band both signed and unsigned arithmetic are supported so what is the basic difference between signed and unsigned arithmetic is that overflow is not flagged in unsigned arithmetic sign extension of immediate for both signed and unsigned arithmetic 0 extension for immediate for logical instructions so when you have an immediate constant inside the instructions what you do is so this constant is going to be inside your instruction so you have a 32-bit instruction here the constant is going to be smaller in size but your data path is 32 bits so this constant will have to expand it to 32 bits so there are two ways of doing it you can expand it by putting 0s at the top or you can expand it by extending the signed bit so what they are doing here is that you would always extend the signed bit so for example so if you remember your 2s complicated essentially if your immediate are let us say in your instruction your immediate are represented using 16 bits for example in that case what will happen is that you are talking about 16 bits 16 bits is complement constant so your most significant bit will actually signify the sign so it is a negative number you will have a 1 there that will get extended so when you have a 32-bit number you can go back and check or if you already know about it that if you have a 16-bit negative number if you put 1s in all the upper places it will be a 32-bit negative number of the same value and if the upper bit is 0 which means it is a positive number extended by 0 so essentially the point is that you always do a sign extension of the immediate for arithmetic operations for logical operations the immediate are always treated to be positive constants and they are always 0 extended so you cannot do logical operation on negatives it does not make any sense when you are exploring your instruction you are exploring a big pattern signed comparison is completely different from unsigned comparison which is actually not true for signed and unsigned add so unsigned comparison will actually treat the values as unsigned values and signed comparison will actually treat them as negatives or positive depending on their size integer multiply divide take only 2 operands that is 2 sources and have implicit targets high and low instructions it offers instructions to move from or to high slash low registers to gpr because what will happen is that so now your result is high and low so you have to move the high and low values to general purpose register to be able to use that so there are instructions for doing that you can move from high to a general purpose register you can move from low to a general purpose register or even you can move the other way also a general purpose register into high that is also allowed actually although I do not know for what purpose you use that so here is a list of ALU instructions you can see that the list is pretty small actually it fits in one slide the whole anti predict logic so let us go through each of these just to understand because we will probably refer to some of these mnemonics sometimes in the design so this is easy add so the mid registers are always represented as dollar followed by an integer these are integer registers or sometimes for the general purpose registers so this one the first one adds dollar one to dollar two and puts the result in dollar three the second one is a subtract operation dollar two minus dollar one put in dollar three you have add immediate this one so this is a constant hundred added to dollar two the result goes to dollar three notice that this constant is actually encoded in the instruction it does not have to come from some add unsigned dollar one plus dollar two put in dollar three sub unsigned add immediate unsigned okay so remember that whether you have add immediate unsigned or add immediate the only difference is the over project there is no other difference in both the cases this will be signed extended this particular constant set less than so if dollar two is less than dollar one you put one in dollar three otherwise you put zero in dollar three set less than immediate dollar two less than hundred you put it in dollar three so what would be a good application of this SLDI follow yes thank you the follow up upper bound check so dollar two is your loop index if your upper bound is hundred you would be checking this every time oh did dollar two cross hundred you shifted most of the difference along with the as you in the right shift you shifted most of the difference you maintain the sign of the exactly so when you are shifting right you are creating space on the most significant side figure out what you fill in there so shift right logical will fill in zero will fill in the sign essentially you fill in the most significant so this one shifts by ten bits this one also shifts by ten bits but in this case the final result will be dollar two shifted by ten bits with zero filling on the upper ten bits here dollar two shifted by ten bits upper ten bits are filled with the sign bit of dollar two and then you have shift left logical variable that is you want to shift by a variable amount so the shift amount you can put in the register shift right logical variable shift right arithmetic variable and Louis so this is load upper immediate this is a very interesting instruction this is called load upper immediate which means you load dollar three with the upper sixteen bits filled with poly lower sixteen bits are zero so so this is a very useful instruction we will see the application of this lower sixteen bits are set to zero yes, not unchanged they are set to zero so that is your so remember that this is not a memory operation it is just loading a constant and it is actually inside your aid it is part of your logical operation