 Hello friends, welcome you all to session 16 . So, we will cover a few more things which I have missed you know which is yet to be covered on the instructions and in this lecture with this we will be completing all the ARM instruction set. So, you are all equipped to start writing more and more programs using the assembly of ARM. Welcome to the world of assembly language programming, I hope you will enjoy this . So, in this lecture we will be touching on Interpolatency ok, especially what is the worst case Interpolatency for a typical scenario. I will explain the scenario and then I tell you how many cycles that would take and then this instruction, all instructions which I have not covered so far. So, we will complete that and then I will show few examples not many with this we will be completing this lecture. So, both these sessions will be completed all these units will be done by today. So, this is the focus of today's discussion ok. So, what is Interpolatency? First let me explain I have told you earlier, but let us have a clarity on this now again. See this is the processor ok, from an external unit some interrupt is right ok, some signal it could be a low some interrupt signal is coming in from external in. So, it would happen at a particular time t ok, t equal to 0 at a particular time t it is happening because this is happening from the external hardware it could be a UART ok, a serial port controller or some other timer or some peripheral which is running independent of the CPU ok. So, it is generating an interrupt it does not have any sense on where the processor is at that moment. Whenever it has completed it is some particular programmed job it generates an interrupt. So, let us assume a signal which is connected to FIQ that is a fast interrupt request ok, because the CPU is supposed to respond to this interrupt at the earlier among everyone ok. Reset anyway is different because once the reset is accepted the processor is starting up fresh. So, we are not concerned about that we are worried about an FIQ which is connected to the system ok, which happens to be very critical andwhen this interrupt is raised how much time a typical worst case scenario of a ARM processor responding to that interrupt is what we are going to see in this example here ok. So, at a particular time in t it has generated an interrupt. Now, processor may be doing some job it could be you know executing an instruction which which is the worst case scenario would be a longest instruction which may be a LBM ok. I will explain that in detail if some such instruction is being executed or any instruction being executed ARM will will respond to the interrupt only after the completion of the execution of that particular instruction please remember. Because in the middle of the execution if it responds then it cannot know the how much of this instruction is completed how much is not completed. So, you cannot restart without any you know impacting the right correctness of the program. So, to ascertain that the instruction which is being executed is not you know left in a half way incomplete state the processor completes the instruction which it has started executing and then responds to the interrupt. So, when I mean by responding to the interrupt means it has to go to that interrupt vector table ok and then based on the FI2 here FI2 interrupt. So, it will have an entry here in the interrupt vector table to say it has to branch to the subroutine which is a ISR which could be the some handler FI2 handler ok. So, the time taken for the processor to complete current execution and then accessing the vector table and going to the first instruction ok executing the first instruction of the handler suppose it happens at this time then this is called interpolating the time taken for the processor to recognize the interrupt and then it is not enough if you designize it it has to access the vector table and then if suppose FI2 you remember it is the top most in the table. So, if the FI2 handler itself can start from here if there is no need to have a branch. So, let us assume the best case scenario where the instead of branching to the subroutine FI2 can write the we can write the subroutine here itself right. So, coming here and start executing this first instruction of the handler is what the time taken for that is what is called interrupt latency. So, this could be few cycles ok few m clocks. So, we are trying to see what is the worst the case scenario that means, what what situation could cause the ARM processor to take the maximum time to respond to the interrupt or to when execute the first instruction of the handler. See the time at which the time the FI2 interrupt happens is independent. So, it could happen just when the current instruction is being executed right it could happen that is the best case when the it is the current execution is just completed and then the interrupt is happening and it is recognize it. So, it will immediately come and respond you know within a 2 cycle it will start executing the instruction from the handler. So, that will be the best case ok, but we are here to find out what is the worst case why are we interested in worst case scenario because we are all embedded system engineers. So, we are you know worried about what is the maximum time it will take for the processor to respond to an interrupt we have to cater for that kind of a scenario because though this is not going to be very frequent or it may never happen in the lifetime of the system, but the system should always be designed keeping this in the mind because you cannot have a catastrophe just because when the interrupt happened the processor was executing a long instruction and then the system failed. We want a system which always works perfectly that is the role of purpose of embedded system engineers the designers. So, we need to cater for the worst case scenario so that we our system design is perfect. So, let us see how we can achieve that and how long is the let let us see that is going to be ok and then we will talk about multiplication instructions and then a few sample example ok ok. Before we start off with that I want you to recall the key things that we learned in the previous discussions. So, this is the relative priorities of interrupts you know all of us should know by heart ok. What is this order? Why reset is given the top most priority because this is the something has happened the system has to restart. So, we do not want the restarting of this system taking more time because we do not want whatever being executed it could be any of this exception handler system being executed right now and the reset is given then you should stop whatever it is doing the processor and then you should start executing the reset handler. So, that is why reset is given the most highest priority. Now you may wonder we keep claiming that FIQ is the most important interrupt and it should be given top priority, but data of what is sitting on top of it. That means, what suppose if an FIQ interrupt is happening from the external system ok at t equal to 0 and assume the ARM processor which is executing an instruction may be a load or store LDM or STM something do with the data memory ok. It was doing some memory access during that time ok using any of the load or store instructions or a swap instruction right. So, data of what can happen only when this any instruction which is doing something with the memory is what will cause a data of what assume at the same time the processor is also doing a data access with the memory and then it causes the data of what. So, data of what could happen when the memory is notifying an abort signal right. So, ARM processor is doing a multiple memory cycle it could do a multiple memory cycle if it is LDM or STM is done. So, every memory cycle at the end of every memory cycle the processor is checking whether there is a abort signal coming from the memory. So, assume that the Oskar scenario is that t equal to 0 the absolute time at the same time and the memory cycle is also getting over and then abort has been sensed already the signal has been raised by the memory and then the processor notices that there is an abort. So, why is that abort given more priority than FH? Any thoughts on this? Think it over the reason could I will tell you the reason ok. See when an abort happens what gets notified the exact instruction which is executing ok some instruction is getting executed which is could be a LDM LDR STR whatever. So, that instruction is causing a abort ok. So, it is being executed this is one of this instruction is executed and then PC is pointing at plus a somewhere it is pointing. So, now if some the OS or you know a handler needs to restore the you know content or you know remove the condition which has caused this data abort and then in you know it has to execute the same instruction again to make sure that it is completed properly right. So, to do that it has to know which PC at which instruction has caused the data abort. So, that should be captured and kept right that means, it has to be in the handler stack or it should be in the R 14 of the abort handler ok R 14 of the abort handler should have the PC value. So, that handler can come back and execute it and then restore the conditions which caused this abort . So, the PC value should be restored that means what it should be copied to this if we do not do it at that moment and with just because in our scenario FIQ also happened simultaneously where data abort has happened if you decide to start servicing this FIQ and ignore this we have lost the PC because PC will be now pointing to the FIQs handler and then it will start executing the FIQ handler code. So, the PC which caused the data abort will be lost ok. So, to make sure that this is restored what happens is the data abort is responded to ok that is why they have data abort more than FIQ, but please remember it is not serviced fully what I mean by that it just enters. So, suppose data abort has happened FIQ also happened simultaneously the data abort handler is just entered ok that means what the PC value which caused the data abort has been copied into R 14 of the abort mode and then immediately the processor comes back to have until the FIQ because processor knows internally that both are enabled both are actively attended to because both have simultaneously happened. So, it just only make sure that I note down where which address has caused the data abort I will come back to this and then address it later . You may wonder there is something though with the data abort how can we delay that please keep in mind the data abort has happened by executing some user code may be or it could be some other handler which are lower priority than FIQ, but let us for completeness sake or for the explanation sake let us assume that user code is being executed at that time a data abort has happened and simultaneously FIQ is also happening ok this is data abort ok and this is FIQ. So, the reason for causing the data abort is something below the user's code nothing though with the FIQ or FIQ handler ok FIQ handler can be executed without any problem even it can have some load store or whatever it will not it may not have problem the problem is only with the user code which where the user has put some value in some base register maybe it is R 2 suppose the user has written some address here which is not in the memory. So, it is causing an abort. So, it is always differ this addressing this problem compared to executing the FIQ handler. So, you have to have a complete picture of what is happening then only you will know what is the reason behind giving a higher priority to data abort at the same time we do not complete the handler the processor comes back to FIQ handler to complete it and then goes back please remember this happens only when both data abort and FIQ happen together and ARM has to service that. Now processor for the just because the data abort is above the FIQ and it is for valid reason. So, it recognizes the abort it enters the abort handler, but does not execute the handler fully. When I say it enters the handler it has already taken care of the PC and the CPSR into the abort handler R 14 and the SPSR ok please remember those things two registers are being copied by the ARM processor automatically. So, now we can differ this execution by now saving these values somewhere in the stack and then come back and then execute the FIQ handler ok. I hope this is clear to you this is very important concept which is very difficult to understand. So, what happens when both happen together let me summarize it when both the exceptions happen together ARM processor enters the handler it does not execute the handler it comes back to FIQ completes the FIQ handler ok and then goes back and where it left off it starts off. You may wonder how will it go back to this because ARM processor once it services this the FIQ interrupt is disabled not disabled at least serviced ok it is not pending the FIQ interrupt is not pending. So, processor tries to see if there is anything above this which needs to be executed ok before coming back to the user code. It will encounter that ok a data abort is pending I have not serviced it. So, it goes back to data abort it completes the handler and then goes back to the user code which was executing the some LDM or SPF ok. So, this is the pro of sequence when both of them happen together ok that is a very unique situation which is not under the control of many of us very good. Let us now so that is the reason why this is given more priority than this and naturally this is given lower priority than FIQ because the system designer needs to make sure that the most critical interrupts which need to be serviced is connected to FIQ interrupt input and the least priority interrupts are connected to ARQ. So, based on that you know you are prioritizing the order in which the interrupts are serviced and the prefetch and user defined . As I told you that it is for some running some operating system code it could be for file access or it could be for you know memory access it could be anything ok specific to any OS activities. So, these are all programmed control flow changes which is done by the part of the program as a programmer. So, this can wait till these are handled ok very good. So, let us now this says again I have shown you this thing earlier please remember setting this means it is disabled ok. What is disabled? ARQ is disabled FIQ is disabled if it is set. So, on entry to any of these handlers ok what is the default status of this I and F flags in the CPSR that is what we are showing here. So, I and F flag which are init is in the order F and I ok in the CPSR ok bit number 7 and the 6. So, automatically the processor as it enters the reset handler it sets both of them ok. What does it mean? When reset has happened we do not want to bother about whether the FIQ is coming in or IRQ is coming in when the system itself is rebooting there is no point in worrying about the interrupts to perform. So, ignore all of them both of them and then start handling the reset whatever you are supposed to do on the reset handler that is why the reset is given the IS priority and the both FIQ and IRQ are disabled. Now, other things are very obvious when undefined instruction or software interrupt of we fetch a bot or data bots are happening FIQ is always given half priority. Now, you wonder even during the data bot execution FIQ is enabled ok. So, if you remember recall in the priority data bot was above this and FIQ was below this please see the sequence this is the priority sequence whereas, this is is just to give it in a order based on the vector table addresses ok they are not in the order of priority. So, though data bot is higher priority, but during executing the data bot handler FIQ is unchanged if suppose it was earlier enabled it still remains to be enabled. That means, data bot during the data bot execution FIQ can be serviced that is what it means. Similarly, when prefetch of what is being serviced that means, it is working on the prefetch of our handler if FIQ was earlier enabled that means, FIQ bit you know that F bit in the XCPSR was 0 that means, it can get the first now if FIQ input comes some peripherals give that interrupt then it will be serviced ok in these status. But during reset it will not be serviced, but if it is already servicing an FIQ interrupt one more interrupt comes and the same input it will only serviced because the previous handling itself is not done. So, this is a previous interrupt it is being serviced so, it will not recognize that ok. So, these are the conditions of the flag. So, you can see that the IRQ is always enabled wherever which IRQ even it is servicing the IRQ interrupt the another IRQ is enabled why I will tell you the reason. Suppose IRQ is connected please remember there are only two inputs the processor that means, if you think that there is one peripheral and another one peripheral ok maybe this is peripheral 1 P 1 P 2 you should not think that only two peripherals can interrupt the ARM processor ok that is not the ARM is designed for or any micro controller or a processor would not have designed for only handling two interrupts or two peripherals. Then you may wonder I have a system where I have ARM processor with so many peripherals all of them can interrupt how can ARM with the two inputs can service them that is where their interrupt controller comes into which I have not explained ok. Let me give you a short introduction to thatlet me take a new page ok. So, ARM processor is there there is a interrupt controller ok IC I am calling it as IC that is interrupt controller ok. It will have so many interrupt connected to it is not interrupt means what I am saying is that peripherals which could generate so many peripherals it is not only 4 it could be 16 or 32 or any number of peripherals. And then this particular controller itself will be accessed by the processor ok this is ARM it can through APB or some MIMAP I O it will be able to configure this to say ok these are the peripherals connected to the controller I want this to be given higher priority compared to this. So, you connect the peripherals according to the priorities of the interrupt inputs that interrupt controller is programmed for and then you can also selectively enable some interrupts ok some inputs you can disable them that means even if it generates the processor will not recognize it because interrupt controller does not generate. Now the interrupt controller generate one single interrupt to IRQ please remember processor will not come and check this every time way higher you know controller do you have any interrupt no processor is busy with its own job of executing its own instruction. So, the limit of only 2 interrupts 2 inputs which are given to the processor 1 is FIQ and 1 IRQ normally FIQ is connected to a single peripheral device ok it is a normal convention the most important ok I should not put even here because is 0 ok. So, this is peripherally so important critical that if it generates an interrupt it should be serviced immediately. So, if you connect it directly to that whereas, IRQ you use it for connecting multiple devices to the interrupt pin. Now you may wonder IRQ comes and then it goes to the vector table right and then it looks at the handler it comes to the handler. Now handler has to know ok handler has to know which are these peripherals have created now generated interrupt and which needs to be serviced in the in which order. So, there is a priority maintained and then based on the priority the processor executes the ISR handler ok specific to the particular device which is connected. So, the the job of the interrupt controller is to monitor multiple peripherals connected to it and then generate the interrupt and then the processor comes back and reads the status register or the which are interrupts are enabled it looks at those register values in the interrupt controller and then based on the values it takes a decision ok. I have both P 1 and P 2 now generated interrupt, but I am most you know interested in P 1 first let me serve the P 1 first ok and then it will in the main time it will enable the IRQ and then go back. So, what happens if one more interrupt is pending it will again you know generate the interrupt or the handler could finish up all the handling of all the interrupts and then go back. So, it is all left to the individual implementation of the handler and the way how it is programmed controller is programmed. So, please remember interrupt controller is one level as a intelligent controller programmed by the ARM processor or the micro processor whichever is you know controlling the controller and then the peripherals are talking to only the controller and then through the controller ARM processor is notified on interrupt and then it please remember the ARM after that it may serve the interrupt the sources directly it has to get the data suppose serial code controller is there it has to get the data from the individual in devices only is the controller comes into play to monitor and then create a priority among the generated interrupt that is the purpose of the handler sorry controller ok. So, good let us come back. So, this is how it is done. So, that is what IRQ is enabled even inside the IRQ handler that is what I am showing ok very good. Let us now go go forward. Now, the most case scenario ok the most complex exception scenario is FIQ and an IRQ and a third exception all happens simultaneously. Please remember FIQ is also happening ok IRQ is also happening that means, it has generated an interrupt signal and then the execution the current execution of some LDM which is a most case scenario has generated an abort data abort ok. So, all of them have happened together now it can happen please remember. Now, what does the processor do ok based on the explanation that I have given you earlier it is supposed to recognize that there is an abort. So, it will execute it will respond to the abort enter the abort handler and then it will not execute it fully it will just enter the abort handler and comes back to FIQ ok to act to execute the FIQ handler it will come back that is the way it is programmed the processor knows ok because this is active this is active and data abort is also active at the same time then it is supposed to enter the data abort and before the execution of the handler comes to FIQ handler to complete the execution. Now, you may wonder in this situation after the completion of FIQ IRQ is also there will it do this or do this if you know the priority of the interrupt data abort first FIQ next and then IRQ right. So, when this to happen this is what is done it goes to the data abort handler and comes back, but once FIQ is done it comes and completes the data abort and then only it will service the IRQ interrupt. So, this is the order in which it is being handled ok. So, the escape scenario you have to complement comprehend all this data. So, FIQ has a higher priority than IRQ. So, and then IRQ also is masked out please remember it masked out the IRQ in the FIQ handler. So, IRQ even if it is generated it will not be responded to ignored IRQ will be ignored. So, FIQ handler actually enables IRQ on return to the user code ok on this one the user code only it enables. So, till then it will not be recognized. So, FIQ is enabled. So, data abort occurs same time when FIQ happens the processor enters the data abort handler that is what I mentioned. It enters it and proceeds immediately to the FIQ vector please remember it does not execute that in the it enters it. So, that it will recognize the address which has caused the data abort handler you know exception and then it comes back to the FIQ vector ok. It executes the FIQ now handler and then on return it goes to the data abort handler resume execution please remember. So, the data abort must have a higher priority than FIQ to ensure that transfer error does not escape detection that is the reason ok whyit is given higher priority. Need to add that time now because this overhead is there we have to add the time taken for coming to the data abort handler and going back. You been calculating the worst case scenario for FIQ got it. Let us see now this is the timing I am going to show you. FIQ interrupt is passed through a synchronizer what I mean by that I told you that FIQ is connected to the processor directly right ah. But there is a a hardware ok a sequencer is there which will take 4 cycles actually ok. The since the time the FIQ interrupt is generated to the time FIQ interrupt signal FIQ signal comes to the processor ok this is on ok. So, the sequence of job is to now synchronize the time and then find out know whether it is a spurious interrupt or is it a valid interrupt. So, those kind of signaling has even handling is done. So, I am not going to detail of the synchronizer that you have to comprehend almost 4 cycle getting a lapse before it gets to into the ARM processor. So, the longest time you have to worst case you have to scenario you have to you know you have to comprehend this 4 cycle also and then for having a worst case scenario of you know responding to FIQ handler you have to think of a longest instruction being executed ok. That also the longest the instruction just started executing and this interrupt is coming ok after this 4 cycle the interrupt is given to the processor by the time the execution of the longest instruction has started. Now, you have to wait the processor has to wait for this completion the longest instruction I told you LDM ok. LDM also there could be LDM multiple load, but it could be a 2 register load also or 3 register load or it could be a worst case scenario of 16 register getting loaded ok R 0 to R 15 all the registers are getting loaded from memory using the base register R 2 because R 2 is also overwritten in this scenario, but does not matter it it copies 16 values from the memory to the processor ok. So, till it gets completed the interrupt cannot be serviced. So, you have to comprehend this also. So, the execution time is 2 cycle before even start doing the memory access you know that address calculation takes 1 cycle and then you know putting the address on to address bus address register takes 1 cycle. So, 2 cycles are gone and then maximum 16 cycles for every word to be transferred why 16 cycle because 16 registers have to be copied from LDM ok. So, now you we are thinking of a worst case scenario of a data abort happening during this execution and effective is already waiting and data abort is happening. When will the data abort is happening the worst case scenario when the loss R 15 is value is getting loaded from the memory the data abort has happened ok. It is not completed the instruction is not completed ok,but you know the data abort is generated. So, it does not matter ok only data abort is recognized the processor can come back and then execute it at a lesser. So, only thing is data abort has to be recognized. So, if that happens then you you remember I was explaining that if on data abort this instruction has to be executed after the data abort is done the handler is executed it will come back and execute the same instruction right. So, to do that base register should be restored because now I am telling you R 0 to R 15 means R 2 is certainly overwritten here right. So, it has to be restored. So, it will take 2 more cycles to write back that keep that you know conditions of R 2 with the updated value of if these instructions have been executed fully what would be the value of R 2 that will be written not the copied value from the memory. So, that the instruction can be restarted after the handler is done the job. So, that takes 2 more cycles. So, if you add up all of them 20 cycles is gone ok see please remember interrupt was generated 4 cycles arrived and then before that FIT could come LDM started executing and this LDM also has not completed fully and data abort has happened. Now, because the data abort has happened it hasit has to restore the base register and then it has taken 2 more cycle ok. So, that is what that the cycle 2 more the 3 more this is 3 2 more cycle for write back ok and then I told you that data abort entry has to happen ok. Entry means what it should just at least execute the first access the first instruction of the handler. So, that is also going to take 3 cycles is why it has to save the R 14 value right inthe PC which has caused this PC which PC may be plus 8 ok that needs to be saved in R 14 of abort handler and then current CPSR of the processor has to be copied into the abort handler SPSR. So, those things need to take it will take some time right. So, 3 cycles elapsed now FIQ now after that it comes back to FIQ handler and then it will not execute immediately right 2 cycles because of the pipeline delay the first instruction of FIQ handler it will execute after 2 cycles of the pipeline delay. So, that is also have to be complemented now add up everything 2 plus 3 5 5 plus 20 this 20 comes from 16 plus 2 plus 2. So, 25 plus 4 29 ok. So, it is not 1 or 2 cycles it could happen 29 processor cycle that is n blocks could have could elapsed ok before the FIQ handler is executed this is the worst case scenario. So, hope you understood this this is the most important you know if you understand this scenario you will be able to visualize you know any worst case interrupt handling when you are designing a system ok and writing an interrupt handler you can think about this worst case scenario in your back of your mind and then accordingly decide what should be the cycle time of your processor and how much is the ok how will this convert into the absolute time it is very simple you know processor cycle is running as 1 megahertz how much of 1 clock takes 1 microsecond right. So, 29 processor cycle means it will take 29 microseconds ok before this particular interrupt could be serviced ok. So, that is what I am trying to say if it is a 2 megahertz it will double this. So, I am sorry it will be half of it. So, so this is how you have to compute the time taken for the handler to be executed because that is the delay ok worst case delay the real time systems need to design such a system such a way that this particular absolute time is at the minimum. So, that any catastrophe can be responded to immediately ok good. So, that was the most difficult part of this discussion today. So, I thought I will spend more time now to multiply and accumulate you guys are already familiar mostly you know how executed multiply or divide operation using add and subtracts, but now let us see the instruction mul supported by on what is the syntax and how they do I will go little fast now you guys are very familiar with all the instructions. So, see mul condition you know yes you know and then r d is a result where it is multiplied value is put and r m into r s. It should strike you now ok immediately that 2 32 bit values are multiplied correct here you can see that r m and r s is 2 registers is the 32 bit wide it is written into one more 32 bit value is it correct will multiplying 2 32 bit result into only 32 bit value it could happen when you have values which are smaller, but if you have a bigger numbers here you cannot fit in the result in a 32 bit value you are very clear about that you must be it will at least take 64 bit one more 32, but you may wonder what are the use of this instruction this instruction is used only when you are bothered or only concerned about the lower part of the result ok not the higher part you may ask me where are the reasons why there are so many in a typical scenario you are interested in only knowing that lower value or you may be knowing that my values which I am multiplying they are not likely to be larger numbers and I am forced to use 32 bit registers because ARM processor anyway has all 32 bit registers, but I may be doing a multiplication which is you know not even exceeding more than 100 in decimal value ok. So, you know 100 into 100 most case how much it will be 1000 right sorry 10000 it will be. So, it you know it it could be fitted into a 32 bit value 64 case yeah you know use the 32 bit value will be easily able to accommodate that, but what I am saying is any number which you know aware of you know this is whether it could be fitted into the lower value and you are not bothered or you think that the higher values are going to be insignificant or it is not it is all going to be 0's or all 1 by all 1's if it is a sign bit value it will be a negative value, but significant part of it is only in the lower 32 bit result. In that scenario you can always use this instruction, but you have to be very careful otherwise you may have a erroneous result if you happen to multiply huge numbers ok. So, the rest of the execution is very simple and then one more instruction is called multiply and accumulate what you do is we multiply two values here and then add one more register to the values in it and then get the result ok. These are all useful for DSP operations ok some of the DSP filtering FIRIR all those you know if you are done a DSP course you will know that they are MAC operation multiply and accumulate. So, this multiplication will followed by an addition is a most often seen in the DSP processing. So, what are the use of providing one instruction doing both because you are saving a lot of execution time because if you happen to have null and add you know as a separate instruction you would have spent one cycle executing this instruction and then one or whatever cycle time I will show you and then add will take one more cycle to execute and pipeline access all that you know unnecessary power base stage as well as the core size of the is now one instruction occupy 32 bit whereas, this will take one more you know 50 percent you know 100 percent more instructions right every MLA. So, you are trying to optimize it by as a processor has given you one single instruction by for doing both operations ok though add will take on internal cycle, but you are saving on core size you are saving on time power everything. So, only you have to what you have to remember is this instruction supports only gives you only the least significant attributes of the result similarly that is also true with the MLA. MLA means it is not a number of ok it is nothing do the correct it is multiply and accumulate ok do not use r 15 of for any of them ok there is no need for me to use r 15 the code address for doing some multiplication or addition right it does not make sense. So, processor does not allow you to use r 15 as one of this operands and then r m and r d means cannot be same register ok these two cannot be same because multiplier and multiplicand the there. So, you cannot override that value with the multiply ok. So, that that will affect the. So, it does not allow you to do this ok because the intermediate results it will be saving it internally the processor. So, it does not want this to be a same register because multiplication does not happen together ok it goes 8 bit at a time I will explain you in the next slide. So, for that purpose r d and r m cannot be same ok good. Now, what are the other restrictions I told you this this you know both signed and unsigned can be operated on ok the values in this register could be a signed value or unsigned value based on that the result will be interpreted, but since you are only bothered about the lower 32 bit value the only the upper portion has a signed related information right. So, unsigned or signed will be changing only in the upper portion the electric portion will be the same in both unsigned or signed operation because of that there are no specific two separate instructions for this where the lower 32 bit results are given. So, that is what unsigned results of a signed multiply and the unsigned multiply of 32 bit operands differ only in the upper 32 bit. So, lower 32 bit results are identical that is why the same instruction could perform both unsigned multiplication as well as signed multiplications. What I mean by this as this instruction only produce the lower 32 bit multiplied they can be used ok. So, same instructions are used whereas, you might remember as subtract I told you that based on the flag we have to realize we have to ah recognize whether it is a signed or unsigned it is left to the user based on the operands what the user has expected to have put in whereas, in multiply that is not the case because the multiplier itself needs to handle the signs accordingly to produce a result ok. Multiplier hardware has to be aware of whether it is multiplying a signed value or an unsigned value ok. Please remember that difference between add subtract and then multiply. So, multiply needs to be aware whether it is doing a operation on a signed value or unsigned value that is why there are two separate instructions which are put in in the later part of the you know in the next slide I will be showing you. But in this case it does not bother about it because the lowest the LSD part of the result is always same ok whether it is doing a multiplication of each type. So, it may assume one of the type and then perform the calculation ok. So, what is the cycle time? So, cycle time is 1 s it will take because pipeline execute state it is 1 s ok because during that time it is happening. So, 1 s in any way it will elapsed for executing the instruction. Now, how many internal cycles are done? M internal cycle what do I mean by M? M is the number ok either 1 or 2 or 3 or 4 based on what is the value on this part of the multiplier operandum. See what I mean by this I will let me tell you this is multiplier ok multiplier into multiplicandum ok multiplicand md I am saying ok and this is the product. Now, multiplier based on whether it has got ok this can be split into 8 bits ok whether it has got a significant value only in the where part or half word part or it has got the least LSD 3 bytes or 4 based on that ok. The amount of time it takes will increase why internally the ARM processor does a 8 bit multiplier it has got a 8 bit multiplier. So, what it does is it is taking 1 8 bit and then multiplies and then put the result and accumulates it. So, if suppose you have a value which is you know different bytes ok there 32 bit value can be split into 4 bytes right. So, if they are all 0's and then only some value is there which is maximum 255 right this sorry if it is unsigned value it could be unsigned value also it you know I must get 255 otherwise it will be plus or minus 2 1 from this 7 ok minus 1 plus 1 to 6. So, based on this significant value if it is only 1 byte it it can do only 1 multiplier 8 bit multiplication and then the rest of it is all 0's or all 1 all 1 byte is the negative value ok and only this is unique. So, it has to do only 1 internal multiplication ok let us say 1 is there if it has got up to this point it has got some significant numbers and then these are all 0's or 1 only the top portion is 0's or 1 please remember these numbers ok note note on the numbers ok. So, in that case it has to do a 2 multiplication. So, internal cycle either it could be 1 or 2 or 3 or 4 based on the number. So, it is. So, you cannot say particular instruction will take this much time it depends on the value that means, put in inside a R understood. So, every time when you put some different value this instruction may take different time different clocks number of clocks which is decided by the number that you are put into the R. So, that is what I am trying to say here ok. So, apart from multiplication in the MLA instruction one addition is there which is also the addition also depends on the this value ok whether how much significance it is whether only 1 8 bit addition or you know because it is donealong for every addition you know multiplication addition is also done. So, that will also go through the same thing, but one additional cycle is anyway there because of theone additional anyway perform it has to perform an R and access it has to do ok even if it is all 0 it has to perform an addition. So, that one cycle is taken and then other things are decided by this values ok. So, I hope this is clear to you. Now, one more instruction set of instructions it is multiply long what does it mean? Let me explain this here unsigned long ok the unsigned are signed is specified by the instruction it will be corresponding to some bit in the instruction inquiry condition flag is there S flag is there ok. I will not explain again all of you are familiar with this. Now, you see this there are see though it is R D low R D high there are nothing, but one of the registers R 0 to R 14 please remember we cannot have R 15 in the multiply instruction. So, you can use any one of them ok as this and then R m and R s is there in that case what will be the results how is computed R m and R s is multiplied and then lower board of the result is put in R D low and higher board is put in the R D high what I am saying is you could write an instruction with the R 1 comma R 10 comma R 11 comma R 1 ok you can have anything sorry here what did I put R 1 I put I think. So, let me put R 4 ok now what happens the result is put in the higher bit of result is put in R 1 the lower board of the result is put in R 10 R 11 and R 14 R 4 are multiplied and then the result goes here. So, the choice of these registers need not be in any order it could be anything only thing is you better make sure that they are all different so, that you do not get a result which is totally observed ok. So, now if you are doing a signed arithmetic the same thing ok these numbers are interpreted signed and then process the internally does the multiplication according to and then if it is a unsigned and a accumulate is also there now what is the difference this result is added please remember these two are same why they have made it this is the usual you know usage ok in a typical applications DSP applications and then moreover if you need to more registers then if these two are to be different you will have totally six registers being used and the six registers is different registers you have to give you need a 4 bits for you know encoding those values ok. So, 24 bits will go in that only in encoding this registers and then remember condition flag will have 1 1 4 bit is gone and then S bit is there 1 1 bit is gone then where is the space for you to accommodate this in the 32 bit instruction bit. So, so it is not practical to have all these registers separate. So, it is all given same register it is implicit whatever register numbers you mention here that will get added here as a when you mention the accumulate instruction. So, same thing is done for some fine value. So, that is all about ok R 15 should not be used ok. So, now you cannot use R 15 as an operand and then R D R O R M must also specify different registers please remember R M is also should be different register and that is for the correctness of purpose and even for executing the instruction and the U mal and S mal are error unsigned or signed ok and the result is a 64 bit value lower 32 bit is there and the higher. So, U mal and I mal instruction treat all their values as unsigned R 64 bit value X mal 3 set as a signed 2 component numbers ok. So, this is how it does multiply accumulate as that value the 64 bit result will be added to the the multiplied value and then the result will be put ok. So, the lower 32 bits of the 64 numbers to R is read from R D L O that is what here please understand that 32 bit lower 32 bit of addition coming from here higher 32 bit of addition comes from here and then this is multiplied and then added to this and then the result 64 bit value is put into the registers ok. So, we have come to a end of multiplication ok I hope you understood this is very simple try out you know some simple number with the simulator and then you will understand where the result is landed into ok. So, cycle instruction cycle it is similar to what we saw earlier only thing is there is m plus 1 here and m plus 2 here also I am not going to the details of this and m is again decided by these numbers which I have already explained to you earlier. So, this additional 1 is there in both the places because it is a 64 bit ok. So, there is additional delay up one more internal cycle because it involves a 64 bit addition ok it is not only bothered about. So, you have to it has to spend one more cycle in copying this value into this register please remember in the register file the results have to go from the multiplier unit to each register. So, there is only 2 read code write code write for the register file apart from the R 15. So, when there are so many parameters to be accessed operands to be accessed and the results will be written and it is all 64 bit value. So, 2 registers are accessed. So, you will have additional one cycle of extra internal cycle to put the result back into the register file ok. So, these values are similar to the value that you have given in the multiplier whether they are significant or not based on that it takes so much of time and again it is depends on the multiplier only nothing on the multiplicand this does not depend on the multiplicand or the add the offset or whatever addition is being done ok very good. Now, I am not going to show too many examples because we have seen a lot of examples while studying the you know understanding those instructions. I want you to try out all the instructions because we need time for covering other portions of the ARM code. So, I am restricting this to get a small examples ok. So, it does not save a lot of execution time, but that gives you some optimization ok how some different code can be written in a different way. It will give you a clue ok it is it depends on your imagination and how much you understand the instruction. So, that I want you to use your innovative skills to write optimum code. When I say optimum code you should be able to perform job with a minimum number of instructions. Why is it good because if you can perform the operation in a minimum number of cycles you are taking less execution time and less number less amount of code size. So, suppose if this is the condition ok if R n is equal to some value p or R m instruction ok some register one of the registers is equal to q some immediate value then go to label. If you understand the C convention ok if this is a typical C code what does it do? If this is true ok in a R condition you know what does it do? It does not even evaluate this why it does not matter right when you have this true whether this is 0 or 1 ok this is 1 already you have said if you are it what are you going to get here also 1 this also 1. So, it is a do not care correct irrespective of this value you are once you know that first condition is true you do not have to evaluate it. You may wonder what do we gain by not evaluating please remember in this case if you could be a simple assignment ok, but as you are aware a C allows you to put a complex expression here it could be involves a multiplication to addition to memory access to anything. In that case if a performing this operation involves so many instruction assembly instruction and which will all involves some cycles. So, no processor or no language does not want to execute or waste time which is not relevant and which is not going to affect your correctness of the code. So, unless you have any difference you are not suppose you have the dependency also because you you have to remember that in a language in this like this if this is true this could not this may not be executed at all. So, you should remember that when you are writing the code ok as a caution I am telling you when you are writing C code. So, let us come back to assembly now compare R n with the P this is the job being performed. So, this R n is not disturbed please remember and here it is not assignment ok I took it from the book. So, please remember it is a just a comparison you know if it is equal to equal to. So, that is what it is a pseudo code it is not a typical C code ok. So, it does not assign the value P into R n ok if that assignment happens then this compare what does it do it is just compare that it is not assigning this value into R n. So, please remember the intent is not to assign the value it is only to compare the value. So, this is the right choice of the instruction and then now what is then BEQ that is branch on equal if this happens to have resulted in a 0 flag being 1 ok 0 flag being 1 then you can just jump to label label it is not shown here it comes somewhere to label ok. So, this will not be executed that is what is required also right C also if it is true it will just go to the label it will not execute this instruction. So, that is true it is in sync with the what is intended by the language. Now suppose if this turns out to be false then it has to execute this instruction it has to do the comparison. Now will it happen here let us again check yes if this is not true the branch will not happen because it is only the conditional jump branch only when it is equal to this jump could have happened otherwise it will execute this instruction. So, what does it do now it compares R m with the Q ok. Now if this is equal it will jump again to label ok the same location otherwise it will fall to the next instruction which happens to be here. So, it does the job you agree, but there is a optimization possible I am not asking you an example because I have given you the code here I maybe if you want you can take a 2 minutes break here ok and check whether these two are equal ok. This 4 instruction is compressed into maybe 3 are they same 2 minutes break we will come back you should have convinced yourself now because it is taken from the book and I am saying that they are same so, you should better agree ok. Let us let us see whether your explanation matches with mine what is CMP it is executed with no conditions ok always ok. So, it executes now will it impact the flags of course, which are the flags our favorite C Z N B ok all flags are if I affected I am not saying whether it is 0 or 1 they are all affected it is reflecting the comparison value. Now, compare NE what does it mean NE means not equal to that means, it is only bothered about this particular flag suppose if it is not equal that means, if the previous comparison what was performed was not equal then 0 flag would have been what 0 it would have been equal to 0 then NE will be true then compare NE means what only if the previous comparison happens to be false you execute this instruction who is equal to the instruction not you are mean the processor. So, the processor should execute the instruction CMP only when NE is true that means, previous comparison has resulted in a false right that is what we want if this is false we want this to be executed is not it doing that. So, you may wonder what is the advantage see one branch instruction is avoided it directly goes to this branch instruction you may wonder see anyway in my case also this was not executed right the previous case also this is it went there directly. Now, what is the advantage first of all the code sizes saved now. So, you have saved one word of instruction ok you are not removed this, but you have removed one label and then how much time extra time you are spending one cycle in the pipeline ok because one one instruction this comparison has come into the pipeline and then it is not getting executed if this happens to be true otherwise it executes this instruction and then jumps to label. So, in this case you may not see thebenefit so, much in the in terms of execution time, but you can see in core size it is saved, but actually you can save it in a different way suppose if you do not put the label here and then actual content itself if you put here one branch instruction is true it can be avoided ok. So, that will be a very very good in terms of pipelineoptimization that means, if I say you know you fall to in this instruction itself then this one branch to this ok could be avoided that means, the pipeline plus and then we start to could be avoided. So, it will be it should have been a very beneficial in terms of execution time, but in this case is a what you save is the core size ok. I hope you understand the advantage of this is very very tricky, but very intelligent way of coding that is where ARM scores ok ARM score has given you a lot ofability to use this instruction effectively ok. One more example again you can take 2 minutes break I want you to tell me what is happening here assume there is some value in R M ok. I want to know after this RSBI what will be the value of R M ok what is the relationship between the value in R M before and after the execution of these three instructions ok that is the question 2 minutes break ok. Let us try to see what happens T EQ what does that mean it checks whether R N is same as 0 ok whether R N is given a 0 value ok. Now RSB you know that reverse subtract ok we should by now went back to the manual and then referred what is reverse subtract I do not know whether 2 minutes would have been sufficient if you do not remember recall what is RSB ok reverse subtract ok what does it mean normally a subtract instruction ok subtract sorry sub R D ok R M ok R N what it will subtract R D is equal to very simple right I told you how to write this equation without trying to recall from your memory. So, this is the job now suppose same thing is given as RSB ok what it will what will it do it will swap these two what I mean means swapping means is not swapping the contents it is swapping the way it is taking the subtraction. So, R N minus R M is equal to R D correct you should now by now you should know this, but if you do not remember please. Now you tell me what is happening here subtraction is happening better being an immediate value 0 and an R N normal subtract would have happened R N minus 0, but here it is 0 minus R N and then the result is R D ok. Now on what condition if it happens to be M i M i is what minus. So, what I mean by minus if R N was minus that means what when the T Q was done the flags which are the flags affected C Z N be our friends. So, N is set that means, what the sign flag is set then M i will be true RSB be executed that means, if this R N was negative R N earlier it was a negative value we are we are we are worried about the sign bit value. So, if N is set that means R N was negative now when we do 0 minus R N now which is happens to be a negative value what are you supposed to get plus R N that means, it is a mod of what I mean by this is what you are going to get the positive value of R N if it happens to be negative will be that, but ok if we I hope you understood this now if R N was negative it will subtract from 0 that means, it will just do a 2's complement of that number itself which will result into a positive value which happens to be the magnitude of that negative number ok. So, it will get the magnitude of the value now you may wonder if suppose R N was earlier negative positive itself it was always positive earlier will it impact the R N it would not because M i will not be 2. So, R N will not be done. So, R N will be same as what it was earlier. So, irrespective of whether it was positive or negative you will get the positive value in the R N if it is negative or if it is positive you will get the same value that is why I am saying it takes a absolute value of a negative number otherwise it gets the same positive value. So, see how see in a actually in a C code you will be writing it in you know the big assignment right. If C is equal to suppose I I is having an is assigned value if I is less than 0 you will say I is equal to I sorry I will write it like this I is equal to minus I right you will do this much in the C code whereas, you will it will generate only this much of the assembly instruction in the finally, by the compiler ok. So, also optimal right almost like a one assembly instruction per one higher level language instruction. So, it is it is pretty unheard of ok. So, that is the power of ARM processor very good. Now, you may take 5 minutes break now. So, this is the last problem of today's class. So, you can take a long break after this. So, please spend some time by understanding this instruction this time you should get it right if you did not get it right the last time ok. Take a 5 minutes break and come back welcome guys. So, what did you learn? So, I am not going to explain the slide it will explain. If the above example if RB has either 4 or 5 or 6 RB is what see please remember I have put RB, RC, RA does not mean that you can write the same code in the assembler in the simulator and expect it to work why it will not work RB and RC are not RA are not understandable by the assembler you have to convert it into some R 1, R 2, R 3, but I have put it to tell you that which registers are common across them ok. So, you have to replace them with some specific registers R 1, R 2, R 3 or whatever you like please do not put R 15 there and then try to execute this instruction and see what happens ok to yourself and try to understand what these things mean. So, carry set this is high high means this is see this comparison is if this is true this will be true ok and this is true this will be true ok. So, it basically based on the value in RB ok it multiplies based on the value in RB RC is multiplied either with this or this or this, but you see you would not see any multiplication at all ok multiplication is a costly operation whereas, just with the move and LSL LSL is what let us shift let us shift by 2 bits means it is a if you shift it by 2 bit it is equivalent to multiplying by 4 ok agreed. So, after that it is a either you want to multiply by 5 or 6 which can be done by one addition extra addition please remember right suppose you want to 5 into 6 you want to do I can do it in 2 ways this 5 itself what I will do I will shift it by 2 bits which is equivalent to 5 into 4. So, it is equal to 5 into 4 I have already calculated, but I am my interest is to multiply it by 6. So, instead of multiplying it by 6 I multiply it by 4 by shifting it by 2 bits which is a very you know easy operation and that takes less time and then I add one more 5 and another 5 if suppose if it happens be 6 I will only add 5 if suppose it is multiplication of only 5 if it is 6 I will add one more 5. So, 2 addition is just 2 internal cycles whereas, multiplication takes more cycle you have seen that m into i all that you have seen right based on the multiplier value. So, multiplication instruction is much more costlier than performing an addition. So, we are performing this addition that too based on the data value. So, number of cycles wasted are very less. So, this is very optimalimplementation of a multiple. So, this I am giving you here. So, that you can write such a code and cut yourself ok very good. So, r b has less than 4 you multiply it by 4 if more than 6 it will also be. So, if it is less than 4 it will be multiplied by 4 if it is more than 6 or it will be multiplied by 6. So, that is also you should remember it is not that only 5 6 it could have a value which is lower than this or higher than this even then this is the operation that ok. So, andwe are coming to a end of all the instruction set ok. Next class we will be covering thumb mode which is a thumb state which I have not talked about it, but though we have seen few examples using them. So, we will talk about that in the next class which will be interesting topic. So, with this we are completing all the multiplication and some few examples and the interplatency we covered it in detail. Please understand this fully for you to have a clarity on interrupts how they are handled and how the exceptions are handled ok. I hope this was useful to you and go and try it out on the simulator andhave a nice day. Thank you very much for your patience listening. See you next next class. Bye bye.