 Welcome friends today we will be covering on two important topics one is about instruction timing ok. I will give you an example how to calculate the timing of a particular instruction how much time does it take ok because we have discussed so many number of cycles non sequential sequential and internal cycles, but how do we come up with that number? Once I show you for one instruction for any other number of other type of instruction you should be able tocome up with that ok. And then very interesting topic now addressing mode 4 which talks about multiple load and store instruction ok. This is a continuation of what we discussed LDR and SDR which is which was doing only storing one particular register SDR was doing storing one register into memory and the LDR was loading one value from memory into a register. We discussed multiple ways of using those instructions. Now in addressing mode 4 it talks about how multiple registers could be exchange with the memory ok. It could be from memory to registers or from register to memory ok. So, these are the topics that we will be covering today mainly the first part of the discussion would be to find out howparticular instruction how much time does it take and how we calculate it. I will show you with an example a pipeline and what happens inside a pipeline so that this becomes very clear to you. And then we will talk about the addressing mode 4 using those LDM and STM instructions ok. Now I am taking examples from LDR and SDR and then show you how the timing is calculated. First of all let us see what is the definition of instruction timing. The instruction timing is the measure of time spent by an instruction in the execute stage. Please remember this is the time spent by the instruction in the execute stage. You may wonder why are we not considering the time an instruction spends in the decode stage or a fetch stage. Because that is a parallelized operation in the pipeline every instruction go through those two stages. When an instruction is residing in a decode or a fetch stage there is some active instruction going on in the execute stage. So, the time spent by the instruction in the execute stage is what is more important. So, let us see how we can calculate it. Now take an example this we have seen earlier I used this particular sequence of instructions to explain you what happens inside a data power. Now let us use the same sequence of instructions to see how did we calculate the time spent by the time taken by the store register instruction. So, we are interested in this instruction, but it is followed by few add instructions and then it was visited by one add instruction. Now I want you to think about how much of time that each stage when multiple instructions are there in a different stages they take ok. Now can we guess what is the time taken by this stage? You can say that maybe it is a sequential access of an address sorry instruction from the memory add instruction. So, you can say that you may say that it takes a s number of cycle because it is a sequential like because fetch is instruction fetch is sequential, but not I am showing here is the first instruction right the first one I will show you what is prior to this. So, in this case we do not know whether this particular instruction is being fetched for the first time with a maybe a data transfer prior to this. That means, the first cycle would have been a non sequential cycle. So, the possibility is that it could be a sequential cycle or non sequential cycle because you do not know what has happened prior to this cycle because that is not shown here. So, we cannot make any guesses on how much of time it has taken. Now you may wonder why are we differentiating between a non sequential cycle and a sequential cycle? It is very obvious a non sequential cycle is one where you are giving a new address and the memory is starting from that address to access it. So, the first access will take n cycles that is capital N cycles a non sequential access and and subsequent memory accesses if it is following the earlier addresses then it may take s that is which may be lower than the n. So, that will then we will call it as a s that is sequential cycle. So, in this case we cannot predict it because this instruction access could have been followed by another fetch in such fetch then it could have been sequential access, but it could have been followed by a data transfer which that means, it will be a non sequential access. So, let us now let us see what is the cycle this is going to take this particular stage. Though pipeline moves in lockstep the every stage and it depends on what instruction is there in each stage and then the maximum time that any instruction in that stage takes that is the way the pipeline is going to move. In this case we know that there is a decode here and there may be an execute of previous instruction there ok. So, out of these operations which would take more time relatively speaking anything to do with the memory is what is going to be consuming more because in ARM especially in ARM mode the instructions are structured such a way that the decode takes minimum time. So, the fetching and instructions from the memory is what is going to be more time consuming compared to any other operations. So, when you have a fetch before it clearly shows that this is going to be a sequential address from the previous form correct. So, that time taken by this stage will be yes because this is a sequential address from this and this memory access is going is the operation which takes more time compared to any execution or any decode stage within the pipeline. Now, come to this here what happens in this particular cycle what all is happening this fetching instruction the add instruction was is getting executed here in the execute stage of the pipeline and our str which we are interested about is getting decoded here to generate the signals required for calculating the address within the data path. And then next added instruction is getting fetched. So, which is going to be taking more time very logically this is a yes because memory access is always time consuming than any other operations inside the, but multiplication may be more complex, but in this particular case add gets executed pretty quickly right the two operands are already know they are going to be fetched and you are seeing what all happens when a particular inside data processing instruction gets executed there it all happens within a cycle. So, that is why we call always the time taken for a particular this stage is yes. Now, isn't it true here this is also accessing the following instruction now you may wonder how is it accessing it now this instruction because this instruction is completed. So, it is exiting the EXC stage. So, now our str instruction is entering into the execute stage ok and the decode stage is busy with the calculating the transfer signals meant for data transfer. So, decode is busy with the our str instruction itself. So, now what happens the fetch is coming here now the new instruction ok because there is a one more instruction which can come in. Now, this is also going to take yes. Now, here what is happening this fetch has moved to decode because decodes stage of str has already done with a job. Now data transfer is busy the data address data bus is busy with the transfer of data. So, instruction cannot be fetched you know that there is a common address and data bus and the memory is common in on 7. So, a instruction access cannot happen along with the data transfer. So, fetch is delayed ok. So, now is it going to take non sequential cycle or a sequential cycle where data transfer is not linked with the previous address of the instruction. It will be a totally a different address from where the data is getting read or written based now actually it is a str instruction it is getting stored. So, the data is getting written into the memory and it is not a sequential address from the previous memory access which was a instruction fetch. So, this is going to be a n ok. So, please understand this sequence why how we are deciding on whether an n cycle or a s cycle is taken by a particular stage ok because this is going to take more time compared to any decode operation that is happening. Now you should be able to guess what is going to be the cycle time here. This is the most cost less in terms of time is fetching, but is it in a sequential access compared to the previous access? Previous access was a data transfer and now it is going to access an instruction which will not be in the consecutive address from the data. So, the first fetching of instruction here after data is accessed will be a non sequential access that is why we see that n is coming here. Remember that not that every fetch is going to give you s only only when there is a previous instruction is alsoinstruction then the sequence sequential access is maintained to it will memory will be able to provide the data much faster than a non sequential cycle and in this case we are not going further down because we do not know what is coming here ok. Most probably it will be a another fetching instruction. So, it could be s, but our interest is on STR. So, how much time has it taken the STR instruction has spent in the execute stage? It has taken 2 cycles it has stayed there 1 is of type s cycle and another is of type n type that is non sequential access. So, an STR takes 1 s and 1 n cycles to execute this is the way we arrive at this particular timing. So, STR instruction takes less cycle than LDR. Why? Because LDR will take 1 more cycle to move the data that it has read from the memory into the register. We will we will see that in the next example. So, I am showing you an STR here. So, this offset computation was done by this calculated address and the actual transfer of R 4 into memory has happened here. Now, because of this there is a non sequential access of the instruction and this itself is a non sequential access of data ok. I hope now given any instruction if you know the sequence in which the other instructions are placed inside the pipeline you should be able to calculate the cycles taken by the instruction ok. This will be a good exercise to compute. Now, I told you that LDR takes little different timing let us see why. Let us first understand what this particular LDR instruction goes through what all it goes through into the you know in the pipeline. First it is decode operation is done for this stage where calculation of address the offset is calculated by the data path and the address is generated and now the loading it is actually loading means it is trying to read from the memory and then put it into some register. So, data transfer happens here what is the read cycle or write cycle happens it is a read from the memory. Now, is it going to be a non sequential or sequential it will be a non sequential address because there is no linkage between the instruction getting pressed the address where the instruction is getting pressed and the data which is getting pressed in the next cycle. So, it is going to be a non sequential cycle. Now, what is this one more cycle take this is actually moving the data which comes in from the memory into a data and register if you recall in the data path there is a data in register at the bottom where the memory loads the value into the data and register. Now, it has to go through the ALU and then just written into the register ok. So, let me tell you what happens I have already explained it. See this is the register which is R D ok. Now, memory cycle has happened and the data in register has got the data from the memory ok from the memory. Now, this has to be moved into this through if you remember the path which was going through ALU and then barrage is a vector and then the register file it has to get written into R D ok. So, this whole operation happens within the data path and it is purely an internal cycle. When this is happening is there any memory access happening it is not because you cannot fetch an instruction also because already the pipeline is full ok because this instruction has not executed the execute stage. So, you cannot access the instruction from the memory ok instruction cannot be accessed and there is no memory access needed also because the previous memory access system is not it copied into the register. So, it is going to be an internal cycle. So, let us see whether what I am saying is what will happen now. Here again the first access we do not know whether it is a SRM and here we are very clear it is going to be a S cycle here it is going to be S correct because it is a sequential address the instruction is getting fetched. Now, what happen this is also S because calculating address is a much more a simpler internal cycle which is lower than the sequential access. Now, during this data transfer what happens it is a non sequential access you are clear now. Now, this is what I am saying in this case there is an internal decode is happening as well as internal transfer is happening nothing to do the memory either with the internal instruction of memory or a data. So, there is no memory cycle internally the processor data is copy whatever data it has read from the memory is copied into the register because it was an LDR instruction. So, it is going to be an I ok. So, next will be here this very very very important you should remember this fetch is nothing to do with the previous data transfer. So, the first first time that is a instruction access also will be a non sequential access ok. Now, how much time our instruction of interest has spent in the execute stage this much time right. After that the next instruction has taken over the execute stage and prior prior to this add was you know in the execute stage. So, our timing is this much. So, which is corresponding to 1 s and 1 n and 1 internal cycle. So, I hope this is very clear to you I thought this is very important that you understand it from the pipeline perspective how an instruction flow through it ok. So, you can see the difference between an LDR and STR now LDR takes one more cycle more than STR because it has to be the data which has read from the memory has to be copied into a register which involve one more internal cycle. Now, I will ask you another question. Suppose if LDR involves something do with the R 15 that we see what is the impact of that particular instruction that is what we are going to see next. You may see that there is a big gap there is a lot of gap in between. Let us see explain now let me explain you what is happening. The instruction what we are trying to address here is some operation is done on some register ok and then a new address is expressed from memory ok. Let me explain you with a diagram say R 15 is here ok PC. Now, this instruction LDR PC you know PC is a RD now ok that is a destination register it is getting an instruction ok. Let me give you an example LDR ok I should have shown that now ok let another problem R 15 ok comma you are doing from R 1 comma some 0 x ok sorry 0 x 4 ok because 4 is what we have to give because instruction access are all 4 beta length. Now, suppose this is the instruction what are we trying to say whatever is the content of R 1 ok add it with 4 because there is no minus here hopefully my handwriting is clear that I will not put any minus ok. Now, assume that R R R 1 was having 100 ok for a change I am taking 100 value 1000. Now, plus 4 will result in 4 ok now this has to be accessed it is in the memory ok 104 is in the memory. Now, we are taking whatever is the content of 104 ok some address is here we are copying that address into R 15. So, R 15 this location is having a address ok from where the next instruction has to be fetched it is having an address suppose it is x address is x ok at the x ok it is a code it is a supposed to be instruction memory that R 15 should start executing from this address ok. Now, to do this we have to do this one memory access to read this at 104 get the address load it in R 15 and then R 15 should start executing by accessing it from this one sorry it may be a not it is not 104 it is x some address x which was read from the 104 ok. Now, let us see how this happens now first decode of this instruction was done and then the calculating address that is adding with the under which was in R 1 with the 4 which was an immediate constant and that address is generated and it is put into the address bus and now instruction access actually it could be a data access because though instruction is saved here it does not matter ok what whether it is an instruction or data memory because we have a a unified memory here. So, this transfer is taking place ok and then that address which has come in is getting copied into R 50 ok one internal cycle will be consumed by that after the R 15 is copied then it will start fetching the instruction. Now, will this new instruction which is getting fetched assume that there is a add here ok add instruction was there ok normally we do not put the add here, but assume ok will it get executed immediately no because this will sit in the fetch in fetch stage of the pipeline then it will go to the decode stage and then this is going to get executed. That means, our in the instruction which was changing the R 2 pane is going to spend a lot of time inside the pipeline in the execute stage it is implied ok it is not actually doing anything beyond this, but execute stage is not doing anything during this time. Why the instruction new instruction which is getting fetched here will take two more cycles to pass through the fetch stage ok fetch stage then decode stage ok and then it will come to execute ok. So, two cycles will be wasted in the execute stage that is why you see there is a gap because execute stage does not have anything to do I hope this is clear to you this is very very important I thought it is you should be clear with that ok. Let us see how much time each of them take ok that will give you how the computation is done for the LDR with the PC. It is always any PC is involved especially in the destination with this stage it is going to take more time because we are changing the flow of instruction. Now can you guess what is the time taken by this ok we do not know let us not worry about it. Now here you can know because some other instruction is in the execute stage and please remember you may see that you may say ok there is one execute stage in one more instruction may be there. Won't it take more time than this fetch how can you say this will take yes. See if it was a data transfer then there is a possibility that it will take more time, but if it this execute stage was a data transfer then fetch cannot happen because two memory access cannot happen in the same time. So, it will be some another execute in say the processor itself ok that will naturally be taking less time than a memory access either it is a sequential access or a non sequential access anything though the memory is much more time consuming than anything that is happening inside a processor. So, this will still be yes because we we are guaranteed to have not to have any other memory access here because if that was a memory access then the fetching will not have happened here. Now how much time this does this take this is already our execute instruction is enter the execute stage now ok and what is happening here it is getting computed the address offset whatever 104 we were doing it is getting computed here. So, but parallely one more fetching can happen. So, it is a sequential access. Now now the data transfer this is a transfer for accessing the instruction stored at the 104 which I showed you last time. So, that will be n because that it is a non sequential access from the address previous instruction. Now that gets copied into R 15 which will take one internal cycle similar to LDR we saw earlier. Now this instruction new instruction at 104 is getting in fetched ok. The execute stage is empty it is not having nothing to do, but it is now fetching a new instruction. So, it has to be a non sequential access it will take one more yes because now it is the next instruction followed by this. So, it will be a yes cycle and one more here will also be yes. Now what is the time our LDR with R 15 has taken it has taken from here to here because the next useful instruction gets that control of execute only at this time. So, till this point you have to consider that this instruction only has taken up the time. So, the time taken will be so much. So, how many n and how many s to each n one internal cycle this is what you will see in the manual. So, this is that sample instruction it could be 4 or 5 I have put some number, but I am actually sorry I should have been 4 because we do not want a odd number address to be given to R 15. So, then it may be a it can give you can do it, but it will involve some thumb mode access. So, let us not you are not even aware of thumb mode. So, please assume that this is a 4 or some number which is divisible by 4 ok. Now, the interest is this one that will topic of interest is this much cycle is going to take. So, when you see this in the manual you cannot directly find out how is it taking so much time ok. Now, with this explanation on the pipeline exactly what is happening you are ready to validate any kind of instruction. How much time does it take if the manual says something you can verify it by yourself drawing this pipeline and then going to each stage and then understanding what which stage is going to be consuming more time like what I did ok. So, this will be a good exercise for you to do in the lab section or as an assignment home assignment ok. So, whenever an instruction changes the PC please remember it will include additional 2 s cycles for the new instruction to enter the execute stage by going to 2 earlier stages of fetch and decode. So, that 2 s is coming from this now you can make out how it is happening ok. So, this is the total explanation on what is happening. So, this we are done with the instruction timing. So, let us get into a multi mode as multiple load or store instruction. It is very interesting please pay attention to it accessing one memory and multiple memories involves some certain things ok multiple sequential lattice accesses. Let us see this is the mode 4 we are going to talk with this we will be completing the addressing mode discussion and mode 5 will come later when we talk about co processors ok right. Now, how do you access multiple memory? So, when you want to when when do you want to access a sequential sequence of words ok only when you want to transfer a set of registers ok which are inside the processor to memory or you want to load a set of register with a set of some data which is stored prior or may be a new values from the memory into the register ok. So, it is like you want to initialize a set of registers together in one single memory sector. That means, when I say one single memory cycle means it will involve multiple cycles, but it is all banned together one after the other ok. So, processor supports that kind of a operation with the load and store multiple you may wonder why is it required? It is required whenever you are doing a a function call or maybe you want to you know go use some you know inter exception ok in on exceptional transfer ok. You are entering an exception suppose you know I am not talked about exception way, but let us see it is it control flow is totally totally changing from the normal flow. Now, inside the exception you will be using the registers which you are using in the normal flow ok set of registers. So, ok now, before the exception or ISR interrupt service routine you uses those registers for its one purposes you should save the set of registers into stack ok. So, ok stack stack is a memory ok it does not matter whether we required as a stack or not it depends on what kind of instruction you use. So, this is the exception handler ok handler ok and this is the normal code ok normal code executing here. Now, what I said was code is getting executed suddenly one exception has happened it could be an interrupt or it could be a you know data or whatever it is because of it jumping to some new location to execute some set of instructions. Now, in this case it is also a assembly code exception handler and it will be doing some operation ok with a register using the registers, but it has to make sure that it does not overwrite some register which are used by the code prior to the exception. We do not know which register was used because all the register is there are some bank register which I mentioned apart from that there are some registers which are common. So, if you are using any of the common registers you have to save them. So, to save it you will use a LDM or IS in this case STM you will be using ok. STM you will use to save store set of registers okin a flower bracket and then once you are completing the handler you will restore it back and then come back ok come back to the code which you are running. So, the code which is there running will not even know that there was an exception and somebody has come and they save the register content and then did something else and then came back to the right. So, it does not even know because it is seeing the same kind of register which it had before the exception happened. So, how is it managed by executing an STM and an LDM load that set of registers from the memory back and then before you come back to the normal code. So, that is why the LDM and STM are useful because you want to do the set of registers saved and set of registers restored with one single instruction which will be much time efficient compared to saving them one by one. If you are familiar with IS x86 instruction there was a push and pop A if you remember it is similar to this instruction here ok. There the these instructions were specific to the stack pointer and the stack memory where it was pushing a set of registers into the memory and then popping a set of registers back into registers from the stack. So, this was in x86 ok and ARM supports using a LDM and STM instruction which were similar to this ok very good. So, this so, loads of multiple instructions can increase the interrupt level. Now, what is the side effect of this normally the ARM processor do not interrupt ok get interrupted when this instructions are getting executed it will wait for them to be executed abort is different. So, let me not know confuse you right now because you are not heard about abort. So, when that happens we will talk about it. So, normally when an interrupt happens there are two interrupts IRQ and FIQ when those interrupts happen they need to be processed. So, it has to go into the ISR vector and then process them ok that also I will be covering you later. So, it will recognize the interrupts only when this after this particular load store is completed because if it is interrupted the memory cycle half of the instruction registers would have been saved by the LDM or most STM then there will be a confusion. So, it will complete the load LDM you know LDM and STM. So, normally what happens if you are executing this instruction there is an interrupt latency will be more. What I mean by interrupt latency you are aware of this being a embedded software programmer how much time that a processor take to respond to an interrupt. Actually in a particular time scale interrupt has happened and then it takes so much time to come back to respond to it. That means, executing the ISR for the interrupt. So, that will get delayed with your saving as huge number of registers into memory and the saving or restoring them. So, if an interrupt is raised then it has no effect until the load store multiple instruction is complete. So, that means, the interrupt is not recognized in the load storage system. Now, let us see what is the format ok LDM, STM conditional code addressing mode which addressing mode you are using for accessing it and then you are saying that RM RN which is the a base register on which you are working on the addressing mode and then this is something new ok. Set of registers are given ok in this case list of registers I will give an example then you will understand. So, when you are storing or doing multiple registers you have to give what are the registers you are trying to say whether R 0, R 1, R 2, R 3 or which register you may pick any register and then put it in the list ok whatever is accessible at that mode then it should be able to perform the job ok. So, basically a load multiple register consecutive addresses from the memory contents of the memory is copied into the set of registers what you are given. Similarly store save set of registers into the consecutive addresses into the memory. So, n is the number of registers ok here that many registers are saved or stored based on number of registers in the list. Any subset of current branch of registers can be transferred ok there is no limit you can give 1 or n number of them 16 or there may be you can give 16, but at least one register should be mentioned in the list you cannot even empty list and a base register is RN. So, from where which memory starting from which memory address you want to store them for the registers. RN register can be optionally updated also. So, after this transfer if there is an exclamation mark that means, if you recall LDR SDR has this exclamation mark to say that a write back happens. So, the base address can be modified after the complete transfer of the instruction is data is multiple data. So, one more thing you have to remember RN which is the base register can also be mentioned in the register list ok there is no verification of these two by the assembler or a processor ok. So, what is the effect of that I will when I am showing an example I will tell you you are trying to use the one of the registers as a base address to access the memory at the same time you are modifying it also as a part of the save or a load especially when you are you are modifying it with a load instruction then there is a undefined behavior is possible if there is a order in which you are doing ok. So, be cautious of giving the same base register into the register this also it is better to avoid them ok. So, do not have to spend too much time here the conditional code is here as usual and there is a special under which is meant for this and then you have seen most of them whether pre indexing or up or down whether the memory is going up or down and then there is one special bit here which I will talk about this PSR is a status register which is unique for the LDM STM which which will be used only in the exceptional mode the privilege mode. So, let us not worry about it right now ok remaining thing is the register list how many bits are alerted 16 bits are alerted why if there is a one that means, you are interested in either loading or storing that particular register ok or 0 register if there is a bit one is set that means, R 1 you are interested in loading or storing. So, you can mention in any any register from 0 to 15 16 registers can be mentioned by giving it in the list ok. Now, how do you use this instruction register list is there, but there is a qualifier which is given here ok. This talks about how to change the base register after one access every access of the consecutive addresses ok. So, please hold on I will talk about this in the next slide. Now, what is this ok this you understand, but what is this character symbol this is very specific to the privilege mode exceptions are is WI. So, we need to wait till we come to that stage. So, please ignore this for a moment. Now, this I will be talking about it in the next thing. Now, I want to give you an example first before I tell you what this instruction does ok. Now, you remember that our convention right this is the registers which are set with this values and this is the content of the memory before the our instruction gets executed. Now, now I will explain you LDM you understand it is loading it is loading what loading into the registers given in the list. How many registers are given in the list? If you mention a hyphen here between two numbers it includes all the registers that come between them. So, R 1 to R 15 how many registers are there R 1 R 2 R 3 R 4 R 5 5 registers are given in the list. So, in the instruction format there will be 5 ones in the relevant points ok bit bit 1 will be 1 to bit 5. Now, from where is it taken you are loading from the address where R 0 is pointing at ok, but we before we take the address there is some command to the instructor the processor say that ok before you take the address mentioned in the base register increment after that means, you are supposed to the processor is supposed to increment the base register after the first transfer is done understood. So, the processor is supposed to access the first word pointed to by wherever R 0 is pointing at ok pointed to by R 0 copy that into the the lowest register ok whichever is given R 1 to R 1 when I say lowest mean the lowest in terms of number. And then the address from where it is supposed to take is the lowest address ok I will tell you why this restriction has come in. So, R 0 is given in base address and then it is going to vary from increment after that means, it is going to change from 1000 to some address based on number of registers will not there, but it is says that increment this register to access the next element after you access the first one that means, 1000 wherever whatever is the value comes into R 1 ok and then R 2 is accessed R 3, R 4, R 5 I put the values such a way that they match so that you you understand what is happening because it says increment after. So, after the loading it is going to be implemented. So, after the fifth value is copied into R 5 it is going to be incremented. So, the R 0 value will be 1000 and 1 4 after this execution is done ok. So, please remember when you have multiple registers to access the address has to be modified otherwise you will be accessing the same address to you will not get change anything right you will be overwriting on the same location. So, if you are accessing multiple values you have to increase the address or decrease the address so that you access every time some different values. So, that is implicit. Now how do you increment it or decrement it whether you will start incrementing first and then loading the value or storing the value or you first load and store and then increment. So, if you remember recall in second session or so, I talked about stack can be changed in different ways. So, that is what is implemented here. Please remember there is a a different instruction 4 4 possibilities are possible in this. So, I hope this example is clear to you with this let us go and see what all different things the base register can be modified. So, the base register can be first incremented increment before or increment after. So, the example what we saw was this increment after ok. So, it access that location first short address wherever the RM is pointing at and then it incremented the value to access the next element if it is there in the register list that is why you see the short address is RM here. Whereas, if you say increment before whatever is the base address address is please increment it first and then you access the memory that is why the short address will be plus 4 in case increment of before was given along with the instruction. Similarly, decrement after now again here why is it minus here and then actual value here. Here you remember always the processor accesses from the lower address ok. Let me give an example. Suppose you have ok memory ok our favorite address 1000 and there is a 1004 or 1008 ok. Let us take only two registers you are getting answered ok. Now, assume for a change R 0 is our base register which is pointing at 1008 ok and then you are saving you are given a register list saying that R 5 comma R 6 sorry I do not have space here. So, please say forgive me it is 6. Now, what are you trying to say? I have two locations in the memory which is pointed by R 0 ok which is pointing at 1008 and then I am telling also ok in the decrement after please remember I am saying decrement after what am I trying to do we can do LDM or SDM. Let us say LDM ok I want to load ok two memory locations 10 is stored here some 11 is stored here no problem. Now ok you have to combine all these things ok LDM decrement after ok load means what you are loading from memory to register which register our our friends are here. So, ok let me show this registers R 5 and R 6 with the values which are at 1008 and 1001 which suppose which are expected you know assume to be having a 10 and 11 values. Now currently the base register R 0 is being used which is currently pointing at 1008 and I am telling the processor decrement after that means what you access this location first ok and then you decrement it and then access the next location and then copy into R 5 R 6. Let me ask you one question which value will go into which are the registers? I told you the registers are access from lower numbers ok and the addresses are also access from lower address ok. I will ask you why it is that for a moment assume that address is also from lower first and then registers also copied in the order in which R 4 you know lower register numbers are copied. So, in this case which is lower address 1000 in this 2 register which is a lower register numbers R 5. So, this is the way it is implemented ok believe me R 5 will get what 11 because it is going to access from lower address and R 6 now it is very trivial to say what will R 6 get it will get 6 pairs of a 10 right from here. Now this is the way it was a decrement after. So, it is happening like this now let me give you another scenario ok R 0 is having the same thing and your list is also same your LDM is same, but I changed this one to decrement before. Let now let me ask you ok then everything being same what will decrement before do think it over tell me. So, decrement before so, the execution is going to start, but I am saying a decrement before where am I most here ok. So, it will be decremented to 1004 ok. So, this is 8 this is 4 ok this is 1000 ok where is it oh I am sorry I made a mistake here when I said there is one more value here. So, please remember ok 1008 is this I said R 0 was having what did I say R 0 was having 1008 correct correct ok no problem it access these two ok 1000 is having 12 here ok. So, it went from this to this ok no problem. So, far so good now I am saying decrement before that means, whatever is the content of R 0 you decrement it before before you access the first value. So, 1004 will be the perspective that means, the one of the accesses and 1000. Now the range is between 1004 ok because you are given only two registers here R 5 and R 6. Now tell me what will be the values in R 5 and R 6 can you guess do not make a guess you should be sure about it. Again remember the processor will start from the lower address and again remember that the processor will start filling from the lower register that means, 12 will get filled into R 5 5 and 11 will get filled into R 6. So, that is the way it is implemented why let us wait for a next next slide ok I hope this is clear. So, you can do it either decrement or increment increment after or increment before there are four possible ways you can change the R n and it is very trivial to find out how these values are mapping on book ok. I do not want to explain because it is very obvious I want you to do as a homework ok. Now let us see here n is the let me take the most back where n is the number of registers this n ok you have to come out of that n being a non sequential access here for a moment this in this particular slide it talks about number of registers which are in the registers ok. And based on whether R n is a base register whether the R n is pointing at a data value or not the end address of the access is determined ok. So, whether you are doing based on the decrement or increment you have to see first what the processor does is based on whether increment or decrement is given it finds out the range of addresses between which the transaction has to happen and then it decides which one to copy to which register based on the list. So, the register is transferred in the lowest to highest ok registers. So, R 15 will always be transferred that is why they have put this lowest to highest. The reason being suppose you have mentioned set of registers R 1 to so many registers dot dot dot R 15 also you have mentioned that mean R 15 is a special guy which is PC. So, at the end of the execution only R 15 could be changed because the flow will change once you modify this and if there is a data of what there will be a crucial impact of restarting the instruction which will be clear to you when I expand the data of what, but for a moment assume and believe that you have to access it from the lower to highest because R 15 is the PC and be the instruction should not modify the R 15 before any of the other registers are modified ok. So, the conventions are like this ok let me move this so that you can read the measures right. So, the lowest register gets transferred first and the from the lowest address ok this is very very important we should remember. Now, some more and then order of register transfers are not based on the order in the list this is another fine point where you have suppose I mentioned R 2 and then R 0 and then R 5 ok. Suppose this is the order where registers are given maybe of course, comma is required I think. So, the assembler will not crib it will not give you error that you are given some you know random order. If you recall the instruction format it is only keeps 1 bit for each register right 0 to 15 it keeps 1 bit to say which registers are getting transferred. So, R 0 it will set 1 here R 5 it will R 2 even R 2 so R 2 will be here somewhere here R 5. So, it is not maintaining a inside the register you know instruction it is not meeting the order in which you have given them. It only knows that you are given 3 registers in the list and then based on this particular condition always the lowest registers are transferred first compared to the higher register. So, whatever order you give here it is going to transfer the contents from sorting from R 0 to R 2 then R 5 that is all from the lower address to the higher address whichever is pointing by a pointed by R n based on this type of increment or decrement. So, because it does not maintain any order it is not possible for the processor to do that and it is not making sense also. So,it always does it from the lower register order to the higher register. So, you can free to give anything, but there is no order maintained and there is no need for it because this instruction is executed in a as a particular chunk ok. So, you know other instruction can come in between also. So, this was the important points I think now we are ready to take some see some examples. I will ask you one question here ok, think it over ask the presentation and then tell choose one of the options. I want the students to discuss among themselves and then decide one of the options then you should be you should convince you about you know you suppose a student A and B are sitting and then A says A it is option is C and now other guy says B because he is B he may say B. Now, both of you should discuss and then convince each other and then finally, you should come up with one answer ok could be correct or wrong does not matter, but at least you would have spent some time in spending about discussing about this topic. Then see what is the actual values the teaching assistance need to take care of this ok. Now, tell me what happens here? OE is the LDM STM always start loading ok or storing from the lower to higher address and also pick the lower to higher register numbers from the list ok. So, though I have mentioned R 15 is the end. So, I have mentioned here. So, you can always say that this is the right answer, but I want you to unbiased please you know B unbiased about choosing the options ok. If you know memory pretty well ok the sequential accesses always happen from a lower address to an increasing address right. So, the option is B ok. So, to enable sequential accesses into memory it starts from the lower register numbers ok that is the first reason ok while it is starting always from the lower address and then it starts from the lower at register numbers, because they want to change the R 15 at the end. So, that abortes can be recovered from ok. Sequential memory accesses ok yes with memory expects the address to be increasing. So, LDM STM needs to start from the lower address. If suddenly you cannot tell the memory ok I am going to give the address, but now you should start decrementing your addresses and giving me the data. Maybe the memory controllers are you know advanced enough to do that, but that will involve lots of other changes to the whole circuitry and the you know in the way the processor is connected to the memory. So, so to bring the uniformity in the implementation this this ARM 7 TDMI follows this particular convention that it will always access from the lower address to the higher address and copy from lower register number to the higher register number ok. I hope this is clear to you that is and being this. Now, let us see few examples so that you understand this instruction fully. This are you already seen with the 5 registers in the list. Now, I am giving you 4 it does not matter they do the same thing except that it will copy only 4 and then the end address will be this ok. So, what will be the value it will be changed from I am giving you the registers in this order because R 1 will be written first R 2 next line R 3 R 4 in this from lower address to higher address. Now, I am giving you another instruction I want you to just stay and then look at this and then tell me what happened. By now we should have you know realized the I B means what it is increment before ok. That means, we want the R 0 to be incremented before any transfer is being done. So, wherever the R 0 is pointing at 1000 is incremented. So, it will be please remember any increment or decrement is always 4. What is the transfer you are doing? We are doing always word transfer there is no byte or half word here LDM supports only word transfer please do not confuse this B with the LDM sorry else you know SDRB and LDRB they are different ok. This B is not for mentioning that it is a byte transfer it is for increment before LDM always does the word transfer ok. So, R 0 is incremented and then the R 2 to R 5 are copied from the lower address to higher address. So, it is very straightforward I do not need to spend more time here. Here let me spend a little more time because you need to takeabsorb the instruction add R 0 with a 10. What are you doing? Incrementing R 0 content ok R 0 you are making it point to 10010. So, you are starting from somewhere you know the higher address and then you are telling the processor A I want to load multiple values starting from R 2 to R 5 these registers should get some multiple values from the memory, but I want you to take the R 0 as a base register and please do a decrement of that. What does it mean? You are supposed to decrement the register base register, but do it after you transfer the first word into the first register what I have given you ok. Now tell me whether will it start accessing from 10010 or 1000? Because it is supposed to provide the memory with a lower address for it to go from lower to higher address it has to compute from where it is supposed to access the first location now how many registers are there R 2 R 3 R 4 R 5 4 registers right. And though the R 0 is pointing that 10010 now decrement operator is given. So, you have to start the memory access from 1004 ok let me explain you this little bit why you should know by now. See this is the way the access is happening because I told you that memory can always you know increase the next address and then give you the data right. So, but it should know the range it could be from 10010 ok, 2004 or it could be from 1000 prior to this what C right 1000 C to 1000. How does it know this 4 values whether it is from this or this based on whether it has to decrement after or before ok. Here it says decrement after that means, it wants it is interested in accessing some data which is pointed by this address now. So, that means, it should only it should you know make sure that this is also included from here to 4 words. So, from 10010 to 1004 what are the addresses 1000 C 1008 and 1004 that is all 10010 also. So, this is the addresses it is going to be accessing memory from. So, if this will come first to R 2 then from here R 3 R 4 R 5 and then it will lie the in decrement after right. So, after the last access is done it will decrement one more that is why you see that it is pointing to 1000 now. If you understand this you should be able to give any combination and try to give the different transfers ok. Hope it is clear to you guys I have spent a lot of effort in explaining this, if you do not follow please go back listen to this again make sure you understand. Now, a smaller change decrement before ok I am showing you both examples of incrementing and decrementing. So, it is starting from 1008 ok up to 1000 is it how many registers R 3 R 3 1000 R 8. So, have you know like really this also 4 bit 4 registers. So, this set of 4 will be accessed because it it says that decrement before you do a first transfer. So, then processor will know ok I need to transfer between this 2 and this. Now let me give this to a address register ok address register means which is between the data processor which is you knowthe data part ok. So, the access will start from 1000 and then it will inform the memory that it is a sequential address every cycle it will keep saying that a access me access one more one more one more. So, it will access plus 4 starting from here starting from here it will be plus 3 ok. 4 will be accessed and then every transfer it will know from there to save because it has ordered the R 5 to R 8 whatever registers you are given it will put all this into bits it has already put. So, it will know that ok these are the bits which has to be copied. So, those registers will be unable to accept this values coming from the memory. So, it is very simple that is what happens inside the processor a sequential access is started and then the transfer starts off ok. Then these registers will get loaded as I shown you here and then where does R 0 point to ok. Now once again so decrement before so sorry this should have been a 1000 ok I am sorry this I missed it because after all the decrementing is done the loss address will be not 1010 it should be 1000 ok because this exclamation is there. So, this is 1000 ok good. So, this kind of mistakes if it happens you should be able to find out ok I am intentionally doing that this is the way to explain ok good. So, now very interesting part how are we going to find out the timing required for this. So, do you know whether it is srn you do not know is it going to be srn you should be telling right now yes is this going to be s or n ok what about this this will also be s. Now what will be the first transfer ok happening will it be will it be n or s is a new address from the instruction. So, it has to be what starting with n ending with how many s in first access if n registers are to be accessed the first word would have been accessed is not it in the first n cycle itself. So, n minus 1 s cycles you will have to be I told you first 3 I told you that last example. So, n minus 1 s should have been generated to access from the memory or to load into the memory whatever it is ok this is the time taken. Now, this is interesting thing will it take how much of this this has to be lost register is copied t intermediate registers are copied while the data transfer was happening can it happen in parallel of course, data transfer is something through the memory and the data is on the address back andmoving a register content into the register file is a internal operation. So, that can happen in parallel I am not showing it here, but it is all of the registers are saved except the last register which get that transfers. So, that will need one more internal cycle ok and then this will be again be careful do not look at the fetcher and then say it will be s cycle it will be a n cycle because it is something different from what the previous access was it is to do memory it was to read some data address and this is some instruction address which is non sequential it will be to start with it will be non sequential. Now, tell me how much of time it has taken this our LDR LDM n registers it has taken from this cycle up to this part right. So, starting from this this is the you see one internal cycle one n cycle and n minus 1 s cycle ok. Now, why is it n cycle here because there is one s here also. So, it will get cancelled and then you will get a n cycle ok. So, this is what happens when you see this it is not very obvious to make out why the ARM manual is saying this right, but this will explain you why it is so ok. I hope this is clear to you. Now, a little you know difference here ok I told you that there are set of store instruction and set of load instruction and I told you also that the intent was to save something a set of registers and then restore them back into the registers back ok. Now, if you need to do that there should be some order in which you should use the STM and LDM you cannot do it in a random order ok. Let me show the example then I will tell you why they match one will match with the other one. Assume that STM in no increment before was used ok R 0 was 1000 and then you said that increment before. That means, you access from 1004 and then copy into and now store ok you are storing the value what was there in R 2 R 3 R 4 R 5 into memory. See earlier once againthis was like this now after this execution it has changed why because you wanted to save from R 2. So, please remember saving also happens from R 2 to R lower register to IR register and from the lower address to IR address. So, R 2 will go and sit here and this is the order in which the memory is updated. You used to increment before now I will ask you another question. You want to load assume that R 0 is now pointing at 2010 you want to load those values exactly back into this registers after some exception handling or whatever or you got into a function and then you did some operation and then you want restoring all the registers. Now, you are giving the same list R 2 to R 5 or ok of course, if you do not give it it will be copied into some other register, but assume that you are giving the same register which load instruction LDM instruction would you use to get the same value back into those registers. Just intuitively see increment before it was ok. So, it was incremented before accessing this. So, it would have gone and stood here it is standing here. Now, you want this to be copied into where R 5 back right. Now, you should not do STLDM anyway you have to do a decrement, but you cannot do decrement before because if you decrement before you will skip this value you cannot get it any afterwards. So, it should be decrement after then what will you do you will copy this and if you do the same order R 2 to R 5 that means, the lower register will get the lower data and then higher register will get the higher the address in the higher address. So, R 2 5 will get the this value and R 2 will get this value, but you have to use a decrement after ok. I am using that properly I am using a same R 0 because assume that R 0 is not changed when you are using after using this you are using this by the time R 0 is same you are giving the same list then it will come here and the exact same value which went out will come back here and then you can start go with the processing. So, that is why these two are mapped that means, if intentionally if you are trying to save something and some set of registers and then just store it back you should make sure that the base register is undisturbed and then you are also using the paired STLDM and LDM. So, that the intended operation is performed processor does not stop you from using anything else ok please remember it is left to you how you want to do it and similarly do not think that you have to you know assembler or somebody will come and help you to you know see whether you have put the same registers in order or something no is also left to you you have to take care of that. So, writing assembly code is always time consuming and you have to be very very carefully in what you are doing ok. You can intentionally put some other register to transfer these values will be something else ok that is perfectly allowed. So, the I leave it to you ok as a home exercise to see how these two are mapped how these two are mapped and I want you to write programs saying that these two are mapped and then it behaves the way it is supposed to using the simulator ok. I cannot explain everything here I think one will make you think now we are going into another small stack operations ARM architecture uses the load store multiple only to carry out stack stock operations this are the stack operations not stock operations over and over. So, pop operation removing data from the stack you you know pop and push. So, pop is pop out from the stack that means, it is a load multiple because from memory to register similarly push is from registers to memory. So, you are storing it. So, similar to push it is a store multiple operation. So, ARM does not have any pop or push instructions in the ARM mode it is there in thumb mode which we will talk about later. So, you have to use LDM only and SDM only to know achieve this operations. So, I explained you the stack can grow in different ways it can be in as towards ascending address or a descending address or a stack can point to a empty location or a full location. So, if you are going up or down you have to use the ARD operations ok ascending or descending in the stack ok. So, I am telling you stack you know stack can be operated in 4 different ways right if you recall the SP can be pointing at a full location or it can be pointing at the empty stack. So, you can read it out if when you use a full stack the stack model is points to an address that is last used or full location. One example I will show you so, if you understand stack is here ok you have a freedom to use whether stack has to grow upward. So, always this is a lower low memory and a high memory. So, you have a freedom to move the stack in this way or this way based on that you can choose the proper instruction ok decrement after or decrement before or increment or whatever. Similarly, you you have another choice whether you want to the stack pointer whatever it is R 3 R 13 is on default, but you could use any register also to perform the same thing. So, if ST is looking at you know filled data then you will normally do before pushing any data you will do suppose you are doing in this incrementing either increment before you will do ok increment before pushing the data because it is already pointing at a full that means, it is a valid data is there you do not want to overwrite on that. So, you should increment the stack pointer before pushing the next element. Similarly, if you want to pop the element you have to say decrement after. See it is very easy you know easy way to see which are mapping increment is there in your store ok LDM has decrement. For increment decrementif it is before is there in sdm it will be a after ok it is very easy to map them, but I do not want you to mug up like this you should know the reason behind it ok. No mugging up will help you anywhere. So, please pay attention to that and explain it reason it out whatever you are learning reason it out. So, ascending an STM will go up or down LDM will be down. So, if ascending a ok if I say that a stack is growing ascending then STM will go up why when you push something only the stack is growing right. So, if if it is ascending memory the stack is going towards a increasing memory that means, while STM it will go up while LDM you are popping it from the loading the register from the memory. So, you are reducing the stack content. So, you are the stack will go down ok I hope this is clear to you. So, instead of using these operations increment before or increment after that you could use the stack ok. See f a f d e a e d these are all the same operation which I mentioned to you full after ascending full descending or empty ascending empty descending whether stack is pointing at an empty location or and then it is going in which direction whether it is pointing at a full location and then growing in which direction. So, these are the instructions which are mapping instructions. So, you could use LDM f a ok I will explain you only one and then you should be able to reason out the other things. What is LDM f a you are loading ok what do you mean by loading you are popping it ok this is the popper pop operation it is mentioned there. Let us first see it is all mapped it says. So, let us see whether it is it is correct or not first let us do a push ok you cannot take any value from stack without putting something into it. So, let us do some push that means, you are doing a STM IB or STM f a ok it says that they are all equal. So, let us see increment before. So, you are pushing something increment before that means what increment before is you are pushing pushing means stack is growing up. So, it is a ascending stack very good that is why you see that ascending is mapped to this and then increment before that means, it is pointing at a full location correct otherwise they would not have incremented before. So, before pushing that means, it is a full that is why you say full ascending is what is stored is a equivalent to the IB. So, why they have found similarly the IB is replaced with a DM in while LDM why there is a common acronym is given to you you do not have to worry whether you are which IB or AA whether I have to use ok. This will be simple if you say LDM f a you have to use SDM f a if your stack is implemented as a full ascending you should always use LDM f a and SDM f a together or you could do any other combination of stack operations ok. So, they actually they perform the same thing all these instructions are same only thing is for the usage perspective they have given you options you can use any one of them provided you pair them up properly otherwise you are likely to land up in trouble what you put is not going to be getting back from the stack ok. So, one example I will show you oklet me come back to the moves ok. Now, you see I am taking this first example I have used SDM f a ok this is a one with this this line is same what am I doing I am doing a store operation ok. First before you doing anything with the stack I want to do store operation you will see that R 1 or 2 or 3 were having this value prior to this this memory was like this now it is changed to 11, 12, 13 why because it was the full ascending that full ascending means wherever it is pointing is already I have valid value somebody else has pushed into the stack. So, you should not disturb this. So, you it is incremented before and then loaded all these three values are loaded. So, R 1 R 3 will come and shift from the lower address to the higher address. Now, what is the mapping LDM f a if you do the same thing if you do SP is also not disturbed after this whatever is the SP SP value you should also have that then this instruction will do the same thing. Please remember this when you are using you can use SP here ok that means, you are doing a stack operation. So, that know it will be very clear to you, but you are not restricted to only use SP here you could use any of these things ok. If you are compiling a program and then linking with your assembly code it is better to follow some convention. So, R 13 is the usually used as a stack pointer if you want this to work with a third party tool generated assembly codes code ok. This is the example I hope this is clear to you this is how you store into the stack and get it to the get it back from the stack ok. One more example ok I want you to please it and then try what is happening see what is happening here. See R 1 to R 14 and then it is saying that empty decrement what does it mean it is pointing at a empty location. So, you decrement it after you fill the value because it is already empty stack location. So, a valid value is put into the stack, but which one is put to first because it is a higher address from this to this only it will be saved. So, R 1 is pushed later, but R 4 is here ok that means, a actual memory transfer will happen from here to there. So, R 1 will first come and then the transfers will be happening, but the values will be like this ok. And the mapping SDM ED is this one ok SDM and LDM ED is this it will do it is ED is what you ok where is SDM ED it is similar to SDM DA decrement after ok. So, LD increment before that means, when you are for matching LDM ED is LDM IB that means, it is pointing at empty location before picking any values from that increment before. So, you are increment the stack pointer before accessing the element in the stack ok. So, we are ending coming to the end stage. So, there is a possibility of stack overflow ok. So, how do we handle that? I am just giving you a sample example to logical conclusion of stack operations stack operations. So, basically these are the stack base the R n and stack pointer which is R 13 which is you know the base address and then the pointer which register and limit ok beyond which the stack should not blow. So, stack base is the starting address of the stacks in memory and pointer is where exactly current stack pointer is pointing at. It could be going in disening or ascending mode or it could be a does not matter ok it is to do with the some operation while doing the stack operation. So, when it is crossing the limit the stack limit is normally kept in a f s indicator register which is R 10 conventionally. Assume that you are keeping as limit you know where this supposed to be stack should not go beyond that you keep that in the R 10 register. Then how do you make sure the current stack pointer when it pushes ok you have to check the stack limit only when you are pushing something into the stack whether it is crossing the limit or not. Example your stack is here ok this is a total stack size the you know allocated stack base is this ok it is starting from here assume that stack is going this way ok. Now currently stack pointer is somewhere here ok. Now assume that this is the size assume that there are 4 words available before the limit is crossed. Now you are executing a push instruction that is similar to some STM ok operation that means, you are going to close the stack more when you are pushing something you. So, that time you are trying to see whether it is crossing this limit or not. So, this is the size by which you are incrementing it and stack pointer is currently pointing at the current value. So, what you do is after the addition of the stack size whether this limit is actually here ok R 10 is here. So, please remember this subtract and add whether you are using this depends on whether stack is going down or up in this example you have to assume it has to be going down so that you are subtracting the value. So, ok a stack is going down you should not grow beyond some value ok that is why sub is there and you have to put if you are going up in the stack. So, you are limit the current stack pointer and then whether it is going if suppose some size is subtracted from this whether does it cross the limit or not ok that is what is checked here against this R 10 which is pointing at the last location which is you know lower starters beyond which the stack should not grow below this if it is lower than this is the stack over flow. So, there is an exception handle. So, this is the way yes a special piece of code can be put to limit you know check the limit ok. With this we have come to an end I hope this you know session was useful to you lots of new concepts you have learned try out everything and I am sure you will become an expert in assembly programming and also in the bargain you will learn everything about ARM processor ok. Thank you very much for your time and attention and wish you all the very best see you in the next session. Bye bye.