 Welcome back. Today's topic is multi-cycle implementation of our scaled down version of MIPS processor which we call M MIPS. Think of it as micro MIPS or mini MIPS. So, why multi-cycle? Previously we have seen single cycle data path and although it was quite simple to evolve, understand and analyze, there are obvious drawbacks of single cycle data path, single cycle data path implementation of MIPS. The most striking drawback was that even for low latency instructions, which would have finished computing the results very early in the clock cycle, though even those instructions would have to wait until the end of the longest clock cycle to commit the results into some way some of the state elements, memory elements like registers or memory blocks. So, for example, a low latency instruction such as jump, unconditional jump, it would have it would have found is decoded the instruction, found the target address very quickly, very early in the clock cycle, but then it has to wait until the end of the clock cycle because we are having a triggered synchronous sequential circuit. It has to wait until the end of the clock cycle which is quite long wait until it commits the results before which the other instruction cannot start, next instruction cannot start. So, a low latency instruction would have computed the results, computed the result which in this case it simply means it would have simply found, decoded the instruction and found the target address where to jump to unconditionally and it would be now ready to load that target address into the program counter, but it has to wait early during the clock cycle this would have the result would have been computed, but long wait this which is wasteful until end of the clock cycle. So, this is quite a waste of time of course, how long the cycle has clock cycle has been defined to be or designed to be would have depended on the longest latency instruction. In this in the case of micro mix we have seen that it would be the instruction load word where which has to kind of go through several phases fetching the instruction then decoding it then computing the address of the memory from where to load from for that it would use the ALU then accessing the memory block and then after the memory block returns the data committing the result into the appropriate register in the register file. So, this is one this is the longest apparently the longest running instruction and the clock cycle would have been designed to be long enough for such a long running instruction and, but for low latency instructions like jump or conditional branches like most of this CPU cycle is going to go a waste or even for little faster instructions than load say store or say register ALU type instructions like add or add immediate. The work will be done fairly early in the clock cycle fair little before quite early enough before the end of the clock cycle, but there will be a wasteful wait until the end of the clock cycle for committing the results committing up by that I mean putting the results back into appropriate chosen like registers or memory block may be. For example, store instruction would be committing the result into one of the specified locations in the memory block data memory. So, this is a drawback like there is a wastage of time and the reason was that this the length of the clock cycle duration of the clock cycle was chosen to be long enough to accommodate execution or processing of the longest running instruction like load word other instructions had plenty of time, had just to wait. So, what we the way to think about it is that the continuous time has been discretized or quantized into very course very big chunks of clock cycle a chunk is a clock cycle of a specific duration the duration is bit too long for many instructions. So, this discretization of time into clock cycle has been a bit too course bit too wasteful in the in case of single cycle implementation. So, that is what we want to overcome. So, the natural way to overcome is like you know to improvise on this would be to use smaller chunks you know discretize the time quantize the time into smaller duration clock cycle. That means faster clock at a faster frequency. So, natural approach natural improvisation would be discretize time into smaller duration clock cycles that is faster frequency clock signal. So, now definitely like because say we are now going to be trying to use 4 times faster clock or 5 times faster clock like these long running instructions like load word would not be able to finish in single clock cycle every different kind of instructions are going to require different amount of time. So, load word would require 5 clock cycles let us say unconditional jump can finish execution in 2 of 2 such short clock cycles and so on. So, different instructions now different instructions would require different number of clock cycles may require different number of clock cycles such short clock cycles. For example, as I mentioned we hope that we can get load word will see that shortly it will require 5 clock cycles. We will just describe how exactly we will get this unconditional jump to finish in 2 clock cycle. So, obviously number of clock cycles seems to be more, but then this duration of each clock cycle is proportionately shorter may be 4 times 5 times whatever there will be little bit of overhead, but so you know, but what we are saving on is a lot of waste that would have occurred in the low latency instructions like jump or branch and so on. So, let us see more details about this. So, first on side note just note that we have been talking about H triggered synchronous steady single clock sequential circuit to implement M MIPS. This the methodology of clocking methodology of using H triggered synchronous sequential circuit is something that makes lot of design the design process implementation process extremely algorithmic and synthesis process efficient and easily analyzable easily verifiable. So, there are lot of advantages of this over the asynchronous methodology. So, we have kind of without any further questions we are going to assume that we are using H triggered synchronous sequential circuit and that is the reason we are talking in terms of clock cycles. The time is discretized into multiple clock cycles. So, updates of this so called state of computation. So, this is the very I am using this notion very vaguely the state of computation the state of computation which is stored in the registers and memory blocks that state of computation is getting updated only at end of the clock cycles only have happening at the end of the clock cycle because that is where the triggering edge is there that is at the synchronizing clock. So, it could be if you are using the negative H trigger discipline it could be at the falling edge or it could be at the rising edge if you are using the positive H trigger discipline. So, this is and as I remarked before this state of computation that MIPS is carrying out is stored in is held in registers memory blocks and so on register file next. So, one of the things that we want to highlight through this case study of MIPS and multi cycle implementation is that it is a very important example of the concept called FSM plus data path FSM with data path FSM D our multi cycle implementation is going to be a case study of FSM plus data path. So, we will not very rigorously very formally like highlight the FSM aspect of it, but we will very clearly see there is an FSM controlling the so called data path data path as we have studied remarked on data path is a collection of data path components which process or route the data and the with how which how the data is going to be process how the data is to be routed or stored or latched is controlled by a controller and that controller is a is a typical FA is an good interesting example of an FSM overall the whole computer itself is an FSM of course, but then the concept of state means all that has been stored in the memory which might be relevant for future computation that becomes notion of state which is bulk much bulkier notion of state. And so, one typically it like likes to think of the core notion of state and core state machine core FSM and the the data processing part like you know left to the data path part of the sequential circuit. So, the whole FSM plus data path can be regarded as a huge FSM, but that is impractical to study or impractical to like analyze. So, this is the typical partition of a sequential circuit into a core controller and a core data path. So, that is what we will be emphasizing through this case study. So, so there is I will be using this notion or the terminology instruction cycle and instruction sub cycle. Instruction cycle is the sequence of or of clock cycles required during which an instruction is getting closest. And you know in case of the multi cycle implementation the instruction cycle will consist of a number of clock cycles and each one of this clock cycle I will refer to it as instruction sub cycle. Instruction sub cycle are those individual clock cycles of instruction cycle. In this individual clock cycles in this instruction sub cycle there will be something specific happening you know. So, that we have to clearly plan for clearly define and then what are the results of those sub operations where are they archived who is going to use them that is what will be that architecture will be designing in this lecture. So, what are these what is happening inside instruction during instruction sub cycles. Let us use our intuitive understanding of CPU architectures from prior courses or something that you this you might have seen before, but let us recall. So, what is happening inside instruction sub cycles sub operations some well defined sub operations and as you natural guess would be the sub operations would be like in this context fetch sub operation decoding of instruction and reading the operands operands form register file that will be a sub operation which will have concurrency within that. Next will be typically the arithmetic or logic execution for whatever reason if it is an add instruction then the in this stage the operands which have been read in the previous stage would be added up and the result will be prepared and archived somewhere in case of memory or load store instructions. This particular stage will be calculating the memory address following this if needed there will be a stage of memory access sub operation here memory is access for the data whereas, in the fetch case also memory is access, but that is access for the instruction following this optionally like in some in some cases of few instruction cycles there will be a stage sub operation called write back. So, this is the stage where like certain kind of results are being stored back into registers. So, it is a kind of commit stage it is not that this is the only stage in which results get committed the results might be getting committed in earlier stages also for different operations, but this is the most typical commitment stage committing this results. So, clearly not every instruction cycle would go through all the sub operations clearly for example, if it is just a unconditional jump instruction then it would go through fetch sub operation it would also go through decoding sub operation it may not have make any sense to read anything out of the register file for that instruction, but yes it will have to kind of break up the instruction and look at the target address do some kind of you know the processing of that get the like extract the target address and then arrange to load it back into the program counter that is the commitment part of unconditional jump which must be happening in the second clock cycle it is second sub operation itself. So, since first clock cycle of unconditional jump will be doing the fetch sub operation in the second clock cycle of the instruction cycle of unconditional jump will be doing the decoding which will essentially be understanding that it is a jump instruction and it is also extracting the target address and that target address would be registered into the program counter. So, that is the commitment part for which it may not need to wait for the write back special stage that in the second clock cycle itself while decoding extracting target address at the end of that clock cycle program counter can be updated. So, depends so this is our intuitive view of what sub operations are and like you know every one of them is going to require typically uniform similar amount of time. So, we have will be allocating one clock cycle for every such sub operation. So, in case of like I said in case of unconditional jump the instruction cycle will consist of 2 clock cycles 2 sub cycles for this 2 sub operations fetch and decoding and while decoding girls at the end of decoding immediately committing in the same clock cycle. In case of some longer latency instructions like store word the instruction cycle will go through all the 5 sub operations and there will be 5 clock cycles required for that is how that is why we remarked earlier that in the multi cycle will hope to or will aim at doing something is a 5 clock cycle something is 2 clock cycles appropriately making more efficient use of time which is discretized in smaller quantas. So, that was an intuitive picture how we had already planned to break up instruction cycle into sub cycles and each sub cycle will be doing approximately what yeah. So, we can now I think in terms of an FSM FSM diagram let us see how to capture what we have been thinking about what we have been analyzing about this. So, it is like we can regard this sub cycle sub operations happening in different clock cycles as like you know controlled by different states of the controller. So, this is a state called fetch or I will abbreviated by F. This is the state that CPU goes into when it starts execution because it has to start fetching an instruction and then decoding it executing it optionally doing memory access write back and so on. Now, after doing its work in the fetch states of the controller this state machine or what a controller would go into a state called well in which it will control the decode related operations. Decode or reading the operands or in some cases immediately after decoding putting the target address into the program counter all this will be all such control signals will be generated in this state depending on the situation. So, what is the we are assuming that you already have some exposure to FSM. So, we know that you have an idea what we are designing this FSM for not just for abstractly capturing the behavior of this the controller, but also to demark it which control signals will be generated which will be asserted which will be de asserted in which state of this controller. So, during decode some control signals to the data path will be activated some control signals will be de asserted and based on that the data will flow appropriately or the data will be archived in appropriate place. After decode supposing it is a if it were the jump instruction then after decode if the operation were jump then after decode it would go back to the fetch state for getting the next instruction. So, if the input condition whichever which is defined by like you know the value of these signals that has been generated during the decode process if that happens to be that indicates that it was jump instruction then at end after decode state the instruction cycle would be over and will go back into the fetch state for the next instruction to start a next instruction cycle. If it if operation or the instruction were not jump then we would definitely go into the state called E x of the controller. The controller now will think of itself being in the execute stage when it will try to control how the ALU is to be used for arithmetic operations on behalf of say add instruction or add immediate instruction or use ALU for like generating the memory addresses using the base register contents and the immediate offset. So, controller would be in the E x state for most of the instructions accepting jump in the third clock cycle if it goes in the third clock cycle that is what I am depicting here. Under E x certain instructions can certain kind of instructions would have finished the instruction cycle for example, conditional branch. Conditional branch we can arrange the data path and control signals in such a way that will be able to we know that the conditional branch all that we need to do is done and we can commit the results and go back to the go to the next instruction. So, if it is some kind of branch instruction which is conditional branch jump is the unconditional jump then at the end of x in execute stage we would arrange to commit the results. What would be the kind of results in case of conditional branch instruction the result that we have computed is just the target address where to jump to which the address of the next instruction to fetch that is all that result is and where it does it need to be committed it would get committed into the program counter. No other register is going to be updated no other memory location is going to be updated when the operation was a branch conditional branch instruction. Because in the execute stage it would like try to see check whether the condition is true or not whether the operands were equal to each other or not in case of branch equal to in case of branch if not equal to a branch negative it will check operand values and do comparison or whatever. So, that is what it would have done in the prior clock cycle in the decode phase itself it would have prepared what should be the tentative target address. So, at the end of this 3 clock cycles of work it would have got all the information to decide precisely where to jump to either to jump to the next instruction itself or to jump to an instruction specified by the target address. So, this is how like state transition would take place under some other situations. So, now if the opcode were not branch where in an instruction execute stage of an instruction if the opcode were not branch and also let us say it will also not and opcode were not load. After ex stage where do we go I think if the opcode were not branch and if the opcode were sorry ALU type like you know add instruction or like add immediate and so on. If this is this by the situation that we found ourselves in the controller found itself in then it would go to what like you know after completing the execution of ALU type instruction which is done inside the ALU the ALU results are out where do we go we go to the stage called write back the state called write back this is where we will go and in this state the results of ALU would be picked up and put in appropriate register which is chosen by the instructions some bits of the instruction and after write back it will go to it will go to start the next instruction. So, this will happen if at the end of the ex state we have the opcode was the controller found that it was not a branch instruction, but it was an ALU kind of instruction add instruction or set less than or which were instruction which would have completed the results and now which will be ready to write back the result into register file and appropriate register then the next state would be WB. So, this is how the state diagram is going to evolve I am not giving the complete state diagram it will be slightly more complicated than this we still have to consider other situation like we have looked at two possibilities out of ex state if the opcode were branch first of all will be in the ex state only if opcode were not jump it were not unconditional jump and if when the ex state then there are possibilities like is it a branch instruction in which case at the end of ex state we know that we will be committing the program counter with appropriate like address of the next instruction to execute or and will be done the instruction cycle will be over if things opcode were not branch, but it were ALU then we just have one stage to one more sub operation to take care of that is write back there is no memory access required in this kind of situation. So, we will have to modify this state machine further to handle other possibilities out of ex state those possibilities are if it is a if we are a load or a store memory kind of instruction let us see just to as an exercise how this FSM evolves. So, like you know we had this fetch state unconditionally like you know in just on the when the next clock cycle starts we go into the decode state we meaning the controller then if it were jump instruction then we know that instruction cycle is over and we go back to the beginning of the next instruction cycle that is starting with fetch instruction fetch or sub operation. So, otherwise we go to ex stage that is not jump I am just using very loose rotation like assuming the familiarity with designing finite state machines from earlier course or from computer architecture whatever you might have seen and then there are under some condition that is if it were branch instruction conditional branch the instruction cycle is over if it were not branch, but, but a ru type instruction then all that remains to be done is write back and then the instruction cycle is over. But, now we have situation like if neither branch nor a ru type, but load or store something like that load or store. So, if it is a load or store instruction then we this next sub operation would be something that accesses memory. So, the controller will go in a state and in this state controller will generate some control signals which will control appropriate operation to happen inside the memory. So, in case of load instruction the memory is to be read in case of store instruction the memory is to be written with appropriate data. So, now if it were a store instruction memory access itself is a commitment like sub operation because in case of store instruction once memory is written the instruction processing can be regarded to be over everything that instruction had to do is done, but if it were a load instruction of code where load then we have to go to sorry we have to go to this write back stage or write back sub operation because what we have read out from the memory in case of load instruction that has to be now committed into a chosen specified register in the register file. So, that would happen for that to happen the controller will go into this state write back state from within which it will generate appropriate clock signals to tools like you know to instruct the register file to latch or to register the data that is coming from memory into a specified register. A specified register would be enabled other registers would be disabled and the data that is coming into from memory data memory would be put in that enabled register and so on. So, now you believe that we have a rough picture of what is happening inside this FSM to complete the picture we need to really specify how all the control signals are asserted or decerted and when whether they are more type outputs which depend purely on the state or they are also based on the current inputs milli type outputs based on that further annotation of this FSM diagram will be done and we will get a complete picture when we work out something in detail in one of the later classes for like synthesis oriented description implementation oriented description which we can test it out on FPGA implement and test on FPGA. We will make a at that time we will do the exercise completely and describe this FSM in a full in the full glory. Here this is only for the sake of getting a picture of it because we need to go and now like revisit data path aspects of this design. So, anyway like you also recall that while talking about single cycle we did not have to talk about such an FSM which is not too complex, but still it had so many states of course, even there in a single cycle implementation of CPU every during every instruction processing there was sub operation fetch decode execute or whatever, but they were not happening on clock boundaries they were all happening all of them were happening together inside a single clock cycle. So, they must have been some price to pay for that of course, the clock cycle that means lower design plus there was one more one more price we paid which we did not realize at that time, but now we will see that using this multi cycle approach we will be able to save on resources we will see how. Now, to understand what is a related data path for or multi cycle implementation or design and implementation of micro MIPS or mini MIPS this is slide number 8. We will take a stock we will try to see what we need in our inventory to prepare the data path what or which components of which kind of components required of course, we require register for storing program counter of which which data path components. We require program counter definitely we require like before ALU in absolutely indispensable then we require memory to store instruction and data note that in case of a single cycle we require separate instruction memory and separate data memory, because we assume that the memory has only single port and because in case of single cycle data path we had to access instruction memory for both instruction and data cycle data in the same clock cycle which is not possible in a for single ported memories we had to have separate blocks for instruction memory and separate block for data memory, but here we will be able to show that we will be able to time multiplex a single block of memory for accessing instructions as well as accessing reading writing data. So, we will see that single memory for I and data that is going to be one of the resource optimization which would be feasible because of multi cycle approach. You probably already have an intuition about why, but just we will go through it anyway. What about what about the other things like you know data instruction memory ALU PC register file of course, which has which host 32 registers for our like to facilitate our instruction set architecture where we could use as many as 32 registers file or a block of so many registers that stored in this so called register file which is a standard data path component. What do we need more? We might like in the single cycle data path we required several ALUs to do different things like you know to compute PC plus 4 program counter value plus 4 to compute target address of branch instructions or like you know typically we require the ALU for computing the memory addresses or the addresses of the ALU type instruction or the result of the ALU type instructions. So, multiple ALUs were used or some of them where we simple ALUs just use this adders some of them were just may be shift may be simple simply more simplified adders customized adders, but they were ALU you can regard them as like some data path components of the ALU type. Here we will be able to show that again we can optimize in the resources and use a single ALU for instead of using three different such arithmetic components single ALU will suffice in case of multi cycle implementation that is again a big evidence of big plus for multi cycle implementation resource optimization. So, other than that we need more so there must be some typically cost to pay and what we might need something more and that more that we did not require in case of single cycle data path this will require for storing or registering results of sub operations. Let us understand what we mean by that we require something more something more in the data path for storing the results of sub operations you know this what we the data path or registers a memory that we used in single cycle implementation was used for archiving the results only at the end of the instruction cycle when the instruction is fully executed. Whereas now we are going to execute instructions in sub cycles in more than typically more than 1, 2, 3, 4, 5 sub clock cycles and so a sub operation will be happening in a given clock cycle. So, the sub operations results also have to be archived have to be stored for later use in some registers can it is fair amount of common sense intuition to see why we need to do like you know store the results in registers even like for this intermediate sub operations because otherwise if you were to leave them on the signals without registering them they might get overwritten by some changes to it at a later time during the instruction cycle. So, it is always a good practice like whenever you have a result of a sub operation ready latch it or register it into some registers and for that we will require some more registers other than PC and the set of registers in register file third. Hopefully that cost will not be too much we will see exactly how much but whatever we will go ahead with it. So, we are saying that our data path would have a program counter ready which can supply address to the memory for fetching an instruction then and this memory will be able to kind of will establish or will convince you that I will convince this memory can hold both data instruction we do not need separate instruction memory and separate data memory. There is a register file which holds 32 registers each one of 32 bits on the data path there will be an ALU on the data path as before as in the single cycle there we require multiple small one ALU and couple of other smaller adders anything else no this seem to be the thing that we need from our experience with single cycle data path but I am saying that now we need something more. So, what that more is we need something this memory is being used for both instruction and data. So, in certain clock cycle this memory is output is going to be the input one of the input to the memory as an address is the contents of program counter there is another possibility of supplying an address to memory but this is one possibility and when address comes from program counter the interpretation of what comes out of memory is something that is an instruction and that we will like to store it into a register called IR. So, in the in the fetch sub cycle when program counter contents are used as address to the memory the output of memory the data output of memory is at the end of that fetch instruction sub cycle the clock cycle the contents of memory are going to be loaded into IR. So, we are going to use a new register called IR. Similarly, at in some other situations like in some other sub operations for example, in the sub operation load sub operation memory access sub operation of the load instruction wherein the memory will be read out into we will need to read out memory for some data which has to be subsequently latched or registered into one of the chosen registers of the register file. So, we will have a register called memory data register in addition to the other registers other data path components. Now, you see that common memory is being used but at some time the output of memory is stored in instruction register at in some other at some other appropriate time the output of memory is going to be stored in memory data register which we know is later required to be fed into register file. I am not drawing the diagram like you know I am not aiming at a final diagram I am just like as the ideas evolve I am going to sketch the connections that is a normal way of designing such architectures what else. So, we have seen the need for this two extra registers similarly you might kind of perceive that you would requires this pair of 32 bit numbers which come out of register file which are the two operands for say addition instruction they are the results of the decode sub cycle where in the register is being register file is being read for operands and for that also we will make use of to store the contents being read out of the register file will use a pair of registers a and b called a and b. So, that the results of the decode sub operation are archived in this similarly to archived to store the results of an ALU or execute sub operation will take the output of ALU and put it in a register called ALU result ALU result. So, this is another one that we will use yeah I mean just let us summarize it again so more registers why and which ones. So, we have actually saved on something we are saved on a block of memory we are saved on a couple of adders that is we are going to prove that or rather convince you about that, but we are going to pay a bit of extra cost and that is for in terms of registers. So, which ones I R why do we require I R for storing or registering the result of which fetch sub cycle we talked about M D R M D R stands for memory data register. Why do we need this to store result of some other sub operation which sub operation sub operation of instruction sub cycle whatever sub operation which one memory access during instruction cycle of it is not that every register is required for every kind of operation M D R is going to be required only for load word kind of instruction of load word instruction. Similarly, similar to I R and M D R we will argue the purpose of A and B and ALU result which are the three more like registers that we have kind of word decided to use register A, A register and B register where they coming they were getting the data or input form their register file and the main purpose of them was to feed this values to the ALU inputs not always not always it is going to be required, but a typically that is a purpose. In fact, ALU is going to be used for something else at some other time, but whenever A and B have some meaningful values on behalf of I mean intentional values on behalf of instruction like add which is an ALU type instruction the in during the execute sub cycle sub operation this will the contents of A and B will become input to the ALU for doing the arithmetic. So, that is how the connections will be so not only connections I am just we are just kind of figuring out which kind of signals how the signals might flow the pass of the signals. So, eventually in the final implementation they would not be a single wire for this it might be like broken by multiplexer it might be a path which might have multiplexers or routers along the way, but this is this arrows of whatever I am drawing are indicating this the data can flow or it would be required to flow in some situations from register A to the first input of ALU. In some other situations some situations the data would be required to flow from register B to the second input of ALU. In some other situations some other data path would be there from somewhere else to the first input of ALU and some in some other situations they could be some other data path coming from some other source to the second input of ALU. So, all that can happen and so for that naturally we will be using multiplexers to do this routing selection of the data sources and so on. The last register that we plan to use was is kind of kept at the output of the ALU to archive the result of to store result of EX execute sub operation. All these registers are clocked. So, in fact I could kind of have this triangle indicating that they are clocked on the rising edge of the clocks provided we are using positive edge or rising edge trigger discipline. So, we just made an inventory of the extra registers that would require because we are now taking the multi cycle approach IR, MDR, IR for instruction register, MDR memory data register, A and B registers and ALU result. And we justified I mean rather like why we need them at what time they will be of use they will be used to store the results and we are given some rough picture. And this was kind of based on the rough like finite state machine that we had designed rough picture of the finite state machine that that is the basis heart of the controller. By the way I just I realize that I did not claim it to be complete, but one of the things that I missed out here was consider the memory access for the store instruction. If it is a store instruction after the memory access after writing into the memory we are done the store instruction is finished. So, after that it will go here that is if the instruction where store instruction then after the memory access that is memory write the instruction cycle is over and we the controller would go to the fetch sub cycle or fetch sub operation of for the next instruction cycle. Whereas, if after memory access if it were a load instruction then memory access must have been doing the job of reading something from the memory and then you would have to like go to a state where what has been read out from the memory is now written into register appropriately chosen register of the register file that is in the write back state. And the load instruction would only then be over and instruction cycle finish and then the controller will go to the fetch sub state of the next instruction cycle. So, just looking at the load instruction cycle load instruction cycle starts in the state fetch then it does the decoding work when it also reads the operands from the register file then it like in the execute state the fetch the load instruction would compute the memory address then it would go to memory access state in which it would give itself time give the data part time to read out the contents of the memory and then the contents which have been read out from the memory are going to be registered in in appropriate place in the register file that is in set WB state and then the instruction cycle is over and the controller will go back to the fetch state of the next will resume fetch state of the next instruction cycle. In case of jump sequence of states transitions is the from fetch you go to decode independent of any instruction and if it for a jump condition unconditional jump instruction in the decode state itself the program counter would be modified at the end of decode state the program counter would have been modified with the target address that have been extracted out of the instruction which has been decoded. So, the jump instructions instruction cycle is over once is two states are finished two sub cycles two clock cycles are over for branch there will be FDEX for store there will be FDEX MA and this and so on. So, we can trace the behavior of different instruction cycles for different kinds of instruction using this FSM I believe it is complete. So, anyway we will see a very detail picture when we arrive at a stage where we are going to implement like this and writes a very log code or target FPGA for implementation of this. So, let us like you know try to get more details of the data path FSM we just had a relook at the data path this program counter the program counter is has the access address can provide access to memory from memory we can fill up this contents of memory can be used to fill up this IR register you know in which sub operation or which state this IR will be updated that would be in the fetch state. Whereas, in some other state for some other instruction this MDR memory data register is going to be used for registering the contents of memory which state it would be the memory access state of a load instruction. And for all other instructions which are not of that kind load or its variants this will be of no use it will not be having any meaningful purpose, but it has to be there because in some situations it is required. So, cannot help it then there is a register file and sizes are not proportionate just do not worry about that what is coming as input to register file RF I call it the input to register file are like you know as we seen in the single cycle data path we require some bits of instructions to arrive here and this pair of 5 bits if you recall were used as indices of the registers which have which are which are to be read out whose contents are to be read out on this 32 bit wires. So, this 5 bits and this 5 bits are going to specify to a pair of 5 bit indices to indicate which registers are to be read out on this output and on this output. And in some situations some of this bits are to be interpreted as the index of the destination register where like in case of load instruction the data which has been read out from memory has to be archived in. So, one more set of bits will indicate the destination register index I am not giving a detailed picture here it will be too clumsy to to clutter to bring it out just the overall concept and the details will be clear later on anyway. What is the purpose of this MDR where will it be used it would be used to provide data input to say write data I need not write this I think you I am going to leave it to you to guess what is where the signals are going from that is clear this signal is going from MDR, but to what which port of RF register file it goes to here I have kind of given the answer explicitly it will go to a write data port these are 32 bit lines. So, when will this happen this transfer happen this will happen this will be arranged to happen inside a write back state of load word instruction cycle alright. In some other case this write data port of register file would be used would get fed from would get a data from some other source possibly, but this is one possible source of data for this then we have this we decided to use a and register b and then this is falling short of space I will just draw a very narrow ALU and this is one possibility of data flow into the two inputs of ALU the ALU is output goes to the ALU out register ALU result sorry this is ALU result and we will we will wondering where the output of this register go will come to that think of a hint it is to go somewhere else in more than one place may be yeah. So, PC has to get input from somewhere right from where what is the input to PC the updated program counter value and it is always a result of like it is yeah it can come from couple of sources for example, it can come out from ALU's output or it can come out from instruction register itself. So, can we try and work out the picture of where this can come from at least one possibility. So, typically what is getting loaded into PC program counter PC plus 4 and where is that PC plus 4 available it is going to be available at the output of ALU and how is that PC plus 4 generated to generate PC plus 4 through the ALU we will have to feed this program counters program counters output we will have to arrange to feed it as a one possible source of ALU's first input and we will have to feed like number 4 here at the second input. So, PC plus 4 under some situation this will become the output to the first port of ALU this will become the sorry this will become the input to the first port of ALU this will become the input to the second port of ALU PC plus 4 will be computed will be available here and in the same clock cycle we want to arrange to take this output and keep it ready to be loaded at the end of the clock cycle into the program counter this program counter is H triggered. So, in that means in the fetch sub cycle what would be happening is that PC contents are read out in the I am talking about any typical fetch cycle fetch sub operation fetch sub operation. In the fetch sub operation the contents of PC are being provided as register to memory element memory block and output of memory block are fed to this instruction register at the end of the clock cycle for fetch the instruction register is going to be going to get updated with the contents of memory. So, instruction register will hence forth contain that particular instruction which has which was at this address furthermore during the fetch clock cycle this program counter will be also fed to ALU as the first input and the second input to the ALU would be fed as this number four constant four will be fed to it. So, PC plus 4 will be computed and ALU will be configured as an adder. So, PC plus 4 will be available here and that is brought back here note that I am not bringing this back from the ALU result register it is before it gets last into the register I am taking it directly. So, this is all happening in the single clock cycle from this register through the combinational ALU back to this register or and also like through this from this register to the memory and asynchronous read and writing into this register. So, there are these two flip flop to flip flop kind of paths state element to state element paths here and this. So, this kind of data flow is happening during the fetch sub cycle of any instruction cycle. So, please make a note that this is going PC plus 4 is coming from here not from here that would be available only in the next clock cycle. So, but whereas, PC has to be updated in the fetch clock cycle itself. So, now we see that we need some multiplexers here we have already seen the need we just in the need that the first input to ALU can typically would typically come from the A register or could come from PC. In fact, early in the instruction cycle the first input to ALU always comes from PC because we have to compute PC plus 4 and keep it ready to be last into this PC register into the PC at the end of that clock cycle very early erase. And second input to the ALU at that kind of time is required to be this constant 4 on the other hand most typically second input to the ALU comes from the B register do not worry about this clutter we will redraw it for different situations. And you have this diagrams very easily available in texts and so on. So, what I am emphasizing here is how do we evolve this from scratch. So, yeah so the point was making is that we will need multiplexers here we will need something like a multiplexer to fill into the first input of ALU. And typically the contents of A will be fed will be routed through the multiplexer in to the first input of ALU or in some cases it is a PC which will be contents of PC will be routed to the first input of ALU. So, this is the this what I have drawn here is the kind of blown up version of this picture clear the stare at it for a while. So, need for multiplexers on data path so that is now clear we will need more and more of this multiplexers we need a multiplexer here because there are two possibilities either the second input will get either the data form B register or will get that constant for in some other situation in some other clock cycle. So, provision has to be there in appropriate clock cycles the multiplexer will route appropriate input to it is output that is a router. What else can we add to this diagram yeah let us look at some other kind of signals we are curious about what is happening this is that ALU result and we have kind of tentatively drawn that the output of this ALU result register is going to be required somewhere else where ALU result would be required in case of instruction like add this is the range file and this is that right port to which typically sometimes not typically sometimes MDR will be feeding to it, but at some other in some other situations it will be the result of ALU that is to be that is to be written into the appropriate register of this register file. So, this data flow required for WB of which instruction write back of say add instruction add instruction cycle. So, whereas, this data flow this signal flow would be used for used in the WB state or WB state of load word instruction whatever the mnemonic for that is load the instruction cycle of that all right. So, again a MUX is required MUX required that shows the need for something like this. So, we I mean you know we have been kind of trying to evolve the data path architecture of multi cycle CPU and we have we made an inventory of what data path components we require. We realize that this is it is going to be done in terms of sub operations at different clock cycles we are going to require more some more registers like IR, MDR, AB and there will be signals flowing like in certain from certain sources to certain destinations that we have also try to capture. Most of it most of the data path we have kind of figured out in this figure as well and in addition with this figure there are still couple of details left couple of you know data flow signal flow wires left here. You can think of it think over it on your own as an exercise with anyway complete it in the next when I resume. So, fortunately this multi cycle data path is not in fact it is slightly tidier than the single cycle data path that you would see. So, just that it is not fitting in too well here we will get a well designed figure and show you all of it, but what we have done so far is to show you the evolution of it. How is the thought process behind designing such a multi cycle data path? I hope it has been ok.