 So, now we are ready to discuss the details of multi-cycle implementation of micro MIPS that means the details of the data path and details of the controller. Details of the data path would mean we have already identified the main data path components that is certain registers like program counter and instruction register, memory data register, register file, memory blocks, ALU plus some more registers that we identified so far. But we are in the data path we also have this router components like multiplexers and the enable signals sorry the multiplexers and some blue logic would be there that those kind of details we will try to get right now. So, so far we have got a crude idea picture of the finite state machine that would control this data path how the data would be processed by which data path component the direction of the data the routing of the data which source will be used for on a multiplex connection signal. So, we are ready for that now. So, the topic is multi-cycle MIPS the details so there is an FSM whose details not all details not very but just the main details we will try to work upon plus the details of the data path. So, recall the FSM the FSM we drew it as rough in a very natural way it evolved there was a phase state doing the phase kind of sub operation related to instruction fetching from the memory and also putting the instruction into the instruction register that is the kind of activity that would happen during the phase state of the system. So, controller will have phase state and it will creates control signals so that the data path would do necessary would do necessary operations on the memory and like you know large decant instruction coming out of memory into the instruction register and so on so forth there is more to it. After fetch in the next clock cycle every instruction would go into the decode stage then if it were a jump instruction after decode jumps we could arrange things so that the jump instruction cycle would be over and would go to the phase state of the next instruction cycle. So, this is if during the decode we find out that it is a jump instruction then we finish the work and in the next clock cycle we would be in the phase state for the next instruction cycle. If it were not a jump instruction then every instruction would go into the every instruction cycle would be in the E X state execute that is where the ALU is involved address computations take place arithmetic computations take place you know and so on so forth. Now again like what next after the E X state if it were the branch instruction branch equal to let us say then we could arrange that E X state itself computes the like you know finishes the work of the execution of the branch instruction branch if equal to because in this state it would find it would be able to check whether the two operands are equal or not and in the previous clock cycle that is in the decode stage it would have found the target address with the help of ALU again and now it would be ready to commit whether the new target address has to be loaded in the program counter or not. So, the work of the branch equal to instruction instruction cycle of branch equal to could be arranged to be over finished in this clock cycle in this after this state. So, we could go to the phase state of the next instruction alright if for other instructions let us say if the instruction is of arithmetic type add or add immediate in that case we do not have any role of memory there is there are two more states not every instruction cycle goes to all the five states, but there is a write back state that an instruction like add or add immediate would go to after execution this is where the data path should be configured so that the result of the ALU would be registered into appropriate register of the register file and after double write back stage this instruction any instruction that arrives in this state would then go to the would be at the end of the instruction cycle and the next state would be the phase state of the next instruction cycle. So, that is by this ALU for the other remaining situation if the instruction where either load or store then the execute state would have computed the addresses of the locations in the memory from where to load or to which to store that would have happened in the execute stage for instruction like load and store, but now yeah so it would be in the next clock cycle it would be in the stage of doing the memory access with the help of the address that it has computed in the previous state ex state, but now if it were a store instruction then there would not be any further work remaining with the memory access the store instruction cycle would have archived or stored the result would not result whatever it wanted to store that was from the one of the registers that would be stored into the memory location whose address has been found so far is available and then it would go it would be at the end of the instruction cycle and the next state would be the phase state of the next instruction cycle that would be in case of the store instruction. If it were a load instruction then after the memory access the load instruction would have the data which is to be written back into the register file so it would go to this state. So, this is how like a state diagram would look like for in case we are supporting only this a few instructions like j m p and one conditional jump branch equal to add immediate load store and so on. So, in fact you know what was limited choice of instruction that we are describing here is purely for the sake of simplicity of the presentation the data path would be capable of supporting many more instructions the FSM can be easily modified using the same templates for many more instructions. In fact for the later discussion we will be even removing this jump instruction support for our toy example toy CPU and that can be left as that can be considered as an exercise for you to work upon. So, we will not see hence forth see the jump picture in the next few like you know slides on next part of the discussion for a while. So, that was the so this is an FSM yes I mean you will see a slightly different version of this FSM in different textbooks or like you know even next I would be describing a small variation of it for the sake of convenience or simplicity simplifying the logic implementation logic, but the essence is essential FSM is this. So, these are only the states what more do we require on FSM. In fact the main purpose of FSM from the perspective of the outside world is that FSM should generate control signals based on the inputs that it receives what are the inputs to this FSM clock input. So, many of the transitions are being made independent of any like when the clock triggers the transition is made like you know f 2 d is a transition that always happens at the triggering age of the clock, but d 2 e x may or may not happen depending on whether the instruction jump or not and so on. For example, other example is WP to F this will trigger at every triggering age of the clock it will not depend on any other input, but many other transitions you see that they depend on what the instruction the type of the instruction. So, this can be regarded as input to this FSM what about the outputs the output of the FSM that is the main thing that is the main thing from the data path point of view and the CPU implementation point of view. The control signals are going to be generated by this FSM to guide the data path to control the data path differently in different clock cycles in different instruction sub cycles. So, that those are the details which we need to work out and for that we might need to we will need we might need to I mean we might choose to simplify this state machine not by reducing the number of states that would complicated log combinational logic that one has to implement, but by you know refining the states into sub states. So, that the combinational logic that is as you know it is a heart of FSM implementation the combinational logic that generates the next state as well as the combinational logic that generates the output that would be simpler and easier to kind of you know analyze or make efficient and so on. So, then so that is what we will be seeing. So, the refinement is as follows. So, I call it refined FSM the refined FSM would have F state as it is as I told you we will be dropping jump out of the picture FD always this transition is always there at every triggering gauge provided you are in any instruction cycle that is in the phase state would go into the decode state at the next triggering gauge of the clock and then we are not considering jump, but after D we will go to EX. Now, here we make a separation we go to different kind of EX states because the behavior inside the EX execute state how the ALU is to be configured how the data path is to be configured would be different for different instructions. So, now we make the bifurcation here. So, if it were the instruction where BQ then I said I create a refined version of a state EX called EX for BQ and Q arbitrary name name is not important. So, it is clear from the what I chosen it is a EX state of the BQ instruction after this. So, the instruction cycle for BQ would be in F state then D state and then EX BQ state and then it would go here where I just mentioned why are we like in refining the state EX into several such states is because that would make our task of describing the control output signal control signal outputs more streamline easier to implement easier to describe analyze many. Then if they from the decode state if the instruction where jump so not jumps add then we would go to the EX state substrate corresponding to add instruction. Most of behavior in this states will be same or much of the behavior, but there will be difference which is just specific to exactly which instruction. So, how the arithmetic is being done by the ALU how the ALU is being controlled and so on. So, that is why I am going to have different there was nothing wrong with the previous FSN that was probably the most compact in terms of number of states, but it when you when you try to minimize number of states there would be trade off in terms of the complicated nature of the combination logic that you would be using for generating next generating control outputs and so on. So, here that is why we are chosen this approach and several textbooks will also suggest that, but they might be something intermediate between this extreme and something else and the other extreme. So, similarly if it were add immediate then I have a state called X add I then if it were load then EX load if it were store then EX store this is routinely I am creating states these are these refined substrates of this is the refinement this phi states are the refinement of the original EX state. Now, from if it we are in the EX state of the add instructions so cycle then the next state would be write back state I call it WB write back for add. Similarly, from here I would be in write back for add immediate now if it were a store instruction or load instruction the next states would be memory access memory access for load and memory access for store. In fact, it will turn out that this two states memory access for load and store are absolutely identical because what we do or other EX load and EX store would have been identical because there precisely what is happening is not this but the EX load and EX store which we have shown separate states here they would be identical because what would be happening in this state is just simply the ALU would be configured to generate the memory address which is the addition of the contents of these source first source register that is A and the immediate field of the instruction which is assigned extended and multiplied by 4 or shifted by 2 bits. So, these two states could have been like merged into one that is the kind of FSA minimization that can always be attempted but that is not the main focus of this. So, I am like you know taking the easy way out just so I have created so far F2, 7 and 11 states. So, then most of the states are the terminal states of the instruction cycles like after dub dub right back of adding immediate will go to the F state of the next instruction cycle after memory access of the store where something is going to be stored in the memory that is the end of the store instruction. So, we will be at the end of the instruction cycle. So, we go here for the lower case one more state is required that is dub and then will go here. So, there are 12 states that we have created out as a refinement of that FSM it just it will mean longer larger details but slightly simpler to read out or like analyze. So, still these are not we have to work out the details of what the control signals would be in this individual states but we have got a more clearer picture of what should be happening inside each other. Otherwise in the older FSM what would be happening in say EX state would depend on would naturally depend on whether the instruction is branch equal to other whether it is add immediate add load store and so on so forth. So, now things have been separated out. Now, let us go to data path again. So, far we have we have noticed that we could do away with instead of 2 memory blocks we could have just 1 memory block and we instead of 3 ALU or adders we are just you could do with only a single adder single ALU. So, we could see some resource optimization but there was a bit of price to pay it and that was in terms of some extra registers that we required. So, that at this the results of the sub operations like fetch, decode, execute those sub of results would be archived in some registers like A register, B register, IR instruction register that is memory data register then like ALU result. So, few registers had to be brought into the picture and we had more or less evolved at data path. Now, we just take a complete show a complete picture of it here still little abstract without details especially the names and. So, you do not bother about not being able to exactly see the details of the wire signals and where they are leading to where the signals are coming from, but I will just read out otherwise there will be just too much clutter if I put in too many details which are not really essential for understanding. So, let us recognize what is over here this is this must be our program counter right and what is being shown as the input of program counter there is a signal coming in, but that signal there are two possibilities either that signal comes from output of the ALU that could go into that could be ready to go into to be registered into the program counter or there is something that comes from here which is also a kind of the result of ALU, but a deferred result of ALU right the ALU result which has been put in the register. So, what will be coming out of here is the ALU result of the previous clock cycle what is coming out of here is ALU result of the current clock cycle that is a difference that is a subtle difference will realize why we need that. So, these are the interesting subtleties that we is we come across when we design such data pass multi cycle and FSM controller and so on so forth. So, this is our program counter I could I could start writing on this. So, now this is this clearly is the memory block rate and this must be the address input of memory and this must be the add the data output of memory which can go into either instruction register or the so called memory data register MDR memory data register it is not clear here, but it has memory data register. So, this is the data rate from the memory. So, this is the address put and this must be the like data input of memory where could come from I am showing here that it could come from this B register only from here no other there would not be any other source to from where data arrives which is to be loaded into the which is to be stored into the memory. So, this must be on behalf of the memory access store sub cycle or state see that the contents of B register are being brought on brought in over here added data input port of memory and it will be loaded into it would be stored into the appropriate that particular appropriate memory location in the memory block recall what is in A and B this is A and this is B. So, in A we have the data register rate from the register of the register file whose index is specified by the R S R S field of instruction register and in B register in the decode stage we would have read out the contents of the register in the register file which is which is given by the index coming on this 5 bit lines at this that means R T and so on. So, we know A and B what the purpose was. So, yeah we will come to the details of individual sub cycles few typical ones and then we will leave a couple of few of them as exercise for you they are just 12 states. So, we need to do this exercise that we will be doing about understanding each sub cycle separately at most 12 times we will do it a few times and then we will leave the rest to you I will do it. So, far I am not yet started that, but I am just trying to help you recognize what is there on this data path. So, the PC this memory block there is instruction register there is a memory data register who fills it up when does it get filled up it gets filled up in the memory access state of I mean it is to be like you know it is to be filled up at the end of the memory access state of load instruction. And then this is register file all right A and B registers this is the ALU which you can and this is the register called ALU result name is not too important it is it is something maybe I should call it really ALU result deferred. So, this is going to store the ALU result not of the current ALU operation, but the ALU operation that happened in the previous clock cycle. So, this is a registered result of ALU please note that I am not showing clock inputs to all these registers there will be register clock inputs to A B this register I R M D R all of them are clock registers right. So, they are registering the loading of data into the registers is happening at a triggering age of the clock provided the registers are enabled by the enable inputs. So, more details have to be seen plus we also identified the role of multiplexes you can see that like over here there are two possible sources to be signals to be loaded into program counter and depending on the state of computation or state of instruction cycle some instruction sub cycle appropriate with either this or this will go into it. And so, there must be a router multiplexer over here there must be a similarly a multiplexer over here at the address input of the memory here when this the ALU result is also possible signal that can be used as an address of the memory input you can recognize why when and all the contents of PC itself could be sometimes used as input to the address port of the memory block when instruction has to be fetched right. So, there are some more multiplexers like over here there is a small multiplexer here that would decide what should be the index of the destination register where the data is to be written into in the right back state of any those a few instruction cycles. So, what whether the destination at register address should it come from the RD field or should come from the RD field. So, this is this line corresponds to RD the 5 bit RD field of instruction register and this 5 bits correspond to RT field of them. So, the destination can sometimes be provided by the RD field or sometimes can be provided by RT field again you can quickly think about like you know when this different things will happen will come to those examples again there is a one more multiplexer over here. So, we had already evolved most of this previously when we are discussing kind of get a evolve a crude picture of FSM and data path in the earlier portion of this lecture module clear I hope. Now, let us let us try and work towards the details some more details of this data path and those details will be will make us visualize where this control signals control in the data path are arriving or where are they like arriving at in the data path for that we will have to like you know enhance the picture a bit get more clear picture of the multiplexers where the multiplexers are. So, here in this diagram which is same as the previous diagram accepting that like over here we had not shown the multiplexers explicitly say for example, over here and big multiplexer over here you see that there are four possible sources to this particular second B port of A D U either it comes from the B register or a constant 4 or a pair of signals you know there are four options over here there will be four to one multiplexer required over here. So, I have not drawn that in this abstract diagram. So, the next diagram will kind of make that clearer, but with bit more clutter like here same PC mem you recognize all of that instruction register file R F A B A U A U result do not worry about the name so much there like you will get used to that. So, now you see that there is a four way multiplexer over here as I told you about that the in one of the inputs to this multiplexer is the directly coming from the B register other input is coming feeding constant 4 and 2 other 2 inputs are coming out of as you see that IR instruction register and these are the 16 bit immediate fields. We will tell you the details about it with some bit of operation on that. So, there are two possibilities here in one go we will not be able to it would not be easy to describe why this four, but as you see like different instruction cycles sub cycles different states took together we like you know we superimpose them then we get this picture, but individually in pieces will be have much cleaner view of the different parts of the sub of this data path. So, we will work with such pictures next so in my notation this oval this oval this oval this narrow oval and so on are multiplexers again from I know drawn the directions, but it is fairly clear from the context of the figure what the inputs are and what the outputs are I am also not showing the control select signals on this multiplexers as well as other control signals, but now the need of the control signals will become clear which control signals are to be generated and accelerated deaccelerated in which clock cycle that will become clear. So, let us take the example of let us take this again just an outline here this some small insignificant thing relatively insignificant do not worry about it this outline we just have it as a background and now we show how the data path is configured for different sub cycles. Let us take the first sub cycle that is fetch which is common for all all instruction cycles all instructions. So, what happens in fetch sub cycles? So, this is for fetch sub cycle what happens which registers are involved which signals are have some meaning in this fetch cycle sub cycle is the added involved in during fetch sub cycle. Obviously, program counter is involved obviously, memory is involved because that is where the instruction is stored and instruction is to be brought out from there. So, clearly like you know program counter will be used and the program counter value will be fed as input to address input to the memory memory will be involved in the fetch sub cycle because memory contains both instruction as well as data we want an appropriate instruction at location quantity 2 by this then the output of this data output of this memory is arrives over here and this I R instruction register is this instruction register is enabled to load into itself or register into itself the contents of memory that are coming out over here and that would be the instruction. So, that in the next state the instruction is stable available here and that can be decoded analyze for future rest of the sub cycles for that instruction cycle. What else is supposed to be happening in this fetch sub cycle this there is instruction fetch, but there is a little bit more than that in this in this sub cycle or clock cycle itself will be using the ALU to generate PC plus 4 then the tentative or typical next address of the to be loaded into program counter and that for that we will make use of the only ALU available will configure it to do a simple addition of what addition of the program counter value which we sent and with the help of this multiplexer will routed to this first port of this and what else what is to be added to PC 4 because the next instruction 32 bit instruction will be at 4 bytes away from the current program counter address current instruction address that will be this 4 getting routed over here and the output of this ALU in the same clock cycle we are not taking the registered value of it is to be arranged to be brought here. So, there is this role of this multiplexer there is a role of this multiplexer there is a role of this and this multiplexer involved in this fetch sub cycle. So, they have to be given appropriate this select inputs to this multiplexers multiplexers have to be appropriated defined. So, that appropriate routing of data occurs we get a point this ALU PC plus 4 all this is required because in the fetch sub cycle we are also loading program counter updating program counter with the with the next typical address of the next instruction PC plus 4. Of course, we know if it is a jump instruction or a branch equal to that kind of instruction then there will be there will be possibility of this value with being over written with appropriate value, but that is at a later time in the fetch sub cycle this is what exactly happens is it clear it is just for to disambigate between this two we will call this ALU result and this ALU out or you might use suffix like ALU result registered and ALU result or unregistered whatever like. So, now we will just identify some of this multiplexers and give them some names. So, that we can refer to them bit more conveniently. So, there is this multiplexer which I call whose which is about selecting the this multiplexer will be for the purpose of selecting the source to program counter PC SRC that is the name I give it. So, then this multiplexer I call it all many text books will also call it I R data. So, that is how we will refer to it means it is select signal will be called I R D instruction or data because the job of this multiplexer is to send either the address of the instruction or the address of the data that we will see later is address of the data and when will it arrive over this line over this line. So, we are giving this names will be used as the names of the select signals on the multiplexer there will be a one bit select signal there will be a one bit select signal here and our convention another convention that I am adopting is that all this multiplexers the upper one is 0 and 0 port 1 port number 1 port number 0 port number 1 here interest more interesting port number 0 1 2 3. So, this multiplexer being 4 input multiplexer would require 2 bit select signal this is a 2 bit multiplexer 2 input multiplexer the 2 ports are 0 and 1. So, I have avoiding the clutter by like not putting this like you know names and whatever symbols inside, but they we are having some uniform convention of like you know reading out what this what this ports are like upper one is 0 later ones are 1 2 3 whatever. So, some more multiplexers this multiplexer a big multiplexer I call it A L U S R C B this is I regard this as the B port of the A L U and what is the source to that B port of A L U is determined by the 2 input or the 4 input multiplexer we see select signal is going to be a 2 bit select signal 1 down to 0 in very long notation 1 dot colon 0 in V H D L 1 down to 0 whatever. So, I am not showing that do select signals they are implicit and they will be referred to by the names that I am choose over here. Similarly, this will be this I will call A L U S R C A like it is to be read out as the multiplexer select signal which will decide what is the source to the A port of this A L U whether it is something coming from PC or it is something coming from A register here do not confuse this A with A, but yes there is a correlation and then this 2 will come to them a bit later when we like talk about them this is something more that you see over here this is the left shift by 2 bits which we require when because when doing the address arithmetic the offset is given as a word offset and that offset has to be like you know sign extended by this block and then left shifted by 2 bits. So, that it becomes the byte offset and 32 bit byte offset 16 bit contents of IR immediate field are to be sign extended and then left shifted by 2 when it is to be interpreted as the byte offset in case of some address calculation. So, we will come to that later. So, please make a note of this names that we have chosen for this multiplications multiplexers and we just finish showing the phase sub cycle the portion of the data path that is active what is happening. So, what we need to notice is that this in this PC S R C will be will have value 0 because it should allow the upper input to go to PC I already should have value again 0 because that is upper one that is going through this multiplexer at it is output. What about this A L U S R C B you see that constant 4 which is on the second port that is port number 1 of input number 1 of this multiplexer that should go through to the output and hence to the B port of A L U. So, this should be this should be 0 1 or 0 1 binary rather in very log notation to take B 0 1 that in decimal it is just 1 and what about this value of the A L U source A should be 0 again because that is a 0th input that is which is the which is coming from program counter is being rooted over here. So, that is what we mean by the details of the control signals inside the fetch state f state of the F S M controller F S M. So, inside fetch sub cycle we will just document or we will make a note of where one was PC S R C PC S R C would be set to the values 0 comment is that like A L U out which is PC plus 4 is to be loaded into is to be brought in at the input of program counter to be loaded into it. And we use the enable signal called PC enable program counter enable we assert it. So, this is the program counter slightly expanded this is the clock and there is a enable input this is clock this enable input and this enable input is going to decide whether the 32 bit data that is coming at the D input these are D flip flops typically are this this is coming out of that multiplexer which is controlled by PC S R C and this is a signal called PC enable. So, this is also coming out of out of the controller controller F S M this says that we should let this A L U out which on which PC plus 4 is computed on with that value is stabilized that should all be allowed to go through and this one should not be this is the select input sorry and this is the enable input to the register program counter for letting whether this is to be loaded letting this value to be loaded into this or not. So, this is going to be loaded. So, but that is not already fetch sub cycle some more control signals are also playing a role what are those this is related to update of program counter which is a kind of auxiliary task of fetch important, but auxiliary the main task of fetch is to get the instruction from the memory for that there is a memory beyond that PC this we have seen, but now there is a role of this multiplexer which we call I R D. So, it is a job in this cycle is to root the content of PC to go as a address of the mem block this is the address input and this is the data input that we do not need to bother about we are not writing in the memory on the other hand we are reading data out from the memory and that would be there are two places where this thing information can go either I R or there is an M D R, but we will enable this register using this control signal called I R enable this is I R. So, we will assert this to one. So, that the contents of memory are to be loaded latched into or registered into I R we will set this PC SRC to 0. So, that area out is is which contains PC plus 4 is to be loaded into this PC which is controlled by setting this to 1. This should be to allow the this to go through as the address this should be set to 0 this should be enable what else ALU that also remains exactly what is happening at ALU how the ALU as environment is to be controlled data part of the data part. So, recall at ALU again we are still talking about fetch sub cycle what is happening at ALU. This is the A port of ALU can be called port 1 port 2, but let us call it A port B port. There is a multiplexer here there are two possibilities here where this is coming from this is coming from PC remember that and this is coming from 4 different sources. For example, this was coming from A register this was coming from B register this is from some other places not important right now, but we have to let this pass we have to let this pass. So, for that we have to set it to 0 and this ALU SRC B we have to set it to 0 1 that is 1. We already remarked on that right. So, that this PC and this 4 could go to ALU we also have to configure the ALU control to say add function that is auxiliary detail that you can work out and so on. So, because of which we will have PC plus 4 available at the ALU out and this ALU out goes is arranged to be routed to program counter and so that it gets loaded. So, this is what is happening in different parts of the data path inside fetch sub cycle it is not complicated, but there are one has to get a clear picture of what is happening in different parts should not miss out on something if you miss out on any one of them then the controller is not going to control the data path correctly, but it is careful analysis is always is quite easy and like HDL description allows us to kind of simulate and test things and debug. So, it is not any black magic or black art of doing things. So, this is this completes our discussion of fetch sub cycle similarly let us do one more couple of more cases after the study of fetch sub cycle and what happens on the data path during that will study naturally the next one that is decode sub cycle there are 10 more states to be considered, but we will do a few of the 3 or 4 of this and leave the rest to you. So, again this is the outline which will start filling up this one small thing that is missing here this is that left shift by 2 yeah I think hopefully now it will be in the picture. So, in the decode sub cycle what is happening the instruction is being decoded that is a major thing. So, I R is going to be playing a role this is instruction register. So, and also the register operands are being read out. So, this is the register file so that is also involved in this. So, the decode stage the appropriate registers of this register file are going to be read out and the contents of those registers are going to be stored in A and B registers this is A and this is. So, and how is H which register chosen to output to A that will be based on 5 bit signals coming from here which are the R S field will give the details later on it is like same as the single cycle data path details the contents of B are chosen the register for B is chosen by this RT field. So, this influence which register is loaded into A which register contents are loaded into B R S. So, this is the register read out part of the decode sub cycle what else must be happening looks like this is looks like on the face of it since we already have now IR like in the previous clock cycle IR instruction register has been updated at the end of previous clock cycle. Now, we have the whole instruction over here some most of the instructions of code and relevant control information is sent to the controller from here. And that is what that is these are the inputs the FSM note that note now input non-trivial input to FSM FSM are now available in IR. Then clock is always an input to the FSM because like you know that is on when the transitions are triggered, but the non-trivial inputs are now available in instruction register and that would be used in this and the future states sub cycles. There is something more that we do in decode sub cycle for example, if it were a branch instruction or jump instruction we will we have the opportunity we will see that the ALU is not being used like you know for the like you know arithmetic operations which are part of this add instruction or the ALU is also not is free because load or store instruction would be using the ALU in the EX state that is the next clock cycle. So, right now ALU is free. So, it is in a good opportunity to make use of ALU for doing something which it which can be done right now that is calculated tentative branch target address. The branch target address is addition of the contents of PC the branch target address can be calculated now please make note please note that because PC has been updated to PC plus 4. This is a peculiar thing about MIPS the target address will be old original program counter plus 4 plus whatever is available specified in the immediate field of the instruction registers sign extended shifted left by 2 bits. So, that it looks like a word it reflects the byte address of that instruction word and then this so we have to configure this multiplexer to let this go through and configure this multiplexer as in the fetch cycle to let this go through. So, this will be ALU SRC will be 0 and this will be 2 ALU SRC a will be 0 and ALU SRC b will be 1 0 and then this do we feed it back immediately to this no we do not want to now program counter because we are in the decode cycle program counter contains original the instruction current instructions address that was earlier there plus 4 remember that in fetch cycle we updated this, but now we have calculated at the output of ALU the tentative branch target address, but do we have do we bring it back here no we simply we are because we are not going to take a decision on updating the program counter right now in this decode because it is too early if we are jump instruction because it is unconditional we could update this right now, but now we are talking about we have left out from consideration the jump instruction we assume the jump we are not supporting in our current exercise. So, the only so the only instruction which would require this branch target address to be loaded in PC is are those conditional branch like instructions, but whether to load the PC with branch target address would be decided only in the ex state when the operands are compared whether they have found to be equal to or not whether something is found to be negative depending on the condition of the operands or arithmetic on the operands. So, since we want to defer the decision of like whether the branch target address is to be loaded here to the next clock cycle what we will do is that the information about a target address that we have calculated we will put it in the register this is the ALU out register ALU sorry result register this this one will load. So, it is enable signal to this will be asserted so that the branch target address is going to be stored here that is part of the decode sub cycle ALU is used it is not just reading out the registers pair of operands from the registers it is not just sending the instruction bits to the controller, but also tentatively calculating the calculating tentative branch target address and archiving it storing it in the ALU result register. So, that it can be used possibly in the next clock cycle otherwise we will just forget about it, but it is a safe place to keep it cannot be used immediately like in the case of PC plus 4 was used in the fetch sub cycle immediately in the same clock cycle the program counter was enable to be updated. Now, there is no such need it where it if jump where to be handled jump instruction then we would have seen that kind of you, but now we do not need any enabling of PC and so on and that is why this yeah so then we can similarly like you know list out which multiplexers are involved which other control signals are just set to what values. So, control signals for decode sub cycle so as you see here this multiplexer this multiplexer I have to yeah they are involved this multiplexers are not involved this multiplexer is to decide which value goes it gets written into the register that is not needed right now this like enabling of this register is involved these two registers they are perpetually enabled they are enabled all the time because you know we would not really bother about what is inside them only when we like you know we need them will it will be made sure that we have some sense like you know correct value there otherwise what is going into it this you would not be necessarily worried. So, we always let this registers be all the time enabled to be loaded you can verify that that is safe and fine. So, we do not need additional like you know control logic we just hard wire this enable signal to this A and B registers to be 1, but here this is we have to be more selective. So, let us make a list of things control signals to ALU SRCA that is that is ALU SRCA 0 1 we have to let PC which is now PC plus 4 you know that go through here and this other big one we have to let the constant 4 go here. So, this should be set to 0 this should be set to sorry not this right it I am sorry that was in the phase sub cycle it will be this which is basically 16 bit immediate field of instruction is sign extended to make it a 32 bit number is left shifted by 2 that is multiplied by 4 that is fed to the second input of this multiplexer and. So, that is 2 tick B 1 0 that is equivalent to the second. So, this is being connected. So, ALU source is 0 ALU SRC B is 2 and ALU result that registers enable signal the flip-flops enable signal is to be asserted high or 1 whatever right anything else I think that is that is more or less other ones other things take the default value that is their asserted or whatever like you know and yeah when we only when we need to start writing the very log code or HDL VHDL code then we have to be like you know we have to be exhaustive about all these values and so on, but this is more about getting an idea how do we go about it I think the picture should be fairly clear. Next we will let us take a look at the execute cycle execute sub cycle of the BQ instruction because this is we know something interesting will happen here this will this is the end of the instruction cycle this is at the end of the instruction cycle for BQ branch if equal to instruction. So, should be interesting. So, let us now again take one outline of the data path and see what is going to happen in execute sub cycle of BQ ALU is involved right. What will be happening in the branch if equal to the A and B which now hold the values of appropriate registers selected by the instruction so A and B are involved this values contents of A and B are compared for equality for that we have to arrange this multiplexers to root this to the ALU and ALU will may be do subtraction or check for equality by some comparator and put a status flag say 0 flag put it out 0 flag it is an output of the ALU status output of ALU this will go to the controller or some other control circuitry. So, that is what will be happening in this more interestingly like you know we also realize that we need the branch target address that was computed during the decodes sub cycle which is the previous clock cycle where is that available that is available inside the ALU result we just remarked on that right that we are using ALU in the decode clock cycle for something which could be possibly used right now and that is the case that we are at this is ALU result register. So, what about this this could possibly be conditionally be what that should get loaded into program counter PC, but it is not necessary right it would depend on whether the 0 flag says yes or not. So, depending on this 0 flag the enable input of this program counter is going to be asserted or not, but the point is that the multiplexers have to be configured to allow the comparison of A and B the two operands and ALU result which stores what branch target address stores branch target address that branch target address is we have to be ready to load it into the PC in case the conditions have been found to be correct or whatever like you know consist you want. So, this router this multiplexer also has to be configured appropriately. So, this multiplexer this multiplexer and this multiplexers are going to and note that ALU out is not of any interest status flag of ALU is of interest this is interest. So, this is how the portion of the data path that is which is relevant to this sub cycle. So, let us start like you know exercising our memory like what those names which multiplexers and what are the names multiplexer ALU SRC A that is involved is ALU SRC B that is involved we have to ALU controller ALU control should say compare or subtract. Subtract the two operands and if the subtraction gives a 0 then the 0 flag will be asserted and that is what we are using ALU source A says that A should be allowed to go through that is the port number 1 of that multiplexer here it is a port number 0 of that multiplexer the if it calls node this diagram this is. So, here I will be putting 0 because this is 0 and here in the select signal I will be putting 1 these are the think of these are select signals this basically 2 bits 0 0 2 bits. So, this selects this particular input port to go through that is the contents of B. So, I will write here ALU source B as 0 0 and this as 1 what else PC enable and also that multiplexer before PC SRC that should say 1 because this is from ALU out and this is from ALU result. So, branch target address is in the ALU result. So, that is why we should select this to go through, but what about PC enable that would depend on a combination of appropriate combination of 0 flag it would depend on it would depend on 0 flag right because sorry I could as well say 0 flag if 0 flag is asserted that means we have to kind of load this branch target address which is available in ALU result into the PC. So, the logic of PC enable logic driving PC enable will be just whether the ALU status says 0 flag is true or not this will mean some combination logic here. So, are we missing anything the execute P Q sub cycle more or less if something is missing to it as an exist think of it as an exercise for you to get more practice with this I think you have got mostly the idea I will just take let us just do a memory access memory access of store. Let us just draw the data path portion and leave the control signal exact definitions as an exercise for you is a store is in fact the terminal state last state at it is at the end of the instruction cycle of store instruction. So, what must be happening of course memory is involved. So, where does the address of the memory should come from where should it come from it should come from what is happened before memory store memory access of store it was execute stage of memory access of execute stage of. So, what happened before this instruction sub cycle in store instructions cycle it would have been the execute state sub cycle of store instruction in which the effective address of the memory location would have been computed by the ALU and stored into the ALU result for future use. So, ALU result would contains contains the memory address which was computed in the EX stores sub cycle execute stores sub cycle. So, this is now to be used over here. So, this multiplexer called I R D is going to play a role and is going to use the select signal value as 1. So, that this is allowed to go through address input of memory this is the store instruction. So, what also have should happen in this committing phase stage of store instruction cycle is at a data input port of memory this is the data input port of memory at this data input port of memory where should the information come from information should to be stored into the memory would be available in the B register. So, this is playing a role verify that from the semantics of the store instruction R T field of I R is going to indicate the register in which from which the data is to be taken and stored into memory location which is whose address is given by R S index register and the immediate field. So, that address has been computed previously B register has been loaded previously and that has to be used as data input over here. So, this is the B it is not clear it is a B thing remember that in the previous clock cycle this R T specified field of I R like you know indicated which register has to be contents of which register have to be loaded into B. So, that has already been done now there to be put into the appropriate location in the data memory data portion of the memory note that there is no role of register multiplexer there is no role of ALU here there is no role of register file and program counter and so on. So, this is the terminal state or sub cycle of stores store instruction similar just to end things I mean there are still more, but to get an idea of like what happens with register file that if some interesting cases remain which will take them as exercises later on. So, let us look at ex state of memory access or write back let us consider write back stage for say add instruction. So, here we assume that in the ALU result previously at the end of the previous clock cycle the result of the addition has been stored alright. So, now ALU is not involved now it just a matter of putting this in appropriate place where should this go to this should go to this should arrive finally, at the write data write data port of the register file. So, register file is going to be involved this multiplexer is going to be involved because that decides what should arrive here and what it should allow is this ALU result should be allowed to go through the 0 port of write data of this multiplexer. So, this multiplexer will give it a name. So, it is we can call it meme to rage this is meme to rage multiplexer is selected with 0 input and. So, that this ALU result is archived in the into the register file which register is it archived into that would be decided by this input of the register file and that is why this mux will play a role and it is the R D field R D field of of this instruction register. So, instruction register is also going to be involved in this picture. So, note that this was R S which is not playing a role here R T is also not playing a role, but R D got cluttered here. So, so again call it D 0 1 maybe we can do more justice to it by drawing some the part of this separately this is R R has some portions R S R T R D immediate and something else this is not really proper to the scale and whatever. So, R D is input to this multiplexers R T is also input to this multiplexer R T and R S goes separately to register file this is source 1 this is the index of source 2 this is the index of destination data 1 that goes to a data read out from the register specified here that goes to b that we are not concerned about in this case, but just having a complete picture of this is the data input to the register and where which register that is decided by this, but in this current instruction it is going to be and this normally it would have been from the M D R sorry this is not to be worried about the green lines are indicating our data path now, where is this coming from this is coming from the A U result A U result. So, I should technically draw this in green. So, indicating that register file is in picture of in this sub cycle it is getting updated because of this information this is not A and B we do not care A U also we do not care, but we care about the result of A U that we obtained in the previous clock cycle which is stored over here instruction I R is involved M D R is not involved A and B are not involved in this so called what piece is this, this is the write back of I R before that the A X of I R a execute stage of I R instruction had taken place and A U result is has the like the result of the addition operation the destination field of I R has the information about destination register and this this multiplexer allows this to go through this this multiplexer I call it M M 2 Rage. So, M M 2 Rage should be set to 0 indicating that it is a register that should go not the M D R in some other case in fact it will be in the load write back stage of load instruction cycle it the data flow will be like this while doing that exercise you will be able to use this will see the use of M D R and the other value for M 2 Rage. So, this should become 0 this should be this I call it Rage D S destination. So, what is it give it arbitrary name that should be 1 this control signal so that this goes through so, this will be this together gives some picture of a good enough picture of I guess the different sub cycles and how control signals are output by the finite state machine in this in those individual clock cycles.