 Welcome again. In this lecture, I will introduce a toy version of a very popular pioneering CPU called MIPS. So, I am going to refer to this toy version of MIPS as M MIPS, M probably stand standing for mini MIPS or micro MIPS. So, this MIPS is a pioneering risk CPU, risk stands for reduced complexity instruction set computer. This complexity is bracket, it is typically read out as reduced instruction set computer. The reduction is not real in terms of number of instructions, reduction is in that sense of the complexity, the simplest. The instructions are quite simple and the simplicity of the instruction set leads to simple and elegant micro architecture. And micro architecture by that we mean the data path and the controller which kind of define that is the CPU. So, later on in a couple of subsequent lectures, we will describe a multi cycle data path implementation of micro MIPS as an illustration of the concept of an FSM driving the data path, which we have already illustrated in the other simple examples of GCD and shift add based multiplication. A CPU micro architecture has always been an interesting and a wholesome example of digital system. So, that is that is what makes it kind of relevant in the context of this course or so. And a VLSI or FPGA implementation aspects of a CPU micro architecture, this mu this is mu architecture micro architecture, this forms an important component of the pedagogy of VLSI design. So, we will be able to address in this couple of lectures some aspects of implementation issues and comment on it at least and this you will get some pointers to go further than this towards more specialized courses on processor design or you know VLSI architecture design. So, there is a good reason for a good example of this kind which is a very standard example in various courses specifically on micro computer architecture and digital system design. So, in micro MIPS micro architecture will be focusing on a very small subset of instructions that would illustrate although it is small subset it would still illustrate most of the important fundamental ideas in the micro architecture and the implementation. So, this MIPS micro MIPS or mini MIPS is a 32 bit architecture by that I mean the instructions are 32 bit wide and data words are also 32 bit wide. There is good amount of uniformity in this RISC risk processors which make things simpler to design, simpler to analyze, simpler to like you know implement. So, and at the top level at a behavior level a processor is described by its instruction set architecture, we kind of understand what kind of instruction that processor is capable of executing executing and then the data path and the controller are designed to facilitate execution of this but those particular instructions with the help of components like ALU register files, multiplexers, shifters and so on so forth. So, this even the small subset of MIPS which we call MIPS is capable of very standard arithmetic logic, data movement, branch and jump kind of instructions. So, I am just saying some standard things there is nothing yet to focus specifically on. So, it is a standard CPU of toy example of a standard CPU that we are going to consider here. It will although it is small it will still illustrate most of the concepts. So, for example, any standard CPU I will toy it here again a standard CPU should be capable of executing an add instruction arithmetic instruction. So, typical example of arithmetic instruction is add. So, MIPS in particular has this like as the instruction by which it sort of to which the arguments are the destination kind of index which is which specifies the destination I will tell you what exactly that four means or dollar four means and a pair of source indices. And the MIPS has one micro architecture of micro MIPS has a so called register file which is a collection of registers every register is 32 bit wide and there are 32 such registers. So, these registers are going to be referred to as R 0 or reg 0 R 1 and so on R 31. And in particular this reg R 0 is really not a set of memory location it is all hardwired to ground all zeros. So, it would look as if the registers number 0 always contains 0 the trick is to hardwired to the ground. And the other 31 registers are general purpose they are I mean most of them are general purpose and every one of them can be written to this one you cannot the R 0 you cannot write to it always contains the constant. So, now this coming back to this instruction this 4 3 6 they refer to the indices of this of the registers in the register file. So, this particular instruction the semantics of it is that two of I mean this two register this two indices specify this the source registers from where the source operands will be could be read out and then they will be added. And the result of the addition is going to be kept in the register with the index 4. So, R 4 register with the index 4 is going to be loaded with the result of addition on this two operands which two operands the contents of registers at the index 3 and register number 6 ok. So, R 3 and R 6. So, in this particular instruction the contents of R 3 and R 6 are going to be read out and the result is going to be somehow rooted back to R 4 something like that ok. So, the data path is going to definitely have this register file which has 32 registers data path would also need to support because the instructions needs to support addition this should be a support for addition instruction. So, they must be an adder subtractor kind of ALU and there should be some set of routers like multiplexers which will help the things go I mean which will root the data from appropriate register it towards ALU and the result of the ALU back to the register file. There will be more things of this kind required. So, will develop will gradually can evolve the picture we can evolve this complete micro architecture by understanding the need of each and every instruction of this kind, but then the that is quite a routine exercise. So, I will be slightly I mean I will be covering this a bit fast at a fast pace ok. Yes, now specifically what we need to understand is that how the instructions are represented. So, let us get back to the same example and instruction like add ok. So, clearly you know recall what this means is R 4. So, this 4 is this the index of the destination register and 3 and 6 are the indices of the source registers ok. This the encoding of is as follows as I remarked every instruction in this CPU is going to be 32 bit wide and the this 6 MSB is more significant bit of this 32 bit from 0 to 31 are the opcode of add ok. Then we have 5 bits representing the we have 5 bits representing this particular 3 that is a first source index. So, that is this number 0 0 0 1 1. So, this is equivalent to 3 that whatever we are specifying as one of the source operands. Then the second set of 5 bits is going to like you know encode this number 6 which is to be an interpreted as this second source index and then number 4 which is the index of the destination register for the addition operation and that is 0 0 1 0 0. So, we are going to use 5 bits for every such index because there are this all this 3 indices in an instruction like add which operates on pair of registers the contents of pair of registers and it puts a result back in the in one of the registers in the register file. So, this is called an R type of instruction R type instruction ok. So, just purely working on the register information in the register file no role of any data memory or anything else here ok. So, so there are we require because there are 32 registers we require 5 bits to encode the indices of them and this 5 bits this triple of 5 bits is going to contain this index of the source index of the source another source and index of the destination. So, this covers about discovers up to here we have used of 16 bits 6 bits for this and 10 bits for this and then this is 5 more bits in the remaining 11 bits right 10 bit number 10 to bit number 0 we have some extra information in the context of add it will also like you know there will be some the encode of add will not be completely like described by this 6 bits. In fact, this 6 bits coupled with the I think it is again 6 more bits this this bits and this bits together is going to be is going to going to indicate that this particular instruction is an add instruction working on this pair of indices or describing the source and this index sorry this pair of indices describing the source operands and this in this particular index describing the destination operator. This will ignore what this this this is where some for some shift instruction the amount of amount of shift is described here again this requires 5 bits 5 bits can specify up to 32 number up to 32 31 I think whatever this is in shift instruction this will be used as shift amount we will not be bothered about that. In fact, we are not bothered about this like details at all we just need to get an idea about how this in the simple manner this instructions are encoded. So, you get add then we will get couple of other examples and that that would suffices to get an idea about how the instruction looks like and how parts of the instruction are to be used for further processing in the micro micro architecture. In general add has the like structure that we will specify the index of the destination and also specify the index of a pair of sources and semantics is r d s t r s r c 1 plus r s r c 2. Similar to add instruction we have subtract instruction with similar structure where this d s t s r c 1 s r c 2 are again numbers from 0 to 31 representing the indices of the registers and other than this 2 arithmetic instructions we have the logical pair of logical instructions for doing bitwise r and bitwise and again the same triple of indices and there is one interesting instruction called set less than s l t again which is. So, this instruction is interesting it has the semantics that r of the destination register is to be loaded with one provided the content of the register indicated by the first source index is less than the content of the other register in which is indicated by the second source index in the instruction. If r of this contents of register with this index is less than contents of register with this index then the destination register is going to be loaded with one that is we are setting the destination register to one setting means setting to one typically otherwise we are is setting it to 0 or we are clearing it otherwise. So, it looks like it looks a bit funny or like too specialized, but it has a lot of use I mean it is going to be very much useful in the comparisons comparison based branching because we are not going to we are going to restrict our attention to a very simple small subset of this already reduced complex instruction set and that instruction set should still be sufficient to be able to do any kind of computation. So, this s l t in conjunction with a simple branch instruction conditional branch instruction called BEQ which again has very similar, but slightly different, but a register mostly register based format branch EQ takes a pair of register indices source one and source two compares the contents of this two register this register specified by these two indices and if they are found to be equal then it would make a relative jump to an address specified by some constant specified here in the immediate field this IMM stands for immediate read this as immediate just clarify it soon anyway I just talk quickly talked about BEQ because I mean brought it brought of the mention of it because BEQ in this s l t in conjunction with BEQ is going to be a very powerful kind of instruction. So, more about that in any standard text on computer architecture especially many text books which use MIPS as a vehicle to describe the concepts of computer architecture and organization standard books one of the best known book is by Patterson Hennessy and so on. So, the many of you might already be familiar with it. So, I am not going to spend time on this. So, other than like you know. So, we have seen some R type instructions like add sub R and s l t they work on their of the type instruction the destinations register specifier and source pair of source register specifiers that is a format of this such instruction they could be more of this you know in the standard MIPS. Then other than this the we have so called I type instruction immediate type instructions in which not everything is from and registers and going to the register, but an example of that is like you know immediate version of this add instruction. So, this is to be read as add immediate. So, add immediate format of that is there is a destination specifier because finally, the result has to go somewhere, but what are the source operands? The source operands are not pair of source indices source register indices, but just one of them is going to come from a register specified by a particular by an index here a source index and the other source operand is going to come from this immediate field. Now, where is this immediate field? So, again look at recall the 32 bit instruction format for add it was like you know 6 bits where for opcode and similarly, the most significant 6 bits will be used for opcode of add immediate, then there will be 5 bits for the destination. So, 5 bits for the SRC 1 the first source specifier, then the next 5 bits will be for destination this particular DST information destination index will be stored in the next 5 bits that cover 16 bits and the remaining 16 bit will store a constant that is that is to be added to the content of this register specified by SRC 1 and result is to be loaded into stored into this register specified by DST. So, let us look at an example add immediate it is simple anyway add immediate say dollar 4, dollar 6 and say 173 in decimal. So, this is going to be 6 bit opcode of add add I and then this is the 0 0 1 0 0 sorry that is a destination right. So, this will be the source and then followed by the index of the destination register that is 0 0 1 0 0 and 16 bits this is an 8 bit number right less than 250 255 0 0 0 1 128 plus 32 that is 160 plus this is 13 plus 32 45 plus 128 set. So, it will be the 16 bits this is number 4 this is number dollar 4 dollars 6. So, the semantics is that is R register number 4 is loaded with the addition of contents of register number 6 and this binary number which is this which stored in this immediate the 16 bits here. If you compare it with the encoding of the add instruction add instruction required 3 register indices 2 for source and 1 for the destination. Here we require only 1 register index for source because the other source operand is going to come from the 16 bits here and the destination which would have been in the third set of the 5 bit indices now it is going to be over here. So, these 16 bits are going to be free for holding an immediate I mean holding a constant which is to be treated as an immediate operand. So, this is the immediate operand 16 bit immediate this is the specifier of direct operand this is the specifier of immediate operand and similar to add there will be add I there will be subtract I immediate with the same format. So, this is an add I and subtract I and similarly R I logical R and logical and immediate are examples of I type instruction immediate type I format instructions knowing knowing their of code figuring out that it is the instruction say add subtract immediate I add immediate we know that the bits of the instruction R to be interpreted as source address source index destination index and 16 bits for immediate unlike in the case of the R type instructions when we have to look at this like you know bunches of the triple of 5 bits for 2 source indices and 1 destination index and remaining some of the bits can be ignored there is no role for immediate operand in the R type instructions like add subtract or SLT and so on. So, I think SLT immediate is also available in the MIPS instructions at architecture. Now, other than this there these are arithmetic logic kind of instruction that we have seen so far and I also mentioned to you about a branch is equal to instruction that will have a pair of there is no role of destination as such there is a pair of source indices SRC 1 SRC 2 and there is an immediate field. So, here the encoding of code of BQ will be here in the 6 bits and then SRC 1 will be here in the 5 bits SRC 2 will be in next 5 bits and this 16 bits will be used for specifying a constant. What is the meaning of this semantics of this instruction? The program counter which stores the instruction or address of the next instruction is going to be updated with current program counter plus 4 will just come to this plus 4 ignore it for a while. The main the role of immediate is this program counter is essentially updated with is like you know updated with this is treated as a relative offset and this is treated as the 16 bit number is treated as the word offset. Remember that I mentioned that this CPU is 32 bit wide, but we are using the addresses at the byte level. Next word of 32 bit is going to be 4 bytes away and so this immediate field is being interpreted as the number of words like relative offset in terms of number of words. So, this is this is the amount like by which I mean that this is the offset which is to be added to the program counter to get an next value of the program counter that is where the next instruction is supposed to be. So, this is the conditional jump which is of the relative jump kind and but the main thing is that this word of word offset has to be multiplied by 4. So, shifted by 2 that will shifting shifting it left by 2 will have the effect of multiplying it by 4. So, this multiplication by 4 will convert this word offset into a byte offset. So, immediate so offset in terms of byte address. So, assume that the memory can refer to individual bytes. So, the addresses refer to different bytes. So, if you want to go to the next word then you have to change the address by 4. This plus 4 you ignore it for a while we will talk about it later. It just one of the settled features of not too important. MIPS has taken a decision like you know just part of the architecture it was decided that the relative address the relative offset would be offset would be added to PC plus 4. So, PC plus 4 is the default next program counter right default value of the next program counter. But so instead of like you know just that default is going to be updated by the offset. So, the compiler or the assembler should make sure that if you are if you try to encode the address of like you know jump if you try to encode a jump address then you know it should not be this difference between the address of that place to jump and the current instruction, but rather the difference between the location value to jump minus PC the address of the next instruction the default next instruction which is PC plus 4 that is why this funny thing. So, very quickly we will just wind up the couple of other instructions. In fact, important instructions that are left are like load and store. So, for the instruction that we have seen have been of the type arithmetic logic or the control flow branch jump is also there unconditional jump, but let me just ignore it for a while I mean it is not a very important I mean you can easily extrapolate by understanding the architecture for this instruction what will be happening for jump. So, let us look at the memory these instructions which work with the memory. So, load and store these are memory based or other memory type instructions they involve a memory access instruction sorry this absolutely necessary right just by doing just by providing the ability to do arithmetic and control branch and jump you are not going to be able to get data or archive data in places registers are there 32 of them let us plenty for a lot of applications, but not in general situations like where you might require a lot of data they will have to be stored in arrays, arrays could be much bigger than the number of registers that you have. So, you need to make use of data memory and that is why to access the data memory you have you need to have a couple of instructions to provide for that and one such is load this will load something from the memory the syntax is load specify the target index the index of the register in which you are going to load something what you are going to load is specified over here I am just using different kind of names which will suggest the purpose. So, the semantics is the register whose index is given by this target target target index between 0 to 31 this. So, this specifies a register from the register file that register is going to be loaded with contents of data memory at appropriate location which location at appropriate address that address is in that address is calculated by reading out the contents of the register specified by this base and again we have the shifting left by 2. So, the encoding of this instruction load instruction is going to be like this is 6 bit of code for load 6 bits then you have a 5 bit representation of this particular like you know the base a specifier base register specifier then 5 bit for specifying target and that leaves us with 16 bit. So, again now this immediate is going to be regarded as an offset because this is address calculation right this is calculation of an address of some location in the data memory and since we have this 32 bit registers all the register if it is to be loaded with something it has to be loaded with 32 bit content and that has to be like you know a word from the memory 4 by 2 word. So, this immediate 16 bit immediate field is treated as a word offset, offset with respect to the base which is specified in this in the register with this particular index. So, this is as you can imagine you can easily imagine that this will this facility is provided for array kind of indexing. So, this is also an I type instruction because there is a role of immediate field and corresponding I mean complementing load we have a store instruction whose syntax is similar and index specifying a register and index specifying another register and an immediate field, but this semantics is it store right storing something from like something from a register file into data memory. So, data memory at location certain location is going to be updated with the contents of contents of R T the register with the index R T and which location in data memory just the way it was done in the case of load. So, address calculation is same as that for load instruction this is going to be the base address that is why we call that register base register plus immediate by the way I was just prompted that I made a slight miss one point in describing the branch instruction this is branch if equal to this is the read the cement the syntax of this is it takes pair of source indices describing the registers which are to be used as source operands and specifies an immediate field as an offset for relative jump relative branch. So, meaning is that PC is to be updated with PC plus 4 plus if and conditional right if the contents of registers at the indices source 1 and source 2 are equal. So, recall this is the word offset like you know so to make converted into byte offset we shifted by 2 left shifted by 2 and this PC plus 4 is the default next program counter that that is updated with this offset. So, if this is done if the equal to equality holds you know this the contents of this to registers are equal otherwise PC is going to be updated with just the default again this plus 4 why because this is the I mean instructions are 32 bit wide and addresses refer to bytes. So, we to refer to the next word the next instruction we have to add 4 address of next instruction word a word instruction is word long right 32 byte long it is a 32 bit architecture. We can study the variations of this anyway but right now for simplicity of the presentation we are assuming data also to be 32 bit instruction to be 32 bit, but we have to kind of explicitly convert the word addresses to byte addresses because the addresses refer are mentioned at the byte level right. So, that was just for a some is that point while describing the BQR instruction in a hurry. Let us quickly look at the role of this I mean the use of this memory instructions. So, memory access instructions for example, and there is this immediate field and the source index has to be treated as the base. So, here on this slide you see some typical like you know kind of C code statement where you have an array a the contents of the location 8, 8 location of array a is to be added with the variable h and the result is to be put in variable g. So, let us say the compiler has kind of associated g with the register number 1 associated the variable h with register number 2 and the base address of a is stored in the register number 3 let us say. So, now this one is this particular C statement is going to be compiled into a mix code and then the index a is referring to the 8th word right start of the of the array a. So, 8th word is going to be required is going to be at the offset of 32 bytes because there are 4 bytes per word. So, that statement is going to be converted or compiled into this pair of statements of course, one is for loading something from the data memory because the array a is going to be stored in memory only the base address of the array the pointer to the array a is stored in the register in one of the registers specifically the register number 3. Now, here this L w stands for the load instruction this because I mean sometimes you might have variation called load single byte, but default load instruction is load word L w stands for load word we will go to focus only on load word instruction in this particular lecture. So, look at the syntax it is saying that dollar 4 is specified as a target address that is a address of the index of the register in which the word data memory word has to be loaded into. So, why number 4 that seems to be some temporary location. So, you load the something from data memory which is at offset of 32 bytes from the base address stored in register number 3 that is 32 dollar 3 32 is the is going to be stored in the immediate field of the instruction dollar 3 refers to the base register that means that is number 3 is going to be stored in the SRC 1 field of the first bunch of 5 bits in the load instruction encoding. And this number 4 refers to some register which we are going to use it as a temporary location storage for the 8th word of array a which we have loaded into which we are read from the memory. Now, for the next instruction what we need to do is we just need to add to H this memory contents that we have read out H is going to be bound to register number 2 that we are assuming. So, we need to add the contents of register number 2 and register number 4 because that is where we have just in the previous instruction the data has been loaded into. So, the add instruction is specifying that in the destination register number 1 load I mean update the destination register that is number register number 1 with the addition with the result of the addition of register number 2 and register number 4. So, this pair of instructions together is equivalent to the C statement G equal to H plus 8. So, this is the simple illustration of how the immediate field is used the offsets are calculated by shifting by 2 and the base address is stored in one of the registers and the offsets are specified in the immediate field. And so, like you know this shows that such a simple small instruction set can also take care of your need of working with arrays which is the most elementary data structure, but powerful enough to mimic any kind of advanced data structure. So, in principle you can write any kind of program with the simple instruction set it is in the sense to ring complete. We do not need to go into that, but it will suffice not only for our explanation of fundamentals, but also it is a complete CPU by itself although tedious to program varies we will have to execute a lot of instructions of the simple kind to do something routine. So, this was the base instruction 32 is the offset and similarly we can look at couple of other examples, but I will leave it to you to study it on your own registers are used compared to the data memory because they are much faster than for access than memory typically in a single clock cycle you can access a register, but for accessing SRAM or DRAM we require longer time then yeah. So, we will not go into that again, but just a small point is if you want to operate on data that is stored in memory then first you have to load it into registers operate on it using an arithmetic or logic instruction and put it put the result back in a register and then store it into memory using a store instructions. So, load and store would be required other than the arithmetic instructions or logical instructions. So, if you want to operate directly on memory if you want to operate on memory locations data and memory locations then you have to use a complimentary pair of load and store also. So, more instructions need to be executed. So, it is it is quite like you know important that compiler make sure that much of the computations arithmetic happens on data like you know most of the data that is that you require repeatedly frequently is stored in is bound to registers rather than stored in some arbitrary locations in the memory. Arrays obviously, have to be stored in memory because arrays are typically large and you do not have large enough set of registers to store big arrays, but local variables they are to be like you know use frequently it makes better sense to use them in bound them bind them to registers and similarly the there is a role of immediate operand and so on. We have already discussed that the way the steps in instruction execution what are they like you know during a single clock cycle in which one instruction executes will be assuming that our CPU simple enough that in one single clock cycle and single instruction will completely execute. Next clock cycle the next instruction will execute which would have been fetched from by using the address in the program counter and so on so forth. So, in the beginning of the instruction execution program counter will supply its contents as an address to instruction memory. Now, there is a something called instruction memory and there is something called data memory. So, why this two things have to be separate will remark on that a bit later like you know. So, once the program counter supplies an address to the instruction memory do some sometime during the same clock cycle after a bit of delay the memory is going to supply the its content at that particular location and that would be the instruction which is to be now processed. So, this is so by this time we can say that we have fetched instruction. Next looking at the 32 bit instruction we identify depending on different types of formats of instruction we identify which of this parts of instruction refer to register indices source or destination. Look at the source indices supply them to register file and some mechanism there will like you know locate the appropriate read out the appropriate registers and bring them on the at the output of the register file. Now, depending on the instruction class we will use the ALU to calculate either the arithmetic result of what we have just read from the register file or we will treat the information that we have read from the register file and some part of the information from the instruction itself namely the immediate field and use it to calculate the memory address that is required for load or store we have seen that immediate field and the source index they together kind of specify the address of the memory location in the data memory. So, one part of the address which is the base address has to be read from the register file and the offset is to be is to be obtained from the instruction itself the 16 low 16 bits of the instruction which are left shifted by 2 bits of course with the sign extension like keeping the polarity of the offset the same. So, ALU would be used for arithmetic result or for calculating the address of memory location for load store or also as we have seen in the case of branch instruction branch equal to the target of the branch that location has to be that address has to be calculated again by this same ALU. Fortunately like one main ALU is going to be used in one clock cycle depending on whether the instruction is arithmetic or whether the instruction is load store or whether the instruction is branch for one of this purposes. So, we do not need three separate ALUs for this thing at in any given clock cycle only one of this kind of activity will be happening. Of course, we require a couple of other ALUs that will I am sorry I just made a like this is completely wrong what I said. In fact, we are going to begin with single cycle CPU and we will be requiring multiple ALUs just the way which is hinted at there is something called instruction memory and something called data memory two separate like blocks of memory this we will require multiple ALUs. And the reason is that in any given instruction we will even if it is an arithmetic say if it is an arithmetic kind of instruction we will require ALU to do arithmetic at the same time we will require the some other ALU to do calculation of the next like next instruction right PC plus 4. If it were a branch instruction then we will require one more ALU to add like you know in the same clock cycle we will require one more ALU to add to PC plus 4 the offset offset left shifted by 2. So, we will require multiple ALUs we will soon get a clearer picture of that after like of course, like you know after we have calculated the address in case of certain memory add like you know instructions like load and store we will actually access the data memory by supplying that address and either reading taking the data from the data memory for the load instruction or storing the data in some data into the data memory that is on behalf of the store instruction. And like in the meanwhile during the clock cycle one of the ALUs would have computed PC plus 4 that is incremented PC this would be a simple ALU which will essentially be adding a constant it can be optimized adder. And the result of that is going to be kept ready to be loaded at the end of the clock cycle into the program counter. So, PC is going to be updated with the target address or PC plus 4. PC plus 4 is a default and in case of branched address if the branch condition is successful then another ALU would have updated the PC plus 4 I mean would have added offset to PC plus 4 and that result would be ready to be loaded into PC ok. So, we will soon see what kind of components will require in the data path will require a program counter or register which will be updated at the end of every clock cycle either with PC plus 4 or with a target address or with destination address in case of unconditional jump will require memory definitely we require memory, but here we see that we require instruction memory as well as data memory separate blocks. We will see the reason for that moral is obvious I will just mention it in a minute or so. Then we require a collection of registers organized in an array of registers which we call in a standard terminology we call it reg file. Then an ALU will require more ALUs as we will see soon. We can mention some more things about individual components a bit later. This is how abstract but a simple picture of the micro architecture will look like ok. Note that this is only data path the controller will describe shortly a bit later like ok. Again you see the role of program counter instruction memory register file ALU data memory. In addition you see two more ALUs which are more specific one of them is a very specialized adder which is adding four the one on the top left portion of this slide and then that all evidently that computes PC plus four which is the default next program counter address. But in case the instruction is branch then there should be a facility of update upgrading I mean sort of updating this PC plus four by adding a part of the instruction the immediate field shifted left by two to that by another adder and that should be rooted back to PC ok. We will see this is just a picture I mean we are not yet fine looking at a data flow we will soon look at it over the next couple of slides. But you see the wiring here they has the PC can supply the contents of PC will be going to the to the specialized adder which is adding four ok. PC is going to be at the input of program counter we have either the contents of the specialized adder which is adding four or the next ALU which would be adding part of the immediate field offset ok. At the output of instruction memory we have a couple of wires a few wires going to the register file which are basically there are 32 bits 32 lines coming out of instruction memory because we are reading our instruction which is 32 bit long a few of the bits going to the register file specifically those three lines that you see are like the 32 bits of instruction are coming out over here. Evidently this is this the this three sets of wires are basically bunches of five wires it five bits the specifying the source the source indices source one and source two and a third like you know in optional cases like add register type indices another bunch of five bits ok which we have seen in certain instruction there is a role for it. This oval the green ovals are multiplexers drawn in this funny way this is some standard convention used in the book by Paterson Hennessy we will have to get used to it just to keep this diagram less cluttered and more abstract ok. So, we will come to that anyway. So, this this blue blocks are like you know this left shift by 2 because we know that we have to take the image some portion of this instruction the 16 bits namely the lowest 16 bits they have to be left shifted in certain situations when we have to use them as offset word offsets in case of memory address calculation for load store or in case of the branch target calculation in case of the instruction like branch equal to ok. Then the output of register file corresponding to the two source indices the two of the registers are read out and they typically are used as inputs to this ALU ok. So, this one is used as the first input and this one is typically used as second input, but there is a multiplexer here and that tells us that the second input to the ALU can come from where it can come from the left shifted version of part of the instruction which is the which is for the purpose of like you know shifting the immediate field by 2 bits and using it as a second operand to ALU this will be happening in case of as you can see load and store instructions ok. Similarly, this left shifter shifter will be used for calculation of the branch target address adding the offset shifted by the immediate field shifted by 2 to the to PC plus 4 that has already been computed by this particular ALU ok. And you see that this multiplexer will optionally let either the PC plus 4 or this branch target address calculated with the help of immediate offset ok. So, this multiplexer is going to be controlled this is going to be controlled by this situation in the instruction that whether the instruction is branch whether the branch condition has been found to be successful or not it will all depending on that this multiplexer will choose whether this one is to be set through or this one is to be sent through ok. Coming back to the ALU this is the multiplexer at the second input of the ALU there is I have not drawn the complete picture there are couple of other sources. In fact, this picture is a bit incomplete because it shows that the immediate field has to be left shifted by 2, but that is for the load and store instructions for address calculation, but for add immediate instruction the 16 bits of this instruction have to be directly sent over to the second port of ALU ok. They do not have they should not be shifted left by 2. So, I should show another alternate path to this particular another alternative possibility to reach the second input of ALU ok. So, this in this manner we can describe the wiring of the data path components. For example, ALU output is to be either rooted back as data input to the register file. So, that that data which is the result of the ALU computation say on behalf of add instruction or subtract instruction or or instruction is brought in here and it is going to be stored in a register specified by the by this 5 bits ok and so on so forth ok. So, anyway instead of showing you vaguely we can look at more specific pictures for different instructions. For example, here in this slide we have I have marked in thick red flow of data on behalf of some instruction and you should be able to guess for which kind of instruction this kind of flow of data occurs ok. So, see what is happening over here what is being depicted is that from program counter the address is going to the instruction memory ok. The program counter contents are also going to this adder the constant 4 is going to this adder and the result of this adder is that is the one which is going to be rooted by this multiplexer back to program counters. That means at the end of this clock cycle the program counter is going to be updated with PC plus 4. So, apparently in this particular scenario there is no role of this particular adder what it does is of no interest to this particular to program counter ok. Now, let us look at the other part program counter is going as an address to instruction memory the contents of instruction memory are coming out here and there 5 bits of them are fade over here 5 bits are fade here 5 bits are fade over here. There is an interesting multiplexer here we will talk about it a bit later and what we see is that corresponding to this 2 bits this pair of 5 bits the register file realizes which pair of in register start to be read out on this 32 lines and on this 32 lines. So, this 2 pair of 32 lines this pair of 32 lines act as source operands to this ALU. ALU will work on the contents of registers which have been read out here on this lines the result of the ALU is going to be sent back through this multiplexer back to the register file. Again you see that there is no role of anything coming out of data memory it is not going to be rooted by this multiplexer to some place. So, what do you guess I mean this is some kind of data flow that is happening on the data some data processing that is happening in the ALU a flow of data through this multiplexers in approach from appropriate sources many of this data lines are inactive in the sense that they do not seem to matter. So, what do you guess I mean what must be what could be the instruction which is causing this data movement and data processing they could be multiple options. I mean in fact clearly it looks like it does definitely it is not a branch instruction because there otherwise would have been a role of this. It is a register type instruction because you see that all this three sets of 5 bits are of use. So, this 5 bits this pair of 5 bits are indicating the pair of source registers the contents are being available at this pair of 32 bit outputs and the result is coming back into the register file. So, it is a R type instruction where the three operands I mean the pair of source operands and the result are all like specified with respect to the register file. The destination is in the register file the sources are in the register file. So, it is the R type instruction and it could be add or subtract or or or like you know and depending on how the ALU is configured. There is some control signals to the ALU which will set the ALU in a addition mode or subtraction mode or logical operation mode. So, it could be an add instruction or a subtraction instruction or logical or logical and or set less than for example. So, next we can look at this example there is a bit of slightly different data flow. So, again you can guess you see that may not be complete in this picture, but here this red thick red line carried over to this. So, this means the PC plus 4 is being computed and is being rooted back to and kept ready at the input of program counter. So, program counter being a synchronous like you know clocked register. So, at the end of the clock cycle program counter will be updated and in the in the beginning of the next clock cycle we will be processing I mean we will be effectively using this new address as the address to the instruction memory next instruction will be fetched out and it will be processed. So, this is what is happening over here again since there is no role of this this cannot be a branch instruction, but you see that the instruction that has been read out fetched 5 bits of this instruction are going over here and although I not shown it too clearly the next 5 bits are are going over here. So, this is the destination this you can guess I mean I deliberately not shown this labels because this is I am treating it as an exercise for you to do a lot of guess work and get more familiar with the data path of this and it is basically quite simple, quite easy to work out from scratch and assimilate the understanding of that. So, there is a 5 bit index for specifying one of the one of the source operands there is a the remaining 5 bit next 5 bit is going to be used as destination address. Now, look at the ALU, ALU receives the ALU processes as one of the source operands the result from the register file which is basically the first source operand specified by this the index of index specified by this particular 5 bits. The second set of 32 bits from the register file is of no interest what the other input that ALU is using is coming through this multiplexer this multiplexer is allowing this input which is basically which is clearly the 16 low 16 bits of these the instruction that has been read out. So, it is an immediate field a shifted left by 2 bits right it is passing through this left shift combinational block it could be a barrel shifter the result of the ALU is clearly in this case it must be the address right address is being sent out to the data memory and that the data memory is access at that address and that contents of data memory are rooted by this multiplexer unlike in the previous case this multiplexer is taking the contents from data memory whereas, in the previous case for the add or subtract instruction this multiplexer was taking the result of the ALU and it was being sent to this input port of data input port of the register file. So, this is the 32 bit data that is going to be latched into a register specified by this target information. Over here this multiplexer is the one which is going to take care and not let this 32 bit data which is irrelevant and instead let this immediate appropriately shifted immediate field into the second port and compute a load the address memory address compute a memory address and so on so forth. So, this must be since it is reading the data memory like it must be a load instruction if it were writing into the data memory it would be a store instruction. So, this data flow must be for this configuration of data path must be for the load instruction load word instruction. Now, what about this one as you can guess there is a role of data memory something is being written to data something is being provided here the address is being provided and here the data is being provided where is the data coming from yeah the data is coming from this second output of this register file which basically is the content of register specified by this 5 bits the second set of 5 bits over here. There is no role of destination address over here because register file is not being written into whereas, a 2 like you know 2 registers are being read out from the register file first of them is being used as base address that base address is being added to the word offset and that becomes the address information for data memory and the data to be stored into the data memory is coming from the second register it is specified by this sets set of 5 bits. So, again no role of this branch related area this is a store instructions data path to now here. So, this way like you know one can study that this one can analyze that this particular data path is more or less adequate of course, there is just one or two minor things are missed out here that optional like for example, I mentioned that if it where we were to show add in the simulation or the how the data flows for add immediate instruction then we would require the 16 bits coming from 16 of the 32 bits coming from the instruction memory to be routed without shifting into the second port of area. So, this multiplexer will have to be big enough to be able to either send this 32 bits or this 16 bits in with sign extension I did not remark on that or like the immediate field without shifting or yeah this three possibilities have to be supported by this particular multiplexer. So, it has to be at least 3 to 1 multiplexer this has to be 2 to 1 multiplexer this is another 2 to 1 multiplexer 2 input 1 output multiplexer and so on so forth ok. As an exercise you can like you know sketch out the trace out the configuration of data path or the data movement for the branch instruction branch equal to it is quite simple. So, with the with this I will stop just last comment that we need to like and understanding the performance issues we notice that longest delay is the one which we know that is the one that determines the clock period. So, the longest delay is along critical path the critical path is the one which causes the longest delay of combinational logic and here intuitively is clear that load instruction is the one which has the longest like you know delay because that is where the lot of data processing and data movement is happening in particular for the load instruction memory is being read out the contents of parts of the instruction that has been read out are going to the register file register file warms up it supplies the like you know supplies one this is a one operand to ALU the other operand comes from the instruction itself ALU computes the address of the memory location from which we have to read the data. So, that addresses to be sent to data memory data memory has to kind of take its own time generate the read out the data for you and that data has to be routed to the register file. So, there are some 5 sub stages in a load instruction comp. So, this seems to be the longest instruction even compared to longer compared to store and other instruction. So, evidently the critical path is going to be decided by the way data moves for load instruction and then you see realize that for many other instructions like you know things are much simpler for branch equal to there is much quicker completion of the work of this processing of that particular instruction. So, the instruction the clock cycle which is long enough for the longest instruction must be west might be west full might be a bit of west for the instructions which would have prepared a result much earlier like you know branch target address would have been computed much earlier by the way there is a role of ALU other ALU in the branch target instruction that you can see when you do the exercise yourself. So, there is a performance issue here the clocks a period is bad enough long enough for accommodating the longest instruction. It is not feasible to have varying period for different instructions sorry for the typo here spelling mistake to vary ok not feasible to vary period for different instruction that is fine, but we can improve this performance by a multi cycle execution or pipe lining. So, in the next couple of lectures we will talk about multi cycle a version of the CPU which is where we see the role of a finite state machine and that is a that is what we wanted to discuss mainly. This was just a background setting up a background of CPU architecture micro architecture that it is there is a data path and similar data path will be used with a bit of changes and it would be adapted to multi cycle execution with the help of a finite state finite state machine which will act as a controller of the data path. Here if you take a closer look the control of the data path means like you know control of the multiplexes control of the ALU which is all combinational in a given cycle knowing the instruction we know completely how the multiplexes have to be like you know controlled how the ALU has to be controlled and how the memory has to be controlled. So, there is there is no need of any statement any kind of state information inside a controller itself controller is purely combinational ok. I will stop here.