 And welcome to today's lecture on instruction pipeline. In the last lecture, I have discussed in detail the basic concepts of pipelining. What is pipelining? When can it be implemented? And how can it be implemented? In general terms I have discussed that. And if you remember I told that pipelining can be implemented for a task if a particular task is repeated large number of times. And if you look at the processor, I mean the way the processor works a computer processor of a computer works, you will find that as soon as you turn the power on it starts executing instructions because basic job of a computer is to execute program. And program is nothing but a or say an ordered sequence of instructions. Whether it is an application program, a systems program irrespective of that you will find that a program is nothing but an ordered sequence of instructions. And obviously you have to later on you will understand the significance of this of the meaning of this ordered. Why ordered is necessary? Because you cannot really arbitrarily write a sequence of instructions. There should be some ordering depending on the application you are implementing. So, instruction pipelining is very important and particularly we will see that instruction pipelining is being used since 80s in processors it is implemented since 80s. And here is the outline of today's lecture on instruction pipeline. After giving a brief introduction I shall talk about ideal conditions that means the ideal conditions for instruction pipeline. Then how the instruction pipeline can be implemented? I shall discuss about CPI of a multi-cycle implementation, then pipeline registers need for pipeline registers, then speed of achieve by using instruction pipelining. And of course, there is some limits on instruction pipelining that also I shall discuss in this lecture. As I have already mentioned computers execute billions of instructions. So, instruction throughput is what matters to improve the performance of a processor. Instruction throughput is very important throughput of a processor when for executing instructions is very important. And that is the reason why instruction pipelining has been used for a long time and this is the first kind of parallelism that was incorporated in processors since 1980s. And the earliest as I mentioned since 1984 pipelining was used to enhance the processor speed and it is an I have already mentioned that pipelining is nothing but an implementation technique. That means you can have an instruction set architecture, ISA instruction set architecture represents the specification and that specification can be implemented in many ways. A processor with the same ISA can be non pipeline, a processor with the same ISA can be pipeline. So, the instruction set will not change. So, far as the user's view or programmer's view of the is concerned it will not change, but there will be difference in the execution time, there will be speed up and so on. So, without affecting the ISA implementation is done and in the last lecture I have discussed about the MIPS processor, ISA of MIPS processor and discussed the non pipeline implementation of MIPS processor. And today I shall extend it for pipeline implementation, how can it be done? So, to implement instruction pipeline these are the following steps to be performed. Number one is divide instruction execution across several stages. So, what you have to do? Here the task is execution of an instruction and that task has to be divided a number of sub tasks. As I mentioned pipelining can be implemented if a task can be divided into more than one sub task. So, the execution of the instruction should be divided into several sub tasks that is one very important requirement. Second is each CPU accesses only a subset of CPUs resources. A central processing unit will have various resources like arithmetic logic unit, adder, registers, multiplexers, bus and so on. So, a subset of these resources will be used for executing a particular sub task or rather sub task of an instruction and different instructions are in different stages simultaneously. Here as you have seen in our earlier example, whenever you are executing several tasks in a pipeline manner, different sub tasks of different tasks will be in different stages of execution. Similarly, here also you will find that sub tasks of different instructions will be in different stages of execution when you go for implementing instruction pipeline. Ideally, a new instruction can be issued in every cycle as you have already seen in the earlier case. Per cycle, one task is entering the system and here also we shall try to I mean issue I mean one instruction will be entering the pipeline when you go for executing instruction, but I have used a term in the beginning ideally. That means ideally this is the possibility, but in real life you will see that we may have to deviate from this ideal condition. And cycle time is determined by the longest stage as I have already discussed at length in the last lecture. You may have different stages and different stages may take different amount of time to perform the different sub tasks. And the longest sub task, the stage which is taking longest time will decide the clock frequency of the pipeline system. So, these are the basic implementation issues. Now, coming to the simple risk data path, let us consider the non pipeline data path. We have already discussed it at length. This is that MIPS data path which has got first stage is performing instruction fetch, second stage is performing instruction decode and register fetch, third stage is performing execution, fourth stage is performing memory access, fifth stage is performing write back. So, you have got five different stages and these are the five different operations which are being performed while executing an instruction. So, it is natural for us to implement a stage such that each of these operations is performed in each stage. So, we shall be implementing pipeline with five different stages. And particularly we have taken up the risk like processor for implementation of pipeline because of various advantages of risk like processors or risk processors. Number one is all ALU operations are performed on the register operands. I have already highlighted in detail the difference between risk and risk processors. And particularly the risk processors are simpler in terms of implementation and these are the key features again highlighted all ALU operations are performed on the register operands and separate instruction and data memory. So, we shall be using two separate memories instruction memory where programs will be stored and another memory that will be used is data memory. And later on we shall see use of these two memory systems in a single system will help in implementing pipelining. It will facilitate easy implementation of pipelining only instructions which access memory are load and store instruction. So, load and store instructions are the only instructions which will access memory because we have seen that ALU operations involve only registers. That means, operands will be taken from registers, results will be also stored in registers for all arithmetic and logical operations. And as I have already mentioned instructions can be broken into the following parts instruction fetch from instruction memory, instruction decode and operand read, instruction execution load and store operands and write back results in registers. And to highlight again the operation of different cycles there is a you can see the left hand side that is instruction fetch is in is said in shaded form. And so the operation that is performed in instruction fetch cycle is you are loading the instruction register you have got instruction register. And that instruction register you are loading by reading the instruction from the memory by the address supplied by the program counter. So, program counter is giving you the address and that address is used to fetch the instruction from the memory and that is being loaded in the instruction register. In addition to that it is also performing the calculation of the next program counter value by adding forth to the present value of the program counter. So, NPC is equal to PC plus 4 that operation is also performed in this instruction fetch stage you can see. Then the next stage your instruction decode instruction decode will take the input from this register instruction register and it will apply that instruction register will provide input to the register file. So, you can see operand A will come from a particular register and the field 6 to 10 will provide you the address of operand 1 and operand B will be taken from another register and the field 15 to 16 will provide you the address of the register. So, these 2 are the operand addresses A and B and that will be applied to the registers and also in this particular stage you will calculate the immediate data immediate you know that value by adding the immediate 16 bit immediate data is available as part of the instruction and this is sign extended to generate 32 bit data in this stage also. So, of course, it will depend what kind of instruction it is executing. So, if it is you know ALU operations obviously this will not be required, but whenever load store instructions are performed then this operation is required. However, the hardware is there for both I mean reading registers as well as for generating the immediate data sign extension of the immediate data. So, this is the instruction decode. So, this can be the second stage of the pipeline in our pipeline implementation of the processor then the third stage will perform the execution. So, third in the execution cycle depending on the instruction that is getting executed it will perform different things for example, one possibility is that ALU output will be equal to A value of A that is being applied here and the immediate data that is being applied here. So, these 2 will be added to generate an address. So, generate some result with the help of this third stage address actually this will be needed for in load and store instructions where the address is generated in this manner. Then it can perform different arithmetic and logical operations and depending on that type of instruction ALU output will be equal to A function B that means depending on the operation to perform addition, subtraction multiply then and or. So, the 2 operands are available here and that operation is being performed bit by bit operation or addition or subtraction whatever it may be and result is produced here. Then ALU output is equal to A operation immediate. So, whenever you are performing immediate using immediate mode of addressing then it will perform this operation A value of A will be applied to one arm of the ALU another arm will be provided with this immediate data and that those 2 will be added and it will produce the result here. Rather not added it can be any operation provided by the ALU control signals. Then the ALU output next one is your NPC plus immediate. So, sometimes it is used to generate the address. So, NPC which is applied here with that immediate data will be added to generate a address which has to be subsequently loaded in the program counter. So, this type of thing will be required in branch jump this type of instructions. So, and of course this will be dependent on some condition. So, condition is decided here that condition value 0 calculation is also done that means to whether the result is 0 that computation is done and depending on that this multiplexer output is selected whether it will go from this branch address or it will be taken from the NPC plus 4. So, this is also done in stage 3. Then for load and store instructions you will require the memory access. So, that is performed in that can be done in stage 4 where you will load that there is a register load memory data which will be loaded by the value coming from the memory and address will be provided by the ALU. So, ALU output is giving you the memory address and that output is applied to the memory and the data is being stored in this load memory address register or it can be memory ALU output. So, the value of B I mean B will be loaded in the memory location. So, in this particular case register file will provide you the data will provide the data and address is applied by the ALU and in that that is your load instruction where the data will be loaded into the memory and these are the conditional cases. If condition then ALU output is loaded into the program counter ALU output is loaded into the program counter this way else PC is equal to next PC. So, you can see these are the it will this stage 4 will perform all the memory operation needed for different types of instructions and lastly that in a last stage is the write back stage in the write back cycle the particular register has to be loaded write back means you are essentially writing back the result into the register. So, the multiplexer output is providing you the value to be stored and it will go to the register the address will be supplied by the instruction itself. So, in the that is the field that is the field supplied by the instruction field 16 to 20 that will give you the address and data will be coming from the output of the multiplexer and that value will be loaded in the proper register or it can be that ALU output can be directly also loaded in some instructions or that data can be coming from the load memory data that means in case of store this memory data has to be loaded into the register. So, sorry that is that is your load the other one was stored storing means you are storing the value from the register into the memory that is store and whenever you are loading it from the memory to the register it is load. So, load and store in all these cases this write cycle write back operation is required. So, you can see we have divided the ALU operation in five stage identified the functions to be performed by different stages and also we have identified the necessary hardware resources that you require for different stages. So, in this way you can form the different stages and implement the pipeline system. Now, before we go for this pipeline implementation let us see some kind of comparison of this I mean before we can implement pipelining and compare it with the existing non-multicycle implementation. Let us see what is the value of CPI in multi-cycle implementation and also later on we shall see what is the value of CPI in case of pipeline implementation. As you can see here as you know we have got different types of operations like data manipulation the data manipulations are essentially the ALU operations. In our pipeline in this particular case if we go for multi-cycle implementation that is mean one cycle for this one cycle for this one cycle for this one cycle for this and one cycle for this how many cycles are needed to perform different arithmetic and logical operations. You can see all our arithmetic and logical operations involve only the registers. So, you do not require data to read from the memory. So, it will require first cycle for fetching instructions, second cycle is required for instruction decode, third cycle is required for execution because memory read is not involved. However, you will require another cycle to write the result back into the register. So, you require four cycles you can skip the memory cycle. So, instead of four cycles you require five cycles you require four cycles for all data manipulation. Now, what about the data transfer? The data transfer operations involves transferring data from the memory to register or from register to memory. How many cycles it will involve? So, as you can see that from the whenever you are performing store, then you will require only four cycles because the data will come from the register and address is generated in the third cycle. In the fourth cycle you can perform the writing operation. That means your store will require four cycles. However, load will require five cycles. Why load will require five cycles? Because you can see here in the fourth cycle the address will be calculated and only in the fourth cycle you will read the data and fifth cycle you will be able to write the result into the register. So, five cycles will be required. So, we find that if we go for multi-cycle implementation, then four or five cycles are required. Now comes the conditional instructions. There are conditional instructions. There also you will require either four or five cycles depending on whether you have to get the address from the memory whenever the branch is taken and store the result from that I mean you have to jump to that particular memory location. So, we find that either four cycles or five cycles are required for different types of instructions and you can we have done some computation. Here branches and stores four cycles all other instructions five cycle. If this assumption is made then CPI becomes equal to 0.8 into 5 plus 0.2 into 4 because 80 percent of the instructions will require five cycles 20 percent only four cycles 4.8. However, as I have already told ALU operations can be allowed to complete in four cycles. In such a case the break up will be 40 percent of the instruction are ALU operations 20 percent are branch and stores and so you are left with 40 percent which will require five cycles. So, 0.4 into 5 and 0.6 into 4. So, that gives you a CPI of 4.4. So, you are getting a cycles per instruction is 4.4. Now what is the objective of pipelining? Pipelining the implementation can help to reduce CPI or objective is to reduce the value of cycles per instruction. So, by whenever you shall go for pipelining you will see that this will be reduced to 1 instead of 4.4 it will be reduced to 1. Now one very important requirement for pipelining is pipeline registers. We have already discussed about the need for pipeline registers. Pipeline registers are essential part of all pipelines and there are four groups of pipeline registers in the five stage pipeline. For our pipeline I mean for our data path which you are interested in pipelining we require four stages of memory. Each group saves output from one stage and passes it as input to the next stage. So, one register stage will be between instruction fetch and instruction decode. That is why the name is instruction fetch slash I F I F slash I D. Second stage is I D slash E X. Third is E X slash M E M memory. Fourth is M E M slash W B. So, you require four such different blocks of registers for your pipeline implementation. So, this way each time something is computed that something can be generation of effective address, generation of immediate value, generation of register content etcetera. So, these are computed by different stages and they will be stored in the registers. So, it is saved safely in the context of the instruction that needs it. So, you may be wondering why only four such stages of register files are required. Why not file? So, let us look at the pipeline. At each line red line you will require one register file one here, one here, one here and one here. Why not here in the beginning? The reason for that is that program counter is actually surfing the job for the register for that stage. That means, that instruction fetch stage is getting its input from the program counter. So, program counter is providing the necessary information for stage one. So, you do not require a separate register file for stage one. However, for the remaining stages you require separate registers. Now, let us see how different, how the pipeline registers are used in whenever you go on executing different instructions. So, this is the instruction fetch stage and this is the pipeline registers. These are the additional registers that you require apart from the registers that is present in the ALU. So, instruction fetch slash instruction decode. So, this is at the interface of instruction fetch and instruction decode stage. So, the instruction fetch stage will perform fetching of instruction from the memory and it will store the result into this instruction fetch instruction decode register. And then as we go to the next cycle, you can see the output of the instruction fetch instruction decode stage will provide the necessary input to the instruction decode stage that will correspond to the instruction one. And at that time you will see the second stage, the first stage will be performing instruction fetch. So, first stage is performing instruction fetch and the result that was produced by instruction fetch is now available in that pipeline register which is now applied to the instruction decoder. And in the next cycle what will happen that instruction decode stage will perform necessary decoding and that instruction decode stage will put the result in that instruction decode instruction execution register. And similarly the instruction fetch register instruction fetch stage of the second instruction will go to the instruction fetch instruction decode register. If you go to the third cycle, we can see the output of the instruction fetch is going to the instruction decode for the second instruction. On the other hand, the output of the instruction fetch execution register, pipeline register is applied to the execution for that corresponds to the first instruction. So, you can see that information about the different instructions are being stored in this pipeline registers and they are used properly for performing parallel operations. So, here you can see in the third cycle instruction fetch is going on for instruction three, instruction decode is going on with the inputs coming from this pipeline register. And similarly execution of the first instruction is going on with inputs coming from this pipeline register id slash ex. And in the fourth cycle, at the end of the third cycle the results produced by the three stages are again stored in these three registers. And in the next cycle you are getting output from the registers and going to memory the corresponding to the in the fourth cycle that execution memory register pipeline register will provide the output which can be stored in the memory. And similarly, that is for the first instruction, for the second instruction that instruction decode was done. So, output was stored in this register pipeline register and which which will provide the input to the execution stage for the next instruction. Instruction two similarly for the third instruction, instruction fetch was completed and that was that is now available in the pipeline register and that is being applied to the instruction decode. So, in this way this will continue and continue the execution. And typically we will not think too much about the pipeline registers and one must assume one just assumes that values are passed magically down stages of the pipeline. All I am trying to tell you at this point you see pipeline registers are present. So, I have explained in detail how the pipeline registers are being used to save the intermediate results produced by different stages in different cycles. But subsequently we shall not bother about it. So, we shall assume that magically the information is passing from one stage to another and results is being generated and parallel execution of different stages of I mean different instructions of different instructions are getting executed in their different stages. So, this is the pipeline registers and you can defect the pipeline registers in this way. You can see here different instructions this is this is the top one corresponds to instruction one, next line corresponds to instruction two, next line corresponds to instruction three, next line corresponds to instruction four. So, for different instructions the resources that is being used are defected for different instructions. But as you can see here if you consider a particular instant of time will be finding that resources are different instructions are using different resources at a particular instant of time. For example, if you consider say first, second, third and fourth cycle in the fourth cycle this is the situation that means the right back is going on sorry in the fourth cycle that you are that memory operation is going on. So, memory is getting memory resources getting used in addition to the pipeline register and for instruction one, for instruction two that ALU resources being used along with the pipeline register IDEX for the third instruction it is using the pipeline register instruction fetch instruction decode and also it is using the register resource. And so far as the fourth instruction is concerned it is performing instruction fetch. So, instruction memory is being used. So, here you can see that instruction memory and data memory both are used both these resources are used simultaneously by different instructions. Now, you may be wondering why is pipeline piling risk processors easy I have already explained that all operands are in registers. And if they are not in registers then implementing pipeline will be difficult because when executing instructions you have to fetch the operands from memory. So, that will incorporate more complication and that is the reason why SISC processor implementation of pipeline for SISC processors is rather difficult, but it has to be done later on we shall consider the pipeline implementation of say Pentium and how can it be done we shall see later on. Then the only operations that affect memory are loads in store I have already mentioned about it. So, although pipelining could conceivably be implemented for any architecture it would be inefficient that means for SISC processors it will be inefficient Pentium are characterized characteristics of SISC or RISC actually Pentium belongs to the SISC category SISC instructions are internally converted to RISC like instructions. So, this is just a hint you will see that to implement pipelining internally complex instructions are converted into RISC like micro operations then pipelining is implemented. So, directly the instructions cannot be pipelined, but you will require some hardware which will convert complex RISC like instructions into a several simple RISC like operations then they can be pipelined. So, that we shall discuss later on for the time being be satisfied with this observation and this I have already discussed in detail operation of different stages. And now here after incorporating the different pipelined register this will be the look that means this is the first stage, stage one that is your instruction phase stage and this is the instruction decode and register phase and in between that instruction phase instruction decode pipeline registers have been incorporated. Similarly, between that instruction decode register phase and execution stage instruction decode register phase stage and execution stage we have put another register pipeline register that is your id slash ex. Similarly, between the execution execute stage and memory access stage another register pipeline register have been incorporated that is your ex slash mem registers file and finally, between the memory access and write back another register stage have been incorporated that is your memory slash WB. Now, you can see we have not only we have added some register we have also removed some registers that also you have to notice. For example, the non pipeline implementation if you go back you will find that we had some registers like instruction register there was an instruction register there was a load memory data register those registers are no longer required because these registers the function of these registers are being implemented with the help of these registers. For example, that instruction register is no longer required because this particular register instruction phase slash instruction decode this register is actually holding the instruction for the next stage. So, that instruction register is no longer required which is required in non pipeline implementation. Similarly, at the end of this data memory there was a load memory data register in a non pipeline implementation that is not required because this particular register pipeline register memory slash write back mem slash WB this particular pipeline register will hold the data coming out from the data memory and that will provide it to the multiplexer for storing it in the register in the subsequent cycle. So, we can remove few registers, but obviously we have to incorporate more complex and larger number of registers to implement pipelining. So, whenever we implement this pipelining then the basic idea is each instruction spends one clock cycle in each of these five execution stage based on our you know that ideal condition we have assumed that all these stages will take same time and they are as a consequence the clock cycle required is one. So, the each instruction spends one clock cycle in each of the five execution stages and during one clock cycle the pipeline can process five different instruction which can be depicted in this manner as well. So, we have seen the different ways of depicting it this is one visualization for the purpose of visualizing the pipeline execution either you can visualize in this manner or you can visualize it in this manner. So, both are used in different situations. So, this corresponds to corresponds to instruction one next line corresponds to instruction two third line corresponds to instruction three or to generalize it I plus two that is the first one is I second one is I plus one third one is I plus two and so on. And this depiction where you are telling clock number at the top one two three four and so on and then you are writing down the name of the different stages. Alternatively you can use this where also you have got the different instructions in order that is I I plus one I plus two and you can show the different blocks different stages instruction fetch instruction decode ALU memory write back and so on. And clock cycles are also mentioned at the top clock cycle one clock cycle two clock cycle three clock cycle four and so on. So, this is the alternative visualization now coming to speed up. So, assume that a multiple cycle risk implementation has a 10 nanosecond clock cycle loads take five clock cycles account for 40 percent of the instructions and all other instructions take four cycles. I have already explained this in detail and that is how we got a CPI of 4.4. Now, only thing that has been added here the cycle time has been given here that is equal to 10 nanosecond. So, whenever you consider the average instruction execution time average instruction execution time for non pipeline will be equal to non pipeline execution time will be equal to 10 nanosecond into CPI CPI as you have already seen this that is equal to 0.6 into 4 because 60 percent of the instructions require four cycles and 40 percent of the instructions will require five cycles. So, this gives you this gives you 4.4 3.4 and this will be equal to 4.4 2.4 plus 2 that is 4.4 into 10 that is your 44 nanosecond and pipeline implementation how much time it will take. So, here one assumption has been made in pipeline implementation add one nanosecond to the clock cycle why you are adding one nanosecond to the clock cycle because you have to take into account the delay of delay of the pipeline registers. So, pipeline register will involve some delay. So, it will require 10 nanosecond plus 1 nanosecond that is your 11 nanosecond and 11 nanosecond is the cycle time for execution time for each instruction I mean that is the rate at which it will come and as a consequence. So, we can say that average execution time average execution time in case of pipeline implementation is 11 nanosecond. So, therefore, speed up speed up is equal to 4. So, if we consider pipeline and non pipeline implementation non pipeline multi cycle. Now, instead of multi cycle if it is if we consider single cycle in that case what is the speed up. So, for single cycle in case of single cycle the clock time period of the clock has to be 10 into 5 that is your 50 nanosecond. So, 10 into 5 because whenever you go for multi cycle implementation your average execution time will reduce, but if it is single cycle then obviously your average execution time will be longer because the total delay for different stages you have to take into account to decide the clock frequency. So, 50 nanosecond. So, in that case the speed up will be speed up will be equal to 50 by 11. So, here it is more than 4. So, 4.5 roughly now. So, the above expression assumes a CPI of 1 here we have assumed that the pipeline processors always generates one output per cycle. So, do we expect this in practice any complications here is a question we have assumed some ideal conditions and based on that ideal conditions it is possible to have CPI is equal to 1. So, what were the ideal conditions. So, these were the ideal conditions we assume that all instructions are divided in independent parts each taking a nearly equal time. Now, the question is can instructions be divided into independent parts each taking nearly equal time that is not true because the whenever you are performing read operation from register it will take much shorter time. Compared to reading data from memory similarly whenever you are performing add operation time will be much less compared to whenever you perform multiply operation. So, even when you are performing ALU operations different instructions will require different time that ALU operation execution time that time will be different. Similarly, as I said read from memory and read from registers their time also will be different. So, in practice that cannot be true then another ideal condition was related to this question can instructions be executed in sequence one after the other in order in which they are written. So, all I am trying to tell is in order execution as I said a program is nothing but an ordered sequence of instructions the order in which they are present in the program they will be executed in the same order. That assumption will also not be true later on we will see that we shall go for some specialized I mean some processors where there will be some kind of predictions or speculations where to improve the performance of the processor you have to allow out of order execution. So, in such situations this will not be valid third is are successive instructions independent of one another this is a very common question. We assume that the instructions are independent. So, they can be executed in a overlap manner since they are independent there is no problem, but in reality that is not so there will be various kind of dependences data dependence control dependence and so on. So, because of these dependences they cannot they are not really fully independent. So, these assumptions are not valid last but not the least is there no resource constant. That means we assume that there is no resource constant unlimited resources are available, but in practice it is necessary to impose some restriction on resources to reduce the cost of implementation. So, the resources are to be optimally used and whenever we are not when these ideal conditions are not satisfied then your pipelining we shall not get CPI of 1 that CPI of 1 because we have to add some bubble or it some additional times will be time will be required some time will be wasted some clock cycles will be wasted and we shall deviate from this CPI of 1. So, our objective is to get CPI of 1, but because of these problems we shall not get CPI of 1. Another last point that would like to mention is limits on pipelining we have seen the speed up is related to number of stages K. In an ideal situation when you are executing large number of instructions the speed up is K. Obviously, you will be tempted to increase the number of stages instead of 5 why not go for 10 stages. So, that we get a speed up of 10 or why not 100 stages. So, that we get a speed up of 100, but in practice then that cannot be done because we have seen that primary requirement is that you have to divide an instruction into different parts and you have to implement them by using hardware. So, increasing the number of pipeline stages in a given logic block by a factor of K generally allows increasing clock speed and throughput by a factor of almost K as I have already mentioned. Now, usually less than K because of overhead such as latches and balance the delay in each stage. So, we do not get the exactly K as we have seen we got 4.4 or something like that, but pipelining has a natural limit. Natural limit is at least one layer of logic gates per pipeline stage you see you will be implementing the pipeline with the help of hardware and that hardware will require at least one logic stage usually 8 to 10 logic stages are present in a single stage, but limit is at least one or usually two. So, that will put a limit on the maximum number of stages that you can have and practical limit is usually several gates 2 to 10 and of course, commercial designs are rapidly nearing to this point. That means, in a commercial design you will find that trying to increase the number of pipeline stages to as many as possible. So, that get so that you get higher speed up. So, with this let us come to an end of this lecture and in my next lecture we shall discuss about those non ideal conditions and the impact of those non ideal conditions. Thank you.