 lecture on pipelining. Pipelining is one of the most important and popular technique that is used to enhance the performance of a processor. And you will see that it is an implementation technique which is done in the hardware and it exploits different types of a kind of parallelism, instruction level parallelism. So, before I go into the details of implementation of pipelining in modern processor, today I shall introduce to you the basic concepts of pipelining. As we shall see pipelining is a concept that is used not only in processors, but in our day to day common life in various situations and various applications. So, it is necessary to understand the basic concepts of pipelining and in this lecture, I shall cover these topics after giving a brief introduction. I shall discuss about what is pipelining, I shall define pipelining and then I shall discuss about its implementation, how is it implemented and whenever you implement pipelining, how the performance changes and usually the performance is represented by two parameters speed up and throughput. We shall see how the speed up and throughput changes as you implement pipelining. And we shall see that whenever you go for pipelining, it is implemented by dividing a particular operation and implementing in a 8 day and it is implemented in a number of stages. And we shall see that the performance is dependent on the number of stages. So, you have to identify an optimal number of stages that gives you good throughput and speed up based on cost performance. Then I shall discuss about two important pipelining that is your fixed point multiplier pipelining and floating point adder pipelining, which are used in arithmetic implementing pipelining arithmetic unit. That means, whenever you are performing a multiplication of fixed point numbers and also as you know multiplication takes longer time. So, it is pipelining if whenever it is implemented, it will improve the performance. Similarly, floating point addition is another very important operation where pipelining can be implemented to improve the speed up operation. Oh my God! A house is on fire. So, as it appears the house is in a village where the fire brigade is not present. And fortunately, as you can see there is a pond nearby. Let us see how this fire is doused by this person. So, obviously what he will do? He will take water from the pond and again come back to the pond to take water and then again put it on the fire and so on. So, he will run between the house on fire and the pond to douse the fire off. So, put the fire off. This is how he can try to do it. But unfortunately, this thing putting the fire off will take very long time. The poor fellow may lose his house. The house may be completely burnt off by the time he is able to put the fire off. So, let us see how the fire can be put off in an alternative manner which was taught in our school. In our school it was taught that whenever there is a house on fire, you get hold of as many buckets as possible, get as many people as possible, let them stand one after the other between the pond and the house on fire. And then let the bucket full of water pass from one hand to another hand from the pond to the fire and the way as I shall show in this particular diagram. So, you can see here the bucket full of water is moving from the pond. It has been passed on to the next person and in the meantime the first person, the person who is near the pond again put another bucket full of water. And in this way as you can see bucket is moving from the pond to the fire and you can see several buckets 1, 2, 3, 4 in this diagram and may be 5. So, 5 buckets of water is moving from the pond to the fire in parallel simultaneously. And later on when I understood, I mean when pipelining was taught, I realized that this is nothing but pipelining. So, what was taught in our village school for putting the fire off is a pipelining technique. So, this is one common example of pipelining in our day to day life. Now, let us see another example. So, here you see two alternative ways in which an engineering college can run. Let us consider the first approach in which admission takes place only when the batch of student passes out from the institute, from the college. So, you can see the student is taken, he joins the first year course, then he moves to the second year course, then he goes to the third year course, then he goes to the fourth year course and at the end of the first fourth year as he passes out, then only another batch of student is taken. So, in this particular case as you can see admission of student is taking place once in every 4 years. So, this is obviously not a very good way of doing it and in this particular case as you can see the throughput that is the number of student passing out per year is only 1 by 4, because only one student I mean in every 4 year one batch is passing out. So, throughput is 1 by 4. Obviously, this is not practiced in our colleges. So, in our colleges what is being done student is admitted every year. So, you can see as the first batch of students moves to the second year, another batch of student is admitted and at the second batch of student moves to the third year and first and the second batch of students move to the second year, another third batch is taken and at the first batch moves to the third year to fourth year, second batch moves from second year to third year and third batch moves to first year to second year and a fourth batch is admitted. So, you can see at the end of fourth year initially there is some delay at the end of fourth year only first batch is coming out, but subsequently as you can see in the fifth year another batch is coming out, that means the second batch is coming out in the fifth year, third batch is coming out in the sixth year and so on. So, if you whenever a college is running for a large number of years, you can ignore the first initial few years and then you will see that the college is able to produce students, I mean one batch per year, so throughput is increasing from one fourth to one. So, this is also a technique for pipelining, so we have seen two applications of pipelining in our common life, day to day life. Now, let us see what is the basic concept behind pipelining. So, whenever you are asked to define pipelining, you will say it is an implementation technique where multiple tasks are performed in an overlap manner. So, what has been done essentially, multiple tasks have been carried out in an overlap manner. So, if we go back to our previous diagram, we see that here the batch of first batch who are in the fourth year is being taught simultaneously with the second batch of students who are in the third year, similarly a third batch of students who are in the second year and a fourth batch who are in the first year. So, all these batches of students are taught simultaneously, although they are in different years and this is essentially the basic idea of pipelining. So, in an overlap manner, multiple tasks are being performed, so in this particular case task is teaching students in a college, so it can be any kind of task. Now, let us see how can it be implemented, this question arises because as we have seen throughput increases whenever pipelining is implemented, question naturally arises, can it be implemented in all possible cases, let us see when it can be done. So, the answer is it can be implemented when a task can be divided into two or more subtasks which can be performed independently, so this is the key idea for implementing pipelining. First thing the first requirement is that you should be able to divide a task into more than one subtasks 2, 3, 4, whatever it may be, then it is also necessary that these subtasks you should be able to perform independently, so we have seen in case of college the teaching of students of different batches who are in different years of study can be carried out provided we have enough infrastructure, building classrooms and teachers, then it can be done independently and that is how pipelining is implemented in case of college, so that is the requirement for implementing pipelining. Now, let us see a task which is taking time t, so this is a task single task and it is taking time t, now let us see whenever it is divided into k subtasks, so here you have got a task, let us assume it takes time t, now whenever you divide it into k subtasks, so you have divided into k subtasks, obviously each of them now will take time which is equal to t by k, t by k is the time required to perform each subtasks, now if you think in terms of implementing it, here the time required is t for a task and here time required is t by k to perform a subtasks, we shall see whenever we go for implementing this in the context of processors they are the how the clock frequency and other things are affected by this. Now, the pipeline can be implemented in two ways, first one is known as synchronous pipeline which is the most popular one and here different subtasks are performed by different hardware blocks known as stages, so we have seen we can divide a task into a number of subtasks, so this is performed by a particular stage, so you will require k stages and each will perform a particular subtasks and obviously these stages will perform different operations because you have divided task into k subtasks and each of these subtasks will be different they are not same and obviously these stages will perform different operations, now different subtasks are performed by different hardware blocks known as stages and the result produced by each stage is temporarily buffered in latches and then passed on to the next stage. So, what is important here not only you have to divide a task into k subtasks and each of the I mean subtasks is performed by a particular stage, but it is also necessary that you have to insert some kind of buffer in between each stage, why is this necessary? This is necessary because whenever this stage will be performing some operation it will get its input may be from the input and then when the result is produced that has to be temporarily stored in this particular latch or buffer and subsequently when another task is applied to it, this latch will provide input to the second stage. Similarly, the output produced by second stage will be buffered in this latch and which will provide input to the third stage, so in this way the different stages will get their inputs from the latches, only the first stage will get inputs from the primary input that is applied, so and as you can see you will also require a clock. This clock will latch the input that input which is available into this latch similarly the output produced by stage S1 will be also simultaneously I mean will be latched into this particular buffer by this clock, so in this way output of different stages are latched in buffers simultaneously with the help of a clock. So, as the inputs will be coming the outputs will be generated and that will be latched and we shall see how these the input will flow from input to output, we shall explain it with the help of another diagram. Now, transfer between stages of simultaneous as I have mentioned and one task of operation enters the pipeline per cycle, this is a very important thing see here per clock cycle a new input is coming and which is getting you may say admitted into the pipeline and you can see the clock is applied to different latches in between a pair of stages, so here let us see the how the execution takes place, so you can see here in the first clock cycle the task T1 is entering the pipeline and entering the first stage of the pipeline. Then as we move to the second clock cycle then the another task is entering the first stage but in the meantime that output of the first stage has been latched in a buffer and which is being applied to the second stage, so second stage and first stage both are performing computation performing some operation, but they are performing operation on the data of different tasks, so this is the stage one is performing operation on task inputs coming from task two and stage two is performing operation of that is generated by the first stage intermediate result produced by the first stage that is buffered in between and in a latch and essentially it is the data coming out from the task T1. So, in this way you can see as we move to the fourth clock cycle all the stages are busy, the stage four is performing processing on data produced corresponding to task one then stage two I mean stage three is producing is performing operation on data that is coming from task two then the third I mean third task is being performed by stage three and stage four is performing the operation I mean performing operation on task four, so you can see this is how the in a overlap manner processing is taking place and now at the end of fourth clock cycle here it has been assumed that it requires four stages, so you can see four clock cycles are required before the first result comes out then in every clock cycle a new output is being generated. Question naturally arises we have defined synchronous pipeline, so there must be an alternative technique that is known as asynchronous pipeline, so in an asynchronous pipeline transfers performed when individual stages are ready, so here you see we are not providing a latch in between, but there is a kind of hand shaking signal present between two stages you can see there is a ready input and acknowledgement input, so ready signal is coming as input to stage s one and whenever it is ready to accept some new data it will generate an acknowledgement signal and then new input will be provided to stage s one, similarly stage two will receive that input from stage s one and only when acknowledgement will be sent by stage two to stage one, so in this way you can see the data will be passed on from one stage to another stage with the help of hand shaking signal in an asynchronous manner there is no clock. Obviously in this particular case the delay I mean the time required to perform operations of different stages can be different, in other words the different amounts of delay may be experienced at different stages, so in this particular case say stage s one may take say 5 nanoseconds stage two may take 10 nanoseconds and so on variable time, so and as a consequence what can happen it can display variable throughput rate. On the other hand in our previous case we have seen in case of synchronous pipeline that is not so a common clock is used to move data from one stage to another stage and as a consequence the throughput is fixed determined by the clock rate, the rate at which output is produced and also the delay that is being we assume that each stage takes same time that means the delay taken by different stages is same there is no difference. However I mean this cannot be achieved in practical situations, so let us see whenever the different stages say here you have got stage s one and here is another stage s two and there is a latch in between and another latch is in between stage s two and s three and in this way you have got a case stages, now you will be applying a common clock to all the latches and at the input also this is the clock, how do you decide the clock or rather identify the clock frequency, actually it is dependent on the cycle time or time required to perform the operations of different stages and in case the time required is different let us assume this stage s one takes time t one, stage two takes time t two, stage three takes time t three, stage k takes time t k, so in such a case how do you find out the cycle time that is equal to tau, how do you find out that what is being done the clock cycle time has to be the worst case cycle time of a particular stage, so what you will do you will take maximum of the tau m let us assume the stage m has the maximum delay, so one of the stages will have maximum delay and that time has to be taken and also you have to take small delay that is taken by these latches may be d that is the delay of the latch and this is how the cycle time of a pipeline implementation is decided. Now cycle time is tau and obviously the clock frequency can be derived from this clock frequency is equal to one by tau, so this will be the clock frequency that has to be applied here, so if this is the time period then this time this is equal to tau and clock frequency is equal to one by tau and based on this the clock frequency will be decided, so worst case delay of a stage will decide the clock frequency. Now let us see how the speed up and throughput changes, as you have seen throughput is the outputs produced per clock cycle and that throughput will be equal to one in case of ideal situation that means when the pipeline is producing one output per clock cycle and later on we shall see that there will not be situations where the output cannot be produced in each cycle and because of problems known as hazards. So later on we shall discuss about pipeline hazards for the time being let us assume we are considering we are dealing with ideal pipeline. Now how do you compute the speed up, speed up can be found out from the time taken it will be the ratio of time taken by non pipeline stage, non pipeline implementation and the speed up and the time taken by pipeline implementation, how do you find it out, you have seen that you are dividing a task into k sub tasks 1 to k. Now let us consider that each of these times are identical same and then the time required that will be equal to n into k into tau, assuming that this is equal to tau because to perform a task of where n is the number of tasks, here we are not considering one task time taken by pipeline implementation to perform n tasks. So one task will take k tau time and in a non pipeline implementation and n task will take n k tau time, unit may be nanosecond millisecond whatever it may be. Now what is the time taken, so non pipeline implementation is taking time n k into tau, how much time will be required by pipeline implementation, first of all in case of pipeline implementation we find that the first task will require k into tau time and the remaining tasks n minus k 1 tasks will produce in tau time. So we can see that, so here it will take k minus 1 time and if you have got n n tasks then it will be performed in this way, so it will be taking k plus n minus 1 into tau, not this here it will be, sorry made a mistake here it will be k into tau plus n minus 1 n is capital or small whatever it may be into tau, so k plus n minus 1 into tau. So what is the speed up, speed up is equal to n k tau by k plus n minus 1 into tau and this tau tau will cancel out, so we will get here n k by k plus n minus 1. So this is a very simplified expression, now let us assume n is infinity, that means when you are performing a large number of tasks then what will happen, n will be equal to infinity, so in such a case your speed up will be equal to n k and here you can ignore k minus 1 with respect to n, so it will be equal to n, so this will be equal to k. So we find that the number of stages that will be required is equal to k, speed up is equal to k and which is equal to number of stages, so in other words we are getting a speed up in ideal situation that is equal to number of stages, so obviously you will be tempted to use as many stages as possible, but unfortunately later on we shall see it is not true because the as the number of stages increases overhead keeps on increasing you have to put on buffers and a point is reached when the you will not get any more speed up, so the speed up will initially keep on increasing, but the curve will be somewhat like this say if we plot speed up and here the value of k you will find that initially it keeps on increasing, but then it will decrease. So the point will reach when you will get an optimum speed up, so that speed up will have always some optimum value, another point that you have to remember that is the memory bandwidth, we have seen that in case of non pipeline processor the rate at which data or instruction whatever may be transferred is the rate is you can say that is k tau that is the time required to perform a single operation. So if it is a processor performing I mean executing instruction each at the interval of k tau one instruction is fetched from the memory because CPU will be fetching the information from the memory instructions from the memory and for a non pipeline processor the rate will be k tau where k is the number of stages and tau is the time required per stage. So this is this is the case in case of non pipeline on the other hand in case of pipeline stages you have to fit data at the rate of tau that means the input has to come at the rate of because the new input has to be provided to the processor at the rate of tau in each cycle of the clock cycle of the pipeline processor your input new input will come. So the bandwidth of the memory has also to be increased by a factor of sk where sk is the speed up factor that means what I am trying to tell not only the clock frequency of the processor will be k times that of a non pipeline processor. The bandwidth of the memory is also I mean that is required will be the sk times that is the speed up times that of the non pipeline processor in other words you will require a faster memory whenever you use it in the context of a pipeline processor. Now what are the different types of pipelines that is possible historically there are two different types of pipelines one is known as arithmetic pipeline another is instruction pipeline. So in arithmetic pipeline you perform different types of arithmetic operations like additions subtraction multiplication division it may be fixed point addition subtraction multiplication division or floating point addition subtraction multiplication division. So these operations can be performed by the arithmetic pipelines on the other hand in instruction pipeline the instructions will be fetched from the memory and the instructions will be executed in different stages of pipeline. So the ALU operations can be performed by arithmetic pipeline and instruction processing can be performed I mean fetching execution can be performed by instruction pipeline. Now arithmetic pipelines like floating point multiplications are popular in general purpose computers and question naturally arises when a pipelining can be used. We have seen that pipelining can be used whenever you are getting continuous input and producing continuous output for a single task it does not give you any benefit the reason for that is for a single task time required to perform computing execution is more than that of a non pipeline implementation. That means only when large number of tasks are to be performed then pipelining is suitable and in case of arithmetic pipelines we have seen this is not very common we do not keep on performing addition or subtraction or multiplication continuously. So whenever an arithmetic addition operation is performed that is performed and may be another addition will be performed after many clock cycles. So it is not very common however there are situations where you have to perform large number of operations and that situation is for example vector processors. So where you have to an array has to be processed and that number of array elements can be large which is usually performed by vector processor that type of thing can be done can be performed by arithmetic pipelines. But so far as instruction pipeline is concerned the instruction pipelines are being used in almost every modern processor. The reason for that is we have seen that whenever a processor of a computer is turned on all it does is fetching an instruction from the memory and executing it fetch execute fetch execute and it does it continuously one after the other. And you have got a constant stream of instructions that is stored in your computer memory and those are being fetched one after the other as long as the power is on. So instruction pipeline is a very good case I mean implementing pipeline for instruction is a good case and that is the reason why instruction pipelining is performed in all modern processors. But before we go for instruction pipelining for the sake of completeness we should discuss about arithmetic pipelines with two examples. First one is pipelined fixed point multiplier and then later on we shall discuss about pipelined floating point adder. So first let us focus on how pipelining can be implemented for a fixed point multiplier. So whenever you are performing multiplication of two numbers say A and B say A is equal to let us assume 1 0 1 1 and the number another number is 1 1 0 1 0. So whenever you have to multiply whatever you do by hand computation you first multiply 0 with this. So you get 1 0 1 1 then you multiply 1 with this you get 1 0 sorry you multiply with 0 you will get 0 0 0 0 whenever you multiply with 1 you get 1 0 1 1 and you keep on shifting it and you essentially we do shift and add shift and you have to add these two numbers you have to perform addition of this and then whenever you multiply with this you again you get 0 0 0 0 and then another is 1 0 1 1. So now these are added so this addition whenever you perform by hand we may do it simultaneously that means we add all these bits together and produce the sum as it is shown in this diagram here A and B to 8 bit numbers P 0 is the partial product by multiplying this bit 1 with this number all the bits we get this. And in this way we have got partial product P 0 P 1 P 2 P 3 P 4 P 5 P 6 and P 7 corresponding to multiplication of these 8 bits with these bits. So these are the partial products which are to be added and you can see 0 has been inserted on the right side and whenever we do the addition of the entire column but whenever you do it with the help of practical I mean processors you cannot your adder will take two numbers usual symbol of adder is this adder will take two numbers and produce a sum and a carry. So this is A B sum and carry so you cannot really perform all these additions simultaneously which is shown in this diagram however we can try to do it in a pipeline manner to I mean instead of doing it serially that means adding these two P 0 and P 1 then P 0 and P 1 then the result of that is added with P 2 and then result of that is added with P 3 and so on that is essentially doing it serially and that will take long time. So whenever we go for pipeline implementation we can have two types of adders you are familiar with full adder a full adder takes two bit inputs may be A i B i and a carry C in and it produces a carry out C out and sum bit S i. So this is the full adder implementation that means it is adding three bits one is carry coming from the previous stage and two bits and producing sum and carry. Now you can realize two types of adder one is known as ripple carry adder which is also known as carry propagate adder carry propagate adder that is CPA how is it implemented here you will have several full adders if it is a eight bit you will have eight bit full adder. So here you will put A 0 B 0 then A 1 B 1 here you will put A 7 B 7 and this may be 0 and the carry coming out from this will come here and S 0 you will get here S 1 you will get here and S 7 you will get. So as you can see this is the final carry that will be generated C out and the other carries are passing through from one stage to another passing from one stage to another. So it ripples through and you get you can see if this is a eight bit number eight bit two eight bit numbers or say two n bit numbers produces n plus one bit output. So this is the case for carry propagate adder. So two n bit number is input and output is n 1 plus n plus one bit output. Now there is another type of adder which you will require to implement the pipeline implementation that is known as carry save adder. Carry save adder has got three inputs. So here you have got three inputs it can take three inputs simultaneously here one bit here another bit here another bit. So and what it does it produces two n bit outputs. So say it will take one input A another input B another input C each of them may be n bit each of them may be n bit and it produces two outputs one is sum which is n plus one bit and another is carry which is also n plus one bit. How it is being done let me illustrate this with the help of two three four bit now three four bit numbers say one zero one zero another is zero one one zero and third number is one one one one. So whenever you perform these three numbers this is A this is B and this is C. So sum will be generated which is the sum of these numbers and as you know sum is exclusive or of these three. So sum will be one sum will be in this case one sum will be zero here sum will be zero here and corresponding carry will be if there is carry from this stage in this case and in this case also there is a sum this will produce one here. So this is the sum and the so far as the carry is concerned. So you can see you have got n plus four bit addition you are taking. So you have got five bits similarly for carry this bit will be this bit will not be there because carry will be from the next stage. So from this stage there is no carry. So it will be zero from this stage there is a carry. So there will be one from this stage there is a carry there will be one and from this stage there will be a carry. So there will be zero. So each of this stage will produce so here also you require five bits. So you can see that sum and carry both are requiring five bits. So n plus one so this will be zero and in this case this sum will be zero sorry there will be sum in this case zero and in this case zero. So it will be sorry this will be carry here carry here not sum sum is zero. So carry is here. So this is the most significant bit is zero in this case and here least significant bit is zero for carry. So this is how two n bit numbers are produced by the carry save adder. So we can combine carry sum adder and carry propagate adder to realize a pipeline fixed point multiplier. As you can see the first stage is producing the multiplier recording logic. So what it does it produces this bits this partial product bits essentially by a large number of AND gates. So by large number of AND gates you can produce these outputs and you can see two 8 bit numbers here you have got the corresponding partial products. So last one was 15 bit and the first one was 8 bit. So which is shown here. So you can see here it is 8 bit then 9 bit 10 bit up to 15 bit. So this input is going to the second stage and we are applying three inputs to each of the carry save adder they in turn are producing the output. So you can see the first carry save adder each is having 10 bit input. So it is producing 10 bit output similarly these three inputs are applied with 11, 12 and 13 bits they will produce two outputs 13 and 13 bit each similarly these two are directly going to another stage and the output of this carry save adder are being added or applied to another carry save adder and this is 13. So this is 10, 10 so reading the most significant bits will be 0 and this will produce two output. So here actually 1, 2, 3, 4, 5, 6, 7, 8, 8 inputs are there and ultimately they are converted into four outputs. These four outputs are applied goes to the third stage and in the third stage you will require two carry save adders to transform them into two 16 bit outputs. So first three inputs will go to one carry save adder and they are two outputs and the fourth input will go to the another carry save adder and ultimately you have got two 16 bit outputs. Now these two 16 bit outputs will go to a carry propagate adder which is essentially a refill carry adder to produce a 16 bit sum and actually carry output will be also there which is not shown. So this is the product 16 bit output product carry is not because here we are interested in the product. So you get a and we shall assume that there is no overflow from this. So we shall be getting a product of 16 bit from the two 8 bit numbers. So this is how you can see in four clock cycles you are able to perform the computation with the help of a number of carry save and carry propagate adder and not only that if you have to perform this multiplication continuously. So after the first multiplication is done which will take four clock cycles the subsequent multiplications will be performed in I mean one output will be generated in one clock cycle. So when you will be performing multiplication continuously this type of pipeline implementation can be done. Now let us focus on the pipeline floating point adder. So the pipeline floating point adder is also implemented in four stages. So before we go to the discuss the implementation of the pipeline floating point adder let us see what are the operations we normally perform whenever we go for pipeline addition I mean normal floating point addition. So here we have given an example of decimal floating point number but it can be binary and the function the operations will be same. So here you see we have got two floating point numbers one is 8.96 into 10 to the power 1 another number is 48.6 into 10 to the power minus 1. So these two numbers are added to be added. So first operation that we do is adjust the exponent I mean adjust the number having the smaller exponent and convert it into a number of exponent having the same exponent value as the larger exponent. That means for example this 48.6 into 10 to the power minus 1 is converted into 0.486 into 10 to the power 1 only when this is done then we can perform the addition. So you can see we have shifted this can be carried out by the help of shifter and this adjustment of the significant is done and then you can add the significance this 8.96 can be added with 0.486 to get 9.246 and then we do a kind of normalization and when you do the normalization that exponent is adjusted and here the there is no I mean the number starts with fraction 0.9246 and if there is leading zeroes then you have to do the round off. The same thing is done here you can see exponent subtraction is being done to find out what is the difference between the exponents and accordingly the fraction with smaller exponent is shifted with the help of a right shifter to have the same exponent value and that number and the exponent with larger fraction I mean larger exponent that fraction these two are added with the help of the fraction adder and the exponent value is the maximum of the two exponents that is passed on to the third stage. And here after the addition is done if it is a leading zeroes in the result in the fraction that is being left shifted earlier we didn't right shifting now you are doing left shifting to remove the leading zeroes and how many shifting is done that is being counted here and then we get a normalized result that is passed on to the last stage however you have to adjust the exponent value to take care of the number of leading zeroes and so exponent is adjusted of the adder and we get the final exponent here and here the sum we get. So, we get the d that is the sum of the number and the exponent of the result s is obtained here. So, this is how that means d into 2 to the power s in binary that these are the two numbers that means we are starting with two numbers a into 2 to the power p and we are adding with b into 2 to the power q and then it is producing d into 2 to the power may be 2 to the power s. So, this is being done in a pipeline manner and this is implemented with the help of a fourth stage pipeline. Now, it is time to conclude. So, in this lecture we have introduced the basic concept of pipelining we have seen what is pipelining what pipeline means and it is essential and implementation technique where multiple tasks are performed in an overlapped manner after dividing into a number of sub tasks and when can it be implemented it can be implemented when a task can be divided into two or more sub tasks which can be performed independently as we have seen and we have observed that the time required to perform an individual task does not decrease, but the throughput increases. So, whenever we go from non pipeline to pipeline implementation we have seen that time required to perform a task does not decrease if we considered individually. However, the throughput increases the number of outputs that can be generated per unit time that increases and in this lecture we have discussed about pipeline implementation of fixed point multiplier and floating point adder these are essentially arithmetic pipeline implementation, but as I have told the most common application of pipelining is in instruction pipeline. So, in the next lecture we shall discuss about the pipeline implementation of instructions. Thank you.