 lecture on dynamic instruction scheduling. Earlier we have discussed in detail about the static instruction scheduling which is done with the help of compiler and we have discussed its limitation. And today we shall start our discussion on dynamic instruction scheduling and we shall see what are the advantages and disadvantages. And here is the outline of today's lecture, first I shall discuss about the need of dynamic instruction scheduling why it is necessary. And we shall see it is a kind of data flow execution and which allows you out of order execution. And particularly in this lecture I shall discuss about a technique known as score boarding which was developed for CDC 6600. And of course, for CDC 6600 the score board will be quite complicated to discuss in a classroom. So, a simplified version that is the score board for MIPS processor which we have introduced earlier has been considered. And we shall discuss the score board for MIPS which is simplified, but it will definitely highlight the important characteristics of the score boarding because both MIPS as well as CDC 6600 are based on load store architecture. And we shall see the four stages of score board control and three components of score board and illustrate the score board operation with the help of an example. Again this is the kind of recap we have seen that primitive pipeline processors tries to overcome data dependencies through interlocking. That means whenever there is a hazard it stalls the processor which brings down pipeline efficiency. And we have also discussed an approach which is known as forwarding which is a hardware based approach where we have seen operands are read not from the actual register, but from the pipeline registers. And with the help of that the data dependencies there are the stalls due to data dependencies are minimized or overcome. And we have also discussed scheduling of instructions software scheduling ordering the execution of instructions in a program. So, as to improve the performance we have seen how the data dependencies can be overcome or reduced by instruction scheduling. And particularly software based instruction scheduling we have already discussed which can be done with the help of a compiler. And I have mentioned that the software based instruction scheduling is handicapped due to inability to detect many dependencies at compile time. Because, since it is trying to detect dependencies at compile time which will not arise I mean which cannot be detected what will happen at runtime. And as a consequence its usefulness is very limited or restricted. And particularly in the context of superscalar architecture we shall discuss about this dynamic instruction scheduling. The need is arising because of you can see the various stages of superscalar processor it will have fetch stage, decode stage, dispatch stage, execute stage, complete stage and the entire stage. And as you can see the first part fetch, decode and dispatch up to this it is in order by that in order I mean the order in which instructions are appearing in the program in the same order they will be read from the program and it will be dispatched. And then it will be stored in a register known as issuing buffer. So, it is a multiple entry register and here there is a possibility of out of order issue that means the order in which instructions appear in a program that can be changed. And instructions which appear in the program order at a later point at a later than another instruction which may be issued for execution earlier. So, it will lead to out of order issue and up the execution will be done with the help of different number of function and unit. Since it is superscalar processor we shall be having multiple functional units and most of those multiple functional units will have different amounts of latencies. The latencies of different functional units will not be same the time needed for fixed point addition cannot be same as that of floating point multiplication or floating point addition. And as a consequence you will see the execution outputs will be generated out of order. So, in out of order issue will lead to out of order execution and what will be done they will be stored in a buffer called completion buffer. And completion buffer will allow you to again produce the result in a kind of in order fashion. So, that the results ultimately which is stored in the register or the status of the program is changed in such a status of the processor is changed in such a way that it will appear as if the instruction execution has taken place in order. So, that is done with the help of you know after the completion of instructions they are stored in a stored buffer. And then they are retired that means written into the registers the way the instructions have appeared in the program order. So, this is how the super scalar pipeline design will take place and these various functionalities that I have mentioned will be implemented with the help of hardware. So, particularly this dynamic instruction scheduling is very important in the context of multiple instruction issue which is done in super scalar processor and the issue stage can be replicated pipeline or it can be both. Here how it can be done for a CISC processor is illustrated as I have already told the techniques that I have been we are discussing is applicable to risk processors having load store architecture. However, this approach can also be used to CISC processor with some modification in the hardware. How is it done is shown in this particular diagram here you have got x86 instructions then you have got a super scalar decode unit and a super scalar translate unit. So, that instruction decode stage has been divided into two components and then the complex instructions are decomposed into simple risk like micro operations that means the a single x86 instruction will be decomposed into more than one simple risk like of micro operations. Those micro operations are sent to the dispatch unit. So, you can see here we are not exactly executing the instructions of the complex instructions as it appears in x86. So, to the dispatch unit various micro operations or risk like operations are being sent which can be issued to the multiple functional units and by the dispatch unit. So, this multiple functional unit will then execute the instructions and obviously here the order can be different. So, this is the in order retire unit which will ultimately produce result and store in the registers the way the instructions have appeared in the program order. So, this is the in order retire unit. So, you can see how complex instructions I mean this processors can be adopted to this approach I mean where you can use this risk like dynamic instruction scheduling. Now, this dynamic instruction scheduling is based on a very simple idea that is known as data flow computation. What is the basic concept of data flow computation? Basic concept it execute an instruction as soon as its operands are available. You see you have got a functional unit. This functional unit will require two operands possibly that will come from two registers R i and R j. Since, we are considering risk processor having load and store architecture load store architecture. So, the operands will be coming source of the operands are two registers R i and R j. The basic idea in this dynamic instruction scheduling is as soon as these operands are available in these registers execute it provide it to the functional unit or ALU and get it executed. So, this is the basic idea of data flow computation and you may have heard of the data flow machine which was proposed by professor Irvind. So, they are also he used somewhat similar concept, but here it is done for dynamic instruction scheduling. That means, it is based on execute an instruction as soon as its operands are available. So, it is very easy to say this but it is hard to implement. That means, when the operands are available in the registers you have to identify the process the hardware has to identify and as the instruction execution is progressing it has to closely monitor various buses operations being performed by different functional units to keep track of when the operands are I mean execution of some I mean operation is complete and it is going to write into a register and as soon as it is written into a register and if it is a source register it knows that the it is available in the register. And so, if both the operands are available that execution can be started. So, it has to be done this way. So, and whenever you do it this will allow an instruction behind a stall to proceed if it itself not stall due to dependency. That means, we have already discussed about the data dependency particularly the true data dependency. In case of true data dependency what happens a particular instruction will produce a output which will be consumed by used by subsequent instruction. So, in such a case obviously, there is no alternative but to stall due to data dependency. However, there are situations where some instructions are not data dependent I mean there is no true data dependency for example, in this example you have got three instructions divide d double f 0 comma f 2 comma f 4 add double f f 10 comma f 0 comma f 8. Obviously, the first two instructions that means add d has true data dependency on divide d. So, because it is producing a result which will be available in register f 0 which will be used by the subsequent instruction add d. However, if you look at it this sub d has no true data dependency on the previous two instructions. That means, the first two instruction the third instruction is not dependent data dependent on the first two. However, as you can see it is the second instruction is reading from a register and third instruction is also reading from a register. So, although there is no dependency that means, this can what can be done in such a situation after divide d it is possible to execute this instruction that sub d. So, in place of add d you can execute sub d. So, that will lead to out of order execution and obviously, this will lead to out of order completion, but this out of order execution and out of order completion will lead to the other type of hazards that read after write hazard is because of true data dependency, but the other type of hazards may arise even when there is no true data dependency. So, you have to overcome the other type the other two types of hazard whenever you allow this out of order execution or out of order completion. Now, here is some kind of convention and instruction is considered to be in execution between the time it begins execution and it completes execution. So, we shall say that instruction is in execution when it begins execution between the time it begins execution and it completes execution. So, in a dynamically scheduled pipeline all instructions pass through is to issue stage in order as I have already mentioned. So, it leads to in order issue, but it may lead to out of order execution as I have told. The advantage of dynamic scheduling is that can handle dependencies unknown at compile time. I have already mentioned that dependencies involving memory references which cannot be detected at compile time, but at run time you know that a particular value that means will be written into a register from a particular memory location. So, that effective address that is being generated can be same, but the instruction may look different. So, as a consequence this kind of dependencies can be handled by dynamic scheduling and one another very important consequences that compiler is simplified it leads to a simplified compiler and it also allows code compiled for one pipeline to run efficiently on a different pipeline. Now, whenever we go for static instruction scheduling that is static instruction scheduling has to do instruction scheduling for a particular pipeline in mind. Now, if the pipeline is changed that code compatibility is lost that program cannot be executed in another processor having different pipeline, but in this case whenever you go for dynamic instruction scheduling that problem does not arise because that instruction scheduling and all these things you are doing at run time. So, if the pipeline is changed that hardware will automatically take care of it and another approach which I shall discuss later on that hardware speculation which are used in modern processors can be used in dynamic instruction scheduling which cannot be done in static instruction scheduling. And this particularly this hardware speculation is used to improve the performance of the processor and this can lead to further performance advantages builds on dynamic scheduling. That means later on when I shall discuss hardware speculation we shall see that it cannot really it cannot be based on that static instruction scheduling it has to be based on it builds on actually dynamic instruction scheduling. And there are two popular schemes which are available for dynamic instruction scheduling which have been which were developed for two different very popular processors first one is known as score boarding. So, this score boarding was first used for back in 1964. So, in those days there was no concept of software when known that is concept of software pipelining was not known and obviously the instruction level parallelism were restricted only to the basic block. And in those days that cache memory and things were also not present. So, even those days they developed a technique known as store boarding and that was done for CDC 6600 computer of course that CDC 6600 computer has got a large number of functional units 11 functional units. But later on we shall explain score boarding actually the score boarding name has been taken from this processor CDC 6600 they gave the name score board for this particular hardware based dynamic scheduling approach. And later on for IBM 360 390 in 1966 another approach was developed by Tomasulo Tomasulo was a scientist working in IBM and he developed that this approach this dynamic instruction scheduling approach for IBM 360 by 91. This IBM 360 91 was a very popular machine and this approach was developed particularly for improving the performance of floating point units. And both of these approaches I shall discuss one after the other, but to start with let me focus on score boarding because it is little closer to in order execution and then later on I shall discuss about this Tomasulo approach which allows you out of order execution and out of order completion. Now score boarding is a technique to allow instructions to execute out of order when there are sufficient resources and no data dependences. That means the score boarding checks two things number one is structural hazard that means structural hazard is because of limited resources available in the processor. So, if resources are not available obviously then an instruction cannot be scheduled. So, structural hazard is overcome by looking at the resources available in the processor and second thing is that it also checks data dependency two data dependency. So, whenever there is two data dependency then also instruction is solved. However, what it does in both the cases if enough resources are not available or if there is a data dependency if an instruction is waiting for result generated by a already scheduled instruction then it will be also stored. So, the way it resolves is by stalling WAR and WAW hazards that did not exist in in order pipeline can arise in dynamic scheduled processors as I have already mentioned. The goal is to maintain an execution rate of one instruction per cycle that was the basic objective of the score boarding that means it will overcome WAR and WAW hazards which can arise in this approach and of course, basic goal is to maintain execution rate of one instruction per cycle. So, in this particular case every instruction goes through a special hardware known as score board and score board constructs the data dependencies of the instruction and that means it it it maintains a kinds of data ways and with the help of data ways it maintains the data dependencies and it can decide only when there is no data dependencies it will allow an instruction to execute that means operands are available in the registers. So, another thing I should tell you that you know score boarding did not allow forwarding that means we have already earlier discussed the concept of forwarding where the intermediate results are taken from the pipeline registers before the results are written to the registers. But in this case you will see it is the the results are taken from the registers itself not from the pipeline registers that means the bypassing or forwarding technique is not used in the context of score boarding and score boarding also controls when an instruction can write its results into the destination register that means whenever you the data has to be written into a register it will do the writing by avoiding the WAW type of hazards which can arise whenever you go for dynamic instruction scheduling. And this out of order execution requires multiple instructions to be in the execution stage simultaneously achieved with multiple function functional units along with pipeline functional units. So, here actually there is no distinction between multiple functional unit or pipeline functional unit both multiple functional unit and pipeline functional unit allows you you know issue of I mean execution of more than one instruction simultaneously. So, logically they will give you the same results as we have already seen. So, in this context here it does not really matter whether you are using multiple functional units or functional units are pipeline. So, all instructions go through the score board which is a centralized control of issue operand reading execution and write back that means four operations like issue of instruction reading of operands execution of instruction and write back all these are controlled by hardware which is known as score board. And all hazard detection is also centralized in the score board and this is the hardware we are eagerly waiting for. As you can see for the simplified MIPS score board where you have got only five functional unit that CDC 6600 has got eleven functional units for which score board was developed. But this MIPS processor the score board which will be explained has got five functional units two floating point multiplier one floating point divider one floating point adder and one integer unit. And here is your register bank and you have got this busses various buses and various buses are available. And this is the score board which controls different functional units and also controls the registers that means control reading of status controlling the registers controlling the functional units all are done with the help of a centralized hardware known as score board. And to allow out of order execution the ID stage instruction decode stage has been divided into two parts. The first part is known as issue and this issue hard issue stage will decode instructions check for structural hazard and it will issue in order if the functional units is free and no right after right. That means if there is no right after right hazard and if the instructions can be I mean if the hardware is available issue will be performed. And the then read operands wait until no data hazards then read operands that means here the reading is also taking place with the help of I mean that means this reading operation is also delayed I mean until all the hazards are overcome. And add the would stall that read operands and sub D could proceed with no stalls as you have already seen that three instruction example that means that add the would stall read operands because of data dependency but sub D could proceed with no stalls because there is no data dependency. So, score board allow instruction to execute whenever that first two conditions are hold and not waiting for any prior instruction. So, it is allowing you out of order execution. So, you can see here it is instruction fetch then issue is being done and different functional units will take different time. And after the read operations are performed execution is done then the right packet is performed at different instances of time. And that is controlled by with the help of the with the help of the score board by avoiding right after read type of hazards in both the cases whenever you perform right right operations. So, out of order completion which may lead to W a r and W type of hazards are overcome in CD C 6 6 0 0 by stalling right stall right to allow read operations to take place stall right to allow read operation to take place read registers only during read operands stage. That means only after read operands are complete then it is done and later on we shall discuss about Tomasulo's algorithm where register renaming was done register renaming has not been done in this score board. And particularly for W a type of hazards must detect hazards stall the issue stage until other completes. So, need to have multiple instruction in execution phase multiple execution in units or pipeline execution units they are same I mean so far as the functionality is concerned. So, the ID stage is replaced by two stages I have already mentioned and score board keeps track of defendances and state of operations. That means monitors every change in hardware I mean whether execution is complete and also determines when to read operands when can execute and when can write back hazard detection and resolution is centralized as I have mentioned. And it has got four stages of score board control number one is issue it decode instructions and check structural hazards as I have already told. If a functional unit for the instruction is free and no other active instruction has the same destination register. That means it keeps track of the already issued instructions and if the already issued instructions has a destination register which is a source register of a particular instruction then it is stored. So, the score board issues the instruction to the functional unit and updates the internal data structure. If a structural or WA right after right hazard exits then the instruction issue stalls as I told the solution for this score board is essentially stalling and no further instruction will issue until these hazards are cleared. Then read operands wait until no data hazards then read operands. So, reading of operands is done in the second stage of instruction decode and it results that read after right type of hazards dynamically. A source operand is available if no earlier issued active instruction is going to write it or if the register containing the operand is being written by the currently active functional unit. When the source operands are available the score board tells the functional unit to proceed to read the operands from the registers and begin execution. The score board results read after right as I mentioned hazards dynamically in this step and instructions may be sent into execution out of order. So, this is how it is allowing an out of order execution if operands are available. Then the third stage is execution stage and operate on operands the functional units begin execution upon receiving operands. When the result is ready and it notifies the score board that it has completed execution and finally, comes the write back stage it finishes the execution by writing results into the registers appropriate registers. So, once the score board is aware that the functional unit has completed execution the score board checks for w a right after read hazard if none it writes into the register in the results. So, you can see dynamically it over comes right after read type of hazards and writing is done only when this type of hazard is not available if w a is then it stalls the instruction. So, in this particular example CDC 6600 score board stalls sub D until add D read after read operands. That means, you can see there is no true data dependency in this particular case it is reading an operand and it is being used here and here it is writing that means it is a read after write type of operation. So, this reading this sub D although there is no true data dependency, but there is a write after read type of dependency. I mean that type of hazard can occur. So, that is being overcome by stalling this instruction sub instruction until reading operation is completed by the second instruction. So, this is done dynamically with the at the write back stage. So, these are the four stages now in addition to these stages it has got three different types of instructions parts to maintain the database. So, first of all instruction status. So, which of the four steps of the instruction is in instruction is in. So, for a for the instructions which have been issued they can be in different stages instruction issue reading operands execution or write back. So, it keeps a it maintains a database about in which stage particular instruction is in and then functional unit status. It indicates the state of the functional unit nine fields for each functional unit. So, you can see database is quite complicated. So, that different functional unit has got there are nine fields like busy then operation to perform operation can be addition, subtraction, multiplication, divide and so on. Then f i the destination register for a particular functional unit and f j and f k they are source register numbers and q j and q k functional units producing the source register. So, you can see not only it keeps track of the source register numbers, but which functional units will produce the result and write into the registers that is also I mean maintained in this database functional unit status database. And r j and r k flags indicating when f j and f k are ready and not yet read that means the functional units may be ready, but the operands have not yet read that is that is being maintained with the help of this r j and r k flag bits. So, there are seven flag bits busy o p f i f j f k q j q k and r j r k we shall see how they are being used when instructions are in flight. And then you have got register result status. So, there are 32 registers and from which functional units these registers are being written indicates which functional unit will write each register. If one exists blank when no pending instruction will write into the register that means the registers will be written by some functional unit. So, it is keeping track of which functional unit will write into which register. So, this is the database that is being maintained and you can see this is the retail scoreboard control pipeline control these are the instruction status and it will wait until functional units are not available and results are not available only when functional units are available and results are available instructions are issued and the various book keeping that is being done that means to maintain those you know that seven flag bits that is being shown here. So, busy o p f i f j f k q j q k r j r k and how they are getting the result and doing the necessary book keeping and it is read the read operands r j and r k no or yes that is being done wait until these are available and execution complete that functional unit is whenever execution is complete a functional unit is released and so if it is not busy a functional unit is released. Then here is the write result how when the writing of result has to be delayed that is being mentioned here and where there are various conditions it will do the book keeping and wait until the results can be written in a fluid register and as a consequence what is being done particularly in the issue stage the w a w type of hazards are overcome and in the write back stage w a r type of hazards are overcome and of course that the most common type read after write that hazards which are essentially representing two data dependency those hazards are overcome by stalling because if operands are not available then stalling has to be done if functional units are not available that is your structural hazard then stalling has to be done. So, these are being done let us and for cdc 6600 cdc 6600 there was improvement of 1.7 factor of 1.7 improvement for Fortran and 2.5 for hand coded assembly and of course this was done as I mentioned before main memory or cache memory I mean cache memory were available and for cdc 6600 surprisingly the hardware was not very complex only equivalent to a single functional unit. However, one very disadvantage is that large number of buses needed we have seen even for fine functional unit you have got a large number of buses because you have to do parallel reading and writing. So, number of buses is quite large however if we want to issue multiple instructions per clock more wires are needed in any case. So, central centralized hardware for hazard hazard revision then another thing is the scoreboard effectively handles two data dependencies minimizes the number of stalls due to two data dependencies and anti-dependencies and output dependencies are also handled using stalls and we have seen which is done by the issue and write back stages. Now, let us consider and illustrate the operation with the help of an example and to illustrate the example we shall consider that load has a one cycle latency and add d and add additions and subtraction has two cycles latency multiply has got ten cycle latency and divide has got forty cycles latency. Some realistic numbers have been taken just to illustrate the example. Now, here is the scoreboard where you can see there are three stages instruction status, functional unit status and register result status and these are the instructions to be executed load if there are two loads followed by multiplication subtraction division addition. So, these are the instructions essentially the state line code instruction which is a basic block it cannot operate beyond basic block. So, these are the instructions to be executed and right now your the instruction status, functional unit status, register result status all are empty. Now, let us start execution with the help of scoreboard. So, after first cycle the this instruction is issued. So, here it shows it has been issued and functional unit status shows that integer unit has become busy. So, it is yes it is busy and it is an instruction to be performed operation to be performed is load and destination register here is F 6. So, F 6 is written here and the source register is R 2. So, you can see how the database is updated as you go to the cycle number 1 and here the functional unit that is busy that register results status here F 6. So, it will come from the integer unit. So, functional unit which will produce the result is shown here. So, that is after the first cycle and after second cycle you can the operation this particular operation that second that operant read operands has taken place for in the second cycle, but cannot issue I 2 because integer unit is busy. It has got only one integer unit and so the load the second load instruction cannot be unfortunately issued because here it has lead to a kind of structural hazard. So, because of the structural hazard in the second cycle the second instructions cannot be issued. So, next instruction due to due to in order issue. So, it has not and nothing much a change except it has this particular status a change. So, you go to the third cycle execution is completed and execution is completing because it requires two instruction and you can see how the status is being changed here. I mean not much has been changed compared to previous thing, but only this has changed and it has gone to the right back stage it will right result into F 6 and after the writing of result you can see here that functional unit is no longer written here because now the functional unit is released after it has completed that functional unit is released and after it has completed the writing of result into the appropriate register R 6. So, here it is not also shown now it will we shall go to the fifth cycle. As we go to the fifth cycle the second load instruction is issued and second load instruction is issued and you can see here this F 2 is the register in which result will be written. So, functional unit is integer functional unit and the F 2 register will be written by this functional unit and inter is unit unit is again busy it is performing this load operation and the destination register is F 2 and source register is R 3 and this R k is yes and it is shown here and now let us go to the fifth cycle, sixth cycle. Now here you see the third instruction has been issued because there is no true data dependency and functional unit required is different. So, here it requires a multiplier. So, since the multiplier is become busy now and various components like destination register, source register and various other things are filled up appropriately in this database and this result register status is also properly maintained. So, it has gone to the sixth cycle now it will go to the seventh cycle. In the seventh cycle it will proceed to I mean sequentially upper end read it has completed execution it is it will complete and it will also now issue the fourth instruction because we have got one multi divide. So, that subtraction so we that integer that adder this is available now. So, this is this particular functional unit is now getting busy and corresponding fields of the those seven fields are being filled up appropriately operation to perform destination register, source register and so on. So, you can see various registers I mean which will be written by different functional units are maintained by this register result status. So, this is the clock cycle seven now we go to the clock cycle eight as we go to the clock cycle eight that integer unit will complete its execution it will write the result into the register F2. So, you can see here no longer there is a mention about the integer unit and integer unit has become busy free. So, here it is not yes it is no longer busy. So, integer unit is now free, but it has already issued the remaining three instructions they will be in different stages of completion because they are we have seen that multiplication takes longer time. Now, let us see although the instruction multiplier multiplication double instruction was issued, but F2 and F4 you see F it was waiting for the result for to be written by the second instruction. So, it did not read the operands until this writing was complete. So, in the next cycle if we go you will see it will read the operand. So, it has read the operand because now the I3 and I4 reads the operands because F2 is now available. So, earlier F2 was not available till the eighth cycle, but in the eighth cycle the writing write back operation has been completed and it is now the operands are now available. And you can see both these instructions where F2 is the source operand are now reading their operands. So, they will go to the second stage of the pipeline that is your read operands and the sub D both of them will do that and here accordingly you can see the multiplication and division these are in progress and add is not yet released that this particular. So, these three are in execution now and accordingly where the operands will be written are mentioned multiplier will write into F0 adder will write into F8 and divide will write into F10. So, we shall go to now cycle 11 in 11 it completes the operation and writing of the result takes place in F8 and this other things because this multiplier and divide will take more number of cycles. So, this multiplication will continue. So, we shall go to the 12th cycle we have skipped few cycles and only in the 12th cycle this particular instruction that this particular instruction execution is complete. So, it has read its operand in 9th cycle execution it will take 2 cycles. So, in 12th cycle the result is written into register F8 and then you will see it can be issued that this instruction can be issued in the next cycle. So, this instruction has been issued in the next cycle because operands are now available. So, all the instructions have been issued now and they will be in different stages of execution the first is the first instruction and this instruction execution has been completed. So, you see that out of order execution has taken place out of order completion has also taken place, but writing of results has been done very I mean carefully such that the hazards are overcome. So, this is the 14th cycle now the operands are available it will lead the operands and in the 15th cycle at the takes 2 cycles. So, no change and we shall be at the completes, but multiplication and divide will go on it will take several cycles it will continue multiplication completes after 10 cycles. So, in the 20th cycle multiplication is complete, but division will continue and accordingly these are the corresponding databases are updated. So, score board example after 21 you can see only except divide and at the all executions are complete and at these also completing in cycle 22 and only divide is left out. So, we have skipped large number of cycles because 40 cycles are needed by divide. So, accordingly we have skipped a number of cycles. So, in the 21th cycle we shall go to the 61th cycle it is completing and now it will write the result into the registers. So, all the everything has been done execution is now finished. So, we have seen how the execution has been completed, we have already discussed this and we shall briefly mention about the limitations of the score board. We have seen that the amount of parallelism available among the instructions is very restricted. The reason for that is it is restricting its window only to the basic block of a program and we have seen that within the basic block the instruction level parallelism is very much restricted and as a consequence it cannot really give you very good result the performance cannot be improved much. And second is the number of score board entries that is window is not beyond branch. So, if the execution can be completed beyond branch then that window of where the instructions which are considered by score board is taken will can be completed but unfortunately the number of entries in the score board is very limited because of the limited size of window the number of and types of functional units. So, the number and types of functional units has to be dependent on the instruction level parallelism available and the window size. There is no fun in having very large number of functional units because that may overcome the structural hazard but because of the other three types of hazards there may be stalls and performance improvement cannot be much. So, the number and types of functional units are to be carefully chosen and the presence of anti-dependencies and output dependencies we have already seen because the anti-dependencies and output dependencies are arising because of out of order execution and they are tackled by score board with the help of stalling the by introducing stalling cycles and that is how the score board is performing. So, in the next class or may be subsequently we shall discuss about that the another dynamic instruction scheduling approach that is Tomasulo's approach which was developed for IBM 360 and that is more sophisticated and we shall see how it over comes some of the limitations of the score board. Thank you.