 hardware based speculation. We have discussed various techniques to improve the instruction level parallelism and we have seen that dynamic instruction scheduling can boost pipeline performance particularly it effectively takes care of data hazards. Then we have discussed different types of the various control hazards for higher performance and particularly for higher performance it is very important to overcome control hazards that we have seen. And we have introduced the concept of branch prediction to overcome control hazards and by using I mean suitable technique of branch prediction we have seen how the performance can be improved. And in the last lecture we have discussed dynamic instruction scheduling along with efficient branch prediction and we have seen how it can significantly improve performance. Now hardware based speculation represents a subtle, so it represents a subtle but very important extension of the idea of hardware based instruction scheduling with branch prediction. So it is a extension of the previous technique that we have discussed in the last lecture this is an extension of our branch prediction with dynamic scheduling. What it does it follows the predicted flow of data values to choose when to execute instruction. In other words it is a kind of data flow execution and operations execute as soon as their operands are available. So it is not that way much different from the technique that I discussed in the last lecture that is dynamic instruction scheduling along with efficient branch prediction but there will be some important differences and we shall see what are the differences and how this dynamic instruction scheduling along with I mean hardware based speculation is introduced. The need for hardware based speculation is coming from particularly from wide issue processors. So we have seen the superscalar processor which are considered as wide issue processor that means you are issuing more than one instruction per cycle. So in such cases we will find that depending on the application the branch frequency that is present in the program you may for in each cycle one will encounter the branch instruction that means a wide issue processor may need to execute a branch in every cycle to maintain performance and just predicting branch may not be enough to get the desired amount of instruction level parallelism. That means the instruction level parallelism that can be achieved simply by branch prediction is not enough will not be able to keep various processing elements or functional units that is present in a superscalar processor. And we have seen in a superscalar processor you have got multiple functional units it is essential to keep them busy to get higher throughput. So the solution is speculating the outcome of branches and executing the program as if cases were correct. Earlier you know the execution were not performed at the boundaries of control dependencies particularly until the control dependencies were not resolved subsequent instructions were not executed. But here we are going a step further so execution will continue in the speculated direction and it will perform I mean it will assume that the speculation is correct and it will perform fetch, issue and execute as if branch prediction was always correct. So you are going ahead with execution of instructions assuming that your speculation is correct. However, whenever you do that you have to provide mechanism to handle situation when the speculation is not is wrong. That means speculation is a kind of guess may turn out to be correct it may turn out to be wrong. So, whenever it turns out to be wrong you have to take appropriate step such that you do not get incorrect result. That means you have to undo the execution that has been performed and how that can be done we shall discuss in detail. You may recollect Tomasulo's approach which we have discussed earlier. So, until the controlling branches have executed instructions only allowed to be fetched and issued but not actually executed. That means the controlling branches must be executed only then the instruction can be executed. But before that fetch it can be fetched it can be issued multiple issue can take place but execution will not take place. And as I mentioned speculation takes this approach a step further. It will actually execute an instruction based on branch prediction. So, here we are also doing branch prediction but in the direction of branch prediction earlier executions were not continued but here we are continuing. And essentially this hardware based speculation combines three ideas. The three ideas are number one is profile based dynamic branch prediction to choose which instruction to execute. So, you will be using some profile based prediction technique may be that correlating predictor or tournament predictor some sophisticated prediction technique to choose which instructions to execute. Then speculation to allow instructions to execute even before control dependencies are resolved. So, earlier I mean dynamic scheduling with branch prediction we were not allowing this but now we are allowing to execute before control dependencies are resolved. So, earlier the instructions were not execute until the control dependencies were resolved. But this is the step we are doing in case of hardware based speculation. Now obviously in such a situation as I have told this can be done with the ability to undo the effects of incorrectly speculated sequence. Earlier what was the meaning of execution? By execution what we were doing we are performing with the execution writing the result in the memory or register. But if you do that you cannot undo it. So, permanent irrevocable change is taking place. So, that has to be somehow stopped. And dynamic scheduling to deal with scheduling of different combination of basic blocks. So, it is doing branch prediction speculation along with dynamic scheduling to do with scheduling of different combination of basic blocks. So, at the branch point you will be going through different directions and you will encounter different basic blocks. So, it will in the predicted direction it will continue to execute the basic blocks. And this particular approach have been used in almost all modern processors like for PC, Pentium, Alpha, AMD, Athlan, etc. Almost all processors modern processors. Now what we shall try to do? We have already discussed in this in details the Tomasulo scheme. But Tomasulo scheme cannot be exactly followed we have to make some modification. So, what kind of modification can be done? We can extend the Tomasulo's algorithm to support speculation by doing two things. Number one is separate the bypassing of results among instructions from actual completion of instruction. So, early what we can do? We can execute an instruction produce result, but those results will not be written in the memory or in registers. But they will continue to pass values to other instructions as and when they need. And another step you have to follow allow an instruction to execute and to bypass its results to other instructions as I have told without allowing the instruction to do any updates that cannot be undone. That cannot be undone means if you perform writing into a register or in a member location that cannot be undone that is a permanent change. So, that will be allowed instructions with speculator results become a speculative. That means it remains speculative until writing is done when it is being done when an instruction is no longer speculative. That means the outcome of branches is already decided then it is no longer speculative. Now, we know that these instructions were supposed to be executed. So, only then when an instruction is no longer speculative allow it to update the register file or memory. So, we have to reach that point when an instruction is no longer speculative it is actually known that instruction will be I mean the control those control instructions. That means the condition has been evaluated the target address is known only when these two are known we know whether an instruction is speculative or not speculative. So, when these two are known then an instruction is no longer speculative only then you can make the permanent changes and this additional step is called instruction commit. That means what we are doing we are we are making we are breaking up the execution into two parts. We are performing the execution we are doing the computation in a shadow, but permanent change is taking place in the commit stage we are adding an additional stage known as commit stage. So, the key idea is allow instructions to execute out of order, but to force them to commit in order. So, this is a very important statement that means we are allowing the instructions to execute out of order, but we are we are enforcing the instructions to commit in order as it happens as it is present in your program order such as updating state or taking an exception. These are done only when this will prevent any irrevocable action until an instruction commits. So, this is the basic idea behind this hardware best speculation when it is implemented on top of the basic Thomas approach. So, you have to separate the process of completing execution from instruction commit. So, completion of an instruction means an instruction have been executed it has produced result and commit means although results have been produced the results have not been written into the registers or in memory locations that is performed in the commit stage. So, obviously this will require additional set of buffers to hold results of instructions that have finished execution, but have not committed. That means you will obviously without any additional buffer you cannot do it. You have to keep the results value generated by instructions to be passed on to other instructions unless you store it in some buffer you cannot do it. So, you will require additional buffer and from those buffers it will pass on to other instructions and that particular buffer is known as reorder buffer a source of operands for instructions. So, reorder buffers will will be source of operands for instructions supplied operands it will supply operands in the interval between the completion of instruction execution and instruction commit. That means even before the memory or register update take place from the reorder buffer it will provide the operands for different instructions and for this duration that between the completion of instruction execution and instruction commit reorder buffer will be the source of operands. Of course, you have to take into account exceptions and interrupts as well speculation means it will guess and check important for branch prediction need to take our best sort at predicting branch directions. So, this is a very important thing first of all your performance will heavily dependent on the correctness of your prediction. So, you have to use sophisticated branch prediction technique and if we speculate and are wrong need to back up and restart execution at the point at which we predicted incorrectly and this is exactly the same as precise exceptions. That means, if we can do that then this will overcome the precise exceptions. So, precise exceptions essentially wants this and if something is wrong we have to back up and restart execution to point to the point at which we predicted incorrectly. So, this is how the precise exceptions are overcome and techniques for both precise exceptions and interrupts and exceptions and speculation is same essentially by using this approach in order commit we are able to overcome the precise interrupts and exceptions and exceptions are handled by not recognizing the exception until instruction that caused it is ready to commit in ROB. So, that means, whenever an exception is detected and it is kept fending until it reaches the commit stage in the ROB. So, whenever before committing is done and if a particular instructions need to be discarded then the exception operations to perform is also discarded. So, both are discarded not only the instruction, but the exception conditions. So, if a speculated instruction raises an exception the exception is recorded in the ROB and that is why that means, the ROB keeps information about the exception and if it needs to be undone then whenever I mean we can keep that information until we commit and only when it is committed then that exception is allowed to happen. So, that is why reorder buffer is present in all new processors. So, whenever we go for reorder buffers there are two basic changes two major changes compared to Tomasulo scheme number is you will require reorder buffers that is an addition and elimination of the store buffer. Earlier we were having you may recall that in our Tomasulo's basic approach we had store buffer and load buffer separate, but now the store buffers are no longer required the reason for that is the store buffer function is taken care of why the buffers present in the reorder buffer. So, this leads to elimination of the store buffer whose functions are integrated in the reorder buffers. So, you require additional set of registers called reorder buffers store results of instructions in shadow that have completed, but not yet committed. Why we are calling them reorder buffer? There is a name assigned to it the reason for that is even though instructions may complete in any order they are reordered in the reorder buffer. So, that they can commit in order that means instructions are executed their results are stored, but the order in which commit is changed with the help of this reorder buffer and that is why the name calling it reorder buffer. So, it puts instruction back to order instructions enter R O B out of order instructions leave R O B in order by in order or out of order mean with respect to the program order in the program the sequence in which they appear with respect to that the results of an instruction becomes visible externally when it leaves R O B. So, until an instruction leaves R O B it does not become visible to the outside world outside world means to the programmer who is executing its program. For example, if a programmer does single stepping execute instruction one after the other. So, in such a case you know until that register operation or memory operation is performed it will not be visible to the programmer. So, that is why results of an instruction become visible externally when it leaves R O B and that means registers updated and memory updated. So, in case of Tomasulo's algorithm once an instruction writes its result any subsequently issued instructions will find the result in the register file with speculation the register file is not updated until the instruction commits I have repeated it several times and we know definitely that instruction should execute. Thus the R O B supplies operands in the interval between the completion of instruction execution and instruction commit. So, R O B is the source of operands for instruction just as reservation stations earlier we were using reservation stations to provide the operands to the functional units you may call that. But now we are using reorder buffers from where the operands are available. Now essentially what it is doing R O B extends the architecture registers like reservation stations. So, as I mentioned earlier reservation stations are performing the role of additional registers which is not present in a processor. Similarly, reorder buffers are also essentially extending the register set registers not available in the processor architecture is now may be I mean is virtually available in the reorder buffers. So, we may consider it extends architecture registers like reservation station. So, this is the schematic diagram how the this shows how the Tomasulo's algorithm I mean have been extended you may recall that we had one store buffer here, but that store buffer has been removed in place of that we have put reorder buffers at the top. And from the you can see reorder buffer is connected to the common data bus. That means whenever any in the functional units memory unit or adder unit or multiplier unit when they produce any result they straight away go to the reorder buffer. So, they are not going earlier they were going to different you know different buffers present in the reservation stations they were going to the other buffers, but now you can see that interconnection has been removed. It goes only to the reorder buffer and from the reorder buffer they are they can be stored they can be they will go to the registers they will go to different reservation stations. It will go to the reservation stations it will also go to the memory unit I mean in case of I have to write it I mean in the if it has to be written into the memory in case of store I mean it will come from reorder buffer then it will be stored that is why the store buffer is not present here. It will come from the reorder buffer however load buffers are necessary because they will provide the load addresses they will compute the addresses it will be present here load buffers will be present here it will come from the address unit. And the data may be available from here and which will be written in the I mean in case of load you know the address will come and then it will go to the reorder buffer and from where it will go to the register. So, this is how load operation will be performed. You cannot really get rid of the reservation stations although we have reorder buffers the reservation stations cannot be removed because they will provide the operands into the multiplier. However, you have earlier you were keeping track of from which functional unit the output will come to the reservation stations. But here you have to keep track you have to keep it kind of tag from which reorder buffer it will come I mean there is a reorder buffer number reorder buffer 1 2 3 and so on. And it is entered in the in the form of a first in first out FIFO and the reorder number the reorder buffer number will be will be will be tagged here. And so from the corresponding reorder buffer number it will operands will come and then it will be stored in the reservation stations. And from the reservation stations it will go to the different functional units. So, that part is not changed and that is the reason why we cannot get rid of the reservation stations. Although partially the functions earlier performed by reservation stations are being performed by reorder buffers, but not fully. So, with this basic structure in mind let us see what are the different entries that is present in the reorder buffer. So, each entry in the reorder buffer contains four fields. There will be four fields in the reorder buffer number one field is instruction type the instructions can be broadly divided into three categories as you can see number one is branch branch has no destination result. Second is a store has a memory address destination that means you have to store in a memory location and branch means simply it will jump to a particular address. So, there is no destination result and register operation which are related to ALU operation or load. So, ALU operation on load is performing in the same way as we have seen in this diagram whenever you perform ALU operation it will be data will go to the registers in the same way data will go to the memory unit it will come from the reorder buffer. So, ALU operation on load is somewhat similar it is different from store. So, because they have the register destinations. So, these are the three different instruction types and that you have to that is kept in one of the fields of the reorder buffer. The second field is the destination the register number for loads and ALU operations or memory address for stores where the instruction result should be written. So, there is a destination field which keeps information about the register number in case of register operation and memory address in case of store where the instruction result should be written. Then the third field is the value of instruction result until the instruction commits. So, as I mentioned repeatedly the from the common data bus the values will come to the reorder buffer as the functional units does some computation and the computation result those values are stored in the reorder buffer and this is the field where the value will be stored. Then you have got another field which is ready. So, it indicates that instruction has completed execution and the value is ready. So, whenever an instruction is issued it gets an entry in the reorder buffer and as it goes through different stages like execution write means write on the common data bus and until it commits it remains in that particular reorder buffer and that is the reason why you have to keep track of it. So, indicates that instruction is completed execution and value is ready means it has to be written either in the register or in the memory location that is to be written in the destination. So, destination is known and when it is ready it will go to the destination and the value is also known. So, the value is known and when the ready flag bits becomes one then that value has to be written in the destination register or memory location. So, this is the this is in a nutshell the information or the data structure that is being maintained inside the reorder buffer in the four fields it is being done. Now, let us see. So, the whole instructions in FIFO order as I told first in first out order exactly at issued that means the way the instructions have been issued in the same order it is kept in the reorder buffer in the form of first in first out. I mean that is your data structure first in first out data structure with when instruction is complete results placed into ROB that means supplies operands to other instructions between instruction execution complete and commit as I have already told. So, you will require more registers like reservation station that means the number of registers that is present in the reorder buffers will be much more that is present in the architecture registers present in the processor and tag results with ROB buffer number instead of reservation station. So, that means ROB buffer number that means you have to keep track where the data will be available. So, that is done by keeping tag results with ROB buffer number instead of reservation station and instruction commit when values at the head of ROB is placed in the registers and as a result it is easy to undo speculated instruction on means predicted branches or exceptions we shall discuss it in little more detail later. Now, let us look at the 4 different steps that is being performed whenever you go for speculated execution or hardware based speculations. So, it is also called as speculated execution. So, first step is issue. So, issue an instruction if there is an empty reservation station and an empty slot of ROB. So, here you see structural hazard will result if either there is no empty reservation station or there is no slot in the ROB. So, both has to be available for issuing an instruction and instruction cannot be issued until both the things are available. So, send operands and reorder buffer number for destination this stage is sometimes called dispatch. So, you may have heard of this term dispatch. So, this issue is essentially dispatch which sends operands and reorder buffer number for destination. Then second step is execute. So, whenever an instruction is issued it can perform execution when both operands are ready then it can be executed. If one or more operands not yet available then what has to be done? We have seen whenever an operand is not available it will be obviously be produced by some functional unit and that functional unit will write it on the common data bus. So, you have to keep on monitoring the common data bus. So, monitor common data bus and wait for the register to be computed. That means it avoids read after write type of hazard and this particular step is sometimes called issue. Then executing an instruction may take multiple cycles as I have already mentioned depending on the instruction type. If it is multiply it will take more number of cycles than addition. If it is division then it will take still more number of cycles than multiply. So, that whenever execution takes place it may take multiple cycles depending on the instruction type. Then comes the write result. Here whenever we say write result earlier write result was essentially writing into the memory location or register, but here it is not. So, writing means writing on the common data bus. So, whenever execution is complete the result can be written on the common data bus and it is received at waiting reservation stations and ROB. So, the data goes to the waiting reservation stations and ROB. Obviously, it goes to through ROB and for store it is written in the value field of the ROB. So, when if it is a store instruction we have seen that store buffer is no longer present. So, since store buffer is no longer present it is written into the value field of the ROB and subsequently in the commit step that writing into the memory will take place. It will store the data in the corresponding memory after the address is known. Then comes the last step that is the commit step. So, commit can be broadly divided in three different types. Normal commit, when instruction at head of the reorder buffer and result is present. That means, it will reach it is a first in first out it will reach the head of the reorder buffer at some point of time as the clocks are progressing and at that point if the results are present it will update register with result or store to memory and remove the instruction from the reorder buffer. So, you see so as soon as an instruction is you know in the first step in that instruction is entered into the reorder buffer and at the commit state after the commit at the end of the commit step it is removed from the reorder buffer after performing permanent change in the processor that means writing into the register or in the memory location. And store is done is somewhat similar to normal commit except that memory is updated rather than registers. So, in the previous cases we have seen that in the normal commit essentially registers are updated, but here you are updating the memory then branch with incorrect prediction. So, this is now you have to take necessary step so that a branch which was executed computation was performed, but it has now turned out to be incorrect. So, ROB is fast execution is started at correct successor of the branch and this is sometimes called graduation. So, commit step is sometimes called graduation. So, once an instruction commits its entry into the ROB is acclaimed and registers memory destination is updated as I have told this is performed in the commit step. So, this is a simple I mean this shows how it takes place in different cycles. So, first instruction load instruction has been entered in the ROB reorder buffer number one and accordingly destination registers in the reservation stations sorry update is performed. You see here it is mentioned ROB number one earlier it was mentioned that execution unit that will produce result, but here it is not so it is tagged with the ROB. So, that ROB will write into the register F 0 and the load operation will perform that is being shown. So, destination is to F 0 and here you can see the it will be required in the second instruction and here that address it will be it will be loading into the that destination address is 10 plus R 2 that is the address from where loading will take place. So, that is from memory it have to read it from the memory. So, it will go to the memory then the third instruction division here accordingly the updating is done in the reorder buffers and also the that store buffer and corresponding modifications are done the for the it is from memory and it will go to the from memory means the it will in this will read from this. So, that is being mentioned and the floating point adders this addition is performed by this floating point adders and the corresponding entry into the reservation station is performed and in the reservation station corresponding to this division is performed here. So, three two these are all mentioned here that means, where the result will go and which from which reorder buffer it will get operand these are all mentioned here in the reservation station. So, you can see we are using reservation station buffers and also we are using reorder buffers and in this way other instructions are entering the reorder buffers and the computation is performed and you can see how the values are being stored here this is the value field where values are stored and after a performing addition that value is also stored. So, you can see this is how the computation is done. Now, what about memory hazards because we have to overcome memory hazards and let us see how the memory hazards are avoided. W-A-W and W-A hazards through memory are eliminated with speculation because actual updating of memory occurs in order when a store is at head of ROB and hence no earlier loads of store can still be pending. So, you can see that in the reorder buffer since it is here the entry takes place in order in the same program order and since the commit operation is performed with the help of the ROB. So, these two hazards W-A-W and W-A-R type of hazards through memory are completely eliminated and read after write hazard through memory are eliminated by two restrictions. What are the two restrictions by not allowing a load to initiate the second step of its execution if an active ROB entry occupied by the store has a destination field that matches the value of the A field of the load. So, this is how read after write hazard is eliminated another is maintaining the program order for the computation of an effective address of a load with respect to an earlier stores. So, the read after write hazards are also overcome in this way. So, these restrictions ensure that any load that accesses a memory location written by written to or by an earlier store cannot perform the memory access until the store has written the data. So, this is how by committing in order you are able to overcome all the different types of hazards that may occur while you execute programs. So, let us take up few examples. So, this is a simple example this is a straight line code two load followed by multiple followed by then it is followed by subtraction and division and in addition. Let us see this is the in this particular case as you can see only the first two instructions load instruction first two load instruction has completed and it has performed commit. So, it has performed commit and the results have been written into the registers f 6 and f 2. So, the value that was obtained is given here 34 plus the content of register r 2 that was added to get the address. So, this and from there the value is obtained value was obtained and that is being written into the register f 6. Similarly, the value was obtained from the memory location by adding the content of r 3 with 45 and then after getting that value it was written into the register. And although the other instructions are I mean they are the execution has started the although several other sub completed execution the multiplication is at the head of the ROB and the two. So, these two have completed now this one is at the head of the ROB at the head of the ROB and this will take some time to get to complete. So, the subtraction d this particular instruction fourth instruction and the instruction six these two they will take smaller time. So, they have already completed their execution. So, but they are not allowed to commit due to the latency of multiply d that particular because it will committing is done in order. Since, this is at the head of the ROB unless this performs commit the other two although they have produced result it has they have completed execution, but they will not perform the commit step. The value column indicates the value being held. So, here you can see the this particular the value being held is here and the value for this instruction is also held here. So, the values are available which can be passed on to other instructions if other instructions need them, but they will not be committed into the registers until this multiplication d is performed which is at the head of the ROB at this point at this point. So, the value column indicates the values, but as shown in the inform we do not show the entries for the load store queue, but the entries are kept in order. So, this is this is a snapshot of the of a particular step when we have seen these two instructions multiple multiplication and division these two instructions have not yet completed executions. Others have already completed execution and and the state of the reorder buffer is shown here and the status of the floating point registers are also shown in this particular diagram. Now, let us consider another example involving a loop. So, this particular instruction is involving a loop and we have seen we can I mean beyond the loop boundary by in the predicted direction and this is the reorder buffer and these are the FV register status. We can see that the these two instructions have committed load and multiple d has committed means they have performed writing into the registers F0 and F4 and however the other instructions have although all the other instructions have already completed executions and hence no reservation stations are busy busy and none are shown reservation stations are not shown here the remaining instructions will be committed as fast as possible. So, these instructions execution have been completed, but they have not yet been committed. So, they will be committing as soon as that as possible and the first two reorder buffers are empty, but are shown for completeness that means these two reorder buffers are empty, but it is shown here they are no longer busy you can see reorder buffers. So, you can fill two more instruction here and they can be pushed out. So, these two are now instruction three is at the head of the buffer. So, this commit will be performed and then this commit will be performed. So, gradually one after the other the commit operations will perform. So, this is a third instruction third example here we shall explain with the help of superscalar that means issuing two instruction and we shall show how issuing can be done without speculation and with speculation. So, this is the multiple issue without speculation. So, you can see we are issuing two instruction at a time across the loop. So, first two instructions are issued then the second two instructions are issued then third instruction is issued. Unfortunately, I mean the third cycle this BNE instruction is issued, but unfortunately in the third cycle you cannot issue this instruction because there is a loop dependent you know that I mean this instruction is dependent on that. So, the LD following BNE cannot start execution because it must wait until the branch outcome is determined. So, this type of program with data dependent branches. So, you can see data produced by one loop is being used by another loop that because of this dependency you can see in the third cycle you cannot issue this instruction and later on we shall see in the speculation step when we shall. So, multiple issue with speculation there how this can be overcome. So, it will proceed in this way in the different clock cycles various operations will be performed and it will write the common DB at the clock cycle here this is without speculation. So, reorder buffer is not present. So, you are performing everything with the help of reservation stations and it is writing into the common data bus. So, then it is performing it in the fifth clock cycle it gets the operand and in the sixth clock cycle it will perform the execution and these two are dependent on that. So, the results will be passed on to these two instruction and in the seventh cycle they will perform the execution memory and in this way you can see it will proceed and however as you can see here it has to wait for one cycle because of this data dependent branches. So, one cycle will be wasted here and later on we shall see whenever we do it with speculation then this cycle will not be wasted because it did not wait for this particular instruction this control dependencies are overcome. So, this will proceed only when control dependencies are overcome and that is the reason why you require this particular this additional cycle. So, it cannot be issued load instruction cannot be issued unless the control dependencies are resolved. So, in this way it will proceed 10 cycle 12 cycle it will it will complete this then it will it will it will provide data to store data and BNE which is dependent on that. So, it will get the corresponding data then here again it has to wait till this control dependencies is resolved. So, in this way it will continue in the 16 cycle 14 cycle this load will initiate it will require 2 cycles to complete and then it will it will provide data to this particular instruction and in the then in the 18 cycles this will complete execution and data will go to that and in this way you can see the multiple issue without speculation is performing and it is requiring 19 cycles to perform the execution of these instructions. That means, 3 loops are getting executed in 19 cycles now let us see how long it takes whenever you perform multiple issue with speculation. So, here different cycles are shown when an instruction is executed and when the read access is performed and when commit is being performed. So, first part is identical, but here you can see this load this load instruction has already been issued it did not wait for this control dependence to be resolved because if we are doing it with speculation. So, although this is this is execution is completed in 7 cycle, but it has been issued in the 5th cycle execution here it is done in the 4th cycle issued is 4th cycle, but execution has taken place in the 5th cycle. So, you can see the advantage of speculation is demonstrated here. So, in this way it will proceed and you can see in 14 cycles it will perform computation compared to earlier we have seen it required 19 cycles. So, you are gaining 5 cycles to complete execution of 3 loops by using speculation. Now, very quickly let us look at the advantages of speculation the processor with ROB can dynamically execute code while maintaining a precise interrupt order and flushing any fending instructions in the ROB. So, it maintains precise interrupt order early recovery from branch misprediction. So, if there is a misprediction the processor can easily undo speculated actions when a branch is found to be mispredicted. How it is does by clearing all the ROB of all the entries that appear after the mispredicted branch. That means, after the mispredicted branch all the other instructions are flushed out, but all the instructions before that mispredicted branch are retained. So, by allowing those that are before the branch in the ROB to continue. So, this flushing is done and as a consequence it can recover early from branch misprediction and particularly the performance is heavily dependent on the branch prediction technique that I have already discussed. Then load and store hazards are eliminated I have already discussed in detail a store updates memory only when it reaches the head of ROB, WAW and WR type hazards are eliminated with speculation and actual updating of memory occurs in order as I have told and RAW hazards through memory are maintained and by how it is maintained not allowing a load to initiate the second step of its execution and check if any store has a destination field that matches the value of the load as it is shown with the help of this instruction. So, you can see the destination address is same this is for this load and store. So, this checking is performed that effective address calculation is done. An exception processing is performed if a speculated instruction raises an exception the exception is recorded in the ROB as I told earlier and not recognizing the exception until it is ready to commit. So, until it reaches the head of the ROB it does not perform the commit. So, the exception is fast along with the instruction when the ROB is cleared and if an instruction reaches the head of ROB it is no longer speculated. However, all these things whenever you include it adds significant complexity to the complexity to the complexity to the complexity to the control that means the control unit of the processor becomes very complex and as a result it will lead to larger chip area and higher power dissipation. As it is evident from this particular table we can see this Pentium processors which are based on superscalar architecture they are requiring the significantly larger chip area 130 millimeter square 180 millimeter square 106 millimeter square in spite of the reduced dimensions used I mean as the device size is reducing. So, this is the area on the other hand for the processors which are not superscalar which are VLIW I have already mentioned about them they require significantly larger chip area. So, chip area is more and particularly we will see because of larger chip area the power dissipation is also more. So, head is comparison of head dissipation is shown here this is corresponding to Pentium 3 processor playing a dividiat and the temperature profile is shown which is 105 degree centigrade and the other hand in case of Crusoe processor which is based on VLIW architecture not a superscalar architecture same program is running but temperature core temperature is reaching only 48 degree centigrade. So, we have discussed in detail various techniques which are used in implementing superscalar processors in spite of the fact they consume larger chip area and high power dissipation they are very popular but we are gradually reaching a point of diminishing return or we are reaching a point where beyond which we cannot proceed because power dissipation is reaching very high level. So, what is the other alternative available you will see the other alternative is not increasing the number of 5 line stages not increasing the frequency but to go for multi-code which I shall discuss in my subsequent lectures but before that in my next lecture we shall start with memory hierarchy because performance of a processor is not only dependent on the processor it also depends on the memory. So, that we shall start our discussion in the next lecture. Thank you.