 So, today hopefully we will wrap this up with little bit of research direction. So, if you are interested and want to contribute what should be your focus that is the whole point of discussing this. So, as we have already mentioned depart pipelines need much better predictions, because your mis-prediction penalty increases, because the distance from the time you make a prediction to the time you execute the branch, essentially the cycles lost which will make an inspiration, because during this time you will be fetching along the wrong path. So, just to give you some example although these are kind of old example, which RTLK has a 5 stage integer pipeline. So, that is just like what we are doing. Alpha 21 to 6 4 has a 7 stage integer pipeline, Intel Pentium 4 extreme edition has a 30 1 plus stage integer pipeline. So, this was actually when Intel peaked in terms of frequency. So, it was around 40 grand frequency. So, to get frequency they were making the pipe deeper and deeper. So, what this means is that essentially that you can assume that in this 31 plus stage integer pipeline, you will make a branch prediction quite early in the pipeline. And this is where the branch executes after 31 stages. So, during these many pipeline stages essentially means these many cycles, if you make a mis-prediction you will be fetching from the wrong path, essentially you fetching wrong. So, ultimately it is to wasted of cycles. So, which is why as you make your pipeline deeper and deeper you need better predictors, you need more accuracy. Today's Intel processors do not have such a deep pipeline, it is actually shorter, the pipeline is much shorter, but definitely bigger than 10, 10 stages. So, you still need better predictors. So, what is branch mis-prediction penalty? That is the number of cycles between prediction and verification. So, from the time you know whether the prediction was correct or not. So, until verification you will believe the prediction and fetch along that path. On verification you will find that you made a mistake in which is you have to throw out all the work that you have done and start afresh from the correct path. So, essentially on a mis-prediction this is the amount of time that you lose, this is the cycles that you lose. And remember that this is the minimum mis-prediction penalty, it can be larger. For example, there may be a branch which requires some data to execute and that data may get delayed because it may have a cache miss or something. So, for example, if you have a load instruction that loads from this address in register 2 and then there is a branch instruction which consumes R 2. Then what may happen is that even if the branch has reached its designated pipeline stage for execution, it may not be able to execute because R 2 is not yet available. Because the load instruction before it is still pending because it took a cache miss when to memory to fetch the data. So, this is minimum, it may be larger from many branches. So, deeper the pipeline is more work is lost due to mis-prediction. So, what are the challenges? So, these are the basic research problems in this particular area. First one is how do you remove destructive aliasing in branches with data? So, remember that in the example that we discussed yesterday in the GAG example, we had severe aliasing in one of the BHT entries. So, question is how do you really get rid of this problem? So, in a sense this is really a hashing problem you have to come up with smart hash functions. So, I will discuss some of these today may be one of them just. And the second one is there is a need for larger history because larger history often happens. So, if you can look over a large window of history normally gives you much better idea about what is happening there. However, here the problem is that there is an exponential relationship between the history lines and the storage that you require. So, just remind you. So, if this is x then this is 2 to the power of x. So, this is the exponential relationship. I want this to be bigger so that I can encapsulate bigger history, but that makes it taller and that this growth is exponential. So, that is a big problem actually. So, that is what makes this very challenging. The question is what kind of other branch predictors that I can come up with where this exponential relationship goes away. I can have bigger history, but I have a predictor which does not have this problem. Sir, bigger history will also need now more time to meet the study state. Yes, that is true. We are learning time, but that is hopefully once you change your train. It is just one time. Every time you go over a phase change, this will happen once you learn, then you can make better. Better prediction for data dependent branches. So, this is a very big problem. So, for example, you might remember from your sorting algorithms. So, a typical comparison that you do there is you often know this. For example, it is extremely difficult to predict this branch because it depends on the pattern of the data that you have in the array. So, this may even be better, but think about doing something like AI compared against some value here, some constant let us say. So, these kind of branches are very frequent in programs. So, these are data dependent branches and these are the branches that lower your prediction. So, essentially what you are asking is how predictable is this value? That is what really you are asking. That is extremely difficult to predict. And the third problem is not really about direction predictors, but it is about efficient handling of indirect calls. So, if you have a direct call, that is fairly easy to predict with the help of branch timing problem. So, this BTB will be correct every time because the target is constant. For indirect calls, the target will vary depending on the chain of the program. Because these are essentially function pointers or virtual methods, the target will depend on what you are really doing. And your BTB will be extremely inaccurate in this case. And remember that for calls, procedure calls, we really cannot take any help from the direction predictor. So, the only machinery that we have for predicting calls is the BTB. That is it. There is nothing else actually. So, this is a big problem. And this is a big problem essentially, especially in your object oriented languages, because they are virtual methods like in C++ or Java language. So, programs between these languages have a lot of indirect calls. So, this has to be addressed. So, pretty much these are the four problems that are challenging today. So, let us go through each of them quickly, some solution at least. It is impossible to give you what has happened in all these four departments as such. So, there is a family of predictors known as G SKU predictors. They are trying to address the first problem of reducing BHT alas. So, here the idea is that you have an odd number of branch history tables. And these are all G and G predictors. So, I will be a little more general here. So, suppose this is your global history register. And this is my branch PC. I have three different hash functions H1, H2 and H3 that index into my BHT. This is BHT1. So, what I want is, finally, how do you combine these three predictions? Any suggestion? Majority vote, that is why I wanted to have odd number of BHT. So, I take a majority vote and that is my final prediction. So, the question is that how do I get rid of this particular problem of alas? So, there is a requirement on these three hash functions. If there are two addresses that conflict in BHT1 because of H1, you should try to design H2 and H3 such that they do not conflict in these two BHT. So, even if this BHT for that particular those two addresses, this BHT gives you very poor prediction. These two will actually be correct and you will win in the majority vote. Is this here? So, that is essentially a problem of designing hash functions. So, one hash function that people often use rather one family of hash function. So, I will simplify this a little bit. So, I will assume that the input to the hash function is some value x. Let us first fix this size. Suppose, this is some power of 2. So, I will write. So, x is essentially a combination of the g h r and p c which is the input to the hash function. So, I write x in base 2 to the n. So, I can write it as let us say x naught plus x 1 into n into 2 n etcetera. So, if I do a modulo 2 to the n hash in here. So, if I do if I just do h 1 x where h 1 is a modulo hash. What is the result of that? x naught. Is that the theory that h 1 x if h 1 is just a modulo 2 to the n hash function. Now, what I want is suppose I have x and x prime and both of them have the same x naught. In that case they will alize on in the first stage. I want these two not to alize for h 2 and h 3. So, what people often use for h 2 how many of you know about perfect shuffle. So, what it means is that if you are given a bit strange. So, there are two types of perfect shuffle. You divide into two parts. So, let us suppose this is n bits. So, you divide into two parts n by 2 and 2 by 2. And what you do is you put the bits from here alternatively. So, you just insert them here. So, for example, if they are 1 0 1 0 0 0 1. So, you divide it in this way insert a 0 here. So, it becomes 1 0 then you have this 0 here insert the next 0 1 here next 0 1 1. So, what have I done what have I done this one here this one here. So, alternatively I take bits from this side of that is the perfect shuffle of this particular string. So, I could do it in some I could do it in a different way also by starting with this particular bit. So, instead of inserting 0 here I have to have started with 0 first and then inserted 1 and so on. So, I can alternate in two different ways. The function that I apply your x that is h 2 is h 1 x plus a perfect shuffle of x h 1 applied. You have two addresses that match the same slot for h 1. This will have actually a very low probability of that happening this particular hash function for these two addresses. So, I am not going to go into that. So, perfect shuffle is one of the hash functions that have this particular property. That if two addresses are aliasing one of the BHT's they are not aliasing the other two. So, for h 3 you can choose h 1 x plus sigma square and fix one apply shuffle twice. And there are nice properties of shuffle if you keep on shuffling ultimately you get back the same string. So, if you have too many hash functions you cannot do this actually for small number hash function. So, it is a basic thing here that what I require then I have a hash function h 1 which has a conflict under that condition I will make sure that h 2 and h 3 will not have a conflict. The majority function will pick this one the last repetition. And why is it called a skew predictor is because now people found out that actually if you make the tables of different sizes you get even better prediction activities. So, essentially what happens is that these BHT's will now have different size of history inputs. Now, when there are papers that try to look at the relationship between these histories. For example, if this history is n bit how many history should I put in BHT 2 and BHT 3 to get better accuracy. So, there are papers which look at geometric history lines. So, anyway so that is your G skew family of predictors they are extremely good and have very high accuracy. If you want to read up on that I can give you some references send me an email. Now, the second problem is the history relationship. So, here essentially we have the same problem I have n bit history here I will have an exponentially large table total power of n entries. So, how do you get rid of that. So, neural networks have some beautiful properties and that actually come into play here. So, how many of you have heard of perceptron. So, it is a simple thing actually let me give you an overview of what perceptron does. So, perceptron is the kind of a neural network. So, it has bunch of inputs. So, for our purpose here the inputs are distributes. Suppose I have n bits of history and what I do is I weigh them with some weights and there is an extra input called a bias. So, what comes out here this is just a sum of this and on this you apply a function. So, this one is going to be a real number what you get because the weights are all real numbers and what you apply a function which gives you a binary output. So, often the function applied here is the sine function. Sine by sine I need not remember which is sine function you take the sine of this value positive or negative. If it is negative your output 0 it is positive your output 1. So, what have we got. So, the perceptron is same as k of summation w i h i plus the bias. So, why is this relevant at all can somebody guess. So, I am saying this is my branch predictor now I give some history I get a binary output and it is relevant because a branch predictor is essentially a linear function. So, your history can be seen as a vector in the dimensional space if you want to input the bias is actually in plus 1 dimensional space when you want to include this 1 also. So, it is a vector and what you are trying to see is given this vector you tell me whether this vector belongs to the taken side of the space or the not taken side of the space. So, this one here is a plane it is a hyper plane in the dimensional space this one. So, essentially what you are trying to do is you are trying to learn this particular plane. So, in the 2 dimensional space if I have just 2 histories it is right h naught and h 1. So, I may have bunch of histories. So, after this history I always see a 1 that is a taken whenever a branch comes up with this history I see that this branch is same. So, I mark this history let us crosses here and I may have histories like these history points whenever they show up the branch is actually not taken and what we are trying to come up with a plane that separate these 2. So, like for example, this one this is what this perceptor is trying to learn in the ways they will come up with a ways. So, that you get this particular plane and then when you apply the sine function you will clearly separate these 2 you will say that oh it is related. So, you belong to this particular part ok alright. So, your ways will be adjusted. So, there is a training of Gordon to address the ways which I will not discuss here. So, here the what is to be observed is that I do not have a VHD anymore I can make my history as large as I can only thing I remember is the weights. So, now I told you what it becomes linear in the history size. If I have an n bit history I need n weights to remember that is it. So, now of course, a single perceptron cannot predict all branches for example, if I have a pattern like this. So, whenever I see a history of 0 0 the branch is not taken after that whenever I see I. So, this is H naught and this is H naught and H naught whenever I see a history of 1 1 the branch is not taken whenever I see a history of 0 1 the branch is taken whenever I see a history of 1 0 the branch is taken. There is no plane that can separate this plane that is impossible we cannot talk about that. Whatever you try you will always mix up something if you draw this plane 2 different points are on 2 sides. If you try this plane same problem try this plane same problem. So, perceptron only will predict this branch you need multiple possible points. So, I am not getting that the point here is that this problem is actually solved. I can now have large history and my story over it is linear in the history it is no longer exponential. There is a predictor called a profit critic predictor. So, these are very general kind of predictor prediction technique which are often called overriding predictors. So, let me give you an analogy suppose you are you are in a city and you are navigating to the city in your car all right. So, you are driving the car in a friendly city and you are trying to get from a station where you have been before. So, now, but you just why do not remember clearly how to do it. So, what happens is that you are trying out different lanes and whenever you get into a lane you proceed a little part your friend tells you that it does not seem like we have been here seems very important. So, you quickly backtrack start some other way you go around that path and you can then you trend again say this looks linear I do not think we have seen this. So, the point here is that after a branch prediction is done you start going along some path right. Suppose we get some of those history bits from the wrong path or from the correct path you can that you do prediction. So, you predict a branch you start fetching from the predicted path very soon you will come to one more branch right on the predicted path you look at the outcome of this branch and then when we predict this branch right and you go along the predicted path again you come to another branch. Suppose I give you this two extra bits after you have made prediction of this branch can you not tell me whether prediction was correct or not. So, it is actually incorporating a little bit of future information notice that this is not same as giving you two extra history bits on that side. So, what I am saying that suppose you make a prediction of 10 bits of history and then you get to see two more branch outcomes this is not same as giving you n plus 2 bits of history I can attach two more history bits there, but it is not going to be that good as this one because here it is telling you that after this prediction have you ever seen these two predictions or not right. When you went along the type of do you see these two predictions to happen in the past that means you must have made on this prediction. So, this is essentially an overriding predicted where you make a prediction first then observe for a while and then you try to make that prediction. So, the first predictor is called the prophet the second one is often the critic of that prediction because of that was correct or not. So, the second predictor is actually predicting the corrective of the first one. So, this one is very helpful in often predicting data dependent branches correctly, but anyway nonetheless predicting data dependent branches remains a big problem it is a huge problem and there is no good solution so the gap is huge. The second the last one is how do you handle internet cost right. So, here one solution that is often used is you use the path history to index into the BTB instead of using the PC to index into the BTB. So, essentially what I am saying is that before you reach the call you must have seen a few branches. So, that defines your path through which you actually make the call. So, that has a very good correlation with where you are going to going to going to make the call. So, what I am saying is that suppose you have a function f and there is a function pointer here which takes you somewhere right. What I am saying is that the path that leads to this particular call determines has a very good correlation with where you are going whether you call some function x through this pointer also for some function y through this pointer determines through which path you take. So, instead of using the PC of this branch to index to the BTB you use a hash of these PC's that you have seen along the path to this particular call. So, that is also seen to prove accuracy. So, of course, this is very handling if you want details let me know I will give you papers you can read upon that. So, one problem that is that cross cuts all these prediction research is criticality of branches. And here the main point is that not all branches are equal in point. So, you might end up investing a lot of time trying to predict a branch which is not very important. So, that is not really a good thing to do. So, predicting some correctly is critical to performance while others have very little impact on performance. So, it is known that all branches are not equally critical the question is why is that. So, here is a list of factors that often influence your branch criticality. So, one obvious thing is misprediction penalty. So, when you misprint a branch how many cycles do you use before you go along the path. And every branch does not have equal misprediction penalty the reason we have already discussed one of the reasons is that if the branch is data dependent and the data is not ready the branch may have to wait longer. So, minimum is the number of five stages between last direction prediction and branch execution. So, remember that we can have multiple direction predictors in multiple pipe stages. The last one to make a prediction is hope to be the best one from that part onward to the time you execute the branch is the misprediction penalty. Certain branches make a delay due to data dependence and predicting these correctly is important for performance because here if you mispredict you are going to lose a large amount of cycles. So, predicting these branches is very important if you have data dependent branch. The second point criticality factor is cache pollution due to wrong path execution. So, when you are going along the wrong path you are bringing the data instruction into a data instruction cache. Conflicting instruction and data working sets along the two branch paths will cause a big problem. So, for example suppose along one path you have certain number of certain instructions in the cache. When you go on the wrong path these instructions may conflict with these instructions in the cache and may actually evict these instructions along the cache. So, next time when you come along the correct path you will start in instruction cache. Same may happen with the data execution. And criticality of the correct path that is for example, if the correct path always starts with an instruction or data cache miss. In that case you want this to be correct because otherwise what will happen is that you go on the wrong path and find out that you are going to be safe. You cancel all the instructions you start fetching from the correct path and at the very beginning you take an instruction cache. So, this entire latency will actually show up in your execution because you cannot do anything actually at the time you do not have the instruction to proceed. The pipeline is empty we are waiting for the instruction place to complete today. So, that also adds to the criticality of branches. So, the branches that often have these things become critical and those are the branches that you frequently want to predict correctly. So, how you go about doing this? So, critical branches need high prediction accuracy. Identifying branches that miss predict frequently is actually easy it is not very difficult. And the reason is that you can just have a small cache of recently miss predicted branch basis. You can maintain a cache which you at any point I am telling which are the branches that miss predicted people. However, all of these may not be critical. So, that is what is most important. A branch may have a very high miss prediction accuracy, but even if you increase its accuracy by a large amount it may not affect the whole cache because it is not critical. So, discovering good features that correlate well with the behavior of the critical branches is difficult. This is a very hot research topic. So, if you want to invest time on branch predictors you can look into that. So, here by good features I mean what program property you tell you that this branch is actually critical. Most of these branches are actually data dependent that is what it turns out ultimately. And prediction accuracy depends on the entropy of the data that depends on. So, as you know if some pieces of data have high entropy then they are usually not very predictable. You want low entropy in the data in the predictor. Today's best direction predictor has very high prediction accuracy which means small number of branches actually cause most of the miss predictions and these are highly critical branches. The performance gap between such a predictor and the oracle is large. Which means even if you have 97 percent prediction accuracy the major chunk of work is still left and this is the hard part. So, there is a big room for improvement and this is one of the big problems that identify critical branches. So, quick summary of cultural hazards redirect fetch from various stages of the pipeline with increasingly better prediction. So, some of the machineries that we have discussed are branch target buffer return address stack direction predictors. The fetcher selects the most appropriate next PC every cycle from among different indications coming from different stages. Research problems focus effort on critical data dependent branches what features correlate well with the behavior of these branches can the compiler offer help in any way. One of the things that that is very interesting is can I pre-execute branches in a separate thread. So, today we have a lot of threads and codes in our processors. So, what I can do is I can translate program on one code and only execute the branches in some other code. So, I can pre-execute certain number of branches and actually can know the outcome beforehand before this code actually gets to that point. Of course it is not as easy as said you have to figure out the instructions that lead to this branch which we have to execute actually, but there are proposals that try to get to that point. All right any question before I move on to some other types of hazard so you never have any questions. So, the other type of hazard that is important in a pipeline is data hazard and this arises because pipelining disturbs the sequential hard process and data dependencies among the instructions starts short. So, here is an example. So, the first instruction adds R 2 and R 3 puts the result in R 1. The next instruction subtracts R 5 from R 1 and then puts result in R 4. Third one adds R 1, R 7 puts in R 6 and this one or R 1, R 9 puts in R 8 and exhaust R 1, R 11 puts in R 9. So, here you can see that result of add is needed by all the instructions. So, in the pipeline what you can. So, this is shown in pipeline actually. So, this is the add pipeline that is going through here. That instruction will execute here, but right back to register 5 here. Other instructions will read R 1 in this stage. This is the state where you read the register 5. So, as you can see here, these three instructions marked in red will actually read wrong values. This XOR instruction will read the correct value because it happens after the values written back. So, how do you solve this problem? You can solve these three instructions. So, this instruction can be solved by how many cycles? I can move the decode here. So, it can solve by three cycles then everything will come right. After that everything will flow through as they are right. So, all by three cycles after you get a solution as we will see we will get a solution something else that can be done. Why is the value produced R 1? Which pipe changed here? X. X the value is known here. So, can I do something sorry. Completely pass the value. Exactly. So, since the value is available in the pipeline somewhere. I can pass the value to this guy. I can pass the value to this guy and then I can pass the value to this guy and this guy will read the value from yeah read the value from the register 5. So, how do we avoid increasing CPI? So, stalling is clearly not acceptable. So, there is something called a phased register 5. Which solve the three cycle apart wrong. So, what is that? What it does is that just like the way we had a you got it of one branch value. By doing a phased branch execution here what we can do is you can see that register right back completes in the first half of the cycle alright. And register read happens only in the second half of the cycle. In that case I can actually save this one. This one will actually get the correct value from the register 5 in that case. So, this actually solves three cycle apart raw dependencies. By the way these are called raw dependencies or read after write. You are reading after a write. So, these are three cycle apart raw hazard happening in this instruction that can be resolved by having a phased register 5. You can complete your register right in first half of the cycle. That still does not solve the problem with these two instructions. So, as a as somebody has suggested why do not we forward the correct value in the same time right. So, this value is produced here. I can directly forward it to the input of the ALU alright. And that is exactly when it is. So, what will happen is that it will read a wrong value in this particular stage. Which will be overwritten by ALU by the correct value here before it enters the x values. Similarly, if you remember the pipeline this value will be carried forward in the pipeline latches. So, here it will be available here right. And that can forward to the this ALU. The same value R1. So, read wrong value in ID RF stage, but bypass value over writes it. So, question is how to really how do you really implement it right. So, you always feed bypass value to the ALU input. The question is how many sources in bypass network and do we need bypass to memory stage also. Here we are showing bypass only to the execution stage. So, first let us take each of the questions at a time. So, how do you implement this bypass? Let us focus on this bypass. I read a value from register which is wrong right. So, there are two questions that is before I enter the value to the ALU how do I know that I read a wrong value. That is the first question. And if I can figure out that I read a wrong value how do I overwrite it. What kind of logic circuitry I require. What else is that I read a wrong value. So, if you have forgotten the instruction stream that will show you. So, you are talking about these two instructions at this point here to here bypass. Yes. Answer to R 1. And now I am reading from R 1. So, there is a I am reading the wrong value. Yes. So, what kind of what kind of operations am I doing to resolve that that I am reading the wrong value. So, comparing. Comparing what? Destination register of the previous instruction with the source register of this. Exactly. So, I have to compare both the sources with the destination of this one. If there is any match I know that a raw hazard has happened. So, what does that mean. So, let me take you back to our. So, how does this change. I have two inputs coming into the ALU and register file produces this value. And this value. I have a multiplexer here which depending on the op code selects either the immediate or the register value. And here either the next PC PC plus 4 or register file value. So, how do I modify this? So, if I ask you to write a piece of program that describes this logic what is that that might be more that might be easier to think actually that way ok. I am writing a C program what should I write it? What am I comparing? How should it change? So, I have to somehow change the inputs to the ALU based on some indication that I made a mistake. What was the right register of the destination register? That is carried forward remember that there is a latch. So, when so we have an add instruction right which is currently executed in this multiplexer right. The sub instruction is currently here right one cycle behind reading the register file ok. So, that means what the add instructions source register and source registers and the destination registers are in this latch stored in this latch you can assume that. And the sub instruction is currently being decoded. So, to compare what? The load registers will be less when you write a destination. Where should I put the comparison which stage is it? In the decode stage why do you want it to be in the decode stage? So, that the decode stage will tell me whether the address is right or not ok source. The problem with putting it in the decode stage is that you get to know my I mean sub instructions source registers only after this decode it. So, wherever you get to know it probably here before you get to know the instructions right yeah. Can I put it in the decode stage? So, the problem with putting it in the decode stage is that you get to know my I mean sub instructions source registers only after this decode stage. Can I put it in the execution stage? Is that possible? We can put it in the execution stage we will then use these two latches the EX. Which one? The add instruction would be in this latch. Yes ok. And the destination of this compared with the sources of this. Ok alright. So, I need two comparators right I need to compare the destination of this latch with the two sources here. I need two comparators ok. So, comparison output goes back what we do with the compared logic. So, I have two comparators and what does it compare? This is my X name. So, it takes the register identify R D right compares it with R S ok. So, it tells me yes to answer the compiler. So, what? How do I use it? I need to overwrite the inputs trade this one is wrong this one could be wrong both could be wrong one could be wrong both could be correct how do you decide? The value to the input mixes and use output of the registers and actinies. These mixes. So, what you are suggesting is so, I have one more input right. So, both these mixes have an input extra input which is output of the my good. So, there is my A and U the output goes into two mixes. So, these mixes already have two more inputs right. So, the selection logics will be off code and combination of this comparator. So, here for example the comparator says yes they actually match I should pick up the A and U output instead of picking up the one coming from the register point ok. The other input will remain unchanged which will be selected by the off code whether if I am selecting the immediate and of course it does not matter whether there is a matter or not in that case hopefully they will not be matched is this there. So, this path is known as the bypass path coming from the destination of the A and U to the sorry output of the A and U to the input of the input this A and U is handling actually right is it ok. Now, how do I do this by us the next instance that also has to happen at the input of the A and U right. But the only thing is that the input is coming from a different pipe stage right. So, when this instruction is executing this instruction is now in the right back right that means the value that it is writing to the register is currently in this latch which I need to buy this right. So, what that means is you have to add one more input to this particular multiplex and your selection will now be based on R D of this and R T of this R D of this and R S of this. We need two more components. So, now there is a small problem on the problem this example does not have a problem ok, but the problem arises in problem ok. So, is this there ok. So, now how do you answer this particular question. So, we know how to implement a problem we have some idea with that. We need bigger maxes in front of the A and U's and how many sources in the bypass network. So, in this case I have shown two sources one coming from coming from X Mem Latch is actually X Mem Latch. So, it will be taken from the X Mem Latch and one coming from the Mem right back latch. Are there any other sources in this particular pipeline from where I need to bypass and is it that the destination is always saying that is the destination always X state or there could be other destinations as well. So, do I need to bypass from the fetch state my engines could there be a situation for them to bypass to the fetch state something cannot be right I know what we have produced in the fetch stage. I produce an instruction and that cannot be input to some instruction an instruction cannot be input to an instruction. In the decode stage I produce some internal decoded signals they also cannot be used. Here I produce values may require bypass here also I produce that may require bypass. Do I need a bypass from here no am I sure. So, if I want to bypass from here where can I why do I bypass what are the possible inputs. If I bypass from here to here I can bypass from here to here the next instruction that will come here we will have the decode stage here we will read the file here and we will get the current value input. So, I should never need a bypass from here to here that is not needed. What about this to this and this to this are they needed. If I want to bypass from here to here that means I am bypassing the result of this instruction to this instruction for it to line back that does not make any sense why should I do that. What about this to this what does the memory stage need it needs an address for sure and it needs a value for store instructions. What if this instruction was a store instruction which stores R1. Suppose this instruction I will write down the instructions use the R1 to compute the address store the value R1. So, this particular bypass I am talking about this instruction. So, this particular bypass takes care of forwarding why should I bypass R1 also I am talking about why do I have to delay to this point there is no reason right. So, we are currently so we need just these two sources X and M M will bypass two kinds of values one is values produced by any instructions here which are bypassed one cycle rate and values produced by no instructions. So, that takes care of this particular question right. So, we will come back to this question next time.