 So, it turns out that the controlled experiment is actually something we are more comfortable with. In fact most of the scientific knowledge we have gained so far as a human race is probably due to the controlled experiment. And in this experiment what happens is that we first state a hypothesis. We start with the statement something we think is related to something else. In fact we might even think one step further and say something causes something else. Smoking causes cancer. So, that could be a hypothesis. So, once you make such a hypothesis you then use deduction and then go ahead and try to explain that if that hypothesis is true what kinds of measurements must you now make and what kinds of variables must you now look into in terms of the relationships between the variables. In particular what might these measurements turn out to be especially some of the variables are deliberately controlled. In other words you will choose precise values in your experiment for some of these variables and given that you have proposed a model if you are choosing values deliberate values then you should be able to predict according to your model what should happen with the other variables. So, if I am looking at a model which says the voltage v is related to the current i and using the proportionality factor r. So, v equals r i if I am coming up with such a model then in a controlled experiment I would come up with different values for the current through a resistor. I would try to set up different values of current through a resistor and then I try to figure out what is going on with the voltage you also you are in control. In fact, you will realize immediately that almost every undergraduate lab experiment in engineering is done this way. We choose certain fixed values for some variables we call these independent variables and then you try to make measurements of the remaining variables which you are now calling dependent variables they are dependent on the other variables through the values that you deliberately chose. Now, one can take this one level further if you think there are these other latent or hidden variables which might mess you up then you can try to deliberately set values for them also if you know what they are and then try to control the influence of these third party variables in your experiment. So, in a nutshell then in the controlled experiment you are going to choose a set of variables and you are going to deliberately assign values to them and if you know your model hypothesis properly you should be in a position to predict what will happen given the choices you have made for the values. So, the final step in this entire experiment is you carry out the experiment as you planned it you collect the data from this experiment and then you ask whether the results of your experiment match your expectation at the start of the experiment. So, this now becomes almost like an algorithm for how to carry out an experiment choose some values for your variables of course I am not getting into how you are choosing them but choose some values for your variables then make some predictions of what should happen to your system given those values for your variables do the experiment collect your data and go back and ask did what I see it turn out to be what I expected. Now, it turns out that invariably if you are going to try to figure out which way causality is going if you are going to try to figure out if x is causing y or z or is z causing x or y we need a controlled experiment which involves intervention and I will explain this first using a simple pair of equations and then I will get into a more practical research example. So, the simple mathematical version of this controlled experiment first. So, I am going to presume that I have three variables in my analysis x y and z and ultimately I need to figure out is x causing z or is z causing x. So, which way does this work? So, all I know is that I have made several measurements of x y z and when I look at x and y for instance I realize that there is a relationship between x and y a correlation between x and y and that precise relationship turns out to be y equals to x. So, this is something I have learnt from the experiments that I have already done and similarly I have learnt that there is a relationship between y and z whenever I look at pairs of y z values it turns out that y and z are related z equals y plus 1. But, remember what I had said about the equality symbol if I write an equation like the one on the top line then I can always rearrange it to the equations on the bottom line. So, if y equals to x I can rewrite it as x equals y by 2 and similarly I can write y equals z minus 1. So, when I look at these two sets of equations there is actually no difference between the two sets of equations they are equivalent. So, now if I really want to figure out if x is causing z or z is causing x these equations do not help me. The upper equation might suggest that if I choose x then I will find that y equals 2 x and then that will give me z equals y plus 1. So, x would seem to imply z, but then immediately you will realize that the lower equation will help me take a z find out y equals z minus 1 and then given y work out x equals y over 2. So, it is not clear which way x y and z are related given these two sets of equations. In fact, you can go one step further if I give you these two sets of equations I can arrange them into all kinds of forms and again these turn out to be equivalent. So, if I combine the y equals 2 x and z equals y plus 1 I can get 2 x minus 2 y plus z minus 1 equals 0 that is yet another equality statement. So, in arithmetic we have learnt how to take equality statements and keep adding terms on to each side and end up with new forms of the same equality statements and that is basically what I have done here I have come up with two different ways to rearrange the equations on the first point and come up with what remain as equivalent statements. So, things remain tough in the sense we still do not understand whether x causes z or z causes x. So, there are two models actually then which you are contrasting is x causing y and if x is causing y then y should turn out to be 2 times x and you can hopefully make out that I am multiplying by 2 that is on top of the arrow. So, notice now that I am using an arrow. So, I am not talking of equalities I am using an arrow to say that something causes something this is the best I can do in terms of a notation. So, x causes y which implies that given an x I will multiply it by 2 to get the y and then of course given a y I will add 1 to get z. So, this is one possibility going from x to z given a value x multiply by 2 add 1 that is z, but then we said that z could also possibly cause x and how does that work? So, given a value z I will subtract 1 from it and end up with y and given y I will divide it by 2 and end up with x. So, I have two models which I need to evaluate in the sense that I have collected x y z data both of these models are equally likely and unless I do now an additional experiment there is no way for me to differentiate between these two possible models x causing z or z causing x. So, which additional equation or experiment can we now introduce to distinguish between the two events or the two possibilities. So, this intervention this intervention so again I am showing you the same pairs of equations the first model on the left and the second possible model on the right and now as we ask what we do to intervene we say let us figure out what happens if we deliberately set y equal to 0 now notice this if x causes y then the moment I set x y will change its value, but what I am saying here is regardless of whether x caused y or for that matter z caused y I will go in there and I will force the value of y to turn out to be 0. So, this now is a different experiment that you are doing from what you have done before you go in there and you deliberately set y equal 0 and if you now set y equal 0 what should happen in the first model where x is going to y and then going to z if I set y equal 0 now the value of x no longer matters the value of x no longer matters and it should turn out that the value of z is always equal to 1. So, the value of z remains 1 independent of what you are doing with x and this happens because you have deliberately intervened and set a variable in there to a fixed value of your choice not allowing it to vary in terms of your experiment design of course when you set y equal 0 you now have to ask what would the other model have done. So, there are two competing models as you said y equal 0 what is now the interpretation of the other model and with the other model now it will turn out that if you said y equal 0 then x is x is 0 x is 0 regardless of which value you choose for z because the connectivity between x and z is now disturbed because you have force fit y equal 0. So, the observation now is that there were two models which could have explained the relationship between x y z the only way to go ahead and work out which model was correct was to do something extra that something extra was in this case to set y to a fixed value and in this case I chose the value y equal 0 and the moment I chose a y equal 0 it became clear that the two models would predict different things. The first model would now predict that z is a constant equal to plus 1 regardless of what value of x you see and the second model would predict that x is 0 regardless of which value of z you see. So, when I now look at my x z data after having said y equals 1 I immediately figure out which way of which model which way the arrows were pointing and therefore which model was correct. So, this then in terms of a simple mathematical example is the idea behind an intervention experiment. So, we control a particular variable in here in terms of finding out which way a relationship exists between x and z. Appreciate also one important thing which is that we intervene with the intention of breaking this connection between x and z. So, the connection between x and z existed through a variable y and that connection had been allowed to exist would have been hard for us to figure out which of the two models were true. So, the way we are going about it is by forcing that intermediate variable something in the middle of a model to essentially disconnect the left hand side of a model from the right hand side of a model. So, you are actually performing an analysis which separates out two sectors of a model. So, the x now is no longer related to z regardless of which way the arrows point and the net result is you are able to quickly figure out because of this broken connectivity you are able to quickly figure out which way the arrows truly point when you look at x and z data. So, the whole philosophy then with intervention is figure out how to set some interesting variables to fixed values which will then make sure that the remaining variables become now separate independent and consequently because they are independent you have a much better chance of figuring out which way the arrows point. Now, I could extend this article to a more practical scenario instead of talking about an arithmetic example let us talk about genetic defects in cancer. So, now everything that we read in the public literature tells us that if you have a genetic defect that is possibly the cause of the cancer that you are seeing, but I will put it to you this way what data we have is that there are people with genetic defects that is a measurement about a genetic defect this might require that the genes of these particular individuals to be analyzed to find out there are defects. So, there are genetic defects for individuals and separately there is the observation by the clinician that such people have cancer. So, in other words there are two variables at the moment in my analysis genetic defects per individual and cancer state for that individual. So, from observing that simultaneously there may be a genetic defect and that patient having cancer it is not immediately clear that one is causing the other. So, actually I am taking things back I am trying not to be biased by what we already know from the public literature I am trying to interpret that technically all that we see when we collect data about genetic defects and the clinical state of your tumor all that we learn is that there is actually a relationship between somebody having a tumor and somebody having a genetic defect. In other words we are proving a correlation between genetic defects and the presence of a tumor, but how do we go one step ahead and claim that it is a genetic defect which causes cancer because there is always two more possibilities. What are the other two possibilities? The second possibility is that you have cancer causing a genetic defect. How do we know that that is not true? So, it is possible that is your biology on your biochemistry is disturbed by the presence of a tumor that that in turn could cause a genetic defect with your DNA. So, in other words technically that possibility cannot directly be ruled out it needs an experiment to figure out now whether a genetic defect causes cancer or the cancer causes a genetic defect and by the way there is that third scenario which is the one we have been spending a lot of time on which is that both genetic defects and cancer are themselves not cause an effect, but that they are both effects of something else. So, the possibility remains that x and y are caused by a third variable z. So, genetic defects and cancer as for example, an alternate hypothesis could both be caused by radiation. So, fundamentally then there are multiple scenarios which come into play the moment we realize that there are patients who simultaneously seem to have genetic defects and who seem to have tumors. And therefore, if I am now going to perform experiments to work out whether something is causing something else, I cannot do that simply by looking at correlations between genetic defects and cancer. I am going to have to do a controlled experiment where I intervene and come into the picture and I deliberately change something about that patient and then look to see because of that change whether the state of the patient has changed. So, what kinds of things could we deliberately do? So, one obvious experiment is if the hypothesis is that a genetic defect causes cancer, then you take somebody who does not have a genetic defect, you deliberately create a genetic defect and then ask is the cancer going to result as a consequence of the deliberate change. Now of course, most of our ethical laws prohibitors taking humans and then causing genetic defects in them, but these are kinds of experiments that are normally done with microorganisms or with small animals. So, it will turn out that in most cancer biology labs that people will actually go in and deliberately change the state of a gene from being a normal gene to now being a defective gene and you then ask the question if that defective gene exists in this new mouse for example, will that mouse develop a tumor in a little time. So, that gives you therefore, the ability to now work out whether it is the genetic defect causing cancer or whether it is the other way around. How would you do it the other way around? The other way around implies that cancer causes a genetic defect which implies that you have now got to somehow deliberately introduce and this typically is done by grafting tissue, you deliberately implant or graft a tumorous sample of tissue into a mouse typically into the back of a mouse and then you ask a question sometime later now that I have introduced a tumor into a mouse does that seem to cause a genetic defect. So, you make measurements of genetic defects both are hard experiments to do, but notice fundamentally then that simply looking at sets of individuals with genetic defects and with cancer does not solve the question of what is causing what. You are forced into doing additional experiments where you deliberately change the state of a gene or the presence or absence of a tumor and then ask what is going on with other variable or other variables. So, in the controlled experiment therefore, we are interested typically in manipulating or controlling the values of certain variables, these have to be chosen carefully. I am not spending time explaining how I ended up with the insight that why should be set equal to 0, I am not spending insight with how I figured out how to change the state of a gene. So, I am not going to tell you that that is an easy experiment to think about and do. So, the fact remains that if you try to if you are interested in working out cause and effect you probably need a controlled experiment and you need to figure out how to intervene and change the states of your variables in that controlled experiment. Now, it turns out that in our controlled experiments that there are several possible problems and I list out the three most common problems which hurt you as you start doing your research. So, the most common problem is that you probably do not even know which variables are relevant to your entire analysis. So, in other words you go into your experiments knowing a subset of variables, some of these you identify, some of these you even go ahead and control. So, the fact remains that you probably not captured the relevance of each and every variable in your model. So, if you are ignorant of the importance of a variable then how do you even model it and how do you even control for it. So, this is a typical problem. This was a problem with that cricket example. The fact that there were alternate explanations was not even acknowledged in the original model. This was a problem with that temple versus crimes example. The fact that population might be available was not acknowledged. A second severe problem goes back to this business of consistency. So, we said that when you try to do these experiments you really do not want to influence what goes on with the experiments during the act of measurement for that matter even during the act of controlling something. So, therefore when you manipulate or control a variable ideally you do not want to influence the values taken on by other variables, other hidden variables. They are all supposed to be independent variables which possibly are controlling what you think is a final dependent variable. And therefore, if you are going to go about manipulating some of them how do you know whether you manipulated many other things which you do not know about. So, in this case for example, where I was talking about a genetic defect and tumors if the experiment that you have to do is to go in there and create a genetic defect. So, that you can then hope to see whether a tumor develops. Then the possibility exists that the very act of going in and changing the state of a gene in a mouse that itself might change the health of the health of the mouse. So, there may be other variables now which influence what goes on with the health of the mouse and therefore, whether it develops a tumor or not. So, the act of carrying out the experiment might create unknown influences on other variables. And these cannot be fully controlled and they cannot be controlled at all because you do not even know what these other variables are. But where you are going is we need to figure out a way to minimize the impact of these other variables. But at this time you have got to acknowledge that in many experiments we have flaws in the experiment design simply because the act of experimentation possibly influences the values of other variables in our analysis. A third typical problem involves feedback in a model. So, what we mean by feedback is this. So, we have been talking about x causing y which in turn causes z as a possible model. But what if z in turn also influenced x. So, you have x causing y causing z and z causing x. So, that is a feedback loop. Now the moment you have scenarios where you have things like this where fundamentally everything is affecting everything else. There is no clear way to figure out what is cause and what is effect. And if you simply write this graphical model down x affecting y affecting z and z in turn affecting x you will realize there is no easy way to figure out feedback. You probably need extremely precise measurements and that too is a function of time to say that the moment you introduce a variation in x the moment you change the value of x that some little time later you end up observing a small change in z that will allow you to claim that x goes to z. So, in other words if you are going to claim that x goes to z given the possibility of feedback you are probably looking at a situation where you probably need to make measurements as a function of time and then come up with an insight that some things probably happen just before something else. So, the influence of time now becomes critical if you are going to deal with systems where feedback is critical but fundamentally it is very difficult to control experiments where feedback is involved. So, when it comes to the control experiment let me describe and wrap up my discussion of the controlled experiment with these general thoughts. It turns out that as I said earlier almost every major scientific progress that has happened has been because of the controlled experiment but it also turns out that invariably the first hypothesis proposed in that particular field was wrong. So, in other words there is no magic bullet here you cannot come up with the precise hypothesis statement at one shot do the perfect experiment and conclude perfectly which way things work in terms of cause and effect. So, the net result is an iterative process where you make a proposition you come up with the hypothesis it is ok if it is wrong you test that hypothesis and when it is wrong you reject it but then you go back and revise the hypothesis because remember in the meantime you have done an experiment you have done some you have done an experiment and you now have new evidence and that new evidence starts influencing your notion of that hypothesis in the first place. So, you start modifying the hypothesis statement and then you end up if that turns out to be wrong again you end up rejecting it again until you end up with a final hypothesis statement which seems consistent with the available data with all the experiments that you have done so far. So, in other words science progresses more or less in this kind of an iterative fashion come up with a model test it reject it if not satisfactory modify the hypothesis statement where needed to better fit the observation so far but then do new experiments because you got to test the new altered model do new experiments and then look to see whether the new experiments are contradictory to your new model in which case you need to go back and refine a model once again and this goes on and on and on ok. So, the net result is that you are looking probably at an iterative process where you do not necessarily completely come to a 100 percent conclusion regarding the validity of a model. But this by the way is how most people learn if ask you how you learn you do not learn everything at one shot you cannot expect for example to learn to be a top notch researcher just because you have sat through one workshop like this. So, you learn your learning methods ok you will improve then you will have to go back and get feedback on how you have done and then you will learn some more and you will improve some more ok. So, learning is essentially the same approach for all domains particularly where experimentation is involved in terms of a controlled experiment. Now we said that the controlled experiment was important to us particularly because there was a possibility that there were variables which we never knew about which could have had an influence on what was going on. So, these we called hidden variables and we kind of said that really if you are going to try to prove cause and effect you must do something about the hidden variables try and set as many values for them as you can think of regardless of whether they are directly involved in your model or not and then try to figure out what happens to the connectivity between your variables is x causing z or vice versa. But what happens if there are variables which you do not know which are hidden to you which you fear might influence your analysis clearly you cannot directly control them you cannot assign values to them because you do not even know them in the first place ok. So, given such a need for a better experiment there was about 100 years back the development of a theory by Ronald Fisher in what is now published as the design of experiments as something called a randomized experiment. Now the randomized experiment is not that same random experiment I talked about a couple of days ago. So, when I talked about a random experiment I was talking about a coin toss. If toss a coin we called it a random experiment and why was it a random experiment because we said we knew what values could be expected when a coin is tossed heads or tails we knew that we knew the range of outcomes in advance of course we did not know the outcome of an individual trial in advance and then more importantly the experiment was repeatable. If I toss a coin again and again I expect the same set of phenomena to underlie the process of tossing a coin no sudden new result should emerge as we perform our reproducible experiment that was the random experiment ok. But now we are talking of a randomized experiment and in the randomized experiment the objective now in the randomized experiment is how do we randomize our choice of variable values to make sure that these other hidden variables which might mess up our experiment how do we randomize values to make sure these other variables do not mess up our analysis and our interpretation ok. So, go back to that one example I had given you on the first day of my talk I talked about this need for that brewery to figure out how to improve the product the production of grain whether it is hops or wheat on adding a fertilizer. So, basically the hypothesis that is being proposed here is adding fertilizer will increase the yield of wheat. Now that we take for granted now we know in fact almost to a certainty that adding fertilizer will increase the yield of wheat, but 100 years back how do you prove that how do they prove it ok. So, what is important was that if you are testing for the influence of addition of fertilizer on the yield of wheat that you see in a plot of land you want to make sure that no other variable in your analysis influences the yield of wheat nothing else must turn out to be the cause of an increase in the yield of wheat. So, how do you go about this. So, the brain wave that Fisher has is that you have actually got to do a controlled experiment first and that controlled experiment implies that you will take a set of plots and we will call them sub plots of land. So, is huge farm is broken up into sub plots and in some of these sub plots you add fertilizer and in the other sub plots you do not add fertilizer and then essentially you know control how much fertilizer it to you add to each sub plot where you are going to add the fertilizer and you then ask a question systematically will those sub plots with fertilizer end up giving you more wheat then those sub plots without fertilizer. So, up till this point actually what is being done is a controlled experiment because you are controlling the addition of a certain amount of fertilizer you are choosing how much fertilizer to add and you are going to hope to see that that results in an increase in the yield of wheat. Now, the problem is that there could be other reasons why there is an increase in the yield of wheat given that fertilizer has been added to certain sub plots what kinds of reasons it could turn out that when you divided your farm into different sub plots that you ended up adding fertilizer to sub plots of land which are already fertile and maybe the remaining sub plots where you did not add fertilizer were relatively rocky soil. So, now inherently you should get a higher yield of wheat as a function of the fertility of the soil. So, that is the third variable the fertility of the soil another variable could deal with the amount of moisture or water available to the soil. So, maybe those plots where the fertilizer was added also happen to have more water available to them compared to the other plots how do we know that the water levels in the sub plots were not influencing the yield of wheat. So, up to this point as I said we have a controlled experiment some plots will get fertilizer some plots will not get fertilizer, but the brain wave that Fisher has is that those sub plots which are to receive fertilizer must be randomly chosen literally using a random number generator. So, why is that important to us? Because the moment you say that a sub plot is randomly chosen to receive the fertilizer now they cannot be a deliberate bias as to how a set of sub plots are chosen to receive fertilizer. There is no deliberate bias in the sense that you choose sub plots which are already fertile you choose sub plots which already have plenty of water and consequently because you have randomized the choice of a sub plot you end up with a situation where all possible other variables in this case availability of water fertility of soil all these other hidden variables they are influence if there is an influence their influence gets minimized. So, therefore, in the randomized experiment the idea is to make random choices of variable values of those variables that you are interested in in your experiment and the reason you resort to randomness is because that minimizes the impact that might be provided by hidden variables in your model and the fear is that if you do not randomize if instead go with deliberately chosen choices for example, of sub plots then you could end up being strongly affected by a bias in how the experimenter has done is experiment. So, now what is actually commonly accepted is that in addition to doing a controlled experiment as you choose your variable set points your values for your variables in your model in your experiment you have to pay attention to how you are randomizing or randomly choosing various set points to operate at. So, you have to do a randomization to minimize the influence of any hidden variables which might cause systematic error. So, look at the interpretation of Fisher's randomized experiment if I write it in terms of these variables x and y. So, let x be the amount of fertilizer added why is the amount of seed produced. Now, if I look back at x causing y y causing x or both x and y being caused by something else actually it is a little clearer now given that we are performing a controlled randomized experiment. We kind of know the answer we know that x is going to cause y that is what our commonsense tells us now, but how do we know that y is not causing x no more that x was fixed by the experimenter and it was added before the seed was formed. So, we have been in control of the directionality of x to y we have been in control of this because we did something deliberately and then we saw the output from this experiment. So, therefore, y could not have been the cause because we fixed the amount of fertilizer to be added. So, what by the way what is the interpretation of y causing x that the amount of seed produced is related or is the cause of the amount of fertilizer that you see per plot that is a meaningless statement because this is an experiment where somebody is deliberately adding fertilizer. So, seed by itself does not imply that fertilizer has been added. So, fertilizer being added is an intervention by that experimenter who is deliberately adding it. So, this cannot be the explanation, but then the other third explanation remains which is that x and y are caused or are influenced by other variables. So, now the randomized experiment minimizes the possibility of this third scenario where x and y are caused by other variables because essentially you have randomly chosen your variable settings trying to maximize independence from these other variables such as soil type and soil moisture. So, in the hypothesis testing lecture I had talked about variable alpha that alpha was interpreted as an error. We always set up a hypothesis and we try to make a statement whether we agree with or disagree with the hypothesis. Typically we said we want to set up a hypothesis that we want to prove wrong in which case the alpha was interpreted as the possibility that the hypothesis that we wish to prove wrong has wrongly been rejected. In other words we are accepting the alternate scenario when we should have been in agreement with the null hypothesis that was an interpretation for alpha. If you go back to the slides you will realize that is not a type of error which you can push to 0. We try to minimize the impact of such errors that is why we choose alpha small at set choose alpha 0.05 or lower. So, it turns out in the randomized experiment you will randomize alright, but you cannot randomize to the extent where you have a 100 percent belief that you have found the true connection between x, y and z. So, they will still remain some small uncertainty as to which way is the relationship between x and z as opposed to z versus x. A further problem comes about if there is a large number of hidden variables. So, for example in this fertilizer experiment I listed only two variables which could possibly mess up my analysis. I said that the fertility of the soil could be a problem unless I figure out what to do with it. I said that the amount of water present or added to a plot of land could be a possible explanation for the yield. So, those are only two variables I can think of. What if they turn out to be 20 variables out there which could possibly influence your experiment? How many of these can you control? How many of these can you minimize by randomization? How many of these can you minimize in terms of their influence by randomization? So, it turns out that there is only one or two things to control or worry about as you do your experiments. Is there a reasonable chance that as you do the randomized experiment that you will probably get to the right result? But again I will leave it to you to think in terms of your intuition. If there is a large number of other variables which might explain what is going on and out of let us say 20 variables you are looking at only x and y two of them and there are 18 other variables which could possibly influence the interpretation of x versus y. Then the factor the matter is there is probably a very very high result a very very high probability that what you are seeing and what you are concluding is not a significant conclusion. What you are seeing between x and y is probably more easily explained by the other 18 variables than by any model that you have come up with between x and y. So, you cannot minimize the impact of other variables especially when there is a very large number of these other variables. So it is therefore important in good experiment design to sit and think through and list out all possible variables that worry you that could potentially influence you. Consider the influence of each measurement in your experimentation ask what each of those measurements in turn depends upon and ask whether you are in control of those variables or not and then get down to finally deciding how to do one controlled targeted experiment with the randomization built in if possible to work out which way the relationship goes from x to z or from z to x. So, this is such a critical issue for example with clinical trials again an example I have gone back to several times. So, in a clinical trial where a pharma company wants to figure out whether the drug does its job or not it turns out you are doing a controlled experiment there is a group of people receiving the drug there is a group of people not receiving the drug. So, there is a randomization that is required why is the randomization required it goes back to what I said to you on the first day the very act of somebody putting on a white lab coat and handing over tablets to a patient could possibly influence the health of the patient you feel better the moment you go to a doctor think of it any time you get sick and you go to a doctor you feel better because you know that you are now going to be taken care of. So, this is now a different variable it has nothing to do with the drug improving your health it is a psychological effect that you are seeing. So, these are typical hidden variables which mess up clinical trials where the narrow question is what is the influence of a drug on a patient. So, if I want to figure out the effect of a drug on a patient I have to randomize I have to randomize to minimize the influence of other variables. So, what kinds of randomizations are involved typically there is a block randomization where one group of patients gets the medicine itself and the other group gets a placebo everything else about the patients is supposed to be the same the way they are treated the way they are handled who gives them the tablet which is a drug and who gives them the tablet which is not a drug they cannot be a difference between the shape of the tablet or the color of the tablet which is a drug and the shape and color of a tablet which is for example, a sugar pill not a drug they cannot be the same with the moment it is different psychology kicks in somebody who receives a smaller looking tablet and the other person might think that they are not getting the true medicine. So, you have got to give identical looking tablets to two sets of people where the people with the sugar pill are serving as a control for any other variable which might potentially impact this analysis, but then it gets more complicated than that it is important for the patient to not know what they are receiving because of course if you are told that you are getting a sugar pill and you are not getting a drug to cure your disease that is going to influence you. So, these are blind trials patients must not know what they get and it goes one step further in some cases where the medicine is for a life threatening disease even the doctor must not know what they are giving the patient. Why because is now some evidence in the literature that if a doctor knows that is handing over a sugar pill to a patient who is practically on his death bed then the doctor knows that the patient is not going to survive and there is a psychological connection made during the act of handing over those sugar pills to the patient something in there gets transmitted to the patient in terms of the body language of the doctor and the patient realizes subconsciously that they have been given sugar pill that is known. So, the net result is ideally speaking even the doctor must not be allowed to influence the clinical trial and so the doctor does not know who is getting the drug and who is getting the sugar pill. The patient of course does not know whether they have received the drug or the sugar pill and the only person who knows what is going on it turns out in a clinical trial is a statistician who sits in the background does not interact with the patients and who has come up with some key coding of patients and pills. So, that patient A receives pill Z and that mapping is known only to the statistician. So, the test is done the clinical trial is done data is collected and all the data is sent back to the statistician who now because he has never seen an individual is not influenced by psychology here in working out the true merit of a drug as a function of improving somebody's health. So, one has to go to very elaborate lengths to try to prove that some variables possibly cause something else in this case that the variable of acting or taking a drug causes the effect of improved health. If I want to prove that I have got to prove that nothing else possibly influences this experiment. So, the clinical trial if you want to read further on how to go about performing good controlled and randomized experiments is something that you want to appreciate a lot of what is done there in statistics in terms of experiment design actually also applies to other domains whether it is engineering or humanities. I in closing want to give you a couple of links there is this article by Judea Pearl which you can find on the internet as a downloadable presentation that goes into the art and science of cause and effect. I request you to go and look it up there is a lot of the history of this present and there are couple of cartoons which explains some of the concepts that I have talked about. So, thank you for attention on this and now I would like to look at some of the feedback and possibly interact. This is Ufakumja college Hyderabad. Hyderabad do you have a question? I am Manizha. I would want to ask one question your sessions were informative, but we being computer science research scholars we do our experiments and our experiments are based on multiple parameters. So, when we would want to make a comparison which graph representation would be best. So, the question was one about a situation in computer science where there are several variables involved and where there is a need for graphical presentation of relationships between pairs of variables. It turns out there is no easy way to represent simultaneously interactions between sets of variables. I have not had time to discuss this, but there are multivariate graphical tools which indicate relationships between variables. For example, you might want to look up something called correspondence analysis and there are other tools which work out correlations between pairs of variables and try to depict them like correspondence analysis. We try to depict them all at one shot on a single biplot as it is called. So, look up correspondence analysis, look up something called a biplot. Thank you for your question. So, this is MES Pillai, new panvel. Do you have a question? I am having a parameter which I am considering as random variable. So, how should I decide which type of distribution fits well to that particular random variable or that particular parameter? So, the question for the rest of you is that there is a random variable that he is working with. There is data available for that random variable and the question is which distribution describes that random variable. So, this goes systematically as follows. First, you have got to ask yourself is it a discrete variable or a continuous variable? Are you looking at a discrete set of values as outcomes or is it a continuous range on the real number scale as a set of outcomes? If it is discrete, you have got yourself likely one of the binomial, Poisson or related discrete distributions. The next thing you ask is if it is discrete, you ask is there an upper bound on the number of outcomes that can happen. For example, with a coin toss, we toss 100 times, you cannot get more than 100 heads when you toss a coin. So, if it is discrete and with an upper bound on the number of outcomes, you possibly have a binomial distribution and then it is for you to figure out what is the proportion of the favored event in the binomial. If on the other hand the number of outcomes can go up to infinity, for example, the number of radioactive emissions given a radioactive object will range from 0 to infinity in a given interval of time. So, then the particular variable is more likely to be a Poisson random variable discrete with no upper bound. If it is continuous, the first thing you ask is, is it a symmetric distribution and do you have a large number of samples, in which case you probably have yourself a Gaussian distribution. And alternately if you have a distribution which ranges from 0 onwards up to infinity as your range of values, with some kind of a skew to it, you are probably looking at something called a gamma distribution and that is the range of, it is a family of distributions which can take on specific shapes as a function of the values of the parameters in it. Now, regardless of which model precisely fits your random variable, the fact of the matter is if you are taking this into some kind of analysis further, you may be able to get away with one approximation of a distribution to your data set, to your histogram as opposed to another. In other words, it is possible for the same discrete data set to be fit reasonably well by both a binomial distribution and a Poisson distribution, in which case it comes down to practical convenience as to which one you will use, in terms of working out probabilities at individual events. A lot of this is done by inspection and a lot of this is also done by having some kind of a gut feeling coming from your domain as to what kind of distributions are more likely to apply to your random variable. So, thank you for your question. Thank you.