 Welcome back to the course on statistics for experimentalists, today we will be talking about regression analysis, so far we have been looking at design of experiments factorial design fractional factorial design, we will take a small break from the design of experiments and look at regression analysis, you will also find a lot of similarities between design of experiments and regression analysis, for example the analysis of variance concept will also be extensively used, the t test will also be used in the regression analysis. Simply put regression analysis indicates development of empirical correlations to the experimental data, we are not modeling from first principles, we are trying to find the relationship between the factors that are influencing the experiment and the response recorded from the experiments, this has lot of significance, it is not giving the data to a spreadsheet or to a curve fitting program and getting a high value of r square, you all know what is r square and we aim for r square values of 0.99 and we add as many terms as possible to the model equation to achieve this high r square, the important thing is the models should be simple and it should not be un-wieldable, so the more number of terms you have in the model equation, even though it may look impressive on paper, it will be very difficult to apply and the predictive capabilities of the model may also decline, it may work well for a given set of data but may not do so for other set of data from some other source. So anyway with this brief introduction let us get started, rather than looking at simple least squares method, we will be applying linear algebra concepts to do multiple regression analysis, the references for the subject are Draper and Smith applied regression analysis third edition New Delhi, Wiley India and the book by Montgomery and Runger applied statistics and probability for engineers, fifth edition New Delhi, Wiley and this is the prescribed textbook for the course on statistics for experimentalists. You also have the book written by Montgomery, design and analysis of experiments and another good book is written by Kutner, Nash-Chem-Nettner applied linear regression models, fourth edition New Delhi published by McGraw-Hill, so now let us come to the multiple regression model. So we will take a simple model first where y is the response and the model parameters are beta naught, beta 1, beta 2, x1 and x2 are the factors or the independent variables, epsilon is the error term. This is a very interesting model, we say that the response y is governed by a combination of 2 factors x1 and x2. The important question is whether this error term is because of random effects alone, we have started cautiously with the simple model and then the unaccounted extra effects are ascribed to this error term. The error may be random or may be a combination of random effects and also the unexplained portion from the model. If this model was inadequate then the error term will absorb the unaccounted part of the response. So in such situations the epsilon term here cannot be thought of random error. Now we are talking about multiple regression models and why is it called multiple regression? X1 and X2 are called as regressor variables and if there is more than one regressor variable we use the term multiple and the term linear is usually associated with these kind of regression models because the parameters beta naught, beta 1, beta 2 are linear in nature and here if you want to make it nonlinear you can develop a model y is equal to sin beta 1 x1 or y is equal to beta naught plus e power beta 1 x1. Then you cannot term the regression model as linear because the parameters are nonlinear in nature. Before we proceed further into the course I would like to recommend something. Please refresh your concepts on linear algebra. You do not have to go very deep into the subject of linear algebra to understand what is being covered in this course. If you are new to this subject or you have done your maths long time back there is nothing to worry. You can take elementary book on linear algebra look up the concepts of expressing numbers or arrays in suitable matrix forms understand about the dimensions of the matrices, how many rows are there, how many columns are there and what is meant by inverse of a matrix. You do not have to go into the detail the techniques of finding the inverse of the matrix there are different tools available different software available to find the inverse of matrices. You have to understand what is meant by inverse of a matrix and you should also know about matrix addition matrix multiplication multiplying a square matrix with the column matrix when you can multiply a square matrix with the column matrix what should be the dimensions of these square matrix and column matrices so that the multiplication is possible. So these are the basic concepts you should become familiar with I would imagine that it would take you maximum a couple of days to become familiar or refresh these concepts. If you can do that then the matrix manipulations will become very straightforward and we can carry out the regression analysis in a more efficient manner. So coming back to the multiple regression model there are more than one independent variables there are independent variables like x1, x2 these are called as regressor variables and beta 1, beta 2 are called as regression coefficients. Beta 0 is the intercept and beta 1 is called as the partial regression coefficient 1, beta 2 is partial regression coefficient 2. The terminology is very important in statistics and statistical analysis so it is important to define them upfront. Beta 1 and beta 2 are partial regression coefficients 1 and 2 the term partial is used because beta 1 refers to the expected change in the response due to a change in x1 with x2 being kept constant. What is the expected change in y when x2 is kept constant and x1 is varied. Similarly beta 2 is the expected change in the response due to change in x2 with x1 being kept constant. This is a very famous diagram for linear regression for simplicity I have just shown 2 points there are obviously more points but I am taking only a section of the diagram. If there are only 2 experimental points I have shown beta 0 and beta 1 with only 2 experimental data or experimental observations I could have fitted a straight line passing exactly through these 2 data points. However, imagine that there are more data points and only a few of them are shown and my objective in linear regression is to make a line pass through such experimental data points in a way that it satisfies certain mathematical criteria. After those mathematical criteria I will explain more detail later but essentially speaking the concept of least squares applies here. Let us say this is the deviation between the experimental data point and the model prediction line similarly this is the deviation between the model prediction line and the experimental data point. So we are trying to balance the deviation between the data and the prediction. So what we do is we find the deviation between the experimental data point and the model prediction. So that deviation is squared. Now we square the deviations of all the experimental data points from the model prediction and that sum of square of the deviations is minimized to find the parameters beta 0 and beta 1. You might have come across this already in least squares method. We are just using the same concept. The only difference is we will be doing with the matrix manipulations so that large amounts of data can be handled efficiently. Let us assume that this line which is given here is the true line. In other words let us say that it accurately represents the relationship between y and x but the experimental data points are deviating from this true line and that is because of random effects. We talk about the error term being normally distributed and with constant variance. So the experimental data points are lying anywhere around the true line here. The experimental data point can also lie here. It can lie anywhere as given by the spread around the true line. So the mean of this distribution is beta 0 plus beta 1 x. This is also the expected value of the response. Now the data is scattered around it because of random effects and the variance of this probability distribution is given by sigma squared. Now when you come to the next value of x, this would be xA and this would be xB. This is the expected value of y and this is where the actual data is lying that is because of random fluctuations. Again you describe a normal distribution around this mean value and you expect the actual experimental data to lie somewhere here. So what this really shows is the probability of the experimental data point lying further and further away from the true line becomes smaller. We expect the experimental data points to lie closer to this straight line. So this is the basic concept. It is important that these distributions describing the scatter of the experimental data around the true line is normal and these distributions have constant variance. Coming again, the distribution describing the scatter of the experimental data from the mean or true line is normal and the distribution has a variance sigma squared and all the experimental data points are also described by the normal distribution with mean beta 0 plus beta 1 x and variance sigma squared. All these distributions have the same variance. So each response is assumed to belong to a normal distribution centered vertically at the coordinate given by the regression line. The variance of all the normal distributions are assumed to be identical. The variances of all the normal distributions are assumed to be the same. Now let us be a bit more ambitious instead of talking about one independent variable or two independent variables. Let us talk about many independent variables or many regressor variables. So we have a mathematical model explaining the relationship between the experimental response and the independent variables. The unaccounted portion is due to experimental error or random error. We hope that this particular model whatever we are proposing is adequate to describe the systematic dependency of y with x1, x2 so on to xk. So we have k regressor variables, beta 0 is not associated with any regressor variable. The regressor variables are x1, x2 so on to xk. Each regressor variable is associated with the coefficient such as beta 1, beta 2 so on to beta k. Beta 0 is not associated with any regressor variables. Every beta 1, beta 2 so on to beta k are associated with the k regressor variables and these parameters are called as partial regression coefficients. What is the significance of beta 0? Beta 0 refers to the intercept. So let us imagine that you have a model y is equal to beta 0 plus beta 1 x1 plus epsilon and the model predicted would be y hat is equal to beta 0 hat plus beta 1 hat x1 and beta 0 in that case would represent the intercept. What is the response predicted when x1 goes to 0? In a multi regressor variable sense when x1, x2 so on to xk go to 0, beta 0 would then be the predicted value of y. Please note that this is not a law. This is only a empirical model trying to or attempting to explain the dependence of y on the different variables x1, x2 so on to xk. For real life experiments we may not know the actual functional relationship between the response and the influential factors. Sometimes the process may be very complicated and the equations describing the process may not be solved to give an analytical solution. In such cases rather than having a very difficult mathematical model we try to understand the process behaviour through a simple empirical model. So the regression equation represents an approximate relationship between the response and the experimental factors over a narrow range. When we do experiments we do the runs only over a certain range. That range may be defined based on the limitations or okay. That range may be based on certain constraints. You may not be able to achieve a relative humidity greater than 100 and you may not have a relative humidity less than 20% when you are doing the experiments. The temperature range in which you carry out the experiments may also be between 20 degrees to 70 degrees centigrade. So these are defined ranges for your experimental variables and when you use the experimental observations to develop a empirical correlation then please note that you cannot extrapolate the correlation to higher values or lower values than what you considered. This is very important because when you change the range of your experimental observations the model parameters may also change. You are assuming a certain relationship and that relationship may change. In fact when you are trying to do calibration of instruments you may find that the same calibration line may not apply over the entire concentration range. When the concentration crosses a certain value you may have to come up with a different calibration line. So the important thing to notice what is the range of the variables being considered in the present phase of experimental work and develop the correlation to account for the variations within this range. The multiple regression model involving interaction between variables may also be represented by y is equal to beta 0 plus beta 1 x1 plus beta 2 x2 plus beta 1 2 x1 x2 plus epsilon. In the design of experiments we saw that the interaction fx play a very major role sometimes even dominating over the main factors and here even interactions can be accounted for in the regression model. We simply put a regression coefficient beta 1 2 and then take the x1 x2 variation into consideration. And you can if you choose put beta 1 1 2 into x1 squared x2. So the choice of the model is yours okay you can keep extending the model up to a certain point you cannot extend the model indefinitely. The simple reason for that is you have a certain number of or finite set of observations and for solving any set of equations you have to make sure that the number of variables is less than the number of experimental observations when the number of experimental variables is equal to the number of experimental observations then you can get a perfect fit. But usually when we have n experimental observations the number of parameters we estimate from the regression model will be less. So I will just explain this portion once again. Once I was telling you you can keep on adding more and more terms to this model. But you cannot do so beyond a certain point the important reason for that is you have to look at the number of experimental observations if the number of model parameters is greater than the number of experimental observations then you cannot find them. If the number of model parameters is equal to the number of experimental observations then you will find an exact fit. Usually the number of experimental data points is quite high let us say 40 or 50 and the number of parameters you are estimating in the model may be 5 or 6. So the number of model parameters you are estimating should be smaller than the number of experimental observations it can also not exceed the number of experimental observations. So within this constraints we can try the effect of adding more terms to the regression model. Because you added on x1, x2 or x1 squared or x1 squared x2 it does not make the regression model nonlinear. As I said before the regression coefficients are still linear in nature. Here please note that x1, x2, x1, x2 are not unknowns the unknowns in this equation are beta0, beta1, beta2, beta12. So these are the unknown terms and they are all linear in nature okay. So this one x1, x2, x1, x2 can go even up to higher orders but as long as we have simple beta0, beta1, beta2, beta12 the estimation is still termed as linear regression procedure. If you are confused by the notation beta12 you can simply define x3 as x1, x2 and call beta12 as beta3. So that is what I have done here y is equal to beta0 plus beta1, x1 plus beta2, x2 plus beta3, x3 plus the error term. The error term as I said earlier may only be random error or it may also be the unaccounted part of the responses. When you give the model to the software we can get a 3 dimensional graph for this particular case once the regression model is estimated we can get a response surface and that response surface need not be planar especially if this x1, x2 term is significant the response surface may be a curve. It may even have peak if you have terms like x1 squared or x2 squared there may be a maxima but it does not mean that the linear regression concepts are being violated. The model parameters are still linear and the estimation procedure is called as the linear regression technique. So this is a more complicated model here the quadratic terms are being added in addition to the interaction term. You simply call them as x3, x4, x5 and use the same procedure to find the parameters or estimate the parameters what is the procedure I will come to it in a moment. Now we are going into the matrix approach a matrix is a 2 dimensional representation of numbers. So data are present or rather data are presented in an array form and this array comprises of both rows and columns. So what you need to do is in this array you have to define 2 indices i, j, i refers to the row index and j refers to the column index. So if I say xij I am talking about a number which is present in the ith row and the jth column of the matrix. Also you can have xijk then it becomes a 3 dimensional matrix but we are not going to look at such matrices in our analysis we will be only looking at 2 indices. If you have x23 for example it refers to the number which is present in the second row and third column of the matrix. Please note that x23 need not always be equal to x32 only in certain special matrices x23 may be equal to x32 otherwise the numbers may be unique. Anyway that is a brief background on matrices I am sure you will find lot more information in standard textbooks as I said earlier please do not go too deep into the subject you just learn what is required for our present analysis. So we are having k regressor variables what are the k regressor variables x1, x2 so on to xk and you have n observations. So you can represent this as xi1, xi2, xi3 so on to xik and yi this may look a bit confusing let us look at only yi first yi with i running from 1 to n represents the n observations of the experiment and xi1, xi2, xi3 so on to xik are required because for each experimental setting you need one equation. So xi1 represents factor 1 or variable 1 for the first for the ith run if you have x31 then it means the value taken by the independent variable the first independent variable for the third experimental run x31 is the value taken by the independent variable the first independent variable for the third experimental run xi2 or x32 in our example is the value taken by the second independent variable for the third run. So we have xij written with i running from 1 to n and j representing the regressor variable k regressor variables. So the model for the ith run or the ith experiment may be written as beta not. This is only a intercept. So we do not have any additional subscript here. This is universal to all the experimental runs. Beta 1, beta 2, beta k are also universal to all the runs. We are not estimating beta 1, beta 2, so on to beta k the regression parameters for each and every experimental run. We are having a group of experimental data for the entire group. We are finding out the parameters beta 1, beta 2, so on to beta k. But the experimental conditions will vary. For n experiments you may have n different combinations of experimental conditions and that is given by xi1, xi2, so on to xi k. You have only independent variables or the factors influencing the experiment running from 1, 2, so on to k. But these independent variables may take different values for different experimental settings and that experimental settings is given by the index i and that will run from 1, that i will run from 1, 2, so on to n. And also please note that the number of experimental observations is usually greater than k the number of regressor variables. So now we can represent it in a matrix notation as y is equal to x beta plus the error term. So you have y which is y1, y2, so on to yn. This is column vector. It is called a vector because it has only one column and it is having n entities. So you can call it as a vector, n dimensional vector and so you have 1, 2, so on to n. These represents the different observations from your experiment. You have done n such experiments. Then you also have x which is the main matrix here. This is not a square matrix. The first column in the x matrix is always 1. Why do you need 1 in order to account for multiplication with beta 0? Beta 0 you please remember is not associated with any regressor variable. It is the constant term in the model equation without any regressor variable attached to it. It is the intercept for you to interpret it physically and so we have 1 here and then you have x11, x12, x13, so on to x1k. What is x11? x11 is the value taken by the first regressor variable or the first independent variable for the first experiment, x12 is the value taken by the second independent variable for the first experiment, x13 is the value taken by the third independent variable or the third regressor variable for the first experiment. So on to x1k is the value taken by the kth independent variable or the kth regressor variable for the first experimental condition. Now it is not necessary that all the values here should be different. In some cases it may so happen that 2 independent variables are kept constant and the other 2 variables are varied or changed. So that is fine but there are some precautions you have to take that I will tell a bit later. But what I am trying to say here is for a given experimental condition it is not absolutely essential that all these values taken by the different regressor variables or the independent variables should be different. So how many such rows you will have? The number of rows you will have will correspond to the number of experiments run. So the last row will be xn, j where j runs from 1 to k. Now beta is again a column vector and it is running from beta naught, beta 1 so on to beta k. You may think look we earlier defined why the response vector which was having n entities n dimensional vector but here is it not expected that k it should be n here not k. We know that k is the number of regressor variables in addition to beta naught but in order to make the matrix representation consistent should not k be equal to n. The simple answer is it is not strictly necessary. You can have k less than n. How the matrices align themselves such that there is no inconsistency we will see very shortly and then you also have the error term epsilon 1 epsilon 2 so on to epsilon n. So far the error term is doggedly or persistently accompanying the regression equation. Now y is a n by 1 vector of the observations. n is the n rows normally when you are representing matrix dimensions you give the row number first and then 1 is the column index. So you have y1, y2 so on to yn and x I explain all the terms in the previous slide what is the dimension of x matrix. You have n rows and then you have k plus 1 columns so you have k regressor variables so you have k columns here and then plus 1. So we call p is equal to k plus 1. So k plus 1 is p so x is a matrix with n rows and p columns where p is k plus 1. x is a n by p matrix of the levels of the independent variables. Beta is a p cross 1 vector of the regression coefficients and epsilon is a n by 1 vector of the random errors. Now you have y is equal to x beta plus epsilon we want to estimate the parameters beta 0, beta 1, beta 2 so on to beta k. So when you are talking about estimated parameters or predicted parameters to distinguish it with from the true parameters given by the beta column vector we put a hat to it to show that this is the predicted value. This beta corresponds to a column vector of betas which are the true values but we do not know the true value from experiments. We can only estimate the values of the parameters from experiments and to show that these are parameters estimated we put the hat symbol. So beta 0 hat, beta 1 hat so on to beta k hat. Once you have these parameters estimated then you put forth a prediction equation and that is given by y hat. y hat is equal to beta 0 hat plus beta 1 hat x 1 plus beta 2 hat x 2 plus so on to beta k hat x k. So now the error term has vanished okay. We are unable to account for the error term by this equation. This equation only gives the systematic variation of y hat due to change in the controlled factors x 1, x 2 so on to x k. It does not explain the unaccounted or random phenomena. So that is why you have also in the predictive equation you are putting it as a y hat and then you do not have the error term. Now you have to find the least squares estimator but before we go to the least squares estimator for the system of equations. I just want to take another look at the system of equations. So please look at this equations here y is equal to x beta plus epsilon. This is n by 1 and x was n by p, beta was p by 1 epsilon was n by 1. When you do matrix manipulations you get n by 1 and p and p cancels you get n by 1 and you also have n by 1. What I am trying to say here is the dimensions of this column vector should be equal to the dimensions of the resulting matrix or vector here. This is not a column vector. It is a array comprising of n rows and p columns. Beta is also a column vector of p rows and one column. So when I multiply n by p with the p by 1 the p cancels out and I get n by 1. So the given equation is consistent and even though beta which was having terms like beta naught, beta 1 so on to beta k, k need not be equal to n. Can k be less than n? Yes. k be equal to n? Yes. This is a bit dicey. We can say k is equal to n minus 1 yes because you are also having beta naught and k greater than n definitely no. So this is what you have to keep in mind because you have n observations or n experimental runs. You have k plus 1 that is equal to p parameters and you have n equations. The equations are y1 is equal to beta naught plus beta 11 x1 plus beta 12 x2 plus so on to beta 1 k xk plus epsilon 1. You have the second equation beta naught plus beta 21 x1 plus beta 22 x2 plus so on to plus beta 2 k xk plus epsilon 2 so on to you have the nth data beta naught plus beta n1 x1 plus beta n2 x2 plus beta nk. So this represents the n equations. So you know that when you have n equations, you can solve the n equations with n unknowns maximum okay and so that is what you have here. n unknowns maximum you can solve okay. We will continue. Let us now come to the least squares estimators of beta. We will discuss this in the next class.