 Hello, and welcome to this session. This is Professor Farhad in which we would look at the regression analysis for estimating cost. In the prior session, we looked at the high-low method. We're done with this. We also looked at the count analysis and engineering estimate. Before we start, I would like to remind you to connect with me on LinkedIn if you haven't done so. YouTube is where you would need to subscribe. I have 1,800 plus accounting, auditing, tax, finance, as well as Excel tutorial. If you like my lectures, please like them and share them. If they benefit you, it means they might benefit other people. So why not share the wealth? Also, check out my website, farhadlectures.com, for additional resources for your course, this course as well as your other accounting and finance courses, especially if you are studying for your CPA exam. So let's go ahead and get started. In the prior session, we looked at the high-low method. And if you don't remember what the high-low method, we basically end up with a formula for the straight line and we said Y equal to A plus BX. A is the fixed cost, BX is the variable cost, and Y is the total cost. And we use this method based on the two data points to define this line, the highest point and the lowest point. And that's why it's called the high-low method. Well, guess what? This is not a good measure. So when we get the formula, when we get the straight line formula, Y equal A plus BX, and I showed you, I placed a random number from the data that we have, and I show you it does not measure properly all the data. It works perfectly for the highest point and the lowest point. So what's the solution? The solution is to go through a regression analysis where we analyze all the data points, all the data points to estimate this line. So rather than just two data points, the regression will go through all of them. And now it's easy to run a regression. You can do so an Excel sheet. So if you have an Excel sheet, you can do so in the description below, you will find my instruction on how to run a data analysis and regression. And it shows us, the regression model shows us how fit, how good is the model. So in this session, I am not gonna run the analysis. If you want to know how to run a regression analysis, please look in the description. In this session, I'm gonna use the prior data that we used for JK renovation and we're assuming that the direct labor cost affect our overhead cost. So simply put, what we are saying, the labor hour, the labor hour is X, the overhead cost is Y. So X predict Y. The more labor hour or the less labor hour will predict overhead cost. Now we run the analysis and this is the output. So what I'm gonna do here, I'm gonna, I'm going to read the output for you. Read the output. Because if you wanna learn how to run the analysis once again, click on the link in the description. So we're gonna be looking at few important numbers that you need to be aware of, starting with multiple R, which is 0.8561. What does it say? Multiple R is the correlation coefficient. What does it tell us? It tells us if there is a relationship between the dependent and the independent variable. And this number is with range between one and negative one. So notice here, the number is point, almost 85%, 0.85. What does negative one or one state? What does negative one or positive one say to us? As the multiple R approaches one or negative one, this means the model is a good predictor. It means wherever you are measuring, here we are measuring the direct labor hour has effect on overhead cost. If this number is closer to one or negative one, it means it's a good predictor. As this number approaches zero, there's no relationship between the two factors that you are trying to study. Here, the factor is pretty high, 0.85, approximately 86%. The two variables are related, which is overhead cost and labor hours. So that's the first one. The second one we're gonna look at is R square. R square is the coefficient of determination show how well X explain Y. So how well labor hour explain the overhead cost? So what's the proportion of the variability in Y explaining by knowing something about X? So it tells us, once we know X, how much of X explain the variability in Y? And here what we have is the number is 73.30, okay? So as this number approaches one, okay, it could go from zero to one. As it approaches one, it means there's a strong relationship or there's a strong explanation of X explaining Y, okay? As R approaches zero, as it goes down to zero, there's less explanatory. It means the independent variable does not affect the dependent variable, okay? Predictor variable are not related if it approaches zero. It means just there's no relationship that X does not in any way influence Y. If it's exactly zero. Here we say that labor hours explain 73.3% of overhead cost. This is what the 73 means. Standard error, which is 4963, it shows us how tightly the actual data points fit and how close they are to the line. So basically you remember we have the line and how close the numbers are they to the line? Okay, the smaller, the better. But again, it's not as important as multiple R and R squared. Then we have the observation is 15. It means how many data points we used to run this analysis. We used 15 month worth of data. What other important numbers we need to be aware of from this analysis is the coefficient intercept. This number right here. This number tells us this is basically the fixed cost. So basically based on our formula, the fixed cost is 20,378. It's where the line crosses the Y-axis. Remember have the X-axis and the Y-axis. So where the line crosses the Y-axis is 20,378. It means at an activity level of zero, we still have a fixed cost. So they tell us what the fixed cost is. 20,378, the labor hour coefficient, which is 34 point, this is another important number, 34.63 or 64 rounding. This tells us the variable cost. Now we have the fixed cost. And now we also have the variable cost. Those are very important figures. Why? Because those two figures, they're gonna help us set up the formula line, which is total cost equal to the fixed cost plus the variable cost times the independent variable. So basically the overhead cost equal to the fixed cost 20,378 plus the variable cost $34.65 or 64 cents, you know, I rounded it a little bit higher times the labor hour. So this is our line. This is the equation. This is the equation of our line. This is the equation of the line. And this equation of the line went through all the data points and that's why it's better than the high low method because under the high low method, we only go through how many points? Just two points. Here we're going through the whole data. So let's assume B equal to zero. What does it mean B equal to zero? It means this number here equal to zero. If B equal to zero, it means we have no variable cost. That means all the cost is fixed. All the cost is fixed because if you have no variable cost and multiply X by zero, it's gonna still gonna give you zero. It means we all have fixed cost. Let's take a look at the t-stat. That's also an important number. t-stat here, the t-stat. The t-stat is the slope of the line to determine if the slope of the line is significant. And how do we compute the t-stat? We'll take the coefficient, divide the coefficient by the standard error. We'll divide those two numbers together. We'll get 4.59. Is this good enough? If it's a greater than two, we say that this observation is significant. Notice of them, the labor hour coefficient and the intercept coefficient, they're both significant because they're both greater than two. Also, we can look at the p-value. The p-value tells us the relationship between X and Y, whether it's a coincident or if it's not coincident. Well, the factor that this relationship is coincident is 0.0005044, which is a low number. It means there's a low chance that this relationship is by coincident. There's a low chance. It could be, but the relationship is very small. So you want the p-value to be like all zeros. Like you want the more zeros you have for the p-value, the better off you are. Notice it says E negative five. This is low. So the relationship, it seems it is significant based on the t-stat because, I'm sorry, based on the p-value as well as the t-stat, they're gonna basically tell us the same thing. We want this number to be small because this tells us it's not by coincident. The relationship is not by coincident. A confidence interval, this one here lower than 95%. It's just, you know, it's computed. You're taking the coefficient minus the standard error will give us the lower bound. Coefficient plus the standard error gives us the upper bound. So it is this number minus this number and this number plus this number, the higher and the lower. What does it mean when we have a lower confidence of 95 and upper confidence of 95? Simply put, we are 95% confident that the fixed and variable costs are within the relevant range. It means whatever we computed here, which is what we're looking for fixed costs and variable costs, we are 95% sure that those numbers are correct. They are within the relevant range, the fixed costs as well as the variable costs. What does that mean? It means you are taking a 5% chance that we could be wrong, but 95% is a good, it's a good confidence interval. Now you could increase this 95 to 99 or you could reduce it to 90%, but 95 is the standard in the industry. So it's a pretty good, practically we can say we are 95% sure that the fixed costs and the variable costs are what the output says. That's basically what we are saying. Now, although the prediction of overhead is R squared equal to 73.3, that's a good, that's a high R squared. Sometime management, they may wish to see whether a better estimate can be obtained by using additional predictor variable. So here what we did is we used only one variable. Let me go back to the data. And basically what we did is we used, give me one second, let me go back. We only used labor hours. We only used labor hours. Remember, this is the X to predict Y, to predict Y. So what can we do? Well, guess what? Let's run another analysis where we also include material cost in the equation. And this has become a multiple regression. We don't have only one independent variable. We have X1 and X2. We have two variables. The second variable is material cost. So the more variable you have, the better off you are because you're gonna try to explain what is changing your dependent variable, okay? Assume JKR innovation has determined that material cost as well as labor cost. So now we're gonna be adding the material cost as well not labor cost. Labor hour can affect overhead. Well, the results of using both labor hour X1 as a factor in material cost as predicted were obtained using the spreadsheet based on the regression. And basically this is what the formula would look like now. Overhead cost equal to some intercept plus the first variable times the variable cost, the independent variable, labor hour, plus the second variable times the material cost. So this is the formula. And we find out once we render regression that the variable cost is $3.32 for the material cost and 17.84 for the labor hours. And once we do this regression, notice the correlation coefficient is now 0.951, which is higher than, what was it earlier, higher than 0.85. And R square, which is, this is an important one, used to be 73%. Now what we can say, we can say that labor hours and material cost explain 90% of the overhead cost. Is this a better explanation? Sure it does. There's still some 10% of overhead cost. It's not explained by those two variables. And don't worry about the adjusted R. It's something similar to the R, but it's a statistical number. The correlation coefficient for this equation is 0.95 and the adjusted, again, R square, don't worry about the R square. This is an improvement over the result obtained when the regression included only labor hours, which is included only one single predictor. Now we have two predictors, which is called a multiple regression. Now what are some problems with using regression to come up with your cost, your estimated cost? One thing you could be using an appropriate data. And from this, and even if you're not using an appropriate data, even if you are using appropriate data, the data is good, but you draw bogus relationship between two factors. For example, you would use direct labor to predict material cost when no relationship exists. So when you're using the regression, you have to make sure what you are doing makes sense in the first place. If it makes sense, that's good. Another issue with regression is you could have missing data, data that it's not part of the analysis, but it should be. You could have a problem failing to emit outlier and what's an outlier? Basically, this is the true regression line right here. Of a sample data. What happened is, this is the true regression line. The computed regression line is this one. Let me put the computed regression line in a different color. This is the computed regression line. Now what's the issue between the true regression line and the computed? In the computed line, there was an outlier. At some point, we used a lot of machine hours. A lot of machine hours. And we don't know if this point is an outlier in a sense that we mistyped it. Maybe during that month, we had a breakdown and we had to redo everything. That's why we used more machine hours. We don't know. But the point is, once you notice there's an outlier, you want to decide, do you want to eliminate this outlier because it's an unusual event. It may not appear in the future. Therefore, remove the outlier and you will have the true regression line. Because you want to know, what's the true regression line? Because you're using this to estimate your costs in the future. So you want to eliminate this outlier, then run your analysis. But before you eliminate, and not in the real world, you have to explain why you eliminated this outlier. There must be a good reason why you eliminated the outlier. Because the outlier, it's skewed your numbers. So for example, here, fixed cost around $50,000. With the outlier, fixed cost goes up to around $63,000 or $65,000. So it makes a difference. If you're going to eliminate this outlier, you better have a good reason why. Also, it could be inflation if you're using dollar amount. And the economy going through an inflation, so you have to adjust the numbers to inflation. You could have mismatched period using data from one period to match it with another data from another period, which will not make any sense. As always, I'm going to remind you to like this recording if you like it, share it with other students, and don't forget to visit my website, farhatlectures.com for additional resources for this course, as well as your other accounting, finance, and CPA preparation. Good luck, study hard, and most importantly, stay safe.