 We're gonna start in two minutes. Okay. I hope you are able to see the slides. Yes, we are. Okay, so welcome to your last last content session that we're going to have for this year. I take note that your exam is on the 15th of October as a preliminary date. It might change or it might be the date that you are writing. So we're working towards that date. So it means since this is the last content and then we're going to have the activities for regression or for study unit 11 on Saturday, then on Wednesday next week, we're going to go through the system and look at study unit 10 and 11 questions online on my UNISA online. I'm not going to create notes for it. I'm going to create an online assignment on our Twitter site. But I will share my UNISA when we go through it. So we'll do that. And then as one of the practice activities and then on Saturday, we're going to also continue with the regression and your chi-square test. Then I'm going to leave you to do your assignments on your own. And then if you have any questions on the when is the assignment due on the date, on the second or the 30th, when is your assignment 5 June? I think it's the 20th. Is it the 20th? I think so. Let me check. Yeah, I have it as the 20th. You must check on the assignment, not on the calendar or something. You must check exactly the date on the assessment too. It says August. It says? Date, August. Date, August. Yes, so that's the date I also remember. It's on the date, yes. So yeah, so then on the other days, you can work it out on your own so that you can complete it. But for the next couple of sessions, we're going to do activities relating to study unit 10 and study unit 11. Okay, so today we're going to start looking at the regression and correlation and coefficient of determination. I know that this session is titled regression, but yeah, it contains or it includes regression, correlation, and coefficient of determination. By the end of today's session, you should be able to make inferences about the coefficient of correlation, meaning you should be able to interpret it and also be able to interpret the coefficient of determination. You should be able to develop a regression model or come up with the equation of the regression line and you should be able to use the equation of the regression line to predict a new value. And also you should be able to interpret some of the coefficients that comes from the regression line, which is your intercept and your slope. You should be able to interpret them. Okay, so what is regression or correlation? So we're going to start with the correlation side of things. I'm going to do, there are a lot of calculations that you need to do. So just bear with me. If you don't understand something today, do not panic. On Saturday, we can also spend more time looking at the information. So for today, I just want to concentrate more on the content and then we can start looking more in terms of the questions and how they ask the question on Saturday more specifically. So there will be, I will show you how to use your calculator to do most of the calculation. And I will also show you how to do the calculations on Excel itself. I have sent an Excel template to you, but I will show you how to use it in this session as well. So regression and correlation. So when we talk about correlation, we're talking about the relationship. When we talk about regression, we're talking about predicting a value. Both of these two. So we can use the regression to predict a new value, but with that regression, there is a relationship that needs to exist between two variables in order for you to be able to see whether that the new value, what would it be? In terms of the correlation, we use it to determine or to test if there is a relationship between the two variables. And we can visualize the relationship between two variables, because when we have two variables, the only visualization tool that we can use really is the scatter plot. So with a scatter plot, it will help us to show the relationship that exists between two numerical values. So what is a scatter plot? It's just a visualization tool used to show or visualize the relationship between two variables. Correlation, on the other hand, is a numerical value that you get that tells you whether they, or it tells you if there is a relationship between the two variables. And it can tell you in two different things. It can give you or when you calculate the coefficient of correlation or the correlation, it can tell you the magnitude or it can tell you the direction. So in terms of the magnitude I'm talking about, it can tell you the strength of that relationship. So whether the relationship is strong or weak, and it can also tell you the direction, whether it's positive or it's negative. So that's what correlation will do. It will tell, it will give you a value or a measure that tells you the strength of that relationship or the association. Okay, I've already said that. In terms of correlation, that's not equate to causation. So when two things are correlated, that does not mean one cause the other. It just means that they have the same relationship, but does not mean X causes Y to happen. Doesn't mean that. So there are different types of relationships that you can see when you put your data on a scatter plot. So these are the scatter plot. And the first on your left side, you will see that there are two relationships and those we call them the linear relationship. They are called linear relationship because if I draw a correlation or a regression line, which is the black line, if I draw the regression line across those dots, I will see a pattern images. And that will tell me whether that relationship is positive. If the line goes up, up, up, and for all the values of X when they are increasing, the values of Y are also increasing because the line will go up, then we say X and Y are positively related. If the values of Y are increasing, but the values of Y are decreasing, here we say the relationship is negative. We can also have, apart from having a linear relationship, we can have a Cavalier relationship, which is a Kev linear relationship. And the first one is what we call a quadratic relationship. So it's like when you bounce a ball. When you bounce a ball, it goes up and then it comes down. So that takes a form of a Cavalier relationship and what that shape, we call it a quadratic relationship. We also have a relationship called the exponential relationship. And if you can remember, early in the beginning when COVID hit, when they were doing daily reports every day telling us about COVID, at some point they were showing us graphs that shows exponentially or an exponential growth in terms of the number of COVID. And that is what exponential relationships looks like. So with the number of days, when the days are increasing, the number of COVID-19 infections were also increasing exponentially. And that is a relationship as well. We can also have no relationship in this instance. So therefore, when data is scattered everywhere, there is no pattern in that. We cannot tell whether when the values of X are increasing, the values of Y are increasing, that this will say there is no relationship. Or when the value, one value is static, it doesn't change. There is no variation. We also say that there is no relationship. But we can use a coefficient of correlation to calculate the magnitude of that relationship as well in order for us to determine that this, the coefficient of correlation for this is equals to zero. Because when R is equals to zero, then there is no relationship. We're going to look at an example later on. So what is coefficient of correlation? Like I said, correlation deals with the strength. Coefficient of correlation is just that measure that we calculate to tell us the strength and the direction. So since it tells us the strength and the direction, therefore it means when you do the calculation, the value of R, because coefficient of determination is denoted by R, the value of R should be between negative one and one. Remember with probabilities as well, we said the probability should just be between zero and one. So with coefficient of correlation, or with correlation coefficient, the value of R should only be between negative one and one. If it's bigger than that, then there's something wrong that you did when you're doing the calculation. If the value of R is positive, which means it's greater than zero, then we say the relationship is positive. Which means when the values of X are increasing, the values of Y are also increasing. When R is negative, which means R is less than zero, it means there is a negative relationship, or the correlation is negative. Then it means when the values of X are increasing, the values of Y are decreasing. If your correlation coefficient R is equals to zero, then there is no relationship. Therefore the X and Y are not related. Sometimes you might want not only to say this is a positive relationship or a negative relationship, but you also need to include the magnitude of that relationship. And say this is a weak or what you call the strength, which is a weak, a moderate, a strong, a perfect, you want to say it in that manner. So in order to do that, there is a scale that you can use. So if R is equals to negative one, we say that relationship, the strength is a perfect, the direction is a negative relationship, which is negative correlation. And it will also happen with the positive. If R is positive one, it will say it is a perfect positive relationship. If the value of R falls between negative one and negative 0.79, we say there is a strong negative relationship. So it means between 0.79 and one, we can also assume that it is a perfect relationship or a strong relationship, depending on how the definition you want to use. So you can relate them to a perfect relationship as well. So if they are between 0.79 and 0.39, we say it's a moderate relationship, at least. And if... What is that in? Yes. I just want to know those numbers. Is it a fixed number or is it just an example? It's just an example. It will depend on you and whoever or you will see. Like it's not a fixed scale. So it's just a guidance in terms of if you are given 0.69, are you going to say it's a strong relationship? It's not that strong, but it is a moderate relationship. Like when it's 50%, it's just a moderate relationship. It's a 50-50. But if it's 0.15 or 0.10, which is almost close to zero, you have a leeway to say whether it's a weak relationship or there is no relationship between that. Because at 10%, it's so small. So you have your own choice to say how you will define that. Because then when it's equals to zero exactly equals to zero, we say there is no relationship at all. Some people also, yeah, in terms of the weak relationship, sometimes it starts at 35, not at 39. So I've put my scale higher. So it will also depend. So different books as well defines it different. Sometimes even they don't even have so many scales, they only have two scales, three scales. It's either perfect, strong, and weak or no relationship. That's it. So but depending on how they question you and you can just use this as a guide as well. So the same will happen when it is a positive relationship. You will also look at the magnitude and the direction and then make your decision. Okay, so so far that is just the theory. How do we calculate that? To calculate the R, when you get an answer, we'll do the calculation later. So let's look at some of the examples when we have a scatter plot. So if we look at this scatter plot and we calculate the R and we find that R is equals to 1 for this scatter plot. Therefore, when we interpret it, we'll say this is a perfect positive relationship because the value of R here is positive. And this also shows us that when the values of X are increasing, the values of Y are increasing. The next one, you can see that the dots are spread, but the relationship here is or your coefficient of correlation here is R is equals to 0.18, which means this is a weak relationship. But also when you look at the dots, it shows that when the values of X are increasing, the values of Y are also at least increasing a little bit. But what's most important is what is the sign of R when you are looking at the graph, because that will help a lot in terms of the interpretation of your correlation or your relationship. On this one, the R on this graph is 0.85. As you can see that it clearly shows when the values of X are increasing, the values of Y are also increasing, and this is a strong positive relationship as well. And this one, you can see that when the values of X are increasing, the values of Y are declining, and our R is negative 0.92. This is a strong negative relationship. Always remember that when you calculate the coefficient of correlation, the value of R is going to be between negative 1 and 1. And also when you look at your R, if it's equals to 1, then we say those are perfect correlations that exist, or that is a perfect relationship. And the more closer the value of R comes to 1, the more related the values of your variables are. You will be expected to know how to calculate the coefficient of correlation using the sum square measures or using... So they might give you the sum square measures and ask you to calculate the coefficient of correlation, or they might give you a table and you can use your calculator, or you can use Excel to calculate. But in terms of the sum square measure, in order for us to calculate the coefficient of correlation, we can use the sum sum... These are the sum square. Sum square, because it's 2SS. Sum square measure of X, which is the summation of your X value minus the mean. So for all your observation of X, you subtract the mean, square them, add them, all of them together. That will give you the sum square of X. Sometimes with the summation, it will be the sum square minus the sum of X squared divided by N. Those two are totally different. We will look at this in more detail when we do the activities and exercise. The sum square measure for Y will be the sum of your Y minus the mean of Y squared, which you can also calculate it by using the sum Y squared minus the sum of Y squared divided by N. So once you have those sum squares, they will go at the bottom of your coefficient of correlation. You also need to get the sum square measure of your X and Y, of the product of X and Y, which will be the sum of your X minus your X bar times Y minus your Y bar, which you can also write it as the sum of X minus the sum of X times the sum of Y divided by N. And that will be at the top. It will be your numerator. So in terms of the formula for R, which we will use later on, you will see that we're not using the sum XY, the sum sum XY. We're using R is equals to the sum of X and Y minus the sum, which is this part, the sum of X times the sum of Y divided by N divided by the square root. And I'm going to do it separately. You will see the sum of X squared minus the sum of X squared divided by N times the sum of Y squared minus the sum of Y squared divided by N. The formula will look like this, or the formula can look like this. R is equals to... So they are different ways. So don't get alarmed when you see different formulas. They are different. They are the same, but they calculate the same thing. They look different, but they calculate the same thing. Sum of X times sum of Y divided by N times the sum of X squared minus the sum of Y. Uh-uh. The sum of X, so it should still be sum of X, sum of X squared times the square root of N times, I must not forget the N, N times the sum of Y squared minus the sum of Y, everything squared divided by N. So all these formulas, they calculate the coefficient of correlation. One, two, three, all of them, they do the same. They calculate the coefficient of determination. Also, including this one also, you can create it in this manner. The sum of X minus, so you can take this, divide by the square root of those two. Will still give you the same. So when we, later on, also when we do the coefficient, the regression line, there is a coefficient of regression line called the slope. We can use the sum square measures as well to calculate the slope, because the slope is just your sum square measure, which is this divide by that, which is for the slope. You will see when we do the activity and exercise later on. Lizzie. I'm going to use this, yes. Sorry, just took, are you done? I just wanted to make a comment. Yes, you can. No wonder relationships are so difficult. Look at these formulas. Yes, yes, so complex. That's a good one. And so we can also calculate what we call the sum square, the sum of the measures of variation by using the sum square measures. So the measures of variation. So in order for us to calculate the total variation, which is your SST, which is your total sum of squares measure is given by your SSR, which is the regression sum of squares or sum of squares, sum of squares regression measure, plus your error of sum of squares or sum of squares errors, or we call it the SSE. So in short, I'm not going to use the long name. SST is equals to SSE plus SSR and expanded formula for those. When we do some examples on Excel, I will show you how to calculate those as well using this. Usually in the exam, they will give you the measures so that you don't have to go and scratch your head and all that. So they might give you SSR and SST and they ask you to calculate the coefficient of determination. So also you can use the SSRs and the SSTs to calculate other measures as well from there. So I'm not going to go into detail on to this slide. It's just for your information, which is what we just explained there. They just discuss what your SST and your SSR and your SSE are in terms in relation to how you explain them. I'm not going to go into details on that. So once you have your coefficient of correlation, which is your R, you can use your value of R to calculate your R squared because on your calculator, you just press the X squared button or you take your R and multiply it to itself again or by itself and then that will give you a coefficient of determination. And you can also, if you're not given R, but you are given your sum square measures of regression and your sum square measure total or the total sum square of measures, then you can calculate your R squared because it's SSR divided by SST. What is coefficient of determination? Coefficient of determination because it uses the total variation, it just gives you the proportion of the total variation in the dependent variable that is explained by the independent variable. So we want to see what is that percentage of the independent variable that has an influence on what happens to the dependent variable. You need to know how to also explain the coefficient of determination in terms of that proportion and explaining it is just replacing the proportion with the actual value and then rewriting the whole sentence as it is and replacing the dependent variable and independent variable with the actual values that you are using. So let's say my dependent variable is the price of bread and my independent variable is the sales of bread. So all the yeah I'm gonna call it the sales of bread because that is how many number of people are buying the bread. So and I've calculated my coefficient of determination and I found that it was 12%. So how do I interpret this? I will say 12% of the total variation in the price of bread it is explained by the variation in the number of sales of bread we have. Oh people bought. That's how you're going to interpret it just as straightforward as that. So like I said coefficient of determination is r squared which we just use the x squared to calculate it if you have the coefficient of determination and your coefficient of determination lies between 0 and 1 because it's you're squaring the values. So even if you get your r of negative 0 comma h6 when you get your coefficient of determination r squared because you are squaring that negative number it will become positive so that is why it can never be more than or less than a 0 or more than a 1. So if you if you calculate r squared it will be you just press the so for this your r squared will be 0 comma 74 if I leave everything to four decimals 0 comma 74 and that will be your coefficient of determination. Now if they give you the ssr you just substitute the value of your ssr and the sst and then calculate. In terms of the scatter plot and how we interpret the coefficient of determination when it is negatively related your r squared will be equals to 1 because this is a perfect negative relationship therefore also for a positive relationship where your r is equals to 1 how you interpret it because it's equal your r is is 1 which will be 100 percent you say 100 percent of the variation in y is explained by the variation in x and you will see that the pattern emerges the same way for all the other scatter plot and the how you define your r. So let's look at when r is not equals to 1 or when r squared is not equals to 1. So for this one because this is a weak relationship as you can see by the look of this scatter plot yeah how because if you don't if you are not given the the value the actual value but you're giving this scatter plot and you don't know what the amount is of that total variation you cannot use randomly and suck the numbers so you can say some but not all of the variation in the values of y are explained by the variation in x. So as you can see we only replaced it there but if we knew that if this was your r squared whether it's positive or negative it will not matter if your r was positive or negative for both of them. So let's say the r is 0.36 so you will say 36 percent of the variation in y is explained by the variation in x because you have the number so you will use the actual value. So when r is equals to 0 because there is no relationship because if r is equals to 0 then it means your r squared will also be equals to 0 and therefore it means there is no relationship between your x and your y and how you will interpret the r squared you will just say the value of y does not depend on the value of x because none of the variation in the values of y can be explained by the variation in the values of x. Let's look at an example. Yeah you need to this is your exercise actually let me make it your exercise. Suppose that the correlation coefficient between a person's salary and his or her education attainment is equals to 0.6395. Calculate the coefficient of determination that is your exercise. What is r squared? I get 0.40896. 0.409. 4096. Sorry sorry 0.40896. Which we can round it off to to that okay and that will be your coefficient of determination and when we interpret it we can say that approximately 41 percent of the variation in the person's salary can be explained by the variation in his or her education attainment and that's how we interpret r squared. How will we interpret that r? The r will just say this is a moderate that it's a strong because it's between 0.39 and 79 so it's a strong relationship. We say this is a strong positive correlation or relationship whichever one you want to use doesn't mean the same thing. Okay any question before we move on to regression okay no questions so in terms of the regression so we set the correlation tells us the relationship regression assist us to predict or get a new value to predict a value in the future we can use it to forecast a new value. So a regression is used to predict the value of your dependent value or dependent variable based on the value of at least one of the independent variable because here we're using one independent variable so we can safely save depending on one independent variable and the regression analysis or the regression helps to explain the impact of the changes in the independent variable on the dependent variable and that change we we can see that impact of change by using the slope because if you did math in high school and you remember we had y is equals to mx plus c mx plus c as the equation of a straight line which is the same as your correlation or regression line that is the regression line or the equation of a straight line. So the slope which is m for every value of x we want to calculate so if this is x1 and x2 they correspond to x1 corresponds to y1 and x2 corresponds to y2 so in order to see the impact we're going to use our m which will tell us the changes that exist in the values of y of y by the value of yx so and that is the slope and that will tell us that impact and the slope also can tell us the direction of this line whether it's negative or it's positive so it can tell us that the value that you see on your m or your slope will tell you whether the direction of this line is negative or it's positive in your module instead of using y is equals to mx plus c y is equals to mx plus c we use y is equals to b1x plus b0 because we rewrite the whole thing s y is equals to b0 plus b1x so you will notice that also in the entire module or in the entire slides we might be using this as such on your calculator you will see and now i'm going to confuse it even more on your calculator the equation is written like this which still mean the same thing for our slope m which is our slope is the same as b1 with the subscript one b with the subscript one and is the same as take a look at b so on the calculator module meds i'm just going to just write it like that so that you understand where i'm coming from so the rest of the module we're going to be using b1 because in your study guide i think you're using b1 in the textbook i think they're using b1 on your calculator we use b so don't get confused you need to write all this down because when we start doing activities and exercises and we start using our calculator i don't want you to get confused so in terms of the regression that impact i've already said that so we what we do with this is just to calculate this change and that change of the value of your y by the value of your x this triangle is the change is that slope that's what we calculate with the slope and that's what we get okay let me erase all that so our dependent variable is the variable that we want to predict whereas our independent variable is the variable that we are using to predict the value of your y so always remember that your x is your independent independent let's go 50 50 one common two independent independent and your y value is your dependent dependent like somebody already made a joke so you must just remember everybody who has a x the x wanted to become independent so okay the regression line that line that i was talking about that regression line which we represented as y hat is equals to b0 plus b1x which is the regression line where the y hat is the value that we are predicting or we can predict the b0 is your inter-sept b1 is your slope and the x is your x value so when you complete the regression line only these two values need to be numeric numeric the others they stay as they are y hat is y hat x is x on the equation so it means your final equation of the straight equation of a straight line when you write it at the end it will look like this y is equals to 3 plus 4x and that is your equation of a of a straight line or the regression line or it can be y hat is equals to 120 minus 15x that still is your equation of a straight line so your y hat and x do not replace them you need to know what your b is so your b1 which is the slope this one tells me that this regression line is negative so it means the graph goes like this this one tells me plus 4 tells me that it goes like that now in terms of the intercept this intercept b0 it is when x is equals to 0 this y will be equals to 3 if x is equals to 0 so therefore if this value here it becomes 0 this whole thing will be equals to 0 therefore the estimated value will be equals to 3 that is that intercept we will learn how to interpret it just now so this is the equation we use to find the relationship between or to describe the relationship between x and y which will we use it as a linear function and it helps us to interpret the value of the slope because we are able to use that to see if the changes in y are assumed to be related to the changes in x because then if it's positive then the changes in in y are positively related oh the changes in x are positively influenced by the changes in x the value of b0 is the estimated average of y when x is equals to 0 and i think i've just explained it just now and the value of b1 is the estimated change in the average value of y as a result of one unit increase in x so you must just remember that when we move one unit increase what will be the new value of y when you move one unit increase what will be the value of so that is what it means it means when we move one unit how much our value of our new y will increase by in terms of that one unit let's look at the next example fm personnel department hired employees to give a job primarily on the basis of the results of an aptitude that was administered to all job applicants the performance of those hired was rated on the same scale by their supervisors a year later they were hired a sample of test grades and the supervisor assigned grades is as follows and there we have our x values of the test aptitude test and the supervisor grade which is our y and since there are two numerical values we can take them and visualize them on a scatter plot to look at the relationship between the aptitude test and the supervisors grade and we can see if we draw a regression line it fits the value and it shows that there is a positive relationship that exists with at least one individual outlier that is sitting there and remember an outlier an outlier is the extreme value it's a value that is far away from the rest of the other so now we have a scatter plot we have this let's calculate the coefficient of correlation let's find the regression line or the coefficient of determination we can use excel to do this easy taking our x and y on excel or any statistical tool it can produce a table that looks like this on this table there are several things that you can take note of the regression statistics a block which has the multiple r and the r-squared and the coefficient table which has the intercept and the test grade the multiple r is our coefficient of correlation so you will get the coefficient of correlation the r-squared is your coefficient of determination and the two coefficients here at the bottom remember our equation is y hat is equals to b0 plus b1x so our intercept remember b0 is our intercept so it will be 7 comma 125 and our slope which will be the test grade it will be 0 comma 625 and that is the regression line completed supervisor grade which is our y hat is equals to 7.125 which is our intercept plus 0 comma 625 which is our coefficient of test grade multiplied by the test grade because our test grade is x and we do not substitute it let's say we want to find let's go back to the yeah let's say we want to find the new value of y if our test grade our aptitude test was 4 what will be the supervisor grade can we find that so in order for us to find the new supervisor grade we're going to use 7 comma 125 plus 0 comma 625 and we know our new test grade is 4 and we can just calculate that and that will be can somebody calculate it 7.125 plus 0.625 into bracket 4 close bracket equals 9.625 and our new supervisor grade will be 9.625 please keep that number somewhere because when we do the activity on our calculator or on the excel spreadsheet we can calculate that as well okay so that will be the regression on excel the output it will look like this the others don't worry about them because it's not part of your syllabus it's for other modules to worry about but for you you just need to know only those four things that I've ticked okay so let's calculate manually now if we have a question that says from the same information that we had we need to calculate the least square regression line which is your regression line y hat is equals to be 0 plus b1 x and we also need to find the coefficient of correlation and the coefficient of determination if those were the questions asked remember our data was given to us our x and y data okay so my slides are out of out of order because then I we did discuss this remember that when we were looking at the sum square measures remember we were looking at the sum of x and y minus the sum of x times the sum of y divide by n we looked at something that looks like this we also note that we had sum of x squared and sum of y squared we also had what else did we have we yeah we had some of those um summations so in order for us when we calculate these things manually to get those summations the sum of x will just be adding 1 plus 3 plus 5 plus 5 plus 1 will give us the total of 15 that is the summation of x so the summation summation of x will be equals to 15 the summation of y will be equals to 45 because we just adding all of these values so in order for us to calculate the summation of x and y which means the summation of the product of x and y is not 15 times 45 it is 1 times 4 3 times 6 5 times 10 that's why we get 50 5 times 12 50 when we add all of this value which will be the summation of x and y will give us 145 summation of x squared is not 15 squared it is 1 squared which is 1 3 squared 9 like that you do all of them when you get to the summation of x squared you get 61 and so forth so this will be the summation of y squared you can also calculate the mean which is your the mean the mean bar for x will be the sum of your x divided by n which is 15 divided by n n is how many there are so n will be one two three four five so n will be five and you just say 15 divided by five will give you you can calculate it then once you have calculated the your summations and all that then you can find your regression line you can go and calculate the coefficient of correlation and using the coefficient of correlation you can calculate the coefficient of determination so let's look at the first one in order for us to find the regression line remember our table so I've calculated separately on the side all the values that I need so how do we calculate b0 and b1 calculating b0 and b1 working it backwards we need to do this regression line the first thing that we need to calculate is b1 because in order to calculate b0 we need b1 so we first calculate b1 and this is the formula to calculate b1 the sum of x and y or the product of x and y minus the sum of x times the sum of y divided by n divided by the sum of the sum square measures of x which is the sum squared minus the summation of x times times the summation of x or the summation of x squared divided by n in order also to calculate b0 we need the mean and mean of y and the mean of x and the b so I forgot to change my b here because I adapted these slides from another session so this should be b1 so to calculate that we need the mean of y or the mean of x but we already calculated it I didn't have to do it that way but anyway I'm going to just show you so let's first by calculating the slope substituting the values we know that yes so b1 will be the mean of y and the mean of x what if you want to calculate b0 sorry b1 no sorry it's b0 it's then the mean of y and the mean of x nope let's let's do it correctly so that you don't get confused so this supposed to be b1 so it means I have to research this light sorry need to go there just for that one error I fixed it sorry remember for b1 to calculate b1 we use this formula for b1 which is the slope to calculate b0 we need the mean the slope the mean of y minus the slope times the mean so that is the formula for b0 which is the intercept to calculate the slope which is b1 we substitute the values the sum of x summation of x and y which is 145 minus the sum of x 15 the sum of y 45 divide by n which is 5 divided by 61 which is the sum squared 61 the sum squared is 61 minus the sum of y which is 15 squared divide by n and then that will give us 0 comma 6 to 5 remember that that's what we got when we looked at the excel as well output then calculating the mean of y is just the sum of y divided by n which gives us 9 same mean of x the sum of x which is 15 divided by 3 now we have our slope the mean of y the mean of x we can just substitute into the formula the mean of y is 9 minus 0.625 times 3 and the answer we get is 7.125 what we do is just substitute all these two values onto the formula b0 is 7.125 and b1 is 0.625 and as you can see there we have the same regression line the supervisor grade will be the regression line for that will be 7.125 plus 0.25x and if we want to estimate the value of if we want to estimate the value the new value of x whether we want to calculate for 8 or for 2 then you can just substitute the value onto there like we did with 4 and that is the regression how do we calculate the coefficient of correlation calculating coefficient of correlation using the same summations remember this is the formula to calculate the coefficient of correlation the square root sorry n times the summation of x and y which are your sum your sum square measures summation of x and y minus the summation of x times the summation of y divided by the square root of n times x squared minus x with some square times n times the summation of y squared minus summation of y squared substitute the values into the formula tarara every value into the formula the points n is 5 summation of x and y is 45 minus the summation of x is 15 summation of y is 45 divided by the square root of the five times summation of x squared is 61 minus the sum of x 15 squared times n is 5 the summation of y squared is 465 minus the summation of y 45 squared when you solve this you get the value of 0 comma 32 to 7 which is exactly the same as what we got when we used the excel sheet and the answer is 0 comma 32 oh it's 32 percent and if we want to interpret this we can interpret this by saying this is a positive is it a weak or a strong it's a positive weak relationship that exists okay and this is mainly because of that one outlier that was the so if i look at this one the slope remember the slope will tell me that whether it's a positive relationship and there is a positive slope so it will tell me that this is a positive relationship that exists between the supervisor score and the aptitude test score how do we interpret the regression line that we had in terms of b0 yes can we just go a step back um somewhere i got lost how did you calculate your n your your n n n is just counting how many numbers you got so we have one two three four five rows from the original data set so let's go back to the original data set so how many uh employees did they hire they hired one employee two employees three employee four employee five employees so your n is five n is always the total number of your sample so they hired five employees so what i didn't do before i go to the interpretation as well remember the question set also calculate the coefficient of determination so your coefficient of determination is just taking your r so it's just saying zero comma three two squared or you can use zero comma three two two seven because we only round off at the final answer let's take zero comma two two squared and that will be point three two two seven squared is one zero comma one zero four one and that's the coefficient of determination how we interpret the regression line the b zero which is the intercept or b zero the intercept we normally do not interpret it because the intercept tells us that when the x value here is zero the average grade of a supervisor will be equals to zero sometimes it does not make sense to interpret it because for example let's say we calculate in the uh the weight of a baby and the answer here we got was zero comma three three four let's say uh it's zero comma two and that's x so if we calculate the weight of a baby and we say we want to know what the average is uh what the average or the average estimated average value will be then this will be equals to zero and therefore it means we say the weight of a baby on average will be zero comma three four which does not make sense because how can you have a baby who weighs that much that means that baby doesn't exist so we cannot uh interpret the b zero we can just acknowledge that that will be your estimated average value when your x or when your independent variable is equals to zero that's the only thing that you can say about that the only value that you can interpret real in do a real interpretation is your slope because we know that for every one unit increase in the values of your x there will be an increase or there will be a decrease in the values of y because when there is an increase in the values of y the change that will show as an increase in the value of y so if it's increase in y then we say that is a positive relationship when there is a decrease in y then the slope is negative so how will you interpret that for example with this one is a positive one we say we will say that for the slope of zero comma six two five then it means in terms of the test grade it means that the mean value of your supervisor grade will increase by zero comma six two five on average for one unit increase in the value of your test grades so for every one unit increase so if this one was one therefore it means your test grades of your supervisor will increase so we'll add zero comma six two five to that and that's what the slope interpretation looks like so how you interpret your slope we'll just say there will be a zero comma two five increase in the value of your test supervisor for every additional one unit increase in the value of your test grade or your aptitude test grade that is very important if this value here was negative if this was negative then yeah it would have said there is a decrease it will decrease by zero comma six two five for every one unit additional or one unit increase in the value of your test grade because this value will decrease by zero comma two five we are not done I'm just going to summarize we still have a long way to go so far what we have learned we've learned how to use the regression analysis to predict a new value like we did with the four to predict a new dependent variable based on the new x variable independent variable we've learned how to interpret the coefficient of of correlation or sorry the coefficient of regression which are your slope and your intersect b zero meaning the intersect and b one being your slope we also looked at how we use the coefficient of determination and coefficient of correlation how we calculate them and how we interpret them so now I just want to show you on your calculator we're going to use the same exercise that we did I'm not going to use this one because we don't have the answers for these values that I have on yeah so those who are using the case you will find the steps I've added them on the slides you press the mode function first then you will get a hey a menu with uh one for comm two for steps so you will select two for steps we're going to go to number two which says a plus b x that's the one because that hey remember our equation is y is equals to b zero plus b one x so we're going to use y is equals to a plus b x which is that one that you see there we select two we press two and then your calculator will show step one and then now you're ready to capture your data when capturing your data you will need the equal sign and the number function so if we capture in this information that is in front of us we just press you will have once you you click on the the number two and then it says that one it will also show you the table that looks like this and you just press the number that you you have that is four and then you press equal so you will have to only capture one column first and then do the next one so we first going to do the x values and then we move to the right to do the y values so you just press four equal two equal six equal four equal until you do the whole table and then you use your arrows to go up to number one and then go left oh sorry right of number one and then you start capturing the values of y when you capture the value of y make sure that you capture them as they appear so four corresponds with five so it means on the y from the first line you're going to put five for two you're going to put three six you're going to put seven on your y column until you complete the whole table do not mix match the values otherwise you're not going to get the right answer so you need to make sure that they align okay so once you have captured all your information you can press the ac button and then you press shift and then press button number one there is a stat function on button number one when you press shift because it's orange it will call up that function and the menu like this will appear okay let's use a pointer the menu like this will appear and the type will mean the type of data where you don't have to worry about that the data will take you back to the table that you created the sum the sum will have your sum summation of x and y summation of x squared summation I will show you just now and the var is your sum your mean your standard deviation and and what else it's just the mean and the standard deviation on the var the reg is where you're going to calculate your regression values like your a your b and your r you will find them under the regression your coefficient of determination r squared once you calculate r you're going to press x squared button to get your r squared the min max don't have to worry about that so once we have pressed shift one then we get this then you can press five to get the a r five y hat at the b so when we estimated the value of four we press we're going to press the y hat to estimate or predict a new value and that is the menu for not oh gosh this is not casual it's not sharp it's casual that's the way that's the other error that I need to fix this is casual for a sharp calculator yours the sharp calculator has all the values in front you will see they are visible everywhere you are able to see all the values so yes you need to press the mode button and then press one and your calculator will be in state mode and then you will have sd and then you will have a line you're going to select line where it says line you're going to select where it says line and your calculator once you do that it will say step one so you're ready to capture the data for you your data won't be visible when you capture it so you want to get a table the two important button you need to use is the store which is the second button there where I am and the m plus which is the last button there on top of the closing and the opening brackets those two buttons they store st o and the m plus which is that are the two most important buttons you're going to use so you're going to press four if we capture this you press four and you press st o and then you press five and then you press m plus so it's four store five m plus and then it will say data set one and you continue two store three m plus data set two and until you capture all the values for you it's easy to calculate the state the the states because all your states are visible on your calculator in blue everywhere where you see blue you will need to press first the alpha button so let's say we want to calculate you want to calculate the mean your mean I think it is on button number four for the x the mean of x the mean of y will be on button number five and so forth and also you will be able to see the summation of x and y and the summation of x squared you can calculate them from there the r and the a and the b on your calculator a is on button number one and b is on button sorry on closing button bracket the open bracket sorry open bracket is a and closed bracket is b your r I think r is on the division sign if we want to estimate the value of y we use the closed bracket okay so the same will apply with the financial calculator the values are in front and you just follow the same step so for those who are using the financial calculator you don't have a store but you have a a bracket with x and y that is your store so you say four that button x and y and you press the m enter you also don't have m plus but you have the enter so you will use those two buttons there also your values are just in front a b and r and when you estimate there is an estimate on the on one of those values I think on the delete nope it's on there no open bracket and then the summation and the mean are on the number buttons you also press the alpha to get to them and that is one way of doing the activities the other way okay I need to discard let me open my calculator and I need to open both of the calculators and the excel spreadsheet that I sent you let's give me a second some reason I need a so the first step I want to show you let's share my screen my entire screen is the excel sheet okay so on the excel sheet you will notice that I already did my excel output but I will I will redo it so that I can show you how to get there in case you want to start from the beginning so on this uh actually you will notice that the the the values I have here are different from the ones that are on the the presentation I made that purposely because um when I did the excel spreadsheet this output I already added a line but in order for us to get the same values that we had on this excel spreadsheet I will show you how to to do that so what I did to us given the values because these are the values that you will be given I calculated the sum which is the same as what you saw the way we had the total summation on the um on the powerpoint if I go to our table I'm gonna use this one a lot so I can just come to this one because I'm going to use this one to demonstrate so this summation's there you will notice that they will be the same as the one that I have there so I went on and I calculated let's make this bigger so that everybody can see properly so I went and I calculated the summation of those values there and I calculated the sum x by just multiplying that value with this value x times y and I did the x squared which is just y squared or one times one which will give you the same thing and the y squared as well and then it calculates the summation and then what I did was you can read the instructions as well because on the instruction I'm telling you what you need to do let's say for example you have additional information or your rows there are 10 of them so if you look at this I only have six because yeah it counts the number of records that I have in order for you not to mess up the whole spreadsheet if I need to add a new value or delete a new row you need to start clicking from B and highlighting that the the cells so you you click and you highlight so you need to make sure that you know what how to use your excel you click and highlight and you right click and you go to delete and you say delete and it be it must go up so only those ones so as you can see everything stays the same as I had it before except for this one okay so except for that one because then now it's shifted but that is literally what you need to be doing so if you're adding more values you just highlight from B and you just go and say insert and you say down don't drag it for until the end because then you're messing up all the calculations and then you will have a line and then you can add your new values onto there and so forth and so forth I'm going to go back because I I want to have the same information on on yeah so what I did was calculate the intersect like we had on our presentation we have the slope the mean of y all of them the way they are I went and I calculated each and every one of them and I'm just showing you the formula in case you don't understand how we calculated it there so you can see I'm saying c 10 times c y so you will just look at that will be for b1 it just says what I am doing right yeah so it just tells you that I am taking those values as by this so it says the summation of x and y which is the e y n multiplying that with an n minus all those things so I'm just doing this whole calculation there anyway that's what I am doing there and it gets the value of 0.644 and then I do the same with the mean of x the mean of x is just taking the x value divided by how many they are the mean of y it's also taking the y value divided by how many they are the b0 remember b0 is the b0 is your mean of y minus this intercept oh sorry minus the slope times the mean of x which are all those values that I do so it's 8.883 minus minus 0.64 times 2.83 that's all what I'm doing there and your regression line which is that I'm just taking the answer for b0 plus the answer for b1 and multiplying it by x now here you will have a challenge because if the answer here is negative you will still have this plus there because this is literally manually edit so you need to know that the answer you will get here will be negative so you will have plus minus that which will in your mind you should know that a plus and a minus equals a minus so you just need to know that in your mind so that when you look at the regression line you don't get confused with the plus again so you know that that is a minus okay so that's what I did so in terms of the regression line or the correlation of correlation the coefficient of correlation as well I used the same formula I calculated it manually like that by using that and r-squat here you can see that it's just that value-squat and that gives you your coefficient of correlation now in terms of the other values like your SSR and your SSE and your SST because I know that we want to calculate SSR and SST using the sum-square measures then I use the formulas so you can see there I went and I calculated the estimated value so this estimated value is this x the y-head so I'm replacing the x-value with the actual x-value that we have in order to calculate the y-head so I just take that plus that times the x-value for all the values and you can see that I've just done it for all of them so it's just the same replace replace replace so if you want to estimate a new value let's say a new value is 4 you can do the same so you just come here and put 4 and then take this and then place it here just copy and paste it will create your new 4 as well then I went then I calculated all these summations I'm not going to go into the formula of how I calculated them but you can see that it's just your your y minus your y-head squared so that's what I did for all of the observation and then I did the summation because the SSE is the summation of that I did the same with SSR you can see that it's your y-head minus the mean and the mean you will see the mean is highlighted mean of y and I did it for all of them as well and your SST as well so this is your observation y minus the mean and mean is highlighted there and I did it for all of them and the summations and then to calculate SSR oh sorry before I calculate the SSR remember SST is SSE plus SSR I also went and validated that just by adding those two values to see if they give me the same as SST then to calculate R-squat I'm using SSR divided by SST which is just that so taking the summation of SSR divided by the summation of SST my R as you can see there it's the same as that okay so because this one is five decimals I can move it to three decimals we will see that it looks exactly the same depending on what the options are in the exam or your answer you just looked at how you run off so the reason being why I kept the same information is because I wanted because the the excel output it uses the same information so I just want to show you that all the values that we calculated manually they correspond to the values that we have our R we calculated it it was 0.339 as you can see that it corresponds our R-squat we find that it 0.115 which is the same as what we we got with the excel output our b0 which is 7.10 which is the same and our b1 are the same how do I get there remember the data analysis patch you need to be able to have that data analysis so to calculate that we go to data analysis and you look for regression and you click okay then we need to select our x-value so I always use the labels as well because they help with labeling on the table and I must select also that that I've selected the labels so I also selected the row the wrong numbers so this is the y-range so I need to select y only the values without the summations the x-value only the values without the summation I leave everything including the confidence interval for now I don't have to select anything else because I'm not interested in those things the output you remember if you want it on the same sheet as this you just select output and then it will put it on the same output I also want it there so I'm going to select output range and then I'm going to scroll to oh I cannot scroll to the end oh sorry okay I can scroll and then I want to put it just there so I'm just going to click on any row or column regression y and x have the same number of okay let's see so that is our y let's go select our x and see what went wrong okay and there is our x so done as you can see the values look exactly the same as the one that I have on the table and in this instance okay and that's how you can use excel to do your your calculations I just want to show you using excel let's go to the exercise there are some exercise where we can look at this one the table is too low don't we have one exercise with no no tables how many values we have one two three four five six seven eight it might take me forever but nothing bad for a good challenge I said how many columns two four six eight one two three four five six seven eight eight so I just need to add two lines okay so now the challenge comes capturing all the information that I need okay I'll start with the x's five two nine fifteen six five and sixteen I didn't move 25 60 and my n is eight I'm maybe with that doing the same here six forty eight seventy five hundred seven ten four forty five sixty four twenty and six hundred I need to adjust I don't have to adjust anything because all the calculations are waking and I let me make it bigger so for this one the challenge now becomes the excel because I don't have another data to validate this I can use the same so I can say just to make sure that my calculations here from my tables are the same so my x to reselect them and my x and my output range I'm going to replace the one that we had there it's fine so it was asking if I want to override yes I want to override I can just hide all these ones because I don't need them which I can just delete them because I don't need them anymore unless if I deleted some of the information I need okay so we can validate our answers our solutions that we got to make sure that everything is wonky-dory so let's see what did I do here I see that I'm not getting the right I'm not getting the same answers let's see do we get the same answer oops oopsie daisy because some of the things are just manually populated here like the x that's why it doesn't move nicely because it's just the picture that I brought in okay okay okay so we know what we have so we can go and answer the questions so just to validate the answers that we have our r so on an excel spreadsheet your r you won't be able to see on excel whether your r is positive or negative the important thing is looking at the slope so the slope will have a negative value so therefore it means your r here is also negative you must pay attention to that on excel the r here is not automatically negative so you need to look at this slope to know what direction your r is so your r if I look at my r is 0.7877 which is the same my r squared are the same my intersect which is 766 and this slope is 15 points 15.47 which are the same so it means it worked so we can go and answer the questions given to us so this one I'm just going to make this very small because the bigger they are the bigger they look uh something they didn't work out well here because I think some of the calculations they didn't work because when I moved things around this didn't move currently because I can see that one is 36 31 and this is 20 so something didn't work well anyway don't want too much about that don't worry about the bottom part because all the information you need are just here okay so let's go answer the question that we were given so we can either use the output of excel or we can use the manually calculated values which we have in front of us right here with the regression line so we have our regression line we have our slope we have our mean we have our slope our intercept and we have our coefficient of correlation and our r squared so I'm just gonna make this fit so that everything is visible I just need the definitions okay so which one of the following statement is incorrect the insurance premium depends on the driving experience so we know that when the values of so we're saying experience is related to your premium that's that statement what it's saying so number two it says the y-intercept which is the b zero the y-intercept is seven seven seven six six point six six that's correct because that's what we got so the y-intercept here I cannot write because I'm in a in a view mode okay so let's make it non-view mode and increase I cannot increase the side I can hide oh yeah I cannot hide okay I hope you are able to see the slope is minus 15 point so our slope which is minus 15.478 which is correct so that is correct that is correct also can write again using white that's correct that's correct so far number four it says there is a negative relationship between the relationships we can see by looking at the slope and by looking at r that there is a negative relationship because the slope it's minus and the r is negative so that is correct and it says 77.77 percent of total variation in the independent variable is explained by the independent variable is there 77 percent so to get 77 percent we can convert our value to a percentage and that is and it is two decimals three decimals not 32 and that is the incorrect one sorry I forgot to also mark that one it's correct that is the incorrect one and that's how you will use your excel to do that on saturday I will show you how to use both your calculator but I've given you the steps you can try them and if you still struggling and then I will show you how to use them so in a way I can in the last four minutes that I left I can quickly show you the case here or the shop let's use the shop one so let's use the shop to do this one that we have the answers to so you put your calculator to mode and you will press one and press one again which is the line and put the values one star four m plus and it will say data set one if you make a mistake you just go second function c a it will clear your calculator and you'll start from the beginning three star six m plus five star ten m plus five star twelve m plus one star thirteen m plus and I'm done there are five to get the summations just press on and off alpha the sum of x and y equal there is 145 sum of x squared you can clear your calculator again say alpha sum of x squared is on the plus or minus sum of x squared equal and that's 61 sum of x sum of y squared you just do alpha y squared sum of x alpha two sum of x alpha dot sum of y alpha uh squared and so forth the mean number four alpha four will give you the mean of x and alpha seven will give you the mean of y so everything is right here so now let's go find the coefficient of correlation so let's go to the next slide so let's go find all of this so to do that alpha a remember a is the intersect it should give me seven comma one two three i want to five equal seven comma one two five the slope which is b alpha flows bracket which is b equal zero comma six two five and then the coefficient of correlation remember we found it it was zero comma three two seven so you do the same alpha divide by which gives you r equal zero comma two two seven remember when we used the estimate and we went and we found the estimate of four let's do that so to get the estimate for four we press first the value that we want to estimate so we want to estimate four so we press four first then you go alpha and then you press the close bracket which is the y copy which will be uh sorry not alpha sorry my bad my bad you press four because it's orange you press second function and then you press the y that y copy it will represent the y hat and then press equal what did i do four sorry second function y hat and then it is nine point six two five remember we found that by substituting the value of four onto this formula so if you replace x by four you will get nine point six two five so let's go back calculate r alpha r equal calculate the coefficient of determination squared there is r squared equal zero comma one four one one four two and that's the beauty you can use your form your calculator to do all that so bear with me the five minutes so that I can show you with the last calculator as well so with case you use you can see that it it goes quick quick within a minute you are done so I used four minutes to answer a whole lot of questions so let's see if this one can do so in this one you go mode and then you press that which is two then you press two again for one plus i that so yeah we say one equal remember we're going to go x values first three equal five equal five equal and one equal and there are five then go up up up up until you get to line one and go right and then four equal six equal ten 12 equal 13 equal and there are all and I can go on and off the calculator shift start which is button number one and I can go and look for the sum square which is on button number three and you can press the x squared which is we should get 61 which is on button number one and I just press one equal to 61 getting the sum sum of x and y shift start three sum and sum of x and y is on button number five which will give me 45 equal 145 okay finding the mean go shift start one for var or sorry four for var there is your mean your standard deviation for oh sorry the mean of x the mean of y and your end if you want to know how many records you have so the mean for x is two we should get three equal so if you didn't clear your calculator remember to always press the equal button so you can go clear your calculator and go back shift start and now let's go to the rec which is five and then we want the intercept which is a let's go to intercept we know that intercept is a we should get 7.125 one equal 7.125 to get the slope shift start five for rec and b is two equal 0.625 if I want the r which is my coefficient of determination oh sorry my coefficient of correlation shift start five rec which is r is on button number three equal 0.3237 if I want to estimate the value of four of y by making x equals to four shift ah sorry before I press shift always press the number four shift start five estimating y hat is number five and these four y hat equal 9.625 within three minutes I am done with all those questions that they were asking and that concludes today's session I know that we didn't do a lot of activity so exercise don't worry we will do that on saturday please go and practice so that on saturday when we go through the questions you are able to answer them as you can see there will be questions that looks like this that only uses the sum square measures so the summations so you need to make sure that you know how to use the summations the formulas as well and not rely on the calculator you will also get questions theoretical questions or you will get questions where you have to estimate so for this one we could estimate the value of our new value which is the new value yeah they say 10 years so we can go and estimate 10 years of those values and um what else and you should be able to to to answer questions where you are given x and y value and interpreting your slope and your intercepts are also part of those discussion and interpreting the coefficient of correlation but we'll look at more activities or exercises on on saturday okay and that concludes today's session any question before we wrap up and go home or and go to sleep i'm good thanks lizzie no thank you no worries okay thank you i'm going to take your formulas and i'm going to work out the relationship issues now okay so if there are no questions thank you for coming see you on saturday enjoy the rest of your evening bye thank you bye everyone good night thank you mrs lizzie