 So, welcome to our last content session where we're going to discuss study unit 11, which is your final study unit. And then you can start preparing for your exams and so on. So, today's session, we're going to concentrate on the regression. Now, it's going to be a very long session, but I will try to keep to our one hour 30 minutes. I will rush through some of the content because I've already shared some of it with you through the recordings that are uploaded on my UNICEF. So, I'm not going to stick too much on the content, so we can concentrate on more of the calculations. Because I want to show you in four different ways how to calculate regression questions. One, by using the formulas because you need to know how to use the formula regardless of the shortcuts that I'm going to give you. But you need to know how to answer questions based on the formulas, so it means the manual processes. Two, how to use your calculators to calculate the regression. Three, how to use Excel using the template that I have shared already with you. And the last one is doing the regression calculation using the functionality within built in within the Excel document or the Excel sheet. So, without wasting any more of the time, do you have any questions? Comments before we start in the absence of comments or questions. Let's see if there is something even in the check function. So, we can continue. So, today we're going to learn how to make inferences around the correlation of coefficient, how to interpret the coefficient of correlation, and how to interpret the coefficient of determination, how to use the regression line to predict the value, and how to interpret the coefficient of correlation, which includes the slope and the interest. But more especially, how to calculate all of them, the coefficient of correlation, coefficient of determination, the regression level. Which includes calculating the slope and the intersect. So, when we talk about correlation and regression, yeah, because we're talking about two numerical values. So, we are discussing or we are testing the relationship that exists between the two numerical values and the graphical representation of two numerical values is a scatterplot. You put one variable on your one axis and you put another variable on the other axis as well, on the horizontal and vertical axis. So, this is one of the example of a scatterplot. In fact, we did discuss this in study unit three or study unit two when we were looking at display, how do we display or generate graphs for numerical variables where we used this as an example in our session. When we talk about correlation, like I said, we can use scatterplot for representation, but a correlation itself is a calculation that you have to do to measure the strength and the direction of the relationship of those two variables. And these are the type of relationships that can happen. So, you can either get a linear relationship when you're looking at the scatterplot, a linear relationship, whether it can be a positive or a negative, and I will explain more in detail in the later slides. Or you can get a Cavalier relationship, which looks like a quadratic relationship or an exponential relationship, or you can have no relationship of the two numerical variables. So, when we talk about the correlation where we said it's a measure, therefore, it means you need to be calculating it to calculate the relationship, that relationship we use what you call or the measure that you're calculating is called the correlation coefficient. And it's represented by a way, when you calculate this, it gives you the strength in terms of whether is it a strong, a weak or a moderate, or the direction in terms of whether is it negative or positive. The relationship between those two variables, where your X variable will always represent your independent variable, which are your input variables, and your X will represent your outcome or your output variable, or the variables that you're going to be using to. At the later stage, when we talk about regression, those are the variable that we use to predict a value. So the coefficient of correlation, this is the formula to calculate the coefficient of correlation. I'm going to show you how to calculate it. You don't have to worry that much for now. When you calculate the coefficient of correlation, you will get a value, and that value needs to be between minus one and one. So you can get a value of two when you calculate in the back, then it means you are getting something very wrong with your calculation. So your answer should be between minus one and one, so it can either be minus one, 0.98 minus 0.68 or 0.39 or 0.5 or one. So if the answer you get for R is bigger than 0, then we say the correlation is negative. It means when the value of X increases, the value of Y also are increasing. When your R is less than 0, we say it has a negative relationship. It means when one increase, the other one decreases. And if your value of R, once you calculated it, then you get the value of 0. It means there is no relationship between the two variables. So how do we define the direction and the strength of that relationship? This is how you interpret your R value. Based on the answer that you got, let's assume that your answer is minus one. Therefore, you will say there is a perfect negative correlation between X and Y, since our variables here are X and Y. If the answer is 0.89, therefore it falls between minus one and minus 0.79, then we say it has a strong negative relationship or correlation. If it's between minus 0.79 and minus 0.39, let's say you got the answer of minus 0.58, therefore we say it has a moderate negative relationship. And if it's between negative 0.39 and 0, let's assume that you got 0.15, then you will say it is a weak negative relationship. And the same will happen when your direction is positive, whether you get a negative or a positive. You will have to state the direction by basing the answer with the sign that is in front of the R value. So if you get an R of 0.98, you will say it is a strong positive correlation because it's 0.98. If you get a score of 0.38, it will be between 0.39 and 0.39, which will be a weak positive correlation. And if the answer is your R is equals to zero, we know that it will be no correlation between X and Y. So in terms of some of the examples of how you can identify the graph or the representation, the scatter plot and the R value to state the type of a relationship there is, you can use the scatter plot and the R value to state how much or what does that scatter plot tells you or what is the relationship between those X and Y relationships. And these are just another example that you can use. Okay, so we also have, apart from the coefficient of correlation, which tells you about the relationship, the measure of relationship or association between two numerical values, we do have what we call a coefficient of determination. Later on, we are going to calculate regression, a regression line. So what do we mean by a regression line if I can come back to one of the examples that we use here? This line that we draw, that is what we call the regression line. It's a straight line that represents the direction of your relationship. So we always draw that line and we go into calculated. So when you draw this line, there are certain things that you calculate and there are certain things that influences the direction of that line or the output that you will get. So when we talk about the regression line, we are talking about the prediction. So we want to predict, we want to assume a new value. We want to come up with a new value that represents that. Yeah, we want to come up with a new value that will represent the outcome when we give the input in terms of our independent variable. So for that, we calculate what we call a coefficient of determination, which is a portion of the total variation that exists in your independent variable, which is the variable that we want to predict. That is explained by the variation in your independent variable, isn't the variable that you are giving the model. So to calculate your R-squared, which is similar to your R-squared is your square of your coefficient of correlation. So it means if I have calculated my R and I have my R, I can just square the answer and that will be my coefficient of correlation. So the square of your coefficient of correlation is the same as your coefficient of determination. If I have my coefficient of determination R-squared, I can take the R, the square root of R-squared and it will give me my R. So you can work VISA-VISA, whether you move from the coefficient of determination to R or you move from R to the coefficient of determination. However, if they haven't given you the coefficient of correlation, therefore it means you need to calculate the coefficient of determination by using the sum-square measures. So R-squared is calculated by using the regression sum-squares divided by the total sum-squares. And your R-squared, which is your coefficient of determination, is always between 0 and 1. It will always stay positive because you are squaring a negative or a positive value, right? And it will always be between 0 and 1. So in terms of the measures of variation, like we have mentioned them there, the regression sum-square measures and the total sum-square measures, we can use these formulas. Now, don't worry too much about the formulas because I'm going to show you how to calculate each and every one of them. But I need to explain which are the symbols. So remember, the summation is aiding up. It means the sum of something. So this says the sum of your, to calculate your total sum-square is the sum of your observed Y, which is your dependent value, minus the mean of Y squared the answer. So you will take every observed value of Y and you're going to subtract the mean and square the answer and add all of them. They will give you a total sum-square, which is your total variation. To calculate the sum-square of regression or what we call SSR is the sum of your estimated Y value minus the mean square. So this estimated Y value is what we're going to calculate when we look at the regression. And later on, we're going to talk about what the regression line looks like. So with the regression, you're going to estimate the value of Y and that estimated value of Y. We're going to subtract it from the mean to calculate and square the answer and add all of them to calculate the total of what we call the regression sum-squares. The sum-square errors, which are your unexplained errors of your model because when you are doing the model, when you calculate especially the estimated value, there will be some couple of errors that you get. So those small errors that are unexplained, we can calculate them by using this formula. You can calculate how much of an error that exists by taking your observed value minus your estimated value and square the answer. And that will give you the total of your unexplained errors. And these are just another explanation of those SSTs and SSR. Your SST is a measure of variation of your Y values around their mean. And your SSR is your variation attributable to the relationship of X and Y. And SSE is your variation of Y attributable to the factors other than those that are given by your independent variable, which is your X. And your SSE are your unexplained variation. But we're going to look at the calculations later on. You just need to know all this context or theory because you might be asked in the exam as well. So these are some of the sum-square measures that we use to calculate. So for example, this is how you will calculate your SSXX. If they give you the sum-square measure of X, this is the formula sum of X squared minus the sum of X squared divided by N. You must pay attention, those two are totally different. The same will apply for the SSY and the SSX and Y, which is X multiplied by Y. And you can also calculate your R, which is your coefficient of correlation using the sum-square measures as well, if you want. It's not a must, but if you want. And later on, we're going to use the slope equation. But also, if you want to use this formula, you can. Otherwise, you just take the, your SSXY is just the summation of X times Y minus the sum of X times the sum of Y divided by N. And the SSX is your sum X squared minus the sum of X squared divided by N. We're going to talk about this when we do some activities just now. So how do we interpret your R squared? So to interpret your R squared, we stated in relation to the percentage that you get or the, the value that you get for R squared. So let's assume that our R squared is 0.98. Or we can even write it as a percentage, which is 98%. How we interpret this, we would say 98% of the variation in X is explained by the variation in Y. That's straightforward interpretation of R squared. For example, when we look at this one that we have here, we have a perfect linear relationship where R squared is 1. Therefore, it's 100% of the total variation in Y is explained by the variation in X. If we get, already I made use of that one example. If we have 0.68, our R squared is equals to 0.68 or 68%. We can either use or refer to the value or we can say sum, but not all variation in Y is explained by the total variation in X. Or otherwise we can also say 68% of the variation in Y is explained by the variation in X. You can use the number or you can refer to sum or sum or less or depending on how you want to state it. But if there is no like 100%, you can state it in this manner as well. If your R squared is 0, it means our R was 0. So if our R squared is 0, therefore the R is 0, it means there was no relationship between our X and Y value. And we can also state that the none of the variation in Y is explained by the variation in X. Or we can also state it in this way. The value of Y does not depend on X. Or we can just say there is no relationship, therefore they do not influence one another. Some of the questions that might come in the exam or in your assignment might be totally different. So you just need to be able to know how to tackle those questions. For example, they can give you a coefficient of correlation. In this instance, they say, suppose that the coefficient of correlation between the person's salary is this and his or her educational attainment is equals to R of 0, 6, 395. And then they can ask you how much of this variation or what is the coefficient of variation for person's salary and educational attainment. Therefore, they expect you to take your R and calculate your coefficient of determination. And by source, it means you need to just take your 0.6395 and then square it by just putting the square button. And that will give you 0,4089 or 0,090 if we keep it to four decibels. And that will give you your coefficient of determination and how to interpret this. So initially, if we interpret the coefficient of relation, we would say 63% with 63 or 64. Okay, I call it the 64%. There is a 64%, therefore it means there is. Is it a strong, moderate or weak? It will be a moderate relationship because it is between 0.39 and 0.79. So it is a positive moderate relationship or a moderate positive relationship that exists between person's salary and educational attainment. But what about how do we interpret the coefficient of determination to interpret that? We will state it in terms of the total variation. We will say approximately 41% of the variation in the person's salary can be explained by the variation in his or her educational attainment, as you can see. So it means you will need to pay attention to how you define what is your y and what is your x. But most of the time they will tell you what is your y and what is your x when you are working out the question. Otherwise, then you will have to make an assumption around that. So are there any questions before we move on to the next explanation of the content? No question in the chat. So it means I can continue. So now let's look at, we've looked at how we find the relationship, how we display in terms of graphical representation of the relationship. And what are the total variation that exists when we predict something that could influence another variable. So now let's look at how we built a regression model. So with regression, it is used to predict a value of your dependent variable, which is your y variable, based on the value of at least one independent variable. In your module, you're only doing one variable, so we only do single variable so you don't do multiple regression. So we're only going to be talking about one independent variable. It explains the impact of change in the independent variable on the dependent variable. And your dependent variable is the variable that we wish to explain or predict. And your independent variable, which will be our input variable, it is the variable that we use to predict or explain the dependent variable. We use this formula, which is also called a simple linear regression equation, which estimate your value of your dependent variable. The formula looks like this. It's y hat is equals to b0 plus b1x. Y hat is your estimated value that we want to estimate. b0 is our intercept. It is where x is going to be. If x is 0, therefore our estimate will be the same as our intercept. b1 is the slope, which tells us the change in the values of y as influenced by the change in the values of x. And your x is your original observation of your dependent variable. With this regression line, we are able to interpret the value of b0 and the value of b1. But this equation also tells us the relationship that exists between x and y by describing it in this linear function. It also tells us that the changes in y are assumed to be related to the changes in your x. And your b0 is the estimated average value of y when the value of x is equals to 0. Like I said, if we replace this value with 0, therefore our slope will be equals to 0. And the whole side here will be equals to 0. Therefore your estimated value will be just equals to the intercept. And your b1 going to interpret it in this manner because b1 is the estimated change in average value of your y value as a result of one unit increase in the value of x. As x increases, what happens to your dependent variable? Does it increase or does it decrease? And we will look at examples on how we interpret the actual values. So here is an example. FM has a now department highest employees for a given primary job on the basis of the results of their aptitude test. The performance of the hired was rated on the same scale by their supervisor a year after they were hired. And a sample of the test grades and the supervisor assigned grades are as follows. So here we have the test grades of the aptitude test and the supervisor grade. Test grades will be our x, which is our independent and our supervisor grade, which will be our dependent. We can draw a scatterplot. I don't have to go again and tell you how to draw the scatterplot. So your x value will correspond with the y value. So 1 and 4, that will be the dot and you do all of them. So 1 and 13, you will do that. 5 and 10, 5 and 10, that will be that dot. And 5 and 12 will be that dot that is there. And 3 and 6 will be that dot. And that is how you will construct a scatterplot. If you want to find the regression line, which is this line, we will need to calculate our regression equation, which is explained by y hat is equals to be 0 plus b1x, this dotted line. How do we define it? Let's go and calculate that line. Now, calculate that line. Like I said, you can use multiple things to do that. So using an Excel sheet, we are going to do the example on Excel. Don't worry, I'll show you how to get there. Using an Excel sheet, you get the output that looks like this. The only measures that we're going to concentrate on, which are the outputs of this are only those values. Multiple R is our coefficient of correlation. R squared is our coefficient of determination. There is a weak relationship between the test grade and the supervisor grade, because it is 0.32. There is a positive weak relationship. Our R squared tells us only 10% of the variation in the supervisor grade is related or is explained by the variation in the aptitude test. The ones at the bottom, your intercept is your b, b0, and the value of your test grade will be your b1. So the only values that we want to use are the coefficient. You can ignore the rest of the other things that are happening on the table for the purpose of your session. So we take the coefficient and we substitute also our equation. Remember, our equation is y hat, which is our y hat. It's what we want to estimate is the supervisor grade equals our b0, which is our intercept value plus our b1, and our x will be the test grade. And our b1 will be the test grade coefficient times the value of x. So now based on this, we can take this equation and estimate another value if we want. For example, if we go back here, let's say we want to estimate a new value, which value doesn't, let's say we want to say the test grade, what happens when a test grade is 4? So when a test grade is 4, we need to go back to our equation. When a test grade is 4, therefore it means we need to come here and substitute into this equation 4. So we'll say 7.125 plus 0.625 times 4. How much it gives us? I'm going to do it this way. 7.125 plus 0.625 times 4, close bracket, equal 9.6, and we know that on the other side it was 3.1 decimals. So we can say it is the same as 10 because the answer is 9.6. If we round it up to 10 decimals, we get that. And that will be our estimate. Now remember we also spoke about the SSI where we had some of the equation where they had y-heads, minus the mean squared. Or we had somewhere where you take your observation minus your y-head. So how do we do that? We can calculate the y-head based on our equation. Y-head is equal to 7.125 plus 0.625 times x, that is our estimate. We can take where we see x, we substitute the value of x so that we can find the value of y-head. So let's do that. This is now how we calculate the estimated value. So I shouldn't have removed that. 0.125 plus 0.625 times, and here we have 1. And the answer there is 7.78, which is 8 if I do it to whole number. The answer will be 8, and also here it will be 8 because it will be the same. For 3, we just change that to 3. We estimate for where x is 3, the estimate says it's 9. So we put the line. And for 5, where it is 5, the estimate is 10.25 which is the same as 10. So it will be 10, 10. 10, 10. As you can see, you can go ahead now and calculate, you should be able to calculate SSR, which SSR is the sum of your y-estimate minus the mean of y-square. Or you can go ahead and calculate SSE, where it is the sum of your y-observation minus your y-estimate. It's great. So for the SSR, your y-estimate will be the new value that you calculated. Your mean, remember, is the sum of how many they are divided by, sorry, the sum of them divided by how many they are. You add all of them, you divide by how many they are. They are 45. If I add all of them, they are only 5. And therefore, your mean is the mean of y will be equals to 9. Then you can say 9, 9, sorry, it's the estimate. The estimate is 8 minus 9 squared. You get the answer plus 9 minus 9 squared. You get the answer plus 10 minus like that. That will give you the SSR. To calculate your SSE, you say 4 minus 8 squared plus 6 minus 9 squared plus 10 minus 10 squared plus 10 minus 12 minus 10 squared plus 13 minus 8 squared plus. And that will give you the two of them. And otherwise, then the SST, you just add SSR plus SSE. You don't even have to go and calculate using that formula. You can just add the two answers. And that's how you will answer using the SSR or the sum square measures formulas. That is one way of answering questions. So now let's look at another way. Let's assume that we are given the X and Y. We continue with our test score and the supervisor score. The question says determine the least square. They give us the least square formula. And also, we need to find, let's say, another option. So let's say this one was option one was to calculate B0. Option two was to calculate B1. And option three was to find out whether you know how the equation looks like. Y hat is equals to be 0 plus B1x. And equation and option four will be, what is the coefficient of correlation? And option five will be, what is the coefficient of determination? So you need to be able to know how to calculate all of them. So if given our X and Y table, so our test scores, remember this is the aptitude test and this is the supervisor score. So we were given that table. So all we just do is calculate the total, which we're not given then in the table. So you calculate the total of both of them and calculate X and Y. Remember there were some formulas somewhere where you had to calculate things like sum of X, sum of Y, sum of X. There will be some way where you need to calculate the mean and all that. So you need to be able to know what those, how to calculate them. So under total, this is the same as the sum of X this is the sum of Y. This is the sum of X and Y. So to calculate X and Y, you just multiply X times Y, you get the value, multiply three times six, get 18, five times 10, 50, and when you add all of them, you will get the sum of X and Y. To get X squared, you take your X squared, one times one is one, three times three is nine, five times five is 25 and so on. And that will give you the sum of X squared, which is something that I didn't mention here, X squared and the sum of Y squared. And to calculate the sum of Y squared, you do the same on Y, four times four is 60, six times six is 36 and you continue and you add all of them that will give you the sum of Y squared. The mean, remember, this is the mean, mean of X, which is the sum of X divided by N will give you the mean of three. The same way, the mean of Y, which is the sum of Y divided by N, you have 45 divided by how many there are, your N here is equals to five because they are one, two, three, four, five values. So once you have all this, remember the formula for the coefficient of correlation uses some of this. Remember, it uses the summation things, but for now let's answer option A. Option A says we need to determine this equation of a straight line. To do that, I've repeated the table here, to do that, bear in mind the only two values that needs to be substituted or V numbers, B0 and B1. X and Y will stay as constant as variables as you see them here. The only ones where it will be numerical values, it's where you see B0, which is your intercept and B1, which is your slope. You need to always remember that that B0 represent intercept and this is the slope, always, always. Slope always multiplies with X. Get it? Remember that. The value of your slope, if they ask you what is the slope of this equation, the slope is the value that always multiplies with an X. Okay, so how do we calculate this equation? First, to calculate or to find the regression line, which is Y hat is equals to B0 plus B1 X, we need to first find the slope of the equation. As you can see, calculating the slope, we can either use the sum square measures or we can use this formula, which is the sum of X and Y minus the sum of X times the sum of Y, divide by N, divide everything by your SSX, which is the sum of X squared minus the sum of X, square divided by N. And I said also those two are very different because the sum of X is 15 and the sum of X squared is 61. So those two mean two different things. Next, once you have your slope, you can then move to calculate your intercept. To calculate the intercept, you need your slope and you need the mean of X and the mean of Y and pay attention to the formula. This formula and that formula are totally different if you look at them, right? So you need to just pay attention to that. The mean, you know how to calculate them. So now let's calculate the slope. We need to start there. Calculating the slope, we just substitute the value the sum of X and Y is 145. The sum of X is 15, the sum of Y is 45, N is 5, divide by the sum of X squared is 61 minus the sum of X squared, 15 squared, divide by 5, and the answer we get is 0,625. You can take your calculator and double check, but you will see that it is the same as the formula we got previously. If I can refresh your mind about the formula we got previously, it was Y dot 125 plus 0.625. So 0.265 was one of them, and which is our B1. Now let's calculate the mean already is there, but just for interest sake, 45 divide by 9 is 4 and 15 divide by 5 is 3. So you calculate the mean, then you can substitute into your intercept so that you can calculate your intercept. So our intercept, the mean 9 minus 0.625 times 3, which gives us 7.125. And if we put the equation together, we get B0 is 7.125 plus B1 of 0.625 times X. And you can also do the same, you can go and estimate the value of Y by substituting the value of X. And that is how you do calculations manually. To calculate R, you also can do the same because the formula is this, n times the summation of X times, the summation of X times the summation of Y divided by the square root of your summation of your n times the summation of X squared minus the summation of X squared n times n times the summation of Y minus the summation of Y squared. So we just substitute the values. We know what from the table, we can just go ahead and substitute. It will be 5 times 145 minus the sum of X and Y. That's 15 times 45. n times 61 minus 15 squared times 5 times 465 minus 45 squared. And the answer will be 0,3327. So you can use the manual calculations. Are there any questions? Before I get to how we interpret the values that we just got. And from here you can also be able to take or calculate your R squared because your R squared will just be 0.3327 squared. Take the one, the longer number, don't take the last one. Take the shorter version, otherwise you won't get the answer correct. And this should give you 0,10 as well. So it should be the same as what you got from the table. So we know this is a moderate positive relationship. 10% of the values of the variation in the supervisor grade is explained by the variation in your test, your aptitude test grade. That's how you interpret the two values. Interpreting the regression line based on, remember from the definition that we had. Our B is 0, it's not always easy to interpret it because we know that it is when X is 0, then your Y estimate will be the same as your intercept. So with the value of 7.125 will be the same as your average estimated value for Y when the test grade is equal to 0. The one that is most interesting is your B1, it should be interpreting. To interpret 0.625, we say since our slope is 0.65, this tells us that the mean of the supervisor grade will increase, which is very important, will increase by 0.625, increase by 0.625 on average for one additional one test grade or one additional unit increase in the test grade. So what does that mean? When you look at the sign in front of the slope, it will tell you two things. The sign either will be negative or positive. Negative will tell us that there is a decrease and a positive will tell us that there is an increase. So when you do your interpretation, you should look at the sign of the slope as it will tell you whether was there an increase in one additional unit because we talk about as a result of one increase unit. As a result of one additional unit, in the value of your independent variable, or was there a decrease in the one additional increase. So you just need to make sure that you know how to interpret that. So far, what you have learned is how to use the regression analysis to predict the new value, how to find the coefficient of correlation, how to interpret the coefficient of correlation and calculate the coefficient of determination and how to interpret the coefficient of determination. How to interpret the slope and the intercept and how to calculate your regression line. That's all what you have learned so far. So now let's dig deep. Let's look at your calculator. I hope most of you have your calculators with you so that you can follow. But otherwise on the notes, on the video, you can always be able to follow through the videos that I posted that relates to regression, including the one that we did yesterday during the session. You can check out. There is a video on the Casio. I'm not going to touch on Casio because today's session, we don't have two hours long. We have one hour, 30 minutes. So I'm only going to demonstrate on one calculator. So we're going to look at the Casio calculator. And I hope you are going to be able to follow through with me as I explain what's going to happen on your calculator. We'll explain to each step by step with the demo. If you have a Casio calculator, I hope you have it in hand and you can follow us. So you will notice that our data has changed. I'm using new data, but there are also still five information on this slide. So the first step that we need to do is to take our calculator and put it in step mode one. In study unit three, when we were doing in the early stages, we were using step mode zero. So now we need step mode one. So you're going to press mode on your calculator and then this menu will come up and you're going to press two. And this menu will come up. I will explain. Let me get to my calculator as well. I'll put my calculator on the side so that it doesn't disturb anyone. So you're going to press mode. As I've said it, mode. And we're going to press the button that relates to step, which is two. And this menu will pop up, which has one up to eight. One we used it when we were doing the measures of central location and measures of variation. We used one because we only had one variable. Now we have two variables and we're doing a linear regression. As you can see there, quickly on the piece of paper somewhere, remember the formula that we used, write it down. I'm going to write it down here for myself. So we said it's Y hat is equals to B zero plus B one X. Why I'm writing this, it's because I want to make sure that when I use my calculator, I don't get things wrong. Now on your calculator, on your piece of paper as well, write the intercept and write this loop so that you don't forget. Now the next step, we need to select two for the regression line. As you see here on this regression line, our equation is not written like B zero and B one, they is A and B. So it means our B zero, it's going to be referred to as A and our B one is going to be referred to as B. And then you can write your equation like that. And now I know when I punch things, functions on my calculator, what am I punching? Is it the slope or is it the intercept? When I answer questions, I should be able to know whether I'm answering the slope or the intercept. Okay, so we're going to press button number two and this table will pop up and the table will pop up and now we are ready to capture the data and to capture the data, it's easy. We're going to first capture the X values and then we'll go and capture the Y values. You need to be very careful. When you capture the values, don't mix and match, don't mix around, capture them in order so that you are able to know that four corresponds with five and three corresponds with three. If you mix them around, you won't get the same answer. You need to pay attention as you work through your calculator. So now let's capture the X value and then the Y value. Right? And I will explain the steps as we capture. So to capture the X value, we press four and we say equal and we go to two and say equal, six equal, four equal, three equal and we will be on five and six is the last row on this calculator now and it doesn't have any value. Now I want to go and capture the Y value. You use the arrow to go up, up, up until you get to one or you get to where it says four and then you use the right arrow to get to the Y column and now we can capture the Y values and to capture them, five equal, three equal, seven equal, six equal and five equal and I can double check if I've captured all my values correctly. Three and five, four and six, six and seven, two and three and four and five and I've got all my values captured and then you can press the AC button. Your data is not on your calculator. Once they will be nothing wrong, it will still be there. So once we have captured our data now we are ready to answer the question whether we want to calculate our R or our R squared or we want to find the equation of a straight line and so on or we are asked to find the sum of X or the sum of X and Y. So to get to the state function we need to press this state function on button number one is written in orange. So we're going to press shift first. So you're going to press shift and then you're going to press button number one and the menu appears and on this menu you can ignore one where it says type. Two is where you start the data and press on three and three is where you will find all the sum measures. The summations and if anyway in the equation they were asking you to find the sum square sum of X squared you can see it's button number one you can press that if they ask you to find the sum of X and Y you know where to find it right but we're not interested in those ones for now. So I'm going to go out and you go back to shift and one always shift that one will take you back to the main to the actual main your state main. I'm going to skip no I'm not skipping if you press four you will see also on four you will get one two five six four five and seven only one two four five and seven you can use if they ask you so one is the sample size two and five the mean of X and the mean of Y and four and seven are the standard deviation of X and standard deviation of Y. We also are not looking for that because just showing you where you can find all those values so we go AC again shift and one we go back to the menu the one that we are interested in is five which is the reg when you press reg you will get a menu that looks like this A, B, R and X with a copy and Y with a copy and A is remember your intercept B is where you will find your slope R is where you will find your coefficient of correlation and if we want to estimate a value let's say we want to estimate the X value we use the estimate if we want to estimate the value of Y we use the estimate okay so now let's create a regression line we are going to press one for A so we press one and we press equal so you always remember to press the equal sign I'm only going to keep two decimals so yeah we have 1.65 so I'm going to keep only two decimals would be 1.66 Y hat is equals to 1.66 now I need to go and calculate if we just press A C and you go shift 1 and 5 for REC and we press 2 for B and then equal and the answer is 0.93 0.93 and then I must put my X and there is my regression line and the next step is if I want to estimate what will be the value of 5 sorry when X is 5 what will be the new estimate value of 5 so I can go back now for the new estimate value of Y you first need to press the value that you want to estimate which is 5 and then you go press shift 1 and then you go back to REC and we're going to press 5 and you will see that it will say 5 times the way estimate but this says I want to estimate the value of 5 and when I press that it equals to 6 so when X is 5 and Y will be 6 and you can do the same way if I want to estimate the value of 2 for example I want to estimate the value of 2 you do the same 2 is the value that I want to estimate shift 1, 5 for REC and 5 for the estimate value and I press equal and the value is 3.5 which is the same because all my values are our whole numbers I'm going to say it is 4 so the new estimate for 2 will be 4 and that will be that but what if sometimes they give you the value of Y let's say they say Y is 2 Y is 2 estimate the value of X you will do the same thing because they've given you the value of Y you just say 2, shift, start 5 and remember 4 4 is when you want to estimate the value X so you just press 4 and you press X and the value of 2 will be 0 when X is 0 the value of Y will be 2 because 1.66 is equals to 2 the next one let's say we want to estimate when the value is 4 let's see when X is 4 sorry when Y is 4 we say 4 shift start shift start reg and 4 equals 3 so when the value is 3 X will be 4 something like that when X is 3 Y will be 4 and that's how you use the estimate let's find R let's go and find R let's go and find the value of R shift start 5 and R is on battle number 3 and E equals and R is equals to 0.93 0.93 let's go find R's point because this tells me that there is a strong positive relationship so now what is the contribution or the variation that is explained so find R's you just press the X squared button and you press E equal and that gives us 87% 0.87 so 87% of the variation in Y are explained by the variation in X and that's how you answer the questions if you get them in the assignment or exam next week we will do more activities but I just wanted to show you how to use your calculator those who have a sharp calculator that looks like that or they have a a financial calculator you can also use or follow the steps from at the notes and the steps are clearly marked on your calculator you will use alpha because all your values are visible in front are written in in blue and you can use alpha to reach for those functions your estimated values you can use the open and close bracket the values are written in orange and if you watch the previous videos you will see how you use your sharp calculator as well some of the questions that you will get either in the exam or in the assignment there might be content related questions like for example how to interpret a coefficient of correlation based on the value that they gave you and you need to choose which one of these statements is incorrect or correct you can do that we can have a discussion on what's up when you are not sure about your answer as well I want to get to where we do calculations the other one is also you can see that here they give you this slope which is b1 you need to be able to know how to interpret it in terms of everything that we just spoke about today as well so we can also discuss this question maybe come on otherwise next week we can add them to the pool of questions that we need to do okay I want to use one of the exercises using your calculator so here is the question that we have a random sample of 8 drivers insured with the business having similar insurance policy were selected the following table consists of their driving experience and their monthly car insurance premium pay and they telling us which one is your x which one is your y which makes it easier for us to know when we capture the data on the calculator or on the excel sheet I forgot to mention about the excel sheet right now we will use the excel sheet to do this exercise so I'm going to open a template you also have this template with you I've shared it, it's under template 1, 2, 3, 4, 5, 6, 7, 8 there are 8 let me just double check here I've got 8 so there are also some instructions on the also on the previous videos these things are explained clearly you can watch them and check how we do that there are instructions in terms of how you add new values and make changes to the excel itself so because there are a lot of calculations I'm going to show you scrolling down there are calculations at the bottom and also there are calculations done adjacent on the right these are calculations for your slope for the mean for the mean of x and the mean of y for your intercept and your your y estimate your r all of these calculations are calculated automatically and I show you which formulas we are using they are calculated automatically my inputting only your x and y values you will see that these calculations that I've added on here they disappear because there is no data the minute I give it data then they will be populated so I'm going to populate this excel with the data that we have here and since I have no space I'm going to hide my ribbon the challenge with hiding your ribbon is you cannot minimize does it yes what happens if you substitute x and y in the opposite way so if you do x as the insurance monthly premium and then y is the driving experience then you will get your answers wrong okay yes you need to capture them correctly x as x y is y okay so I just want to show you when you download the template you will need to if you're adding problems you will need to click in the b block way under the total you will have to select up until the k so b block up until k those are the only things that you select you don't select the total only the values you right click and you can delete or inside I've counted how many records I need I need only 8 lines so I'm not going to add or insert but for you if you only need 5 lines you select all 3 other lines that you don't need and you just go and you delete and it will say do you want to shift the cell up or down and you just say cell up and it will not affect anything the calculation below and the calculation on the site won't be affected if I want to add new fields I just highlight from column b I also want to add only 3 so I'm going to say insert and I will say insert down because I want to insert the columns below I'm short of one column or one row sorry I must just go and select one row and say insert and say down and there I've got 8 of them the other thing you need to be aware of is to select all the other calculations above it above where you inserted and drag the calculations because you want to make sure that the same calculations are applied everywhere once you're done then you can capture your data I'm going to start capturing the X values first so you will stand on the X column and you capture the X 5 2 9 15 and you will see as soon as you start capturing with the fields where there is calculation related to that we'll start populating as well 6 25 and 16 and then we go to the Y and 640 8 70 500 710 440 560 420 and the last one is 600 just want to make this bigger so the data that I have now the two three columns that we used also in the presentation there sum of X sum of X and X and Y and X square and Y square I also have them there and all the calculations relating to the and this will happen so you just need to expand the column because this is done manually okay and there is your B1 there is your B0 and there is your means and here it calculated your your X and Y I'm just going to delete that your equation of a straight line B0 minus B1 so because the sign is minus there there is the minus so it will be B0 minus 15 point which is B1 X your your correlation of coefficient R and your coefficient of determination also calculated there now the other thing which we didn't mention is this part here at the bottom you will see that it's not automated because also if you want to go and calculate the SSEs and SSR you will just do the same you will need to go and copy the data that you had you have there if the questions is asking you to do the same you just copy that and paste it yeah they will change your estimated value it's based on it's based on the equation the regression line it takes the regression line and because I didn't have any value it was 0 it was the same it was 76 because there is no other value there so I need to delete those two because they affect the calculations that we do right here at the bottom so you just delete the fields that you don't need and it calculates your your values your equations and these are your equations and you can then just take them to do your calculations to see if you get the answers but anyway now let's go and answer the question just want to move this big up this is just driving experience so don't worry too much about that and then I must scroll scroll scroll scroll scroll and I must scroll scroll scroll scroll scroll because this is the way we want to end up because we want to answer all these questions so which one of the following statement is incorrect the insurance premium depends on the driving experience does it depend on it is there a correlation they are there is a what is the what is the answer for our correlation it's a it's 77 so it is a strong or is it a moderate relationship but it is a negative the negative relationship the other thing our slope remember what is our slope the slope is b1 you need to check whether the slope is correct or not and I can see they using decimal and one decimal so you can also change the decimals by using the number so I can leave everything to two decimal so let's answer the question we can skip one if you are unsure about one let's go to number two the y-intercept is 7666 y-intercept is the same as b0 is that true or false is that correct it is correct because the y-intercept is 776.60 that is correct the slope is that correct you go and look at the slope answer the slope is minus 15.48 that is correct number four it says there is a negative relationship there is a negative relationship insurance depends on driving yes it does number six it says 76% of the total variation independent variable is explained by the independent in is that true if we look at what is 76% 76.66 which if I convert this to a percentage if I convert that and I make it two decimals and I convert this to a percentage and I make it two decimals the explanation here given is for the coefficient of correlation the coefficient of determination is that correct it should state 58.97 because that is the square root of or the square of 76% which should be 50 8.97 which is not 76.76 is there coefficient of correlation which tells us there is the measure of strength and direction so number five is the incorrect one so you can use the template that I just gave you the template that I just gave you access to or you can use the actual I'm going to copy this and I'm going to open a new sheet and paste that on your Excel there is a data analysis panel if you don't have it you go to file options add ins and there is an analysis tool pack you click on analysis tool pack and you click on go at the bottom and you tick analysis tool pack and you click ok once you do that you close your Excel and then you open it up when you open it up the second time the data you go to the data menu under data analysis you open it and you scroll down to wait for its regression and you click on ok and you select the values so on your one it will not have I used this yesterday for the other session that's why I have information in so you click inside the box where it says the Y input you go and select the Y input input in the label you select only the values that are relevant the X input you also go and select your X values you tick the labels you go into ticket your setting will be on new plan you can use the same output which will output in here you select the output range and inside the white block you click in there and you select where you want the output to start you ignore the rest of the other selection and then you press ok and there is your answers if you look at this this is the same as what we just answered 0.58 and 0.77 is the same as what we got on R 0.76 and 0.58 are the same thing if you scroll down to the bottom to the coefficients there is 0.776 and 0.148 so you should be able to answer some of these questions the only thing which is a downfall of this is your R value here will never be negative the value here on excel it's not negative but you can look at the value you get on your slope if the value there is negative therefore it means this is also negative that's the only disadvantage of using the data analysis panel if you want to calculate the correlation coefficient also from the data analysis panel there is a correlation but this will give you I just want to show you what it will give you select X and Y you just need to select both of them and you say this is grouped by columns because the values are grouped by columns and you also state that everything is in the first row I want to output it in a new play it will open a new play there if I open it and there it is so that is the only way you will get your R of negative number as you can see there but if you're using this excel data analysis you're just going to get the R in a positive answer this one gives you in a negative otherwise remember you can use your calculator to calculate you just input the data and do the calculations so you've got work to do this and find the one that you feel comfortable with whether you want to use the formulas and calculate manually therefore it means on this table you will create X square Y square and so an X and Y on this and complete it and create a total column here at the end and calculate all the totals you can use the totals in your formulas so find the one that you feel comfortable with you can practice there is no harm in playing around and practicing remember if the questions look like this and they tell you that in 10 years therefore it means they have given you an X and they want you to go and estimate the value because your amount the premium is in rent you need to go and estimate so once you have your regression line because here they didn't give you regression line you need to go and write out your regression line which will be B0 plus B1 X so you will use the values that you have here 7 6.6 minus minus 15.48 X you will use this equation to answer and substitute tag there if you used if you used your calculator remember you will just press 10 and then go and do the estimate if you used excel I keep on opening the wrong excel if you used excel which is this excel if you used this excel you will have the equation right there you know what the equation is always easy to also you can just say that as that times whatever the value you are estimating let's say it's 10 and then you press equal it should give you the answer oh sorry I used the wrong the wrong values you need to use the ones at the top not this ones here because these are calculations you will use that one and you will need to use multiply by the value because this is excel and it doesn't recognize a it doesn't recognize a a bracket so and that will be your answer so I already gave you the answer for that question so the answer is 611184 okay so that's how you will answer questions but I've also given you some of the questions that you can use to practice and here is another one yeah you do not have a you do not have the x and y so it means you need to use formulas so you need to know how to calculate this ss xy which are your sum square measures you need to go and look for that formula for the sum square measures as well as the ssx which this I can give you the formula for is the sum of x square minus the sum of x divide by n and your sum of x and y you also need to go and look up how do we calculate that which is your ss xy xy I might be giving you the wrong answer here which is the sum of x the sum of y divide by n or you could say it's n times the sum of x and y minus the sum of x the sum of x and you can calculate that and that should give you the answer there the mean you know how to calculate the mean you know how to calculate ss t ss t you know that ss t is your ss but there are sum square measures you just need to look at the sum square measures but ss t you can calculate it using your ss e plus ss r or if you don't want to use ss t over m plus ss r because you won't have the ss e and ss r you can use your coefficient of correlations no not the coefficient of correlation you can use your estimate actually from the regulation line to calculate your ss ss t yeah so you will just use those formulas to calculate your b0 and b1 remember b1 you will use the equations so b1 is the same as your ss xy over ss x that is b1 which will be that value that you calculate there and b0 which is this value remember b0 is your y hat minus b1 times the mean of y minus b1 times the mean of x so if you have your b1 you can calculate b0 and that is that this is the equation of a straight line or the regression line then you can calculate by substituting x into that formula and calculating that coefficient of correlation you use the formula you remember the formula is that very long you will use the sum square measures and that is how you will answer questions like this so you can always rely on using the templates and Excel and all that you need to be able to know how to use the formulas and do the calculations and that can be today's session are there any questions if there are no questions I think we need to practice more your yes tough definitely you need to practice more and look at more exercises and more activities as well especially for this one because also the templates are not as straightforward they might be a little bit tricky but you need to make sure that you know how to use the properly in order to support them to support you you also need to know how to use your calculator allowing us to use yes they do allow you depending on the type of exam you are writing right so if it's online and you're not using your phone only and the thing doesn't want you to always keep on taking pictures and all that then it should be fine but you need to be prepared to go to an exam knowing almost that if they are not allowing you to use templates then you need to know how to use your calculator or how to calculate things my way Lizzy do you know assignment 5 is going to be due is it the 30th or has the date changed I have no idea I know that it's going to be open on the 16th when is the 16th next week on Friday yes it will be open on Friday and I don't know when is June but it has to be June before you guys start writing exams because you need to have exams that's in October right so in order for them to submit your year marks and see who qualifies for getting your timetables and all that because timetables needs to be sent so yeah as far as I know that the subject is going to be online yeah you're writing online yeah they haven't communicated that we need to be in a no it got less you need to receive a timetable that tells you when you're writing whether you met the requirements to write the exam or not yeah but I was referring to Ntumbi I think she was asking about templates and stuff so I mean as far as I know this is going to be an online examination so I mean I know we need to kind of memorise and just know how to work these formulas but the templates really help a lot they keep you sane yeah and also even if you're writing online as well depending on your practice system if they say this is not an open book yeah and your practice system checks for you moving from one system to the other it will disadvantage you so you always need to be aware of the type of exam okay and what are the implications of closing a screen and opening it again and all those things because they take into consideration when you are writing exams so be prepared read about everything you need to read about requirements for your exam and prepare accordingly okay thank you yep okay bye enjoy your test