 That's how guys there's a lot there's a lot of energy. I need that energy from you to give me that energy Okay, so today we're going to continue with the activities because on Wednesday we we went through the content but before we start today's session I'm going to also do a recap so that we refresh our mind from what we have learned. I know that it was too much there was so many things that I've said on Wednesday. I hope today when we do the activities that things will start to unpack and and become easier as well as when you start doing the exercises yourselves. Are you able to see my my screen? Yes yes okay so I want to I want to go out and share my entire screen because I'm going to also demonstrate the other things like your calculator and how to use the Excel spreadsheet that I shared with you and yeah so let's start with today's session. Welcome to your session 26. Today we're looking at activities relating to regression. Remember then on Wednesday we will then look at both regression and chi-square test because your assignment five is including both of them so but for today we only concentrating only on the regression activities. And let me also just double check. I know that my UNISA is down and your assignment is due on the 30th which is that following week Monday. Therefore the Wednesday the 25th we will do the activities relating to both regression and chi-square and then on the 28th we will also continue doing those two because I was hoping that on Wednesday we can do the activities on my UNISA life but since my UNISA is down I won't have time to to do that and so we'll have to I will send you the activities as a PDF and then we can go through them as well. Okay so let's recap on what we did the previous time we were together on Wednesday. So we looked at how to make inferences in terms of coefficient of correlation coefficient of determination and the regression alive. We also learned how to calculate all of those and how to interpret them. In terms of the correlation we said the correlation is a relation it shows the relationship between two variables the independent and the dependent variables so yeah we're talking about categorical variable oh sorry numerical variables so for correlation we look when we test the relationship that exists between two numerical variable then we use either the scatter plot to visualize the relationship or we use the correlation coefficient when we calculate and it tells us the strength or the direction of that relationship and this are the type of relationship that we spoke about that it can either be a linear or non-linear or they can also be no relationship and we said when it's a linear it can either be a negative relationship or a positive relationship when it's a non-linear relationship it can take a form of a quadratic form or it can be an exponential format and when there is no relationship therefore it means one variable it's at constant and therefore there won't be any relationship and with the scatter plot we can also have a coefficient of correlation which is a measure or a value that we calculate that can tell us when we're looking at this type of a visualization what type of the coefficient of correlation can be and that's where we look at either the direction or the strength and we said if the value of r lies between negative one and one and if r is greater than zero we say it is a positive relationship when r value is less than zero it's a negative correlation or a negative relationship and when r is equals to zero we say there is no relationship and we also looked at how in terms of the strength how we can define that strength based on the range of the values that you get and either you can say when it's when r is one approximately equals to one we say it's a perfect whether it's negative or positive we say it's a perfect relationship so you when it's positive you will say it's a perfect positive relationship when it's negative it will be perfect negative relationship also the value of r you can also in terms of the strength you can say it's strong or it's weak or it's moderate so depending on the range of the values that you you are working with we also and how to interpret each one of them if we are given them so I'm not going to go into detail in terms of interpreting all these graphs we dealt with that on Wednesday we also spoke about using the sum square measures to calculate the coefficient of correlation or to calculate the slope or to calculate the total variation and so forth so we need to know how to use the sum square measures as well because sometimes the sum square measures that you can be given the sum square measures and be asked to calculate the coefficient of correlation you need to know how the formula works or how to complete the formula and we also learned about the measures of total variation where we said that our total variation it can be given by two parts which is your sum square measures of regression plus the sum square measures of errors of that regression one and we also learned that we can use the total sum square measures to calculate your coefficient of determination and the coefficient of determination gives you the percentage or the proportion of the total variation in the dependent variable that is explained by the variation in the x or the independent variable and your r which is the coefficient your r squared which is the coefficient of determination lies between zero and one and it's just the square of your coefficient of correlation because coefficient of correlation is r and if you have your r you can just take the square of that r venue it will give you the coefficient of determination and we can calculate the coefficient of determination not from the r but using the sum square measures as well then we also learned how to interpret the coefficient of determination which if r squared is equals to one we say it's 100 percent of the variation in y are explained by the variation in x and when it's a any value random from between zero and one any value we say some of those sum but not all of the variation in y are explained by the variation in x and if the r value is zero we say the none of the variation in y can be explained by the variation in in x or y does not depend on x and that is r squared and we've learned also how to interpret it by using this slide where we were given the r value and we found the r squared by just taking the square of that r and we got that it was 41 percent and we interpreted it to say 41 percent of the variation in the person salary can be explained by the variation in her education attainment then we went on and and dealt with regression analysis the least square and we explained what regression is because regression is a method that we use to predict a value of your dependent variable based on the value of your independent variable and since we only using one independent variable in this instance we're not doing a multiple regression you're only looking at one dependent variable independent variable and we set in terms of your least square regression line we define our dependent variable as your y variable which is the outcome variable which is the variable that we want to predict and the independent variable is our input variable which is the variable that we're going to use to predict or explain the dependent variable and we've learned that this is the regression line which is your y hat which is your estimated value is equals to b0 which is your intercept plus b1 which is your slope times your x observation so it's y is equals to b0 plus b1x and in terms of the regression line we also set our b0 we normally do not interpret it but b1 you are expected to also interpret what the value of b1 mean in relation to the regression where you will state that your value of b1 is the estimated change in the average value of your y as a result of one unit increase in the values of x and based on the slope as well it can also tell you the direction of your regression line so if it's negative if the value of your slope is negative therefore it means your your relationship is negative and we know when the relationship is negative we say that it is a decreased relationship or the value when the value of x are increasing the values of y are decreasing so therefore when you interpret that it will be say it shows the decrease average is a decrease average value of your y as a result of one unit increase in the values of your x because when your x values are increasing your y values are decreasing so when your value of your slope is positive therefore it's a increase in the average value of your y because your value of your y will increase by that value it will increase bar the average value of y as a result of one unit increase in the value of x because when it's positive it means when the values of x are increasing the values of y are also increasing and that is the slope so when you look at the slope and you look at the regression coefficient they both need to tell you the same thing in terms of the direction whereas the correlation coefficient will also tell you the strength of that relationship but it will also give you the direction which also the direction of that you will find it by looking at the sign on your slope I hope I make make it easy for you to understand that and we looked at an example we looked at using the text grade from the aptitude test and the supervisor grade of this applicants once they have been in employment and we we splotted the the x and the y values to see the to see the the the the relationship between our x and our y and we saw that it was a positive relationship because when the values of the test grades were increasing the values of the supervisor grading was also increasing with the exception of that one outlier that is there and outlier and outlier is an extreme value a value out of the norm out of far away from the other values okay so we also took this and answered a question where we were asked to look at the regression line and the coefficient of correlation and the coefficient of determination and we said we can take the x and y value put it in excel and calculate the regression do a data analysis on the regression and it gives that this output and where on the output we were able to find all the key measures that we might use to complete regression model where multiple r will be your coefficient of correlation which is your r and r squared is r squared is your coefficient of determination and in terms of the regression line where we had your y hat is equals to b zero plus b one x leave it means when we come to the coefficient of this regression output then our b zero will be our intercept and b one will be the coefficient of the test and we just take them and substitute them into the equation and create a regression line where we know that the y hat and the test grades we do not substitute we just leave them as variables unknown variables as they are and this is the question that we needed to answer so in that instance we answered that question using excel and here we're going to answer it using manual calculation and in terms of the manual calculations remember you need to start by getting your regression line so in order for you to formulate your regression line we need to find b zero and in order to find the b zero we need the mean the slope which is b one and the mean of x so therefore it means we need to first find the mean of b sorry the b one first so we go and find b one and if you look at b one we can find b one using the sum square measure or the summation measures so here we can use the sum square errors so which is the sum square measures at the top of x and y and the sum square measure of x substitute the values and then calculate b one you can calculate the mean of y the mean of x and then substitute into the mean of b zero and then find the b zero which is our intercept and then substitute into our regression line to formulate our regression line so when we have our x and y if you calculate in this manually you will have to go and calculate the values of x and y separated you do your x times your y your x squared and your y squared and then you sum you sum them and once you have created the summation which is adding all of them then you can come in substitute into the formula to calculate your slope calculate your mean calculate your mean calculate your intercept by substituting all those three values that you've calculated previously and substitute the values of b one and b zero into the formula and create your regression line and that's what we have done we also looked at how we calculate the regression using I don't know why I have it twice using the regression formula or the correlation of the coefficient of correlation also using the same values so you just substitute sorry you just substitute the values I don't know why it replaced my table I will fix that but we just use this very same table this table to substitute the values of our summations into the formula and then calculate the coefficient of in order for us to calculate our squared we just take the value of zero comma three two and square it and we got zero comma one zero four zero I think it was like that or I might be lying and we also looked at how we interpret the regression line because we already have the regression line we said b zero we don't interpret b one we interpret and say zero point six tells us the means that that the mean value of a supervisor grade will increase by zero point six two five on average for one additional unit increase of test grade because we're saying an increase because of the positive sign on the slope when we do the activities we will get used to how we interpret b one as well and that's what we learned on Wednesday too much and then we also learned how to use a calculator so we looked at the steps on how do we use our calculator to calculate the regression line by following the steps on our calculator where we can find our intercept which is a and our slope which is b our coefficient of correlation and we can use also why had to estimate a new value of y also looked at the shop calculator as well on all the steps and on the shop calculator all the values like your regression lines they are they are on your calculator and they are written in in blue so your r will be on the division side your a and your b will be there if you are using a case your a shop calculator you just need to press your alpha button which is the green button your mean for x will be on button for four and your mean for y will be on button seven okay then we also looked at this is if you are using a financial calculator which also looks exactly the same as the shop calculator because everything is written in orange you just use your alpha your r is on the open bracket your a and a b and your mean on a four and a seven and when you want to estimate the value there is a y had estimate on the closed bracket as well okay so now let's start with today's session it took me 30 minutes to to recap but no worries let's begin this one we can work it out together whoever wants to answer can answer it if the coefficient of correlation is equal to 0.98 which of the following statement is correct number one there is a strong positive relationship between dependent and independent variable number two when the independent variables decrease the dependent variable decreases as well number three the slope will be positive number four ss y will be positive which is your sum square measure of x and y it says it will be positive so which one of the following statement will be incorrect so based on what we know so far about the regression based on what we know so far what do we know we know that our especially with the correlation coefficient our correlation coefficient is positive and when it's positive therefore it means the regression is positive then it means when the values of x are increasing the values of y are also increasing and it also tells me that the change in the values of x to the change in the values of y which is my slope is also positive therefore it means my sum square measures as well will be positive if my regression is also positive so which one sorry which one is incorrect sorry my sum square measure wait sorry it's not gonna be positive but we know that our sum square measure will just be the sum of x and y times the sum of x times the sum of y divided by n oh it will just be n times the sum of x y minus the sum of x times the sum of y so this can either even be negative or positive because depending on um what you will have there so there's no guarantee on that so let's look at number one number one says there is a strong relays strong positive relationship between the dependent variable and and independent variable is statement correct yes the statement is correct because it's a positive relationship and based on the based on the strength this is a strong this is a strong positive relationship number two it says when independent variable decreases so it means when the values of when the values of y are going down that's the values of sorry when the values of x are going down that's the values of y also go down it's a very confusing statement because we always relate this in terms of increase when it's increasing it's also increasing but if i'm here the values if i'm moving from year to year i am declining so because this is the highest value of x and this is the highest value of y if i'm moving downwards this is the value of x and this is the value of y so when y is smaller x is smaller and that's what i will interpret that as because if my independent variable is getting smaller it means also my dependent variables are also getting smaller because when they are increasing they are both increasing in the same direction my slope is positive knowing the sign in front of my regression line if my sign is positive therefore my slope also will be positive my sss will is positive i won't know whether my sss is positive because i will need to be able to calculate this to know whether is it the negative answer that i get here or a positive answer that i will get they maybe it's a negative or it's a positive i would know so i would have chosen number four is an incorrect one now i want you guys to start talking to me i cannot be the only one talking consider a simple linear regression equation the slope b1 represents number one it says the predicted value of y is equals to zero remember we're talking about the slope where its y hat is equals to b0 plus b1 x where this is our intercept and this is our slope so number one says the predicted value of y so it means this predicted value of y will be our slope if the value of x is equals to zero that's what we need to make make out for number two says the estimated average change in y pay unit change in x so does that interpret what the slope is the predicted value of y does the slope predict the value of y the variation around the regression line is this the variation around the regression the predicted value of x when y is zero so are you guys answering the question on the chat or let me just check as well or am i talking to myself which one is the correct answer and i see some answers it's option number two yes it's option number two because this is our intercept that is not correct total regression is r squared and it does not predict the value of y because you need both of those intercept and the slope to predict the value of y so only number two is the correct one it is the estimated average change in y pay unit change in x given a random sample of eight cars drivers issued a random sample of eight cars drivers insured with a business and having similar car insurance policies were selected the following table lists their driving experience in years and the monthly car insurance premium paid and we given our x and our y value and we are asked to find the incorrect number one it says the premium depends on the experience i don't know that it means i need to find if they are related if x depends on y the y intercept is equals to 776 the slope is 15 minus 15.58 therefore it means i need to calculate this there is a negative relationship between x and y i don't know that i need to calculate that and the total variation which is my r squared in the independent variable is explained by the variation in the independent variable so i need to go calculate my r squared to know that so two ways or three ways or even four ways that you can do that so you can either use the formulas as given we can use a calculator and calculate so i'll use the case you in this instance to answer some of the questions so my x and y i need to pay attention on what they defined as x and what they defined as y so that is my x that is my y so the first step mode two and i press two and i start putting in the values onto the calculator first with the x because i'm using a case you i'll do the x first five equal two equal 12 equal nine equal 15 equal six equal 25 equal 16 equal then i have one of them just go up up up up until i get to the first one then go to the right start capturing the y values 640 equals 870 equals 500 equals 710 equal 440 equal 560 equal 420 equal 600 equal you need to make sure that you have things relate to one another and you have captured the values correctly because if you're not will not find the correct answer so i'll just go and double check if my values are correct i just browsing through 5 and 640 2 and 7 870 12 and 509 and 710 15 and 440 6 and 560 25 and 420 and 16 and 600 so i have all of my values what i need to do is go ac go out now i'm ready to answer the question so i can stand on this side it says my y intercept i know on the calculator my y intercept on the calculator so in terms of the formula y hat is equals to b0 plus b1 x on the calculator it will show as y is equals to a plus b x so since i know that so therefore it means my intercept my intercept is a so this i must find a my slope is b so on the on the calculator that will be the case for r squared i will have to go and find r and then press the x wet button so i already know what i need to do let's go do it so let's first find our y intercept shift stat and i go to rec which is five and i'm looking for y intercept which is one and i press equal therefore my y intercept is correct done i'm looking for the incorrect one so let's calculate the slope shift one rec five slope is two which is b equal minus 15.8 which means my slope also is correct so both of them are correct there is a negative relationship that exists between the slope and yes there is because there it's negative therefore it means the relationship is negative so it will be incorrect but i can also calculate my r because r lies between negative one and one and it will also tell me whether the relationship is negative so let's go calculate r shift stat rec r is on button number three and equal there is my r my r is minus 0.76 so i have r not r squared i need to calculate r squared r squared i just press the x squared and that is my coefficient of correlation it should state that it's 0.859 and that is incorrect that's correct that's correct that's correct why depend on this the relationship here it's a negative relationship therefore it means when the y depends on y because there is a relationship if it was zero then there is no relationship therefore it does not depend on one another so that's how you will answer the question on one calculate so let's assume that you went and you bought a sharp calculator which is not a k-show calculator which looks like this so as you can see this calculator does not have any values on top um these are the latest sharp calculator that people get from the shop so i'll just do the same um on mode i need stat which is one i need one a plus bx which is the same as what i have there so i'll press one then i have a table similar to what we did with the k-show we also as long as there is a table we press the equal sign to enter the data so i need to i'm going to use my computer as an equal sign with the enter so i'll also do the x value first so i'll go five need to be on the calculator itself sorry my bet this doesn't allow me to use my my so it's going to be a long way to go in terms of this one so let's put it this way okay i'll move it once i've done a couple of them so it's five equal two equal 12 equal nine equal now i need the others 15 i made a mistake i just need to delete the delete the button on this 15 equal six equal 25 equal 16 equal then i need to go to the top i can go there and go back to the top way is that's it one okay six 40 equal eight 70 equal 500 equal seven turn equal 440 equal 560 equal 420 and 600 equal if i did any mistake i don't know because time consuming to capture all this all right so now i have all the values captured i'll just go out and i know that i'm doing regression i'll press the alpha and then i go to stat the stat mode is there then press eight and there i have my different menus i need the regression and i press one for regression and they are all my values so with this calculator i'm able to see all the answers i all at once so my a is seven seven six my b is minus 15.8 my correlation of coefficient is 0.76 and my coefficient of determination is 58 so whether you use a sharp calculator or a cashier calculator you will be able to get the same answer now let's go to this one what about when i use um when i use my excel so on excel i need to move some of these things no i'm just gonna give it so on excel i need to capture also all this information on our x and y so let's go five two i'll just press enter when i go 12 enter nine enter so that i just put all the values of x i'm at the end for x so if i click here i can just you just need to let me just show you when it's bigger so i don't have to to use my eight don't do it like that you need to click on the b of the last one and scroll to g make sure that all of them are highlighted when you click in the cell you just hold control or hold your mouse and then highlight and then insert just click insert there and say insert down and continue like that i need two more inside down and that will allow me to capture the other values so i'm on 15 i go enter and then i can put 25 enter uh six i need six six enter 25 enter and then the last one is 16 so i've entered all the values and when you do enter your values you will see that the total will adjust as well uh and all the other calculations will happen so i'll do the same 640 equal 870 enter 500 enter 710 enter 440 enter 5 enter 420 and 600 and the excel will also do all the calculations so since my excel is my band it will do all the calculations and you must bear with me some of the things are just pictures that i i placed there because it's not easy to to write them on excel like this is a picture all of them uh so looking at i can answer my question my intercept which is b0 it's seven seven seven six six my slope is 15 i have to click my slope is 15.48 my r is minus seven six and my r squared is 58 so all of them are there so let's make it bigger so that you can see all the values are there so whichever way you use you should be able to calculate and answer the question okay so that is me talking for the last time okay now we also need to be able to uh estimate a value so yeah we say the question is to estimate the value 10 yes so it means we need to go back to our question which we have the slope and the intercept we can write our equation y is equals to what was this slope seven six seven six six point six zero on this one it says six zero and we know that this slope was minus so 15.48 x so we can write the regression line all what we need to do is seven six six point six zero minus 15.48 times 10 because that is what we are estimating calculate um i'll have to choose another calculator that i'm not using at this point so we have seven six six point six zero plus uh not plus and delete minus 15.48 open bracket 10 close bracket equals six six one one eight zero six one one eight zero that is the estimate that is the value so from the calculators that we used we can go and estimate the value so let's go and estimate the value so on this calculator i need to first press 10 because i'm estimating 10 go say shift set right which is five now i'm going to use the y hat for the estimation and it will say 10 y hat and when i press equal i should get the same answer so because this calculator that i was using i rounded off the value there that is why i'm not getting the same answer but that is on your calculator you do get the exactly the same as what we have there on the this one i will be lying if i can tell you that i know how to estimate using this let's see because i've just learned it today so let's see alpha no i don't think on this one we should we will be able to estimate five so you can also either use those values it doesn't give the opportunity to estimate so let's see again no so you'll have to calculate it manually on this one as well get the values and estimate so i just want to see the value of of it says six zero as well on this one so on excel since we already have this value we can estimate we can use the excel to estimate the value as well easy we can take my that plus that times 10 equals and there is my value so you can either come here and also calculate it it will still give you the same answer by just adding the two values and multiplying by your x or we could have just used our b zero and b b one still give you the same answer that's regression which one of the following statement is incorrect i'm going to give you two minutes to read it through and then give me an answer the correlation analysis determines the strength and the relation and the direction of a relationship between variables the independent variable always influences the dependent variable a negative slope is a simple linear regression shows oh a negative slope in a simple linear regression shows that there is a negative relationship between independent and dependent variable if the slope is equals to zero there is no relationship between them uh the two variable when the coefficient of correlation is negative there is a weak relationship between two variables regardless of the magnitude of the r which one of this is incorrect remember you can also answer in the check if you can unmute and give me the answers but i will i will like it if you can unmute and talk to me so that the whole video or the whole recording is not me alone talking easy yes i think it's option five why you think it's option five okay so let's look at all the option and see if option five is the incorrect one so option one it says the correlation determines the strength and the relationship is that correct that's what you are saying correct the independent variable always influences the dependent variable yes that's correct because your independent your depends on independent variable a negative slope in a linear regression shows that there is a negative relationship so a negative if this is negative therefore it means this relationship is negative that is the slope so we look at the change in the values so that is correct number four it says if the slope is equals to zero then there is no relationship y hat is equals to b1 plus b0 sorry b0 plus b1x if the slope here is zero that does that does that does that mean there is no relationship if the slope is zero there will still be some sort of a relationship because y hat will sorry y hat won't have any relation to the x so there as well it means correct because if the slope here if the answer here is zero therefore it means there is no relationship then i'm thinking it to multiply the x and it will be zero also yes because if there's if the slope is zero then it's not there so there is no relation to the x so there is no relationship so that is also correct here it says the coefficient of correlation when it's negative it means the relationship is weak irregardless of the magnitude so we know that it can also be zero negative zero comma nine zero but that does not mean that this is a weak relationship but this is a negative strong relationship it's not weak so that is incorrect and that's how you will look at the options as well given the information a research on the relationship between x and y reveals the following information here they gave you the summations n of 14 the sum of x is 52 the sum of y sum of x and y the sum of x squared and the sum of y squared which one of the following statement is incorrect the first one it says is sum of x and y incorrect so if you forgot the formulas the formulas are like this for the summation of x and y which is sum of x and y minus the sum of x times the sum of y you need to go calculate is equals to n the sum of x oh let's put it at the top to calculate this sum of x the summation of x we need to use the sum of x squared minus the sum of x squared divided by n that's the sum of x then we need to calculate the coefficient of no the regression so this is your b0 and your b1 so let's write the formula down b0 and b1 so we know that we need to calculate first b1 so to calculate b1 you need to go and use b1 is your sum of x and y minus the sum of x times the sum of y divided by n divided by the sum of x squared i think it's the same as this so it's you can actually you will just take the values uh let me not go and complicate your life here because if you if those values are correct then you can substitute them which is ssxy divided by ssx will give you b1 to calculate b0 you will need y hat uh mean of y times b1 times mean of x which the mean is just mean of x is the sum of x divided by n mean of y is the sum the sum of y come on divide by n and you substitute into those formulas get that once if this is correct then you need to go and estimate the value of y so make sure that you substitute this onto that formula but that is if this is correct substitute into there and find out if that is correct that is correct calculating the coefficient of correlation yeah it's the other thing you need to know the sum summation formula so r is i'm not going to repeat that because it's the same as the sum sum square measures of x and y so if you got the answer there for ssxy you can use that divide by uh here you will have to go calculate the uh sum you will have to go and calculate sum square of y which is the sum of y squared minus the sum of y squared divided by n that's what i'm gonna do here at the bottom i'm gonna call this sum sum x times sum sum y so that you can just take the answer you got there and the answer you got there and substitute into the formula to calculate your coefficient of correlation oh gosh now they also ask you to calculate ss t we know ss t is given by ss r minus o plus ss e so if so many calculations party i'm saying so many calculations in y yes so easy to do don't worry okay so in terms of the first question the summation of x and y is 382.85 minus the sum of x which is 52.9 times 92.8 divided by n of 14 so you just need to go and calculate that on this side where we have the summation of x we have x squared which is 215.41 minus the sum of x which is 52.9 squared divided by 14 and then you go and find the answer for this that will be the sum for that one and then here on the mean you just need 52.9 divided by 14 and 92.8 divided by 14 okay for the for number one ss x y the answer is correct it's the 2.9819855 so that is correct the sum of x and y let's see if that one is correct as well so we have 215.14 minus 52.9 squared yes it's 214 thank you squared divided by 14 is equals to 15 point so that is also correct so this is also correct now we need to come to the second one we need to calculate b1 so b1 is this answer that we have on here so if we can find this to be equals to that therefore we are on the right track so substituting 32 divided by 15 which is this formula that we have here I'm not going to write all the values so I'm just gonna say 32.1 divided by 15.5 we will use all of them when you do on the calculator so on the calculator I'm just gonna use all of them yes take 2.198558 am I missing a number nope divide by 15.52358 okay when we say equal and the answer we get b1 is so you need to write the answer for b1 correctly so we know our b1 I'm just gonna write the answer here remove all this we found the answer is 2.074 so it is that value there 2.07417 that is b0 we are sorry b1 so b0 we know b0 is b1 it's by the mean so did you calculate the mean there the mean of y I'll start with the mean of y the mean of y is 98 2.8 divide by 14 I'm gonna do it on the calculator the whole of it so I'm not going to get the mean the answer for the mean minus our b1 we did get it it is 2.2.07417 times the mean of x is 52.9 divide by 14 and I close the bracket equals minus so minus 2 which is that value there sorry which is that value there so this answer is minus 1.20883 so now it means this is incorrect it's not the correct formula we cannot use this to estimate the value of x so estimating the value of x we need to use the correct formula which should be y hat is equals to our b minus b0 is minus 1.20883 plus 2.07417 x so we need to put here times 1.7 1.5 we need to substitute 1.5 there so did you calculate already I get 0.26093 none of my calculators okay let's use this one and this one for some reason so minus 1.2 0883 07417 times 1.5 close bracket it's 1.9 which then it means this is correct and the coefficient of correlation or Gaussian therefore it means they are all correct the only incorrect answer I'm not going to go through all of them now the only incorrect answer we've proved that it is the equation you need to because if that is also correct then I've already proved that that one is the incorrect one then you can also do your coefficient of correlation by substituting the value probably you will get that you multiply by 100 when you get to the answer you will multiply by 100 it will give you the the answer so remember also to first calculate this to substitute in today the SST you can also calculate it let me just double check yeah but you will need the coefficient no you will need 0.3 08 yeah you will need to calculate um the SST using the formula uh it's a long formula I don't have it now but you can check the notes for the formula to calculate the SST as well it uses the SS the sum square measures okay so moving on so this are the type of questions I would like to see in your example because then it gives you an opportunity to either use the formulas or to use your calculator which will be quick and easy to do so try and see if you can answer this question on your own ask if you want to ask you can also use the excel sheet that I've shared with you as well because then it will be easier you don't have to use the the sum square measures you can use the values pay attention to where your x is and where your y is x and y you need to pay attention attention to that substitute correctly the values on your calculator let me know when you have an answer name okay easy okay okay let's say option four okay so I'm not sure how far others are you want extra time no response so what I did when I went out I completed all of this on the on excel sheet um capture my x on my x side my y on the y and have my totals so if let me scroll a little bit if you look at my totals there is my sum of y is 320 that's my 320 sum of x is 175 sum of x and y is 10 000 so it matches exactly the same sum of x squared is 575 and some of y squared is that to answer the questions we just require all the values here on the site so I'm just gonna scroll to the site yeah only those values I just need that view sorry about that trying to do so many things on advance sometimes doesn't work the way you want it okay so in order for us to answer these questions we can look at the output there the first one says the mean of x is 29 that is said is that correct that is correct yes the mean of if you look at that is 29.17 1667 the mean of y is 53.33 that's correct it's exactly the same the slope 1.4160 the slope is 1.1460 the intercept 12.0354 12.03554 and it says the equation you need to write the equation y is equals to b0 look at what you got here on the slope which is b1 the intercept which is b0 if you substitute them on to that formula does it give you that and if I yes and also on the excel spreadsheet easy to see that this is incorrect is should be like that so that is the incorrect option the relationship is strong so you can check whether the relationship is strong because our r is 0.98 and if you have used your calculator which I didn't do you would have also gotten the same the same answers yes I used a calculator you use the calculator yes and it is with the calculator than using the formula actually faster almost there I thought it would take time but it's faster yes with the calculator it's faster because if if they give you a table and the summation I don't have a problem as long as they give you the summations then I have a problem because it is time consuming to do the calculation using the formulas okay what will be the coefficient of determination so coefficient of determination on an excel spreadsheet we already calculated that sorry that r squared is 0.959 is it r squared or r? r squared coefficient of determination is r squared so take the coefficient of correlation that you calculated here if you did calculate it go calculate r or your calculator so go find r and then press the x squared button so on excel yeah so if we run off the values I think we will get 0.9 because the answer here is 0.95 9952 which will be rounded off yeah which would be that one oh sorry um the ssts I don't have the summations for the sst on this one but I try to also work it out yeah if you if you are going to use if you are given the x and y values then you can calculate the ssts using this but there is the summation formulas we covered them early in the in the session as well so you can use that is it the one for why one must swipe all squared I'm not sure now but usually the formulas are given to you in the exam when you go right so you don't have to scratch your head to remember what formula to or what formula um can I use because you will be given you can just identify that one that calculates sst and then you use that okay now take your regression line and estimate the value of 23 so since we have the previous question that we had we were estimating there we already calculated that so instead of 10 I can just use 23 and enter and the answer is 44.60 that will be the new value so on your calculator since you still have the values on your calculator what you do is press 23 and then I don't know which calculator you are using and go look for that y hat like that or when you use the case you will select the one with the y hat it's written in orange on the sharp calculator on the sharp calculator it will be that function there so you will press second function that to get to your 20 to the estimated value if you are using a case you you will have to go shift a stat and then go look for the y hat under the ring function and that will give you the same answer and if you are using the excel spreadsheet remember you can take this formula that you have and just weight it out multiply so you just take the formula and multiply the value of 23 to the slope equation any questions before we move to the next question next question I'm also going to do it on the calculator this time and I will do it on the excel will not hide your values okay are we done okay we can look at if we're using excel to answer the questions let's see if we can get all the answers correctly now it says the mean of x is four that's correct the mean of y is six correct that's correct b1 it's correct yes so all what you can do as well here is to increase or decrease the the decimals so to get the same answer as what they have you just go to the decimals and there is a number function and you just increase and we'll see that you will get the same the same decimals okay and that's b1 so then it says the equation b1 now you must be very careful what how they wrote this as you can see that the value is multiplying x so we need to rewrite the equation as we know it it will be y hat is equals to b1 x because b1 needs to multiply x minus b0 regardless of whether they write it like y hat is equals to b0 plus b1 x we always need to know that b1 multiply x the slope always multiplies the x so they can rearrange sorry rearrange the equation but it will always be the same so this same equation can be written as b1 x plus b0 so which is what they wrote this one there so is that correct is our our intercept we need to go to the intercept what is our intercept it's b1 so is our b1 that also b0 which is our intercept so therefore this is an incorrect formula so this will be incorrect so in a way I can also double check all those values by using we didn't use this calculator so I must find the right one so this is the calculator we used 1.78 and 1.28 is my my a and my b so this is incorrect and it says the slope is positive and my slope is positive because the answer of r is 0.988 on this calculator 0.986 which is 99 so if I look at excel as well do I find the same what is the slope is the same sorry they the r r is actually it says the slope not r the slope is positive and we know that the slope is positive in in that instance and the next question is estimate the value sorry estimate the value where x is 8 so where x is 8 we can come here and we change our 23 and make it 8 and say equals and that is the answer that we get so in terms of a calculator I have used a different calculator to capture the data again as well which is this then I can go and estimate the value of 8 which is 8 shift start 5 rank and I go to 5 I should be getting the same well and I should be getting the same answer which is means that is the incorrect question this one they gave a lot of values 11 values you're gonna take us forever calculate the coefficient of correlation which is also the same thing you can do is capture the data on your calculator or you can use your formulas because they gave you your sum square measures you can if you lazy to capture those values you can so rely on the formula which I'm going to rewrite it but it's just this n times the sum of x and y minus the sum of y times the sum of x divide by I'm gonna do them separately the sum of x squared minus the sum of x squared divide by n multiply by the sum of y squared minus sum of y squared over n and you can just substitute the values sum of x and y 8 4 2 and 6 times n they are 11 they were 11 students minus the sum of x is 936 4 times the sum of y is 990 divide by sum of x squared 7 9 8 2 minus the sum of x 3 multiply by sum of y squared 89 9 4 0 minus the sum of y 990 squared divide by 11 if you use your formulas you can use your your excel to capture the data here we have seven so I need to add one two three four I need to add four columns so since I know how many columns I want to add I can just come here and select four rows what I'm selecting two four this four and when I say insert and then down to just insert four four rows the data and I've captured all of them coming to the y 96 9 90 92 90 and 91 and when you capture the data you need to be very careful as well why are my values different to their values because they summation they didn't continue with this one so you just drag the formulas so if you have you just drag them they will populate the whole row we got our coefficient of correlation which is r is number four somebody shouted number four so that question was really quick to answer the question so that is number four number four because on our excel you can also do this on your calculator as well but I think I'm relying on the excel the answer of the coefficient of correlation from excel is minus 62.46 which is the same answer so in regression problem if the coefficient of determination is 95 this means that remember in the variation of y x plate so the total variation variation x oh that's nothing x more or less that is how you interpret r squared so you have r squared how do you interpret it in terms of that 0.95 is it one two three four or five option two I think they made a mistake they're back to the percentage it will be option two because option two will say 95 percent of the variation in y oh maybe I should be using the right English in y is explained by the variation in x which is number two what is the percentage of total variation in can be basins not explained by the regression module you are not explained that's the key weight thing not explained by the regression model so it means we're looking for the errors so how many do we have one two three four five six only six rows so it means I must delete the other rule is also when you delete one two three four five six so I must end there so to delete also you start by column b highlight all of them all of the other a the rows that you don't want right click and say delete and move the shift up so that the thing goes up and your end is it doesn't change any of the calculation so we need to read the sentence the question because they are we need to know which one is our r and which one is our y in this instance it can be bar manufacturer is interested in trying to estimate how sales are influenced by the price of their product sales are influenced by the price so how sales are influenced by the price how x influenced y so it means this is our x how sales are influenced by their price so this is our y and this is our x x and y we capture the data point 30 this is very tricky because it's got the point decimals 1.80 I'm not going to do 2.0 just because of time 2.4 what did I skip 2.9 2.9 190 40 38 32 we're looking for total variation not explained by which ones are explained by so you'll know which ones are explained by which are my r-squared and the ones that are not explained by that will be 1 minus percentage and the answer is 22.61 oh decimals let's go so option number one so this one x this r-squared tells you what the total variation that is explained by the regression module remember that so the r-squared here will give you that total variation the one that is not explained by the module are the errors and those will be the errors 21.61 so which is option number one will be your correct answer this would have been correct if we were only looking for r-squared but we're not looking for r-squared we're looking for the other variation that are not explained in the module using the same information which is this information the prediction of can be given the candy price is 150 what will be the price okay so we need to this is not right it's not the right what will be the prediction of the number of things given that the price is 150 cannot be a price it has to be a number of i don't know why they have a same day the price there any who let's see going back so that should be because we estimating we have a price of 150 100 we have a price of 150 it's 89 so it shouldn't have been 89 like a rent value it should just be there will be 89 sales 89.10 sorry they shouldn't be those are in front because we estimating the sales so it should just be 89.10 wherever i got this from i think it was from a tutorial let most of the tutorial letters have errors so we just bear with those ones so this because it sales it should be actual number because sales are actuals not prices and here we are predicting the sales given that the price is that so we've predicted that it's 89 in the last two minutes that is left let's finish off yeah it says calculate the sse so do we know what our sst is equals to s sr minus sse yes we do because lucky for me sse i did all those on excel so here are your calculation of sse and ssr based on that information that we have i must just make sure double check okay nope it's not based on that information we need to copy and paste sorry about the flicking you see waking on two different different thingy we need to copy the values of x and y only those values not with the total the total will calculate copy paste and i hope this are six one okay we'll see when you copy and paste the values there so i'm sure it's it works it's working all right so it calculated the sse ssr and sst and we know the formula is sse will be if we move sse i just want to double check if i have the right formula for sse it's plus make sure that we have so therefore it's sst minus ssr which will give us s s e which therefore s s e will be equals to actually i've already calculated it it's 1062 1062 of 58 now there is a discrepancies here 62.58 let's just double check if my formula works is working correctly so as the estimations i see it's using the right formulas so this should say the sse if we don't use the sst sst minus ssr we can use the summation so it says it's your y hat minus the estimate of your y squared so on yeah that is this so the estimated the y hat subtract that 100 minus that squared and that's what we did for all of them so it's the same thing doing the same doing the same doing the same and the answer is 1062.5 because it's the summation of all of them it should be 1062.58 which is none of the answers on here however this one is the closest one because even if i use the the two it would have been sst sst minus ssr so i will assume that there was an errata on these questions as well because they come from a tutorial letter of the previous yes but then that would have been the the closest one with that concludes today's session so i will see you on wednesday when we continue with other activities so but nothing stops you from going through activity 16 until 23 so there are only those 16 17 18 19 20 21 no 22 and 23 so we will also go through them on um on wednesday and including also the chi-square tests so the first hour remember it will be for chi-square and the second hour will be for regression so i'm not going to ask as many questions because i think also for the chi-square we didn't also complete all the activities so we're going to combine both so i might not create new activities any questions comments um so as um what can i get these uh uh excel templates on these ssr these questions uh which template now sorry for this uh this regression the excel sheet okay are you on the what are you on the what's up group so there is a what's up group wait let me do this just give me a sec i'm going to stop recording