 Afternoon, Lizzie. Afternoon. How are you doing well with load shedding? I have load shedding at 4 o'clock, so I hope I should be able to connect via my modem or something like that. If it kicks me out at 4, just be patient. I will be back. If I'm able to connect, otherwise then I don't know. Okay, so assignment 5, which is your last assignment, was opened on the 16th night, so you should have already had your first submission to see what it looks like and what type of questions they are asking and so on, and you should have at least tried your first try to see if you do understand the content. So today's session, we're going to do activities relating to study unit 10 and 11, and then next week it will be question and answer session. Then we are done with the content. We're done with the assignment and we take a break and then we start working on the exam preparations. Okay. Are there any questions before we start with today's session? The activities that we're doing today are any of them, are any of them going to be exercises from our study notes? Which study notes are you referring to? The ones that we didn't do in the previous sessions. The ones that I said you can go through them on your own. I don't think so. I'm not sure. Maybe it might not be. I've got new other activities which might be different to the ones that already have shared with you. And yeah, I'm not sure. If you have went through the activities like every Sunday after every session, I always leave three or four questions for you to play around as well as to see if you understand because the majority of the time in the session, I'm the one who is doing a whole lot of talking and doing the exercises and showing you how to do it. So by me leaving those five or three questions it's for me to say go and do them and if you are struggling let's have a discussion outside of the online session and then I can help you so that we can open up that discussion but normally you guys you don't do that. And then when we do the activities I come with totally new activities because there are so many permutations of questions that you can get because you must always be mindful that with every session that we do it's to prepare you to be able to know the content to get you ready to write the exam, right? It's not about answering this assignment and getting over and done with the assignment because your exam might ask you a totally different question but if you haven't been exposed to that you won't be able to answer the questions in your exam and in preparation to that that hence we do different type of questions that we could find. So today's session as well you will have a whole lot of activities that you might have not seen before some of them you might have already seen them because probably I want to emphasize on one key issue or something with that question as well but we will see but I don't keep track of what I used the previous time and what I'm using now because every day activities I created them today so and I didn't open last week's session to see what is it that we didn't cover all I just I use my mail to say but this is the nice question to use maybe we can use this for this week, right? Okay, are there any other questions? If no questions remember you need to have your your statistical table ready you need to if you're going to be using the templates make sure that you have the templates ready so we should get my calculator ready. They are speaking of templates where can we get them? The templates I've shared them with you it's either if you are if you check on the load under my UNISA and let's see if I have it open on the if you go to if you have if you have joined the teams group when you go to teams under the files the templates are there and I can see that two people have worked through the template on here please download them don't work through them online in case you make some changes it will affect other people that will want to use the template go there and download them and then if you are on my UNISA site the templates are under templates under additional resources today's notes you will find it under the weekly session and probably it is called that is today's session the study unit 10 and 11 activities and then the notes the templates you will find them under templates and all the other videos that you will see online you will find the summary notes but for every Saturday Sundays the notes are on here okay so these are the two places where you can find the things that I'm talking about okay all right thanks all right well let's get on to it without wasting time so the first few activities that we're gonna do will be from the Chi Square test and then after we've done a couple of them then we will move into doing the regression one the regression activities so here I expect you to also take part in the activities and do the activities with me and and then we move on so the first one you are given the table consider the following table um which one of the following statement is incorrect you need to choose the statement that is incorrect I'm gonna give you a couple of minutes if those that are using the template you can go ahead and use the template to uh capture the information those who do not rely on the template but use their own the other the options to go through the question and also complete things manually we can do that together while I'm giving those who are using the template the chance to type the data onto the template and you need to also be able to select the correct template by checking what is this contingency table in terms of rows and columns right because that's what you will need to select the template so in terms of frequency table you also the question is asking you to find the incorrect one the expected frequencies remember expected frequency you go into calculate it by using your row total multiply by the column total divide by the grand total which is h and your observed frequencies are always the values that are given to you the degrees of freedom remember this is the degrees of freedom for a for a chi-square test therefore it's number of rows minus 1 times number of columns minus 1 and that should give you the degrees of freedom finding the critical value we use the critical values of chi by finding the value of alpha and your degrees of freedom and they have told you what your alpha value is and then the last one is about making a decision so you just need to always remember that with a critical value of chi-square when you make a decision because the chi-square is an upper tail area region of rejection anything that falls this side reject the non hypothesis so once you have your critical value you go into check whether this test statistic you don't have to go and calculate it they are telling you that suppose the test statistic is 18.35 and if the critical value is that where does it fall does it fall in the rejection area and then how do you make a decision that's all so the template that you need to be using it is the chi-square test template and looking at the rows and columns you need to be able to choose the right contingency table or a two by three or a two by three contingency table or is this a two by two contingency table or is this a four by two contingency table so but you need to go and select the correct contingency table and based on this I will say we already have the data kept checked so you can go ahead and answer the question let me know when you have an answer I have an answer for number one okay I mean if you are done with the exercise and have an answer of exercise one where it says which one of the following statement is incorrect which is the statement one two three four and five one to have the incorrect answer then I will know that you have went through all five statements and you have your incorrect statement five is an incorrect easy so you are done with the question yes others yes five is the incorrect answer five is in the incorrect answer so it means you are also done okay so let's look at each and every statement since I think two people how many are we online unless someone is still busy so let's look at each and every statement statement number one says the expected frequencies for bias above 45 and bought a medium car is 39.6 so you are going to take the row total of above 45 and the column total of medium so the row total is 120 times 99 divided by 300 so 120 times 99 equals divided by 300 equals 39.6 and those who are using the template that is 39.6 as well because it is 45 and medium that's where the expected value is which means option one is correct observed bias under 30 and bought a large car under 30 and large car is 34 which is correct the degrees of freedom how many rows do we have we have three three rows how many columns three as well three columns three minus one is two three minus one is two two times two is four so our degrees of freedom is four critical value at alpha of 0.05 therefore we need to go and find the critical value on the table for 0.05 and 4 so when we go to the table this is the critical values go to the upper tail area and look for 0.05 and 4 where they both meet that is the critical value which is the same which is the same as 9,488 which means all that are correct suppose that our test statistic is 18.35 H0 cannot be rejected so here they are telling us if we know that our critical value is 9,488 here and they say the test statistics is 18.35 therefore it falls in the rejection area and the statement will be incorrect because it says we cannot reject the null hypothesis so that is correct we are rejecting the null hypothesis so this is the incorrect answer and that's how you will answer some of the question in case they give you like this in the exam or in your assignment yes I thought there was a question somebody's mic is unmuted don't know who so let's look at number 2 which one of the following statement is incorrect about the chi-square test of independence between the two variables the test statistics has are looking for the incorrect that's what the question is asking the test statistic has r-1 c-1 degrees of freedom where r is the number of rows and c is the number of columns number 2 the observed frequencies for each cell is equals to the row total times the column total divided by n where n is the sample size number 3 the two variables are qualitative number 4 the null hypothesis is that the two variables are independent of each other number 5 if the observed and the expected frequencies for each cell are equal then the test statistic will be equal to 0 which one of these 5 statements is incorrect again number is it 1 is it 2 is it 3 4 or 5 I think 5 is you think 5 nays somebody's phone is ringing is this correct remember your chi-square test says if you observe then your expected frequencies are equal to 0 if this was a 2 by 2 table right there you have your rows row 1 row 2 column 1 column 2 if this was 0,0,0 therefore they also say if your expected frequencies are also equal to 0 so I'm going to write my expected frequencies here as 0 so you will say 0 minus 0 squared divided by 0 plus 0 minus 0 squared divided by 0 plus 0 minus 0 squared divided by 0 plus 0 minus 0 squared divided by 0 which the answer will be equals to 0 and it is correct because they say if they are all equal I'm just making an example right I think number 2 is incorrect because it tends to be divided by the total not the size the sample size no no no no it is correct the only thing that is not correct is this what is supposed to be the observed value are the values that you are given something that is not given you need to calculate it by using your row total times your column total divided by your n which is your sample size which is your grand total what is that the expected frequency is your expected frequencies hands number 2 you had it right it is the incorrect one but it's supposed to have said the expected frequencies of each cell it's equals to that okay and also here I've just used a lame example in terms of in terms of zeroes let's if I change the bottom one let's not use zeroes because then it as if like I'm saying the values should be equals to zero just for an example again on the second last one in case you get something like this in the example if this was one two three four and our expected values were one two four and three the one at the bottom are our expected so we're still going to say one minus one divided by one two minus two because they say they are equal to each other right they expected then the frequency they observed are equal to one another and three minus three it's divided by three and four minus four divided by one minus one is zero divided by one any number zero divided by any number is the same as zero two minus two is zero you cannot divide zero by any number it will stay as zero three minus three is zero zero right and the answer here will be equals to zero so if your expected value and your observed value are the same therefore it means your test statistic will also will be equals to zero so that is correct we're using qualitative data and the two and the null hypothesis we always state independent right this we just dealt with it in the previous one is your degrees of freedom is stated by the number of rows minus one times the number of column minus one moving on unless if you have another question let's move to the next question many companies use well-known celebrities as spokesperson in their TV advertisements a study was conducted to determine whether the brand awareness of female TV viewers and the gender of the spokesperson are independent each in a sample of 300 TV viewers was asked to identify a product advertised by a celebrity spokesperson the gender of a person and whether or not the viewer could identify the product was recorded the number in each categories are given below the male celebrity and the female celebrity and also their identified product and could not identify that referring to the table at 5% level of significance the critical value is the critical value of the statistic is so here they just want you to find the critical value you need to go find your degrees of freedom which is number of row minus one times number of columns minus one um how many rows how many columns substitute two two you need to also identify what is your alpha value so there are two columns and there are two rows and what is your degrees of freedom two minus one is one two minus one is one one times one is one so our critical value will be zero comma zero five which is our alpha value and the degrees of freedom of one then you go to the table what is your critical value one and zero comma zero five it's three comma eight four four one the table that they used had four decimals three decimals but number one is the correct answer in a contingency table with nine rows and two columns how many degrees of freedom are there for this kind of square test how many degrees of freedom are there there are eight degrees of freedom eight degrees of freedom because we are told that there are nine rows and there are two columns nine minus one is eight times two minus one is one which makes it eight which is option C to perform a chi-square test of independence you require what do you require the degrees of freedom um you might be right what do you what else do you require a test of contingency table yes all the statements they are and the level of significance yes a test of independence it's a test of contingency table because it's a cross tabulation test you will need the level of significance if we making a decision we use a degrees of freedom you the distribution of a chi-square test is not probably this question they probably they wanted to say what is what is not required to do a test for independence because the chi-square test is a positively is positively skewed right because you can see from the distribution that it is a positive skewed distribution so probably the question yeah something is wrong with this question entirely because all this statement 134 and 5 are related to how you perform your test only number 2 is not so I'm going to change the statement to say to perform a chi-square test you do not require the distribution to be negatively skewed that's the only thing that is relevant to this because all of the other statements we use two or more nominal scale because if I have two nominal variables male and gender male and type of car both of them are nominal you need the degrees of freedom and the level of significance to find the critical value to make a decision or to generate a region of rejection and this is a test for a contingency table you need to also be able to put your data into a contingency table in order for you to be able to calculate your expected values so only the only thing that is not correct here is number 2 let's go to number 6 to test if the absence of workers from their job or cares at a high rate on rainy days than on non-reiny days a company to get a sample of 400 days and the results are as follows so they tested the weather and the absenteeism and they recorded there were 117 employees or workers who they did this a statistician wants to test the independence or to infer weather the incidence is higher on rainy days higher on rainy days right so this is a contingency test so you should be able to know how to state the null hypothesis of the alternative hypothesis but that is the step one that is you need to know that you need to know how to find the critical value you need to know how to calculate the expected value you need to know how to calculate the test statistic so yeah they didn't ask you about the test statistic those who are using the template choose the right template so this is a what type of a contingency table is this it's a 3 by 2 it's a 2 by 2 table so you will have to go to a 2 by 2 because you have rainy and no rainy yes and no don't count the totals so it's a 2 by 2 contingency table and those who are using the the template you just need to fill in the observed values 5, 10, 55 and 100 into the table change the names rainy and yes and no and then answer the table there is your expected there is your test statistic but the question was not asking you about the test statistic those who are calculating manually you are with me r minus 1 and c minus 1 what is your observed what is the expected expected frequency you will find it by using the row total times the column total divide by the grand total which is n so you will take the row total of rainy the column total of absence is yes it's 15 so it's 60 times 15 divided by 170 the critical value you just need to go and find your k square critical value which is your alpha and the degrees of freedom you would have calculated what the degrees of freedom is then you also need to know how to state your null hypothesis correctly we are looking for the correct statement are we winning? yes let's deal with what we are able to see in front of us and then we will go to the table later on so we are looking for the incorrect statement what is the degrees of freedom how many rows or columns we already established that there are two two rows and two columns therefore the degrees of freedom is one that will be incorrect what is the observed value for non rainy and no absence it's 100 not 100.3 so that is incorrect what is the expected frequency of rainy and absence where is rainy rainy and absence is 5 it's the row total 60 multiplied by the column total 15 divided by 170 what do you get 60 times 15 60 times 15 divided by 170 it's 5.3 which is not equal to 5 and number 5 how do we state the null hypothesis do we say independent or dependent independent always null hypothesis states that independent so that is incorrect critical value our alpha value they told us it's 0.01 and the degrees of freedom is 1 so let's go and check the table way 1 and 0.01 our critical value is 6.635 6.63 that is the correct answer that we are looking for let's look yes can you explain number 2 again for me I think I made it lost on number 2 at number 2 they say observed value observed value is equal to it's the value you see on the table oh the 100 is it the 100 it's 100 yes and there they told you that the observed value is 100,3 but we know that it's just 100 that is why also on the expected value because the answer on the expected value is 5,29 it's not 5 it's 5,29 4,1,1,1,7 it's not 5 it's bigger than 5 that's why it's not 5 5 is the observed value that you see on the table let's look at one more question and then but this is the same as the one no it's not the same it is the same did we answer this type of question before it looks like oh we were looking at the degrees of freedom the critical yes so we can look at this it's almost the same as the previous one so I'm not going to calculate what the degrees of freedom is we did establish that so you need to just go and look for 0,01 and the degrees of freedom of 8 so let's go to the table we remove all these other values 0,01 and 8 and the answer is 2,09 2,09 and that's how you will find the critical values okay let me see I had additional questions that you can go and do on your own this is one of them I think this is almost similar to what we did in class probably we did touch something like this there are no values here you just need to make sure that you populate the values before you populate them if you are using the template and answer the question here is another question on this one also pay attention if you are using the template you should be able to get the numbers in the brackets they told you what they are they are your expected values because you are calculating a chi-square test statistic it's very long if you take this this is a 2x 3 column so you will go to the contingency table 2x3 and change the values p and q and change the values a, b, c and just capture 55, 40 and 79 onto the table then you fall 56 and 100 onto the table all they are asking you is the last part the test statistic the value you will find here it should be your answer if you are going to calculate manually they are giving you your expected values so they don't expect you to calculate the expected values so you just substitute into the formula we know that the formula is your observed minus the expected squared divided by the expected so what they want you to do is just take 55 minus 42.54 square the answer divide by 42 .54 plus 40 minus 45 .89 squared divide by 45 .89 plus 79 minus 85.57 squared divide by 85 .57 plus and then we go to the bottom 34 minus 46 .46 squared divide by 46 .46 plus 56 minus 50.11 squared divide by 50.11 plus 100 minus 93 .43 squared 993.43 and you will find once you have worked out all of them and add them together you will find the test statistic that's all what they want you to do this is another one you can go through I think we did one of this in class I'm not sure it might be some of them they look familiar so yet you just answer the question calculate the totals the template you just captured these values the total will be calculated for you but you just need to make sure that you know how to state the null hypothesis and the alternative how to calculate the expected and how to calculate the degrees of freedom the next one they asking you to calculate the test statistic and the critical value you just need to calculate the find the critical value this table it is three by two table and you just calculate the critical value after you have your critical value and your test statistic you can then just come and do your decision and make your decision because yeah you will have your critical value and you will find out whether your test statistic that you calculated whether it's zero or one where does it fall does it fall on this side or that side of the critical value based on also by based on the critical value that you have selected from here then you make a decision whether you are rejecting or you do not reject the last one this is automatically excluded because we don't state the statement by saying accept or statements like that so on this one as well you just need to state your critical value find critical value and select which one is that or you can say based on the critical values that you looking at whether when the level of significance decreases that's the critical value decrease and you can also look at this when your critical values are decreasing that this values decrease as well or they increasing that is something that you need to figure out there and you need to be able to calculate the test statistic which is also the same making a decision let's move to the regression and remember if my line cuts off wait for me maybe I might be able to reconnect if not I will send them a WhatsApp if I'm unable to connect because of low connectivity on my site but other than that let's continue and look at regression and correlation exercises also you have a template that you you can use if I open the template also it's given there with some information in terms of how you use the template correctly without affecting some of these calculations because most of these calculations are automated so let's let's look at the exercises so the first exercise is a sample of 8 observation of variable X and Y are shown below with the values of X and the values of Y and they gave us the summation values we can use them or we don't have to use them which of the following statement is incorrect the first one the coefficient of correlation is 0,09 the coefficient of determination is positive the best line is Y hat is equals to minus 2.19 119 plus 80.155 times X there is a strong negative relationship the results in the connection with the above variables are reliable so first of the equation Y hat is equals to B0 plus B1 so it means we need to be able to use this information and calculate the slope and the interceptors check we need to go and find out if R is equals to that or R squared is positive some of this are straightforward so if we using the template go to the template we have make it smaller so I can be able to capture the data there are 1,2,3,4,5,6,7,8 1,2,3,4,5,6,7,8 I've got two additional information just go to B highlight everything that I don't need until Y squared and delete and I must smooth it up and everything is intact I just keep the values so I'm going to start by putting in my X values which is 5 go down 3, go down 7, go down 9 2 4 6 and 8 and then go to the next one which is my Y 20 23 15 11 27 21 17 and 14 and you can only get this right if you practice because if you don't practice then you're going to struggle when you answer the question either in the exam or in your assignment as well so I'm just going to minimize it so I can go to the left and I can also double check this value so they say X squared is 284 X squared is 284 I have the same so Y squared is 2990 the sum of Y squared is 2930 the sum of X and Y is 725 there looking at the questions let's start with number 1, number 1 says let's make it bigger number 1 says the coefficient of correlation is minus 0.99 there is my coefficient of correlation I can make it even less this involves so that it reflects the same so it's the same R squared is minus 0.09 that is correct your R which is the coefficient of correlation is correct your coefficient of determination which is R squared will be positive because if you look at the answer as well it's positive I make it 2 decimal it's positive it's 0.98 which if you take 0, minus 0.99 and multiply by itself twice you will get 0.9998 which is positive so number 2 is also correct number 3 it says the best line fit is I just need to remove this plus sign here there is 30 minus 2.99 X so it means this is incorrect so the only incorrect answer here is number 3 because the equation should read our slope B1 is 2.19 2.119 and here it's written as the intercept remember always the slope is the value that multiplies with the X right so that is the incorrect one there is a very negative strong relationship yes there is a negative strong relationship because your R is minus 0.99 and the result based on that we can say the results are reliable I think this one question we did do in class I'm not sure it sounds familiar anyway your question which one of the following statement is incorrect about some of the concepts of linear regression number 1 I just want to make sure that my my modem is on so that the connection when it forms up is alright okay so which one of the following statement is incorrect number 1 a correlation coefficient of 0.1 indicates a weak positive relationship between two variables number 2 the least square method estimate which is your regression line or the regression equation by minimizing the errors of your sum square SS in number 3 the coefficient of correlation or the correlation coefficient always takes the value between 0 and 1 number 4 the coefficient of determination can be interpreted as the percentage of the total sum square that can be explained using the estimated regression equation remember the coefficient of determination r squared can be calculated as SSR divided by SST this slope and the intercept of the estimate that regression equations are let me know if I'm back yes you're back on busy you must let me know if I cut off my bar on my phone it's two bars so I'm connected it might be a low connection okay so number 5 the slope and the intercept of the estimated regression equation are estimated using the least square methods which one of this statement is incorrect let's go through each statement number 1 is that statement correct strength and direction you still remember in terms of the strength in terms of the direction we refer to negative or positive so the value here is a negative or positive and in terms of the strength we look at the number if it's between if it's 1 it's a perfect if it's between 0.99 and 0.79 we say it's strong if it is between 0.39 and 0. if it's between 0.79 and 0.5 we say it is or is it 0.39 we say it is moderate oh yes it is between 0.79 and 0.5 we say it is moderate and when it is 0 between 0.39 and 0 we say it is weak and when it's 0 we say there is no relationship so based on that is that correct yes that is correct because it's 0.1 would refer to a weak relationship between the two variables the least square measure it's used to minimize the errors remember we also say in terms of your total variation is your regression plus your errors that you cannot explain so the equation for the least squares which is your regression like actually it is y hat is equals to be 0 plus b1x plus errors so we by using this equation we try to minimize those unexplained errors as well and then number 3 it says the coefficient of correlation and we just spoke about it so we know that the coefficient of correlation is between 1 and minus y only the coefficient of determination is between 0 and 1 which makes this an incorrect statement that is correct because I gave you the answer to that and the slope and we can use the least square measures estimate we remember the sum square measures the summation they are also called the least square, the estimated values okay so the only question that is incorrect is option 3 let's see if you know how to read a scatter plot if given the data so consider the sample data below and develop a scatter plot which one of these scatter plots from a to f describes this data that we have the first thing you can do is do a process of elimination based on the the x axis and the y axis like for example if I look at my scatter plot a I will use that one as an example if I look at a a says I'm starting at 20 on the x axis and I'm ending up at 35 if I go to my x axis it starts at 18 and it ends at 40 so therefore it means there are some limitations on this x axis already I can do a process of elimination from there that is how you can start off the other thing is by picking one or two points so let's say for example if I come to this scatter plot I could say I need to go to point 34 it should be somewhere here and I need my y value to be 15 and I go to my y value if you see on the y axis it starts at 30 therefore that cannot be that is one option that I'm giving you I'm not counting on you tell me which one of these options represents this data that we have is it C F E O D E F T do you also agree that it's D yes we can look at the other ones so let's look at this the F oh sorry C let's look at C C has 50 there is a point that has x of 50 and I can see that x of 50 therefore C is eliminated try that and F and all the dots ends at 35 I know that I've got a point that on the x axis that means to correspond with 40 there is nothing there so that can be eliminated E there is a point at 40 right at least my minimum starts at 18 18 so if this is 20 so probably 18 should be somewhere 19 18 let's say yeah in the middle is 15 there is no 15 and those points looks like they are on 15 also therefore process of elimination on that one if you look at this there is 40 and the minimum one was 18 and 30 there is 30 therefore it means that one and that one 10 and 40 so C is the correct one so that's how you will find your your values or your scatter plot by using the method of elimination you can look at each point and try and map each one of them pick one or two points the first two points and see if when you map them you are able to identify which graph they correspond with okay look at Lindsey is it possible to go to the previous question there's another one that the previous one before this one this one on the correlation the first point hmm I think I read on the test book where it was saying if R is close to 0 it indicates a little linear relationship and then if it ranges between 0 and 0 0.3 to 0.5 it's a weak linear so I got a little bit confused with this one but this is 0.1 it's not 0 right okay I also have the same confusion because it's still close to 0 and under weak linear relationship it starts from 0.3 to 0.5 yeah closest to 0 yeah that's the other thing as well to be considerate of when you look at this because different table or different textbooks and different things I explain differently in terms of the relationship so when it's 0 we say there is no relationship right when it's 0 when it's equals to 0 we say there is no relationship if it's anything between 0 and 39 we say it's a weak relationship so it should not or let's when we say 0 between 0 and 9 yeah we're referring to anything less greater than 0 or equals to 39 something like that so there is no such thing as close to 0 because it says yeah 0 can be something like 0.05 what is 0. let's say 0.05 0. 0.01 0. 0.08 those are close to 0 right sometimes you can say that it's asking you how you will define that but in terms of all the statements that are here if you go through them each and every one of them you will see that intentionally your lecture says this 0.1 is a weak relationship there is still some sort of a linear relationship there but it's very very weak because if you take this 0.1 and square it if you take the square 0.1 because I lost my connectivity I can't use my calculator what is 0.1 squared it will be 0.1 squared which will be 0.0 0.01 so if I take this 0 and I convert it to r squared it will be 0.01 which tells me that there is a 0.01 chance of the x the total variation in the independent variable that is assumed to be attributed to the x variable but you can see that that is very very to a point where we can even just say there is no influence between the two values the x and the y the your your what the total variation in y are not attributed by the variation in x in terms of that because it's very very small and some books might refer to this as no relationship in terms of your arts your coefficient of correlation when you look at the relationship as compared to when it is 0.1 0.5 0.5 0.5 things like that so you just need to pay attention to the statements given and make assumptions based on that alright okay so let's look at the next question you are tasked with investigating the relationship between the speed y and the hx of a primary and high school learners using a simple linear regression which one of the following statement about the investigation is incorrect number one the h is dependent number two the estimated regression equation will take a form the reading speed is equal b1 times h plus b0 where b1 is the slope and b0 is the intersect number three the reading speed of a quantitative discrete variable oh the reading speed is a quantitative discrete variable number number four if the correlation coefficient is negative there is a negative relationship between reading speed and the h which one of the statement is incorrect one two three and four let's hear is it number one is number one correct or incorrect do you know what is an independent variable and a dependent variable in terms of your linear regression what do we put on the linear on the independent side we put the x variables what do we put on the dependent side we put the y variable because your independent variable is your input variable it is that variable that you use to predict what your output will be your estimate will be so number one correct or incorrect it's incorrect incorrect and since number one is incorrect we can also always estimate that the reading speed which is our y is equals to our slope and our intercept which they wrote it correctly there right we think they could have written it this way as well b1h because h is our x remember they could have written it by surface and we know that the reading speed they didn't say it yet but reading speed is a quantitative variable so if they say it's a discrete variable we can just assume that it's a discrete quantitative variable then if the correlation coefficient is negative for sure the direction will state also that it is a negative linear relationship between the variables some of the questions are just straightforward but some they might look tricky you just need to remember all these other things that you have length as well we have eight minutes and we can look at more questions I'm not going to go through this because I think this one we used in we used it as a practice with our calculator let's use the template to answer this question we'll use the template so in terms of the template I need to go and clean up I know that we had eight variables here how many one two three four five we have five so in order for me to read only five one two three four five I need to delete all these other rows so you highlight from B up to Y squared the values that we want to take out and we delete up and we can just capture the values and we have four two six four and three I don't have a calculator online now so unfortunately I cannot demonstrate using a calculator but you can watch the previous videos because we did use the same question in the previous video to demonstrate how to use your calculator seven six and we just double check if I have the values correctly four and five two and three six and seven four and six and three and five and we can go to the to just get other values check so we looking for which one of the following statement is incorrect and if I go to the model that they gave us so because this is positive I just need to put a plus right here in the middle up and do it on the rows and should be on that column should be left almost it sorry say this so which is the same as that Y hat is equals to one point okay it's two decimals we can also reduce it to two decimals also yeah we can reduce it to two decimals so it's 1.66 plus 0.39 which is the same as what we have yeah so it means we can answer any of these questions our data is correct which one of the following statement is incorrect but before we answer that question I just want to also advise you to write the equation yeah so that if any way they refer to things on the options you should be able to know what you're talking about yeah intercept and the slope okay so let's answer the question which one of the following statement is incorrect X is correct X is dependent is that correct or incorrect it's correct ah we just dealt with this just now no no no no X number one is incorrect because it says X is dependent X will always X from now from today you will always remember this X is always independent independent X is always independent I'm not even writing independent right now X is always independent independent Y is always dependent X will be incorrect so as long as X is incorrect the way they labeled it therefore it means Y will also be incorrect for X is equals to 8 the estimated value is equals to 9.1 all what you need to do is take your calculator and just substitute the value of 8 onto the formula and calculate 0.93 times 8 do you get 9.1 I will use my formula from here yes it's 9.1 times 8 is equals to 9.11 yes true 9.1 9.1 so that is correct and here it says the slope is 1.66 if I go there what is my slope my slope is 0.93 also if I go here on this formula what you don't see on the entire excel now since I've minimized it you should be able to see the slope is 0.93 yeah they say it's 1.96 but also you didn't even have to go and use the template because all the information unit is here so which is incorrect the intercept therefore it means it will also be incorrect because they swapped the values around and that's how you will answer some of these questions so let's look at the next question so the next question also looks exactly the same here they didn't give you the equation they gave you the sum square measures you can use the sum the slope the regression because the slope is given by your sum of x and y minus the sum of x the sum of y divide by n divide by the sum of x squared minus the sum of x squared divide by n you can just take this values and substitute it to this formula the intercept is given by your mean minus v1 times the mean of x that will give you your intercept therefore it means after you have calculated the slope you can substitute the mean by taking 122 divide by how many there are minus the answer for the slope times 39 divide by how many there are minus that and then the regression line you just need to make sure that you substitute the values of your intercept and your slope correctly because this is b0 and this is b1x the answer you get from number one and number two should substitute correctly into that as you can see here they say this is the slope but it's written on the intercept on the formula the coefficient of correlation also you can calculate it by using the summation formulas which the coefficient of correlation is your sum of x and y minus the sum of x the sum of y divide by n sometimes they don't do that they don't divide by n they multiply by n here and they don't divide by n there divide by your sum of x sum of y square root something like that your n times n times that which this if I take it out I don't have to use that I can use the square root of your n times the sum of your x squared minus the sum of x something like that I can't even remember the formula I don't want to give you the wrong formula but it's the summation formula otherwise you can use your template those who are using a template you could have been done long time as well I substitute in the values so we have 6 8 9 and 12 I didn't include correctly 6 8 9 and 12 and then the y values we have 12 and 16 and 25 and 30 and 39 and already you would have the answers that you are looking for so you could answer the question your intersect which is b0 it's minus 3 minus 3.29 if I look at this it says it's 3.5 which is incorrect your slope it's 3.54 it says this slope is minus 3.21 which is incorrect we are looking for the correct one the regression line so these values I can see that they have written them vice versa so my equation should look like this should say y hat is equal to minus 3.24 because this is 2.4 they have 2.1 but it's still almost more or less the same thing minus 2.34 plus 34 so it means this equation might be correct because they would have calculated this manual not using the formulas as we have the coefficient of correlation is 0.99 you can see the answer 0.99 which is that therefore the coefficient of determination is 0.99 we can also double check that it says it's 98 going back so you can see that that is not correct even if you take your calculator and take 0.99 you won't find 0.9950 as well so that is incorrect and it says if you substitute into this formula x is equal to 10 you will get 35 so let's do that but that do is substitute 10 and the answer is 32 which is not 35 which is not correct even if you take this equation there minus 3.29 plus 3.54 times times 10 35 because you will be subtracting from 3.25 times 10 it will be 35.4 minus 3 should give you 32 point some number so it cannot be 35 because 35.4 will be this if this was 0 so that will be incorrect the only correct statement there would have been option number 3 okay so some of the questions that you might get in the exam or in your assignment looks like this so you need to be able to know how to do the summations remember the summation of xi it means you adding all these values the total it's your total right your total this will give you the summation of i's this will give you the summation of y's right that will be the summation of y i's if you want to calculate the summation of i minus the mean squared let me go back to the template the template so on the template we do have some calculations like this but these are not the calculations that you can use to answer that question so let's say for example you want to answer this question where it takes the y value it should be the same s taking your c like here I have the summation of y i minus the mean of y so you will need to be able to go and calculate your mean of y the same way as you would be able to calculate the mean of x and all what you will do so let's assume that the mean of you would have calculated it here actually let's put it this way you would have calculated it here there is the mean of x on your data so if you come here and substitute all these values by your x and y values there actually I'm lying I already calculated them these are the values that I'm talking about so sorry my bad I forgot about them so let's go back here let's go back here oh come on so this is the sum of x minus the sum of mean the sum of x minus the sum of mean times the sum of y times y minus the mean of y which is this it will give you that answer there that part will give you this summation answer on there this sum of x i minus the mean squared yeah it's this second column on e it is your x minus the mean of x squared you can see there the sum of it 36 would give you the answer for that or probably let me do it for you and then you can go and practice so I need to know how many values are in here there are 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 there are 10 at the moment I have 5 so I need to add 10 more so on b you must just I'm going to add 3 and then add 2 insert down and then I must just add 2 more and I should have add a rows insert down remember I need to be able to take all the calculations that were done previously take them soon so let's check the data let's take it down let's start with the x first x is 10 15 9 10 11 18 12 12 and 10 and we'll capture the y's 178 274 155 207 207 I didn't check my okay I've got enough battery 208 184 and 224 and 196 I hope I'm capturing everything correctly and 246 and 168 let's double check our values 10178 274, 155 and 9 10 and 207, 11 and 208 11 and 184 13 and 224 12 and 196 12 and 246 and 10 and 168 there we go so I have all the values I can just minimize this I'm not looking for the sum of x just looking for these squares okay so because these are summations, summations means don't ask we write summations are total so let's see the first one it says the summation of the x values we know that it is the sum of all the x values is 113 number 2 it says is the summation of your x observation minus the mean which is that x observation minus the mean square root as you can see there I'm squaring it square and that is the answer which is the summation which is correct and the next one it says is the summation of your x minus the mean times your y minus the mean which is that part the last one it says it should be 511 and that is correct number 4 it says b0 it's 6.91 we can go to the b0 let's remove the ink b0 is minus 1.491 which is not correct so it's not correct and b1 is 18.19 which is correct if I made it 3 decimals it's the same so the only incorrect answer here is 4 now answering the question which one of the calculated quantities from 1 to 5 is incorrect it's only number 4 which is option 3 and that's how you will answer some of these questions I know that we are 50 minutes above time you should be able to interpret your r to calculate the move by you should be able to make your estimates they gave you the h is 15 so therefore your x is 15 calculate it and find out what is the value of y you should be able to interpret your r-squared as well and you should be able to interpret the slope which is this it is a minus b3 so you should be able to to state it in relation to the values which one is the incorrect one or the correct one I think that is how do you interpret the slope also interpret your correlation coefficient and you should be able to interpret what the coefficient of correlation is or also the interpretation of the coefficient of termination or you should be able to take the equation and substitute where you see x and c if that is the correct one because they say the estimated when it's 17 so age it's 17 the answer 298 you should be able to do that you should be able to look at this and make your conclusion about the correlation and remember you'll be looking for the incorrect value you should be able to interpret the slope which is that value what does that mean it's positive what does that mean remember negative decrease positive increase you should be able to interpret your value of your R squared the same thing on this one and on this one as well you are giving the score of 15 you just need to substitute it into that value so almost similar questions in different ways right and I can't even include this you need to calculate the coefficient of correlation you can use the sum square measure formulas to calculate R or you can use your template your R value on the template it's just that value that you will see on there same thing they don't do this so there are so many other practice activities that you can follow before you write your assignment and for preparation for your exam are there any questions or queries before we edit let's say just for the exam purposes I'm not sure if you've answered this question already but will they allow us to use these templates that you provided for square and regression you should be able to use them provided the type of the system you're using for proctoring yeah so if the system you are using for proctoring it doesn't allow you to move from one view to the other then you can use the template you have to use the formulas but if it doesn't you can continue and use the template because if you are able to talk in between two screens or minimize your platform where you are writing you can use them or if you are using your phone you must be very careful as well in terms of remember when they say take a picture I don't know what picture you are taking of yourself of the screen where you are or when your proctor system automatically take pictures of whatever you are using you just need to make sure that it doesn't show as if like you are cheating the system right because if you open a template which is not part of your your exam and your exam is not an open book then it will reflect as if like you are cheating okay and you don't want those kind of things so but we can discuss for exam we can discuss how you are going to take the exam and I would already also have found out more from your lecturer in terms of the type of an exam you are writing what is allowed and what is not allowed because I cannot make that decision myself we get direction from your lecturer to say what is it that we need to relate to you otherwise there will be a tutorial letter for the exam where it explains everything in terms of what you need to be aware of what is allowed whether they will give you tables and formulas and all that because when you were writing a venue based or a venue based exam the question paper came with certain tables and came with all tables and came with certain formulas and we had to make sure that we make you aware of which formulas you need to remember and memorize whereas now you might not be given formulas because you are allowed to use whatever your formulas that you have at your own disposal so those kind of things we can discuss them closer to when we do the exam preparations we can always allow you in terms of things to look out for things to be aware of okay thank you alright then enjoy the rest of your day and I will see you on Sunday and answer at least do your first attempt and then yeah on Sunday we can look at where you are still struggling with or whatever the concept you are still struggling with and then so that you can submit your final assignment remember also your assignment 5 also it's very important that you perform well on it so that you get a good year mark and those who have not submitted at least 4 assignments try and make sure that you don't skip this one so that also you get a good full exam entrance mark because if you get less than 40 they might also not be allowing you to write the exam so it's very very important to submit all your assignments and make sure that you at least get more than 40% for your year mark to be considered the other thing you must also remember that in your exam if you get a year mark of I'm not sure I can't even remember now when do they not we need to when we do the exam preparation we will look at that we will talk about what is it that you need to be aware of when your exam mark will be used and when it will not be used things like that because it's very very important to have a very good year mark so that you can pass with a very good final mark for your stats as well but anyway enjoy the rest of your day and happy learning bye