 Now I get an answer. Okay. All right, so we are recording and again if you have any problem with it does it's not working for you, you let me know. I don't think I have to worry about anybody not wanting to appear on camera because I'm the only one. All right, what do we have here. Annotate. The null hypothesis. If anybody wants to speak up now is that you can do that no one will know who you are on the recording, but I'm, as usual I'm not pressing the point anyway. So, let's call a group one married men and group two single men we can say me one is equal to me two that's the null hypothesis. Me one is not equal to me two. That's the ultimate hypothesis that's really all you need. You don't need anything finer than that. We're working with a Z. What's alpha. Alpha is point 10. So, and it's all we're always going to be doing for this test, it'll always be a two tail test. 0.05 here, 0.05 here. And you can you look up the critical values in the z table somebody tell me in the chat, what you get. And then we're going to get the calculated value of Z. If someone's asking me to repeat I'm not sure what. Can anyone tell me what the critical value is for this test. Negative plus and minus 1.645. Yes, absolutely. That's correct. So it's 1.645 on this side and negative 1.645 on that side. And in order to know what Z to use. I need to find the right formula right. So let's go to the formula sheet. We want something that has two X bars in it. Right. So here we are Z is equal to X bar one minus X bar two. Well actually we're going to be using S S one and S two as point estimators for Sigma one and Sigma two. So we've got the formula right there. And what did you get it. What's the numerator at 8.7 minus 7.9. Well let's go back to the problem. Okay. All right so Z is equal to the difference between the two X bars, divided by the square root of me take a look at the formula again. That's and this is why you really should have the formula sheet printed out so that you don't have to keep on going back and forth. One squared over N one plus S two squared over N two. And that's under a square root. Now if you've done it all if you've done the computation can somebody tell me what you got for the calculated value of Z. You got some I see someone got four point two people got 4.13 good. I'll take it. So the calculated value of Z ends up being 4.13. It's definitely past 1.645 it's in the region of rejection. And so the conclusion is reject H O. All right. Let's clear and go to another one. I want to one thing you can see if you didn't see it already that these two sample tests are really, really easy you just need to do a lot of them. And once you've gotten one, they're all basically the same or very, very similar. Let's take a look at this one under the two sample T. Okay here this one's looking at the differences between men and women. And we have an average wage for men of $18 and 95 cents I guess that's an hour. And for women $15 and nine cents. And the question is, is the reason they're different just because any two samples will be different. Or are the population means really different. I mean, anytime somebody wants to take a company to court they have to do a test like this. But this is going to be a T. Why is it a T we do. Well, first of all, we don't know Sigma, right. And the sample size is very small. So when there are two samples, you want to make sure you are at most at 32 in, in order to be able to use the T distribution. Otherwise is really nothing you can do in this class. Thank you. Yes, it's absolutely correct. The people who answered in the chat. Yes, we are definitely, we know that we're using T and we know why we're using T. Okay, now what's the alternate null and alternate hypothesis. It's going to look very familiar. Mu one is equal to me too. Ha, it's exactly the same right. Okay, this is, this is the reason this is the same is because we have scaled down the complexity a lot. We're not doing one tail tests. We're looking at the differences between two means. You don't have to give a particular value. You'd say, well, I'm hypothesizing that Mu one minus Mu two is zero, which is the same thing as saying Mu one is equal to me too. So this looks exactly the same as what happened before. But we're using a T distribution now, not a Z. And it's a T with how many degrees of freedom 29. Thank you. I love hearing a voice. Okay, it's here with 29 degrees of freedom. It's n one minus one and two minus one. And it's a two tail test. What's alpha. 0.05. So the tail probability is 0.025, which means that you're looking up on the T table where the row is 29. And the column is the point 025 column. And what did you get for your critical values? I see 2.0452. Yes. And of course, on the other side, it's minus 2.0452. Okay, let's go back. So and as you know, by now looking things up in the T table is much easier than in the Z table. So let's set up exactly the way we want to use it for inference. Okay. And I need the formula to get the calculated value of T from the formula sheet. And is it going to look very much like the formula for Z. But it's going to be a lot harder, right? Because what do you have in the formula for T that you didn't have before for a Z? You've got S pooled exactly. And that's the reason that when you do your homework, you'll see that I asked you to use Excel for two sample T tests. Because very few people will actually do it by hand, although I do also want you to be able at least once to show me that you know how to do it by hand. Okay, so question on the test will be expected to solve the T ones by hand, right? You'll be able, you're expected to know how to solve these by hand, but you're also expected to know how to read Excel printouts. I won't ask you to do Excel on the test. All right. But I may give you a printout and, you know, if you sure you might be able to figure it out but if you've done it, you'll be much more comfortable. Thank you. Let's see. What did you get for the calculated value of T? I'm not going to compute the whole thing now because I kind of want to save time, but it really shouldn't be that difficult. It's just a lot of what we're doing today, including correlation and regression is just plugging numbers into formulas. What did you get for the calculator value of T from the two X bars? 2.63 I see. Good. And that's the final result. So that has the two X bars in the numerator that has the pooled variance in the denominator. So yeah, it's kind of a hairy formula. And nobody likes to do it. Okay, so that's actually part three. And then for part four, we have to may have a conclusion. The critical value was 2.0452 2.63 is in the region of rejection. And so the conclusion is reject the null hypothesis that the two groups produce different statistics, because they really are two different populations. It's not reasonable to believe that these are just two random samples from the same population. That's basically in any two group test, that's always what you're looking at. Okay, let's take a look at another one. I want to do one example of each and then we'll take questions. Okay. Here we are. Here's a sample Z for proportion. It's going to look exactly the same. Okay. A researcher wants to compare foreclosure rates. Okay, so right away, we're thinking proportion here. Rates on mortgages issued to men and women. So men out of 500 mortgages 65 were resulted in foreclosure. Women out of 200 mortgages 18 resulted in foreclosure. So you're going to have two sample proportions right sample proportion for group one sample proportion for group two. And the null and alternate hypotheses are going to look very similar, but not exactly the same this time, because the null hypothesis, instead of being mu one equals mu two will be P one equal equals P two. And the null and alternate hypothesis is that P one is not equal to P two. What's the test statistic we're using. We must use the we don't even have a choice here that's the only one we know how to use for for proportion. What's alpha point oh five. Is it a two tail test. Yes, because in this class, all two sample tests are to tail tests. What are the critical values from the Z distribution for point 025 on one side and point 025 on the other side. Yes, plus and minus 1.96. Next semester. I dare you to try to forget to tail alpha point oh five from the Z plus and minus 1.96. You're having trouble remembering it this semester next semester you're going to have trouble forgetting it marked my words. You should get back to me and let me know if that happened. Okay. So now, so we've got the first part the null and alternate hypothesis we've got the second part, we set up the decision rule the critical values. We need the, the calculator value of Z from the data, whatever it calculates to if it's greater than 1.96 it's in the region of rejection on this side. If it's less than 1.96 it's in the region of rejection on this side. And how do we do that what formula do we use. Well, we're going to need the formula that has two proportions in it right. Let's just take a quick look at the formula sheet. Oops. It's going to be a Z equals. I know it's here somewhere. There we are. The sample proportion one minus sample proportion to divided by P bar. So again, just like with T, we have this idea of a pooled statistic. This is P bar where we take the two numerators and add them together and the two denominators and add them together. Take a minute or two not more than that and get the calculated value for Z for this problem. I'm going to do that on your own for a minute. Just get the value. See somebody is answering. This is a question. This is a two sample test for the difference between two proportions. You know it's a Z. Look on your formula sheet. And look for something that starts Z equals and has two different proportions in it. It'll be this PS one minus PS to sample proportion one and sample proportion to one formula that looks like that. All right. So this is the solutions page. And it starts out with exactly what we had the null hypothesis P one is equal to P two, the alternate hypothesis P one is not equal to P two. And it's a Z distribution with the critical values plus and minus 1.96. The formula Z equals PS one is 65 over 500 or point one three. There's that point one three PS two is 18 over 200 or point oh nine. There's the point oh nine that's the numerator P bar is 65 plus 1883 in the numerator and 500 plus 200 in the denominator that's 700. So it's kind of like P bar is we're telling ourselves, what if this this really was just two samples from the same population. Well, why don't I just combine them and make believe that I got one bit one sample of 700 with 83. So if they really are from the same population, my, my overall pooled P should be point one one nine 83 over 700. And that's what I put in for the measure of variation point one one nine times one minus point one one nine times one over and one plus one over and two that's the formula. One point four eight. Now, where is one point four eight. On your picture of your null hypothesis with the test statistic. It's not beyond one point nine six it's not beyond negative one point nine six it's in the white area in the middle the, the except HO area in the middle. And we don't say except HO because you can never know for sure if they're exactly the same, but certainly you can't reject HO. And that's the way we put it. Okay, so you've just seen one each of every type of two sample test that that were learned that we learned how to do this semester. And if you, if you've, once you've learned the stuff that you had for exam three. This was nothing. This was very, very easy because it was way reduced from what you had to know for exam three. If you had trouble with exam three, then you, you know, you may have more studying to do with two sample tests because you have a lot of the same concepts here that we had before. We have a null and alternate hypothesis. We have the critical values we have formula that we have to compute from the data from the sample evidence. We've got to know whether to reject the null hypothesis or not. But there's a lot we don't have. And I went over that at the beginning of the class things we don't have to know. The rest of this as they say is just practice. And where are you going to find things to practice for two sample tests. All of these practice problems, as long as they're about inference, they're going to have two sample tests in them. As a matter of fact, there was a review here it was this first review session review session for inference. And you, you did this for the last test but you skip the material in here that was related to two sample tests so now you can go back and finish it and do the rest of it. This is interesting I have this right here maybe we should just jump to it. Well let's do one more thing. So, in addition to two sample tests. This exam is going to cover correlation and regression. And you already know from preparing for class that correlation and regression. Here we are is just basically a bunch of formulas. So, already understand the concept of a p value. So, I'll show you where that might be relevant. But I'm not going to ask you to do an entire hypothesis test. In order to test the correlation coefficient for significance, or in order to test the slope of the regression for significance. There are some things that really are done sometimes even in an intro course, but it's not the way we do regression and correlation. The way we do regression and correlation is really mostly like a descriptive technique. And if you remember way back when when we did descriptives with two variables, we had a scatter plot. And you compute the correlation coefficient and or draw a straight line on it. That's correlation and regression. So let's just take a look at the steps here for computing these formulas. Now somebody asked me, if you're going to have to get if you're going to get raw data on a test. Sure, you should be able to do problems using raw data. But in my opinion, it's a very bad use of class of test time to have you adding up a long list of numbers. So I'll, you almost always give you the summations to, in addition to the data or maybe just in some problems you'll only get the summations. But one thing you'll have to know because I will very often not give it to you. You'll have to know whether which variable is X and which variable is why. Remember, I'm always more interested in that you understand why things work, not how to plug numbers into a formula. So you will need to know, if I give you two, two variables paired data, you'll need to know which one you should call X and which one you should call Y. We do a problem, pay, you know, pay attention and keep it in mind if you don't understand then ask them. Okay, so the next step is to get the correlation coefficient so you see correlation regression, even though I'm treating them like two separate topics they really are the same topic. And here, we take some of the summations that we got from step one. And we use it to compute a correlation coefficient the correlation coefficient tells us how strongly related, these two variables are, and it goes from negative one to positive one and an R of zero means no relationship. And we're not going to test this for significance but it definitely can be. And also, we can sometimes give a good guess. You know if you have a correlation coefficient of point oh one or negative point oh one. It's a pretty good guess that that's not too different from zero. Now this step three, you calculate the coefficient of determination are square, it's just our square. It's an important measure, because not only does it give you a measure of correlation that's signed it's not related to sign, you know you're getting rid of the effects of the sign by squaring, but it actually is more meaningful for the regression. So R square, which is called the coefficient of determination gives you the proportion of the variation in why that's explained by X, if you have only a one one X in your regression. We always though the why the response variable is always what we're studying. The X is what we're using to try to understand it. So for example, if why is exam grades exam scores. For X we might be choosing things like how much money you have in the bank, or how many hours you spent studying, or how long your big toe is. We're trying to find things that might explain why everyone doesn't get the same value on their exam, why everyone doesn't get the same score. So R square is the proportion of variation that's explained by the regression, the proportion of the total variation and why. Then we get the regression coefficient be one you compute that to now yeah these are big formulas but you have the formula sheet I don't want you to memorize anything. If you have the formula sheet it's just a matter of figuring out which are the summations, the only thing you'll be thinking about really is what's X and what's why, because like I said I'll be giving you the summations. And you have to compute be one using this formula, then you compute be zero using be one. And then you can finally write out the regression equation. If this is the scatter plot, the regression equation is a line that's drawn through the scatter plot in the best possible way. Okay. And if you scatter plot looks like this the line that was drawn doesn't look that bad. Why hat is always your regression line. And be zero is the Y intercept be one is your slope. If be one is positive. That means you have a positive slope as X goes up Y goes up. If be one is negative that means you have a negative slope as X goes up Y goes down. What can the intercept tell us. Well it could tell us what's the value of why when X is zero. If there's if there's no X if there's nothing. I don't have a big toe, my big toes absent X is zero what will my grade be. If we if we were to use that ridiculous example. Okay, and this is just what we're going to see other examples but this is just one. Let's see what would be a good one to look at. You know I kind of like the pretty one here. This review is on the handouts page under review session regression problems using Excel. Okay, you already can follow. Great. In order to get the formulas down for correlation and regression, if you don't have Excel. And it's just a matter of plugging numbers into formulas. So, it's kind of. We're starting to go over it now although I don't mind doing it but let's first go over things where we can actually interpret the results first. Okay. Here's a problem. There's a reason that this is in here, even though we're kind of saying, I want to show you something that where we should not have done a regression. That's basically what this problem is. The X is a math score on probably on a math test, and why is a job performance score. And we want to see what the relationship is we want to see whether the score that an individual gets on the math test has anything to do with predict predicting their job performance in this particular job. Take a look at the scatter plot. The scatter plot looks pretty random right. This is a really bad candidate for regression. All right, and that's why we're using this first, because significance F is p value. And you know that p value is the probability of getting the sample evidence that I got. And the reason that X and Y are not related. Okay. And basically what this is asking is, is this a significant regression. Does the regret is there is the regression something that I can write up. Oh, this regression was a good idea, and everyone should do it. This is a formula that should be used in the future for predicting. Well just looking at the scatter plot you know that's probably not so. But the, the, the significance of the regression. We didn't learn the test but that's what significance F is. It's the significance the p value of the F test. And it's if we're using point oh five for alpha, you know, usually, it's certainly greater than point oh five, even if we're going to use point 10 it's certainly greater than point 10. What about if we were going to look at some of the others like the slope term which is related to our. Okay, the p value for that is also. It's exactly the same because they are related the slope and the regression when we're testing the slope when we're testing the regression and a, a simple regression with one X. That's what we're basically looking at the same dimension. So, if you have a problem like this. You don't immediately go and start answering or asking or answering questions about the correlation about the regression. There's no point. It shouldn't have been done. But let's take a look at a problem where it's okay it works. Here we've got X years of education, why hourly rate wage. And remember when you're deciding on X and Y. And when the problem is asking you to decide on X and Y. Why, in some way depends on X, not the other way around X is the independent. Why is the dependent we're using X to explain why. All right. So in this case. I don't think we don't have the scatter plot here, but all right. In this case, we're saying, yes, there is a significant relationship. We're not doing a test, but at least it, it's meaningful to go ahead and look at the regression. Let's let's look at these questions. I like this because these are some questions that can be asked in a blackboard type of test. Is the regression significant. Yes or no, probably probably not going to ask true and false is but all right, you should be able to figure it out. If I do ask that question it's something you should be able to figure out. What's the value of be zero well you read it right off the table I mean if, if this is a problem where I asked you to compute the values on your own. Here we are. Well then you do you computed it so you should be able to answer the question what is be zero what is be one you should be able to write that in for sure. But on the regression output, you should be able to do it to the coefficient, the intercept is be zero. That's negative 17.9. So basically what that means is that supposedly, if I have zero years of education, then my hourly wage should be a negative value. Now, the, the intercept as we know doesn't always make sense. All it is is a way of positioning things on the two dimensional plane. So it doesn't always really mean when you have zero X. Here's what why is going to be. The one is the coefficient for what's called on Excel X variable one. And that that's a slope, it's a slope term there's a positive slope 3.349. Okay, so that's be zero be one question three what's the regression equation oh I know how to do that. The regression equation is let's see, let's just rate it out. Let's look at the output and I say the regression equation is why hat equals and what's be zero negative 17.94 plus 3.349 times X. That's the regression equation. If that's my question, I know how to answer it. What's the correlation coefficient. Oops. Okay, what's the correlation coefficient well, that's what Excel calls multiple are. So ours the correlation ours point 868. It's a fairly strong positive correlation we got here we're not testing it, but it's a pretty high number it's almost point nine which which would be significant. What's the next question. It doesn't ask about our square but if somebody if it does if a question is asked, you can pull our square right off here point 754. If the question asks what's the proportion of the variation in wages that's explained by years of education point 754 that's the definition of our square. What does it ask what's the proportion. Oh, that is it. Okay, what's the proportion of variation hourly wage explained by your that's just that's the definition of our square. So I just read our square off of the output. I think there's another question but we didn't ask it in the first problem let's ask it here. Okay, here's another one. We have X is age why is how long it takes to complete a task. If I look at our first, there's a positive correlation, fairly strong positive correlation although I won't ask you to test it for significance. Actually, we can see right here that the regression is significant. The, the be be zero and be one be zero you pull off of the output 11.525 be one 1.032 the regression equation is be zero plus be one times X. What's the correlation coefficient we just saw that. What's our squared point 753. Now that the additional question here, how long would it take somebody who was of age 50 to complete the task. Well, for this is an answer to that all we do is we take 50 and plug it into X in the regression equation. So we'd have 1.032 times 50 added to 11.525, but I want you to notice something. These ages in order. Yeah, so the ages go from 23 to 70, which is good, because regression is not really valid. If you're extrapolating. If you're interpolating if you have a value that's in the data that you already used to build the regression. That's okay but if for example you wanted to know how long would it take somebody who was of age 16 to do the task. This regression wouldn't really help you it wouldn't wouldn't be valid. Somebody asked the question how do we know if the regression is significant. And that's just this thing called significance right over here in the middle smack in the middle. And so the answer to this question how long would it take somebody who was age 50 to complete the task is 63.125 minutes. All we're doing here is plugging a value into the regression formula. So what you see here is taking a bird's eye view of correlation and regression. You see that you can compute the correlation coefficient you can compute our square. You can compute the regression coefficients. You can do that all without Excel using the formulas and you should know how to do that. But you could also answer all these questions with the Excel print out. And very and maybe half and half on the test. Half I might ask you to come do the computation half I might ask you to just read off a print out and either answer a multiple choice question or right in the answer. You should be able to handle it either way. Okay. Let's take a look at one more. So this problem, we've got a little bit a shameless self promotion. X is the number of absences in class which I just gave up on so I wouldn't even know. Why is the score that you get on the statistics final. Clearly, we're looking to see if the differences in score can be explained at least partially by the number of absences that a student has. Is the regression significant. How do we know you look at that middle where it says significance, and it's point zero zero three. So it's quite small this p value. And it's certainly less than point oh five or point oh one. So the answer is yes it's significant. What is be zero what is be one be zero. I need this a little larger. Right. Oh, we have it right here, 88. Okay, 88 point 75 is be zero negative 4.4 is be one. So what does a negative slope mean. Anybody just explain it in words. Where verbally and if you unmute yourself, or in the chat. What is that negative 4.4 zero mean in terms of the relationship between absences and the score on the final chat. Okay, somebody didn't want to wait. Okay. First of all, mainly what it tells me is that I have a negative slope. There's a negative relationship or an inverse relationship between number of absences and the score on the stat final. So as the number of absences goes up, as it gets larger and larger and larger. The score on the stat final goes down it gets smaller and smaller and smaller. That's why I said shameless self promotion. Okay. So that's, that's a negative and inverse relationship. Now well here's a good time to point out our Is that one of the questions, what's the correlation coefficient, take a look at this skip ahead. What's the correlation coefficient. Well, if it's an inverse relationship the correlation coefficient is supposed to be negative right negative point zero point 674. What do you have over here you just have plus point 674 Excel doesn't give you the minus, because Excel computes are square and then just takes the square root in order to get our. You have to figure out if this are supposed to have a minus sign on it. How do you know that you know that from looking at the slope. The slope is negative. There's an inverse relationship and then our has to be negative, and you have to know that from from working with the Excel output. Okay, how do we express the regression equation. Why hat is equal to 88.75 minus 4.4 times x. If I know someone who's absent eight times. What would I predict that that person would get on the final. Well I have to plug eight into my equation. So why hat would be 88.75 minus 4.40 times eight or 53.55. How good is number of absences as a measure of explaining the grade that a student gets on the stat final. Yeah, it's okay, but it could be better. The proportion of variation and why that's explained by x is point 454. 45% of the variation and why can be explained by x the rest of it is due to other factors that we didn't consider in this problem. All right, I'll leave the rest for you. Now, let's go to you now because I wanted to at least illustrate in order to to explain to help your understanding every different type of problem you'd get. I mean what you're required to do to study for the test, but now let me throw this over to you what questions do you have, or do you have a particular problem that you were doing that gave you trouble. As a matter of fact what I'll do is I'll stop the record. If I can figure out where that is to make sure that people aren't shy about bringing up their questions. Thank you.