 Welcome back everyone. Today, we're going to talk about hypothesis testing with two samples. Okay. So, if you want to test a claim that involves two groups, you can use a slightly different technique when conducting a hypothesis test. So, like we talked about before, we were essentially doing hypothesis testing using one sample. So, we collected data from a population, we knew that we may have known the distribution of that data, or at least we had a pretty good idea that we wanted it to be most likely a normal distribution. And in this case, we were just testing essentially one claim. And that claim, we were making the null hypothesis and the alternative hypothesis and then using the data to determine could we support or could we reject the null hypothesis. So, if you want to test the claim that involves two groups, you can use a slightly different technique when conducting a hypothesis test, the types of breakfasts eaten in east and west of the Mississippi River, for example. So, in this case, we're interested in what are the types of breakfasts eaten east of the Mississippi and west of the Mississippi. And notice here that these are completely different, or these are completely independent groups that we're testing. So, if it's people east of the Mississippi, well, people east of the Mississippi don't really, the population doesn't really affect the population west of Mississippi. So, whenever we're doing random sampling of the individual populations, and then we want to compare both populations together, right? So, we want to compare the types of breakfast eaten east of the Mississippi, west of the Mississippi. We have two separate groups. We want to compare them in some ways. Typically, one group is given aspirin and the other group is given a placebo. So, in clinical studies, whenever we're trying to test whether medicine actually works, we basically give some real medicine to a particular group, and we give a placebo or no medicine, or basically fake medicine to another group. And then we also have another group usually where we give them no medicine at all, no placebo, no medicine. So, in studies like this, we would usually have at least like real medicine and then a placebo, which is kind of fake medicine that doesn't actually do anything, but it does make you think that something's going on. And then essentially a group that received no medicine at all, and we want to compare is the placebo group any different than the aspirin group. So, in this case, we have one group that took one medicine, one group that took another medicine, they're completely separate. And if the medicine actually works better than the placebo, then we say that it has significant results. If the medicine doesn't work better than the placebo, then we say it doesn't have any significant results. So, this is one way where we can test if a medicine actually is doing anything or whether it's just all in our mind. And if you don't know about the placebo effect, the placebo effect is actually a very powerful effect that humans have. Sometimes even whenever somebody is given a kind of like a candy that they know doesn't have any medical powers, but they are told that it has medical powers. Even if they know it doesn't, then it can still improve their health just because they think it does. So, placebos are a very interesting, very powerful method of kind of mind control, but mind manipulation, essentially. So, just receiving the pill is essentially the thing that makes people feel better, even if they're not actually better. So, we use that in essentially testing medicine. If you don't know about placebo, I highly recommend that you look it up. It's quite interesting. Let's say we want to test heart attack rate studied over several years. So, heart attack rate studied over several years. Here, we essentially have different groups that could either be different participants, different, we could group it into years. There's lots of different ways we could group it here to attempt to test a claim that involves basically two or more groups in this case. So, comparison of two groups, we compare various diet and exercise programs. So, kind of like the medicine we give one group, a particular diet, one group, another diet, another type of diet, or maybe even no diet at all. Just say keep doing what you're doing and then see how each group responds. There's been some groups that have been put on diets where people who were on no diet at all actually performed better than people on, I don't remember what it was, something similar to the Atkins diet, which is basically all breads but no meat or something like that. So, just comparing in this case two different diet programs. Again, these are kind of like clinical trials. So, we want to know which one is actually producing better results or are things producing a better result or any result compared to nothing at all. Another one is politicians compare the proportion of individuals from different income brackets who might vote for them. So, you can usually trace different voters based on not only regions but also how much money people make. If people, well, depending on the country in the US at least, if people make a lot of money or if they make very little money then they tend to vote conservative and then if they make kind of, let's say, middle income they tend to vote more liberal or democratic essentially. So, politicians really want to know how these different groups are going to react to different things or vote essentially for them. So, they really are interested in how these different groups, well, who are these groups and how they will react during voting time. Students are interested in whether SAT or GRE preparatory courses really help raise their scores. We kind of talked about something similar to this in a single sample testing. But in this case, we're looking not only at a single score, right? So, I think we were, what was the example? We were looking at whether something like a haguan was actually improving student scores or not. In this case, we're interested which one is going to really help raise their scores. So, which courses like an SAT course or a GRE course or maybe, I don't know, we're going to compare a couple of different courses together essentially. Yeah. So, so far we've looked at single means and single proportions. So, by single means, we've looked at the mean of a particular sample that we've taken out of some population and single proportions. And now we're looking at basically multiple means or comparing the means of two different groups. Yeah. And here we are, independent groups. So, we need kind of the idea of how groups essentially affect each other. Okay. So, if we have independent groups, these consist of two samples that are completely independent. Okay. Two samples that are completely independent. In this case, the sample of one does not affect in any way the sample of the other one. So, again, think about people eating breakfast east of the Mississippi and west of the Mississippi. Well, those samples depend, as long as we do them basically at the same time, they're not going to affect each other. They're everyone in that both of those regions are going to eat whatever breakfast they choose and their their decision isn't really going to affect people in the other region. So, in this case, independent groups sample values are selected from selected from one population are not related in any way to sample values selected from another population. Okay. So, they are completely independent if they do not affect each other in any way. We can also do hypothesis testing over matched pairs, which essentially consists of two samples that are dependent, two samples that are dependent. We'll give an example of that. And I thought I had more on this, but I guess we don't. Okay. Right. So, two independent samples simple random samples from two distinct populations, two independent samples are simple random samples from two distinct populations. So, again, asking people what they want, what they're going to eat for breakfast. We go east and west of the Mississippi River. And we take a random sample of the people eating breakfast in both places and they're completely random. In that way, they don't affect each other and we're getting a random sample from the population. For two distinct populations, if the sample sizes are small, the distributions are important. If we have small sample sizes, we need to know the distributions and therefore the distributions should be normal if we actually want to compare them because again, we're really comparing means in this case. If the sample sizes are large, the distributions are not as important because we have enough data and we'll talk about why that is. Okay. So, the average amount of time boys and girls aged 7 to 11 spend playing sports each day is believed to be the same. So, right now we think that boys and girls aged 7 to 11 spend the same amount of time playing sports every day. Right. So, this is some hypothesis now that we can test. We've come up with a hypothesis, the claim that boys and girls aged 7 to 11 play the same amount of sports every day basically. Okay. So, a study is done and data are collected. Each population has a normal distribution. So, in this case, we know it's going to be fairly normal because there's going to be some kids that don't play at all and some kids that play, let's say, a lot of several hours a day and then there's going to be some mean or average. So, that's going to look, it should look fairly normal. Of course, kids now probably don't go out quite as much as they used to but so maybe it would be kind of scaled left but anyway, it should look relatively normal. Okay. Here we have two groups and the two groups are boys and girls and they are independent because the number of boys or the things that boys are doing are not going to affect the girls and vice versa. So, we have two distinct groups. They are independent and we want to make some claim about them. So, here we have girls and we have a sample size is 9 and the average number of hours playing sports per day is, in this case, 2. Sample standard deviation is 0.866. For the boy group, we have a sample size of 16. The average number of hours playing sports per day is 3.2. Samples standard deviation is 1 essentially. So, in this case, our sample size for girls is actually quite a bit less, right? It would be good if we had a larger sample size for girls but we know that both of these samples, the sample for girls and the sample for boys are going to be normal. So, we can still make a comparison even with the data we have. Maybe there will be more error basically in the comparison of these samples because the sample size of girls is quite a bit less than the sample size of boys. So, there is a bit more error than we would like but we still can make this comparison and here we say the average number of hours playing sports per day, the average number of hours sports per day for girls and boys. Now, with that average, we can start to make some sort of claim over this distribution and remember the average is essentially just the mean. So, now we can calculate how confident are we with the sample sizes that we have, how confident are we that that average is correct and for the girls, we're not actually that confident. There's going to be a relatively big scale at which the average could fall in where we would be confident that the average is going to fall for the boys that scale is going to be a little bit less but we're yeah. So, in this case, you can kind of see where I'm going with confidence intervals which we talked about before with significant levels, with p-values for example, we can calculate how likely is it that the girls' number of hours playing sports per day, how confident are we that it will fall within some range of two and for the girls value that range will be higher, the boys value that range will be a little bit lower but yeah. So, now you can kind of see how we can start to make a comparison between boys and girls' independent groups. The population means with unknown standard deviations. So, if we don't know the standard deviations, the comparison of two population means is very common. Very different means can occur by chance if there's a great variation among individual samples which is of course possible if there's a huge variation in sample. Essentially our error rate goes up or our confidence level essentially goes down. So, very different means can occur by chance if there's a great variation among individual samples. Account for variation by taking the difference of the sample means. So, essentially x1 minus x2 divide by the standard error. So, this is just dealing with essentially all this variation. If we have high variation in our samples then it becomes very difficult to predict what the next value is going to be and what we're trying essentially to do is to predict what values will be what are normal values but if there's huge variation between the values it obviously becomes a lot more difficult to predict these things. So, we want to account for this variation by kind of minimizing variation as much as possible. Degrees of freedom. Each of a number of independently variable factors affecting the range of states in which a system may exist. We won't talk about degrees of freedom too much. I don't think you really need to know too much about it now. We will talk about it later quite a bit but degrees of freedom just think of it as independent variables. So, it's depending on how many variables you have in your data or what you're testing for the degrees of freedom are variables that can change essentially. How many independent things there are and as we introduce constraints we remove degrees of freedom. So, we tend to in studies we want to constrain the study as much as possible but sometimes there's a lot of variables affecting certain outcomes and we consider those variables degrees of freedom for the sample. Coins D is a measure of effect size based on the differences between two means and the measures measures the relative strength of the differences between the means of two populations based on sample data. So, this is just a way to compare two samples and the relative strength of the differences between these two means. Again, we're comparing essentially in this case two independent samples. We're taking their means and we want to say something about the means in relation to each other even though they are completely independent. So, using Coins D we can measure the strength of the relation between those means. So, when conducting a hypothesis test that compares two independent population proportions the following characteristics should be present. The two independent samples are simple random samples that are independent. Notice basically every time we've been collecting samples especially for independent populations we're really focusing on random samples. Random sampling is one of the best types of sampling to do but whenever we're talking about especially independent groups we want simple random samples and we want to make sure that the populations are actually independent. The number of successes is at least five and the number of failures is at least five for each of the samples. Growing literature states the population must be at least 10 or 20 times the size of the sample. This keeps each population from being oversampled and causing incorrect results. So, for example, two types of medication for hives are being tested. Hives are kind of like a rash. Two types of medication for hives are being tested to determine if there is a difference in the proportions of adult patient reactions. 20 out of 20 of a random sample of 200 adults given a medication given medication A still had hives 30 minutes after taking the medication. Let me let me say that again. 20 out of a random sample of 200 adults given medication A medication A still had hives 30 minutes after taking the medication. 12 out of another random sample of 200 adults given medication B still had hives 30 minutes after taking the medication. Now in this case we have let's see 20 out of a random sample of 200 adults right still had hives 30 minutes after and then for for medication A 12 out of another random sample of 200 adults given medication B had hives 30 minutes after. Now just looking at this we would probably say that medication B appears to be working better right but is that the case can you can you confidently say that's the case no we still have to actually calculate you know how how confident are we what is the significance of these findings how confident are we that these things are actually true right we can't just say well you know medication B we tested it once and compared to one test of medication A it worked better so medication B is better no that's not good enough that doesn't actually tell us anything right so not only do we have to have the this this experiment done multiple times remember last time we talked about type 1 and type 2 errors we still have these type 1 and type 2 errors that can happen with our hypothesis using multiple samples but also we have to calculate what is the actual significance of these findings what how confident are we that medication B is actually performing better than medication A and what we might find is that they actually work about the same even though medication B we had 12 out of another random sample of 200 people essentially not have high or only 12 rather than 20 it looks like medication B is going to work better but we still have to calculate how confident or how significant are these findings and what we might find out is that they the findings actually aren't significant and medication A and medication B work basically exactly the same even though medication B in this case kind of got lucky and you know seems to work a little bit better in this case but it probably doesn't work better over time okay if two estimated proportions are different it may be difference in the populations so in this case the the random sampling of the population for medication A and the medication B maybe there was a difference in the population that actually accounts for the fact that less people had hives whenever they were taking medication B maybe it was the people and not the actual medicine or it could be random chance so it is possible for example if you remember back to probability it is possible to flip ahead like heads on a coin you know even 10 times however it's very very unlikely that you're going to flip ahead 10 times but it's still possible right so in this case maybe it was just random chance that that the medicine B outperformed medicine A so it could be either due to population it could be due to just random chance that medicine B just happened to perform better this time but overall it doesn't perform any better this is why we have to do multiple sampling possibly from well usually from the same population or these independent populations if we do multiple sampling and we do multiple experiments then we essentially remove issues in the difference of populations and issues of chance and remove our error rates so again reproducing your studies over and over again is the best way to remove these kind of errors hypothesis test can help to determine if a difference if a difference in the estimated proportions reflects a difference in the population proportions right the difference of two proportions follows an approximate normal distribution so the difference of two proportions follows an approximate normal distribution null hypothesis usually states that the two proportions are the same okay so in this case if you recall last time we were essentially saying that the null hypothesis for a single sample is essentially the opposite of whatever hypothesis you're trying to to prove so and or should I say support actually so for example a car gets greater than 30 gallons per mile gas mileage right so the null hypothesis would then be getting less than 30 miles per gallon so anything less than 30 miles per gallon is the null hypothesis in the case of two independent populations the null hypothesis is basically stating that the two proportions are exactly the same so for the case of the medicine and the placebo we say that the the two proportions or the two means essentially that we're measuring are exactly the same so in this case the medicine is performing at the same rate as the placebo which shouldn't have really any effect at all right so in that case the medicine is cannot be differentiated from the performance of the placebo which means they are the same right so the null hypothesis is that basically both of your populations are the same and whenever you disprove or whenever you reject the null hypothesis you're basically just stating that those two populations are actually different so the medicine does outperform the the placebo by a certain amount or we are certain or we're confident that there is a significant difference essentially subjects are matched in pairs and differences are calculated we're talking about matched or paired samples so in this case rather than having two independent populations or independent samples we have matched pairs so these are essentially you can think of it coming from usually coming from the same sample just through multiple times so for example if you have a sample of a person at one time and a sample of the same person at a different time you might have different values of the sample but you've taken them from the same person right so in this case subjects are subject subjects are matched in pairs and differences are calculated so and I'll give you an example in a second when using a hypothesis test for matched or paired samples the following characteristics should be present first simple random sampling is used right so we've we've collected our samples using random sampling sample sizes are usually small especially between between samples two measurements are drawn from the same pair of individuals or objects so our sample sizes are small two measurements are drawn from the same pair of individuals or objects differences are calculated from the matched or paired samples the differences from the sample that is used for the hypothesis differences form the sample that's used for the hypothesis test either the matched pairs have differences that come from a population that is normal or the number of differences is sufficiently large so the distribution of the sample mean of differences is approximately normal so again we're dealing with it's ideal to deal with normal distributions again so here's an example a college football coach was interested in whether the college college's strength development class increased his players maximum lift in pounds on the bench press exercise so basically these football players are are doing bench press and the coach wants to know whether the course is actually improving their their maximum lift or improving their strength or not he asked four of his players to participate in a study the amount of weight they could lift was recorded before they took the strength development class after completing the class the amount of weight they could lift each lift was again measured the data are follows so here we have notice we have one player right or we have actually four players but we're looking at player one we're looking at player two we're looking at player three player four individually and we have two groups for each player one group is the amount of weight lifted prior to the class one group is the amount of weight lifted after the class now this relates to a single person or a single individual or object but we have two groups two samples per person two samples per person okay so player one amount of weight lifted prior to the class is 205 pounds amount of weight lifted after the class is 295 pounds and i'm not sure about anyone in the class but to me that is a lot of weight right and that's also a huge huge improvements right so just looking at player one we say okay well this class or whatever player one did did have did improve strength development and increase the maximum lift okay so then we look at player two who is in the same class amount of weight lifted prior to the class 241 pounds which is already quite a lot and amount of amount of weight lifted after the class 252 okay well now there's there's an improvement but it's not a huge improvement right it wasn't nearly as big of an improvement as player one okay now we look at player three started with 338 pounds and ended with 330 pounds so this player three actually lost eight pounds of lift we look at player four 368 pounds and actually lost eight pounds of lift again but is already able to lift 360 which again is huge amounts of weight right so right so now we look at these things and if we look at player one we say okay this class definitely has benefits we look at player two the class has some benefits but it's not really sure if it's the class or if it's just the fact that player two could improve by themselves or doing something else and then player three and four both lost strength right so just looking at this data it doesn't really tell us if the strength development class itself helped right it seems like it helped player one it's kind of maybe helped player two but it doesn't seem like it helped player three or four right so we want to know you know how much how much strength increase do we get from this particular class and it looks like if you're over a particular amount let's say if you're you're less than well all I can say is if you're over a particular amount then it doesn't appear like it's actually helping you at all if you're over 300 300 pounds for example but if you're less than 300 pounds it appears like it does help you so in this case we could figure out exactly when does this help you first off does the class help generally speaking and then in what ranges will this help you so again we're using matched or paired samples the the important thing here is we have these two different groups and we have one individual sampled twice in time right so this was before the class this was after the class subjects are matched in pairs and differences are calculated when using the hypothesis test for matched pairs I have already talked about this okay yeah I'm not sure what happened with the with the okay I'm not sure why that was copied twice okay so matched paired samples just think that we have one individual the sample sizes are relatively small and we we take at least two measurements drawn from the same pair of individuals or objects so that is independent samples multiple independent samples and also matched or paired samples so that's it for today thank you very much