 Welcome back, everyone. Today, we're going to be talking about hypothesis testing. So we want to be able to make claims, we want to be able to differentiate a true claim from an untrue claim. So to be able to do that, one of our strongest or best tools that we have to do that is through hypothesis testing. And most science that we do also uses hypothesis tests to be able to say something again about a claim, whether it's whether it's true, whether it's supported or whether it's not. And we there's a couple different concepts we need to think about whenever we're talking about hypothesis testing. And one of those is confidence intervals. So like we talked about before, confidence intervals are one way to estimate a population parameter. And another way to make a statistical inference is to make a decision about a particular parameter. So we look a lot at confidence intervals. Whenever we look at distributions or look at measurements that we've taken for some hypothesis that we're making. And confidence intervals are a very important way to measure how confident we are about some claim that we're making. So we use confidence intervals quite a bit whenever we're doing things like hypothesis testing. So hypothesis testing, imagine just some examples here. Imagine a car dealer advertises that its new small truck gets 35 miles per gallon on average. I'm not sure what that would be in metric. I'm sorry, I can't convert that in my head. But let's say quite good, quite good gas mileage on average of a car dealership says this car gets much better than average gas mileage. How do we actually test that claim? Right, we want to know what this claim seems like it's reasonable. I mean, it's kind of within parameters, you could have a small truck that gets 35 miles per gallon. But that seems like a really, really good gas mileage for especially a truck. So how can we test this claim, a tutoring service claims that its method of tutoring helps 90% of its students get an A or a B. Again, how can we test this claim? If we had the data, would we be able to test this and verify that this is actually true or not? Can we be sure that it's actually the tutoring service that helps its students get 90%, that helps its students get a very good grade or whether it's something else. A company says that women managers in their company earn an average of $60,000 per year. Again, how can we test this? And a statistician will make a decision about these claims. Whenever we're doing statistics, we really want to make some decision about a particular claim. So hypothesis testing, hypothesis testing involves collecting data from a sample and evaluating that data. So like we saw before, all of those claims are either made up or they come from data. So we need, if we're doing statistics, we're gathering data, just like we did when we were collecting enough information to make some distribution of the data. We're gathering information, we're gathering data, and we're trying to evaluate what this data actually means or what this data is telling us. The statistician makes a decision as to whether or not there's sufficient evidence based upon analysis of the data to reject what's called the null hypothesis. So the null hypothesis, which we'll talk about in a second, is what we're actually trying to reject. We're trying to show that the null hypothesis is not likely to be true. It doesn't necessarily mean that it's false, but it's not likely to be true, and we're trying to reject the whole null hypothesis. Now, this is a very important point not only for statistics, but also for basically all of science. Most people think that scientists are trying to prove something is true. Listen to what I'm saying here. Most people think that scientists are trying to prove that something is true, but that's not correct. The way you have to think about it is scientists are trying to reject or prove that the null hypothesis is not true, and we'll talk about what the null hypothesis is, but basically scientists are trying to reject other possibilities in support of the possibility that they are hypothesizing. So what they think is true, they're trying to reject everything else except what they think is true, and in this case is the null hypothesis. So scientists, whenever they're doing tests, they're usually testing for the null hypothesis and attempting to reject the null hypothesis. Now let's see how they do that. So a hypothesis testing consists of two contradictory hypotheses or statements, and a decision is based on the data and a conclusion. So as soon as we make a hypothesis, and a hypothesis is essentially we can define it as kind of like an educated guess about the answer to some question. So let's say my question is does this truck get 35 miles per gallon in terms of gas mileage, right? So how could we actually test that? And what is my hypothesis? My hypothesis here is that this truck gets more than 30 gallons per mile gas mileage, right? So that's something now that I can test. Well, how can we test that? Well, we can get this truck and we can drive it for a round for a while and fill up the gas tank several times, and potentially we would have enough data to calculate whether we actually do get 35 miles per gallon on this truck or not. But what is the contradictory hypothesis here? Because my hypothesis is that yes, this truck does get, you know, greater than 30 miles per gallon in terms of gas mileage. What is the contradictory hypothesis? And there's always a contradictory hypothesis. And the contradictory hypothesis here or the null hypothesis here is that no, the truck does not get greater than 30 miles per gallon in terms of gas mileage, right? So as soon as we make a guess, if we've formulated the hypothesis correctly, then there's always another explanation that is the opposite. So yes, it's greater than 30 miles per gallon. No, it's less than 30 miles per gallon, right? So there is a kind of this opposite, we're looking for this opposite or contradictory hypothesis. So you have to think not only about what is my, what is my hypothesis, but what is a contradictory hypothesis to mine? And that contradictory hypothesis is what we call the null hypothesis. Okay, so set up two contradictory hypotheses, basically what you want to show, and then the opposite of that essentially. And we are trying to disprove the opposite of that. And that's, it seems counterintuitive that we would try to disprove the opposite of what we're trying to prove. But believe me, it makes sense once you start doing it. So we set up these two contradictory hypotheses, and then we collect sample data, we determine the correct distribution to perform the hypothesis test. Now, here we come back to the distribution of our data, what is the distribution of our data. And if our data is distributed normally, or we have a normal distribution, then it becomes actually quite easy to analyze our data. Now other other distributions are useful as well, but the normal distribution is quite straightforward. Then we analyze sample data by performing the calculations that ultimately will allow you to reject or decline, or reject or decline to reject the null hypothesis. So basically, we're, we're trying to say that we either reject the null hypothesis, as then we do not believe that the null hypothesis is supported. Or our data shows that we can't reject the null hypothesis, we can't say that the null hypothesis is definitely not true. Basically, okay, so we have, we have rejection of the null hypothesis or non rejection of the null hypothesis, we haven't shown that the null hypothesis is not true. Therefore, we cannot say that the hypothesis, the actual hypothesis that we want is true, because the null hypothesis could still be true, right, we can't reject it. Depending on, yeah, based on our data, if we can't reject it, we can't necessarily, we don't really know what is true, we can't find enough evidence to support either case, okay. And then we make a decision, and we write some sort of meaningful conclusion. And this is, I guess, one of the most important parts, right? Once we make a decision, once we say that no, we cannot reject the null hypothesis, or yes, we can reject the whole null hypothesis, I'm confident we can reject the null hypothesis. So our hypothesis is supported, or is significant, we'll say. If our hypothesis is significant, then we need to make some decision about what our data is actually telling us, and we'll talk about that in a second. And then we need to write a meaningful conclusion. And this meaningful conclusion is essentially the knowledge that we've found, right? So the study that we did, did we find out that the car dealer was actually lying whenever they said that the truck gets 35 miles per gallon? What type of meaningful conclusion can we pull out of this? And we need to write it or represent it, because if we don't, then no one else will get access to that information. If no one else can get access to that information, then we are the only ones with that knowledge. And why should anyone believe us, right? If we don't, if we don't actually give them the studies that we've done, if we don't show them that our conclusions are correct, why should anyone else believe our conclusions? So getting knowledge out there is also a very important part of hypothesis testing or statistics and basically science in general. So next, null and alternative hypotheses. These are what contain the opposing views. The null hypothesis is essentially going against my proposed hypothesis. It's the opposite. And the alternative hypothesis is the hypothesis that I am proposing. Okay, so more formally, the null hypothesis is a statement about the population that either is believed to be true, or is used to put forth an argument unless it can be shown to be incorrect beyond a reasonable doubt. And this is really what we're trying to prove here. Can it be shown to be incorrect beyond a reasonable doubt? Okay, and this is what we can measure. We can say that this null hypothesis, if our data does not support the null hypothesis, and we can say that our data does not support the null hypothesis, and it is incorrect beyond a reasonable doubt, and we can measure essentially how much confidence we have in that claim. Because we've talked about confidence intervals, we can measure how confident we are that we are far away from our null hypothesis and closer to our, potentially closer to our alternative hypothesis. So our alternative hypothesis is a claim about the population that is contradictory to the null hypothesis and what we conclude whenever we reject the null. So if we have enough data to be able to reject this null hypothesis or the statement that is believed to be true, or basically we can just show that the null hypothesis is incorrect, this is the opposite of what hypothesis we kind of want to prove. Then we can say that we accept our alternative hypothesis and we can show, we can show by rejecting the null how confident we are that our alternative hypothesis is actually significant or is correct, essentially. This brings us to error testing, and there are two different types of errors, type one and type two. So basically the best way to think about these is just have me read them out. So the decision to is not to reject H0 whenever H0 is true. So if we decide not to reject H0 when H0 is true, this is a correct decision. So here we have action H2, H0, which is H0 is the null hypothesis, by the way, H0 is the null hypothesis. So we do not reject the null hypothesis whenever the null hypothesis is true, and this is the correct outcome. So that is actually not an error at all, because it's true, right? So this is a true, what we call a true positive. If we, the decision to reject H0 when H0 is true, and this is an incorrect decision known as a type one error, and here we have, we are rejecting the null hypothesis whenever the null hypothesis is actually true, and we have a type one error, and this can happen sometimes. So we might reject the null whenever the null is actually true, that is called a type one error. Okay, the decision to not reject H0 whenever in fact H0 is false, we have an incorrect decision known as a type two error. So here we have, we do not reject the null hypothesis, and null hypothesis is false. So we accept the null hypothesis even though it's false, that's a type two error. Okay, I mean that is the decision to reject the null hypothesis when H0 is false, this is a correct decision, whose probability is called the power of the test. So here we have two correct outcomes and two types of errors, basically where we have a a true negative and a false positive, true negative false positive. Okay, so those two different types of errors that we can get, and by using those errors, we can calculate the error rate of our overall experiment or test. And this is one reason this type of error or these types of types of errors is one reason why we replicate science so much. If we just ran the an experiment for one time, for example, we collected a bunch of data, and we just ran an experiment once, then we might get some of some of these errors, dealing with our hypotheses, right? So if I just ran it once, then I might get a type one error, but I don't know that I got a type one error because I only ran the experiment once. This is why experiments tend to be run multiple times. Maybe we collect data multiple times and actually analyze the data multiple times or separate the data into multiple parts and analyze these different parts to see if we're getting different types of errors. So again, more testing, more data and more testing is usually better, especially whenever we're trying to determine whether we have a type one or type two error. After multiple outcomes, like tests, observations, surveys, we can determine if our null hypothesis is supported and what its error rate is, right? So we have to have multiple outcomes to be able to determine if there's some sort of type one or type two error rate. And out of these multiple outcomes, we can figure out is our null hypothesis supported or not supported? Can we reject the null hypothesis or not? If we can't reject the null hypothesis, then the alternative hypothesis that we have proposed is essentially not supported or it's false, or we can't statistically say whether it's supported. If we can reject null hypothesis, then that means that our hypothesis is more likely to be supported. Instead of finding out how much of something we're trying to explain the cause of something essentially. A systematic way to make a decision of whether to reject or not reject the null hypothesis is to compare the p-value and a preset or preconceived what we call significance level. So basically we are trying to determine whether our alternative hypothesis is what we call significant. You can think of the p-value, well let me read it first. So p-value is the probability that if the null hypothesis is true, the null hypothesis is true, the results from another randomly selected sample will be as extreme or more extreme as the results obtained from the given sample. A large p-value calculated from the data indicates that we should not reject the null hypothesis and the smaller p-value, the more likely the outcome and stronger the evidence against the null hypothesis. So essentially you can think of p-value kind of like a threshold. How do I say this? It's kind of like a threshold of acceptance. If we have, like it says, if we have a huge large p-value calculated from the data, then we can't really say much with very much confidence about the null hypothesis. So we cannot, we should not reject the null hypothesis because our p-value is too big. The closer or the smaller our p-value gets, the essentially smaller rate, smaller the threshold of error we get. So the smaller the p-value gets, the more unlikely the outcome and the stronger the evidence it is against the null hypothesis. So again, a large, just think of it like large p-value, more likely that our null hypothesis is false, smaller p-value, the more likely it is that it, ah, sorry, I got that wrong. Larger p-value, the more likely we should not reject the null hypothesis, smaller p-value, the more likely we should reject our null hypothesis. And this depends on also our significance level, how significant is our data. So essentially we make this kind of threshold, this decision threshold, where we say that our data has this much error above and below it. And if our point, if our hypothesis or the points we are observing are beyond that level of error, then our hypothesis is supported. If it's within that level of error, then we can't differentiate essentially. A preset significance level is the probability of a type one error, rejecting the null hypothesis when the null hypothesis is true. So we have the significance level. And I guess I should have described that as the threshold. We have this level of error. And we want to calculate how big that is. And if the p-value is greater than that, than the significance level, then yeah, if the p-value is less than the significance level, then the stronger evidence against the null hypothesis. When do you make a decision to reject or not reject the null hypothesis, basically do as follows. If the significance level is greater than the p-value, reject H zero, the significance level is greater than the p-value, reject the null hypothesis, basically the p-value falls within the error rate, right. So if our p-value falls within the error rate, reject H zero, the results of the sample data are significant. There's a significant, there's sufficient evidence to conclude that H zero is an incorrect belief and the alternative hypothesis H, here we say HA, may be correct. And this is our alternative hypothesis, okay, may be correct. If the significance level is less than or equal to the p-value, do not reject H zero, the results of the sample data are not significant. There is not sufficient evidence to conclude that the alternative hypothesis HA may be correct, okay. So if the significance level is less than or equal to the p-value, do not reject H zero. When you do not reject H zero, it does not mean that you should believe that H zero is true, it simply means that the sample data have failed to provide sufficient evidence to cast serious doubt about the truthfulness of H zero, I put H, oh, but I mean H zero. And this is one thing that a lot of people get wrong and this is kind of, this is actually very similar to court. So think about court, whenever we find somebody not guilty in court, does that mean that they are innocent? No, it just means that they are not guilty, we can't say anything about their innocence, we're testing, the hypothesis that we're testing is that they're actually guilty. If we say not guilty, then basically what we're doing is saying that we do not have evidence to, we don't have enough evidence to say that they are guilty, but we also don't have any evidence to say anything about whether they are innocent or not. So we make a claim of not guilty. And this is the same thing, so when you do not reject H zero, it does not mean that you should believe H zero is true. We're not saying we support H zero, we're just saying that we don't have enough evidence to prove that it's false, we don't have enough evidence to prove that it is false, we don't have enough evidence to reject it, right? So basically rejecting the null doesn't say anything about, in this case, whether the null hypothesis is true, it doesn't say anything just like courts doesn't say anything about the innocence of the person. All we're trying to do is look for a particular claim, a claim that somebody is guilty, a claim that an alternative hypothesis may be true, right? So this is kind of an important point, it's not very clear to most people, but just think of it like court. We are trying to test for guilt, we don't say anything about the innocence of the person, but we can say, do we have sufficient evidence to say that they are guilty? If we don't, then we say that they are not guilty, but that's not the same thing as saying that they are innocent. So again, when you do not reject H zero, it does not mean that you should believe that H zero is true, right? We're not saying whether it's true or not, we're just saying that we don't have enough evidence to reject it. Here, this is essentially what we're looking at whenever we're looking at these thresholds, especially the significance levels and error rates. So here we have this looks like a normal distribution, I'm pretty sure it's positive it's normal distribution. And then we have at two, what looks like two standard deviations. And this is where we fail to reject H zero, possibly because of the significance of the data or the yeah, significance or error rate in the data, right? So we basically have the mean of whatever our data is. And this area essentially is the probability that we fail to reject H zero. And then here at the edges, we have data that is significant or show significance to reject H zero. Now notice, we don't say any claim, we're not giving any claim here to the probability of accepting H one, right, rejecting H zero and accepting H one are essentially are connected together because they're the opposites or should be the opposite to each other. But here we are only testing whether we accept H zero or reject H zero. If we reject H zero, then we can conclude that H one is supported because it's the opposite. But here we're basically saying within this range, we fail to reject the null hypothesis. And anything in this case, after two standard deviations because of the data that we have, anything after two standard deviations, we reject H zero and therefore accept or support H, the alternative hypothesis. Okay, so this is kind of what the distribution would look like. So consider what the p value means and make sure you form your conclusions like what does it mean for the null hypothesis to be supported or not supported. Again, one of the biggest problems that people have is actually forming a proper hypothesis. So if you don't form a proper hypothesis or you don't know what your hypothesis means, if it's supported or not supported, or let's say you form a hypothesis and you don't understand what it means for the null hypothesis to be rejected or not rejected, then you could form incorrect conclusions even though the rest of your study is correct, right? So be very careful about how you form conclusions and make sure you understand what you exactly mean by your hypotheses. Okay, so that's it for hypothesis testing with single samples. Next we'll talk about hypothesis testing with multiple samples. Thank you very much.