 There are some contours related to the use of p-values. So why p-values are the most common way that we use for statistical interest. Now there are issues with technique. Some of the issues are fundamental, like some people claim that the null hypothesis significance testing is an illogical approach. It doesn't answer the question we want to answer and it focuses on the incorrect thing. Also there is some evidence that if we base our publication decisions on which studies get published and which are not, on the p-values that will distort the body of knowledge. So only studies that support the hypothesis are going to be accepted. Therefore there is bias towards confirmation. Some of these problems are not specific to null hypothesis significance testing. But it's still useful to understand what limitations on most common misunderstandings there are about these techniques. So this slide lists six different statements about null hypothesis significance testing. Assume that we have identified that p is less than 0.01 in our study. Does it mean that we have disproven the null hypothesis, that there is no evidence, that there is no difference in the populations? Have we found the probability that the null hypothesis is true at one percent probability level? Have we proven an experimental hypothesis that there is a difference? Can we deduce the probability of experimental hypothesis being true? We know that if we rejected the null hypothesis, the probability that we are making a wrong decision is small. Or we have a reliable finding in the sense that if this experiment was repeated, then the repeated replication would arrive to the same result. All of these are false. These are common held beliefs listed in the Relics Management Journal. And you can see these all over in published articles. But they're not true, because p-value doesn't tell us anything, these things at least directly. So we have to understand the criticisms of p-value. And there are three main points in the criticism. One is that the p-value doesn't really tell us what we want to know. So we want to know when we have our sample data, we have calculated something from the sample. We want to know how certain we are that there is an effect in the populace. P-value doesn't tell us that. It tells us what is the probability of getting the effect that we just got in the hypothetical scenario, that there is no effect in the populace. So given this data, the probability of null hypothesis being true is not the same as what is the probability of this data given the null hypothesis is true. So p-value doesn't really tell us what we would like to know. But we so much wanted to tell us that, that we nevertheless often say that p-value is evidence for an existence of an effect. Whereas in reality it's only indirect evidence at best. There's also another criticism, another angle to the criticism. It's that the p-values are illogical. So the idea of p-value is based on, you could think of it as deductive reasoning. So the idea here is that if null hypothesis is true, then we should not observe data. We observe certain kind of data, therefore the null hypothesis can't be true. If these are statements that have absolute truth value, so if then, if null hypothesis is then not data, if data, therefore not null hypothesis works well. But the problem is that null hypothesis is a probabilistic statement. So we say that if null hypothesis is true, then observing a data is very unlikely. So that's the p-value quantifies that likelihood. So we just say that it's unlikely to have that kind of observations by chance only. We get an observation, we cannot conclude that there are null hypothesis is very unlikely. It's just, this is not logically valid. One way to understand why this is not logically valid is to just put some meaningful statements instead of null hypothesis and destar the data. So this one would be, for example, a classic example is that if person is hanged, then person is dead. Person is not dead, therefore person was not hanged. So the null hypothesis is that person is hanged, the observed consequence is that he is dead, then if he is not dead, then he was not hanged. It works well. When probabilistic statements are made, it breaks apart. So a classic example is that if a person is American, then a person is very unlikely to be a member of a Congress, of the Congress, because the Congress has some hundreds of people and there are hundreds of million people in America. So it's very unlikely that an American is a member of a Congress. Then we observe that the person is a member of a Congress. We cannot make the inference that it's very unlikely that the person is an American, because you have to be American to be a member of the Congress. So when we move to probabilistic statements from these two and four statements, which have absolute values, then things break apart. Then the final criticism, which I think is the most important one and most commonly misunderstood, is that the small p-value doesn't tell us whether there's an important effect. It only tells us something about the plausibility of the effect being zero in the populace. There are some many effects that are not zero, but they are so small that they don't make any difference. And p-value doesn't tell us whether an effect is meaningfully large. You have to do an interpret other statistics to understand that. And this is a big problem, because quite often you see paper articles, for example applying regression analysis, they conclude that an effect is statistically significant, and after that they don't really interpret whether the effect is large or not. They just say that it's not zero, therefore we have an interesting result. It doesn't work that way. You have to have a meaningfully large effect to have an interesting result, and p-value doesn't tell you that, unfortunately. So p-values have these issues, they are misunderstood, and they have some fundamental issues, and also the way we use p-values to judge which papers are publishable and which are not is problematic. So there have been some efforts to address these issues. One extreme is that a journal and some journals are banning, not a hypothesis significance testing. So if your article includes any p-values, don't try to submit it to a basic and applied sociopsychology, for example. They will send every article that has p-values back to the authors and tell the authors to remove the p-values. Strategic management journal is de-emphasizing p-values for two, so there is a trend of de-emphasizing or even out-rebanning p-values. So that's one way of addressing the issue. Another way of addressing the issue is related to the choice of which studies are published. So if we only publish studies that have small p-values, then we will inflate the false positive rates, and we will also bias the results. Why that's the case, I will explain in another video. But a way to address this problem is that studies are registered. So before you analyze your data, before you collect your sample, you write a study plan, and then you submit it to an online repository that tells the readers what you are planning to do. And that will be reviewed instead of your paper with the results. Then when your research plan is being reviewed, then that means that your study is not being reviewed based on whether the p-values are small or not, but it's reviewed based on the strength of the design, which is a much more meaningful metric for quality than the p-value. So this is another registered report, an upcoming trend that you should be aware of. There are lots of readings about controversies, and I really like the paper by NADSO in nature about statistical errors. It's a three- or four-page paper that explains these issues that I went through in this presentation, and it's well worth the time reading. Then there's the question of can we do better? So are there, the null hypothesis significance testing, the p-value and the confidence interval, they suffer from the same problem. So can we do better? So are there what alternatives do we have? So if we ban null hypothesis significance testing, we ban confidence intervals, then what's remaining? Not reporting anything about the precision of the estimates, that would not be a good idea. So can we do better? So ultimately we want to know or say something about the truth value of the effect in the populace. So instead of saying that the p-value, when we observe that we reject the null hypothesis, we would like to say how confident we are that the null hypothesis is true in the populace. But we don't know that based on the p-value. We can know that if we know what is the distribution of the true hypothesis in the populace. So let's take an example of clairvoyance. So we have a hypothetical example where we throw two dice and we ask a person to guess the two dice. They can either guess the two dice or if they can see the future, they will know what the two dice are and they can answer it correctly. So if a person answers the question of two dice, let's say that they are one and six, if the person answers the question correctly, then we reject the null hypothesis because getting two dice correctly is one out of 36 and that's less than 0.05. So either we have a false positive, our test had revealed that the person was guessing, or it's a true positive that the person actually knew what the dice are going to be, so he was clairvoyant. So what do we do? We can, if we know that based on the test that it has one out of 36 false positive rate, so guessing correctly is about one out of 36, and let's say that this test has 100% success rate for clairvoyant people. So if the person can foresee the dice, they must answer correctly. If there are one in a million people that are clairvoyant, let's say that it exists but it's fantastically rare, then we can say that if we rejected the null hypothesis, the probability of the person being clairvoyant is still one out of 20, about 28,000. The reason being that these people who are not clairvoyant, the 999%, 999,999 people who are not clairvoyant, one out of 36 is here, which is 27,279, and the remaining are here. There is one clairvoyant person who gets it correctly and there are no false negatives. So we compare one against 27,779. So this is the idea of basin statistics. So we include prior information, our beliefs about the prior distribution of the phenomenon in your analysis, and then we can, based on that prior information, we can say something about the phenomenon that we're studying that goes beyond the p-value. The problem, of course, is that how do we know these priors? How do we know that our test has 100% success rate? How do we know that there are one in a million people that are clairvoyant? So that's the problem of basin analysis. In basin analysis, we add information to the analysis based on what we know before the study. And that allows us to make inferences that are slightly different from p-values. But the problem is that how do we know? The basin analysis sounds attractive, but it has been available for a long, long time. So this is not a new thing. And basin analysis has been coming also a long, long time. For example, there's this article now that was published recently in the Journal of Management that says that our time for basin analysis is now. There have been these kind of articles for a long time ago. This is something also that you should be keeping up with. So it could be that these basin analysis, where we include prior information to our statistical analysis, it becomes more popular. But nowadays, it's not that commonly used. I can't tell you any papers that would apply to basin logic. I've applied it in one paper myself. The problem with basin analysis is not only the priors, which you would have to know. It's also that because these are not commonly used in business studies, then reviewers who get these studies on their desks don't know what to do with them. So we don't know how to properly evaluate. Perhaps we are skeptical of those studies for that reason. So there's kind of like a chicken and egg problem. This could be on in some ways a better approach to the statistical analysis. But because we have not been doing it in the past, we're not doing it now. Another pragmatic approach is a pragmatic way of thinking about this is that you will have to know p-values anyway, because 99% of the studies that are published at least use p-values. Once you know p-value-based statistics, then you can start publishing yourself, or you could spend an additional year on learning basin analysis. Most people will choose to go publishing because you're a PhD and then you're 10 years depend on that.