 We could discuss the holiday season, but I think for right now, we should just talk about statistics. You know, this'll be a normal episode. Humans are, unfortunately, pretty good at seeing whatever it is that they want to see in a set of data, regardless of what that information actually says. The observer expectancy bias is something that we've all experienced at one time or another, a sort of wishful interpretation of evidence. Whether it's convincing ourselves that the gas gauge isn't quite at empty yet or that the person that we're dating isn't flawed in any way, it's incredible how far we can bend our perceptions to suit whatever we want to be true. One of the reasons that science is so good at discovering facts about the universe is that rather than depending on that highly variable human intuition to determine if a certain trend in data is significant or not, scientists use math instead, which is less prone to wishful thinking. Generally speaking, to be taken seriously, scientists have to demonstrate that the patterns that they see in their results pass a certain threshold of probability, that it would be really, really weird to see these numbers if that pattern didn't exist. That threshold varies by field, but it's more or less the same thing for all of them. Set a bar for how many times your pattern has to appear in a certain number of tests before you can say it's real instead of just some random thing that happened while you were testing. Because of the objectivity and power of this method, scientists are able to discover some incredibly subtle patterns and be mostly certain that they exist. This is how we know about stuff like dark energy despite the fact that it's incredibly hard to measure. However, this sensitivity to patterns can be a bit of a problem in certain contexts when we're trying to translate scientific results into behaviors and policies. Let's say that we're medical researchers, we've done a proper double-blind study, we've crunched the numbers and we are 99.999% certain that the new drug that we've developed lowers blood pressure. Here's a graph of what that math might look like. On the right-hand side, we have a group of people who didn't get our drug and got a placebo instead. There are a few who have higher blood pressure and a few who have lower blood pressure, but most of them are clustered around this central value here. On the left-hand side are people who took the drug. Their distribution has moved further down the scale, meaning that generally speaking, they have lower blood pressure. Some are lower than others, but there is a statistically significant shift in this curve. So, great. Obviously, we should start mass marketing this drug for people with high blood pressure, but you'll notice that I haven't labeled the X-axis yet. What if it looks like, say, this? There is still a statistically significant shift in this distribution, but it's not very big when you're talking about blood pressure, only one or two millimeters of mercury. That's not going to affect anybody's health appreciably. The effect of that drug, as I've described it, is statistically significant, but not clinically significant. It has a definite effect, but that effect isn't big enough to warrant changing our behaviors or policies. This idea of clinical significance is prevalent in medicine, but is applicable to all sorts of different things. For example, this 1978 paper has been widely cited as justification for the idea that men have better spatial reasoning than women, a factoid which has been used to justify everything from differences in enrollment rates and engineering programs, to why dad should get to read the trail map. Here are some of the results from that paper. As you can see, the authors have done their due diligence to demonstrate that there is a statistically significant difference between these two groups. The smaller the p-value, the better, and some of these p-values are as small as .001 in a psych paper, show-offs. But take a look at this column, the standard deviations. In every single one of these tests, the differences between the mean values was smaller than the standard deviation within each distribution. To get a better idea of what this means, look at a graph of the results. Not as impressive as the blood pressure thing, is it? A slightly better than average woman here would be better than most men. A slightly worse than average man here would be outperformed by most women. Statistically, yes, there is a pattern in this data that shows that there's some difference, but it's not really only one in five mechanical engineers is a woman or a dad is just going to be naturally better at reading the trail map sort of difference. Here's another paper with another graph of two statistically different groups, introverts and extroverts. The math shows that there's definitely a difference, but that difference is very small compared to the range of behaviors exhibited by each group. Both groups more or less partake in the whole range of introverted to extroverted behavior some of the time, preferring stuff near the middle and tailing off at the extremes. The differences only really appear in the frequency of moderate activities, like an extrovert will smile at a stranger a little bit more often than an introvert will. They're both likely to get overwhelmed at Coachella and they're both likely to go stir crazy if they're snowed in for a week. In this light, the ruckus that's raising a lot of media about the practical importance of the vast gulf between introverts and extroverts is a little bit weird. Yes, there is a difference, but is occasionally doing something that's a little bit more one way or the other really that much of a deal. Now, this distinction between statistical and clinical significance is important, but we are talking about interpretation of complicated sets of data, which is an notoriously fickle process, often prone to those biases that I was talking about when we started. Unless you're a statistician or an expert in the field, you probably shouldn't go around casually dismissing scientific results based on how practically meaningful they seem to you. Even experts get it wrong sometimes. But this does highlight the value of the dialogue and consensus between scientists for a decent understanding of these subjects. There are always going to be a few researchers who think that a certain set of results is totally bankrupt and a few who think that they're the most significant findings in the field to date, but the majority of opinion will probably congregate around something that's close to the truth. Huh, where have I seen that before? How much do you trust your statistical intuition? Please leave a comment below and let me know what you think. Thank you very much for watching. Don't forget to blah, blah, subscribe, blah, share and don't stop thinking.