 I know that we've talked about Occam's razor before, but I really want to get into some of the details. You know how I like to split hairs. Occam's razor, or the principle of parsimony, is sometimes taken to be a fundamental rule of reasoning, a member of the set of laws that we must accept in order to reason properly. It's often invoked to dismiss propositions and ideas that have some sort of obvious frilliness or extra-moving parts, and rightly so. If multiple explanations are just as good accounting for the evidence, it seems totally reasonable that we should prefer ones that require us to take fewer leaps to cover the same ground. But often, the nuances of Occam's razor require careful handling, and flailing it about willy-nilly is likely to cause some havoc. In his book, Occam's Razors, a user's manual, philosopher Elliot Sober notes some sobering details about how parsimony works and why we might have good reasons to doubt razor-like arguments. The title mentions razor's plural because Sober notes two logically distinct versions of the principle of parsimony, what he terms the razor of silence and the razor of denial, both great names for philosophy metal band. The razor of silence is the more conservative of the two, saying, if you don't need something to explain some phenomenon, then just don't mention it in your explanation. If the fact that it's cold outside fully explains why your car doesn't start the first time, there's no reason to postulate that your battery must be dying too. The razor of denial is more aggressive, claiming that when choosing between logically incompatible explanations, all things being equal, we should rule against the existence of extra entities. We know that Lee Harvey Oswald shot JFK. The presence of a second gunman is unnecessary to explain that event, so we should conclude that there was no second gunman. The difference between not assuming something and assuming not something is subtle but important. The razor of silence is fairly easy to justify on probabilistic grounds, because two things being true are always less likely than one thing being true. But the razor of denial makes positive assertions about what we should disbelieve, and as it turns out, it's much harder to support. Although many philosophers and scientists have justified their use of Occam's razor by appealing to intuitions about how the laws of nature ought to be simple, this attitude gave way to more probabilistic approaches in the 20th century, possibly because the success of quantum physics was a remarkable demonstration of just how buck wild those laws can get. Sober performs a rigorous dissection of the razor using both frequentists and Bayesian probability, and arrives at an interesting conclusion. Although parsimony can be a good rule of thumb when you're evaluating hypotheses, it seems to be standing in as a proxy for more fundamental principles, and in situations where those principles actually conflict with parsimony, it seems like ditching Occam's razor is the best move. For example, let's take a look at one of the classic uses of the razor of denial, assuming common cause. One of Isaac Newton's rules for practicing natural philosophy was, the causes assigned to natural effects of the same kind must be, as far as possible, the same. This was partially what motivated Newton to make the brilliant assertion that apples fall for the same reason that planets orbit. Similar statements can be found in the works of Aristotle and other famous philosophical bigwigs, essentially saying that if you see similar phenomena, you should lean towards explanations that say the same thing caused both of them, rather than a different cause for each. We can restate this idea in terms of probability by linking it to something called the law of likelihood, which says that evidence favors one hypothesis over another if it's more likely that we'd see that evidence if that hypothesis were true. This is a little confusing, but stick with me. We're not talking about which of our hypotheses is most probable or anything like that, we're just ranking them based on which ones would most reliably cause a certain bit of evidence. If I flip a coin and it comes up tails, the law of likelihood would say that the evidence of the flip favors the hypothesis that the coin has tails on both sides. That's obviously not the most probable explanation, given everything that we know. There's all sorts of other things that should make us expect a normal coin that happened to come up tails. But just looking at the one flip, we'd only see a result of tails 50% of the time with a normal coin. If it were a double tails coin, we'd see that result 100% of the time. 100 is more than 50, so according to relative likelihood, double tails gets a higher ranking. Now, often, when we see two things that look similar, that evidence favors a common origin, because all things being equal, the law of likelihood favors that explanation. Two people who look exactly the same, you'd see that a lot of the time, if they were twins, and only rarely if they weren't, so the law of likelihood favors the twins' explanation. Two different objects both fall toward the earth, the law of likelihood would rank a common gravitational force higher than two different forces, which might well accelerate them at different rates. But what about two different guests showing up to a wedding with the same item off the registry? In this case, the similarity between these phenomena actually makes it less likely that they have the same cause. If one guest bought it off the registry, short of some weird bug with the software, the other one couldn't. This actually favors an explanation with two different causes. Maybe one of the guests overheard the groom talking about this cool toaster and circumvented the registry to get it for him. In the wedding registry case, Occam's razor is at odds with the law of likelihood, and it seems like the most parsimonious answer a single cause for these similar events is actually improbable. Sober suggests that while Occam's razor is often aligned with likelihood, when it isn't, likelihood is the more fundamental of the two, and should win. Another way to think about parsimony is as a mechanism for choosing better models of the world. Although as we'll see, it's worth asking better for what exactly. If you've gathered enough data points to try and guess the laws that govern them, like guessing at the equation for gravitational force between two objects, there are arguments to be made for starting out with simpler equations with a few adjustable parameters and working your way up. After all, equations with lots of high power terms might be massaged to fit a set of measurements better than less sophisticated ones, but they run the risk of overfitting, exploding into whack-a-doodle values when you tweak their initial conditions slightly. It seems wise to avoid using volatile terms like 13.2x to the eighth minus 100x to the fifth when a simple x-squared will do. While it's a good thing to be wary of, you don't get to congratulate yourself on the job well done just for being parsimonious with your variables and not overfitting your data. The whole point of building a model to begin with is predictive power to figure out what will happen. Depending on how much weight you put on Occam's Razoring Your Way down to a few low power equations, you may miss out on some useful predictive accuracy in the process. Like a model that says the coin is going to land either heads or tails has one fewer term than one which says that it's going to land heads 50% of the time and tails 50% of the time. It's not overfit, sure. It's not sensitive to small variations in flip frequency, like a run of heads won't send it spiraling out of control. And it's very accurate, but it's not exactly useful. In both the model selection and likelihood mechanisms of Occam's Razor, there are some important things to note. First, in both cases, parsimony is just a proxy for something else, either relative likelihood of a theory or relative predictive accuracy of a model. It's not an epistemic virtue in and of itself, and can actually conflict with those goals and weird scenarios, leading dedicated shavers astray. Second, even looking directly at those more fundamental properties, we're just ranking and comparing some theories that we've come up with. There's no indication how likely they are to be true, just which ones that we've thought of are more or less likely given the evidence, or more or less predictive. Given these facts, it might make sense to be diligent about wielding the razor of silence wherever possible, and when more discerning cognitive tools are needed to use likelihood and predictive power directly, rather than invoking the potentially unreliable razor of denial as shorthand for them. After all, if it's not strictly necessary to get good results, can you think of a place where a more nuanced understanding of how the razor of denial works might lead you to a different result than just straight up parsimony? Please, leave a comment below and let me know what you think. Thank you very much for watching. Don't forget to blah blah subscribe, blah share, and don't stop clunking.