 I'd probably love to go, but you know that I can't do Thursdays. That's me and my bae's date night. I'll ask, but I don't think that I can pry her away from that new Star Trek. In this bag, I have a number of black and white marbles. I don't know how many of each there are in there. There might all be one color for all I know. But if I reach in and pick one out, it's black. How likely is it that the next time I pick out a marble, it's going to be black? It's a pretty simple question, and it has a pretty simple answer. Obviously, if we follow the trend of what's happened already, I've only ever drawn black marbles. 100% of my experimental data shows nothing but black. So I should assume that they're all black, and that the next one will also be black, right? I mean, I guess so? But that feels wrong somehow. It's not unthinkable that there are 50-50 mix, and I just happen to grab a black one first. Or even that they're all white, except for this one. There are many possible scenarios which could conceivably have given me this result. If they were all black, sure, I'd expect to see this 100% of the time. But if they were half and half, I'd still expect to see it 50% of the time. Even if there was only one black marble and 99 white ones, I'd still expect to see it come out of the bag first 1% of the time, despite the odds being against it. That's 100 times less likely than if the bag were filled with black marbles, but it's not nothing, and we probably shouldn't ignore it. That implies a slightly more complicated answer than we started with, one which takes those possibilities into account. Essentially, we're working backwards. To predict what's going to happen in the future, we're figuring out what sets of circumstances might have led to what we've seen so far, and how likely each set is. The cool thing about that approach is that it's very easy to update our guess about the likelihood of those possibilities as we collect more information. If I drew a white marble next, it's obviously impossible that they were all black to start with, and the odds that their split 50-50 get much higher. If you can do a little calculus, you can actually find a simple equation that balances the infinite number of possibilities for the marble question. The number of black marbles you've drawn, plus one, divided by the total number of marbles you've pulled out of the bag, plus two. With just one black marble, that puts the probability that I'll draw another one at two-thirds. That sounds much more reasonable than a hundred percent. If I drew two black marbles, it would be three-fourths. If I drew a black one and then a white one, it would be half, or 50-50, exactly like we'd expect. Both the marble equation and the general method for updating a guess about the conditions that might have led to a specific observation were discovered by the famous mathematician Pierre Simon Laplace, based on a paper published by Richard Price, who had developed an idea he heard from a Presbyterian minister, Thomas Bayes. So obviously, it's called Bayes' theorem, and it's become a cornerstone for all sorts of important stuff. Science, economics, artificial intelligence, cryptography, engineering, in any field which uses mathematical models to predict the future, someone's using Bayesian statistics to do it. But there's one place you might not expect it to show up. The philosophy of knowledge, or epistemology. In episode 58, we talked a little about the problem of induction, an uncomfortable philosophical objection to the main method humans use to form ideas about the world. I see a raven, it's black. I see a raven, it's black. I see another raven, it's black. I generalize from what I've seen and I say, ravens are black. That's induction, and it's just how we learn things. But if you think about it, all I really know for sure is that every raven I've seen so far has happened to be black. How do I get from there to the idea that all ravens everywhere are black? I can't actually prove that that's true for the whole universe worth of ravens unless I observe every single one of them. Which I mean, I'm a pretty busy guy. Compare that to deductive reasoning. Something like all ravens are birds, all birds are animals, therefore all ravens are animals. I can say with 100% certainty that that's true without ever seeing a single raven. But inductive reasoning, which is the foundation of all scientific knowledge, is, well, it certainly seems effective, but I could never actually prove that it is. Bayes' theorem is a very useful and intuitive response to the problem of induction, because it gives us a formal way to justify some amount of belief about a whole class of things given only a few examples. Black marbles, black ravens, I might not be able to prove you'll never find a white one, but I can still get close to absolute certainty that the next one you see will be black, even if I haven't observed all of them. Bayesian epistemology uses Bayes' theorem as a template for forming various degrees of certainty about beliefs, which gives us a decent way to use induction responsibly and justify being fairly certain that the sun will rise tomorrow, which is helpful. In fact, it's so good at honing in on intuitively correct answers quickly that is often used as the gold standard for rational thought. Some of its proponents argue that if you're reasoning properly, free from irrational stubbornness or errors, your conviction in your beliefs should update in a Bayesian fashion when encountering new information. So you know, every time you're faced with a tough question about the world, you open Excel, start inputting your priors, hey, where are you going? What do you mean Excel is boring? Excel is not boring. Excel is beautiful. Okay, so it's hard to find a straight line between Bayesian inference and practical everyday reasoning. There are certainly some interesting insights we might glean about how we should approach thinking if we're being good Bayeans, but it's not like knowing about Bayesian epistemology suddenly makes you spock. However, it's possible that your brain is already using Bayes' theorem in a very important way. The predictive processing or predictive coding model of cognition represents the brain as a number of layers of interpretation of the world, each using essentially Bayesian rules to achieve the same general goal. Guess what's going to happen next, pretty much the same way we did with the marbles, and minimize surprise. According to the theory, our brains are a set of prediction engines in a hierarchy of abstraction. At the bottom most level is the raw information we're receiving from our senses, flashes of light, vibration of our eardrums, that sort of stuff. The layer that receives all that information, we'll call it layer one, isn't concerned with high level concepts like ravens or pizza. Its only job is to try to filter out unimportant information and errors from that signal, using guesses supplied by layer two. If you wear a watch or a ring all the time and don't even feel it anymore, it's because that layer has learned to screen out that particular sensation as noise. Don't even worry about it, it tells layer two. Too predictable, no need to send this up to chain, I've got it under control. Layer two on the other hand has some very basic abstractions about things like shapes and colors. It can feed layer one predictions about what a blob of red is likely to do, keep being a blob of red. If layer one receives sensory information that the blob of red has suddenly changed to a blob of green, it can do one of two things, usually it treats it as a fluke and smooths things over, saying nope it's still mostly red, no problems here boss. But if it's way too weird compared to what layer two is guessing should happen, too surprising to hide, it kicks it up the chain, nope it's definitely green now, your problem. Similarly layer two is receiving guesses about what's likely to keep happening from layer three, which maybe has ideas about objects or persistence, which is getting information from layer four and so on, all the way up to layer n, the top dog, making all the predictions about the world from a huge tower of abstractions who can say no layer n minus one, it's not actually magic, magic doesn't exist, he probably just switched the red apple for the green one when we weren't looking. According to proponents of predictive processing, this model very closely matches what we observe about how brains process incoming information from the senses, a sort of iterated chain of prediction and feedback, there are all sorts of interesting examples of optical illusions and perceptual blind spots that do seem to fit the idea fairly well. And for what is worth, it more or less matches the subjective experience of surprise, too. We try to explain our surprise from the bottom up, first making sure our senses aren't deceiving us, and only updating our highest level abstractions when we're confronted with something that we can't explain away. Of course, as anyone who's familiar with statistics has been screaming incoherently at the screen for the past 10 minutes, Bayesian approaches to interpreting data have their limits and criticisms. Generally to use them, you have to supply some sort of initial guess about how likely particular scenarios are before you can start honing in on the right one, which can skew the results drastically. That inherent subjectivity makes a lot of people, especially mathematicians, uncomfortable, and is often a very intuitive and effective method for predicting the future, and it has been used to amazing effect. You might even be using it right now. What do you think of Bayesian inference, epistemology, and a model for cognition? Please leave a comment below and let me know what you think. Thank you very much for watching. Don't forget to blah blah subscribe, blah share, and don't stop thunking.