 Man, do you guys remember in the 80s when this squiggle was on absolutely everything? It really was a sort of trend line, wasn't it? Surprise! Oh, um, hi. I guess you weren't surprised by that. But have you considered why you weren't surprised by that? Why some things are surprising and others aren't? Surprise is the feeling that you get when you discover that your mental representation of the world is inaccurate in some way. And some conscious or unconscious prediction about how things will turn out is found to be faulty. It's obvious why that's the case with something like a jump scare in a horror movie. You expect the scene to continue on as per usual, and then suddenly... Ah, orchestra hit scary monster! Ah! Surprise! But there are also more subtle versions of surprise, which we encounter daily. The check for dinner was bigger than we expected. The meeting ran too long. Stuff like that. Sometimes that just happens because we lack important information. But other times, it's because the way we built the model in the first place is screwy in some fashion. If the meeting has run over time every single week you have it, and you're still somehow surprised when it does, you probably shouldn't be. There are all sorts of ways to screw up a prediction engine, but let's look at one in particular. What happens if you have enough data, but you overinterpret what it means? Let's say that this graph represents a mental model of some phenomenon, like how delicious something is versus how much chocolate is in it. The points on the graph represent data that your senses have reported to you at various times in your life. A fun-sized chocolate bar that you had for Halloween one year. A friend's birthday cake. That one time that you had real Belgian hot chocolate. There are various ways that you can draw a curve through these points to predict what future instances of this phenomenon might be like. Things that you haven't experienced yet. If you just drew a straight line that got as close as possible to all of the existing points, it might tell you that some ice cream with some chocolate syrup on top, that would be pretty good. But with chunks of decadent fudge instead, that would be even better. Or you could try a slightly more complex curve, something more like this. This tells you that up to a point, more chocolate is better, but there are decreasing returns after a while. Yes, fudgy ice cream is probably better than ice cream with chocolate syrup on top, but quadrupled chocolate fudge on chocolate ice cream with chocolate syrup and chocolate sprinkles on top? I mean, it's good, but it's not all the way up here. You'll notice that the second curve also gets much closer to hitting those existing data points than the first curve does. One might imagine that the best possible mental model absolutely nails every single one of them exactly. After all, we want that model to make accurate predictions, and if it doesn't match up closely with my experience of, say, chocolate cake, then we might be nervous about how accurate it really is. To fix that, we might try to bump up the complexity of the curve again, and again, and again, until we have this monstrous function that accurately predicts every single experience we've had with 100% accuracy. But watch what happens to that function when I feed it one more data point, a chocolate eclair, and then adjust it to fit. This says that if I eat something in between hot chocolate and cake, it's going to be the most disgusting thing I've ever put in my mouth. What? The problem here is that the more complex function, although it's very good at fitting the data points we've already gathered, is very sensitive to new bits of information. It's liable to fly out of control if we feed it anything that isn't exactly in the right place. It's not just fitting the available data, it's what data analysts call overfitting. The things that we experience are subject to an amount of background noise, random variations that can't really be accounted for, and that's a problem for highly sensitive models of the world. Like maybe when I had that birthday cake, I got an especially dry slice, or I was feeling congested that day and couldn't enjoy it fully, or any number of other factors which colored how my senses reported that particular data point. With respect to how we think about the world, overfitting manifests as an over-reliance on rare events to be representative of how things usually work. Because of that background noise, really weird one in a million events will happen, and if we try too hard to crowbar them into our models of the world, we're going to end up with some really kooky predictions. This is exactly how superstitions and weird phobias get started. Imagine someone who experiences an engine failure on their plane, a one in 1.5 million chance that usually doesn't result in a crash, and yet they still become paranoid of air travel. In their experience, even if they fly a lot, trying to fit that one data point too closely instead of treating it as a bizarre fluke, which it is, might lead them to that irrational conclusion. Unfortunately, we do live in an age of information, and a lot of the information that makes it past our well-trained attention filters is an information about the normal, boring, or pedestrian. It's usually about those one in a million events that we've never seen before and likely never will again. I mean, we call it the news, not the same old same olds. Just think of the effect of hundreds of news stories inflating the psychological weight of a few rare and unrepresentative events, and how that might totally screw up someone's ideas about what's likely to happen. Overfitting also crops up in many other contexts, where a person or organization gets obsessed with explaining or optimizing one particular data point. It's a notorious problem in business, where naively incentivizing certain metrics leads to weird system gaming behavior. Like Christmas creep is the result of overfitting the metric better sales numbers than this time last year, despite the fact that improving that number doesn't improve total sales at all. It might be a little counterintuitive, but sometimes to make mental models better at predicting the future, they have to be less sensitive to changes in certain variables. That can mean allowing for more background noise when considering some variables, or it can mean just straight up ignoring some of them. Reference class forecasting is pretty much exactly that. Instead of getting bogged down in all of the details of a problem, accounting for every data point and all of the bias that comes with them, it instead asks us to figure out what kind of prediction problem we're looking at, what class it belongs to, and then simply assume that it's going to turn out more or less like all of the other ones. For example, if I wanted to predict how long it would take me to write and produce this episode of THUNK, my first impulse is to consider a boatload of specifics. How complicated is this topic? How much research do I think I'll have to do for it? How many images can I reuse to illustrate my points instead of having to find new ones? There are probably at least 20 or 30 criteria that I might use to estimate what time I'll be able to hit upload tonight, each one of which is subject to bias and potentially overfitting. But in reference class forecasting, I dump all those variables in the trash and just look at two numbers. The average length of time it takes me to make an episode. Any episode and the distribution of those times. That is, how many take me FOREVER to make and how many take a little less than that. I only have to make an intuitive judgment about where in that distribution this episode might sit and there's my prediction. The end. That might seem a little bit drastic or haphazard at first. After all, I've got all these tasty tidbits of information that absolutely have a measurable effect on how long it takes me to make an episode. And I'm just going to ignore them. But it turns out that reference class forecasting is often incredibly accurate. Often much more accurate than even experts are able to manage. In a 2010 study, a reference class forecast of the projected cost for several road building projects was found to be 25% more accurate on average than the experts who originally estimated how much those projects were going to cost. Even with all their experience and detailed information about how big the projects were or how complex construction would be, the DUNG reference class forecast estimate was reliably better. Of course, you do have to make sure that you're using the right reference class, which can be a challenge unto itself. For my estimate, I could use a sample of all THUNK videos ever, but the later ones are significantly different from the earlier ones. Thank goodness. Maybe instead I should be using something more like THUNK videos in the past year or THUNK videos since I got a new camera. When I select a more specific reference class, I'm potentially gaining accuracy by honing in on exactly what kind of thing I'm looking at. But I'm also potentially losing accuracy because now I have fewer examples to draw from. And unfortunately, picking a reference class is a largely intuitive process. If you think that I'm using the wrong reference class for my forecasts, then we might get vastly different answers with no real way to tell who's right. To address that, we might use a different tool that's used to deal with overfitting and other problems with predictive functions, cross-validation. Cross-validation is used to check how well an existing model works when applied to new information. Rather than feeding every available data point into it to really fine-tune it, we instead hold a few of them back to see if the model can predict them accurately. For example, let's say that you and I were trying to figure out where we should go for dinner and disagreed about the most appropriate reference class for a particular restaurant. I think that we should be using the reference class of Italian restaurants, which predicts a high likelihood that it'll be delicious. Whereas you think that we should be using the reference class of restaurants in this area, which unfortunately predicts a low one. After some thinking, we realize that we know someone in this area who is probably eaten at that restaurant. But instead of calling her up immediately to get her opinion, we write down what we think she's likely to say based on our respective models. I think that she'll say something like, Oh my God, be sure to try the Bolognese. Whereas you think she'll say something more like, It's alright if you want a basic slice of pizza, but nothing special. When we call her up, she gives an answer that's more or less what you wrote down. So we know that your model is more likely to be accurate. And so we're probably better off driving to a different area to find someplace to eat. Maybe some really good Italian. Flooded as we are with information about the world, with a clear bias both in media and our attention towards noteworthy events, meaning weird or novel ones, it's important to remember that unlikely things happen all the time. And trying too hard to fit them into our conception of how the world usually works can leave us believing some silly things. But using reference class forecasting and cross-validation, we might be able to rescue those mental models from getting totally wrecked by one data point that was a little bit off. Are you also prone to vastly overthinking things? Please leave a comment below and let me know what you think. Thank you very much for watching. Don't forget to subscribe last year and don't stop thunking.