 Check. All right, audio looks good. Who are we missing? Natalia. Where is Natalia? All right, I'll get started and when she comes in you should all stare at her intensely. Okay. Welcome back. It's just to go rethinking this is the last lecture of the 2017 into 2018 series. Look at all these late people. I'm into public shaming. You can do it to me too reciprocally. This is the last lecture in the 2017 to 2018 series. What I want to do today is finish chapter 14 about measurement error and missing data and then spend the remainder of the time trying to give you an unsatisfying overview of the course content and maybe some pointers into directions you might go from here. Thank you, directors, you could go from here if you so choose. Before I get right back to the slide we ended on on Wednesday, let me take just a minute to draw back a bit and remind you that this whole business of Bayesian inference is Bayesian inference is nothing more than counting up all the ways things can happen according to your assumptions and then things that can happen more ways possible. And that's it. It's totally non-magical. It's just computational and the only thing that makes it seem impressive is that it's really hard to do counting yourself. So we use machines and then it somehow seems sciency, right? But if you did it with marbles in the sand, it wouldn't seem sciency and you could do it all. You just need a lot of marbles. You start counting them. And that was the chapter 2. That's the way we did it in chapter 2. Every statistical example in the course after that has been the same procedure. There's nothing about the procedure that is anything more than discovering, logical discovery, the implications of your assumptions. But that's a lot because that's really hard to do with natural language. That's why we need the mathematics to do it. That's also why there's no automatic access to truth through statistical inference, right? You just this is not it's not possible. There's no way to process data that automatically makes your conclusions true. It just isn't the case. And now I say this and you're all like, well, yeah, who thought it was, but it's worth saying I find. It really is worth saying over and over again, just to remind ourselves because debates about statistical evidence in the sciences or in the public often trade on the idea, well, there's statistical evidence that this thing is true. It's statistical evidence. You can't deny it. Yes, you can't. You can't deny it because it depends upon the assumptions, right? And so but it's still powerful and useful, of course, because it seems to be really hard for people in the absence of a mathematical prosthetic to go from assumptions to conclusions without error. I'll say that again. It seems to be really really hard for people in the absence of some mathematical prosthetic to go from their assumptions to their conclusions without error. How do we know that? Because when we take verbal arguments about the implications of assumptions to subject them to formalism, we often find we were mistaken before. And that the necessary insufficient conditions for things are not necessarily the ones we thought they were, just through verbal reasoning. And again, I'm not saying anything new here. This is like philosophy 101, right, basic philosophy of science. But it's worth reminding you that that's the context in which we do all our work in this course. And so we come to these interesting things like missing data computation, which are a little bit baroque from an interest that's perspective. I'm trying to give you the impression that this is all part of the unified whole. These models are just as natural as any other model because there are implications of assumptions. And what the measurement error tricks that we did on Wednesday and the missing data tricks we're going to do today, they're not tricks. They arise naturally from the assumptions we've already made about models. And now we just have to pull those things out in these contexts. So it's this thing I tried to say on Wednesday that in most pedestrian data analysis context, it doesn't much matter if you use a Bayesian approach or any of the, you know, thousand non-Bayesian approaches, right? Non-Bayesian is not a unified field. It's just things that aren't Bayesian, but they're not necessarily alike one another, right? But there's a bunch of other approaches and they can work. And for a linear regression, for example, you can justify it many ways. Many, many different sets of assumptions will lead you inexorably to the same linear regression, which leads to this comment I've often made that the funny thing about the field of statistics is that it's principles, what's controversial statistics is inverted relative to its foundations compared to the sciences. So in the sciences, like evolution or biology, the things that are controversial are on the cutting edge of knowledge. But we all share this deep core of theory and facts that forms evolutionary theory and it's largely uncontroversial, right? The things that are supposedly debates about evolutionary theory are actually debates about little ways to describe things, right? Like the niche construction controversies. They're really not controversies about evolutionary processes, they're controversies about how to describe them. And that's not useless, but it's really different than, say, a controversy in statistics where in statistics what's controversial are the foundations. What's not controversial is how to analyze data. So it's really weird. So it's like in statistics, everybody uses linear regression sometimes, but no one can agree upon why it works. And it's just such a bizarre thing about statistics. So it's very important then, when I teach you a course like this, I'm not trying to tell you that there's only one way to do it, but rather I want you to understand how the Bayesian formalism works and you want to come back to this bedrock of the foundation, that the foundation of Bayesian inference is very modest, self-humiliating. It is that we just count up all the ways the data can arise according to our assumptions. And the states of the world that have more ways to make the data are more plausible. Well, that's all it is. And from there, you can do a lot, it turns out. So now to missing data, so that's sermon 75, I forget what I'm up to, but when we come to missing data, we get to apply this. And again, there are assumptions already in your models that can emerge in cases where you have failed to observe variables because in Bayesian inference, when you fail to observe a variable, that just means it's a parameter because an unobserved variable is a parameter. That's all the parameter is is an unobserved variable. An unobserved variable is data, or data owned, if you, if you, and so another problem that's very, very common in data analysis, at least in sort of data analysis that I work with, because I work with observational systems a lot, is that for some cases in the data, for some variables, you have missing observations. And there can be all kinds of reasons for this, but it's really common. And the usual way to deal with this, in fact, it's so usual that the base regression tools and R do this silently without asking you, which I think is a terrible, terrible design decision. Whoever did this should be publicly shamed for this. This is bad, right? And why is it bad? Because it leads to mistakes without any obvious way to notice them. So what it does is this thing called complete case analysis. For any case in the data, any row, the way we've been setting up data, where there is a missing value, it just drops the whole thing. So all of the observed data for that case is gone from your analysis. And then your sample size goes down. Then this is what's dangerous, is that people will start comparing models that have different numbers of predictors in them, and then silently R will drop cases. And then you end up comparing the fit of models of different predictors that actually have different cases in them. And that is bad if you can't do that, by the way. You'll get invalid differences really, really fast. So the complete case analysis isn't necessarily bad. It's not necessarily wrong. It's a very conservative thing to do, though. And for it to make sense to not bias your inference by dropping the incomplete cases, you need to believe that the missingness is scattered randomly in your predictors. I gave you some pointers, some links in the chapter to exactly what I mean by randomly here. It turns out that there are at least three interesting things you can mean by that. But for the sake of the lecture, I'll move through and say that what you need to be true is that the missingness, the NAs, are put in there through some process that doesn't have to do with the values of the variables or the relationships with the outcome. So it's more like your research system. Let's call it TVR, or something like that. You're going to be burned some of the forms, and they're gone. But that burning process was at random. So this missing a data situation, the complete case analysis, they won't bias your inference. It's inefficient because you're losing data, though, because you're throwing away all the information in the non-missing variables for those cases. Does that make sense? So we don't want to do the complete case analysis, actually. We'd like some way, using the same assumptions about missingness at random, to use all the information that we have observed while honestly treating, according to the implications of our model, the missing cases. And again, this would definitely be easy. So that's what I want to show you today. This is the do tool. So there are lots of alternatives. There are some that you should never do. What you should do is some form of imputation. This is what we're going to do today. It's what's called Bayesian imputation. Something that is often done, and I want to emphasize, you should never do this. Is replace the missing values with means of that column or something like that. This is very bad because now your model conclusions don't know that you're just pretending. It thinks you observed those values. And you get really weird... I mean, there's no telling how it applies, but it'll definitely create strange differences. This is super common in machine learning, by the way. And not to just pick on them, because of course, statisticians committed all these things before machine learning was a phrase. But you shouldn't do this. There's a common technique, which is incredibly useful, called multiple imputation, which is usually attributed to a statistician named Don Rubin. And Rubin is actually a Bayesian, and multiple imputation he derived is an approximation to a Bayesian procedure. So now that we can just do the Bayesian procedure on the desktop, we're just going to do the Bayesian procedure that multiple imputation approximates. But you'll find packages for doing multiple imputation. You just want to understand that what that's aiming at is an approximation of what you're going to learn today. Yeah, so just do. The full thing about the approximation is my advice. But multiple imputation is great. It works really well. There's nothing wrong with it at all. And then, yeah, there's always other stuff at Hockery that comes up. So, yeah, impute. The word impute. You should look it up at the dictionary, if you like. In my dictionary, it actually has an example that contains my name for some reason. The crimes imputed to Richard. They are many. All right, so let's go back to a previous data context so that I don't have to introduce, spend time introducing a new data set. Remember the primary milk energy data set from chapter five? Long time ago. Last year, literally. And there are, turns out in these data, we were interested in the relationships among milk energy, adult body mass, and the proportion of the brain that is neocortex. And if you recall, we ended up doing a complete case analysis back then, because there are actually a lot of missing values in the neocortex column. There are a number of primates, most of them prosimians, which nobody has ever bothered to measure the proportion of the brain that is. So, actually, I need to update this. I think I have these data now to fill in the NAs. So in second edition, maybe I'll do better. But I've got a bigger data set now. But anyway, let's just carry through with the historical analysis. Just for the point of the lesson. And so what do we do with this? We, if you do the complete case analysis, which is, like I said, it's not necessarily bad. The problem is you're, it's inefficient because you're dropping all these extra body mass and milk energy data cases. And there may be information in those that tells you something about relationship between body mass and milk energy. But you throw all that information away when you do the complete case analysis. Yeah? So how can we get that in there and deal with the NAs? And that's what we're going to do. We're going to do the Bayesian notation analysis. So here's the idea. Well, yeah, here's the M-car thing I talked about. You have to suppose your undergrad assistant lost those neocortex values that they had been collected and then they were lost at random. Now they're not lost at random here because they're, they're predominantly in primates that aren't as glamorous to study. Right? That's, that's, let's face it, that's how it goes with primatology. Yeah. I like lemurs, but they're not, you know, as career boosting as certain other primates. So here's the way imputation works in a rhetorical sense. And then we'll do it computationally. You could consider the neotortex variable and ask what is your best guess of the missing value? And the answer is the posterior distribution from the remaining data. So that for each case, if there's an observed body mass and you know the regression between body mass and milk energy and neocortex percent from all the other cases, then you can impute a value for the missing value for what should go there. Because the relationship between the other variables from the cases where you have information, it implies something about the missing value. It won't just be the mean of the column because the body mass has a particular value and the milk energy has a particular value. And so the values that are consistent with what you do know about that case, narrow down the plausible neocortex values. Yeah, does that make sense? That's the idea. So there's information here and it arises from the same old Bayesian thing. We count up all the ways things could happen according to our assumptions. And we just want to let the model do this though because there's long dimensions in these things. So how do we set this up? Mechanically, here's how it works. And I'll show you the model description on the next slide. You have to imagine now we've got this variable, the column neocortex percent. There is a variable. And what's weird about this now that hasn't been true before is, before variables were either observed or unobserved in total. Either you had the observations, in which case we called it data, or you didn't, in which case you called it a parameter. Now you've got a single variable in which it depends upon the case whether it's observed or not. But that's okay. It works the same way. Every place it's observed, it's data. Every place it's not observed, it's a parameter. So now you get this and Bayes don't care. But he's just perfectly fine with that. It's just as hard a problem as the other one for Bayesian inputs. So now what we're going to do is every place there's a missing value, we insert a parameter and we estimate it. And you might say, well, what's the prior? Well, the prior is the likelihood because they're the same. It's remember, a likelihood is when it's observed. A prior is when it's unobserved. But it's just a distributional assumption. It's the same idea. So the observed cases here inform the prior. You learn the prior from the data. And then the unobserved cases, they're constrained by the prior. That's learned from the observed cases. You having fun yet? Let's look at the models and see how this goes. So again, it reminds you the theme from Wednesday was, what I love about Bayesian inferences, it obviously needs to be clever. You don't have to realize how this is going to work at all. You just set up the assumptions of the model. So here's what the model looks like. And I'll try to explain this in a way that helps you understand how this arises naturally from the assumptions. So I think this is one of the cases where I got a lot of animation. Yeah, so here's the whole model. Let me step through it. Up top, we've got our outcome, kilo calories, milk energy, normally distributed with some mean for each Kci and standard deviation. Now we've got our linear model. And it's inside it is this variable n sub i, but the variable n, as I've indicated at the top, is a mix. It's a vector that is a mix of observed fixed values and parameters. And those parameters are distributions in the way you're thinking about the calculus of this. But as the Markov chain proceeds, they're determined at each step, at each jump, right? And particular values get plugged in. But the chain keeps moving, right? And it moves across the posterior distribution, and that's why it lets it integrate over the other circuit. Yeah, does that make sense? But the linear model looks exactly the same as it did before because that's the assumption. This is the assumption we're making, is that there's some relationship between milk energy and these two variables, neocortex percent n, and log body mass log m. And we just write the linear model as it was before. The fact that some of these are missing, that's fine. It doesn't change the model, right? It just changes our calculations. And then the only thing we add that we haven't usually bothered to add before, but has always been implied in our modeling, is that each variable, each of the things we call data before, has some distribution as well, and we could estimate that. It has its own likelihood. The fact that it's a predictor means that we didn't worry about that. We weren't interested in its mean and standard deviation or its correlations with other predictors. But now we have a reason to worry about it. And so we put in this second likelihood, n sub i is distributed normally, but some mean, new, and some standard deviation, sigma sub n, yeah. The world's most simple likelihood function. But this is when neocortex is observed, this is a likelihood. When it's not observed, it's a prior. And the parameters new and sigma sub n are learned from the observed cases, right? Their posterior distribution gets updated from the observed cases. And then the unobserved cases are imputed based upon, well, actually, all the parameters in the model. We'll get there in a second. There's information flowing faster than the speed of light across all the dimensions, remember my metaphor, I would say, right? The only thing faster than the speed of light in the universe is information, because it doesn't move, right? It's already there, right? It's just our implications, our realization that moves. Sorry, I certainly sound like a Buddhist. All right, so, yeah, those of you in my department, you know, it doesn't take much. So when observed a likelihood, when imputed a prior, it was just nomenclature to help us distinguish between observed and unobserved variables. The distribution assignment is the same logic, right? We're assigning plausibility to different values through these distributions. In the mean to neocortex, it has to be estimated and the standard deviation, which has to be estimated. And then there are a bunch of priors. So when you look at it in the code, it doesn't really look any different than that. It's the same thing. The neocortex variable appears here, and then it's here, you've got another assignment. And the nu and sigma, you have to have priors for them. Yeah, nu, neocortex is scaled as a proportion from zero to one, so I set nu at 0.51. It's not exactly normal, because it can't be negative. There's a homework problem at the end of chapter 14 where you actually constrain it to zero and one. If you want to have some fun tonight, use a basic distribution. Yeah, I see Natalia is eager to do this, right? Or Tamakami, one of the two, or both. You can do both. So this is it, and what happens computationally here, you have to map to stand, I wrote map to stand as a teaching tool. It's kind of a teaching tool that kind of got away from itself, but I wrote it as a teaching tool because I wanted some way for students to be able to specify the model, which forced them, inconveniently, you're welcome, to put all the assumptions in. But I wanted it to abstract away from the computation. So you have to, it looks as much like the mathematical definition of the statistical model as I was mercifully willing to make it. It doesn't have all the detail. You don't have to do the indexing and all that stuff. But it doesn't, there's nothing in here about having to say, take neocortex and then split out all the ones that are missing and make a vector of parameters at that length. That's what goes on under the hood here. When you invoke a model like this and map to stand, it checks all of the data you feed in for missing values. If it finds missing values, then it sees if there's a distribution assigned to that variable. If there isn't, it stops, and it tells you with an angry scolding message, you cannot, you know, you have missing values in your model, and you do assign a distribution to it. Then you do your life run. And then you do that, and then you put in a distribution for neocortex, and then it's happy, because now it has an assumption that lets it do invocation. And it arises spontaneously from the nature of the assumption. Yeah? If you have no missing values, you can put this in too, and it'll run fine. You'll get estimates for new in Sigma 7, right? It's just in the presence of at least one missing value, you get to use those estimates for new in Sigma 7 and to do some extra work. So I'm going to say this like 100 different ways, because I think the only thing hard about this is the conceptual part of it, but all of this potential has been lurking in these models since the first linear regression ran. And there's some annoying programming stuff that has to get done when you make a Markov chain out of this, but it's just what I call index fiddling, right? I think Brendan is a professional in index fiddling, right Brendan? Yes, and there are nightmares about index fiddling, but it's just part of the stuff that I want to extract away from you guys. Okay, so what happens when you run this model? Well the first thing to note is you get a vector in your posterior distribution of 12 missing values that have been imputed, and that's what I'm showing you on the right, and this caterpillar was it, forest plot, something like that. So mean neocortex here, it's about a little over 60%, right? That's where it is, and that's just the average in the primates. Primates have a lot of neocortex, for mammals this is a lot, right? And then there are imputed values for each of the species that have some value missing. You'll notice that they're pretty imprecise, right? They cover a whole range from about 0.55 to 0.8. They're imputed, but there's not a lot of information in this analysis with which to pin down the missing values, but this was useful, this was not a useful exercise because you got to use all the data in the other variable, right, to inform the beta coefficient between body mass and milk energy, yeah? Now second thing to note, second order thing to note is they're not all the same, I mean, even though their prior is the same. And why? Because these parameters are in the linear model, and they're being multiplied by a slope, right? The relationship between neocortex percent and milk energy. And it's not zero, it's definitely not zero. And so for any particular species then, that means that some values of neocortex percent are more plausible than others. Some are excluded by the size of the slope, right? If the slope were zero, then those would all be the same. Does that make sense? You can think about that limiting case. If there was no relationship in the other cases, present in the data, between neocortex percent and milk energy, then all you'd have here is the prior, and there'd be no way to do better than that. But since there is a relationship, then the body mass you observe for a species tells you something about its unknown percent neocortex. Yeah? You with me? Exciting? Yeah, okay. Good. So now you don't get much here, but sometimes you get a lot more than this. It all depends upon the details. All right. Yeah, and then the slopes have changed here. So that's the kind of thing you want to tell your colleagues, right? When you include all the cases, it doesn't matter. It affects the analysis, right? If it doesn't affect the analysis, you still want to tell people that. I got it, as I always say. You have to check. All right. So here's what's going on. As I said before, the infuted values are tracking the regression. They're doing so very weakly. They're getting nudged around. So the neocortex, when it's been observed in these data, is associated with milk energy. That was the whole point, back in Chapter 5, remember, that you need both these things. Neocortex is associated with milk energy in one direction. Body mass in the other. So you've got to put them both in the model because there's a mass in the fact, remember? That was the whole point of this data set. This is why I searched for this data set. I put it in the book. And so there is an association. The fact that there's association means that when you do the imputation of the missing values, they get nudged, pulled towards this regression line a bit. So this is what I'm trying to show you in this. I'm sorry, horrendously ugly scatter slides on this slide. But let me talk you through it real quick. So on the bottom is neocortex proportion. So there's a pretty restricted range in this data set. And then kilocalories per gram of milk. On the vertical, the regression line and it's, I think it's 89% confidence bow tie. Fusili is that the pasta looks like bow tie. And this is a fusili. It's shown there in gray. And the closed points are the observed values of neocortex. And the open ones are the imputed values with their 89% intervals shown on them as line segments. And you can see that they tilt towards the line. But not much, but they tilt. There's tremendous uncertainty about them. But they're tilting and that tilt, that slight tilt, is induced by the direction the line goes. But there's a lot of scatter. The relationship is not super strong here. It has to do the particular body masses as well. Okay. Yeah, so this is what I said before. Now here's the thing. We can do better than this because when we've done the imputation, we've ignored the fact that the predictors are associated with one another. They're correlated. So body mass is correlated with neocortex size. And we can use that to do better imputation of the neocortex percent. So let me show you that model. That's the last model of the course. And that's what we're going to do now. And you can see the consequence of ignoring that in the results from this model you can see here. When we plot log body mass here and the neocortex proportion here, the imputed values miss the correlation between these two things. You see that they're positively associated. You can see that in the field points, which are the observed cases. And the imputed cases, there's no correlation because your model assumed there was no correlation. It didn't have a parameter to measure that correlation with. So it's missing something here. We can make a better model by giving it a parameter to estimate this. And you're just like, well, how would you do that? Well, you know how to do that. You need a linear regression. Right? That's all it is. Now we're going to have two simultaneous linear regressions in the same model. You're welcome. And there's no problem. This is easy for your computer to do. You can really punish Stan, by the way. You can get a lot fancier than this. And Stan will be fine with it. As I keep saying, I finished a project a couple of months ago with 27,000 parameters. Stan can handle it. It's OK. You know, I ran that on our computing cluster downstairs, not on my desktop. But still, Stan can handle it. You can be aggressive. And if you have problems, come see me. You know, or my offices. So naive, the naive imputation model we just did, the prior is just insubbi is normal with new and some standard deviation. Slightly less naive, still naive. It's always naive in stats. But slightly less naive is we make new and linear models. So now it's new sub-I. Now there's another intercept, but it's the neocortex intercept. And then there's a slope, which gives you measures, the association between log-body mass and neocortex. Yeah, just linear regression. You're running another linear regression. This makes sense? Yeah. So we've run two linear regressions at the same time. Code's in the book to do this. You should definitely run this yourself and get an idea for it and change the assumptions and play around with it. Now, as before, we get in the posterior distribution, 12 imputed values, different ones. And we've got a bunch more slopes and stuff. Now there's this gamma body mass slope, which I put in there, which measures the association between log-body mass and neocortex percent. And it's positive. We knew that because we looked at the scatter plot. Right? But now we've got the posterior distribution for how positive, what's possible range is. And this affects the imputation. What does it do? Well, now, the first thing to look at is in the upper right, the plot in the upper right, log mass against neocortex proportion. Again, this is the plot we looked at a few slides ago. The filled points are the observed cases. Those are species in which we've observed both the body mass and the neocortex proportion. The open circles with the line segments are the cases where we had to compute the neocortex. Yeah? And now you'll notice that they're in the path of the total scatter. And that's because of that gamma parameter, which measures the association. So now the imputation is drawn in there. It uses that gamma to say, well, if you haven't observed the neocortex for this species, but you told me it's body mass, I can make a better guess now. Of its neocortex, because I know gamma. Yeah, well, know it. You know more than nothing. That's what you've done, right? That's all we ever know is more. Does this make sense? This is super, yeah? And again, this isn't magic. It doesn't make up information. You're really exploiting the assumptions that were already present. It's a perfectly reasonable regression assumption that were already present in your model. And it also has consequences for the other relationship as well. But that's not necessarily the exciting part. The exciting part is to understand structurally what's going on under the hood here. Now, I make this final point, final philosophical point here, is that information's flowing in all directions here. It isn't, the imputed values are being informed by the cases where things are present. But information is also flowing from them into the regression coefficients. Information goes both ways. There's nothing about Bayesian inference which means information only flows in one direction. It goes both ways. And so it's not too unusual that the imputed values can change what you believe about the relationship before you put them in there. That's not super weird. It's counter-tuitive. And again, when something feels counter-tuitive, that just needs to spin a violation of our intuition. And that's an opportunity to learn something. So the very last homework problem in the book. This is what I want you guys to do for your homework for the last thing of this is just the very last homework problem in the book. It's the end of chapter 14. Just one problem and just do it and write it up and think about it. And you might think like, this is super easy. This is a regression problem where there's one missing value. I think there's 11 data points. There's one missing value. And when you impute it, the sign of the slope between the X and Y variables changes. It flips. And understanding why is your Satori moment. You know the word Satori? Japanese thing. It'll be a moment of enlightenment. You will understand. Now, most data analysis situations aren't like that. I rigged it. I totally rigged it. It's not meant to make you do what I want to do. But that's what teaching is about. It's about rigging the universe so you can learn from it. Because usually the universe is hostile to human life and blocks learning. So the whole point of pedagogy is to make the universe more benign so you can actually learn something from it. And so look at this benign final horror problem in the books. The last hard problem at the end of chapter 14. And enjoy. It's, you'll find that I think, if I'm right, it'll help you understand the more typical sorts of situations like this one from the book. Okay. Let me give you a little more than half a dozen slides to try and wrap up the course and give you some sense of what you've learned and where you can go next. The central metaphor in this course is that statistical models are powerful and dangerous constructs. I use this Golem metaphor because I like it. Prague is a nice city. The Golem is a sort of romantic image about how good intentions can go awry. And this is how I feel about all of science, actually. And I get this metaphor from a book about the history of science called The Golem Colon, What You Should Know About Science, which is a great book by Collin's and Pinch. And applied to statistical models, the metaphor is a bit stronger, of course, because there's an explicit case of construction. And the construction is often done by individuals when you build your statistical models. Science is a big cybernetic process. It's not clear who's constructing any particular scientific conclusion. But with statistical models, you can see the metaphor a bit more strongly. And the thing to do is not treat them as if they have any natural insight. They're just constructs and they're slaves to their programming. Nevertheless, it's very difficult for the designers to understand the implications of the programming. And that's why I've emphasized so much in this course, ways to, well, understand the model after you've run it by plotting posterior predictions. It's not focusing on tables of coefficients, which often just leads to misunderstanding about what the model is doing, but focusing on posterior distributions of the implications of the model on the scale of observation, the thing that you care about, the business end of the science, yeah? So now let's pull back a bit to the wider science context again for a second. And so the title of this slide is Stats Not Substitute for Science, which of course no one would argue the statistics is a substitute for science. What do I mean? Sometimes there's an implied, non-tirely explicit style of argument in the sciences that someone has generated statistical evidence for something. And that means it's true. And this, now of course if you just say that, people will say, no, we don't think that. But there's this kind of folk tradition in the sciences, in all the sciences, occasionally to act like this. What do I mean? Well, I think the psychologists in the room are the most advanced across this in thinking about this, because their field is currently undergoing a very healthy process of regeneration, like a doctor do or something, right? And many textbook results turn out not to be replicable. And so the one that I was like, I'm inviting hate mail by actually mentioning results, but there was a psychologist, there is a psychologist named John Barb who has this famous result where, if you give people a word search problem that contains words like Florida, they walk more slowly afterwards. Yeah, I know. But so leaving the results, this thing is in a bunch of textbooks and it's a famous study. It's been cited hundreds, if not thousands of times. It's a bunch of times. It's really famous. No one ever thought to replicate it. And then when somebody did decide to replicate it, it turned out to be really hard. In fact, maybe no one can. And in large high power replications, there's nothing. There's no effect of this. But whole fields, the whole field of framing was built on results like this, where people said, well, there was a study done and the p-value was less than 5% is true. There you go. Let's base a field on it. Yeah. Now, and when you say something like that, of course it sounds ridiculous, but no one ever said it that way. But lots of the science is build up empirical results this way. And that's what I mean by stats is not a substitute for science. Getting a significant result, which we haven't done at all in this course, and I don't see any need to engage in arbitrary thresholds in terms by the accident of how many digits some ray-thin fish had in some ancient geological era. That's why I was at 5% because you have five fingers on each hand. But it's just an arbitrary thing. Thresholds are arbitrary. There's nothing about the evidence, the strength of evidence crossing that boundary which suddenly makes it good evidence or not. Which is nothing. Right? And it's obviously true, but there's this convention. That convention aside, it's often wrong. And there are lots of things that scientists can do to turn any data set into a significant result. In fact, there are labs that teach their students to do this with great, great power. And drop some outliers, try something anyway. I don't know, I'm not going to teach you how to do it. But you know, right? You know, it's called p-hacking now, which is a great phrase. And so let me draw this out a bit and give you some sense of even in the absence of p-hacking. If everybody was honest, it would still be the case that stats, you have to have the replication and you have to have triangulation with different kinds of studies on evidence. And again, everybody knows that, right? But we often behave in the sciences as if stats was a substitute. And so let's set up some simple thought experiment which does great violence to the actual process of science, but nevertheless violates our intuitions and therefore teaches us something. Yeah, let's assume there's a false positive. Studies are done and for each study that's done, you get some result and there's a convention about whether it's good evidence or not for a finding. If you've made a discovery, we won't say what statistical procedure it is. It doesn't much matter for the story. It's a signal detection problem. And there's some probability you get a false positive finding is 5%. Yeah, which is, you know, that's what people supposedly set there. Just procedures two is 5%. The probability of a true positive finding, if there's actually a finding in the system you're studying that you're probing, you have an 80% chance of finding it. Whereas if there's not actually anything going on only 5% of the time, we can get a positive indication, right? So sometimes we call these the false positive rate in power, but this is signal detection. This is not null hypothesis testing. This is just signal detection. It'd be true for detecting planes on radar. You get things like this, yeah? Finding submarines in the Bay of this Game. All these things work this way. And now I ask you the basic question, conditional on a positive finding. What's the probability the finding is true? So this is like you when you're reading a journal, you're reading PNAS, and someone has found a correlation between sitting in first class and getting angry. And so I think this was an actual study actually. And now what's the probability that's true? Now we can't answer that question necessarily, but you can, this thought experiment can show us that this is why this is difficult. And perhaps, in my opinion, why we should be skeptical of a lot of these things. It requires some replication. Unsurprisingly, the way I think you should answer this is using Bay's theorem. You saw this kind. So to remind you in this context, Bay's theorem tells us that the probability the finding is correct. That it's actually true. There is a phenomenon there. Conditional on a positive indicator is the probability of a positive indicator conditional on true, which we know. Yeah, we think we know. At least I'm giving you the thought experiment so we know. All right. What is it? 80%, right? 0.8. Yeah, you with me? That's 0.8 is the probability of a positive indicator when it's actually true. Yes? Yeah, I'm not, you don't seem convinced. Yes, okay, thank you. And the probability, times the probability is actually true, which we're going to talk about a lot next. What is that? That's hard. That's the hardest part of this. Divided by the probability of a positive finding. The probability of a positive finding means the average probability across all the states of the world, which means I expanded here in the denominator. It's the probability of a plus given true and that's the power times the probability that things are actually true. Yeah? Plus the probability of a plus given its false, that's 0.05, 5% up top, times the probability that it's false instead, right? That's the denominator. The marginal likelihood, we called it in Chapter 2, 3. So, you can make graphs from this, plugging in arbitrary values for the thing on the bottom that we have no idea about. The thing we have no idea about is, in a given field, in a given period of its history, what is the base rate of correct hypotheses that are being deferred? This is a weird thing to think about, right? It's just a thought experiment. But you have to imagine that there's some process of theory construction following up on previous results or people just having thoughts like, huh, I wonder if people read the word Florida if they walk slower. But I'm going to keep thinking on that result because it's just like what? But people in my experience will give you just like crazy numbers over the whole range for this. My guess is like 1%, to be perfectly honest. I think, and where do I get that? Because I've read a lot of history of science. And in history of science, the first order of induction is that everything is wrong. Why? Because everything has been wrong in the past. It's like nothing that turned out to be correct. And so everything gets embarrassed and replaced. It just never stops. And that's so hopeful, right? It just fills me with hope. No, the base rate is kind of low. We do learn things from failures. That's why we learn stuff about the world. But our, if you will, innate theory-generating apparatus as human beings is not so lustrous. And history of science tells us that. So, but other people have very different ideas. I've had people tell me that, no, in my field, it's like 50% of the hypotheses are probably true. And I'm like, OK, well, I mean, I can't disprove it, right? This is really hard. I'm not sure how to process evidence to tell you. What I can tell you, though, is that if it's low, think about this graph. When I'm plotting on the vertical here is the thing we'd like to know. The probability something's true, given you have a positive indicator. This is your reading PNAS about air rage. And there's a significant value in there. There's a bunch of significant values. The paper just littered with asterisks, right? And so now you're like, well, is this true or not? Well, to make that inference, you need to develop some belief about the base rate in which research of that kind generates legitimate hypotheses. And that's relevant to the inference here. Not that we can ever know that, but it's relevant to it. And so I put some plots up here for different power, different probabilities of a positive indicator given something is true. You'll see it's not very sensitive to power, but it's extremely sensitive to the base rate. So for low base rates, even if your power is really high, like one, still most of your positive indicators will be false, right? Because you're testing a big bag of false stuff. So even if your false positive rate is low, most of your positive indicators will be nonsense. So this is why people in my department are tired of this sermon, but this is why theory development is crucial. And you need methods for developing and banning theory. It isn't enough just to use statistical procedures to figure out what's true or not because you waste your time testing nonsense. And then you'll be drowned in false positives. That's, this is supposed to be uplifting, sorry. It was like dark real fast. But yes, it's uplifting. So focus on theory. Yeah, no, we need methods for theory development. It's not enough to just say Florida's slow. Maybe that'll work. And you get a positive indicator. Well, yeah, but you can get a positive indicator. You test enough stuff, and then there's publication bias. You can get positive indicators all over the place. And it's not enough. And again, I think there's a latent wisdom about all this, but it's worth saying at the end of a stats class something about this, because I feel personally betrayed in all the stats class I've never talked about the process of statistics in science like this. There was no context in which it was deployed. It was just you got to feed out and you can publish that sort of stuff. No, literally it was just horrible. And so I am wracked by guilt when I give stats lectures and don't mention this broader context. That said, I don't think science works this way. Science is much more interesting than this. It's more complicated. And it's also not a pure signal detection problem. But the violation of intuition here is really, really useful. Yeah. So same thing over here. This graph is, again, probably is true given the false positive. I'm very base rate. But now the lines I vary, the false positive rate. So they're not always 5%. Very, very sensitive to this, because it changes the rate at which your bucket of false elevates to positive. And again, just my opinion, but I think many of the procedures that people learn in their apprenticeship as scientists help those people in their careers because they elevate the false positive rate. So this is what p-hacking is. P-hacking is a range of behaviors. Many of them are normative in many fields. Things like dropping out wires, trying everything. Those procedures are normative in many fields. People will just publicly encourage you to do it. There was a famous case at a certain Cornell psychology professor who made a blog post in which he told this Victoria story of encouraging a student to p-hack a data set. And then the internet was horrified. But he presented it as if it was totally normative. He was like, I mean, this is a celebratory post about this thing we p-hacked. He didn't call it p-hacking. But no. So how does that happen? It happens because the norms embody lots of stuff that, well, it gives you positives. The industry of science. Sorry, this is going to go dark too fast. But the industry of science selects for positive results. It doesn't necessarily select for true positive results. Right? Because we don't know the truth. And so p-hacking has become normative in many fields. And the field of statistics is very upset about this now because, of course, they never encourage people to do this. But the field of statistics doesn't have control. It's largely autonomous from the sciences. And that's part of the problem, I guess. Anyway, I have no solutions here, just except you folks who will inherit this and fix it. Right? Just to give you a little bit of backup about why I think the base rate is low, I mentioned history of science. You don't have to use just history of science. There are a lot of ongoing replication studies now in medicine and in social psychology, which is the one I put up on the slide here, but also in economics and other places where fields have taken on board the sort of idea I had on the previous slide and thought, okay, well, it's not good enough that some famous person did a study in 1965 announcing all the textbooks. Does it replicate? And we're all worried about publication bias now, right? So you get these high-powered studies with multiple labs involved where they try to replicate famous results. And often a large number, a majority in this case, don't replicate. And by don't replicate, I mean, it isn't that there's not necessarily any effect, but the effects are really tiny when there is one, relative to the original published results. So the publication bias exaggerates the importance of factors, I think, quite often, even when there is something really there. But a lot of this stuff, there's basically nothing. So what you're looking at here, this is this mini lab study. I think it was the second one, mini labs two. And this is all up on the open science framework. And this is the best graph from their from their report, even though the graphic design principles of this caused me deep pain. Sorry, but it's an amazing study. So what you're looking at on each row is a different kind of famous effect, a style of experiment that has been done. And these were chosen because they were things people talked about a lot. These are the kinds of framing effects that were in Daniel Kahneman's book, Thinking Fast and Slow. We talked about these incredible results that you have to believe in because the evidence is so compelling. It's literally he says that in the book. And then, well, okay, let's try to replicate them, these compelling results. And it turns out most of them, this is zero, the effect size of zero. The original studies are the green triangles. And the little accents are the replication studies across labs. And then the black points are the weighted means across the studies because they have different sample sizes. And so, yeah, it's a lot of these famous things, the warp perception, weight embodiment, power perspective, persistence effects, availability of heuristic. Some of these is like, there's nothing to write home about. But if there is anything going on, it's way, way weaker than the original studies. Where there's some kind of publication filter, which favors unusually strong effects, glues, because it's easier to get them published, I think is what goes on. The what's at the top is strew. So I put, this is there because some stuff is real. The strew effect is real. The strew effect replicates really, this is a great thing to give to your relatives. Next time you're home for Christmas break or something is do the strew effect with them. If you don't know what it is, Google it, strew, S-E-R-O-O-P, strew, it's awesome. It's one of the triumphs of psychology. And you can do it with your relatives. It works every time, every time. So strew was in this just to show that it could work. If strew hadn't replicated, I wouldn't have believed any of the results. Because strew is real. A world without strew is not a world worth living in. So things do replicate, right? And that's important because people who defended the original studies have sometimes said that, oh, it's just that the replications weren't done right. Well, they can do some of them, right? For things that are big and consistent across cultures like strew, it's no problem to replicate it. What's wrong with this other stuff? If it's true, it's true in a very subtle way. So just to try and summarize here, replications are always necessary. And the communication process that we use in science is always suspect. Everybody knows that editorial boards have their own particular biases. And those biases aren't always in the same direction. They vary across fields and across journals. But there's often a taste for surprising things. And I don't know about you, but I tend to think that surprising things are often false. That's why they're surprising. That's not a scientific principle, but it's a reason to be suspect. And we know that there's publication bias that positive and negative findings are not equally likely to make it into a journal. This book that I put up on the slide is called The Lost Elements. It's one of these history of science books that I enjoy. It's about the history of the periodic table of elements. What's great about this book is it's like 800 pages of stories of elements that were discovered, but actually turned out not to be real. And it turns out there are more of those than there are actual elements in the current table. So this is important to realize if you're feeling bad about your field, like I often do as an anthropologist, you shouldn't because the so-called hard sciences were just as messy. It's just that the textbooks erase all the mess. They don't put it into there. They don't mention the crazy fraudsters and accidents. And most of this stuff's not fraud, by the way. It's diligent people just making mistakes. Among them was Enrico Fermi, whose name you might have heard before one of the most famous physicists in the 20th century. He discovered false elements. And it was part of the reason he got a Nobel Prize. Luckily, there were other parts. But I was hearing his, I think there were neutron bombardment experiments. This is on the road to discovery nuclear fission. And he thought he had discovered new hypergear elements, but really he had just discovered fission. So he had broken things down. I think that's the story. I'm a little hazy on the details. So it can happen to Fermi. This is just part of what goes on. We make mistakes. We just have to recover from them. So in the beginning of the course, I complained about these flowcharts, like the one on the right that we used to choose statistical models. And one of the big problems with these flowcharts is that it doesn't give us any set of principles to make modeling decisions on. We're at the mercy of these flowcharts. And it embeds this kind of anxiety. And we get procedures, I think, and statistics as a result, where people feel like if they do something other than what's in the flowchart or what's in their textbook, then all hell will break loose. It's this kind of fear. And this leads to a ritual handwashing. It's compulsive. So that's like, well, P less than 5%. It's like, well, that's the convention. You have to do it. Otherwise, boo, science explodes. It doesn't. But you know what I mean. And I'm very sympathetic to the psychological struggles that go on. I absolutely am. All joking aside, I completely get it. And why that happens. And part of the reason I present the material as I do in this course about the Golo metaphor, for example, is that these models are really miserable things. It's amazing we learn anything from them. But we learn things from them because we construct them to do a bit of processing that we can. But we give them all their power through the assumptions. And we have to remember that. They have no automatic access to the truth. And so we should not, we should be concerned about the assumptions and the structure of them. But we shouldn't feel like making small adjustments is going to make everything explode because these things aren't precision instruments in the first place. That just isn't how they're set up. And I made a comment about this a little bit earlier about a lot of this is made worse by the fact that statistics is now an autonomous field. Stats departments, if you're a faculty member in most big research universities and in the actual statistics department, you're not rewarded for making the lives of scientists better. That's not how you get tenure. You get tenure for proving theorems about theoretical statistics and publishing in the Honours of Statistics, which is a journal that no scientist should ever read. And I'm not saying the results are very useless. They're definitely useful, eventually, to future alien civilizations. Sorry, I can't help but dig into this a little bit. But their professional incentives aren't to help the sciences. And I'm not saying they should be. It's just a fact of when statistics became autonomous and the prestige ratchet of how work happens in that field applied statistics as a much lower prestige than theoretical statistics. This will resonate, I think, with a lot of you. And that's a sociological problem that I have absolutely nothing smart to say about. It's just the fact, I think, that we wrestle with. And so part of my job I see in anthropology is I read that awful stuff for you and I try to do some translation because I see that that's part of the division of labour in sciences, is the sciences need people who can step across the field and read that stuff and come back. But it's not a job for everybody in the field because you have better things to do. Anyway, so I think what happens through this compulsion is that we get this idea that objectivity is good in science, but all objectivity means in this context is that everybody does it the same way. And that makes it safe. You're immunized from criticism if you do it like everybody else does. Even if, like in the case of P less than 5%, everybody knows it's crazy. The emperor has no clothes, 5%. Nobody can justify 5%, right? Now, maybe in an individual study, you could, I should say. That's fair. If you could in the case of an individual study, you give a certain cost of benefit, say we're going to set the false positive error rate in 5%. I can see that as a particular case. But as a convention across wide ranges of empirical topics and a bunch of different sciences, it's madness. Everybody knows this. And yet, it's safe. And you're immune from criticism when you do it. Right? Again, this is supposed to be uplifting, but it doesn't feel that way. So subjective. Subjectivity means expertise matters. We can disagree, but that doesn't mean it's bad. Subjectivity is a necessary part of doing the business of science. So let me close up now. I realize I'm a minute over, but I'll give you your money's worth here and give you some pop wisdom at the end here. So defaults, Andrew Gellman sometimes says that statistics is the science of defaults. Because it's an incredibly deep field and scientists have better things to do, statisticians, applied statisticians, can do a lot of good by providing useful defaults to scientists to start with. They don't have to do them. They're not obliged to use them, but it's nice to have defaults. So let me try to give you a few. Here's, I think, the recipe for doing Bayesian data analysis. Or any other kind of data analysis, really. First step is you define your model. Ideally using some domain specific theory. This is how it would be nice. Rather than Florida slow. Sorry, I'm going to keep picking on that. Then you fit the model. Then you check the fit because sometimes the goal of misbehaves it does not function right. So you need to diagnose whether your machine worked correctly. Then you should critique the model. You look at posterior predictions. You think about how the model is doing badly and some theoretical reason for that. What can you learn? Models always do something badly and that's an opportunity to learn. Yeah, but you make mistakes. It's an opportunity to learn about why the mistakes arise. And then you repeat. And if you do all these steps transparently under the full gaze of your peers they will help you improve these steps as well. And then you're not cheating, right? It doesn't have to be some pristine sort of thing. But of course I don't want to fall into the horoscope thing and tell you that there's any particular way to do all of this. In some cases you don't need to check the fit because the model is super simple and you're always sure it's there. But it's still, it's a good hack to get into. Think of this as your flight checklist for building a machine. And I've gone through this step before so just very quickly to give you an idea about recipes. For choosing likelihood functions there are bad ways to do it. Like what I call histomancy in the book. You should not plot your outcome variable and then ask what distribution it looks like. Never do that. And I know a lot of people are taught to do that. You should never do that because no matter, even if you're not doing Bayesian inference if you're doing regular old frequent assessments the model only assumes that the residuals have some distribution. They don't assume that the whole variable does. It's the residuals after you fit the model that need to be Gaussian redistributed. And you can't see that by plotting the variable before you fit the model. So just don't do it. It doesn't make any sense at all. I know there are textbooks that tell you to do this but this is the manning world we live in. So what should you do instead? Think about the constraints on the variable before you use the values. Even if I told you I was going to measure height you already know a lot about the possible values of that variable before you've seen the data. That's the kind of information you use to choose a likelihood function. There are examples in the book, particularly Chapter 9. So you need to think about what aspects of the data you care about because not all of them are worth modeling. And so, for example, maybe it's just the mean you care about in which case the whole distribution isn't even relevant to inference. And then a Gaussian is almost always okay because it's just a distribution for finding the mean. You need to understand your model as well. And of course you can try different models and do robustness testing. For choosing priors it's actually quite similar but now we've got to worry about overfitting as our dog friend reminds us. I hope you guys appreciate my memes. My memes are the dankest you will find in a mod-club director. We're having a competition. So you want to guard against overfitting. This is the most important thing. No matter what statistical framework someone is using it's perfectly reasonable to ask them, say like, excuse me, that was a fascinating talk. Thank you for all that. What have you done about overfitting? This is a perfectly reasonable professional demand. It doesn't just because they're not Bayesian that does not excuse them. The dog is going to come through their door and ask them about overfitting. Overfitting is a universal phenomenon you should always worry about. In Bayes there's just a straightforward way to address this that is easy to communicate and that is by using regularizing priors. Other statistical frameworks also have ways to handle it and many, many ways to handle it and that's nice. In fact, I think part of the 20th century acceptance of Bayes, increasing acceptance of Bayes in the 20th century was that non-Bayesian statisticians started worrying about overfitting a lot and developing procedures that were sometimes mathematically identical to using priors. And then people were like, well, we can't really complain about priors anymore, can we? Kind of using them already. No, seriously, that's sort of happened. And I think that's great. So flat is never best. You can always do better than a flat prior, but choosing the best prior is really hard. You don't have to have the best prior. You just have to be better than flat. Yeah, that's the idea. And then some of the advice that I spent a lot of time earlier on, of course, as well. Sometimes you actually know things about a parameter. You know things about its constraints. So you can use principles like maximum entropy or the actual information you have about it. Okay, so some seemingly useless results here at the end. I'm going to skip over these in the interest of time. I think it is worth having some emotional orientation towards this material. And to think about what the purpose is, your motivation, so you don't get stuck in the life you're trying to find an asterisk problem. And again, I'm sympathetic. So what should you do, I think? Assume there's an effect and try to estimate it. Don't think about testing. Think about estimation. Did you get the testing for free from the estimation calculations you need to do? Yeah, I'm not saying, of course, science is about testing things, but the estimate gives you a lot more information of which to do a fine test than trying to think about some signal detection framework. So think about estimation and try to get the most precise estimate of the theoretical quantity you possibly can. And that will give you the most transplantually the best tests. You should embrace and propagate uncertainty, accept it. The whole point of statistics is to quantify uncertainty, not to remove it. It's to characterize for your community what the uncertainty is, where it lies, and therefore where future work should focus. Propagate uncertainty means don't drop it halfway through your analysis, which is a very common problem in statistics. Or you just throw measurement error out, for example, and then keep on trucking. And then you get overconfident results, and this is a good problem. This happens in a large number of fields, for example, when people use only one phylogenetic tree to do a comparative analysis. We don't know the tree, but we have a giant posterior distribution of trees and we should use them all. That's what we should do. Fitting is easy, prediction is hard. You've learned that from your homeworks. Yes. It's easy to fit a model. It really is. Predicting the future, that's tough. There is no right. There's only less wrong. Your models are always wrong. They're just golems. Yeah? None of them are correct. You can only get less wrong. And math is not real, only then can it be real. I don't even know what that one means. No. So the intent of this is that math has no direct access to truth. It has no direct access to reality. It's a prosthetic that the human species has invented, aside from the integers, which are from God. But aside from that, is that Gauss? That's an actual code, like the integers from God and everything else is. From the devil or something. But real numbers are from the devil. But math has no direct access to reality. But we can use it to process our beliefs about reality in very powerful ways. That's useful. But it's still garbage in garbage out. Other than pure number theory, we're just modeling all the time. Okay. Let me give you some pointers of places to go in the future. Because this is the most common question I get at the end of courses like this, is what should I read next? And I don't have a precise answer to that. So I'm going to give you a buffet of things that I think are worth your time. And 20 years from now, you might read a couple of these, because you have busy lives. The next base book you should read is Bayesian data analysis. This is Tops. This is the book. And in fact, this is the book you should cite in your papers when you're talking about, I'm doing Bayesian analysis. You should cite this. This is the thing. This is the authoritative applied Bayesian analysis book. It's frequently updated. It has a large range of opinionated authors. Who are experts in particular areas. So different chapters rely upon different expertise. It's a great book. It's also very readable. Fantastic book. If you want to learn more about causal inference, it'd be terrified about whether science can ever learn anything. You should read this book by Judea Pearl and his colleagues. This is the junior version of Judea Pearl's famous book, Causation, which is also a great book, but it's very hard. It's extremely difficult to read. I think it is. Yeah, it's killing you. Yeah, sorry. But it's a very rewarding book. It took me like a year and a half to read it. But it's, you know, your mind burns as you read the book. This is an easier book. It won't make your head hurt. On the practical side, with lots of code, like my book, at least if you're a cognitive psychologist, this book by Lee and Bagramakras is very good. It's got a lot of code, a stand code, for doing cognitive modeling, dynamic models. Lots of different kinds of models. And one of the ways you learn, you exercise your modeling skills is by reading other people's models. Yeah, so it's worth doing. It's a great, super clear book, really worth your time. On the background theory side, I think model building is a, statistical models are a special case of model building in general. So it's worth having some idea about how models are made in general. I keep saying that's how I started out. Stats for me was this thing that happened after I already had a model and then I needed to somehow make data, speak to it, and no one ever taught me how to do that. They just taught me how to run a key test. That didn't really help me get a dynamical model. So thinking about broadly, with minimal mathematics, these are three books that I'm very fond of. And this one on the left is both easiest and broadest book I know. It's an introduction to modeling. And it's even though it has the word field biologists in the title, the subtitle is Other Interesting People. And you're all interesting people, even if you're not a field biologist. But some of you are field biologists, like me. And at least I used to be. And so this is a fantastic book. It tours through different kinds of modeling, agent-based models, optimality models, dynamical models, gives an example of each. There's code. It's a fantastic book. It's dim, it's inexpensive, it's worth your time. This is a history of science book about history of probability thinking called The Empire of Chance. It's not a book that's going to teach you how to do statistics, but it might give you some wisdom about how the attitude for existence has changed historically. It's a great book. It's a really fantastic book. And here's another book. Sorry, the slide. You'll see this at home when you look at the slides. This book is called Objectivity. And Daston and Gallison. This is Lorraine Daston. Lorraine Daston, who's a mox-punk director in Berlin at the History of Science Institute. This is one of my favorite books ever, which is why I put it down here. It's a history of scientific imaging. And you might think, how does that have to do with this course? And the answer is everything, because it's about how scientists represent nature. And as you start to get through the book, it turns out, of course, they're imaging things, which are really just the output of statistical procedures. But we think of them as images. So like an image from a scanning electron microscope. What is that? That's not a photograph. Right? What is it actually? It's a heavily processed thing that is drawn to represent something about nature. It's a fantastic book. It's philosophical. It's historical. It's also got beautiful photos and drawings and flowers in it. It's a fantastic book. It's worth your time. Anyway, on that note, I will stop, and I will thank you for your indulgence. And I hope you found something useful in the last 10 weeks. Thanks.