 Okay. Hello. Sorry I was a little late. I didn't know where this place was. Okay, so sorry that people are on the floor. You deserve it. No. Welcome everybody. Some of you will decide to stay home and watch lectures at some point, but as long as people are happy on the floor, then I would prefer that you're all here, because then you can ask me questions. I'm Richard. Some of you know that. Before I get started with the, you know, the railway here of the lecture, are there any questions for me about the teacher? Anything? Anything comes up? You don't even know where to find any emails and things? We have a lot to do this quarter, so I want to get right into it. Let me start by just giving you a little bit of the history of this course for me, so you know, where I'm coming from and what I intended to do. The simple version is I'm trying to teach the course that I always wanted when I was in grad school and never got. What I got in grad school is I think what most people get in grad school is you're just trying to do science and there's no way to learn statistics, but you must do statistics to have a career. And it was awful and it was painful and I don't want anybody else to have to go through that because it's a terrible purgatory. Right? And it's the reason that there's a lot of mess out there, I think. So what I'm going to try to do is give you a functional whole style of model-based statistical inference that is, I think, becoming dominant in biology and some parts of the social sciences, but there are other styles as well. And the goal of this style in the sense that it's model-based is because you're approaching your problem with some existing theoretical lens already. You don't need the statistical framework to give you hypotheses. So that's why this picture is up here to prompt me for. I study humans in the context of environmental systems and I'm a big fan of irrigation and terracing and things like this as these sort of systems that are compelling to me. This is a Chinese terrace system. You're seeing the reflection of the sky in the ponds here. These systems are incredibly complicated and they have spatial and temporal dynamics and part of the scientific goals of studying these systems is to understand the dynamical systems, the equations that describe the dynamical systems. So you come to the statistics in the sort of problems I work on. With equations in mind, you have models and then there's data and you're like, how do I get these things to talk to one another? And if you take an ordinary statistics course, you will learn nothing about that. In this course, you will learn very little about that. And the reason is because when I started, so I arrived as a sister professor at UC Davis, the people were like, you do math, Richard, why don't you teach the SAC course too? And I was like, I want to teach, you're sure. But I started out as a bottler who did some ecological stuff. And I wasn't, you know, I didn't have a big sort of data science background. But since I do applied math, it's so hard to pick the stuff up. So that's an advantage. But I came at it with the idea that I already had theories, which is different than say that I'll get into some of this today, the idea that you just look at the data and you're just looking for differences between treatment groups. So the first time I started a statistics course in graduate school, two graduate students rather, it was from that perspective. Here's population growth equations in ecology or economics, various dynamical systems things. And we've got some data from the system and we want to contrast the different explanatory models given the data. And that I still think is the eventual goal. But what I found is most people, well, those people are like me, they had had such a bad statistics preparation that they weren't ready for that. And many of you may be, and that's great. But hopefully you will enjoy the pedestrian introduction in any event. So over time, I've slowly dialed back the course a bit. It's still hard work, as you'll see, but you'll appreciate it. Dialed it back to basically reteach regression, because most people have never had a good course in regression. So maybe you had a course in regression, it was just painful, right? It's like post-traumatic stress. The matrices maybe did a Koleski decomposition and then you black out. And I get that. I do completely. So we're going to work towards something more satisfying where we can really match our theoretical models of the systems we study to the data and thinking of statistical inferences and interface between these two. Not a framework for providing hypotheses or simple null hypotheses, but something more substantive, developing predictive models of natural systems. But to get there, we're going to sort of start over. And so over time, originally this course was called Systical Thinking because I wanted to contrast different styles of statistical analysis. And so I, you know, there were like three major contemporary philosophies of statistical inference and I sort of taught them all three and it was a disaster, because it was too much. So now there's only one in Bayesian, because that's my internal narrative in Bayesian, which is frightening, but true. And so that's what this course has become and dialed back. And I think it's a much better course and you'll let me know. But I think it is. So that's the history where we're coming from. And the topical coverage over the 10 weeks, it's fairly compressed, because 10 weeks is not a lot of time. As you guys know, those of you who come from semester systems, you're like, what's going on here, right? 10 weeks, you'll open the door. And that's just how it is, I'm afraid. But in week one, we're going to start this today, I want to give you the philosophy of Bayesian inference and some of the mechanics. We'll move pretty fast here, it'll be pretty easy going because there's a lot of computational work initially. But you'll start to do computations and you will have a homework this week, because chapter three has a lot of R code in it. Next week, you're going to relearn regression, I call them linear models. And it will be recognizable, but you're going to work more with nuts and bolts of things. So you'll, you'll really get it, I hope. And this will cover, cover chapters three and four. Then we're going to look at multivariate regression. By week three, you'll start to seem pretty easy because you'll have gotten your practice in week two, that's the same kind of thing, so just like add some more parameters. It all makes sense as you get there. And then week four, we're going to do model comparison and information criteria, because you'll have learned a bunch of modeling and now there'll be multiple models and you'll need to do something about it. And I'll say something more today about why we want to do that. And then I'm going to spend a whole week on interactions, interaction effects, because I think it's often, let's say nature is full of interaction effects. An interaction effect is when the effect of one manipulation depends upon the value of something else in the system. And natural systems are full of conditions like that. And I often think status quo is underplay interaction, because it's mathematically easy, but it's conceptually very difficult. So we'll spend an entire week on it. You will feel really awesome at the end of that week, because you'll be so good at the coding at this point. So basically, there are no, the coding tools you learn next week are going to carry all the way through to week five. And you're going to feel awesome really well. Those of you taking this for the second time, you can nod in a permanent way, and I'll tip you later. Then week six happens. And there's like a transition. We've got to learn Markov-Tay Monte Carlo, because we're going to need it to fit the more complicated models. And then we transition to what many people think of as Bayesian inference, which is how you fit the model, which is I'll try to convince you today not what Bayesian inference is. It's interpretation of probability instead. And then we'll learn generalized linear models, which are incredibly useful. We can take these as nonlinear regressions, counts, mixture models, multi-level models or mixed models. And by week 10, you guys will be so awesome, we'll do things like incorporate measurement error and all kinds of uncertainty into our models who never have to create averages in your data again. And we can even impute missing values. I'll show you how to do that, which is not naughty. It is more naughty to drop cases than it is to impute missing values. I'll try to convince you. That's our objective. I'm leaving a bunch of stuff out here, but along the way I'm going to put in little stubs to link you to other things, like biogenetic regression. This is a special case of something we'll get to in week nine called Gaussian processes. We'll talk about there, and I'll try to link this to it as we go. Alright, that's the goal. Mechanics, before I get into material, try to put everything on the website. The notes are there. I'm recording this lecture right now. It seems to be functioning because there's a audio meter that's flipping around as I talk. I will export this as a slide cast, so if you decide you prefer to stay at home or you just like to hear me over and over again, you can do that. I have a soothing radio voice, I'm told. So the homework every week is meant to be a group project. What that allows me to do is assign realistically difficult homework so you feel the burn. We work as a group with other people. These are homework problems which involve analysis and inference, so they're not toy problems, but you work with them in groups because that's how you're going to do science. The great thing about teaching PhD students like yourselves is you're all wonderful people and you really want to learn this stuff, so I don't have to worry about cheating. If you slack off, I mean you're only great for yourself, so I don't do any policing. You're going to build team skills too by working on these homeworks together. Everybody has a different problem with the homework, so when you work together you all get warm clothes. You'll feel good about it. So you'll submit them to your Dropbox and Paul, who's around here someplace, Paul is on the floor, he's the reader, you will be grading these. So if you have a problem with the grading, he will mainly check them off. Police submit them though for Paul's sake in PDF form because Microsoft Word, even the same version of different computer's images don't always show up correctly, that's just an issue, so you just export it to PDF. It'll work out great. Or if you do everything in a plain text script with your R code and comments, that's okay too. We can handle that. Final exam is going to be Take Home, you get a week to work on it. This is the only part of the course you do by yourself, although you can ask me for help and it'll be due one week later. If you are going to the Grand Canyon, is anybody here going to the Grand Canyon in the last week of the course? Yeah, okay. If you're two people, alright, so that's cool. We can figure out a schedule for you to do the final exam, so it doesn't matter if you were the Grand Canyon thing, okay? We'll get closer, let me know. There's always somebody going to the Grand Canyon at the end of this course. I wish I could go. It would be fun, but there's no way I could get out of the mechanics of my life. My life is a big machine with grimy beers, and by leaves, the whole thing explodes. So, final exam, you're great. Everybody gets an A because everybody works awesome and hard in this course, very rarely are there any great problems, but it's half and half. I just want to say, this course is a lot of work, but you're going to learn a lot, and I think the several hundred people who have taken this course before, either they have Stockholm syndrome or it's a good course. So, but I do want to caution you, there's this experience early on in the course where, so this is my cartoon of the gain in knowledge, level of knowledge against weak for an average student. So, initially there's this incline, you're walking up hill, so the burn is the slope of a line tangent to the curve at any particular point, right? That's called the difficulty, your sensation of difficulty will be the tangent line at any particular point. So, initially you feel a little bit, but it's not so bad. And then, typically in week two, it feels like you're climbing El Capitan, and it's going to feel bad, but you're going to get, someone's going to push you up over this hill, and you'll be all right. Usually it gets steeped there because you have to learn R at the same time you're learning statistics, and some of you have good R backgrounds and some of you don't, but you get past it, you just struggle through it, and it'll be great. And the evidence of that is people have taken the course before, so that's why I put this up. Your planning is, it's going to burn right there. If it doesn't, you're wonderful, you're fantastic, you're a superhero, and just keep going, or rather push your colleagues up over the hill, is what I'd ask you to do. But you get past that, and then there's this clean sailing, until information theory, and then you're going to feel some burn again, but it'll be great when you come out of it, because you know information theory. And then it's never quite as difficult as it was before. But you're, at the end of this, your altitude will be a lot higher than it was at the beginning, and you will be rock stars, or whatever you want to be. Country stars, if you prefer. Sorry. I don't mean the privileged rock music, there's nothing special. Anyway, sorry, I'm an anthropologist, as soon as I like privilege one thing, I have to stop myself. It's a disciplinary thing. Okay, so let me, let me begin the course with a metaphor. There's a lot of metaphor in this course, because I'm trying to help you guys learn by establishing emotional connections in material through metaphor. And the one I want to start with is designed to get past this, sort of, there's this or about mathematical topics, which I think comes from the way math is taught in elementary schools, that there's one way to solve every mathematical problem, and if you do it any other way, you will fail the test, right? Your teacher will penalize you, sometimes even if you have the correct approach, but it's different than the one you were taught, you get penalized, right? This happens in elementary school over a number again. The truth is in applied math, it ain't like that. This is like, statistics is like engineering. There's not one way to build a bridge. There's lots of ways to build bridges, and some of them are better or worse for other cases, but in many cases there are many aspects of the design that we don't comprehend when we do the planning, and so engineering is mainly a history of bridges falling down, and figuring out why they fell down, and making better bridges. And statistics is very similar to that. Math is involved, but that doesn't mean there's only one solution. For each experiment, there are many legitimate statistical analyses, and there are even more bad ways. And that's much like engineering, right? Many ways to build an airplane, many more bad ways to build an airplane. So let me give you an alternative metaphor. One that's more linked to engineering, but has the kind of monstrous nature that I want to have, and that is the legend of the golem. Some of you have heard about golems before for folklore, but in case you haven't, golem is kind of an ancient legend from the Abrahamic communities, now what we call the Jews, where it's a clay construct. I think of them as clay robots. It's the first kind of folklore robot, and it comes from this Hebrew for shapeless mass, and in the legends of this rabbis who were steeped in the kabbalah, the rituals of this could give life to inanimate objects. This was through simple recipes. You get a ton of clay, for example, form it into humanoid shape. You put words on it, and like in many cultures, words are endowed with magical effects, right? I don't think they are, right? Because I can say snake, and if you picture a snake, you really don't want to. So words are magic. And in the classic golem legend, the Hebrew word for truth is what's inscribed on its brow, because that's what animates life, is truth, or that's for it. Now the thing about legends of the golem though, is the golem is inherently dangerous. So in the most famous legend of this, which comes from Prague, and I start the book with this story as well, the rabbi Yuda constructed the golem to defend the Jews of Prague from persecution, from blood libel, and it was able to do that. But the thing about golems is they're not really smart. They follow instructions, but they also break stuff, like Prague. And so it's a very powerful thing, and if you don't treat it with extreme caution, they can also break things. And this is the metaphor I want you to have in mind with statistics, which may seem like I'm trying to undermine what I do for a living, and that is correct. It's exactly what I'm trying to do. And I think there's this casualness about statistical inference. We need it to learn about the world, because we study complex systems. But at the same time, there's this casualness as if we can just, you know, not invest a lot of effort in learning about how we're doing it, and processing information, and everything will be okay. And I don't think it will. So these things, I love this quote that seems to be from, there was really a rabbi Yuda. I don't think he actually made a golem, but he did actually say this, as far as I can tell, that even the most perfect of golem risen to life to protect us can easily change into a destructive force. Therefore, let us treat carefully that which is strong, just as we bow kindly and patiently to that which is weak. So there's power in these things, and that's why we're attracted to these sorts of constructs, like statistical models. But that also means we have to regard them as something other than magical. They don't have direct access to the truth. They're automatons, and they're bumbling, and they don't understand our intent. What we wanted them to do, they're just slaves to their instructions. And so there's a lot of burden on us to be careful, and we can't offload responsibility onto these tools. So just to create the parallelism to understand why I want you to think about golems, models as golems in a sense, or some sort of robot, a clunky organic robot. Well, there's the silly part. Golems are made of play, models are made of, well, we write them in computers now, so they're made of silicon, that's the best I could do. Both animated by truth, in the sense that we are motivated by trying to figure out what's true, the true state of the world, and that's why we can create constructions like this. The golem is powerful and legend. Models are hopefully powerful, they're not all very powerful. Some of them are terrible. We'll do some terrible ones. Both are blind to the creator's intent, in the sense that when you deploy a statistical model it's not smart. It doesn't pay attention. It's vastly dumber than you are, even though it has powers of analysis that you don't have, that none of us have. This is weird cooperation that goes on between them. It's easy to misuse, and there's this thing about models is famous saying from George Box that all models are false and some are useful, and that's a good saying, but a better way to think about that is that models are false in the same way that bridges are false. It just doesn't. It's a category error if you even label them as true or false. So there's this thing, and it has behavior in the world, and we need to understand its behavior to use it properly. It isn't that it's true or false. It's a tool. No one speaks about a hammer being false for making a table. And so the statistical models are hammers. They're appropriate hammers for your particular job, and that's the way we're going to approach these problems. And one of the reasons to create this metaphorical cultural wrapping for our introduction to statistical modeling is that I want to argue against the classical view of what statistics is, which is it's a way to test hypotheses. That it's kind of the crisis point of every scientific study is a hypothesis test and it results in an asterisk or not, and then you can publish it or not. Right, and I don't think that statistics is actually very good at testing hypotheses, even though quantitative rituals have risen up to do that. Mainly outside of statistics, I should say. Sanitisticians have been have been primarily cautious about this, but once statistical methods get losing the sciences, they evolve. They're selected to sustain your career, and as a consequence they're very good at producing p-values that are small, and I do think it's a selection effect. So one way to think about this is at some point you've got to leave behind teaching a bunch of these classical methods that have been eclipsed by more modern things, and so at the beginning here that's one I was going to use to do. You can take a bunch of really good stats courses that teach these classical methods, and these classical methods are useful in fairly narrow contexts. Most of these methods are thought of as procedures or tests. Things like the Wilcoxon rank sum tests and p-cests and all those things. Each of those is useful. It's a little goal. It invites a model, it has particular assumptions. It's only useful in certain situations. Most of them are not terribly powerful because they were developed back when, well, nobody had computers really, for the most part. That's not a slight against them. They're just golems from a previous era. But at some point it's too much to ask you guys to learn all of these things. And when your elderly colleagues use a Wilcoxon rank sum test to just say, okay, what are the assumptions of that test? At some point you just have to forgive yourself for not knowing that stuff. You don't need that anymore. There's an archival reason to know about those things, but we can do a lot better now. So we're going to shuffle past that burden and instead try to replace this with another way to think about it. And that is, these tests, I think a lot of the motivation for thinking of statistical inference as a collection of tests is because science is supposed to advance through falsification of hypotheses. So I want to spend a little bit of time now talking about why falsification is a criterion that demarks scientific hypotheses, but it's not a criterion that advances them. And that is the consensus and philosophy of science. This will be news to scientists because usually all you ever hear about is that Popper proved that science advances through falsification. And that's not what philosophers are science like. It's not what Popper thought. Falsificationism was about demarcation. What is science? It's not about how science works. Let me spend some time talking about that and why falsification or what I call Popperism is not a realistic objective for our golems to satisfy. So I define Popperism. I put the ism on it to distinguish it from what Popper actually wrote. I'll have some quotes here. The idea that science progresses by logical falsification of hypotheses, in particular null hypotheses, therefore statistics should aim to falsify. Otherwise it's not scientific. There's a consensus in philosophy of science and philosophy of statistics that this is not correct. It is a folk philosophy developed in the sciences and I think it should be abandoned. So if any of you hold this, it's not your fault, right? You didn't invent it. You were tied at some point. One of the psychological things about this I think is kind of productive is it puts a burden on individuals and individual procedures now. So if you don't get the rejection of a hypothesis that you wanted, you did something wrong. Somehow, right? Because the method is impeccable. This is the right thing it's supposed to be doing. Instead, or on your individual study, the burden is on you. Instead, science is a process of cultural evolution that takes generations to figure out how systems work. The individual burden on you in your career should not be seen to be some gigantic logical problem that you have to be able to solve it. Sometimes we just can't measure things precise enough to do these issues. So we need better tools. So let me focus in on a couple of the logical reasons that falsification is in practice impossible. It's not the kind of thing that we can really do here. Let me give you an example that's from population biology. Apology says social sciences in the room. I love you guys. I'm half the time a social scientist myself or like a quarter of the time I'm a social scientist, I guess, increasingly less. But I think you'll still get what's going on here because there are analogous sorts of problems in the social sciences. In a generation past in evolutionary biology and population biology, there was this big debate over neutral evolution. How much selection mattered in explaining genetic diversity at the molecular level. And this is most often identified with Motokumura, population genesis from Japan who did an amazing amount of work actually understanding the mathematics of neutral evolution and its consequences for molecular variation. So let's think about this case where in what Kumura and his students did is they went out testing the null hypothesis that selection mattered or not by looking at the expectations of this neutral model and then looking at frequency distributions of alleles in populations, mainly human populations and asking whether they could reject the null model or not. If they didn't, they concluded that evolution was neutral and they did that over and over again. Meanwhile, lots of people here at Davis like Gillespie screamed no, you can't do that. And I want to show you why that doesn't work, that logical inference doesn't work. So let's take H0 is this hypothesis evolution is neutral and I want to make a distinction a tripartite distinction between hypotheses which are nearly always verbal and kind of murky. They involve lots of sort of unstated assumptions and it's vague. It's a statement like evolution is neutral. And then there's a process model which is typically mathematical and sometimes people just skip this step entirely as you'll see. But in Kumura's case, he didn't. Much to his credit, he made a process model of we make assumptions about the structure of the genome and how mutations work and you can generate from that the logical consequences of the model. In particular, there are assumptions. There are many process models and I'll show you in a second that may correspond to the state that evolution is neutral. As soon as you get to the process model it's different than your hypothesis already. This will make sense when I give you an example in a second. A process model can then be made into a statistical model which would usually there's some distribution of frequencies of observations that are implied by the process models and you look for those as evidence that something is consistent with the process model. Now the problem is and a lot of this went on here at Davis by the now retired Gillespie who was in EVE. The alternative that people had in mind was that selection matters which is even more vague as an evolution is neutral. Why? Because selection takes many different forms and all population biologists know this. Crazy number of different forms. People who do artificial selection know this even more because they create forms of selection which are never found in nature like trachation selection. Like none of the cows with less than a certain percent of rump fat get to breed. Right? I don't think that ever happens in the natural system but it happens in farms. And so selection can matter in a bunch of different ways and so there are a bunch of different process models that correspond to the data hypothesis and they can differ in quite subtle ways as well and they can correspond to their own statistical models. Now here's the neat thing. So what Gillespie showed is that if you have selection in a fluctuating environment it'll produce the same frequency distribution as Camero's neutral model. And this became something of an industry for Gillespie every time Camero would create a neutral model unless we would find a selection model that mimicked it. But this could be done in lots of systems. And as the course goes on and it gives you some idea of why this happens as a consequence it's only called information entropy or entropy aggregation. The statistical models are always going to conform to a bunch of different process models. In the physical sciences they often call this the inverse problem that if you've got some phenomenon in nature and you're trying to figure out what caused it there are going to be a bunch of candidates that are consistent with the data. It's just a necessary problem. This creates lots of logical difficulties. And let me fill in the final part of a figure here. Evolution is neutral is also vague. Camero's version was that you have an equilibrium population size. And that's a very special assumption. So if the population is growing or shrinking or saying that the population size varies through time which it does in natural populations at least the ones I study then you get then you get yet another distribution that would also be neutral. So even if they rejected the expectations of this model that doesn't mean selection matters. The world could still be neutral. So nobody wins from this. Everybody's in pain. How do you avoid this? Well, you need multiple models and meaningful non-null models non-null models and you can contrast them. And in the form of data you're intending to collect can't distinguish them like in the case of Gilles' fluctuating selection model Camero's usually delirium model they make the same kind of frequency distribution then you need a different description of the data. And that's how this empirical literature evolved. As soon as people realized that they looked at the data different ways. How do you want to say? Across space and time these models don't look the same. And when you look at them temporarily the fluctuations in frequencies through time or through space Russ Landy has done some interesting work on this in butterflies in the Amazon it's really nice work. Then it's clear Camero was wrong or we should say Camero was wrong his model was wrong. Camero did great work and I'm not trying to diminish his impact it's a very productive career. It's just that there was lots of logical tying and knots because of this idea that you can do good science figure out how nature works just by rejecting the model it's more accepting it. And I don't think that's usually true for any interestingly complex system. Ecology has recently gone through a rehearsal of this with the neutral theory of ecological communities right the Hubble stuff the ecologists in here know I'm saying. Again very productive but it's a shame they couldn't have just borrowed the lesson from population genetics and figured out already that actually niche model non-niche model they kind of make the same predictions maybe we need another kind of model and that's where this it is gone now. This just makes some sense I do think science works by the way so don't be depressed by it science still works it just doesn't work the way maybe you thought it did science does work sometimes that is sorry that was supposed to be a pep talk I do think science works it works it's not always efficient but maybe it can't be okay the other thing the logical appeal of of Popperism of the folk version of Karl Popper's philosophy of science I think arises from how simple the argument is and how compelling it is and it comes from this Syllagism called modus tolens which just lacking for the method of destruction I think right as a tolens is like destruction and this is method the idea is we have some hypothesis which we're just going to call H that hypothesis generates a logical implication D I'll fill out this with example in a moment then if we observe something other than E we can deduce that something other than H is true if we observe D we can't infer anything right because the idea is lots of things could produce D this is what called modus tolens there's a logical implication of H for D but other things could also imply D so you can't know that H is uniquely true and this filters into the traditional statistical hypothesis testing framework in this idea which is rarely observed I should say is that you can only reject the null you can never accept it however people accept the null a whole lot right I'll have some examples later on in the course whether that's naughty or not is actually a difficult thing to talk about but from the modus tolens argument it's definitely naughty so let me give you an example that philosophers of science have used routinely in talking about modus tolens suppose the hypothesis is that all slowns are white some of you know this story Europeans used to think all slowns are white because in Europe they are like lots of things in Europe I'm an anthropologist so it's like whenever I can make fun of white people I will and so it's likely it's like an obligation in my career but so then we go to Australia though we find out that there are slowns in Australia that are black and it only takes an observation of one black slown to falsify this hypothesis right no matter how many white slowns you observe you can't say anything about the truth value of the hypothesis and that's the power of modus tolens and this is completely appropriate in this case but let me reveal to you why this is not in general what we're dealing with in science there are two things the first is that measurement matters usually we're not sure what kind of color of slown we've seen quite often at least we're interesting problems you guys are in at an r1 university in labs with famous people and you're going to do awesome super science right and because of that you're working at stuff that's hard to do no one's quite figured out how to measure what you're doing and there's maybe a seed bank sorry that the botanists are in here and they're really seeds and you don't know how long they've been there 200 years sometimes right and that screws up your whole system you're not even sure how to think about it and it might take generations for you and your students to figure all this out you know that's okay because that's the only way to make progress in physics it's the same thing how do you measure the mass of some subacombic particle that may not exist well this is a hard problem and you can spend decades building detectors and then more decades diffusing them and then figuring out that french trains passing miles overhead mess up the detector and okay that's actually happened with CERN and so measurements difficult let me give you an example that I think is easy to think about though some of you will know this story so lots of the indigenous fauna of north america has been driven extinct since the arrival of europeans they're actually the arrival of people in the first place there's like multiple ways of extinctions but one of the sad extinctions I think is probably actually extinct is the ivoryville woodpecker which went extinct in historical times and yet it was it was possibly rediscovered in arkansas 2004 there's a badge that people who believe these are the believers and this is the best photographic evidence circle but there isn't actually a sasquad it's more likely to exist than the ivoryville woodpecker they now have these swamps littered with camera traps trying to find more evidence of this bird people are going to believe in this because if now it's possible though I have to I have to acknowledge it's possible that they're just a small number of them and then they're out there and it's hard to detect them because I've worked with camera traps before too and there are lots of things out there that never trigger them and it's it's tough it's a sampling problem so we don't know and this is hard to deal with because the evidence is quite ambiguous most columns cannot be applied here you would say we only need one example of the ivoryville woodpecker to just prove the hypothesis that's extinct and yet we can never be sure with one observation that that's what you actually saw because there are other woodpeckers that kind of look like it and that's a bad photo and this stuff happens quite a lot I think and as a consequence sociologists of science talk about something called the experimenter's regress just coined by Collins and Pinterest these two sociologists of science best known for the book I have on the slide here the experimenter's regress is that be seen in this two by two table if you believe the hypothesis is true and you observe D you're like I win right but your colleague who doesn't believe it is going to say that was a measurement error and if you believe in H and you don't know D you're going to be sure it's a measurement error and you're going to check all your instruments you're going to try again or you can put more camera traps out right and you're going to find this thing and you know what that's how it should be because sometimes it was a mistake and history of science is full of examples of false negation of the hypothesis because the measure of the mistakes so but sometimes it really isn't fair and so this experimenter's regress is part of the the Sturman's Rang of real science that is its argument its argumentation with rules and that is how that is what's carl's poppers actual explicit philosophy of how science functions so you have to keep this in mind nevertheless we do reach consensus eventually last thing to say about measurement measurement mattering the other class of stuff aside from observation areas that there are lots of interesting hypothesis which are continuous and then discrete logic like modus tollens just doesn't apply so imagine a whole class of hypotheses like most swans are white or we'd like to estimate as precisely as possible what proportion of swans are white well now modus tollens isn't going to help you observing how the last one doesn't how do you use that to update your estimate of the proportion of swans that are white that's what you're going to learn in this course because the continuous version of logic is Bayesian inference I'll say that again the continuous version of discrete logic is Bayesian inference and that's all it is and it's a consequence now logic is great because logic is disciplined and it's objective everybody has to do it the same way but also logic is terrible because the assumptions have a huge effect on how good it is and if the assumptions don't match the real world it's so good for you logic is garbage in garbage out right but it's a really powerful tool and so Bayesian inference I'm going to present it to you the same way it's logic for continuous connections and it has a uniform and optimal way to process information but it doesn't admit simple falsifications as a consequence and that's frustrating and I understand that frustration so let me try to summarize and then we'll do some more statistics strict falsification is not possible in realistic scientific settings both because first thing I tried to convince you of is hypotheses are not models hypotheses like evolution is neutral is not a model and to compare things to data we need a model and models are more special than hypotheses that inspire them and typically more than one process model will correspond to the same observations so we need comparisons we can't isolate individual quantitative models and compare them to data and make very good progress in most interesting systems and second there are always debates about measurement and there should be because everybody makes mistakes one of the examples that was in that Colin and Pinch book which I like quite a lot is maybe you'll know the story of Louis Pasteur proving that only life begins life that rotting meat by itself does not produce maggots turned out to be really experimentally difficult to prove that because it's hard to purify the air and get all the mold out of it so if you leave you know if you have sugar water there are spores everywhere right you know you're just dusted in spores just everywhere and yeast and everything in the room and microorganisms around the world and it turns out Pasteur one illegitimately he hadn't actually purified his stuff he had just accidentally put an exterminant in solution and so he won by vote to the French Academy and actually he should have lost because he couldn't purify a solution it was not a clean experiment and now he was right I don't believe that maggots spontaneously or mold spontaneously generate from rotting meat but it turns out to be hard to do these things so we need we need to argue about measurement it's just part of the process falsification does happen but what I want to say is it's consensual it's not logical communities scientists argue for consensus about meanings of evidence in light of hypotheses and in most of the big successes in the history of science no hypothesis significance test has had nothing to do with the advance of important theories right how do we figure out how the solar system works I don't believe there were t-tests anywhere involved and in fact if you did if you subjected Kepler's laws of motions to a t-test you would have rejected them because they're wrong as are Newton's laws of motion right as are Einstein's equations but they're good they're very good for getting approved to Mars right and I want to say Karl Popper's emphasis on falsifiability was nearly always not always but nearly always about demarcation that is trying to draw pills off the line between what is science and what isn't and this was a big mission in 20th century it's now mainly a snooze vest for most of us right or maybe California knows but but you just have to realize that it was about demarcation and not method so I want to give you a quote on the right-hand side of the slide about what Popper did think about method this quote is from his last book called The Myth of the Framework where I highlight the relevant part the method of science is that of critical discussion you thought there was lots of stuff that goes on in science and it's actually there are a set of norms that the community describes to you that keep us in check and there's a lot of policing involved but that it admits a lot of confirmation is just as important as reputation in Popper's general view of it okay and of course a lot has happened in philosophy of science since Popper a lot okay well we want to do real engineering so I want to give you guys a rigorous introduction to making statistical models that you can contrast in light of data to help you understand systems we need some kind of framework and there are a bunch of different frameworks so applied math is not like pure math where you get the idea like a number theory there's only one thing that's true addition has one unique definition and things like that and applied math is not like that anymore and applied math you typically work by you state some principles of inference that seem rational and then you apply them and hopefully the consequences are good if they're not you stop doing that you look for different principles and that's how statistical inferences there are different schools of thought that describe the different axioms different principles of inference and the implications of those inferences before we're coherent way of making decisions but those different frameworks don't always agree and the different traditions of statistical inferences are like this they're all useful they all work that's a non-basic statistic works great too and Bayesian statistics works great and they each have strengths in different areas I'm going to teach you the Bayesian view because I think it's the most general view you can often understand non-Bayesian procedures as special cases or approximations to Bayesian ones you do the other way too it's just more awkward why because as I'll show you try to argue Bayesian probability is very permissive it's a big umbrella that takes in a lot of concepts whereas the other views of probability are more restrictive so we're going to I'm going to try to teach you one coherent way of doing mathematical modeling but I want to emphasize from the start this is not the only way to do it because there is not one true way to learn about the world there's probably no true way to learn about the world and we're just universe is hostile to life and it's a miracle that we crawled out of the oves and built the university right so three part Bayesian data analysis multi-level models which are all the rage I'm told because people kept asking me to teach them and no and they're really useful because they're a gateway drug to building really fancy engineering with pieces of models and model comparison and information criteria which gives you a formal apparatus for contrasting different nominal models to one another in light of a common data set which is why they become so popular in biology so let me give you quick introduction to this in the notes I give you a lot more history and citation for the background this is the kind of stuff you want to glaze over right now and then after the course is over you come back in a relaxed mode and then you'll get after you actually know how to execute it the philosophy will make some more sense I'm afraid that's just often how it is for most students but let me try to do the best job I can to historical introduction Bayesian data analysis can be done a bunch of different ways and I'm going to teach you a particular school of it which is the logical version of it which is as I stated earlier Bayesian inference is just the continuous version of discrete logic there are other interpretations which use the word belief a lot about beliefs of people and rational agents and that's not the version I'm going to push on you guys because I don't know what you believe and I don't know what I believe but I do know what my model believes right because I programmed it and that's the only thing I can talk about and you know what my models typically believe some crazy stuff and that's when I buy a change them so I don't like the belief thing because it makes it seem like it's normative and you should adopt the inference of your model I think you are actually the supervisor of your crazy golem that is potentially wrecking Prague and that's the way I want you to think about it instead Bayesian inference uses probability to describe uncertainty in that sense it's the logical extension to continuous conjectures of discrete logic and I'm going to use the term plausibility to refer to these probability assignments that it makes and it has a unique way of working which will derive and I tend to identify the origins of this with Pierre Simone Laplace although usually credit it to Bayes and Bayes did have some prominence but Bayes didn't really develop it as a data analytics strategy Laplace did so I think Laplace didn't know about Bayes when Laplace started working on it too so there's a primacy fight here right which goes on one was British and one was French and you know they were at war so there were fights I don't care about that it's clear that Laplace contributed way more to the modern practice and what I want you to notice is when Laplace lived in the in 17 and 1800s this is long before what we think of as classical statistics was developed based upon sampling theory which is what most of you probably learned and I'm not going to bother to talk too much about in this course Bayesian inference is older but it went through a hiatus because the British didn't like it the British developed sampling theory I think it really was basically that and excluded it the thing about Bayesian inference I think one of the reasons it went on a hiatus though is this is computationally difficult relative to sampling theory so until modern microcomputers became quite extremely powerful and we had a fancy algorithms like some of the Markov-Chain-Monty Carlo algorithms you'll learn later on in the course it wasn't practical to do it now it is now your phone can do really fancy Bayesian analysis quite easily if you have a smartphone right and your smartphone is more powerful than all the computers that send people to the moon so not that it could send people to the moon you know what I mean and so this is one of the resurgence since I was a graduate student in the 90s and in the second half of the 1990s and I remember when Markov-Chain-Monty Carlo hit the stats community and people were like WTF we can fit these models now oh my god and then suddenly everybody was writing Markov-Chain-Monty and it was a revolution it really was and now and it at least in biology especially in phylogenetics it had a massive impact it reformatted the way everybody did data analysis really rapidly also in genomics and all that stuff is very daisy now this is the reason people like to take this course social sciences have been slower to get on board because they don't have typically theoretically inspired models as often right so you know like what's the model for how people determine what brand of soap they're going to buy right so people just do tests for that stuff and maybe that's fine but biologists went crazy for this because then they could fit their actual theoretical models in the data now and we're going to be working towards that goal and then I want to say the history of this is it used to be controversial so Ronald Fisher did a tremendous amount in the development of the neo-Darwinian synthesis and is a real intellectual hero fine but his statistical stuff I think is mainly a wash and he thought that Bayesian inference was well just mistaken and it's most influential saying that it must be wholly rejected well he was wrong about that he was right a lot about genetics but his his statistical impact I think is going to fade rapidly as time goes on instead I show the bottom right here is contemporary Harold Jeffries who was a geophysicist he's credited with discovering the internal structure of the earth by studying seismic waves discovering the solid core and all that stuff and then his wife was an early quantum physicist versus Swirls who was called the Lady Jeffries but she was vastly smarter than him did quantum mechanics really odd but it was a big deal actually in quantum mechanics and together they did a lot to keep the Bayesian thread going from Laplace forward and Jeffries argued a great deal with Fisher and Fisher was never persuaded and Jeffries was never persuaded and you know they both died happy I think is how it goes but so even I put this here not to say that statistics is a mess I mean it is but it's not uniquely a mess I mean just because there's math involved doesn't mean that everybody agrees upon how to do things and there are fights and I figure in the long run hopefully what we now what I'm going to teach you guys will be eclipsed by some larger framework that includes what I'm going to teach you as a special case and that's what I would like to be true I would like to think that we're only just begun to organize our thinking about how to process information and learn about nature so let me give you the quick contrast here the frequentist view of probability so I said Bayesian probabilities is just we use probability to describe plausibility of different states of different possibilities about what is happening in the world this is in contrast to the frequentist view of probability where probability is a limiting frequency of events in the world so if we had an infinite amount of data a probability describes the limit of the frequency of some event in a collection all the uncertainty then its statistical inference nearly all of it arises through what we call sampling variation that is if you have a bunch of different samples and you construct some statistic from each of them then the distribution of that statistic across samples is a measure of the uncertainty in your estimates and this works really well there's a lot of statistical procedures which we're not going to use in this course but which many of you have already seen which are very powerful and can make use of this problem is it's way less general than Bayesian inference so let me give you a case it works great from what Fisher used before which is agricultural field trials where you can run a bunch of them and you get sampling variation among them for like the productivity of wheat under different fertilizers but there are lots of things in physics where the sampling variation I need doesn't make any sense at all let me give you my favorite case this is Saturn as Galileo saw it so you know the story of Galileo use some primitive telescopes seems the first person to gaze upon Saturn and really sketched a notebook and the sketches of this notebook look vaguely like this I got this by taking a modern picture of Saturn and blurring it but this is what his nobody sketches look like and so are there rings? no it's like a Mickey Mouse like a celestial Mickey Mouse or something it's you can't quite see what's going on if you want to what's called degoss deblur this image what do you do so the question the statistical question is what's the true image there's uncertainty about what the true image is but that uncertainty is not comprehensible as sampling variation there's one image and every time you look at it it's going to look the same as long as it's in the same because of relative positions right so sampling variation does not extend to these sorts of issues and in general it often doesn't so I have this this joke in the notes about about the diversification of songbirds and the Andes well it happened once what's this what's where we're going to rerun history and get a new sample what does that mean now we're going to talk about time travel and pretty soon we're going to be watching Dr. Who reruns and we're still not going to be figuring out how to do the statistics of re-sampling the diversification of songbirds and the Andes the Bayesian approach to probability progresses just fine and there's no obstacle because we can talk about the uncertainty using relative plausibilities and the logic works fine that's again and this isn't to say that frequent statistic doesn't work it's very powerful it's unreasonably powerful even in cases where it's in logically it doesn't apply like the diversification of songbirds and the Andes I think often there's work there that works really well despite the fact that it isn't coherent to say that we're going to re-sample songbirds and the Andes right it still can work which I think is interesting and worthy of study but I thought this is the clearest case I could give you so one way to think about Bayesian probability is the probability the uncertainty when we talk about probabilities in a Bayesian framework is always a product of our incompleteness of information if we had total information about about the world there would be no probabilities because we know what was going to happen right so coins are not inherently uncertain they're governed by physics mainly Newtonian mechanics it's just that it's a chaotic system the initial conditions when you flip the coin you would need incredibly precise measurements of everything about the initial angle of velocity predict which side is going to land but there's nothing about the coin which is inherently random the randomness is in us in our state of knowledge that we don't have precise enough measurements to make the prediction right it's not the randomness is a property of us or our machines our models and not of the world now this point someone brings up quantum mechanics and and maybe but I just want to remind you that the interpretation that interpretation of quantum mechanics has never been settled and there's not yet a single experimental result which distinguishes between me fundamentally random and fundamentally characteristic interpretations of quantum mechanics just keep that in mind but if it turns out to be fundamentally random no one will be happier than me right you know I'm an anthropologist so things turn out to be fundamentally uninterpretable my field wins right I say that for the other anthropologists in the room because only they will get the joke the rest will laugh nervously like what but so I will come back to examples of this as we go you just want to think about the probability as a measure of the model you've made your golem to continue with the metaphor for the moment about what it doesn't know and if we could measure more and more stuff about the world we could get down to deterministic models in principle okay so that's Bayesian inference you're going to be learning that in embodied ways you as you develop your model fits the main tool we're going to be working towards is multi-level models these are models with multiple levels of uncertainty in them there are lots of different ways to describe these models so when we get to this part of the course I will go through several of them help you understand them because again they're all metaphorical for now I want you to think about this as exhibit says we can take a parameter in any model or replace it with a model and so it's like as I joke in the book it's turtles all the way down what's a parameter well it's like a placeholder for something we don't know why don't we get a model of that thing we don't know what do we do we just replace the parameter with that model and you can Bayesian inference does this really well because it's very good at propagating uncertainty up the chain of inference information moves at the speed of light in all directions and Bayesian models and sorry that may sound intimidating but it'll just your calculations will do it and so this is the premise of multi-level models where we have some parameters and we want to know where they come from and lots of things can be lots of useful models can be thought of as multi-level models the common for a scientist's perspective we usually want to know what they're going to do for us so I give you that risk now often used especially in biology social sciences when you have repeat and in balance sampling so if I sample each of you what you do during the day a number of times but I have more data for some of you and others multi-level models can handle that imbalance due to logical estimation of what's going on whereas traditional models don't you can study variation sometimes variation in a population is the thing of interest and lots of classical models have a hard time with that because they don't represent the variation you avoid averaging it's very often that people get multiple samples from say species like body weight and they construct an average and they plug the average into regression what that's naughty I know you can learn the right thing that way but it's naughty a statistical term because it throws away uncertainty right there's uncertainty in the average body weight of the adult female attack so with multi-level models you don't have to do the averaging you can put them all in there and then the higher level model uses the inferred average with all of its uncertainty and this is only all this does is make your statistics honest and makes them more conservative and it's not hard to do we'll do it in the last week of the course phylogenetic models factor analysis path analysis network models spatial models all can be thought of as examples of this sort of approach or you're stacking together other simple models and I think for a lot of students in my experience once you kind of get this basic trope of mixing together like a record sets different pieces of models there's this change in your psychology about it you become good engineers about these things so I hope I want to work towards that okay last sort of frontiers I call it it's model comparison I hope to convince you that most of the time not always but most of the time simple falsification approaches aren't very productive in science we need meaningful non-null models and we want to contrast them with one another in order to do that we need some mechanism to deal to do the contest and this is going to take this is going to be all of chapter six and it's going to take all that week and this gets us towards what are called information criteria which are kind of the rage in biology right now I think as you see in a lot of journals and editors are like okay you've got information criteria you don't need p-values kind of thing but I want to tell you what they actually are and they're just little models too they're models of prediction of forecasting and as because they're models they depend upon assumptions and they're not perfect they don't do the impossible they don't tell you the truth right that's that's how it goes but I want you to walk out of here with a real clear understanding what they're for they're for solving a problem called overfitting that is models get really excited by samples by the data you feed them and they think the whole world is like that because that's all they've ever seen so you've got to deal with this somehow and there are different ways to do it and I'm going to teach you one the information criteria the goal of information criteria is to measure the overfitting risk that is how excitable a model is how much it's going to over-generalize from a sample you can also use conservative priors called regularization in non-basin statistics to do this as well I'm going to teach you both and they both work well together which is why I want to teach you to use them both one measures the overfitting the other reduces it so that's why you want them both overfitting is bad because it leads to bad predictions so these lead to criteria like AIC the Aca Ica Information Criterion DIC the Deviance Information Criterion and WAC the Watanabe Aca Ica Information Criterion or actually Watanabe who's still live calls it the widely applicable information criterion and it's the new hotness and I'm going to be teaching you it it's the most recent robust Bayesian form of AIC is what it is and I'll show you how to compute it and I would say yeah probability and information theory is inherently Bayesian so these things all fit together information criteria okay you're also going to learn a lot of R even though this is not a class in R you're going to be doing simple scripting in R and you will be hot shots in R by the end some of you already are I know you will be resources yes you will I look at your search beautifully resources to your colleagues and R is a giant calculator that can compute anything that's computable and you can usually interact with it that way but I want to teach you a little bit of discipline scripting through your homeworks where you write your solutions in a strict form and you keep them this will be good for you later years from now when you really need the information from this course you're doing your dissertation analysis you can come back to those scripts and it'll all be there with your comments and it'll make some sense and also there's a certain ethical obligation in science to be able to share our results in a way that are replicable with our colleagues so we can publish an analysis and you don't have a script that you can email someone so they can replicate your work that's unethical it really is and I know lots of lots of journals and fields don't enforce that but I think we have to say that that's ethical and I will say that I have not always done that I'm a bad person but I want you guys to be better than me but don't imitate me I'm a pirate right be better but no I mean now I try to do this and so I want the nice thing about these inconvenient at first text based command line stats processing things is that they force you to write it out in a script so that the replication gets taken care of as you're doing the analysis whereas if you're using SDSS God forbid and you're clicking around in menus right there's no trail of breadcrumbs there you don't even remember the next week what you did and then the next version won't even have that command and that way lies madness so this is something that becomes a professional skill it helps you collaborate it accelerates discovery because other people can pick up where you left off and it's quite common and the reason R is such a big deal insistence community is partly because of that it becomes a common coding tool to share results and algorithms so that it's part of doing team science okay I've got a little bit of time right we ended three is that right yeah okay let's let's start into chapter two where we build up Bayesian inference from humble origins that'll be the goal so let me start with another historical metaphor because that's how I am Christopher Colombo was a bastard right see Italian and I look at I look at my Italian reader but sorry sorry Paul excited he sailed for Spain so Italy's off the hook but he's a right bastard because he gets to be America's and he starts genocide right horrible horrible history he's credited with the discovery of the world and I'm going to leave that aside for the moment instead focus on his navigation choices so leave the fact that he was a he was a he was a bastard behind for a second and then we have a holiday about it for some reason so what's interesting is this is the globe that Colombo used or Columbus used to play in his journey and it was wrong this is a very small world he got this from a German geographer behind actually made this into a globe if you google this you can find it with the pd page and they have photographs of the original globe it's in a German museum still this is Asia this is Japan right here I don't know what some of this is this is Africa and they're Spain and he plotted his derby to get over here which is where he was going because there was spice there right it's the idea turns out however that the world was a lot bigger than he thought it was and there was a lot more ocean there and he was lucky there was another continent in the way otherwise he would have died of thirst and starvation never made it to Asia because the Pacific Ocean is big is anybody seen all across it it's a pretty big ocean it's like half the planet and so he was kind of lucky in the sense but he had this view of the world if I call it a small world which he used to plot his journey and this is this was his model now of course back then we didn't have satellites so this is what he had to use and the reality was that if you superimpose the modern presence it's something like this it's actually hard to do but you gotta like stretch the globe to make space and such but vaguely like that and so he ended up making a landfall in Bermuda we think right I don't know if people have ever figured out exactly what he made landfall the world was a lot bigger than he was so I want to use this historical story of a colossal mistake of measurement which is sad in historical life because the Egyptians at around the time Ptolemy had made a much better estimate the size of the earth and knew that the earth was bigger than that but then the Europeans forgot all about that work that happened in Egypt way back when sort of sad but this contrast between the small world and the large world which is part of the standard vernacular of Bayesian inference thanks to Lyric Savage of this book that he published in 1954 on Bayesian rationality the small world is where probability lives it's our representation of what we're trying to learn about and depends upon assumptions just like Colombo's little globe it's a small world view of what it was and you have to use small world representations make models to make predictions you have to use it there's just no other option learning about the world but what we really want to learn about is the large world and we only learned about through iteration of comparing predictions made with small world devices two measurements made in the large world and the mismatch between them is an iterative way to make our models better and improve science so we've always got this this problem where we're going to spend a lot of time in the small world in this course and it's enough to just understand what the models mean in and of themselves and then we've got this secondary issue checking the inferences of the models against what we think is going on in the real world so we're going to alternate between small and large world contrast and I emphasize this it seems like a silly thing you're nodding like well yeah of course why are you bringing this up everybody knows this but very often people will say well look the confidence interval of my model is really narrow so you know the parameter value must be in that range but that's conditional on the model being true right it's a small world inference everything your model says is only true in the tiny logical world the false world of the logic you still have to test it against reality it doesn't matter how certain your model is it could still destroy product right and so we're going to come back to this and I'm very sympathetic to the idea that it's easy to forget that because the small world is so subjective and relaxing it's also perfect and everything works right at least sometimes when your software works and the large world is hard and it may be that in our lifetimes we never quite get to the destination we want but we're going to keep coming back to this and there the good news is we do make a lot of progress so got about 15 minutes I think I can get through the garden of 14 data now we're going to build up Bayesian inference and I'm going to do it in a narrative fashion probability theory is just counting that's really all it is it's not glamorous it's just convenient method of counting an infinite number of possible things right because otherwise it would be hard to count so let me build that up in a simple toy example that will be nothing like your later data later data analysis problem so then I think you'll see the logic of what Bayesian inference is I want to do it this way it's supposed to show you that this is an extremely powerful approach it's perfectly logical it is the optimal way to construct inference in the small world context there's no other mechanism of updating information that could beat it nevertheless it's not glamorous because it's a garbage in garbage out logical process it always depends upon assumptions the things we're going to count in probability theory are events nominated by models right so they depend upon our assumptions so there's no way out of that loop except to check against the large world that's why the precision in the small world doesn't matter but we want to do as good a job in the small world as possible so we're going to try and learn Bayesian inference as best we can from this so I'm going to use as a launching point this short story from from from Borges a number of people know this story Garden of Fortune Pass no you should read it Google it read it tonight it'll thank me hopefully and if you don't you're a bad person if you don't like it you're a bad person great story you don't have to read in Spanish by the way there are lots of great English translations anyway I'm not going to use the story in particular but the way that the way the narrative in the story unfolds is that it's a it's a story it's a short story about a book that has a bunch of different alternative plots that are like a choose your own adventure thing that branch out through time it's a great short story it should be it it's also about it's about spies and stuff which always makes it sexy right so Bayesian inference is like this in the sense that when we do Bayesian inference is we imagine all the possible things that could happen then we look at what did happen and we eliminate all the past that are inconsistent with what actually happened but it is just that I'll say it again it's a process of nominating according to our assumptions we nominate all the things that could happen all the possible plots all the possible data sets that could arise according to our assumptions then we look at what did happen and we see which of the past are consistent with what did happen typically there's more than one and then that's all there is and then we look at the we do that for each of the possible conjectures the models that are meant to explain the events in the world and we ask which has more possible past that remain that are consistent with what we've observed and those counts of possible past we're going to walk through this in the next few slides so bear with me those counts of possible past are become probabilities once they're standardized that's all they are we don't actually want to count stuff because there would be billions of pasts as you see the commentators and these things explode so we standardize probability so everything sums to one and then it's easy to do math with it and that's really all it is it's completely unlamorous and it's awesome at the same time it really is so let's think about this in a I said a simple toy example this won't resemble the data analysis substantive data analysis problems you're going to do later in the course but again I think you'll see the symmetry so let's imagine I've got a shopping bag and it's got four marvels in forget the backstory there is no probability it's likely wrong there'll be real data and so there's the mystery bag it's got four marvels marvels come in two colors blue and white why because it's a story it's a probability thing and so what you do know though is that there are five possible contents of the bag because there are only four marvels and there are only two colors of marvels they could all four be white one of them could be blue two of them could be blue three could be blue or they could all be blue agree? those are all the possible events we'd like to know after observing three draws with replacement from the bag so I have one of you reach into the bag pull out a marble look at it someone writes it down then you put it back we shake the bag pull out another one we do that three times say we observe blue white blue what are the relative plausibilities of these different possibilities how can we estimate the contents of the bag and this is a representation of Bayesian inference how probability theory works in these things it's also consistent with other non-Bayesian ways as well but we don't worry about that as we go okay so what I say one way to think about this is we plant the garden of forking data figure out all the things that could happen so let's take a single conjecture of the contents of the bag one blue marble and three white marvels and let's consider only the first draw from the bag first and what could have happened well there are four paths in the garden of data either we get one blue there's one way to get a blue and there are three ways to get a white right because there are three white marvels and they look the same to us from the way we write down the data but they're actually different marvels agree? then on the second draw each of those paths has four paths because the draws are independent that's an assumption maybe they're not maybe there's marble magic going on and they stick together it's an assumption so each and so we could draw there's one path that gives us one blue marble the first time then there's one way to get a blue marble the second time and three ways to get a white marble if you get there's three ways to get a white marble the first time and then for each of those there's one way to get a blue and two ways to get a white so now we've got four times four is 16 paths 16 possible data sets some of them look the same to us this will be important in a second and then you can see here's the combinatorics and why we don't like to count things so for the third data point now there are going to be four times four times four is 64 paths and again it's the same idea this is the garden of forking data all the possible data sets you can get in three draws all the number of ways now let's take the actual observations and eliminate paths so conditional on the bag containing one blue marble and three white marbles and we observe blue white blue how many ways is there to get that under this assumption and we just eliminate stuff and you can see well the first thing that has to happen is we get the blue on the first one so all these are eliminated right then there are three ways to get a white marble in the second go those three ways stay alive for each of those only one way stays alive so there are three paths that are consistent with the data with me so far this will look more useful as soon as we look at the other contents the other possible contents of the bag then we can do comparisons yeah so let's do that we're going to do we're going to contrast these different ways so possible contents we've only done the second possible contents so far there are three ways to produce the observed data for the other ways first thing I want to assert to you and I don't think I need to draw the trees is that the first and last are impossible because we've observed at least one blue and one white so they're going to be zeros in there and there are no paths for those that are consistent those are impossible makes sense sometimes you're lucky and that's true there are some theories which are completely incompatible with your data yeah and uh the others it's harder usually in when we do science there are a bunch of trajectory which is your data but two different extents because different numbers of ways are consistent with the observations so I'm just showing you again our previous garden but I know I've made it a wedge in the upper left this is the thing I showed you before then let's consider the second one suppose the contents of the bag were two blue and two white now two paths survive on the first draw because the first draw was a blue marble then two paths each can give you a white marble and then two paths each for those give you a blue marble there are eight ways to observe those data right so it's more plausible that's the first actually now it's not a slam dunk eight ways versus three ways I wouldn't bet my house and I bet your house did not mine we don't have a lot of evidence but the final way three blue marbles versus one white there are I won't go through the exercise because I think you get it there are nine ways that are consistent with the data which is slightly more not a big difference these are these are plausibilities they will be in a second when we standardize them they are the relative numbers of ways that each of these hypotheses if you will each of these conjectures about what's in the bag could produce the observed data and this is all Bayesian inferences really does but it's going to be it's going to be abbreviated into a mathematical formalism which saves you all the effort of doing all this counting it's actually way easier than this but the the risk there is that it becomes superstitious sometimes right it's like it seems magical and magically rational and it's producing all these optimal inferences in the small world and it does do all that stuff but you have to remember that it has this unglamorous beginning of make some assumptions count up all the ways that you could make observe what actually did happen conditional on your assumptions compare the different conjectures and that's all it is it's amazing that it's so useful given that it's so accidentally done like this right but this is an amazing form of inference that has I don't have to convince people taking this course that Bayesian inference does a lot of powerful things but that power depends upon the assumptions being useful not that they're true but that they're useful in particular contexts so let me do some summary one of the things that comes from this is flows events are independent you can multiply to get the total number of waves right because it's just multiplication is just a shortcut for addition right repeated addition and that's all it does here so we've got zero way for that to happen zero way for this to happen and then three eight and nine so these are more plausible than that one by some relative amount you're like well it's not a big difference and you're right but that's one of the nice things about Bayesian inference is that an estimate and its plausibility the strength of evidence are the same thing it's all in the relative values of these numbers and that's typically what we do is we standardize them before we do that I've only got a few minutes here you can update as new data arrives you don't have to repeat the whole calculation this is called Bayesian updating and you'll get you'll get a more formal introduction to this when you come back on Thursday and we'll do it the kind of textbook way again again we're just in the narrative form right now same conjectures say we take one more draw from the bag we get another blue marble you know the ways to observe one blue marble all these objectives zero one two three and four right I don't have to draw gardens for that we've got our previous counts of numbers of ways to get to new counts we just multiply the two together and now we start to get some divergence right among them and every additional data point changes those relative counts in the same predictable way it's called Bayesian updating and it's an optimal way of learning in the small world in the large world there is no optimal way of learning I'll try to unpack that later on in the course but I think that's often true one way we think about this is is the use of prior information sometimes we receive different kinds of data and we've already got a previous analysis which has given us relative plausibilities of the different conjectures and we'd like to update those Bayesian inference is really good at this it's one of its major selling points there are things that are not good about selling it like it's computationally difficult but the ability to mix and match different kinds of information and to use prior knowledge of that system in our statistical analysis is a huge advantage I think so imagine in this case for example we've done the previous analysis and then somebody tells us but you know at the factory the blue marbles are rare so in the production process we make sure that every batch contains at least one this changes the relative ways to get different bags and so these are ways so as long as these numbers are correct relative values relative numbers of ways for these conjectures to arrive for sampling for drawing things out we can use it as prior information and update them with our data analysis so in this case it doesn't matter which logically prior or after these are the ways we had before multiply those by the factory counts and again now we get this switch where this one seems more plausible because those kinds of bags were produced by the factory more often sort of thing now this is all absent some real scientific context so it might not seem completely relevant to you but that's by design so that you get the basic logic of it once it's once it's in the middle of a scientific context you have to move out measurement error and all that other fun stuff which is essential but the purely logical exposition here is just meant to be clear does this make sense a little bit okay last thing and I'll let you guys go and we'll pick up here on this slide when you come back on Thursday punchline for now is that the unglamorous basis of all apply probability not just not just basic but all apply probability is you imagine a small world system where you can count events and there are different possible processes that could be producing these events so if you want to make continuous plausibility inferences about which of these processes is consistent with the data all we can do if you want to be logically consistent with the assumptions there's one way to do it and that is just to count up all the ways you could see the data according to each process and compare those relative weights and that's all there is the probability theory is just standardizing these counts so that across different systems the relative counts always sum to one so how do we do that here I'm just going to say we're going to relabel each of these conjectures with some number which is the proportion of blue marbles in the back this will later become a parameter we'll be excited by that ways to produce data if we sum up this column and divide each value in it by some we get plausibilities which are in fact probabilities probabilities are any non-negative real number where the set of them sums to one and then all the actions probably apply and that's all probability theory is some of you already knew that but this is the superstition free way all right I've held you guys over 30 seconds I apologize come back on Thursday and we will cruise into some computation