 This is a statistics course, but it's really a course in subordinating statistics to science The reason is because the audience I want to speak to Got into science not because of data analysis, but because of some fascination with natural phenomena like this dinosaur Or the seeds it's eating or the person who's feeding it for some reason And our interests in these things very quickly turn into Data because that's how we systematically record the variation and the interactions and dependencies among these organisms and To make sense of those numbers we need mathematics and then All of that mathematics that we've ignored since secondary school throughout college and our basic training into science becomes relevant again And even necessary We know this is true at very large scale phenomena like the climate and weather there's massive amounts of data, but But it's also true at the small scale the humble sciences the social sciences to make sense of people's relationships And their needs and their conflicts We also need lots of data highly detailed data at all scales So we can make sense of how society's evolved so you get all that data and then what? well in introductory statistics courses what you do is consult a flow chart Pick a procedure Which gives you a p-value the purpose of all the procedures that populate a chart like this are to test a null hypothesis To reject a null hypothesis hopefully the research hypothesis itself The target of your scientific investigation is Anonymous in these charts and this has led to a lot of dissatisfaction When you move on to do more Fundamental research in your field because the procedures in these introductory courses don't really address fundamental causation in any satisfying way and they're highly inflexible compared to the Very large domain of hypotheses that scientists consider at the cutting edge of their fields So what do you do instead? There are lots of options, but I've prepared 20 lectures that present the one that I like This is the course that I had wished I had had in graduate school In these 20 lectures, I tried to rethink the role of statistics in Research whether that research is scientific or it takes place in industry or it's just your hobby And the idea is that we want to fully integrate Statistical modeling with scientific modeling and scientific thinking so that they go together and they're woven together throughout the course of a study The rethinking is not the Bayesian part This is a Bayesian course and you will learn to do Bayes and we will do everything with Bayes but Bayes is no longer Some alternative way some some radical alternative to mainstream statistics Bayes is mainstream. It's mainstream in every major science every major scientific publication features Bayes now Everybody's heard of Bayes. Everybody has software and a computer to do Bayes. It's not unusual anymore Bayes is mainstream in this first lecture I want to talk about the content of the course in the framing by talking about three different kinds of creatures the first are golems The second are owls and the third are dogs or excuse me. Dax. I'm in Dax. Let's talk about golems first Our tale of statistical modeling is going to start in 16th century Prague So Prague in 16th century was the place to be in Europe. It was a very fancy center of an empire Emperor Rudolph the second I think his name was was a big fan of the arts and the sciences and Tried to lure all sorts of intellectuals or performers to the city and was often successful Like many important cities in Europe at the time there was a substantial Jewish minority and that Jewish minority was persecuted and As legend has it rabbi love rabbi in charge at the time Constructed using the Kabbalah a defender of the Jewish people of Prague in the 16th century a golem Which is a mythological clay robot that would defend the people This is this legend has an unhappy ending unfortunately because the The point of the mythology is that toying with the power of creation is risky and the golem while quite powerful had no intellect or intent of its own and Ultimately became too dangerous to maintain and was Decommissioned as it were by the rabbi that created it Prague today if you have a chance to visit it you should Uses the golem legend as a tourist lure and there are golem statues and Pastries and cookies all over the city. It's a very nice city Our interest in the golem legend is that this is a metaphor for statistical modeling the Golem is powerful much more powerful than its creators and it can do things for them that they cannot do for themselves But it has no intent of its own and unless you're extremely careful with the instructions you give it it can Injure you or the people you care for So let me review Golems the things you want to know about golems They're clay robots mythological clay robots. They're extremely powerful They have no wisdom or foresight. They merely carry out their instructions. And so your instructions must be extremely precise Otherwise you will encounter great danger and peril The same as true of statistical modeling, unfortunately each of the little procedures in that flowchart I showed earlier is a little robot a little clay robot. I use clay here playfully Mean silicon right most statistical models these days are instantiated only in computers We don't do them on pen and paper with pen and paper anymore or with slide rules And these little clay robots are very powerful They can they're very good at stuff that people are not good at this is of course why we like mathematics at computers These logical procedures can can very quickly and accurately do things that people have a lot of trouble doing So they're powerful But they have no wisdom and foresight their purpose and their results is entirely up to our wisdom in choosing to apply them and Interpret them correctly and therefore they can be quite dangerous when misused in The first chapter of the book that goes along with these lectures I I discussed this example this chart figure 1.1 of little golems and the flaws with it and I'm not totally against using Heuristic charts of this sort. I just think we have to be very careful. I don't think they're very useful in Advanced research they're useful in particularly highly controlled and even industrialized contexts These charts are the procedures in these charts are incredibly limiting. They're good in and only very limited circumstances And each of them focuses on rejecting a null hypothesis instead of a research hypothesis and if you think that Falsification and rejection hypotheses is a good philosophy of science then I agree with you, but unfortunately These procedures get it backwards Karl Popper and the philosophy of falsification ism was not Focused on falsifying null hypotheses, but on falsifying research hypotheses by understanding their implications and Comparing those implications to data and we'd like a way of doing statistical work that helps us do that In these procedures and in these charts the relationship between a scientific hypothesis and the test is really not clear Because the research hypothesis is not represented in any transparent way only the null These procedures are not always bad. They're good in industrial frameworks or highly controlled experiments So what we want to do instead is be very careful about the relationships between our scientific hypotheses the scientific process models that emerge from those hypotheses and The statistical models we use to connect those process models to evidence So here's a figure 1.2 from chapter 1 of the book I'm going to step through this figure one piece at a time so you can understand the point I make through an example from population genetics And don't worry if you're not a population geneticist or you have no interest in population genetics I'm sure you can think of some analogous situation in your own field because the the situation I want to present here is actually quite commonplace. So let's start with one hypothesis and Walk through the connections to process models and statistical models. So on the left here We have a column for hypotheses and the the hypothesis I have shown is that evolution is neutral and this has been an ongoing debate for a long time and the evolutionary science is about the extent to which the sequence of Base pairs in DNA is to be explained largely by selection or by other evolutionary forces Neutrality is a very hard position on that that it's basically just mutation is what explains most patterns of variation I have a squiggly outline on this because hypotheses like this are quite vague To do any work with them you have to map them in the middle column to some kind of process model And what I mean by that is a real scientific model that has logical causation And if there are entities that you instantiate and you say which entities affect other entities and then what the consequences are of those things and So what I have one example in this slide and that is a process model of neutral meaning there's no selection Equilibrium meaning that the population size stays the same And this is a very particular kind of no model or neutral model that was used to try and understand evolutionary dynamics in the 20th century And then from this process model you can construct one or more statistical models and I'm just I just have one example here on the right the circle in two and The reason that process models and statistical models are not the same is that statistical models Examine associations. They don't really have causal forces in them Instead what you have to do is Study process models so that you know which implications to look for statistically Will tell you about causal forces in the process model help you estimate their strength and so on but you need both And statistical procedures for example might look at distributions And in this case, that's the way this story unfolded in the 20th century the neutral equilibrium model implies a particular distribution of the frequencies of different alleles in the population A power law distribution if I remember correctly But you can have other neutral models So here's another process model that also maps on to the idea that evolution is neutral the neutral non-equilibrium model So you can have no selection making it neutral But the population size can still fluctuate and almost all wild populations of animals and plants Have fluctuating population sizes over time sometimes radically so And this gives you a different statistical expectation than The neutral equilibrium model does so in that sense you could actually test these two against one another because they imply different statistical distributions Now let's consider a different hypothesis altogether that selection matters Now we all are prepared to accept that selection matters But the question is to what extent does it matter for the frequency distribution of alleles in wild populations? Everybody agrees that selection matters for design And again, we find that there are multiple process models that are that are attached to this hypothesis First would be a constant selection And another would be fluctuating selection the distinction here is that in constant selection That some trait is good and it's always good in fluctuating selection A trait may be good in one season or one year and then not in the next And Biologists argue about the extent to which selection fluctuates, but everybody agrees it fluctuates to some extent. Well now here's the moral of the story Many fluctuating selection models Generate statistical distributions that look like neutral models And so you can't easily tell them apart just by looking at the frequency distribution of alleles And again, this was this was a this saga that this slide represents is over. It happened in the 20th century But it's a general lesson that has reemerged in a number of fields Ecology went through the same drama with the neutral niche model the extent to which species are really different from one another and again A model with niches and a model without niches can produce the same frequency distributions of species and so on the point I want to get across is that In situations like this at the cutting edge of research with complicated natural phenomena There is no unique null hypothesis Instead we have to just be very careful about what process model we're talking about what its connections to hypotheses are And which aspect of its statistical implications We're going to use to contrast it to test it and to contrast it with other process models And that's not always easy. There are lots of things In my own work at my institute that have this this inconvenient feature, but I've seen examples in all kinds of fields So the ones I'm most familiar with Phylogenetics there is no such thing as a null phylogeny What does it mean to randomly permute species on trees or traits? So much of the structure of evolution is already baked into the data we have And so you have to have a non-null process model to make any sense About phylogenetics same goes for for ecological communities There is no uniquely null ecological community. You can't one does not simply permute a community of plants and animals And structurally similarly social networks in people and other animals Again, there's no unique null To randomize to instead we need process models and to study the implications of those process models So I hope that I've at least for the moment persuaded you that we need more than little tiny null robots tiny null golems We also need precise process models And then we need some set of statistical models. Hopefully justified from those process models From their implications so that we can get at some particular scientific question And I'm going to call repeatedly in this course the estimate. What do we want to know? What are we trying to estimate in the first place? So that's golems golems are going to keep coming up in the course over and over again And I hope you don't get tired of them The second creature that I want to talk about today is owls And owls have a Outsized role to play in this course because we're going to draw a bunch of owls Metaphorically, so there's this joke on the internet And I know some of you already know it that goes like this. The idea is you you have a How-to guide a step-by-step guide on how to draw an owl A very useful thing Step one is to draw some circles. So we have here our first step one There are two circles sketched out kind of a well one circle for the head and then An ellipse for the body and then step two is draw the rest of the owl And that's the joke So the the point of the joke is that this is not very useful There's obviously a bunch of steps missing In the middle here We need to draw some more circles and draw a branch and draw some features of the face and then do some shading And finally put in the detail. There are lots of steps to drawing an owl Technological skills are often like this because the how-to guides leave out a huge number of steps And often the experts of those skills aren't aware of all the steps either Which is why it's so nice to learn and close proximity to experts Because they're not always the experts are not always consciously aware of their expertise and all the all the steps that go into it So in this course, I want to really draw the owl for you as much as possible And that means coding and it means forcing you to express statistical models and scientific models In code in detail so that the assumptions are not hidden and we can talk about the connections among the Among their pieces and so that means slides like this. We're going to we're going to take what would normally be Just a command and a stats program and we're going to program it by hand In and this is not a particularly complicated calculation This is just the calculation of a Bayesian posterior distribution But we break it down into five steps and walk through them First step of setting up the calculation calculating the prior Calculating the likelihood and finally the posterior the normalized posterior distribution so that we draw the owl We'll also have detailed expressions of statistical models So if you're accustomed to expressing statistical models like regressions with a single formula one line of code That's not going to happen in this course, but it's for your own good We're going to express everything that goes on in it And that will tell you how to draw the owl So that you know things why do I want you to draw the owl because I realize drawing all the drawing the owl every time Be a little annoying There's three modes of drawing the owl in this course The first is to help you understand what you're doing So that you're not trusting in somebody else's can procedure the black box of the little golem And I think if you're taking this course you want to understand what you're doing Second There are selfish incentives beyond understanding if you document your work future you will thank you I'll say that again if you document your work then future you will thank you Uh future we will thank you because errors will have been avoided When you carefully document your work It's like reviewing your work during the documentation and this removes some kinds of error not all kinds but some kinds And then secondly you get to reuse your work in careful ways in the future as well when you document it Point and click interfaces do none of these things they leave no trail of breadcrumbs for you to review later And let's face it. We all have complicated lives. We can't remember what we did exactly But if you if you document your work by drawing in the bayesian owl as I say It'll all be there for you to return to and then finally We want a respectable scientific workflow and what I mean is it's not enough for you to trust your own work You have to work in a way so that others can trust it And that means having a documented orderly Justifiable scientific workflow that involves setting up your scientific hypotheses connecting them to to scientific models and then finally Two statistical results Okay, so the the outline of the scientific workflow that I'm going to reuse in examples in this course through the next 19 lectures after this one Have I think five steps? Let's count them out and see if I remember correctly So the first is to have a clear idea of the theoretical estimate that is what are you trying to do in the first place? And we'll have a number of different kinds of examples I it may seem silly to have this as step one, but it's often quite hard to tell what a scientific study was trying to do in the first place All too often you have a vague metaphorical connection between research buzzwords and some data set And then some figures are drawn and that's not good enough. We have to do better second You need some sort of scientific causal model and that'll give a step two of drawing the owl and The theoretical estimate will be precisely defined in the context of scientific causal models And again, these are these are models that can produce data. They're forward simulating models Um, our logical models that generate synthetic observations and let us design statistical procedures So that's step three We use the theoretical estimate and the precisely defined scientific models to build statistical procedures That can get at the estimate or tell us that it's not possible to get at the estimate sometimes That's that's true as well, but that's good to know because that means we need to Find a different way of investigating the phenomenon step four we simulate From step two the scientific models to validate that number three our statistical model works And this is a way of checking again. This is to justify Our our workflow so that our colleagues can believe that our software works This is also good for your own peace of mind to know that it works as well Most of the models in this course are not so complicated that step four is strictly necessary But nevertheless, I'll show you examples because I think it's something that I can teach you to do that You'll be glad you learned And then finally We analyze the real data We've passed through step four. So we feel like our our statistical procedure works It gives us in theory The theoretical estimate that we want And now we're prepared to put the data in notice that we we have not designed the scientific models or the statistical procedure Conditioned on the data we use Yeah, the data are entered at the last step But when we get to that point of analyzing the real data, we may realize we forgot something and that's okay We can back up and we can do things again as long as we document how all those decisions are made Okay, before we leave the owls, I want to say a little bit more about being a Bayesian owl and what all that is about so The reason we're going to do Bayesian owls in this course is because it's a very flexible approach So what you see on the screen here is believe it or not Saturn as Galileo Galilei would have seen it in 1610 you see the Scan of his personal notebook in the lower right here The telescope that he was using one of the first Was not very good and it made very blurry images And so when you look at Saturn through a bad telescope or or highly out of focus The rings look like little ears on the side of it like this and that's how Galileo drew it Now the inference problem here is what's generating this blurry image What does Saturn really look like now? You know what Saturn really looks like because you've seen pictures of it since you were a child, but Galileo did not Um This is an interesting kind of inference problem because there's no sampling variation involved No matter how many times Galileo looked through his telescope. He got the same blurry image Nevertheless, there's uncertainty about what the generating image is what the planet actually looks like So the question I put to you is is this a statistical problem or not? And you know, I'm going to say it is of course. It's a statistical problem. It's a Bayesian statistical problem The Bayesian approach is permissive and flexible No matter if you've got a data generating scientific hypothesis You can analyze it with Bayes It doesn't matter if the uncertainty is due to sampling variation or to some other process like light scattering in a bad telescope You can express uncertainty at all levels whether it's measurement or observation or sampling biases Uh missing data all those different sources of uncertainty live together in the same analysis And near the end of the course, I'll show you how to use the Bayesian approach to get Rather direct and immediate solutions to measurement error and missing data Expressing them in the kinds of regression models you'll be using throughout the earlier parts of the course and As I said a little earlier What Bayes let you do is really focus on scientific models because you can go straight from the scientific models To statistical procedures with minimum fuss. You don't have to worry about which kind of estimate Sorry, you always worry about the estimate which kind of statistical estimator you're going to use or Which kind of standard errors and and all those other sorts of decisions We have only one estimator in Bayes and it's the posterior distribution, which you'll learn about in the next lecture Okay The third critter I wanted to talk about before I finish this first introductory lecture is dags. So what are dags? So I just said a little bit about Bayes and and most of you know that Throughout much of the 20th century, especially the early 20th century. There was this Competition between the Bayesian approach just to statistical inference and the frequentist approach to statistical inference These giants fighting it out over aircraft carriers and so on I'm not very interested in that fight and the reason is because Both are very capable frameworks and the problem scientists really have is not which of those to choose But that they have no training in how to connect Their causal models to their statistical procedures. And so That's what I want to talk about now. Now that said, of course Bayes is better. This is a Bayesian course But you could teach this course using frequentist tools and it would be Not so different in most places near the end. It would be quite different But in most places wouldn't be so different The important part is the part about the causal inference part connecting our causal models to the statistical procedures So how are we going to do that? Well, the slogan that I often give is to put the science before the statistics and no one's going to complain about that slogan What do I mean for statistical models to produce scientific insights? You really need something outside the statistical model And I've already mentioned this in the in the justifiable workflow when we when I talked about drawing the owl What we need are scientific models or sometimes called causal models models that contain in them Entities that influence other entities and not the reverse The reasons we do a statistical analysis a certain way are not in the data themselves It's not enough to simply have a big data table and then look at it and Ask how many groups there are and so on and get to some meaningful statistical analysis And the reason is because Data tables only have associations And so we believe in causes because we believe in them And that leads us to interpret associations in causal ways And there's no way out of that. So you have to have the scientific model and I'll I'll have endless examples of this as the course progresses So the Point of all this is that the causes of the data cannot be extracted from the data alone There's a philosopher of of science. Nancy Cartwright has this great slogan No causes in no causes out and that'll be another motto for us in the course So other reason to think about Causation explicitly here is our big looming problem is that even description and research design are all aspects of this The models and the information you need to put into analysis to successfully Conduct causal inference is the same information you need to accurately describe a population Or to design a study which gets at the desired estimate. These are all really the same sort of task And I want to I want to emphasize this issue about description here and this This may come up a few later in a few later lectures, but if not, I want to really want to emphasize it now because I'm An anthropologist and a lot of what anthropologists do is describe things. It's a and and that's not a Some sort of low-class occupation to try to describe things description is fundamental to scholarship But to do description, right? You need causal information You need causal information about how the sample differs from the population I'll say that again to do description, right? You need causal information You need causal information about how the sample differs from the population And when you're studying humans at least the sample always differs from the population sometimes in very systematic and important ways And you need to account for those things the right way to adjust for those sampling biases Is to understand what caused them And we'll have examples of these sorts of things in later lectures Okay, so what is causal inference? We're we're approaching DAGs rapidly. We'll get to what a DAG is in a moment. So causal inference is the attempt to understand the scientific causal model to Understand the pieces of it using data that may have been produced by it So everybody's heard that correlation Doesn't imply causation. Well, unfortunately causation doesn't even imply correlation as we'll talk about later But there's good news. It is possible to learn about causes from data But those Causal learning those causes requires more than just the association between variables And there are two ways to think about what causal inference is and this helps us think about how we would find it in data First way of thinking about it is that causal inference is prediction But it's a very special kind of prediction And i'll say more about that on the next slide And the second is that causal inference is a kind of imputation Which means a a counterfactual Imagining of something that could have happened and i'll say a little bit more about this later as well These are the same thing actually. There's one kind of causal inference. We do in statistics But if you do it right, you can do both these things Okay, first let's talk about causal prediction So prediction and causal inference are really different tasks But there's a there's it's nice to think about causal inference has a special kind of prediction So It's a kind of Prediction where you predict the consequences of intervening in a system Knowing a cause means being able to predict the consequences Of an intervention that is you do something to change the system and you observe what happens If you understand the causes operating in the system, you'll be able to predict the consequences of that intervention So the key question is what if I do this the causal inference is getting at that question The trees are here as an example A simple kind of kids example to think through this if you If you're inside your house and you look outside and you see the wind blowing the trees It's it's not immediately obvious from the observation alone. Which is causing which Right, you believe the wind blows the trees because you believe in the wind causes the trees to blow But it's it's the wind and the movement of the trees and the wind are always associated So it's not clear which causes which Now if you were to do an intervention like getting you and a few hundred of your friends to climb up all the local trees and shake them It would not cause much wind That's the prediction at least Because you understand which is causing which it's the wind that makes the trees blow It's not the movement of the trees that generates the wind The other way you can think about cause is through causal imputation, which are the counterfactual outcomes Knowing a cause means being able to construct unobserved counterfactual things things that did not happen But if they had happened You would be able to predict what would happen as a consequence So these are alternative histories like imagine some other country like china had gotten to the moon before the united states How would history have changed if you understood the causes of history you could say What the effect of that Change would be So that is the what if I had done something else or if things had turned out differently This is the part of causal inference that we often think of as explanation Although explanation is a philosophically difficult term. So i'm going to try to avoid it for the most part Okay, finally daggs This course has a lot of daggs in it daggs are heuristic causal models They're the simplest sort of causal models that you can work with and they're really good for onboarding scientists into thinking About scientific causal models that are distinct from statistical models that contain extra information causal information Dagg stands for directed acyclic graph and you don't have to worry about what that means right now We'll talk about it later But there's an example in the upper right As I said, these are heuristic models. You can analyze them with your eyeballs and that's what I'll teach you how to do They help you clarify your scientific thinking and you can actually Apply logical rules to them to design statistical procedures as well. And that's that's what we'll start doing In in a couple weeks For now, just let me try to be a little bit provocative by Talking about what you can represent in these and why they're needed Each letter represents a variable something you can measure Well, you don't always have to be able to measure it, but it's a thing that exists And is caused by or causes other measurable things And the arrows represent causes the arrows point in the direction of causation So typically in a scientific study, we have some theoretical estimate like we want to estimate the effect of x on y represented By the arrow at the bottom here The x with the arrow pointing to y But to do that we unfortunately have to consider other kinds of relationships in the system It's not enough everybody knows just to focus on the two variables of interest There are other kinds of things going on First of all, there are other variables influencing the outcome like b in in this This particular dag And there are other variables influencing the cause of interest like a pointing at x in in this dag So here's a question for you and I assume most people listening this have had one or more statistics courses Which of these a or b should you add in a model? When you try to learn the effect of x on y, are they the same? Are they different? Should you add them both? Should you add neither? We'll get to the answer later in the course, but if you're like me, you were never taught this At all to even think about it in these terms Give you a hint is they're really different from one another and you should do different things with them There are other variables like c which are common causes of the two variables of interest here c is pointing into x and y c is the classical confound and we'll talk a lot about confounds in the course and obviously we need to deal with confounds Because x and y can end up having a strong association even if x doesn't influence y at all And then finally the other variables can have relationships with one another Which create complicated problems as well And in this particular graph the fact that a and b also influence c does create a complicated problem Which i'll talk about again as I said in a couple weeks when we focus on dag's more So the practical reason to learn this stuff these heuristic causal models Is that different statistical queries different scientific queries require different statistical procedures So even with one process model one dag like the one in the upper right of this slide If you have questions about x influencing y or rather about b influencing y you would need different statistical procedures to do that correctly to estimate those things You can't necessarily do it in a single model So questions like which control variable should you use are really not so innocent once you learn this framework and Just to caution you and I'll demonstrate this later. It's absolutely not safe to just add every potential confound to the model There are things called bad controls Controls that can actually make things worse The other thing you want to do with these heuristic causal models is Test them and so we can talk about that as well There are testable implications of any particular scientific process model and of course scientists have been testing such implications for centuries Now dag's are extremely useful and they're useful in research and they're extremely useful in teaching But eventually of course every scientific field aspires to move beyond purely heuristic causal models If you've got more scientific information You can represent your scientific models with that information and you go beyond dag's to other things and it's a very Last lecture, uh, the second to last lecture. I think of the course. I give some examples of this of More elaborate process models and how we can do stats with them Okay, let me just try to summarize this because this is the three critters That we're going to see over and over again in the course first golems. This is my Metaphor for the brainless powerful statistical models that we rely upon and we need golems. We absolutely do But you've got to design the right golem and you have to deploy it in a very Constrained set of circumstances so that it doesn't do damage Second are owls. We're going to draw the owls and what I mean by this is documented objective procedures and working in a way That gives our gives us confidence in our work and also Justifies confidence of others in our work And third dag's directed acyclic graphs A way to make transparent our scientific assumptions so that we can justify the scientific effort Expose it to critique and directly connect scientific theories to the powerful golems We're going to use to extract associations from data So this is a 10 week course the way I've planned it Although if you're watching the lectures online, you can take as long or as short As you like I plan two lectures for each week and in this first week at the top of the of the table here It's the focus is just learning the foundations of Bayesian inference And if you've got the book and you're following along in the reading you should read chapters one two and three To accompany the first two lectures in the next lecture. We will focus on really doing Bayesian inference So i'll see you there