 Rare CQ Cumbers, welcome back everybody, let's get into it and before I get right back into the material, I want to show you how to draw an owl. So surely somebody knows this but I'll pause, I should read it, yeah okay they're all nodding like yes we've heard this joke before, it's been a lot, now okay why is this here, I assumed you'd all seen this but I want to show it to you because this course is like that, like this right, and I'm sympathetic to that. So what I'm showing you how to do in the lectures is to draw some circles, that's what I'm doing, and then I send you up to your homework and I draw the rest of the owl and you're supposed to submit some owls or something right, I know it's like that and it just kind of has to be that way, there are things that are hard and you can show people the outline of the steps but there's a lot of things you just have to practice yourself to get it right and we're entering a part of the course in particular where I've drawn a lot of circles for you, geocentric circles if you will and you can go quite far with those things and now we're getting into you know the finer techniques of drawing feathers and such and it is necessary that I skip over a lot of that detail in lecture but the notes have a lot more detail, the book has a lot more detail that I can possibly give you in the lecture now because of all the little moving bits of these things so there's a lot of code that makes things happen, the things that I'll show you in the lecture and that code is in the book, it is, it's there, I had to write the code to draw the figures and it's there in the book but I won't show it to you and so if you're feeling like oh my god how does that work that's okay you got to go read the book right and then you can draw the rest of the owl, yeah it's just how it is so if you're not as I keep saying if you're feeling confused that's that's only because you're paying attention and you don't have to get you don't have to understand everything at once and your owl doesn't have to be perfect either I mean think about this metaphor I love this I love this joke I do I use it constantly but this is life right it's like science but what it doesn't get quite right about the scientific endeavor is that we don't have a particular target it doesn't have to look like an owl right we don't know what the owl is so there's actually a different exploration metaphor here you're drawing some circles and it's fine there's an exploration that goes on the right there are many many right models for an analysis is the point I want to make there are many many owls that would be satisfactory there's not some perfect platonic owl that you must draw there's not some perfect platonic Bayesian model that we're aiming for we're trying to extract evidence okay so I remind you when we ended last week I'm giving you an introductory example to doing a Poisson regression remind you a Poisson distribution is a count distribution what's a count number zero one two three four five all the way up to infinity right and the Poisson distribution arises when there's some unknown maximum count we're never going to see it and but the rate of any one item one trial is very very low and then we just need one parameter to describe the distribution the expected count and this is a very handy way to model counts and I had built up the first Poisson regression for you and we had just gotten to the point where we're going to do posterior predictions and that's what I'm showing you on the slide I'm plotting the model predictions against the raw data this is a very small data set I like teaching data sets like this but also anthropology is often like this right we're not going to get more data this is it right so this is all the oceanic societies we've got good data good data sets on so we we we go to war with the data we have and I'm showing you the same posterior predictions on two different scales on the left you've got the way the model sees it so the predictions are done on we're interested in how long population is associated with total tools because there's an underlying theory which says that you get more total tools as in proportion to the magnitude of the population size yeah you get more innovation is the idea with larger populations and so you get more complex cultures more complicated tool kids and there are two trend lines here because one is high contact islands and one is low contact islands so the in particular the dashed line is the trend for islands that have low contact with their neighbors and the solid line is for high contact with neighbors or that was the other interacting effect we wanted in the model that is your your small society but you're next to a big society like Tonga then it doesn't hurt you so much that you're small because you get to steal all the tools yeah Tonga was big one of the biggest so you can see time in there right so only thing bigger is Hawaii and so you'll see that there's definitely for both there's a very strong relationship with log population and I say the open circles are the low contact islands in the data set right Hawaii had low contact you can imagine why if you know your specific geography right Hawaii is way out there it's a miracle anyone found it yeah it's the middle of nowhere there's nothing near Hawaii it's nothing and all the other Polynesian islands are near one another for the most part and easy to easy to get from one to the other so Hawaii also has the biggest population historically so there's a strong relationship with log population and there's some difference with the contact rates low contact islands do have lower expected total tools and any particular size until you get out to this weird effect that's a very high end there's massive uncertainty you see the compatibility intervals there's massive uncertainty out there high contact islands because there are no high contact islands with large population sizes so you see that compatibility normal just explodes we're looking at the the upper right of the left hand figure right and it just has no idea where it's going up there but the model lets the black line cross back over so the models essentially saying yeah it's possible given all the assumptions you've made that a very very large population sizes high contact islands actually have fewer tools than low contact islands I want to assert that this is weird it's a very weird feature of this model and it seems wrong right seems like the model should be constrained somehow not to produce that prediction but it does now showing you the same predictions exactly on the right but on the natural scale if you will where the act the horizontal axis is population by head count just counting people and same vertical axis same curves but now they're squished and transformed and you see it's the same things you see that switch you see the massive uncertainty in the compatibility intervals on the high end Hawaii is still out there last thing I want to say about this is when you look in the text so at the bottom of the slide I say point size proportional to this thing called the Pareto K diagnostic value okay what is this this is where you draw the owl right so all the code is in the text to do this this comes from the Pareto smooth leave one out cross validation metric so if you if you calculate the expected out of sample predictive accuracy of this model using the Pareto smooth leave it out cross validation metric it gives you for each point in the data set this Pareto K diagnostic value which is like a measure of leverage it's a measure of how much force each point is exerting on the posterior fit this is a really handy thing to look at you get an idea about which points are making it sensitive in prediction and so what it's saying here is so I've scaled them and the big points are the ones that are exerting high leverage and you can see Hawaii is the biggest and you can probably look at this picture and see why Hawaii has high leverage because it's the only populated it has order of magnitude larger population than all of the other historical oceanic societies and so it's it's the only thing that's informing what happens on the really high end there's not much else to do about this because again we can't invent another island society but now you know and so what might be motivated to do you might well I don't think you want to drop away from the analysis but you might drop away from the analysis playfully to see what happens and if you do that well maybe I'll give that to you as a homework problem at some point but you could probably see that there's a trend in the lower ones despite Hawaii just Hawaii is doing a lot of work on exactly positioning the high end okay questions about this just draw the owl right just now there's a lot of text there's a lot of code in the book particular for drawing both of these figures get an idea what's going on before we leave this example though let me let me try to summarize a couple of the criticisms I have of this model and these are criticisms that extend in general to generalize linear models and now I like generalize linear models they're unreasonably effective like all geocentric models are unreasonable to be effective in a wide range of circumstances you give me some random variables and tell me nothing about them I can figure out a GLM and I can describe associations yay major paper right you can do a lot of work with generalize linear models but they generate a bunch of anomalies if you know things about about the variables that are external to the to the data set you're given often generalize linear models produce ridiculous effects I think there are a few things in this case that are like that symptoms of something I like to call generalized linear madness which is if the first time you have drawn up a quantitative model is when you start doing the statistics chances are something ridiculous is going to happen because there's scientific knowledge that is missing this is what I call generalized linear madness so this bottle is terrible even though it's geocentric right it's it's it's describing the relationship sure but it's got some things in it don't make sense so the first thing the most egregious I think is that the intercepts don't pass through the origin so what do I mean when these lines come down they cross zero population anywhere they like this is impossible given the nature of the data set it's got to be true that for any real relationship between total tools and total people the zero and the zero have to go together you got zero people you got zero tools I assert based upon the physics of how monkeys work zero monkeys on an island zero tools right if you add one person you might get a tool all right and then it goes up from there monotonically it's got to work that way that's the way the theory has got to work and this generalized linear model does not assert that because it just has a free intercept in the intercepts to go anywhere right the data just tell it where you go it doesn't have the physics anchoring it to the point where it has to cross which is the origin of zero and zero does that make sense so now this isn't a total disaster if you were we are learning that there's a general relationship between population size and total tools here but it does it itches my brain immensely this is going on it's also this weird thing where they cross at the top that bothers me a lot too it seems like the it's it's capable of doing things that really don't make sense what if we stop for a second and thought scientifically instead of statistically it's like really statistically so let me let me run out the simplest model I could think of of the system value and so what we want if we're going to have a scientific model of data like this is we want a dynamical systems model we want a model that says from one time point to the next individuals are inventing tools and they accumulate in the population there are also processes that lose tools at some point these processes balance and then we get an expected number of tools a technological sophistication of a society that's a basic cultural evolution model so it's the simplest we can do we start we have delta t which is the change in the number of tools in some generation time step and there are innovation inputs so P is population size and this is per person and then there's a rate of innovation alpha that is per person how many tools can they make right invent this is inventions right this is new tools to come up with right you got a slice an avocado it's really frustrating to do it with a knife and a spoon right you ever done this yeah and and so you invent some new fancy like you know hoop or something you can just slice through the avocado and it takes the seed out you know these things exist they're terrible just use a spoon but but people invent these things so that's very clever that's your alpha your contribution to posterity as you've invented the avocado slicer yeah and so the alpha p part is just kind of the basic assumption the verbal model itself is that each person has some chance of inventing something and so as you get more people you get more tools you get more innovation this is how economies work right if you add more inventors you get more inventions and there's a proportionality thing there we have to estimate and then what's the beta for the beta is this thing if you're an economist you call this an elasticity this governs the diminishing returns on how numbers of people create innovations so think of it this way there's saturation effects you can just get lazy because someone else will invent the avocado slicer for you yeah so beta controls the diminishing each additional person contributes less innovations than the previous one fewer innovation sorry then the previous one because there's some diminishing return effect to innovativeness if you're the only person on the island and you got to invent everything yourself and you're really motivated right and then you add some more inventors and then you start getting lazy yeah it's like a lab so we need to measure alpha and beta their parameters but he is data and then there's the loss people forget stuff and tools are lost either well either because people forget them and the technique is lost and you can't reinvent it or you don't need to use it anymore the reason you invented that tool in the first place is no longer worthwhile because there's just some rate which ecological challenges come and go where you live so this is the world's simplest cultural evolution model but I wanted to show it I know most of you don't work on cultural evolution although I also know some of you do want to show it because now we're going to fit this today and I want to show you at least one model in this course which is not a generalized linear model and okay so we've got three parameters now alpha beta and gamma just some loss rate per tool loss rate notice that the loss term is per tool it's not per person the more tools you have the more you're going to lose yeah more you know you more you forget every day let me tell you that's true right forgotten more in the last decade I learned in the previous 20 so got make room for new bureaucracy rules and stuff like that right a lot of gazettes to the right so scientific model our goal here now is we if we're going to match this to data we've got a cross-section we don't have a time series this model implies a time series and if we had a time series it'll be gold we could do that and where I learned I learned statistics in any studying ecology sorts of problems where you have time series of like numbers of wolves and yellow stone and stuff like that but we can also use the same model in a cross-sectional case we need an expectation so you can solve these dynamic systems for steady states that is after a while the innovation and loss processes balance and where will they balance will they balance where the change is zero when delta t equals zero that means the two processes are balanced that's it so you replace delta t zero and you solve for t boom science right so on the left we've got I put a hat over the t those are you taking modeling with me you know about the hats right it's like a party everybody gets a hat all of the t's that means that's the steady state that you go to and then it's this alpha p to the alpha p to the beta over gamma is the expected number of tools this will still be stochastic right it's not actually a fixed point but this is like the mean of the stationary distribution and then we can just stick it inside the plus on there's no link function there's no ad hocary it's just there now surely this is an inadequate model in many ways but I would assert it's a way better start the generalized linear madness where we had because the intercept is fixed here guarantee you this passes through the origin and we didn't have to force it to do that or anything it's still going to do silly things a little bit but at least now the violations mean something and they're going to push you in the direction of some actual you know science pale science right so you can just write this into a Markov chain no problem just like all the other stuff you've learned right so just draw some circles right and same idea this is just like the possible regression that we did at the end of Friday but now for lambda there's no link function because we've just got this expected lambda now expect a number of tools the only trick is all the all these parameters have to be positive and you've got an array of tools at your disposal to ensure the parameters have are positive I'm using to here because I wanted to show you two different ways to do it one is you can exponentiate the parameter so that's what I've done with alpha if you exponentiate any real number it's positive right so this is log normal think about a log normal variable a normal distribution can be any real number negative or positive if you exponentiate it it's guaranteed to be positive that's why log normal variables are always positive so this is what if you exponentiate this is just a trick for making alpha positive yeah then you have to be careful to understand what the you're talking about a log normal now is what this denormous it's a log normal with mean one and standard deviation one and for the other two I just give them exponential distributions and exponential random variables are positive yeah so it makes sense so the only thing you have to do is you know what the parameters mean now their rates and so they're positive that's what we need and then it runs chains happen and now we can compare the two models on the left is what I'm calling the scientific model and on the natural population scale in the horizontal and then just to repeat it so you can compare the scientist this model on the right now I'm not going to argue that the scientific model is doesn't have flaws here but at least it passes through the origins now you see that it's aiming for zero zero as it's got to and you get real separation now between the solid line and the dashed line it doesn't let them cross at high values and you get all this stuff for free because there's actually some science to make in this model it's not just pure generalized linear madness certainly there are violations but the violations mean something scientifically now they're not just weird features of the tide prediction engine right staring at the weird gears at the bottom and the parameters have biological meaning you could actually use alternative data sets experimental data sets to get information about those parameters in addition to this kind of cross-sectional data right unlike say alpha in the Poisson regression which means nothing it's just the thing that bends the line around yeah sorry this is my sermon so those of you who live in my department you know these things I'm really like animated about trying to have real theory models made it to the statistics to think about you can't really teach a class like that because every particular scientific example there's all these details that are peculiar to it and that's the problem so why do I teach a class of generalized linear models because that's useful to everybody right it's just got this flaw that you need to be aware of is you want to see it as the generalized linear model of some entree to a research tradition you will build where you will have more meaningful scientifically based models that can make predictions at a bunch of different scales okay last thing I want to mention about Poisson regressions it's super important and useful is that if the different counts you've got in the data set each row there may be different observation windows so-called exposures that apply to each so if for example now it doesn't work for the oceanic islands but say you're counting fish someone goes fishing in a lake and they're pulling out fish and the fish come out to Poisson right okay that's approximately correct because there are a lot of fish in there you won't catch most of them today yeah so it's approximately a Poisson right and but if somebody spends twice as much time fishing as you did you can't just directly compare those counts you got to adjust for the so-called exposure difference does that make sense yeah and so how do you do that well there is a very principled way to do this in a Poisson regression you just use what's called an offset which is the log of the amount of time spent fishing and you just add it to the linear model and so this is a point where I say just draw the rest of the owl because I'm going to skip over the rest of this slide say that in the book there's a section where I slowly derive this for you and why it's exactly the right thing to do there's no ad hocary here it's not magic that it's the log of the number of hours it's required it's the only reasonable thing to do if you believe the Poisson rate assumption okay so draw the rest of the owl at home there are a number of other count distributions and I say more about these in the text multinomial categorical models are extrapolations of the binomial and logistic regressions to more than two unordered outcomes so it's like a coin with more than two sides you flip it and it's going to land on one of them right or one of those those fancy Dungeons and Dragons dice right like 20 sides you can do that as a multinomial or something and geometric distributions or count distributions but they're the number of time steps until some event and then there are mixture distributions and there are examples in at the start of chapter 12 in the book these things called beta binomial and the gamma Poisson also called the negative binomial these are binomial and Poisson regressions but they allow the rates to vary by some unobserved heterogeneity in each case and so they they have writer variants there are a lot like multi-level models but they're like multi-level models in which you don't estimate the random effects so I'm gonna I'm not going to teach these in class but you may want to look at the the notes and then I am going to spend two weeks doing multi-level models and you'll see that those behave similarly to these but are actually much more flexible so I'd rather spend the class time on the multi-level models but I think these are very useful types of models at the bottom I mentioned this thing this dairy clay multinomial thing we'll talk about dear clay distributions I think on Friday I hope to get to them and tell you what that is okay I promise some tax so I want to motivate a very useful class of models for you called survival analysis and I say just motivate because this is kind of thing you got to go home and and run the code and and look at the details draw the owl right I can draw some circles and then you got to go home and draw the owl but let me motivate there are two things to really understand about survival analysis first of all it's they're like count models because they're discreet events that are happening the thing of interest is that stuff is happening and we're going to count it those events could be mortality events they could be bursts the example in the text is cat adoptions right how many cats are being adopted in Austin Texas this is the actual data set and but to do this properly to count those events what you need to do is estimate the rate of those events this is just like the Poisson and binomial models the parameters are about rates and the outcome is a count survival models have that feature to them as well but because the exposure window changes and other kind of events create this effect called censoring you don't get to see some cases whether the event would have happened or not so imagine some lonely cat in the Austin Animal Care Facility and it's waiting to be adopted and then it escapes right because the doors open this happens in the data set by the way there are escape there's an escape code escapees they're loose and they're free right and we don't know if that cat ever would have been adopted or not right now it's free so what do you do with that data and the wrong thing to do is throw it away because how long it waited but not being adopted is information about the rate so you got to count the stuff that wasn't counted it's really weird this is a weird thing about survival analysis so the thing that you want to model get the probability of is the waiting time and it's not just the waiting time until adoption it's also the waiting time that you weren't adopted until something else happened that meant you weren't adopted or we didn't get to see and those things could be escaped died of natural causes right all number of kinds of things that can happen to cats they ascend right whatever happens to the cat and this is a phenomenon called censoring and there is two kinds of censoring to worry about left censoring and right censoring left censoring is when we don't know when the risk period started we don't know when the cat was brought into the animal care facility and could have been adopted that's called left censoring and then there's right censoring which is more common typically in data sets that is the data set gets taken at some point and there's still cats in the facility that haven't been adopted yet we need to use their data even though they're right censored because the total time until their adoptions we don't know it yet these are the two kinds of censoring to worry about and you can handle them in the same sort of way if you ignore censored cases you get biased estimates and for this audience I tried to come for the example you might understand imagine you want to estimate time to phd and say a mox plonk graduate school program and but you ignore all the people who drop out of the program which never happens here right and you will get a biased estimate the amount of time they spent in the program before they dropped out is information about how long it takes and if you drop that out you're going to be getting the wrong estimate you're going to bias it in a particular direction and I would like you to think about which direction the bias will go okay cats so this is in the rethinking package now this is 20,000 cats from the Austin Animal Care Facilities is all up on their website this how I get these things right and they've got it all and there's lots of things known about these cats they've got a whole computerized system where cats come in cats go out they track them right they both got chips in them right so they get their IDs and you scan the cat computer pops up tells you if the cat's been there before yeah and 20,000 cats time to event event is adoption we're interested in adoption rates in particular want to compare black cats to non-black cats because I have a hypothesis that people are bigoted against black cats and I love black cats I like all cats actually but particular black cats and so we're gonna be interested make rates of adoption for black and non-black cats and but the thing is there are adoption events and then you just need to predict given some assumption about the rate what was the probability you would wait exactly three weeks set to be adopted but then there are other events that censor your ability to see an adoption and then we want to predict the probability of waiting that long and not being adopted so that makes sense this is the censoring how you deal with censoring and so what are the other events well you you can load up the data and see there's lots of stuff that the something else is some of them die because some of these cats are brought in they're very old and so it happens but they've got eight more lives don't worry and some of them escaped this is a code right the cat just went missing escaped and and then pure censoring is the cat is still there when I downloaded the data and so that's a case where the observation window has stopped but the cat's life goes on but we need to use that data does this make sense so epidemiological studies are often like this you're trying to to figure out many causes of death and you're following a population along and things are happening to them lots of things but it's not just that basically any kind of time series you can think about censoring happening in it I have a paper in press now which is about sage grouse and we use a survival analysis in sage grouse because you only do a focal follow in any particular grouse for an open window and then we want to rest many probabilities of them doing particular cool things and you stop watching them when the beep happens and then they could it still do it right even though you stopped watching them and sage grouse or if you don't know sage grouse is the coolest living dinosaur just really remarkable North American dinosaur you should Google okay so again this is part where I say go home and draw the owl I just want to give you an idea the simplest kind of distribution data just data probably distribution for a survival analysis is an exponential this means the rate is constant and then it's like nuclear decay imagine a cohort of a hundred cats if there's a constant chance every day they get adopted then there's a half life after a certain amount of time half the cats will have been adopted and then after that same amount of time another half so you go to a half to a quarter to an eighth to a sixteenth to a thirty second and so on right until you're at a fraction of a cat yeah as it goes that goes down and that's the constant rate and so for observed adoptions we just get the probability of being adopted on that day comes from the exponential distribution easy enough and then we've got a lambda we can put everything we know about the cat in the generalized linear model and you know with a log link and and we're gold it's no problem with censored cats you got to think about what it means to calculate the probability that it hasn't happened after a certain amount of time and this is there's this is the kind of thing you're going to sit down with and look at the notes to think about it this is drawing the owl but you can use the exponential distribution and you just need to cumulative version of it because it's it's the number of cats who will have been adopted up to that day and then if you take the compliment of that you've got the number that who won't have been adopted up to that day and that gives you your probability again this won't be totally clear right now in lecture when you sit down with this and think it through with the notes in hand you can figure it out and this is where we get our censoring probability the probability the cat was not yet adopted on day D and that's what you want to get and it comes from this thing called the complimentary cumulative distribution I know what whatever distribution you use for the rates you always get the right censored probabilities from this seat so-called CCDF the complimentary cumulative distribution function okay the codes in the text the only exciting thing to say about this is this is the first model I'm going to show you where you've got this multiple choice likelihood function in here Oolong is not feature complete yet but it is a fully operational battle station and you can destroy planets with it and you you're really just coding the log posterior here this is just raw stuff so you can just write what is this days to event pipe adopted equals one that is if adopted equals one this is the probability use and then it's just an exponential if it if adopted equals zero means it was something in some other kind of event now we're censored and now we need this CCDF and there's this custom tag and when you long sees a custom tag it just dumps whatever you put in there into stand so this is stand code in there so this is why I say this battle station is fully operational you can do lots of dangerous things you can also do lots of really wonderful things this way there's an overthinking box in the notes where I walk through this very slowly with you so you understand what's going on okay what do we get some results my hypothesis is correct black cats are discriminated against and they're adopted at lower rates than non-black cats by the way there's a column for cat color which is why I can do this and somebody would just make an upcat colors there in Austin there are a lot of unique cat colors they got really really exciting so I just took the ones that are were called black and put them there but there's also like black smoky and smoky black you know and so there's lots of other things you might you might code the data a different way there might be more in here and the difference might be bigger so there's a difference in rate and and please run this for yourself and take a look at how it works and draw the plot okay let me spend the second half of this lecture transitioning to chapter 12 that was all chapter 11 chapter 11 is intro to count models and survival analysis give you an idea about how to work with link functions chapter 12 we get into more elaborate types of generalized linear models where you start mixing components together to do well to handle very important data modeling problems and the central metaphor I have for this is like monsters because this thing about monsters in mythology is it's not just that they're bigger right you don't get monsters they're just like a giant house cat or something like that it's it's always like bits of different animals stuck together for some reason there's something about human cognition cross-culturally which loves this idea it's like you make it monstrous you've got to make it like bits of a dog and a cat now it's a monster it's just a big dog that's not really that monstrous you may give it multiple heads okay it's a monster but if you've got to do something extra to it then just make it big and so you know whether it's minotaurs or is that a griffin and on the lower left I forget the name of this but this is this Australian aboriginal serpent thing and then Hawaiian legend one of my favorites is is a man who's the offspring of a shark and a human and he turns into a shark at night he's got like a big shark mouth in his back and it's good good times so and if you're gonna think about statistical models though and making a monster well they're devices right they're like robots it's kind of like the junkyard challenge has anybody ever watched this show no you it's it's it's monstrous but it's people going to junkyard to try to put together working devices that may kill them during any particular episode and but then you have to go to a junkyard and then make a rocket-propelled go-kart so that's the junkyard challenge so don't try this at home or anywhere near me but the stuff we're gonna do is in some sense like a junkyard challenge but we've got it's safe in the sense that there's lots of principles to guide you and the way you make sure the things work is you use simulation to make it work so from this what we get are more complicated GLMs things that I call monsters for dealing with order categories and ranks which are outcomes which look like counts but are not counts they're discreet they're ordered and the gaps between them are totally unknown it's not like integers in a count the distance between one and two and two and three is the same it's always one and unordered categories like a Likert scale it's not like that at all and this makes them a bit monstrous but we can handle it and I think I'll spend all of Friday just working on order categories other kinds of things are mixtures this is what the chapter starts with and there are different kinds of mixtures with varying means and probabilities and rates like the beta binomial and negative binomial models that I mentioned just a few slides back and that's the start of this chapter I stood I want to spend time and lecture talking about zero inflation and hurdle models this is a really common issue where you've got a count it's a good count but it arises from more than one process and so there are lots of ways for example you can get a zero in science since you're all scientists you're already imagining ways that this can happen this is this pollution effect where there are different ways to get zero so the simplest would be your detection is just not good enough and so when counts are low you can record a zero that's not a true zero but you record it as a zero that's zero inflation I'm going to give you an example okay the example I use in the book is a simulation example so you can see how the process is mixed we're going to generate fake data here in this example and I do it in the context of thinking about a monastery mystery and in the history of this course actually this came from case that one year I taught this at University California Davis there was somebody in the class who studied monasteries and we would chat about data problems with monasteries after class and that's where this idea came from so it's a little bit silly but there's a real inferential problem to be figured out here now what I want you to imagine is that you're some sort of medieval investor entrepreneur and you go around buying up monasteries why might you want to do that because monasteries they they they bring in money they copy manuscripts and they produce wine and both of these things produce income and so it's worth owning monasteries yeah also the labor is kind of free right you just pay them in wine and parchment maybe some maybe a pea garden right and so your issue when you're trying to evaluate the economic value of any particular monastery is its output rate and what is the working capacity how many manuscripts can they make per day I mean the problem with just looking at the average rate per day is it comes from multiple processes right you want to have a causal inference here so most copy manuscripts and they can they can finish a certain number in a given day but they also drink right this is thing about the Christian tradition that it's compatible with alcoholism and sorry I'm Scottish descent so yeah definitely right the Presbyterian tradition but the data we're interested in is the number of manuscripts completed in a given day but we want to infer the number of days they get drunk I'm trying to identify the drunk monasteries and how would you analyze this so let's build up the problem scientifically again to get us to a generalized linear model this is something that's going to be called a zero-inflated Poisson observation there's a hidden state that we can't observe in principle we could write we could set up listening devices and cameras goes captioning in the monastery but the technology none exists back then right and you want to hang out in Prague not visit your monastery so you're just getting the data and so you can't see this hidden state you don't know if on any particular day whether they're drunk or sober and you've got you observe a zero now the question is were they drinking that day well they could have been but they could have just been working slow maybe they all they finished a bunch of manuscripts the previous day and they started a bunch on that day and they didn't finish it but they were working really hard you'd observe a zero both ways so if you serve a non-zero they weren't all drunk maybe only some of them were but if you observe a zero you can't tell what happened the hidden state is hidden from you but what I want to show you is even though you can't necessarily say on any particular day whether they were working or drinking you can't say on average how many days they drink if you've got enough data that's what that's what zero-inflated models do you get information about the two mixed processes even though you can't see the hidden state this is a super important problem most of you don't have monastery data but you do study problems where zeros arise from more than one process this happens all the time any kind of detection problem will be like this you walk sorry I used to work with a lot of ecologists so you walk transects through the woods and you count birds you get zero that day does that mean there were no birds no maybe you're very distracted right or visibility wasn't good any number of reasons you could get false zeros false zeros are incredibly common in observational studies in laboratory studies it's usually not called inflation but it's you get a hurdle model is what is called or zero augmentation anytime your chemical assay has a minimum sensitivity before it will say anything is there this is those of you who work at benches you know what I'm talking about yeah then you'll get false zeros and this is called zero augmentation above a certain concentration you measure the concentration quite accurately below that everything shows up as a zero this is zero augmentation and it's very similar the models work very similar to this so we can do this let me let me build up this flow diagram on the right at the top of this nature begins there are two processes that could happen P at the time the monks decide to drink it's like they've got a coin the biased coin and they flip it every morning right in the breakfast hall and you know and comes up stines and they drink and one minus P at a time they work so if they drink then you observe a zero but if they work you can also observe a zero there's another way to get a zero and the or are greater than zero so if we simulate counts from this process you get data that looks like this this histogram on the left and the black ones is a pure Poisson process that's what a Poisson distribution with a mean of I think one yeah with a mean of one looks like and the extra blue bit on the zeros well those are the drunk days so you get zero you get extra zeros this is where the zero inflation terminology comes from so this makes sense so these are your extra zeros when you're walking transects in the woods or when you're doing chemical assays or whatever it is you're doing you get some inflation of zeros and so the the aggregated data is not Poisson distributed it's a mixture of a Poisson distribution and something else in this case it's a Bernoulli distribution but it's a mixture of Poisson and something else this makes sense this happens a lot this is an incredibly practical type of model so let me walk you through this graph again the process graph again to help you understand how we write down the likelihood we need a function for the probability of any given observation whether it's a zero or non-zero right the number of manuscripts finished you make these models move by every Bayesian model moves by counting the number of ways you could observe that thing conditional on your assumptions so we need to get that now so we're going to walk through the garden again at the beginning we've got this binomial process Bernoulli process P at the time they're going to drink one minus P the time they're going to work say you observe a zero there are two ways you could do that one is going down the P path there and the other is going down the one minus P and then exponent minus minus lamb does just the Poisson probability of getting a zero it's a really simple nice likelihood function the Poisson super simple so we need both of those terms and then their alternatives and you may remember when you study probability theory if there are two ways for something to happen then you add them together right is because it's an or every time you say the English word or you need an addition in your probability expression so either P or one minus P exponent minus lambda and that's your probability of a zero that makes sense and the other thing that can happen is you observe in and there's only one way for that to happen so let me try to summarize for you I want to say there's a one way for that to happen in which case you just go down that path on the right one minus P times the Poisson probability of the value greater than zero which is that thing that's the Poisson likelihood that you're looking at there you don't have to memorize that your computer already knows it but that's all it is so let me try to summarize two ways that we want the probability of a zero it comes from you you make a graph this this is a dad by the way it's a directed a secret graph they're back it's used for a different thing this is not a causal day right this is a this is statistical day but we can go P or one minus P exponent minus lambda that's probably the zero conditional on P and lambda yeah no you're drawing the owl now you'll sit down and get this later and but this general strategy works for anything no matter how complicated the model is if you can write down the dag process for it you can get the likelihood out of it and then the other path going down the right one minus P times the Poisson probability so it's just one minus P times the Poisson that's all it is why is it one minus P there because you had to work to observe something greater than zero the only time you'll observe account greater than zero is when they were working that's what the model assumes right good exciting yeah so that's all the hard work usually when you run these models you never see all that stuff because they're these little helper functions like zip Poisson Zi Poisson for Zi for zero inflated which is bundle all that together they take those two probability expressions and if then statement and they just use them to construct to return the right probability for any observation and then now here's the trick that is going to look a little funny we've got two parameters we've got a P which is the probability they drink or work right that's the probability of which process you select and you can make that a linear model of anything you like a generalized linear model of anything you like you can have predictor variables that predict which whether they drink or not that are different from predictor variables they give you the rate at which they work they could be completely different for example the weather may determine whether they drink or not but the weather may have no effect on how rapidly they work yeah depends upon the scientific process but there's these are just up to your scientific situation they should be bespoke remember this annoying word that no one but me uses you have bespoke they should be bespoke to your application these linear models are different models they have their own parameters they have their own predictors whatever you think is is needed to make the right kind of model but you need link functions on them and lambdas like a Poisson regression P is like a logistic regression and you already know how to work with both of those this makes some sense this is why I said these things are like monsters right there's little bits so we have taken the cat and the dog and stuck them together to make the cat dog and it's terrifying but it's also really powerful and a model like this can do a lot of really good things because natural processes have these fixed features natural observable processes are mixtures of component processes so we need models like this in the book we simulate the data so I don't walk you through that now to help you understand what's going on this is a when I'm developing bespoke is that we're again a statistical applications I always do this I want to know that my code works I want to help my help develop intuitions for myself about how the model behaves so I simulate data from a known process and then I feed it into my statistical model I make sure the thing functions and so we can do this this so-called dummy data process quite easily in our I've already showed you several examples the goal is to recover estimates and understand the model and also kind of test the limits about what the model can do they're going to be combinations of parameters where the model doesn't do very well because that's how nature is here's the whole simulation it's really easy and I'd like to walk through this code I don't do a lot of that in this course like to walk through it so you see how dead easy it is to do this kind of simulation this is the same statistical model but this is going forward so remember at some point in the beginning of the course I asserted that for a Bayesian model all Bayesian models are generative what that means is you can run them in either direction if you don't have data you plug parameters into them and they produce data that's the direction we're going to go on this slide if you do have data and you don't have parameters you run them in the opposite direction and then they spit out not parameters but distributions of parameters because it's not sure what the true parameter values are so it just rates up their relative plausibilities that makes sense but you can always run the model both directions this is a huge aid there are lots of statistical models in other frameworks which are not generative and you cannot run them both directions that doesn't mean they're bad models it just means they're hard to understand yeah they don't make predictions in that case okay so first line here we define some parameters we pick we're playing God the probability the monks drink on any given day is 20% that's pretty industrious the middle ages probably when they do work they finish on average one manuscript a day you know these are these are beautiful illuminated manuscripts right you know what I'm talking about these great things with the Lindisfar Gospels and things like that we're going to have a whole year sample now you're deciding whether you're going to invest in this monastery so we're going to 365 days of production and then we just simulate I set a seed so you can replicate this there's nothing special about the number inside set seed this just means that all the random number functions that come after it will produce the same results when you use the same seed but it sets the machine state right and then we simulate a binomial thing first whether they drink or not this is the drink indicator variable yeah 365 drink or do not drink decisions yeah and then we simulate the manuscripts and this is conditional on drinking it here why is one minus drink their drink is a zero and indicator so drink is one why is always zero make sense if drink is zero meaning they worked then it's just a random boss on number which could be zero because boss on distributions do produce zeros and in this case the mean is one so produce a lot of zeros actually you'll get a lot of zeros just naturally from the you know how long it takes to make an illuminated manuscript okay this makes sense is this helpful I when I was learning statistics I thought exercises like this would be only reason I ever understood anything is running the model forward otherwise it's just a black spot machines with spits out probability distributions right okay so here's the code the Oolong code to do this just repeat the mathematical version of the model at top I've written this DZI plus helper function when Oolong sees that it's going to interpret that to mean you want to do this multiple choice thing where you ask if it's a zero or not then you have that P plus one minus P e to the minus lambda and if it's greater than zero then you use one minus P times the plus on probability of that count just does that internally that's all it does and then the rest is a standard model all the other features you're used to no predictors in this but AP and AL the intercepts of the two processes are different parameters because there's there's an average rate you need to estimate and there's an average rate of drinking you need to estimate just make sense yeah and just show you at the bottom the machine works right you knew it was going to work otherwise I wouldn't be showing the example but there's a selection effect but no this is a this is the way you verify the machinery is working the posterior mean rate of drinking that you estimate is a little over 20% right and the rate of completion the rate of production on any given day they do work is right around one something to note is your simulations are finite so you shouldn't be surprised that you don't recover exactly the data generating parameters because the sample doesn't represent them exactly so if machine works correctly it won't get exactly right true data generating parameter values so that makes sense but you should cover them with high probability if the machine is doing right and the sample is of any reasonable size so you should not be surprised that it is not exactly point two because the sample hasn't just because of the stochasticity they drank more than point two of the days in the year just because of the stochasticity and that's why the posterior distribution says it's more like point two one because that's what it was actually in the sample does that make sense okay there's an overthinking box in this section at the end of this section where I talk about how you could code this without the helper function I think if you're interested in understanding these kinds of mixtures and mixture models are really useful this is a good box to attend to this is the same a lot model it'll give you exactly the same posterior as the previous one the one with the dzi plus in it all dzi plus does when you want sees that it replaces it with the two lines at the top here again there's this why pipe why greater than zero means why conditional on y being greater than zero is distributed as this thing and then there's this arbitrary stand code in there and I know it looks weird but there's a box that explains why we do things this way we do everything on the log scale in statistics because otherwise things explode what that means is explained in the box now it's perfectly fine though to use dzi plus I do that's why I wrote it but it's good to have some understanding about what's going on and if you ever tried to do this on your own you are licensed you are licensed professional now you can draw the owl yes you can okay there's no reason you can't also do zero inflated things other ways like zero inflated binomial would look very similar there's this Bernoulli process that generates x has zeros and then there's a binomial count you replace the Poisson with a binomial and the model stays basically the same and you could write that mixture using the custom code if you like I have not yet written a dzi final for a lot but I could do that I guess but you could just do it with the custom code too there are also things called hurdle models I mentioned these these are much more common with continuous distributions where you've got some like I said some chemical assay and it'll be some number greater than zero but there's an excess of zeros because there's some thresholding effect where you can't detect anything and you get lots of this in in bench work all the time and so another case where this shows up when the anthropologist in the audience is in human production data this happens all the time because say say you're a human forager something my department studies a lot you go you spend eight hours in the forest and you come back with nothing that happens most of the time hunting's hard right the animals don't want to be eaten surprisingly and they're not like goats and sheep right it's just like beg to be almost right no they don't but it's they're not hard to catch if you're hunting a howler monkey some of the data that that my department analyzes it's much harder to catch them and half of trips will be zeros and then when you do manage to get one what's recorded is the kilograms of meat you bring back so if you're if you want a statistical model for the kilograms of meat produced per hour of foraging it's massively zero augmented massively it's it's not that there's some threshold effect it's that hunting's hard and 50 to 60 to 70 percent of all trips can be zeros and but when you do get something it's way more you're going to eat today and that's why human societies exist because of that basic economic reality that we balance the risk against the high need return of hunting so but there are lots of cases where you get mixtures of processes right in order to have some kilograms of meat to eat two things have to happen you have to catch something and then you need to figure out how big it was right in both of those there are two distributions that mix together to produce then the observation that comes out yeah last thing I want to say at the bottom multi-level models are also a mixture and a very flexible type of mixture we'll spend like I said probably two whole weeks on multi-level models at the end of the course okay when you come back on Friday we're going to spend all day Friday on one of my favorite kinds of outcome distributions it's incredibly common at least in psychology yeah also in anthropology called the ordered category so we're just going to stop right here and when you come back I will have a well-worked-out large data set example for order categories on both sides of the regression meaning as an outcome and as a predictor and both cases require special treatment and it's incredibly useful for sort of basic regression tasks in the behavioral sciences okay thank you for your indulgence I'll see you on Friday