 hierarchical base. So what I kind of want to do here is build on what Shannon talked about yesterday afternoon, which I think really laid the foundation for, in my mind, the idea that ecological data are complex. They have a lot of idiosyncrissities and it's important for us to bring appropriate statistical tools to deal with that complexity of data, often construct models individualistically for specific types of analyses based on the characteristics of those data and their challenges, rather than taking data and trying to, you know, twist them, sometimes inappropriately to fit some long-standing canned test. The classic ones we learned in intrastats were derived, you know, quite a while ago at a time when we couldn't solve problems computationally and so you had to solve things analytically. So you think about like why regression has so many assumptions. So many assumptions because they needed to be able to solve it analytically. If you didn't have to solve it analytically, you wouldn't have made those assumptions and we no longer have to solve it analytically, so we should not feel bound to those assumptions. In terms of hierarchical models, I think they deal with one of the things I was trying to highlight in my lecture yesterday morning, which was, to me, one of the real characteristics of ecological systems is their, their, their heterogeneity and their variability and the fact that we can't, we can often account for that variability even if we can't always explain exactly what's causing for that and it's important to be able to account for that unexplained variability or better way of thinking of it is the as yet unexplained variability because sometimes we can use hierarchical models to characterize variability and then that helps point us to what scales and space and time and organization and phylogeny and whatever that are the most important in driving our system to help us actually understand where there might be additional process understanding. That said, we deal with systems where there is just incredible complexity and we can never measure everything and so hierarchical models are a way of helping us account for the fact that we can't ever measure everything. There's always going to be unexplained variability. So I'm going to start with a very simple case to hopefully give you the idea behind it. So imagine I have data and I'm sampling this data over different observational units. This may be different individuals, different plots, different watersheds, different lakes, streams, islands. You know, however we are doing the sampling, you know, ecologies are frequently sampling the world. And so imagine I've got data sets coming from why different sampling units and these data sets themselves might just not be individual numbers but whole vectors and matrices of observations, whole tables of data, but they're the data I collected on a particular observational unit. It could be a plot, it could be a year, it could be a species. And when I'm confronted with making the same measurements over different observational units, I'm often confronted with a conundrum of how to analyze that data in terms of how I think about the independence of that data. So one thing I can do is I can take data from a bunch of different plots. I can lump all of that data together and fit one model to that aggregated data. As a simplest case, imagine I've got observations from a bunch of different plots and I'm just going to fit a mean and say, you know, what's the mean abundance of a specific species? Well, I'm going to aggregate the data from all the different plots and calculate an overall mean. At the other extreme, I might say, well, the observations within each plot are not equivalent to each other. You know, things going on in one plot might be slightly different than things going on in another plot and so I might want to calculate independent means for each of these plots separately. So these represent the extremes in what I think of as a continuum from all the observations being completely identical to all the observations being completely independent. And often what we have in reality is something in between where things are different from plot to plot, unit to unit, river to river, but they're not completely independent. You know, what's going on here is not just utterly wildly different from what's going on over there. They're shared information about the process and this is where hierarchical models are handy. So they provide this intermediate case to represent this continuum. So in a simple hierarchical model for, say, calculating a mean, I would have a mean that I'm calculating at each site, which I'm calling here the theta one, theta two, theta three, that's telling me what is the mean of this data, but then I'm also going to fit a hierarchical across site mean. So it's called hierarchical because from the perspective of the parameters in the model, I've added another layer. What's going on here is described by a model at a higher level. This is a fairly simple model because all it's saying is what's going on at this level is I just have an overall mean and I have some site to site variability, which is actually a pretty powerful concept, though a simple one, which is that once I've done this, I've now actually partitioned my variability. So before, the variance here is just the variance over all the whole data sets. Here I have variability within each data set, but I'm fitting them independently. And here I've taken the overall variance and partitioned it into the variability within each of these sites and the variability across sites. So I've taken the overall variability and partitioned it into an across site component versus a within site component. And I said this represents a continuum between these two extremes because that continuum is driven by what you see in the data. If the data tell you that your sites are very similar, then your cross site variance is going to be very small and this model is going to behave more like this one. If the data tells you that there's a lot of variability from site to site, then this across site mean isn't providing much constraint and it's going to behave more like this independent mean. So the actual amount of variability tells you kind of informs to what extent the system behaves on this end to that end and kind of represents that whole continuum in between. From a practical perspective, what we do when we fit hierarchical models is we're not fitting these means, taking them as knowns and then fitting that mean, but what we're actually doing is fitting them both at the same time. So from the perspective of doing this in the context of a Bayesian model, when I fit this model, I have a mean and I have to have a prior on that. And it's just a prior with some parameters I give. Here I have priors on each of these independent means. Here when I'm fitting this mean, this is essentially acting as a prior on what's going on in this particular site, but this itself has unknown so I have to have priors on this thing. So I've pushed up the priors in the model to a higher level. And so it's kind of like from the perspective of each site, you're fitting it giving an informative prior, but you're calculating the informative prior from what you're seeing at all the other sites simultaneously. And that actually leads to something I'll reinforce as I go through this some more, is this idea of borrowing strength. That the information, the inference you make at each of these sites, borrows information from the other sites because the cross-site variability tells us what the overall mean and variability is and that acts then as the prior for what you're seeing site by site. So I'm going to go from the high level concept and dive into some of the math of how you would do this for a simple case. And then from there I'm going to actually dive into kind of the JAGS code of how you might implement these things, but then add, you know, kind of explore some of the complexities as we go along. And then again highlight some of the strengths of thinking about things hierarchically. So what I've kind of done here is I've taken what we looked at graphically earlier and kind of written it down in terms of some probability distribution. So here I'm saying y of k, so k could be any one of those data sets from each of those sites, assuming that it's distributed normally with some site specific mean and some variance. I'm starting now with a model where I'm saying this is data, these two are unknown parameters, and then since these two are unknown parameters, I'm just going to put priors on them. So these are the priors on those. So this is a model where I'm fitting a mean independently to every single data set, but just for to keep things simpler, I've just fit one variance for all of them. So I'm assuming the variability within the sites is similar. I could easily have fit a site specific variance as well. So this is, you know, just a basic emphase model for fitting a mean. Likelihood of the data given a mean and variance priors on the mean and variance. Okay, so here I'm fitting each data set independently, but assuming they each have the same prior. So when I'm writing down this model, these musing tau's, s's 1 and s2 are actual, you know, specific numbers would need to be plugged in there for your choice of prior. I'll just say explicitly here the IG is an inverse gamma. It's not the only possible choice of a prior on a variance. It's one that I use frequently because it's conjugate. Okay, so if we wanted to extend this instead of being independent means with a common variance for hierarchical models instead assume the prior contains unknown model parameters. So if we want to move this to a hierarchical model, we want this not to be a specific number we put in, but we want that to be an unknown that we're fitting based on the data. So if that becomes an unknown that we're fitting based on the data, then we need priors on these two things. What's our prior on the mean and what's our prior on the tau? So that tau instead of being the mean on, you know, the variance on UK becomes our site, the cross-site variance. Hierarchical mean, assuming a common variance, yeah, we're just going to put priors on each of these. So that's an unknown, so it needs a prior. That's an unknown, so it needs a prior. Those of you who've done Bayes a lot know this. Those of you who are relatively new to it, you know, in Bayesian inference everything, everything that's unknown, that is an unknown needs a prior. Anything you're trying to estimate has a prior. So once we move having the cross-site mean and variance being unknowns, we need to put priors on them. I put this slide in orange because it highlights kind of the some of the key take home messages about hierarchical models and why I find them particularly useful. So first of all, what they're allowing us to do is to model the variability in the parameters of a model. In the simple case, we're modeled with two parameters, a mean and a variance. So we're modeling the variability in the mean. You know, any model of any complexity, you know, from a simple mean to a linear model to nonlinear model to some complex process-based models can be fit in a hierarchical context and in that context, we use this additional hierarchical layer to model the variability in the parameters. You know, so in some of the works one of my postdocs system is doing right now, she's fitting process-based threshold carbon cycle models in a hierarchical framework where she's asking what is the site-to-site variability in, you know, these mechanistic models with mechanistic parameters. But the concept is completely identical. You have some model of what's going on at each site and then you have a statistical model above that describes the site-to-site variability in the parameters. And there's lots of reasons to expect that in real ecological systems that the parameters are models are not always identical for every site. But at the same time, there's reason to expect that they're not completely independent of each other. You know, if we think that there are common processes, we won't, we don't think that the variability from one site to another is unbound, that there's some range of variability of what we would expect to find and that nature kind of tells us about what that range of variability is. And then, again, we can kind of try to understand what might be causing that. So it's allowing us to take the overall unexplained variability in a system and partitioned out into multiple terms. So in the simple example, we partitioned, you know, the variability in mean across sites. But we can partition variability in hierarchical models at multiple layers and in multiple ways. So often things that represent your sampling strategies, things that would be different if you repeated the experiment again. So if I were to repeat some study a second time, I would put out different plots. I would make the measurements in different years. I would measure different individuals. So those things are the sorts of things that you would put in a hierarchical model. If I repeated the experiment again, I would, I might use the same experimental treatment levels. So the treatment levels are not random effects. Those are your, your covariates. So things that are, that you think explain the variability mechanistically or you think are tied to hypotheses, those are the things that you would put in in your process model. But the things that are random, things that are variable, things like plots, individuals. Yeah, often things having to do with your sampling schemes. And a lot of work we do, they often point to specific, you know, spatial scales, temporal scales that help us actually partition the variability explicitly. And then again, this idea of borrowing strength across datasets. In a perfect world, which may happen if you're an agronomist, where everything is, you know, laid out on plots that are as, you know, perfectly homogenized as possible, and everything is perfectly balanced, you don't need to borrow strength. But it's often true that you have, for example, unbalanced datasets where, you know, you have more information at these sites and less information at these other sites, and you can actually borrow constraint through the hierarchical level from your data rich parts of your analyses to help better constrain some of your more data poor parts. So in that sense, kind of the, the well constrained parts are, are constraining the hierarchical level. And then that is kind of acting as an informative prior on the poorly constrained parts. So taking to extreme, you would be borrowing strength to a new, to a site that has zero data. That would be the, the limit. You can't get less than zero data. If someone knows how to make negative data, I don't want to hear about it. Actually, that's not true. I've had a few undergraduates generate negative data because they, turns out they broke the data we already had. The idea of negative zero data, so is one strength of hierarchical models is they also let you formally think about a distinguishing, distinguishing between within sample and out of sample inference. So if I am, so if I am trying to predict what I think is going to happen, so this brings us back to forecasting. If I'm trying to predict what's going to happen in this site, I use this model. If I'm trying to predict it, it's happening at this site, I use this model. If I predict this site, I use this model. If I have site number four, site number four, theta four. Theta four has no observations. It's out of sample. I've never made measurements before there. How do I predict that? Well, I have this hierarchical layer that provides an informative prior on what I expect to see here and what the degree of site to site variability is in this system. So I can formally say, you know, I'm using this model with insight inference and I'm essentially using this hierarchical layer to provide us a constraint on the out of sample inference. And because you have to integrate over this site to site variability, you get a very natural result, which is out of sample predictions are more uncertain than in sample predictions, which should happen. That makes perfect sense. This gives you a formal way of doing it. By contrast, when you fit models independently, statisticians mean something very specific when they say independent. They really mean it. If I am fitting models independently, I have site four. From a statistical perspective, the fact that I fit models at these three sites, I have zero basis for making a prediction at site four. I have no idea. If I assume that all these things are just completely independent, that means I'm assuming that if I had a new site, it's independent of everything I've seen before. So I can't, I'm not learning anything that's useful to me to make that prediction at that new site, because I'm assuming that it was independent of the sites I've already seen. This gives me a way of predicting it, but it says that I predict the same thing everywhere, which might not make sense. Even if it does make sense, the hierarchical model would tell you that because it would tell you it's not seeing any site to site variability. Okay. So it has this idea of partitioning variability, of borrowing strength. The details of hierarchical models are usually hidden in the subscripts. This is the thing that I think kills people when they're trying to read hierarchical models and I haven't seen them before, which is usually we just blow over the equations. Let's be honest. You're reading through some stats textbook or some really mathy paper, you're following the words, suddenly there's an equation, you just jump. You read the next words. In hierarchical models, not only are the equations important, but what's important all is hidden in the subscripts. You know, I come back here, the fact that this, you know, the detail of what's random and what's the hierarchical effect is hidden in that subscript K. That really matters. This is, that I'm fitting this for each site and then I have this. If I have, you know, a site effect and a year effect, then I have like a K and then I have a T and I have to pay attention because all of your layers of your hierarchy usually get ended up expressed in terms of your subscripts. So pay attention to them. And also to remember that from a statistical perspective, hierarchical models are hierarchical with respect to parameters. You're making a model where there's multiple layers of parameters. It's like with linear models. Linear models can have polynomial terms in them. They're linear with respect to the parameters even though statisticians, whenever they classify anything, they're always thinking about the parameters, not the equations, not the data. So you can have hierarchical models on data sets that are not themselves hierarchical. That said, you can also write models that are hierarchical in the data. So a very simple hierarchical model might just be I have an across site mean and that across site mean tells me about the variability from site to site. But there's nothing that says you can't add process at that hierarchical level. There's nothing that says I can't try to put explanatory variables at that hierarchical level. So I have site to site variability. I'm capturing that. I can write a model at that hierarchical level that says here's what I think explains that site to site variability. I can add process. I mean I can add a mechanistic model at the hierarchical level to say this is what explains, but whatever you're writing at the hierarchical level is trying to explain why the parameters in the model, in the lower level models, are different. So you're always writing models about the parameters and why the parameters would change. And in fact, one of the things you'll see discussed at the end of the hands-on activity, though you don't have to implement it, is a not uncommon thing to do. For example, random time effect or random spatial effect is to move from a simple independent random effect to one where there's autocorrelation. So you might have a random year effect, but you might not assume those random year effects might themselves be autocorrelated. So you've built an autocorrelation model at the hierarchical level, or you might build a spatial model at the hierarchical level saying the parameters are varying from site to site, but they're not varying independently. I've also, I think I'll give an example later of where I've put in, you know, put in covariates at the hierarchical level. It's the modeling of the parameters that makes it hierarchical. Okay, so one of the most common special cases of hierarchical models are what are called random effects models. And so if we take what we had before kind of a model that says each site has a mean, there's a cross-site mean and variance, and so I have a within site variance and a cross-site variance and a cross-site mean as the unknown parameters that I'm fitting. I can rewrite this model of the mean at each site, writing it in terms of the global mean across all sites plus alpha k, which is how this particular site differs from the global mean. So I'm just refactoring this term in this way. When I do that, instead of a model for the mean at each site, I have a model at the hierarchical level for how each site differs from the global mean. Now I'm going to assume that the differences from the global mean are unbiased, so these should be zero centered. How every site differs from the zero mean should be centered around the global mean. It's kind of a weird model to write something that, you know, the difference in the global mean are not centered around the mean. Probably wouldn't want to make that assumption. So this is always zero, and again this tau stays as the description of site-to-site variability or plot-to-plot variability or unit-to-unit, whatever. I have the same prior on the within-site variability, and I have the same prior on the global across-site mean and the same prior on the site-to-site. The only real difference is I've re-expressed this in terms of having the global mean at this level and the deviation from it. That's kind of how we rewrite hierarchical models in terms of random effects models. For a simple mean, model of mean that gives you the exact same result. It can be handy though when you write more complex models that have multiple sources of randomness to them. So again highlighting that random effects always have a mean of zero, and the random effects are again partitioning. Random effect variance attributes a portion of the uncertainty of a specific source, so we're partitioning the overall variability into the within-site and the cross-site in this case. The random effects framework lets us, I think, write models more easily than to have multiple sources of uncertainty. So I alluded to this earlier, but I just wanted to come back to it more formally. What things can be random effects? What do we put in that hierarchical level? Random effects are things that would be different if the experiment was replicated. Like I said before, the plots, the blocks, the years, the individuals, the study sites, they would be different if you repeated and we're often using random effect as one way of accounting for the lack of independence between observations within a block or observations within a year observations made on the same individual are not independent of each other. Any of our treatments, covariates, accelerotrol, things we have hypotheses about are our fixed effects and so this gives us, in terms of linear models or generalized linear models, gives us a class of models once you add random effects in. The things that we're always in those models are fixed effects. You add random effects and you end up with what's called a mixed effects model. Okay, so things to watch for when writing down random effects models. They're not magic. They can't estimate variability across plots, units, individuals, lakes, rivers, islands, watersheds, whatever if you don't have replication. They can do things that sometimes seem like magic. Like I've written random effect models with random effects on individual plants where I actually technically am fitting more parameters in the model than there are individuals in the study, which seems like magic, but it only works because I've been following those individuals over time and therefore I have some basis for saying what how is this individual's behavior different than other individuals behavior. If I just measured that individual once, I would have no basis for saying how is this guy different. I'd have no within individual replication. Likewise, if I have a study that has one plot, I can't put a plot random effect on it because my estimate of plot-to-plot variability is, well, it's literally just my prior. I mean, technically you can do it, but what you get back is what you put in. I have an assumption about what my prior understanding of site-to-site variability is. I see one plot. I have not updated my understanding of plot-to-plot variability. If I fit a model with two plots, I mean, Bayes lets you fit a variance with n equals one, but likewise, you just get the prior back. You can fit it with n equals two, you get a little bit of constraint, but not much. The degree to which this can blow up on you has a lot to do with what your priors are. So if you're starting with really uninformed of priors on your, say, site-to-site variability, and then you throw a really small number of sites at it, you will get very little constraint. They can converge, but they're going to be all over the place. I've written down models where it can be very difficult to distinguish the residual noise term from the random effect if you don't have enough replication. That said, I mean, there are a few places, particularly if you're updating on previous work, where you might actually have an informed prior on, say, what your site-to-site or individual to individual watershed-to-watershed variability actually is. So if you do have informative priors, you can get by with much lower replication. But I find it not uncommon for folks to be like, I've got big data sets with lots and lots of rows in them. Why is nothing converging? It's like, well, because those big data sets represent three sites and you're fitting a site effect and you don't have actually much constraint on what your site-to-site variability is. I can set up a data logger to measure data every second at a site, but if I only set up two data loggers, I don't know much about their variability. And an infinite amount of observations at those sites don't tell me about the variability across them. I had this slide yesterday just to reinforce again this idea that the partitioning of things into random effects actually affects the inferences and the predictions we make. So now you can actually think about what we were doing when I showed this slide yesterday. So I was fitting a simple model that just has a random year effect and a random site effect. And so here, when I'm making that prediction to a new year for a known site, I'm doing an in-sample prediction where the site effect is known for a known site, but the year effect is not. So I have to integrate over the uncertainty of the year effect. But in this case, there's not much observed year-to-year variability. So that's where that prediction comes from. By contrast, if I'm making a prediction in year 10 for a new site, I have a known year effect, but I have an unknown site effect. I have to integrate over that site-to-site variability, which happens to be large. And if I'm making a prediction for a new site in a new year, I have to integrate over both of those sources of uncertainty and then likewise over here. So this just kind of highlights, again, some of those points that I made earlier about how hierarchical models affect prediction, because they allow us to make predictions about unobserved site species year-to-year extent of whatever we've made are random effects. When we make the out-of-sample predictions, we're integrating over that uncertainty versus in-sample where we're not. But also pointing out something I didn't say before is that because the hierarchical model provides these informative priors when making out-of-sample predictions, you can often improve your predictive ability at those new sites with a relatively small number of observations because they had this benefit of the hierarchical constraint. And I'll give an example of that in a little bit with some work we did with allometries, where if I have just a few observations on a rare species, I can get some constraint, I can get much more constraint in a hierarchical model because I'm borrowing strength than I can if I was just trying to fit that rare species by itself without saying I'm learning something from say similar species. And this is an example that goes back to my own graduate work where we're working with forest models and this unfortunately ended up low resolution, but here we're, these are time steps in years, so this is a thousand-year simulation, adult tree density, two species that were put into competition with each other. The parameters describing the means were identical between these two simulations. What was different was whether you counted for the random effects on those parameters and actually showing that in this simulation we qualitatively changed the existence criteria depending on whether we just stuck in like every parameter is at its mean or counted for the fact that there was structure to the variability. So here when we don't include the random effects, the variance is the same because we put that that variability gets pushed into the residual, we're including it, but we're not partitioning it. And here we're showing that by, you know, partitioning it we actually, that structure allowed species to coexist that and actually changed which one was dominant. Okay, other general rule of thumb with any statistical modeling, but definitely with hierarchical modeling, start simple, progressively add complexity. That'll be good rule of thumb as you dive into your own projects this week. Write down the absolutely simplest model you can for something and then add a random effect one at a time. Add, you know, some of the stuff Shannon talked about in terms of like dealing with latent variables and errors and variables and observation error. Add these things in a little bit at a time. If you add everything in at once, if you write down the Uber model that you think is going to be the one that like accounts for everything, and it doesn't work, you don't know why. If you add things one at a time and it stops working, you're like, aha, that last thing I added, that was the thing that didn't work. If you had two factors at a time and it doesn't work, you have to go take them out individually and figure out which one caused the problem. So to take a simple example, imagining a bunch of spaghetti plot of biomass through time, but this is structured, the color here represents observations coming from the same observational block. So there's some, you know, say spatial structure there. They're from the same location and then obviously there's time. What I want to do here is to dive into what this would look like in terms of some JAGS code. Some of you use other languages like Stan and stuff like that, but the codes look very similar. We're using JAGS this week. So first model, fitting a global mean. You know, I have a prior in the mean, I have a prior in the variance, I have a likelihood for the data giving normal likelihood with some mean and variance. And here the data happens to be stored in array that loops over time, block, and individual. So I'm just looking up each individual. And this is, if it weren't for the fact that the data was structured in array, this is just fitting a mean, which is about the simplest Bayesian model you can write down. Confidence interval is very tight. Predictive interval is very wide. It doesn't make a particularly interesting prediction because it's fitting a mean. It's the same every year in every individual. Let's try to explain some of that year to year effect. So we're going to write down a model that says the expected value of x at some specific time depends on that global mean and my alpha t, so my random effect for time at a specific time. That then goes into my likelihood. And then because I have a random effect for alpha, I have to write down its model. So I have a loop over all of my years where each alpha t is sampled from a normal again with mean zero and tau t being my year to year variability. Tau t is an unknown so I need to put a prior on it. So I've kind of tried to highlight and read the things that I added in order to add this hierarchical random effect. I added the alpha, I needed to add the estimation of the alpha in a loop and then that year to year variability needed a prior. And so now I get a slightly wider confidence interval because I'm now estimating more parameters but my predictive intervals are showing much more interesting because I'm now kind of capturing some of the year to year variability. I'm not explaining why there's year to year variability but I'm capturing that it exists. I could then take that output for all of those random year effects and then ask kind of an exploratory mode what explains those year effects. That's actually something that I would very often do is take the posterior estimates of a random effect and then I can do you know quickly do a bunch of exploratory analyses. Plot the random year effects against temperature, plot the random year effects against precipitation, plot the random year effects against wind or humidity or you know nutrient deposition or whatever it is that I think is explaining that year to year variability. Get a feel for what is likely to be important and then go back and add those effects into the model without going and refitting a bajillion models. We can likewise write very similarly the model that accounts for a random block effect. So instead of a random effect on time we have a random effect of block. I have to loop over the blocks and I have a variance on the block which needs a prior. It's essentially the exact same code except I'm looping over a different factor just how the data is structured. And then I didn't put the confidence intervals on this because it makes it messy but we can see that the posterior don't vary from year to year because I didn't put a year effect in but I am seeing that there are consistent differences. You know the black and the cyan tends to be high, the blue and the green tend to be low. If you're colorblind you have no idea what's going on anymore. And then if you wanted to put a random effect on both we just have that the expected value of x depends on both time and block with an overall mean, a random effect on block and a random effect on time. So this is actually where I think a good example of why refactoring hierarchical models in terms of random effects is actually handy because it's easy to then write down a model like this where the different random effects are just combined additively. If you move on to more complex models you can likewise put random effects on parameters. So if I have a say a linear regression and I just have an additive random effect on the end that essentially is a random effect on the intercept. But I could likewise write a model where I put a random effect on the slope to say the slope might be different from site to site not just the intercept. And so I would write down a model that says you know beta 2 is you know beta 2 mean plus beta 2 random effect for whatever is causing that random effect and then I just plug that into the regression. Cool. I don't actually plot this because it would end up really noisy. I just actually plot what I show here just a overall summary table of the models that were fit. Every model has a mean. Every model has this residual uncertainty. We can see as we added random effects we actually started explaining more and more of that unexplained residual uncertainty. So overall variability is here but we know we've partitioned that out and explained the random effects that don't have process to them but they do still explain over two-thirds of the variability. We can see our DIC score which is our model selection criteria does support including those random effects. The other thing that we do see which is a kind of a classic thing that goes on in model selection when we increase the number of parameters the uncertainty in our parameters do increase. So when we're only fitting the mean we're very confident about the mean. When we're fitting this model we're considerably less confident about the mean because we're fitting the mean and then we're also fitting a variance but then we're fitting the cross site variance the temporal variance the cross block variance and then we're fitting 10 year effects and five block effects. So this is where you can get hierarchical models that suddenly seem like they have huge numbers of parameters but an important thing to remember about those alphas is they're not independent. So the total number of so this is one of the things that's challenging about hierarchical models you know what is the number of parameters in the model. Well it's not the number of alphas because those alphas are not independent so the effective sample size of parameters is often much smaller and you know it may be it's ill defined just by the structure of the model because it really depends on what the estimates of those random effective variances actually are. So the random effect variances are very tight the effective number of parameters is much smaller if the random effect variances are very large you're effectively are fitting an alpha for every single site because you're effectively fitting every site independently. So it's something that emerges dataset the number makes kind of makes things a little unintuitive but that the idea that the effectively the number of parameters in the model depend on what data it's fit to. Okay I showed this earlier but I wanted to come back to it again you know just showing going beyond fitting a mean we can fit models that have both fixed and random effects kind of went through this earlier some high lever points this idea that we're using random effects to try to explain the unexplained variability you can use this to point to things that we need to explain and then we can add covariates to try to explain this so this is exactly what we were talking about earlier this idea starting with random effects figuring out the scales that need additional explanation trying to put process understanding on the things that aren't understood to try to explain the unexplained variability but noting that sometimes those additional fixed effects are not justified so that's where model selection is helpful you know they might explain it but it might not be parsimonious so a simple year effect so consider some number of new younger produced per adult female in a population of birds then let's say that we fit a year effect and it shows that there is coherent year to year variability through the whole population in that reproductive output and based on that you could use those year effects to look at the additional covariates without necessarily and like I said you can do the exploratory analysis on that without having to go back and refit a bajillion models if you have simple data you can go ahead and fit a bajillion models but you if you have complex if you have large data in complex models you might not you know things that where it takes are weak for the MCMC to run you don't want to try a bajillion candidate models you want to have some idea of which ones are likely and then you can refine the model to add these additional covariates so some take home messages this is just as important to account for the variability as it is to account for the covariates and this is again something I tried to emphasize yesterday yesterday when we're making when we're doing forecasts how we characterize the uncertainties how we partition them into different sources you know the driver uncertainty the initial condition uncertainty the parameter uncertainty the random effects understanding the variability can be just as important to our ability to make predictions is what the mean model is predicting and then we use these random effects to account for these unmeasured often unmeasurable covariates these things we can't yet explain you know the slides like I said are all in lines you can come back to them so this is an example from my own dissertation work where we're trying to fit relationships between tree diameter in tree height and one of the take homes is this idea of borrowing strength so here are common species I have measurements of diameter and height across the full range and lots of things and so the black line is the Bayesian estimate the red line is just a linear regression that line is the maximum tree size tree height from the literature and they're the same you know I have lots of data the hierarchical model gives me the same observation the same estimate as though I just fit these things independently because the data at that scale dominates here I have trees species that are predominantly just in the canopy I didn't there's not recruitment for these species and so some of my you know linear regression fits say you know yeah they're right at their maximum height and if the diameter is smaller they would have been taller because that's that's the maximum likely to fit by contrast here is what we're getting we're getting the idea that because we see consistent across species variability in the relationship between size in height that it says well no trees have to start small and get big this is the pattern by which trees tend to start small and get big and I make sure it goes through the data so I have an intercept that's consistent with the data I have a slope that's largely coming from the hierarchical model and likewise over here we have rare species that tend to be more in the understory where you know sometimes I have no relationship to height sometimes I'm you know soup growing super fast the hierarchical model gives much more biologically plausible because I am borrowing the strength from the common species to constrain the rare species or species that aren't necessarily always rare but you know not observed over a range of variability that lets me constrain the parameters fully and then these other examples just go on to say that you can also apply random effects to nonlinear models so here's a simple beaverton hole density dependence model recruits are a nonlinear function of spawners this alpha parameter and this r parameter were treated as random effects treated hierarchically so they were fit with an overall mean on each of those and then variability on each of those and so you can get you can fit them hierarchically and I think the hierarchical fits are the dashed lines and the site to site fits are the solid lines and the site to site site level fits sometimes do these somewhat unsensible things and then here's another example of a nonlinear model is coming out of one of my grad students dissertations are fitting the the farquhar von kammer berry photosynthesis model which is nonlinear with a change point transition between these two parts and here we decided that we wanted to explain the variability in two of the parameters vc max which controls the maximum photosynthetic capacity and this alpha the light use of the quantum yield which can kind of controls photosynthesis at low lights and we actually wrote down models for those things in these nonlinear process based models that were a function of fixed effects and random effects we actually had a random leaf effect a fixed effect for month a fixed effect for leaf nitrogen and then here we had a fixed effect for leaf chlorophyll and leaf specifically area so we ended up with a model that's got a lot going on quickly how it behaved predictive observed before and then after we counted for the leaf belief variability