 I'm going to talk about characterizing uncertainty in a kind of a broad sense, but really only broad within a destination of why you want to use Bayesian statistics. We have some classic assumptions whenever we're learning about statistics throughout much of our training and for many people still probably whenever you're doing a statistical inference. Homoscedasticity, so our variance is constant across time. The X variables and X being the predictor variables are in fact not variables, right? There's no error in them. All your errors in Y, and it's generally all crunched into the measurement error, right? So your residuals are things you just didn't count right. We generally assume that that's symmetric about the mean, it's normally distributed. Observations are independent and you don't have missing data. If you do that, that generally you lose a whole row, right? These are assumptions for many linear models and they have been associated with classic or frequent statistics that are really rigorous and if you can meet these assumptions. But we're generally designed for agricultural systems and well-controlled systems. I have a lot of colleagues that do things in jars. They put dirt in a jar and put it in the incubator and really tight control. But most of ecology really doesn't fit with those assumptions. So if you had this cloud, yeah, you can fit a linear model that goes through those data and you often would get something like an R-squared back out, right? So an R-squared might be of 0.6 and yet your slope is really strong, P less than 0.05 but you're still not explaining 40% of the variation in those data. If your goal is to get a statistically significant slope, that's great. If your goal is to make a prediction or forecast based on that slope, it's probably not great. And I'm going to, again, kind of ad nauseum go over some of the just terminology and the formatting to make sure we're kind of all on the same page, right? So this line here is a model. It's a linear model and in this case it's a linear model that has an intercept and a slope. Those are our beta terms and some error. We can think about this as both the data model where the y, our observations, are normally distributed. So that's the n with some mean mu and error and an error term, right? That's our observation model. And that's describing generally how big is that spread of our observations around that line. If we measured perfectly, our y's, they would fall on that line. If this model, our process model, was correct, right? So your process model is that linear model, okay? As you get more and more data, your beta and your beta not beta 1 parameter estimates get tighter, but not necessarily your observation error. Because we're working in the Bayesian framework, we also then have the parameter model. So our parameters in the data and process model, which are, we're going to be working with precision, so one over the variance. And the betas are themselves treated as random variables. That means that they are distributions that have their own parameters, which we will fix this one. For instance, we have a mean of zero in some center deviation is generally how we think of it, right? You've got your normal error, and then here we would have, again, a mean of the beta not or the beta 1 with some variance about that mean. So they each have their own variance. So in Bayesian framework, even our parameters are treated as random variables. And we're going to assume that they themselves have a distribution that we're going to describe. And we're usually trying to describe the variance on that distribution. And when I drew these lines, again, to emphasize that most of this spread is considered to be observation error. Given that model, the process model, if you could measure things or count them better, then they should fall on that process model, that line. All right, another way to write this is in graph notation. Again, we have our linear model up top. And what we're really saying is that we have a data model for how why we are assuming is sampled. We have a process model that has parameters, and then we have parameter models. Again, because we're working in a Bayesian framework, our parameters themselves have models or assumed to be from distributions. And we'll use this kind of terminology again and again. So if it doesn't look right, stop me. All right, and I kind of fell off the side a little bit. But again, this is reemphasizing that each of these lines has its own or series of equations that go with it. So beyond the classic assumptions, what are you actually going to do with real ecological data, which is why we're all here? First of all, I think it's useful for some people to back up again and think about what are we actually doing when we're thinking about a data model or a distribution at all? Where are we using a distribution? So we assume that the data, these bars, this histogram, are random samples from a true population or a distribution. And we describe that by probability distribution, right? So this one happened to be a normal distribution. And we describe it by a distribution because we can't sample all like so. So if we wanted to know the height of people in the room, we could measure that. But if we wanted to know the height of everybody who's interested in ecological forecasting, we would just use this group to estimate it, but we're not going to measure everybody. The distribution allows us to make some assumptions about how this group represents the entire population. So we all know that. Because there are some known, we generally pick distributions that have these known parameters, mu, standard deviation, things that describe the distribution that allow us more flexibility as we move further on to make comparisons. And it also helps us estimate what we didn't measure, right? So that kind of end down there. We didn't get any data for, but if we believe that the population is a normal distribution, then we can estimate what it should be, OK? So distributions, but not all data are normal. And so some distributions, again, even though they're not normal, I'm sure, you know, classic statistics has many very rigorous ways to deal with them. And I'm sure everyone in this room has used them. So we have a distribution here that's clearly not normal. It goes from zero upwards, right? So it's not continuous. Also, if you fit a normal distribution to these data, you know, if you do a statistical analysis as assuming that these data are drawn from a normally distributed population, you will get a mu and you'll get a sigma. But it's actually, right, so your mean here of three is not representing the data. It's not representing what was actually measured. I mean, it's wrong. So, but you don't necessarily know that, right, unless you plot them out. And so, you know, this is when you pick a different distribution that better describes that error or difference between the observations you collect and the assumptions about the true population. In this case, this would be a Poisson distribution, which is a general non-negative discrete data distribution. And instead of a mu, you describe a mean that now you can see is a little more representative of the data and make your assumptions about that. Incidentally, this is also a distribution that doesn't require homoscedacity, right? So the variance increases with the mean. So sometimes you can deal with that assumption just by choosing a better distribution. So we're going to kind of practice and talk about a lot about the distributions that we're working with for the different data sets as well as for the parameters. You know, are they realistic? Are you just choosing normal distributions? Because they multiply times each other really well, multiply by each other well. You know, are they actually a biologically relevant representation of the data? So another common example of data that don't look normal, we've got binary data. And in this case, you can use a binomial or Bernoulli to look at the probability of a success, for instance, seed germination. So the probability in this case is treated with a logit, which is a transformation that puts the probability space, which is 0 to 1, into a normal distribution. And you can model that as a linear model, right? The important thing to kind of think about as we move forward is that is, you know, we've got our data model, and so we're assuming that, you know, that's not a normal distribution, that we are sampling out there from some true population of 0s and 1s, not a true population of 0.5s, right? So, but then we have this link that makes it linear and the process itself. And both of those together are part of your process model, right? So as you collect more data, you're getting a better and better estimate of that process model because we're Bayesian and we also have our parameter model. Another assumption that I touched on, homoscedacity, variance doesn't increase over time. One of these data sets, it gets a check, meets that assumption. One of them does not, right? So this data, I mean, we would test it to make sure, but I'm going to tell you it doesn't. These data look like the variance increases over time. That's a problem because it violates the assumption of a classic linear model. And there's a couple of different ways to deal with it. There's many different ways to deal with that. One would be to try to find a distribution that does not assume that the variance is constant over time, but another way would be to model it. You could model the variance. And so if you remember the last time I showed you a graphic notation, y is distributed normally with mean beta 1 plus beta 2x, right? So that's our linear model. And before we just had a variance term, right? And now you actually have a model for variance that's also based on x. So you're modeling the change in variance over that variable x. And the way it looks in the graphic notation is you just add in parameter models for the variance. A couple of different examples of how once you kind of put this in a probabilistic or Bayesian framework, you can kind of pull apart the data model from the process model. And it gives you more flexibility to tweak either, to better represent your data and your question. In this case, so the red is the linear model run as if the variance was constant. And the green is that second model run with changing variance over time. So x is here. These are our observations. And the true line is in the middle. Dark black, red and green fall on each other here. And here, green, or the one that actually models variance, actually the credible interval changes over time, right? Over x, not time. And the red doesn't really. So the mean of the posterior also better represents, better reflects the true mean once you actually take into account the variance. But there's no difference, really, if you do it on data that are not heteroscedastic. And then in this case, this is a model that Mike ran. But the DIC, the Deviance Information Criterion, which is one way you can evaluate a Bayesian model similar to an AIC, right? So what have we done here? We've added another parameter to the green. Okay, so it's got one more parameter. So in a traditional information criterion, right, that's a penalty. It's the same data, but now you have one more parameter. And when you have smaller numbers, that's actually a better fit. So even though you've added another parameter, you better fit the data and you've increased the information criterion. And so again, we're talking this week about uncertainty. Doing a better job at capturing uncertainty does not necessarily mean you're going to get more certain. You're more certain if you actually you're more certain out here, if you use the model that doesn't fit as well, right? That doesn't make you right, and that becomes much more problematic when you're trying to forecast. All right, so another kind of class of assumptions, the observation error. The regression model assumes all error is in the y's. Again, so you figure out what the best fit line is and you calculate residuals by looking at the difference between your observation and the line. There's always some uncertainty associated with the parameters that you estimate. You can make that uncertainty smaller by collecting more observations. But the noise that's left is always assumed to be in y. In the observation itself, if you had the perfect model, if your process model was perfect, then that's fine. It's important to know what your observation error is. Maybe really everything that describes that line in the real variability is in the model. The model explains 40% of the variance in that and maybe really 60% really is just, you can't work a DBH tape. But I don't think most people get to the end of their linear regression and feel that way. But usually there are a few that you think probably something else is biologically relevant and it's not just I couldn't count it. But that's what the assumption is, okay? So generally we assume that the observation error is symmetric about that mean or a predicted value. But sometimes you actually also don't measure x very well, right? So sometimes there's actually error in our predictor variables. And in fact, a lot of times there's probably error in our predictor variables. And there are few more frequentist opportunities for capturing that. Sometimes it's not a big deal and sometimes it could be. You can imagine that if there's error in x and you're trying to use x to make a prediction, then that error is going to propagate out and also be in your prediction. If you haven't somehow accounted for it or described it, then it makes your prediction overconfident. All right, so errors in variables is a way to deal with the fact that we can often have errors in x or uncertainty in our predictor variables, as well as in our response variables, okay? So classic assumption is that it's all in the response variables. But in ecology, the reality is that it's often also in our predictor variables. So how do you deal with that? And the Bayesian framework, again, because we have this probabilistic structure allows us flexibility to build that into a model. So in this case, what we've done is we've got the same kind of linear model we've been working with. Where our parameters describe the variance around a slope, for instance, in the intercept and the observation error. But also, we've got a model for the variable, the predictor variable x. That's described by its own set of parameters and also informs y. And I've written it up here, a little bit different. So we're actually modeling x as a random variable. Now we're going to move into latent variables. So anything that's not directly observed, sometimes that's just error. Sometimes that's variables are measured with error, whether that error is biased. Meaning you've got an instrument that's always a little bit high. Or it's actually got the random error, as assumed, generally with measurement error. And you want to account for that explicitly. If you have missing data, and you want to estimate data that you didn't actually collect. And more often, we hear about latent variables with proxy measures. So you've got something that you measure. And you've got something that you actually want to interpret. And I think we heard a lot of people in the project descriptions talk about, I've got these data, and this is what I want to do with these data. And generally, I want to summarize those data exactly because I think they're the true measurement of the population. That you've got data that represents some bigger population. And sometimes they're actually proxies for something that you can't count. But you could also, I think the book talks about GPP being a proxy. Of component things that you go out and measure. And then you kind of put stuff together. And NEP would be a clear example. You put stuff together and try to make some summaries about net ecosystem productivity. So ignoring the fact that there are latent variables can have a whole bunch of outcomes, which is modeling a derived response or a flawed observation can lead to incorrect or falsely overconfident conclusions. I think everybody knows that. But if you go back and look through a lot of analyses, it does not stop most people from doing it. So we'll talk more explicitly about what is a latent variable and how might you treat it. So first, missing data. You've got a data set. This is something that has a response on the y. And some variable, this case I wrote time on the x. And you've got missing data. If you've got a good model, then you could predict what's missing here. You could predict y given some known x. And the error that you've already estimated. You can make good predictions if your model is a good representation of those data. That's what probably most people have experience doing as far as missing data. Or you use regression to fill in a gap. That's this example. I guess sometimes people fill them in other ways. But generally, people are more familiar with, you've got a missing y. I fill it in given my model. In a Bayesian framework, again, because it's probability and everything that's not known is being treated as a random variable, if you have missing data, when you run the model, it will estimate those missing y's. Likewise, if you have missing x's, missing predictor variables, and you have a model for those predictor variables, as long as you define a model for them, a distribution that it can draw from, then you can estimate the predictor variables. And not only can you estimate what they are, the missing ones, but you can estimate how they influence the y's. It does it all at once. So that's pretty powerful, especially if you have you've been out there collecting data, and you have lots of data, but then you've got weird sensor things, or somebody got shins splints and couldn't help you that day, so you've missed counts on one day. You've got random things that are missing throughout your data set. Most often in a linear regression, in a classic sense, you would have to just kind of get rid of that whole row. Or you would have to do some kind of gap filling that just fills in a point value. This allows you to actually estimate what it should be, as well as look at the influence of that missing data on the rest of the response or the response variables. So by the time you get down here, everything's been defined as to what distribution it was drawn from. But you're going to update the regression model. That was the mean equals beta naught plus beta 1. You're going to update the regression model based on all the rows of data given the current values of the missing data. You start the model with an initial value. Then you update it. And then you update the missing data based on that regression model, which is similar to a regression gap filling, except that you're doing it in this iterative fashion. So you're actually also not just gap filling a point. You're actually filling in with all of the uncertainty in how much information you have to inform that point. In order to do that, you assume that your data are missing at random. If your data are not missing at random, if there is some reason that you're missing chunks of data that happens over and over like we never got night. Or this one person, every time they went out, things went really badly. And they were out every Tuesday. I mean, if there are reasons that you have missing data, then you have to build a better missing data model than what I just showed. That's based on the idea that you just have random missing data. If it's systematic, then you can't estimate data that's missing if you don't have other data that should be replicates of it. I mean, this is to say, again, there's a lot of power in Bayesian inference and in using a probabilistic approach to statistics. But if you don't have information in the data, you're still not going to do magic with the statistics. Layton variables, when you observe y, but interpret it as z. And we already talked about the fact that sometimes we already know we do that with our data model. We're saying that we're observing data and expecting it to come from this observation error or this true distribution. But missing data and then these proxies where you have one or multiple things informing a y, informing your response variable, are more common. And then I was going to end with an example of the breadth of one Layton data approach. But so this is actually going back more than a decade for me when I was a forest ecologist. And I worked at the Duke Forest face site where they was kind of one of the coolest experiments that I've worked on because it was just so big. It was DOE, right? So they had these big, big things that kind of extended above the trees and just blew carbon dioxide onto them at the time at what was like a pretty futuristic CO2 level. And it was a pine plantation, so they're all the same age. So we know the year that the gas was turned on. We know how old all the pine trees were. They're all the same species. It's kind of like a jar of dirt that you put in an incubator but out in big scale. And I got to spend September and October climbing up those towers above the trees with binoculars and counting things. It was a beautiful PhD. Perfect time of the year. Everyone else in the lab is out in August measuring trees. They're all poison ivy infested and only go out in the fall. And so they'd been collecting seeds before I got there. They put out these laundry baskets and collected seeds in the baskets. And they could see that you got more seeds in the baskets in the elevated, which are the red or the bigger ones here, and fewer in the ambient rings. And there were three ambient and three elevated. And so the question here is this is a CO2 fumigation response. If you're not a forestry college, what you need to know is the fecundity of trees is often related to tree size. So you make the assumption that the bigger the tree is, the more resources it has to put into fecundity. They have to grow big first. And then they start putting out seeds. And then people do things like fecundity equals the number of seeds, which is a function of diameter and treatment. Tree size isn't really diameter, but it's a good proxy. Seed number is really hard to estimate. If you want to count them on a tree, it turns out that's hard. And so you count them in these seed baskets. And then you have to try to figure out how many of them came from what tree. So right away, we've got error in y, error in x. And then I'm trying to ask about this treatment response. But also, there's the question of, is this more trees putting out more seeds? So are there more trees contributing seeds? Or are the trees just bigger and contributing more seeds? So there's lots of different layers to this, which made it a good PhD. And when we went out and looked, so remember, the assumption is that if you know the diameter, you can predict something about fecundity. And we went out and looked, and this is total seed cones. So I counted cones because counting seeds is insane. And you could actually count the cones on a tree with binoculars. And then the diameter of the tree is here. And if I pull out one chunk here, you can see that there's a ton of variance in fecundity at that diameter, especially in the elevated. But if I moved it over here, then you would see that there's a ton of variance in the ambient as well. And so there's a lot of individual variability in seed production. If you just wanted to do a mean, then at that diameter class, the mean ambient would be about seven cones per tree. And the mean elevated is about 52. But you can see that that mean, it isn't really representative of these data, especially for elevated. And I'm not a perfect counter, but I'm not off by 50 either. So the variance there is more than just my inability to count. And so we built a model that could take a bunch of different information into that process of fecundity. So where the fecundity is still a function of tree size, it's now conditioned on whether or not the tree is mature. There's a couple of different ways you can get zero. You can get zero because the tree just didn't produce anything that year, which happens in some years. You can get zero because it's not mature. It shouldn't be counted as mature. There's a different trigger that seems to happen that turns a tree reproductive. And we used both the cones and the seeds to inform the fecundity estimate. And part of the reason we were able to do that is because we had older trees. Remember, all these trees are the same age. And so even though they kind of have a diameter range that goes from probably around 10 to up to about 30, that's not really a big diameter range in terms of trees or even loblolly pine. And we actually had information about bigger trees that could help. Remember, if you don't have any information out here about 40 centimeter trees, then your line here, when you're trying to estimate a change in fecundity with size, can't really inform that very well. And so if you have information to constrict what that should look like, which we did with these other seed traps in other stands, then we could actually use that to help constrain what's observation error from what's actually something within the realm of natural variability. So you can use different data sources to start to describe a latent variable. And in this case, we've got the latent variable of maturation status and fecundity, which is both a function of information derived from numbers of seeds and laundry baskets that comes in via this dispersal model and the number of cones that were counted on a tree.