 Like I said, we spent a good bit of the last day and a half after the general introduction really focused on statistical tools that solve what is ultimately the calibration problem. We have some model, we have some data. We need to estimate the parameters to that model. We covered some fairly sophisticated ways of doing that. You know, methods for dealing with the complexity of data, observation errors, errors in variables, you know, non-constant variance, blah, blah, blah, non-Gaussian distributions. We talked about using hierarchical models to partition that variability. We talked about using state space models to fit these as dynamic time series, but they were all ultimately about calibration, about fitting that model. So what we want to do now today is think about how we take that calibrated model, integrate it and the other sources of uncertainty we have into making projections, but then also time thinking about how this creates a feedback loop where we can analyze the uncertainties themselves to better understand what data we need to collect and how we might do that data collection more efficiently. So the four things I want to emphasize in this lecture, the four key concepts are sensitivity analysis, basically asking the question how does a change in x, one of our inputs, translate in a change in y, the thing we're trying to predict. Then building on that uncertainty propagation, how does the uncertainty in x affect the uncertainty in y? Likewise, how do we forecast y with uncertainty? Uncertainty analysis, which sources of uncertainty here are most important and then optimal design thinking about how we can most effectively reduce the uncertainties in our forecasts. I'm going to spend a disproportionate amount of time on uncertainty propagation because this is very much a course on forecasting and like I said, this is a key concept in forecasting, but I think there is a deep connection between sensitivity analysis and uncertainty propagation and uncertainty analysis and then one of the values of uncertainty analysis is to think explicitly about that feedback loops, how we can use our forecast to actually improve the way we make measurements and the way we do monitoring. Okay, I'm going to take for granted that most of you have probably had some prior exposure to the idea of a sensitivity analysis. Is that pretty fair? So I'm just going to give a quick review. There's multiple ways to assess the sensitivity and understanding how our outputs change given our inputs. Probably the most straightforward of those is just to do it analytically. If we want to know how our outputs change as a function of our inputs, that is essentially what a derivative is, change in output given change in input. So that first derivative for a simple model is a very simple way of understanding sensitivity and if you have a simple model, go for it, you're basically done. But all these other numerical methods exist because this is often non-trivial or for things that exist only as computer codes, not really even something you can do. So then we can kind of classify sensitivity methods into those that are local, those that are understanding sensitivity relative to, at a specific point in parameter space, usually centered around the mean value versus those that are trying to understand the sensitivity more globally so they can look at how the outputs change as a function of the inputs across the larger part of that input space. I'm not going to go through all of the global methods today. More of them are covered in the book and then even more covered in the Salt-Helley book which is a great reference if you're interested in sensitive analysis. It really gets into the details but I think the important thing to know is that in terms of global sensitive analyses, they're pretty much always more computationally expensive because, which makes sense, if you're exploring more parameter space it's going to take more cost. You can kind of arrange many of these other methods into a gradient of how extensively they explore parameter space but that being more computationally costly versus things that give you a more approximate answer with a fewer number of evaluations. So there's kind of a set of trade-offs. I'm going to particularly focus on these two right here. So the simplest univariate numerical sensitive analyses are these one at a time. You hold every parameter except the one that you're interested at its mean or its default and then you vary one parameter. So it's essentially allowing you to numerically construct that derivative holding everything else at its constant. Some of the work that we've done with vegetation models, this y-axis is NPP and so we were asking, as we varied a bunch of different parameters, how did NPP change as we varied the parameters? In this particular instance, the gray line was when we did this with the prior distribution and the black line was when we did with the posterior distribution. And actually this brings up one thing that I like to emphasize about sensitivity analysis relative to how it is often done in earlier literature. So at least when I was trained, I am older than most of you, and you'd see a lot of one at a time sensitive analysis that were like plus or minus 20%, plus or minus 10%, which is completely arbitrary amounts. I think one of the things that I find particularly valuable is if you have an estimate of the uncertainty, we tend to do our sensitivity analysis plus or minus a certain number of standard deviations. For non-Galaxy distributions, the quantile equivalent of those standard deviations. So that's what you're seeing here. You're seeing the prior mean and the prior one, two, three standard deviations, one, two, three standard deviations. This was not normally distributed, so these are the quantile equivalents of those standard deviations and again with the posterior. I find that useful because it gives me a better frame of reference for understanding what range of variability is actually meaningful for a particular parameter because some parameters vary by plus or minus 2%, and some vary by plus or minus 200%, and so why would I vary them both the same amount? In this case, we find that plants are really sensitive to their specific leaf area. For those that don't think about terrestrial NPP, I didn't even give the units. I think it was megagrams per hectare. It goes from ridiculously huge to dead. At the other extreme are global methods that explore parameter space very broadly. One of the simpler global sensitivity methods is just Monte Carlo sensitivity analysis where you just sample completely from that joint distribution to understand what the outputs look like over a wide range of different parameter values. So in the one-at-a-time analysis, I was holding every other parameter constant. Here, I'm letting every other parameter vary simultaneously. These are actually predictions from a simple Gaussian-plume pollutant transport model, looking in this case at the sensitivity of pollutant concentration at a specific location downwind to wind speed and atmospheric pressure. So you can kind of see the more full shape and variability in that sensitivity. Two metrics that I find useful, when you do this global sensitivity analysis, but you still want to just deal that down to a specific number that's handy to assess the relative importance of different parameters. Just fitting a regression slope through that cloud of points. Again, slope is, you know, dy dx. So slope is essentially one number for average sensitivity would just be that regression slope. The other number that I find useful is the partial r squared associated with that input. Because if you wanted to, say, partition out the relative importance of different variables, those partial r squareds kind of tell you how much does each parameter contribute to the overall variability. Because you're often approximating nonlinear models with linear models, the partial r squareds are not going to sum to 100% because of interactions. But you can also, it's also likewise fairly easy with those, you know, throwing regression through there to put in interaction terms to actually, you know, assess the interactions. So here we can see, you know, wind speed has some sensitivity. Obviously sensitivity units are dependent upon the units of the y and the units of the x. But it's a decently important parameter. By contrast, atmospheric pressure, the slope is essentially zero. The r squared is essentially zero. Pluton transport was not at all sensitive. The other reason that I emphasize Monte Carlo sensitivity analyses is that I'm going to talk about uncertainty propagation earlier and later. And one of the approaches to uncertainty propagation is Monte Carlo-based. So if you're doing a Monte Carlo-based approach to uncertainty propagation, you essentially get the Monte Carlo sensitivity analysis for free by just kind of doing some post-doc analyses of those posterior samples. Okay, so that's mostly what I wanted to say about sensitivity analysis in order to set up being able to think about uncertainty propagation. So for forecasting, what we want to do is say we have some set of uncertainties in our current state of the system and our other parameters or other inputs. And then we want to project that into the future with uncertainty. How do we do that? Well, how we do that, how we make predictions with uncertainty is just a special case of a more general statistical problem which is how do we transform distributions through functions? How do we transform variables? So the idea of a variable transform is I have some probability distribution and I have some function and I want to transform some probability distribution through a function to convert the distribution from one space to another. In forecasting, you're taking the space that you're trying to transform through is through your model into the future or into some novel condition or novel location. All the methods for propagating uncertainty in the future are just a special case of transforming uncertainties more generally. I'm going to go over five different ways that we can think about propagating uncertainty into projections and I'm going to organize them in this matrix. I'm just going to point out that in the textbook, there is a typo that didn't get caught. For whatever reason, this is what I sent in, but this got moved down here. So your book has the Taylor series as a numerical uncertainty approach when it is in fact very much analytical. So what I've done here is I've taken the approaches to propagating uncertainty and classified them in two directions. Depending on what we want to get out and the approach we want to take. So first, I'm classifying these into analytical versus numerical methods. Analytical methods are ones that involve doing math, but they give us an equation. That equation is handy because it gives us a more general understanding of what's going on and once you have it, you can keep reusing it and it's often very computationally efficient because I plug a number in, I get an answer and I'm done. Numerical methods are ones that rely on some form of computer simulation. They are more computationally demanding, but they often require you to do less math. So that's kind of the trade-off. You're trading off computers doing hard work versus your brain doing hard work. Hell, it's not uncommon for ecologists to decide that they'd rather have a computer churn on a problem for a few weeks than to remember how to do derivatives. And then there are a few models we work with where doing derivatives takes more than a few weeks. If I'm working with a global earth system model and I wanted to solve for the full matrix of all pairwise derivatives, it would take years to code that up and verify that there's no bugs in it. The other direction I'm going to think about is what we want to actually come out of this. So one thing we might want out is the full probability distribution. In fact, ideally that's what we would like to get out, is that full probability distribution. Alternatively, we might be okay with just getting the statistical moments. I make a prediction in the future. I'm fine with just getting the mean and standard deviation. I can live with that. Why would I be fine with that? Because, not surprisingly, it's harder to get the full distribution than it is to just get the mean and standard deviation. So again, there's more work in this column, and this column is often easier to achieve. You've probably all now zeroed on and said, this one's easier and that easier, so I'm going to totally end up here, right? Can we just cut to that one? No. I'm going to start with the hard one. I'm going to start in the other corner, which is what I'm going to call the gold standard. Analytical methods for variable transformation. You get a closed-form solution. You get the full distribution. And this is the sort of thing that you can find in a graduate-level stats textbook. And I will say, I've done this sort of transformation in graduate-level statistics classes. I've never done it after a graduate-level statistics course, and I've only done it for fairly simple problems because it's actually pretty non-trivial. So if you think about how I have some probability distribution for my input, I have some function, and I want to know the probability distribution of the output. Well, first of all, in an ideal world, you'd be like, probability distribution, why? That should just be the probability distribution of this. No, it should just be, I'm going to take my function, I'm going to stick my probability issue into my input into my function. Won't that work? No. That doesn't work. You can't just take a Gaussian and plop it into your function and expect to get something out. What you have to do instead is to plop the inverse of your function into the probability distribution. So if you have a model, you need to analytically solve for the inverse of the model and then plug that into the equation for your probability distribution. And that's the easy part because you also need to take the derivative of that inverse of the function, which isn't too bad if you have a univariate model. You never have a univariate model. I mean, the only univariate model we actually have is fitting a constant mean, a string forecast. I mean, even a regression is a bivariate problem. And then if you acknowledge that you have uncertainties about the x's as well, you now have three terms. So you now have a trivariate model and in a trivariate model this is not one derivative. This is a 3x3 matrix of derivatives. And that's just for your simplest straight line case. If you have a 12-frameter model, this is a 12x12 matrix of all pairwise derivatives. And then you're multiplying that by probability distribution. And that gives you the probability distribution. Now just having the probability distribution by itself isn't particularly useful because it's unlikely to be a named probability distribution. You're not going to do this and suddenly it's going to be like, oh, normal, great. It's going to be some weird obscure thing that no one's ever seen before. So then you have to go, okay, well what if I want to know things like the mean of that distribution? Well then I have to take this thing and say, if I want to know the mean that's y probability of y integral of that. If I want to know the variance it's the integral of y minus y bar probability of that. And so now you have this incredibly nasty multivariate thing and you're having sorry, to multiply it by this and then take the integral of the whole thing analytically just to find out what the standard deviation is. Like I said, you never do it. The places you do it is like when f of x equals x squared. And then you go through a whole lot of work and you find out that if x is distributed normally then and you want to know the transformation for x squared it is a being chi-squared. Which is if you ever wondered why the chi-squared is called a chi-squared is because it's just x, it's the distribution of x squared. That's like the, you would do that as a homework in a grads level course and yeah, you don't do it in practice. So, gold standard, absolutely wonderful, absolutely useless. So let's move on and think about let's move to the next corner which is to say, well we can't get a full solution analytically, can we get an approximate solution analytically by focusing just on the mean and the standard deviation. So if you're going to take that approach what you're going to do is think about the fact that there are a lot of existing proofs that explain how random variables interact with each other and how random variables interact with constants. So here's a set of rules kind of the algebra of variances for here capital letters are random variables so things coming out of probability distributions in lowercase are just constants and if you look at the table for the rules for how means behave, it's actually relatively intuitive as long as you're not multiplying things together it largely works out that like the mean of random variables you know, the mean nothing crazy. Variances by the other hand are a bit unintuitive the algebra of variances like the variance of a constant times a random variable is the constant square times the variance. Variance of a random variable plus a constant is just the variance of the random variable. The constant goes away because variances of constants constants don't vary by definition and then if you add two random variables together, yeah you can add their variances together but then you also need this term for their covariances and then you can start saying well if I combine constants with random variables I start getting these a squareds, b squareds and two ab covariances and the algebra is not actually hard it's just not necessarily intuitive until you've done it a few times but it actually is something that you can take a model and you can apply these rules and work out analytically what the moments of the predictions would be there's an important caveat though all the rules that I show here are correspond to linear combinations of random variables. The method of dealing with the moments analytically only works out exactly if you're using linear combinations so essentially this works very elegantly but it only works very elegantly for linear models anytime there's any non-linearity you don't get an exact solution through the analytical moments so let's look at this graphically and I wanted to do this to emphasize again what's going on but also to graphically show the relationship between uncertainty propagation and sensitivity analysis because I said that these are not unrelated so let's say I have a linear model my simplest univariate linear model y equals m theta plus b theta is my x and again m is just the slope which is our derivative reminder that the slope here is the sensitivity if I have some distribution here what I'm wanting to do is transform the uncertainties in my inputs into the uncertainties in my outputs so if I want to know the variance in my output that's just the variance of the full model when I apply these analytical rules for dealing with uncertainties you essentially are applying them from the outside inward so I would say I have a variance of something plus a constant that becomes a variance of something because that constant doesn't matter I can then apply the rule a variance of a constant times a random variable and get out that the variance of y is m squared the variance of theta and again because that slope is the sensitivity we get back this this property that for a linear transformation the variance in y is the sensitivity squared times the variance in theta which is something I actually mentioned on day one which is that if we want to understand the uncertainties in our predictions that that is determined by two things the uncertainties in our inputs and the sensitivity of our outputs to those inputs and if you remember that standard deviations are variances the square root of variances you take the square root of this you essentially get sensitivity times standard deviation so what if you have a non-linear model well it can't apply these rules for variances exactly but there's a couple alternatives one that I'm not going to cover is that there are approximations that it can be applied using equations to variances of other mathematical operations and again you would apply those from the outside inwards so you want to say the variance of my function I've written out my full function and I'm going to break that in working my way kind of from the outside inward down back to the random variables the other way you can think about this is say well what if it's non-linear I have some uncertainty in the input well the thing I can do is I can approximate that non-linear function with its linear approximation so I can take a linear tangent approximation of my non-linear function if I take a linear tangent approximation of my function I then am back to having a linear model and I can apply these analytical rules those two approaches are actually going to be the same because all of the equations in the appendix for dealing with the variances of products quotients and other mathematical transforms are linear approximations but they're ones that potentially allows you to get away with not having to solve derivatives ok so then if I have a linear approximation I'm estimating the uncertainty in my output relative to that linear approximation not necessarily linear to the full model but it gives me an approximate answer it's not computationally demanding it does require knowing these derivatives but it gives you an analytical conclusion that again has some analytical generality to it and then you know we just saw earlier that if the model is linear then the variance in y is exactly dy d theta squared the variance in theta if we're taking a linear approximation it's going to be approximately the variance of the sensitivity squared times the variance of the input to get at the same idea in a slightly different way if I have some function and I want to know the variance of this function the variance of what we're predicting with our model that's going to be approximately the variance of the Taylor series approximation of that function and what we're essentially doing is truncating that Taylor series approximation after the first two terms to make a linear approximation so this is what I'm doing I'm taking a Taylor series linear approximation of my model I can again apply those same rules for how variances work so variance of a function evaluated at its mean well the function evaluated the mean of the parameters are constants I take constants and I plug them into a model I get a constant back and so the variance of the function zero that goes away df d theta our derivative evaluated at the parameters means that's actually also a constant your derivative evaluated at the mean parameter values theta is a random variable the mean of theta is a constant so df d theta times theta that will give us constant times a random variable gives us constant squared times the random variable and the last term constant times a constant no variance there so again we're getting the same answer the variance of a function is approximately related to its sensitivity squared times it's the variance of the input again that works we have one parameter and you never do so the more general solution to this involves having to sum over multiple parameters so I have multiple parameters I have a direct contribution of each parameter just summed up over all the parameters and then if you remember when I sum up over multiple parameters so property of the variance you know the variance of you know ax plus by was a squared variance of x plus b squared variance of y plus 2ab covariance of x and y which is essentially what we're seeing here so these are the terms these are like the a squared variance of x b squared variance of y and then these are like the 2ab covariance and in fact you can generalize all of this to just being the sum over i sum over j df dxi df dx j variance between xi and xj and if you write it like that the variance terms are just the special case where x and j are the same and the 2 comes up because you end up with ij and ji being identical terms and so it shows up twice this is essentially the general solution but I think most people like being to see this way emphasizes the direct effects and the interactions a little bit so in the book I go through an example of applying this to a fairly simple non-linear model of the Michaelis-Metten model where the f of x is vxk plus x the Michaelis-Metten model has the behavior that it asymptotes to v because as x goes to infinity this is just essentially x gets a lot bigger than k and this is essentially v times 1 and then k has the special case of when x equals k this ends up being 1 half times v so 1 half maps to k your half saturation constant those are the two parameters if I apply this analytical approach taking the derivatives of this model I find that v has a linear sensitivity so it's the transformation between v and y is actually exact because it has a linear sensitivity by contrast relationship between k and predictions is non-linear and so if I have some uncertainty about k we see here this is the mean value of k that translates to a specific prediction by contrast the variance of the the overall variance of the prediction is offset and that property you know that the property of the mean of a function is not equal to that function under its mean parameters is Jensen's inequality so this is just a graphical example that the mean of a function and the function under the mean are not the same thing and that's actually really important thing to remember when we're dealing with transforming uncertainties through non-linear functions if you run your model under your mean parameter set that what comes out is not your mean prediction if your model is non-linear because what you actually are saying we actually have a distribution of predictions and the transformation itself is non-linear so included that just to give me a chance to emphasize the importance of remembering Jensen's inequality and so here we get a mean and compositor of all for the relationship between x and y and here the black line was showing the model evaluated with the mean parameter set so again this difference here between the middle line and the black line is again the result of Jensen's inequality and this is a compositor that is derived completely analytically so this is just from closed form math if you go back to pretty much every model you've seen in classical stats that draws a compositor you know it was derived analytically someone went through the math and it was derived analytically how to calculate that compositor for things like linear models they're linear so the math works out perfectly and actually that is another thing worth refreshing and reminding people about if I have some linear regression between x and y gives me a straight line if I were to propagate the uncertainty in just the intercept that would give me an interval that just perfectly parallels the line if I were to propagate the uncertainty in just the slope looks just like that if I added those two things together you know it would look just like that because I would just have this interval with the additional uncertainty in the intercept is that what a regression confidence interval looks like no linear confidence interval will not just look like a big V because we have uncertainty intercept we have uncertainty slope we add those things together why doesn't it look like that because of the covariance so this is really important when you propagate uncertainty through those covariances are critical the correlations between your parameters are important and so this happens a lot and it's really easy to do you get some posterior estimate of your parameters you plot up these marginal distributions and you think those marginal distributions are your posteriors no your posterior is the full joint distribution of all of your parameters with all the complexity of their covariance structure and many of our parameters those covariance structures can themselves be very nonlinear so when you sample from parameters you have to sample from that joint distribution you have to account for that correlation it's that correlation that gives us our classic hourglass shape our classic hourglass shape comes because when I choose an intercept that's bigger than average the slope is flatter than average because otherwise it doesn't go through the data so there is always even as something as simple as a regression there's always a strong negative correlation between the slope and intercept it just gets more important with more complex models so that covariance term can't be dropped so we've covered our analytical approaches one that's impossible one that's pretty straightforward and one that isn't bad but it has the limitation of the more nonlinear model is the worse an approximation it gives if I'm looking at this and looking at that linear approximation I might not be particularly comfortable that's a pretty bad approximation that said if I have a model that's slightly only a slight curvature to it it might be a perfectly adequate approximation so let's think about alternative numerical ways to propagate uncertainty well we just saw that in Jensen's inequality I can't just take the moments of my model and transform those moments through the distribution so I can't take the mean and transform it through a nonlinear function and get the mean prediction out I definitely can't do that with the variance either by contrast one of the things that I can do is take samples from this distribution so if I have a distribution here and I take random samples I draw random numbers from this input distribution now once I take a random number from this distribution I have a number I can transform that number through the function no problem because I'm not transforming a moment I'm not transforming a distribution I'm transforming an actual number so if I draw from this distribution from that to my heart's content the key idea in all the numerical approaches to uncertainty propagation has to do with this idea of taking samples from the distribution transforming those samples and essentially approximating the distribution on the backside using those samples now the idea of approximating a distribution with samples from that distribution will be common that's exactly what we're doing in the Bayesian MCMC methods we don't get an analytical solution we get samples from a distribution with frequent bootstrapping methods you're essentially doing the same thing well, as well, you're approximating a distribution with samples from the distribution so this idea that's behind most of these Monte Carlo methods a straightforward conceptual one if I have samples from a distribution I can transform those samples and actually it's a powerful approach because we can transform I can transfer that into a distribution here I can then take that distribution and transform through something else I can repeat this process through increasing layers of complexity and analysis and I'm always propagating the uncertainty properly because I'm just propagating individual instances, individual numbers this approach of propagating through samples is the Monte Carlo approach of the distribution but I'm going to make a distinction for the rest of this chunk between your standard run-in middle Monte Carlo approach and ensemble-based approach I will say that this distinction is not one that's I think found throughout the literature but I think it's one that I make and I think it's a useful one to think about which is that I think about using what I'm going to call Monte Carlo approaches when we're trying to approximate that full distribution from samples by contrast we also have lots of times where we have ensemble-based approaches and the only real difference between an ensemble-based approach and Monte Carlo approaches when most people talk about ensembles they're usually talking about a much smaller number of samples and there's some trade-offs there and the key one is going to be that when you have a much smaller number of samples approximate the full distribution with those samples there's no math for the Monte Carlo approach there's just an algorithm and it's a pretty straightforward algorithm you set up a loop you draw random values of your inputs from their distributions you run the model under those inputs and you save your predictions you can then summarize those predictions in terms of probability distributions you can make histograms of them you can calculate mean, sample mean or variance of the sample quantiles to get confidence in the rules but you're working with those samples one thing that makes this a particularly straightforward thing to do if you're Bayesian is that you know the input to this is to drawing random samples of your inputs well if you've done MCMC to fit your model that's what you have and the only thing you have is random samples so it's very straightforward to take those samples from the MCMC and run those through the model as a way of propagating uncertainty and in fact the key part here as I was saying before with capturing the covariances is that when you sample from this essentially you're sampling rows so the key trick is to sample, essentially to sample row numbers from your posterior distribution and then when you sample the whole row of parameters you're capturing the covariance across the parameters so this just shows a simple numerical simulation of doing this for a linear model where each of these timelines is me drawing a mean and a variance from their joint distribution drawing a line under that specific choice of parameters and when that number of draws is small I don't get a very good approximation of the uncertainty when that number of draws gets large you can see that there's outliers in the tail but the bulk of this gives me this classic hourglass shape that I would get from analytical methods sorry, you're right I misspoke for each of those lines I'm drawing an intercept and variance from their posterior distribution and I'm drawing the line using that slope and intercept here I've essentially made 10,000 predictions with a linear model each with different parameters at any particular X I can go in and I essentially have a histogram at that specific value and if I have a histogram at any particular X I want to look at I can summarize that in terms of its sample mean, its sample standard deviation its sample quantiles I can draw the distribution so once I have all those samples anything else that I want to calculate can be summarized from those samples so making a confidence interval is simply taking these predictions and just applying the quantile function across a range of X's so now the distinction I'm going to make between Monte Carlo approaches and ensemble approaches is a pragmatic one usually when someone's talking about an ensemble they're usually talking about a much smaller number of runs usually if someone's running a model 100,000, 10,000 even a thousand times they're not usually referring to an ensemble you usually get ensembles when you're like NOAA's ensemble weather forecast for today has 21 ensemble members they represent different perturbations of initial conditions in different model structures in some sense the core part of the algorithm is essentially the same you're drawing samples from the input distributions you're running the models, you're saving the results what I think is the key distinction between an ensemble and a Monte Carlo method is with an ensemble having a smaller sample size you can't trust that histogram of your predictions to really have characterized the uncertainty so if I have 10,000 draws from a distribution and I draw a histogram I'm fairly confident in saying that histogram is a pretty good approximation of the distribution itself if I draw 10 numbers from a normal and I make a histogram you're going to tell me that's a good approximation of what the whole distribution looks like no, it's going to look like it's going to be like you know and then maybe there's one over here and maybe there's like one over here you know like is that sample and a good approximation of the whole distribution of values like no so if that sample is not you can't use that sample to represent the distribution itself but what you can do is make distributional assumptions so here's where you're going to see a trade-off you're going to see a trade-off between making an assumption and then gaining something from that assumption so if I were to say I have these samples if I assume those samples are normally distributed I can fit the sample mean and the sample variance assume that these came from a normal distribution and then I can you know go back 1.96 standard deviations and get a confidence interval from the distribution my interval estimates are not coming from the samples they're coming from the distribution I don't have to assume it's a normal distribution but the thing you're essentially doing is I'm getting the ability to estimate that shape by assuming a shape that assumption might not be perfect but that's essentially the trade-off basically it's hinging on the idea that you need far fewer samples to estimate the moments of a distribution than to estimate the full distribution so if I want to estimate the full distribution the rule of thumb I usually use when I teach my grad course I want to see an effective sample size from an MCMC of around 5,000 if I want to approximate a distribution by contrast you don't need 5,000 data points to estimate a sample mean confidently that's stats 101 how many sample points do you need to get a stable estimate of a mean tens hundreds if you're feeling ambitious but people go out and do field experiments with tens of samples all the time and get stable estimates of the means likewise if I want stable estimates of an ensemble mean I can do it with tens of ensemble members I can get a stable estimate of the standard deviation with tens of ensemble members and I can get a confidence interval made an assumption about the distribution the choice between these two I hope is fairly straightforward if you have a computationally efficient model if you can afford to run your model 1,000 to 10,000 times go ahead and do the full Monte Carlo you don't have to make any assumptions you get the distribution the distributions are going to be whatever they're going to be they don't have to conform to name distributions if you have a computationally challenging model you can get by with a smaller ensemble if you're willing to make a distributional assumption and if you have a really computationally challenging model that you cannot afford to run multiple times you have to revert to analytical methods in doing math you can get by with making a prediction running the model once if you've solved all those derivatives because you can then take that analytical prediction of your model that you've run once and figure out how to propagate those uncertainties analytically so this is just a graphic kind of highlighting some of these differences between these high level classes so here is an example of an input probability distribution it's mean it's covariance these dots are all samples from the input distribution transform through a nonlinear function this crazy manifold shape here is the the true shape given all those samples and then here's the mean and covariance of those samples which we take as a summary of that output distribution but realizing that that is just a summary that distribution is definitely not even if this distribution looked Gaussian it definitely is not and that actually points at an important point even if your input distributions are Gaussian once you've applied a nonlinear function to them the output distributions cannot be Gaussian so that's where the the where that assumption in the ensemble approach is always worth thinking about if you're putting a Gaussian assumption on your output you know by definition a lot of these things can't be Gaussian this is the Taylor series approach I have the mean and covariance of my inputs I have an ensemble size of one I only have to run the model once I analytically transform that and in this case because the model was so crazy nonlinear the approximation you know put me in the right part of parameter space but it wasn't perfect you know it didn't really capture the covariance structure there was this Jensen's inequality bias between the mean of the function and the function of the mean another point that's worth thinking about an area that I think there could be some interesting research for the research which is when we do ensemble methods the way that I spelled out it says take a random sample from your inputs and transform them well if I'm working with a fairly small sample you might go maybe I could do better if I sampled non-randomly if I actually thought systematically and so an example of that is the unscented transform which instead of sampling from the inputs space randomly it actually has an algorithm that chooses these sigma points and analytically back transforms from these single sigma points to the covariance structure in the transform by using a clever bit of math on how to relate specific design points in one space to design points in the transformation you can see that actually did a pretty good job of approximation interestingly the name of the unscented transform it didn't stink it didn't stink this is on video make sure you want to do it hey I didn't invent it, the graduate student who invented this a few decades ago you know he named it for the fact that the transformation didn't stink so it was the unscented transform okay so that kind of wraps up the highlights of the different trade-offs for uncertainty propagation I want to kind of quickly go through how we can analyze what comes out of that so first I'm going to come back to where I started on day one saying that we have a general approach to thinking about how we can take the uncertainty in our predictions and partition that out into different sources hopefully you can now see where this actually came from this came from that Taylor series where the transformation applied to just the general form for a dynamic ecological model but it also by spelling it out this way rather than writing that sums makes it clear that if I want to understand the overall variance I can actually calculate these terms and I can, if I calculate each of these terms individually I can say well what percentage of the overall variance is due to this term divided by the sum so I can actually partition out those variability and tell me something about the relative importance of these different terms I can also then dive into each of them if I have a whole vector of parameters I can decompose that into the contributions of each parameter in the model to the uncertainty and again I mentioned this on Monday thinking about the covariance and scaling but again this covariance came from that's literally what I wrote they're the same thing okay so an uncertainty analysis a couple things I want to highlight about uncertainty analysis so the idea of an uncertainty analysis is to estimate though the contribution of each input to the overall output uncertainty unlike a sensitivity analysis change in x gives you change in y we're now saying uncertainty in x a partial uncertainty in y and in some sense we're summing up all those partial uncertainties to the overall uncertainty and remember that that was related to two things the uncertainty of the input and the sensitivity so this just kind of highlights that if I want to understand what is important to the uncertainty of my predictions that there's two ways things can be important things can be important because they're highly sensitive or things can be important because they're highly uncertain and so things that are highly uncertain and highly sensitive make large contributions to the predictive uncertainty things that are insensitive and well constrained don't make much contribution to the predictive those are the things we can ignore thankfully there's actually usually a lot of these and then things that can be important either by being sensitive or by being uncertain so one of the take-homes from this is that sensitivity analysis by itself doesn't tell you what is important in your model because you could have something that's highly sensitive but the thing itself is very well constrained so it doesn't actually contribute to your uncertainties likewise you could have something that's very insensitive but really poorly constrained it does matter so it's the combination of the two that tells you what's important we've been doing these sorts of uncertainty analysis with process-based carbon cycle models for quite a while now this comes from one of the first times we did it which was some biofuel work we did at the University of Illinois and what we're seeing here is a whole bunch of parameters in a model and here are the contributions they're partial contributions to the overall predictive variance black is the posterior gray was the priors and some of these we got some good constraint so I rank-ordered them by their contribution you can see things that matter to the model things that don't matter to the model and actually, yeah, and then here CV, all I've done is taken the uncertainty of the parameters and normalized them as the CV so that I have them all in the same units and then here elasticity likewise, I've taken the sensitivity and I've normalized them to all of the same units so in elasticity of 1 is a unit change in x produces a unit change in y but it has a sign because the slopes can be negative so you can say, I can now go back and say, well, this thing's really important because it's sensitive and not well constrained but I can jump down and I can see other examples like the turnover rate is important because it's poorly constrained so even though it's not particularly sensitive it's not well constrained well, specifically if area is well constrained but it's a particularly high sensitivity and so they contribute almost equally another thing that I think is worth noting if you have a model that has 100 factors going into it that means the average factor contributes 1% to your predictive uncertainty which means the average thing doesn't matter and most uncertain analyses kind of look like this they tend to look a little bit of exponential so this is why I'll make the assertion that most of the time it's a small number of factors that are really driving your predictive uncertainty and there's a long tail of things that average out in the wash which is useful this figure also shows a great example of using this information to actually constrain models so the grays were the priors the blacks were the posteriors all of these parameters up here were ones that were just calibrated using literature data the things that were didn't change or things where there just isn't literature data this guy is a different story this is the model slope in the prior was the fifth or sixth most important parameter when I did this at the University of Illinois Andy Leakey the guy who had an office next to me he is really really obsessive about plant stomata so for all the animal people stomates are those little holes on the leaves where the CO2 comes in and the water goes out and the stomata slope is a parameter in the model that essentially controls the sensitivity of that stomata model closure as a function of mostly as a function of humidity so this is the sensitivity of stomata to environmental conditions Andy is great at measuring that he's great he sent out a graduate student into the field for two days with a Lycor to measure stomata sensitivity and essentially eliminated that as a parameter worth worrying about he knocked it back he nixed the uncertainty so this was an example where we identified an uncertainty and two days of field work done I have another example that I don't show here of a master student from our work in Alaska where I had him spend the first year of his masters kind of going through the literature to understand the system so he was taking an existing vegetation model we were working with and moving it up to apply it in the Alaskan tundra and a system that model had never been applied in before so he went through kind of iterative rounds of reading about things but also quantitative information from the literature synthesis he was doing and using that to constrain individual parameters which is the sort of thing you can do in a process based model is to actually say this observation of the field maps to this specific parameter and he kind of did this as a feedback loop because he would do these sorts of analyses and say oh I need to read more parameters, read more papers about roots because roots matter and then around April I stopped him and said okay design your summer field campaign around your uncertainty analysis and then he could go out in the field and measure the things that the model is telling him is driving his forecast uncertainty and you know his master's thesis has a series of these that kind of show the uncertainties marching down as he does literature synthesis field study Bayesian inverse calibration and at the end we end up with a well calibrated model and it's done by I really like this idea of a feedback loop so that leads me to a few final thoughts about using these uncertainty analyses to constrain what we do in the field this idea of model data feedback so I want to talk a little bit about power analysis and then observational design so power analysis is the simple version of this questions like what sample size is needed to get a certain effect size, what's the minimum effect size detectable and then observational design a little bit more complicated version of that so just as a quick reminder in classical hypothesis testing you have your null model some threshold your tail probabilities that give you your p-value but then if you have an alternative hypothesis you have some rate of false positives and false negatives and at one minus beta all of that integrated area all this integral is your power how well do you distinguish the alternative hypothesis and so power is a function of your effect size and the uncertainties you can do this sort of power analysis for traditional statistical models fairly easily so this is the one I like to show in my undergrad class looking at the power of regression expressed in terms of r squared essentially saying that if you want to be able to detect an r-square that's 10% or bigger with 90% probability or greater you essentially need a sample size of approximately 100 which is a good rule of thumb I use that rule of thumb a lot when reading papers we often under-sample a lot and I encourage folks to think about that when you get non-significant effects just because you didn't have the power to detect it more generally we know that in classic sampling theory as we increase the number of samples our parameter estimates typically go down as one of the square root of n it doesn't perfectly follow that same rule with some non-linear models in Bayesian calibration but it's a good first approximation the uncertainties are going to go down with sampling but that points to I think another useful thing to think about when trying to use model data feedbacks which is if I am in this part of my this curve a few samples can reduce my uncertainty a lot if I'm in this part of the curve I'll do a bunch of samples I'm going to get very little return on investment so I'm going to jump back so for example if I had these two cases a parameter that's very sensitive and is already fairly well constrained versus a parameter that's not particularly sensitive but is under-constrained they both contribute about the same to the predictive uncertainty which should I go out and measure turn it over I'm going to get the most bang for the buck for measuring leaf turnover rate because I have the most potential to reduce its uncertainty with further measurement by contrast I can't really change the inherent sensitivity of the model to this process so I'm going to get a lot of bang for the buck I'm not necessarily going to just measure these things in the rank order that they're given here I'm going to tackle the things that are going to be giving me the greatest return on investment approaches to estimating these uncertainty estimating power with non-linear processes in Bayesian models most often done with some sort of pseudo-data simulation so here we're going to simulate data of some specific sample size fit the model save the parameter or parameters of interest and those are essentially extensions of bootstrapping sorts of methods we're either resampling the data assuming the model parameters and simulating the data but the idea is you can generate random data of a specific size and in a bootstrap analysis you're always generating a sample size that's the same as your original because you want to understand the uncertainty about the original but for things like experimental design you may be simulating what if I doubled, tripled, quadrupled my current sample size how the uncertainty in the parameter is going to go down as I increase so here you might end up embedding this in a loop over multiple sample sizes so here's a simple conceptual example of kind of putting these things together so I might have an understanding of how the parameter error an error in a specific parameter may change as a function of sample size either analytically or through some sort of bootstrap sort of approach then using everything we talked about earlier we may then be able to think about well how does the forecast or the model error change as a function of sample size I'm transforming the uncertainty in the parameter into the uncertainty in the predictions one of the reasons that's handy is because when I transform into the uncertainty in the predictions now I can make apples to apples comparisons across different sources of uncertainty because they're all expressed which is the predictive uncertainty I can't really compare the uncertainty reduction in the slope versus uncertainty reduction in the intercept I need to be able to put them together so here I'm imagining a case where variable 1 might actually contribute more to the uncertainty but its uncertainty declines more slowly than variable 2 which starts out a little bit slower but I get a better return on investment I can then use basic economic theory to say in certain sample size what should I measure and so my turns out I'm going to measure this for a while because it gives me the best return on investment but at some point this starts to level out and I should be sampling that you can go take that a step further and say well not all types of measurements cost me the same in terms of man hours, dollars, whatever so if you could actually put a cost on these you could actually then say maybe variable 2 gives me a better return on investment but it's more expensive and so actually it might end up with a sampling strategy where I make a whole lot of measurements on this guy because even though I get less return on investment in terms of per measurement I get better return on investment in terms of it's cheap and you can extend this further to not just include the marginal costs of each sample but also the fixed costs involved in field research which could be in lab research as well I mean your fixed costs are the instruments themselves and the time involved in setting up new sites are fixed costs like typically for me the cost of getting the permitting on a new site is the same whether I'm putting in one plot or 50 plots the difference doesn't really scale much which leads to actually interesting conclusions which is the results of that uncertainty analysis of what I should measure first could be very different for different labs depending on which toys they already own so like if I already own an eddy covariance tower which costs $100,000 to put up but gives me you know measurements 20 times a second that are virtually free I might make a lot of eddy covariance measurements if I don't own that already I sure as heck you know it's not going to be the first thing I'm going to buy I'm going to go for that specific leaf area because I need a hole punch and a drying oven and then the thing I wanted to end on is this idea of what are called observing system simulation experiments which is kind of just taking what we've been talking about to the next level which is to think about designing not just a measurement but often designing a campaign or a network so these sorts of experiments are often done by folks like NASA and NOAA and other agencies say you're going to deploy a new satellite and you want to optimize its orbit or optimize what you know wavelengths it's measuring what you can do is can simulate your true system you use your model to predict what you think is happening in the world you simulate observations pseudo observations from that system so you add your observation error in and then you try to assimilate those pseudo observations back into your model to assess their impacts and you can do this with whole networks so I might say what if I deployed buoys in this configuration what if I set up towers in that other configuration what if I added this instrument and so you see these on larger scale research networks as ways of in other parts of the physical environmental sciences we see this being done more often we don't see this a lot in the ecological realm I think NEON wanted to do this but they had the capacity to do it years after they had already locked in their design because it took them years to actually be able to get up to the point that they could do this so I ran a bit further than I intended any questions