 Well, great. Good morning, everyone. I'm glad you could join the webinar. So my name is Jeff Garrison, I'm a professor of geological sciences at Stanford. And in general, my research interest is in general about earth resources exploration, all the way to appraisal and then exploitation. And so the interest for me is in water, minerals, storage, oil and gas, and things like that. So recently I published a book about this, quantifying uncertainty in earth and subsurface systems. So my work focuses largely on subsurface, but a lot of the material that we will hear applies to basically all kinds of earth systems. So the challenge that we're all having is who or energy materials, where will it all come from? And so in the appraisal and exploitation of such resources, there's always a question of risk versus return, which is the resource versus the environment. The exploitation of groundwater, for example, is a good example where you may have an extraction that may damage the environment due to contaminants entering the groundwater system. So that's also why I want to start just kind of as a leading example. I've been very much involved with the managing of the groundwater system in Denmark. And there the idea is for the Danish government to protect the Danish groundwater system and to essentially manage the aquifer extraction. So to do that, in Denmark, we have a very complicated subsurface system which are the Glaciated Valley system. And this is all these lines that you see on the left, these black lines are essentially valleys in which the groundwater resides. And so to manage that, the Danish government has mandated very large exploration by means of what's called SkyTem and T-Tem geovysical data. And these are your examples of that. SkyTem is a kind of a helicopter system that has a transmitter loop that measures the subsurface electrical resistivity. And to then to get more detailed information at the really sort of agricultural scale, they use these dragging of that system on the ground. And here you see actually my student, Lee-Jing Rang, who is doing that last summer in collaboration with the University of Aarhus. So the major challenge, of course, for them is how do you apply that at such a large scale? How do you manage these groundwater systems? Are we going to use science in particular all these fields that are involved from hydrogeology to geophysics to geostatistics and decision science and do that in a scientific fashion? So what's needed here is decision science, which is a way of addressing opaque decision questions into actual quantitative science. What's also needed is a way of reasoning about uncertainty because, of course, the decision is going to be made not based on a deterministic outcome, but based on uncertainty. So how do we reason about that? And what are them based on that reasoning? What are them methods of solving such problems for decision making under uncertainty? And of course, the whole data part, right, is that so much data is acquired either by stream flow gauges, by measurements through geophysics and drill holes. And so what I want to present to you is a scientific view of this problem and also something that has some kind of logic to it. And so what we want to avoid as much as possible is doing ad hoc things like tuning or fudging with the models to make them match observations. We want to avoid these kind of things. My experience with that is that works in terms of the tuning, but down the road that often inverts you in terms of your models are no longer predicting your observations, your future observations. So I've developed over the last 25 years expertise in that area in Earth Resources and uncertainty quantification to use the plastics. And so I come up with this protocol, which I call Bayesian Evidential Learning. And so we'll talk about this protocol during this presentation. So what is Bayesian Evidential Learning? As you notice, there are three words in there, Bayesian Evidential and Learning. Bayesianism is based on Bayesianism. I'm sure many of you heard about Bayes' rule in school or in courses, but Bayesianism is slightly broader. So Bayes' rule was invented about 250 years ago. Bayesianism is a way of reasoning about data and uncertainty and is generally used in science in that sense. And it's only been around really for 50 years. And what is really key about Bayesianism is this fundamental notion of prior knowledge, which is that we already know something about what is going on prior to even acquiring data. And so that initial knowledge, that initial uncertainty is going to be fundamental in what we're doing, and it's going to be fundamental in coming up with a systematic way of quantifying uncertainty. Obviously, we're going to require evidence, which is data provided by field observation, but also data provided or information provided through modeling. The learning part will be the statistical part, and I will not talk too much about the statistical details. I'll just point out some code at the end of the presentation. But in general, we will be doing Monte Carlo, which is the only way to get to no certain decalification in the most general sense, and we're going to learn from that Monte Carlo. And so that's a kind of idea. As I mentioned, this is not really new in that sense, that this kind of idea has been developed over many, many, many cases that have worked in all these kind of resources fields. So let's go through this protocol. So what Bayesian Evidential Learning is, it's not a methodology in that sense. It doesn't say what to do. It just tells you what is the framework within which you need to do things. And then within that framework, you can make decisions about certain methods to use. For example, for sensitivity analysis or inverse modeling. But what's really key is that it's a goal-oriented framework that means you need to start with formulating this decision question and statement of what is actually you want to predict in the future, because that's what your decisions are going to be based on. The next step is that Bayesian step is that we're looking at this question, and we started to think about what are the kind of models that we need to build? What is our model complexity? And what are already initial uncertainties or prior uncertainties? And so here, we don't need to be very accurate. The whole point of this is not to be accurate. The whole point is really to be brought and have a large range of uncertainties already present. And so what will come next or what will come later is that we'll reduce these uncertainties with data. The next step is probably one of the most important steps in nerd sciences is that what we often find is that the model skin often not predicts future data. And so in order to avoid that, because that's of course, first of all, that's not effective. And secondly, it's not efficient because you have to redo everything, is that we want to go through a process which is called falsification. So falsification is a concept in science that tells that you can only prove things wrong. You cannot prove things right. Now I'd like to make a differentiation here between falsification and other nomenclature that's often used, which is validation and verification. So falsification is is basically not trying to make anything valid is trying to make things invalid. And so I think this makes us clear distinction of what is often and commonly done. Once our model is falsified, we're more confident that we can go on that we'll make meaningful certifications. The next step is really a learning step. Once we've done Monte Carlo, there's lots to do with it. First of all, there's a global sensitivity analysis. And global sensitivity analysis is very different from local sensitivity analysis or base case sensitivity analysis. We'd like to understand really, what is important to make predictions because we're going to base our decisions on predictions in the future. So what is really impacting in the model about predictions. The second step then is what is impacting the data because if there's some overlap in the data of sensitivity and the prediction sensitivity, then logically, our data has value and we can start using it. Then comes the uncertainty reduction step, which is probably something that you're also familiar with or inverse modeling. You notice that the inverse modeling also really comes at the fifth place. We don't start with inverse modeling. I can start with inverse modeling. Often, you have this issue that either the inverse modeling is too difficult to solve or that you're solving it, then you really don't know whether you have a right solution. And so here we now arrive at the stage where we can be comfortable in applying inverse modeling. And then finally, we may make our decision obviously. So coming back to the Danish case, let's see how this can work. So what we want to do is first, of course, it's formulate the decision question. So here we go to a smaller area in Denmark, which is near Arhus, in a net small area here, we'd like to decide where to drill the next drinking water well. And so that's, let's say that four alternatives have been given to us, location A, B, C, or D. So that's basically the question is like, where would you drill for alternatives? And how would you now solve this problem? So here we have a number of observations, which is typically done in groundwater is pumping wells or observation wells where we measure the groundwater level. And here at the bottom, you see the Skytem resistivity data, which indicates, as you noticed, this very heterogeneous subsurface system of buried valleys. You see a significant variation in the resistivity in the subsurface, which indicates significant change in water content and grain size. So this is really what we're talking about is what is the problem you want to address and how would you like to solve it? So in decision science, we will first formulate objectives. In the beginning of the presentation, I talked about risk versus return. And so typically you would have risk objectives and return objectives. So there's a risk, of course, when we drill and pump, that we contaminate the groundwater system because of inflow of pollutants from agriculture or industry in Denmark. The second is that by pumping your risk draining wetlands and also draining streams. And so that's another risk. The return, of course, then is that we do get the groundwater extraction to draw down. And we'd like to, of course, maximize that to obtain the groundwater for drinking. So each of those will have associated with them some kind of variable, which is the prediction variable that we'd like to predict. The nature of this variable here and how it's done is not really important. But you can imagine, for example, for industry in dust, here would be the amount of contamination that is obtained because the overlap of the catchment zone with industrial land cover. So in the end, what we would like to then do is, is to evaluate these objectives with each alternative. Now, some objectives more score, some alternative may score well on certain objectives. It really depends. And that's where you want to estimate. Now, once we have that, then we can start sort of summing this all up and coming up with our best location to drill. But we don't notice. We don't know these values. And this will be part of modeling. And so we need to predict these values at each location. And all these values are subject to uncertainty. Okay, so now, of course, we move to the second point, we sort of got an understanding of what we want to do. We now want to conceptualize our earth in all of its aspects. And we want to look at what does the community already know about it? It's not that we start from zero here. Obviously, people have studied glaciated valleys before people have acquired geophysical data in glaciated valleys before people have ideas of resistivity, and how it varies, etc, people have ideas of how much rainfall is in Denmark. So there's a lot of things we already know. And that will determine that balance together with what we want to predict balances that conceptualization in terms of what is the complexity and what's the uncertainty. So that's what I mean here is that we want to define model complexity, we'll call it M is a big vector, let's call it like that of all of the model variables and components that we have this is huge. And all of the uncertainties a priori, so there's no observations, no tuning, just make broad statements about things. And this needs to be done for all of the fields currently. So you can imagine this, if you go to real scale application, you have to do that for many, many types of fields of science involved. Just to give you an idea of how this can proceed, I'm just gonna talk about a little bit of another type of case studies. And in the, in the webinar, we'll sort of in the mid intermingle a little bit of a couple of other case studies just for you to get an understanding of the generality of what we're talking about. So here we are in a general exploration in the Baseline range in the United States. And so we're here in this valley called Dixie Valley, which is as you noticed in Nevada. And so in these valleys, what we often have is that over time, mid York water has infiltrated the subsurface created a cycle of heating and cooling that then led to the creation of these hot springs, which are then sources for potentially indications of sources of geothermal energy. The problem, however, is that these hot springs or these types of observations cannot be made direct simply because these values are very much covered by sediment. And so we need to do exploration drilling. And so exploration drilling is very expensive. So we want to limit our exploration drilling. And then the other problem, of course, is exploration drilling is only shallow, the only probably one or two kilometers at best, or potentially even less, while the actual geothermal production wells are much deeper. So in this case, what is done the prior information? Is it just kind of if you drill? Well, the answer is no, because we have a well studied already, these kind of valleys, we know that they are extensional systems. So that means create these kind of faults is setting up normal and white faults. We know that the basin has been filled in, we have extensive information elsewhere in the United States about the fill of this basin. We have a lot of information also about the probability of the basin rock, sort of the base month rock, which is of course very low probability. But still, people over the world have drilled in very deep from thermal systems. So there's a lot of information about that. Then we have geophysical information that may inform us about the density variations and what can be expected. So all of this I was compiled into this table. So this table is now our statement of prior model uncertainty and prior uncertainty. So typically in the subsurface, you it's the big table, not going to go into details, but typically, when you build models, there are structural components, at least in the subsurface rocks and flow fluids. So structural being in faults and layers. And so here you see that there are several uncertainties. There's also distributions and established from the kind of data that we have. And of course, so of course, getting to this table is is a very well tedious and laborious work, but it's very important work. What's however, also important is not to worry too much about these distributions, we're not looking here, say to estimate in detail, what is the base in depth, we're saying here, for example, if you look at the first row, we're saying it's between three and four kilometers. So we're not going to say it's 3.258 kilometers plus or minus some standard deviation, we're looking at really broad ranges. Shouldn't worry about that, because later on, these ranges will be significantly reduced once we acquire or go into inverse model. So what we then do is define what we call the data variable, which is not to be confused with the observations, the data variable is simply taking a model and applying the data for model, which is in this case, the temperature modeled into the system. And so the model of the temperature in the system, we're going to use some physical model, in this case would be heat and permeability system. And so we use, for example, conformal physics to take that model that's at the bottom there and run it through a simulator and then we get the temperature distribution in the basin. And we can then drill and acquire the temperature in those wells. So the picture on the right hand side is a very favorable picture, obviously for geothermal exploration, because you notice that you get this upwelling of essentially the, see if I could get a pointer going here. See the upwelling here. That's very favorable. Of course, that is true, if this particular model would happen, we don't know the model, the only thing we know is just a few observations. And so then we are ready to do Monte Carlo is taking that big table and starting drawing from each variable, one realization or one sample. And then we have one model and we repeat that. And so we have then a number of realizations held here. And you see that you get a large variety of realizations. So you can start not to understand that inverting a model like that with observation is going to be become rather complex. In the Danish case, similar summarize here, we have a little logical uncertainty, we don't know where the gravel is, where the barry values are exactly, we have some information from geophysics, but not perfectly. We don't know probabilities, probabilities within the area of remodeling and also probabilities on the boundary of the area of remodeling. We have river flows and conductances, we have drains that are built in the system, we have aquifer recharge, which itself can be a complex variable in a sense you have to model rainfall over time in Denmark, it rains a lot. So that's not so hard. And also in Denmark, the groundwork systems is pretty much at the surface. So but again, you see, we need a lot of expertise in doing that. So you would typically have to collaborate with a number of people of different fields of expertise. So this is a summary of what Monte Carlo then entails. Monte Carlo entails making that table, which is that box here in the middle, then creating multiple realizations of the model, then forward simulating the model, getting the data variables, which is these realizations here, simulating also the future, which we'll call H, and creates this here. And then we also have the actual observations. Good. One thing, a little technical note here, and we'll talk about is, how do we visualize Monte Carlo runs? Because we're creating a lot of information, we have a lot of complex models, data and predictions. So how do we visualize that? Some of you probably heard about dimension reduction. One of the basis, basic idea of dimension reduction is the following is that we can take anything an object, a vector, which could be a model, a data variable, prediction variable, and write as a linear combination of nonlinear functions. And so this linear combination of nonlinear function, it's like fixed shapes, and then scalars. One form of doing that, this principal component analysis, and principal component analysis, these nonlinear functions are called eigenvectors. And we can write then, say, for example, a model, or whatever it is, an image in this particular case, as a linear combination of these basis factors. And so these basis factors, as you notice, increase in frequency, you can see it's like, basically, it's like a free idea composition. And so what this does is nice about this is that instead of now focusing on x, we can now focus on the alphas. And so the alphas are contributions, essentially, to the eigenvector. So they have certain variances that are interesting. Alpha one will have the largest variance, alpha two, then the second largest, etc. What we can then do is plot alphas with regard to each other. For example, here, I plot alpha one, with respect to alpha two, for let's say 1000 of these x's, when up to 1000 models. So this two dimensional plot is now sort of sort of simple way of visualizing the Monte Carlo. And this is very useful later on, we'll see that then we can start color coding these dots with certain properties and allows us to make some better visual understanding of what we have achieved in the Monte Carlo. Here's an example of how Monte Carlo can be used is in our case, that I've discussed in the book, and it's about reactive transport models. So here we're in Colorado, where we have the Colorado River. And so one of the problems in Colorado is uranium contamination, which is due to the dumping of waste of nuclear material of mining, right next to the river and the flood plains. And so people of course want to clean that up. And so what they do is they inject certain acetate, which is a chemical in the subsurface to precipitate uranium. Of course that precipitation process is uncertain, we don't know whether it works. And so what we do is we run models to evaluate how effective that is. And so the right hand side, you see reactive transport models, which are crunch flow models. And what you notice here is that we have injected a fluid, for example, here on this side of injectors, so the zero is injective sites you see here. And so by injecting a fluid, you see that you're starting to precipitate the uranium as the fluid moves through from the left to the right. So this concentration here is in concentration of immobilized uranium. So you notice that this is really, really nice. This is a good system. It seems to have worked, but here doesn't seem to have worked. And so we want to understand what factors are involved. Is it permeability? Is it the reaction rates? Is it the microbial activity? So many uncertainties. So we've done this Monte Carlo, we created many of these maps. And so one thing, these are the various uncertainties that went to it. In this case, we have not only geological uncertainties, but also chemical uncertainties, which have to do with the reaction rates, the concentrations, mineral surface area, etc. So for example, here you see one particular reaction, sorry, one particular saturation of immobilized re uranium. And so when we do principle component analysis of that, we can take this very complex picture. And essentially, as you notice, if we use 100, I can, once our 100 is alpha scores, we get pretty much something that looks exactly like our input picture. So that means that we have reduced the problem of tens of thousands of pixels in this image into a problem of only 100 values. So now our problem becomes not a problem of 10,000 variables, but the problem of 100 variables. And that for, of course, for mathematician or statistician or data scientist becomes suddenly much more attractive. So you can also do that in the data variables. For example, in Denmark, we have in that area 300 observation wells with head data, which is basically the water level. We also have stream flow data you see here this green dots is essentially where you measure stream flow. And so what we can do is we can also put that in the principle component analysis. And then we get essentially a plot like this. So if each dot comes from a model that I created with my big table, remember, I started my big table, created a model, run my simulator, in this case, this is mod flow, created a response, then did dimension reduction. So each dot in here is essentially a response that I get from my model. What's really interesting, when you do this in the data world, is that you can also do this on the observations, because you can do the same kind of transformation on the actual pumping data or actual observations data. And so you notice here, the star here is the actual observations. So an important observation, of course, here is that that star is in the middle of this cloud of blue points, it's not outside this cloud of blue points. This is very important, because obviously, when that star is outside the cloud of that blue points, what does that really mean? It means that my model that I created, my table, my own certainties is never going to be able to predict the observations. And so that's a critical observation. And that's the observation that leads us to falsification. If that star is way outside that cloud, means that your model is falsified. And then we have to think about what we should do next. Okay, as I said, falsification doesn't mean proving anything correct, or validatable, calibrated, or verified. It's simply a process that says you have proven something incorrect, and you need to revise. So the revision is very important. Because what we don't want to do now is to we don't want to sort of take the model and multiply a variable with five or, or fix something or try to change something in a meaningful fashion. Because what do we really mean by the model is falsified? And why does that happen? And there's really only three reasons why that can happen. And as explained here, either your model is not complex enough, you need to add complexity because there's something lacking in your model. Or your model is complex enough, but the variables are not uncertain enough, you cannot really capture the ranges that are there. And so instead of trying to tune and invert and calibrate, we need to increase complexity or increase uncertainty or do both. And so that means you have to go back to your table and revise. And this is not a statistical question. This is a question of understanding why is this happening? And in order to do that, we can start using sensitivity analysis. So that's where we've landed now. If our model needs revision, we can do sensitivity analysis, or if our model has been revised and passed through the falsification, we need to do sensitivity analysis to understand better the problem. So here we're talking about Monte Carlo or global based sensitivity analysis. That is, we're just going to use again, the Monte Carlo results that we already created and run a sensitivity analysis, what we're not going to do is take a model and mix more for perturbations to it, like one of the time analysis, that is called a local based sensitivity analysis and should not be used for uncertainty complication. In the sensitivity analysis, what do we want to learn? We want, as I mentioned before, we want to learn what in the model variables impacts most my prediction. Because if model varies, no matter how they vary, for example, no matter how I vary the recharge variable, it doesn't have any impact. Then I don't have to worry about the recharge variable and have to bother with it. And that is going to allow us to simplify the whole problem. So sensitivity analysis, a lot is about simplification. The second part of sensitive analysis is about what I call value of information. If you will find now do a sensitivity analysis and understand what model variables impact the data response, then we can start to understand if there's an overlap with those model variables that impact the prediction, then logically, then the data is informative about prediction. And we can actually tell what in the data is informative about the prediction. And that is extremely useful information for us to start doing in the version and do it very efficiently. So if you do this in the Danish groundwater system, what you're finding, for example, is I have a look at the data variables, you might use global sensitivity analysis. And typically what comes out of these global sensitivity analysis are these types of Pareto plots. So the Pareto plot what we do is we rank the variables from most sensitive to least sensitive. And so if you look at what impacts the data variable, in other words, the same word that says is, what does the data inform? We see that on the top is the recharge and the budget budget means essentially the budget of water coming into that region. So this has to do with boundary condition of that region. So what is surprisingly not important is the permeability, hydraulic connectivity and geological heterogeneity. So in other words, the stream flow and head data is mostly sensitive to things that are flowing, not to things that are geological. That also means that I can only use the data to reduce uncertainty on those variable, I cannot use the stream flow data to reduce uncertainty on hydraulic conductivity. If I look however on the prediction side, and this is the right hand side here, then we see that in terms of predicting industrial pollution, which is essentially calculating the overlap of the drainage area, or of a well with regard to land cover intersecting with industry, then we notice it's more complex, there are more sensitivities, and there are sensitivities related to boundary conditions, as well as geological heterogeneity and permeability. So what can we conclude from this? What we conclude from this is that the one the data is informative about prediction, because we have overlapping sensitivities toolbox. But the data is not fully informative about prediction, because there is areas that the data is not informed by. So it means that I have to model geological heterogeneity. And potentially, I could use other data to constrain geological hedging. For example, in Denmark, we would use geophysical data to do that, obviously, to get a better handle on the buried valleys. That's why in Denmark, the geophysical data is extremely valuable, potentially even more valuable than the flow data hydraulic head, sort of head and stream flow data. So good, we arrived now at this point, where we want to do uncertainty reduction, we've created many models, they're not matching observations. But we've used those models to understand better what's going on. We falsified the models in the first place, and we've calculated sensitivity. So now we're ready to do this uncertainty reduction. Often, this is very challenging, because uncertainty reduction requires inverse modeling means using the observations to reduce uncertainty on the variables. And there are many uncertainty resources here. Right, we talked about boundary conditions, sky 10 data, blood uncertainties, rock physics models on certain these reactive biological models on certain these etc. So inverse modeling can often be challenging. That's why there are two ways of looking at this problem. One is called modeling version, and the second is called direct forecasting. So in modeling version, we're going to reduce uncertainty on the model variables with data. She knows in this slide, this part here, we do that first, then second, we're going to use the models that have been matched to observations to make predictions. In many cases, in practice, this doesn't work very well, simply because there are too many uncertainties. The second one is that the forward functions, simulators, things like that way take way too long to evaluate, I've been involved with simulators that take a week to evaluate just one evaluation. So this idea of inversion is not going to work very well. An alternative is to establish something different. We have to recognize that we what we already did is we have to Monte Carlo, we have created multiple realizations on model. From that, we have created multiple realizations of the data variables and the prediction variables. So another more modern approach is to do machine learning between those two is to learn the data from the prediction variable. Once we have that regression relationship, then we can use the actual observation to directly reduce uncertainty on the prediction variables. There's two important points to that. First of all, there's no models being inverted. And secondly, you directly get the answer you want on which the predictions are going to be based, which is the prediction variable. So in terms of inversion and direct forecasting, there's sort of these two ways. In terms of inversion, inversion is still possible in many situations. But we have to really be careful in applying this. What we want to do is not be optimization or stochastic optimization, genetic algorithms, or any of these kind of things. The reason for that is that you do not apply Bayes rule. You want to do a proper scientific uncertainty quantification at some point, you have to use Bayes rule. There's no way around this. Otherwise, you're not using the prior information in the proper sense. The only really rigorous way of doing Bayes rule and inversion is called Markov chain Monte Carlo. These are beautiful ideas and techniques. However, in practice, they become very difficult to apply simply because they just take an enormous amount of formal evaluations. That's why I'm more a proponent of what is called approximate Bayesian competition. We're not going to do this rigorous Bayes Monte Carlo inversion. We're going to do this in an approximate sense. And what we also want to be focusing on is to try to make the whole process faster. And one way of make the process faster is by called statistical surrogate models that allows us to do much faster evaluations of the model, rather than running into the physical model every time. In direct forecasting, we do the same because that was their relationship and then we use Bayes rule. But now between data prediction or between modeling data again, we can use the statistical surrogate models. So why Bayes rule? Well, in Bayes rule, if you're doing inversion, you want to get the posterior uncertainty of your model given the observations, and we know this is a function of the prior and the likelihood as you see here on that side. The problem with the full Bayes part is that in order to do it rigorous, you have to account for this relationship between that and that. And so in order to do that, you have to come up this full likelihood function, which is very difficult to obtain and it also requires running a lot of physical model evaluations. So the full Bayesian approach is often not done. What is done is then instead is called a large family of methods that are called likelihood free methods or approximate Bayesian competition. It's a very, very simple idea. It's simply saying you have half your prior, we have done not to parallel, we have falsified the prior, possibly simplified the prior using sensitivity analysis. So what we're going to do now is draw realizations from the prior models, we evaluate those models, if those models are close to my observation, then I accept a model, otherwise I reject the model. So that, as you notice, is very simple, but maybe very difficult to apply because what you're counting on here is that by chance, I create a model that matches the observations. So that chance, obviously, if they are small, maybe one in 1000, one in 10,000. So obviously, we want this function here to be very fast. And that's why we'd like to use surrogate models. Here's just another proof, essentially, of saying why Bayesian approximate Bayesian competition actually is a formulation of base role, that's more described in detail in the book. So surrogate models, surrogate models is essentially trying to come up with something that replaces the function evaluations. And it does require some form of training. That means we're going to generate a bunch of models, forward simulators models, and then calculate the mismatch, which is the systems. And we'd like to understand what is the mismatch, the function or the relationship between this mismatch and the model parameters. So that's where we build some kind of a machine learning surrogate model. A common surrogate model is called cart, which stands for classification and regression trees. There's variations on cart that involves bootstrap, and they're called random forest. And so you've probably heard about these terms, if you've taken a machine learning course, even even just out there, so cart and random forest. Why are these such interesting methods? Because they don't require any parameters, unlike neural networks, where you have to train and validate and cross validate, you have all these problems with overfitting. Cart is a more problem of underfitting. And so it's a very interesting model in that sense. And it's more about cart, of course, than I can talk about. But the other thing about what's really nice about cart is that it provides you a sensitivity analysis. So what you see here on the right is for the Danish case, the sensitivity analysis to the mismatch to the data. This is different from which I've before. Previously, we showed a sensitivity analysis to the data variable itself. Here, we saw sensitivity analysis to the mismatch of the data. So again, of course, we see that this budget and recharge of important variables. So what I have now is a very simple statistical model that I can evaluate in matter of milliseconds. That given any of these model parameters, and there are many, many, it will tell me directly, without running the physical model, what is the mismatch that I have with my observations. This means I can run this model millions and millions of times. That means I can plug it into approximate vision computation and start evaluating things. Here, you see that we get a good training set, which, of course, is the mismatch predicted with card versus the mismatch that I actually have, and you would do some kind of out assemble validation to do that. So once you have that, now you run approximate vision computation. And what approximate vision computation outputs is the posterior uncertainty of all optimal variables. And so what we're expecting is that those model variables that were sensitive to data, which we already know, that's our global sensitivity analysis, will be reduced in uncertainty. And that's what we see here, you have that red curve, which is the posterior, and the black curve is the prior. Variables that were not informed by data will not have a reduction of uncertainty. And this is critical. This is the application of these rule. If you use other methods such as optimization, genetic algorithms, and all the things that are out there, you will get an uncertainty reduction in variables that are not informed by data, because it violates this rule in many cases. Once we have that, we evaluate now the future, we again use card, that's the one go in the business of that, and we get an updated uncertainty of all of the prediction variables. For example, on this side here, you see that I get an update on uncertainty on what would be the industrial pollution, if I drill in location C, what would be the stream, the drainage of the stream flow, if I drill at location A, and you see that you get a significant update in the uncertainty of that prediction variable. Once we have that uncertainty, then we can start making this decision table. I'm not going to go at all into the details of this, there's a lot to talk about, and many of you are not involved in decision science. But basically, what we're doing here is using all of these results that we just talked about, of the uncertainty reduction and coming up with some kind of a way of evaluating what is the best decision. And so here you notice that we have a decision that says that location D scores the highest. We can also look at tradeoff, we can, of course, see that we can evaluate risk versus return. For example, in the book with describe how location BD has a better return by the much lower risk, higher risk. So if you want to balance that, then maybe location A is better. Okay, let's skip over that. And I think I'm coming towards the end. There's a lot of reference material. First of all, there's the book by Wiley, which was last year. I also have a YouTube channel, where I just search my name and YouTube, where there's many presentations and talks about some of the details of the book as well as other material. And we have also a website with a Git repository. There are many repositories to deal with things like global sensitivity analysis, falsification, dimension reduction, direct forecasting. And so feel free to browse through that. There's one specifically related to the book that was published. Okay, and that's where I'll stop and end here with our slide about Beijing Adventure Learning. And I'll be happy to take any questions.