 All right, and thank you for inviting me. This title is actually a title of a book, and this book is not yet published. So again, I think this conference really giving a slice of research is a 400-page book, and I hope you get some ideas of what this is really about. In doing so, of course, I want to acknowledge my co-author, who was my postdoc five years ago, Pugua Marito, and he's now a professor at ETH in Switzerland. I also want to, of course, acknowledge Wiley Blackfall for publishing the book, and my students who are here, who have contributed a lot to this talk, and as well as a poster, Peshman Tamasebe, who's a postdoc, and Louis Lee, who's a first-year PhD student. So I think this conference can be framed in a bigger challenge in science and engineering than I've seen in many areas of research. But really, we're dealing with two big areas of science, which is data science and physical science. So in the data science, we talk about big data, but often, you know, big data for me is, well, if you're forecasting, it's still really a sparse problem. It's still a problem about uncertainty, and certainly in the area that I work with, this is very important. On the other side, we've seen the growth of very big simulation models. In the subsurface, we've been doing that for a very long time, simulation of geology, simulation of flow, simulation of geomechanics, structural geology, et cetera. So maybe a challenge is how to merge these two fields. Really, how to come up with physically realistic models that are data-charged and hopefully provide us better forecasts, and maybe not just better forecasts and sort of being accurate, but also being realistic about uncertainty. So traditionally, I've worked a lot in geostatistics is where I started my career, and just the physical field that has been developed tremendously at Stanford. It started actually in the mines in South Africa by Danny Krieg, then was formalized by George Matron in the 1960s, and one of his students, André Gernel, came to Stanford and built this really practical application of geostatistics in many areas of the earth sciences. André retired five years ago, and, you know, you could say that the bread and butter of traditional geostatistics is this kind of problem, that we have physical truth. We've got some limited sample data, and then people do either inter-estimation, which is called universal cringing, or they do simulation, which often is done in a multi-Gaussian framework, and you create multiple realizations of the truth. So if you look at these Gaussian models, and you compare it to this physical truth, which is actually the DEM near Walker Lake, it's an area in the Sierra Nevada, you see that it doesn't at all look like the truth. And that is some of the aspects that I try to address is how to put physical realism in stochastic modeling that are charged. Why is this happening? Maybe started really fundamentally looking at this problem. Well, fundamentally why this is happening of people express relationships using covariances or spatial temporal covariances. If you just look at these three images, you don't have to be an expert in geology to say first they're very different. You don't have to be a reservoir engineer to say that if you're going to produce from these systems, you're gonna get something very different. Yet the spatial covariance is the same of all these images. Here we see, actually not the spatial covariance, you see the VAR program, which is a measure of spatial dependency that is traditionally used in geostatistics. So neither the data nor the statistics that are traditionally calculated can differentiate at all any of these models. And typically what we end up is something like this, which is geologically unrealistic. So in geostatistics people have worked a lot theoretically with what is called the random function model. It's a part of probability theory where you need to rely on a lot of difficult concepts such as stationarity or codicity, other things, infinite domain in order to formulate and say really that the truth is a realization of a stochastic process. And then in the end we often say that this realization is sometimes decomposed into two problems, a mean problem, some kind of average that is fluctuating up and down and a variation now around that mean. So if I show you this picture on the left which is a physical simulation of geology and you ask geologists now to say what is the mean and what is the residual, obviously we are entering a problem here, we can't really apply this. So that is really why multipoint geostatistics was born. It was born only 10, 15 years ago and it's truly a Stanford product and it's been taken up by many other researchers in the world. It's of course an extension to higher order statistics but what we recognize immediately is that to extend to higher order statistics you can't go incremental. It's not the third order and the fourth order and the fifth order, that didn't improve anything. Actually it made things more complicated. The models got more complicated, the estimation become more complicated and nobody was using that. So it failed. The second thing that we have to realize is that if we want to communicate physics with statistics we have to have a bridge and that bridge is the training images and I will talk about that in extensive. It's a little deceiving this word image because it's not 2D, it's 4D. It's spatial temporal and it's an example of what you believe is physically realistic. We also notice in this field an increase in the use of computer science over statistical science and particularly computer graphics and we'll talk about that. What is computer graphics? Well computer graphics just ask your kids. Computer games, movies, things like that where people use textures and texture synthesis to create sort of, we want to create a wall in a movie just piece of a pattern of a wall and you paste it on that and this movie game this game will then produce this wall. The movie is still, the game is still 2D unfortunately. So there's some interesting connections to that. However in geostatistics we're dealing with a much more complicated problem. First of all it's not 2D, it's 4D, it's space time and in beyond the scale it's 5D. Secondly our earth textures are very organized, they're very non-stationary, they're very specific and they have to follow certain physical rules. If you think about channel deposition something like that then it needs to follow certain rules. So we would like to design algorithms and can take patterns that are now earth patterns in 4D and make replicates almost on the fly but and this is where it becomes difficult we need to be constrained to data. Data either take in a samples well seismic geophysics remote sensing, soil samples, everything that's out there is data. How do I create a pattern constrained to data? Texture synthesis people don't have to worry about that. I'm gonna show you two kind of basic ideas, basic principles and please don't read too much into it it's a whole lot more complicated but this sort of shows the little design of these algorithms to create these kind of realizations. The first one is beautiful and simplicity and it's also relies on this unique idea that was proposed by Shannon in 1948 and I don't have to explain this person to the audience of computer scientists is that the first sample is basically a sample of PDF you don't need the PDF you just need the sample. So there's no need for random function theory. So if you imagine this situation would be a typical situation in earth sciences you have three samples and you would like to know what's in the middle right that's a geostatistical problem you can calculate covariances and solve gradient system and matrices and stuff like that. Well if we have an image like that what we just do is look in this image where I have a match with this data configuration if I don't have a match I move to the next move to the next I have a match I take and I paste it. That's it. That's the algorithm and you go on and you generate complete realizations of the image. So this is called the direct sampling algorithm which is published. Another beautiful kind of idea that we have developed is what I call stochastic puzzling. If you have some kind of a pattern like here on the right hand side so in order to create images you do what your kids do right is you take the whole thing you cut them to pieces you throw it up in the air and then you start puzzling and the way you puzzle is you start at the border and then you go try to go along a line that's called the raster path in simulation. So we put a puzzle piece then the next puzzle piece we have to worry whether it interacts with the previous puzzle piece. So we look up in this image here where that occurs if you find one we put it there and we move on and then we can complete the puzzle like this. Both algorithms direct sampling which is more of a pixel-based technique and image quilting which is more of a pattern type of technique are extraordinarily fast. You can imagine that you can create 4D images very fast and for a non-expert statistician it's very easy to use there's no positive covariance and all the kind of other things you have to worry about. So that leads to the fast generation of complex spatial variability from existing spatial variability that you may have obtained. Here we see three examples and here we see using these two algorithms realizations and I can generate a million more of that. You may have a delta here we see actually we'll talk about that in a bit a flume experiment or a fracture system which of course today is very important I can generate many realizations. Take for example this here is a six million cell model is generated in four seconds. Okay let's talk a little bit about applications and the area that I work in is a subsurface reservoir forecasting. So the forecasting is really the problem here. The forecasting of first of all water reserves, water recovery factors, where do we drill wells and if you drill the wells there what I can expect in terms of production and this can be applied to all gas, water, CO2 sequestration anything that basically has to do with the subsurface. Some typical problems about the subsurface that are very different from what I call the above surface sciences is this large uncertainty in the nature of the oppositional system. Despite all the great data that Bihanna shows we still don't know a whole lot about the subsurface particularly at the very small scale and this small scale matters. A fracture is very thin but it matters. A shale barrier that's very thin it matters when you start producing and a lot of these things cannot be imaged or clearly imaged by seismic and seismic is an important contribution but it doesn't resolve everything. The second problem we have in the subsurface is we have very large data and we have very large models. We talk about billion cell grids now with multiple properties and very large simulations. The second big problem that we have which I'm not gonna talk about is grid. We have very complicated structures in the subsurface faults, layers, things that are complicated that requires extremely complex gridding as some of the areas that Margot is working on. And we have the availability of a large variety of data. So the current challenge in doing this is basically these two fields. We have big data that is just shown in the previous talk and we have big simulations. And we'll talk about what that is exactly. The subsurface is geology. This is what is so different from the above surface part. There is a medium and this medium is created through some geological genesis and process. So we have to talk a little about what geologists do because it's very important work. There's sort of three areas geologists work. First they work in the lab. We're working with Chris Paola who is a sedimentary geologist at the University of Minnesota who has a tank. And this tank is basically a five by five meter cage where you have sediments come in and they get distributed. And here then you can take this like remote sensing people, those snapshots, either pictures or LiDAR pictures, basically every second. So think about data creation just like that. Here we see for example, a zoom and here's the source of the sediments. And the patterns that you see in this tank are just incredibly beautiful and they just look like patterns when you fly over the Mississippi Delta, when you fly over Utah or areas like that and you see these various depositions. The other thing we can do with a tank is one that is created. We can take slices through it, vertical slices, kind of shave it off and then take pictures of that. And then we see, guess what? Very beautiful geological patterns that are being created and we have here very large data sets and we can create these models. We have hundreds of these tank experiments. So this is one area I think where geologists try to get our understanding. A second area is computer modeling. So people try to actually generate the geology at the time of deposition. They try to recreate the earth as if it were deposited and that requires PDEs, rules of erosion and deposition and a lot of genesis that needs to go in. That's the physics of the problem. Here's a model that is created by ExxonMobil of a turbidite deposit and it takes one month of computing time just to create one single physically realistic model of the subsurface. So because it takes such a long time but it creates really something that is extremely realistic, people now do also what is called process mimicking models where you try to sort of mimic the physics. It's not really accurate but it creates you something at least that's very realistic, say these meandering shapes. And of course the other thing that geologists do is go out in the field and look at outcrops and look at measurements there and take measurements there and these days there's a lot of interest of course in using photogrammetry and all the new devices and stuff like that to very rapidly look at outcrops and create them and then come back and maybe do some basic computer modeling. So this is all great, right? It allows us to understand geology, it understands genesis but it doesn't do anything for forecasting. These are not numerical models that I can say put in some core forecasting system, a flow simulator or something else and create actual numbers of oil rates or other gas rates or water rates or whatever you want. That is why we need BEYONDOM, big data. So here we have a data set from any of, this is the kind of data that shows it's not a passive, this is active, right, and it's shot and these are very large data sets. Here we see a look in depth and the target reservoir maybe somewhere just down here. I'm actually not interested in that, I'm interested in this here, okay? If I take a small slice through here and I look from the top I see this here occurring and I go look at that a little closer, I see that, I see guess what, channels. And then people say great, now I've got the answer, no, you don't have the answer yet. This doesn't tell me how much oil I'm gonna produce in the next 10 years. Secondly, its velocity, it doesn't tell me what is velocity and permeability, it doesn't tell me where the fluids are, it is an indication to it but it doesn't tell me that exactly, okay? Then, of course, geologists come and say, yeah, this is all great but there's more detail to it. There is subsysmic variability that is very important to the forecasting that I'm gonna do and that will need to be generated. So, and Annie did create it with these process models, a 3D turbidite model, this is turbidite is basically an underwater deposition of channels of sand and shale and created this model. So basically now we have this big simulation and we have this big data and we need to merge them and try to forecast. So what we did with this is say, well, this is realistic geologically but doesn't match data. This is the data but it is not necessarily what I want. So let's use this as a training image and image quilt our way, including having some wells and in those wells I have recorded geology and I can generate many, many hundreds of realizations of the subsurface that reflect both the data and the geology at the same time. So here we see for example, if you look a little closer, you don't see the well here and that's good because that means that the model is matching exactly at that location, that information. So you see it's 4D puzzling, right? Actually it's 3D puzzling, imagine having a box of puzzles but you need to put the puzzles together at the same time you need to constrain to this point data. So that's the strength of these algorithms. Then we can do multi-phase flow and unfortunately the projector doesn't wanna show oil. So here's the oil rate over time which is uncertain, it's very uncertain in fact in these kind of systems and then companies can make decisions about risk deciding to drill their platforms where they want to drill them. So this was really uniquely an area developed for subsurface and has now in the last 10 years has seen a rabble explosion and I think that's a healthy sign in a lot of areas in the earth sciences and I'm gonna talk a little bit about those things. And this is not my research so these are publications of others that have been published and there are sort of three areas here that I'm gonna talk about and they show a little bit different sort of complexities. Some problems are temporal or they're spatial temporal or they're multivariate temporal. All of them sort of have to deal with large data sets. The first one is remote sensing and gap-filling which is a problem when you have remote sensing data and you need basically to deal with orbits that, you know, characteristics that are missing in the data. So people in the traditional statistical field have looked at this and they solved this through covariances but that requires the inversion of million-size covariance matrices which we don't have to do. And plus if you use covariances you will still only capture the second order of the data. So here we apply, here these orders apply these direct sampling technique and create from the existing data itself, right? Because it does have characteristics that are there. This is multivariate. This is, look at the soil moisture, latent heat, temperature and soil moisture. So it's a multivariate temporal problem that you can immediately villain the gap and this is done in the bottom here, right? Without any intervention much by the user. Another example is when your satellite data is covered with clouds and those clouds are moving you can sort of directly generate this fill-in or gap-fill. Another area, again, it's not very visible here necessarily but we have the stochastic generation of rainfall. This is now time series. So it is not that this is just a spatial problem you can also work in time. Here we see the rainfall in Darwin over 10 years and over 100 days and this is the simulated rainfall in Darwin over 100 years, 10 days and, sorry, 100 days and 10 years. Here we see actually the same thing. It's a non-station. The black line is the actual rainfall in Darwin, again over 7 years. You see this very complicated dry wet periods and then you have this non-stationary effect and this is a direct sample simulation again and we can do many simulations and we can then forecast percentiles and things like that. There are some other authors in IEEE have published how you can do that in space. Here we have a narrowed image of ground-based wetter radar which we can combine with cloud-top temperatures that we can then simulate basically replicates of radar and of rain in the east coast and thereby this is used in data simulation and ensemble modeling for meteorological applications. The last one which I find personally very interesting because we have the similar problem is downscaling. Often when you do physical simulations and you create large simulations you can only do a really coarse grid and we need to create stuff that's on a finer grid to make actual forecasts and in reservoir this is a very hot topic as well because we can't really do simulations at a very, very fine grid. We have to go to this coarse grid. We can only do on a coarse grid, sorry. We have to go to this very fine grid and the same problem occurs of course in the climate modeling. So here we see three climate variables basically for Australia and these authors wanted to create fine-scale models for watershed modeling and the coarse-scale climate models don't give you the necessary information that is only at the 50-kilometer scale you need to have it at the 10-kilometer scale. So what they did is they used the shorter sequence of the climate model physical downscaling which takes a large amount of computing time and they used that as statistical relationships here to directly downscale that information for a very long extended period of time and that was quite successful in this case. You see that you can create from these large-scale models very fine-scale features on this. Here you see some more data for example. Here's the large-scale climate model. You see it almost averages out the entire coast yet you can see that it produces these very fine-scale features that will be then used for further modeling. All right, I'll keep it there and maybe formulate if you take a ways for you. Hope you get some appetite for purchasing the book when it comes out in a few months and this is all the, you know, the advertisement is academic advertising. So what is multiple geostatistics? It's a random functionalist modeling that I become, I think, a hybrid between computer graphics and traditional geostatistics. It relies on training images that are either physical experiments, right? We're going to put the physics in the statistics and I think what is really driving this and what has been the success is that you can use it as a non-expert and I think that is very important. I love statistics and I've done my PhD in that area but the models that are being created are very, very difficult to understand for the non-expert user and this has been my personal experience. When I go teach courses in the industry for geologists and you start talking about positive definite covariances it doesn't go very far and people just ignore it. So providing them with a framework that's an image where you can sort of sort of communicate statistical variation in something explicit and then have automated algorithms that do this very fast without parameter estimations or complicated maximum likelihood or Mark of Chamber on the Carlo we are avoiding any of that statistical research for practice it doesn't really work very well. So with that I have one more morbid ending in terms of the slide that Margot also showed. Do you trust your covariance relationships? Here we see the relationship between US spending on science and suicide. It's 99% correlation. Thereby I propose we cut all funding to all universities in order to help those poor people. Thank you. Thank you Jeff. Any questions for Jeff? Thank you Jeff. My name is Bob Entrichen from Electric Power Research Institute. I'm interested in the work that you're doing for stochastic modeling of the downscaling climate other weather phenomenon. Have you tried combinations of weather phenomena like rainfall, wind insulation, etc. combined and characterizing the correlations? So as I said this is not my research the last three applications I showed are by other authors it's actually the research of my postdoc now co-author here. One of the challenges for this is what you describe as multivariate. It's not just about forecasting it's about forecasting a set of phenomenon I think there's great potential but there's still really interesting research to be done in solving this multivariate problem. What we notice with multivariate problems is that the correlation is often complex and complicated and sort of relying on the standard covariance and cross covariance models often leads to an overconfidence in the correlation. I think the last example is a very simplistic example but it shows it. And so what we try to do with this kind of work is it's discovering pattern relationships in space-time. That is really what it is about pattern relationships not just point-to-point relationships and how to discover that and how to use that information maybe I'm not necessarily an expert at that but I can imagine again that you can rely on some physical modeling that can explain you what the relationships are and then you can use that information in some forecasting models. So what we're doing here is basically decoupling from the actual forecasting we're learning from physical models in order to make the forecast much more realistic. We have time for one more question right here. You mentioned... I'm Leo Taborowski from Big Confuse. You mentioned uncertainties in the first slide, the second and I'm wondering when you separate data and the images there is a lot of uncertainty in correlating images to data and there can be multiple images which you can supply to obtain the same data and we usually are very interested in this multitude of images or models which can be correlated with the data. So what you do about this uncertainty do you provide multiple images or how it works? Very, very, very excellent question. I think it's the problem in the industry exactly what you describe. People use one image. People use one model. People use one concept, one idea. That is the understatement of uncertainty because basically people look at the seismology and say, beautiful! That's it. And then they walk away, try it drill a well and they drill a well in the middle of the channel. I've seen it twice happen this year with the companies at Oracle. It's all shale. All of it. So what we are advocating and we have two chapters on what you describe is multiple training images and a discovery of scenario uncertainty that you can discover actually that there are not one training image, of course, but there is a discreet set of images that are explaining your data. The second problem you describe is that of consistency between physical modeling and data science. We have an entire chapter that talks about validation and consistency because that is the most difficult part. It is to come up with a series of images that not only describe the uncertainty but also are consistent with your data and it's the current, the most important, ongoing part of research in my group. So yes, thank you. Please join me in thanking Jeff once again. Thank you.