 Okay, we're gonna start with two more talks and before I'm gonna introduce you the next speaker I want to point out at 4.15 we're gonna have to post a session and there's gonna be beer involved and that means that we cannot leave this room so the doors will close and as long as you have beer in your hands if you can stay in this room or the next room that would be really great otherwise we're losing our liquor license for these meetings. That's about it I think. Okay so let me introduce you the next speaker. This is actually Julio. Julio won the Savitsky Student Bundler Award. There will be a little bit more around this event during the banquet tomorrow so I'm just gonna introduce him. Julio Hoffman-Mendez is from Stanford University and he's gonna give a presentation about image quilt. Julio, can you all hear me? I turn it on. Hello, thank you for having me first of all and I would like to say that if you wanted to receive the award first of all I've been following the CSB-TMS community online for a while. It's a model of open science research and I really appreciate the award so thank you very much. So in this talk of today or this award I would like to talk about this idea of trying to better understand the system that we study and actually do it in a systematic way and so I think it ties very well with some of the talks in this model. Let's say I remember Greg mentioning this emphasis or this team of like modeling stream events and the process of actually trying to get this data and make some sense out of it or inside. Also from the talk of Paul this morning as well he mentioned the fact that some of these models we run them we make these predictions of flood but then at the end the last statement which is like what sounded to me is that the fact that you have to also have an understanding of the uncertainty that goes with it so stochastic simulation must be there and we need to somehow quantify this uncertainty and even though this title has a bunch of terms they're not well defined I'll try to clarify some of them during the presentation. So I think to start with I have to just give this example which is by the way also in that poster there just to show you how connected things are. This is the big event that happened in Japan the Hoku earthquake. The idea here is that like we have these models right so what happens when the models of the first thing that I'd like to point out is this quote by this I got it from Wikipedia which says that the Hoku earthquake came as a surprise to seismologists. Why the Japan trench was known for creating large quakes it had not be expected to generate quakes above an 8.0 magnitude. So it really tells that like even though we have these models it's really about understanding all the possible outcomes that these things can generate. It's not about a specific calibrated model that we care about but when this when these models are off and we are not aware of these facts things like this can happen. Fundamentally I think this research is mainly about trying to understand to what extent can these measurements and these observations that we make with physical experiments be used to improve understanding of systems statistically and how can we actually reproduce to be defined this systems by some sort of statistics that you care about. So in this slide what I'm illustrating here in the left this is an actual measurement taking this from experiments that we are studying and in the right is a completely made out synthetic model that I'm gonna try to describe during the presentation and the idea is how do we actually know if this model makes sense at all. Is it just like another model how do you actually falsify these things. There are many practical challenges that I would like to just point out here that I think many of you will identify which is first of all there's a lot of amount of data that's been generated on the experiments which is an unprecedented rate of data how do you actually digest all of this data to make some insights to make some conclusions. Second I think everyone that plays with modeling and develop these models knows how frustrating can it be for time consuming at least you calibrate these models to observations. These models have a lot of unobserved parameters that go into them and coming up with these numbers is really difficult so there's always this challenge of like okay I have all these experiments that I don't know I'll just set a specific number and I hope or I believe I'll guess that things are gonna go gonna be okay. And so when they only say that the model is off is not only the physical model that the numeric code but also the mental model of constrain yourself to pick a specific set of parameters and lately is the fact that like the predictive power that these models give you is usually also attached to the specific physics that you're modeling right so the specificity of the problem that you're calibrating the model for it doesn't generalize very well so you slightly change the boundary conditions or something of that kind and all the effort that you spend of weeks modeling some parameters in a sense they go away so the predictive power is not often a very good predictive power and so this key word I think for me is like uncertainty and that's what our research group does modeling these uncertainties explicitly so in this research specifically here I'm gonna share with you this case study from experiments and I'm gonna try to go on time so the idea here is that we have this from experiments which are these videos that was being played in the previous slide the simple observation that we make about this experiment in order to model it is this one where we can look at the successive frames of this experiment in black-white which I just threshold where the water is and by computing an appropriate difference or a distance definition that captures morphology I can produce this time series here which just tells us that there's a very peculiar even though it looks very random there's a systematic evolution of course it comes from a physical system right but the main important aspect or feature of this that I would like to highlight is the fact that there are big spikes interleaved with small variability so that's where you're targeting here in this modeling so that's the main observation I had in the beginning so develop this model I want to replicate this so what this model should entail then so we came up with this initial observation and you then convert that into a mathematical model right so that's the first thing we do mathematical model consists of this small variability genes interleaved with something that you don't understand very well so this we can model with any kind of shallow equations simple simplified pds or something of that nature anything that you can think of that the issue is like when we have the system for some reason or for some limited understanding we're not able to get all the physics inside of it so we're gonna do a stochastic transition to some motor state which has some motor duration delta t this whole model which is known as a Poisson random process as a parameter here which is of big importance which is the rate of events per unit time you see and it turns out that if you want to simulate a process of this kind which is a continuous time process you instead simulate the duration of intervals here by drawing from exponential distribution so you draw a time interval you simulate with any model of that you like you perform a transition that I'm gonna explain next and then you keep doing this over and over again with the hope that you're gonna capture those spikes that we have seen so the first thing we do is parameterize this stochastic transitions and the way I do this here is by this trick here which is I'll try to explain here so basically all that you are seeing here is all the images of the experiment so I have a video one of those that are analyzing have seven of them this is one of them and the color bar here just represents time from purple to red by doing some statistical manipulation by defining this distance between shapes which are this black and white images each dot here is an image the proximity of this dots in this 2d plane represent a proximity in shape in that sense you can see that this process under this sense or under this distance definition as a starts yet this cluster of purple jumps to something else then jumps there's this counterclockwise type of cluster that you're seeing so the question is we have all these images right there are a lot of them we're not gonna use all of them in our modeling you're gonna discard then and I'm gonna use just a few representative images from each of these clusters to represent the state so I have a set of states each state has a let's see a likelihood which is the size of the cluster it belongs to safe I pick an image from here I'm very likely to sample it more often that they pick an image from a small cluster correct and there's also this transition probably is that you can also model based on distances so the idea is we reduce the dimension or the parameterized system by reduced set of images now the question is how can you reintroduce the variability back right so if you look here the this time series again I'm illustrating basically I'm saying that these transitions that have been made our jumps between these clusters somehow and now I'm gonna reintroduce the variability in between that's the image quilting part where it comes from so we wrote this paper in 2017 which is just a disco algorithm that's purely data driven there is no physics going on it's a complete synthetic fake process but it's a process trying to reproduce the types of patterns that you observe on these images so what this algorithm does is you give to it one of those representative images and you additionally give some additional constraints if you will any type of remote sensing data or if you actually have an observation at a particular point the algorithm is capable of reproducing type of pattern what it does is that given just a single images produces like random images which have no physical sense but they kind of reproduce the morphology in a sense of what the system is doing at that state so what is the main advantage first of all we as I mentioned it is able to capture dimension condition to data and it's also very fast to run if you want to do Monte Carlo studies and what we did then was let's now generate fake videos with this model there was some process plus image quilting if you observe this black line this is the original measurements of wet area let's say I pick one property that I'm interested in wet area in the real experiment and then I generate fake movies that produce to be big time series as well what we can do then is start evaluating is this model useful at all which is a completely made out thing we can define similar statistics that you care about let's say here's one example where I look at the diagram which is basically the autocorrelation of the series to see if the model that's all the scourge that are generated from the color all these videos if the observation that I've had is inside of the square if it's inside I cannot do anything I can just move forward trying to falsify it further and then recall the initial observation initially we saw that type of spikes right so that's what I'm doing here I'm generating fake videos and calculating the difference again to see a similar type of behavior of spikes what is important here is because of stream value theory extreme value statistics you can also start falsifying this model based on important statistics such as return levels so we look at how much time you have to wait on average to see an event of magnitude x and because we have the ability now to do the Monte Carlo in a somewhat efficient way because it's like a purely data-driven algorithm there's no PD solving or anything that's super expensive we can get also a confidence band let's say of return levels which is also very important finally I think what I'm also trying to aim for of this study or next steps is understanding how do we move from this flow process that we have just defined in based on these image tricks to actually reconstruct reconstructing statigraphy so there are a set of additional parameters that are currently studying building this model on top that produces this type of models that have been generated here and illustrated in the first slide and what we can do is that with this is also start trying to falsify or calculating the likelihood of this outputs each dot here is one is 3d statigraphic model that's being generated by the model and actual measurements in the tank so even all this 3d this is actually 3d I'm just illustrating his license I can do some simplest of these corporations to put it into this space here and the colors here represent the likelihood of this observation being actually coming out of this process I have just created so what we are ultimately interested in is quantifying this uncertainty how often our models are generating things that involve what we observe in nature we're not constraining ourselves to a specific set of parameters we're trying to randomize what we don't know and so that's what Monte Carlo brings to us and to end I would like to thank you again for the award and I think this quote also is very important what kind of guides my research now which is think simplicity and distrust it so Alfred Whitehead is this creator of this branch of philosophy know as process philosophy and I think that's kind of what I'm doing when I'm doing my research I started with a very simplistic in a way sense model and I try to falsify if it's not falsified I continue with it if it's falsified I try to add complexity that's why thank you very much questions will you very clear talk I think there are three main advantages one is the fact that we are kind of trying to exploit the data that we collect so everything here is made out of the data not in that sense it's not necessarily the best thing in the world because of course we would like to introduce more physics but you are able to actually condition the outputs of this model to whatever you observe so if you know a priority that there's like a stream passing into a given location XY and owner that type of information the second thing is a speed as you pointed out so if I want to do my quantification of uncertainty I'm capable of doing my multicolling my laptop instead of having to require like a cluster of computing and wait a week to get some hundred models out and I think also the idea that this model is driven by that first observation that I showed you if I want to make a decision at the end of the day or I want to study those return levels I think I'm aiming with this model to reproduce those statistics even though the physics completely garbage if the model reproduced the statistics that I care about that's what I gain right so I don't care about the intermediates but I care about the return levels that I get so the model is developed for that specific purpose if there are no further questions so we can move on to the next speaker so thank you