 all will hear your lecture. Thank you. Thank you, Alec, for this very kind presentation. Welcome everyone. Good day, good morning, good afternoon, good evening. According to your time, it's my pleasure being here. I was looking forward to this workshop. So we have quite a couple of hours to stay together. I prepare several material. I recommend that you ask any question you want, okay. But as in the other lectures, you may write your question in the chat and I will refer to them in the discussion. So let me start sharing my screen. Okay. And in the presentation mode. Okay. Is everyone seeing the presentation okay? Is anyone giving me a feedback? Yes, I give that. Okay. From my side, it's fine. Okay. Thank you very much. So today, we will talk about data simulation hydrological sciences. So here is an outline of this talk of this lecture. I will start with some introduction on general concepts of what is hydrology and mainly what are the peculiarities for data simulation that are characteristic of hydrological science. At the core of the lecture, I will address four main examples, four types of data simulation problems. In, I would say, an increasing level of complexity. I will start with a very simple example, referring to a Kalman filtering of a linear system, which is the water mainly to improve remote sensing of lens surface. Then I will address to non-linear Kalman filtering another state estimation problem, referring to the problem of sparse observation. As we will see, is a very common problem in hydrology to have very sparse observation, especially at the ground. Okay. Then I will switch to geophysical inversion from state estimation. And also in geophysical inversion, I will address two main examples. One that we refer to a groundwater problem where the main task is a sort of a causal identification and quantification problem. And then the more complex, I will leave this example for the last part, geophysical inversion and state estimation together, which is also a quite common characteristic in many hydrological problems. And I hopefully, I will have some time also within the one and a half hours of lecture for some, to start some discussion or at least to raise some trending topics. So as you know, hydrology is a science or sciences that deal with the water cycle. And water cycle on the planet is a very, I would say complex issue. It includes many, many different problems above the surface, on the surface and below the surface. Luckily enough, there are other disciplines that more deeply deal with the part of the water cycle. For example, as you know, all the atmospheric part of the water cycle is addressed in the community of the atmospheric sciences, while the water storage in the oceans is addressing the oceanographies. So we may say by simplifying that what hydrologists do is dealing mainly with the land part of the hydrologic cycle. But of course, there are interactions with the atmosphere, with the sea that needs to be taken into account. And of course, there is the complexity, the hydrological crisis, as I said, deals with both surface and below surface processes, which sometimes are hardly observable. So in this very broad realm of physical objects, the hydrology deals with, there is also quite a long list of applications of the sciences. Here are just the list of the most traditionally widespread ones that go from traditional engineering design, water resource management, now to natural hazards like extreme events like floods and droughts, land atmosphere interaction, hydrology is very important to provide the lower flux boundary condition for weather and climate modeling, water-related environmental quality, and also in the last decades, also quite a growth of cross-disciplinary terms. Some of them give birth to new terminologies like echohydrology or sociohydrology, or more terms that relate to other disciplines like hydrogeochemistry, or even disciplines outside of the hard sciences like water security, with a lot of interaction with social sciences and economy. Of course, many of these teams now have more attention in the general team of climate change physics and impacts. But dealing now with what are the more peculiarities for data simulation in the hydrology, what is the peculiarities that distinguish data simulation from example in hydrology from atmospheric science, let's review at least a few important issues. First of all, most of the complexity of the problem reside not in the flow itself, but in the environment that contains and constrained the flow, and eventually in the foresee. Here are two examples. For example, the groundwater. Groundwater flow is very slow from a fluid dynamic point of view, we may say, which is a linear, is a laminar flow. So nothing complex from a point of view of the flow, but the characteristics of the media that contains the flow are very unknown because they are below the surface, and we usually have a very sparse point of observations or samples on what is the characteristics of the aquifer that contains the flow. Or even for surface flows like a river flow in practical application beside the flow being turbulent, but that not is the issue. The issue is that the flow is controlled by characteristics of the riverbed that may have a very peculiar, strong and localized changes like interaction with bridges, sudden change of the morphology of the river, and so forth. So there is no butterfly effects, which is relevant in hydrology, but a lot of unknowns in the boundary condition, if you want to put it more in the, or in parameters from a mathematical point of view, from a modular point of view. Strictly connected with this issue is that most of the most important governing equation in hydrology are convergent in nature with a quick loss of memory of initial condition, the opposite of what we call the butterfly effects. Here on the right, for example, it is a simple, I would say cartoon of what we may intend for a divergent system like the atmosphere or a convergent system like soil moisture. In a divergent system, either small changes in initial conditions may lead to very divergent tracks in time of the trajectory of the system. A convergent system, the opposite may occur, even very distant initial condition may bring the system to evolve toward the same final state. Of course, the convergent system is a desirable characteristic in terms of general predictability, but it has a strong drawback in terms of data simulation. Even if with data simulation, you may capture the right initial condition for the system, if the model is wrong and may be wrong by any reason, by structural inadequacy, by wrong parameter estimate, after a while it will converge rapidly to the wrong solution regardless of the data simulation. So as we will see clearly in some example, there is a low persistency of the benefit of data simulation in most of these hydrologic problems. Now data simulation, of course, is putting together models and observation. Okay, hydrologic observations need to be available on the many hydrologic components of the water cycle that I show you in the first slides. But some of them, I would say even the most important are poorly, very poorly, sparsely in time, measure systematically at the ground. Here we have three sketches of some of the main important hydrologic variables. Here on the left, soil moisture, evapotranspiration on the right, river discharge. Okay, this for example at the international network for systematic measurements of soil moisture at the ground and evapotranspiration at the ground. Systematic, not something that you do for a field experiment. Okay, something that may be available systematically to large-scale studies, for example. Okay, or even river discharge, here you have a map that show quite a large density of points in many parts of the world. But I will draw your attention on this small legend here, where different colors are shown for these different periods of availability of this data. And for example, river discharge gouging is suffering for a lot of this mission of many stations. Okay, the data that are available in the last decade are almost half of the data that were available two decades ago. Okay, so there is a plume of ground affordable sparsity of data. Of course, there is a lot of information coming from remote sensing, and I will focus on that. But again, remote sensing, as you may know, as I'm sure you know, is not direct measurements of geophysical variables. It's always an indirect measurement. We may say that remote sensing of land surface is itself a data simulation or better said, a geophysical inversion problem. Okay, here for example, the principle on how from satellite you can infer remote sensing information on soil moisture is based on some relationship between the brightness temperature that can be measured in the microwave spectra with respect to either the emitted or the backscatter if you use a radar radiation in the microwave spectra from the surface. And there is a lot of problems of uncertainty related, for example, to the different for a different path that the radiation may take in reflecting from vegetation to the ground and vice versa and so forth. Okay, so there is a lot of uncertainty and errors that depends on local characteristics. So the main problem of using remote sensing information is that the data error is usually very complex and unstable, varying from time to time from space to pace, from space to pace error structure. Last peculiarity, okay, contrary to, for example, atmospheric sciences, there is not yet what we may call a holistic hydrologic model. Atmospheric scientists for weather prediction use, it's true, different models, but models that are quite similar, one to another. They may differ for center parametrization for certain numerical scheme, but the guiding principles upon which these models are built are all the same. Hydrologic model instead is a jungle of problem-specific models. There may be a model specific like mudflow for groundwater, a model specific for river flows, for hydrologic response, a watershed side, water surface response at global size, the global scale and so forth. Okay, so moving from one model to another for one application to another, model characteristics and then model or error structures also may change dramatically. So let's start with some example. As I mentioned in the introductions, I will deal with a few examples that we may frame in two main, I would say, families of data simulation problems, which are state or Bayesian estimation or geophysical inversion. Of course, in Bayesian or state estimation, you assume that even if you have some source of uncertainty, maybe in the forcing data or in the model structure or more important in the non-observable parameters of the model, in state estimation, you assume that you cannot do much to reduce the error, okay, or at least to correct the biases due to this error, you use state observation and data to improve directly the state estimation or the state prediction that the model will provide. In geophysical inversion instead, the focus is mainly in the original uncertainty, especially in the model parameters. So even if the final goal is to improve state estimation, the specific goal of data simulation in geophysical inversion is to improve and correct uncertainty in the model parameters, okay, to arrive to improve state estimation as a sort of by improved by product improvement. Okay, so let me start with a few examples, as I said, starting from a state estimation, okay, given what I shortly mentioned about the usefulness and necessity of using remote sensing to acquire more information on the lens phase and hydrologic processes, I also mentioned that remote sensing is very indirect, so there is a lot of data simulation application that deal as the main problem, the improvement of this earth observation from space, okay. I will provide two examples, one a very simple linear one, okay, which will deal with the problem of filtering the clouds from remote sensing on the lens surface temperature in the thermal infrared band, okay, and then I will mention a more advanced nonlinear example that deals with the remote sensing of soil moisture in the microwave region. Okay, the poor filtering example with a linear model, okay, what is the problem here? The problem here is that if you measure lens surface temperature from a satellite, for example, from a severe sensor on board the meteor satellite, a geostationary satellite, you intend to measure the amount of radiation that is emitted in the thermal infrared from the surface, but if clouds come in in the field of view of the satellite, what the satellite will measure is not the temperature of the lens surface, but the temperature of the cloud, okay. As you may know, clouds are usually much colder than the lens surface, so the sensor will detect quite a colder temperature, which is not the temperature of the lens surface that you want to get information about, but it's the temperature of the cloud. So there is the problem of filtering these bad, or I would say wrong, lens surface temperature estimates. Most algorithms use sort of static processing of single data, okay, so just, for example, compare the temperature of a pixel with a nearby, okay, or use also visible imaging to correct the presence of cloud. Here you see instead of dynamic filtering, where we use the concept that if in a region is cloud-free, once the clouds come in in the field of view of the satellite, the temperature of the satellite will suddenly drop to a much lower temperature. So we can detect the presence of cloud in a dynamic sense, comparing temperature estimates with previous estimates, a typical dynamic filtering problem that can be addressed with a Kalman filter. So I assume that you already had in previous lectures, okay, or by your own knowledge, how a Kalman filter works. I will not go into much detail of it. As you know, Kalman filter is based on having the linear Kalman filter, the simple one, a simple update, I'm sorry, a simple forecast equation, okay. In this case, we use a minimal forecast linear equation, where the temperature at any time k is related to the temperature at time k minus one via an empirical coefficient and this falls by the incoming shortwave radiation, which also can be measured from the satellite. Plus, as you know, some noise. And then you have an observation operator, which is this case is very simple. You assume that the temperature that you can detect from the satellite is identical with some error to the state that you want to estimate. Now, how do we use these simple equations in the Kalman filter algorithm to filter out the presence of cloud? Well, relaxing one of the main hypotheses of the linear Kalman filter, that is the error of the measurement, this error here is Gaussian. This would be the main hypothesis in the Kalman filter, okay. We relax here this hypothesis and we will assume that the error of measurement is instead a bimodal, as a bimodal distribution as shown here in this graph on the right. You have a quite, I would say, narrow Gaussian bell that would represent, okay, the error of measurements in case the satellite is actually measuring the lens surface, but you may have the presence of cloud, okay, that may bring a much more substantial error, okay, which is if you see the sign of the error of this it will be a positive error, okay. It may reach values that are of the order of tenths of degrees, okay. So the algorithm is based mainly on trying to detect whether the error from the measurements, that is the difference between the predicted and the measured lens surface temperature is more likely to follow in this range, so to be an instrumental error or to follow in this range, to be a cloud contamination problem, okay. And of course you can do this by checking the difference between the prior prediction with the measurement if this difference falls within a given range then depend on the assigned model errors and instrumental errors, okay. Here is an example. In this graph, okay, you see with different symbols different things, okay, with the continuous dark line, the model predictions of that very simple model I showed you before, which of course has this behavior because first of all it follows the measurements and also is forced by solar radiation, okay. So it says that pick out the urinar cycle of lens surface temperature forced by solar radiation. The original measurements are the one that are marked with the round circles, okay. The black circles are the ones that are recognized to be valid measurements because the difference from those measurements from the prior prediction follow within this range, okay. The other ones, the white ones, which has a larger distance, okay, from the prediction are marked as being cloud contaminated. What are marked as this rhombodila is the final analysis of the Kalman filter that combines model prediction with valid measurements in the optimal sense that you have seen for Kalman filter, okay. And of course, you may have good reliable analysis, okay, reliable I mean with an error with an estimate of its error that is not too big, even at times where the cloud contamination mark the measurement as not good, okay. This is sort of using the prediction capability of a Kalman filter. So this algorithm has both two advantages, one of being able of filtering out filtering out bad measurement cloud contaminated, but also of providing some estimates of lens face temperature, also in case where there is cloud contamination. Here is a diagnostic of the algorithm that is done on a given region, this is Sicily Island, using another different remote sensing instrument, which higher resolution with respect to the gestational one to validate these measurements, okay. To be quick, I would concentrate on this graph here where you show as a function of the percentage cloud cover as detected by the modus verification instrument, the higher resolution one, but it has the drawback of passing only one cell A, okay. As a function of this true ground true cloud cover, the amount of valid LST estimate that you can get, okay. And of course, the higher is the cloud cover is the lower the percentage of valid LST estimate you can get, but in using this dynamic filtering, you almost double, okay, the amount of information that you can get with respect to a static cloud filtering algorithm. Okay, let me quick move to a more, I would say, complex nonlinear example on how filtering dynamic filtering based on the Kalman principle can be used to improve remote sensing of land surface for hydrological application. There is a lot of attention on soil moisture as a state variable in hydrology, not only because it provides controls on many hydrologic processes. For example, runoff formation depends a lot on the amount of soil of moisture that you may have in the soil when rainfall starts, but also you provide important information to atmospheric scientists, to climatologists, because soil moisture controls a lot, and we will talk about this later on, the fluxes in terms of heat and vapor that comes from the land surface toward the atmosphere. So soil moisture is a key variable for many applications, but as I showed before is very poor measure at the ground, not in terms of quality of measurements, but in terms of coverage. Okay, so there is a lot of reliance on remote sensing observation, especially in the microwave to address this problem, to measure soil moisture. Okay, what I'm addressing here is not a piece of research I'm taking part of, but it's so important that I'm happy to mention this. Okay, it's an algorithm that uses a Kalman and nonlinear ensemble Kalman filter to improve the root zone soil moisture estimation from in the microwave region taken from these map missions. It's another mission that has been launched in 2015. Okay, this algorithm uses as information from the satellite the emitted microwave radiation from the surface. Okay, so it will use what's called a level one satellite observation from the mission. It will use then a distributed ensemble Kalman filter with a catchment lens-to-face model. Here is a sketch of the details of the lens-to-face model. It's a model that covers the world globe. Here you have the maps, for example, of vegetation types that are considered in this lens-to-face model and that resolve the main process at the soil infiltration, evaporation, runoff that perform a water balance on all these units. And of course, soil moisture is a key state variable in this type of ideologic models. Okay, so this algorithm will assimilate the radiometric microwave measurement from satellite into a water balance-resolving distributed model over the world globe with a distributed ensemble Kalman filter. Okay, the advantage is not just the predicting capability, but also using an ideologic model some other important information are brought into the estimation system. For example, rainfall. Okay, we know that we expect the soil to be to have more moisture after a rainfall event. If we don't have an ideologic model to use, we will not have the mean to bring into the system this very important piece of information. So in this case, the model that to be a reliable model needs to be non-linear has the advantage, again, not only of filtering the satellite measurements, but of bringing into the estimation system many other information like vegetation and how vegetation respond to precipitation. Okay, precipitation itself. The result is that the improved product, what is called the level for product, has quite a number of advantages with respect to the standard product that use the brightness temperature only without the hydrological model to assimilate with. Okay, for example, the observation of the brightness temperature from the satellite has a nominal resolution of 36 kilometers. Initially the mission was designed to provide the product a nine kilometer resolution because the satellite when it was launched had on board a dual sensor, an active and the radar one whose combination was designed to provide a higher resolution at nine kilometers. Now the system of assimilating the hydrologic model allows to provide a nine kilometer resolution even with the passive sensor alone. Why this? Because the active sensor of the radar had a major failure a few months after the launch, it's not working. Okay, so the mission is able to provide the design resolution even with half of the sensing capability shutdown. Most important for application, microwave remote sensing detect moisture only from the very top five centimeter of the soil. But most of the hydrologic processes, for example, water that is consumed and re-evaporated by plants reside in the much deeper layers up what is called the root zone of the soil, which may be of the order of half a meter, one meter or even more. Assimilating the five centimeter measurements into the logic model that did with the wall soil layer provide an estimate of soil moisture even in the deeper layer of the soil. So a product that's much more readily usable by a logic application. Also the algorithm for direct retrieval of soil moisture from the macro in the microwave region works well where the surface has some roughness and coverage characteristics of a kind, not everywhere. Here on the left you have the mask where in dark are the region where the retrievers of soil moisture from the microwaves are considered reliable. Using a hydrologic model assimilation hydrologic model alone to produce estimates on the world globe. Okay, even where the microwave observation, either because the vegetation is dense or the topography is to variate, is to variated. Okay, even there you get some estimate. Of course, and that also something, some information that can be derived by the ensemble common filtering algorithm, the error of the estimate will be very different from time to time and from region to region. Okay, before switching to a different type of problems, do we have any questions so far? You can ask some questions now if you want or otherwise we will of course leave other questions for the remaining of the talk. Fabio, I have not a question but it's just now it's around the five minutes before the first break. Would you like to make a break right now or you will speak another five, ten minutes and then we will make a break? I will prefer speaking another ten minutes. Okay, no problem at all. Just to ask. Okay, let's move to again the state estimation with Kalman based algorithm but with the problem of using ground observation. Okay, sparse observation. Okay, and the problem here is, for example, is one of the many geologic problems is detecting in near real time the extent of a flood. Okay, you can do, you can monitor the extent of a flood either via levels on the rivers what are called gorge, river gorge data or also from satellites. Okay, as we will see one is sparse in space. We, as I mentioned before, there are very few river gorge stations around. The other one is sparse in time. Okay, to clearly detect the flood extent you need to use quite high resolution remote sensing which is not available every day. Okay, and there is also problem of cloud contamination here too. Okay, now here I will try to enter and I will interrupt in the while and we recover this argument after the break some more technical details to show how a general ensemble Kalman filter scheme need to be somehow customized in different type of application. Okay, let's start for example or try to estimate what is the extent of the of the flood during a flood event using river gouging. Okay, here for example you have the problem that the river gouging is sparse. Okay, so and you want to use the information to update your estimate of a field which is instead spread in space. A flooding will inundate quite wide areas. Okay, so you need to treat specifically this problem with what is called localization. How to use a point sample to update a nearby field. Okay, the other one important things is where you how do you consider general uncertainty in the models. Okay, here for example we are considering mainly uncertainty in the main logic input rainfall amounts and runoff amounts that will produce the runoff and also as I mentioned in the beginning the uncertainty in the in the environment that contains the flow for example the channel friction. What I'm showing here is a case study on 2012 flooding event in the Tiber Valley upstream of Rome. Here you have a map of the area where you have in yellow the extent of the flood of the flood as detected by one Landsat imagery that was available at that time. Okay, on this map here you have the location of five different gouging station that was possible to use to assimilate the 2D hydraulic model. Okay, to constrain or to have a better estimation of the flood extent. Okay, a few more minutes. Okay, I mentioned just to give you some hint of some pacification of of what you may call a customization of the Kalman filter when you enter specific problem the problem of localization. How do you treat the fact that you have information from a rigged gauge in one computational cell and you want to use it to update the river stage along the channels but also at some distance on the flood plain but not too far away. Okay, if you try to update too far away at an unreasonable distance what you get is not just a better state estimation but an error group. So in doing this we use a simple I would say the advantage that in surface hydrology more or less especially where the river is we know what is the preferred part of flow. Okay, so we can efficiently constrain how the information in a measurement and point will propagate first of all along the channel network. Okay, and then use this sort of kernel here you have the shape to update the computational cells that are closer to the measurement station. So it at any computational cell once you have the measurement from the ground station you update the state in the Kalman filter sense depending on the distance only waiting to start from the two nearby ground stations. Okay, here are some results. Okay, and then I will stop for the break. Okay, where at three different stations you see the comparison of the flood wave in dash black are the observation. Okay, the open loop mean okay is the blue line where there is a probably depending on you know wrong model error or wrong model structure or insufficient data on the rainfall input and under estimate on all station of the flood peak. Okay, if you only assimilate one ghost station is not enough to have a sensible correction while you get quite a short improvement by the I would say significant improvement with almost a perfect almost a perfect match of the peak flow prediction if you assimilate it at least all the four gouging stations. Okay, of course this improvement is can be measured and can be detected on the flow along the channel but just assimilating along the channel the river data on the channel does not provide enough information to also improve the estimation on the flood plate. Okay, in order to improve this we need to use also satellite estimation and I will show this after the break. Okay.