 Okay, okay, so welcome back everyone. Let's continue with the second part. We will focus now on the different data simulation strategies that have been developing in the last decade, I would say. These are essentially three as we will see that that insertion the inversion of the source term and the sequential ensemble based sequential data simulation. So just to recap a little bit what we said before. The use of data simulation in our field is quite recent, I would say one decade. And essentially it was motivated in the aftermath of the 2000 2010 eruption in Iceland, which rebelled some flowers in the operational forecasting strategies and clearly show show it the need for doing more quantitative based forecast. Some of the developments of the road map of the scientific community was set up and was paid during two very important meetings that took place in Geneva in 2010 2011 that gathered a part of the scientific community. And let's say, decided that the different maps and strategies that the community should do in order to have this change of paradigm in the forecasting of volcanic clouds. Remember that the problem essentially consists on finding an optimal initial condition for the models. And that we have a lot of uncertainties and we have uncertainties in the source term, which have a clear, let's say, but influence on the results of the forecast. And the different data simulation strategies that have been tested mostly at the scientific level during this decade, try to circumvent these uncertainties. Normally, by using observations data from from satellite detection and retrieval. Okay. So different strategies have been proposed in many scientific studies and many publications of increasing complexity, starting from the simplest one which is that insertion and moving moving then towards more sophisticated data simulation strategies such as the inversion of the source term or the ensemble based. I have to say that, nonetheless, there have been a lot of progress at the scientific level but the transfer of all these strategies into operational forecast setup is quite slow. And we are still on it for for several reasons that we will discuss. So let's start with that insertion which is the first modeling data simulation strategy, this is the simplest one and the idea is very simple. It just consists on giving this initial condition by means of defining a virtual source away from the volcano and derive this virtual sort from satellite one or several satellite retrievals. So it requires to have to derive concentration from the two dimensionals and it will be must load retrievals. And then of course we also need to interpolate observations from the satellite grid, let's say to the to the model and we have to do it properly in the sense that we have to ensure for example conservation of muscle it cannot be just a simple interpolation. So we have to do some interpolation mechanism. And nonetheless this is quite let's say a straightforward you can see here some examples. The image on the left is a full 3D model simulation without data simulation for the SO2 in this case is not as is SO2 cloud of the 2019 Red Cocker eruption, this is what you can have for example without data simulation, sorry that insertion yes. And this is the result of a simulation that uses that insertion that has used that insertion before this snapshot. So insertion was done 12 hours. So this is what you have 12 hours after the insertion. And this is a comparison with with one observation one retrieval from Himawari satellite. As you may see, when you use that insertion, you improve notably the forecast with respect the simulation that had no that insertion. We can put some more numbers and look at some forecast skills and when we do that we see that in general that that research should improve the skills, but it's far from perfect. So what we can see here is the evolution of different metrics, probably take, look at the red line which is the solid metric is a composed metric metric that is composed by three different ones that what we call the structure amplitude and location of the cloud is a solid metric. But the important point here is that the value of zero is the optimal one, and then it has larger or smaller values is metric. It works. And the black line is another metric is the figure made of space which essentially is the ratio between the intersection and the union of between the simulated and observed clouds. This one has an optimal value of one and the worst case is zero. What we have here is how time series of this metrics for one particular simulation of this record a cloud. So at T zero, we have the insertion time and then we have today's forecast to any 48 hours forecast and we can compare here what happens when we do that insertion on the top and without that insertion at the bottom. For example, if we look at T zero, of course, both metrics are perfect because the observant and the simulation clouds are the same we are doing a session here. So we have a solid zero and a figure made of the space of the space of one. Of course, as the forecast above in time these metrics start to decorate and to have both values. And if we look now at one screenshot at after one day at our 24 is what you have in the images here on the right. So in the data simulation case you can compare the model which is the red corp, let's say you're the red contour with the observations which is the blue one. Okay. And this is the difference when you have data simulation here on the top and without data simulation. No, no, I am very sorry, may I interrupt you because it's a very important issue. This data insertion where you inserted where the ongoing when and where you insert only within the some specific known areas of measurements. So how it is done. What means exactly insertion insertion means you have a satellite retrieval such as for example, I will this one for you have several retrievals. Okay, this is for example, another option in Chile at the zero at the 24 one day after two days after etc. Okay, for example, take this one we have this observation and we give this as the initial condition so we don't we don't put any condition here in the volcano. We just put this initial condition and let this cloud evolve with time so essentially is in an initial condition. Okay, thank you very much now I got the message because I didn't know where exactly you inserted. Yes, so for example, this is what we did in this particular case so we have this retrieval. This is an initial condition. So the model with that insertion these are the results of the simulation at different time instance of the model with that insertion as you can see at the zero, both data and observations at the same because we need to like the model with an observation. And then we left this above and we get this after one day and this after two days. Okay, and we can then later on compare what we will get after one day with another independent observation after one day or after two days. And you can see that this is much, much better. So actually we can compare here the model with the data insertion with the model with no data insertion where the model here is always the red code and the blue ones are observations. So this is in this case we have an observation in which we give the source term at the volcano. And this case is a simulation where we give the initial condition as the retrieval. Okay. And as we can see, for example, here, as both simulations with and without that insertion involved with time, we have some differences, particularly here. No, we can better capture this part of the cloud with the data insertion. We can also draw your attention on this part of the cloud. This is all the mass that has been interrupted after the insertion. And in this case, of course, with and without insertion, the simulation is the same because that insertion was when we applied that insertion was only in this part of the cloud, the simulation, the reduction was still ongoing when we do the insertion. Okay, so it means that after two days, for example, both with and without the insertion are actually the same, because the mass that we inserted at the time of the insertion has already left the computational domain, and we still have the reduction going on here on both simulations after 28 hours they are they are the same. So let's see a little bit the pros and the cons of this strategy that the main advantage is that with, let's say, remove the uncertainty in the eruption source parameters. And this is very particularly in the case where when the cloud is detached from the band when we do the insertion, what does it mean, it means that if when we do the insertion their option has already end, and we have one cloud that is detached from the band, then we don't need any any information on the options of parameters. In contrast, if at the insertion time their option is still going on, we will have one part of the cloud in which we have a simulated data, and one part of the floor of the cloud in which we have not assimilated the data yet. So here we still have, we cannot, let's say, eliminate this dependency on the uncertain eruptions of parameters, if the eruption has not finished at the time of the insertion. Okay. Another advantage is that these strategies in principle quite quite easy to implement. However, we have several cons here. The first one is what I already mentioned, if the eruption has not finished, if the eruption is going on with the insertion, we still cannot eliminate 100% the dependency on the eruption source parameters, and we need them anyway to forecast in the future because we need to know the emission that will be the mass or the properties of the emission in the future. We can just assimilate the past and the present, but we cannot assimilate the future, so if the eruption is going on, we cannot eliminate this dependency. And the other limitation is that the satellites can only see, let's say, a part of the picture. For example, we can have that part of the cloud might be obscured at the insertion time. For example, if we have a scene in which there are a lot of overlaying meteorological clouds, or we have ice that is coating particles, maybe we can have a lot of false negatives. It means that we may have the ice there, but the satellite is not able to retrieve and we are doing an insertion with, let's say, a lot of errors in the observations. One important thing in our field is that the observations are not error-free. We can have a lot of false negatives under certain meteorological scenarios. The second, let's say, drawback is that this passive sensing, as I mentioned before, are not resolving vertical. So we have a zenithal view of the cloud. We see the cloud from above and we know the vertical mass, but we know only how mass, let's say, the column load. But we don't have any information in general. If we do not do additional collocation observations, we don't have observation about the vertical structure of the cloud. So it means that when we do that insertion, we need to assume some cloud thickness, and then assuming a thickness, we can have some kind of specially average concentration. And this is what we insert, because we are inserting in the model, we are inserting a 3D cloud from a 2D observation. So we need some additional hypotheses on how the mass is distributed in vertical. We don't see that from the satellite if in case of passive sensing observations. And then another thing is that the satellite actually only sees one part of the cloud, only sees the fine material. Typically, satellites can observe particles up to a few tens of microns of size, but we don't see larger particles. So this data insertion technique works well for clouds that are more distal. So in the approximate clouds where we have mass, a substantial fraction of the mass is in particles that are, let's say in the minimetical size, we are losing a lot of the mass in the observations, so we cannot assimilate it. So let's see how we can address these limitations that we have with that insertion. The first limitation that I mentioned is that what happens if parts of the cloud are obscured during insertion time. Okay, so we can somehow circumvent this doing a multiple retrieval strategy. Okay, for example, let's imagine that we do an ensemble of flats, and we have a number of observation times. We have one observation at time one, another one at time two, another one at time three, and so on. Okay, this would be the analysis. And now let's imagine that we do an ensemble forecast in which the first ensemble member is initialized with these observations and then we just run the model forward in time. We did the same with the second and second member we initialize them all at another time instant with another observation here, a third ensemble member and so on. And by this way we construct an ensemble of runs each initialized with the different observations, and then we do that for a cast as usual. And when we want to do the output we can do some combination of ensemble members for example we could take the ensemble mean, or we could take we want to be conservative. The maximum among all these ensemble members and so on. The advantage of doing this. The advantage is that maybe in if we just one consider one if we just one consider one particular insertion time maybe here there were there are a lot of meteorological clouds that are obscuring part of the cloud. And we don't see that. Probably the same is different here and we these cloud, these parts of the cloud that were hidden here here are visible. So we, let's say somehow can make evident or be less dependent on the on the on meteorological clouds obscuring the ice cloud right and the strategy that works quite well. The second limitation is about the vertical resolution. And this, as I mentioned before can be addressed by colocated this observation with polar based satellite observation. This is one example to illustrate what I was mentioned. These are named the UK name model simulation that combines six different retrievals in the analysis it means that we have an ensemble of six members here, each one initialized at a different level. So we have an analysis that spans 35 hours in this particular case, we have a window for the analysis of five hours. And then we do, we run six ensemble members each initialized with a different observation. And we can do, for example, in the forecast phase. We can consider the median of the ensemble or the mean or take the maximum value so this is a kind of composite image in which at every pixel, we take the maximum among these six example members so we here have a conservative approach okay. This normally scores worse, but it has the advantage that is more conservative so if there was asked that is obscured in some of the ensemble members when we take the maximum we will see it. And actually when we compare with the retrieval at the same time instant, we can see that this avoids the problem of the false negative that as you can imagine is very, let's say something that we don't want in the case of aviation because if I just there we want to be conservative but what we cannot let's say afford is having a situation in which we do not detect ash and airplane and airplane is flying there. The second mechanism is the source inversion. And this one essentially consists on finding the optimal eruption source parameters by best fitting one or a series of observations. Okay. So this is very similar to that insertion, but it has some advantages with respect to that insertion mechanism. The first one is when when we do the inversion the inversion essentially consists of a we also have an observation and then we try to invert this integrating backwards in time to see which is the best emission profiles, the mission profiles that let's say fit better the observation. When we do this, we are explicitly resolving for the vertical structure of the cloud. So, as opposed to what happened with that insertion. When we do the inversion we do have a treaty reconstruction of the cloud is not just that we have a 2d observations and then we have to infer the, the three dimensional structure of the cloud. But in the case of source inversion we explicitly resolved for that. But it has also a second additional advantage, the source inversion is that because we invert for the emission profiles, then we already have them. And then they can be used to forecast in the future. Okay, so if when we do this inversion the eruption has not finished yet. We already have the emission profiles and we can assume that they, for example, if we assume that they will be the same in the future, we can integrate forward in time the model and use this 3d emission, sorry, this emission profiles at the volcano. This is something that we don't have in the last insertion because in the last insertion we just put an initial condition, but we do not, let's say assimilate or we do not retrieve any information on the volcano we do it far away from me, instead with the source inversion, we've tried to find which are the emission profiles that better fit the observations. Okay, so this is how it works. So many of, let's say, of the of the strategies are based on the so called elementary by Asian inversion that this was originally proposed in a by silver, almost 20 or 20 years ago now. So this is simple is they propose an algorithm that minimize cost, a cost function that needs. First, we need to know and a priority solution for the sources. And then we apply the Bayesian bias theorem that is in formulation involving uncertainties in this pre or sources and in the observation so there is imagine that we have a volcano and then we have we release from this volcano. A series of point sources. Okay, so I just draw here six sources but actually we have to use hundreds of them. Okay. And within with this vertical profile, this is a guest profile. We have to assume that this is the a priority source. We run the model up to the analysis time. And here we have one observation we have one observation of this cloud and we have the simulated cloud. And then we do a correction, knowing this a priori solution and knowing the errors of of the observation we can apply the bias theorem do a correction and then get a corrected profile. Okay. So the idea of this is simple is, if we can decompose this emission as a set of N point sources. Okay, above the volcano. And we run this up to the simulation time and then we do a correction in essentially in the mass that we assigned to these of these sources to retrieve which profile because the source of each one has a different, different mass. So we solve a linear system of equations to get the individual mass that is released in all of these point sources. So this was in the case of volcanic clouds was first tested in 2008 by a current qualifiers, and they in this case in this particular case use the Lagrangian, the flexible Lagrangian model with SO2 observation. And they define it a cost function with essentially three contributions. Okay, so we have a functional here that has three components J1 J2 and J3. And we want to minimize this functional so that the corrections that we apply to this a priori sources, give a minimal minimum value of this function. Okay, and they consider here for example that the misfit that we have between models and deviations, so this is what we look for we look for the corrections that we have to apply this unknown of the resulting system, this is the correction that we have to apply to these sources, knowing the a priori error. We have to minimize the deviation from this a priori value and in this case they are so imposed by definition that the resulting distribution has a minimum deviation from from smooth. Okay, the point here is that okay you have a linear system of questions that you have to minimize to find the correction in order to get the best vertical the vertical profile that minimizes the, the observation. Okay, so yes, as I said this results at the end you have to solve one for one model and one linear system of equations that gives you the increment with respect to the a priori solution that you have as human. These are, for example, results for this particular case they applied this to one eruption in Africa for an SO2 cloud. And just to give you an idea, for example, this is the height versus, so mass the distribution of mass in tones, in tones per second so variable versus altitude so essentially the vertical emission profile. So that's the a priori solution that they observe. Then with this solution you integrate forward in time the model up to the analysis time, then you compare with the satellites solve all these minimize this functional there, and then you get the corrections and depending on the type of observation that you use these are different results for different data forms, you get a correction of this initial profile, and you get something for example, like this, for example, with the only one you get the yellow one, etc, with the different satellites as you can see here. This is quite difficult to get without doing this inversion because your mission profile is quite particular you have different injection of a different different layers in this particular. This is that the most prominent one but you also have injection of different layers at two or three more heads. And this gives you a very particular dynamics because it's important to take this into account because normally the atmosphere we have we share. Depending on the injection hate this flow can go to the north to the south to the east, etc, because the direction of the wind strongly depends on hate so it's very important to pick or to, let's say, get right. This injection picks because in case of we share this will give us a very different pattern something. And this is the result when you apply this emission profiles and you solve the model for wearing time, you can get, for example, something like something like this this compares. We have only observations which are the cold color contours with the model which is the contour delineated by the by the black line at different time instance. And as you can see when you do the source inversion, you get a much, much better results. So applying the bias theorem and doing this elementary way as an inversion has been applied by by many, many, many authors. Well, it's quite quite let's say successful node is just another example. You can see in the flex part model by Christian Senetal and you can see here the profiles that you get after the inversion and then when you integrate these profiles again for wearing time, you get something like this and a comparison with with the observations. In the works, they assume it that these profiles were steady means that when we get these profiles, they've not evolved with time but later on for some of the authors already did time dependent inversion. For example, in this, in this work, they did an inversion with the time dependent source. This is the a priori that they assume. So, this axis you have time here you have hate. And these are the priori in emissions. So they did this, let's say time dependent inversion considering several observation time instance and this is how you correct and the posteriori emission profiles that we get so you get a time dependent source, which is much closer to reality. Once you have that you can integrate again for wearing time. Just to give you some numbers in this particular study, this, let's say this data simulation strategy reduce it, the root mean score error in the flexible model by around 30%. This is typically the numbers that we get when we apply this. Of course, it depends a lot on the every specific case in some cases 10% in order 40, but just to give you an order of magnitude of what what you can get when you apply this. So you do that source inversion, compared to assuming the priori emissions. So if you run just with the priori emissions or you run a forecast with this, let's say improved vertical profiles. In this case, typically you can expect a reduction of the root mean square error by around 30%. Just another example, more recent in which these authors did join inversion because before we saw inversion for SO2 clouds or for ash cloud, but here moxnes and co-authors they did a first join inversion, doing a simultaneous inversion of SO2 and ash observations. And again, here they have to assume this is time, height, the total mass for SO2, the vertical profiles for SO2 and for ash. And this is how after the inversion what you get. So this is the posteriori, these are the posteriori profiles for SO2 on the left and for volcanic ash on the right. As you can see, there is a substantial difference after this optimization. It's a procedure, you are able to have only vertical profiles with a much, much higher resolution and this improves substantially the forecast. So just a comment is that this elementary variation that in which you need to release a number of point SO2s and integrate this in time. This is very, very optimal for Lagrangian models because you just need to integrate the model one model run in time and up to the analysis time. But the question is what happened with Perlerian models because remember that here in the analysis we need to know, identify every pixel at each pixel of the stimulated cloud, we need, let's say to identify the original source contribution. This is quite easy in the case of Lagrangian formulations because we can pack the Lagrangian particles of the Lagrangian path and then in the analysis time, we know exactly the contribution from each of them. But this is not true in Perlerian models because there we don't have paths, we have read points and we don't know, given the concentration at the point, we don't know how to let's say the different contributions of the elementary points. So in other words, to make it simple, in the Lagrangian models, we just need to run one single forward modeling, but in the Perlerian the situation is much worse and we need to run if we have 100 sources, we have to run 100 model simulations with one single point SO2 and then combine them. So this is not, of course, it's far from optimal. And because of these limitations for applying this Bayesian formulation in Perlerian formulations, some authors introduce a very different source inversion approach that is valid also for Perlerian models and that's not reliable on the Bayesian formulation. The idea is, okay, we could characterize this emission profile by means of some known functions that depend on parameters and we can do an ensemble of runs, run these ensembles, okay, and then trying to find pattern correlations as a measure of the model observations agreement. So here in this approach, the inversion essentially consists on finding which combination of these source parameters have the maximum pattern correlation with images. And this has several advantages, because first of all, we don't need any assumption on the uncertainties on the model and on the observation, but there's something to remember that in the Bayesian approach, we also need to know some, we have to bound the arrows and give the insertion uncertainties of both model and observations here we don't need that. Here we are doing an ensemble that explores the range of values of the different parameters and then for each ensemble member, we can just find a kind of pattern correlation, how correlated is the simulation with the observations, and we can rank the different ensemble members and find the combination of ensemble members that give the optimal pattern, okay. But not only, apart from that, from finding these optimal, let's say, combination of ensemble members, we can also run them. And then do the forecast with a subset of the ensemble members. For example, we can, during analysis time, maybe we can run with, I don't know, 100 or 1000 ensemble members do these forward runs, then do the simulation, find which are the 10 better ensemble member or 20 or the number one, and then do the forecast, which are limited subset of ensemble members, which is more optimal in terms of more computationally optimal. So these, for example, some results that these autos obtained after applying this pattern versus inversion strategy for the source term, using the high split model, driven by the access regional and the logical model, the Australian model. And on the left, you see retrievals of different time instance, and here you can see the optimal combination of ensemble members that if the better fit to the observations. Okay, so let's move now to the last, let's say, data simulation strategy, which is the sequential data simulation, this is more a standard one probably you're more familiar with those. And this data simulation problem, as you know, already at this moment of the course, I guess that this is characterized by a sequence of steps that involve forecast analysis in which the posteriori estimate is obtained from the priori forecast. All the sequential data simulations technique that are applied to volcanic clouds are mostly based on applying Kalman filters, which as you know represents the optimal sequential technique for linear dynamics and Gaussian errors. The problem when you apply the Kalman filter as was originally proposed back in the 16th is that this is, as you know, not feasible for real geophysical systems that have a high, high number of dimensions know. So this is why the ensemble based Kalman methods appear and became very, very popular, because in this case, the probability distributions can be approximated by an ensemble of system state, and you also find an approximate covariance matrix that is given by an ensemble. For an authors where the fairs that apply at this sequential data simulation, more classical sequential data simulation strategies to volcanic clouds. These are for example some results when they use this ensemble transform Kalman filter to a lot of zeros transport model. There are three different time instance, and a comparison of the forecast without a simulation on the top and with that simulation applied, applying the ensemble transform Kalman filters. And well as you can see there are some differences and in general, you also improve the forecast. And finally, we have implemented in the full to the mobile data simulation system based on the PDAF PDAF is the parallel data simulation framework that allows a parallel doing this all this stimulation. And what we do here is we generate the ensemble by perturbing the eruption source parameters and also the some meteorological variables in particular the different components of horizontal window. This is for example results for a synthetic test. And here you can have a synthetic through so this is a cloud that we know well this is not a real case it is a twin experiment but in which you have a true and from this through you generate some synthetic observations, adding a Gaussian random noise. So we generate the observations. And then we do the simulation and the analysis with the synthetic observations. And then we do different cycles of simulations and you can see how the root mean a square error of these data simulation strategies decreases with time, depending on. And also we can look at the spread of. I have to say that here, we apply it a local local filter so we do not do the simulation everywhere because it's zero in almost. I mean, what one of the problems that we have with the simulation of volcanic clouds is that what we assimilate that is concentration is not. It is continuously distributed in the space as it happens with the simulation of other high results, for example, chemistry, but we have a very localized field with a lot of planets and then it's it's worth to apply some more local filters. Another example for the other clock cloud where we have satellite retrievals observations at three different time instance, and here we have the free run. The results of the analysis just to illustrate a little bit how the simulation, the simulations change when we do this all this simulation or the all this data simulation cycles. Okay. This is a comparison between. These are two clouds for the recovery option again, and you can compare here at different time instance the observations with the different data simulation cycles. And just to illustrate you the order of magnitude of what you can expect when you apply this on different simulation cycles, compared with a free run. And, and here the results of of applying this local ensemble transform common filter with different localization practices essentially is a parameter that controls the size of this local filter. In general, what we find is that the analysis arrows decreased by around 50% relative to the free run that's without that simulation when we do this type of ensemble based sequential simulation. But nonetheless, this also has some issues know, and one of the problems is that in the case of volcanic aerosols. We have a non Gaussian distribution of errors. When this means that we have a very skew distributions and this this we find that this is an issue because when we apply Kalman like filter let's say they assume a Gaussian distribution of errors and they are so much so linearity and this in our case is not always the case so we find that's a non optimal or sub optimal performance of the audience sample Kalman filters that we have tried in general. It means that that when we do the analysis maybe we can have some, for example, artificial negative concentrations that we have to remove. And the origin of this is again a non Gaussianity of this. And another issue that maybe it's worth to comment is that we assimilate concentration. But actually it would be better probably to assimilate or the other variables such as radians or aerosol optical depth, but this, if we do this is not clear the advantage of doing this in the sense that this would lead to introducing a non linear non operators. In the case of observation of assimilating concentration the linearity is guaranteed. But if we assimilate all our variables, this is not the case and this, even if in theory is potentially better, it may yell to some problems, but this is very much work, work in progress I would say this is the start of the art of what we have no. So just to conclude, and recap a little bit. Remember that in volcanic cloud forecasting. The key point is the source term and this is very certain. Okay. But during the aftermath of the 20 10 a year option there was a change of paradigm. Before that we had qualitative forecast. And after that, there was the need for having more quantitative based forecast. So we faced this issue on how to better constrain and better quantifying the source term. And for that, that assimilation has been, let's say, strongly used and yet as a research level as a solution to this problem. And this has bring a lot of words a lot of contribution with substantial scientific progress on the different data simulation strategies that have been proposed. But that insertion is the simplest one that works quite well in some of the cases. Source inversion is also very popular in particular for the range of models because it's quite optimal if you apply this elementary bias in a strategy that has not worked for a variance, even though. Alternatives have also been proposed for doing the source inversion using a very models. And then several authors have also started to explore more standard sequential data simulation strategies based on different types of carbon filter. This is very promising. This is a very promising alternative for assimilation of volcanic aerosols but as I have mentioned before, it has limitations regarding the Gaussian hypothesis behind these filters that actually leads to a sum optimal filter performance and you need to do the, let's say, to assimilate with a high frequency if you don't want the filter to collapse. All the scientific progress, it's very, very promising but we are still in the process of transferring this into real operational model setups. And this is a slow for several reasons, including the complexity of the workflows that are required to combine all this but this is expected. In my opinion, we will see this implemented operationally in the following years. And that's that's it from my side. Thank you very much for your attention. You can mail me here for questions and happy to.