 And we will record the game. Cool. Sounds good. OK. Let's see. This all set. OK. So to pick up where we left off, what I just want to point out again about this diagram is what we're viewing in this is, now, the classical ensemble Coleman smoother, where this is a very cheap algorithm to run. Because in effect, it's just running at the cost of, well, the filtering estimates. You're just running a filter. And then you're backwards passing the sort of ensemble transform to ensembles that are stored in memory. So it just comes with basically an extra memory storage cost, not really actually with extra simulation cost. But there's an issue in terms of, well, the information flow is only unidirectional. It's only moving backwards in time. And particularly, if you want, it's actually reinitialize the DA cycle in the sort of perfect nonlinear model that we're sort of implicitly considering with the smooth conditional ensemble. That is, if you had, let's say, an ensemble representing our uncertain initial data, again, just sort of a sample of a lot of different prior values. But where we've smoothed it by conditioning on observations up to a much later time, initializing with such an ensemble can dramatically improve the performance of the subsequent forecast and filtering statistics. And so if we denote, then, let's say, a composition now of the model forecast, if we just have the slice notation again to represent this composition of these nonlinear models or the slice notation again to represent just a product of all these linear transformations. If we initialize with the smooth prior, this then exploits the mismatch in perfect nonlinear dynamics, where if we took, let's say, this initial ensemble, which is smoothed with all this additional information and we propagated that forward in time, generically, this does not actually equal the filtered ensemble at the same time. That is, in the theoretical sense, in just the linear Gaussian model, well, these two would actually be identical. But in reality, of course, they're not. And that's because, let's say, of these mismatches between, let's say, this sort of approximate linear Gaussian dynamics and, well, a real model under consideration, where the effectiveness, actually, of this sort of linear Gaussian approximation, which is widely used, strongly depends on, let's say, the length of the forecast window, what we would say this sort of delta t, that on the sort of schematic that we saw at the beginning, the sort of length in time before we receive our next observation. And for a small, let's say, forecast window as such, the densities are actually well approximated with Gaussians. But there are deformations, of course, induced due to the non-linearity of the forecast. And so if the dynamics of the evolution of the densities is what we'd say, weakly non-linear, it's not a very strong deformation of this. Re-initializing the model forecast with the posterior estimates under, let's say, this sort of linear Gaussian approximation that we've been developing can bring new information into the forecast states in the next data assimilation window as we're moving forward in time and actually producing, let's say, fixed lag-smoothing estimates. And so this has been exploited to a great extent by utilizing the 4D cost function as in, let's say, many of the traditional var formulations, 4D var and all the nvar formulations, certainly, where the filtering map cost function is extended over multiple observations simultaneously and in terms of a lagged state directly. And so this type of cost function in a sequential forecast system motivated again by, let's say, a short range of weather prediction, this leads to what is known as fixed lag-smoothing. Well, let's suppose now that we have, let's say, this joint posterior, this joint smoothing posterior for the entire time series of the model states given some entire time series of observations for the current data assimilation window. And we want to write this as a recursive update to the last smoothing posterior. Sort of schematically again, what this looks like is we are imagining that in time, we have this data assimilation window of a number of, let's say, past lag states. We initialize with some value, create some ensemble forecast or maybe just a single forecast over these states. And at the end of, let's say, some length of time, we start in, well, encountering new observations, new observations that depend on, let's say, how far as what we call the shift that we are pushing this data assimilation window forward in time. And so we go through one smoothing estimate, let's say, assimilate these observations and then subsequently at the next time, we shift this forward, make a forecast of some kind, assimilate the newly incoming observations, shift this window forward and so on and so forth. And this is what's meant by this sort of fixed lag, this fixed length of window that we're moving across actually the lagged and current states. And if we follow actually just a very similar Bayesian analysis like before, what we can actually write in terms of this joint smoothing posterior as above, in a proportionality, we can write this as, well, these terms one, two and three, where in the first place, this is the marginal for what we'd say is just sort of our initial condition, given the last joint smoothing posterior, where again, this S is the shift size, so this is corresponding to one shift back in time, the time series one shift back in time and the observation time series one shift back in time. And respectively, we have the joint likelihood of the incoming observations in the current data assimilation window, that is these S total observations that are newly entering our data assimilation window, given some background forecast and where again, under the assumption of the independence of the observation errors, well, then this joint likelihood just becomes the product of these individual likelihoods for a given observation, given that same times model state. And then finally, we have this chain again of these Markov transition densities, which are given basically, represent a just sort of free forecast with our perfect model. So a forecast of the initial condition, the likelihood of the same data over that window and where we initialize with respect to our last best estimate, our last posterior again, as we average out the initial condition. And so this gives formally in the same way that we saw this sort of filtering recursion, this is sort of the, well, 40 smoothing recursion, which is used quite widely through many styles of schemes. And using this recursion, we can chain together a recursive fixed lag smoother sequentially across these data assimilation windows shifting in time. So again, if we apply the sort of linear Gaussian assumption, the resulting cost function would take the form. Well, we have a whole mess of terms, but let's say to try to highlight what corresponds to what we have the colors, where we have sort of our last smoothing posterior as the sort of background that we had in the filtering cost function. But now in terms of this, well, 4D cost function, what differs from the filtering version is that, well, of course, we're ranging over all the incoming observations. And we have this norm square in terms of the discrepancy of our model evolution pushed forward again by these linear transformations and then pushed into the observation variables, this discrepancy then from the observed data. And where we're following the same sort of formalism where we're writing this in weight space rather than in state space. And subsequently, if we go through the same sort of analysis, we make some substitutions, we get something that looks a lot like, again, just the sort of extended form of the filtering cost function. Now encountering multiple terms of these sort of discrepancies from the observations. And in the linear Gaussian case, actually, the solution can, again, be found by a single iteration of Newton's descent algorithm from optimization where for the cost function above, I mean, again, this is just to flash some equations. The derivation can be done or you can look at, let's say, the derivations in the manuscript. This is just, you know, computing the gradient, computing the Hessian. We have our single iteration of Newton's descent. We get our smooth estimates given, let's say, our last smooth estimate only with information up to the last shift by applying the optimal weights to this matrix factor and adjusting this. And respectively, we get our right transform again in terms of a, well, the squared inverse of the Hessian and applying this, again, as a transform on the right of the matrix factor, thus conditioning this full smoothing estimate up to time L. And the question then is, again, when the states and observation models are actually nonlinear, that is, well, let's suppose we wanted to replace these with the sort of original motivating equations, something that's actually realistic of geophysical dynamics. We want to replace nonlinear, let's say, terms in this cost function. Well, then this cost function will then be solved iteratively to find a local minimum where it's no longer guaranteed to be quadratic. And the difficulty arises, of course, in terms of that the gradients with respect to the weights actually requires differentiating the equations of motion themselves. We have to differentiate this composition of these model forecasts and the forward operator into observation space. So in 40 var, this is performed in the traditional 40 var by incremental linearization and back propagation of the sensitivities with what is known as the adjoint model. There. Let's say, if we suppose that the equations of motion are generated by a nonlinear function, and let's again just take independent of time for simplicity, where we have a derivative of the physical model state in time is given by some function f, we can formally write this sort of nonlinear map propagating our last model state to the current model state as just, well, the integral of the equations of motion with respect to time between the time points and plus the last initial condition. Is there any question? I was just hearing some things. No, okay, very well. Well, we can extend the linear Gaussian approximation then for the forecast density as if we view, let's say our model state at some time as a sort of perturbation of our mean states and some Gaussian, where again, this is going to be a Gaussian distributed, and we'll assume that at this Gaussian mean and this background covariance, and it's just perturbed again by this delta. Well, if we just sort of demean this, we subtract off the mean of that, it's equivalent to say that this delta is a mean zero Gaussian with the same background covariance. And then the evolution of, let's say this perturbation can be approximated via Taylor's theorem as, well, if we take the time derivative of delta, this is of course just taking the time derivative of this difference, which then, well, these are governed by the equations of motion with this F, and at first order, we can actually just write this with respect to, now, nabla in terms of X with the gradients of, well, let's say the derivative in the Jacobian sense evaluated at the mean, then applied to this perturbation delta, and then plus order delta square, well, delta norm square terms, where again, this is the Jacobian equation with the dependence on the underlying mean state. So in particular, what is denoted is the sort of tangent linear evolution. That is, we truncate the Taylor expansion and we simply write at only first order terms, well, our tangent linear model is defined as this sort of evolution on the space of perturbations from the mean, where in time, it's approximated by, well, the first order, just the Jacobian equations at the mean as a linear transformation of the perturbation. So making this approximation of the tangent linear model, we can say approximating again, that the derivative of X is approximately like taking, well, the derivative at the mean, plus the Jacobian equation applied to the perturbation delta. If we follow this through integrating over time, we then get basically an approximation at first order, where we'd say the evolution from our last time can be written, so the model state at the current time is a lot like taking at least approximately the fully nonlinear evolution of the last mean, and then plus the tangent linear evolution of the perturbation, where this is the resolvent or sort of a fundamental matrix solution with respect to the tangent linear model, these linear dynamics in the space of perturbations. So Gaussians are closed under affine transformations, which is what we are viewing right here and above. We're taking a linear transformation plus some constant value, and so we can approximate the evolution under the tangent linear model to say, well, again, if we don't have so long of a forecast horizon that it's not too nonlinear and the first order approximation is good enough, let's say the state at some time can be approximated by a Gaussian with the mean at the nonlinear evolution of the last mean, and then the sort of tangent linear propagation of the last background covariance. And therefore, let's say the sort of quadratic cost function that's actually used in incremental linearization in incremental 40 var is following. We have our posteriori cost in terms of, again, in weight space, we have our weight norm square, and now the difference, let's say with this term, what we're trying to look at is that we're taking the nonlinear evolution of the last smooth mean, again, through the fully nonlinear equations of motion and looking at the discrepancy from the observed data, but now we are taking this perturbation from the mean and this sort of adjustment factor in the weights where we are applying it through, well, the linear or tangent linear evolution of the matrix factor and the weights. And this is basically then the approximate linear Gaussian model cost function that is actually used directly in 40 var where the notion again is, well, if you're not so far away from the local minimum or let's say the optimal solution and if it's not too long of a forecast horizon, you can use the sort of approximate and actually quadratic cost function to incrementally and let's say iteratively solve for the best weights W to describe the initial condition. From the last slide, so a very efficient approximation, the gradient can be performed using the adjoint model, the tangent linear model. And that's the other side of this with the sort of traditional 40 var approach, the adjoint approximation and the gradient is very, very powerful. It's extremely efficient. But in particular, how this is defined where we have these adjoint variables, it's defined by a backwards in time solution to the linear equation where we'll define the sort of delta tilde as the adjoint variables. And we were actually, well, in order to solve this adjoint approximation for the gradient, what one actually has to do is to take the, well, minus Jacobian transpose to define the backwards in time linear evolution of the adjoint model where this has the underlying dependence of the nonlinear solution over this interval. So therefore in the sort of traditional incremental 40 var, one constructs the gradient for the objective function by differentiating the nonlinear model as follows. We make a forward pass of the nonlinear solution while computing, again, the tangent linear evolution in the space of the perturbations with then a second backwards pass and only in the adjoint equations where we back propagate sensitivities along the solution to find the gradients. And this is a very effective and very efficient solution. And of course, this has been the cornerstone of the ECMWF for many, many years. The one challenge of this really is that it relies on the construction of the tangent linear and adjoint models for the dynamics. And where for full-scale geophysical models, this can be extremely challenging, though it should be noted actually that increasingly constructing the tangent linear and the adjoint models is increasingly can be done using automatic differentiation techniques by sort of just formally computing the tangent linear and adjoint models from the computer program alone without sort of an explicit, no update, that sort of an explicit construction, which again tends to be very, very challenging. But so having seen, let's say, now the incremental 40 var approach, one, let's say alternative to this and let's say alternative to constructing the tangent linear and adjoint models is to perform what we'd say is sort of a hybrid ensemble variational or envir analysis based on the ETKF that we saw earlier. And this approach is now, it's at the basis of what is the iterative ensemble Coleman filter of Sockoff and others and the iterative ensemble Coleman smoother of Bokeh and Sockoff where this technique seeks to perform an ensemble analysis like the square root ETKF by defining the ensemble estimates and the weight vector in the ensemble span. So again, a bit of a mess of equations but hopefully the colors help clarify some of these parts. We have instead, well, we're replacing, again, we have an ensemble mean, we have an ensemble perturbation matrix, we have an ensemble covariance and what we're looking at this, I mean, in most respects it's fairly identical but we are going to look at the discrepancy again between the observed data as we take the fully nonlinear evolution of a perturbation of the ensemble mean in this, well, norm square with respect to the observation errors. And again, we have, if we reduce this down to just sort of the weight space analysis, we have this with the ensemble size minus one as a result of using the ensemble based covariance and this term remaining the same. So one will then measure the cost, this a posteriori cost as the discrepancy from the observations with the fully nonlinear evolution of the perturbation to the ensemble mean and then combined again with let's say the size of the perturbation relative to the ensemble spread. That is where this is defining again with these weight vectors, the combination of the ensemble perturbations from the ensemble mean. So the key again with this approach is how do we actually, well, compute the gradient to the above cost function because, well, I mean, sure we can compute this cost, but again, if we're doing a sort of gradient based optimization this is the major, major challenge about this approach. So the gradients of the ensemble based cost function is given, now if we just follow this through it's roughly the same analysis, but where the one major piece that's different is this y tilde where what we're representing the y tilde represents a directional derivative of the observation and the state models. That is, we're defining this as let's say the nabla with respect to the ensemble mean of the composition of the observation and the state models applied to the ensemble perturbations. So this is a directional derivative in the directions of the ensemble perturbations as evaluated at the ensemble mean. Again, using the sort of Gaussian perturbative linear approximation. So in order to avoid then the construction of the tangent linear and the adjoint models, let's say one particular version that's fairly easy to understand the sort of bundle version makes an explicit approximation with finite differences with the ensemble where we can say, let's say here is your ensemble mean here's your ensemble perturbations but let's rescale the ensemble perturbations for a very small constant epsilon. And so this represents let's say a cloud of ensemble perturbations of the ensemble mean very finely perturbed from the ensemble mean fully non-linear pushed through the non-linear model and then rescaled on the outside by epsilon one over epsilon again in the observation space. And where this term over here is actually, I mean, it's in statistics known as the centering transform and all this does is this writes actually all the differences of the ensemble perturbations from the ensemble mean in the observation space. So we are again explicitly writing this as a finite difference approximation, not respected. It's not so bad. It's just kind of knowing, I don't know some of this matrix algebra and some of these notations that we use but you can be thinking about this exactly. This is a finite difference approximation. So the scheme then produces an iterative estimate using a Gauss-Newton, for example, or as well, Levenberg-Marquardt based optimizations both of which have been used in a variety of circumstances. And it should be noted, well, there's a very, very similar approach that is sort of a parallel kind of twin version of this type of estimator that's more commonly used in reservoir modeling than in, let's say, short range weather prediction. It's known as the ensemble randomize maximum likelihood estimator. And it's a very, very similar analysis. It follows most of the same conventions but with just a little twist here and there but otherwise, I mean, similarly using a Gauss-Newton or a Levenberg-Marquardt implementation, both of these things have been done. Now, to get into the new results part. Well, the accuracy increases again in using new and let's say additional iterations of the 4D map estimate. The big cost of this is that every iteration comes at the cost of the model forecast. And the model forecast over, let's say, this entire data simulation window. And so, I mean, with these large geophysical models and ensemble forecast is definitely, this is the leading order cost because it's so expensive to run all these equations, these equations of motion. And so, the flip side is actually in certain applications, let's say, such as synoptic meteorology, the linear Gaussian approximation of the evolution of the densities is actually quite often an adequate approximation. It's not badly actually approximated with this tangent linear model. And so iterating actually over the fully nonlinear dynamics may not always be fully justified by the improvement in the forecast statistics. That is, well, instead we might consider only to perform an iterative optimization over, for instance, the nonlinear observation operator or hyperparameters in the filtering step alone, just like in the classical ENKS that we saw, well, at the first end of the first part of this, this is just running the filtering step and then producing a smoothing analysis retrospectively and this sort of iterative optimization versus a nonlinear filtering cost function can be run without the additional cost of model forecast. So this can be performed actually very, very similarly. The sort of optimizing versus hyperparameters or optimizing versus a nonlinear observation operator, this can be performed very similarly to what we just saw with the INKS approach, using, let's say, for instance, the technique of the maximum likelihood ensemble filter of Supansky and others, which is sort of a related technique as well, but where this is only the sort of filtering approach. And then subsequently, the retrospective analysis of the ENKS in terms of this sort of filtering right transform can be applied to condition the initial ensemble. That is, again, if you construct your filtering transform, you can produce a filtering analysis of the iterative optimization of the nonlinear filtering cost function, find your transform, and then condition your initial ensemble just the same. And then as with the 4D cost function, one can likewise initialize the next DA cycle in terms of the retrospective analysis, like we did to sort of introduce the improvement in the forecast statistics by initializing with a smooth prior and gain, let's say, the benefit of the improved initial estimate. So the scheme is exactly what I was discussing. This is the scheme that I developed with Bokeh, currently an open review, which what we describe as the single iteration ensemble common smoother. To note it as such, because we're trying to refer to, it requires a single iteration over the data simulation window in this sort of ensemble forecast, where let's say compared to the classical ENKS, we say this just adds an outer loop to the filtering cycle to produce the posterior analysis. We're compared to the diagram that we saw earlier. We can imagine, let's say again, we're sort of initializing our ensemble forecast at some lagged time and we are running a free forecast over states until we encounter an observation. And when we encounter again an observation, some newly introduced information, we can just run this as a filtering step and assimilate the data on the fly as in the traditional LATKF or ETKF analysis and then produce a retrospective analysis on the prior. And we can run this actually as opposed to the 4D approach, we will just sequentially assimilate these observations on the fly as we encounter them and just produce the retrospective analysis on the fly. Whereas the 4D approach is to say, run in its higher free forecast, globally analyze all observations and then produce this at once. And so the way this game can work again is, well, you can run it within a filtering configuration as long as there's sort of a benefit of having to initialize with a smooth prior. And at the point that the benefits of, well, initializing with a smooth prior is running out, well, then one takes all these retrospective free analyses and initializes again with a smooth prior and starts a new cycle. So the information then as compared with the classical ENKS, this is to, well, have the information flowing from a retrospective analysis in reverse time, but then the reanalyzed state becomes the initialization for the next cycle over the shifted data simulation window, carrying this information forward into the forecast. And the iterative cost function that is only performed in the filtering estimate say, so that we don't need to, as in the 4D approach to analyze these observations. The 4D approach is making an ensemble simulation over the entire window, whereas the filtering approach is simply to optimize in the filtering step alone without the ensemble forecast. So just what I said. And in particular, when the tangent linear approximation is adequate, this is shown to be actually an accurate and highly effective approach to sequential DA, at least in a variety of toy model test cases. Let's make that qualification. This has not currently been implemented in let's say a full scale model. This is, our results are basically with respect to toy models at this point. There. One thing to add onto this is that actually our key result is a very efficient form of multiple data simulation or MDA within the ENKS cycle, where MDA is a technique that's based on statistical tempering which is designed to relax the non-linearity of Bayesian map estimation, where in a single data simulation smoother or what we'll refer to as SDA, each observation is only assimilated once, so that, well, if we have a long data simulation window, new observations are only distantly connected to the initial condition simulation. This, if we make a simulation from somewhere far back in the past, these can be very distantly connected to the optimization of the initial condition. So that, in particular, this can introduce many local minima to the 4D map cost function and can strongly affect the performance of the optimization. So MDA is designed to artificially inflate the observation errors and weakly assimilate the same observation over multiple data simulation windows. That is, if we look back at this diagram, we would say, well, for instance, if we initialize right here, we'd in fact assimilate the observations at all of these times, even the ones that we have already assimilated, but we inflate the observation errors artificially so that they're only a weak, weak constraint to prevent from the overfitting of, say, assimilating these multiple times. And with a bit of formalism, one can show that, at least in the linear Gaussian model, it's a fully consistent Bayesian approach, though it's, let's say, on weaker ground, let's say, in terms of the theory for a nonlinear system. But yeah, what this does then is it inherently weakens the effect of local minima. And as in tempering, it slowly sort of warms up the estimator and brings it more close, slowly, to an optimal solution. Where in statistical tempering, I mean, that's been used for a long time, basically, to prevent the numerical divergence of Markov chain recursions when it's just simply, let's say, multimodal or it's just highly, highly difficult with many local minima. But the big difference, let's say, about our scheme is that the way that this treats MDA versus a 40 N-var estimator is by using a classical ENKS cycle to weakly assimilate the observations over multiple passes. That is, well, the filtering step in this analysis is actually used as a boundary condition for the interpolation of the posterior over the lag window. So if we look back at this diagram again, what we're saying is the 40 approach again makes it just a free ensemble forecast just over all these states, makes a global analysis and then optimizes the initial condition. But in the SI and KS approach, what we're saying is, well, one actually treats all of these steps as a filter analysis, so that you re-assimilate all of these observations on the fly with a filtering analysis where the observation errors are inflated so that this is only a weak boundary condition. But as we make the simulation versus the 40 approach, instead of simply running a free forecast, this is a weak boundary condition as we interpolate this estimate over all the lag states. And it has the ability actually then to control the accumulation of the numerical forecast errors as you're running this interpolation over the lag states. So, where are we? Yeah, in fact, what we demonstrated in a number of test cases is that this is in fact a more accurate, more stable and more cost-effective approach than similar ENKF-based 40 and bar schemes in the short-range forecast setting where particularly we want to emphasize that this is really for short-range forecasts where this SI and KS is not as robust as the 40 map estimates when you have highly nonlinear forecast error dynamics. This is really relying on this notion that the tangent linear approximation is a fairly good approximation for the evolution between, let's say, observation times. Now, let's say to go on to some actual numerical results. I'm good, I think we're doing well on time. What we're looking at here is actually a number of simulation results for the 40-dimensional Lorenz-96 system where we are viewing the forecast, the filter and the smoother statistics for each of, let's say the traditional ensemble Coleman smoother using the right transform analysis, the SI and KS, the Linn INKS, which is, let's say, it's formally equivalent to the 40 in LATKF of Hunt and others, but what this is representing is the same 40 INKS, but where this only utilizes a single iteration of the cost function. And then this is the fully iterative 40 cost function where it's allowed to make multiple passes of the ensemble simulation over the data simulation window. And in the vertical axis, we have the number of lagged time points in the past that we're simulating over. And in the horizontal axis, we have the ensemble size that's used for the estimator where we have the now smoother filter and forecast RMSE and the spread side by side so that we would judge the estimator performance basically by having low RMSE and having a similar spread to the RMSE, which is a diagnostic to say that actually the linear Gaussian approximation is fairly accurate in the setting. And we note, again, this is the single data simulation setting. The single data simulation setting and where the first thing off the bat to note is that the sort of classical ENKS that only uses retrospective analysis and doesn't actually iteratively solve, it's all stable over much longer lag windows because again, it's only providing a retrospective analysis and it's effectively just running in a filtering configuration at all times and then applying, let's say, some retrospective analysis to smooth pass states. But this performs actually much worse in terms of the accuracy than any of the iterative approaches, which gained the benefits of actually re-initializing with a smooth prior. And respectively, well, actually these all perform more or less the same in the SDA configuration. But if we turn on the multiple data simulation, what's interesting to see, just to note, this is not a multiple data simulation scheme. It's just placed here as a reference to show again the sort of baseline very cheap performance. When we turn on the multiple data simulation, the differences, well, particularly, it's not so much in terms of the smoothing statistics where these are fairly similar in their performance, but it's most especially in terms of the filter and the forecast statistics, where that difference is that the SINKS by using this sort of weak boundary condition of the observations is actually able to control the error growth as we make this interpolation over the lag states much, much better than the 40 approaches. I mean, all have actually better performance in the MDA, but it's particularly that we get this actually much wider band of, let's say, stability and accuracy with the SINKS. So as a cross-section of the last plot, let's say where we'll just take one of these lines and we'll vary versus lag for a fixed ensemble size. In this case, we have fixed the ensemble size at 21 so that there's an effective ensemble-based gain rank of 20 which is half the dimension of the system. We have again, so the smoother the filter, the forecast statistics as represented with respect to the classical SINKS, the SINKS, this LININKS and the fully iterative INKS. And again, the estimators and the SDA configuration are largely similar except for the fact that, well, all of the iterative approaches actually, in fact, achieve a better forecast RMSE than the filter RMSE of the classic approach. So again, indicating the strength of sort of re-initializing with a smooth prior. But when we turn on the MDA, again, the notion here is, well, in fact, we do see a little bit wider band of stability for the SINKS versus the other schemes. But it's particularly, again, in terms of the forecast and filter statistics that this weak boundary condition as we're interpolating over the data simulation window, this remains much more stable. And in fact, it is able to continue to increase accuracy over very long lags versus the 4D approach for which this free ensemble forecast, the forecast is accumulating errors over this interpolation and it begins to degrade much more early in either of these approaches than with the SINKS. So this data boundary condition is shown to improve the forecast statistics by controlling the accumulated forecast errors over the lag states, sort of unlike traditional 4D environment approaches. And similarly, well, the interpolation of the posterior estimate does remain, in fact, more stable over the data simulation window when the forecast error dynamics are not highly nonlinear. This is again, this is relying on this sort of, well, adequacy of the tangent linear approximation. These results are actually demonstrated in a wide variety of other test cases for short range forecast systems, which are presented in our manuscript, a fast single iteration ensemble common smoother for sequential data simulation, which is currently an open review in geoscientific model development, which it's a very nice process actually. So, I mean, the preprints available and even the community can comment on the review process if they wish. And so, I mean, it's completely open to the community to be a part of this review. And so please, I'd be delighted if you take a look at this later, but as you wish. In particular though, we study the estimation versus nonlinear observation operators versus simultaneously optimizing hyper parameters and versus long shifts of this data simulation window, which again are challenging in the 40 map approach. And in a variety of test cases relevant to short range operational prediction cycles, we demonstrate to the improved accuracy and in fact at a lower leading order cost using the SINKS versus the, let's say the older INKS, but which is better for highly nonlinear dynamics. But there are two qualifications that we should mention are that, well, these theoretical results are again, based on the perfect model assumption for simplicity in this initial analysis. And considering this in the case of significant modeling errors is still ongoing work that we're considering. And we have not either introduced localization or covariance hybridization in this initial study for simplicity. And this is well, this is part of our ongoing work to introduce this to make it well, realistic for a fully, a full scale geophysical model. But however, let's say, with respect to these sort of simple test cases that we've looked at, these are supported by extensive numerical demonstration with the Julia package data simulation benchmarks that I've written along with some support from my student researchers. And let's say, well, the ETKS, the LININKS, INKS and the SINKS in this manuscript have pseudocode provided. And of course, the implementations are available in the open source Julia package. So in this work, we introduce our novel scheme, basically validating its performance advantages for short range forecast cycles. And we also, in let's say, surveying many of these topics that you've seen in this lecture, we're trying to provide a bit of a theoretical and computational framework for NVAR schemes in the ETKS style analysis to give basically just a general reference of the development of these ideas and also a fairly comprehensive demonstration of their numerical performance. In conclusions, in surveying a variety of schemes, our key point again is that the Bayesian map analysis can be decomposed in a variety of ways to exploit the operational problem where traditional 4D map approaches have largely exploited this analysis for moderately to highly nonlinear forecast aerodynamics where the 4D map approach is very effective in terms of optimizing a very nonlinear forecast for the initial value problem. And this utilizes, again, using the 4D map cost function to estimate a full sensitivity over the whole data simulation window for the initial condition. But with a change of perspective, again, this analysis can be similarly exploited for short range prediction cycles as in the SI and KS where we exploit that in systems where the perfect model assumption is sufficient and the forecast evolution is weakly nonlinear or sort of corresponding to short range forecast cycles. A retrospective filter based analysis actually solves the exact same map estimation for the smooth prior as in the perfect 4D map estimation that you can view it in either way. You can do the retrospective analysis or equivalently, you can do a full 4D analysis. And in either case, you end up, again, with the sort of Bayesian map formalism with the same solution. It's just, it's how you want to address the marginal versus the joint smoothing problem. So this leads to our sort of simple outer loop optimization of the DA cycle itself. And this Bayesian analysis in particular, I guess why I like this is it gives a very fresh perspective to the tools of nonlinear optimization and how to produce efficient statistical estimates in online settings. And so that's the end of it. I think I just managed to finish on time again. So thank you very much for your attention. Thank you very much, Colin, you are perfect in all sense. Especially taking away, perfectly managing your time. Thank you very much for the quite, I will tell this, besides at the same time, there's a nice overview of the many methodologies used today in the field of the data simulation that's currently within the framework of Bayesian. And now we have a number of questions already in the chat, but also I invite everybody there, Aleksandro, please allow everybody to open their microphone. And then you can also raise your hand into the reaction point, raise hand, or we will see who would like to ask or comment something. And now from them, first, I would like to know if the presentation can be made available. Also, could you provide an accessible open library source for data simulation scheme, especially the ensemble filter and smoother that use the Python. Indeed, actually I'm just looking, some old friends back at, well, the Nansen Center in Bergen, where I was working formally as a postdoc. Here we go. Yeah, I'll just send a link in fact, the GitHub page for the Dapper Python package. I can highly recommend in terms of, well, it's a framework, it's not, let's say comprehensive in terms of what you can do, let's say operationally, but this Python package, I guess let me share again as well, just to show you. It's designed actually to handle mostly the same sorts of topics, like an ENKF based filters and smoothers with actually a very extensive set of models that it's been designed to handle already, as well as results that it's basically, it's reproduced within the literature, also using let's say some amounts of var techniques and things like this as validation and to intercompare the various methods. So this is one, I guess highly recommended Python implementation, but let me also mention that within, let's see, yes, within the docs, there's a variety of other packages that you might consider if you want to look at, for instance, Python, MATLAB, R-based, various libraries that could be used for data simulation. This is I guess just my own, and this is at a much more, let's say in progress state, this is really just my own research code that I use to develop and intercompare things, but it's all available and is also available in the Julia Standard Libraries. If you want to use any of this, I mean, absolutely feel free. It's just meant to provide, let's say the basis for all these numerical test cases that I did where this is the manuscript as well, that is currently an open review. And so I'm more than happy to provide any of these resources. I can highly recommend the Dapper package, let's say for just general, let's say mid-scale to small-scale data simulation research. It's not really meant for, let's say a full-scale operational model, but if you're interested in sort of studying the algorithm design and having a very robust basis for comparison of different schemes, this is a great package. And I mean, the people who work on it are good friends and I contribute a little bit of this myself, but I've transitioned and it's my own research life more to Julia just because it's blazingly fast and it's a fantastic language. So let's see, as well as the other part of the question, sorry. There is another question related to this course mattress, but it's a well, it's a which methodology most specifically is asking to help her because mentioned it's this methodology, but of course, many methodologies that they mentioned. I think it's the good way. Let's say in principle, I think most of these methods could be adjusted, I mean, and often are adjusted in operational settings. If you have, let's say some features such as sparse matrices, I mean, various types of analyses in terms of, let's say the sequential conditioning on observations are used in practice or parallel conditioning on observations are used in practice, I mean, to speed up performance. And so if you have something as nice as sparse matrices, sparse arrays that you need to handle, this can be absolutely integrated within one of these frameworks. And it's just sort of getting more into the nuts and the bolts of the implementation operationally. There's a question related to the application and it's here written about the forecastable corruptions but many other applications or science, atmospheric sciences or meteorology what do you can tell about it? That's probably with respect to the methodology which we've developed recently. I don't know the application of volcanism well myself but I presume that most of it would follow, I mean, mostly the same sort of analysis. I think these things are done in standard sort of 40 var implementations. I mean, I think volcanism is included, let's say, yeah, and the ECMWF style models, things like this. And so in principle again, it's the trade-offs come down to with ensemble-based methods, you can sometimes avoid constructing tangent linear and adjoint models but tangent linear and adjoint models are extremely powerful. It's just sort of, if you're willing to put the lake work into actually developing these, it can require a lot of expertise and a lot of development time. You have them available, adjoint methods are fantastic. I mean, this is why back propagation is used in machine learning exactly to train these sorts of neural network weights, these sorts of things. It's all the same adjoint-based approximation of the gradients. And if you have it, I'd say definitely use it. If you don't have access to that, consider as well using automatic ad joints or try an ensemble approach. Ensemble approaches, the big qualifier again is that the cursor dimensionality is very much, it's a real factor for the stability and the reliability of the inferences of ensemble methods. And so in practice, what ends up really being done is you need to do a lot of covariance regularization either through hybridizing sort of with a 3D var static covariance. You use some sort of interpolation of an ensemble covariance along with a static one or you use localization to basically, let's say in its various forms, condition and regularize the ensemble-based covariance and the ensemble-based gain. Okay, there is some question, but I think it's, we need some more clarification because it's written, it's a, could you give us an example applied in hot springs geodynamic model? What that exactly means? Hot springs in geodynamic model, but anyway, it should be clarified. I would like just to comment the comments about the applications. I think it's what you are doing that is really very great when we have a lot of observations. When we need to, at each time step to look this and the observations and to compare it to the forecast and so on. In some models, particularly in geodynamics we have not so much data to assimilate. We have a present data. We have a data for, let's say, from geodesy, from some mental processes, but again, it is not as something which is very reliable data, like atmospheric data, which we have when we collect this data and so on. In this case, definitely the application of war, it's or some other approach which deals with so-called the traditional inversions. It's sometimes it makes sense, but when it's with the increasing of the number of observations, definitely it's time-consuming, computer-consuming, sometimes intractable just to solve the same thing. Just all the applications, other rules. Absolutely, absolutely. Definitely, this is mostly motivated by this sort of sequential forecast cycle. In reality, when you're doing any of this analysis, so much of it depends on the problem at the heart that you're studying. Everything is nice and wonderful theoretically in a linear Gaussian model, but as soon as you start to deviate from this nice and wonderful idealistic setting, you need to encounter and address the real problem at hand. And the methods that you choose will depend on the trade-offs of that reality. And for the questions, okay, well. Maybe we can go back to Sheikham and ask what he meant by his question about hot springs. Yes, exactly. That's I've already mentioned and the comments that may tell. You mean in this hot springs, maybe it's the balloons or it's some other. What exactly springs in your case? What sport maybe or most probably is a hot sports unit. Anyway, they say, yeah, it's some terminology issue, but if it is a hot sports, it's a hot sports information comes from the seismic tomography. And it's self-remodel. It's self-remodel of the interpretation of seismic waves. And there is a note every time, we have the one model, one time, and then another model comes from some other time. And this is a notice so much it's inversion. It's the most simple, if you will do version straight forward. I think it's in terms of the classical wave, it's a classical wave in the inversions. Of course, it's a hot sports unit. What do you think, Pauline, about that? Oh, I mean, my goodness, I only just, I have a small, small, small, tangential knowledge of, I mean, wave front inversion. I've seen the topic, but I certainly wouldn't consider myself an expert on that. But I mean, I guess what I am aware of in the literature, I think a lot of the perfect model assumption is often used, at least what I've heard from friends within the petroleum and seismic engineering analysis that this type of perfect model assumption is quite often used because it's not as much, let's say an issue in terms of the accuracy of the model, but I guess from where I stand, what I've heard is that it's just an extremely, extremely high sensitivity to the model parameterization. And it's the need to handle this extreme sensitivity and the need to, yeah, the reason why let's say 40 approaches are often used in this is because you can generate an optimization over the initial condition itself. And this allows you to have, let's say, a fully consistent model evolution with respect to versus doing something like a Coleman-style estimator where you actually perturb the simulation in place. You're applying a correction online. This can destabilize actually some of the simulation. And if you lose your balance with the conservation, anything like this, this could easily destabilize, let's say the simulation, at least from what I've heard from friends, but where it's not, let's say, my own area. So I hope I don't say anything incorrect. Very good. Alik, if I can jump in? Yes, please, yes, I mean, I missed the first part. I mean, just getting back to the, I think that in full wave form inversion, I mean, when it goes to seismology, this is a case of a static inverse method, the inverse problem. So I wonder whether you can comment in general, I mean how, because I've seen the application of this in full wave form inversion, but people later on have given up. It's like the argument is that they say that full wave form inversion is more of a static inverse problem, which does not relate directly to this dynamic or evolutive problem that is specific to data assimilation. I definitely agree. Let's say there's a fundamental difference if you're talking about, let's say in relation to this notion of having a data simulation window, when the data simulation window is fixed in time, you are equipped with a very, very different kind of problem in a different way that you would want to approach the analysis. And it's exactly because you don't need to perform this analysis sequentially that actually, again, using let's say a 4D approach is very, very natural in this setting because you don't need to keep shifting the data simulation window incrementally. If you have the ability to do sort of, let's say in my own nomenclature, I'd say it's just a static smoothing problem, you're equipped with the ability exactly to handle the parameter estimation, state estimation in a very fundamentally different way than if you have to actually shift this window on time. Yeah, thanks. Okay, somebody else would like to comment or ask? If not, again, Colin, thank you very much for a wonderful lecture. It's many people really appreciate your lecture for the chat, if they could see it. Thank you very much. Yeah, yeah. And I think that's it. Yeah, we will also, I would like to mention that it's a point and we'll contribute to the book which I mentioned at the first date and here you can also use this information which Colin specifically made a very nice overview of the models and some new techniques and so on together, that's it. Now I would like also to attract your attention that it's a tomorrow we will have a morning lecture by our colleague from Seoul. It's Sung Moon Lee. He is one of the distinguished scientists and dealing in mathematical geophysics and will give a special presentation on the topics or inverse methods in the era of national learning and deep learning. Please attend the lecture. You will enjoy, I hope the lecture. Keep in mind that this is a very special person. The person who fights with the health issues and they say who really shares himself with society, with science and with the students. After unfortunately there is a car incident, he became almost fully paralyzed and he's in chair and he is supported by several people but he likes to give lectures. He likes to have a conversation with students and I hope that tomorrow you will enjoy his lecture. Okay, and also tomorrow afternoon we will have presentations of our workshop participants. We'll be tomorrow session and please also try for all 10 events and ask your colleagues something which will be presented during this time. Okay, thank you very much for everybody. Yeah, Karim, you would like to say something. Can I just add something technical? Yes, definitely. Those of you that are talking tomorrow, I mean, we're asking you to connect half an hour earlier, please, just to check that your PowerPoints are working. And so just to make in a way that the timing is used in a very efficient way. And the second information that I would like to give you, there is no need that you're right to me at E-Mains, please regarding your participation. If for any reason you're unable to attend the lectures, it's Zoom that will decide it. I mean, we give the certificate to people that are attending 60%. It's not my own decision, okay? Don't send me E-Mays, it's Mr. Zoom that will decide on you. They do not leave your Zoom session open without attending, it does not make sense, okay? Then I'm not the person that decides, okay? Then there is a system at ICTP that works this way now with this virtual world because of the pandemic. But I hope that, well, from next year, maybe we will be able to get back to our schools in presence and see you physically. I apologize, I'm not able to respond to all the E-Mails, but I mean, this is the way the system works, okay? If you attend 60% at least, then you'll get a certificate, okay? But myself, I cannot do much on this. Thank you very much for this. It was really, and I enjoyed the last lecture as well. Thank you very much for, it was a pleasure. And also I would like to mention which Cary wouldn't like to mention, and I also wouldn't mention, but you will get some surprise by the end of this workshop. That's why everybody should... Should, yes, exactly. Yeah, you should attend, at least. Exactly, attend. Otherwise, you will not be a part of this. There is a little surprise for those of you that are really, yes, doing their best, okay? Yes. Thank you very much. Thank you, and good evening, good morning. Good morning. Good afternoon. Good evening. Everybody's welcome. Thank you. Best wishes to everyone. See you tomorrow. Exactly. Bye-bye. Bye-bye.