 Good morning and welcome to week two of the ASP summer equilibrium on S2S science and application. This week our first speaker will be Magdalena Alonso Balmasera. Magdalena is a principal scientist at ECMWF and also the head of the earth system predictability section. Her interests lie in the predictability and initialized seamless predictions and ocean coupled reanalysis. She has made numerous important conclusions in the field of ensemble generation, the analysis and evaluation of coupled reanalysis and ARC interactions and in particular ocean initializations. She's involved in a number of committees and scientific steering groups, too many to list them all. Also Magdalena is known to throw great parties where she is serving specialties from Spain like Manchego and Rioja. Welcome Magdalena. Thank you Judith, good that you remember. I'm glad and thank you Judith and Anish and the organizers for inviting me to be this presentation that is going to burst on the initialization and forecast the strategy for seamless forecasting. So this is a bit the outline of my talk. I will start setting up the scene with some introductory concepts and then I will talk about initialization shock forecast drift and calibration and then most of the talk is going to be as an example the initialization. What do we do for the initialization of the ocean as an example of a slow component of their system. I would like to also compare two initializations and forecast the strategies used if you want to indicate that and more in seamless approaches and if I have time I would like to talk about ensemble generation strategies for seamless prediction. So probably by now you would have seen last week that the basis for forecast beyond weather time scales resides on the properties of a system with two time scales with multiple time scales. The first time scale point of view which would be the atmosphere. This prediction beyond weather is basically a boundary condition problem or the predictability of the second kind of loaded dice by which the forcing exerted by what the atmosphere considers boundary conditions changes the atmospheric circulation modifying the large scale patterns of temperature and rainfall and so the probability of occurrence of certain events deviates significantly from climatology. But what are those boundary conditions from the atmosphere? There would be a ceasefire temperature, moisture, snow, radiative forcing, stratosphere maybe. But these components in most of them in their system models are not boundaries are really prognostic variables. So from the point of view of this slow component the predictability beyond weather is also a predictability of the first kind that means an initial value problem. So we need to visualize the slow component. And before I go into full details I would like to remind you that these seamless forecasts they have an additional component is not only predicting that this is an example but this would be an ensemble prediction for today that we know several realizations to estimate the PDA of the possible occurrence. But this is not enough. We need to compare it with what is called a climatological PDA that would be the black car that we estimate by issuing serious or conducting a series of forecast or retrospective forecast back in time to estimate this climatological PDA of the model. And the information content is really in the difference between the model climatology and today's forecast. To initialize this for this climatological PDA we need initial conditions for this forecast and the quality of these initial conditions would depend very much on the quality. They usually come from reanalysis and the quality of the reanalysis. So that's why reanalysis are an integral part of a seamless forecast. Why do we need this reforecast? I mentioned mostly the calibration or just contrasting what is today from the traditional climatology of the model. So it's calibration but it's also for detection of extreme events and here we can see the predictions at two weeks of the Canadian heat wave for instance or a seasonal forecast the seasonal mean. We use them and this is very important for a skill assessment. It's not enough to issue a forecast. You need to have some knowledge of how good the quality of that forecast. Does the reforecast are used for that as well? Then the reanalysis they are used for monitoring climates and of course the reforecast and are very good data set for predictability and evaluation studies and to see whether the model has error or what are the future directions for improvement. So what are the requirements arising from this reforecast? So what do we need is consistency. This reforecast and the real-time forecast need to be consistent so the calibration makes sense. We also need temporal consistency in the past and faithful representation of a wide range of time scales so the skill assessment makes sense and the understanding of what is going on makes sense and this is quite challenging especially the presence of model error and a changing observing system. It is obvious but I would like to illustrate it. We also need accurate and physically balanced estimating and associated uncertainty so the observational information can be propagated into the forecast and so the relevant processes can be reliably quantified and how far back as far back as possible and the limitation there is usually computer resources. So the initialization I like this picture because what we are doing in the forecast is basically two ways translation from the observations into the model does the initialization stage. Then we propagate the information into the future and as the forecast stage we do it in model space and then we do the calibration stage that is the translation from the model information into observation space. The techniques both initialization and calibration are very similar they are based on Bayesian statistics. In the initialization we call it data simulation and this is going to be the topic of the town where the observations collapse the uncertainty of the possible realizations but they also correct them in the state where we are. This is just a very idealized cartoon of an ensemble of data simulation but and this initialization step is really what distinguishes a forecast from a projection or a model simulation. This is essential in the forecast and indeed it helps it has more information and forecast than a projection or a simulation because it has that connection with the real time but this initialization may not be perfect. For instance models are not perfect and that can create problems and observations are insufficient and the data simulation which is the translator itself has deficiencies and this is you can imagine somebody translating or a real translation in the early times that sometimes the translations are far from meaningful even if they are very objective but they need a little bit more. So this errors in this translation can give rise to problems with initialization shock so this cartoon shows a little bit the problems that we may have with initialization shock drift and how these two things can affect the scale. Imagine that we have the real world that is this blue line this is some schematic of the face space we have the model attractor of the model climate which is different from the real world because models will always be deficient representations and as the forecast we launch a forecast and the forecast the forecast slowly will go from the real world into the model climate that could be ideal if we have this sort of behavior is ideal because during this time before it reaches the model attractor we are going to have information about so the forecast will have information but this is not always the case sometimes this drift may not may not be monotonic and in fact we can have that's what we call initialization shock. So it's not monotonic in some direction of face space and in fact during this period of time the error is worse than if we had say the model climatology for instance this can create problems at times especially if we have nonlinearities imagine that we have the system is nonlinear which is and if this initialization shock is large enough that kicks the nonlinearities we may end up having a convergence of solution to a state that is different from what we would have had with the initialization so we need to be careful. So what I mean basically is difficult to define but I try to say that the consequences of initialization shock because it implies that the data simulation process has created imbalances in the initial conditions not supported by the model physics and the observation information is rapidly lost the adjustment that deteriorates skill. What causes this initialization shock that could be the efficient data simulation as I mentioned just bad translation and that it could be because we don't have sufficient physical constraints on the data simulation or data simulation is trying to put the scales that the model cannot support or we are giving too much weight to the observations or the observations that put quality control and end up in wrong observations being unsimulated. So these steps are quite important the weight that we give to the observation the quality of the observations and the balances in the data simulations. There are also reasons other reasons why the initial conditions can lead to initialization shock is if this had been produced with different model cycles that the one that is used by the forecast this could happen if we use separate initialization or uncoupled data simulation where the observations and the observations are produced differently separately but in coupled mode when they are together or we use different model cycles that they are not related to each other and we did some exercises to quantify this that was in the medium range and what we could see is the largest contribution of the initialization shock is when we use different models from the ones that have initialization but coupled data simulation also contributes so this is the RMS error with different models this one is the same model cycle but separate initializations and coupled initialization and the blue line is with coupled initialization so we know that there is a path for improvement and that's where we are and so what about drift so much about initialization shock what about drift we say that this is an example from seasonal forecast where we see nino 3.4 and so an index of ensouled surface temperature with different models and those are some of the models that we use operationally and you can see that we have biased in the first moment of the distribution drift in the first moment of the distribution that is called biased and we see how this depends on lead time and it grows with lead time and this depends also on the model cycle and the model resolution and it also depends on the phase of the seasonal cycles so it's not always the same if we initially lies in November the drift that we may get is different that if we initialized in May we also have errors in the second moment of the distribution and this is can be characterized by the amplitude ratio the interannual variability between the forecast interannual variability and the observations interannual variability and you can see here these two colors scales are the same blue the one with colder drift is the one that has larger amplitude in the seasonal and then larger error in the estimation of the amplitude and we know that this is because there are non-linearities that link the mean state the variability the cold tongue and the strong thermocline feedback basically so why do I say that first is that there are different sort of errors and in this case is first moment and second moment of the distribution but if we have errors in the non-linearities then the basic calibration of removal of bias will not be because the variability is affected so this bias correction of posteriori only of the mean is insufficient if the system is non-linear and then the other question is that one common perception is that while initialization shock depends on the initialization the drift depends only on the model but it's not true these two are intrinsically linked and we can see that the same model with different initializations can go to different states and different drifts so that's important to get in mind when we want to diagnose model errors okay I know there are many considerations but I think it's a multifarious aspect of problem this initialization so I wanted to give the full picture first so now we come to this issue we want to initialize the slow components of their system so how do we do it optimally so in the atmosphere the atmospheric scientists are very good with optimality and they have this for the bar and tangent linear and they are able to optimize the initial conditions so they produce the best forecast with an objective metric so elite time a variable and a region they tend to use geopotential height or the energy of the of the system and it's you can evaluate the initialization very quickly for the slow components more difficult because the tangent linear and the adjoint usually don't they are not enough or we don't have them but we decide to use common sense if you want and these practical requirements we want to the initial conditions should represent accurately the state of the real world and projecting to the model attractor so those are two requirements this is difficult in the process of model error as we have seen because of the initialization shock and drift but the idea is to minimize this initialization shock there are the other practical requirements arising from calibration that the error should be as stationary as possible and we also would like to have consistency between the re forecast and the real time so we have re-analysis and that we try to do as consistent as possible and representation of uncertainty so I'm going to illustrate these aspects with the ocean and here we are changing that we want to predict Enso and so okay which part of the ocean and here we see a half meter of sea surface temperature anomalies daily SSD anomalies on the right and on the left is the depth of the 20 degrees of thermocline anomalies and you can see here and there is this big linear and that happened around 2010 11 can we predict this so if you see in the depth of the 20 degrees of therm you can see the scalving waves propagating both in this case positive in this case negative from east to west the point of this slide is just to say it's not enough to say okay then ocean will have larger thermal energy we only need to initialize the mixed layer we also need to initialize the dynamics so because the dynamics is predictable and in fact it gives the equator is the main source of predictability at this time scale so how do we go about so most of these slow components not only the ocean but the land you can think around the same of the sea ice we need the model and the atmospheric flux is run atmospheric free analysis those are the best and the first guess if you want and with that you get already quite a lot but it's not enough we also need ocean observations and that they are put together with the ocean model with data simulation methods and in the case of the ocean we have this ocean observations the main one is sea surface temperature but also in situ observations xpt's argo moorings and altimeter so why do we need why do I say that we need data simulation well we have large uncertainty in the flaxis and also in the models that can lead to large uncertainty in the subsurface and here what you can see I don't know if you see this and plot those are the equatorial atlantic the windestress anomaly between two different products and the equivalent uncertainty if you want time series of the upper ocean heat content which is quite large and the uncertainty is equivalent to the variability that we want to predict so the questions are does we have ocean observations can we constrain the ocean state and the answer is yes if we constrain use data simulation these two cars can get closer is that enough it's not enough so we want another two requirements we want to improve the ocean estimate not only that there is less uncertainty but the estimate is better and we want that this improve estimate improves the seasonal forecasted scale or the forecasted scale so we know I mean this is an example if you look at the ocean analysis and the assimilation increments this is an example we know that these assimilation increments have a mean and I mean difference which is not what the methods are not the describe the data simulation methods are not they don't target model error usually but we see however that the mean assimilation increments correct I mean it corrects the total ocean heat content and the slope of the thermo climb in this case so this is important because it has implications for the temporal variability as I am going to say right the observing system is changing continuously and here you see the observation distribution on June 1982 that was a very I think it was a war meteorological year or something like that and we had an unprecedented then unprecedented coverage of ocean observation in 2005 the observations you can see with the advent of cargo we have observations almost everywhere in the southern hemisphere and the subtropics and these are time series of how the different components of the observing system has been changing I mean you can guess where I'm going if the model has errors and the observations are changing what will happen with the main state of the estimation of the inter-annual variability from reanalysis and here is what we see this is one example one of the first early works that I did when we were doing the first reanalysis at ECMWA ocean reanalysis we had one we always call one experiment that is controlled that one experiment that has induced data simulation and ocean course with observed SSTs and forcing and we could see the depth of the thermo-cline in the equatorial Atlantic it would be this red or black line however in the experiment with simulation when the Pira termuris were introduced they produced a huge jump and they corrected the state the diffuse and the depth of the thermo-cline in the in the Atlantic this jump is huge and if you don't know where it comes from it may be mistaken by spurious variability so you want to avoid that and the way that with the advice to avoid that was to say okay let's take the information of the observations now and try to extrapolate it into the past and this is basically by adding a bias correction on the on the tendencies of the mother and this bias correction back to terms one that it was a priori estimated from the recent observations and the other that it was estimated online is estimated online in fact and we apply this term retrospectively and in fact if we apply that we avoid or ameliorate these problems with the jumpiness and that's what you see here in the in the blue curve so just to give you an idea this is the from Argo period we can get these terms these correction terms that we apply to the mother and the observing system will always be changing so for reanalysis not only ocean but also atmosphere we really need to have these corrections on the tendencies and by extrapolating the observation information into the past this is polemic and it's by no means without flow but it's very important so once we do that it would if you remember one of the questions that I had is that's this data simulation improves the internal variability and when with records of essential climate variables to see whether we improve the temporal correlation and that's what you see here this is the control when we don't assimilate observations when we assimilate observations we get better correlation with altimeter c level of course if we assimilate altimeter c level this correlation improves further which is not so surprising but what is the challenge here would be with the that the to be able to project the altimeter information that is only on the sea surface into the depth and this is it has a little effect not that much but it improves the ministry them in the surface so those are the criteria that we do if you ever go into this game of doing of initializing similar small and the final the final requirement is does the data simulation improve the forecasted scale and again we only have always have one run initialize without observation that would be the blue and the red is when assimilate observations at it and improves and then we are very happy and this shows that in our current system it would be system five this shows how the skill of the system on and so has been improving over the years this is the system five if we didn't assimilate observations we will have the skill of years back so the assimilating the ocean observations is equivalent of 15 years of progress on model development i say that because it is quite important i think it's the it's hard that is the way to go and there are many other attempts of saying okay maybe we don't we only need sst you can imagine and i think that was and this syntax system and for the kdal forecasting sometimes it's inside if you only have sea surface temperature and you have a couple model where you nudge the sea surface temperature that should be enough to initialize the kdal or seasonal forecast or we tried to quantify and to see whether it was true and of course this the answer could be system dependent but what we see in our system is that the ocean observations so first atmospheric observations are essential so atmospheric analysis are essential and that would be the contribution what they call wins here ocean observations are also very important and the the combined true it increased the scale by 25 percent in several areas the only area where that is it was problematic at that time was the quatorial atlantic and i think if we see this is because there is something wrong with our methodology so this is an area to pay attention and try to improve the methodology not try to remove the observation that's so which other approaches exist for initializing our system predictions so at the sometime ago and perhaps now as well we have these two different paradigms that happen because we have the real world and the model world for the medium range we tend to initialize everything on the on the real world and we call this full initialization and being close to the real world is perceived and as an advantage because the model is slowly drifting to its own mean state and that's the figure that I showed at the beginning the this process of drifting means that we have a skill at the early times and the decadal predictions at this some years ago things have been changing but at the time that we did this work it was very controversial the what it was proposed was to do a normal initialization to avoid forecast drift and the idea would be to initialize about the model attractor or mean state and that was called anomaly initialization but then if we go for this dichotomy or what happens with the seamless forecasting no it would not be possible really where is seasonal is it closer to the real world or to the model world so the other comment is that anomaly initialization doesn't mean necessarily initializing in the model attractor so in fact sometimes it's not at all and you can achieve initialization in the model attractor with full initialization but we have seen so far or I hope I have conveyed that the full initialization that is the one that I presented he had two main caveats or problems is one is the initialization resulting from unbalanced state and the non-linearities and non-stationaries so let's try we did some work trying to compare this full initialization with anomaly initialization so the full initialization as I said is what we do in the ocean is as in the medium range forecast in the atmosphere except that the model bias is taken into account during the data simulation the apostrophe calibration of the forecast is needed and this calibration depends on lead time and on initial date so it's quite expensive because you have to do quite a lot of reforecast and finally if this initialization is uncoupled the model whatever we learn about the model drift during the data simulation can not be applied in the forecast which is a bit with coupled data simulation maybe we can estimate this bias during the initialization process and propagate the forecast what about the anomaly initialization I represented in this diagram and the idea is to conduct a long couple integration to estimate the model climate of this only one realization for many years and then you have initialized with an observed anomaly to superimpose the observed anomaly and then you get this information so what are the benefits and the cons and the and the pros so the original purpose was to avoid the expensive reforecast so if you have a look at these forecasting systems and the full initialization is much more expensive than the anomaly initialization because of the number of reforecasts in this case the model climatology doesn't depend so much on the lead time but this is not enough because with this long integration doesn't give you any idea of the skill of the system and you need to provide an indication of the skill so that's not enough you still need reforecast for assessment and the other thing is that calibration is still needed even if you do put the observed anomaly these models still drift and because they initially say the anomaly initialization is not always perfect and there are initialization shocks the other problem that it has is that you always need to define the anomaly so what happens if you have observations for first time you can unless you have a reanalysis you don't know what is that so this anomaly initialization indirectly always depend on some full reanalysis being there is something interesting regarding the bias is that this anomaly initialization acknowledge that there is model error during the initialization but we call it this the bias the algorithm for data simulation is called bias blind because it removes the bias instead of correcting so if you compare these two experiments in this case these two equations here we are removing the bias here we are correcting just to give you an idea of what are the pros and well the pros and cons the the algorithms I don't know how I don't want to rush you but if you could wrap up it would be great so that there's time for questions okay yeah just to say I don't I have some slides on ensemble generation but maybe I don't need to say much about it I will leave the slides with you basically what I want to say here is that for seamless to represent the uncertainty on the ocean instead of using optimal sampling which it could be possible but it will need modification the linear propagation or generalized propagations to compute the singular vectors what we do is more sort of common sense it's just that we know that we have uncertainty let's check take our knowledge of uncertainty and apply it so we apply each uncertainty what is the wind uncertainty what is the uncertainty and that is introduced by the spin up by the ssd if possible we try to put also p-c absorb solar radiation ci I mean all these things and we put this during the for during the production of the initial conditions we also sample uncertainty by perturbing the observation location or by doing random thinning instead of super oving that allows this is very good for ensembles because it allows using the observational information differently and finally that couple data simulation allows us to even with this simple scheme of perturbations it allows us to sample flow dependence perturbations flow dependent errors and I guess I don't want I have some examples of how the uncertainty on solar radiation is in a uncoupled analysis versus sorry uncoupled versus couple and also how this in ENSO how I'm going to go how with couple data simulation we can have this uncertainty deputation associated with the displacements of the convection and uncertain the wind the stress associated with the westerly wind bars and NGO that's all I wanted to say the couple model is very good I mean that's one of the advantages of couple data simulation flow dependent a natural way of representing the dependent uncertainty so so I want to finish now we have seen some criteria to design a good initialization for their system trying to reduce the initialization shock we have seen the needs for drift and calibration and the need for historical stable records of initial conditions consistent with the real time and we have also seen the importance of treating the model error during the data simulation processes and exploiting the observational information and extrapolating it into the past and I try to boil a little bit some examples on the initialization of the ocean for seasonal forecast but it's important to initialize the dynamical and thermodynamical processes and being aware that the data simulation changes the main state and so we need to treat the bias and that even if it's challenging it's worth a simulation I try to see what are different other alternative approaches and try to do a comparison between the full initialization and at the end very quickly I went about how to represent uncertainty and that's it thank you I thank you so much I would love to ask the question can you go through all the steps in the last section of your paper but I think maybe we do this in a full blown seminar because it's super interesting the whole issue about ocean initialization and and uncertainties