 introduce the next speaker. Our next speaker is Sergei Froilov. Froilov and him, he is a data simulation and coupled model forecasting expert with a strong track record of formulating and implementing advanced computing algorithms that drive a system science modeling and observation workflows. When I read for his CVs, I thought it was really interesting that past contributions included his scientific support for negotiations of the US-Canada water sharing treaty and other advisory tasks. Now Sergei is in Boulder and he is working here at NOAA and working on the coupled re-analysis system with the UFS. Sergei, welcome. Thank you, Judas. It's pretty strange when people read your, you know, describe yourself resume blurbs aloud. Who's that talking about? I thought I need to rewrite mine after this. Thank you, Judas and Anish for inviting me to the summer school. Summer school was a highlight of my graduate careers and it's nice to contribute. Also putting this presentation together started to appreciate the teachings that all of our university colleagues are doing. It's definitely different ball games and doing research in the lab. Okay, so with that I'll share my screen. I will shift the focus from MGO and talk about something much more simpler than a couple data simulation. Okay, first I'll start by doing a brief introduction to what data simulation is and I'll take a historic perspective to it and I'll talk about relevance of data simulation to S2S forecasts. So the basis of data simulation is the ability to communicate observations quickly. So before the mention of telegraphs you couldn't actually observe the weather, carry it ahead of the weather and deliver it to a different place where it matters because you could only ride as fast as a horse, which is about 75 miles per day and we know the travel weather travels faster than that. So in 1840 the telegraph was invented and you could actually telegraph your weather report across the continent and the first weather maps were created. So this is an example of a weather map created in 1887 and you can see that there's really very few stations. There is a station in Kiev and Petersburg and there's one in Prague and there's none in Italy or Spain. So the weather observant network was very sparse back then. Can I just check with you guys? I see everybody else freezing. Are you guys hearing me? Yeah, we hear you well. Everything works well. Because everybody else's image froze so I just wanted to check. Okay. So this analysis map is really the first form of data simulation where you take sparse measurements and you're trying to make a map out of it. So in 1904 weather prediction was formulated as a two-part program. First you have to make an initial condition and then you take prognostic equations and predict them. So making initial conditions is a data simulation problem. And in 1922 they tried to do it by hand using human computers. They took six weeks to produce a six-hour forecast for two locations in Europe and they were off by two models, orders of magnitude. And that was the essence of numerical weather prediction for the next 100 years. Just making these things better and faster. So the first practical weather forecast was 1950 when the first computer became available. 1980s we see the first global atmospheric and ocean models. There were two separate things. 1990s was the first ensemble forecast and you guys heard about it on Monday. So 2004 the first coupled global model forecast was transitioned. It was NASA's NOAA CFS system and that was the birth of coupled data simulation because for the coupled model you have to produce initial conditions for both ocean and atmosphere and at the first time coupled data simulation became the same. And as a recent milestone 2018 all models in European center suite become fully coupled. So they no longer run uncoupled model unless it's for testing or something. So the last 20 years is really when coupled data simulation became a necessity and I'll talk about that, how we're going about it. So a little bit more about data simulation as an interpolation problem. So here is a hand-drawn map. It seems like it came from some course or an undergraduate course. So instructions are very simple. Connect the points of the map with equal temperatures. Separate regimes with warmer temperatures from colder temperatures. And it's pretty simple. You took all the area where the temperatures are 30 and color it in purple and you take areas in 40, color the blue and so on and it seems pretty simple. But for model initialization it's much more complex and here is why. So you have to assimilate about 100 million measurements a day so you can do it by hand. And you need to initialize 3D fields with billions of grid points. So you have to produce a grid analysis. You also need to convert from measured quantity such as brightness temperature on top of the atmosphere. That's what the satellite sees, right? To model prognostic variables such as potential temperature through the entire depth of the atmosphere. And finally you need to weight accuracy of different measurement systems against a forecast. So to do all of that you have to solve is an optimization problem or linear algebra problem depending on how you look at it. But before we go talk about all these words on the slide let's look at this figure first. So this is a simplified the weather forecast and the scalar prediction problem here. So on the x-axis is time. This is our simulation window. We collected our observations with green dots. And we're making a forecast from a previous time in the blue line here who made the forecast and it doesn't go through observations. We'll have errors. So what we want to do, we want to correct our initial condition for the forecast in such a way that it will fit observations as closely as possible. And as closely as possible is defined by our prior guess of what observation errors are and what the accuracy of the forecast are. And to go about it there are two equivalent formulations. One is a minimum, you know you can look at data simulation as a minimization problem. So you write out a quadratic function here with simplicity. So we have initial conditions. So a forecast from an initial condition and we have our analyzed initial condition. You take a difference from that and you weight it by this, that's actually is an inverse. I forgot to put an inverse here, by the inverse of the forecast error covariance. So that specifies how tightly you can, how far away from the initial condition you can move to generate analysis based on how accurate you think your forecast is. And on the other hand you have this miss feed between observations y and the forecast from your analysis initial condition. And you're specifying with your inverse of r, I forgot to put an inverse here. So r is your observation error covariance matrix and it specifies how much you trust your observations. So it's a quadratic derivation problem, there are methods to solve it and there's a whole class of algorithms doing it. Alternatively you can rewrite it as a linear algebra problem so you still have your y minus h of x a here and you have this big matrix inverse multiplication problem in front of it and you solve for it in one way or the other but there are two ways to go about it and there are practical implications of why you will go one way or the other. But the crucial part of it is you have to specify this r, observation error covariance and p, a forecast error covariance matrix. So you would think that specifying r will be easy, you just look up the specification for an instrument and tells you that it measures temperature with accuracy of 0.2 Kelvin. So it turns out that this r also incorporates representation error. So how well does your grid model can represent a point temperature measurement? So in practice, r is always tuned through trial and error. For p, we usually can specify it either as a parametric form, so maybe some the correlation scale of the Gaussian through some physical balances such as geostrophy and vertical balances or you can specify it through ensemble simulations. You find a bunch of ensembles and you compute some empirical correlations from it. And that's probably as much as we need to know about the algorithms themselves. So how does a data simulation problem connect to the S2S forecast? So you saw this slide a couple of times already, so you have your weather forecast and you started to be very, very good. And by day 20, it really is not very useful. And then there is subsaisonal time frame, which in these figures between week five and week eight, so second months of the forecast and then it leads into the seasonal forecast. So we know that weather forecast problem is a data simulation problem. So the quality of the weather forecast is in large part determined by initial condition. If you read through these details, it says that subsaisonal forecasts are determined by ability to monitor media engineering oscillation, land surface data, and other sources. So really subsaisonal forecast is our ability to initialize land, ocean, and some properties of the MGO. And for seasonal, you want to really initialize and so in CI is corrected. So all of these forecast problems have a large component of data simulation or ability to initialize a model state. So let's actually, you know, like everybody looks at this figure here and they're like, oh, you know, MGO has predictability in months too. So if you actually look at our ability to predict a week MGO, and here we're looking at anomaly correlation of, I think, RMM-1, and we have three models here. The scale of the ECM-DLVF model, the scale of the CFS-V2 model in blue, and the scale of the Navy ESPC. It's a new subsaisonal system that Navy has developed and it was transitioned a couple of years ago. It's in black. So you can see a couple of things here. First, like, I would like you to look at the scale of the initial time. So you could see that none of the models actually have a perfect ability to initialize MGO as an atmospheric state. So none of them get initialization of convection and clouds correctly. And wins. So there's still quite a bit of work to be done to do a better job initializing the atmosphere in the tropics. Secondly, there is a big gap between the best model, which is ECM-DLVF, and between the runner-ups. So there's a lot of work that can be done and better initialization of the atmosphere. If you look at the extended range, week three, and so on, you can see that the biggest difference is between the first generation of the coupled models, which is CFS, and the modern generation. So all modern models get the skill from MGO about the same skill, about right. But the previous generation was really struggling to do it right. So my take on it, that data simulation has a role to play at two times scale. So I already talked to you about the initial time. It's very obvious. The second part is you could really use data simulation to inform how you specify model error and parameterization, how you improve models. And this iteration between data simulation and model development is something that will improve weeks two, three, and four. And finally, to make a final connection about data simulation and MGO, if you look at this map, so it's correlation in between ROM-1 and MGO and OILAR index. And you can see that a lot of action is happening over maritime continent. We talked about it today in other lectures. But if you think about it, to initialize the state of the MGO over maritime continent, you have to get a lot of things right. You have to get SSTs right. And if MGO is active, you have to get SST right under the clouds, which is a whole different problem. You also have to get the land right because you're transiting over very mountainous terrain with a lot of moisture in it. And you also have to get the whole atmospheric cone right, what the clouds are doing, what the convection, what the active convective state is. And honestly, we don't know how to do any of this very well. So it's almost a magic that we're doing as well as we're doing right now. So really to improve MGO skill, working on our ability to initialize over maritime continent is a key. So what I'm going to do in the next part of the talk, I will go through atmospheric ocean ice and land perspective on data simulation. And I want to talk about what are the actual observations available for each of these components? And how well do we do with assimilating them? Because our skill is really coming from our ability to take information from measurements and do make it into the initial state. Because there is no measurement measuring the saying, there's nothing you can do. Okay, so this is a kind of an overview slide of why a couple data simulation is difficult. So this is a little slide made for Navy model. So we have an atmospheric model, it's at 19 kilometers and 80 levels. European center model goes to nine kilometers, so it's about twice as refined. But the initialization is done at 100 kilometers, well, I'll correct that. You initialize a 19 kilometer model, but you compute your increment at 100 kilometers resolution. So for ice and the ocean, you might be resolving the ocean ice at 125th degree, but you're only incorporating observations at the one eighth of a degree information. And the details of how you're doing it are very different for atmosphere and for land and for ice and ocean. The other differences between the different mediums is the scales differ. So if you take Gulfstream and Jetstream as a comparison points, they have almost one to two orders of magnitudes in the representative scales. Observation data delays in atmosphere, you know, you might get things in order of one or two hours. And in the ocean, you might be waiting for 24 hours or more for the observation data to arrive. Observation coverage, you have almost complete observation coverage from polar orbiters in 12 hours. And for specific regions from gestational satellites, you can get a complete picture every five minutes. But for ocean, you have to wait for 10 days for all the outer flows to surface. And the model resolutions are very different as well. So how do we go about it in each of the media? So let's look at our atmospheric observing system. So the three work horses of atmospheric observing system, first is what we can call conventional data. So conventional data usually means a thermometer or a humidity probe that we physically stick in the middle of the atmosphere. And the ways for us to do it are using aircraft, ships, buoys, weather balloons, and sometimes we throw zones and we have a lot of ground stations. So that's our basic ways of sticking a probe in the atmosphere. From a remote sensing perspective, we have radar, which is pretty sparse. But the work horse of weather prediction from a satellite perspective, it is polar orbit in satellites. So what does polar orbit mean? It means it rotates around the globe. It goes pole to pole under some inclination. And every so often it will circle the entire globe through this really narrow swath of view. And we'll see an example of that. And another satellite is geostationary satellite. So if polar orbit flies maybe 200 kilometers, 150 kilometers above the Earth, geostationary satellites have to sit in a much higher orbit, a thousand kilometers up. So it actually is rotating together with Earth. So it always looked at the same spot on Earth. So it observes a large disk. And the benefit of it, you can take a picture every five minutes. The problem is that there's only a certain set of sensors you can deploy on the geostationary satellites. And because it's so much further away from Earth, it affects its resolution. The other type of satellites are GPS satellites. And I think it's pretty old picture. So this picture of satellite is pretty big. But there's a very large number of small sets. And they're about a shoebox size. And they orbit the Earth. And they measure the time that it takes to travel from different GPS satellites to the shoebox. And this time is affected by temperature and humidity along the path of this GPS transmission. And that's kind of a new thing. That's the extremely accurate measurement. And small sets are very affordable. So that's a different way of observing Earth. So let's talk about what sensors can we deploy on the satellites? So infrared is one type of sensor. So the downside of the infrared is you can only see above the cloud. You cannot see through the cloud. So the infrared was the first type of sensor and it kind of shows you where the clouds are. But that's about it. Well, it's not true. From geostationary satellites, that's what's available right now. So you can take a picture very quickly from geostationary satellites every five minutes. You have a picture. You could also have a lot of channels. You can have a thousand channels. So you can profile. If there's no clouds, you can profile atmosphere in very small increments. Microwave is a more powerful tool. It can see through the cloud, but we can only deploy them on polar orbiters right now. So you could only get a complete picture every 12 hours. I talked about conventional GPSRO. And the question is how much of each of observing systems contributes to our ability to sense the Earth. So microwave is a workhorse. It contributes. So the different colors means tropics is green, southern hemisphere is red, northern hemisphere is blue. So microwave is a workhorse of observing system, even though we don't get the pictures very often. They're the most informative ones. Infrared is very informative, especially in southern hemisphere. MVs is the atmospheric motion vector. So actually looking at the five minute snapshots of the clouds, you could see how fast the cloud is moving and convert it to the vector wind at the cloud height. Conventional observations are extremely powerful in northern hemisphere, but not very much elsewhere. And the same thing with aircraft. Aircraft are extremely useful observation in northern hemisphere. Okay, let's switch to the ocean. So here you could see the polar orbiters. You could kind of see how they only observe a swath of Earth at the time when they go north to south or south to north. For oceans, the picture is different. The observing system is surface dominated. So we're observing a sea surface height altimeter. So it's a pencil beam. It's a radar. So it shoots electromagnetic pulse and see how long it takes it to come back. And it measures how high the surface height is. Or we can look at the SST either using microwaves or infrareds. And we have a lot of our profilers, Argo profilers and surface drifting stations. And you can see the tau array lights up once in a while when it reports back. You can also see that the number of ocean observations really peaked after the 2000s once we start having Argo array. And before 2000, most of them were just surface based, which is sheep. So before about 1990, we know very little about subsurface of the ocean. Ice. The workhorse of ice observing system is microwave sensors. So this is a picture of an ice extent from a microwave sensor. I'm sure too here. The alternative for determining the ice edge is ice charts. The way people obtain ice charts, the United States buys a whole bunch of SAR imagery from Canada. There is a person sitting there every day with a mouse and they look at the SAR imagery and they see where the ice edge is and they draw it on the map with their mouse and they come up with an ice chart. And you can see that in some of the ice chart is much more accurate what we get from microwave. And the reason is because if you have water sitting on the surface of the ice for the microwave, it looks like water. But really it's ice with a water puddle on top of it. So the two of them work together. It's a nice thing that we do have microwave imagery going back to 1979. Thanks to US Navy who needed to sail submarines and wanted to know where the ice is. A new type of measurements that we're trying to incorporate in our models is ice drift. So it's measured very much the same way as cloud drift is. You're looking at the surface of the ice and you're seeing how the cracks in ice are moved when you're tracking features in ice. And you can see the ice motion because ice moves very slowly. You can only get one of those estimates every two days. The other one is ice height. So the different ways to measure because what you want to do is to measure ice thickness. The best way we can do it is by measuring ice height. And then you can compute height into free board of how high is ice above the seawater. And then from density of ice, you know how much of ice is underwater and you calculate the ice height into ice thickness. You can also measure ice thickness by the color of the ice. And it works better if the ice is seen. And you can also measure it from SAR sometimes as well. Finally, the final observer system is land. We can see the ice is the temperature of the land very well if there's no clouds. That's the first picture. We can also observe the snow extent, which is very important for the forecast scale, especially when it's full and there's a lot of snow accumulating and changing albedo. We have extensive network of ground stations in developed countries, less so on the maritime continent. And finally, we can sense soil moisture. And you hear a lot probably about the impact of soil moisture now as a ability to our way to predict on subsistence scales. So indeed, if you assimilate soil moisture, you improve your forecast. But so far, this improvement has been only seen up to day six. So we still have to figure out how to take that information about soil moisture and project it into subsistence scale. And WP systems usually do not assimilate biomass properties because biomass is really not part of the land models and WP. But there is a different class of land models that are kind of more climate change based and biomass is a big part. Okay, so that was kind of an overview of the type of things we can observe in a modern observing system. So now that we have fully coupled models, and now we'll be talking about fully coupled data simulation, let's see how we can exploit this system to bring more information to initialize our coupled models. So you already saw this slide on Monday. So I use this slide to describe the challenge of data simulation opportunity from algorithmic perspective for quite a while now. So what we have is we have the systems that have been developed since the 1980s, that can do atmosphere and ocean as well as pretty well, not perfect, but pretty well. What we're trying to do in coupled data simulation, we want to use this cross fluid correlations that can inform, if I observe the ocean, I want to improve my atmosphere right away. And the other opportunity is that a lot of satellite measurements are actually sensitive to both atmosphere, ocean and land, and we're not using that information very effectively right now. So that's another opportunity. But in this talk I want to ask... Excuse me, Sergei. So you have about two more minutes. I don't want to rush you, but just to let you know that... Yeah, it's near... Yeah, so that you have time for questions. So don't rush it, but just get the message across so we can open it up for questions. Oh, sorry. I missed time with my talk. I thought I had multiple... Apologies. Yes, I understand and I apologize for it. Okay. Well, there is about... I'm only halfway through my talk. I'm so sorry for... No, just take five more minutes and go to the initialization problem. That's really important for S2S. Yeah. Okay, so let's look at this slide. So we're looking at the correlation between SST and wind speed here. That's one way to judge the strengths of the correlation between ocean and atmosphere. And it really falls into three regimes in a global model. We are either looking at the boundary currents where the ocean is driving atmosphere, small perturbations in the ocean will drive atmospheric response. And, you know, it's Gulf Stream, Croatia, but really ACC is the biggest one. And Eastern Tropical Pacific where you have these reflective pros and b-waves. The second regime is tropics and subtropics. And specifically when the mixed layer depth is very shallow. When the mixed layer depth is very shallow, a small perturbation in your convective state can translate into your ocean state very rapidly. And finally, you have mid-latitudes. So mid-latitudes, you have the seasonal, large seasonal change in the depths of the mixed layer. And depending on this change, you're either sensitive to the atmosphere or you're not. And you can look or fly at some mechanisms behind these three things. Over ice, we have a great opportunity to assimilate things like ice temperature and ice velocity that's currently not assimilated. But both ice temperature has a strong correlation to atmospheric temperature. And ice velocity has a very strong correlation to both atmospheric velocity and to ocean velocities. And by developing these new assimilation methods, we can really close the observation gap in high latitudes. In the atmosphere, the key problem is that we're not assimilating a lot of channels that are sensitive to land surface and ocean surface because we really don't know how to specify it very well. And the challenge is to do it better. And the way that Alan Geer puts it, if you look at the SMIS imagery of the surface, you can see a lot of features on it. But we're only exploiting the things circled in purple and CIS as well. The rest of it, we're not exploiting. And that's an opportunity. My final slide is the importance of reanalysis and reforecasts. I think that Frederick Wittarff mentioned already that reanalysis and reforecast is really the workhorse of S2S prediction. So here I'm showing precipitation scores and they happen to be for the first week. But you can see that if you don't reprocess precipitation forecasts, the skill is not very good. But if you reprocess, if you use statistical processing techniques to enhance the skill of the forecast, you can get pretty good skill scores even in the week one. But our ability to reprocess really depends on how many years of statistical data we have. You need to have between 10 and 20 years to do it for week one. And you can imagine you need even more time to do it for weeks three, four, five and six. And you get this information by writing a longer reanalysis, not of 20 to 40 years, and then writing a reforecast from them. And that's the importance of reanalysis and reforecasts. And with that, I'll wrap up, saying that data simulation is an essential part of our system prediction enterprise, migration to coupled models, open new opportunity for research, and tightly coupled data simulation. And reanalysis and reforecast is a workhorse of S2S prediction. So thank you. I'm sorry for running late. Thank you so much, Sergei. And now apologies. We weren't clear that the 45 minutes includes 15 minutes discussions because we wanted the students to ask as many questions as possible. Thank you so much. I was really interested. And I know you have a very vast and deep knowledge about how difficult really it is to do a coupled data simulation.