 I'm delighted to introduce our next speaker, Sue Ellen Haubt. So, Sue, I introduced her before for one of the debates, but today she's going to give a talk about using machine learning for post-processing forecasts. Sue Ellen is a senior scientist here at NCAR and the deputy director of REL at NCAR, which is the research applications lab. She has worked a lot in boundary layer meteorology but also large-scale atmospheric dynamics and is also very well known for applications of artificial intelligence in the environmental sciences. Indeed, she co-authored a book about genetic algorithms in 1998. So she was really one of the people in the geosciences who used genetic algorithms and machine learning as we think of it today was one of the first women to use it. Sue, I am delighted to listen to your talk. Thank you, Judith, and just want to confirm you're seeing the rights presentation screen. Yes, looks right. Great. And thanks so much for inviting me. And to be honest, I've really been enjoying returning to the large-scale dynamics where I started my career over the last few weeks. And I've been able to catch a fair number of the talks. Now, today I'm going to talk about machine learning and Inish specified, like to know a little bit what the private sector is doing. That really made it a bit of a fun project because I had a chance to poll a lot of colleagues in the private sector since they don't publish their results very often and really do a bit of a survey of what they do. But first, I'd like to start. The first part of my talk is going to cover some of the sorts of things we can do with machine learning that does impact the forecast for the sub-seasonal to seasonal scales. Now, first, I want to return to, say, the 50s and 60s when we were first deciding how to best advance weather forecasting. And there were two approaches that were generally considered. First, the equation-based approach, numerical integration of the equations of motion with pre- and post-processing. But then people talked about using statistical methods to forecast. Well, as we know, for quite some time, the NWP approach really won out. And yes, we do assimilation. We do do post-processing. But it's really been more recently with the growth of computers, with the emergence and popularization of artificial intelligence that we're beginning to see the second way used more. Now, there are some of us, as Judith said, who've been doing this for some time. It really isn't new. Working with people in the American Meteorological Society and back in 2009, we actually codified some trainings that we were giving, given in terms of a book, Artificial Intelligence Methods and the Environmental Sciences. But I think agreeing with Libby and most of the folks who are in the field that blending the physical approaches with the statistical AI approaches really could optimize prediction. Now, we've seen a lot of changes in the last 25 years or so that people have been using machine learning. We found new methods to observe data. There's faster observations. Satellite data is coming in at higher resolution. We have very specialized instruments in the field, for instance, for solar power sky imagers that look directly at clouds. And all this data comes in in real time. It really is an internet of things approach. And again, not new meteorologists have been leading this for a long time. Now, we can also leverage the gridded model output. That NWP model data certainly is very useful. We've had huge increases in computer power over that time period. But then we bring in the AI machine learning methods. We can both leverage and offer an alternative to the traditional methods. And it really is a big data problem. Now, I'm going to overview a few types of things where machine learning are used. On Monday, Matt talked about linear inverse models. And I was glad to see how much that has advanced as computer power has advanced, where we're now coupling oceans to atmospheres, doing much better than back when I built linear inverse models. But one thing, after building some linear inverse models of the climate system, I said, well, the atmosphere isn't linear. What if we could add a nonlinear term? And in this case, a quadratic nonlinear term. After all, infection is a quadratic term. If we, the problem is the linear inverse models, we have ways to solve them. We don't have ways to solve this messy tensor equation. But what if we set it up as an optimization problem? Where we basically subtract the right side from the left and see how close to zero we can make it, turn it into a minimization, solve it with a genetic algorithm. Again, I was doing this in the late 90s, early 2000. So you had to pick a small problem to do at that time. What's one of the most interesting, really nonlinear problems, Lorenzo tractor, of course. So if we pose it as this nonlinear inverse problem, apply a linear inverse model to it, and we find, of course, we get a decaying structure as those types of models are meant. You know, it's a slow decay and it's in the right area of phase space, but it is decaying. But if we use the genetic algorithm and optimize, we're able to get something that at least has a shape that is a little bit more like a butterfly attractor. It's not perfect. Again, it's in the right region of phase space and it's non-decaying. So that was one interesting approach using machine learning to solve nonlinear problems. Model parameterization. Libby mentioned using it to speed radiation. I wanted to give one more example and get at the problem. What if we have observations? Can we build some parameterizations based on those observations? Now, one parameterization that really we saw as low-hanging fruit is surface layer parameterizations. They all use Mononubikov similarity theory. Okay, so general relationships between variables and then you have to fit it empirically. Most of the fits are done using data from flat prairie type terrains. So why do we expect it to apply over the ocean in complex terrain, et cetera? You know, we have these stability functions and when we actually plot them, we find they don't fit the relationships. So we gathered data from two data sets where we had flux measurements, Kaba in the Netherlands and Scofield Tower in Idaho. And the idea here is to fit a random forest. We also did a neural net to predict the friction velocity, sensible heat flux and latent heat flux that could be used in models. I don't have time to go through the details but let's just look at some results. The Idaho test set is the top table, Kaba, the second table. On the left of the tables, you're seeing the R squareds. Of course, one those as close to one as possible and mean absolute error that we want as close to zero as possible. So, you know, looking at friction velocity, temperature scale and moisture scale, the three variables we care about, comparing Mononubikov similarity theory to the random forest trained on the data that we're using to predict. And indeed, we find the R squared is higher and the MAE lower for those as one would expect. Now, and the Kaba very similarly, the interesting thing is what if we take the model that we trained on the Kaba site, apply it to Idaho and indeed we still get an improvement. So the conclusion here, and this is, you know, early we just submitted the first paper on it, random forest and neural nets can significantly outperform the Mononubikov theory. And that's true even when we apply it to the site that is different than the one it was trained on. And we have been testing, putting it in the weather research and forecasting model, finding that we can use it as a surface layer parameterization. There's a lot of complications with doing that because, you know, the most theory is used in several different places. So it's complex to try to change your models, but it's possible. Downscaling, another really great application. And I'm going to highlight some work done by Ryan King's team at NREL that I was really quite taken with. And the question here is if we have climate model data at 100 kilometers and we want to predict, we want to be able to look at the wind and solar resource at very fine scale, two kilometers, can we use machine learning to do that type of downscaling in a very realistic way? So they used NCAR's CCSM data for their training data, some of their in-house toolkits for the fine scale data to make that two kilometers. That's a 50 times downscaling. Again, just showing some results. The climate model data at a particular time might have this type of resolution. And when we take a particular place, the climate model data, of course, is very pixelated when we blow it up. But doing the super resolution by 50 times, we get a realistic type flow that was conditioned on what's happening currently at that particular place. So we say, well, how long does it take? Well, with about 40,000 training images from CCSM, it took them about three days to train on one GPU. But once trained, you can generate about 400 images in less than five minutes. Really an improvement over running high resolution models. Both their research group and ours is now going the next step. If we have mesoscale data, can we get large eddy simulation scale data? And first results are looking very promising on that as well. Now, let's move into real research to operations problems, moving to what is provided to the clients. And first, let's think what it takes to do these RTO problems. We start with an end user that has a need. And the basic research community has been doing research in a certain type of weather, S2S, climate prediction. They're way far apart. We're not meeting the user end needs. How do we get there? Well, there's applied research that goes a first step. And in our case, and in most cases, you need somebody who's willing to fund this process. You have to have data monitoring in the field. You have to have some real time computing and prediction capabilities. But then this important extra one, translating the results, really connecting with the end user. An example of this is a project we had from Department of Energy a few years back, their SunShot Group to predict solar power forecasting. It was really a public-private academic partnership where we worked with utilities and independent system operators who needed information to blend the solar power into the grid. DOE provided the funding, NCAR was the lead, but we brought in a host of other national labs and universities. We know NOAA was part of the project. They were going to get the part of the tech transfer piece. And so were many of these companies in the private sector that participated. So it really took all of this group working together to make an impact. And we found that a lot of what we were doing was post-processing model output as well as real-time data for applications. And the final system, Flowchart, looks something like this, where we had lots of real-time observations coming in, data in the field, standard met observations, satellite data from these specialized instruments like total sky imagers, et cetera. So some of those located directly at the solar parks. We had the NWP models, such as all these NOAA models that are run on a daily basis, but we also had used a lot of AI methods. The ones that are highlighted in red are all AI methods that were important, including a power module. And we were able to output probabilistic power forecasts. Now the private sector has used some of the results of our research and now use some of those techniques in their real-time predictions. Yo, at NCAR, we have gone on to continue research on systems like this, working most recently with the Kuwait Institute for Scientific Research to forecast for both wind and solar. Again, combining physical models like the ones highlighted in blue here with artificial intelligence models to make real-time predictions to help them integrate into the grid. Now, what about deep learning? You know, we've heard about all this deep learning. Can we use that for post-processing as well? And certainly can. I wanna talk about a project led by David-John Gagne back when he was a postdoc here at NCAR using a convolutional neural network. And he was predicting probability of hail greater than 25 millimeters, real actionable problem. But the interesting part about his work not only did he do this forward hail prediction, he did a back propagation error to update the image producing the hail. You know, this is what made it interpretable deep learning. And what he found is we can identify the patterns of the storms that produced that large hail. Looking here at three different levels, 500, 700, 850 H Pascal. Here is a, you know, identified a pattern that is rather common that one would expect, confluent moisture at the lower levels, veering as we go up getting a little bit of tilt to the pattern, it makes sense. Okay, so we identified a pattern. Now, that wasn't the only type of pattern he identified. He also got this really interesting dipole pattern that persisted throughout, you know, the entire, all the levels. We ended up seeing grapple hail seeding. And dock on, you look in the literature and Andy Heimsteld identified this as a real hail producing pattern back in 1980, identifying those types. And, you know, in answer to David who asked the question and observations, yes, this was trained on model data. So perhaps not surprising that we found structures that we know are in the models. So the impact of using these convolutional nets is they can produce more skilled hail forecasts than other types of models. They encode realistic storm information and their internal representations can enable more sophisticated analysis of the weather and climate data. And Will, I hope you don't mind, I love highlighting your example of atmospheric rivers, you know, looking at integrated vapor transport that's a tracer of, you know, atmospheric rivers he used as his truth dataset, the Mara 2 reanalysis data forecast from the global forecast system applied a convolutional net. Now the interesting thing is I like looking at his difference plots where you look at, you know, first let's start with the middle right one, GFS minus Mara 2. We find that a primary source of error is a phase shift in the data. And, you know, that's something when you do point observations using more techniques, we're not able to capture the phase shift. But Doggone, if his convolutional net didn't capture that phase shift, it's not perfect, but it's the first time that really caught me that we're able to do that phase shift using deep learning. So it gives me a lot of hope for what we can do in the future. And as Anish pointed out last week, a group of us that went to a workshop in Oxford in late 2019 archived a series of test datasets that you can pull down and use to try your own model data. We archived Anish's MJO and PNA ensemble data that he talked about last week, Will's IVT data, as well as a couple data sets on temperature and road conditions from Europe. So I encourage you to try that data set. We actually wrote a paper about our thoughts on post-processing. Now getting down into the question that Anish first asked, what is the private sector doing for post-processing? Well, I found by pulling several different companies that there's some commonalities among their approach. First, what they care about is whether the variable of interest is above average, below average, or often there is a tercile for average as part of the forecast. And as you see in Yon-Dutton's plot from whether climate service is here, they do not expect a perfectly behaved PDF. That we do know we have shifts. This one shows just a shift of the mean, but the terciles are not necessarily equal. Most of them start with some sort of model output. They very much prefer ECMWF output. They downscale, interpolate to pull points, whatever is useful, train some sort of machine learning method, most of them are pretty vague about specifically what they do. And then they issue forecasts and put in terms of client needs. Now that's the interesting part. They have to understand what the client needs. And again, from Yon, how they judge the value of their forecast is based on some function that relates to the type of decision that the client is making. UL used to be called AWS TruePower. John Zach told me that their wind and solar clients do want one month out, but they're not willing to pay for it. So it's a little bit of a frustration for them. Hard to justify the research they want to do. And so they basically just extract point in area values from CFS and ECMWF output, but they are doing more research on extended use and using the indices that we've heard about over the last few years. They particularly like ENSO, PDO, NAO, AO. They use NWP to derive the indices. Sometimes they're able to get them directly. Sometimes they have to derive them. Use observed correlations to translate to wind and solar. Often use time lab correlations. And they find, as we would expect from what we've heard in the last two weeks, skill is episodic. If there's not a strong ENSO signal, may not have as good a skill. Andrew Anunzio from Second City Weather Consulting tends to forecast a lot for one and two-week temperature anomalies for energy trading companies. He thinks in terms of heating degree days, looks at patterns coming out of the ECMWF models, comes up with probabilities that the temperature is going to be greater than normal, the inverse being less than normal. Of course, this is a particular interesting case. He did early this year. He uses 50 different signals. Some of them he has derived himself learning from his own mistakes. He uses EOFs, Random Forest, Recursive Neural Networks, built his model on a 40 years worth of reanalysis data. And I love his quote where he says, machine learning is a great tool to avoid missing the easy forecast and to help make the hard ones easier. Again, recognizing episodic skill. John Williams of the Weather Company, recently bought by IBM, I guess it's several years. They do both deterministic and probabilistic forecasts. First step is to calibrate the model output. They do S to S forecast for temperature and precip. He shows here a particular use case where an energy company in Australia wanted to know the number of hot days and define that as temperatures exceeding 35 degrees C. And that helps them decide how to plan for peak demand. And they plot the max temperature at a couple of different locations here. And in this particular case, the normal at location X was 7.4 of both 30 and 10 year normal. Location Y was a little bit, was more variable. But their forecast showed a much higher probability. And again, yes, that did verify. They actually use the ECMWF ensemble, bring in data from all 50 members and use a technique they call heteroscedastic censored logistic regression. Vaisala, Eric Grimit actually gave a talk, he pointed me to a talk he gave at AMS in 2017. Vaisala does some pretty robust seasonal forecast. They bring in climate data, looking at 15 teleconnection indices. They use time lags between two and 12 months. They do actively use NOAA's SST, linear inverse model forecast, that's six of their features, as well as the ECMW's seasonal forecast, four features. Now, first thing, they want to look at the correlations between what they're bringing in. They wanna be careful not to over train. So, again, they have a renewable energy motivation and they put a lot of effort into producing useful average data sets. They used 432 months. They divided that into 312 training months, 120 testing. Benchmark that they're comparing to is ENSO probabilities and weighted resampling of monthly climatology. They used all these indices. They considered a lot of things. They tried various machine learning approaches, settling on random forests. Well, how are the results? Monthly results, one month ahead, what they wanna get is their forecast. The blue is their forecast. The yellow is their benchmark. And you see that they consistently beat the benchmark about the same for this particular location. All but this location, they beat 50%. But the interesting thing is, they can get on monthly about as high as 70%, four particular predictable cases. Three months, they do better. Random forest consistently shows scale between 62 and 77%, including more climate indices improves the forecast and the solar anomalies, which I'm not showing here, we're even better than the wind anomalies. Yon Dutton. Now, world climate services, their short range forecasts are done by prescient weather but they are well known for doing seasonal forecasts. And I worked with him in a summer school a couple years ago in Shanghai talking about what they do. And he gave me some information to update it. They have a variety of clients. They focus on prediction of seasonal temperatures. The top left plot shows how they blend the dynamical models with the historical forecast data using intelligent post-processing. And a key component in translating the results are the web tools that they use and the APIs. And they're configured again, specific for the user needs. Now, this top middle diagram shows the workflow for producing the forecasts. Given historical observations and re-analysis data, they use those to train a process. When the current ops are available then, of course, they're assimilated into a model. They get the model data as well, apply what he calls secret sauce, and it is machine learning, to produce their S2S predictability products. And the method includes clustering the predictant, in this case, two meter temperature, to reduce dimensionality, capture regional variability on weekly time scales. They search for antecedent global variables from one week to two months prior. They condition on the climate indices, such as ENSO, that really does enhance predictability. And Jan notes that working with the data is very important. They take a lot of care to include analysis of temperature trends and notes that the variability in the trend is the same order of magnitude as the variability they're trying to predict. So that is taking into account changes. Now, I wanna finish with some work being done in Europe under the Horizon 2020 project. This is a collaboration between universities and companies and the Secly firm project examines the added value of seasonal climate forecast for integrated risk management. And I was able to get hold of at least part of a report that was submitted just in the last couple of weeks. And so they've done multiple projects in this report and tested various post-processing methods for seasonal forecasting. They use both multi-model and ensemble combinations. Starting with the tree-based regression system built by the company UL, they use lots of indices. You can see in this table on the right, some of them are standard indices that they're able to get from the Hadley Center. Some are from NOAA, some are SST. They really AMO, PDO. You can look at all these interesting indices they used. They applied a random forest. Unfortunately, their results were not what they had hoped for but they did note the testing they did was during a time period that did not have a strong end. So it was a low predictability case. Now, at the same time, University of East Anglia and KNMI in the Netherlands, were comparing the SDS prediction of dynamical model suite, multi-linear regression and random forests. And they were looking at two meter temperature and precipitation dividing between February, March, April and June, July, August. So the predictor suite they used for the statistical and machine learning model included persistence trend, cumulative precipitation, large-scale indices, as we see below here and also note that they use CO2 equivalent forcing for a lot to capture a long-term trend. What they did with that was they detrended and then at the end of the forecast, they added back in. Now, again, looking at results, top right figure, you're showing where the dynamical model is best in yellow, random forest in pink and the multi-linear regression in dark blue. And it's a little bit disappointing because the dynamical models do provide the best individual forecast and this is composed by testing it at points all over the globe. And in general, a multi-member combination of the dynamical models was best, but that's not the end of the story. They went and looked, well, what if we compare the statistical models or combine them and just do an optimal weighting of the dynamical and statistical models, five models at a time, how many of those models are either statistic or machine learning? And here on the bottom, they're plotting where there's no models is kind of the dark yellow, orangeish color going up into brighter orange to pink and purple as we get more. And we see there are areas including over Eastern US and Canada and a lot of the populated areas of Europe where it really does provide a lot of value to include statistical and machine learning models in their forecasts. So just in summary, machine learning really does advance applications in weather, climate, S to S. It's becoming a necessary component of operational private sector forecasts at the S to S scale as well as the weather scale. We see that interpretable deep learning is poised to make some real improvements in inroads. Private sector does use machine learning successfully and reiterating the point that Libby started that really combining the AI with our physical knowledge blending in the NWP, that really is the best approach that we've been finding to enable advancing applications in our environmental prediction. So thank you. Thanks very much, Suelen, very comprehensive talk. It was nice to have the private sector applications in there and overview of those. Thank you so much for carrying all the data because often, yeah, you need to know the people and talk to them.