 Our next speaker is Sue Ellen Hopped. Sue is an NCUR Senior Scientist and the Deputy Director of the Research Applications Lab, RAL, at NCUR. And Sue also gave a great talk at the colloquium a couple of weeks back. Thanks, Sue. Look forward to your talk. Okay, I want to confirm that you're seeing my slides. Yeah, it's full screen, but it's cutting off on the side edges for me. Okay, I clicked, did that do anything? Yeah, this works. Great, thanks, Anish. And thanks for inviting me. We've heard lots of talks this week and during the summer school about advances in physics and all the different components that impact S2S prediction. Today I'm going to emphasize some of the tools that we can use to make our life easier as we're trying to advance these physics, specifically machine learning and verification systems. So we'll start with the AI and machine learning as a way of thinking about post-processing. This part of the talk is going to be very synergistic with what David John presented earlier this morning. And I want to talk about some work that emerged during a workshop that I attended in October 2019 when we were still attending things in person. And it was the Oxford Conference on Machine Learning for Weather and Climate. And as part of that, we had several breakout groups. I had been asked to lead group four, which was emphasizing post-processing. And the idea here was to think about how can we use post-processing effectively and what can we do to try to push it into more operational use? So the idea here is if models do have systematic biases then we can identify them. If we can identify them and quantify them, of course we should be able to correct for them as well. And some of the easiest methods have been developed back in the 70s, model output statistics, Golan and Lowry. But those methods have evolved so we're now using machine learning approaches. And there was a question about that in David John's talk and really it's become ubiquitous and part of most forecasting systems, at least for any applications now, to have machine learning as part of these. And I talked about that a bit at the summer school, what industry's doing about it. We can do ensemble calibration with machine learning. We can do regime dependent prediction by separating into EOFs, using K-means clustering, self-organizing maps, et cetera. And we can directly quantify the uncertainty using machine learning. Some folks like Vladimir Krasnopolski have built artificial neural network ensembles directly and then folks like Luca de la Monica have popularized an analog ensemble approach where you can build the ensemble based on past predictions using high resolution models. Here I'm just showing a diagram of a typical method where we can bring in a multi-model ensemble. We can correct each of those using a MOS-like approach with either machine learning or multi-variate statistics. We can blend them optimally. We can include vectors. There's lots of things you can do to improve prediction in this method. And applications often drive the post-processing. A couple of weeks ago, when I talked about what industry does, they have clients who want them to do a better forecast than they can get from the national centers. And of course they bring in machine learning, but there's researchers obviously doing research on that too. A few years back, we were asked to do some machine learning as well as model improvement for solar power forecasting. And through a project with DOE, we really did blend the machine learning with the physics improvements to come up with better predictions for a particular variable. And in this case, a variable that can't be directly assimilated easily. So when we strive toward merging the information, we end up with user-relevant information. And that's what we often strive for so that an end user can make decisions and we can support that with these machine learning post-processing approaches. David John talked about using physics constraints for writing the loss functions. I'm not going to go into that in more detail. He did a great job of explaining some details of that. We always get asked questions, what happens with the long-term forecast when you have trends? You can basically work in terms of anomalies on a detrended database, then add the trend back in as we did for a particular project we did a few years ago looking at climate impact. Deep learning is really showing advance in post-processing. And I love showing this example from Will Chapman and his team of advisors where they were able to use a convolutional neural net on top of GFS trained to mirror reanalysis data. And they found that the big change was a phase shift and using this convolutional net by implementing that phase shift which is really hard to do with other methods, they were able to significantly improve prediction of atmospheric rivers, as shown in integrated vapor transport. And additionally, we can provide substantial cost value by using these post-processing approach. For example, I talked about the analog ensemble, Luca de la Monica wrote a paper where he showed the huge computational savings you can get by building an analog ensemble that verifies in a reliable way, statistically reliable way that allows you to put your computational costs into having a higher resolution base case so that you don't have to run as many ensemble members. So the team talked about what is needed to really move forward in deploying post-processing, machine learning post-processing for weather and climate. First of all is trustworthiness. And again, David John talked about this. You have to have proper benchmarking, also interpretability, going beyond that black box to explainable methods, things like input permutation for feature importance, saliency maps, backward optimization. I love showing this example, David John on predicting large hail and then using this saliency maps, backward optimization and identifying that this dipole structure is one of the structures that produces the large hail. Well, is that realistic? And gee whiz, Andy Heimsfield back in 1980 identified this dipole structure as a feeder-seeder mechanism that did produce large hail. Now we need data to be usable and bringing out the fair principles. We've heard about that earlier in the workshop, findable, accessible, interoperable, reusable, important. We need to really push that technique. Applying the right methods for the problems, very important. We can't just bring in our favorite method every time and we can't just try something new that doesn't apply. And it's important to quantify the uncertainty in addition. So, I thought Marika's talk, she was really great in looking at predictability in terms of, is our spread less than our signal? And these are very important aspects. So the next thing the group addressed was what is a roadmap? What are the actionable things we can do to achieve this vision? And the first thing we decided, we really do need data repositories. There's lots of people out there applying machine learning on different problems. Some of them do a great job, most of them do, but there are some folks who are just trying it for the first time. They pull off a Python library and just be dating without really knowing what they're doing. Really need to be able to compare to some benchmarks. So having a data repository will facilitate improving post-processing techniques, standardizing data, again, getting back to the fair principles, bringing in these interpretability methods, needing broad research. David John mentioned the Center for Trustworthy AI and Weather, Climate and Coastal Oceanography. It's led by Amy McGovern at University of Oklahoma, lots of partners, agencies like NOAA included, universities, industry, including some of the big players like IBM, Google, NVIDIA, VISA, et cetera. Having a group like this making advances can really advance how we use machine learning for post-processing and other applications. Metadata in our data, this is really part of the fair standardization. If we have standardized metadata and trying to understand what we have, including having quality control data, labeled training data, very important for advancing the use of machine learning post-processing. And then finally, coming up with a database of failures of AI. So many of us are reinventing the wheel, trying something that somebody else found didn't work earlier. What a waste of time. We can do better if we communicate our failures as well as our successes. So the group decided to put our effort where our advocacy was, and we did build a repository as we were advocating for. You can go to this GitHub on NCAR of the data stored at UCSD. Anish contributed his MJO and P&A examples. Will contributed the integrated vapor transport for the Atnispiric River examples. We have data from ECMWF two meter temperature ensemble over Germany and UK surface road conditions. So we encourage those of you who want to try this out, go to this data repository. There's Python assessment tools. We tried to make it really easy to access this data and try your own methods and compare against the benchmarks that are on the site. Now, for the second part of my talk, I want to highlight some tools for evaluating and specifically the model evaluation tools. And this is part of the developmental test fed center which is a collaboration between NCAR, NOAA, some DOD agencies, multiple universities. And I've boldly stolen some slides from Tara Jensen and her Met Plus team in a recent talk that she gave on how their applications are now going into the S2S realm. So Met, the basis of this is really a composite of over a hundred traditional statistics and diagnostic methods for both point and gridded datasets. There's multiple interpolation methods. It can be applied to spatial and temporal scales and it's developed for easy sharing of the configuration files, lots of users, both in the US and international. It's based on GRIB, GRIB2, net CDF format. You can use it for gridded or point. Now, Met Plus is a suite of Python wrappers about around the core. There's viewing systems, there's Python plotting scripts. You can use as much of your own code versus as merged with as much of Met Plus that you wish. And it traditionally has focused on short range weather, medium range weather, ensemble prediction, lots of variables, and like I say, recently added capabilities for S2S. So what it does is it allows spatial analysis. There's neighborhood methods for upscaling and downscaling. There's masking so that you can allow separating out night day, you can pull out specific latitudes that you might be interested for your particular problem. There's ensemble methods where you look at scores, CRPS, rank probability, probability, integral transform, all kinds of spread skill, confidence intervals, as well as more traditional statistics contingency table. There's a big mode application here. Mode is method for object-based diagnostic evaluation. It allows you to identify objects in your model data and then look how they change over time, how they might morph, stretch, move, et cetera. A large area to study tropical cyclones, interpolating to common grid so you can compare data and observations very easily. Now, some of their, if I can get this to advance there. Some of their diagnostics are generally useful. They're bringing out now S2S multivariate distributions, multivariate objects, spatial distribution of errors, space-time coherence spectra, looking at systematic errors, using EOF analysis, all as part of these tools. Again, very easy to apply and merge your own Python code with this downloadable open source MET plus. I'm going to highlight two basic examples. One is related to MJO, first using an index outgoing long-wave radiation-based MJO index, or an OMI, where you can use either models or observations of the OLR. You can do these pre-processing steps where you may choose to look at a particular section of your grid, you could cut the domain, for instance, to the equatorial section. And then the calculation will automatically filter for 26 to 90 days, regress the OLR onto the EOF patterns, retaining the frequencies of the MJO, and it'll normalize the principal components. So that an output, a possible output, is a phase diagram, such as this, where you can look at the evolution in time of the different phases of MJO in terms of OMI. Now that's not the only index they've used, they also have the real-time multivariate MJO index, or RMM, in addition to OLR, now including both 850 and 200 hectopascal windfields as part of the basis of your EOFs. Again, you can cut out the part of the grid that you're most interested in. The calculation will remove the 120-day means so that you're looking at anomalies. Normalize by the square of the variance, regress onto the EOFs, normalize by the principal components, standard deviation. Again, getting nice phase plots of the evolution of MJO. You can compare predictions made at different times, as we see here, different colors. Great way to easily put in your data. They do have example data based on NOAA's UFS in there with some observations as well to go in and try, build your own diagnostics in addition to what here, but based, you can download the code and edit it, edit the Python code to whatever you want. The other example I'll look for, I'll mention is blocks. They've looked at various things you might want to analyze about blocks, specifically central blocking attitude, latitude defined here, you know, high pass-builder nine points weighted by a cosine. And then there's also instantaneous block latitudes, grouped instantaneous block latitudes, frequency of blocking, et cetera. You can have, again, you can input either models observations, both best to have lots of years if you're going to analyze statistics for blocking. You know, you can re-grid to whatever, you know, here's an example of one degree data. You know, you can look at running means, default is five days, the idea is to compute anomalies. And it does automatically look at each of these four objects that it wants, and it plots the CBLs, the IBLs, or, you know, the actual blocked events. So an example for the instantaneous blocking latitudes, okay, they actually have implemented, you know, identification criteria, the Peli-Hoskens method, which looks for the reversals in the geopotential height gradients, allows for offset from the, you know, central blocking latitude. And it looks for Easterly equator word flow. They actually pulled the algorithm from Barnes et al. And I copied in the equation that's actually used, so it's defined as blocked. If this function is greater than zero, you can plot the 500 millibar height average here. You know, you can see especially you have the blocking latitudes, the important blocking latitudes here over Europe as we know. And when you look at the frequency distribution, again, not surprising that you get just west of Greenwich for the highest frequency, second highest. If you look at this, that would be off the, you know, just off the west coast of the US. So identifying blocks in your data, very useful. And then group instantaneous blocks as well as blocked events, you can plot, compare, and it writes out met files so that you can then analyze further using your own. And, you know, encourage you to go look at this. You can download it from GitHub. I'm not the best to ask questions, but Tara loves to respond to user questions. So I put her email on that. And just in summary, you know, really these tools are making it easier to make progress in, you know, many areas, both weather, climate, S to S. These machine learning tools, we're finding that interpretable deep learning is making inroads. Data archives can really help advance the science. And these rigorous, easy to use validation tools. So you don't have to reinvent the wheel and code the same validation codes. And, you know, these ones have been highly vetted, debug multiple times and are used by 3,500 users. So I think it's easy to count on them being accurate. So thank you. Thanks very much. So we have this really interesting book, the machine learning site, but also the Metplus tools that can be used. Chidong, you have a question for Sue. Yes, I'm very glad you brought up Metplus. And I was very enthusiastic about these tools until this summer. I had three interns, one computer major, and the graduate students all worked together, tried to learn Metplus in a month. And we could not figure out how to use it. And we feel like we must meet some critical steps, but we're clueless. So I just wonder what kind of devices you can offer that maybe next summer we will not repeat this failure. Wow, that's too bad. And I'm sorry to hear that. Now they do have Metplus workshops on an annual basis. And if you could send somebody in your team to the workshop and I'll be glad to get you information on how to sign up for that workshop, maybe that will help in the future. The other thing is to directly engage the Metplus team and maybe they can help you jumpstart, to get going a little bit more quickly. So that's what they do is they engage with users. So instead of letting your interns spin their wheels, I really encourage you to have them reach out to the Metplus team and maybe get some good hints on how to get started, how to download the data, the tools, et cetera, so that they can be off and running much more quickly. Thanks. Thanks. Thanks, Julian. Judith, you have your hand raised. I'm putting a note to connect you with Tara. Thank you very much, Sue Allen. I have a machine learning question. It's a very fundamental one and I would be interested in your perspective. And this is that I think we have had great process in estimating model errors or errors in post-processing and improving forecasts in a post-processing step. Where I'm straddled with is how we can use machine learning to find unrepresented processes or erroneously represented processes in our models. And I was wondering if you could comment on what are the first steps to do more along these lines. And Libby has been talking about this relevance map and I understand this, but I wanted to get your perspective on it. Yes, and I think that really you're getting at the edge of research in collaboration between the modelers who are writing the physics routines, the machine learning folks. I think that's why a lot of progress is being made by many folks who are trained in physical science and then have learned the machine learning in parallel or after the fact, like I did because when I was in school there wasn't a machine learning type courses. But I think David John's paper was a great example of how he used these saliency maps, the backward propagation to be able to identify features in the models. But it was based on model data. And we often do our work on the model data because it's easy to get nice gridded data from the models. What we often don't have is observation data and to recognize what's wrong with our models we probably wanna do it in terms of observations. But even to grid our observations we often do re-analyses which automatically put it in terms of the model physics. I think we need to get to the place where we trust our observations well enough that we can go directly from some observational data. Now that we have a ghost 16 data and equivalent satellite data from across the world if we can look at some variables that we can composite over the globe from the satellite data perhaps we can get at working really with observations then compare it to our models not that there aren't problems with observations as well but even the satellite data as well but to really work with all three at once the observational data, the model data and the machine learning saliency methods that after we have trained to do whatever type of prediction we want and identifying features just like these Met plus tools do is really helpful. Thanks, Jordan. Thanks, Sue. Jacqueline, do you have a question? I saw your hand raised and then lowered. Oh, my question was very similar to you. But I would like to add something, I guess. I guess it's related to that collaboration. I know there is like this machine learning community is growing and there is also this like dynamical model community. So I was just wondering if there is this effort ongoing effort of collaboration between these two groups in a way that machine learning can help to improve dynamical models and like this feedback between these two communities because I think when we were so some students were talking about this like we're afraid that people are just gonna go and use machine learning only and they are gonna forget about the dynamical models and like we have to be aware of this issue. So I was just wondering if there is an ongoing effort or yeah. No, and do you bring on an excellent point? No, we learned a lot about dynamics over the years. I mean, back in the 60s in the days of Lorentz and Leith they were trying to figure out what were the best ways forward using the dynamical models or at that time what they were looking at as statistical learning really before we were talking about machine learning much. And the numerical methods won out because the computer power really was applied that way. We had more computer power than data at the time but a lot of the people who got into machine learning like myself were originally modelers. I still run modeling projects in addition to running machine learning projects. And there's a lot of people out there like that. NOAA in particular, they have folks working closely between the modelers and the machine learning folks. At NCAR we do that too. At CGD there's examples of like Katie Dagon who was trying to interpret her climate model data who applied machine learning to facilitate it. David John has shown examples and that Hale example was based on using model data trying to interpret the model. So you're absolutely right. Getting the model people together with the machine learning people is the best way forward to make progress. Craig, I think in the interest of time we'll move on. Janak, if you could post your question on the chat and Sue if you could reply on the chat that would be really great. I think we are 15 minutes over. So thanks a lot again, Sue for the great talk and for the discussion.