 So we'll move to our next talk which is the student presentation and two projects so the first group will be the machine learning presentation will Chapman who led the group who did a lot of preparatory work before the summer school as well. So thanks. Yeah, whenever you're ready. Sounds good. Are you all seeing my screen now. Yeah, it's not full screen. Okay, perfect. Wonderful. All right, cool. So our group was looking at exploring machine learning for Tersile two meter temperature S to S prediction and we're particularly using the CSM lens for training data and testing. In the group, Fernando David myself, Tim, Jaclyn when when and a lot of guidance from a nation Judith. So the outline for what we'll talk about is really the what why and then we'll establish a baseline metric for comparison to our machine learning methods will look at the data and specifically the machine learning methods transition to some of our model results and look at some explainable artificial intelligence results and then point towards future future work. We want to take this. So really what we want to do is we want to test various statistical models for forecasting to meter temperature skill at S to S lead times statistical models are nice because they're fast they're simple, they're easily interpretable. So let's see if we can establish some sort of theoretical ceiling for for skill so we look at a perfect product diagnostic model framework in which we're using CSM lens and predicting CSM lens we do this because it is an extremely large data set here very very long running, and particularly we use the, the PI control member which has 1000 years worth of data. It doesn't have this nasty trend. We want to establish some sort of ceiling for how good these statistical models can do. And we want to see if we can really use some interpretable methods to identify forecast when as well between me so a lot of work here. So first thing we want to do is establish a baseline method to see if we can these modern machine learning methods can compete with. So for that we use the Johnson at all 2014 forecast method which is looking at skillful wintertime for American temperature forecast out for four weeks and they just use the state of Enzo Enzo and the NGO. The way they do that they established some sort of climatological PDF here and define tercile balance at the 33rd and 60th percentile for cold neutral and warm states. And then they look at the state of Enzo, they look at the state of NGO, they see how this probability distribution shifts, and whatever the shift in that distribution is they they grab the tercile that most, most often falls in, and they forecast that is their probability to say, Okay, it's going to be hot if it shifted like this, and they evaluate their forecast with the high to school score, which we will be doing also. The high to school score is a really simple metric that's been introduced a couple times during this colloquium, but it's really saying how much better we than a random guess, a zero means that we're, we're pretty much climatology. A 100 is a perfect forecast and negative infinity is you're doing really really bad. So anything positive, we're better than climatology. And with we don't have a ton of times to go into these methods, but I wanted to say we have two really, really flexible statistical models that will be testing out in the first one is a random force specifically an independent gradient boosted classifier. So the predictors for it, we feed it the same predictors as the Johnson at all model. So we, we give the Nina 3.4 index, the RNN indices, and then we one hot encode the NGO phase which means we're just letting you know that there's a categorical here and we give it zero for when MJO is off, and then one through eight for the remaining phases. And then we're trying to predict this seven day average two meter temperature and we're making a new model we're constructing a new model at every lat long location over land. So random for us again, not super complicated but it's it's basically an ensemble of decision trees that are then voting on a forecast and what is output is three nodes the probability that it's cold, the probability that it's a neutral state probably it's a hot state. For the neural network we're doing a very similar thing. We are outputting three probability states with the same predictor variables, looking at the probability of cold, probably the neutral, probably hot, whatever this highest year we guess that is our forecast. We want to point out that we trained on 100 years of model data. So model years 400 to 500. And then we tested on model years 500 to 600 and we validated our models over here but we tested here. And so everything you'll see moving forward is on just the testing data to show that our model is generalizing it's not just overfitting to our data. Our training data. So looking at some results. Here's the three week forecast over in North America. The stipling shows where the forecast is not statistically different from zero after controlling for false discovery. You can see pretty apparently that there are two machine learning methods are more skillful over a larger domain, particularly the neural network is the most skillful metric. I want to point out that the color bar here is only going up to about 14 so this is not a, it's not a highly skillful forecast but there is definitely forecast skill here. So if you average this over the entire spatial domain and you look across the weeks week one out to week six, the neural network remains that the most skillful forecast there's a pretty stark drop off in skill in week five. And the random forest sort of second place and then this distribution of Johnson method is sort of our lowest forecast skill. So if you look at this spatially tells a fairly nice story of fairly same thing but we're we're seeing again that this neural net forecast out to first four weeks at least is very, very skillful compared to the other two methods. And then we see sort of a skillful drop off. I want to point out this this region in northern Mexico here because this will be important for some of our interpretable AI methods as we walk forward. We did look at other regions we didn't just focus on North America. We had one group number that ran entire this all of this data which is terabytes worth of data through all these methods for South America to and we see a very similar phenomenon here, especially where you would imagine the strong and so connections are we see really high forecast skill. Something interesting that we observed is our previous skill metrics that we we showed we're looking at the entire data set but if you look at some conditional stratified forecasts, particularly looking at when Enzo is off but MJO is in a particularly active phase eight. We see a strong peaking of skill or certain regions in the model. And I want to point out that this high to key skill score is now going up to somewhere in the 30 to 40 range, whereas before we're looking at 14. There are definitely skillful windows of opportunity that we can look at here. So with that being said we wanted to sort of examine what the model was learning. So transferring from just inputting indices we decided to input full spatial maps of OLR and SST in the tropics. We're going to use this method layer wise relevance propagation introduced by barn or Libby Barnes and are in the talks throughout the week. It's just like the gold standard and interpretable AI right now or explainable AI right now in our field. And essentially what it does is it takes the probability of the output propagates it backwards and it says what were the relevant regions that gave me this skill or this forecast why did I forecast what I did. So for that highly skillful point in northern Mexico, and we're looking at the neutral state, the cold state, and the warm state and what we did is we took the most likely, we took the highest confidence forecast anything where it forecast over I think 80% that it was going to be in a neutral cold or warm state. Essentially what we got back is what we expected because we sort of baked in the answer here what we're seeing is end though. So in the cold state, this is the LRP over the right for the SST and for the LLR. In the cold state we have a nice and a Nino condition, along with really familiar or signature that would go along with this, this strong on Nino for the warm state it looks to be a linear in the SST temperatures and then again to press down here, according to this LLR map, again this is OLR composites and this is SST composites corresponding region that the network is looking. So, where do we want to go with this for future work. We know that the forecast skill is good but we don't actually know if these probabilities coming out of the model are particularly good are they reliable or they calibrated. So we have nice methods introduced by SURE at all 2020 that we can actually calibrate these probabilistic predictions really well so that way we have a little bit more confidence in what we're doing and then we want to conduct some transfer learning. So moving from CSM lens and seeing if we can freeze some of this model develop the models developed on CSM lens move those over to observations train a little bit more and see if we can transfer some of this skill into real life, see if we can leverage CSM lens to improve our, our observation. Oh, for conclusions. So we use high key skill score to assess this forecast so random force and neural nets were more skillful than the sort of baseline Johnson all method. So forecast lead times from front one to four weeks the neural net was particularly skillful but the skill kind of dropped off up to that. Using the interpreter AI. We were able to make sure that the networks were actually learning the end zone and these NGO teleconnections and we're highlighting these convective regions known things. So we're starting to push into ways to shut off and so in our prediction and then look at what the network would then look at and we've done some preliminary testing there but we won't show that here, just because it. We're still wrapping our brains around it. So let's highlight that the neural networks model ability improve these stress to meter predictions over the for over the US particularly in CSM lens. What this means for the real observations sort of remains to be seen with that. Thank you. Thanks for the move to the next groups presentation now and then come back to questions. Yeah, Sam. Okay, I'm going to share. Okay, does everyone see it. Yep. Awesome. So our group over the last couple weeks have focused on us to us model verification of two meter temperature over the conus and our group is Jordan Tyler Sam said Iksha and myself Melanie. And our facilitators were ABJ and Judith Werner. So, I think over the past couple of weeks we've all, we've all seen that as to us timescale predictability is lower compared to typical weather forecasts or seasonal forecasts, and specifically due to are specifically for heat. Waves are becoming more frequent and intense. And we've also seen a lot of record breaking temperatures within the United States over the past couple years, specifically like in the West Coast, a couple months ago, or even maybe about month ago. So what are some of the sources of predictability over the North American continent. The one we'll focus on today is the P&A Pacific North American pattern. So this pattern in a positive phase is associated with higher heights over the Western United States and lower heights over the Eastern United States and vice versa for a negative P&A pattern. P&A teleconnections due for temperature. So when we're in a positive P&A pattern we typically see warmer temperatures over a majority of Canada and along the West Coast and lower temperatures for the Eastern Coast. This is also the opposite within the negative P&A pattern. We also see that there's not much impact on temperature during the summer. So we also wanted to kind of split up by season to see what the difference is based off season. So to analyze data we worked with the package Klimpred developed by Aaron Spring and Riley Brady. Using Klimpred we calculated anomaly correlation coefficients for data sets, which is basically going to provide us a still relative to climate, with positive one being a perfect correlation zero indicating no correlation. So we worked with three sets of model output gathered from the sub seasonal to seasonal prediction project. The European Center for a medium range weather forecast model, model from INSEP, and then CESM version two, and we compared those two observational data from NOAA's Climate Prediction Center. Our work focused on the two meter temperature averages. And so we looked at the skill of different models as compared to the observational data and then also looked at the state dependence for skill of positive and negative P&A phases. More specifically, we divided CONUS the continental to US into five sub regions in order to determine the skill and effects of different teleconnections in different regions of the US. So we had a West Coast region, a Mountain West region, a Great Plains region, and then divided the East Coast into a Northeast and Southeast region. So just to get us acquainted with what these types of figures will show. So what we're going to do now is compare the three different models that Melanie and Sam just described, subsetted across all five of the subdomains across the CONUS, as well as the CONUS overall, looking at this anomaly correlation coefficient. So first here we're going to look at the wintertime. So what we can see overall is that by and large, the ECMWF is outperforming the other two models. But with CESM shortly behind that. So we thought that was interesting and, you know, kind of good that the in-house model fared so well, at least in this metric. So the additional thing, especially in the Southeast, is that we noticed that there's slightly higher skill at this week five and six lead time with a slightly higher anomaly correlation coefficient, compared to some of the other regions like the Mountain West and the West Coast. So let's look forward now and look at the summertime. We can see a pretty similar pattern shows up. We have the ECMWF doing the best. We have CESM somewhere in the middle with NSEP, more or less performing the worst of the three models. But again, we see this pop out especially with the ECMWF of the Eastern region and JJA having higher ACC and a little bit more forecast skill at that week five and six lead time compared to some of the other regions so that further motivated us to look at some of the spatial patterns of what anomaly correlation coefficient could look like. So here we're looking at the state dependence of the ACC skill within the three models associated with the P&A pattern. And essentially you'll one look at, sorry, you observe right away that there is a lot of regional variation associated with the skill. And that next slide please. And that you'll see that the P&A is actually associated with increased skill for the leads time 15 which is weeks three to four and lead time 31 days which is weeks five to six. So another observation for the state dependence is that one red shows the increase in ACC therefore P&A is associated with increased skill in those red regions while it's associated with reduced skill in the blue regions. You'll notice that ECMWF and CSM2 and also NCEP shows that that sort of temperature gradient bimodal pattern over the over conus that is associated with the P&A and that for lead 15 there's also increased skill associated particularly with the western and central us temperatures while in leads 13 one days, you'll see that there's more increased skill over the eastern coast for all three models. Next slide please. But when we do when we look at the negative P&A, you also see that there is, I think, more variation in terms of the inter model comparison. For one CSM2 stands out a lot in that you get increased skill over the entire US continent, which might explain a little bit of why you see CSM2 sort of outstripping ECMWF and NCEP at longer lead times. For both the positive P&A and the negative P&A you'll see that there is very little change I think in terms of ACC that is associated with lead one or weeks one to two. So the main takeaway from this state dependence part is that one the P&A is associated with sub regional variations in skill and that it has more skill at longer lead times weeks three to four and weeks five to six. So concluding all the things that we've done so far for this ASP colloquial, we find that the ECMWF model has the highest prediction skill at the same time looking at the corners. This coast has the highest prediction skill when we take the two meter temperature and we also see that this coast has more prediction skill when we initialize forecast in both like positive and negative Pacific North American pattern and each model here shows that there is slightly different signals when we take into consideration different positive and negative P&A pattern. So also in future we also want to do the significance testing of the results that we've shown so far because we haven't done that yet. And we also want to analyze the seasonality of state dependent verification and want to use other metrics else than ACC to get better picture of season dependency of models we've used. We also want to scrutinize the role of MGO phases on state dependent verification as we want to explore more the sources of SDS prediction for a conus reason and also we want to apply similar analysis that we've done so far with using other parameters like your potential predict and precipitation to get better and comprehensive understanding on prediction skills of different models that we've used. So with that, we're ready to take any questions concerns feedbacks. Thank you. Thanks, all your very impressive work against such a short time so thanks for a great presentation. So now we have about 10 minutes for questions and discussion for both the groups that presented now and if there are questions for the previous groups as well. Feel free to post them I see one question from Zane to the machine learning group Zane, would you like to unmute and ask. I really liked that talk well and I thought some of the LRP results in particular were cool. I wondered if you guys, especially based on the LRP maps looked at if you remove the MGO and just use and so because it looked like you know that that was a big source of skill or did you look at just the MGO to try to get at you know which one of those two pieces is is really giving you more skill. Yeah, great question. Short answer is no. It's it's on the long list of things that we want to do. My intuition would say that it was all the skills coming from Enzo but then we saw some of those figures where we're in an Enzo neutral state and MGO, you know in a particular phase was giving us a lot of skill. Yeah, so there's a lot of testing that that there needs to needs to be done in that realm. So, yeah, we're looking forward to do not. Thanks in thanks well. Other questions on. I had one for the template group. When you looked at the PNA, plus and minus and its relationship to the two meter temperature. So a will has done some recent work on Enzo forcing of PNA itself and like there being an internal variability part of the PNA versus the forced component of the PNA. So have you considered when you do the state dependent prediction skill doing some kind of a revision or like how much of it is coming from Enzo forcing PNA then leading to skill in two meter temperature versus if you just have PNA alone so you might have to do like regression and remove then so signal from it or some other method. Yeah, I know in our methodology at the very least for the neutral states we tried to make sure that they did not fall in either PNA or NAO. So that could easily be applied to other teleconnections to get neutral states of additional additional fields just so that we know that there isn't that interconnection. Yeah, and we're also aware as well that there are relationships between all of these different they're not isolated at all. PNA, NAO, MGO and so yeah and we, we noticed this also because we didn't just do the PNA sorry, we did look at patterns for the NAO and MGO but just focused more on one on PNA. Yeah, that's a good observation. Great. Thanks. You would also to point out that one of our students named Jordan defended her PhD thesis so congratulations. Jordan yep. Dr. Great. So, in terms of future work. Have all of you discussed about AGU presentations or AMS presentations and I know we'll have a longer discussion this afternoon. But are you planning to put in an abstract for these conferences for presenting. Okay, great. I guess that does still discussing. We're good. I think we're going to also. Okay.