 Annalisa Braak with Professor Georgia Tech and Sliva Kocher. Annalisa was another XICPP member and she went on to a very brilliant career in the U.S. So Annalisa is going to talk about Tropical Teleconnection in CSM and reanalysis using the approach. This is oceanography. So I kept my climate interest on the side. And last week they were working, collaborating with Annalisa Braak in the College of Computing and trying to develop new tools to look at model output. So here I'm going to present some... Some momentum, momentum between the level of three years ago. And then we find and at some point, throwing away what we had and kind of a visit is easy. So we try to look at quantum models and really try to focus on that connection. So it tries to use quantum analysis to figure out how the connections are reproduced. And it's worth that a student of mine, Abitio, is doing. And Hanos Leine is interested on the cloud side of things. So I'm going to try to explain you something about delta maps, which is this method, the name that we have given to this method. Discuss a little bit the robustness and the test that we've done to show that this more or less work. I will not go too much in the details of that because otherwise I'll spend all my 40 minutes just going into the robustness analysis. And then I'll show some preliminary application to CSM and the large ensemble. And the reason for that is try to figure out how some of the tropical interconnections are reproduced in that model and what are the problems and see if we can use the network analysis to kind of identify the problems and possibly diagnose what's going on in that model. So why we wanted to look at climate networks? Because, I mean, we have so much climate data coming out from models that trying to develop some data mining algorithms may be useful. And talking to the computer community, a computer scientist community, it's kind of interesting. So our first issue when I started talking with Constantine was like, well, can we do something better or slightly different than UIF, essentially? I mean, UIFs traditionally are old, but they have problems. I mean, they give you a lot of information, but they also have problems including the fact that, you know, they are autogonal. And we wanted to develop... CMIT5 was coming out when we started this collaboration. And so we wanted to develop a method that allowed us for intercomparison of models. And in fact, we did publish a couple of papers looking at pretty much the whole suite of CMIT5 and how they simulate tropical connections all the way to 2300, which is kind of interesting because when you pass 2100, they really start to look different to some of the models, the few that have one member continuing. And then the second step, and we just started in that one, so we're not really talking much about that, it's trying to use it for attribution study. And we really want to look at the aerosol problem. So where aerosols start changing teleconnection patterns if you have them in using these kind of tools. So DeltaMaps, which is this network method, works on two... There are essentially two steps in order to get the network. First of all, you start with a domain, a grid of domain, and you need to reduce the dimensionality. So we try to identify functional components that we call domain of the system, which are contiguous regions. They may overlap and they have to be homogeneous under the underlying variable. So on the side, you can see one map that comes from SC anomalies over 35 years. And those are the domains that the network that I will define next is going to identify for that system. And once you have identified the domains, the domains are going to form a network and you can establish whichever link exists between the various components of the various domain of that network. And those links have a weight, and you can calculate what the weight is, and that is essentially linked to the magnitude of their interactions, so it's essentially the covariance between the various domains. And you can also identify interactions that are lagged. The network, the identification of the domains happens on essentially one dataset. So you take the monthly anomalies of SST over 35 years or over 20 years, whichever you want. But then you can also identify interactions between the domains that you identify that have a lag. And the other thing that you can do is that you can define a strength of that domain, which is essentially what is color in that map at the bottom. And that depends on the sum of all the links, so all the interaction that that domain has with everybody else. And so obviously if you take the SST and you define the strength for all the domains you identify, the one that corresponds to ENSO is the one that is going to be the strongest, because it's the region, the domain that is teleconnected the most with everything else. But you can also see what the other areas, other domain give you. And then you can also plot the way those domains interact, and this is the last, which is really the network, the last plot on the side. And what you see when it comes to the SST, for example, is that ENSO is the one that has the strongest links with a lot of other areas. There I'm plotting essentially the seven strongest areas. But an interesting thing that appears is that the tropical Atlantic to the south of the equator, which is this area here, is actually preceding the ENSO area. So it's connected. There is a link. But there is a link that goes from the tropical Atlantic into the ENSO area that has a lag of 8 to 10 months. That's something that Fred has studied quite a bit. There are several papers explaining what's going on there, and it's a Gil-Matsuno kind of teleconnection. So if you have SST anomaly in this area in the spring, you modify the worker circulation, you modify the convection in this area, you modify the worker circulation, and you tend to modify the winds. I'll show that later over the Pacific. And so you can, if you have positive anomalies here, you tend to weaker the winds along the equator and the divergence of the winds along the equator. So you have less appalling, and therefore you generate... Sorry, if this is cold, this becomes warm. So if this is a cold anomaly, you get less appalling, and therefore you get warmer SST and vice versa. And this is one of the teleconnections that models have the hardest time to capture for various reasons, including the fact that usually here they have a large error and a large SST bias. So I will focus on that a little later, which is why I'm telling you a little bit more about that. Going back on the network side, so we want to identify the domains first. They have to be contiguous, they have to be functionally homogeneous, so the grid cells that are inside the domain, they must participate in the same dynamic effect. They can overlap, so you can have overlapping domains, and they can have interactions that are both weighted and lag, and so you also keep a sign on the interaction that they can have. So how we do that, we define a neighbor of a grid cell, and that neighbor can contain four cells, which is that case, because your cell that you are considering is the one identified with the I, and we identify a neighbor that is given by four cells, or can have six, eight, twelve, whichever you want. And we did test the robustness of this, and it's very with the data set you are considering, so you have to play a little bit at the beginning to figure it out for that data set, which is the ideal neighbor. You use that, you change K, and you figure out how very easy your network, and we have a matrix to identify that, until you find that around certain range of K, variations are not large anymore, or very, very small, and therefore you know that that K is robust. And then you define, you calculate the homogeneity of a grid cell I, calculating the Pearson correlation around for the neighbor of that cell, and here it's the Pearson correlation coefficient for that cell, and then you establish the homogeneity of the domain, calculating the Pearson correlation of all the cells that belong to the domain, and you establish what is the minimum correlation, Pearson correlation that you want. Usually we require that everything is 99% statistical significant, and so that gives you a parameter delta that you can use to test, so that all the correlations inside the grid are, all the grids belonging to that domain are correlated with each other, at least up to that. And we find domain cores, so the epicenters, a grid cell at which the local homogeneity is maximum, and it's obviously greater than delta, and then we go around the epicenter and we set the threshold for the homogeneity, which is usually, everything needs to be 99% statistical significant, so there is essentially just the alpha, and we find the domain, the grid cell, that satisfy around that epicenter the criterion that we set for the homogeneity. From the beginning, you don't know what are the boundaries of the domain, so you have to identify them, and that's what is called a greedy problem in computer science, because there isn't an optimal choice at priority that you can make, and so you essentially make, we created an algorithm that makes a locally optimal choice at each stage, and then the hope is that you find the global optimum, and we identify first the seeds, and then we expand and merge until we identify all the domain, and I'll give you an example next. So this is an example. We have found the seeds or the cores, so the regions that have the strongest Pearson correlation with all the neighbor cells, we've been arranged, defined by that gamma that we set, which here is 4, just to make it simple, and then we start looking for candidates around it where we could expand each of those domains, and when possible we merge them. So those are the correlations between the seed and all the cells that are colored in the same way. So for example, all the blue cell, the correlation that is 08, all the red one is 08, and now we're going to test if those one and those ones are also correlated enough that is higher than delta, and so this would become one single domain, and then we can also expand to the other group. So essentially this is how it goes, so the first thing we merge those two, and once we merge we have that this whole domain has a lower correlation to 08, but everything is still significant. Here we set delta initial is 05, so to simplify things there's an easy number, this is just an example, then we start to expand around that orange area over this one, and we figure out what the domain become after the first expansion. We continue doing that multiple times until we essentially arrive to the point where there is no further merging or no further expansion. So the great points that are white means that they're not participating to any clear dynamical interaction with anything else, so they don't belong to a domain, a great point doesn't necessarily belong to the domain, and those ones are the two largest domain that we could find that are interacting. Then we define a domain signal, and of course we take into account the latitudinal changes so that we essentially determine what is the signal that the domain is doing, and we look for a correlation between values domain that can be with a lag, and we define how much that lag can be. In this case we usually look at plus and minus 12 months, and we also test for the statistical significance of those lag correlations using the Barlet formula so that we can exclude uncorrelated signals that are producing spurious autocorrelations, essentially. So we get rid of some of the autocorrelation problem that in climate fields is pretty strong because you can have domains that are close by that are essentially autocorrelated and then you're getting a spurious correlation out of it. Then we say that domains are connected if there is at least one significant correlation, and we define a lag as the range at which the correlation is significant located within one standard deviation from the maximum absolute correlation. Then we catalog all the edges, so all the links between one domain and another one as undirected if they essentially there is a lag zero in between, or directed depending on lags. We can be positively lag or negatively lagged. Finally, we weight the links using the covariance matrix between the various domains, and so here we take the correlation in absolute sense, and then the edge weight captures the magnitude of the signal between the two domains. We also know which is the sign, so we can have weights that are positive and weights that are negative, so positive and negative correlations, and at this point we have a final network which is weighted and that is lagged. There has been quite a bit of work on climate networks in the last 10 years. One big difference of this system is that we don't prune the grid cells, and this has been something that we had to work quite a bit with because what most network system, network methodologies developed so far, what they do is that they take the grid cells and they check for correlations between that grid cell and all the other grid cells, and if one grid cell doesn't have a correlation above a significant threshold that they set a priori, that grid cell comes completely out of the network a priori. It's not considered anymore. If you actually look at the correlation matrix for any kind of climate system, that is extremely sensible to the threshold that you set. So essentially, if you take one model output on one reanalysis and you take another reanalysis and you do the same job of the same threshold, you are going to get two very different networks at the end. So it's not robust to the pruning. While in this case we essentially don't prune, we keep all the grid cell, we eventually have grid cells that don't belong to domains. So this is an example of the output of this network that I gave you before. This is the other SST for the period 71, 2015. This is the strength map for the domains that this network finds, and that's the network with the connections and the links between the strongest domain, and what you see when you see numbers is because there is a lag, or a predominant lag at least. So I'm putting them there. So you can see that, for example, the tropical Atlantic is linked to Anso with a lag that is always positive, so it's always preceding the activity of the Anso area. The same is true for Anso versus the North tropical Atlantic. But with Anso preceding what the tropical Atlantic is doing. While the South tropical Atlantic is preceding what Anso is doing. And here we are considering essentially the South Pacific Anso, the Horseshoe patterns, the Indian Ocean, and the two tropical Atlantic areas. One thing that it's nice is that if you take the signal, which is again the find there, for the Anso domain identified by the network on its own, and you plot it and you do the same for two other analysis where they look the same, which is good. I mean, it's over a period where we had satellites. We would hope that. But it's also behaving like the scale first principal component of the UF of the same system. So what you are capturing is eventually more than what the UF can give you, but it's not different than what the UF can give you in a way. Those are two different analysis. We're just comparing strength math to give you an idea. So if you just look at the principal component for the Anso area, for this one and that one, as you saw before, are almost identical. The correlation is 0.99. However, if you start looking at the strength math, some difference shows up. For example, the aliases tend to have all the tropical teleconnection from Anso is slightly stronger than Kobe in the way they are reproduced. This is true for the Indian Ocean. It's true for both Atlantic area and it's true for especially the south of the Osho pattern. So even if the Anso, the principal component, are identical for those two data sets, there are some differences in the way the teleconnection in the SSD are reproduced by those two data sets. Those are the networks for Adelaide, Kobe, and ERSST, which has a different resolution compared to the other two. What you see, pretty much what I was mentioning, this tends to be stronger. So for example, the connection to the South Pacific or the Indian Ocean are stronger here. The color is slightly lighter. It's closer to higher than 0.7 and it's slightly less there. It's red even if it doesn't look red. All of them have the Atlantic being essentially the one that lead Anso. All of them. Yeah, it's true. And the reason for that, and the reason we didn't plot it here, is that essentially it's a cluster that appears exclusively on the Adelaide SSD analysis. Kobe and ERSST essentially do not distinguish the warm pool as his own domain. It simply does not come out. So I do have the network for this one, which includes the West Pacific and the connection from the warm pool area. I didn't put it on because it complicates because of course it's a strongly connected area with everything and then you don't see that all with the other one. Interestingly enough, CSM doesn't have it either. It simply does not come out as an independent domain in any of those networks. So here I'm going, I'm starting to look at what CSM does and I'm just plotting three of the 30 members that we analyze in the large ensemble. Also because they kind of look similar. What you see in all of them, it's of course the Anso area is the strongest one that appears in all of them as well. The Indian Ocean is usually a very strongly, tightly connected. The South Atlantic essentially does not appear as a connected area almost at all. It's sometimes white. It means that it's essentially not responding or not linked to anything on his own or it's dark compared to what is in the analysis so it's much weaker. Most of the areas are pretty well captured. And what we can do here is also do a correlation matrix and compare the domains in one network and the domains on the other, both for the strength and for the geographical location of them and the extension of them and therefore kind of measure the similarity between those networks. And CSM is not doing too badly, especially compared to what some of the CMIP models did which was worse. When we look at the actual networks, this is again, this one is at ASST, the other ones are three of the members of this CSM. What it's in all 30 of them, none of them as the South Tropical Atlantic preceding an Anso activity. None of them as a correlation that goes from the South Tropical Atlantic into the Pacific area. Some of them, about a third, as actually Anso preceding the activity of the Tropical Atlantic or the Tropical Atlantic responding to what the Indian Ocean is doing. But it goes in the other direction essentially. All of them tend to be very strong in the Anso connection. So the links from Anso into the other tropical area tend to be extremely strong, which is known because the Anso strength in this model is very strong. It's essentially stronger than what it should be. The variance is very large, the linear events are really strong. And so this is seen also in the teleconnection. But the structure of the network generally is, it's quite well identified. We've also at least some of the relationship properly captured. They're all anti-correlated in relation to the horseshoe pattern. There is a very strong correlation with the Indian Ocean. There is a correlation that is some kind of lag often with the Pacific. So it's not doing too badly. Here, it's when we start looking at this net correlation, which is essentially the correlation between different network structure calculated between two agent metrics. So we are really comparing the metrics that comes out from one network with the metrics that comes out with the other network. And therefore we are measuring how different are the structure, the topology of the two networks. And despite the flaws in the tropical Atlantic areas, most of the runs do a pretty decent job. So we are using as ground through the Adelaide SSD. So we are comparing to the network that we got from the Adelaide SSD. This is where ERS SSD compares to. So close to one. One, of course, would be perfect match. Colbert is a little bit lower, and indeed it's the one that has lower strength in all the links from the end. So it's not surprising. Several of the runs from the large ensemble do a very decent job in reproducing the structure of the network, the topology of the network. Few of them are not in here of the 30. And the reason for that is that they tend to overestimate the links between ENSO and the South Pacific. And so the South Pacific area, it's together with ENSO. Essentially, if you go back to this map, what happens is that this area, it's completely attached to this one. Appears as a single one. It moves together with this one. And therefore then the net correlation is close to zero. Because it's a strong enough area and it's a strong enough difference in the fact that one has a separate domain there that the other one cannot identify that the net correlation, the comparison with the matrix goes to zero. You can correct for that and essentially assume that you put a boundary between the regions of the domains and you do it artificially. But if you don't, simply the net correlation is very low. This is a plot instead of the strength of the ENSO domain. So we calculate how much are the teleconnections in absolute value and how strong are the teleconnections in absolute value from that area. And we sum them all. So we sum all the connection that ENSO with the ENSO area as with any other area. And we compare that to all the members. And this is also, you can see that the majority of the runs overestimate those connections. But it's not doing too bad. This is at the SSD. This is Kobe, which was lower. We already saw that from the network maps. And this is the mean of all the CSM runs. So it's actually not doing too badly despite the fact that in the strength for at the SSD, there is a reduction associated with the fact that the, no, there is that the self-topic anti-contribution that goes into the sum in the CSM because the runs don't have it. This is another plot. Here we run the networks for 10 of the ensemble members. We run it for 30, but if I show 30 lines it's almost impossible to see what's going on. And we run it on 45-year periods from 1920 all the way to today, 2015. And this is the evolution of the ENSO strengths in Kobe in at the SSD. It's kind of interesting that the difference between the two analysis is actually higher in the last period than it was before. And this is for all the runs that we analyze for CSM. There is a trend towards larger strengths in recent time compared to what was at the beginning of the 20th century. This is true for the analysis. It's true for most of the runs. Not for all. There are a few runs that start strong in CSM and continue strong. One thing we did this exercise, I don't think I have the, we did this exercise going all the way to 2100 continuing for the models. One thing that we are trying to explore is why those ones that don't have, that have a large variance but high strengths and kind of keep it constant through time continue to keep it constant through time to 2100 while those one continue growing. And it's about one-fifth of the 30 runs that have this characteristic. It's pretty much constant through the 30 runs. So we found six over out of 30 that start with and so very strong at the beginning of the 19, of 20th century and continue all the way to the end of the 21st century. While most of the runs start lower closer to the analysis and then grow in time often too much because on average this has grown more than what the analysis have done. And then I just want to show some preliminary result of the investigation we have done of the problem of the Atlantic. So this is from a paper that Fred Carter Paul et al. appeared in 2015 and kind of summarized how the tropical Atlantic as as the anomalies pretty much at the equator immediately south of the equator is essentially the Atlantic tree region the Atlantic Nino area it affect the circulation you get warm anomalies in the Atlantic convergence on the Atlantic modify the worker's circulation you get divergence on the Pacific more appalling colder waters you tend to go into La Nina kind of preconditioning and vice versa for cold anomalies and you get also response in the ocean and you get rosby waves in the atmosphere essentially doing the job and what you see when you look at the lag correlation between the Atlantic SSD and the ENSO SSD calculated using the domains identifying the network is that indeed you have that ENSO leads but with a very weak correlation over the whole domain and the Atlantic leads with a much stronger anti correlation and that's why it's essentially coming out as the Atlantic leading overall in this signal but when you actually look through the whole 12 months more and less in the lags that we calculate there is also some leading of ENSO in this so following ENSO essentially you do have a response on the Atlantic but the response on the Atlantic is weaker essentially than what the Atlantic is able to do on the ENSO domain so what we did with this was to split the domain in two regions I'll tell you why we choose those two regions in a second and what happens is that if you take a upper half of that domain what you really see is that the leading of the Pacific is very weak and the leading of the Atlantic becomes stronger is the blue line well if you take the southern region which is the red part is actually a more of a leading from the ENSO side and less of a leading from the Atlantic one and the reason for that is that if you take a network and you build a network on the clouds and here we use the Mera2 data those two areas show up very nicely and so in the clouds the signal it's really separated so those two areas that in SSD have essentially the same they are strongly correlated because there are currents ocean currents and therefore you do have a dynamic that connects what's going on in the Gulf of Guinea with what's going on along the coast of Africa for the south are essentially separated in the cloud response in the atmosphere and in fact it's essentially the opposite and you have that in the northern component you have an opposite signal in response the clouds on the ENSO area versus the clouds on the northern component of the south tropical and equatorial Atlantic behaves completely in the opposite way than the other one and it's really the ice cloud component you don't see it in the water side and this is the total and you can see it also in the total and you can see that it's separated by the ice fraction so what happened in CSM and we haven't finished the analysis on the clouds but it seems to be quite consistent is that essentially you don't have the separation and what really you get in CSM is that you get this area in the clouds to be the dominant one and the only one that counts and the reason for that it's in part that you have a strong bias before you're not doing the dynamics properly and you're not have the obvious response from the atmospheric side but it's also that your response to ENSO is strongly dominated by the thermodynamic effect so this is an example we split the area also in the sum of the runs we did it for all the runs I'm just showing a couple here and when you start splitting and really just looking at the northern part or the Atlantic Tree area some of the runs have the signal but it doesn't show overall as a global change in SSD dynamics and what really dominates it's the signal in the mean the model response in the SSD anomaly of the whole area what really counts is what the sounder part is doing it's essentially dominating the overall response of the SSD it's the south which is dominated by the thermodynamic effect from ENSO and here we've done it for more than that some runs when you split it and really just look at the Atlantic Tree area do have the right sign of the interconnections with the Atlantic preceding but most of them don't it's like 20% of them really and what they all have is a very strong impact from the Pacific into the Atlantic which is also seen here this is the response of the large ensemble in precipitation this is a work that E and Sobel and Klara Desert did and just published this here so they took the precipitation in various basins and they looked at how the CSM reproduced in the large ensemble reproduced this precipitation signal when you have a couple model when you have just a slab the same atmospheric model the CSM atmospheric component but couple to a slab ocean on when you have climatological SSDs and what you can see is that for the Atlantic especially the ocean is essentially not doing anything dynamically all the precipitation signal comes from the slab essentially so it's completely teleconnected to what the Pacific is doing it's essentially responding to the anomalies that are on the Pacific side and in fact if you don't get the signal if you have climatological SSD in the Pacific with climatology only on the Pacific but you do get almost completely the signal that you have just a slab the green line so the blue is the model the green is just a slab and the black are the observations and this is essentially what this model is doing is overestimating the thermodynamic response of ENSO it's too strong in this this is data but this is the process that the model is really overdoing and therefore as a response in precipitation that is essentially only linked to this the cloud investigation but right now we have only two members analyzed because we have to download the clouds for all the 30 levels and the pre-processing it's a little bit heavy and this is essentially what we're seeing so it's a very strong response that follow this but doesn't have any of the dynamic feedbacks so I have essentially none of the wave response or too weak so this is dominant so this is what you're essentially seeing in the natural city so just a partial conclusion that we have is that the teleconnection from the Atlantic into the Indian Ocean is quite fragile there are two ninjas that are going on in different basins and they are both influencing each other they can have a destructive or constructing interference and in CSM are essentially cancelling each other the key ingredient to simulate the Atlantic variability is to be able to reproduce the ocean atmosphere wave dynamic that's essentially not there because it's completely killed by the thermodynamic ENSO connection and so the thermodynamic response to an Inyo event dominates we have seen this in two members we are we still have to confirm it for all the others and essentially what happen is that when you look at your cloud network your cloud is really the one that shows the problem because you don't have those two regions you essentially only have the dark blue extending in the clouds on the top one so it's the same domain in the network that comes from CSM versus the one that is in the observations where the two domains are separate and behave separately and this is everything I have