 And let me also welcome our first speaker in this series, who is Tom Anderson. In 2019, Tom obtained a degree in Information Engineering from Cambridge. And so he is very well suited for this kind of adventures. Today, he works as a data scientist at the Artificial Intelligence Lab of the British Antarctic Survey. And in his work, he focuses on devising deep learning based methods that help to extract the maximum amount of information from these large climate datasets that people have collected. And one very important aspect of metrology and hematology in general is, of course, forecasting. And Tom is going to tell us today about forecasting one parameter that's very important in that context, namely the evolution of sea ice in the Arctic. And with that, Tom, please take it away. Great, thanks so much, Philip. You know, I've never been a huge fan of introducing myself, so it's a nice luxury to have someone do it for you. So hopefully everyone can hear and see me okay. Please, Philip, or someone shout at me if not, and we'll see what we can do. I'm going to be presenting work that I primarily carried out over that fateful year of 2020, plus or minus a couple of months. And what I was doing was developing a seasonal Arctic sea ice forecasting AI system based on probabilistic deep learning. Now that's a bit of a mouthful, so me and my colleagues have decided to call this system IceNet for short, as you can see in the title there. And as you might imagine, this is a very interdisciplinary piece of work which would only have been possible with the help of a huge number of collaborators and co-authors with whom I extend my thanks to listed there on the left. And because of the interdisciplinary nature of this, these, my colleagues hail from a wide range of institutions, including the Alan Turing Institute who funded this work and a range of other research centers. And I'm hoping that while I'm presenting a specific study today, right, forecasting sea ice with AI, I'm hoping that I'll be able to convince you that the marriage of machine learning or computer science with the earth sciences is actually a fruitful marriage that maybe members of the audience would like to explore more themselves. Great, so in terms of the fork structure today, I'm going to start off with a bit of background rather than surprisingly. So, you know, looking, introducing sea ice to those who haven't come across it before, thinking about forecasting in general and existing methods. I'll move on to look at the data that we use of IceNet and the design of the model. Then we'll look at some of the benchmarks that we use for assessing IceNet's performance. Finally, we'll get our teeth into some results and have a look at how IceNet's performance compares against those benchmarks. I'll then move on to what will hopefully be an interesting section for you guys which is looking into interpreting IceNet. So, trying to understand how IceNet is using its input variables in order to make its predictions. I'll start wrapping up by showing some preliminary results from IceNet 2. So, yes, there's already an IceNet 2. So, the idea behind IceNet 2 is it predicts on a daily timescale. The focus is much more on short-term forecasts rather than seasonal forecasts as with IceNet. The ultimate aim with IceNet 2 is actually to have it running in real time and have it be the first operational public CIS forecasting AI. And we've actually already got some funding to lay the groundwork for that. So, that's super exciting. And I'll conclude with a discussion on potential impacts of this work or extensions of this work for Arctic conservation. So, yeah, kicking off with a bit of CIS background. So, I'm sure it'll come as a huge shock, but CIS is a layer of frozen seawater. So, it floats on the surface of the oceans around the north and south poles, where the frigid temperatures mean that the seawater itself can freeze. It's a crucial habitat for Arctic mammals like polar bears and walrus, for example, and a range of other animals in the Arctic. As well as that, humans also depend on it for things like hunting and travel. I believe there are about four million human inhabitants in the Arctic. And last time I Googled, it seemed like about one million of those inhabitants are actually indigenous people. And indigenous communities have existed in the Arctic for, I believe, at least 40,000 years of the latest estimates. So, these communities have really rich histories and cultures and traditions dating back millennia, many of whom, for the coastal communities, the sea ice is a really integral part of life, not only for hunting and travel, as listed there, but also as a way to get out of the town, the domestic issues that there may be where you live and just get out into the open sea ice. So, it's a source of not just culture, but spirituality as well and a way of life. Now, before global warming started to kick in, the Arctic sea ice pack would typically expand to about 15 million kilometers squared in March and retreat to around eight million kilometers squared in September. Now, the actual value in each of those months would vary year on year due to natural inter-annual variability, but on average it would be around those values. Unfortunately, since 1979, so when the satellite era really kicked off, the September sea ice extent has lost about four million kilometers squared of ice, which is equivalent to about 20 times the size of Great Britain, so truly unprecedented levels. Looking at the time series on the right, we see that we've got a roughly kind of linear decline in sea ice extent. This is driven largely by the fact that the Arctic has warmed two to three times the amount of the global average due to positive feedback loops, which is a phenomenon known as Arctic amplification. It's about 13% decline per decade. There's also been massive declines in sea ice thickness, sea ice volume, as well as the age of the sea ice itself, but I'll be primarily thinking about the area covered by sea ice in this presentation. So yeah, we've got that context. I'd like to now just think about forecasting in general and compare and contrast these two possible paradigms. So on the left, we have the physics-driven approach, so otherwise known as dynamical models. These models are based on the laws of physics directly. You can think of them as bottom-up approaches. So they're based on causality. We discretize the atmosphere and ocean into cubes and solve hundreds of partial differential equations across them through space and time. They're fantastic tools to understand the Earth system, but unfortunately they can be very computationally expensive to run because of the sheer number of equations that have to be solved. Typically they can require hours to run on supercomputers. So in contrast to that is the class of data-driven models, so statistical and machine learning models, although in this talk I'll be thinking primarily about deep learning models. So we can think of these methods as a kind of top-down approach, where we're automatically learning the relationships between variables from the raw data itself. They're based on non-linear correlations between variables rather than causality itself. Once they're trained, they can be very computationally cheap to run, which is a nice plus. So compare that with the supercomputers required. Deep learning models can, depending on size, run on your laptop once they're trained, for example. Now just a kind of whistle-stop tour of deep learning. It was inspired by the way the brain processes vision. It's a bit of a cliche, but I like it. So our brains are receiving about four gigabytes per second of information with photons landing on our retina, which get processed to increasingly high levels of abstraction as we go down the visual processing system until we reach the visual cortex in the back of our brains where we'll represent visual information in abstract terms like, you know, grandmother chair, coffee mug, not massively inspired examples there, but you get the picture. Deep learning systems are far more crude than the brain, by the way, but the inspiration is there at least. A nice example that I've chosen here is a image captioning AI system. So there's a deep learning model that takes in images as input, raw images, and processes them and spits out a caption as output. And if you haven't seen this type of example before, it can be almost spooky how human-like it is if you just read the captions there. Another famous example comes from DeepMind. So they developed the AlphaGo AI that learned how to play the game of Go and was so good that it beats the World Grandmaster in the game of Go, which is actually far more complex than chess in terms of the number of moves you can make and the resulting search tree. So people thought this was decades away when it was solved. So yeah, rapid advances in deep learning in recent years that we're hoping to leverage in this study. Now, I've presented this as a binary, but like all good things, it's actually a spectrum. So there's some really interesting stuff that sits more in the middle between these two extremes. For example, embedding deep learning within dynamical models in some way. But for the purpose of this talk, it's gonna be primarily a binary. So looking at deep learning in the earth sciences, the number of papers using the term deep learning has shot up in recent years. And if we project the amount sitting there in 2021 so far ahead, then it's gonna continue to go off the charts. Hopefully what I'm presenting today will be one of these 2,200ish papers this year. It's currently under review at Nature Communications, but hopefully in the next few months I'll have a paper to share with you all. Now, one of these 1,750 papers from 2020 was actually the inspiration for this study. And I'll show the main result here. So it was a Nature paper called Deep Learning for Multi-Year ENSO Forecasts. And what they found is that a deep learning model based on a convolutional neural network was able to outperform a suite of dynamical models in multi-year forecasts for the El Nino Southern Oscillation Index. And yeah, I've drawn a narrow to their model here. For anyone who doesn't know, El Nino is a super large scale semi-periodic phenomenon of ocean patterns in the Pacific Ocean that stretch from Australia to South America and can be super important for global climate and weather. So, you know, can drive weather conditions in Europe even. So yeah, this was a super interesting result. And I decided that it would be cool to apply deep learning to sea ice. And I'm gonna give you a spoiler a lot here because I'm gonna give you our main results, basically our equivalent plots, roughly jumping one year ahead of reading that paper. Basically, so if people are not particularly interested in the methods that we're about to go through, at least this might pique your interest for the results, which is that ice net in blue here outperforms a state-of-the-art dynamical model in red, as well as a statistical benchmark shown as the gray line here in seasonal forecasts of sea ice, which is quite a nice result. But yeah, that's just to kind of pique your interests and putting that aside for now. Let's have a look at the types of data sets that we have and the design of ice net. So say we have the sea ice packet in a certain location looking like it does on the image on the left there. How can we measure this? Well, it turns out that ice emits microwaves and it does so to a greater extent than the sea. And satellites can actually monitor these microwave emissions. The good thing about microwaves is that we can't depend on optical information alone, A, because there might be clouds in the way and B, because much of the Arctic is actually in a state of almost constant nighttime during the winter. But microwaves will pass through clouds roughly unscathed and continue through the night as well. Unfortunately, because of the low energy levels of these electromagnetic waves, the spatial scale of information is quite low. So we can't pick out the really fine details, but we can use these observations to convert into sea ice concentration at a 25 kilometer resolution. So we have algorithms that can convert this raw passive microwave information into the fraction of grid cell covered by sea ice, which is the sea ice concentration. And I've just done a kind of toy gridding procedure on this image to kind of drive that point home. And we get this map across the whole Arctic, as you see on the right here. And we have daily maps of these sea ice concentration maps going back to 1979, with only a few gaps of missing observations. So a great source of data. So yeah, here are three months of these monthly average maps, with time going from top to bottom. However, you know, there's more to the earth than sea ice, isn't there? So we need to look for other sources of information. And we also have reanalysis data. So here I'm showing some example, other variables in the different columns here. For example, the two meter air temperature, the sea surface temperature, sea level pressure and some wind variables. Now we can't measure these with satellites. And unfortunately we don't have observation stations conveniently gridded every 25 kilometers measuring these things back to 1979. But what we can do is, the way reanalysis works is it basically interpolates observations, surface observations or atmospheric observations using the laws of physics across space and time in order to build up these gridded and temporally consistent data sets, which give us a good picture of what the earth is doing at any one time. Unfortunately, this only covers a 500 month period, which in deep learning terms isn't a massive data set. However, we also have climate simulations which come through our rescue. So here's an example of some climate model data from MRI ESM2. Now a typical simulation will run from 1850 to 2100. So that's over 3000 months. And there are multiple of such simulations from a single model and multiple models as well. So there's really an abundance of climate model data. I've shown an example from 2050 here where unfortunately climate simulations are predicting the Arctic to be ice free in the summers, which is unsurprisingly going to have a significant impact on life in the Arctic. Okay, so there's that data. Now, how can we frame this as a machine learning problem? Well, we need to think about what our desired outputs are. And in this study, our desired outputs are the CI's concentration maps from one through to six months ahead into the future, here drawing time going from left to right. We input a bunch of this past information from the data sets that I've showed you. So we choose to input the past 12 months of CI's concentration as well as the past three months of various reanalysis fields. And I'll let you just read some of those names there. So any oceanographer out there might notice a distinct lack of ocean variables. And we did choose early on to not include ocean reanalysis data because there are very sparse observations under the ice pack of the ocean, which can lead to the reanalysis fields being quite poorly constrained. However, the ocean is half of the stuff that CI touches, the other half, that being the top where it touches the atmosphere. So there's some argument that it might be worth trying to input ocean reanalysis data and see if that lends us any better predictability. As you can see on the left, there's also a few other inputs. So we choose to input some other forecasts as inputs into ice nets. In this case, we use some linear trend forecasts and I'll explain those a few slides down the line. But the idea is that IceNet has some sample predictions to use as a basis for its learning procedure. And I think this was actually quite a nice design choice. And I'll explain more about that later. And we also input some metadata like a land mask. So IceNet knows where is land and where isn't. So if the CI's concentration is zero, it knows if that's because it's land or because it's ocean and there's no CI's. And we also input a time variable that tells the model where in the calendar year it's been initialized. So spring or autumn or whatnot. So we can catenate all these inputs just like the red, green and blue channels of an image. We chuck them into IceNet and it spits out these forecasts from one to six months ahead. But now let's just quickly dive into the design of IceNet. So we use a UNET architecture for the model and I won't spend too long on this in lieu of limited time. But the way it works is that the UNET takes in images as inputs and produces images as outputs. The model was initially conceived for biomedical imaging segmentation, like detecting tumors in medical imaging, well, not x-rays, but ultrasound or MRI scans, for example. And we basically downloaded the code for this and adapted it to our needs, to our use case. So the way the UNET works is it processes this tube of data through a series of sequential non-linear convolutional layers which produce intermediate cubes of data and the resolution is increasingly compressed down the kind of down sampling path. And then we go start going up an up sampling path where the resolution increases back up from 27 by 27 up to 432 by 432. And we have these concatenation paths from the equivalent layers in the down sampling path. The idea there being it helps you to not lose spatial information during the down sampling procedure. And our outputs are 432 by 432 by three by six. The six corresponds to the six forecast months and the three will become imminently clear. So we choose not to have ISNET predicting CIS concentration directly. The reasoning there was that the CIS concentration values are inherently uncertain because of the, we're not measuring it directly, it's based on passive microwave. So we thought maybe we can reduce the noise to some extent by binning the IS concentration values into three classes. So we define a no IS class for IS concentrations below 15%. The marginal IS class for IS concentrations between 15 and 80% and a full IS class for IS concentrations above 80%. And then each grid cell and lead time ISNET will output the probability that it thinks the IS concentration will be in each of those classes. And the three probabilities will sum to one, of course. So ISNET has over 40 million trainable parameters, all trained through a big gradient descent optimization procedure called back propagation. And we pre-trained the model on over 2,000 years of climate model data from the climate models at MRI, ESM2 and DCR3. We then fine-tuned the model on observational data from 1979 to 2011. The reasoning there is of separating these two stages is that by fine-tuning on observational data, we might unlearn any potentially limited or incorrect physics that exists within the climate model data. So ISNET is actually an ensemble of 25 individual networks that were initialized with different random initializations which leads to a different learned input output mapping. And this is a known basically trick in machine learning to significantly improve performance. They ensembles even outperform Bayesian neural networks and studies. So it's quite a simple way to get a nice boost. We leave the six years from 2012 to 17 as our validation set, which we use for halting the training before we start overfitting to the training data and also for a Bayesian hyperparameter search in order to determine good choices for the learning rates and the number of filters in each layer. If that doesn't make sense, you don't worry too much. We then have the test set data from 2018 to 2020, which weren't used at all until the model was frozen, until it was finalized. So these three years represent our true ability to generalize to unseen data, although ISNET wasn't trained on the validation set either. Now, quickly looking at some of the two benchmarks that we use. So in terms of a dynamical model benchmark, we use this figure to inform us. So it's a bit of a chaotic figure, but I'll walk you through it. So the x-axis is lead time in weeks and I've drawn a dashed line for the six month mark. And this is basically plotting the forecast skill on the y-axis, where higher is worse. And it's comparing a suite of dynamical models. Now, one of these models stands out and that's a model called CS5 from ECMWF. It's the most consistently accurate model over the one to six month period. So we just chose to use that as our kind of state-of-the-art model that we're trying to beat. We then use a statistical benchmark as the linear trend extrapolation model. So the way this works as illustrated in the figure is it fits a line of best fits to the CI's concentration values at a given grid cell for a given calendar month. And it projects that one year ahead as its prediction. So say we're trying to predict September 2013, the linear trend model would extract the line of best fits at a given grid cell from all of the previous September's one year ahead. And this model doesn't depend on lead time. So it makes the same prediction wherever you initialize it. So it's gonna look like a flat line on a plot against lead time. Okay, so now finally getting our teeth into some results. The metric that we use to measure performance is a binary accuracy metric. So first we compute isonets output probability of CI's concentration exceeding 15%, which is also known as the CI's probability. We then convert all of the forecasts into these binary predictions for open sea or ice. So like zeros or ones. And then we measure the binary accuracy. So the way that works is illustrated here for these three different forecast months. So the stuff plotted in blue to white, that's the isonets CI's probability. Isonets predicted ice edge is shown as the white dash line. The observed ice edge for that month is shown in the black dash line. And the space between those two contours is where you've got it wrong. So I've overlaid those grid cells in orange to indicate binary errors. And we compute the binary accuracy in this classification task over an active grid cell mask, which is shown as the thick black line. And this mask basically masks out land and ocean that's too far south, where you never get sea ice. And the mask, there's a different mask for each calendar month. We're showing the binary accuracy in the bottom right of each panel here. Okay, so looking at the mean performance over 2012 to 20, Isonets held out data. We get that the nice figure that I showed you earlier. So Isonets outperformed by seas five at a one month lead time, but that's because Isonet only sees monthly average data as inputs, which can smear the initial conditions and weather phenomena that dominate predictability at short lead times, like on a weekly timescale. In contrast, seas five, for example, would be initialized using a daily or perhaps even hourly average input on the first of the month, which means it gets much more kind of fine grained initial conditions. But we see Isonet outperforming the two models by quite a wide margin in the seasonal forecast from two months and beyond. This figure masks the fact that the performance varies a lot based on the calendar month. And that's what we're trying to show with this heat map here, where the rows show the different calendar months and the columns show the different lead times matching up with the figure above. And you can see there's a big dip for these long range forecasts in the late spring and summer months. And that's a known phenomenon in sea ice forecasting. It's called the sea ice predictability barrier, sorry, the spring predictability barrier. And basically, the kind of theory goes that you need to have observed conditions at the onset of melt in around springtime before you can start to make good predictions of summer because those conditions can be a determining driver of sea ice conditions in summer, which means if you initialize before that, it's quite unlikely that you're gonna get much predictive skill. We can subtract the equivalent heat map of seas five and we get this plot. So yeah, we see the purple colors where ice net's binary accuracy is lower than seas five. Most significant is in the one month forecast of September, but we get this nice highlighted area where ice net does particularly well in these seasonal summer forecasts. And these are notoriously difficult forecasts to make by the way. We can also do the same for the linear trend model and unsurprisingly, we best outperformed the linear trend model at short lead times when we're most able to make use of the initial conditions and then that improvement drops off with lead time. I'd like to kind of point out here and another way of sort of looking at this bottom right panel is if you recall, we're inputting the linear trend forecast into ice net's input. So it's kind of unsurprising that ice now performs the model. We would hope that it would at least match the performance because it can hypothetically copy the forecast from input to output, right? But what this tells us is the amount by which we can outperform the linear trend gives us an indication of how much ice net can leverage the other inputs in order to improve upon those forecasts and get the non-linear variations of sea ice beyond the linear trend. So it's quite informative about where we can leverage those initial conditions variables and it seems like that's most substantial in the short forecast of September, but we're also getting this kind of tongue of higher predictability out into the seasonal predictions of summer. So yeah, anyone doing machine learning research, I think there's quite a nice like trick to input some sample predictions into your model and see, really dig into where it can go beyond the performance of that comparison model. Okay, so looking at the effect of CMIP6 pre-training and model ensembling on ice net's performance, this heat map here shows the improvement of an ensemble that includes pre-training with an ensemble that doesn't have any pre-training. So given that we've assembled, what's the improvement of pre-training to isolate the effect of pre-training? And yeah, we get some nice, there's a nice, you know, small boost pretty much in most conditions except for the seasonal forecast of September, which seemed to be slightly harmed actually by the pre-training. And I'm wondering if this is partly to do with potentially the melt physics in the climate models aren't particularly accurate, which gives some kind of double-edged sword behavior that we see here. Now let's look at the improvement of ensembling, isolating out the effect of pre-training. So this is looking at the mean performance of the individual ensemble members and then seeing how much better the ensemble average prediction is. And here we get an improvement across the board, particularly in these long range forecasts of summer. When we look at the effect of both of them, it's quite clear that this has a substantial benefit to Isenet's performance. And it's actually for this reason that we're able to achieve state-of-the-art, well actually exceed state-of-the-art performance. So it was those modeling choices that did the trick for us. Okay, so another nice aspect of Isenet is that it turns out its forecasts for pan-arctic extreme events in September are much better than its competitors. So the first row shows September 2012, which was the lowest extent on record. And the observed ice edge is shown in black and Isenet's predicted ice edge is shown in green. And we have the lead time going down from four months ahead to one months ahead. And you can see how as the initialization date approaches the target date, Isenet's able to update its forecast and improve upon it, fitting closer and closer to the observed ice edge. The second row shows September 2013, which was an unexpected bounce back in the CIS extent. So I believe this took the CIS prediction community a bit by surprise. Everyone thought, you know, 2012 was the new normal, but then actually 2013 came around and the IS extent bounced back by quite a large amount. 2020 was another low extreme, so the second lowest extent on record. And you can see the forecast in the bottom row there. I find it quite fun to kind of look at one column at the time and scan up and down and see how Isenet's predictions change in different initial conditions. And I'll just let you stare at that for a couple more seconds. Cool, so now moving on to all hopefully be fairly interesting looking into interpreting Isenet. So we've got those results that I've shown you. Now one open question that remains is how is Isenet actually using its inputs data to make its predictions? And we go some way towards answering that question in the analysis to come. So the method we use is, it's a form of feature importance method, but feature is like a kind of machine learning language way to talk about an input variable. So I'm gonna call it a variable importance method. And it's called permutant predict. The way it works is that we replace a given input variable with values from another random month, all right? So a permutation. And we then measure the accuracy drop at each lead time. So we're seeing how much cutting the kind of causal ties between that input and the output affects our ability to forecast CIS. And an example is shown in this time series in the bottom left, which shows the one monthly time accuracy drop from permuting the initialization CIS concentration map. And you'll see there's a spread around each value. That's because it's prudent to repeat this permutation multiple times using different random seeds to kind of average out the random nature of permutation. And yeah, it's quite an interesting plot. There's quite a clear seasonal cycle going on here. And we can actually aggregate over the calendar month and get a plot like this on the right, where we see quite clearly that the accuracy drop is greatest when we're trying to forecast September and October. So it seems like ice net is depending more on the initial CIS conditions for forecasting those months. Looking at another example, here I'm showing the accuracy drop corresponding to permuting the 500HPA geopotential height anomaly map. And for anyone who doesn't speak meteorological jargon, that's basically relating to the large scale atmospheric circulation conditions in the middle of the troposphere, the troposphere being the lowest layer of the atmosphere. And yeah, it's a bit more chaotic compared with the previous plots, but there's still something going on. So if we look at the calendar month plot, it seems like the drop is quite consistently low from January through to May. And then we start to get these higher accuracy drops when we're forecasting from June through to December. One hypothesis for why the drops are low from January to May is that at this point, the CIS pack has expanded substantially. It's thicker, it's locked in with the coastlines. So there's less room for it to move around. And as a result, it's less buffeted by the atmosphere. It's less dynamically driven by the atmospheric conditions. So potentially that's one explanation for why the variable importance has this pattern. Another fun one is the two meter air temperature variable. So looking at the time series, it looks like the accuracy drop is mostly actually quite close to zero, seeming like it's not being used very much. We get these occasional positive spikes, which are a bit funny. That means that permuting that variable actually gave us an accuracy increase, which is a bit bizarre, but that's bound to happen sometimes, just randomly permuting the variable perturbs the forecast towards the observation by chance. But what I'd like to dwell on momentarily is this sample here in 2012. So if you count each of these time points, in September, the accuracy drop is almost always zero, but for 2012, we have minus 2% accuracy drop from permuting this air temperature variable. And if you recall from the previous slide, the September 2012 was the lowest extent on record. So I'm quite intrigued by this sample here. I'm wondering if that's telling us that Eisner is doing something fancy and using this. Maybe it saw the air temperatures were anomalously high and therefore it kind of lent on that variable more for its prediction as it tried to forecast this extreme low event. A more skeptical interpretation is perhaps, if the temperatures are higher for a given month, then permuting it is gonna be a larger change to that variable, and therefore you get a larger change to the output. But that's, in my opinion, that's a slightly overly pessimistic interpretation. I like to lean a bit more optimistically and see, think that this tells us something about Eisner doing something fancy of this variable. And again, the calendar month plot looks kind of interesting on the right with not much importance from January to May. And then we have these potentially high accuracy drops or higher. So it's all well and good looking at all these time series, but there's 50 input variables and six lead times. So that's not gonna be easy to digest if we keep working with these time series. So what we can do is we can just take the average across the whole time series and compute that into a single value. And then plot a big heat map showing all of the mean or average variable importances for each lead time and each variable. And I've shown that on the left there, however, you won't be able to read it unless you're an ant crawling on the screen. So I'm gonna zoom in to the top section first and the variable naming scheme here is such that SICK 1 means the CI's concentration map one month before initialization to the most recent map. Linear trend SICK forecast 1 corresponds to the one month linear trend forecast. So the numbering brackets for the forecast inputs correspond to their lead time, but otherwise it corresponds to the lag of the initial conditions variables. So we can see that the initialization CI's concentration is super important at a one month lead time. The importance seems to drop off with lead time as you might imagine. The inputs at greater lags of CI's are less important. However, it looks like, for example, at a six month lead time, CI's concentration six corresponds to the CI's map six months before initialization. So if we're trying to forecast six months into the future, that input corresponds to the same calendar month from the previous year. So it's quite fun to see the importance rise up and that corresponds to this diagonal line here. With the linear trend, this is, yeah, this is also nice to see because we have this kind of diagonal line here showing that the linear trend inputs is generally most important for the lead time that it corresponds to. Or in other words, if we permute the one month linear trend forecast, we don't see a massive drop in performance at a six month lead time. So Isonets learned to use those inputs in a reasonable manner. And the importance of these variables goes up with lead time. You see a six month lead time, we've got quite a big drop in accuracy when we permute that variable. Okay, looking in the middle and not dwelling on this for too long, casting our eyes to the top left value, that's the two meter air temperature anomaly. Remember, we saw the time series for that one where the 2012 accuracy drop seemed quite significant but the average accuracy drop is quite low. So this kind of hammers home the point that the average might seem like that variable isn't important, but it might come into play and become important in specific prediction scenarios. But yes, moving on, what I'd like to see it to say about these bottom variables is most of them aren't important apart from the one month lag sea level pressure and the 500 HPA geopotential height. Something that I find very curious here and I'd be happy to discuss this with anyone in the audience is if we look at the 250 HPA geopotential heights inputs, those values are almost unused. However, at least to me, if you were to compare the 500 HPA and 250 HPA maps, they're very similar. There's a lot of correlation between them because we're talking about this large vertical structure in the atmosphere which can mean the low pressures overlap. Because the 250 HPA inputs corresponds to roughly the top of the troposphere, by the way, before we start to move into the stratosphere. So yeah, I'm curious about whether maybe Eisner has learned that this lower level atmospheric information has a stronger statistical association with sea ice and it's just thought, I'm not even gonna bother using the 250 HPA input. But yeah, quite curious. And then we have the land mask, which is the same every time. So permuting it does nothing and it looks like the cause and sign of the initialization months, which is our kind of time input is being used by Eisner, so that's nice. Okay, so some caveats of this method. Permuting can inflate the importance of inputs that correlate with others. So if we think of this toy scenario, we have two scalar input variables and say that's our training data. If we were to permute one of them and keep the other the same, then the model would get shot out into the ether where it has no training data and the input output mapping can be complete garbage, which can end up inflating the importance of variables that correlate because you'd have a potentially greater drop in accuracy. However, there's a lot of correlation between a lot of Eisner's input variables. And interestingly, we don't see bizarre inflated importance values that we totally blow our minds and seem unreasonable. I think the results I just showed you seem fairly reasonable. So yeah, that's interesting. Also, this goes without saying, but we're measuring the importance of the data, which is 25 by 25 kilometer spatial averages and monthly averaged across time. And there are also approximations of the real values, right? They're not the real values. And it's the importance for the model. So I think this is a nice method for probing the model itself, but I don't think this method is a particularly sharp analytical knife, which we can use to dissect nature and try to uncover new phenomena that we didn't know about. However, there are some machine learning models that, sorry, methods that lend themselves better to that type of task. And yeah, this measures the importance of pan-arctic scale. We haven't looked at a regional decomposition, but perhaps wind, for example, is really important around the edge of the ice pack, but nowhere else potentially. So we have to keep that in mind as well. Okay, so now moving on to share some preliminary results from IceNet 2. So IceNet 2 is similar to IceNet but operates on a daily timescale. So the inputs are now these daily input data. We've had the idea of also training on data from Antarctica to leverage the Southern Hemisphere. The architecture is the same as IceNet, a UNET ensemble. And the outputs now, we've had a few changes. So we now have daily output data going out from one through to 93 days ahead. And we've decided now to predict continuous sea ice concentration values rather than using our classification approach. The reason being, the raw sea ice concentration value is probably more important for end users, even if it's got some uncertainty attached to it. Also, the goal is to output probability distributions rather than point values. So we're gonna use a truncated Gaussian distribution, which is cut off below 0% and above 100%. So looking at some results, so yeah, we've already trained an initial version on ensemble of five of these models. And now we're looking at the mean performance from 2012 to 17. We get a heat map like this, which is just like the previous heat maps I was showing you, but on a much higher temporal scale. So about 30 times higher temporal scale. And yeah, it's quite fun to look at. We're plotting the mean absolute error now. So higher is worse. And again, we see this kind of spring predictability barrier, but in a much more fine grained detail. We've got some preliminary results comparing with some benchmarks. So ice net two is in blue and we're comparing against C's five again. And they're roughly comparable at this point. And we're also continuing to outperform a statistical benchmark that we definitely want to be doing better than. And this is plotting the root mean squared error here. But as I said, these are provisional results. So take this figure with a large pinch of salt. What we can do if ice net two is zoom into a specific grid cell. So zooming in here to a specific 25 by 25 kilometers grid cell, we can look at the forecast just like a weather forecast. So just like, if you pull out your phone and try to see what the weather's doing in the days to come, which if you're in Cambridge are looking quite pleasant. We can have a fixed initialization day. So in this case, say we initialize on the 15th of May. And for a specific grid cell on the coast of Hudson Bay in Canada, I believe it's close to a city called Churchill. We've got the prediction for the CS concentration in blue and the observed in black. And we can't forecast this quite chaotic variation which could be due to just noise on the raw data. Could also be due to weather phenomena that we're not able to predict that far ahead. But you see even out to three months ahead we're getting the overall trend quite well. Or we can ask for a fixed forecast target day say in this case, the 4th of December and ask for our prediction every day as we get closer and closer to that day. And you see, within the last three weeks we start to correct our initial kind of under prediction. Okay, so another interesting thing we can do the final result here is a look at regional predictability using IsNet too. So the title of this figure is a bit confusing and I've got a bit carried away with projection methods in Python. But what this is showing is the lead time at which we can predict the CS concentration to within a given era and stay within that era until we reach the target day. So a longer lead time means we get that, we're predicting to within 10% for a very long time two months and beyond and the dark colors correspond to lower lead times. And yeah, the lower predictability is mainly around the ice edge where it looks like we have to wait until just a few weeks before initialization or even a few days before we can predict the concentration to within a given threshold. Okay, so just finishing off now with a discussion on potential impacts for Arctic conservation. You know, CI's forecasts are useful for shipping and stuff like that. But I'm also interested in what we can do to help the quite dire situation in the Arctic. So for example, polar bears are having to go hungrier for longer on land because of CI's decline. So I'm wondering if CI's forecasts in conjunction with tracked polar bears could be used in some extent to inform if there's a hard season ahead or if there's risk of human polar bear conflict, for example. Another example is predicting warris mega-haulouts. So warris mega-haulouts occur when the warris have no ice sheets to ice platform to rest on. So they're forced onto land in tens of thousands and they're easily started by humans, which can cause stampedes and lead to really high mortality rates. So if we combine CI's forecasts with known haulout locations, potentially we could limit access to that area and try and prevent stampedes. Another final option is dynamic marine protected areas. So this is fresh off the press from the WWF. They've highlighted some priority areas of conservation and on this map are also shown some shipping traffic. So the idea of dynamic MPAs is that, for example, CI's is a dynamic habitat. So if we can predict in advance where the CI's will be, perhaps we can give advance warning to industry and tourism to say, this is a protected area with limited access and new regulations. Cool, so yeah, we've got a preprint on Earth Archive. Feel free to check it out. Thanks to everyone for listening. Thanks to all my collaborators for their time, advice and supervision. Feel free to ask questions or contact me on my email or Twitter, otherwise happy to take questions in this now quite limited Q and A. Cheers. Well, fantastic. Thanks a lot, Tom. That was an excellent talk. I really learned a lot and I wrote them some keywords to look up afterwards. So I really appreciate it. Okay, so let's see if people got any questions. If you want to ask and please raise your hand if you could. Or if you prefer type your question in the chat window and then I'll read it out. I might share my screen again, actually, in case there's... Yeah, maybe that's helpful. Okay, Nick, please go ahead. Thanks. Hey, that was a very nice talk. Thank you. I was wondering, can you tell us a bit more about the ensemble that you're using? Is this just different initializations of the net or are other things varied as well? Yeah, so I'll try and... Yeah, I'll actually scroll up through the slides. So, yeah, so each ensemble member is initialized with a different random initialization. So all of the weights in this model are random and we initialize them from a different points. And this turns out that through your very high dimensional training procedure, you end up following a different path in the weight values, which leads to quite dramatically different input output mappings. So you can get quite a high diversity in your output predictions. And we average the predictions by just, yeah, taking the average of the individual probability predictions of all of the individual ensemble members. Thanks. So the basic structure of the net is the same for all these ensemble members. Exactly, and that doesn't have to be the case. You can have different models as well, but in this case, we use the same structure. So have you looked at individual members? Do they on average perform similar or are there some that perform better or much worse? Yeah, good question. Yeah, so I did and there are some actual high performers and some low performers. Not a huge spread, but actually enough to take notes of. And this actually did affect the study in some way because for the permutant predict method, because it's quite computationally expensive, I decided to actually use a pruned ensemble of only the five best ensemble members so that I was interpreting our high performers in terms of what variables they chose to use for forecasts. So any idea what makes it a high performer or a low performer and can you use that to improve this ensemble technique? So why might some be better than others? Just the randomness of training. So that random initialization seed has quite a big determining factor on the resulting final model. You can, for example, get more stuck in saddle points in the high dimensional optimization space, for example. It's entirely random. So there is nothing structural there that is happening that could help you further. I don't think so, but potentially you could improve the ensemble average by upwaiting the higher performers. There are more sophisticated ways of ensembling than just taking the average, but taking the average also works quite well, so. Thank you. Yeah, thanks a lot. Maybe I could also ask on that side, is there any, do you mention this unit architecture was also used for medical image analysis and so on? Can you give us a bit of intuition as to why this is a particularly well-suited architecture for these kind of prediction tasks? Yeah, so maybe I can elucidate that with an alternative. So alternatively, you could try and make a prediction at each individual grid cell. So say we look at the past time series of the CI's concentration values and we have a kind of scalar regression model that's trying to predict the output on an individual grid cell-wise basis. The downside of that is that you lose the spatial context. So I actually did a calculation on this architecture which showed that basically with these sequential convolutions, your receptive field increases down the model, which means each points depends on more and more inputs, spatially speaking. And at the outputs, if I did my calculations correct, each prediction actually depends on 1500 kilometers of information in the kind of X and Y directions which can help with kind of long scale predictions. I see, now how would that contrast with let's say similar type of network without sort of this reduction in the natural extent? So by forcing it to compress the information, it's like you're forcing the model to come up with these kind of abstract summaries of the raw data in a similar way to the way the brain compresses the raw photon information landing on our retina. So it's a similar kind of line of reasoning. I see understood, thanks a lot. No worries. Okay, anybody else have any questions? Okay, it doesn't sound like it. Then let me thank you very much again, Tom, for this excellent talk and for the nice discussion afterwards. And this is some of your next week, hopefully. Cheers.