 Our speaker this week is Laura Zanna. Laura is a professor of mathematics and atmosphere and ocean science at the Koran Institute at NYU. Before moving to New York in 2020, she held positions at Princeton University as well as here in Oxford, where she spent an entire decade. In her work, Laura studies the physics of the Earth's oceans and how they impact our climate. That brings with it the necessity of studying physical processes at a multitude of different length scales, which, as you can imagine, is not always easy. In her talk today, she will outline how machine learning can be useful as a tool, not only to understand these processes better, but also how to use that knowledge to improve our climate simulations. And with that, Laura, please take it away. Okay, great. Yeah, thanks a lot for the invite. And it's nice to see many names in the audience that used to be down the corridor from me, so definitely excited to be virtually back in Oxford physics. So yeah, so I'm gonna talk about some work that we've been up to in my group, mostly on how can we use machine learning to represent processes in climate model that are not resolved. And I'm gonna concentrate on the ocean because I'm a physical oceanographer by training. And so I'm the one speaking, obviously, but a lot of the hard work and the exciting work has been done and led by two people in my group. One is former PhD student, Tom Bolton, who I actually got is D.F.L. from Oxford physics, back when I was still in Oxford, and a postdoc, Archel Guillaume, who is at NYU Courant in the group. So I'm gonna give you a little bit of an outline. I'm gonna focus on machine learning for a very specific application in terms of climate modeling, which is how can we represent those subgrid-scale processes, those small-scale processes that climate simulations don't resolve? So what I'm gonna do is I'm gonna motivate a little bit the talk, right? I'm gonna tell you what a climate simulation is in case you don't know. I'm gonna talk about what we call model uncertainty, meaning the models that we have, the climate simulators that we have are imperfect for many reasons, but one of them, again, is because we don't resolve many processes. And I'll show you that that impacts projections over the 21st century of things like precipitation or temperature or other quantities that we care about when we actually try to predict the climate system. And so the main question is, everybody is doing machine learning one way or another these days, for good or bad reason, I don't know. But what I'm gonna try to argue is that it is possible to actually use machine learning to represent those small-scale processes that we don't resolve by using data, either observations or very high-resolution simulations of the ocean or the climate system in general, to actually try to capture those small-scale processes and represent them in climate models in a way that is a little bit better than what we've done so far, which in general was basically using, yes, basic physics that we love, but sometimes can be a little bit empirical and a little bit adult. So can we do better than that? Or can we try to find the best way to merge physics with machine learning, basically capturing enough information and data and physics to actually try to come up with better representation of those processes? As I said, I'm a physical oceanographer. So I'm gonna concentrate on a process that I'll call mesoscale eddies. So they're basically 10 to 100 horizontal kilometer in scale. So that's what we're gonna focus on to actually try to make that argument. Now, of course, if machine learning can capture those processes or can capture a good representation for them, that does not necessarily mean that they will translate in better climate models because we need to take those representation and then plug them into existing climate simulation. That's not a simple task either. So I'll discuss both the opportunity and the challenges that come with thinking about those ideas of merging machine learning physics and climate modeling together. And is it possible to actually think along the line of bringing physics, data and new algorithm to help us move forward in terms of modeling the climates? I'm pretty excited because we get support from Schmidt Futures. So we have a lot of projects that just kicked off just a few months ago to really bring together climate models, domain scientists from the ocean atmosphere and sea ice and machine learning experts to actually really kind of try to tackle this problem in climate models. So can we represent processes from the sea ice atmosphere and oceans by using machine learning and then plug them back into existing community climate models that we use for making predictions? And so this is pretty exciting. It's a large international collaborations with many people and we hiring as well. So if anybody's interested in their job, there's a lot of exciting work to do in bringing physics and machine learning together. Okay, so I'm gonna start with a little bit of motivation. By the way, I don't know if you ask questions in the middle or if it's at the end, but I'm happy to take questions, by the way, as we move forward. So I'm gonna start with something pretty basic is how do we model the climate system? So in the best case scenario, we have equations. So for the fluids of the ocean and atmosphere, we have the navier stokes on a rotating planet. So we have forcing dissipations and all terms that you would include. For other parts of the climate system, we don't necessarily have equations. So for part of the thermodynamics or say the land, we have approximation of those equations. But nonetheless, we still write them down as partial differential equation. But those are highly non-linear. So obviously we can't do much with them as is we actually make projections or do anything else. So what we do is we plug them in a climate model, plug them in a model. And so to do that, we take those equations, we break them down into pieces that we put in a piece of code and then we solve them in basically on a grid. So every single grid box of a climate simulator is basically a discretized version of the equations of motions that we have or the thermodynamics equations or the land equation and so on and so forth. So then you can imagine, right? So the more of those grid boxes you have, the more degrees of freedom, the higher the computational cost is. And so I'll come back to that in a bit. That's a way we actually stimulate the climate. We start with the equation, break them down into pieces, solve them on a computer and the size of the grid is basically the number of degrees of freedom that we have when we consider all the equations. And so as I said, the more grid boxes you have, the larger the number of degrees of freedom. So your computational costs kind of go up. But of course your simulations are of course more realistic, right? Not just look prettier. So this is a coupled climate model showing sea surface temperature. So temperature at the surface of the planet of the ocean, sorry. In the same simulation, but at three different horizontal resolution. So basically three different size of the grid. Over here, it's an horizontal grid of about one tenth of a degree. So about 10 kilometer if you want, along Antarctica. This is about a quarter of a degree. So about 25 kilometer by 25 kilometer. And this is a conserved resolution round. So about 100 kilometer by 100 kilometers. So in general climate simulators that we use for projections are run at those resolution, which is relatively coarse compared to some high resolution simulations. And the reason for that is it's very difficult to actually run the simulations or even higher resolution do ensembles for hundreds of years. Again, computational costs is a little too high and too steep. So usually we rely on those models and the idea is, can we represent all those processes which can see those are filament and steering and mixing? So can we represent all those missing processes in those course resolution climate model? So that's a big question obviously. And that's the way the parameterization world works. So now, you know, we have kind of simulators. They relatively course resolution again, as they don't resolve all the processes that we want. And so what does that mean for our projection? So we're gonna look at a range of climate models. And here we're looking at precipitation. So as a function of time, so on the x-axis is time. This is precipitation over here. And each one of the thin line is a prediction or basically a run with the climate simulators. The black curve here are the observations. And the colors are for different emission scenarios. So obviously we don't know how much we're gonna emit in the future. I very much hope that we emit as little as possible or a lot less than what we're emitting right now, but this is part of the uncertainty. So that's something that we call scenario uncertainty. So if we go to this plot on the right here, so this is basically showing us the spread. So the fact that the climate simulators are diverging in that prediction by 2100, so as a function of time. And so we have three different type of uncertainty. So one of them is gonna be due to the initial condition. So you can think about basically the internal clock or the chaos of the climate system. So that uncertainty will go away in time. Now the two other type of uncertainty, so one, as I said, is scenario. So this kind of green line over here. And that's not increased as a function of time obviously because it's harder to predict what we're gonna emit. And that tells us that the projections or the uncertainty in the projections will increase depending on the scenario that we have. And so if I go back to the plot on the left, the different colors were for different scenarios. And again, as you can see, the amount of greenhouse emission that we'll put in will give us a different projection of precipitation and the mean, but also a different spread. So the spread is the quantification of uncertainty. Now the most important for us is the model spread, right? To the fact that we have different climate simulators, even if you give them exactly the same greenhouse emission over the next hundred years, they still give you a different answer that is independent of their initial condition or this scenario. That's what we call model uncertainty. And you can see it's basically the dominance of uncertainty, especially on multidicator time scale. And that's from the fact that again, we're not solving the exact equations of the climate system and the numerics and the lack of processes that are important are actually introducing some uncertainty and some spread. So this is a plot for the previous generation of climate model, what we call C-5. If we look at the same exact quantities in the current generation of climate models, so add up C-6, so this is what you get. So overall, actually the model spread or the model uncertainty is actually now larger than it was before. So we didn't necessarily reduce the uncertainty associated with models, actually has increased. And there are a few reasons for that, right? Where some of the models have more complexity in them, so they have different feedbacks that are included. So not a reflection on the models being bad is we just have more spread because again, the complexity of your system has also increased. But that also tells us that a part of that uncertainty is associated to all those processes that we don't resolve and give us model spread overall. So if poor representation of some key processes say mixing or clouds are actually affecting our simulations and our projections, do better. I'll tell you how can we do better. Let's just try to think a little bit about what we've been doing so far, which has been extremely successful and just to be clear. So what we've been doing so far is trying to think or trying to represent mathematically what those processes do in again, a kind of a simple conceptual or idealized mathematical representation. So in every single grid box, whether it's the ocean or atmosphere, anything that is below the scale of the model, the grid of the model that you're resolving. So say that ocean we have basically grid boxes that are 100 kilometer by 100 kilometer or 25 kilometer by 25 kilometer. So things like clouds or ocean turbulence and ocean mixings are not resolved. So the way we've been tackling this in terms of, again, what we call a parametrization, try to come up with a simple mathematical formula that will represent this mixing or cloud processes. And we put that on the right-hand side of our numerical simulators. And what we do there is basically, if we take the example of ocean mixing, say, so you can think that, if I put some tracers or if I put some dye in the ocean it's gonna get steered and it's gonna get mixed overall. So I could write it as the Laplacian of the tracer. So it's gonna be basically a diffusion operator or quasi-diffusion operator and it's gonna have a parameter in front of it and the diffusion operator, so the Laplacian is gonna act on the resolved scale. So basically only on what I have that is at the size of the grid box is gonna move that around. So the closure problem or the parametrization problem is just to come up with a mathematical representation of those processes, whether it's cloud or mixing, that only depends on the resolved scale because I have no information about what's not resolved. That's the way it's been typically done for decades. It's extremely successful, right? I mean, been able to represent many processes and how they interact with the large scale flow but nonetheless, they're uncertain and some of them are a little bit empirical. And so the question is, can we do better? Or at least can we do as good as that and hopefully a little bit better by using the data that we have? So here what we argue is, can we actually use the wealth of data set that we have now? So we have limited high resolution simulations or even global ones, we just don't have a lot of them. We don't have 100 up here, but we still have many of them. And so for example, this is kind of again a picture of the high resolution couple climate model that is run at IOS. So can we actually extract information about those small scale processes from existing observations or high resolution model together with machine learning algorithms? So how can we do that? Well, we have a lot of images, right? We have basically features. Can we actually learn the relationship between those small scale processes and a large scale and extract enough information from that very complex and turbulent data that will help us represent things like mixing and cloud and so on and so forth. So rather than going at it by saying, oh, well, I know what the large scale impact of those processes will be, just letting the data and the algorithm actually pick out a relationship for us. So machine algorithm, there's a lot of hype around it. Don't get me wrong, we all know that. But nonetheless, we have a lot of data. We have new algorithm. Is it possible to actually extract new information we didn't have before? So people have been working on this in the climate arena if you want a turbulent arena for quite some time now. So I'm just putting up a few examples here. So in turbulence, basically mostly large AD simulation. So there's a very nice paper by Link in 2016 that already start using convolutional neural network. So kind of complex algorithm to extract information about turbulence by embedding some symmetry properties of the tensors within it. So kind of blending physics and machine learning together. And the atmosphere had started about a decade ago already with basically a lot of papers thinking about how can we represent radiations or clouds by extracting information from machine learning. And so now there is actually a large number of papers. It's actually almost impossible to keep up now, especially on the atmospheric side. Lots of paper looking at different processes from gravity waves to cloud. I know some people in Oxford are actually working on this. So a lot of work is being done in extracting information from data directly. So in my group, we've been focusing on the ocean. The ocean is actually less populated when it comes to machine learning and parameterization. So we've been working on this topic of ocean mesoscale as I mentioned, so I'll redefine that in a second. And there's another group a couple of years ago that started working on vertical mixing. So turbulent vertical mixing in the ocean and how we can steer tracer in general. So we're seeing more and more and more people working in this kind of specific sphere of using machine learning for parameterization. So I think there's a lot of potential. We don't know exactly yet if we're improving climate simulations. But what I'm gonna do now is basically kind of use a couple of examples from our work and telling you the good and the bad. So I'm not gonna hide the things that don't work or the things we don't understand, but I hope to give you a little bit of a flavor of what's possible to do and where the challenges lie. Okay, so we're gonna work on parameterizations of ocean mesoscale at ease. So as I said before, so that horizontal scale of 10 to 100 kilometers roughly, and they're pretty important because they actually, you know, sets what we call the stratification in the ocean. So the temperature profile and function of depth. And that means they're also critical in, you know, steering and mixing tracer and taking up heat and carbon. So, you know, the ocean is a massive, you know, reservoir of heat and carbon and Eddie's pretty big role in actually, you know, steering and taking up those different tracers. So if we look at this nice little picture over here, we're looking at surface velocities from a climate model. We run at three different resolution, a little bit like before. So we're looking at surface velocity. And so that's actually one of the UK models. So on the right, we have a climate model that is one at one degree, right? So what you can see is the surface velocity field that's kind of smooth, right? That's a little bit boring. There is no turbulence there. You can still see jets. For example, over here, the Cauchy, which technically is supposed to be quite turbulent, but you don't see that in the velocity features. You increase the resolution. And of course you start resolving those mesoscale at ease and those features. So you can see a lot more filaments and a lot more turbulence in regions like the southern ocean, for example, the equator. To increase the resolution even further to the left, one-twelfth of a degree. So now again, roughly eight kilometer horizontal resolution, a lot more filaments, a lot more turbulence. So it's not just that, of course, this picture looks a lot better than this one, but you're actually resolving scale interaction, right? So those small-scale processes interact with the large-scale flow. So we don't have any scale separation. So resolving them and resolving their impact actually has an effect on the strength of, say, the Gulf Stream or in the southern ocean, the ACC and how much heat and carbon up that we're gonna take up. So there's no scale separation and that's why it's quite important to actually represent them pretty well. So now how can we come up with a parametrization of ocean mesoscale at ease using data and using machine learning? So the first thing we're gonna do is we're gonna take a high resolution simulation. So we have the equations and we have the data. And so what we're gonna do is we're gonna diagnose the missing forcing from this course resolution simulation from the high resolution simulation. So we basically gonna take our equation, we're gonna low pass filter them if you want, of course, grain them. And when we do that, we end up with an equation that has an additional term here, S, which is the turbulent forcing that we need to put on the right-hand side of course resolution model to actually make it behave like a high resolution simulation, but of course without the computational cost. So first step, we're gonna take data from high resolution simulations. We're gonna diagnose this missing forcing term, which is basically the nonlinear advection that is impacting the flow. And then we're gonna ask machine learning algorithm to actually learn that term as a function of the large-scale velocities. So we're gonna let the algorithm just pick whatever function it wants based on the input data and again, this function S that we're looking for. And I'm gonna give you two example, one which is using neural network. So basically a little bit of a black box, right? So many, many parameters can capture many complex features, but it's very hard to interpret and I'll show you a little bit of that. And second example is gonna be interpretable equation discovery. So we're gonna try to learn equations from the data for this missing process. So now, even if we are successful at all of those things and I'll show you, we're doing a pretty decent job, it doesn't mean that when I take this S and plug it back into a cross resolution client model, or in our case gonna be very idealized flow simulations that will remain numerically stable and that it will actually improve on the flow. And so there's a little bit of an art of tuning that is involved there. And all the problems that we have in general in climate simulation, numerical stabilities and skill and all those things actually don't go away because you learn a better function and actually remain. So those problems are still there. And so I'll show you a few examples of that. Okay, so first example, we're gonna use neural networks to actually learn that missing forcing term. So we're gonna take idealized simulations because we can run them pretty easily and that fairly high resolution and gonna generate a lot of data. So it's a, what we call a quasi-geographic model. Very simple for those of you who know, if you don't, don't worry. So basically we blow wind at the surface of the ocean, we generate what we call a wind-driven circulation. So we have a sub-tropical circulation over here, a sub-polar circulation over here. But the most important aspect is we have a jet here in the middle, that meanders where there's a lot of turbulence interaction and mesoscale being developed. And so we're gonna give, we're gonna diagnose this missing, you know, forcing this missing tendency that represents the impact of the mesoscale eddies on the large calcule that is missing from a course resolution model. And we're gonna ask a convolutional neural network by just giving it a lot of 2D images, a lot of pictures, you know, without more information except that it knows that it's X and Y and there are many of them. Of that type with pulsing, I'm gonna ask it to learn it. So come up with a prediction S hat for that S term. Okay, and so basically giving us back an image X and Y that only depends on the input, which is the surface velocity field that will, you know, best mimic this missing tendency. And so at the end of the day, we're just coming up with a new parameterization of ocean turbulence of momentum, basically, from that would be missing from a course resolution model. So, and we're making no other assumption than just giving it, you know, snapshots of missing tendencies and asking it to come up with a new parameterization that only depends on the large-scale velocities. Okay, so the first thing we've done, and so that was work done by Tom again when he was a PhD student was just try it out. We had absolutely no idea what we were expecting. And so that was quite interesting. So we started with a simulation with a given Reynolds number. So again, so flu simulation, this is gonna be X over here, this is Y, so, you know, longitude, latitude. And we're looking at this missing, you know, eddy forcing. So what you can see is where the jet is, this is where the missing forcing is most prominent would kind of make sense, right? So we're running a simulation at course resolution as we saw the jets are usually more viscous and dissipative. So that's where you're missing the turbulence. So that's what you would want to represent. So that's the true missing forcing that's its standard deviation. That's the prediction made by the neural net and that's the correlation between the two. So what you can see is that where there is forcing we're doing a really good job in regions where there is no forcing the correlation drops but again, there is no forcing. So it doesn't really matter all that much. So overall, you know, the black box kind of magic gave us a really good answer for predicting this missing forcing. Now, if we don't retrain the neural net, right? So we're not giving it new data or we're not asking it to relearn anything new. We're just taking that function that came out and we asked it to predict the missing forcing from a simulation that does a higher Reynolds number, right? So basically the flow now is gonna be more turbulent. My expectation was that the correlation will actually drop that, you know, I'll do a worse job at predicting the missing forcing from a highly, you know, from a more turbulent regime than he did on what he was trained on. Surprisingly, actually, that's not true. Did actually a better job at predicting a more turbulent regime. So that's why we see here, higher Reynolds number exactly the same, you know, neural nets, parameterization. And so we give it again the input velocity and ask it to predict the truth. That's the prediction and that's the correlation. So the correlation actually went up especially in the turbulent region. So the truth is I have no idea why it does that. So it does a better job at generalizing to a higher Reynolds number. We have a few ideas of why, but it's actually very difficult to figure out why it does better and does better for standard deviation and for the higher order moments as well. So skewness and trapezius. So it's not just a bit of an artifact of shifting, you know, the probability actually truly does a better job, but I don't know why. So, you know, that's one of the issues with getting a great prediction. Sometimes you're actually not sure of why it does a better job and why it generalizes better. But still tells us that apparently there is more information in data compared to, you know, the basic parameterization that I haven't shown you here yet. And we're actually getting higher scale by just learning it. And by the way, I haven't implemented it. Now, one other caveat in addition of not quite understanding what it's done, it doesn't respect conservation. So we didn't impose any, you know, conservation laws. And so you can think about it already, right? If I want to take now this new parameterization and plug it in the climate model, let's say I don't conserve momentum, then the parameterization might input momentum forever. So, you know, your flow might accelerate without anything to damp it or might remove momentum overall. So one thing that we've done, you know, as a kind of follow-up work is we actually changed the architecture. So rather than learn the subgrid scale by itself, we learned a different component within the stress tensor. And then at the end, we took a derivative of that stress tensor. So then, you know, given the boundary condition, this entire term will integrate to zero if you have, again, kind of, you know, no normal flow and no flux at boundaries. So that's one way you can actually change the architecture to embed physics and embed conservation laws. And so, you know, we do it for momentum, you can do it for energy and mass and so on and so forth, depending on the problem that you have. So it's not completely out of the question to actually embed physical constraints within the architecture, but you need to work at it. It's not something that they'll know from the get go, even if you give it something that is technically conserving properties. The algorithm is not gonna learn that by itself. Has to be imposed. So that's, you know, something to think about when you actually learn those foundations. So as I say, it didn't conserve momentum, but we could fix that. It was based on very idealized data, right? So as I said, it's kind of, you know, those models are quite simplistic. There's only one forcing to some extent that doesn't change as a function of time and there's only one scale interaction. There's also no uncertainty quantification. So I learned a mapping between course resolution velocities and a missing turbulent fluid. But we know that, you know, those relationships are imperfect and uncertain. And so I didn't include any uncertainty quantification there. So that was some kind of followup work that we've been doing, which is using more complex data and embedding, you know, some uncertainty quantification in it. So now we're gonna take data from a, you know, couple climate model that is run at 112th of a degree. So, you know, much higher resolution, complex data. And we're gonna actually, again, retrain the neural net. So now we're kind of forgetting about the idealized experiment. We're just gonna pick just a few regions in the ocean, mostly to see if, you know, you have limited data, you know, can you generalize to the foreglove and can you generalize to different climate as well? So we picked a few regions in this kind of couple climate model. Again, you know, the data is there. It's been kind of given by, you know, our colleagues at GFDL and it's hosted on the Google Cloud, by the way, Pangeo. So, you know, anybody has access to it if you want to and actually try to train a neural network based on just those few regions. So we're gonna do exactly the same game as before, you know, diagnose the missing tendency, learn a parameterization for the missing, you know, turbulent forcing, given the input velocities. But now rather than learn just the mean, just one on one mapping, you know, I give you the input velocity, I get one prediction, we're asking the algorithm to learn the mean and the standard deviation of that missing force. So now we're asking the neural net to actually learn two parameters if you want in the distribution. By assuming that there is some uncertainty and fluctuation in this kind of highly turbulent force. So all we have to do is to change the loss function. So again, when we do the prediction, we needed to optimize, you know, some kind of loss function to give us the best prediction. So now we change this loss function. So they include both the standard deviation and the mean. So now what the neural net is gonna do is gonna predict at every single grid box, the mean of the missing forcing and the standard deviation around it. So again, you can think about the standard deviation as the uncertainty associated with that mapping. We do that again at every single grid box and we learn on only four small regions of the ocean with if you want different degrees of turbulence. So here are the results. So again, learn on four regions and, you know, learn the missing forcing, you know, from a high resolution model that it was one tenth of a degree to 0.4 in this case. So here's what we call the R square. So it's basically a measure of scale, right? So how well does it do? So, you know, when you're over here, that means that the neural net is performing very well. If you're down here, that means we're doing a very bad job, but actually even bad job is 50%, which is still pretty good. So overall, what you can see is in the majority of the ocean, the neural net can predict the missing forcing, you know, at the surface at least with high degree of accuracy, so more than 70% accuracy. The regions where we do, you know, the worst, if you want, in particular, are usually the high latitude that are covered in ice. And that kind of makes sense. We didn't train on any of that. So there's no, we didn't include anything that is interaction between the sea ice, the ocean or the atmosphere, right? So all we have were just a few regions in the interior of the ocean. But nonetheless, it does a really good job even in very turbulent regions that it hasn't seen before. So, you know, it can actually predict the majority of the globe. And so we only trained, and I forgot to say that, we only train on data that comes from a pre-industrial simulation. So where the CO2 levels are 280 per million, of course, you know, as climate warms, surface velocity and turbulence changes as well. It can change by up to 50% in many regions of the ocean. So once again, if we use exactly the same trained neural network and validated against data from, you know, a simulation in which the CO2 is actually doubled. And so that's what we have here. So again, the same R square, the same scale if you want our line of our neural net. And what you can see basically overall is there is very little degradation in the prediction of the missing forcing, whether I'm in a, you know, whether I'm in a planet that has less CO2 than planet where there is, you know, more CO2 without having seen any new information at all. So there's no degradation when we look offline at those, you know, at that scale, which is good news, right? That means that it can generalize to different regions. So different turbulence regions as we said, but also different climate regions. So that means the parameterization that we learn has some generalization properties that are not totally constrained by the data you give it, which is good news. But of course all of that so far has been run completely offline, right? So all I'm doing is, you know, training on some amount of data, testing on bits of data, the neural net as in C, but I haven't implemented it yet into a forward simulation. And now I need to take this term and plug it on the right-hand side of the course resolution model to see if it works. Doing that in a real climate model is no easy task. There's something we're doing now, but so far all we've done is using idealized simulation and try that. So that's why I'm gonna show you now. So we took again an idealized model and I don't know if Milan is in the audience, but Milan is in EOPP and has kindly written basically a Python code to run a very simple kind of flow equation model. And so, you know, we've been using that because it's great because it's in Python. So we can think our face really well between, you know, between the neural net that is being learned and the forward simulations. So what we've done is, you know, on the right-hand side of this kind of course resolution model that is run at 30 kilometer, we put on the right-hand side the learned parametrization that we just kind of created, that we learned from, again, much more complicated data. And so we put that on the right-hand side. Okay, so now the question of course is what happens, right? So first thing we need to do is we need to decide how we're gonna implement the parametrization. So it's stochastic in our case. So what we're gonna do is for, we're gonna take the surface velocity from our course resolution model and we're gonna feed it through our learned parametrization and it will sample, you know, a mean and a standard deviation. So the standard deviation is just gonna be multiplied by some white noise, right? And at every single grid box, we're actually picking a mean and a standard deviation that we plug on the right-hand side of the equation. So how does the simulation do compared to a course resolution compared to the high resolution? So that's what we're seeing here. This is global kinetic energy as a function of time. So the blue curve is a course resolution simulation run at 30 kilometer without any factorization. The violet curve is the same simulation but run at higher resolution. And so you can see that the amount of kinetic energy, so the amount of, basically total energy in the flow is actually increased, the flow is more turbulent so you expect to have more kinetic energy overall. And each one of the three curves that you can barely see because they're almost indistinguishable from one another are three simulations that are run with this stochastic parametrization. So we have three ensemble members, right? Because it's stochastic, so each simulation will give you a slightly different results but they kind of, you know, all merge it into one. So overall what you see here is that, you know, we can recover the total kinetic energy in a course resolution simulation using our learned parametrization on a completely different data set, by the way, not even the same data set as the model that we're using here without the computational cost of running a high resolution simulation. So even though learning and training a neural net, you know, at some expense is still a lot less expensive than actually running the high resolution simulation. So it can improve the simulation in those very, very idealized setup. It's stable, which was a surprise, by the way. If it's not stochastic, so if the simulation was purely deterministic, by the way, the model blows up. So the simulations are not stable and we need to tune down the learned parametrization. And the blow-up can be different. In many cases, actually, it's not necessarily numerical instability that the flow becomes completely unrealistic with extremely high velocities and almost kind of forget to what the force is and so on and so forth. So there are very subtle ways the scale interaction can happen that we don't necessarily understand. The stochasticity help actually kind of, you know, stabilize the simulation and that's not new. Many of us have seen that before in a range of simulation, whether it's machine learning or not. So that's the good news, right? So you can learn something on the given set of data, generalize as well offline and you can implement it at least in an idealized model and give you a pretty good solution and a pretty good answer in those idealized sets up. So that was approved, right? Which I just mentioned. High-skill, generalizable. When stochastic, you know, it's kind of a true representation of the physics but, you know, not necessarily, you know, a proper representation of the way you would think about it but having a very simple kind of mathematical operator but nonetheless it's pretty decent and again kind of stable when we're there. We could include some quantification of uncertainty within the last function by learning a mean and standard deviation. The cons, hard or almost impossible to interpret. You can look at feature maps, heat maps and the local gradient and that's something we're doing right now but have the time, if you don't know the answer it's already hard to actually get there but, you know, there are options. So I'm not gonna say it's completely impossible but it's definitely non-trader. Need to ensure that the conservations are actually embedded within the architecture. So it's not learned directly. And the implementation, as I said, not always stable. Not necessarily easy to implement either. I mean, there are now kind of different softwares that kind of, you know, popping up everywhere to try to, you know, blend Fortran and Python and there are many of them if you're interested, I have to talk later on. Now, the question, of course, is would it work in a couple climate model, right? When you actually implemented nothing in an idealized model that we could easily do ourselves or, you know, it's kind of a very different situation. So that remains to be tested and to be seen. And this is something we're doing and I'll come back to it at the end. But then, you know, the second approach, you know, that we've been trying has been, okay and it's in parallel right now we're merging the two approaches actually but in parallel, what we've done was, okay, black box is great, high-skill but I have no idea what's going on there. So can we try to target, you know, from data parameterization that is a bit more interpretable and can I learn new physics from it? And that was pretty exciting. So, you know, the idea came up by kind of reading this paper from Udi Atal where what they've done is they actually run a simulation using burgers equations, kind of idealized navier strokes. They use the full data collection and come up with a sparse regression algorithm by which you actually give you algorithm a set of function, a library of functions with derivatives and, you know, for velocities and curl and so on and so forth. Again, based on data, right? So the algorithm does not know an equation but can give it a piece of data from which you're actually map a given operator on. So you take your velocity, you take the curl so that gives you the velocity and so on and so forth. And you give your algorithm a huge set of basis function that is based on the data and you ask the algorithm to actually prune that set of basis function and come up with the equation that will, you know, best mimic the evolution of your flow. And so of course what was beautiful is that they could recover, you know, the simulation that they created with, you know, some basically capturing, you know the right terms if you want with the right parameters in front of them, which again, here are reflected by the Reynolds number. So that's exciting, but of course the question was here, you know the answer. So, you know what you're supposed to get. So the idea we had was, okay let's try to do that for the parameterization where we don't necessarily know the answer and, you know, trying to see if we actually can come up with something that is new that we didn't know before. So here's what we've done, you know we basically again use data to try to discover unknown equation for ocean mesoscales. Here I'm gonna show you only the part of momentum that we've done it for, you know buoyancy and energy as well. Of course this idea can be applied to any part of the climate system because then they have to be applied to just ocean mesoscale parameterization. You can apply it to any other part of the climate system. And again, this is something we have to write. So I'll show you just a couple of slides of results on what we've done. So again, we took an idealized simulation, diagnosed the missing forcing. Again, so in this case we went from seven and a half kilometer to 30 kilometer. So diagnosed that missing forcing and now what we're looking for is we're looking for, you know a parameterization that's gonna be the sum of some weights multiplying a set of basis function that we're gonna give the algorithm. And then it has the algorithm again to predict, you know, the missing forcing the best possible missing forcing given the set of basis function. So as I said, we take the high resolution model inputs we're gonna give the algorithm we're gonna give the sparse regression algorithm a library of 200 function roughly that depends on the resolved velocities the temperature gradient and so on and so forth and we go up to second order derivatives. So there are a couple of reasons for why one is because, you know you need to keep everything in memory and that's not easy. So we were limited a little bit by that and second is at the end of the day you need to implement that in the climate model and to do that, you know the basically the order of your derivative is a little bit limited, right? Because you can't have an infinitely large stencil that you can implement. But so there's something we're kind of revisiting right now that was one of the decisions we started with. And then we actually go through this kind of iterative sparse Bayesian regression which is different than the Rudi et al paper actually because we find this method to be a lot more stable. And so the algorithm basically prune through that library of function, you know we give it the threshold. We say, okay, you know if this function, you know matches the prediction, you know keep it or drop it, you know if you're below a certain threshold and basically, you know prune through that set of function and just keep a certain number that explain a certain amount of the variance. So there's a little bit of a choice of course of, you know how much of the variance you wanna keep but at the end of the day all you end up with is basically weights that have, you know some kind of uncertainty associated with them because it's a sparse Bayesian regression, right? So the algorithm is called relevance vector machine by the way if you're interested multiplying a set of basis function. Okay, so here's what we get I'm not gonna get into the detail I just wanna kind of show it and have a quick question at the end. So we kept only a certain number of basis function that explained between, you know 50 and 70% of the variance meaning that it can recover about 70% of the missing forcing. I can add a few more, you know go up by a few percent but not much more than that. And what we found is that first the algorithm learn something that is actually a symmetric stress tensor which is something that we would expect when we're looking at those properties. Also, we made sure that all the basis function could be written as the divergence of a flux so then we can serve momentum. So that was kind of a nice property. And also the parameter that was multiplying each one of the basis function was pretty, we're all pretty close to one another's so we kind of encapsulated all of them in one parameter that has no space dependent. So it's one coefficient that multiply this kind of stress tensor over here and that stress tensor depends on the vorticity and the sharing and stretching of the fluid. That was pretty exciting because it's something that, you know resemble a parameterization we came up with a few years ago. So we could actually explain and understand it pretty well. And so again, I'm not gonna go into the detail of that. So in this case, we came up with something that we actually kind of, you know discover ourselves a few years ago but there are some differences between what the algorithm has learned and what we proposed a few years back. But here, the beauty of it is now you have a mathematical expression that you can go and analyze and understand the properties of and how the scale interaction between the eddies and the large scale are happening. So I'm just gonna show you just one more thing which is what happens when we implement, you know those parameterizations in. So this is the equation discovery that we came up with it's gonna be the red curve. So in the same idealized model that Milan, you know kindly created and made available online to all of us and one based on a neural network that has, you know, conservation properties so conserving momentum embedded in it. So again, can especially average kinetic energy as a function of time. Great curve is the course resolution model. The high resolution simulation is the cyan curve over here. So again, more kinetic energy in a high resolution simulation. When we implement the neural network and we've embedded, you know, conservation this is the violet line that you see here. But again, we have to tune it down because otherwise the simulation was a little bit unphysical or variant physical. But at this point it does a pretty decent job. And over here are the equation discovery parameterization. So it gets us halfway there roughly. And this one was because we ended up with a little bit of numerical instabilities along the way but something we're working on and we think we've solved actually. So, you know, that was a little bit of a show and tell, you know, and trying to pinpoint, you know what the opportunities are and what the challenges are. I think, you know, parameterization will remain in the men's for decades to come in climate simulation. So they're not going away anytime soon. So why not to use the data and machine learning algorithm to see if we can actually improve on the current parameterization but also on, you know, understanding of those processes. So, you know, that's the way we see it. And equation discovery is one example there, right? But deep learning is another one. If you actually manage, you know, to interpret them or find a way to interpret them. But if all you need is skill then they're still, you know, doing a really good job. So I feel like there's a way to actually merging both approaches and that's something we're doing right now. So can we get the best of equation discovery and deep learning to both improve our understanding of the physics and, you know, and basically having a high skill that we can implement in climate models. You know, some thoughts and kind of a way forward. I think, you know, there are a lot of exciting possibilities. There's also a lot of hype, right? Everybody's working on machine learning right now to, you know, to do something. I think one has to be careful. There are many challenges ahead and we can't just assume they're not there. But, you know, I think, you know, bringing physics and data together is definitely a way forward. There's a lot of information to extract in the data. Machine learning algorithm for me are just an additional tool in my bag. The same way I use theory or climate models than machine learning, it's just another one. And, you know, I think in general, it's always good to have, you know, diverse approaches so you can actually, you know, try things out and combine them when necessary. And there's no way that, you know, that it's not good for science can help us actually accelerate, again, both the field of climate science in general, but also modeling. Would they be as transformational to climate model as robotic and NPL? You know, it remains to be seen when we think about machine learning and AI. They really transformed other aspects. We hope that they can transform, you know, climate physics. But again, I think there's a lot of opportunities and a lot of challenges that I'm sure many people in the audience are keen to work on. Just a couple of things if you're interested in those, in those ideas. So we have a program at the Kavli Institute for theoretical physics, this form on machine learning for the physics of climate. And so that, you know, encompasses everything, not just parameterization, by the way, from, you know, low order reduction to understanding complexity within the data. So I believe applications are still open if you're interested in joining. There's a conference at the beginning for one week and then a long program for people who want to stay. And of course, you know, we have a lot of things coming on the pipeline with n-squared line, which is on the project that really tackles the parameterization problem in a way that is interpretable physics aware and something we can implement in climate model. So I leave it at that and happy to take questions at the end. Thanks a lot. Well, thanks a lot indeed for this very inspirational talk. Really exciting. Okay, questions. If people have questions, then please I would ask you to raise your hand. And then we can go through them. Okay, I don't see any, I don't see any right now. Let me go ahead myself. You mentioned a couple of times that you had to tune down so if there's additional term that you learned with your model to make sure that the model doesn't go haywire and that it stays physical and so on. Is there any danger into running something that is subtly wrong but also incorrect that you wouldn't necessarily notice and then fix by dialing down such an additional term? Yeah, absolutely. So we made the assumption, we made a very strong assumption that what we were learning was the truth. And so it's kind of an interesting, that's an interesting problem to some extent because it's not the truth. We still diagnose that from a simulation that is incorrect to some extent. So it is possible that we're also learning something that isn't perfect. And that's one of the issues. So what we're trying to do right now is trying to actually bring observations so by what we call transfer learning. So you learn this missing term from your climate simulation that you know is not 100% correct but nonetheless the best you've got and then we recalibrate with observations which are technically a little bit more true. And that's really the way we're trying to approach that. That's a very good question. I see, thanks a lot. Okay, Jonathan, you also have a question. Please go ahead. Yeah, hi there. Thanks for the talk. I had two questions if that's okay. So first one was when you were looking at how well it was performing in inverted commas against the high resolution model for kinetic energy, you were taking the spatially integrated kinetic energy over the whole ocean. And really what you want is the kinetic energy in the right places both at the surface. And I mean you've muted yourself. Yeah, and I'm here. I'm here. Yeah, so you're absolutely right. You're absolutely right. And so the thing is like if I look at the map then I will actually see the correlation in the right places in the regions where we need a kinetic energy. So the correlation definitely does a better job even when we look at spatial maps. The one thing that we don't do very well, I forgot to say it, is actually the mean state itself. So say the mean velocities are not as good as what we would want. So there is an improvement but not as good as what we would want. We're trying it out actually in more complicated model and now we're seeing much better improvements so we're looking at a bit more turbulent periods. Well that was sort of my second question. You also said that you don't conserve momentum but and how difficult or possible would it be to have a constraint in the CNN or the deep learning approach whatever you use to have conservation of momentum, conservation of mass, conservation of salinity, whatever it is you want to solve for because I think that that you don't want some horrible drift or bias or something like the ocean getting salty or whatever it is. How possible is it to impose those sort of constraints in the learning hour? Yeah, so that's something we've done right. So we kind of self-corrected initially we didn't conserve momentum then we built within the architecture themselves. You can build within the layers, conservations, so there are different ways you can do that. Either you learn you know set of fluxes rather than divergence of the flux and then you know you directly implement the flux so then it's divergence free or there are so that's one approach. For example, you can build it within the architecture. Usually that's the way we kind of think about it. There are a couple of papers that came out putting out constraints or self-constraints with the last function where you have an additional way to actually impose consolidation. So that's definitely multiple methods now. But the one thing I would say as a caveat is the client system is not the closed system. So for me the part I'm thinking about right now and thinking ahead is yes, there are some consolidation properties but we also have exchange between medium and so that's the one I'm still scratching my head at because for different parts I don't know how to impose that. So that's not that was not your question because I know how to answer your question but the next one is more complicated to me. Yeah, yeah, okay, thanks. Okay, thanks a lot. Actually I have a second question which is sort of a bit vague and a bit looking forward. So you've mostly focused on correcting your simulation one specific way. You've identified some small scale physics in which your simulation is incorrect and then you've tried to correct that. And I'm either trying to think about generalizations of this very nice technique throughout the fields and throughout the problems and so on whether simulation might be wrong in multiple independent but non-equivalent ways so to speak. Do you have any intuition of how simple or difficult it would be to generalize that and to have sort of multiple terms that are learned either simultaneously or after one of the other and then everything added on top of one another would that make sense? Yeah, absolutely. So I mean in this case we are learning a vector technically, right? So we are learning two components of the vector at the same time. Now the question is always going to be do you have the right data set for that? You need a data set that will actually capture the interaction between the different pieces. And so yes, so this is something we're thinking about as well, absolutely. Okay, nice. But non-trivial nonetheless, right? Because again the phase space now is changing quite drastically and you're putting a very strong constraint on your phase space which can be good and bad at the same time. It's good for physics, it's bad for the algorithms sometimes because it's hard for it to converge. Yeah, I see, I see, nice. Okay, anybody else have any questions? If that's not the case, then let me thank you very much again Laura for this great talk and for taking the time to answer the questions and hope to see you all again next week at the same time. Thanks everyone.