 Okay, thank you, Alec. I will continue with part two. Now, let me share the screen again. Hello everyone again. Let's try that. Can you see that? Yes, I will. Yes, good. I'll just turn off my video as requested. You can still hear me? Yes. Okay, so hopefully the internet will be smoother. So for the second half, I wanted to talk about two areas, just being the first of two, I think, very exciting aspects, that has already taken hold in our field and in geophysical inversion and inference more generally. And I think, undoubtedly, are set to continue. The first of these is data science and machine learning. Now, there are many talks, lectures in this workshop that cover aspects of machine learning. And let me say, I am no expert in machine learning. I'm sure others here are much more expert than I am. I come out from an outsider, eager to learn and to be amazed. So, in this context, what I'd like to do is I am hearing myself. Yes, correct. Maybe you will switch off your microphone and again switch on. I did that, yes. Oh, now it's fine. Did you hear me now? Yes. Yes, fine. Okay. Sorry about that. Okay, so here's a Harvard Business Review article from October 2012. And it says, a data scientist is the sexiest job of the 21st century. A friend of mine, a colleague once said that he was at a conference. He's a computer scientist, a data scientist, and two of his students got hired while he was in the pool at the hotel. And it's really a boon area for data scientists. But what is a data scientist? They deal with data. So data science, really, I think is defined with three different pillars. One being data management aspects, discovering stewardship, creation of data. As data sets get large, we've already talked about that. There's a whole area of how you manage that data and make it accessible and findable and interoperable. Data engineering is about delivering data to computation, largely, and that's a sort of software engineering aspect to that. What I'm going to focus on is the one at the bottom, often called data analytics, which are methods and tools for collecting and learning from data. And learning from data, I call that inference. That's really the meat and potatoes. That's where a lot of us are working. And data analytics is really a sort of super set of the things we've been doing for many years. Being recognized. Geoscientists have been in this game for decades. It's received a sort of a new lease of life with the advent of data science as a discipline. Geoscientists have been in data management, especially in seismology and all across the Geoscientists for many years, and particularly in trying to do new things with data. It's a very creative area and I think an exciting area for the future. Now, machine learning itself, here was a figure I stole from the internet, has many little bubbles living after it from supervised unsupervised learning reinforcement learning classification and regression which is a sort of to store what the statistics that are very much in this field. The definition of machine learning is a branch of artificial intelligence in which a computer progressively improves its performance on a specific task by learning from data without being explicitly programmed. By and large that boils down to things like regression and classification almost commonly and deep learning we're hearing a lot about now as an extension of machine learning that uses the concept of neural networks to loosely simulate information, processing and adaptation pattern as seen in biological nervous systems. So it's an analogy, a way of representing complex relationships from data, but this new field of machine learning indebted to Andrew Valentine he pointed this paper out to me. Let's read the first part in the first phase of project prep a multi factor classification technique assault resembling learning machine, which have been studied as pattern recognition automata was experimentally applied in this case the forecast in solar flares. And this comes from a paper, a report written for NASA by a CM site and a in Murray in 1965 perceptrons are really another word for neurons in neural networks computers that this is before human beings have been to the moon. The concept of basic concepts of perceptrons and learning machines was around in 1965, and there's at least one paper on it. So, perhaps it's not such a new build. But really, the sort of modern area this is from Valentine and Trump and 2012. I give you for data sets here. Now. There was missing. Okay, there you go. One of these is a seismogram. One of them is the temperature in Birmingham over 50 years. I think that's Birmingham UK. I was in Birmingham, Alabama. One of them is white noise. And one stock prices. I would guess that most of you could work out which was which, some by recognition some by process of elimination. I would hope most seismologists would recognize the bottom one as being a seismogram. Anyone who's seen white noise would probably recognize the second from bottom one is white noise. An annual oscillation in the second from top as a temperature in the top one, the sort of behavior when one stock price goes up and down. And indeed, that's because we can all do pattern recognition. We can classify these things because we've seen such things before or we have a mental picture of them. Machine learning knows nothing about this but tries to classify them by looking at many examples of the same thing. Classification and regression or the two sort of staple problems looked at by supervised learning on the top is an example here of telling the difference between apples and strawberries. There are many examples of a similar color, they have greenery associated with them, but by seeing many examples and training a neural network, one can try and classify for any new examples one sees whether an apple or a strawberry. So by seeing many labeled data in supervised learning, we try and estimate the same thing for unseen data or new seen data unlabeled data. Bottom is my simplest possible regression problem distance against gravel size for a data set putting a function through data here's a simple linear line. Okay. The key idea here is if we show enough examples of such things. Machine can try and learn, which is really about forming regression problem so regression problems that statisticians have been doing for many years to that in a probably more sophisticated manner. And in some sense, a bit of a blind hybrid box doing such things, and the game with them we can make predictions of outputs for future input or generate new outputs in a similar style. We've already seen many cats and dogs and celebrities photos coming from other celebrity voters on the internet. But these are the basic ideas behind supervised learning. Now, as this has entered into seismology. I'm indebted to the tea shaker who went through the literature for me here. The first applications in my own show room in the 1990s, and typically the problems where we problems of data analysis, picking of arrival time. We have a lot of problems that is deconvolution discrimination between earthquakes and artificial sources or problems that need to be sold. But ones that we have other ways of doing it. This was an early demonstration is possible to do these sorts of things with machine and early now rhythm. In the early 90s, early 1990s, it's getting on for 30 years ago now. The tools were less sophisticated than in more modern cases, as in the upper case where the same style of problems really have been addressed with more modern machine learning algorithms. One of the problems between these two is that we have much bigger computers these days. So improve the computational capacity has lived in vastly increased sophistication of neural network in particularly in seismic applications. Here's two figures from papers on the left hand side is a two layer neuron binary pixel image from 1993 where they have essentially a two layer neural network. And on the right is a 2020 paper by Musavi et al where they've got a 70 layer model, but using different types of neural networks. It may make sense to talk about convolutional layers and other types of layers in there, all mixed in. Now, what we're getting into here then is a sort of heuristic decision of how you put all these different layers together and how you build a neural network. And there's, there's a lot of understanding and expert knowledge that goes into these things, which is possibly a weakness in that the next person comes along and tries to reproduce it. And their expert knowledge might not be as high, but you see the vastly increased sophistication from the early applications on the left to the more recent applications on the right. Okay, now I will move on. So a more exciting application would be to detect new signals in seismic data, rather than perhaps just doing jobs that I could do picking a rather times the convolution I could do other ways. So we're extending this idea of finding signals in data to finding new signals in data that we didn't know were there. And here's exciting for the papers by Relay Ludeck in 2019 and Hubert in 2020 from the same group, looking at Cascadia subduction zone, and particularly looking at signals that correlates with GPS signals, time dependent repetitive signals, shown the build up of seismic energy, suggesting months long nucleation of slow flipping Cascadia. So this is now moving into and I think this is where an exciting area will grow in the next 10 years where we start using these tools to try and suggest and detect signals that could then be focused on and scrutinized in other ways, possibly, and find signals that we had not discovered before in data. Now that is very exciting because I always think finding new signals in data. There's nothing more exciting than that. This is about correlations between seismic noise features and GPS signals, but there could be many others. Now, again, because it's based on machine learning, it's a sort of bit of a torturous path. If my machine learning algorithm is better than yours or maybe I see the signal and you don't. So the details of how we do it are very important and really need to be developed in this field, but we can begin in to see exciting new applications here. Okay, so a related area that is obvious one that will grow is more on the regression side. Now, surrogate modeling going under the nose and the surrogate modeling surrogate meaning in the substitute or a placeholder something which mimics something else. Now, you could argue most of mathematical physics is a surrogate. It's an approximation to the real world, usually typically where they continue them using differential equations and using classical physics to describe phenomena that itself is a surrogate. This is sort of a step removed from that, trying to mimic behaviors by showing training examples again where there is no physics. So we've all been sold things because we've looked on the internet. People trying to eat commercial data and mimic behaviors that have no physical understanding so they can predict your behavior and recommend things to sell you to and or recommend using live data sets of actions. I've heard this described in that some some purposes these can be put to a not be one to necessarily want them put to maybe I don't want people to sell me new types of pizza or maybe I do want someone to recommend me pizza to me because I don't know what one to choose. But in the physical sciences as below here, we have physics. We have ways of solving that problem. So what would be the benefit of the surrogate model here? Well, it may be that you could use a surrogate model to solve these problems, approximately, but very fast, computationally efficiently. There's a trade off between the benefits of doing things quickly to the approximations that are involved in quantifying the accuracy and I will try and click on this great video here from the reference below. And this is a neural network, I think surrogate model, modeling coupled with some excellent graphics, modeling the Navier-Stokes equation and flow around an object here. So we're going to be flowing around a rabbit and there are the arctic triumphs and that's actually quite a complex nonlinear Navier-Stokes equation and comparing the result of a neural network here with actual finite element calculation solving the same equation. And to first order, they are very similar, capturing many of the properties in the Navier-Stokes. I'll just let that play out here. And when I first saw this video, I was quite amazed that a neural network approach had got to that level of sophistication. One of the reasons they do that is because they're not just looking at blind examples, they're beginning to include the physical laws in building surrogate models. So there's some sense of hybrid between just simply looking at samples of the initial conditions and the final output or examples of solutions to try and mimic them to combining that with physical laws. So here's an example from Matthias Schulte, a PhD student between Andrew and I. A simple ODE at the top left, a differential equation, yeah. That's an easily solved one of the analytical solutions you have there, but the way one might do this, and this is a very simple example, is to train a neural network to find the function f of x, which minimizes both the fit to the differential equation and the boundary condition. So that's determined the bottom left here, and by using it in this way, the example on the right, it's only approximate. The star is with the horizontal axis is the boundary condition, and so where the boundary condition is met, it's very accurate, but because it's informed by the physics, it can use that in mimicking of the solution. And we see the analytical solution in solid and the approximation in a dashed line. So within the neighborhood of the correct boundary condition, it's reasonable, but it also provides the first order solution elsewhere. This is a very simple example. Here's papers on a much more complicated example by Mosley et al in 2020. There's a number of references above there to something similar, trying to use correlations and physical laws and neural networks. And I think this is quite exciting. So this is looking at a wave propagating north, south, up and down as a function of time across the page. In the top set of panels you see the ground truth, which is the correct and solved with a wave equation solver. Different time snaps, you see the waves moving and reflecting. I don't actually have an animation of this. The middle panel is prediction using a neural network, which is not informed by the physics that can only reproduce what it sees. It can't extrapolate particularly well, because it only can do what it sees, so it needs to see everything to be able to mimic it. But ones that have physics involved in them in their optimization phase, as I described earlier, ones which combine physics in the bottom panel here. You can see the same data goes in the second one and the third one, but the third one captures all of the reflections you can see there and is much closer to the truth. Because it's using information, not just the solutions of ODE's or PDE's, it's actually using the solutions and knowledge of the equations that have been solved. And in this way we call it physics-informed surrogate models, and that's really where much of this field is going. It seems like a good compromise and a way of comparing somewhere between where the hard way and solving them with just simple examples. These things then become, the bottom is the surrogate model of the top that have always happened to solve differential equations explicitly and essentially much more efficient typically to use. And in doing that, of course, we could use them in inversion, because we might need to solve differential equations as our forward model. That's the underlying idea here. Now, more generally, a class of machine learning I think is interesting is called generative models or generative models. It's a growing trend in machine learning to use these ideas. The four dot points there are all examples of in this class of variational auto-intoders, generative adversarial networks, which is called GANs, diffusion models and flow-based models. This is a sort of a trend of the last five years, which is now appearing in our own field. Such things are responsible things, but deep folks, you might have seen that on the internet, but a typical use is training the neural network to mimic the features in training data and then generate new output, easy images in the same style or models in the same style. But there are already applications of these ideas. Again, to sort of earthquake or data-based problems like arrival picking again, earthquake and noise discrimination again, and seismic data interpolation, augmentation and reconstruction. So the initial applications of any new idea seem to be to the same set of problems, and then they get expanded. But there are a number of recent papers, I think are very interesting, which is applying these things to direct inversion applications. Now, they really fall into two classes there. Dimensionality Reduction by Lloy, Moza, and Lopez, Aziz, Al, trying to reduce the size of the problem. And I see that a slide on that, which amicably because of the time to time make inverse problems work in a smaller dimensional space. And in doing that, we might argue that that is a benefit because it's simpler to solve or actually may be harder to solve. We may have fewer unknowns if we use Dimensionality Reduction, but there's nothing to say that the problem will be simpler in a smaller number of unknown. But I think that the decisions really that open the field here as to whether this is the benefit of Dimensionality Reduction that's going to pay off in geophysical inversion, that's an exciting idea. And the other class would be model space samples, which is sort of closer to Bayesian inference there. And there are several papers there all from Publix this year. One including my friend Felix Herman and another one by another friend of mine, Angie Curtis and his group, trying to use generative neural networks to sample model spaces and look at the Bayesian class of problems. Now, I'm going to present some work along this line. Here's my latent variable inversion slide. By this then what we mean is, we take a high dimensional model as in the top left here. You use, let's call it an autoencoder in this case, which trains that model, passes them through fewer and fewer layers then back out into larger layers. And by going to fewer layers, the idea is that you reproduce the model you put in. Okay, we go around in the circle, but if you can train a network to reproduce models that go in, where they pass through this little gate, they have been compressed into a latent variable space. And that's an interesting way of finding the small number of unknowns that represent the large structures of typical models you want to revert for. And of course that depends upon the quality of the data that you see, but you can control the compression. And then the idea would be to do inversion using that half of that network where the unknowns are in the two green dots in the middle. You can decompress this neural network to produce the model and we try and solve for the little few unknowns in the middle and fit the data from the predictions of the larger model on the right hand side. But as I said, there's a price to be paid and that the inverse problem. It's not really clear, may or may, may or may not be easier or it may be more difficult in the condensed space. But I would say that it could potentially be more difficult because as you scrunch the problem up in many respects, the price you pay for that, the optimization problem may be more complex. And there's some evidence for this in the paper by Lloyd et al 2019. So this, this may be a way for the future, or we may actually make be making problems smaller and harder. But we don't know. Now, I wanted to sort of end this part of the, my, my talk before I get into the, the next part of this part is to talk about an example from material slices work is sort of paper and prep here. And the idea here is to use a generative model to try and represent the output of a large inversion a large Bayesian trans dimensional inversion. This is the work of Seaman we saw me and co-authors ANU. There's a large Bayesian model of the core mental boundary here on the left top left you see their mean model produced by a large number of hours. This is in front of me of a super computer during trans dimensional inversion looking for ways these models to fit the data of shear wave speeds concerning shear waves with a core mental boundary. And they've done this to do some very large model and then the question is, well, what do we do with this ensemble. It's enormous. What would other people like to interrogate it and typically what we do is just distribute the mean and the standard deviation on the right. And can you do more. And what Matthias has shown is that you can, in fact, you can use a generative model to try and mimic using using the ensemble output of an inversion as training data to represent the same structures and classes of structures in a neural network. And in doing so, the size of the neural network in this case is reduced by somewhere between 95 and 99% in terms of digital volume. But, more importantly, also allows us to reproduce the ensemble so you can now use these things to generate more models of the ensemble. So, one could then make this much more open science in the sense of it may be difficult to distribute the entire ensemble. But third party people can take the results here by taking the neural network that's been trained on them and generating their own models or generating any properties of them. And to all intents and purposes, they're very similar. I have the same statistical distribution. Now I'm going to try and convince you of that here on the top line of the results from the MCMC and the bottom. The top up as we go through we have the mean model in Australia, my favorite place, the standard deviation of the ensemble, the skewness and the ketosis. Okay, so these are higher order moments and there's the covariance function in that geographical region. And at the bottom of the reproductions from the gang. I think the first order they look very accurate, we would cover the mean, the standard deviation, but also the non Gaussian path, the skewness, the ketosis, and reasonably the covariance function. So it's not previously really been possible to distribute or make such results in versions accessible. But we believe that if you use a gang in this way, a W gang, as it's called, it's able to capture the higher order moments, not just the mean and standard deviation, and then rapidly, it's both small in volume that can therefore be distributed or use related purposes, and then generate enormous numbers of almost unlimited samples, pretty much for free using the gang. So it's a way of mimicking the output being a surrogate, not for not for the inversion, but for the output of an inversion. Okay. So, as I come to my final part of my second lecture, I wanted to touch on a different topic, which is, we're moving to build optimal transport. Now, I just want to give you a bit of a taste of this idea of optimal transport. Now, I will look at references in a minute, but let's look at this question of how do you fit data. So, on the top here, I have an observed waveform and a predicted waveform, and I asked the question, how close is one to the other? That's a central question we're asking all inverse problems, how do you measure fit to data? Okay. Now, in the bottom, I'm going to show a movie where one waveform, the blue moves across the orange. And the bottom, as you can see, is a least square, the sum of the differences of the squares of the two waveforms together. This is the sort of plastic in most common way. And you can see when the wave line up, these are double wricker wavelengths, whereas they line up, the fit to the data or the misfit goes into a minimum as you see around about the offset zero when they're perfectly aligned. But there are many local minima elsewhere as different parts of the two waveforms line up. And that's a classic example of cycle skipping, and this type of problem has been known for many years. And this example follows a slightly modified one by John Ingerest and Faris and fit to data by simply looking at the differences in waveforms, which is by far the most common is fraught with these types of cycle skipping problems. Now, this is where I'll just take a departure, the topic of optimal transport comes in. Now, optimal transport dates back to essentially a town of Napoleon and his scientist Gaspar Mont. Now, imagine you had a pile of dirt, as on the left hand side, the orange pile, some arbitrary pile of dirt. And on the right hand side in blue, I have some holes, some complex shaped holes, and the volume of the holes is equal to the volume of the dirt. When I ask the question, how do I with the least amount of work, in this case, physical work, move the pile of sand into the holes. And generalize that's a problem studied by was the one first encountered by Gaspar Mont, or at least is one of the persons that wrote about it way back in the 19th century. And the classic way of posing the problem is to say, well, it's to write it as an integral. If I have a way of measuring work, in this case, I'm talking about see if I define a transport map, a T of x is the parcel of standard x moves to the past the position y. So I'm looking for a way of mapping all my x's into my y's, which is my T of x, I'll call that the transport map, such that the work I do is minimal. And by work there, I mean that the integral of the sum of all the parts of the distances I travel times the volume of the math I take is minimal. And that's represented by the integral below. To generalize this slightly we can define work in different ways. I'll define it as a norm again remember my peak was one or two. I'll define it as the square of the distance or the average distance. And I notice if I, if I take the value of the square of the distance, and I multiply it by math, that's proportional to energy. So the particular choice of taking the square of the distance, as my measure of distance between x and y, I actually get something which is proportional to energy which is the work, but it's a general formalism where I could choose another. Now this work derived from the work of Kentorovich in 1942, and particularly the work of Vilani in 2003 and 2008. So Vilani won the Fields Medal for Mathematics on this problem, and Kentorovich won the Nobel Prize for Economics in 1970s for essentially reformulating the same problem. And here I'll briefly go through where we can consider this problem as a way of finding a mapping from one distribution which I'm calling f of x to another g of y. You'll see that as I map from one to the other, I can actually get a measure of the distance between the two distributions which is the work in moving one to the other in some sense. And I'll call that WP, and that's what's known as the Wasserstein distance. So, for a choice of P, as I mentioned, it's a distance to the power of P times the mass, which is the area under the curve, I can form something called the Wasserstein, the one Wasserstein distance or the two Wasserstein distance. And I can seek the transformation of one function, the top f of x into g of y, it will map exactly and give minimal work in some sense. And solving these problems was really the breakthrough by Kentorovich, he managed to solve it, it turns out it's just a linear programming problem, which is very familiar. And Mancol, we don't hear you. Sorry. And we'll gas in. No, it's fine. So I'll try and play my movie here on the top will take a simple linear average of the two and move one into the other. And if I, by linear average, I mean the amplitude of one moving to the other. So this is, this is the path of transform between the two end members, if we're using a difference in the height of terms, which is the classic way we fit seismograms these days. But in the bottom one, I'll show you the path formed by optimal transport for the same problem. And as you can see, the optimal path, devised by this new style of idea, but it's not that new, great going on at least. Is a path where it essentially knows where the other one is it moves both in amplitude and in time. So this different style of path is in some sense, more sympathetic to the structures it's not just simply looking at a different is an amplitude. So the way we transfer one object into another gives us a distance between them. Here's my little example using what we call slide to Wasserstein, the dots on the left are a Torah. I've colored them in on on the right is a cookaburra, which is a famous Australian bird, and in mapping one to the other using the same Wasserstein approach. You can see which dots move into which space the green moves into their coherent left and they coherent on the right. You can transform objects from one object to another. But from our perspective, the transport is interesting, but also the measure of distance that that gives us gives us a way of measuring how far away the Torah is from the cookaburra. So I'm showing the same idea of transporting shaped in 3D from a red cow to a blue duck to a green Torah. Anyway, fun and game. Oh, here I'm going to transport to give you a feel a famous symbol. On the back of my computer, but it's green as an apple into an orange on the left hand side is a simple weighted average between the two. And you can see, as with the Gaussian example, it's the two amplitudes one to the other one fades to the other. And that's a linear map. Optimal transport does something different and I'll play it now as the green will move to the target and you'll see the different character. Sorry, there's the linear one again just to show you that's really just one fading in the other one says enough to linear some. But if we use optimal transport. It's consistent with both the color, the amplitude and the geographical position, and it moves from one to the other. Yeah, so as you use these more sophisticated ways of transforming one object to another, we also get a distance. Just as you do the least squares measure is caught on the linear case optimal transport gives a new distance and if we've got a new distance. We can use it to fit and therefore use an inversion. And that's where we're going with this idea. This was introduced into the exploration industry way back in 2014 by beyond interesting toe. There are many important papers there, and also by Ludo, the TVA in his group, a whole series of papers. There's different ways of doing this is open questions about how you solve these problems. And lastly, how you transform one seismic trace into a density function. So, the thing I didn't mention here is that this idea only works when the functions are positive. Okay, so you have to transform your problem somehow into something that's positive, and how you do that can be one of the open areas of this problem. Some of the work by the TVA here is in some very innovative work on how you might do that. More recent papers on gravity inversion and user insights and receiver functions as a mystic function for inversion or proposing that that's what you do. Okay, and here I've shown a simple example. It's very similar to the Gaussian example, but rather than with the linear on the top with these squares again, simple weighted average how you move the blue to the black. But on the bottom is the optimal transport and you can see it moves both in amplitude, converting the blue to the black curve, and in time. Yeah, so it's actually moved in two dimensions rather than one. And in doing so, it provides a sort of a smooth mapping from one to the other function to solve for it or to try and optimize in inversion. And here's a, again, an example, a simple experiment, a variant of one number, including three, where I'm playing that same thing again, the middle is the least squares mystic function, the bottom is the Watterstein distance between the two. And you can see it converts a multi minima function in the middle to a smooth quadratic function on the bottom. It's very nice and easy to invert. We're using our own version of Watterstein mystic here, which I won't go into the details of the paper under review. Here's a simple example. I'm going to finish with how it could possibly work in a, again, a three parameter simple problem. I have two, two rickshaw wavelet, double rickshaw wavelet, and I'm going to have a three parameter problem where I'm simply going to move the blue by shifting the origin time shifting the time shifting it. I'm going to change its amplitude. I'm going to change its frequency. On the left hand side, I'm going to try and fit the orange with the blue by minimizing either the Watterstein distance or the L2 norm between the two. Now, rather than do that for you, I'm actually going to show you the mixed functions themselves. Here on the top left and top right is the Watterstein distance, the L1 norm on the left and the Watterstein one measure, Watterstein two measure on the right. And you can see that these are simple functions with unique global minima. On the left it's like a folded piece of paper. On the right it's actually a quadratic surface with a nice beautiful minimum to solve for. The corresponding least squares mistake, which is a standard comparison here as multiple folds as you can see, and it becomes a more difficult optimization problem. So we convert multiple minima into a single minima in this simple problem. And that is actually exactly the same problem again. And I'll just finish off as I come close to the end of my time by showing you this is again from this paper that under review. It is a mystic function for an earthquake where we're using seismogram, what we call finite free choice. Pf waveforms, essentially high frequency GPS involving static offset. Essentially, it's seismogram fitting for earthquake location. And this is three geographical slices. The top two are for exactly the same problem. But looking at the mystic function as a function of defined by differences in waveforms, that's the standard L2 measure. Now, you can see here, the light colors, the pink colors are low mystic. And the middle is where the solution is. It's the global minimum is close to the true solution because it's a noisy seismogram. But you see multiple minima in the top left and you see a few multiple minima in the top right. There are two different steps like this through this mystic function. And in the bottom, you see the marginal Wattestine algorithm, which is a variant of the previous stuff that is in this paper here. And the real point, the message of this is that it looks much more quadratic light and simple to optimize. And if you do actual tests on this by optimizing the top, finding the earthquake from many starting points, many more converge in the bottom than the top, simply because it is a simple mystic function. So we're building up some evidence that the Wattestine idea can be extended to earthquake location and in also moment transfer inversion. Now, as I come to the end of my time, my final slide, really to sum up most of the things I've talked about here. So there are many new developments in inversion of geophysical data. And essentially, all the ideas I've talked about are translated from other areas of the sciences. Sparsity has come from the signal processing computational maths community and machine learning from the computer science community and optimal transport, really from the pure maths community. And that's what we do in the geosciences. We learn from others, we adapt, and we make use of. And I think that's really an exciting area of doing so for inversion, bringing these different ideas together from other fields into the geosciences. So we can expect new types of signals to be found in geophysical data, particularly from machine learning. We can expect new ways of performing inversion because we'll have new types of data. The principles in version aren't changed. Largely, we're interested in getting models or classifying or doing base in sampling. And I think it's an exciting time for new mathematical computational tools. And I urge all the students out there to experiment with these things and try something new because they will be required. It requires a multi skill set in the future because the areas on which we're withdrawing ideas from come from different fields and we need people to understand those. So we'll learn new things by doing things in new ways and asking new types of questions. Okay, I'll leave it there. Thank you. Malcolm, thank you very much. This was a wonderful lecture, and I think it's all participants will understand why we placed your lecture as a first lecture. And they say it is a really, yes. It occurs sometimes occurs.