 We're in the fourth and final panel here, which is Lessons Learned and Next Steps to Advance the Science. And we've asked Greg Baroza to take on the challenging task of kind of summarizing this meeting and kind of in the context of what's happening overall in the field. Yeah. And that will lead to a stimulating discussion, hopefully, at the end of that. And at 4.30, we need to end sharply because Greg has to depart. OK, thanks, Bill. Yeah, so I've been sort of dynamically modifying my talk to conform to the topics that have come up that I'd kind of like to revisit. And I have to apologize in advance I'm going to revisit it entirely with work from my own research group. But you can do the mapping easily enough to your research group or things that are not seismology. So how do I navigate this thing? No. Thanks. Oh, you click. OK. So one thing I'd like to emphasize is that we're going to have to deal in the very near future with very large data volumes. This is from Cong et al's paper from a year ago showing that there's something like half a petabyte of data in the IRIS archives. But the data volumes are going to grow really very rapidly. So a single DOS experiment, for example, fiber optic experiment running for several months, just a single fiber can generate, again, as much data. And we have to ask the question, how are we going to process this data? We certainly need some kind of scalable algorithms that can work with and extract information from large data sets. Can we move this much data around? Does it make sense to try? These are challenges that we're going to have to face basically right now. Just in the last talk, the notion of legacy data came up. So computer scientists sometimes call data that's hard to use or is not used as inaccessible as dark data. So isomology has a lot of it. So over there's instrumental seismology started in the late 19th century. So for most of our data measured by time is analog. And digital didn't become disandered until, say, the 1990s. So we have a lot of old important data that's perishable. And we have to figure out how to deal with it. And I gather that this committee had a meeting, convened a meeting, a workshop a month or two ago that I couldn't go to. But one of our students has been working on using not the vector seismograms, but images of seismograms. This is film data that is scanned. And she has figured out how to extract, basically, use that as the data and extract, for example, arrival time data. And it avoids the need to do the difficult vectorization that you can imagine trying to follow that trace. It's not easy to do. And you can extract the same sort of information that we use for precision seismology. And this is looking back now 50 years at the Range Lee earthquake control experiment. It's an experiment we can't do right now for various reasons, having to do with liability. But it's a very important experiment in the history of earthquake science. And we can go back to it with modern techniques and get a clear look. We're detecting more earthquakes. It's going to get much better than this. This project is kind of a weird pastiche of analog data and machine learning, which is what we're using for this. So there's some real opportunities applying machine learning to legacy data sets. One of the issues that's come up is how to sort of broaden the impact of AI machine learning or data science throughout the geosciences. I'm a seismologist and I study earthquakes. So my approach to trying to do this is to get other people in sort of the earthquake science near field looking at LIDAR data or GNSS data or INSAR or simulation output to get them interested in seeing how, get them motivated by seeing how much of a difference it makes in seismology. But there are no other approaches to broadening the reach of data science. So one of the issues that came up was how to choose the right approach. We have a lot of experience with this. And we're slow enough at doing our science that the right approach is kind of a moving target. So you have to do what you can to stay abreast of the field. And that tends to motivate one to be closely connected to real data scientists who do this sort of thing full time. But just as examples, we use this data mining algorithm, uses locality sense of hashing. Clara Yoon, who's on the call at least earlier, that was the core of her thesis and aspects of the Karrion's thesis as well. That's an unsupervised method. It exploits the similarity in unlabeled data sets. So it's a data mining technique. For other applications like earthquake detection, denoising, phase picking, we use labeled data. So those are supervised methods. And for some classification, reducing dimensionality using deep auto encoders as we saw earlier today can be quite helpful. So Zach talked about this deep denoising algorithm. So that's an algorithm. It's a very deep neural network that learns to separate signal from noise by learning both the signature of signals and noise. So we work in the frequency domain. Small local earthquakes are initially broadband and they decay in a frequency dependent way with time. This deep denoiser learns the characteristics of the signal and the noise. This applies a simple mask to try to separate the two. And for the data that's like the data we've trained it with, it works quite well. So this just shows an example. We have the clean signal at the top. Second panel shows noise. So these are recorded independently. We do a superposition. So in this case, we know ground truth. We know the signal. We want to get back. Thanks. And we see how well we recover it. And so here we recover it pretty well. The inset panel shows in detail the wave form at the initial part, which is what we usually use to measure the arrival time. And the recovered noise is at the bottom. And if you look closely, you can see that the noise sort of necks down to lower amplitude. It's clearly some crosstalk between the two. But this is a much better method than simple filtering or something like that, which is a very crude instrument. It doesn't use the time-dependent character and frequency-dependent character of these signals to do the denoising. And this has a lot of potential applications because all of our data is noisy. And the data is probably most helpful for the data that's the noisiest. So urban seismic monitoring, seafloor seismic monitoring, near volcanoes, those are all applications where this I think has a lot of potential. And sometimes we want to get rid of the earthquake signals. So we can do de-signaling if we want to do ambient field correlations, for example. Okay, Zach showed this histogram of the neural net-based P and S-pics versus sort of the standard approach. And you can see that it doesn't do that much better than the P wave, but it does dramatically better though with the S wave. But the important thing, the one thing I wanted to emphasize here is that this is compared with reviewed picks which are somehow taken as correct. Well, they're not necessarily correct. And this brings up the question that's come up in a couple of the talks of how do we navigate ground truth? Ground truth is kind of squishy if we don't know what the answer is supposed to be. So just to illustrate that with some examples, here's an example of a bad peak pick in the catalog. So here is, this is the phase net, whoops, the phase net output showing the peak of probability and hence the P wave pick right around here. But in the catalog it's removed from the waveform, whereas here's a really bad S pick. And these sorts of errors are in there and they contribute to the dispersion unfairly, I would say. The dispersion in these measures of performance. But we don't really know what the ground truth is. So how might we get there? So one thing we could do is test the self-consistency of the data. So here's a small earthquake cluster from earthquake sequence in Italy. So very much independent data. We make 50,000 picks with phase net, the sort of standard approach we make about half as many. And phase net has larger residuals, but because it can pick so many S waves, we get what we think are better results. So this is a rotated view of the relocations. These are hypodd relocations. The standard picks are on the left. The phase net picks are on the right. Our prejudice or bias is that earthquakes happen due to slip-on faults. And so the fact that the one on the right looks more piecewise planar satisfies our intuition and maybe it's not proof, but it suggests that we're actually getting better picks. We're closer to ground truth. So that's one way we can get at it. Another way is if we have independent information. So I'm gonna show a few slides from the Guy Greenbrier sequence. This was happened in central Arkansas that's shown here in the inset. The gray are the eventual earthquakes in this sequence as they appear in the USGS Comcat catalog. Here we see 75 earthquakes in the summer of 2010. They're color coded by depth. They're the circles, they're 75 of them. So we use this data mining method fast that does similarity search. So nearby earthquakes have similar signals. And without assuming what the form of those signals is, we can just, we can fingerprint or come up with a compact representation of all those signals that's diagnostic, search that efficiently using a set-based similarity measure and come up with lots of small repeating signals, almost all of which are earthquakes. So here's, so we go from 75 earthquakes to 14,000 earthquakes. So just like Zach was talking about, we reduce the detection threshold substantially. You get many more detections and illuminates new processes. So I'm going to focus in on this cluster here, what we call cluster one. And that was not what we were looking for. We were expecting these earthquakes to be due to deep injection, but in fact, these were due to hydraulic fracturing. So what's shown here are two production wells which were stimulated from their, so they go down and then they go laterally like this and they were stimulated with hydraulic, with individual stages that went from the toe of the lateral back to the heel, back towards the well. And you will see these rectangles will indicate that when the stages were active and the circles will indicate the earthquakes. And you can see that the earthquakes follow the hydraulic simulation, the individual stages, they follow them around. So this suggests strongly that we're seeing length processes and that our detections and locations are reliable. So it's a form of ground truth, if you like. And it keeps going and going. Each of those simulations, the same sign. Pardon me? Do you stimulate the stimulation of the same sign? I doubt it. But I don't know how they decide how long to, yeah. In the structure in the earthquake that was generated? There was structure in the earthquakes that had been generated. So the question is, is that structure because of the environment? Is that the rock structure or is that due to the stimulation? That's what I was trying to get at. Yeah, so it's a good question. I don't know the answer, but I will say that there's some things in here that would sort of are not desirable. So when they're stimulating over here, they're triggering earthquakes over here. That means there's connectivity between those two regions and they usually try to keep it. They kind of manage that. So if they were doing this in real time, monitoring in real time, they might adjust the volumes or the pressures to avoid that happening. But I don't know the answer to your question. Okay, another question that came up several times was the notion of generalization. So we're gonna look at the same Guy Greenbrier sequence rather than using template matching, which is where you know what your signal is, a priority and you look for it in continuous data or this uninformed data mining method where you just look for repeating signals. Now we're gonna try to generalize that to something that's more permissive, just finding signals with similar characteristics to previously cataloged earthquakes. And we use machine learning for that. Okay, so this is a machine learning based catalog using this phase net neural network. It's not particularly deep, neural network was deep enough and it's trained to pick P and S waves and to peak their probability of the P wave and the S wave arrival times. The data that we use to train that network is all from Northern California. It's mostly short period sensors, earthquakes range in distance up to about 100 or 150 kilometers. They're mostly five kilometers or deeper in depth and they're mostly small but not really tiny. They're like magnitude one to three. So that's what it's trained on. We apply it to induced seismicity here in Guy Greenbrier, Arkansas. Earthquakes are smaller. They're mostly magnitude one or smaller. They're shallower. The geologic structure is different and it works quite well. So this supports Zach's assertion this morning that these phase pickers do quite well at picking arrival times even when you use them in different areas. By the way, we didn't retrain it when we applied it to Italy either. So we find something like 90,000 events. Most of them are shown in gray here. The ones that are associated with hydraulic stimulation are shown in a color. And we see evidence for both of those processes as being important for triggering earthquakes. So this is a comparison of our results on the right, so Park et al. He's a third year graduate student. And on the left is Steve Horton, a seismologist who looked at this sequence and plotted the larger events. He had a little under 1,300 events. We've got about 90,000. So, and he had a grad student, Paul Aguirre, who worked through most of the sequence and found some of the same structures that we're finding, but he didn't go all the way through the sequence. And so he didn't get as full of you as we get. He had something like 17,000 earthquakes. So a monumental amount of work. He was picking those phases, but with machine learning we can do that automatically. We really didn't even have to look at the data. Okay, and the interesting thing is that it reveals some processes that were not previously appreciated. So this is three sequential episodes in time, going from left to right. So from July, 2010 at the top, to in the lower right, October, 2011. And so the sequence started in the summer of 2010, and got quite active again in early 2011. And there was a magnitude 4.6 earthquake, which led them to shut the deep disposal wealth and the sequence gradually died out. But you can see within there, there's all those arcuate structures. Those are basically aftershock sequences that look like they're showing some kind of diffusion away from where they occur. Whether it's fluid or stress that's diffusing, it's illuminating a process that was otherwise invisible before we had all these earthquakes. And we can look at this in cross-section. So this is looking at a cross-section that goes from this side to this end to that end, so sort of north to south. And here are the initial earthquakes shown until I think February into the initial part of that cluster, they're near well one and well five. And then this is something that was not appreciated before. The ones in February nucleated sort of independently, and so the sequence jumped ahead of itself or nucleated independently and then ruptured back towards where it had been rupturing previously. So that perhaps implicates well two and there's actually a cross-fault, the ender's fault that goes from this injection well towards where that earthquake sequence eventually reinitiated in the southern part. So this is something new that we didn't know about beforehand and was illuminated by all these locations. Okay, another important thing that came up, although only briefly was how we quantify uncertainty. So for phase net output where we're classifying probabilistically whether something's P waves or S wave noise in a time series, we get some width to the distribution and it's tempting to map that into uncertainty but it's also wrong. That's not a measure of uncertainty in your observation. As Karyan and others mentioned, there are approaches like dropout that are being used to try to characterize in a sort of a bootstrappy way, the uncertainty in the output of these deep neural nets. So data sets and data challenges. So a couple of people have talked about the need for curated data sets and benchmarks. We've put together a earthquake data set that we call STED or Stanford Earthquake Data Set. This is getting data from the IRIS archives from around the world, from small local earthquakes, 500,000 of them, also samples of noise. And this shows the distribution of seismometers, 2 1⁄2,000 of them, epicentral distances. And we've done a whole lot of quality control. We spent basically a year on that. It's both signals and noise and we added lots of additional labels that weren't in the original data set. And we published it in a journal that I had never heard of called IEEE Access and we did that deliberately because we figure that the seismologists are gonna find this. But the people we really wanna find this are the data scientists. So we publish this in the data science friendly format. The paper's kind of funny because it has what is a magnitude scale? It has a lot of introductory seismology in it. That's not for you, that's for the data scientists who might be reading it. And the whole idea is to recruit the interests of those people in our problems, which I think would be great for all of us. Another way to recruit interest is through these data science challenges. So you may have heard of the Seism Olympics. This was to try to develop algorithms for pace picking for the 2008 Wench-Wench-Wendt aftershock sequence. There was $50,000 in prize money. I found out about this competition late. I told my grad students, don't bother participating in this, they ignored me. So this is highly motivating apparently. And it was actually a good thing for them to get involved. But look at there, so there's 1,100 teams involved, over 4,000 participants in this. There probably aren't that many seismologists in the world. A lot of the people doing this were electrical engineers and computer scientists and so forth. It was great to get that interest. Good for us, good for our visibility and also hopefully good for the field. Now the ground truth is based on CEA analysts. And it would be interesting to come up with a different measure of ground truth that is more accurately portrays what we think is really the truth. That's a good challenge for us. Another question? Who won? I don't know, but one of the, it was a group at Georgia Tech. No, it was not at Georgia Tech. Georgia Tech was deeply involved. It was a group at, not Beijing, but USCC. There's a lot of seismologists there and a lot of seismologists working with electrical engineers and data scientists there. And I think it was a group of computer scientists that won. And I think they were using some STA, LTA kind of algorithm. But again, but again, but remember what the ground truth was, right? So I think we can do better. I should also say that to collect the prize money you had to explain your algorithm in Chinese. So there we go. So Greg, I was motivated to have Chinese participants in every team as we did. So Greg, so what was the outcome of that? I mean, so that's great, but what was the, you know... What was the follow-up? Yeah, well, what did we learn? What was the benefit of this? I don't know, but let me show another data challenge that, well, for me, the benefit was my students got highly motivated to do more machine learning. And so I derived a great benefit from that. I didn't win any money, but they did all the work. So I would have won it anyway. I think that was a useful exercise, but I think it could be more useful. And let me illustrate that with another example. So this is the Lennel earthquake prediction experiment. And by the way, it's surprising to me that there aren't hundreds of people around the world claiming they can predict earthquakes using machine learning. The combination of the low entry threshold and the black box nature of it would seem to be inviting. And yet it hasn't really happened. Anyway, so the notion here, these are laboratory earthquakes, not real earthquakes. And they're in a very clean system, I think a glass bead failure system. And this was a project that ran last year, ended last June or July, but four and a half thousand teams participated. It was a run on Kegel and in Los Alamos, they sort of set up the experiment. Penn State was involved. That's where the laboratory work was done. Laura Pirach-Nulti, who's at Purdue, can be in the meeting under DOE sponsorship to think about how to do this sort of thing and the Department of Energy sponsored the data science challenge. And so the results are in some team named Zoo, Z-O-O, one. And there's an article online about what they did and why they did it and how it worked. So, Richard, to your question, this is a better way to do this kind of a challenge with real feedback in the system. Okay, so I'll just stop with these recommendations. This is from Carrie Ann's paper from earlier this year. I think it's important to, a lot of these things are important. And I guess a concern I have is that we make sure that the kind of work that gets done under the Geosolidrith geoscience rubric in machine learning is credible and that we follow best practices. So I'll stop with that and take questions or maybe we have the panel discussion. Thank you very much, Greg. I think, yeah, let's go right to the panel discussion. That concludes questions on specific things you presented, but, and we do have to end, I guess at 4.30 so that Greg can depart. So, I'll open it up. Yeah, this is your, Greg, you're welcome to sit down. You don't have to stand. So I guess I'll ask a question about, your denoising work is really nice and stimulating. And I guess it raises a question that hasn't come up yet as an application, which is quality control. Since a big part of machine learning depends upon putting together a good, clean data set to train things on. It seems like the first thing you want to do is use machine learning to kind of build your data set and to get rid of things that are not real. And I know, you know, in geophysics we have, we don't need people to do attacks on it. We, our data attacks itself. We have dropouts. We have leap seconds. We have nonlinear response. We have countless problems in data. And it seems like the first thing you want to do is clean that up. And I'm kind of surprised nobody really talked about that, but that seems like low-hanging fruit for machine learning. And I wonder if you want to comment on that. Yeah, so the denoising algorithm, you may have noticed there are many, it's a very deep net. There are many trainable parameters. And it was trained on data from Northern California. And that data tends to be recorded in quiet places that are picked because they're quiet. They're on bedrock and whatnot. So I would not necessarily expect it to generalize well to the seafloor or to the urban environment because you'll have a rich set of noise sources that may, you know, may not be separately recognized by that algorithm as current implementation. So the generalization has yet to be demonstrated for that instance. I mean, I'm sure it would work well for the Northern California data it was trained on or, you know, going forward with that kind of data, but beyond that, it's uncertain. Did that answer your question or not? Sort of did, but, right? It doesn't necessarily have to be generalized. I guess it's sort of maybe like a set of rules that you would want to apply to your data before you, to your training set, before you started. And maybe you have to do that for each data set separately if you're trying to basically screen out things that are outliers. Yeah, so, okay. So there's, into that training, go both, so I didn't get into any of this, but into that training, go both signals and noise. And so the signals, earthquakes, we know that for, say, phase picking that that generalizes quite well. And so I think you could do something like if you had local noise issues like irrigation systems or whatever, you could retrain it with that as sort of an overlay so it would work better. Okay, yeah, so I agree with you. And thanks, and I guess, let me remind everyone too that this is now the general discussion, so people are free to ask questions to any of the previous speakers and anyone is invited to step in and answer if they feel they have a good answer or a good discussion point. So let's open this up. I just wanna add to your question. I haven't looked at the papers recently, but in our review paper, there were a few examples of seismic data where they were using machine learning for data cleaning. I haven't looked at the papers recently, so I can't tell you a lot of details, but if you look in there, there are a few references. So it is something that people have used, machine learning to try to approach. And I think those papers were from a little, they were some of the ones that were a little bit earlier, so like maybe like early 2010, so they were before everyone sort of jumped on the machine learning bandwagon, there was some work even then on that. If I could add to that, for some reason, those are all in the exploration seismology, almost all in that field, that they were sort of into it before we were. That's also been a driver for a lot of outlier detection methods in machine learning is finding examples in a data set that should not be in that data set or should be removed from the training. That makes sense given the volume of data you're talking about dealing with, I mean, there's no way to humanly review that, so. Other questions or discussion points, person? So Greg, do you want to comment a little bit more on this earthquake prediction thing? What did we learn from that? I was just reading the zoo thing, and they're like, yay, we threw everything in the kitchen sink at it, like some neural networks, some this, some this. And then they added noise to it. So I didn't participate in that challenge at all. I mean, I was at the meeting and we talked about it, but we didn't do anything. So I'm not the one to comment on how useful that is. You know, I just noted that there was follow-up that where the broader community got feedback about what worked and some speculation, if I recall correctly, about why it helped. But beyond that, I wouldn't be able to give you guidance. I think it would be useful to think about how, you know, setting up another challenge and thinking up front about how to make it most useful. I think that sort of approach of like throwing a lot of methods together, that's worked for a lot of other challenges of you kind of like the Netflix prize, for instance, that one, the final solution was sort of an ensemble of methods. So I think that's kind of common, that kind of, you know, throwing everything at it all at once, give us good solutions. These ensemble approaches tend to work well, I think for a lot of problems. I can't resist but point out that 50K, it's an incredibly cost-effective way for NSF to spend some money. Curious about this, right? So it's 50K and there were how many, there were over 1,100 entries. I'm just, you're funding, your chances at getting NSF money are much higher. I'm curious as a motivator that that's... Yeah, so there were over 5,000 entrants and 50. So the expected value of your winnings is... But I think that includes people who just made accounts and never submitted anything, right? That's not necessarily submissions, over 5,000, it's... So there were more submit, there were 5,000 teams. And there were many more submissions than teams. Interesting. So obviously people were good at multiple entries and I don't know who all these people are, but at one point, I didn't check it at the end, but before the end, it was some dude in his garage was in first place, interesting sociology. But I mean, all of this, right? You have a combination of methods and then you add noise, sort of like the randomness sort of seems like some method is gonna win, sure, right? But how does this, how do you generalize from that? I mean, so the difference between the winning entry and the second place entry was very small. It was a very, it was a small margin. The list is on there. And all the prizes didn't go to the winner, it was... No, no, what I mean is what is the difference in methods between the first and the second approach, right? Are those similar methods? So they have this similar mix between looking at quartiles and a neural network? Is it, I mean, if there's some reality to that, they should be similar mixes, right? Otherwise, it's pure luck, right? Possibly, yeah. So speaking of luck, when I was six, I won a bicycle and I went with a friend to a fair and she, it was her, whatever, it was her father's. Anyway, that's the only time I ever won a prize. So instead, what I thought I'd do is... It's change the subject and get, leave prizes and go back to talking about a couple of the examples, even, and I'm gonna come back and ask Zach, too, is what are you, Greg or Zach, or anyone? You've shown some examples of fluid induced earthquakes and then tectonic faults. We have a mix of sources and a mix of, or potentially a mix of signals within these data sets. Since you're talking about locations, you're showing locations as if the location is the prize, but instead the end goal is to also then look at the time space sequence and understand process. And so I guess what I wanted to get back to then is examples of fluid-induced earthquakes or induced volcanic earthquakes and others and the massive problem of detecting S waves or whether or not there are secondary waves at all and how we can improve our understanding of fluid-involving processes through these machine learning. Do you have some insights as well? So you wanna know about S wave picking performance? No, well, S wave picking performance in a bit more nitty gritty and success in the range of types of earthquakes sets because we've seen success in terms of a map fault or the clustering or improved clustering, but then there are also changes in waveforms and changes in types and there's additional waves that we could, besides the spatial locations or time space, but we could understand different sources. How far away are we from? I mean, I've spent most of the last year and a half or two focused on just getting to the point where we can build these seismicity catalogs. And so I think going forward is really when all the analysis is gonna get done. So I don't have too much insight into all these specific science applications just yet. So we're close to doable. Yeah, absolutely. Okay, and I think that, because we're looking forward, we're taking where we are now and moving, are you thinking forward and informing forward and just trying to move the conversation in that direction as well. Sure, yeah, I mean, there's, we have capabilities now that we just couldn't do before with this kind of level of convenience, I would say. And the performance is just outstanding. I think the application's going forward and it's going to be very exciting, yeah. Yeah, so it's actually kind of a follow-up to this. So I was struck by one thing you said in your talk which was a lot of S-Wave pics. So the analysts were not confident about the S-Wave pics. The machine obviously was. So I mean, were the analysts underconfident? Were the machines overconfident? And do you have, it may be too early to fully answer that, but do you have a sense in these cases where, yeah, I mean, are the machines getting their confidence level right in particular in these more difficult to pick and difficult to identify signals? I mean, it's hard to assess because we don't really have, as we discussed before, we don't have a really good way of evaluating confidence. I guess you could look at residuals in location to see whether they are systematically better or worse or something like that. Right, yeah. But then again, I mean, this is just a model and you can always kind of map that into unknown structure and things like that too. So it's not exactly clear. I mean, as you get to the really low SNR conditions, it starts getting really hard to try to talk about what the errors are, obviously. But I think the fact that we're even discussing this in the first place is really kind of a different take on this altogether, right? Whereas with these STLK pics from before, you're looking at not picking anything less than SNR five, right? So now we're here, we are talking about whether or not these pics are good enough at SNR of one. So that's a totally different discussion. And I mean, I'm happy to have it, yeah. Other questions or comments? Oh, just for a comment. So I think what the machine is good at is to see information that's distributed, that's diluted over a number of scales. You know, as humans, we can only see information when it's sharp. But if for some reason, you know, you distribute this information over many different scales, we don't see it anymore, but the machine has no problem detecting this. Can I follow up on that? So, and also one of the aspects about this that I think is so exciting is that the, you know, humans can't deal well with the dimensionality, right? We can't look at even four dimensions easily. So what do we do? We plot seismograms north, east and vertical, side by side, and then try to jump between them and make connections, because we can't visualize the full 40 particle motion, right? But these types of systems live naturally in high dimensional spaces. So that kind of offers a lot of promise, I think. So I wanted to get back to the question of discovery. And Chinkai had a cool plot in his talk that showed basically solving for the parameters in like a dash pod spring model. And what I was wondering was, but what if you didn't know that you had a dash pod in the spring in parallel and instead you had another one in serial or you had some different configuration? Could you use these algorithms to actually figure out what that configuration was? Or do you have to guess your model a priori and then try to fit the parameters? Yeah, I think that's to me, I think right now it's a still really hard. Like a lot of the times we see the observations and we see the input into the model, outputs into the model. So basically you try to encode the whole process, but then to really condense into the real concept, like what's the parameters are really useful, that's still a long, I think that's still needs a lot of efforts. Like the example I showed you in that paper, like I think that spring match like a system. So they basically it's basically a toy problem that they might want to see, like how you can actually use machine learning to get the equation or to get at least like some insights. So, and also like they kind of like know like how many of the parameters that are really useful. So they actually like they are looking for this like this set of parameters. I think there's a lot of work needs to do that, like if something can fully understand, like if we have a lot of, for example, we may don't have like the capability to visualize like that's the one spring system, but then you have a lot of these neurons in the like neurons in the neural network. And then after you train to analyze the neurons like a cluster of them, you may find that they may cluster into like maybe three core different groups or like a certain groups that maybe attribute to this true system. So that's my take on the paper. And also can I just a quick comment back to the competition? I think that's the recently, there's a lot of this competition starts in our field as well. If you, I think Hannah gave a really nice talk about remote sensing. There's actually a one going on right now called the X-View data. Basically this year's scene is like a try to use this remote sensing data for disaster detection, disaster building damage detection and so on. So, but if you look at a lot of the winners from Kaggle and also like Carrie mentioned that like the Netflix like one million price. So the winner got like the highest accuracy but you actually know this model cannot be used in other situations. It's so tuned, like so manipulated for this task for this data set. I think Hannah also mentioned that like when we build the benchmark data sets for people to build the model to test like the score and on these things. So we need to be careful like the usefulness of the model like a built against this competition and also like the practice, the practical usage. I think it's also one thing that we need to add into the like the metrics, evaluation metrics instead of like just like the accuracy or the correctness. Yeah, that makes a lot of sense. I mean, it seems like if we were in a competition or a funding agent was going to the competition, why not make the goal physical understanding instead of accuracy or something that is potentially more useful coming out of it, not more generalizable. Yeah, I think to that point it is important when we are using machine learning for scientific applications to think about what evaluation metrics make the most sense for that given application. It might not be accuracy. It might be false positives or it might be like minimizing a false negative rate or something like that. But to add to the point about benchmark data sets this is like I mentioned a huge problem in machine learning too where people have shown that there's like a huge, after one model will come out like again for example, there's been a huge wave of modifications to that architecture that claim marginally better performance usually and then there often are later papers that will show that those were not due to any architectural reason or some intrinsic property of the model but it's the setting of the hyper parameters. And so there's a big emphasis in machine learning too on doing ablation studies when you present a method to ensure that, to actually tease out what it is about the model or the way you trained it or something like that that actually got you that performance gain. So, oh, can I make this to your point? And is it the same one you have? Okay, so I mean to Hannah's comment, I mean I'm coming back to these examples of the two competitions. So were the various approaches made openly available? So could somebody theoretically go and look at these different approaches, understand kind of what the differences were and try and actually extract some of this understanding if they wanted to? I don't know how these are set up. Does anybody know? I don't know, but I think some of them were open and some of them were not. I think it was index. Oh, just one more thing I was gonna say about benchmarks which is I think the problem of benchmarks was talking about the limitations of them. I think we need to have benchmarks but I think the benchmarks as you were saying you have to think about the metrics because since we don't actually know the ground truth when we create a benchmark we say this is our ground truth now and we don't wanna optimize to it when it may not be correct, right? It represents what we know now but if we over sort of this over reliance on benchmarks really gets at a point where we might be sort of hurting ourselves in terms of discovery because since we don't actually know the ground truth so I think this idea of like not over optimizing for accuracy on benchmarks is important. It is important to have benchmarks because otherwise everyone's just kind of throwing methods out there and it's really hard to say like how do these actually compare if everyone's just scoring themselves. So you wanna have something, some basic baseline that method should meet but I think that we have to be careful about how we design the benchmarks and how we measure and score against them. Publishing executable code and publicly available code can also help with that by having that available if I'm presenting a method for one particular problem and data set that maybe hasn't been open already or is not a benchmark data set I can then actually use your model that you published to test against mine and that kind of meets both criteria without overemphasizing benchmarks. So I have a kind of related question about errors for Greg. I mean, you showed these beautiful plots of earthquakes that were sort of cloudish and then zooming in on the fault and it seems like a similar transformation from when we started using double difference methods. Is there anything in the algorithm or the method that would somehow bias it to come in on the fault and assuming not, I mean to come in on a plane and assuming not, do you have a sense of the errors and how much this is improving? Well, we used hypodd to do those relocations and so there are two things. I showed a couple of things. So with the appanines, we were using the phase net picks and because those were piecewise planar there's nothing I can think of in the data themselves and the way we process the data would make them want to be planar like that. In the Guy Greenbrier case, we actually re-measured, once we detected the events, we re-measured the arrival times using cross-correlation where we could and use hypodd to relocate them. So that's why it looks so extremely sharp. But so do you have a sense of how much those new picks are improving the locations? So the new picks were used using to find the earthquakes and then we re-pick after that to get the precise locations. Okay. Right, does that make sense? Yes, I'm just trying to, but I'm trying to get a sense of what's the overall improvement now in from just using the old picks with double difference to the new picks with double difference. I can't. I mean it looks from the examples you saw like it really shrunk down to an even finer scale fault structure. So I don't have, so I haven't compared the phase net picks relocated by hypodd with the cross-correlation based picks. That would answer your question, I think. But I don't have that answer. So a lot of talks are focusing on interpolation and then so I was just wondering for extrapolation, is that something machine learning will never be able to achieve or is something the community can move forward and perhaps one day can be able to do that? Any first speakers is invited to respond to that question. All right, I'll bite it. So it's gonna be, I don't think, it's hard to generalize. I wouldn't say never and it's gonna be problem dependent for sure. Anyone want to correct me or add to that? I think we don't understand the limits of what machine learning can do. And so I don't see a reason to see that these things are going to be more I don't clever than us, right? I think we can imagine that when I don't know but there is no reason for them not to. So I have a question that obviously this is a committee that's focused on geophysics, but there does seem to be, the geophysics community is ahead of the game in terms of a variety of different applications on computational applications that I would say a lot of the other parts of their sciences. And so what types of activities do you think we could support that might help improve the utilization of more complex computational methods to other parts of their sciences? What inroads do you see perhaps as fertile ground or low hanging fruit for being able to do that? So that's a good prompt. I think getting, so what something NSF could do would be to get computer scientists who were potentially interested in new areas together with geoscientists where these sort of methods are, that are interested in these sort of methods and yet are under represented in current applications. So just getting those two groups, I don't know how you best do that in a workshop, but the geosciences have important problems that are interesting to data scientists. They're much more interesting than mining Twitter or whatever they do at their time. And we just have to advertise those problems well. And my experience at least with earthquakes is that, sorry. They will not make, I guess, maybe not with those competitions, but mining Twitter can, you know, allow people to earn a lot of money. That's the difference. Yeah. I'll just add to that. I mean, I think a big part of this is the data, right? This is taking off in seismology because we have tons of data, labeled data. We have lots of data-driven problems that are kind of well-posed. That's not so true for other areas of geosciences necessarily. So I think at the end of the day, that's going to control really where this gets adopted. I'll just add on this because, so I talked to with some of the data scientists and also like a computer scientist about like applying methods to a different type of like a geophysics problem in seismology. But we found that like in seismology, like our data has its own format and this format is designed actually, not like the other type of area. For example, like in data science, people already use the JSON format or HDF-5 and so on. So our own format is making us like a facilitator of our analysis. We have our own tools used in the last few decades. So I think this is like a data format actually creating a lot of a barrier at the first place because when I talked with the data scientists, they usually complaining that the seismology data is really complicated. And I told them it's not. If you download a sac, you can actually run the sac format but they just feel like this is too complicated. I guess a shameless plug here. For the reference Earth model databases, they are being constructed in HDF to kind of facilitate the use outside of the seismic community. And I think as our databases, the reason we did that, even though it took us an extra year of work is that for very large databases, some of the seismic formats are not optimal. And so I think we as a community will move that way as we're kind of forced to. So I think we've come to the end of our session here, 4.30. I wanna thank all of our speakers, especially those of you that traveled here. And I wanna mention, before we give everyone a round of applause, I wanna mention that the video session from this will be online at the CUSD website. And with the permission of speakers, slides will also be available soon on that website as well. So let's give another round of applause and thank our speakers. And thank all of you for attending for a very stimulating session.