 I'd like to welcome everybody on this meeting on the future machine learning and data intensive computing and solid earth sciences. We'd especially like to thank our sponsors, our invited speakers, and all those that traveled to get to this meeting, as well as the many people expect to have participate via telephone. Part of this committee's task is to provide advice on science, technical and policy matters related to seismology, geodesy, geodynamics, interview, basic and applied research activities in solid earth sciences that contribute to federal agency missions. And it's hard to think of a more timely topic than machine learning. Obviously the increasing power of competing systems combined with exponentially growing data holdings is leading to exciting new results and tremendous interest in the community. Today we want to explore where we are with machine learning and as part of data intensive computing in general and think about and discuss where we as a geoscience community should be going next. We divided this day into four panels that focus discussion on particular aspects of machine learning and data intensive computing as we go through the day. So on the first panel we start with an overview of machine learning with presentations by Dr. Cary Ann Bergen who's getting ready there. An early career researcher at Harvard and lead author of an excellent review paper on this topic and there was a science magazine in March of this year. Then we look at work by two other early career researchers in the field, Dr. Zachary Ross and Dr. Diego Melgar. And then after their presentations we have 30 minutes for discussion with the panelists as well as everyone in the room and on the phone. In the second panel we tackle the challenging topic of next practical steps to accelerate and broaden the use of machine learning and geosciences with a remote presentation by Dr. Kincai Kang of UC Berkeley where we know there is not power but we're hoping he will be able to connect. And he was also the lead author of another recent review paper that was in seismological research letters in January of this year. And then again after that presentation we'll have 30 minutes for discussions on that topic. Then at the end of the second panel we'll take a one hour lunch break and we'll return with a third panel where we've invited two experts in machine learning from other related disciplines to present on applications and advances in their field. So first Dr. Hannah Kerner from the University of Maryland will present on machine learning applications of remote sensing. And then Dr. Bruce Menard from John Hopkins University will present on machine learning applications in physics and astrophysics. Then we again have 30 minutes for general discussions. And then finally in the fourth and last panel we've invited Dr. Gregory Baroza of Stanford University an expert with a long standing experience and expertise in this area to review the current status of machine learning and data intensive computing and help shape discussions on next steps to advance the science. So that's going to include both what we learned in the first three panels as well as what's going on in the field in general and we hope to have a very stimulating discussion in the 30 minutes after Greg's presentation. As we stated in the introductory text a goal of this meeting is to review both progress and machine learning and discuss what future investments are needed for a concerted long term effort to organize geophysical data sets and combine them with appropriate data intensive computing resources in the solid earth geosciences. That's the natural laboratory necessary for scientists across disciplines the most effectively work together and we want to be able to discuss how those workflows and student training can be combined with approaches that provide insights into the physics of the earth systems. So kind of the more ambitious goal is to look beyond what's needed to develop a deeper understanding besides using machine learning as a black box. And as a congealote in their overview paper it would be transformative if we could develop hybrid modeling framework to combine data driven machine learning methods with explicit physical models. And I hope that we get into that in our discussion sections coming up. So with that I'm going to turn over the the gavel to Dr. Cindy Ebinger, committee member who's going to lead the first panel. Thank you for a great and comprehensive overview of the meeting. I just wanted to remind everyone in the room that we have a large number of people calling in or listening online and so to try to use your microphones and I mean obviously the oh well we'll be the speakers themselves will be near a microphone the whole time as well. What I it's my pleasure to introduce Kieran Bergen from Harvard University who's going to be giving an overview of the subject but also to taking us inside the black box and that this whole first session as well it will be motivating a discussion of where we stand where there are potential gaps and even I'm thinking as well about the different funding agencies and the funding models how they support or where there are cracks perhaps or gaps between the different funding agencies they'll be pointing out some of these topics for further discussion as the meeting evolves so Kieran thank you. Is this my okay it's working good can everyone hear me all right all right okay thank you everyone thanks for the the organizers for inviting me so today I'm going to give sort of an overview of what I think is sort of the state of machine learning in the solid or two sciences and I just want to give sort of a let's see okay so I've been having some technical difficulties with the slides all right so um so I want to give sort of a little overview of kind of how how I got here and sort of what where these sort of things that I'm sharing with you are going to come from and so I'm by training I'm my background is in computational data sciences but I've been working with seismologists for the last five or so years I worked with Greg Broza who you're going to hear from later today on applying data mining methods for large scale search to earthquake detection and from working with seismologists from working on this project one of the things that sort of came out of this collaboration was that I think both of us learned a lot about what are what are some of the opportunities for machine learning in the solid or two sciences but also what are some of the things that that it could be doing that it's not things that people are that there's a lot of really interesting work going on but there's a lot of places where we could push further and there's certain obstacles to that and so in order to sort of share with people what we thought could help to advance this field more we wrote this review paper where we talked about what people are doing highlighting some of the really interesting examples of work and we also wanted to give some recommendations for where we thought that the field needed to go in the future so that the solid or geosciences could really innovate using machine learning in order to hopefully make more discoveries with the very large datasets that geoscientists have so just to start out I want to give a brief sort of summary of what machine learning is so machine learning is a subfield of artificial intelligence in traditional computing what you would do is you would have some problem that you wanted to solve so you take some data you maybe have a model you wanted to run so you'd write a set of commands you'd write an explicit program for how you wanted this computer to process that data machine learning is a little bit different and that you want the this computer to actually start to build the model itself from data so it's learning from examples which is how people use or how people learn but what data with um machine learning computers can actually have experiences they can't experience things like people so for a computer learning from experience means learning from data and so the machine learning is you actually rather than giving the computer a list of commands you have it partially learn the model itself by looking at a lot of data and so machine learning are tools for extracting patterns and building complex models from data a lot of these are predictive models but there's a broad range of kinds of machine learning which we'll highlight next and machine learning it sounds really sophisticated really fancy and some of the methods are but also a lot of the methods are drawn from applied statistics so methods like linear regression which is stick regression and principal component analysis which a lot of scientists use in their work regularly are actually pretty examples of pretty straightforward and simpler machine learning techniques so a lot of the techniques that people are using now are sort of more sophisticated complex versions of what scientists have already been doing for a long time um so this in this meeting this also has the the phrase data intensive computing in the title and so I do want to say what in this talk when I talk about machine learning I use it a little bit of a catch all there are other techniques that the machine learning community wouldn't call it strictly machine learning but there were techniques for extracting information from data um I usually call these are data mining techniques and so I'm going to when I talk about machine learning I'm really referring to both machine learning and these other data mining techniques that are designed to extract information from very large data sets um so I'm giving you sort of an abstract definition I want to go a little bit more into what machine learning is and how it works so a lot of the examples of machine learning that you hear about in sort of the news a lot of the flashy applications are doing what's called supervised learning and this is where you're actually building models from examples and then what this looks like is rather than as we say we have a task where we want to build um have a computer distinguish cats from dogs we want to build a classifier where it tells you whether something's a cat or a dog it's a little bit of a silly task but it has cute pictures so so what you could do is you could write a list of commands like you know look for pointy ears maybe that's a cat you could list the set of rules but this is a task that's hard to do to list out a set of commands to distinguish a cat from a dog so instead we can get a lot of labeled examples um we can find pictures of cats and dogs on the internet we tell the computer which ones are cats and which ones are dogs we feed those into the machine learning algorithm and it uses that data to come up with an with an optimal set of rules or criteria for distinguishing cats and dogs and so the output of the machine learning algorithm is a prediction model and this model if we've done our job right it should be able to take a new image and answer the question is this new image a cat or a dog and hopefully it'll get the answer right um if we've done a good job with machine learning so this is supervised learning where you're actually giving the computer examples of the pattern that you want it to be able to reproduce and learn a model in order to do the same thing um you can also do machine learning even when you don't have labeled data and this is useful especially in geosciences where we often have large unlabeled data sets and that's called unsupervised learning and a lot of data mining techniques would fall under this category as well and this is using machine learning to find patterns in data so if you have data you don't have the labels you may have a large collection here's say images of different kinds of animals you can feed these into a machine learning algorithm and depending on the choice of algorithms to use uh use it's going to return some sort of structure in the data and it's using an unsupervised learning algorithm but what the structure might be it could be common patterns or features um in your data or it might be groups of um data points that are similar to each other so it could give you um relationships between your data points or between the features of your data so there's a lot of different kinds of structure and it'll um the the kind of structure you find will depend on the algorithm and so there are a lot of different kinds of machine learning algorithms out here we talk about unsupervised and unsupervised learning a little bit of a simplified picture this is a little more complex simplified picture but even then there's a lot of different kinds of machine learning algorithms there are a lot of different ways they can be used um and up here we have um deep learning which is something you'll probably hear about more later today um deep learning is just as a as a type of machine learning um it can be used for both supervised and unsupervised learning um but there's a lot more to machine learning than just deep learning deep learning are powerful tools but there's a lot of different applications out there um and so what I hope you guys take away from the meeting today is that machine learning can help geoscientists extract more knowledge and insights from larger data sets than ever before so I really think machine learning is a very useful tool for processing large amounts of data um and I have said than ever before here because geoscientists have been using machine learning for a long time this isn't like an idea that we just discovered yesterday um in the 90s there were people who were using artificial neural networks which is sort of a simplified version of the deep neural networks that are popular um today these methods were being used in the 90s for distinguishing earthquakes from explosions for example um in the 2000s there was a lot of work with um graphical models like hidden markoff models also were distinguishing um in this example um classifying seismic signals and so these techniques are not necessarily new but what's new is there have been new developments um coming from both the seismology side um the geoscience side and from the computing technology side that have created new opportunities that hopefully geoscientists can leverage um to make new discoveries and so the first is that we have much larger data sets so we have massive geoscience data sets large data sets for in the context of machine learning are both a challenge and an opportunity the challenge is processing large data means you have to a lot of computing power it can be challenging to work with but the opportunity is if you're trying to find patterns in data having a lot of data means you're going to be able to extract more signal um and so we have a lot more data sets um I've been working with seismologists so those when we talk about large data sets we're often talking about in terms of the amount of time we may have long records of data there may be data from many instruments um you'll hear later about remote sensing there's large data sets from spatial temporal and remote sensing data sets um there's also the output of large simulations and I think people don't always think of these as data necessarily but this is a kind of data are when you run a large scale numerical simulation you get a very large output back in some cases and that's data that you can try to understand using machine learning techniques as well and there's also people coming up with new sources of data for instance crowdsourced data and so there's always new sensing modalities that's sort of coming online and creating new data sources and so this is one area that's changed since the sort of early uses of neural networks and geosciences we have a lot more data at the same time there's a lot of new machine learning algorithms and models out there there's been a lot of work over the last 10 years really pushing the state of the art in machine learning and through these big data technologies so I was working on this project fast where we were using methods for sort of a big data type approach the methods that are meant for large scale data there's also been a lot of work recently in deep learning and that's because the new network architecture is like convolutional neural networks which can be trained and a huge number of other deep learning architectures that are designed to do a number of different kinds of tasks deep learning architectures they can work well paper data on a grid for sequences they can learn to reconstruct there's unsupervised type methods generative models there's a lot of different deep learning architectures out there now they give you a lot of flexibility in terms of the kinds of problems you can solve and so this is really been a space where there's been a lot of work in the last 10 years and that's why you're hearing about machine learning and AI everywhere and some of these new improvements in these models have really been enabled and why you're hearing a lot about machine learning is because of improvements in computing technology and tools so one of the reasons why deep learning really took off is the ability to actually move machine learning onto GPUs and be able to compute on large datasets which was difficult previously to train the model so there's been now you can train it using GPUs there's new techniques that allow you to do that as well in terms of the mathematical side there's also getting cheaper and easier to compute with big data there's a lot of that work you know you can now run all your machine learning models in the cloud for instance and something that I think is really useful from the point of view of scientists are these open source machine learning frameworks a lot of them have been developed at companies that make these tools easier to use and so it used to be that if you wanted to run a deep neural network they 10 years ago you would have to be you know a machine learning PhD student in one of the top labs but now anyone can sort of pick these things up you can do a tutorial you can take a class online and it's easy for people to get started and actually run state-of-the-art models themselves so those are the the things that I think have really made a difference in why machine learning deserves a second look in solid or geosciences and why this is we're really seeing a lot of new developments in other scientific fields as well so machine learning's been particularly successful in a few different areas one of them has been in computer vision or processing natural processing images for example classifying images or segmenting them machine learning's also been really successful in natural language processing so this is things like speech to text where you talk to your phone it turns into text and then when you talk to Siri or Alexa all these sort of things and the combination of the two so now you can do things like that combines both image processing and natural language which is image captioning so these kinds of applications with images and natural language have become quite sophisticated to the point that these are actually like products that companies can sell to people they're in a pretty advanced stage and there's also been a lot of developments have gotten a lot of interest in gameplay for instance the alpha go thing was highly popularized but other things like robot soccer and these kind of fun things that a lot of machine learning people like to work on so these are some areas where machine learning's been really successful but some of these things these applications have in common is that one is that these are data that tend to be on structured grid so images and sequential data like language those are very structured data sets the tasks that we talked about that I talked about here are pretty well defined and these are tasks that are also easy for humans to perform which means you can create large volumes of high quality labeled data for these kind of tasks it's easy for a human to label an image and tell you if it's a cat or a dog you can get a very large number of cat and dog images for instance and so this contrasts a little bit in terms of what we see in the sciences so data sets in a solid or geoscience are a little bit harder to work with sometimes as many of you know so sometimes the signals that we're interested in have a low signal to noise so we may have a harder time picking up the signals we're interested in the data also tend to be very noisy these are real data sets not sort of nicely cleaned and curated data sets that a lot of people are using in their sort of test problems there's also the problem of that we often don't have labels for the data so a lot of times we have sensor data put out sensors you collect data but you don't actually have labels for what's in this in this data and if you can get labels they may not be as high quality as you need you may not be able to get as many as you need or they may not be exactly they may not be perfect in every case it can be harder to obtain and part of that is because the ground truth is unavailable so we may know we may know some things that are in our data set but we can actually say for sure that we found everything and so not knowing kind of the right answer can you make it challenging to apply machine learning in some tasks because we have such large data sets we need methods of scale so that's another challenge but we but one of the scaling challenges is sort of particular to scientific data sets is that you're often need data sets that are large because you're looking at phenomena that act across multiple scales so in seismology when we think about for instance earthquakes earthquakes happen a very short period of time but sort of the the scale between the time between earthquakes can be very large or the processes that are driving earthquakes like sort of tectonics that happens over a much larger time scale so you have to sort of think about modeling cross scales both in terms of time and in space you can have things that are on sort of the forest scale we also can have things on sort of the global scale and how those interact can be important to model in some cases so there's a number of different challenges so the question here is how do how is machine learning actually being used in in Solider Geoscience today and how might it be used in the future and so when we were thinking about writing our review papers or summarizing where things are we came up with this sort of three different modes of machine learning and these are automation modeling and discovery and these aren't meant to be distinct they overlap but I think it's a useful way to think about the kinds of tasks that you can use machine learning for in sciences so the first is automation or automated prediction decision making or data analysis and the goal in when we're doing automated prediction are these kind of tasks are a task where we want to perform some sort of complex or repetitive task that could be challenging for a human to perform it might be something that would be not challenging for a human to perform but might be tedious or it could be a task that might be infeasible to have a human perform because of the size of your data sets you may have to have a human this would be something like for instance a spam filter like a human could tell you easily if an email is spam but it would be tedious and infeasible to have someone sitting there pre-filtering your email for you right so these are the kind of tasks where you want to automate things it also could be a task where um where you know how people know how to do it but it's difficult to actually express this as a set of explicit commands so this is again for instance our example with cats and dogs humans know what what a cat and a dog looks like but actually trying to write out a program that will do that is a challenging task because we don't quite know how people do it so these are the kind of tasks that we may want to think about automating with machine learning um and so this in these cases machine learning the focus is often using machine learning as a tool for high accuracy predictions um and here I use prediction in the sense of um sort of labeling the data um not necessarily predicting the future um all right so this is this is the idea here but one of the things that I think this is an area where a lot of the current use of machine learning in geoscience has been to date um and in part that's because this is often sort of the the low hanging fruit of machine learning in um in the sciences because often what you're doing is you're taking a process that you're already doing you either have a process that's automated in some way and you want to improve the accuracy of that process or you have a process that you've hired a grad student to do repetitively and you want to automate that um and so these are this is an area where we've seen a lot of work because it's sort of these are kind of in some ways it is a low hanging fruit the obvious things to automate with machine learning all right so what are some examples of this so these are examples these are not my own work so um I'll do my best to represent what they're doing um but so some of these examples are from automated data analysis in this case one of them is classifying volcanic ash particles this is a task where they have these um these images they're particles of volcanic ash and they want to know just which shape category they fall into right and this is something you could hire grad students do um they probably wouldn't enjoy it very much but that's you know what grad students have to do sometimes this is a task that's very tedious for an analyst um but because these are just look like these are nice data on a grid these look like images this is something that we expect that machine learning could do pretty well right because machine learning is good at image processing so being able to tell you which shape these are um a human could do it but a machine could be trained to do this fairly um easily and so this is a task that's well suited to automation with machine learning um another example where there's been a lot of papers has been in um seismic single analysis like examples like phase picking of seismic data um these are processes that may already be semi-automated or um automated and the idea of using machine learning here is you may want to improve the accuracy of the automated process or fully automate it so a lot of times there's kind of a human in the loop um tweaking the results and we may want to fully automate this and improve the accuracy of the existing pipeline so this is an area where machine learning can be used even if it's already automated machine learning may be able to do better than um our previous attempts which may have been trying to write out a series of explosive commands of how we think people would do the task um it can also be for automated prediction um an example of this is lithological mapping so here would be um they take geophysical and remote sensing data um and they try to be able to map that to a lithology using data that are just sparse examples so they can't actually go and take measurements determine the lithology of every single spot on this grid um and so using the few measurements the few ground truth examples that you have um or the few observations that you have um you can then fill in the rest of the map as the idea here and so this is something where um we're able to this is something where it would be hard for person to do this manually would be a lot of work and so this is something that that sort of lends itself to automation with machine learning um so there are some challenges that come up when we think about applying um machine learning in this sort of automation context in the solid or q sciences um one is something called data center covariate shift which is when the data that you're using to train your model um doesn't always match the data that you want to test it on um so this can happen in a lot of cases maybe the data that you have for training is data from simulations but you want to apply it to real world data so these kind of things can come up also if your data somehow changes over time um that may create these kind of issues with automation um or challenges um think about when I think these challenges though I think about them in a optimistic way and that they're challenges but these are also kind of the hooks for the data science community these are areas where we need their help and collaborate with them um there's also issues with biases in the data collection labeling it's easy um there's some data that are easier to label than others but those are often not the data we're interested in right you can easily label the frequent events the large events um but the smaller more infrequent or sort of difficult to um phenomena that are more difficult to pull out of your data those are ones you're interested in so you may have these biases in your data set that they're they over represents the kind of events that are not as interesting to you um and evaluating performance can be challenging because we don't have high quality ground truth um in some cases and so it can be difficult to automate and iterate on um your process of developing machine learning solutions when you don't actually know the right answer um so that's that's sort of the overview of automation there's been a lot of work there there's been a little bit less work in this area of modeling and inversion but I think that this is an area that's really interesting and will sort of be um where I hope we'll see a lot of work um in the near future um and so the idea in modeling and inversion um I vote if I'm modeling I say inversion because this can be a forward or inverse model is we really want to use machine learning as a tool um to build models and so we want to create a representation that captures relationship or structure in the data and I mean this sort of broadly so one of the things you could do with modeling is to learn a surrogate model so say you have some large simulation you do and you just use as large simulation to get an estimate of like one value right so you do a lot of computing to just get one one value and maybe you don't care about it being a hundred percent accurate it has to be good enough right it might be a sub routine and another um and some other code that you're running and so we what one way you could use machine learning is to actually automate that process to you to um or sorry to build a model that will predict what those values are to learn a surrogate model that approximates your more complex model and gives you an a rough answer that requires less computation um another area for model reduction and course screening this is something that a lot of other sciences have been used uh of um an area in which a lot of other sciences have been using machine learning and that's to take your full system and learn of reduced representation of that system something that's simpler to model that has the same properties as the ones that you're interested in um and so this is something you see a lot in for instance um computational chemistry material scientists do a lot of that um and so a lot of the modeling applications that I think are of interest are places where we see machine learning sort of intersecting with sort of the traditional computational scientists and numerical simulations and so I'll give you a couple examples of this um so one is this um work that are using machine learning to simulate wave fields so the idea is if you have a 1d velocity model you can you have a forward model um that can actually just directly compute your wave fields at a given time so you have to use using finite difference modeling and you have to take a number of time steps and all this and it can take a lot of time to compute this um and so what they did here was they took and they ran there took their velocity models they ran these simulations um but the challenge here is if you wanted to use a new velocity model you have to sort of start from scratch and you have to run it all over again even if you just change the velocity model a little bit um and so what they did was they took examples where they had a velocity model they ran their simulations and they used those as training data so they wanted to use the velocity model as their input this is the image of the cat or the dog and then the label is actually the output of the simulation um in this case and so they wanted the reason that they do this is because the prediction model will now allow them to take a new velocity model and actually then predict what the wave fields are without having to completely reconcute it um and so this is giving you this is a case where you're getting sort of a faster model than having to actually run your full sort of numerical calculations every time another example of this is of modeling is using model reduction and this is for flow in porous media so here they have discrete fracture networks um that are used to model flow and transport and so if you have the full network model um it can be more uh computationally intensive to compute on those so the goal is to find these sub networks which they call in this case they use the term backbone for this reduced network model that has the sort of same or that that captures the bulk of the flow that they can just model this sub network um in their computational simulation um and it will sort of give them the same results and so this reduced model can be difficult to find and so what they do here is they actually use machine learning to pick out what that sub network is um they take their full network and they use the machine learning model to predict which of the fractures in this network which pieces of it should be included in the sub network so that they can have this reduced model now they can do their computations on this model um and this is something that they could do before where they um they can actually compute what the sub network was but that was a computationally intensive process so machine learning allows them to do this in a more efficient way um and this is a so this is a tool to really speed up um the calculations that they're doing all right um so the challenges that come along with this sort of in this sort of modeling um mode of machine learning one is quantifying model uncertainty um with traditional sort of computational techniques um numerical methods you often have a good idea of what your uncertainty is um that's something that applied mathematicians um study and model and that can be more difficult to determine from machine learning systems um there may be physical constraints or domain knowledge that you want to incorporate into your modeling um and so that's a little bit more tricky to do because most of these methods are designed to be fully data driven so um it requires more work to actually embed this sort of information into your into your machine learning architecture and into your solution um and there's also with when you're using simulations um and the output of simulations is your training data there may be an expense of actually being able to collect the data in order to be machine learning so we say well we can use machine learning to beat up simulations but if you want to train it you have to run a bunch of simulations first and so that's um one of the sort of downsides of this approach it works well if you already have a lot of existing um calculations that you've done um and so the last example or the last um mode of machine learning that I want to talk about is um discovery or discovering patterns and insights from your data um and so the idea here is that you want to extract new information um that information might be pattern structure relationships um in your data set um and what's key here is that you want to find patterns that are not easily revealed by conventional analysis techniques they just use machine learning to get a different view of your data that you aren't able to get um from the techniques that you have right now um and so some of the techniques that I think are really useful in discovery um and there's going to be I think the machine learning community is also really interested in these kind of techniques going forward or unsupervised learning techniques and generative models um for exploring large unlabeled data sets um because a lot of the techniques out there now these you know convolutional neural networks image processing they work really well for labeled data sets um but where we well we need for discovery is we need to be able to work with unlabeled data um and so that's a key aspect I think of of discovery and where there's a lot of opportunity for growth is in this area um and I'll just give one example um of this where in so this is in geosciences um which is finding patterns among seismic signals this is work out of columbia where they took um this is 46 000 earthquakes from the um geosciences geothermal field and they use machine learning essentially tease out what are the differences between these different signals and the relationships between them um and they used um a couple of different machine learning techniques they used representation learning and clustering um and they were able to identify um sectors and plural patterns um that were too subtle for traditional methods to find these were patterns that they wouldn't have been able to find from traditional methods and it gave them um insights that they wouldn't have been able to find otherwise and so this is a kind of area where I think there's a lot of potential there's also a lot of interesting applications that are I would group under discovery coming from other fields um and so I just want to highlight one that I think is of particular interest to um and as a solid earth community um which is learning to learning governing equations from data um so this is a lot of there's a lot of groups in the applied mathematics community who are working on this um this is work out of the university of washington where they are basically trying to take snapshots of some system and they want to use machine learning to actually determine what those governing equations are that was um and what the physics were that was um controlling that system um and so that's something that I think these are the kinds of things that I think moving forward um we I would like to see more of in the geoscience community um and so with discovery there's some challenges one is that because we're interested in outliers in frequent events um or unexpected patterns it can be hard to use machine learning because machine learning is really looking at um it's really good at finding these kind of common patterns in the data kind of the overall trends but it can be difficult to find you know what what are these kind of more unusual cases like how do you try to machine learning algorithm to find something that's surprising or unexpected um that's a much more challenging task than cat and dog um and part of that is because machine learning algorithms are really good at interpolating between data um but they're not as good as at extrapolating and so in some cases when you're trying to discover you're trying to look outside of what you what you already know and that's getting into the realm of extrapolation um the other challenges that often discovery requires us not just to have a we might be able to build a machine learning model that does some task really well makes a prediction we want to understand why what was it doing and how was it how did it learn what did it learn in order to um to do that and these models can be difficult to interpret so a lot of times the discovery piece will require interpreting what's going on in the model um and that's an area that still needs a lot of work um and so sort of in reflecting all this in our review paper we we talk about um different ways to advance um the sort of state of machine learning research in solid or geosciences so one of these is um open science techniques like having open access papers using electronic preprints um having codes be open source and this is for two reasons one is that sharing um sharing research sharing code sharing data makes research move faster this is some things that's that computer science community has adopted and it's one of the reasons why they move so quickly it also makes it easier to collaborate with computer scientists because this is already a part of their culture of how their sort of academic field and publishing works um and so it puts them more in line makes it easier to collaborate with them because you don't have these fights over can we put our paper up on a preprint server and these kinds of issues can we share a code um another is benchmark data sets um so this is a challenge because we often when it comes to ground truth we don't always know what the correct answer is and that means that you may have a bunch of people creating algorithms they all make up their own ground truth and then you don't know who did the best job because everyone's kind of scoring themselves based on their own criteria it makes it really difficult to improve upon what people have done because you don't really know who's actually doing a better job at a particular task so by having sort of clear benchmarks sharing our data sharing our code um we can actually advance the field rather than everyone kind of writing independent papers that we don't know what to make of as a as a community um now there's a geoscience geodata science education so you'd like um more education of these kind of techniques in the geoscience community and both for the geoscience community to itself to develop these solutions but also to make it easier to collaborate with people in the machine learning and data science community um and the others I think there are a lot of opportunities given all these sort of challenges that I talked about for new research on the data science side and I say to someone who's coming from a data science background and part of what interests me about geosciences is that there are a lot of interesting and challenging data problems here and so I think part of this edge geoscience um the data science education in geosciences will also help to better communicate and actually get assistance from people in the data science community to facilitate those collaborations so we can actually develop new solutions that are specific to the needs of data science or of geoscientists um in collaboration with data scientists and some things like interpretable methods is one area um that we highlight and also methods that incorporate physics and domain knowledge combining physical models and data driven models um and so that's um where I conclude and so thank you everyone um thank you thank you so much karen for for it so clearly articulating the scope in the meeting and presenting such an excellent structure for further discussions two things and we have time for just a couple of quick questions I'm remind everyone that we have 30 minutes of discussion at the end of this session and so I'd like you to um ask only specific questions um to karen right now and leave leave commentary about where the for example more general questions like where is the join between geoscientific research and data sciences research or you know some of the open source and open access um discussions to later part of the talk I also remind everyone to please use a microphone or get your hand up so we can get you a microphone so that you we um those on listening online can hear the entire discussion so do we have a brief question so Tor and then Jeff and then we'll move on great talk karen I was wondering if you could explain a little bit more how those ODE guessing machine learning things work the example of the reaction diffusion system where the machine learning came up with the ODE uh yeah so that's not my work so I'll do my best um I can refer you to the paper um so that's that's work where um one of the things that they're doing is they're my understanding of the paper um is that they're trying to essentially learn the model by enforcing a constraint of sparsity so they want to learn a model that's sparse and they design their network so that each of the terms in the model corresponds to some sort of term in the physics equation so they use a machine learning model that learns to reconstruct the data but it forces it to sort of pass through this layer that has some sort of sparse kind of physics model um and by enforcing that sparsity constraint they get back because most physics models are somewhat or a lot of the governing equations are sparse in the sense of there's usually not like a million terms in your physics equation um there's usually just a couple and so by enforcing that sparsity and sort of running the reconstruction of the data through um a model that's looking at the physics they're able to actually then pull out what are those coefficients which ones were non-zero and then map that back to the the equation um if that makes sense and so the input is sort of a time series of realizations of the system and then it kind of does a temporal interpolation which comes out to be the ODE yeah so they're using the time snapshots and then they also I think they directly compute the the derivatives from that and they're in this particular example there's other there's other groups that are doing similar thing and so they may do it in a different way but my understanding is that they're using that they're trying to learn to the model is essentially like a the the objective of the model to learn how to reconstruct the data um but it has this effect well I don't think I can go back to the slide but um anyway but it shows the there's sort of these two networks and then there's kind of the skinny piece in the middle and that's basically forcing the reconstruction through as far as kind of physics model um okay and Jeff yeah I wonder if you're looking at sort of the discovery mode let's suppose that you devise a black box that finds a pattern in the data that you didn't realize was there what could you give a concrete example what would you then do to actually try to figure out what's causing the pattern essentially is identified something that's there how would you go about trying to tease out what the what's actually going on inside do yeah so that's sort of an open an open challenge I have been reading about that so I have like I can sort of give you some examples so one of the things that sometimes people try to do is there are models that are easier to interpret like linear regression is fairly easy to interpret you just have a few coefficients you can sort of tell yourself a story about what's going on with that so what some people will do is they'll try to take the model that they've learned and they'll try to either learn a global or a local linear model that um that models either try to approximate the solution or maybe just locally and then they try to interpret those coefficients so that's one way um some of these neural networks there's other these other techniques that try to like visualize what you there's all these different sort of features that are being learned um deep neural networks work well because um and I took out the slide maybe we should have kept it but um the one of the reasons that deep learning methods work really well is the same thing that makes them hard to interpret which is that they're learning both the the prediction task or the classification task but at the same time they're learning what features to use for that task and so it can be difficult to understand what are these features it's teasing out and so there are people who have developed techniques to try to poke at the method the network and kind of try to figure out what is what are these features what kind of things um excite them or activate the different parts of the network and those techniques are sort of controversy over the what how much it's actually telling you how useful those kind of techniques are but you can get some information out of them in some cases for instance there was this I don't know this is Google deep dream where they were taking all these images and making them look like they had all these strange things like hidden in the background um if you google deep dream you'll find like a blog post that has all these crazy images and these are the kind of techniques where they're trying to actually choose that what's going on in the network and so a lot of these techniques work well for tasks that humans are good at so for images like if you can try to um I guess I should when I talked about unsupervised learning I had these two examples of grouping images and these features and there was a kind of cat face that was something that came out from a neural network and that that was sort of interpretable person could look at that and say I think that's a cat but you may get those kind of results for images you may be able to interpret them a little bit but in some cases it can be hard with scientific data because it is harder for humans to interpret so a lot of the techniques I think that people are developing for interpreting say image classifiers and understanding what's going on within them don't always translate so that is an area of active research people are really interested in can we understand what they're doing but it is hard to do because the models are really complex but trying to make these like linear approximations is one that I think is a little bit more successful but it's always a question of how much is that actually telling you okay thank you so much Kiran and now we'll move on our next speaker is Zach Ross from Caltech and so could you just tap the mic and be sure it's on Zach can you you know it's on okay so earthquakes have this really funny property where smaller that they are the more of them that you get and I think you know the snapshot of data like this really illustrates this clearly where you know this is 15 minutes by basically 24 hours right during the heart of the recent rich crescent earthquake sequence and the main shock here is 7.1 is this big red waveform there and so you can see all the stuff going on all these different scales of course the more that you zoom in here the more that you're able to see and this continues all the way down to the background level where we're able to pick up you know vehicles nearby air traffic that remote earthquakes and things like that and so you know key goal in seismology is to recover as many of these events as possible so we can build catalogs of them but typical techniques for doing so really they fall short of recovering most of what we're actually recording in the data and so in particular these kind of hidden events that are there are really important because they fill in the gaps between all the larger ones and they tell a much more complete story about how these sequences are evolving in space and time but they also represent the vast majority of the data that we have and so being able to identify all this stuff helps us move everything quite a bit forward so okay so this is a map of southern california here so I heard a little little hard to see here but every dot is basically an earthquake we can look at the magnitudes associated with these and we can count them here's a histogram and if you look at the envelope of this you can see that it basically has linear scaling down to about magnitude one and a half when the half or so which tells us that because we believe this keeps going forever that we're starting to miss events below about one and a half and so if we were able to push this down to let's say a full magnitude unit smaller we expect because of a b value of basically one that we would find something like 10 times more events here which means that we have a lot more information available to us to use okay so so how do we even traditionally detect earthquakes in the first place for decades this has been dominated by the use of moving averages and so this kind of underlies almost all of the real time seismic operations worldwide so you have some seismic data like this and you basically run a couple of moving averages on this one that's really short and it's designed to track basically the running signal level across the data set and then a longer one that is designed to track the running noise level if you take the ratio of these two you get something that looks like this which basically is supposed to increase when you get some kind of impulsive transient signal there and then decrease otherwise and you can set some threshold here and above which you would then trigger and you'd say that you've detected something at this point you don't know if this is an earthquake or not it's something that might be an earthquake and needs to be subjected to additional tests and so forth but this is really kind of a key algorithm behind all of earthquake monitoring even today so how do we go about building a catalog well we take data like this and so this is potentially streaming in in real time every second we're getting packets of this data flowing in and so we run these kind of moving average detectors across this whole thing and these are making tentative phase picks continuously at all the different stations that we have in the network and at the same time we have an algorithm that we call an associator which is basically some kind of decision module that's looking at combinations of all of these picks across the network and seeing if there's some subset of them that basically back projects to a coherent origin somewhere within the network and if there is we basically take all of those that back project there convert them into essentially phase arrivals and we say that the event is formally detected at that point in time and so once we do this we can then locate the event using those picks we can then go on to calculate magnitudes we can derive earthquake source properties and everything else that starts to go on from here and so this is important because basically this catalog so everything leading up to this point here is kind of the starting point for almost all downstream analyses that we do in seismology which means if you're talking about your seismic tomography studies or your tectonic analysis or your analysis of earthquake source property all that stuff heavily depends on having this catalog here and the measurements that are contained within it so so that's you say well okay we have this great workflow here and we're running this in real time it must be pretty useful in reality setting up something like this is incredibly difficult to do it takes lots of people to precisely calibrate all these algorithms for a specific dataset and in reality seismologists all the time deploy instrumentation to record new data new places where you've never seen anything before we don't know where the earthquakes are coming from and setting up something like this is a very non-trivial exercise so we really lack the capabilities for extracting information easily from these large datasets but we would also like to move beyond kind of these old school algorithms that are based on moving averages and things that don't know what earthquakes look like to really improve on all this so today i'm going to talk a lot about how to do this with deep neural nets i'm going to focus particularly on these parts here these space picking algorithms and the association step but i'm also going to talk about a few other areas where this whole thing can improve and so again improvements here and it kind of streamlining this process gaining more sensitivity have the ability to impact everything to kind of come downstream okay so i'm going to talk a lot about convolutional networks in here today and so i want to highlight particular aspects about convolutional networks that make them very well suited for seismological purposes so convolutional networks are systems that allow you to basically input some kind of raw data like this in our case it's going to be a waveform and output some set of predictions so and it might be you know likelihood of a seismic wave or a different type of seismic wave for example and they have some kind of nonlinear mapping in between and in particular we have this feature extraction system here combined with something basically a fully connected neural net here so this feature extraction system is kind of the heart of this convolutional network and it's designed to basically account for translation invariant structure that we expect from this in the data and that's because seismograms we expect there to be translation variants if you just cut some window you expect that the arrival could be anywhere within the window and you need to be able to extract that information automatically so this type of structure is very well suited for time series seismic data we can basically say that this system here functions because you basically have a stack of filters that are very short and the value of these filters are learnable and so one of the time you take these filters and you convolve them with basically the output or the input here and you pass it through some kind of activation function which produces basically a heat map that activates when it's fine when the filters find something that they're looking for and so from here you can basically down sample these activations and repeat this whole process by convolving again with the outputs from that previous layer so you're finding again stuff that you're looking for but now at a different scale and you again pass through an activation function and you repeat this whole process a bunch of different funds and the idea is that by the time you finish this whole thing you've been able to extract patterns from the data in a translation invariant manner and so from here you can take these features you can pass it into your standard neural net and use that to do some kind of class prediction about the things that you care about so that's basically what makes convolution network state of the art for all sorts of types of problems where the data has a translation invariant structure and so they are excellent this task because they're able to generalize the knowledge contained in extremely large data set which means that you don't need to have some kind of specific match to things that you've seen before in the past now the major limitation of all this is that they require large amounts of labeled data and that means as Karen talked about before you have some kind of ground truth examples of that phenomenon so in seismology it turns out that we are quite rich in labeled data we've been labeling earthquakes and phases for more than a century at this point an example of a data set like this is what we hold the Caltech which is basically labeled phase data back to 1932 and so you know this is associated now with millions of earthquakes more than tens of millions of hand measured P and that's where the rival crimes we also record and save all these first motion polarities which we can use to calculate earthquake local mechanisms and so for the first part of this I mean the goal is basically to use these large labeled data sets to train deep neural nets to do earthquake detection very well okay so um so how might this work so here's a snapshot of data three components any and z and so given this which might have a couple of earthquakes in it so here's four events the red is for p waves and blue is for s waves they'll have the same s minus p time here as you can see um and what we would like to do is given this time series we would like to output basically a probability time series of equivalent length where actually it's a couple of them so for this case you have one for p waves and one for s waves you can also add one for noise for example in this case you would want it so that all of these sum to one at any given time step and can be interpreted naturally as probability so in this scenario here whenever you have a seismic wave present you would like this to go from something near zero to one and then back down again and so forth so that you can then set some kind of simple trigger threshold uh and just read that off to detect and log your seismic wave detection um so there's a lot of different ways that you could go about getting this you could take it in a model like I showed you before where it outputs three numbers and for a single window of data you can assign those to the center time point of the window and then slide that along and generate a time series like that you could also do a multi uh target prediction problem where you given this time series you could output the whole time series and that's been done successfully so there's a lot of different strategies that you can use here to achieve this and it's not really limited to any one particular architecture uh so basically um so I'll talk about one particular example of this here um so we took this big data set that I curated it's basically almost three thousand earthquakes uh seven hundred something stations and it's about four and a half million seismograms split evenly between p waves s waves and what all this is called pre-event noise seismogram and so given this whole data set and so these are kind of some random draws from that p waves s waves and noise um we can have a model that just looks at these and learns how to classify them just based on the the raw time series alone so we can take this whole data set we can train this model end to end to minimize the prediction error directly on the ground truth by making adjustments to these filter parameters uh accordingly so um so I'll show some examples about how this might work uh so this is a swarm that occurred down near the southern end of the San Andreas here a couple years ago uh just near Bombay Beach California so these kind of circles here were events part of that swarm and all apply to some continuous data that was recorded during that time so um so here's a chunk of data that's 10 minutes long if we do this in a sliding window mode where for each window you just classify it and assign that to the center time point you can generate a probability time series just like this and trigger every time you exceed some threshold here where these colors indicate different types of seismic waves that were detected we can zoom on this and um you can see kind of examples of the temporal sensitivity that comes out of it so here you have data that's two minutes long or so and there are about 14 events uh in this time series so we're able to resolve events basically that are only a few seconds apart for both phases as a very quick um turnaround between them so even when they're close to overlapping we can still resolve them and that's really not a capability that we had with these moving average type of techniques uh we can take this we can run on even longer data so here's 24 hours we can look at it relative to the onset of the swarm here and track how the number of detections increases over time so we can see the swarm starts exactly right here and then these two types of seismic wave detections just rise together leading to something like eight times many phases that you detect over this 12 hour period so um and this is relative to the original regional network catalog that we were comparing it to so there's a lot more increased sensitivity when applying a technique like this um is really quite powerful um so there's lots of other really exciting applications of deep learning to effect detection related processes one of them is in the area of single denoising um so this work by by Jou et al which comes out of the Stanford group um basically takes some kind of input noisy seismogram like this and has a model that given some ground truth learns how to denoise this into this so you can split this into a clean seismogram that's noise free and to clean noise whatever that really means um and in between you have a model that basically learns some kind of low-dimensional representation of this input and then uses that low-dimensional representation to reconstruct um these clean signal examples and so this is one I think really exciting future direction uh for applying neural ethnicity seismology uh what this is going to help us detect uh seismic waves better also measure their times better other signal properties things like that and obviously there's going to be a lot more work to come I think in the area in the future um there's also very exciting applications to phase arrival picking so um you can take a model and you can train it to for example predict the time that of the onset of a seismic wave within a given windmill um and here for example we took basically a million phase picks um and compared the predictions doing this against the ground truth and 75 percent of them are within basically three samples of what the human being could do and if you put 100 people in a room and had them measure the onset of of these seismic waves that would be basically about the error of what the human could do so we're basically within this or potentially even better um and really if you know about the history of all this stuff seismologists have been developing automated algorithms that have been chasing the performance of what humans could do for basically 40 years at this point and all these algorithms even up until maybe two years ago we're still far behind um what a human could do and so now really with deep learning this is kind of just changed dramatically um and of course this propagates into everything that that goes downstream um there's other architectures you could use to do basically the same thing so this is with phase net um which is basically using a a unit type architecture to predict the entire time series uh for given an input waveform and so you can do p waves and s waves here and then compare them against um you know an other breasted type picker here you can see clearly clearly that the uncertainties shrink significantly um and I think this is going to keep going better and better so um this has very fundamental implications for again everything that that depends on earthquake catalogs uh inequality of them we also have kind of a long-standing problem in seismology that I mentioned briefly before which we call the phase association problem and it's basically given an unknown number of earthquakes or even none and a set of phase detections across your full seismic network you want to assign each of these phase detections to the event that caused it and so um you can look at a cartoon like this where we basically have station latitude and some normalized sense here versus time and so each of these little circles here represents um a phase detection that's made somewhere within the seismic network and so if we apply one of these types of approaches I mentioned before where you basically make all these pics you can think of reducing this very high-dimensional data set to a handful of discrete trigger times that are distributed across your network and so you end up with something like this where you have potential move out patterns like this for different sizes waves maybe they might be overlapping like this and then you have other stuff here that you want to discard altogether and so you basically have some kind of supervised clustering problem where the rules of clustering um get with this are basically characterized by the physics of wave propagation in the earth and so a trained expert can look at something like this and recognize that this looks like a seismic wave it's got a move out shape associated with it if you look at record sections you can do the same thing basically the idea that we're going to talk about here is using deep neural nets to do the same kind of thing so learning to recognize the characteristics that seismic waves make as they sweep across a network so the algorithm I'm going to talk about using here is based on recurrent neural nets and so these are you know they're not quite the same as the other types that I talked about before because they're based on working with data that has a sequential structure so with typical neural nets fully connected ones they lack any kind of mechanism for learning sequential structure and data so um recurrent neural nets they basically achieve this with uh an internal memory state of some kind and so you can think of passing in some kind of sequence one element at a time like this and making some kind of prediction here and feeding part of that information that you've extracted from this back in that when you process the next time step you can use the context from the previous one to help make a better prediction for the next one and you can keep looping over each element and of course the more elements that you have the more context that you have and so these types of neural nets are state-of-the-art or have been state-of-the-art recently in many areas of of artificial intelligence including speech recognition speech synthesis that kind of thing and there's a couple different variants here one is called a long short-term memory network and another one is called a gated recurrent unit they have different properties but for the purpose of the talk today they're basically very similar and so again the idea with these is that they can take in a sequence and extract the context from that sequence and use that information to make a better informed prediction so we developed this algorithm we call this phase link it's basically a deep neural net approach to the seismic phase association problem and it's designed to basically take all of these pics as shown in this cartoon here and treat them as a sequence so each one of these pics is tagged by a latitude of the station and its longitude the time that it occurs at the type of phase that we think it is and if you sort them in order then this becomes your sequence and so the idea is that we have a model that for a given window of pics and in our case it's going to be 500 of them we basically want to take those 500 pics and make 500 predictions one for each of the pics we want to predict which of these 500 pics should be linked to the very first one within that 500 and so it's basically a binary sequential prediction problem we simply want to link pics together that we think came from the same event somehow we have a model that learns to do this and so let's say that we input this here which only has six for the cartoon and it scores this one a one because it's from the same event as itself we think this one should be from the same event as the first one but this one should not it should be discarded and then one one and zero here and so we end up with some linking structure here because of the model that learns how to do this so we basically have a model that is designed to given a set of training data with ground truth to learn to recognize way fronts and make these predictions accordingly so we can do this basically for this sliding window here and so for basically if you do this every pic becomes the root pic within the window at any point in time or at one point or another and so you can basically do this for all possible lags of the sliding window and you build a big highly sparse matrix which basically breaks up automatically all of the pics into a graph essentially so it defines how each one is linked to all the other ones and so it's basically learning to recognize these patterns because they make the shapes that seismic waves are making as they sweep across a network so we can solve the state association problem just by giving it ground truth somehow and doing this kind of similar to whether the human can do it so if you look at record sections you know 100 times human can look at a record section from a totally different region but somehow see that there's this seismic wave sweeping across the network and we know this kind of characteristic shape well here we have these neural nets they're basically learning to do the same kind of operation so I talked about a bunch of different algorithms here we can take this kind of phase detection part and couple it to this association part and we can run this thing end to end on a big chunk of continuous data here I'm taking data from Southern California this is three years of continuous data for all of these stations here where we detect like 86,000 events and these are all the locations that come out of this we're just something like five times as many events over that three-year period as we had before and of course I think this is going to continue to improve more and more as our algorithms get better and so forth but this is starting with absolutely nothing it's raw waveform data end-to-end detection with a catalog and so to be clear this has always been a non-trivial exercise to do this people spend years collecting data trying to get to the point where they can build their catalog that they can use for their projects that they're interested in whether it's you know short-term deployments or things like that building catalogs from scratch with no knowledge of the area is has been a barrier and so here we're hoping that this can kind of streamline this process significantly we're just going to I think move everything a lot forward so here's just some quick examples of this so here's some size of waves detected and it turns out that we typically end up with many more S-picks now than T-picks which is kind of the opposite problem and we had before now the P-waves tend to be pretty attenuated at far distances but the S-waves are still slightly above the noise level and so we can kind of we can zoom in over here we can see that this stuff is starting to basically nail seismic wave onset even at very low S&R conditions where we would never have a human being picking that because they wouldn't be competent at doing it so okay the last example I'll just mention very briefly is that we can also do this for focal mechanisms and so we took a data set of almost five million first motion polarities which are used to construct these these beach ball plots which tell us how the fault moved and in the direction that it moved in and typically we do this by measuring whether the ground moves up or down at every given site and then reconstructing the fault planes accordingly so we took this model that learns to measure these first motion polarities we can run it on here it's about 150,000 earthquakes and we can compare the polarities that come out to what the humans can do and invert them for focal mechanisms and so here we basically have polarity misfit so smaller values are better and the number of focal mechanisms that come out of this operation and so you can see that for the machine the misfit is lower and we get many more focal mechanisms out and this is not about overfitting because there's only three parameters here it's for the exact same set of events between both data sets except we get many more focal mechanisms out so we can conclude from this as the machines actually do a better job than the humans do and it's not just on this it's actually the also the first motion polarities we found that something like 20 percent of all the polarities that the humans were picking are probably wrong and so the machines are doing a better job because they're picking the opposite sign for these so um so there's a lot of promise here because this is completely automated you just take a data set of hundreds of thousands of events and it spits out basically focal mechanisms as a result so okay so looking forward um seismology because of these large labeled data sets and data driven problems um has really quickly become a leader in applying machine learning to um to our domain and um what what i'm seeing really here is that you know over all these decades we've had this general inability to build or to a catalog from scratch for arbitrary data sets on demand that's always been a barrier for all the downstream analysis that we do and so machine learning is providing basically end-to-end solutions that are making these barriers disappear and very quickly um and so you know i think this is going to transform experimental seismology where you have some science question that you're interested in now i can actually deploy all my instruments collect the data and quickly build catalogs which i can use for the actual questions that i'm interested in um you know it used to be in it's always been that people spend years trying to get to the point where they can actually do the real science they care about because we didn't really have these types of techniques that can work on arbitrary data sets on demand so i think that's really an important takeaway from all of this and the last part is that these automated measurements that are coming out of this are basically as good or probably potentially better than um the human ground truth and so that means that we're going to be looking at much more expanded catalogs better locations that come out of all this stuff improved tomography studies improved understanding of earthquakes and faults and imaging the faults better understanding of earthquake source properties and every single thing um that comes down it depends on all this information so um i think it's really exciting uh where we are right now and so i'm looking forward to the future uh i'll stop there thank you we we just have time really for one question i am actually going to go ahead and one in addition to what i'm going to ask as well i mean it seems like the obvious point though is that you're working with an immense data set so they'll learn that you have pre-existing data sets for the machine learning but a large i'm not um let's say most experiments that are these days or new there are going to new areas where we don't have that benefit or we may have a single um nearby permanent station within the area so it's not a it's not a solution to all of the kinds of problems and it's directly relevant to thinking about how sage engage and new facilities are actually going to be implemented and how how experiments will be run as well do you have a comment or maybe we could talk about this later as well but um we found from extensive testing over many regions that for these problems that the data sets that we use for Southern California for example train that they generalize quite well other regimes and so that's very important there are some cases where we seem where things break down a bit potentially source steps are much different so the polarization is quite different but um there there are a number of great examples where these things generalize very well and so um i mean we're taking data from Southern California who applies to induce data sets at shallow depths okay in other countries i mean it's just um it functions quite well yeah this is maybe a related question uh you mentioned that the models even better you think than uh human ground truth data sets but your models are well so they've seen this in computer vision too where they have taken data sets like image net which is basically labeled images from from humans and they're at a point where the machines actually can do better than than the humans can and so you can estimate the error on on human the human error by basically putting a bunch of people in the room and having them each measure the same image and tell you what they think it is and so um the variance on that is the error rate then you run the machine algorithm on it and it does better than this so they can actually quantify what the the baseline is for um for the human error and show that now um these deep neural nets are are outperforming uh the human capabilities so that's the error so that would be estimating the error rate of the model compared to the human error labeling rate but i'm asking about mistakes that humans are commonly making like you mentioned that humans may not be labeling a certain type of wave because they might not be confident they never would have labeled it so if they're not labeling it how um how is that influencing your models predictions what why is it maybe labeling things that the that were not included in the training data set not for that specific example but in general because there's still generalization properties I mean how far you can generalize a wave from examples that you've seen only in the past is is not exactly correct it's very problematic but there's certainly still generalization properties um yeah that's i mean it's a great question but there's um there's a lot there and I imagine we can return to that uh in the discussion at the end but I think we it's time now to move to Diego do you want to be mic'd up okay and so uh the last speaker in this session is Diego Malgoz who will talk about rapid forecasts of earthquake hazards from crustal deformation patterns okay so I'm going to talk about a completely different problem and uh thanks for the invitation first of all to speak here today I'm going to discuss how we can make short-term forecasts i.e. in minutes of earthquake hazards from onshore geophysical data and I'm going to focus on my favorite sensor which is GNSS but this I hope it will extend to other geophysical measurements as well this is very much work in progress so what I'm going to show you here have some component of still aspirational but it's been a big collaboration mostly this is Junting Ling's PhD thesis but also a lot of work from two very talented postdocs and two colleagues in the department who've taken on different bits of this we've benefited enormously this goes to Carrie Ann's point we've benefited enormously from the fact that the supercomputer on campus hired a research staff for data science the same week I was hired so we benefited a lot from interactions with them as our sort of ground truth of what if when we frequently ask them if what we're doing in terms of data science and machine learning is stupid or not and they're a great way of checking that up for us so I'm going to start by focusing on one problem this extends to other earthquake hazard problems but this in many ways is the easier one and that is local tsunami warning so I'll direct you to this map over here this is a simulated magnitude 8.2 earthquake and these are the arrival times for the first peak of amplitude at the local coastline and this is just to situate you and how quickly we need to respond the times are down here so the first arrivals right off offshore right off set earthquake are about 17 minutes this is a good situation if the earthquake were down here where the continental shelf is shorter we're looking more at like the eight to nine minute time frame in terms of when the arrivals happen so we need to characterize the earthquake and we need to characterize the hazards very quickly and local tsunami warning remains a challenging problem in all tsunami prone regions different parts of the world deal with it in different ways Japan probably deals with it the best but nobody deals with it perfectly or well at all in many cases so what we're going to do is we're going to focus on real-time GNSS most of you might know this is what a GNSS station looks like it's an antenna firmly coupled to the ground via geodetic monument and when the earthquake moves or when the earthquake starts I'm going to wind that again you'll get this beautiful signal of the deformation across the crust in this case I'm showing the Tokuoki magnitude nine earthquake you don't need to be a crack seismologist to figure out the source is mostly around there and that if things moved about five meters then it's probably a big event so we want to exploit this kind of pattern here are a few other examples this is another very large event the Maulay earthquake in 2010 the magnitude 8.8 also with about five meters of crustal deformation across a much sparser network but the important thing about Maulay and that I want to impress upon you is when you think about GNSS don't just think about these arrows these vectors pointing out to sea what's important about GNSS is also its time history how do we get from no deformation to the final static deformation so this station that moves five meters here's its time series in the northeast and up direction how we get to those five meters of deformation there's a lot of richness and detail in that trajectory and that's also a pattern that we want to exploit when we characterize the earthquake and we characterize the hazard itself so there is some simple correlations between GNSS and tsunami impacts if the earthquake if the GNSS signal is large the tsunami is large that much is quite obvious we see that into Hokuloki and we see that in Maulay where five meters of crustal deformation produce tsunamis in the 20 to 30 meter range in the 2015 EAPL earthquake we see somewhat more muted displacements about two meters because that earthquake is a little smaller magnitude 8.3 but we still see also a very large tsunami and this pattern of all the stations lurching out to sea is always a telltale sign of a traditional megathrust earthquake but the crustal deformation patterns also tell us when maybe the tsunami is going to be a little bit more muted here are some examples from the magnitude point three to Hokuloki earthquake where we see only about 50 centimeters of crustal deformation that lead to a tsunami that's only in the one to two meter range same thing for the ekk earthquake in 2015 a magnitude 8.1 where we only have a two meter tsunami and the equator earthquake a year after where we only have a one meter tsunami from one meter crustal displacement of course the correlation is not simple some broad scale features like bigger big big crustal deformation equals big tsunami are simple but the details are not simple at all so what we're trying to exploit is the fact that when an earthquake breaks the megathrust if you look at this cutaway of the oceanic slabs abducting beneath the continental mantle what gets recorded at the top on the continental crust is the deformation caused by heterogeneous slip it's never just 10 meters it's never just 30 meters whether you get 10 meters deep in the megathrust or 10 meter shallow in the megathrust that's going to generate a very different tsunami so we need to capture that variable behavior at the subduction zone interface and indeed when we do a traditional slip inversion that's what we see we see heterogeneous slip all across the megathrust this is a very well known feature of earthquakes there's nothing new here but we need to capture that complexity because where the slip is happening leads to where the deformation is happening and eventually it drives what the tsunami is going to look like and you can see here how complex that can be for that same magnitude 8.3 earthquake when we calculate the vertical deformation that starts the tsunami we get this very complex signal this is from traditional seismological techniques from slip inversion we see this upper slope three meters of ups of uplift three meters of uplift in the middle to lower slope and of course this is going to lead to two packets of tsunami generation two big waves and also drives when the tsunami arrives but importantly if we look at a GNSS station on shore we see that while this isn't exactly measuring this uplift and subsidence if you squint you can kind of see these large three periods of growth you see it growth in the GNSS a little break growth in the GNSS a little break and then the final growth to the static offset this time variability is related to the same features of the earthquake that make this patch of uplift and this patch of uplift so we want to use this temporal information in addition to the size of the offset to characterize the earthquake and to characterize the tsunami itself of course the the one thing that is also very important about GNSS is that these crustal deformation patterns also tell you when something is not that hazardous the most point in example is this magnitude 7.9 earthquake that happened offshore Alaska early last year where there was it was a strike slip event so lateral faulting almost no vertical deformation so only a very small modest tsunami 20 centimeters but the Pacific tsunami warning center sorry the national tsunami warning center who's in charge of this area they don't currently use GNSS operationally so the watch standards what they saw in real time in the first 15 minutes was magnitude 8 close to the trench in a subduction zones so they naturally issued a far and wide warning to all of Alaska and the Pacific and the US west coast saying there's likelihood of a large tsunami GPS easily reveals that because in the crustal deformation patterns instead of seeing that familiar lurching out to sea of everybody you would see what seismologists will understand is that four lobed pattern of deformation characteristic of a strike slip earthquake so GNSS captures that as well but perhaps what's most exciting about GNSS is that it also captures or helps you to identify tsunami earthquakes this is a boogeyman if if you've ever worked in tsunami science tsunami earthquake is a term that i'm not in love with but it's part of our vernacular in seismology tsunami earthquake produces a tsunami that is much larger than what you would expect for its magnitude so the most recent example we have is the mental wide earthquake in Indonesia only a magnitude 7.8 that produced a 15 meter tsunami with widespread devastation and a large loss of life in the mental wide and the near silence now these are very difficult here are results of the tsunami survey conducted by Emma hill you will see that the in some parts of the inundation or the run-up was up to 17 or 18 meters so this is a very very very large event in a recent paper we were finally given access to the local seismic data and what we found is to me very striking if you compare the seismic recordings from similarly sized events in other parts of the world not tsunami earthquakes just traditional mega thrust events you'll see what looks like regular strong motion seismograms roughly of the same size these stations are roughly the same distance from their causative earthquakes nothing too surprising there but when we look at the mental wide earthquakes you almost see no ground motion associated with that event there is almost no shaking associated with that tsunami earthquake um so these events are are terrifying because the shaking is like only a magnitude six but the tsunami is like a magnitude nine and it makes them incredibly challenging to identify in an operational setting here too gnss can help of course we know that the reason that these tsunami earthquakes happen is because they're breaking the shallow most part of the mega thrust with incredibly large slip in this slip inversion by han yue and turn lay you'll see up to 20 meters of slip and only a magnitude 7.7 to 7.8 earthquake meanwhile more traditional seduction zone earthquakes of the same magnitude are deeper in the mega thrust you'll see them here in Costa Rica this one in Chile also in Chile and in Ecuador they have two three four five meters of slip and because they're deep they don't generate a large tsunami gnss sees this if you look at the gnss recordings for all of these events you'll see a traditional pattern in the normal or common mega thrust event with a short abrupt rise to a large peak displacement meanwhile in the tsunami earthquake and mental life you see this very very long drawn out growth to the final deformation from the earthquake this is exactly what we want an algorithm to be able to do to say the magnitude of the event is 7.8 or 7.9 because the displacement is of a certain size but we also wanted to look at the trajectory so that it can say something more educated than just the magnitude so a working hypothesis has been for some time that time-dependent onshore crustal gnss can be used to forecast something that is occurring offshore to forecast the tsunami impact because in that time series of gnss you have information that correlates strongly to what is happening offshore so the deformation of the sea floor that eventually causes a tsunami but the relationships are not simple and there's many sort of edge cases that we need to worry about like tsunami earthquakes so traditional seismological algorithms like scaling laws moment tensor inversions slip inversions they work well for some of the events but it's hard to build an algorithm that works well in real time for every single instance of what is possible so if i said complex correlation the first place your mind goes to is well we need machine learning to solve to draw the lines of demarcation in those correlations the problem that we face is that thankfully large earthquakes are rare so we do not have a real training data set of large earthquakes to use so what we do is we solve that through simulation we take the step of simulating as many large earthquakes as we can in a variety of situations some of them tsunami earthquakes some of them traditional mega thrust events and use that as our training data set for the machine learning algorithm of course as I just said these need to be realistic this is a big challenge how do you know that that magnitude nine point two that have a computer just spit out how do you know that that and it's associated gnss waveforms are what earthquakes actually do and can we train a time-dependent machine learning algorithm that takes not just the growth of the gps but the path that it takes to make evaluations of about the earthquake and forecast the hazards we do we take a two-step approach to solve this first we have a fairly efficient code called fake quakes that creates stochastic slip distributions based on the on the work of martin marie and and greg borosa this is now 17 years old but it's with what stood the test of time very well where we can make stochastic slip patterns and then apply a set of kinematic rules to make those patterns go in time we try our best to make these kinematic models always correlate well to reality we do have a few large events to compare against and we are always making those comparisons to make sure that things make sense everything in this code is a tunable parameter so if you want to argue with me about why that value and not some other value you can take the code base apply some other value and see yourself what the results and the changes will be um what's best or what's nice about fake quakes is that we we've put in a lot of effort to parallelize it so we can very efficiently generate tens of thousands of simulations across the large gnss network here's an example for cascadia a magnitude 8.6 and you see these arrows unfolding as the pulse of slip propagates through the slip distribution and we generate the one hertz time series of deformation the nice thing about gps is that it's relatively simple so we can use a realistic earth structure and everything is deterministic there's no even though the source process is stochastic the propagation through the earth remains deterministic so any structure that you put in there will be reflected in the resulting waveforms the other piece of this is a geoclaw which is this tsunami back to forward one one more area which is this tsunami modeling code out of the university of washington created by applied mathematicians there geoclaws come a long way since we first started working with it is now gpu and cpu enabled so it runs very very quickly and it can allows you to simulate the tsunami across a very large domain this is an example for the tohoku oki earthquake and it also allows you to inundate the shoreline so it's not just a reflecting boundary condition here you can actually see what happens to the tsunami inundates and has overland flow recently they added some adjoint method trickery to geoclaw so that you're only refining the parts of the model that you care about and you can ignore parts of the domain like down here where there's no tsunami propagation and you don't need to know about what's going on there so we take these two pieces fake quakes to make earthquakes and geoclaw to make tsunamis to develop the data the data set that we're going to use for training we use even though our end goal is to work on cascadia at least that's my goal because i live there the what we're focused on now is chile because chile actually has had some large events that have been recorded by gnss it has had exactly size which might not sound like much to you but it's more than zero so the kiki earthquake in eight point two and it's aftershock of seven point seven the ea pelle earthquake in eight three malay in eight point eight and malinka seven point six so it actually spans a pretty good magnitude range so what we do is we've generated 50 000 simulations across the megathrust using a realistic slab geometry in the very broad magnitude range and we synthesize the gnss at 121 stations of what is right now today the chilean operational gnss network and we use these five events as ground truth a validation forward this is what the data look like for one of these very large events is for a magnitude nine point three a station really close to the earthquake sees about nine meters of crustal deformation in the west direction with some corresponding uplift and north signal okay so we need a temporarily dynamic algorithm because it's not just about the size of this it's about how you got there and about every single little detail in the gnss and of course to those who know about machine learning this screams rnn so that's what we use we use a recurrent neural network i'm not going to dwell on this too much exact did a pretty nice job of explaining that but we want exactly what he described we want to input the gnss at five seconds whatever the earthquake is in the first five seconds put that into the neural network and make a prediction about what the magnitude is at five seconds i'm going to start with magnitude and work my way slowly towards forecasting hazards but we also want information to make it through such that at the next five second interval you put in what the gnss network is doing at 10 seconds but you also need whatever happened before to make the prediction about the magnitude at 10 seconds and you want this to keep going up to 15 seconds to 20 seconds to 30 seconds and we use a plain vanilla lstm implementation of our rnn and it works um quite well we've barely made any modifications to this so the way to think about the problem is you have these vertical slices where each one of these slices is a scenario with its 121 gps stations and what we do is we take cuts at five seconds 10 seconds 15 seconds 20 seconds and that's what goes into the training and into making the predictions of the earthquake magnitude that's what we're asking the lstm the rnn to do now there's some important implementation features that about the training that that end up mattering a lot so the way we do the training is we pick one rupture randomly from all the available simulations and we add realistic gnss noise this is very very very important because we also want the algorithms to be robust that to not just classify any little blip in the gps time series as a hazardous event we know from real analysis of what is happening in real-time gnss networks worldwide we have a pretty good model of gnss noise so we just make random draws from this gnss noise model and put it into the simulated data one of the most important things is we have to randomly remove stations you because 121 stations are not always recording every event stations drop out stations are being maintained for some of these events the stations didn't even exist at the time so we remove anywhere between 10 and 100 stations from the gnss packet of data and put that into the training we do that with an existence code stations get either zero or one if they're participating or not participating in that particular training step rinse repeat and train as many times as you can we we train with 80 percent and we leave out 20 percent of the data as the first cut for validation importantly the labels here are the final magnitudes of the event we assume that a magnitude nine and a magnitude eight look different at five seconds at 10 seconds at 15 seconds we make that assumption and ask the RNN to find the patterns that distinguish up an eight from a nine so this is how the RNN does after it's fully trained what you're looking at here is the actual magnitude in the simulation this is only a thousand events from the validation data set versus what is predicted in terms of magnitude and the dash line is a one-on-one correspondence line so you see behavior that you perhaps would expect at 30 seconds there's a lot of scatter because basically at 30 seconds a nine and an eight and a half still look very very similar so it's very hard to make that prediction and say with a lot of certainty that you've collapsed onto knowing the correct magnitude that scatter of course goes down as time goes on and as more information from more stations becomes available now the proof is in the pudding as they say and even though the validation data is great and all it's still simulations so what we're most interested in is what is the performance for these five real events that actually recorded gnss data so here's the first test this is the key k earthquake and magnitude 8.1 and this is a time varying prediction from the RNN which by the way has never seen this data we've never put the real event into training um the performance is quite good I would say there's some interesting things so here's what the actual gnss data looks like I'm sorry this is so small but you can see that by 10 or 20 seconds you actually already have about 20 centimeters of deformation signal at the closest site and yet the RNN chooses to ignore that completely we've noticed that the RNN does not like to make single station predictions it seems to want to wait for more confirmation from more sites and that's certainly true for the eqk earthquake there's some overshoots because there's memory so even if this station has reached its final growth and coming down the other stations coming below might still be growing so we've also noticed this sort of dynamic overshoot effect in the magnitude prediction we see that for eqk we see that for the eappel earthquake as well we're this is a magnitude 8.3 same sort of slow growth well this is still pretty fast but not as fast as I would like it to be overshoot and then convergence to a final magnitude same situation here there's a station here with already 30 or 40 centimeters of deformation by about 20 or 30 seconds but the RNN waits for confirmation from more sites we think that's what we think is going on except for the mallet earthquake and this I think is a very encouraging result this is a magnitude 8.8 earthquake and the convergence to a large magnitude is very very very fast by about 35 seconds you hit magnitude 8.8 right on the nose and this is mostly with one station so this orange station so in this case the RNN which doesn't like to make single station predictions see such a large and fast deformation signal you get to about a meter and a half by 30 seconds but it says I don't care that this is just the one it must be a large event and goes ahead and makes a prediction of a large magnitude for that event so that's very encouraging and important but in the end what we're after is forecasting the hazard in the context of early warning and the context of rapid response what the earthquake is doing is unimportant I don't care about its stress drop I don't care about slip distribution I don't care if it was long or wide what I care about is whether it's going to make a large tsunami or strong shaking so we modify our algorithm to take the GNSS at 5 and 10 and 15 seconds and instead of forecast just the magnitude it outputs every 10 kilometers um along the coastline a prediction of what the tsunami amplitude is going to be and of course we can do this because for every one of the scenarios that we have we have modeled what the tsunami is going to look like with geoclaw and that's what you see here in these bars this is the model output along the Oregon and Washington coastlines and the model output across Vancouver Island is up here so we just take the same idea and reapply it in this context this is very early days this is what we're working on right now today so we've only done these sorts of tsunami models for Cascadia we haven't yet taken this to Chile but what we're doing is making these predictions over 10 kilometer bins so this is what you see here the slip model and what you see in red is the outcome of the simulation and the hollowed out bars are what's predicted um in real time or in a rapid assessment and we've also worked with our NOAA colleagues because what they want to see is they don't they actually don't care about this level of granularity what they want to see is what they call their threat levels which is if it's zero to 30 centimeters it's green one to three it's orange more than three it's red this is what NOAA uses it's different for JMA so we can very easily go ahead and predict these threat levels because you don't want to make a forecast for one beach and then make a different forecast for another beach um up the road because the uncertainties are still quite high so we just sort of down sample that problem and make only one larger forecast so here's a confusion matrix for how well we can predict those threat levels lots of orange and red across the diagonal but in general we're over predicting a little bit which surprises me and I don't have a good explanation for why that is the case but again this is ongoing work we're only tested this for Cascadia and the validation here is very very challenging even when we take this to Chile what we have are a few tide gauges and tsunami survey measurements so we'll validate against that but it'll still be only a few events the holy grail is to do inundation modeling we want to move past this so saying what the tsunami is going to be at the coastline is well and all but what we really want to do is we want to know what the overland flow is going to look like at a particular location of interest say for example a nuclear power plant this is very very very hard to do we know the math and we can run the computer models to do inundation but it's much slower than just doing the tsunami amplitude at the coastline and there's a lot more sensitivity to getting the earthquake source right and having details like what is the built environment across this bit of land you need to have know those very very very well but I think in the future we'll be able to make this kind of movie this kind of map very quickly from the onshore geophysical data how well that will perform remains to be seen but I'm quite hopeful based on these sort of simple findings that will eventually get there so why why does this work I think it works I hope I've convinced you that at least it kind of works I think it's important to recognize that the reason why this approach of using simulations to train machine learning algorithms is working is because the physics that we've chosen to use is already pretty well understood so even if you know there's a lot of complexity in real earthquake rupture whatever these stochastic kinematic models are doing seems to be good enough to replicate what GNSS really does we know how to show the we know how to solve the shallow water equations we know but symmetry and coastlines really well so once we propagate a tsunami that problem is very well understood so these modeling steps are very robust and they're validated they map well to reality to what we think earthquakes and tsunamis actually do very importantly we can run many of these simulations so we can sample all the possible behaviors that happen during megacrust ruptures sorry I clicked and nothing happens of course you might ask yourselves okay if this works for tsunami hazards why not take this one step further to other kinds of hazards and that's what we're starting to do what we want is we want to predict the shake map on short not just the tsunami but the shake map this is a much more challenging problem for large earthquakes because we need to basically be able to forecast waveforms like this one this is a strong motion record section that has a lot more detail than what we would see in GNSS and in this case when modeling broadband seismic waveforms the details of the source and the path the things that we know for sure we don't know really well those things become more and more important than they do than they are for GNSS. Fakewakes can do this we can model broadband seismograms it's a lot slower but it still works but because we don't know all those real life details and we rely on approximate semi-stochastic methods because there's no way to make forward calculations for 100 hertz waveforms now today one question you might ask yourself is do we actually know about enough about this process for the sort of simulation and machine learning approach to be meaningful and that's a very valid question I think maybe I think possibly at least we can extract things like the broad shaking features across the landscape but we probably won't be able to predict every single nitty-gritty detail of a strong motion seismogram so that is uh encouraging but there's still some things are very very very hard and will likely remain very hard for the foreseeable future one of them is landslides earthquake triggered landslides are incredibly challenging so here's a magnitude 7.5 earthquake in palu in indonesia a large strike slip fault with a restraining band running right here along the landscape comparatively small magnitude for a very large tsunami there's very good evidence i'll just start this there's very good evidence that what triggered the palu earthquake is a combination of some component of cosite macdeformation but a lot of landsliding in the bay here you see a before and after picture with a beheaded uh scarf there that fell into the bay and there's good evidence that what causes the inundation down here in palu city is mostly the landslides and right now we have no physical way to connect a forecast of strong shaking to where and how the landslides will be triggered so we're very far away from making a movie like this uh in real time there's no machine learning that's going to do that right now so machine learning provides a really great way to establish these complex correlations between genesis observations and earthquake hazards and i think again as i said it works well for tsunamis because the underlying physics is fairly well understood and we can run many many many simulations to train our rnn now the physical connections between earthquakes and some of the other hazards are less well understood but i expect that our knowledge will continue to improve especially as we see more and more uh large earthquakes over the coming decades so we will be able to do better at this it's likely that what we want to do is not just use gnss we started with gnss because it's easy but we want to be able to pull in every single geophysical sensor into our prediction strain ocean bottom pressure anything that you can think of should be contributing uh to these sorts of forecasts uh some of the challenges that i that i foresee are that for earthquake hazards in particular we still don't know a lot of things for tsunamis we don't know a lot about how the wedge deforms and if we ever want to do an inundation modeling we need to pin those details down we don't know why strong shaking is generated preferentially in certain parts of megathrust earthquakes and more uh particularly whether we can find those strong motion generating regions before the earthquakes happen we don't know how shaking triggers sub aerial and submarine landslide of course they're physical models but um they're still not a deterministic way of taking a shake map for example and pointing it to a landslide inventory and finding the features that correlate one to the other so we're going to need if we're going to go down this machine learning and simulation path you need multi physics codes that simulate of all this behavior at a variety of timescales and because large events are rare and will continue to be rare thankfully um we require these codes to be very efficient and very fast and finally i'd like to echo a point that karyan made and that's that capacity building remains a real challenge i think even though with things like scikit learn and tensor flow and all sorts of stuff it's a lot easier to break into the field i think for most earth scientists it's still challenging to get into massively parallel computing and to learn about machine learning most of us in geophysics are still stuck in classical inverse theory and that's what is still taught today in most masters and php programs so i think there's a lot of room for improvement there so i'll finish there and i'll i'll take any questions if i haven't gone over too much thank you geigo just um just a second to explain what we're going to do next just a couple of quick questions for geigo and then we're going to bring the panel to the front table or even um uh karyan and uh zack you could start moving up there as well and then we'll open the floor uh to some questions but let's see whose hand was up first you've already spoken uh let's have richard in it and i'm sorry you can ask to the general panel thanks geigo that was great i so i want to try and tease out a little more about where the machine learning is really you know adding and and where it's not adding as much perhaps so i want to take the example the first example you showed right of using the uh the geodetic stations to come up with the estimate of magnitude so in that case do you think you're doing better with machine learning than just using standard kind of inversions to kind of slip on a salt yeah great question yes the answer is is very much yes we're not doing better at getting the actual magnitude right we're getting it faster um and the reason is that it's very hard to use a classical algorithm to use the growth to that peak displacement we just use a peak displacement we do great but it's hard to build an algorithm just with linear regressions and stuff like that that takes advantage of how fast that growth is happening so the RNN picks that up and that's why i think the malware convergence is even faster than what we've done before so there would be so then okay that's great and so then the other piece which is sort of a different advantage as i see it is when you come to the actual tsunami warning i mean the the tsunami wave height type of estimate there the real advantage of the machine learning is that you're simplifying what otherwise is a very large complex calculation right exactly we would have to have a moment tensor and a slip inversion and connect that to some deformation of the earth and run the code and all of that in real time yeah yeah just a quick question oh that was really great um so in your examples those are designed for um you know subduction zones where you already have pretty good network distance and distance between stores and receivers so if i ask the question differently can you use machine learning to figure out what's the optimum you know space between receiver stations and station density in order to issue like soon enough warning for a given magnitude in this area where you already know what the big uh maximum earthquake magnitude could happen that's a great question mohan um yes and that's something that we're doing now is trying to understand why some of the simulations are doing better than we have so many simulations that we haven't had time to tease out why some are working well i don't know if you notice but we have a slightly low bias for the large magnitude events why is that happening it's not something that we understand yet um as karyan said actually getting at what the inside of the algorithm is doing and why it's choosing to do what it does is not simple but we can ask questions like are the sparse network because we remove stations are the events with sparse networks is there a threshold at which we achieve peak performance we can do analysis like these but we haven't done them yeah great and tor has a very quick question well it's a follow-up to richard's question really and it's a little bit too to have told the whole panel you illustrated some of the uncertainties in terms of going from an earthquake of a certain magnitude to a tsunami right and there are surprises you mentioned some other things right the rheological behavior of the wedge is very important in in this context i wonder how the machine learning can ever do better than improving our estimate of the variations right because you use you sort of sample everything and that will give us you know the range much better right it's range with physics rather than some sort of a generation relationship but but what we are what you're trying to do is to get the specific right the particular out of it and and it seems that that very much then depends on the specifics of the rupture and so i'm not sure how you would get out of i don't think it would do better i don't think it would do better so if we ever want to make something like this operational at no we have to be very liberal with the simulations in terms of what we think is possible and just generate strike slip events otherwise normal false events shallow events events with plasticity events with nope just as many as you can and see what the output is and just stick that all into the machine learning algorithm such that the next big event has no surprises some part of its behavior will have been captured in the simulation so this very much relies on us knowing what the what the science has solved and putting it in there this is not discovering new science perhaps if we look at why the algorithm works we'll find some insights in terms of what is important but this very much relies on the physics being prescribed by me or by by my phd student yeah yeah like Diego i think it's it's great that we start this discussion session on this topic because in a way it's not the standard pure physics approach of blue skies research it's a specific task for hazard that also bridges multiple funding agencies as well the sponsors for this committee and so i think it's just good to bring that up as well i think another point that was common to many of these as well is where geosciences and data sciences where geosciences leads and where where we need to draw in resources or expertise from data sciences and you know getting getting a better feel for what your recommendations are would be really helpful as well as we move forward um you know we talked about benchmarks and and limitations of benchmarks um how we how we approach some of those obstacles and then obviously um you know they um desire to make discoveries or allow for discoveries as well so i i just like to open the floor now and i believe that others from can anyone from outside oppose questions as well okay so we'll see how that goes so so now we just open the floor uh to to general questions to annie or all um matt yes thanks uh for three great talks um i i just want to start with uh diego's last point for and get your feedback from all three of you on capacity building um sort of how are you imagining um educating uh both yourself i guess some of you came from data science backgrounds so you came from geophysics and your students uh going forward what are effective ways of teaching students uh you know to as in the name of the today's workshop to go beyond the black box that to sort of take to this next step of of trying to make either discoveries or digging into what the algorithms are actually telling you to get beyond just uh automation or whatever else um this sort of routine at this point um yeah i i think we're gonna have to educate a whole new generation of students to do this uh i'm teaching a class of caltech and machine learning and geophysics um the goal is to talk about problems that are they're common to the types of data that we all work with multivariate time series and thinking about um working them in that way and um the challenges that that we're going to have going forward the lack of ground truth for a lot of things i mean there's not necessarily answers to some of these questions um right yet but um i think it's important to get people thinking about them at this time and um and aware of these issues yeah i think i guess i'm thinking beyond a class so i think you know there's a there's single classes i'm thinking it's kind of revealing more of a curriculum of you know do you take students that have sort of a double major in data science and geophysics or is it you really want it someone to start in geophysics and then you train them as a graduate student data science i mean i'm sure there's multiple answers to this but i have you thought through you know what is the the step forward to make a progress on this um yeah i i think that's a good point um you know it in a lot of ways this resembles um what happened with inverse theory in the 1970s so um yeah people from different backgrounds are going to be uh become i think very important here uh the people that can are fluent in all the computer science applied math um will be very helpful for translating this type of technology into geosciences um you know there's obviously it's broader than just geoscience but again like i said before there are a lot of challenges that are kind of either common or unique to geoscience that that helps too so you know we do obviously need domain experts um it's not going to be just you fit one generic model to everything and it just figures everything out i think we're going to very quickly kind of um recognize what we can and can't do with the existing types of models that are kind of off the shelf within these software packages and then beyond that it's going to take people that are very well educated in um in machine learning algorithms um and also understand the structure of the data sets that we work with that you can very carefully design new things um i think that's the next level but we're nowhere near there yet yeah could i just ask you're talking about the education could you comment on the level i mean these are these are levels but i'm assuming you're you're recommending that we started undergraduate level that's what i was good i thought what i was going to say i rather embarrassingly at the university of oregon until last year you a geologist could graduate without knowing how to code so um i think you'd need to overhaul the entire earth science curriculum such that uh web services or cloud computing and programming languages and this kind of stuff becomes as fundamental a tool as microsoft excel is to them right now and we're very far away from that so geophysicists are sure are great at coding most of the time but other colleagues of ours in the earth sciences not so much and simply because they haven't been afforded those opportunities to discover that they don't suck at coding they just were never taught it the right way and we need to do better at that yeah so i'm coming at a just from a little bit of different perspective having come sort of into geosciences through data and computational sciences and one of the things that i think is um there are some sort of you know challenges coming from that direction but something that i've noticed i know a lot of other people with my background who work in other fields like they work with material scientists or they work in biology and i think in some of those other fields there's a greater recognition that biology i think is a good example like there's a field called computational biology there's a field called biostatistics and there's people who are specifically trained in that there's specific curricula and there's you know specific journals or specific conferences these kind of things i think in the physical sciences broadly and in geosciences there's not as much of a recognition that this needs to be as specific i think um it's all in sort of field an area of expertise i think you get a lot of people who are you know trained as geoscientists and then they learn about deep learning and they get really into it and they know that one method may be really well and they get good at applying it to problems but they don't have the sort of broad background that say computational biologists who know you know they learn the science they learn the biology but they're fundamentally trained more as commuter scientists who know some biology and know how to work and talk to and work with um collaborate with biologists um and so i think that that is what i would see as a direction um in terms of the level i don't know if it's something that would make sense at the i think it's sort of the graduate level that's something where it would maybe come in but i do agree that i think in order to prepare people to go into those kind of areas there needs to be more education at the undergraduate level so that you know people who are majoring in say geology feel like that's a program they can jump into it won't be the first time they're seeing taking you know linear algebra the first time that they're taking a programming course um and i think there also needs to be sort of uh more of a willingness to um try to bring people from computer science in as well um have courses that are um maybe of interest to them as well and sort of get them their take because right now there's kind of this like i think there's like geoscientists who are learning a little machine learning their computer scientists data scientists who have their own sort of research interests and i think there's a lot of space in between for there to be new methods and development as you see in computational biology biology has its own set of challenges they need their own methods for these like you can't just use image processing methods to solve problems in biology for instance so you need your own methods and so i think that there needs to be kind of education in this sort of specific area jeff yeah so i had a question this is partly inspired by uh zachary's talk about the sort of the feature extraction process and deago also talked about this to some degree that the shape of how it goes is really a feature extraction how much to what degree do you need to put in the filters that try to extract features uh do you just have to like try a huge variety of them and see which ones essentially activate or um uh is this something that actually uh yeah is it a lot of trial and error or is that we're actually we're putting in what we think we know about the physics um so just to clarify the filters themselves are learned so you don't specify what those are you specify the numbers of them for deep learning yeah you specify the number of filters that you have and kind of dimensions of them and everything else about them is learned right but is that from like the i mean there's an infinite variety of potential filters right so you have some degree of structure in terms of what things are being tried and what things are then learned right so there's no there's no requirements on it whatsoever they start as uninitialized parameters and they become something at the end of it and so that's that's the whole revolution here with all this is that um decades ago the standard neural nets you had to come up with your own parameters that you thought were going to be useful to help you make your predictions um and humans in fact back when um when deep neural nets were first introduced the filters were hand chosen they weren't learned at the time and so um people thought that they knew better right they they studied these systems forever and they thought well I can reduce that to some equation and I put it in there it turned out that when you let the system decide what to do optimize end to end that it does better performance wise so um yeah but there is a lot of trial and error and and and even if the end result looks like a beautiful machine learning like plot with all the fully connected layers and stuff there is a lot of trial and error in the architecture of how you go from inputs uh to outputs yes um so um so that's a great point um there's there's a lot of trial and error but um I would say you have to ask at what level do you do you care about this uh computer scientist cares a lot more about this than than a geoscientist would um I would say from my experience with a lot of this is that um getting the type of model correct framing the problem the way that you want it to be is much more important than tuning these numbers and other things so you can once the model is is in the right general structure so it's what exactly you even want to predict in the first place and that kind of thing um it tends to get you very close to the right ballpark um without extensive need for for tuning of all this stuff of course you can always get better by searching a whole primer space but that's from my perspective that's less important so then I guess the key is that you have a sufficiently large quantity of data that whatever feature you might want to be identifying is is there an enough repetition in the data that it can be found essentially I would say that's that's true yeah so you can't learn like you're not going to learn a feature if it's not in the data right these data driven methods it really depends I think another big aspect of it um sort of building on what Zach was saying is a big uh piece of this is actually putting together the data sets and getting sort of the data sets in a state that where the data set that you're using to train the models is actually going to be useful for the task that you want to do as well okay what I'd like to do is to revert back to the capacity building question and take a couple of questions from panelists and that is Steve Whitmire at NSF, he's a tectonics program director right now asks that you know for undergraduate curricula there isn't so much adding the issue is what you remove to make space um I'm not asking you what we remove but you know there are challenges I think and Artemis points out too that there are younger faculty within department who face obstacles in trying to convince folks to to allow them to teach because we pack our curriculum in interesting ways so you know there are folks pointing out challenges do we know of success stories or particular um directions forward strategies that have worked and been effective are you aware of so I don't I don't think you need to remove that much so most most earth science degrees have some sort of a statistics data some class that you could turn into modern programming but then you need to add it into all the classes that follow and I'll give an example that's going to annoy my fellow geologists I know how to use a stereonet there's no reason why I should know how to use a stereonet anymore we could use that time to teach to take these coding methods that we've learned and put them in stratigraphy and structural geology and problems like that and teach them how these tools actually help them with those disciplines so you don't have to create a whole slew of new classes I think you can put it in strategically in other parts of the curriculum I suggest to recap one course and a stats course that also covers machine learning and then inclusion of exercises throughout the rest of the curriculum following on and making that an early degree level required course or something like that right another question that came in um can we what are kind of community resources the last question from Leah is related to capacity building as well there's a limited number of institutions with both strong due sciences and data can we as a community build resources somewhat potentially like you know cig training courses and or as a component even if one of our existing infrastructure um yeah um so you know the panelists and perhaps provoking a more specific question is yes so I mean like cig is a great example of community hosted code validation exercises and then going all the way to end users and teaching how to use pilot for spec families but before it also could house benchmarking yeah yeah so I don't know what one more to say this what one more to say there other than yes should be doing that so from uh this is a great time to have this conversation right there's a number of announcements that we've made about future opportunities and changes that we're making to supportive infrastructure and um in in terms of the the competition for earthquake research science center and competition for a future facility in geophysics the community I hope is thinking carefully about what elements of workforce development that you would like to see supported by those um different activities and we're here to listen and see what is of greatest interest can I make a related comment to that but I mean isn't there a difference here and I want to see what the panel agrees with this I think to me I have a sense that there's a big difference between this topic and other topics about high performance computing and computational capabilities that we've talked about a lot in the past and that is that the tools that we're talking about like several of you pointed out these are openly available tools tensor flow is the the one that most people seem to use but these are tools that are widely available for a whole range of different kind of approaches which I think is different to when we've talked about computational resources in the gene sciences before it's been much more specialized and so isn't there a difference here that perhaps there's less need for us as geoscientists to be more focused on training students to use these tools because in fact these students are learning these anyway as an example Berkeley now has a you know big data is big everywhere of course the biggest class in Berkeley now for undergraduates is this data science class and and the point is that the students are actually now coming with these skills that we're talking about without us doing anything to promote that and so I just wanted to see whether what your sense was is this different to the computational questions that we've had in our community in the past if I could just add to that I was going to ask the same question but make the point that there are a lot of similarities with the access to high performance computing that we discussed in the sense that you know we've had run meetings where we explored ways of training students in particular in departments where there is no strong data science program right in the use of the leading edge computers and we felt in the past we're kind of falling behind here partially because of workforce training issues so I had the response well there's a lot of similarities so I'd be curious let's go um yeah I mean from my perspective I think that there there is a lot of resources out there already um the the real thing for me is um at what point is it relevant for for our specific problems the types of data structures that we encounter and things like that right if you're learning about machine learning in a computer vision context where everything is about images and there's nothing physical about any of this it's very different from taking something that is discussing multivariate time series the entire time um but I don't think that that's a particularly hard leap to make um I mean like I said I think that this kind of thing could be probably communicated easily through a single class um focused on applying machine learning to geophysics I'm regularly working with undergrads and grad students and on applying these tools to these problems and they've never done any of this step before but they pick it up very quickly I don't think that it's really kind of a huge investment that people have to make um to get to that point that so that's my general you know okay before I take any any well is it presented the question yeah go ahead yeah so um one of the things that has to happen for this sort of thing to succeed is that uh students and their importantly their advisors uh need to know what's possible and so Carey Ann designed a a short course for the ICME program at Stanford that's it's just like three weeks or something maybe you could tell us a little bit about that and say whether you think that would would work as a something like that would work as an entry to machine learning methods for geoscientists yeah so I designed this when I was a graduate student I designed with another student who was working in um basically computational biology um I also someone who's coming from a data science background and it was basically originally designed for the idea from it came because there are a lot of researchers and fields outside of computer science outside of statistics um we sort of traditional data science fields who were seeing people start to use these methods in their field they wanted to understand what are these techniques what can they do and and how do we use them and so we made this course um sort of for them um not for the the people who weren't going to sign up and take like you know the graduate level introductory to machine learning course that's um you know Stanford it's a very popular course but a lot of people it's not the first course they want to jump into it's you know really heavy in computing and in mathematics if you just want to get a sense of the field um and so that's where we started we did have students do hands-on work with with coding um it wasn't specific to any field we wanted to keep it broad but I think you could adapt this to geosciences because I think kind of building on what Zach is saying one of the challenges I think um and I see this a lot when you know we're viewing papers where mission people are trying to use machine learning and seismology is they take the class they do the you know the some class on Coursera they do the TensorFlow training and they often sort of miss like what the the things that people are doing in computer science like sometimes those things don't quite apply in in our sciences in the same way like there's differences in the data there's different challenges and so if you just take what you've learned in computer science and just apply it directly without thinking about the differences between the problems I mean if you don't have a deeper understanding of how the methods work it's easy to sort of think that you're applying them correctly and using them wrong and I see that a lot in in in papers um that get submitted I think one of the challenges is that you know um sometimes the advisors don't always know you there and so I think that's where these these collaborations and having this willingness to kind of um I think having these kind of short courses are useful for training like making that leap I don't think it would need to be like a huge number of courses but in sort of bringing up these issues like in in my presentation I was talking about what are some of these data challenges and then you could also you know bring up what are these challenges and how could you you know how do you start to address them or what changes you need to make to what you're doing and a class like that could also be useful I think for computer science students who are interested in applying problems to these fields so they don't jump in and say I don't know there's like an xkcd comic where someone's like jumped in I'm a computer scientist like I'm going to solve this problem with machine learning or something and then you know the follow-up panels like six months later they're like oh this problem is hard and I don't know what to do um this is harder than I thought and so I think that's kind of what you have to kind of bridge bridge back up so I think it could kind of go in both directions um the short course format um did you have faculty okay so yes we did we had the first time we offered it was as a workshop and the reason it turned into an actual course was because we had some faculty in the medical school who said I want my students to take this because the problem they were having is they would teach a machine learning course like for computational biology or in their own field but the students if they had had no machine learning they spent half the course just teaching them the very basics and they wanted to be able to spend that course just focusing on the kind of machine learning techniques that are most useful for their field so they will maybe want to um you know spend the like in geoscience you might want to talk about how you use methods with spatial temporal data time series but if people don't know the basics of machine learning that's hard to do and so they wanted to there to be they they liked the idea of having this course so that students could come in um with that knowledge already and already a sense of kind of what what the um different kinds of methods are that are out there um there are two questions from the um before we move on to any other topic those in the room would like there are two other questions that came in jingkai asked to diego specifically um that your training with this synthetic data um has a systematic underestimation of the magnitude and you know how do you deal with systematic problems within within machine learning as well i think it's a general problem i don't know jingkai i don't i don't know i don't know i don't i don't know why this is happening but it's not as bad as it looks in that plot because we only plotted a thousand of the events um but that's part of what we're doing now is understanding why it works the way it works um so this is a real challenge and and do you need to go back and recompute the full model no if you change station distributions no no that's the the benefit of adding and removing stations is that it's robust to the pattern of crustal deformation being sampled by different stations so that's a good thing um but i think the large event i think we just don't have enough events and training for the i think fifty thousand is not enough okay um but but we don't know for sure yep so i open it to other questions from the pardon only one more question uh from the floor yeah i deaf here so this let's suppose that i mean in the examples that were shown here essentially each each seismogram coming from a different station was independent that is there's no correlated errors between them and Diego you treated the dps as the same although technically there are correlated errors although they're probably small compared to the signals that you're looking at what if i mean is that a deal breaker if you've got spatial correlated errors with the structure we think we know or maybe we hope we know um does that cause any fundamental issues with these sorts of approaches or is that something that's fairly easy to handle and i guess maybe the same thing would be true of temporarily correlated errors in the data um it does violate kind of basic assumptions that are that are going into all this which is what is it is assuming that you're not highly correlated like that um it's not so much of a problem if you tend to or intend to use the model in the same area that you're applying it to and which is then okay because if you're if you're overfitting on these examples it doesn't matter because it's still in the same locations generally um it does affect generalization outside of um those locations i think this is going to become a more important problem uh going forward and how we deal with this it's not a trivial thing to deal with spatial correlation how you parse up the data set what you restrict it to and that kind of thing in order to try to um decorrelate that in some sense um and there are people working on this subject and within computer science uh and that kind of thing but it's not just a simple answer um yeah