 my pleasure to serve as a convener for the second panel on next practical steps to accelerate and broaden the use of machine learning in the geosciences. You know we're trying to discuss the opportunities and challenges a little bit more and touch upon some of the issues we already brought up in the first panel in particular. What's the science right? How can we use machine learning to sort of intend them with physical model to physical modeling to get to the core of severe system behavior that we'd like to understand. We have one presentation and then time for discussion and our first and only presentation in this panel is by King K. Kong of the University of California at Berkeley and he's going to be presenting remotely. I understand he's online and he's going to be talking to us about machine learning and seismology turning data into insights so thank you. Hello everyone can you hear me? Hello. Yes we can hear you King Kai. Okay good to know. Sorry yeah I can't be there in person I think this is a great workshop that generates a lot of discussions so I wish I could be there but due to the baby sitting responsibility so I have to stay and anyway so I'm King Kai Kong I'm an assistant researcher here at Berkeley's seismology lab so today I'm going to talk about like machine learning in seismology that turning data into insights. So this is actually a talk like actually based on the paper we actually published last year so this is actually a collective intelligence like efforts that all these different authors actually contributed a lot into this paper that we're trying to answer the question what's the stats of the machine learning insights and how we actually can like generate like motivations to turn your data into insights so this is actually me and also Daniel Trogman and also Zach Ross and Michael Bianco and Brandon Meats and Peter Gerstoff that we worked together and Peter is the leading actually efforts that put together this whole team to work on the on this paper so in this paper we actually reviewed like a different type of learning like a career already covered this I won't go into the details about this but like a lot of the learning machine learning problems fall into these supervised learning and unsupervised learning and also a lot of that recently like we see a lot of improvement or like advances in the supervised learning because there's an actual bit of information the labels that can give the machine learning more more chances to learn the problem right and so on so in this paper we also covered different areas instead of based on different methods so you can read the paper like in the agenda like after the agenda in the in the in the in the booklet that is sending out last week so we can see that we covered a lot of earthquake detections and also this morning Zach gave a really nice presentation how to use like a deep learning to do the earthquake detection and also we covered machine learning applications in earthquake warning so that's also like we use the smartphone data to do earthquake warning and also like other type of machine learning uh machine learning applications in this field for example man andrew's work from caltech that's to distinguish noise from the real thing you know and so on and also like we have the coverage machine learning in the ground motion prediction how we actually training the model to do a better job of predicting the ground motions and also of course like a tomography is a big area in in seismology as well so there's a lot of like applications or like uh uh efforts in this area as well last in the paper but not least is the earthquake geodesy and other applications so if you read that you can see like in our field people are really trying to apply different type of machine learning algorithms or other type of algorithms that's related to applying that on the data to our field which are really interesting so um since we had a lot of discussion this morning on specific applications so i won't go to the details about like this review paper you can read this review paper offline and also like try to read some of the papers we listed there but in this section we have these three different questions so which i want to cover these three different questions but based on some of my thoughts it may be not right because some of them may be like just premature but i just want to lay out here so that we can generate more discussions in the panel discussion so today i'm going to talk so first some limitations of the current machine learning algorithms i know that everyone is like really into the machine learning these days but i think understanding some of the limitations of the current machine learning algorithm actually gives us like a way to basically to examine the machine learning algorithm and also think about how that applied to to geosciences and how to make improvement on that so i treat this from a like an optimistic like perspective so then i will show some of the examples that the existing efforts combining machine learning with the physics model because we the the topic or the the theme of these workshop is basically how we can actually use machine learning to beyond the black box right like we want to as a scientist we want to combine in machine learning with the great power of the like the traditional physics modeling and so on and also i will finish like concluded with the moving forward but i think karen already covered most part of it i will just add in some of the parts in the last section so let's start with some of the limitations on the current machine learning algorithm like i mentioned that this is not saying like a machine learning has a lot of limitations we won't we won't use it actually it's in the in in in in updates that like machine learning has a lot of great power that we already saw that like a lot of applications this morning but also these limitations that we need to use it more carefully and also we need to understand that like machine learning are not magic like a lot of people in our field actually i talked to with some of the the the folks at various places some of them are think machine learning kind of like our magic that we can cover all the problems in our field and to solve all the existing problems but this is actually not true so if you think machine learning it's really good at like a pattern recognition it's just a small part of all the all the scientific problems we want to solve so that's why i just want to set the stage that saying machine learning is not magic and once you learn the skills and the two sets it's just like a two sets into our like a geophytist or geosinist like a toolbox that we can solve more difficult questions that can't be easily used in mathematical model or can't easily be be modeled and so on so one thing is that when you look at machine learning you already i think machine learning algorithms are narrow applications which means that when we train a model we use a specific data and we we target very specific problems and we can get a really nice performance like it's showing here is just a sketch of what i was thinking of machine learning versus other general purpose algorithms there are the algorithms that works on various type of problems but the performance may be just like around the average like that dotted line showing maybe the average performance but like machine learning once you train with large amount of data sets for one specific problem so it can it can have a very nice performance but but these performance also have some sacrifice as well which means that these machine learning algorithms usually are narrow very narrow applications that if you want to apply to some other area or some other data sets you may just need to retrain or like it's just not working exactly as we thought like it can be working on wide range of problems and also another limitation i think it's already covered by other like speakers this morning that we need a lot of these structured data so when we do machine learning we found out that most of the time we need to cut our data into like these metrics and we need to make sure all the data is like well organized into the format that generate puts into the machine learning algorithms and we need lots of them especially when deep learning becomes popular and then it just starts to it's a really data-grady algorithm that we need a lot of data to train the gigantic model well and also we need lots of good data as well like if we have bad data we usually you output will be bad as well machine learning has some capability of generalize the problem but it's just like a weak generalization instead of like a strong generalization i will talk a little bit more about that later but also like a lot of the advances in the supervised learning that we need really good labeled data and this is not true in a lot of our fields that some some of the fields that due to the constraint of the timescale or due to the constraint of rare events that we only have small data sites or we only have like a very rare limited labeled data so how we actually use machine learning in the limited data approach that's also a hot research topic that's in computer science if you look at like a field shot learning and also some other type of learning algorithms try to like a teach the machine learning algorithms using like a small data sets but also sometimes I was thinking that if their data sets is relatively small and can be can be like a approach it's more addressed using the traditional approach that's actually maybe a place that we we definitely need to be like should just try the traditional approach first instead of like relying everything on machine learning just to give one this is like one sort of experiment for example it's maybe needs a couple of apples to drop on the Newton's hat to basically have the these like these these newton's law come out but how much data do we really need from machine learning perspective that how many apples do we need to observe or how many other like a data we needed to collect to train machine learning algorithm to recognize or to to understand this law or to basically condense this log into this equation I think this is still very hard problems that people are starting to attacking these these limitations but still there's a long way to go so I will cover that in more details in the second part like the physics model parts and also in our field we always have very famous say in garbage in garbage out I think that's still applies that when we have garbage data as the training data your lead the results are also very bad so this is like another saying like I want to detect like a some signal but the signal is completely buried into noise that can you use machine learning to find it I think that's probably a very hard problem for machine learning as well like machine learning is based on patterns we need to see these patterns from the data that we can generalize and also another limitation is that sometimes there's always bias in the data that it's very hard to identify and also very hard for us sometimes we we we are easily too like to prone to this this kind of bias I use an example from like other fields instead of like from our fields I can mention a little bit more of our fields as well but you can see that so this is a machine learning algorithm trying to recognize whether it's a manual women to do what kind of cooking and so on but the training data science sometimes like you start to have a lot of these pictures that have more females that actually starts to generate these type of arrows that the the the algorithm starts to make this prediction based on these best sorry I think because of the heavy smoke here yeah and also in our field as well like I always most of my research is focused on detection as well but I think these best are also there for example we have a lot of noise we have a earthquake signal have noise but this noise data from many different sources some sources are actually not present in the training data or it's very rare that it may be just a best that we ignore the model will ignore some parts of the noise and then just predict the majority we may we may have a model that have a very high performance but it may be making some wrong decisions at a very critical situations especially like for real-time applications like a earthquake warning that if we have some certain type of rare signal just to come up and detect it as a magnitude nine earthquake it will sorry yeah it will basically cause panic and so on and also we always talk about like machine learning has the capability of generalization that we train on our training data and some of the new data and thin data will perform equally well or even better and so on but but this is actually assumption that machine learning actually have weak generalization which means that if the data distribution you can see the left figure here so basically I plotted the data distribution and the the the training data distribution is the right curve here and also the blue one is the new training data or is a new tested data or the target that you want to machine learning to work on you can see that when the when the distribution are very similar to each other that this algorithm actually will work well is even there's some like discrepancies between these like two different situations but it may not work well I think for this is strong a strong generalization which means that my training data distribution is totally different from my my new distribution and also there's only slightly like an over life and so on so this is like the strong generalization case that may not work well this is also like from our Franco's figure I actually get from one of his famous book deep learning with Python so you can see that basically like if these blue dots are the training data we have we can make some extrapolation or we can make some generalization when we see some new data like nearby the the the data points so this is basically called like a local local local generalization or weak generalization that's current machine learning algorithm has so in terms of like extreme generalization it may not work well for most of the applications for example if we train on algorithm to detect like for example earthquake signals on this region and we move to other region that have completely different type of the signal might not like from the real earthquake or quarry blast and so on it may not just work like as we expected so that's why we need to realize these like a generalization purpose generalization capability of the machine learning algorithms and also recently the most famous one technique war like a way to to do this like a transfer to do this like a transfer the learning from one data set to another is this transfer learning so basically like we can train a model on a very large data sets and then so you can directly apply to other region war other problem sets but it may not work well due to like a specific situations at other regions but then what you can do is that you cut the model basically basically just a cut a certain part of the model generate the features and then adding in some of the new models on the new data set to retrain it to fight tuning some of the the structure so basically you can see this pre-train train to layers won't change when you apply to a new small data set at some other regions for example and only one layer will be like a retrain based to accommodate these new situations I think this is actually looks really promising in different fields and also in our fields there's people applying this as well like this is basically we adding some new layers and then we retrain we retrain this new layer based on only small data sets to be to assume that these pre-train layers have already encoded some of the knowledge or some of the the experiences we have from the previous large data sets so this is actually make the generalization capability a little bit stronger but still I try so here we actually tried to detect like the damage of building from social media data one of the students working with me on detect that we tried to use the image net actually to transfer the features we found out that that's only improve a little bit but if we include a lot of like the the damaged building in the in the in the image net we actually have better results still this is showing the capability that like the generalization capability is relying on like the distribution of the training data and sometimes like it's more critical that we understand that we apply to the new things we should have the relatively similar distribution and also of course like I think the main topic today in this workshop is that a lot of people are criticized machine learning as a black box that we have input we have outputs happy outputs but we don't know what exactly going on inside these box I think I think that's one thing is that it's difficult to interpret how it works well after we train the model and we just see that it's satisfying our our task but also the other thing is that it may not bounded by the existing laws for example if we want to predict a temperature temperature within a region and no no like a like a region or maybe the temperature also follows some like existing laws that we need to bounded the algorithm so this is all the missing parts in in the current machine learning algorithms so luckily there are a lot of people actually try to tackle this problem for for more advanced techniques so also there are more limitations I don't want to like spend more time on that but I think people maybe just like understand or have a sense of these ones are it's better for like a building future models for example machine learning algorithms are actually very actually very vulnerable so there's a lot of study already show that like when they train a lot of recognition pictures and so on so people found that sometimes even you change one pixel it will confuse the deep learning basically make wrong decisions so this is a lot of this is actually more important in real time systems that more on the security side but this is just a showing case that it's more difficult for us to train a model that that like arrow free or like arrow proof and so on so sometimes we need to know this vulnerability and to test this vulnerability in the future and also it's not easy to learn in real time even though there's a lot of algorithm trying to learn online algorithms like you have batch of new data comes in you're reaching the model but it's actually very difficult to train a stable model and so on and also that this link also gives you at this year's triple as meeting there's a nice talk about misapplication misapplying or for machine learning actually in science may generate like because of science crisis I think it's also worse uh worse reading and so on so I talked about these limitations but it's not like I mentioned that we we want as a machine learning practitioner or like applying machine learning in our field we need to know these limitations and to keep that in mind that machine learning is not a general tool that applying for everything but also there's a lot of examples in combining the physics trying to answer the question like how we beyond this black box so I put together this one here is that like a yearly as a domain scientist we are very familiar with like this physics model numerical modeling uses a small amount of data but also these machine learning algorithms are started to to like a dominant like the the data data rich field and also especially deep learning that use a lot of data to automatically find features to make the task finish and there's a need that we need to some hybrid approach that to combining the powerful the two domain two domains that to like advance our science so like I mentioned there's a lot of researchers are actually working on these I think I will just give like some starts to generate more discussions how we actually apply these to different like to combine the physics and so on so one thing is that we know that like a career also mentioned that the different models have different interpretability based on how they design the algorithm we can see that so the horizontal axis here is how flexible the model is or how powerful the model is you can think that way and the more powerful algorithms deep neural network we have less interpretability because like it's so complicated that it's difficult to understand what's it's going on but also there's a simple algorithms that give us more like a insight if we want to interpret the results so I think one thing is that we shouldn't stop when we training a machine learning task and get a good model I think that's not the point we stop I think that's actually just a start that we want to understand what exactly the model learned and how to interpret the results there's a lot of applications or like a research on the computer science domain that after they train the model so they actually try to virtualize the kernels and try to do some statistic analysis on these kernels to understand each kernel and each layer what it does it's learned so for example in earthquake detection problems so the kernel we learned in the deep learning algorithms sometimes also give us like insights as well so one of the exercise we did here we found that different layers kind of filter the way forward into different frequency bend and also some of the layers or kernels try to compress the the noise some of them are actually try to amplify the signal and so on so this type of application are more I think it is like a we need more analysis after we're training the good model and put it into production and also another one is like using synthetic data I think Diego gave a very nice presentation how or why we need to synthesize the data if we don't have enough observations to tackle some specific problems also there's applications that use this approach to reduce the simulation computation cost for example this is actually a work by by Fabie from Harvard's and also Brandon Mitz's work so they basically use machine learning to train to train on the like the the viscoelastic deformation calculation and then they actually can improve the speed by 50 000 percent which actually is a very nice application that when you apply these large scale simulations if you have a machine learning algorithm learns these features and learns how to how to do these simulations you can do this much faster in this sense and also there's ways to encode the physics laws into the machine learning algorithms for example we all know that in machine learning it's an optimization algorithm that we already have an error of the estimation from the machine learning compared with the observations and then we try to minimize this error by changing the weights so Diego and I talked that it's actually a very nice or a smart way for great search of the parameters or like to reduce the the errors and then we can adding in terms that like whether the machine learning estimation is followed the physics law or not by basically encoding these physics into the into the machine learning algorithms so this is actually a link to a paper published last year basically trying to use this approach to estimate the lake temperature modeling so you can see this is the different estimate the features generated into or fitting into the machine learning algorithms and also but machine learning algorithms always like try to estimate the temperature in different range and also like a in in like a random way but these temperatures in the lake actually have followed some laws if we know the density and also if we know the depths you can see this is actually existing laws so basically what they did is that they added in this loss function that whether the estimation from the machine learning is aligned with these physics law or it's in consistency with this law so basically when we minimize this loss function this algorithm will also like try to try to train the machine learning algorithm that to follow this physics law so this is actually the results you can see using your network so this is a physics physical in in consistency you can get like a nice result so this is actually vertical axis is the arrow from the from the the tested data sets so you can see using a pure physical approach you you don't have the physics physical inconsistence and your arrow is relatively high but use a artificial neural network so you have a relatively high physical inconsistence but the arrow is relative below but when you're combining these two so you actually have a very nice results that's lower lower arrows with like a more of a line with the physical physical like a loss and also there's ways that you can respect in the laws of the physics by like encode the non-linear like a PDE partial differential equations into the machine learning algorithm as well so this is actually I think I may forgot to put the paper link here I will add the paper link to this one and let me also see what's the time just to make sure I have enough time so basically this is the the the burgers equation that like a using using used a lot in like a fluid fluid dynamics modeling and so on so this is actually a way that using the try to use the machine learning to to base to learn these partial partial differential equation so you can see that you we have these like a different the the the like the potential fields you here like to take the derivative respect to like the input x the position and also the time scale as well so this equation you can actually form a function that to use the machine learning algorithm to capture it so this is a actually how we can actually capture it by putting together like a this is like if we use tensor flow that we can see that the code basically here showing that this is like we generated the night work or the weights of the machine learning algorithms to to mimic or to try to learn the u field like the potential fields basically from the input of x and t but also we can form another the function basically taking the derivative of the different parameters that we want to we want to like a based on the differential equation and then we're adding them together and then put that into a loss function that can basically encode the arrow the arrow from the observation and the the the the estimation of of the machine learning algorithm and also encoding these differential equation into the machine learning algorithm basically you can see that when we train the algorithm the algorithm itself will take the weights from the machine learning algorithm and take the weights doing the differential like the derivative respect to different parameters which actually encoded the physics model into the the the physics like a governing equation into this training and also this is basically another way to think about like this is another paper discovering the physics concepts with neural network which is showing a lot of existing examples that we can use the machine learning to try to understand for example this is a simple like a very very simple like a damped like a damped spring system that you have this k the spring of the the spring coefficients and also like the the this b value here basically is the damper and its governing equation basically is this differential equation and when they train the algorithm for using an artificial neural network they learned that the the neurons actually in each after training are actually correlated with these different parameters that you can extract this equation after that. There's also some other methods that we talked before I think a lot of in our field we we already seen that Bayesian approach that actually can incorporate a lot of the prior information from physics for example there's a Bayesian inverse finite force inversion from Sarah Minchin's work and also there's a Bayesian like a based detection like from George Russell's like a net VISA and single VISA and you can search for the paper basically trying to use the existing prior information we know the seismicity we know where the errors quickly already occur we know the this type of information for like a constant the model learning and also like there's another like a approach basically recently try to combine Bayesian approach with deep learning try to give uncertainty error uncertainties to the kernels that we learned from the automatic like feature extraction and also the gam model is also very popular these days which means that you have two models competing with each other one model is generating the data points from like from from like a certain type of distribution and the other model is making the decision whether this model is wrong or bad or like a more more fake or like more trustable and so on how we can actually adding in the physics into the generator the generator actually can help us to combining the data combining the physics into machine learning algorithms there's a paper about this I won't go into details but you can explore more if you want so I will finish you here with some of looking forward to here I think Carrie Anne did a very nice job in her review paper laying out of these different areas I also just want to highlight or like emphasize the the the some aspects for example the benchmark data sets it's really driven the community computer science community to make better algorithms and to make this deep learning becomes really really powerful so basically they put these image nets like these large-scale benchmark data sets every year there's people like try to compare the new algorithms like how they actually arrow decay on these data sets so that people actually can can compare with each other and also there's a lot of new data sources as well like Carrie Anne mentioned but also we need a way to fusion the data together like a good quality bad quality and also like a different type and so on sometimes it's hard to use the it's easy to use the individual data but how you combine them is actually very interesting problem as well for example my shake actually we I worked on use the smartphone data to detect the earthquake to do earthquake warning so we actually launched a new version last last week we actually got like a half a million downloads last week which gave us a lot of data that but how we actually combine this data into the into other type of data is another type of like a area that we need to work on beyond that also some of the new things that we never thought of before maybe like a smart meat smart like sensors at home like in the cars and also like in the CCTV cameras or zooms and so on all these things are providing new data and new challenges for us for the for the whole community I think I will stop here and hope that I generate a lot of like a heat here to to encourage more discussion on the limitations on the physics machine learning approach thanks thanks a lot Shanghai this is great and we really very much appreciate you addressing these questions so well and I think there's lots of a lot of good starts for discussion I have a quick follow-up question myself what people collect their thoughts you showed this lake level temperature problem paper and the machine learning approach with the physics guidance produced the same kind of adherence to the physics at lower levels of misfit and I was just wondering how that's possible because there's seems to be like a missing piece in terms of different priors so that the physics only model and for smoothness when it shouldn't have or is there something like that going on I think yeah that's a very fair question and good point so also like that that work is not my work and I just like maybe try to summarize that my based on my thinking so so what they used for training is that they had a lot of these observations from the lake and lake temperature and different features for example the the the sun like the sun radiation and also like some other features so if you use a physics like a tool to model these so you actually captured the general you basically captured the general match of the data there's a lot of like there's these small scale variations I think it's why it works really well is that the machine learning actually helps to try to capture some of these small variations now that's not captured by the physics but also with this constraint of physics that in the cost of function it's actually make the make the the the training more reliable even you use a relatively smaller data sets instead of like you need a lot of these data sets or like much longer observations to make the correct decision so that's why what I think is that like combining these two actually are powerful thank you all right let's follow up with questions to chinkai first thank you chinkai um I have a question that clearly we cannot do everything in machine learning otherwise you have to buy a lot of computers in general a lot of green greenhouse gas so is there a simple guideline for geologists that what kind of applications we should definitely go for machine learning and is there any kind of example that we probably don't need that kind of tool thank you yeah that's a really good question to be honest I don't have that list what kind of so I usually only think in my field like in my in my uh domain um in the past but I think I think karen's review paper also like in our review paper in the seismological uh research letters I also mentioned that like what kind of problems that you already tackled using machine learning and also these ones like most of the problems that go to add to like if the problem has like a certain pattern that's hard to capture to buy equations or hard to capture to buy like intuitive like algorithms so it's probably like a very good fit for machine learning algorithms but but if you look at most of the applications right now we are doing is still within like automate a lot of these time tedious work I think that's really important to as a first step that we try to automate a lot of these but then later part it's more useful to the to the sciences that's how you use the new automated system to generate more insights from like the science domain for example that's work like finding a lot of these new smaller events that's what that tells us and so on yeah other questions otherwise we will open it up to you you know all speakers of this morning so I have a question to chinkai and also probably zack and maybe others as well and I you you discuss the um the transfer learning and I I guess one of the challenges that we have in terms of understanding process has to do with coupled problems and we you know briefly talked about the landslide or cascading hazards as well but if you if you turn to it you know just take a simple or simple complicated example like a volcanic arc above a subducting slab and want to build a database um it's like an earthquake catalog and you have two very distinct processes shallow and deep and the in the same data site it's it's it's a um we traditionally separate one or the other and and analyze the group separately as well but uh is there a potential then with the transfer learning to actually make some um progress at coupled complex processes like fluid transfer from one from the dangling slab to the overriding plate time and space um or are we always away from these kinds of problems I don't know how far you've experimented with with more complicated data sets and taking learning from one to another uh maybe yeah yeah go ahead yeah maybe I answer first so um uh Zach can add in later so basically my experience with the transfer learning is that usually you have like a large data set like you have the problem very well defined then you train the algorithm the algorithm will capture the different features for these data sets and then these features there's some general features there's some more specific features so the specific features is more tied to this one specific uh problem or task and the general features may apply to different like problems so that's why I think the the transfer learning in this sense actually works that when you have another problem for example that you the two problem you mentioned there's like a similarities between each other but there's also difference between each other so if you use one like one problem to design the to extract the general features and use the general features applying to the other one that similar processes but the more specific features that will be learned by the new layers you added it's actually can can success or like can improve the the results but it's not work always like uh I saw like there's like like a different type of applications like there's like even for example in image processing there's like you you learn from one type of dog or drag uh or cats like you train them algorithm on that but then you're applying on detecting cars they're still working the reason is that like some of these features is just like a try to find the boundary of the object and then this find find the boundary of the object in cars is also useful like you find the boundary of the cars and then you make the decision so that's why like this transfer learning you already in my opinion it's it's like more useful when you have like some similar features and that's a similar in certain degree I guess to briefly just follow up on this it seems like if you're trying to do things like solid fluid in the actions or multi-hazards as soon as that then we already know that at certain spatial temporal scales some of those interactions will matter and then others they won't so it seems like again this is like a real tough and tall order for machine learning system to give a general answer because we already know it's not going to work for certain spatial temporal scales and this sort of brings me to a question I asked Karian when she visited you take a few weeks back I guess do you get good information about the misfit and the reliability of the extrapolation out of these other these algorithms could you at least use them to say oh stop here don't trust that anymore I got a note here Zach wants to follow up um getting kind of confidence estimates is a hard problem um it's it's something that's actively being pursued by the computer scientists um to some degree it depends on um whether you can you can learn a distribution of the data and um so there's there's some potential from kind of what would be called generative models which is basically yeah you're learning a distribution rather than or the parameters of a distribution rather than um discriminative modeling which is kind of more functional based um and so because of that is a distribution you can kind of sample the space and and kind of quantify where you are um but it's it this is still not an easy easy thing to do yeah I would just emphasize with that because I think this is a very active area of research in the machine learning community so it's something where people are proposing solutions for that kind of problem but I think it's still something that um you know hasn't really fully been been solved but when you usually when you apply normal language you get sort of a point estimate out and so that's what you get and if you want to get more than that um it's going to take sort of new techniques and methods and and so that is an active area but I think right now it is hard to get a measure of the uncertainty so um this reminds me actually a question I wanted to ask Zach earlier and I didn't really get a chance so one of your comments Zach was um when you were you know training the algorithm to detect p waves and s waves that this was comes to this sort of you know transferability of the learning question and you were making the comment that it was um relatively straightforward to transfer the learning from detecting p waves and s waves in one region to another region and you made the comment that this was you could do this better or I figure exactly how you phrased it but that this was easier to do than using the traditional methods I mean you know the standard seismic phase detection kind of approaches um that we've been using for decades and I wondered can you amplify what you meant by that a little bit because obviously we can deploy a seismic network and use you know standard triggering type algorithms to detect seismic waves and locate earthquakes and then we could apply the kinds of approaches you were talking about this morning so can you explain a little more where the advantages are of using these new approaches over the older ones um well so just to be clear when we're talking about transfer learning it's a very specific thing so there's transfer learning and then there's generalization transfer learning specifically basically means taking a model that has already been trained on some data set right um and so that model is the parameters have been filled out they've been learned now you're going to kind of keep part of that model you're going to retrain the rest of it and you're going to basically fine tune what you trained before to a different data set so that that's specifically transfer learning so the idea is that um the the original model can be trained out in full to learn kind of core structure in the data um and that's because you have lots of data that might be from a model for example the simulations um and that we expect the simulations at a very core scale to look very similar to um the real conditions that you're interested in but then kind of the more nuanced structure of the data might be you know noise dependent or things like that that's where the transferring learn the transfer learning is done because you're fine tuning that now um to this so just to be clear generalization is different where you take the model as is and you apply it to a completely different domain so yes we see um pretty in this order we see significant generalization capabilities it's not perfect and there's still lots of stuff we don't understand at all but but i've been able to take you know these models are trained entirely on data from um southern california relatively shallows has misses obviously no subduction events there um it's it's able to still kind of generalize to other tectonic regimes and the reason is because i think very simply it's that you're looking at kind of what's common to all of these p waves and what's common to all these s waves and so is there some kind of relatively crude characteristics of them which we know is the particle motion and things like that um that are as generalizable um a big part of the neural net where they stop working is is they tend to be very fragile to kind of you know to weird um conditions that are they're changing as well and that's part of this kind of i mean we talked about a little bit today but this concept of adversarial examples where if you add kind of epsilon white noise to an image or something then you can kind of push the the decision completely over the boundary to the other side and you can kind of interpret things with high confidence um we're seeing some of that for sure but it's not exactly clear where those things so if i can push just a little more but i still don't understand the difference so you know traditional network detection kind of approaches of stal ta and think algorithms association algorithms like binds and things like that obviously you can go and apply those to any other region as well so so what is it specifically that the the machine learning algorithms bring i mean to the machine learning algorithms obviously the southern california example you're essentially extending the detection capability down by one one magnitude unit and so you would expect a similar sort of uh effect in some other region but i i was i was just trying to understand if there was something else that these algorithms are bringing that you weren't getting when you transfer these more traditional sort of you know association algorithms to other regions um okay so basically from my perspective um if you were to just take an arbitrary data set you don't know anything about it you don't know if there's earthquakes in it or not um trying to so there's there's a couple aspects about this the computational infrastructure you need in terms of the software is like these giant stacks of stuff right libraries manuals reading through all this learning how to do this you have to become an expert in how to run these these software platforms you have to then know how to tune the parameters of it such that the results of the catalog are appropriate and things like that and that's totally a non-trivial exercise yeah with the standard stuff stl ta binder all this kind of stuff um what's the advantage of the machine learning is that in theory you could have kind of more end-to-end solutions um that you encode kind of simple rules in there like our associators just giving it examples of of associated um events right and it's learning to do that right and that's once you it's like this much code to do that once i generate the training data set and so um it's not a very complicated exercise to learn how to run all this stuff and there's very few sensitive parameters because the system is kind of self-optimized to get the best performance so that's my take on it one one quick question which is a follow-up to that have you have you tried applying this to well there's not that many earthquakes in cascadia but there are some where you get deeper earthquakes have you tried it with completely where you don't just have shallow events i'm thinking in particular the associator the that that would be presumably where there would be more difference um so i haven't talked about our associators actually train entirely with synthetic data um so we generate the training data entirely from a model yeah we take it it basically learns to recognize wave fronts from this synthetically generated data and we directly apply it in situ to um to the real data and it works what i was thinking is that the wave front the way the wave front goes through the network from a slab event will look different than a shallow crystal event sure um so that would be the question is how much do you have to retrain or could or or it might actually be getting principles that would apply to both so we train it separately for every reason it would apply to but that's a one-time thing you have to do it's it's only you know it's like a few hours of time to get it there and obviously we don't acquire data that at that pace yeah and then one of the quick question on this sort of transfer problem which is a little bit more general um uh chinkai mentioned uh transfer some layers and retrain some other layers how do you decide which layers to transfer and which layers to retrain is that a trial and error process or is there something that actually comes out of the algorithm that you get some sense of where within these multiple layers is it actually extracting information yeah that's like yeah go ahead okay yeah that that's actually a very good question for for the deep learning approach like when you train on a specific problem you're the the first a few layers are general like features and when you go deeper and deeper layers it becomes like more specific features for example uh use the cat and dog example like for the first a few layers maybe just like the boundary edge detection of the the cat and dog and then for the the later layers deeper layers maybe just like the eye like a year or color like more more specific features two different type of uh uh objects so that's why like you already when you do this transfer learning you want to use the general uh features that uh similar to each other that you just cut like the first a few layers and then transfer it to the to another problem and adding in layer adding in new layers basically in the deeper layer to learn these like more specific ones and then that will basically capture like more specific or more task or new objects oriented features um can I ask another follow-up question so I remember recently I read this article maybe in science or nature about uh autopilot they were not able to recognize a stop sign because there's a sticker on the stop sign so they just read it as speed limit 45 mile per hour so I guess my question is so for those you train your seismic data and could that be always some kind of new condition that came out like clock air or just something in your data set that you you were not expecting that and then because of that new condition and then you just totally miss identify something and because you are generating millions of detection and you actually you actually have you have very limited chance to actually find that afterwards um yeah um yeah I think this gets back to what I was just talking about about these adversarial examples so this is kind of exposing this fragility of neural networks that that's very well known at this point um and so um to some degree I originally just started out about five years ago where people recognized this flaw in these systems and then um it led to kind of this interaction back and forth where people then came up with a way to avoid it and then people broke it again some hacker basically figured out how to exploit this again and to it looks like there's even theoretical arguments for why that you would never be able to um to fundamentally deal with these kinds of um these shortcomings so um yes this is this is a big security issue on a number of domains um like the self-driving car example is one of them because if you cover up a sticker in just the right way you can trick the system to thinking that it's a stop sign or instead of being a stop sign you think it's a whatever assigned it to keep going and so a green light so um but this is this is very serious um yeah so I've been actually working on adversarial examples in this domain um and it is I think a problem there it's something that you wouldn't necessarily like someone isn't necessarily going to attack these systems but it does show you that they are very fragile and what I'm finding is that they there are a lot of concerns as someone coming from a math background I think of a neural network as it's just a function it's some sort of a mapping from your data space to some sort of target space that you're interested in and these mappings with neural networks are very complicated all these different layers learning the features these are very complicated mappings and these are in high-dimensional space you're training with a lot of data but you're in very high-dimensional space you have a huge number of parameters and so these functions in a lot of ways they do well you can regularize them so they give you nice results but there are ways in which because you have a really complex function in a high-dimensional space they're not always well behaved and so these adversarial examples why I'm interested in it is to me this raises concerns um this is something that's been known in the community for a long time that these methods are so sensitive and Zach was mentioning that there there are methods to to try to work against this um but it's still something that it's very difficult to show or to train a network that doesn't have this property that you can perturb it or change it just a little bit um and have it change the result pretty dramatically um and a lot of the methods is that there are some people who are trying to make methods that won't necessarily hurt performance but in some cases it does um and a lot of these methods are also only able to show that for very restricted cases of the kinds of perturbations like adding Gaussian noise or for instance like you may be able to show that you can't add Gaussian noise that will change um the class and so it is sort of an ongoing problem and I think but it is something that people like me are starting to think about and work on um but it is somewhat a fundamental aspect of um deep neural networks some of the last complicated methods you can also attack but it's more difficult so one of the things kind of going back to um to Richard's question about you know whether or not you know what's the advantage of this over STLTA I think one of the advantages of methods like STLTA and traditional methods is that we do know um you know we have estimates we have ideas of when they're going to fail we have for a lot of numerical techniques we know um you know how accurate they are and I think this is still a disadvantage of machine learning methods that I think both this community needs to be aware of but also it's something that's an active area of research um and I think that that's something that um I think geosantists need to be more aware of and think about as we're applying these methods all right I think this is probably a great time to end and Maldis a little bit more over lunch I'd like to to thank Ching Hai for his great presentation in all this morning speakers I hope they're going to hang around and we're going to break for lunch for one hour and then we're going to reconvene at 145 thanks everybody