 So we we're very happy today to have been you who's a professor at Berkeley to speak to us about interpreting deep neural networks towards trustworthiness. Bin is a very distinguished scientist, and it would be a lot of work to read out all the honors details and positions you have held but just to give you a flavor. She's a distinguished professor in Berkeley leads a group there one statistical and computational developments in machine learning. And she's a member of the National Academy of Sciences she is on the advisory board of the UK touring Institute. She has held lots of fellowships and honorary doctorates and so this we are extremely honored to have you. Bin's talk will not be the full hour but a little bit shorter and after that we are having a panel discussion with three three academics working in related fields has had a herrington from Oxford, Chris Holmes from Oxford and Simon taver a from the Columbia University, and the idea of the panel discussion is very relaxed. We hope that the panelists will kick off a nice exciting question session with been been go ahead. How long do I have because we've wasted 10 minutes with the tech, the problem. I I think. So it is now 815 so something maybe between 1345 minutes and then we will just have to reduce the panel time yeah shorter would probably be better to allow some more time for discussion if that works. So I give a bridge version of the talk. Very good. I world. I like this code from Bill Gates, AI is like nuclear energy both promising and dangerous. And data science is really at the heart of AI with a lot of data driven algorithms. So I said data science is really a life cycle, and with many many many steps, most of time we concentrate modeling algorithms, I think now with the very advanced of computing power software we need to pay more attention to the other parts of the data life cycle, especially pretty non culture. Well, I'll say like pretty obituary in some with certain limit like human judgment calls and so human at the center we work closely with scientists collaborators. And I think you already had this general data protection regulation 2016 or something new being discussed understand. So, I hope you will lead the way in into promotion learning. Here just a few examples where interval machine learning is needed. And I will go to the cosmology problem for this group, but there are other many in medical and biology, everyone to interpret machine learning and especially steep learning seems to be a big complex black box. And I like to take a moment to define scientific machine learning I see many make my project scientific machine learning people have used the term. So I haven't seen clear definition here's a definition to use machine learning for scientific research for knowledge discovery, and also built in scientific principles to machine learning, iterate in both step and push both fields forward. And I think the result in scientific machine learning show subject to higher standards or scientific standards and had to be supported with open source and reproducible software. About a couple years ago, my group put our heads together tried to define, but people talk about inter permission and what do we really mean. So we came up with a definition I think the most important bit is really emphasis the human aspect, we're interpreting machine learning. We also have a model with authentically catching structure in data through predictive accuracy or more scientific knowledge, and also can be explained by human narrative language, and has to be relevant to a human audience, and to a problem. And often this relevance also has a time constraint components, but you are doctors you want a simpler rules for somebody for preventative medicine, you can probably take longer time to look at model and interpret. This is a typical trade off, which doesn't hold out show you we can improve both the super accuracy and predict like in two scientific collaborations. But most people think that well it's kind of deep learning as good predictive accuracy but not good interpretability. So most of the talk here would be about the postdoc interpretability you already fit in the model hopefully you ready did reality check, and then you try to interpret the two kinds global interpretability feature importance across different cases, or you look at individual cases you would really drive this prediction for this patient to have a positive prediction for cancer. I have trust worsening my title. Where does trust come from. There are two important aspects is trust why is we have scientists in the good scientists, we help us to interpret as it makes sense from scientific principles. The other is actually through the sun, the prince, the data some life cycle whether we have a quality controlled process and I'll end there with something I've been talking about for the last couple years called predictability is the framework of a vertical truthful data science. So for the rest of talk, I'm going to start from pretty generic machine learning method we develop a sentiment analysis. And then I was pushing my student to say let's see how it works in cosmology. So in contact cosmology, we were brought by the scientists to the free domain, and we thought, well let's do local for your transform, let's do data driven vivid distillation and for that case so we did this methodology external validation to a scientific field and see our original method work okay, and we got something better. And then we took upon another project which we're already in our group on cellular biology to show this adaptive vivid really to use the number of parameters tremendously and work really well in another project and I'll end with this data science life cycle idea through quality control. So this is a couple years ago, Jamie Murdoch my student was really spearheading this project, the many many interpretation method already basically perturbation driven right you do local perturbation like gradient you do kind of a transition, it's a perturbation driven all the interpretations perturbation driven. But what I was lacking was this importance, kind of like measure, based on more than just one local marginal, like warf say you want to see what's the compositional importance of not good not just not and good which already existed. Came out with this idea that we can decompose the best actually to through example, suppose our recent input is not good is negative sentiment analysis and we decompose this thing you have a label into two parts you want the importance for very good and not apparently already marginal you have a measure already, and you trace it through this trend model already sale LSTM, and you keep some additivity the formula is actually quite involved. And then you trace you that propagate upward and to the label, and you have two measures for the relevant part and the relevant part. I will later we generalize with Chen and sing joining to image domain. So this is one example you can see that the lower end is all this important scores. And then the shaded is the great tell you how positive is different pieces are, and then you see where things flip and flip back so they give interpretation how this algorithm works, and we can do the same thing for images. We're going to run two small scale human experiment to show that our measures for three pretty typical data sets overall perform better so the first one's comparing model prediction is a good model and we run them perturb some ways to make the same label but make a bad model, and you give the interpretation to a human subject and say, use these four different methods for this sentence. Which one do you trust, and whether you can use interpretation people can correctly use their human knowledge tell which one is better across good model or bad model so we came ahead. And then the second line is just about trust look at interpretation, which one do trust and we, the ACD also came ahead, which is really organized. We have these different measures, and we organize use hierarchical, like a classroom to make it more integral as I show you in that sentence. And we also feed it back with visiting student Laura from Denmark to really reduce the rely on some noise features human nose and desirable features like gender so can feedback interpretation and revise the big burning model. So it's just a bit probably more interesting to the audience, which is cosmology was more interesting to me as well, but I have to say that starting with generic machine learning did give us a start point at this problem so we start working with. So I'm just now back to Paris at Berkeley as a postal and my postal who took joining us and France fast colleagues, try to look at a model that Francois already fitted to. I understand. Oh, to estimate omega amp for this audience probably don't need to explain it's a parameter and the origin of the universe and you simulate what we can observe with this gravitational lens technology, and he already designed a rest net with 10 million parameters I work the best at the time. And we said we have this cool machine learning technology, let me let us help you to interpret it and he said well we need to go to the frequency domain. So we generalize our message, a little bit more just using the full year inverse to kind of friend and added this free in inverse for your transform from the axis the rest net model, and we can still interpret the f prime. So it was just a little bit work, and we made it transform domain. So if you look at this frequency domain, each freaking domain, lay down at the bottom will get a number because we can do compositional importance and people see that you can see that around point two three by you get the most importance and seem to agree with this was cosmology knowledge, and then it shows that high frequency. It's not as important that could be taking to come in when you simulate, and when he simulated this images. And this is just a proof of concept to show that why we went for our message instead of the other existing methods. This is a very simple case and we did better. So this is just, and so so the cause were not available to this compositional importance. And then the methodology, but so this is iteration as I mentioned scientific machine learning, we went in with special learning method, and was guided to the frictions domain by cosmology. And now we in the context cosmology, we want to develop new methodology inspired by free transforms and let's go for local for your transform, which is vivid. And people already using cosmology of fixed limits to do the same problem and work worse than the rest net. So the question is, I had been wanted to relate maybe with deep learning for a long time so this was a second opportunity to do so, coming to data to and got co joined because we had a cellular biology problem to externally validate this a WD method. So now we're doing machine learning to solve the problem, and also hopefully to solve another problem, as you can see. You can study this distillation tinton at all. You can distill a complex deep learning model to a smaller one. Maybe you want decision trees, whatever the context drives you to make some interval models. It's not just interval but also hopefully also feeding the science, there might be specific forms in genomics decision trees actually match that biological threshold and behavior. And you might actually get more margin up because now you have a feeding analytical match of your mathematical entity with the reality. So we set up to do is to learn data driven baby transform as a generalization for your transform. And the include interpretability and efficiency. So we've let's hopefully for this group is pretty well known. Many good books, I work on babies, 20 years of 30 years ago with with elegant mathematics harmonic analysis and very useful engineering applications and also connection with neuroscience. So the connection with neuroscience is very clear with the physiology design the 90 late 50s people did this physical experiment hub on whistle. And you have the mental entity gap or baby filters and you see today, even unsupervised learning addiction learning. Oh Alex not the first layer look like. Which is matching the V one area of visual context. So here now we try to look for this baby transform in the blue box. We make it orthogonal so there's also inverse, and we use the same kind of idea approach, as we did for free transform. Now we don't know what is blue box before we know the free transform and in the big neural network is rest net with 10 million parameters. So we designed a new wavelet loss function with three, which construction loss really, you know, who's so enchanted really came out as a great idea that first part is make sure the baby transform more or less, you know, invertible. I'll explain the second baby loss, H and G are the low pass and high pass features giving a mother baby, this whole program, you know how to do the low pass and high pass filters. And the last part depends on the rest net, which we didn't use our CD actually we end up using just a gradient, because the computation was too heavy for us. And then we do L one penalty. So we do use the resistive rest net of the deep learning in this loss function. How we did the second term we look at the books, there are many many equations, this low pass and high pass filter have to satisfy. We collected them, not always. We take them randomly and turn them into loss function. We take every equation, we make a difference and square them, and made into a wavelet loss. So the problem that become the search of H and G. And with a penalty to falsely like vivid. And I already explained the loss function at the last part is interpretation loss. So for concept you start with the ground truth vivid on the right, and the distilled basis estimated with our machine learning method look very similar. And this is a different way you start from WC five. You randomly start your program, your ground truth is diversify starburst cost one two, then you get something quite close and you have not as good if you start with same five, but still models are pretty close so this is just very simple example. So now back to cosmology. So this is a very busy slide let's take some time to digest it. So people already in cosmology using wavelets with something called peak counting method so I won't talk about that. So only we're replacing one step in this peak counting starburst rest that Francois already designed, use our method to distill data driven vivid and feeding to the peak counting method. So let's look at the bottom table that the first one. This is prediction error you want to be small is 10 to the minus two. This is not racist. Take the square root is taken. And restaurant get you 1.15 we have about 15% reduction or 10% and then the different measures don't do as well fix babies. And the last one's interesting if we only use a two loss functions in our objective function. Not the rest that part, you do worse. So if you just want to learn data driven babies without the rest that you don't do as well this really shows the contribution of rest that. Deep learning is great with search. So, with deep learning you through the search efficient searching, you want to a good space, and then you take the fitted value from the rest net. That's the distillation idea, and then you feed it into the loss function. And you teach the student which is a vivid, and we have a reduction of 10,000 fold to about 1000 parameters from 10 million parameters. And if you look at the baby search right so we have two parameters between use cross validation. And what we find is this symmetric mother vivid. Which really reflect what this model tried to do with the baby said he tried to take the difference of mass in the regional space, which large densities the peak. And then the contrast with surrounding areas. That's seem to be most predictive to the omega m parameter. And at different skills. Okay, so this is machine learning general methodology, help to solve the original problem better because we increase, increase interpability tremendously and also increase, increase predictability. Same time so we didn't follow that curve I showed you were able to increase the predictability and interpability so my intuition is that this problem, probably babies with the physics, it's a better match in terms of analytical structure to the problem, but deep learning give us this potential to search to get to a place that we couldn't get directly. So we're combining both words of wavelets and deep learning. Now I always wanted another problem to validate this now it's become a general methodology solve the problem specific problem pretty well, actually the best we knew. And now, I want to see as a general methodology, whether we have external validation so another scientific problem and this iteration I talked about in the beginning. So this is the problem we're already working on with GACO from Berkeley advanced bio imaging center. And this is a very central cellular process called CME or clear with classroom mediated into cytosis. So the kind of the line is the cell membranes below it's inside the cell about it's outside the cell. So this is a very central process for the cell to get food and nutrients and other things inside the cell. So we could use some proteins inside the cell start kind of tagging them and the whole thing curve. And there's a protein called acting to cut and you get a cargo in, and then there's something called Oxylin comes into unpack the cargo and the CME is completed. But this process doesn't always complete. So for cell biologists this is such a fundamental process you could probably understand maybe the cancer cell do this process differently I'm now just thinking about. Then you need to observe this process know when this process finish so there's this classroom, which is on the left, this is the video observe at a very small scale. We also have oxoxin in the green, which is more expensive. And the problem we try to solve is the molecular partner problem. Can we predict using the orange to the green. If you use deep learning directly on this very fuzzy images. So in the beginning we only had very small data sets so we did a lot of human teacher injury and look at things. So I extracted use a lot of understanding to the microscope and Gaussian noise all of that to extract time series out. And when we had more data we could make LSTM work to give the best performance this is our square so on the test set. So you want it to be large and LSTM give the best result point to three. But if you do the same thing the same methodology machine learning. We get better result about 10 to 15% increase you can get point to six. If you don't use LSTM. You don't have an increase you can get a bit of this get similar data driven babies that's the last number tells you it's a data dream babies you without using deep learning. The same result LSTM, but that's better than fixed babies. Okay, so this is a distillation that you have the best of the both worlds so you could have just do data driven babies without the rest and you get similar to LSTM, but you combine them, you get about about 15% instead of one sum of parameters is not really into what we have only 30 parameters or 30 features to look at parameters. So similarly you can see, look at this trade off between the two penalties. And then here is not a symmetric way that we picked is something. The primary has to have a build up which you already knew by looking at many, many images, you need to see a huge build up of the orange signal to indicate that it's more likely, the cargo be successfully transported into the cell, which is indicated by our green signal. So this is also become interesting as also at different skills. In the last bit, it's really saying that the first part is about feeding. And then interpreting, right. We didn't look at how the data cleaning part of the upstream, basically the scientists, you know, similarly the course you don't have a problem, but for the second, we, we just use the pre processing pipeline already existed. We didn't have a trust in our scientific collaborators so we didn't worry about that, but any other problems. I think we have to go much more upstream to look at the whole process, and this is the paper came up 2020 with my former student is a co post out of UCS M now cartoon beer. So using a trustworthy AI or data science, this approach is really try to aim at the quality control data science process to maximize the promise. I find it very helpful to think myself and also in teaching a class to separate reality from the recent in reality, which is data, and to the model which are called mental construct, they don't have to reflect related at all. In the future in our minds, how do you set it side the text data, the term is determined by how you're going to use the algorithm. And if you have hospitals data, you should set different hospital as a test instead of random split right because you want to think how the algorithm will be used will be different hospital. But the data cleaning, if you really want to be culture to really taking the calm all uncertainties, we really show have people clean the data, the test validation and the training by different people to reflect uncertainty clean data. So the data science define as reliable reproducing permission traction with the enraged technical language, which is very much needed in deep learning now to describe what's going on the geometry in high dimensions to communicate any battle in pure for evidence. In general, I think, with too much into algorithms, these are the looking for in pure evidence in terms of domain knowledge and human decisions. So we should really think this like the old state of quality control is a quality. It's a process need to be quality control. And the PCS is really tried to unify follow up on Leo Brahman's work to cultures make one culture of machine learning and studies and push both forward. And we have this nice figure from Rebecca my co-author of the book I'm finishing using PCS. So let's just start getting on the path of this PCS through stability. At the time. I had the opportunity to give you two key lectures so I tried to connect data perturbation with model perturbation basically robust statistics and unified them through stability and call them reasonable perturbations. It's really a no brainer to us, I hope, if you want to be hybrid, you can quote Plato saying that for true opinions as long they remain a fine thing. But they're not waiting to remain long so you need to tie things down and that's knowledge. So this means stability anything you want to talk about name with the language everything has to be certain period has to be stable language evolves and but you still need something with stability for multi discipline team to work together. So the same English word show me the same thing to multiple people matrix is also using cancer research and means something very different from what we need. I run this project to show that the human judgment calls, it just so prevalent in our work. So three teams of students in 2015, Berkeley, each had a doctor collaborate from UCSF thanks to my collaborator Aaron complex. We look at this from traumatic brain injury data. Three teams clean data in different ways. One team cleaned up a 23% of data and declare victory become a clean data to go the best result. People did different plots, you have one dimension data violin plot to me and box plug in a very different visual impressions. And they use different interval models logic regression or trees, and they use different performance matter for this should be sensitivity and specificity but some team just one for predictability loss, you know, function anyways. And they use different data perturbations different proposal metrics, and at different conclusions. So another paper published with 29 teams of people under the same data, get very different result. So this is something we really have to come front is that saying, Oh, it's not there. We have to take into account uncertainty in the step of data cleaning specially even visualization to really formally account for all the uncertainties, not just the modeling stage with one set of clean data, I really advocate, we should keep multiple clean data and run through the pipeline to see whether we endure difference. And of course, scientists already ahead of us right they already plot these nine climate models to give global temperature rise from 1.5 degrees to 5.5. So the PCS paper at a section on perturbation intervals, try to really enlarge our inference to in taking to taking all these different perturbation also methodology I have a paper on randomized trials for different heterogeneous treatment estimation do a different 17 different to measure by reputable groups that we considered. This is so many choices people make and cherry picking your undercounting the uncertainty in your result. And we did randomization initial randomization data perturbation analysis to our to scientific project so I will get similar results and documentation. One of the things that human judgment is such an integral part of our PCS framework, you document why you made out why your data perturbation is reasonable you really see the data could have observed this way because a different person could have cleaned the data. So that's why you have this data perturbation is a different clean copy and should be reproduced with code and Jupiter notebook or mark down and record your domain knowledge and judgment calls and put it on GitHub. Up to today, my team has really worked using this principle this is really high level conceptual principles is not a cookie cutter, but in card to do critical thinking in the context of your problem. And we have now a different project, and this framework work work really well and every time we do something a little different because the context of domain knowledge. And we're not producing, we have very to very good software packages, once cover vertical flu to make this PCS data analysis easy. And so hide a lot of the parallelization building on software coming from British CS. And we'll also have another C, which is sim shot for simulations in our that you use data to drive your models, even maybe multiple models, you can do this. But you have multiple simulation models, which you need to do when you have parameter estimation for prediction you can use test that probably chosen test that for parameter estimation, you'll have to have some simulation models of theory to justify why the methodology is so hope this become useful, and we're not pushing forward to try to do some theory, motivated by the way this work, and then we nice to kind of connect a WD with mechanistic models. We'll also actively working with hydrogen center and the Anderson and join genomes Institute, try to bring some of this idea into data and the protocols and pipelines, and also working with actually a climate new science technology center at the Columbia. Hopefully have these ideas brought into routine data science work. And I'm thinking both with my current post stuff on the student we pick a bother is I meant to price I'm sure some of you have heard I've been talking about this for two years so definitely will finish this year, and we have a pre copy online copy one we submit for our print. Thank you. Excellent. Thank you very much. So that is very very interesting lecture. I think we should ask our panel members to very briefly introduce themselves and then begin to ask a bit in a few questions. Heather, shall we start with you. My name is Heather Harrington. I'm a professor in mathematics here at the University of Oxford, and I am also co director of our Center for topological data analysis and affiliate with our welcome center for human genetics. Thanks for the really nice talk. At the very beginning you showed a very applied I think it was like your second slide looking at predictability versus interpretability. Is that correct with the axis right okay. And we see this trend and what I was interested is, is there, you gave a couple examples but where do you think there's the most opportunity to improve both the interpretability and predictability like what fields do you think would be. Yeah, where there's where there's the most room to grow in there. I think it's really where we need to work very closely with domain scientists to bring that science into this interpretability, and, and do the, you know, the cool things also with the best prediction performance as we did I think that's where we're going to have lots of I think a lot of low hanging fruit of machine learning I'll cut the first phase of machine learning. It's kind of done the generic measures seem to be pretty good result everywhere. But in the second phase, I think the most interesting for me is actually getting to specific the problem use domain knowledge, and think about analytical structure matching you can use deep learning as a big search engine, but don't end there, but go further, and to bring the domain knowledge I think a lot of faces already do bring this into deep learning, or beyond and go to, I really want us to end up in something very interpretable and simple if you look at science science. And it's not really transmissible when you have this humongous like, you know, the IT industry is going to this large models just dump a lot of things like we might have amazing results but how much computing probably used. So, but I see that computer on that those are the things have good predictive performance. There are structure embedded there but I have to distribute it into some form too many interpretable and invest scientific knowledge, I think that's where you would want to have good domain knowledge. That's that's feasible. So it's really engagement in significant way with human experts. Yeah, sorry. Can I ask one more question is that all right, or do we need to go to the next panelist. Let's go to the next panelist and we come back to you. Simon, I'm going from left to right on my monitor actually and Simon. Simon does any so that's fine. Oh hi I'm thanks very much been for an interesting talk. Good to see you. I'm Simon Tabaret I spent 15 years in purgatory in Cambridge, where I was in damped an oncology. The last five years of which I was director of the cancer search UK Cambridge Institute, which is a now very well known cancer research place. I met Peter through a connection there at the London math society. So we've been friends for some time in 2018 I moved to Columbia to start something a bit different. I started with size one instead of size 450. And that's the Irving Institute for cancer dynamics, where I'm now the trying to build this operation. And it's slightly different to other institutes because it's not in a hospital or a medical school it's actually on the main Columbia campus. And its aim is to attract people from physics math statistics biology and so on into problems in cancer research. So we currently have about 35 people including three endowed chairs and other people, other researchers, and it seems to be going reasonably well. My research interests are primarily stochastic computation and cancer evolution, and some aspects of tech dev in particular single cell whole genome DNA sequencing which is proving very interesting for understanding more about copy number variation in my question, you might guess would be medically related in some sense and maybe you've answered some of it already. How do you see been the prospects for these sort of models or approaches, moving from fundamental research to standard of care applications in medicine, where there is a sort of big uptake problem for methods which are not. That's why I say it's simple, because they're interpretable more easily. Well, I think many people working in this area, trying to make it happen I think, but I think for this algorithm driven thing to go into the bedside. I think FDA will have to have some more clear regulation put in place and about how do we evaluate and continue the monitor. So I would like to think that there has to be some screening process for all this data driven, for them to go into clinical decision. That's first thing just like biomarkers right you need something, but then we need continuous monitoring of such process. And when you have new data. So, I think, ideally, I like to think a more sought out process to ensure safety of this decision rules or clinical decision instrument. In this area heard some very nice work from coming up from Israel that for preventive medicine. I think then we have not as much problem because they try to get people to go to fall vaccination. Then that's, I think that's a very different game and there's a lot of things to be gained there in the public health domain. This equity and the damage is very limited just people's time and things like that, but for crucial like chemotherapy. I will think a lot to see either people know the risk interpreted very clearly people waiting to take the risk. So I think they're the AI can also help because human knowledge and human energy just limited. So this group of work with the ER doctor from UCSF. So they have this PCOM is a group of ER doctor for pediatrics, and they have the decision rules. So they just have it on the internet and there's nothing and some clinician uses some don't so so that's happening too so the clinician take full responsibility. The PCOM rules are more trustworthy because it's already this grassroots doctors getting together so we stress tested their critical decision rules are so use our PCS framework and the past. So, with that, I think the FDA should take into account already grassroots efforts to make things reproducible, and, and we talked with different team of doctors just go variable by variable where the one hospital. So we had a validation data when we did our stress testing. They don't do exactly the same thing so we go through variable by variable to check with the two teams of doctors, whether they mean the same thing. Right. So, I think we just need a lot more domain experts engagement for that to happen. Did I answer your question Simon. Yeah, yeah. Thank you. Thank you. Chris, you're next. I've been. Great talk. Thank you. So for those who don't know me, I'm Chris Holmes, I'm Professor of Biostatistics at the University of Oxford. I have a joint appointment in the Department of Statistics and the medical school. And I'm also Program Director for Health at the Allen Turing Institute. And I was working out been I think I've known you over well over 20 years probably 22 years when we first met and it's been a joy following your work and various discussions over the years. I have a particular question and it's because something that that that kind of vexes me or something we've thinking very carefully about and it's around a non representative data. And in the context of medical and kind of algorithms or AI is employed in medical science and I just kind of characterize that a bit more is at the population level, we know that the greatest medical burden falls on those that are most disadvantaged in society and those sectors are the least represented in the data sets that we get so so rather than experimental regimes where we can set up very precise experiments often when we're collecting data for medical records. The people who agree to sign up for medical trials, typically come from well in the UK from white middle class. Let's say populations and that means we're missing the part of the population where you kind of where the algorithms have most have greatest utility and so within your kind of framework and in particular thinking about the life cycle how do we kind of capture that aspect or that key aspect of an unrepresentative data when we're thinking about models and models. And of course, I will put in my knowledge of future data, just write a whole paragraph about Wow, actually, we, you know the data we have, it's very different from the future patients data. Yeah, it's put it there. I mean, on top of that I heard a talk recently by James so from Stanford was, you know from the work with the genetic. There, he made it clear there's another reason the minorities like suited because a lot of the accessibility for randomized trials are based on mostly why population in terms of measurement. Right. And the minority group actually have a normal profile but fall out of the interval. So they're very healthy people but the intervals are made not for the minority group so they got excluded there even they already. Participate right that part I think can be, I will say mitigated. But if they're not on your radar, but just really describing right a paragraph a big paragraph about how different in that future not right I put it under my P. Yeah, this is the people want to reach, but our data, or your way. When you do the evaluation, you don't wait. Suppose you have a small group of minority people in the try to upgrade their result. Right so that's another but you have nobody there, you only can make a statement saying that we know this is horrible. What else can you do. Oh, you have certain genetic models to to kind of extrapolate something at molecular level based on their genomics or something you have to get up your model and your data current data to go there to do transfer in some scientific way. But at least make a statement I think just forward. Yeah, and I think that kind of up waiting of course introduces additional instabilities because your effective sample size is being kind of dramatically curtailed. That's why I think we need this continue the monitoring just like we did it's old statistic control quality and control on the factory floor right quality controls never we said this, and we go away right it's a continuous plot right this time dynamic I think that idea has really been burst in this data science. We need to have this plot of something just run with time about the quality and continuously but we don't have that mentality right now we just do one thing we go away and hopefully it lasts. I think we have to move away from that. It's really control plot. We should start thinking things as a plot. Thanks. So if I may ask a question bit. So, you know, the history of physics and mathematics is one of creating laws. And now suddenly in the last, you know, 1015 years we see these vast, very complex problems being solved by machine learning. Yeah, problems where perhaps the laws are very, very complicated, because they're like too many particles involved or other reasons. And it's, do you think that the aspects of interpretability, and perhaps also the aspects of the architecture of the learning circuits, will they eventually form some sort of replacement of fundamental laws that are being transferred or are we going to go for a sort of case by case data set result situation versus searching for underlying truths. That's a great question I think I would think that in some cases we might end up with this very well vetted algorithms in place of mathematics. Personally, I still like mathematics better than than than algorithms. I was playing as a mathematician. So, so, but, but I will concede that we might have this bionic type of thing some part mathematics and some part very well vetted codes. Right. And then in combination, I think that's the future I will embrace, but I will try to still to take those and make it into mathematics if I can. I will definitely use it when it's well vetted, because now even proofs we don't read all the proofs anyways so we're not really having the graphs of the mathematics we used to, and not a proofs, you know, important result get vetted right but not so important result not vetted the same way. So, but I would like to try to distill that into some I think mathematics is pretty reproducible over the years right and code can, you know, the computer you know things might not be as reproducible so you will fall that matter I think also always strive, try to go into mathematics is one language we know it's very reproducible. And, but I would definitely wouldn't shy away from using some why why did the code that we understand the best we can in this process in combination with mathematics. So I will definitely go there myself. But has to be vetted what do I mean by that that's the question. But also, before returning to header, ask one question that was brought up in the audience by you, huh. And after thanking you for the talk. Thank you question is. Is it possible to use deep neural networks to search for new kinds of way fluid filters new families of way flits, and the motivation is since the classic way of filters are designed by mathematicians are the perhaps some, some hidden one still that can be discovered and produce better results for certain problems. Well, thank you for the question. That's what we did right away we found was actually data driven is not exact. It's kind of using the using deep learning. And because we fitted a deep learning to the data and then we distributed so we basically take data from the fetid surface, and we can have infinite number of them. And then we have a data driven vivid coming out so it's kind of what we're doing. I don't know whether you have something else in mind. Yeah, the babies I showed you work for this two different problem to different they were not designed by the mathematician they were designed by us. So, we're not going to have every problem you can have your own data driven babies now. I think my, my question is, since the traditional evidence. It is designed to be used by to process different signals like the sound audio or the images or some other kind of data set. It's quite universal. So I think in the data driven data driven, I think the data driven design of evidence maybe it can only handle this kind of data and I don't know if can be robust to handle other kind of data set. Well that remain to be seen so hope you guys get other people get into and we'll find out together. So the thing is, you can see that the data driven babies work better we did compare with fixed faith that the people using already right so there's advantage. So, so it's not as universal as you think right because it's very very didn't do as well of data driven one. So the question is reproducibility with fixed babies. So we can see that we can find other mathematical forms, which you have using the data driven ability to inspire, and which is not in the current collection of fixed vivids. So that's probably will make you happy because they can be a mathematical characterize, but we need to see more experience, I would like to see you have 100 such examples and we can see some commonality of form and find mathematical function that reflect that. Right now, we have two cases, and but you ready shows the data driven. I will say power. Then the fixed vivid. But I always like to end up with more clean mathematical forms, if we can. I'm not there yet, but really great if you guys codes on the internet. We have good codes. So give it try. And it's not just deep learning, you can have random forest this works to because we just take the function feed the function. And then this deal. So it's not the examples using deep learning but can be other functions, you want to distill. Thank you. Thank you know that's Heather has, has at least one more question. Go ahead, Heather. It's time I know it's just after. Yeah, we actually tend to go over quite badly in this seminar. We go as long as people are in the room and we like to talk. Okay, so, um, so my right my first question was up particularly interpretability my second question was about trustworthiness and AI, which you discussed a bit. There's some perturbation but I guess this relates a little bit to Chris's question with the unknowns, right so it's not even biases actually unknowns and then how do you trust what's happening if you don't have those unknowns. I think you partially covered it in some way saying that oh we need to put a warning out but if you have unknown unknowns that's just we just have to close our eyes and trust what the best we can do right so I don't know there's any guarantee to be a situation which is unknown unknown but that's why the process becoming if we follow the best process use the best mind we have. Then by that point we just have to close our eyes and March. I don't think until we have more information I guess yes that makes sense. In the end, I cannot remove the human judgment call right so the meta human judgment call cancer smoking cause cancer was a human decision by community of people at NIH right in 1963 or something look at 23 or 25 observational study and made the decisions causal. And most of us believe it's causal. So we cannot. It's just like a gold or theorem is this eventually we just have close I hope things will work out. I don't think we can get, but we should bet as much as possible. We should debate as much the time allows us. And that's why the process becomes important. At least the policy we could check we checked. And then have faith that will be my approach. Thank you. Yeah, Chris. And maybe following up on that so how much talking about faith how much faith do you have in like confidence intervals classical, you know, statistical measures of uncertainty and p values and and what does. So as struck, like, you had that plot of the nine climate models and we during the pandemic we had multiple epidemiological models and one of the striking things was that they would give confidence intervals that didn't often overlap which by definition okay is kind of interested in this own right so how much faith do you have in in statistical uncertainty quantification. Well, in the classical one in some physics model you have Gaussian measurement model I trust it, but in most of the other things I've seen I don't. And that's why in the paper we have called PCS inference, we're still working on that so we have a paper on IP three tried to do epistasis detection using PCS framework, and what we did there basically we just introduce a lot more variability. So don't worry about data cleaning right because you can buy back data we didn't have access to that. And effectively we just fatten the now hypothesis and have much smaller pre values. So I definitely actually I shy away teaching especially p values in my class, because I don't I cannot keep a straight face, most of the time to do that so I don't I tried to avoid it and the confidence interval. The problem is with that so keep saying we do uncertainty quantification, somehow, but for prediction we don't even we have to have a way to evaluate how good we're at uncertainty. And we don't. So I think for prediction, we can do it through property chosen test set is get some confidence level coverage. And that's why I want to enlarge the perturbation, the confidence interval include data perturbation and multiple. So the idea of perturbation to what you PCS include all the reasonable perturbations you write a little paragraph about and put them together and then use another test set to evaluate the coverage. So it's very agrismic, and usually give you much larger intervals. And, oh, you build some good simulation model and saying that if the simulation can be right, I will really recommend multiple simulation models to use the same data to really see and take the more conservative one. We value in the epic trip for the red hair are you okay by that if you use the traditional one you get 10 to the minus 11 right I don't believe that. So we do this more robust kind of PCS key value get to 10 to the minus four or something which is I think more trust worthy. But but I think we have to start talking about how do we value this and certainty quantification. And people seem to think that, since you have quantity and certain quantification, and you and there for me if I don't trust my and certain confusion I usually don't give intervals, because it can be misleading. So, talking about COVID so my group jumped in with 12 of us or 13 of us for two months two years ago. Oh, that's agent based model, I just thought they was they can predict months ahead I didn't believe it and I was right. We could only do 10 days ahead of seven days, how we did the intervals, we took the residuals. And then we use that to to give you a magnitude and the coverage more empirically observed because the good thing about the pandemic keep going. Every day you have a new data point but of course there's a data quality issue with that count right and the count, but case numbers now just completely undercount. So, so I'm very. I think we need to talk about how to we evaluate and certification. That's the discussion now. Oh, I have a quantification. If I don't trust I think I rather not to have it. I agree I think it's about again, what is it verifiable you're kind of your uncertainty qualifications, like scientific in a way is there an experiment to verify. Okay, let me the confidence into is that it's never going to be pretty it's just give me a ballpark should never be taking that serious it's just that the ballpark magnitude right 95% versus 80% I don't know how humans can really differentiate. But the magnitude I think still important and a lot of time if your standard deviations culture and it's in there already. I think the uncertain quantification gets into very precise numbers. I just think it's overkill. Yeah. So that's why we have PCS perturbation intervals. It's just beginning we're not having that completely divided either but definitely like that better still just because bigger. Which the one thing really bothers me a lot is we have this Gaussian approximations right and we know centered in there. The, the was called as in Barry bond is one over square root of N right. So when you have 10 to the minus 11 you're basically to for that bond to be nice you need to 1010 to the power of 22. Number of samples right so even mathematically we're not culture there, but somehow we got used to it, and then we just start using it right even even. The number it take up from the normal table. It's beyond the precision of this mathematical result. It just doesn't make sense to me. Exactly. Yeah, even mathematically, even everything holds. It's, it's not to be done. We're transforming things to normality as well through logic functions or other things and then then you're even worse in the tails and. And if there's dependence around you end up with rates that aren't like one over the square of N they're one of the square log N in my world and that really kills you. Yeah, this is why you can't estimate anything in genetics roughly speaking. Yeah, that's why we need face right. Well, you need faith to be a statistician at all. So, can I ask if we have time just a quick question at the other end of the spectrum. Where do you put this sort of vertical thinking into undergrad statistics or don't you. You know, right this book. That's why the book we're finishing this very very limited math is really tried to have the whole pipeline and first our student, what do you mean by a random variable what's the reality map to reality of your X. I taught X for 25 years without blinking. So, a lot of statistic causes, or all my statistic costs but now I introduce random variable very very carefully, because random variable actually makes when you introduce random variable you're making assumption it's a stability assumption if you only care about the data in hand, you don't need a random variable why do you need a random variable. At least, you need to imagine another situation, another set of the data in the future in the past whatever can be collected and can be viewed as a realization of the same random variable. You tie them together. And that's some stability consideration there. Why is not why why the next data says not called why but only called X. This took me 25 years to realize and getting to causal inference helped me to think about carefully about assumptions. So, in my book. We introduce random variable very very late. We talk about evidence, we talk about use prediction error about future all of that it's very qualitative and narratives. And just say that just think, there's no magic here there's no magic formula for you to plug in everything's fine. You take the responsibility in your document about why this thing's being done. In the variable you better start imagining the physical situations. Another set of data as a realization can be observed. Can you be there. Right, what freedom and we'll call shoe laser work, go there observe. I mean, I remember I visited you many years you told me you have this 3D assignment, like I didn't go in but think about the cell, how the dynamic right so that's what, you know, think about it. If you have cellular data try to imagine yourself in the cell and see what you see. And so, I definitely try to do that has successfully that. And so some students really get it because they tell me say that they use it for their job interview, whatever right at least it was that utility. And my graduate students I think get a whole lot more intensive training in the PCS and one of the students working with you and so I saw you yes that the reception. And he was very happy saying that also and so he's doing my class with come talk to me because they're doing stability analysis. And, and also our each with random forest. He said work really well for him, you know, he's very rigorous scholars are very pleased to hear that he thinks that our method works stability is such a no brainer for us to do. And I think it's really low hanging fruit for us to do a lot of good groups already doing a stability analysis, but I hope the PCS framework is formalize it even in my own group. Sometimes do this way with something do that way, we're not consistent. So now actually, I'm really making everything in my group, very protocol like we're going to have a standard documentation. So I can imagine the PCS documentation. Some authors is submitted to a journal show feel out this PCS documentation to answer some questions in a borrowing way just fill in the blank and tell you which line in your paper, these questions addressed. And that will be really helpful the referee we need to push more work on the author instead of referee had to do their work for them. So, just a supplement document you fill out. You can try your paper in the most fancy way the, you know, interesting way you want, but you feel out this form to answer this questions and site, which line of the paper, you address this question I think that's that would just make how people to think about this issues too that's what I'm doing in my group, or our project of PCS documentation, and the MD Anderson collaboration. That's what we did with Sam, harnesh group. They did the documentation for them they did the analysis. So they went in and fill out the documentation. And also the PNS editor meeting this week. They are organizing a community of statisticians, so to help other editors that their statistical analysis. And my plan haven't done that is to go approach them to see whether they can ask someone such documentation for PNS submissions in terms of statistical work. I think that will help people to think more thoroughly about their process. And that. So all this idea came from. I've seen myself repeating myself to my new graduate students about the same thing, right, you have new better students. And I thought, you know, I might just write it out. And then eventually, you know, put into classes, and now more and more into documentation. And I think this GitHub, even for yourself. When you're writing sound, you're more clear in your mind, and somebody might look at it, you know, make you more reverse it just human nature. So it's not fun the PCI documentation is pretty boring but it's useful. And so definitely I believe that I talk about that in my undergraduate class and I think the book, definitely talk about it, and we're going to start putting up some sample documentations on the internet. So did your course replace something or are you extra. So there's always a dogfight in my experience about what gets to be in a curriculum. I have been teaching 215 Berkeley for 15 years, not for 10 years 11 years. So Berkeley you have a lot of freedom to do what you think is right. So I didn't have dogfight. The question is whether my successor of 215 will keep it. Yeah, but I have my book now so I feel like students who care can read it. And I've been talking to actually high and one will be our next year. So I've been talking to high and high and really likes this framework, and talking to me I need seem to like it too so with those two people interested maybe we'll have some continued interest in this approach, but it's up to the next generation. I feel like with my books is then I have put it out there. It's up to them. But I do have other people interested I give a talk about this book and this thing at the Texas and I am at the time was the chair. And he was very engine he was talking about organizing a summer school for this to help people teach from it so they definitely has been interested in all my collaborators who not a statistician seem to find it very accessible, relative to other books. And all the editors in like, we talked to MIT Press, Princeton Press and Cambridge Press, all the editors loved it because they can understand what we're doing. So, so I think some people find it useful, and we can have a free copy to you don't have even pay. And it was nice of MIT to allow us to do that. MIT Press to lower the access barrier. So, I'm hopeful, it's just common sense I don't think there's anything fancy here just completely common sense but sometimes common sense is hard to find these days. Well, maybe it's because many years ago people were filling in the same sort of form you've just alluded to, which stop them thinking about what they should be thinking about. So, you gotta be careful codifying things I guess. You know, I'm a pretty free style right so just like think about it, right something so I tried to remain in more as a high level critical like kind of liberal are thinking type of thing remain there instead of telling you. People ask me, they want numbers I tried to resist not to go there. Yeah, yeah, just write a narrative. So liberal are critical thinking at this stage. So I've been spending time at Simon's Institute for causality program there's a lawyer in residence about his fairness and he's a houseman from Yale. And we kind of completely agree she has been very frustrated attending to what is causality sounds like what your ex means, and people don't want to answer that they just want to go back to their ex. And for her is people right it's lawsuits and all of that and, and many of us but people don't want to answer that question. And then they start claiming that whatever they do is relevant to practice to do real life. So that's the part. What does the X mean. It's a very fundamental question I think was a which I didn't worry about for 25 years so I'm just as guilty but now I start really worry about that. We have further questions from the panel or from the audience. I think we're going to to thank them very much or the header are you still coming up with one. No, then we're going to thank you very much and thank you was lovely to have you here and listen to the discussion. Thank you very much, and we hope to see some of you next week. Thank you also for the panelists. It's great to have a discussion good see everyone meet either and see Peter again. Thank you all. Have a good night for you guys. Thank you for watching. Bye.