 So welcome to this new podcast of the Robert Lee Beneficial group. Today we're going to discuss a blog post by Jürgen Schmidhuber, who's one of the most influential researchers in deep learning in particular and co-inventor of the LSTMs. He wrote this blog post about AI versus COVID-19 asking about what are the ways that deep learning in particular or deep learning kinds of methods can be helpful in the fight against the COVID pandemic. And so today we're going to discuss, so he basically he listed three points and today we're going to add two more to this list. So first we'll be discussing ideas related to contact tracing or global scale, like population scale tracing. Then we're going to discuss single patient diagnosis or methods used to better understand the health condition of an individual. Number three will be drug discovery and everything that's related to the search of useful molecules. And then number four, we'll discuss one of our favorite topics, which is recommender systems whose role I think is extremely neglected in this crisis. And then we also discuss things related to epistemology or moral philosophy that combination of thinking and probabilistic thinking are probably useful for and these are fields that are highly researched by AI researchers. Okay, so we already discussed concerning tracking populations, the use of contact tracing for individuals. The idea behind contact tracing is to be able to faster detect who is at risk of being infected or not by the virus and to make them be tested or put themselves in quarantine to stop the propagation. Another idea that was very important was that epidemiologists should have access to better data to be able to estimate quickly the reproduction number over time. And this is especially important if we start to stop the confinement because every decision we take to reopen schools or cinemas will have an impact on the reproduction parameter of the epidemic. And we want to very quickly be able to have this estimation. So we have said last week that contact tracing, a contact tracing app could be a great asset for epidemiologists to collect this kind of data to quickly estimate the reproduction factor. But if we don't have this kind of data, there are actually other sources of data that could be used. We discussed for simply cameras in a metro station or bus stations to count how many people pass by. So today, because of the global quarantine, there must be a very small number of people using this. But I expect that epidemiologists could find a relations between how many people are outside and traveling in transports and how the amount of contact people have and the propagation. All of this, so simply collecting this data and making it available to epidemiologists to be able to have better estimation of the rate of reproduction. Yeah. But all of these points, like in the blog post, there are mentions of current work. And I guess most of it is not peer-reviewed at all because it's in crisis mode, and it takes time to write and review papers. But these are actual normal ideas with research that can be done, but it's not guaranteed that they will be extremely efficient. What's interesting is that if you want to do this while you, well, if you want to do AI, if you want to do machine learning, then you absolutely need a lot of data. Machine learning is only successful if you can gather a lot of data of quality data and a lot of data in terms of quantity as well. What's interesting is that some, especially Facebook in particular, has been addressing more and more this issue. They've launched something called dataforgood.fp.com where they're trying to work with academics to try to set up databases that can be useful for this kind of work. And typically one thing you can do that's mentioned in the blog post is that you can try to predict the next contaminations for AI, like where there's going to be the next outbreak. And if you can do this, this can be extremely useful for policy. You can accelerate the policy making to contain the outbreak. It can be also useful to organize the response. And typically, given that the resources currently are limited in terms of material resources like ventilators, but also in terms of human resources like doctors, it's important to know where they could help the most. And this kind of large-scale predictions can be extremely useful for this. Yeah, I was waiting for you to finish explaining point one, actually, which we address in the episode on contact tracing. I just note something which is very common in academia, especially now in these days, the bubbles of communities. I went through the Alice workshop. I didn't see any mention of DP3T or or robot initiatives from peer-to-peer. And so the initiatives such as DP3T or the Inria one and the front-of-run robot didn't make it to the radar of the machine learning community, at least up to the date of this workshop. And I think that's also something that should be highlighted and that makes safe time. And if we just have some channels such that people don't reinvent the wheel, each one in each community in its corner, I don't know if the idea was clear. So you see that there was this effort on developing peer-to-peer contact tracing and then you had people within the machine learning community doing it on their side and probably losing some time by not being aware of what is done already in the peer-to-peer design. Yeah, I'm guessing that by now they've heard of such projects. One thing that would be interesting is how to exploit the data from contact tracing because so at least in DP3T users can also agree to share the data in an anonymized way with epidemiologists. So this can give additional data about how people interact, like how often at least what is the number of contacts that, for instance, the DP3T users have per day and is it increasing over time? These are, I think, interesting and important data to analyze. Like what is the proximity? Should we ask them to be more socially distanced and such like this? Yeah, this would probably make up a nice database to be analyzed by machine learning researchers. Point number two was about focusing on individuals and given an individual can we, for instance, predict if he is already contracted the COVID-19? Ideally, you would want to diagnose this even before the person has a symptom which sounds extremely hard and probably won't be doable, unfortunately. But there are other, perhaps easier questions that you can ask, or maybe by observing some biometric data of individuals, you can give some probability estimate of whether he's more likely to have the COVID or not, but whether this priority is large enough so that he should stay at home. There's also some interesting work about trying to predict if already sick patient is going to have a critical condition within two hours, for instance, because one of the things that's really terrible about the COVID-19 is that the patient can feel fine for a long time and then all out of sudden, all out of sudden, without warnings and without doctors predicting it in just a matter of a few hours, the patient can be critically ill and even die. So if you can predict this two hours ahead of time or a few hours ahead of time, then it can be extremely useful to prepare a response for these critical conditions. So there has been apparently already work done on this that's really interesting. You can analyze all sorts of other data that are collected in hospitals to do this or to better predict if the patient is fine to be released or things like this. There's some more, let's say, unusual kind of approach. So all of this is very nice, but it only works if the patient is already in the hospital and as we know, a hospital being overwhelmed is already a problem. So you don't necessarily want people, more people getting in the hospital to get this kind of diagnosis. So instead, what people are having, like these AI researchers have been trying to propose is solutions to help people diagnose COVID-19, but with very limited data, for instance, typically data that can be collected from your phone. Because if you can do this, then essentially anyone could use his phone like a lot of people can use their phones to have a better estimate of their health conditions. And this allows to bypass the medical institutions that are already overwhelmed. So that's very interesting. Now the main idea about this is quite unusual. It's about analyzing the sound, the sound recording of someone coughing. If apparently, according to the paper, apparently COVID-19 is attacking the lungs differently from other diseases that attack the lungs. And so because of that, apparently there could be biological reasons for which people cough differently depending on the disease they have. Even acoustic reasons. Yeah. And so the idea of the app is to analyze this. And if you have a lot of data, this is a very classical supervised learning program. Like you have a lot of people coughing. And if you can get the label, if you can know if they were afterwards, if they were, or even before, and if they are positive to the COVID-19 or to another disease, then this is like classical supervised learning. And this can be helpful to diagnose, to determine based on the cough, if someone is more likely to be having COVID-19 or some other disease. So the people I saw commented on this, on the medical and biological communities tend to be skeptical. So I believe there are reasons to be skeptical. Because we have a track record of overhyped apps in AI, etc. But I also think there are reasons to be indulgence and less skeptical. When we think about how indulgent we are with non AI diagnosis, so to say. So we tend to be, we tend to ask for, like we tend to raise the stand out sometimes. I'm quoting, so there's this professor at Virginia Tech. And she was saying that we tend to just raise the standard with AI. So obviously this app wouldn't be bulletproof. It would have a lot of issues. And as you said, it needs a lot of data that we might not have yet from patients. And it might even need data from patients when they had symptoms and when they didn't have symptoms. So on the same patient, so that you don't, you don't start learning something that has not anything to do with COVID, but just like with populations, like you start learning how old people cough because old people are overdiagnosed than young people. So instead of learning how COVID make you cough, you start learning how old people cough compared to young people. But yeah, we tend to forget that we have also very much this with human diagnosis. With pneumonia you can lose a week if your doctor uses a stethoscope and she or he doesn't send you to do an x-ray while you needed it because she or he taught from this stethoscope that you don't need to do an x-ray because there's nothing apparently. Yeah, I think we need to be careful about this paper. It will have to be more tested. I think they need maybe approval from the FDA. I don't know how it works exactly. But like if it has a lot of huge weight or false negative or false positive, it can be a problem. Well, they have a lot of, well, they did a lot of experiments and testing and according to their results, like it's really, really promising according to what they've published. But of course, they've only analyzed the people probably coughing in certain conditions, maybe in given hospitals using some specific recording devices. It's not yet clear that this will generalize to applications on everyone's phones. But what I find interesting in this paper is that they were really careful about especially false reporting that someone is not likely to have the COVID-19 when he actually has the COVID-19. I think it's very important that this poverty is very low because you don't want to allow, like to encourage people to be careless if they actually have the COVID-19. So yeah, according to their data, like the poverty, this poverty is like very, very low. It's like very, very low. You can get the paper, it's like less than, I think it's around like one per million, but it's probably not that reliable as a number. But it's indicative of the effort they've made. And that's, yeah, that's interesting. And the app also could just reply your test inconclusive. So it shouldn't be thought as a diagnosis, but it can be a help to say to some people, this is, you sound a bit like you should be very careful get tested. Maybe this could also help to prioritize test because testing is limited. So we cannot test everyone and we have to prioritize and it makes sense to prioritize those who are more likely to be at risk. Yes, for this kind of app, I think they have a very high potential to do good. The one detecting a COVID-19 through listening to calls, but also other ads like using the heart rate we know from smartwatches can measure this or the breathing patterns. And one thing that we should be very careful about is that we should be able to anticipate the reaction that people would have to to diagnosis from such a certain app. For example, we will necessarily expect that there is some percentage of false positive or false negatives. And if, for example, the app has 2% of false positive, then we don't want that 2% of the population would go to the hospital for no reason that that could be a very negative side effect of using this kind of app. So yeah, there is a discussion to have and what thinking about not only improving that app to make it as good as possible to detect the diseases, but also anticipate the reaction of people to this kind of diagnosis. Yeah. Yeah. And again, these are classical problems of robust machine learning. Like you want the results to be robust. In particular, I think here, the greatest danger is probably a distributional shift. Like the data that people will be sending through the phone is maybe not the one that the app was trained on. But again, if you want to to solve this problem, like getting more data is critical and dimensioning these people, like if more and more people use the app, the app can improve. And yeah. And it's hard to predict how how well it will be. But maybe it's, I think it's worth investigating more this possibility while being careful. But I think a bit of exploration here is maybe enough. Yeah. And one of the aspects that you mentioned is for the app to be able to simply answer that I'm not able to make a diagnosis instead of being a positive or negative. This is something that's a topic often discussed in terms of ASFT, that when machine learning models make prediction to be able to be to also have a measure of how sure the model is of its own prediction. And I think it's something that a human doctor would be doing. If he knows, the human doctor analyzing you would either know for sure that you have no problem, know for sure that you have a problem or know that he doesn't know and boost you to towards a different test and different doctors, especially. And that's something that would be desirable for this kind of apps. So yeah, this is so this relates not only to the back end design, but also to the to the front and like people in computing tend to forget that we need to have a standout where it's like we need to have a word with the standout is not yes or no, but you have a nuance like you have a spectrum telling you just like your doctor might tell you she would tell you, okay, I think you have a pneumonia or I think you don't have a pneumonia or she can tell you, look, you are in that range where we don't know. So you might have a bacterial pneumonia or it might just be a bronchitis. So let's take an x-ray. So maybe we should think of designing apps where you always have the spectrum of likelihoods and just tell the user, yes, you fall in this region of yes, you fall in this region of no, or you are in this broader area of doubt, and maybe you better go to the doctor now. We can move on to number point number three. So grab discovery, trying to find molecules that are promising for or during the COVID-19. So there has been an interesting work in this area. For instance, you can have a better understanding of the functions of different proteins and proteins are what are going to what they're doing the job at the molecular level. So it's important to understand how they behave. There has been advances, especially from Google DeepMinds and something called AlphaFold about better predictions of protein folding. There has been also work using graph neural networks to understand the similarities and the connections between different molecules. So yeah, this is all like research that has been done in the background for years. Now they've kind of put to the test, you know, urgent manner. So I don't know that much about the field, but I do know it's extremely difficult for the same reasons that it's very difficult for a human to be discovering the right molecules. And also like there's no, I don't think we should, I think one dangerous fallacy would be to think that we're searching for the miracle molecule. Like, I think it's unlikely that there's going to be this one molecule that whenever people who are in critical condition take it, they suddenly are saved. Like talking to doctors, it seems that what they hope for is more like something that reduces the probability of dying. And if it reduces by 10%, it's already good. If it's by 50%, it's a miracle. So I think this should be stressed that there's not going to be a medical drug, probably, very, very, probably not. Yes, the way this kind of research works is that the virus has some sort of molecule that it uses to stick into the human body and and what neural network like AlphaFold and I think there are others that correctly predict the interactions between different molecules. Try to, with this kind of neural network, we can test the billions of molecules very quickly to test whether we think that they would be interacting with the virus or not. And one way that this can be used is to not make a final decision, but decide on which molecule are at the higher priority to be tested in a lab by a chemist and a biologist. So out of one billion, we could think that it selects only 1,000 molecules that are better candidates for that we think will be more successful at interacting with the virus. And this kind of speed up drastically the work that is happening in labs. Yeah, I've seen also work about trying to predict side effects from using the similarities between different molecules. And this can be also useful try to, yeah, to predict the side effects because like some of the most powerful treatments are also those with the most side effects simply because if they want to be powerful, they need to change a lot of things in the human body. And yeah, this often leads to important side effects. So you need to find the right balance between helping to cure the disease and not having too bad side effects. And yeah, this kind of working can also be useful, especially as you said, like to prioritize some molecules to do research on them over other molecules. Yeah, the good work that has been done on this area previously is also building a databases of known molecules and no interactions between molecules and these databases are very valuable. Out of the five directions, maybe this is where the machine learning and AI community does not yet have anything ready to use. So this is where we have the least ready to use things. So like those things are like AI for drug design, AI for molecules, etc. This is something still very, very, very premature. And I think it's worth, yeah, worth investing in data sets. Just put in efforts in data sets because that's the first thing you have to put. You don't do anything else before having enough data sets. Like the applications of machine learning in image recognition wouldn't have happened if the data set image net was not there. So I would say 90% of the efforts that led to the current use of machine learning image recognition was gut-train image nets, more than the algorithmic part. Yeah, and this is a direction where, and I think this is a direction where chemists and biochemists should join the effort to increase the scale of data sets. Yeah, we talked about this also a few days ago, but the way data is collected in hospitals can be improved just like this. And I think it's very important to gather this data because this data can be extremely useful to better understand the spread of the pandemic and how urgent it is to take greater further measures, the number of tests that are run. This is extremely important to estimate the number of actual cases in the population. Yeah, all of this, like, having better information systems is critical because information is critical these days. So some people might not realize how much low-thick the world is going through with COVID. Up to 20-ish of March in the official news channel, the official television, like the national television of Switzerland, reported that COVID deaths and cases still need to be reported via fax. So not only that, but even cases. So positive cases up to 20 March or to I don't know if it changed since then, but I'm afraid it's less likely to change. Some of these things depend on legal aspects that are very heavy and so very established and hard to change. So yeah, cases needed to be reported via fax. This just creates a bottleneck on knowing just the number of people that are infected. Yeah, it just slows down all the process and also there's the problem of standardization because if the data sets are all in their own formats, it's additional work all along the way to trying to get everything to work together and thinking about people from awarding data, their power is struggling a lot with data sets in very different formats. Yeah, the fourth one is something not mentioned in the blog post and I think it's something that's way too neglected in my view. It's the problem of recommender systems. I think a lot of the most effective interventions that really have to be done especially during different Feynman have to do with people's behaviors. Are we going to keep social distancing every day? How careful are we going to be when we touch different objects of open doors and stuff like this? Can we touch our faces less and stuff like this? Are we going to wash our hands and more frequently and so on? So all of this have to do with human behaviors. Arguably, this is the most effective treatment especially to keep the reproduction number below one and these behaviors are not going to appear out of themselves. There needs to be constant reminders and encroachments to do this to save lives and recommender systems have a critical role to play in all of this. So yeah, I've seen a video in French but by a psychologist who is discussing this yesterday and insisting on the fact that it's not only of having these behaviors implemented out there but prior to this, especially for communicators out there, there's a big challenge about how do you communicate most effectively. So psychology has a lot to teach us about this. But then there's also the problem of how do you get these messages out there? And we've discussed it many times on this podcast but the big channel today, one of the biggest channels are social media and in particular YouTube. And a lot of what's communicated through YouTube depends on the YouTube algorithm just to recap a few figures. 70% of the views on YouTube are results of the recommender system of YouTube. So if YouTube wanted constant as an excellent video by Mark Rober, by why you should wash your hands, it's very compelling. It's very cool. Also very nice to watch. It's about how I think the title is how to see jumps spread. It's very educative like with children and all. It's very nice to watch. And this video has I think 12 million views, which is something like this. It's quite good. But arguably it could have, it could easily have hundreds of millions of views. And this could lead to actual large scale impact, a behavior change that could have a huge impact on reducing the reproduction number and saving lives. Yeah. So one thing that I like to consider is to try to think of how, what would be the best case scenario if you can absolutely decide what the recommender system is doing or for the one on YouTube. And count how many life would be saved in that case. So for example, if you imagine that the recommender system is smart enough to anticipate before the European government decide to decide the confinement for everyone to anticipate that there's a program and try to spread advice to the population that staying home might be a good idea starting from now. If it simply convince a 2% of the population, which is a very small number to stay home two or three days before the actual quarantine is set in place, then this has a repercussion in terms of a thousand or tens of thousands of lives that are of people that won't be infected by the disease. So it's huge. And on the other hand, another impact that it has, the algorithm of today, when it continues to recommend videos that says for example that the virus is not worse than the flu. And seeing these messages motivates people to fight against the quarantine, continue to go out even though even given the situation and clearly does not motivate people to take the right behavior to slow down the pandemic right now. And propagating these messages, also the number of life negatively affected by these kind of messages is in the order of the thousands of the millions. So yeah, one thing that shows that a recommender system are really important in our scenario right now, in our crisis, is the huge impact that they have on the population. And that simply small changes of this kind can drastically change so we are influenced by them. Yeah, I think that maybe we can discuss it as well. So the CEO of YouTube has announced that YouTube plans on removing videos that have COVID content that are too remotely different from what the World Health Organization recommends. And this is an interesting move, but I'm not sure how robustly beneficial it is. I fear that there may be a backlash because of this. And also like just the threshold of not being consistent with what WTO is saying is quite vague. And it may lead to like unhelpful debates, let's say. Whereas I think perhaps a more robustly beneficial way to move forward would be to, maybe you do recommend a little bit the contents that are extremely harmful. For instance, thinking that drinking alcohol can help kill the virus. This is not going to be helpful, this can be very harmful. But also I think a more impactful and consensual way to move forward is to recommend a lot more the videos that are really much more clearly, robustly beneficial. And again, I'm going to recommend this video by Mac Robber on the importance of washing your hands. Because it's a very compelling video. I think many people who watch this can tell themselves afterwards, I should wash my hand more and I will really try to do this to save lives. And if these videos get a lot more views, I think this can be very impactful. And this can, yeah, because a lot of them has to do with this mute news problem rather than the fake news problem. I think like the fact that this important information is not communicated as much as it should be. And playing with this change to recommend the videos that should make more views for the public good. I think it's a better way to move forward. And I'm hoping that you should be just another layer on this. I think I've discussed with some people on this topics for like different like of recommendation for or like the recommendation and it's very subtle because sometimes you can have backlash if you'd recommend. So I believe you can do this without smeltering. So as as as a suggested to the CEO of YouTube, she said that anything that does not fall into the guidelines of the World Health Organization would be taken away. So someone suggests in vitamin C might help you. So this bit, like this kind of things might be removed. I hope she reconsidered what she said in that declaration. Because actually, they are like very professional medical advice. So it's still it's still unproven, but there are reasons for some medical professionals to suggest supplements of vitamin C vitamin D and zinc and those things like do not harm. So and if we enter that logic, what we will do what we do we start doing with with presidents suggesting, for example, like disinfectants, alcohol bleach, like there's now a surge of the word like the surge on the surge of bleach just as we speak now because and so you would fall into subtle situations where you can't censor a president, obviously, without a strong backlash. And not only that, I think like if you enter the territory of let's do recommend or let's remove, it's not very, very efficient. I'll give you an example. So now whenever a video is talking about COVID, you have this disclaimer from YouTube, let's please look at what the health authorities are saying in your country, let's please go to the World Health Organization. And this happens only when you have COVID in the title. So for example, Alexander Technoprog did a video on on the confinement. Like how the confinement should work. You don't have the suggestion on that video. So you can talk about you can talk about COVID without mentioning COVID in the title. And you won't have the disclaimer on your review. So you could have like a lot of videos under the radar, not labeled as COVID videos and having like there's now a surge on a video reaching reaching 1.5 million views in France, suggesting that there is a conspiracy of global governance. And I was like, it's off course talking about COVID without mentioning COVID in the title. So there's like a lot of problems like this, this, like what is a COVID video. But there is like something else. Another strategy that is very efficient, which is instead of the recommending or censoring or et cetera, problematic content, why not just flood YouTube with quality content. So flood YouTube with with Kotke's acts and the World Health Organization and the and the three blue blood, three blue and brown and and and in the case, et cetera. And like whenever you detect a high quality content, just promote it and put it to the front. I heard might maybe like could confirm that I heard that like the science YouTube now is suffering as a side effect. Yeah, well, right now now there's this COVID situation where science is quite well regarded. Okay, like in France is the most the people, the profession that people trust the most is a scientist. And in this COVID situation, like I think it's more or less fine. It's like science YouTubers do make quite a number of views a bit more than usual, I'd say. But over the last few years, like it's been decreasing, I think the number of views for science YouTubers. My number of views certainly went down, but I've heard this from many science YouTubers. And it's such a shame. I mean, we could be pushing the quality contents up there. And there are some some amazing contents that can do a huge amount of goods. And that can also just, yeah, as you said, flood the YouTube with quality instead of trying to to fight what's very, very bad and letting what's just bad go up. Yeah, I think that there's a really needs to be much more thoughts about improving the overall quality of YouTube. A lot of the motivations I had when I started YouTube was about bringing quality content to people. And it turns out that these days, the bottleneck is no longer me, like the quality of my content, while it's still a bottleneck, but it's not the main bottleneck. The main bottleneck these days is whether YouTube is going to recommend the contents or not. And I think it's a shame that YouTube is is not promoting as much these quality contents as it could, especially in the context of a crisis like today. Do you want to discuss the fifth point? Yeah, we might, we might go to the point I like the most. Epistemology, and I believe I personally believe this is the most overlooked part of where data science can help. So, so our field, the field of data science, as it says, the field of inferring from data. So we tend to have a lot of epistemologic issues and methodological issues. We think we tend or like we should, we should spend a lot of time or we're supposed to spend significant amount of our thinking into reasoning with uncertainty, with partial access to information and reasoning on constraints. So what could I infer from this amount of data, given the fact that I should decide within this amount of time? So we are supposed to be one of the fields that should practice trade-offs, computational thinking, of course, but a lot of probabilistic thinking. So we mentioned the yes, no, blurry diagnosis where you could, where you should also try to evaluate how uncertain you are on the yes or how uncertain you are on the no and how certain that the person is in the between region, etc. And so I believe that the epistemologic contribution is overlooked, not only by medical, biological people, that's biologists, etc., but also from like computer scientists themselves. They tend to not realize how much precious material we have in terms of, in terms of epistemology and thinking about thinking, thinking about inferring from data, etc. So who would like to start on that? Well, I personally feel that probabilistic thinking is hugely lacking. And I think it's a big problem. And so typically one thing I like to have been doing a little bit is to ask for different people I know about how they view the future, like how they think things will unfold in the next few months. And people usually answer with one scenario. They say, well, I think it's going to go this way, or maybe sometimes they'll add a probability, and it's probably going to be this way. This is not probabilistic thinking, like probabilistic thinking is about envisioning all sorts of scenarios and putting probabilities on all sorts of scenarios. And this is critical because I think, given the huge uncertainty that there is right now, the lack of data about many different things, and also the fact that there's not a lot of data about de-confinement in general, because at the scale we are, there's never been a de-confinement in the past. We should be extremely uncertain about what can unfold. And this means that we should consider different scenarios. And maybe in some scenarios, things work out fine, and that's good. But maybe there are some other scenarios that have a non-negligible probability, where things are bad. Maybe there's some small probability, but non-negligible, that things go very, very, very bad. And I think it's important to prepare answers for all the possible scenarios. And the trouble with thinking that there's only one scenario is that you just say, well, for the scenario I have in mind, what this strategy is going to work fine, so I think we should go all in for this strategy. And then you start to criticize all the other strategies because they're not relevant for the scenario you had in mind. And I think this is extremely dangerous. Yeah, I find what you say super interesting. This kind of things that you described is sometimes referred as long tail distribution. So long tail distribution is that it starts with a higher spike of probability for normal average events, but then has a very long tail of non-negligible probability for extreme events. So I guess the kind of things that can be modelled by this is, for example, the number of subscribers that a YouTuber can have. They are a very large number of YouTubers that have less than 1,000 subscribers. But if you look at the long tail, the number of YouTubers that have 1 million, 10 million, 100 million subscribers, it still has a non-negligible probability, but there is less and less. And this kind of long tail, the problem is that what we see the most is the normal events that have high probability. And I think now we can say that there is the same kind of distribution for the number of deaths from a pandemic. Yeah, that most years it's close to zero or very small numbers, but with some probability maybe 1% or 0.5%. There is a pandemic that comes and kills more than 10,000, maybe kills more than 1 million. And it's problematic in the fact that we don't see it coming because what we observe the most is the small events that have higher probability and it becomes dramatic when we are not, if we are not making the effort to estimate the probability of these extreme events and react accordingly. A good reaction is one that correctly estimates this probability and then adapts the strategy that is not the optimal strategy for the most common case, but the optimal strategy based on our own uncertainty about all possible events. So yeah, this is very important. So yeah, in this podcast, we talked in details about what the mathematics that is developed in the context of AI could help, could provide, and actually it's already provided, already used in the like adaptive clinical trials, but there are a lot of other aspects where the thinking that was developed in the context of computer science and AI could help. Like this is very humble, like without exaggerating how much it could help, but it could provide some guidance into thinking with large numbers, because like we're now dealing with large numbers and maybe like among the fears of engineering computer science like is the one that probably dealt with a lot of, like that developed a lot of tools and like the toolkit of dealing with large numbers with a restricted amount of time. So I don't know if you see something also we could add in this avenue. Yeah, so one thing that's interesting also is the prime of having tradeoffs in general and scarce resources and you have to optimize, given the resources, what you can do and also given how bad the situation is. And I guess interestingly, a lot of the work about this started in World War II. During World War II, there was a lot of thinking about optimization of logistic resources. This is now known as linear programming or you know, maximization in general. And well, sometimes you can actually apply this framework directly to some problems. So for instance, one thing that will probably be important soon is the problem of testing as many people as possible, given a limited number of kits or test kits. Like it sounds impossible to say, well, there's one, well, you can only test one person with one test kit. Well, actually, no, you could have a group testing. You could also have a prioritization of testing. All of this is extremely relevant. It's about like having scarce resources and knowing that you won't be able to test everybody. Because like it seems like the number of test kits is not growing up sufficiently to test everybody. Because of this, you're going to have trade-offs between you cannot test everybody. And so you have to selectively do your testing so that in the end, like, well, probably the end goal is probably to save as many lives as possible while saving the economy and making sure that everybody have enough to live and avoiding wars and whatever. And these great trade-offs are all sorts of different levels. Yeah, we've seen also the like of thinking in terms of trade-off when talking about contact tracing. Because it has some number of negative aspects. The first one being that it's not perfectly privacy-preserving. But as we discussed the effort that are here, do a lot of work to be as privacy-preserving as possible without reaching perfection. And unfortunately, some of the pushback simply consists to say that see there is this case that is this particular case where the privacy-preserving is failing and someone can get your private data. And then right away concludes that so this is not the solution that we want. But I think that they should when making this kind of, we all should when making this kind of reasoning to also consider the negative cases in case we don't use contact tracing. And what motivates me to recommend to use this kind of technology is that I see the negative cases in case we don't use it. Yeah, this is sometimes called a counterfactual reasoning. Like when you're hesitating about making choice A or not A. Well, it's called an A and B. So B can be not doing A. When you're hesitating between A and B, well, you should not think of whether A is good in the absolute or not. You should, if you're doing counterfactual reasoning, at least you should think about, if I do A, what are the likely future scenarios? And if I do B, what are the likely future scenarios? And if in the first form of case, it seems better than the latter case overall, or well, then you should go with A rather than B. And I think this is very important for decision making these days. So yeah, like we mentioned a lot, the safety mindset. I believe the trade off mindset is something also very overlooked in terms of the epistemology we need now to deal with this crisis. And in this crisis, we will have to choose between a lot of bad choices. So there's no good choice. There's only bad choice one and bad choice two and bad choice three. There's no bad public reaction, but only bad public reaction one, bad public reaction two, et cetera. And we have to pick the we have to face the fact that we are in a pandemic, people would die. So we just take choice that would have 1000 deaths and the choice that would have 10,000 deaths and the choice that would have a million deaths. And obviously we should go for the 1000 deaths because we can't do the zero that and and sometimes sometimes by wishfully like doing some wishful thinking and aiming for the zero that we might end up with the million debt. So yeah, we have to maybe raise the awareness and pedagogy efforts on the trade off main mindset and the epistemology of trade offs. It's not something trivial. I think the problem with epistemology is that a lot of scientists take it for granted. Just like thinking a lot of most of us think that thinking is obvious because we have a brain. So just as the normal human believes that thinking is obvious, that we all can think because we have a brain. I actually reasonable thinking is thinking sometimes against your brain. So this is I heard this from an epistemology. I think I don't remember who would say like it say the epistemology. The scientific method is not about thinking with your brain, but it's about using your brain to think despite your brain. So it's like a lot of like we tend like we humans and I don't know I'm talking for myself. For the majority of my life, I thought that I'm able to think correctly just because I have a brain. So if I think then I'm thinking correctly. And it's the same for science. Like a lot of scientists believe that they are epistemologically same. So their epistemology is correct just because they are doing science. And then we take epistemology for granted. So this is very general like nothing to do with the trade off mindset or the safety mindset. Just like we should not take epistemology for granted and we should not take the sanity of our methods for granted just because they are like scientists doing that and using numbers. Yeah, yeah, this is the problem of overconfidence. And yeah, I think even in the scientific community, it's worse elsewhere, but even in the scientific community, there's a lot of evidence that people are usually extremely overconfident, especially when dealing with extreme cases like the COVID situation. This is something that should be developed and we should raise the effort of making everyone aware of this in every field and every aspect of society. And also uncertainty awareness does not mean rejecting certainty. Or like not all uncertainties are equal. So you should have methodological ways to evaluate uncertainty so that you think within this uncertainty and not just say, okay, there is uncertainty here, I don't listen. And then like the importance like for journalists, for example, sometimes this creates incentives to listen to experts that look certain, that look like confident and discard experts that do not look certain. And the problem is that society might create incentives for people to look more confident than what the data tells them to be. Yeah, well, just to wrap up. So this is also known as a calibration, like being able to assess how confident you should be as opposed to how you are. And there's a great app developed by Louis about this called base up. So base dash up dot web. Provide the link. Provide the link. And it's like I really recommend spending ideally a few hours, but at least half an hour just answering the quiz, like it's just like quizzes to be able to switch quizzes. But you have to answer them in a probabilistic fashion. And by doing this, you can train your, your calibration. And then I think this is extremely important. Okay, so one thing that I like with this app is that at least it teaches you very quickly that when you think you are sure at 100%, you are is actually not true 100% of the time. So calibrating yourself consists in being, being aware that when you have one type of feeling of certainty, how, how, how true is it actually going to be easy going to be true 10 out of 10 times or eight out of 10 times and and being calibrated is about to correctly evaluate the ratio of correct or wrong that you use yourself as. Next week, we are going to talk about trust and methods to make a artificial intelligence system more trustworthy. So this is a topic we already mentioned here and there when discussing different previous papers. So for example, contact tracing relies on trust because we want as many people as possible to be using it. But also, we're in our previous discussion, we also discussed that we did AI that is an algorithmic solution to take ethical decisions is also something that that needs to be trusted. Because there is this intuition that ethics is something that is for humans. And only humans can do it well. And it's sort of surprising that an AI would do it well. So when, when we build an AI system to do it, it's, it's quite necessary that this also includes a way to to convince us that it's doing a good job at it and that we trust the system. Otherwise, no one will be using it or there will be too much pushback that this kind of system won't be used at all. And then we lose on the possible benefit of such a system. Good. See you next time. Bye bye.