 Today I'm very very pleased to welcome Professor Joshua Benio to join us. Joshua is one of the world's leading experts in artificial intelligence. He's a pioneer in deep learning and specifically the rebirth of neural networks. Since 1993 he has been a professor at the University of Montreal in Canada and he is the founder and scientific director of the Mila Quebec Artificial Intelligence Institute which happens to be the world's largest university-based research group in deep learning. In 2018 you know Joshua had more citations than any other computer scientist in the world thanks to his you know many many high impact papers and that was a year he earned the prestigious Kilman Prize and this prize is given to distinguished Canadian scholars who have shown continuous excellence and made a significant impact in their field. Of course Joshua is also the recipient of the Turing Award which he received jointly with Jeff Hinton and Young Lecun and the award honors conceptual engineering breakthroughs that have made deep neural networks a critical component of computing and of modern AI. Joshua is a fellow of both the Royal Society of London and of Canada and what is also very very special about Joshua is that he's deeply concerned about the future and the social impact of AI and over the years Joshua has actively contributed to the Montreal Declaration for the Responsible Development of Artificial Intelligence. He supports many AI ethics initiatives and guidelines and he tries to raise awareness on global issues including the environment climate change and diversity and inclusion. Joshua as many of you know is no stranger to IBM research we have you know enjoyed his participation in our annual AI research week events and many of us have been privileged to collaborate with him and his team over the past several years as part of the AI horizons network and his research is inspiring and impactful with contributions spanning technical machine learning solutions and high-level forward-looking proposals about human and machine decision making. Today Joshua is going to describe his work on machine learning projects against COVID-19. That's a topic that as you all know here in IBM research we're also very passionate about and he's going to highlight several projects that his team has been working on one for example is the use of AI to accelerate the discovery of antiviral drugs or another one about extending cell phone based contact tracing methods to achieve earlier warning signals of the spread of COVID-19 and more. I'd also like to introduce Francesca Rossi, IBM Fellow and IBM's AI ethics global leader. Francesca will be moderating the question and answer period at the end of Joshua's seminar and please use the Q&A panel on your console to submit any questions that you have throughout today's session. So Joshua again welcome we're really excited to hear about your latest work. Thank you. So before I start I want to go back in time what seems to be an eternity in the end of February and beginning in March when we started to realize that something bad was coming to us and by you know one month after that I think a lot of scientists around the world were struggling asking themselves a question what can I do with my expertise to help the fight against this pandemic and so did we at Mila and many of our collaborators around the world and I must say that through all of these months I've been really thrilled and my heart has been warmed by the enthusiasm of researchers who are ready to let go of the usual issues they grapple with regarding their research that have to do with ego like you know who's going to be the first author or the second author or the last author and you know who's going to get the biggest grant or who's going to get more cited or whatever. So these petty competitions that we have with each other I think pale in front of the kind of challenges that face humanity and right now we're we're talking about COVID-19 but I'm hoping that that spirit of collaboration between companies between research centers between scientists is going to be you know in some part going to continue after this this thing goes away because we have a lot of other global challenges including climate change which is the bigger wave you know it's not the second wave it's the super big wave that's coming very slowly but surely at us. So that was the little intro to tell you a little bit about my spirit. Now I'm going to tell you mostly about two projects that I've been involved with at Miele as Daria was saying one really regarding new drugs and the other regarding how to use phones to help fight the disease. Before I do that let me say a few words about responsibility since against Daria nicely introduced me with a concern for that issue. Used to be the case not so long ago when I was a grad student and then a professor for many years that I didn't really think about the social impact of my work it was just not a question right because first of all our work was not really used that much outside of university so we could just live in our ivory tower and not care too much and just focus on our math and our algorithms and our technical questions but the world is different now and we really really we have to think about the impact of our work so that means we have a new responsibility that means we can't just do our research or if we're engineers working on products without thinking about questions that we're not used to that means we have to educate ourselves like there's zero formal education in our current universities especially in computer science to prepare us for what it means in terms of ethics in terms of society in terms of democracy in terms of privacy to have machine learning applications in the wild I'm sorry and the thing that actually I'm more scared about in the longer run not in the next couple of years is how what we're doing with machine learning is building very powerful tools and these powerful tools obviously could be used for good but they could also be misused and so there is this kind of wisdom race we have to become collectively and individually wiser in the way that we organize our societies including how we use technology we have I think become wiser I mean we have some setbacks I won't talk about but but not fast enough in technology is is getting more powerful every year at a rate which I think outpaces our current you know ability to improve our collective and individual wisdom that's the thing I'm most afraid of it's a little bit like if we if we were to build tools that could be used to build super powerful bombs but we allow children to play with those those those tools which can become weapons and can destroy you know each other and and potentially the planet and and that's why I also care a lot about you know just thinking about the responsibility in a negative way but also how we could use our expertise for goals that are not just profit driven but also are focused on what technology can bring that's best for humanity so yeah for social good or you know whatever name you'd like for this and the projects I'm going to tell you about basically fit these kinds of things whether it's it's has to do with healthcare education the environment humanitarian applications and so on I think these are good examples okay so now let me talk about COVID-19 antivirals so there are a lot of research groups around the world who are struggling to find vaccines and to find antivirals these are different things the the the vaccines are the ultimate weapon but they might take a lot of time or maybe we won't even find good vaccines think about HIV right we haven't found any vaccine for that yet so and a shorter term goal is to build treatments and most commonly through new drugs or repurposing existing drugs so the first thing you can do is take one existing drug that was meant for something else and then realize that it could be used against the the virus and the good thing is there are not that many existing drugs a few thousand and so you can actually use high throughput assays to test every single existing drug so that's good but now if you go from one drug maybe we don't find the answer there and doesn't look like we found that the next thing up if you want in terms of complexity and power is two drugs or three drugs so we can take existing drugs so we know their toxicity we know how they could already you know what harm they could do and the fact that they don't harm too much and we're looking for a combination which together somehow makes a difference that we already know for example with HIV that actually whereas where a single drug no single drug might cure a disease sometimes two or three together can make a miracle and so there is a number of projects going in this direction and I'm involved in one of them and in terms of machine learning what this involves is putting together a kind of knowledge graph of all the information we have about all the existing drugs and how these drugs are related to protein so typically a drug these small molecule drugs they target a particular protein in the sense that the drug is going to typically prevent the operation of the protein or potentially highlight I mean enhance it but usually prevent it and so there's a fairly clear relationship between drugs and proteins but then those proteins can have impact on other proteins and then you know the cells are complicated we don't fully understand them so there's a complicated set of interactions between different proteins and if each drug is somehow doing something to one protein and the proteins interact with each other and you want to have multiple drugs understanding that network of relationships at least at a statistical level could really help us guess good new combinations of drugs and of course in addition you can collect data so once you have candidates and this is going to be the general strategy right so you have some sort of machine learning method to propose candidates and then you're going to test those candidates and you can test them first in as I said the assays where you can you can use robots for example to test hundreds or potentially even thousands of new drugs or drug combinations and see if they if they bind to the target protein or how they affect some cells and so and once you start doing this you get data that you can now incorporate into your set of data points for your machine learning and then you're going to iterate so you're going to use that data to come up with new candidates the new candidates are going to be tested by the biologists and eventually some of those tests might actually be reasonable and that's just the beginning of a long pipeline of evaluating the drugs ultimately leading to clinical tests which which are going to be of course slower and more expensive and so there's this whole funnel of selection and machine learning now is just acting at the top of that funnel to select from a bunch a set of candidates that we don't have time to actually test chemically or biologically the other approach we signed combining existing drugs is to create new drugs new molecules so now the advantage is we're searching in this huge space of something like 10 to the 60 potential drug like molecules the problem of course is that it's difficult to search in this huge set and so we want to use first of all machine learning methods first of all we can use physical modeling methods that are in silico we can use machine learning methods to approximate these physical models so that we can do say a hundred times more evaluations and get an approximate evaluation and then we can use things like reinforcement learning active learning and so on in order to search the space more efficiently so that's that's the story I'm going to focus on yeah so so first of all just you get a sense of the how big the the problem is or the potential advantage because most small molecules have never been even thought by a chemist or a human being and much less been evaluated for real so there's a huge potential of discovering new drugs whether it's for COVID-19 or for other things if we can develop better tools for searching in that space so so that's interesting now the potential for using machine learning in in pharmacology and drug discovery I think it's just we're just seeing the tip of the iceberg right now and one of the big issues with the current approaches is it takes a lot of time to to discover a reasonably interesting candidate or a lead anywhere from three to 20 years and that's one of the reasons why typical drug development is so expensive in the case of COVID-19 it's clear we can't afford to wait that that long and so it's really really worth it to invest in methods that have the potential to reduce that search to something like six months now we're not sure that we're going to be able to do that but it's a bet that's really worth doing given that we are so short on time to find a cure okay so there is something really interesting about from now I'm going a bit more technical there is something really interesting about searching in that space of molecules that has to do with the way that we're going to evaluate new candidates so you know naively you could think well I propose a candidate and then I have some oracle which tells me how good it is like we want to estimate the binding affinity between the the candidate's drug and target protein well if the world was as simple as that it would be nice but but actually the world is more complicated if you want to actually get the answer how good is this binding you're going to need to do an actual chemical experiment and that's going to take a lot of time compared to what you could do with in silico experiments and then in silico experiments there are many things you could do at different tradeoffs between precision or fidelity and computational time so so so we enter a really exciting I think research area for machine learning which is how do we trade off computation for the speed of solving a problem in our case we're searching for good candidates right so each time I take a decision like I'm going to call a particular oracle and that oracle is not perfect it's going to be an approximation so if I use FEP calculations I can have a very good oracle but it's super expensive you know it might take like hours or something to get an answer or minutes depending on the kind of calculations you do you could also use a docking calculation which is much cheaper than FEP but also has less precision or you could use a neural net which has been trained to approximate either the docking or the FEP or a combination of both now what happens if you if you use an oracle that has less precision is that you're going to need to look at many more candidates than if you have one that has more precision this is what this picture is saying so if you if you just do like random search and you evaluate candidates using different kinds of oracles the blue would be like the true one the green is the FEP and the orange is the docking this is what you should expect in terms of how many candidates you have to look at on the x-axis versus how well you're doing you want to minimize this binding affinity and you can see that the slope depends on the precision of your oracle so anyway this is raising a lot of interesting questions about deciding at any point in time which oracle I should be calling given the information I have and how should I organize the search to to make that slope better all right so we we have developed a research program called lambda zero which is whose starting point is mu zero the reinforcement learning approach that has been developed for computer go and it starts from running about 200 million simulations of physical docking and then using those as training examples so so it's interesting that because we're using these these simulations to generate data these data is kind of low quality right it's it's a bit low precision but it's a lot of data so you might think that when you do a drug discovery you're going to have very little data because the actual number of candidates you can you can evaluate in real with you know chemists and biology is going to be fairly small but then you have this huge amount of low precision data where the target has you know is not perfect and and so this is also an interesting challenge we have these two kinds of data in fact you're going to have a whole spectrum of different types of data with different precision how can you take advantage of that and what's interesting is that if you use some some sort of reinforcement learning instead of just randomly guessing candidates and testing them you can actually really improve so so not only you can get an advantage by using a more accurate prediction of what would be the success of a particular molecule but if you if you search in that space using better machine learning methods in particular here reinforcement learning and active learning methods you can also for the same amount of computation discover molecules that are substantially better in terms of energy so this shows where we currently are with one of the elements in in in this system which is going to be the approximate reward function for the reinforcement learning is just a neural net that predicts the the the outcome of the docking and and and what's interesting is we can get a fairly good approximation so on on the x-axis here is the real docking output and then y-axis is the predicted output from the neural net so you know it tracks the identity but there's there's uncertainty and noise around but but then we can do this 100 times faster than physical docking what we want of course is to be able to use these searching methods in order to virtually scan through a a space of molecules which is much larger than would ever be possible by enumerating molecules and then evaluating them separately um one of the components that's very important in in this kind of research is uh synthesis ability so it's not enough that the uh molecule we're looking for binds well to the target it'll also be something that chemists can actually build at a reasonable price or you know even is chemically feasible so there are um uh uh software using all kinds of chemical rules to evaluate the synthesis ability and you can also use machine learning because these are also too slow to run to you know to get the nice throughput so you can use neural nets and here's graph neural nets to approximate the result of these synthesizability calculations and again you can approximate these things pretty well so now you have a like a double objective which is uh a the binding affinity and the synthesizability and then another one which we haven't incorporated but becomes will become important is toxicity right so you want a molecule that will bind to your favorite target but is not going to bind to everything else and destroy the person at the same time as killing the virus um yeah this is for chemists it doesn't mean anything to me um oh let me just say a few words uh and i'm gonna move on to the other project about active learning so so i mentioned that already um we're gonna have these iterations where uh we use machine learning to generate candidates and then uh we're going to um use these candidates to obtain new experimental data and then that experimental data is going to be added to the training set and so there are all kinds of interesting questions here um how do iterate these things in an optimal way so for example um one way to think about it one aspect of it is simply well how do i what are the criteria for selecting which molecules you want to evaluate you might think simplistically that simplicity in a simple minded way that it's just the molecules that have the best score in terms of the the binding a predicted binding affinity and the predicted toxicity um but you also want to take into account uncertainty so you want to maybe doing things like basing optimization uh or other ideas coming from active learning um in order to also evaluate molecules for which uh you have a high uncertainty so then there are many methods to evaluate uncertainty uh which one is going to be more appropriate um and then another interesting question is when you're going to provide these candidates to the biologists you don't want to just give them one molecule at a time because if when you're going to do an experiment they might as well do it with a batch of 100 molecules or something like this and so you really want to provide a batch of candidates but if all of these say 100 candidates are good but they're all kind of the same under with some small variations you're not going to gain a lot of information you have to think of this whole iterative process of trying to acquire information so in you know with that in mind you want to take into account the sort of mutual information um that that these different candidates have with can bring together um and so there are things like batch ball for example which is a method that's been proposed recently to um to to estimate the the the information gain from the whole batch and not just from a single candidate so there's really a lot of interesting machine in questions that come up and oh and the other interesting machine in question is instead of thinking of each of these filtering steps and prediction steps as independent machine learning problems really um the ultimate goal here is to think of the whole process with the the chemistry in the loop is one big search problem where we can use machine learning uh in a way that's optimized with respect to the whole search process including the the the feedback coming from from the real world okay um so so the work that is going on right now involves a lot of people I'm not going to name them all but um uh you know people from from Mila people from many other organizations and um it's really exciting to see the the energy that goes into this research um all right um for the next uh 15 minutes or so let me tell you about a completely different project which has to do with how machine learning could improve contact tracing as well as um epidemiological modeling so let me say a few words about uh the uh the way that the the virus propagates so what you see in the figure is the um uh viral load um evolution uh during the course of the disease so zero is the day when uh you have symptoms appearing and then you're probably going to have symptoms for for a number of days that can vary quite a lot um but the interesting thing is in the three days that precede um having symptoms you already have a high contagiousness so the viral load is how many viruses you're shedding and that means you know a high viral load means you're more likely to be contagious to others now what's really um um scary here is that you use your your contagious for like two three days before um you even know that you're contagious or at least you that you have any clue that you're contagious so people when they realize they have symptoms they get worried and they will typically change their behavior uh so they will become more prudent uh go out less uh maybe even go in quarantine um the problem is uh well okay so one problem is there are people who don't care that's a social problem and a political one um but there are also people who simply are not aware that they could be contagious and if we can bring any kind of information to um to these people so they can change their behavior just be a little bit more prudent right take take a bit more distance um stay at home if you can work from home things like that um then we can really save lives and reduce the the rate of um spread of the virus all right so how can we do that well um first of all uh let me tell you about one of the most powerful tools we have to provide early warning to people and that's contact tracing so uh that's you know usually done manually um we ask people who have been tested positive to report who they have been in contact in the last couple of weeks and then we contact those people and we ask them to go in quarantine uh and potentially be tested so um so that's that's the standard way but there are there are a number of issues with this um the main issue is it's only when the person you know has been tested and usually that maybe like a week after they started having symptoms um um that the information can propagate to the people they have been in contact but by that time many of these people uh may already be be symptomatic and uh they already know that they might have the disease and so it's not going to change as much um if we could know that you are uh carrying the disease earlier uh then we can warn your contacts earlier so that's that's a little bit of what we're trying to do here so um so yeah that's where digital contact tracing comes in but well first of all one thing I forgot to mention is uh at least in in countries like Canada it takes a while between a positive test and a contact tracer to actually call all of the people uh that you know uh we're talking about like between one and two days so that's an extra delay and if you consider the little time window that we have that I mentioned of like two to three days adding one or two days to this is really bad like we really are playing against the clock so everything we can do to shave time in uh the way that information propagates from people who clearly have the disease or might have the disease to people who don't know that they might have it uh could have a big impact so so digital tracing that means we're trying to do something comparable to uh contact tracing manual contact tracing uh but we are we would like to take advantage of your phones to allow the information to propagate a faster and to um even to people you don't remember spending time with maybe you were I don't know in in a queue or in a bar together and of course you don't know that person so you can't like remember that person's phone number and name so on so that the contact tracer can can find her so so that's that's you know the promise of uh digital contact tracing but usually it's still done um kind of imitating the the the manual contact tracing that the information starts to propagate only after a positive test has been obtained so so this is where um machine learning can come in uh because one of the challenges with um just looking at symptoms is the you know there are many kinds of symptoms you can have like easily a dozen symptoms that we currently know about that may be revealing of having um uh the COVID-19 disease and uh yeah how do you know uh and and these symptoms can have different levels of severity so so like how you convert that information into um a an action probability like what what should you do should you warn your contacts uh how how much uh what should you tell them like how prudent should they become like maybe it's just a false alert like we don't want everyone to go into quarantine because they've been you know in contact with somebody who's starting to have a cold right and so so there's a trade-off here in general between um uh how much freedom we're going to remove from people by asking them to be more prudent maybe to go into quarantine versus uh how we can slow down the rate of propagation of the disease and so it becomes really important to have the right tools to evaluate the trade-off and that means we need to evaluate the risk like the probability that you're contagious or the amount of contagiousness like this viral load I was talking about uh and you might use if you consider one particular person you can have many sources of information now to provide clues that that person is contagious so I already mentioned symptoms but we also know that having the disease or not depends on prior medical conditions so one of the things that we've done is build a questionnaire yeah these these questionnaires exist uh there's becoming fairly standardized uh we know a lot about medical conditions that increase your probability of getting the disease um and we also know that things like age make a big difference so you you want to ask these kinds of questions ahead of time and then when people starts having symptoms they could report them on their phone um also another source of information for a particular person is um have they been in contact with people who seem to be at risk and what was the risk level of these people like how how likely were they contagious and uh and so on so now we have all of these sources of information and you can see that it would be hard to come up with a handcrafted heuristic to combine all those pieces of information it makes much more sense to use machine learning to combine these pieces of information and then and then if we can do that then we can provide an early warning signal uh well first to the person uh but also to the people that the person has been in contact with and potentially you know their own contacts so so that's the idea of machine learning based digital contact tracing okay so so let me let me go through a little um a sketch scenario here to illustrate how the early awareness could save lives so so we're looking at this character I'm sorry if the letters are too small for you to read so I'm gonna explain verbally anyways but we're looking at three potential histories of the same underlying scenario where character Jim is gonna get infected and then potentially infecting others under a scenario where there's manual tracing going on a scenario where there is digital tracing and which we call binary tracing because you either you you have a binary decision that you know you're at risk and so you should change your behavior if you've been in contact with somebody who was contagious then suddenly you go in quarantine versus a machine learning based approach the third row where you can have a graded signal that um you know uh because you're not completely sure that you're contagious that you're infected and so instead of going you know uh bang bang I behave as usual versus I go in quarantine you can have intermediate levels of of say prudence that's going to change your behavior but not necessarily prevent you from doing any activity so so at the top what what we see is that on the first week on the Wednesday well in all the cases our character Jim has actually a contact with a high-risk stranger somewhere um and what happens then is that the stranger a couple of days later starts showing symptoms now in in the manual and and regular uh digital tracing that information uh doesn't reach Jim but but actually if if Jim was using machine learning and the other person was using machine learning uh an app with these kinds of tools you could have early warning signals even before the other person gets tested in the second week so so even before the other person gets symptoms because uh that stranger has the app maybe her app has already calculated that because of the contacts um that person had with others who were infected that uh she might already be at some level of risk and then that level of risk will propagate to some extent to Jim and so even on the first day Jim might already get a signal that uh he should be a little bit careful and then when the stranger starts having symptoms uh Jim is going to get a slightly higher level recommendation of being careful and then a few days later um and when the stranger symptoms grow worse then this then then Jim gets an even stronger signal and this is you know where in in the in the cartoon story uh there's a difference because in one case Jim decides to go to work and the other case he decides not to go to work um or to go to you know whatever public place and um and then we can see the difference between the manual tracing and digital tracing in the sense that the the test results uh the time delay between the test result and Jim getting the information uh might you know make also a big difference so a few words quickly about some of the really interesting and challenging issues around these kinds of projects first of all um it's a lot about privacy so how do we how do we find the right trade-off between privacy considerations and machine learning considerations because you know at first sight these two things are sort of in complete opposition uh the privacy uh considerations say well no data because any bit that I sent is potentially exploited and machine learning people want more data you know as much data as possible um so it looks like uh it's hard to resolve this but but really you know once you allow some data to to be exchanged there are many options and you'd like to find sort of the privacy and the security and the communication options that are doing to be good from a privacy perspective but also allow enough information to to be uh propagated for machine learning to do its job and when you look at the privacy issues there's basically two big categories that you have to think of there is the big brother attacks and the little brother attacks so what is that so big brother attacks I think we understand that we don't want say governments or large companies to uh centralize data about everyone so that they can use it for purposes that are not good for us in particular threaten our democratic rights or exploit us in any way the little brother attacks are a bit more uh I mean people think more or less about these things um so what it means is your neighbor getting to know that you're infected or the people that you meet on the street to know that you're infected so of course you don't want that right you don't want that because you you want to protect your dignity you don't want to be discriminated against you don't want to be stigmatized so uh or even taken advantage of by somebody who could you know knowing that you're infected you know make money out of you or something so uh unfortunately the different kinds of solutions uh that exist in terms of privacy tend to be either um like mostly helping against the big brother problem or mostly helping against the little brother about the problem but it's hard to reconcile both kinds of uh defenses but you have to keep that in mind and you have to make some choices which may depend on what do people care most about okay now from a machine learning perspective I won't have time to go in a lot of detail but there are many approaches that one could look at um and unfortunately often the approaches we can think of come in contradiction with privacy constraints so for example it's not easy to do things like the first thing that came to me when I looked at the problem was oh which is going to do loopy belief propagation or something like this where the nodes correspond to different people and their phones unfortunately this requires communicating a lot of information very often between all the phones um and um uh if you want to do uh learning on a server that means the server would have access to the full what's called contact graph like who met whom when and where and this is something like it's really really bad from the point of view of a big brother attack um and so ideally we don't want to have any search file that contains the contact graph of every everyone so you need to find solutions that that avoid that um I see that time is flying so uh let me skip a few things let me tell you about what we have um been exploring um so we've explored a solution in which um the phone is doing a lot of the calculations but we're also using a central server to do the learning uh ideally we would use federated learning but that raises other challenges and from a machine learning point of view what's sort of a bit challenging is that the input now is not a fixed size thing it depends on the contacts that I've had in the last two weeks and the number of these contacts varies so we wanted to use a machine learning method that can deal with variable length input and and one such approaches is transformers which is what we've been using um also another kind of tricky question is well what do we want to predict exactly and I mean ideally what we want to predict is something which unfortunately we can't measure which is the contagiousness I had uh and days ago so let's say Alice met Bob five days ago and now Alice has some probability of of being infected um what kind of information should Alice send Bob so that Bob um you know optimally changes his behavior and um what makes a lot of sense from an epidemiological perspective is well basically Bob wants to know if he's infected and the the information that's most relevant to that is um how contagious was Alice five days ago when she met Bob and so the the predictor on Alice's phone should be predicting for each day of the past two weeks how contagious she was um so that she can send that information to her contacts of those days so now it means the output is also not just a single scalar but a but a you know one at least one quantity for each day in the past um let me skip that um so um yeah so so one issue is the thing that we want to predict isn't something we can actually measure like contagious is not something you're going to get from the tests especially contagiousness in the past so what we want to do is infer it and the approach we've chosen is to infer it using a um a generative model of the joint distribution between the things you observe like especially on on your phone and the things you don't observe which are latent variables and um and training an epidemiological model which captures that joint distribution on one hand and the other hand training a neural net inference machine that predicts the latent variables given the observed variables so this is something that should sound familiar you know it's basically what em does right so uh we've explored approaches based on amortized version of inference which is what you have in VAEs um but essentially similar to what you do in em uh i'm going to skip that and just uh last bit is that part of that research project involves building a good um simulator or a good generative model for the all of these variables and the best way to do that is to have a really good epidemiological model that is organized at the level of individuals so it's not just like the standard ones or compartment based models where um you simulate the proportion of the population which is in different stages instead here you do this but for each person in a population which raises all kinds of computational challenges it needs a lot of computational power um but it's uh in this way you're able to simulate things like what if this particular person given the information that she has on her phone changed her behavior in such and such way how would that affect the overall evolution of the virus so this is the kind of calculation we've been doing um this is more information about the simulator um and uh and and our simulations seem to suggest that indeed this early warning signals allow you to reduce the number of infections to reduce what's called the um uh reproduction number of the virus which is uh how many people a person will infect in average so this is the second graph and in there in the bottom uh where we see that the machine learning based methods is able to reduce by maybe 30 percent the this reproduction number and that translates into a number of cases uh which is not going to grow as quickly um compared to other uh approaches like standard contact tracing so I'm going to stop here and just mention that this is the work of a lot of people again and thank you very much