 This is actually our one guest speaker from outside of the Syracuse kind of family, so to speak. And we thought it would be interesting to talk about outbreak analytics. And so maybe you can give a short introduction to yourself as to how we've done with the other speakers. So I don't mess it up and then jump into the talk. So can we switch over to... Yeah, thank you. Yeah, sure. Hi, good morning everyone. Thanks Jeff for having me. Can you hear this? The sound cut out a little bit. So I don't know if you can hear me now. Yes, okay, brilliant. Okay. Yes, yeah, we can. Thank you. So my name is Thibault Jean-Barre. I'm an associate professor in outbreak analytics. I'm gonna tell you about what outbreak analytics is. In a minute, I work in London for two universities, the London School of Hygiene and Tropical Medicine and Imperial College. And I've been in this field roughly for the last 10 years. And I think right now I am going to share screen and start sharing my slides. Word on the slides, I will be sharing the URL of my slides after the talk. I'll have to dash after the talk. I apologize for I won't be able to stick around for the rest of the event. But feel free to treat my slides as creative common attribution CC by as if you want to reuse them, please go for it. Okay, can somebody just confirm that you are seeing my slides full screen now? Yes, no. Okay, I'll just assume that you'll jump in if you cannot see my slides. We can hear it. Okay, perfect. Okay, I apologize. I keep having notices of people in the waiting room. I'll close them for now. Okay, so I'll be telling you a little bit about outbreak analytics in the next 20 minutes or so. And then I guess we'll have plenty of time for a conversation after this. So I've got actually a short course of five days on outbreak analytics. So I tried to condense quite a lot of things in just a short 20 minutes. So there'll be gaps. What I'll try and do is give you a few examples of outbreak analytics applications and to start with, we'll actually talk about what outbreak analytics is. So this slide usually is more telling to the people in the UK because these are faces that pretty much the general public in the United Kingdom will have seen over the last year and a half now. And I don't know to what extent it's been similar in the US, but in the UK there's been a lot of focus on infectious disease modeling in relation with the COVID-19 response. And so basically for me that was the first time in 10 years doing this job that I was seeing my colleagues and my line managers, which you see here in the newspaper, on TV, on the major media outlets, to discuss sometimes quite advanced modeling concepts like over dispersion in reproduction number, which some of the articles that I'm putting down there are actually talking about. So it's really come to the front of the scene, so to speak, infectious disease modeling over the last, well year and a half really with the COVID crisis, but it was there before. And it's becoming trendy now to look at trends, but it was actually a thing before that. And in fact, there's a little bit of a distinction between infectious disease modeling, which is the application of mathematical modeling to the analysis of outbreaks really. So how fast is it growing? How can we contain it? Basically the kind of question you can address through these kind of tools. And in fact, you could take a broader view to the whole thing, thinking that it's not just about the modeling, it's about all the steps of data analysis that you will need to take to be able to go from data that's observed on the ground in the field and informing decision making. And so I've been involved personally with several infectious disease outbreaks before COVID-19, which some of which I will talk about, but especially an outbreak of Ebola in the Democratic Republic of the Congo, I'll touch on to that very soon. And the idea was really we need to be on the ground and we need to be able to use all the toolkit of modern data science to inform the response in real time as well as possible. So to describe the emergence of this field because that was kind of a new field in data science dedicated to the analysis of outbreaks. We coined that term outbreak analytics and there's this paper that we wrote in 2019 to describe the emergence of these new fields which basically is the crossroad of many different fields which is including field epidemiology, methodological research, research in stats and machine learning if you want, but also software development and public health response really. And the aim of these say new data science really is to inform the response to emergencies in real time. So that's what we're gonna be talking about. A little bit of complement on the context. It had been a few years I was interested in developing tools for outbreak analytics and there was a real gap there, there still is. It's being filled, I'll come back to this I think on my very last slide. But to cut the long story short, I put together originally a network that then turned into a non-governmental organization called RECON, the R-Apidemics Consortium and it's basically an initiative to develop R packages R as in the language for data analysis. So basically to develop free packages for data science geared towards the analysis of outbreaks. And okay, so that's a little bit of context. It's a very recent data science. There are still quite a few gaps in the tools that we need. There are still a lot of things that we are not quite sure how to do, but there are some things that we can use data science for when it comes to informing the response to epidemics. Now for most of us, this is a very different context to the one we usually see in data science. So the outbreak I'm going to be talking about quite a lot in this talk is an outbreak of the Ebola virus disease in the Eastern DRC, Democratic Republic of the Congo from 2018 to the official end was 2020. Some of you might have heard of this one in the media, the larger outbreak a few years before that in West Africa was probably a little bit more known to the general public, but this one was still the second largest epidemic in the world and the largest in DRC. And it was a very, very difficult epidemic because of the context. So for those of you who don't know about Ebola, think of the nightmare of an infectious disease. It's basically caused by a virus that we believe is in the animal reservoir in bats, probably circulating there, but when it transmits to humans, it is extremely deadly. It's an absolutely terrible disease. We're looking at roughly 60 to 70% of the people who get the disease who die from the disease, which is amongst the deadliest diseases that known to mankind currently. And this outbreak was roughly 3,500 cases, which for an Ebola outbreak is very substantial. Now what made, so there's no easy Ebola outbreak, what made this outbreak even harder to handle was the place where it was happening. What's happening in North Kiwainituri, these are two provinces on the eastern part of DRC, which for probably the last 50 to 60 years have been under very frequent military conflict and armed conflict. There's a lot of different paramilitary groups there. I think ISIS basically made their position there in Benny in the city where I was at the time, known the month I arrived there. And there's tons of others and it's very, very dangerous place. There's a lot of violence, a lot of poverty and that makes responding to any emergency very difficult. Threats to the local population of course, first and foremost, but also to the response staff and facilities. So kind of a different context, but sorry, one of the things I should highlight and I'm gonna try and get a laser pointer, a virtual laser pointer here. One of the things I should highlight is that that was also probably the first time, at least to my knowledge, that there was a data science cell dedicated to informing the response in real time. There was an analyst called Cell, created as part of this response pretty much from the get going. And personally, I've been spending about a year there, well, six months in total, but like across three different missions in the year. And my job there was really to build the data infrastructure for this cell. So what can you do with data science to inform operations? But there's a few things. The first thing is sometimes just a nice telling visualization. I don't know that visualization is nice, like you'll make that call, but at least it was useful. So what you see on this graph on the X axis of course is time and on the Y axis you see numbers of people that are sick with Ebola on a given day. Now, when people report cases, and you've seen that with COVID, what you report typically is either the number of new cases that's your daily or weekly incidents or the total number of people that have been infected so far. But very rarely do you have an indication of how many of the prevalence that is, how many people are sick currently today. Now, this was presented at a meeting, a weekly meeting we were having with all the decision makers of all the big UN agencies. So the World Health Organization, the military branch of the UN, the World Health Program, everyone was there and several NGOs as well. Everyone knew the outbreak very well and I asked how many people are sick today with Ebola and nobody had a clue how many these were. Now, what you see on this graph is like there's two shades that the total that you see on the Y axis is the number of people sick on a given day. But the important thing is the distinction between the darker shade, which is cases we know about and the lighter shade, which is cases we don't know about. So we don't know about yet, but we will know about them eventually. The reason why we don't know about them yet is they've basically not been detected yet. They are sick in their communities, probably passing on the more infections. And the important message of this graph here is that basically if you look at on a given day, the chances you get to control your outbreak is basically determined by how much of the dark shade you have. If everything's dark, everything's known, that means there's very little ongoing transmission. And the opposite is true if like this case, on a given day half of the cases are not detected yet, then that means you've got a problem because all of these lighter shades are still passing on the disease. Now, there was a question in relation to that. It's like, why do we have that? Well, we have that because there is a delay to case detection. And the question is like, how does this impact if we reduce the delay to detection or to hospitalization of cases? How is this going to impact transmission? Now, based on some data that we had on who infected whom and when people, oh, sorry, I'm hearing myself a lot. I don't know if it's possible to tweak the sound, but I'm hearing myself with a lot of echo that's a bit disturbing. Sorry, I'll keep talking, but if you can fix it, that'd be great. Okay, so basically what you see here on this graph is the distribution of how long it takes when a people is sick with Ebola for them to infect new people. So how many days after they get their own set of symptom, after they start being ill, how many days does it take for them to infect new people? And so that's a probability distribution. So basically what this says is like, mostly it's between zero and 20 days, very little transmission happens 20 days after the onset of symptoms. Now, the interesting thing in these cases that we positioned the mean reporting delay, that is the average time it took us to detect an hospitalized patient. And it was about seven days. And so that's like right on this part of the distribution here. Now, why does it matter? Well, it matters because that means that on average, if these vertical line here marks where we essentially interrupt transmission, because when people get hospitalized ideally, they stop spreading the infection, right? Unless you have transmission within the hospital, which we had as well, but that's a different story. But basically everything which is right from this line ideally is avoid transmission. And so the question was, what's going to happen if we can shorten the delay? Well, if we can actually switch it to say three and a half days, basically we can calculate, we can give an estimate of how much we will reduce transmission. And in this case, so that would be like this case, for instance, if we shift by just three and a half days, so we have the average reporting delay, on average we estimated that we would avert 30% of the secondary transmission. And in this case, that 30% was taken us from our reproduction number, that is the average number of new cases per infected person below one, which is threshold, I'm sure you've heard of that over the media outlets. But basically that 30% reduction meant control of the outbreak, which was a very big operational impact because that meant a chance to control was not deploying more people, not having a heavier footprint on the communities, but being able to access the cases faster and for that we needed community engagement. So this was an example of simple analysis that basically were used to inform the response like directly. So still talking about this outbreak, another case, but this data science being applied to inform the outbreak this time with a little bit more modeling involved. I said this was a violent place. We had, we faced several attacks, armed attacks on several able-to-do treatment centers in the course of the outbreak. These were absolutely terrible, we lost some patients, we actually lost some people, some staff as well that got killed during these attacks. So you see here a photo that was taken just a day after the attack. And basically a whole able-to-do treatment center was burned to the ground. And so it has, when this kind of thing happens, of course it's going to have a very strong impact on dynamics of the disease, cases are going to go up because surveillance is gonna crash for a few days, we won't be able to treat patients or to welcome new patients and treat them. So that's quite terrible. I've written a paper on some of the consequences and of these kinds of events on Ebola Dynamics but that I link here. But the question at the time was, we are going to rebuild in a few weeks time because that's how long it's going to take us. How big does the new center need to be? So the question, I'm gonna cut a long story short here, I'm very happy to take more technical questions after but basically we had data on how long people were staying at hospital. And you see some curves here and how long people stay at hospital with Ebola depends on whether they survive or whether they are going to die. So it's a composite distribution here but basically we can integrate all of these and get this like a distribution for the duration of stay in a hospital which we can use for computer simulations. So what we're gonna do is we're going to estimate how fast the transmission spreads. We're going to estimate how many cases we expect to see in the next few days and then for every case, we'll simulate a course of hospitalization and then we just count how many beds are taken on a given day in our simulations. And of course, this is a random process so we won't take just one or two simulations, we'll take a few hundreds of thousands or tens of thousands. Actually I'm exaggerating, it was only 5,000 but it doesn't really matter in this case. And so what you see here is what I presented at the time and you can tell because this actually is in French, DRC is a French speaking country. What you see here on this graph are two things. The pink bars are numbers of cases, the real ones and the simulated ones. And what you see with the box plots here are predictions across all the 5,000 simulations of how many hospital beds are needed on a given day. So all of these box plots are simulations and the red dot here is the data point of the day I was presenting the analysis which reassuringly was right in the middle of the predictions. And so what this was suggesting was that actually in terms of bed capacity by the end of October would be below 20 beds and they were aiming for a little bit more than 20. So the take-home message of this analysis was if everything stays as things are currently are, we are expecting the new facility to be enough to accommodate the new patients, which it was quite thankfully at the time. And this kind of thing of course is a general problem and in fact we've reused exactly the same thing now for the UK, I mean I say now but it's been around for a year now but we've turned that into a shiny app. So it's a web app which is basically using R in the background to forecast bed needs for COVID-19 this time in the UK. I'm part of the modeling committee informing the government UK for COVID-19. So this was one of the tools that was actually at the time used to show that we were going like A&E services were going to get saturated and we needed to go on lockdown soon. So another example of that kind of modeling impacting decision making. And I'm very aware that I'm dragging along I've got three minutes. So I'll probably go very quickly over this. It's very trendy to talk about machine learning for good reasons. One of the problems that we had with COVID in the UK and other places was to actually detect whether or not there were recent changes in some locations. So you can break down your space in many, many different locations and you want to be able to pick up the new therapy of infection. And so that's kind of the problem here. You've got case numbers of cases at the time. The question is are that compatible with what we've seen so far or is there a marked acceleration there? And I'll cut a very long story short here but basically there were methods to address this kind of problem that we were not completely satisfied with at the time. And it became very problematic especially when working with a World Health Organization who wanted to be able to do surveillance at a worldwide level with countries that had very different temporal trends and I illustrate here the differences between all of these different, from near exponential trends or linear with some noise or different types of periodicity, changes in trends, et cetera, et cetera. And basically what we did was to develop a new method and I'm sorry because I'm gonna have to go quickly over this so I won't give you the technical detail of it. What we did was use an automated machine learning approach to address this problem and what the method really does is it's going to look at the past trend, fit many, many different models, hundreds of models to the past trend, get the best model, the best fitting model, say, okay, well, this is the dynamics that we've observed over the last five or six weeks. And then it's going to compare these dynamics to the last few days of data. And if the last few days of data are very far from the model then we'll basically raise a red flag and say, look, it looks like it's accelerating. Cases are higher up than we expect to see given the past trends. And basically, so this is an R package called trend breaker, which is not released yet, but you can already see it, it's on GitHub. And you can see here it being used for surveillance on country levels. And what you see up like, so for Albania, for instance, this is the raw data and this is the output of the trend breaker package where every time there's a data point that should raise a red, perhaps a red flag will be flagged in red here. And you can see, for instance, in the case of Turkey here, even though the data points didn't steam just at the naked eye to mark a very strong acceleration they actually were. And it was confirmed after that actually, Turkey was a little bit out of control after this. And so this is, I've worked with the WHO on this for six months. It's now part of their routine surveillance pipelines. It's used for decision making and resource allocation alongside a bunch of other tools. You'll have the links in there. There's actually another Shiny app which is publicly available. So you can actually see this in production being used on the WHO Shiny app. Okay, I'm sorry for being over time. Just to say that if you're a young data scientist or an aspiring data scientist, I think there's a lot of opportunities in outbreak analytics. And Bill Gates blogged about us as like the Avengers of virus hunters. And I think this is very, very far from the truth. That's this article there. But there's actually nonetheless exciting things to be done, whether it's methodological development or if you're interested in programming and software development or just if you like doing training and capacity building, tons of opportunities there. And with this, I'll just leave you with an advertising for a big project that's coming out early next year. We'll start recruiting for outbreak analytics programmers at the London School joint with like other partner centers. And I think that this, I'll up on the floor for questions. Thank you very much. So thank you. I guess, first, for people online, if you have questions, please post them. And maybe I'll start off with one question that I kind of thought of as you were going through. So you talked about analytical cells that were deployed or at least one cell that was deployed for Ebola. Could you do two things? First, talk about the cell, like who was part of that team? Cause I think using the word cell for like a team, who was part of the team and what the skills were a little bit on like day-to-day, like what was actually going on. And then the second part of the question is, is there anything analogous that was created for COVID? Oh, great question. Okay, so the team, so I joined the team probably five to six months after it got created, but there was no infrastructure at the time. So my role was to build the data infrastructure. One of the key things for the team was actually its composition that was super diverse. So we had field epidemiologists. We had some statisticians increasingly. In fact, I started recruiting statisticians and data scientists after being there. We had some GIS, one GIS expert. And one of the key things was also to have social scientists. In fact, and now a very good friend of mine was like leading the social sciences for UNICEF at the time. Well, it still is. And that was key because typically the ideal way that kind of cell would work, I think is the data scientist will flag something unusual. So we surveyed a lot of data. We had a lot of basically routine surveillance to do and some ad hoc analysis to do as well. And so it may be the case that we would see something and we could show statistically that it wasn't in line with normal expectations. And, but then that doesn't tell you why it's different. Okay, so I'll take an example. One of the key things we were doing was looking at the quality of the response. So to speak in real time, there are different indicators that you can track to make sure that your, for instance, not meeting cases. And one of the things we realized was we were actually, in one location specifically, we were missing a lot of the cases. In fact, 95% of the people who matched the APKS definition and should have been tested for Ebola, that is we weren't sure, but we suspected they might have Ebola and then he testing. 95% of the people who were tested we should have been tested, sorry, were not. So out of a hundred people that could have had Ebola that we should have tested, only five were tested. Now we've liked this, we don't know why. Maybe it's artificial, maybe it's data quality, maybe it's something else. And this is where the field epidemiologists are gonna be very important, but also the social scientists because these are the people who can actually investigate on the ground, ask questions in a rigorous way to look at Ebola perception, a response perception and see what makes this, what the origin of this problem is. In this case, it was a mixed back but mostly untrained staff on the ground. So they were actually missing these cases. And it was revealed that they didn't know what the Ebola definition case definition was or were not applying it well. They were retrained and then that problem was fixed and we actually saw the improvement right after. So in terms of how we worked, some of it was this kind of like routine surveillance stuff and some of it would be literally the head of the response. So that was Michel Liao at the time at the WHO who was like the incident manager that is the leader of the response who could come a morning and say, well, we need to know the Mambasa Center, the one I just showed in my talk. I got destroyed, we need to rebuild it. How big do we need to rebuild it? Is that going to be sufficient? And can you say something with modeling? So it was a bit of a mixed bag of proactive things and just answering questions. I don't know if I answered your question well. Yeah, no, that's good. So another question, this one's from Doreen. So if I remember correctly, Ebola is spread by blood and bodily fluids, COVID is respiratory. How much of the work that we talked about today and in general is applicable across both or is it fairly different? Yeah, that's a great question. It is a bit different. Anything that's airborne is bad because it transmits just so much more easily. So in the case when you have a disease like Ebola which is exactly what I said, bodily fluids, it's going to be very, very close contacts that will, you know, person to person transmission. The other thing is there's no asymptomatic cases or very, very few. It's a very, very virulent disease. And so in terms of control, it's going to change the game a little bit. Yes, it's very, very deadly, but if you miss cases, that's because you haven't surveyed these people. It's not because there's an asymptomatic carrier that can infect a dozen people without you knowing, without them knowing. So in terms of intervention, it's going to be very different. For instance, a lot of the Ebola intervention revolves around contact tracing. That is, you try to reconstruct who infected whom and who a given case could have, who could they have infected? And it's quite, it's a lot of work. It's very hard work, but it's doable because you can, you know, the symptoms are very clear. You know, it's going to be vomiting. It's going to be diarrhea. It's going to be bleeding and explained bleeding. So you can ask a patient, okay, have you been in contact with somebody who had these symptoms over the last, you know, couple of days, weeks? And usually they'll be able to tell you, in the case of COVID, you can, you know, contact tracing is tremendously harder. And yeah, so yes, it does. The mode of transmission changes the dynamics. And it's not just the mode of transmission. The length of the incubation period changes a lot as well. And the overall infectiousness of individuals. I mean, the famous reproduction number also changes things quite a bit. Thanks. Another question came in through private message. You seem to have knowledge both technical and medical as well as data. So I guess you can think of three different areas. Can you talk about, can people enter this field without a lot of medical knowledge to do health analytics? I think it's the focus of the question. So can you do health analytics without, yeah. Yeah. First thing, disclaimer. No, I don't. Sorry, light is in my face right now. No, I don't really have any medical knowledge. I kind of like picked it up. My background was biology and ecology. And then I got into bio stats kind of by accident. So yes, you can, whatever skills you have for data analysis, you can be useful in this field. It could be just software programming. It could be that you are a pro at making interactive graphics that are very informative, or you're into very computationally intensive approaches. There's such a need for skills at the moment. And nobody's got all the pieces of the puzzle. Field epidemiologists are getting into data analysis, but it's not their trade statisticians and mathematical modellers don't really go to the field and don't necessarily have an awareness of how the response is on the ground. You know, yeah, we need a lot of different profiles and nobody's got all the pieces of the puzzle. That's good to hear. Maybe some of my students will participate. Another question is, how do you set up objectives for each of your models and how do you measure model performance? That is a great question. It really depends what we do. On this, on the example of the Ebola outbreak in North Kivu, you make predictions of, one of the key things we were doing was like saying, this is how many cases we expect to see in three weeks time. Not further than that, because then it's just science fiction and nobody can predict that. But we'd say, well, we expect, on average, 10 to 20 cases a day. And then people would go like, no, there's no way. And then it happens. And so that's your measure of validation. You basically see whether or not what you predicted was right. And there's tons of ways to evaluate forecasting models using simulations and any kind of methodology, really. So sometimes you just simulate the outbreaks that you know everything about and then you see if your method is able to reconstruct things. But the problem is the reason, very often the reason why a model is gonna be wrong is not necessarily the methodology in itself could be, but quite often it's just reality is more complicated. And so something changed in the response and now we are not picking up cases in this place because the response is not happening because of security reasons. Well, you see cases going down there artificially, no way your model would have predicted that. It's just the way reality works. It's complicated. Another question. What do you think the most important skills are from a data scientist perspective that wants to kind of, I guess this is from an aspiring data scientist. So what skills would you encourage them to kind of focus on achieving to contribute in the field? I'd say our programming, but I'm very biased. I mean, I created a consortium which is called the RPDNX Consortium for a reason. So I'm definitely doing propaganda here. The reason why we're using R is in terms of collection of packages for data analysis, it's probably the most complete that we have currently especially for RPDNX. I'd say that first, because if you code decently and cleanly in a way that's readable by other people, it's kind of like the technical limitation if you want. So if you struggle with this, then making nice visualizations is gonna be tricky, creating new ways to test new statistical tests or any new thing you want to do is gonna be a bit of a struggle because there's gonna be that technical layer that's gonna get in the way pretty much all the time. So I would say like get this out of the way, be good at coding in a way that's reusable by other people, that's clear, that's simple to read, that's probably the most important thing because that's like the prior to like everything that follows. Great. So you mentioned number of beds as being kind of a key parameter that you were kind of looking at. Could you talk about some other important attributes or parameters while you're kind of doing analysis? Sorry, I didn't get the beginning of the question. So you mentioned the number of beds was part of your model. I think you were talking about when you were gonna rebuild, I think one of the places. So I think the question focuses on what other key parameters that you were trying to predict. So number of beds, but are there other things like medical supplies or whatever the case might be? Yes, yeah, absolutely. I mean, tons, I couldn't list everything. And sometimes not just prediction, it's assessing what's happening now, like the example of like alerts, like testing being insufficient, but also just checking that for instance, people who get tested, there's no systematic bias based on gender, which could happen, which could reflect a differential access to healthcare, which would be something we really want to avoid or age patterns, or sometimes it's gonna be just about describing what's happening and sometimes it's about prediction. So, but like, yeah, a role in transmission, who is creating the most secondary infections in the case of Ebola and the cell break was a lot of community death and funeral exposure, that is people washing the body of the dead and getting contaminated that way. And you could have just a burst of transmission, so that was one of the big thing we're trying to assess. The growth rate, how fast is it growing? Mortality, is the mortality rate the same across all our treatment centers? Do we have treatment centers that are worse off than others? Trying to assess spatial spread, what fraction of the infection in a given place comes from somewhere else? So, it's very context dependent. It's not gonna be the same for two different outbreaks. Okay, another question came in, I'm gonna focus a little bit on this. So, were there a lot of attributes that you had to encode or transform? So, I think that question is really about feature engineering as you're doing machine learning. So, I think they're curious a little bit about, did you have to do any feature engineering? Was that feature engineering consistent across the different models you were using and things like that? Yeah, it's funny, because it's not the terminology I'm super familiar with. I'm more of an old school statistician. So, feature engineering, yes and no. Very little in the way of ordination in reduced space. So, principle component analysis and that kind of stuff we didn't really have to do. Most of the, what would fall under feature engineering may be a lot of data cleaning. Data cleaning can be immensely complex. In fact, I have some colleagues who actually have, Bayesian methods trying to fill the gaps in a data set when you see things are just not realistic like people dying first and then showing symptoms. And so, a lot of the data pre-processing if you want before modeling was down to cleaning. A related question is about collecting and cleaning data that's being collected by others. So, I think there's a question here kind of implicitly that you're modeling not just on data that you as a data science team have but you're collecting from the field or other areas. So, can you talk a little bit about that collection process and the challenges of that collection process? Yeah, I want to talk about the collection process because as you say, I don't collect data myself and it's not just the data that somebody else is collecting, it's their work and that's important to acknowledge because that means one of the things we did, I think Porti was having people who were on the ground collecting the data for a given thing on board from the get-going when we were analyzing the data and because they've got the awareness of context. So, one of the good thing that we had being still close to the field were at the Emergency Operations Center that is the coordination in country in the middle of North Kivu but you've got many different field stations that's where the transmission is happening and you definitely need to collaborate closely with these people and I think that's probably true of like most data science you need to, you need this awareness of the context and the limitations of the data and you need that close collaboration with the people gathering the data if you want to make sure you understand what's going on. Thanks, somebody wants to go back to the discussion of programming in R and basically I think the question is pretty much most of the analytics and modeling done in R or is Python also an important language from a technical perspective as you're doing kind of outbreak analytics? For outbreak analytics, Python for now is not really a thing but if you do bioinformatics, Python is gonna be the thing, you know? If you process data that's the sequences, speed out and you want to assemble genomes and that's, you know, Python is gonna be all the way. So I don't think the language features themselves justify that it's just what tool are around and what the community is using as well. So it could be Python, Julia, R, whatever but it so happens that most of the tools we need currently are in R. Yeah, I think that's good to hear I kind of tell our students that they need to learn both because depending what context, you're gonna have to use one of the other and you might not get to choose. I think we're kind of running out of time, we have one more. Let's see, oh. So I'm gonna read this one directly. So how does working on solving an epidemic impact timelines? I assume you need to validate models. So how does that work? And does that change obviously when lives are at stake? So a lot of data scientists are working on things that lives are not at stake but how do you validate when you know it? Every day that you validate is costing lives. Yeah, it's a very hard question especially if we're oppressed with time but it's a great one. You kind of need everything for yesterday or the day before. So technically what happens is teams work long hours often at night to make sure that we finish things in time because there's this drive that for once you believe you're doing something that's marginally useful. And validation is a real problem. So especially when you do research because my job mostly is to develop new statistical methods. And if you develop a new statistical method you don't really have time to test it really well. Certainly you don't have the time to publish it in a peer review journal. And so that's very complicated. We don't really have the answer for now we basically try to emulate the process of peer review with colleagues that we could pressure for time that could basically look at things overnight and give us some feedback on the next day. But it's a real problem. It's been the same for COVID I must say. Much larger teams of academic researchers and still the problem of validation is a huge one. Yeah. Yeah, and maybe we'll end on one question that tries to summarize a bunch of what you said which is we've talked about lots of different challenges lots of different things you're working on. You can think about going over the next year to what's like the top problem that you are hopeful the field will work on and try to solve. That's kind of like the thing that I was pointing out in my last slide. I've been trying to develop our packages for our book analytics for nearly 10 years. What you got typically was a little pat on the back from the funders saying, oh, that's great. It's super important somebody does it but you don't get funding for it. You don't get recognition for it. It's not an academically rewarded thing. You need papers and grants in academia which is where I work. And so we like I've seen since 2009 the flu pandemic the tools we need for data analysis are we missing the tools were missing then many of them were still missing nowadays. And so we finally got funding for this. And so I think that's gonna be the top priority now. It's like the stop reinventing the wheel every time there's an outbreak because there's always gonna be outbreaks. And we need to have a solid toolkit to do the analysis and not just to recode everything from scratch every time there's a new outbreak somewhere.