 Hello, everyone. I'm Joseph Rickard from RStudio and RConsortium. Welcome to the COVID-19 data forum webinar. This is the first event in what will be an ongoing series of conversations to examine issues around the data required for medical professionals and public health officials to make informed decisions about the COVID-19 pandemic. In this discussion and future discussions, we hope to assess the data that's currently available, identify what's missing, recommend best practices for quality, accessibility, sharing, and privacy. Today, we have four speakers, Allison Hill, Brian Hafen, Orthon Eiden, and Noam Ross. All of our speakers are senior researchers who individually have significant experience in data acquisition, data curation, and modeling, and whose collective experience covers a wide range of perspectives. So after some opening remarks by Michael Kane of Yale University's Department of Biostatistics, each of our speakers will have 15 minutes to make their presentations. We'll have a Q&A session after the final speaker, so please write your questions using the Q&A box at the bottom of your screen. And now let's start. Please welcome Dr. Michael Kane. Hi, everyone. I'm Michael Kane. Like Joe said, I'm an assistant professor of Biostatistics at Yale University, and welcome to the COVID-19 data forum. This is the first in a series focusing on data-related aspects of the response to the COVID-19 pandemic. The pandemic has challenged virology and epidemiology, as well as our policies regarding outbreak response. The pandemic severity rose abruptly, and especially in the beginning, we were contending with tremendous uncertainty with respect to the disease and its transmission. As a result, we did not have a lot of time to plan proper studies, put data management infrastructure in place, or to coordinate unified data collection protocols. A lot of the data collection resources we created are ad hoc, and it's important to realize that many of these sources are collected by healthcare workers who are working very long hours and exposing themselves and the people around them to potential infection, so that we can do things like estimate how many beds or ventilators are going to be needed by a given hospital in the next week. These other related data sources like them provide a basis for which we understand the disease and its spread. At the same time, we need to acknowledge the limitations for the data we have. For example, we know there are regional differences in how cases and deaths are accounted for COVID-19. This can make fair comparisons across regions difficult, and identifying these differences early in the data collection and validation process can prevent inappropriate comparisons that could eventually result in bad policy decisions. We need to allow researchers to use these data and disseminate them in a way that is open while maintaining individual privacy. The model of collecting data and releasing it years later, only after it has been mined for all possible publications, only slows our collective response to the outbreak. Data collection and validation often requires intellectual efforts, and those efforts themselves should be properly incentivized. This forum was put together to embrace these challenges and focus on the acquisition, validation, standardization, normalization, harmonization, and dissemination of COVID-19 data. These steps are often overlooked and even marginalized in favor of the more easily publicized results of the models relying on them. However, I think everyone here realizes that the data preparation and exploration is where we spend a lot, if not the majority of our effort. And without a properly cleaned and validated data set, there is no reason to believe models derive from these data or put any stock in the decisions these models inform. So the goal today is to shine a spotlight on the process of creating high-quality COVID-19 data sets so that they can be explored, visualized, and modeled to better understand the disease, its outbreak, and to inform better decisions for prevention and treatment. Our speakers are expert practitioners in these areas, and we're eager to hear about their experiences and insights, and to find out about the data resources that are available. And with that, I'll hand it back to Joe. Thank you, Michael. Okay, our first speaker will be Dr. Allison Hilt. Allison is a mathematical biologist who develops models to help predict, understand, and control the spread of infectious diseases. She is currently the John Harvard Distinguished Science Fellow at Harvard University. And until the coronavirus outbreak, she was focused on HIV AIDS and drug-resistant infections. So please give your best Zoom welcome to Allison Hilt. Thanks so much, everyone, for joining us, and thanks for the introduction. So today, I'm going to be telling you a bit about from the point of view someone working on modeling COVID-19 spread and control about the data needs and challenges as I see them. So I always like to start off my COVID talks with just this like big picture summary of the epidemic, which I know you're all familiar with, but I just think it's really important to remind ourselves of some very sobering facts about this outbreak. So that virus we had never heard of five months ago has now infected at least four and a half million people around the world in probably every single country. And based on my estimates, probably already is going to be within the top five viral causes of death worldwide this year, even just counting the last few months. So really, this is the crisis of our time. So I think, as Joe mentioned, in my day job, I do work on infectious disease modeling. But before this outbreak, I focused on HIV drug-resistant infections. But many of the principles behind modeling infectious diseases are something that we learn and apply and can be put into use of many different scenarios. So like many other people in my field, as soon as this outbreak began, there was a really big demand for infectious disease modeling expertise, and somehow ended up that now I'm almost exclusively focused on working on coronavirus modeling. And one of the things that we've worked on is that in early March, we released an online modeling application that allows users to simulate COVID dynamics from their browser and compare the effects of different parameters, adding interventions, comparing these to healthcare capacity. It's open sourcing can be adapted for a lot of different purposes and has been used as the foundation of models that have been created for a lot of different purposes by regional health departments or other government agencies or consultants, educators or for other web applications. And so kind of based on this, we've ended up working on a lot of different projects with many different agencies, all trying to answer the same questions about what is going to happen with COVID and how can we control this outbreak. So I just wanted to highlight what I see of some of the main roles that modeling has or can play in the COVID-19 epidemic. So maybe even the simplest thing is just making really short term predictions. So many people just intuitively don't have the best grasp of the concept of exponential growth and even really simple models that highlight that fact and how quickly infections can spread. I think it's very helpful early in the outbreak to make people realize how quickly things could change. I think they played a big role in highlighting the risk of healthcare capacity overflow if the epidemic was allowed to spread unchecked without released severe interventions. Models obviously were very instrumental in getting across this idea of flattening the curve, which has now kind of just become a word that everyone is familiar with. I think, as I said, they motivated this implementation of strong interventions kind of both qualitatively and quantitatively by saying really how much do we need to reduce spread to keep our healthcare system under capacity models when designed right and when kind of properly accounting for all the uncertainty can be used to project the course of the epidemic kind of beyond this initial phase into the summer fall, even subsequent seasons. They've been used to estimate the impact of seasonality and whether that's going to play a big role in this outbreak and of course trying to estimate the total burden of infection and how that relates to kind of what we're seeing in our reported cases versus what might really be there. So I wanted to go through some of the kind of ingredients of models for COVID-19 and then for each of these ingredients, I'm going to talk a little bit about what type of information we need to make informative models kind of focusing on each particular part. So one important part of any disease model and particularly infectious disease models is having a good idea of the clinical course of the infection. So from that, I mean, what happens when someone goes from being susceptible to getting infected? So how long does it take them until they actually start showing symptoms, being infectious to other people? And then what are different stages of infection they pass through? For example, do people have asymptomatic infection? Do they develop certain types of symptoms? How long do those last? Do people recover or do some people eventually develop more severe infection that might require hospitalization, ICU level care? What percent of people end up dying? All of these components are the first ingredient that goes into an infectious disease model. So we can think about how individuals in the population might transition between these different states and the time scale in which that happens. Another really important ingredient is transmission networks. So what this means is trying to understand who transmits to whom, who can transmit to whom and who did transmit whom. And obviously there's a lot of unknown information in these transmission networks and this is really one of the most difficult parts of infectious disease modeling. So I'll talk more about what we know and don't know there. Looking at health care resources that are available has been a big part of modeling COVID-19 in terms of what do we have available in our health care system? How does that depend on where you are and how does that relate to what resources we might need kind of depending on who progresses to which stage of infection? Then of course another part of models is looking at interventions. There's a huge diversity of different types of interventions but trying to use models to say which interventions might work better than others. How strong do these need to be? How long do they need to be implemented for? Can we estimate how efficacious they are retrospectively? That's another big ingredient of models for infectious disease outbreaks. So what are the data needs for each of these different categories of things that go into COVID-19 models? So maybe focusing first on the kind of clinical course of infection. So how do we really parametrize a model to take these things into account? So the things that are needed here are understanding the duration of each stage of infection and its variants, the probability of progressing to different stages, how many infections might be asymptomatic versus symptomatic, and the infectiousness at each of these different stages of infection and also how it might relate to the viral load that we could detect in someone to their age etc. So I think the gold standard to get this type of information is doing really detailed and large-scale cohort studies meaning there's a group of individuals who are followed throughout their infection through the whole course. Also having contact tracing studies meaning people who have been exposed to someone who was infectious are all surveyed and tracked and kind of see to see if they develop infection or not so that people can really be followed from the very beginning of time and that you might be catching asymptomatic people as well who might not naturally present on their own to hospitals and of course universal and centralized reporting. All of that information makes it much easier to back out these types of information. But of course we don't have this type of information in many settings including in this country. So we're missing a lot of information on even this like basic first ingredient of models and that type of uncertainty is problematic for everything else that goes through. So I think we have good estimates of these quantities from some detailed studies that have come from earlier in the outbreak but more systematic measurements of these kind is really needed and it's very difficult to estimate all of these quantities just from kind of population level data with individual level data. It's much more reliable way of getting these things. So then the second component here that I mentioned is these transmission networks. So this kind of means two different things. One is like the potential transmission network as in who is it theoretically possible that I would be able to spread to if I was infected and then also kind of a realized network like who did spread to whom over the course of the outbreak. So this is often the hardest part to get for infectious disease models and there's been lots of advancements here but there's still lots unknown particularly for this infection. So we want to understand who contacts whom, where they contact each other for how long and how often. We want to understand which type of contacts are most risky. Is it related to physical proximity whether it's indoors or outdoors, the duration of that contact, the indirect contact being touching same surfaces different times or similar times. We want to know what setting is most important for transmission. So even like how important is transmission within households versus at work versus visiting common retail spaces and of course you know that the answer to that question might depend on whether it's before or after an intervention might depend on the location age etc. Of course how important is transmission in hospitals how much is that contributing spread. These are all really important questions for making good models. I think it's not clear what the gold standard is in this case I think some very good ways of measuring such things are we call contact surveys where we ask people lots of questions about who they interacted with in a given period. Proximity tracking if you know able to be done sort of dealing with all the privacy concerns is a great way to try to understand contact between individuals contact tracing in the epidemiological sense meaning kind of retrospectively if someone is diagnosed finding out who they came in contact with and then genetic epidemiology where by comparing the sequences genetic sequences of viruses between individuals you can try to trace back who the virus jumped between. There's many challenges for these type of estimations including of course privacy, tremendous resources, needed infrastructure for reporting these type of things and much more. So one of the unique features of COVID of course is that a high fraction of people require hospitalization and ICU care and so particularly early on in the epidemic but should still be now a big part of modeling is understanding these resources and what we have and what we might need. So things that we need to make good projections here understanding like like I said when I talked about the clinical model what percent of cases require different levels of care how that depends on age comorbidities. We want to understand what capacity the healthcare system has for baseline and and with surge capacity particular providing for particular resources but also for other things outside of the hospital like personal protective wear for the everyday public. We want to understand this at a geographic scale could depend through in a very detailed way on whether you're in a city or a rural area and particularly this is important in low-income countries and we want to also understand how people access care and the impact of COVID on non-COVID related health concerns. I think a gold standard for this is you know probably national databases that track in detail all of these medical resources where they are how they can be mobilized of course real-time reporting of resource utilization but you know in general we don't really have these things so the challenge here is going to finding and compiling and standardizing alternative data sources to estimate these type of numbers. Then of course looking at interventions and here when I say interventions I'm meaning what we tend to call non-pharmaceutical interventions so I'm not talking about like giving people you know drug treatments or vaccines for COVID we don't really have anything effective there yet so I'm talking about anything from mask wearing to isolating cases to quarantine school closures closing retail establishments having work from home policies stay-at-home policies complete lockdown all of these types of things that have been implemented sort of to some extent in in many places around the world so the questions that come up here are like what is the evidence base for these interventions even tracking which interventions were implemented when and where in some standardized way be very helpful the big unknown is understanding how much interventions reduce contacts that are relevant to transmission what level of adherence people in the general population have to these interventions how that varies spatially temporally and then are these interventions working and like which ones are working so in this case I think of course a gold standard would be like a randomized control trial of particular interventions and of course like we can never do that so automatically there's many difficulties in understanding a lot of these questions surveys can be very useful and understanding how people's behavior is actually changing and everything that I talked about related to transmission networks is also useful here because if we can track how the virus is spreading and how that changes before and after interventions that helps us understand what's working so the challenge here is kind of relating if we have to use other data sources to try to estimate what an intervention is doing for example looking at how much mobility is changing how do we actually relate those alternative data sources to something that we put into a model in terms of what percent of transmission is reduced and so that's that's I think the big challenge with trying to understand interventions so just wanted to quickly say that of course there is some data that we have a lot of and this is like reporting of cases and deaths on many different databases the pros of this are that it's really easily accessible and it's centralized sources but there's many cons to this type of under reported population level data which does not have individual information and is not really catching everyone so that's all I have for now excited to hear from the other speakers just wanted to say thanks to many others in my group and collaborators at other universities who I've thought a lot about COVID things with over the past few months and of course to our funding sources who have been very flexible in allowing us the time to work on these problems and I'll be excited to take your questions after all the speakers talk thank you so much dr hill so next up is ryan haven ryan is a data scientist working as an independent consultant a consultant with the preview group and an adjunct assistant professor at Purdue University in his work ryan focuses on tools methodology and applications and exploratory analysis visualization computational statistics statistical modeling and machine learning so please welcome dr haven all right thank you just making sure I'm unmuted great so thanks for the introduction Joe I'm going to be talking today about COVID-19 case count data for multiple open sources and my slides are available online there's some detail in the slides I probably won't get into in the talk so feel free to look at these at your own leisure for a little bit of context this work has been joint with the epidemic intelligence from open sources project which is managed by the world health organization as well as in conjunction with a large global health nonprofit based here in the state of Washington where I live and this group does a lot of things and I've been working on a number of things with them but I think in interest of this being kind of the first of these series about COVID-19 data it would make most sense to focus on case counts and we heard just just heard a little bit about case counts and so and you're probably you know if you've been following COVID-19 at all you're probably very familiar with these most of the thousands of dashboards and visualizations that you'll find out there rely on case count data the EIOS in particular is interested in case count data providing that to their public health intelligence analysts to help them both quickly understand the trajectories as well as discrepancies between different data sources it's important as we've heard to understand limitations of different data sources and just have a better feel for for what's going on in the data so that's what I'm going to be talking about and I want to point out some of the sources that we have been using at the global level I think the most well-known source is as data published by Johns Hopkins there's also your PNCBC, WHO and even news sites you know like Worldometer which is kind of an obscure news site but surprisingly very up-to-date and actually updating data by the minute with sources that we've identified as you know this and a few other news sites have been identified as places where we can get quick information to to compare with other sources of information within the United States some sources include again Johns Hopkins New York Times USA facts and of course there are more than what I'm showing here but there's some links to those for people who are interested and with this being a forum mostly around data I think it's worth spending a few minutes I could spend this entire talk talking about data standards especially whether it relates to open-date communities I'm just going to briefly touch on a few things and a lot of this has come up through the experience of pulling data from multiple sources and just getting a feel for the breadth of different ways people decide to publish data and you know when you think about standards when you're talking about open-data communities and people choosing on their own to publish data a lot of times not much thought is given to standards and even when it is people have everyone has a different idea of what what standard should be I'll give you some ideas in this talk of what I think but of course everyone you know everyone has has different opinions but also there's often a little incentive to adhere to someone else's standard if you have data in a format that you already are analyzing you have code or you have an app that uses that data format why would you want to take the time to perform someone else's standard right but I think regardless of all this we should at least think about what are best practices sharing data and I'm going to start with what I'm going to call an example of not best practices but bearable practices and I'm going to focus on the data published by Johns Hopkins and this is you know as I mentioned the most widely used and cited source of data so I hope it's fair to just pick on them a little bit but first I do want to say they they they have done a really great job of getting data out there early and often and they've succeeded in making it very accessible you know to csv you can go get on github but there are a few issues that I have with with the way this data is published and I just want to maybe tactfully hopefully tactfully go over a few of those issues because I think it brings up some good points of discussion like I said I could talk a lot more about this but but I'm just going to highlight a few things first of all the data is published in a wide format so every date gets a new column and that's really not amenable to analysis I don't know many people or anyone who analyzes data in wide format typically you need to pivot that into a long format also it's not very amenable to version control every time you update your data every row of your data changes and so it's hard to see what's new the date format I really you know everybody should use the ISO 8601 date format this is an ambiguous format difficult to parse someone could make a mistake parsing it also country names are used as a geographic identifier they do have a lookup table that you can then merge this to geographic codes but things change you know you're just introducing more chance for errors also you'll notice for Australia for example they report data at the territory or state level whereas for other countries it's not broken down in that way and ideally I think you should have a separate file at the country level versus the state the state level and they also have a different file for each variable this is for cases they have another file for deaths another file for recovery ideally again this should be a tidy format one column per variable all in the same file with rows for each date and country one other thing I just want to quickly call out is the terms of use and license is an important thing to think about when you're publishing open data I'm not a lawyer or an expert in the area of data licenses so I won't try to make too strong of a statement here but I think when you have a non-standard and terms that are too restrictive that really can impede the progress of science you know someone could read these terms in a way such that anyone who's using this data is violating the terms of use because they're redistributing it in some form or another right so I think thinking about licenses is an important consideration now to pivot to an example of best practices going to the data published by New York Times they kind of hit on all those things that I just mentioned they publish the data in a tidy format a column for cases a column for debts rows for each date and here we're looking at states and they're using a standard geocode mechanism to identify the states they put state and county level data in separate files and they're using a license that is coextensive with Creative Commons 4.0 international now like I said I don't want to I don't want to belabor this too much and you might think some of these issues aren't that big of a deal I can just transform it right but you know so reading in this New York Times data and getting it into a form that I can work with is a matter of a few lines of code where I'm just renaming variables and whereas the Johns Hopkins data we're talking over a hundred lines of code to read multiple files pivot them join them resolve geographic entities aggregate to the different levels of geographic resolution and it's not just about my efficiency it's about the chance of introducing errors as well as you know there's hundreds if not thousands of other people pulling this data and having to do the same transformations a lot of you know kind of collective efficiency and accuracy of potential issues going on there and so I think the more fluid open and accurate we can be with data especially in an event like this the better with that I do want to just talk for a minute on what I've been doing with this EIOS group and really it's just been pulling data from sources and providing their analysts with tools such that they can you know really get a feel for what's going on in the data and so we've been pulling from the sources I mentioned rolling up counts at different levels of geographic aggregation and computing statistics of interest and I do want to mention that what I've been working on with this group has been used privately by their analysts for since February but in any day now was going to be public and so I can share that link when it's public but out of respect for it not being public yet I'm going to show an example using US data that's a similar approach to what I've done with this group just not at the global level and so in addition to pulling data and now at the US level I'm pulling from Johns Hopkins New York Times and USA facts we provide a set of visualizations for each geographic entity that the user can interact with such as you know cumulative cases over time new cases aggregated either daily or weekly new deaths and case fatality rate and after providing them with these visualizations we put them into a tool called trellis scope if you're familiar with trellis scope it's an R package that I've developed that actually turned out to be exactly what they had in mind for wanting to interact with case counts so ended up being a nice fit that I was working with them and so we can throw these case counts in to this tool called trellis scope and there's a link here I'll just give you a fill for what this looks like real quickly so with and this is you know you can go visit this URL and explore it your own leisure as well this is just one of another many COVID-19 visualizations but we have things at the state and county level so I can click on states here it's going to pull up these visualizations by default ordered by current number of cases from high to low so New York shows up first followed by New Jersey and I can kind of you know page through these visualizations and get a feel you know most of the sources agree in the US there's a much bigger difference as you get things globally but you'll see for example Johns Hopkins is is quite different in Rhode Island but I can also toggle these visualizations to say I want to look at new cases and maybe I want to switch to weekly and I can see how you know in New York cases are going down again of course there's the caveat that you know testing rates are changing over time and all of those kind of things but there's different views here where I can say I want to order by weekly percent change in deaths weekly percent change in cases where I can see South Dakota has had a big you know uptick there's all kinds of kind of default ways you can you can view things but you can also go in there's controls that allow you to do further filtering and say I only want to look at states where the current number of deaths is at least 50 or something like that right and so there's all kinds of ways that you can you can interrupt here another thing that you can do is you know if you're interested in say let's say Alabama you can actually click a link and go in and look at counties within Alabama or I can also look at all counties and so here I'm looking at the 67 counties in Alabama if I'm interested in you know looking at all counties ordered by the change in cases then I can anyway there's all kinds of ways to interact here I think in interest of time though I'm going to just pivot back quickly to an idea that we've been working on taking what we've developed and thinking how can we go a step further and take aspects of this including data standards and build something that can benefit the general data community as well as make apps like this more robust to deal with future events so we've been working on something we're calling the COVID-19 data registry and actually what I just showed you in this application is based off of this framework that we developed with the idea being you know people are going to publish their public data however they want to publish it but that doesn't stop us from defining standards and schemas and providing transformers that pull from those sources transform it to the way that our analysis or our app expects things to go and we're doing this in an interesting way where we're using all of this is self-contained in github we're using github actions github repositories where the code and the data goes and all of the compute goes so that it hopefully is easy for a community to contribute to you're not reliant on outside compute resources and then things like that I think that's all the time that I have so I'm happy to you know discuss this more with people who are interested but I think there is some interesting potential future work around this framework of building schemas for different data types pulling those in and augmenting our analyses and interfaces to deal with those and with that thanks for listening and you know if you're interested in following this effort or collaborating feel free to get in touch thank you thank you ryan okay now our third speaker our third speaker is dr. orton eigen orton is a senior researcher at the environmental systems research institute and a faculty member at the spatial sciences institute at the university of southern california he's also the product engineer for our bridge a geospatial our library that integrates our analysis with geographic information systems he holds a masters and a phd from the stanford school of earth sciences so please welcome dr. idan thank you joe um today I would like to talk about the spatial and spacetime aspect of COVID-19 and available data sources I would like to start my talk by reflecting on the evolution of the media that is being used to study and communicate pandemics on the left is an infographic from 1980 Spanish flu here the main media were charts depicting the curve that we've been hearing so much about even though there were epidemics and pandemics between 1918 and 2009 I will quickly jump over to h1n1 swan flu pandemic and we start seeing more common uses of maps to communicate information to the public now I'm not here making a statement saying that in the past people never use cartography to map diseases this has been done for uh for a very long time but what I'm pointing out here is uh more and more we are using maps to communicate information related to pandemics to everybody so that they can they can they can be knowledgeable about what is going on and lastly we have the jhu dashboard that I'm confident that most of you have seen and previous speakers have also talked about for COVID-19 on the right hand side which is an interactive map that contains rich information both in space and in time so this is a spatial temporal data source and unlike a static image a static map this is a dashboard that you can interact with to look at parameters that you're interested in to look at temporal evolution to look at time series of the given location um this evolution points to a certain fact that two of which is one growing importance of spatial and spatial temporal data and analysis in a planet that is more connected than ever and the immense value in sharing data in a format that is actionable and when we talk about creating maps like these we can talk about five main steps of course we can break this down further but these main steps is first mapping the cases here data that is being collected by local international authorities firstly this data needs to be curated cleaned and put in a format um that allows mapping it this process involves um projections um cartography and finally mapping the data in a digital form that can be shared with everybody of course the raw data is rarely actionable even though it can be very useful for understanding what is going on in the case of cobit 19 we need to represent the temporal information in a way that can so that we can map the spread of this disease here the time component in the data has utmost importance and we are mapping the spread in addition to raw data here we can also think about model data outputs such as uh epi models that's um Ellison talked about initially being able to map these in space and time gives us an idea about how the pandemic might progress as time goes along certain demographics are more vulnerable than others and uh very aware of that in this stage in addition to risk mapping the cases and the spread mapping vulnerable populations becomes very important and creating risk maps are very important for um decision makers that both at the international and local level and we know that a pandemic creates a heavy load for hospital researchers such as medical for medical professionals hospital beds icus ventilators just to name a few in addition this has a very deep impact for our supply pipelines as well the way we get our goods um and the global economy so being able to map these available resources being able to map what is at stake being able to map shortages becomes very important and this fourth step mapping available resources and combining with um mapping the spread and future projections become very important and last but definitely not least is the communication portion of it where we communicate the current status of the pandemic with everybody of course all of these steps rely on data and of course there are some challenges related to data the first and foremost being uncertainty uh case data for example has uncertainty associated with it as and of course as the spatial extent of this data increases from a hospital level to a county level to a country level um we see that the uncertainty on this data also increases um sometimes this is due to reports of some um measurements errors some mistakes during reporting or systematic data suppression secondly COVID-19 also impacts us on many aspects of our lives data pertaining to these such as hospital resources trade flows employment comes at different scales both temporally and spatio temporally so being able to resolve these different scales in data is also a very important uh challenge um going forward representing data spatially and temporally again is a crucial step um as I said the impact of the pandemic on our society and its progression is a spatial temporal phenomena um being able to represent the data in a spatial temporal format that we all can use through data aggregation and spatial temporal representation is key in addition to uh challenges with scale some of the data on the impacts of COVID-19 comes in different formats such as tweets uh proximity recorded by cell phones um and here resolving these different types of data is quite important to be able to use them in conjunction with a uh with a multidisciplinary within a multidisciplinary model lastly the pandemic and its impact on our lives is evolving and this necessitates the ability to serve and consume data in real time because the data here is dynamic um not casting short casting is important as a forecast is at least as important as forecasting so being able to make short term forecasts as well um so here curating serving this live data becomes quite challenging because we know how challenging it is to curate data on its own um but now that we're dealing with live data um dynamic data sources how can we curate those so that we can serve them on real time which is what is needed at this point I would like to delve into some data requirements um I think we had a great overall view of epidemiological models here these models are being frequent frequently used to understand the progression of the pandemic um and what the future numbers are going to be these models are particularly useful to port forecast the progression and also modeling the impact of interventions um data on population demographics is particularly important to model the susceptible population um and spread of the infection is modeled with different metrics such as the attack rates the infection rates also important are the dynamic information on the progression of the infection in particular the timelines of incubation infections and co-invalescence and of course if you want to create a regional model you need to have this data at the regional scale or at least the spatial units at which this region is discretized on and some models require data on hospitalizations a number of COVID-19 patients coming in the death rates um among hospitalized patients their length of stay at the hospital and possible length of use of ventilators and stay at the ICU and this data is is also has some uncertainty associated with it it is incomplete but this is an important aspect of the data that we need for these models and on the intervention side we need data on the intervention type and its effectiveness um on the rate of growth of the infection of course this is very hard to gauge and we use secondary uh sources to be to be able to do that at a large geographic scale that that has some uncertainty associated with it um but this is again a required input for the epi models so these models predict the demand on hospital resources and we need some data on available resources so that we can plan for allocation of these limited resources one thing we do not want to have are shortages if a COVID-19 patient cannot get the care they need due to lack of staff beds, ICUs or ventilators the death rate can go up uh quite quite sharply and because the population now cannot be a part of the population who's in fact it cannot be treated so in addition to hospital resources we need it by patients there are resources that are needed by medical professionals such as PPEs uh personal protection equipment and um having resources for these is extremely important as well and this is a data that we need to have to plan as um as this pandemic progresses um protective equipments are things such as gloves masks and gowns that are generally in short supply that can that can be in short supply and these can limit the number of active professionals treating patients or cause more health professionals to get infected while treating this virus. Bed, ICU and ventilator data is reported by hospitals and authoritative spatial data on these exists but of course the temporal aspect of this data is still lagging we do not know how these numbers are changing day by day especially not in not at a county level we do not have this type of resolution yet and also PPEs are a very unique supply chain challenge where we need data about source to transport and the destination of these materials so that we can plan accordingly to make sure that healthcare professionals have the equipments they need. So we have been compiling geospatial resources data resources and there's a link for our disaster response hub at Esri you can access data that is live uh data is curated um and also is served to arrestful APIs so if you are using R or Python you can directly bring this data in and this is freely available for you to use so you can actually start consuming this data in your analysis if you're interested there's a wide breadth of data here we compile data from sources such as Facebook for good mobility data ghu data and the map that you the dashboard that you see is an Esri dashboard so it is being served on our technology and also the layers for for those data sources are available for you to use so on the data required so moving away from the raw data side or and the data required there is also an important aspect that is communicating this data the data doesn't need to be raw data this can be model data and it is very important to be able to communicate this data because we know that communication is a huge aspect of this one thing we have done within the spatial statistics team at Esri was integrating some of these models that are out there um an example of which you have seen so this definitely is not an exhaustive list of epi models that are out there um one from University of Penn is called Chime the other one is the IHME model and also the COVID-19 surge model from CDC one thing that we've done and I do not want to go too much into the modeling details because some of these um required re-implementing them so that we can integrate them into the platform but the idea here was taking these models that are not spatial but that can make projections and over time and integrating inside a geographic framework so that we can bring together the demand and the resources together to understand if shortages are going to occur and here is an example of a web widget we created using the Chime model here we are modeling at in Florida at the county level what the demand for hospital resources is going to be and when we overlaid this with hospital resources now we will have an idea of overages and here we are comparing two scenarios in orange 29 per um in orange is a social distancing of 29 person effectiveness and a social distancing of 50 person effectiveness on reducing the infection rates so at some points you will see that a case where we do not have as much social distancing the hospital resources here at this time in this model starts running over and you can see on the left hand side in real time what the which locations are going to be impacted and where the overages are going to be since these um since this is integrated within a GIS you can have this broader view of how things change over space and time which makes it very important to plan for these extreme events in addition to communicating modeled and raw data um i as i mentioned you can consume the live data that you serve through the use of apis um and this being jointly organized by our consortium i will be remiss if i did not mention the excellent geospatial libraries and are that you can leverage for um reading processing and working that's live geospatial and spatio temporal data and here i would like to highlight the open source geojson r package which allows you to bring in data that's in geojson format into r as a data frame and also i would like to highlight the ArcJS binding as we call the r bridge which is an r package that we develop at ESRI that allows you to seamlessly read in all the data sources i mentioned so far um as an r data frame so that you can integrate this into your analysis and i would like to show you very quickly what this looks like and i would like to do so using an r notebook um and here one thing i wanted to show of you is since this data is live meaning that more and more data sources is coming in over time at spatial units whenever you run this analysis it's going to be a different answer and you may not need data from certain counties or from certain timelines so this has the advantage of not downloading files um over and over even though they might be updated regularly so that you can just get the portion of the that you need um so i have this live data source that is um actually the contour level data from jhu of cases over time for us counties here i can point to this remote data source i can bring in only a portion of the data over california that i need so that i don't have to bring in the whole data set i can also subset this temporally and i can start um basically bringing it bringing this in as a geospatial data frame and start um start analyzing it i can even um start serving this as an interactive plot emap so this allows me to create widgets this allows me to bring data in for my own analysis so that i can really truly stand on the shoulders of giants here and um leverage our libraries to do my analysis and i've luck to conclude by the challenges here um here resolving different scales is a big challenge both at spatial and temporal uh levels this is something that we definitely need to address for different data sources um representing uncertainty in data and models is crucial we know that these data sources have uncertainty associated with them and they definitely should be a part of the data itself and also the metadata that anybody a practitioner needs to know that the data that you're using has uncertainty associated with it and lastly committed driven curation we want to make sure that the enable high fidelity data and um doing this for live data is challenging but that is definitely a challenge that we need to face now during this crisis where every every day counts thank you well thank you very much um we're doing pretty good with time but let's get to our last speaker dr gnome ross gnome is a principal scientist for computational research at eco health alliance he studies how the combined process of changing wildlife populations evolution of pathogens and human activity lead to the emergence and spread of disease is where focuses on understanding disease in complex structured populations and on developing epidemiological forecasting gnome is also part of the leadership of our open side an organization that develops tools and communities to foster reproducibility and open science among researchers please welcome dr ross thank you very much Joe um so thanks for the organizers uh for putting together this forum and uh all of the other speakers who have said great things and everyone coming to attend if this is the beginning of a series of forums i think we're off to a great start um in highlighting the issues and helping us all come to a better understanding of uh what we need to do to get a better picture of co vid 19 through the data and use that data to uh you know intervene and uh support policy and action and health care again my name's gnome ross i work for an organization called eco health alliance eco health alliance is a non-profit based in new york we work at the intersection of wildlife conservation and human health and a very large part of that is studying the emergence of new diseases from wildlife populations how they spill over into humans and how they spread in the human population and it's a group that ranges from veterinary scientists to medical anthropologists to data scientists and computational researchers myself as well as virologists and public policy specialists so we work very much at this at this intersection um and have been involved in coronavirus work for many years uh and as and as joe said i also work for an organization called our open sci our open sci builds sort of both the technical and the community infrastructure especially within the r language but support open reproducible science and these are these the two directions from which i come at uh coven 19 data problems on one hand the studying of the emergence of the virus on the other hand the practice and understanding of data sharing and curation and communities and then my original training is in the type of mathematical epidemiology that alison hill described at the at the beginning of these talks um and you know everyone at eco health as well myself is involved in some form of that uh response work and consulting with and advising with policy makers including building models with this type of data that is uh being brought and collected by people from all around the world so what i thought i would talk about a bit today uh because of the nature of the forum and the discussions we're hoping to have is i'm going to go back to the beginning and talk a little about the nature of the data that helps us understand the emergence of an epidemic like this uh which is sort of the the data world i know best and the challenges and successes within that and then apply it to the data we're seeing in the epidemiological context as more and more sources are coming online and uh all we have all these challenges in building data and tools for epidemiological response one of the interesting things about this epidemic is actually how much we've learned about the viral origins of it very quickly so sarah's c o v two the virus that causes covet nineteen uh was only discovered when it emerged in january and we know a whole lot about it um and even a lot that we learned within the first few weeks because of the research that occurred very rapidly we don't know completely everything about its origins but we know a lot about where it lives within the evolutionary tree of viruses and its closest relatives um within the first few weeks of its discovery a preprint and then a paper and then follow-up papers had analyzed the viral sequence published the viral sequence compared it to other viral sequences and helped us understand that it most likely originated or is derived from viruses that circulate within bat populations in southern china so uh a couple of the papers that do that analysis are on the slide and on the left is an evolutionary tree that shows where samples of the uh sarah c o v virus uh back before it was named to when it was the novel coronavirus um nest within this group of sarah's like coronaviruses that had been sampled from bats and are very closely related to a particular sample uh that came from a bat from unan province uh so this occurred very rapidly and that's actually quite a bit of contrast from uh how we learned about viruses in the past if you go back to the bowl epidemic just getting the first sequence out of the first ebol epidemic in west africa took a number of months and actually when you go back to uh the original uh sarah's outbreak in the early 2000s uh while there were a there was a bunch of work that occurred quickly really understanding and pinpointing the origin you know has taken uh more than the decade to understand that and so uh there's this rapidity with which understanding viral relationships and disease origins is beginning to happen that we've seen in this uh this outbreak and one of the things that uh is the driver of that is not just the investment in wildlife surveillance sort of the increasing uh focus in the field following the first sarah's outbreak but it's been the development and widespread use of common genetic data repositories so there are a number of these repositories two of which are really most important in my work one run by the nih called genbank and an international one called gis aid which stands for the uh um oh i'm gonna i'm gonna get it wrong um uh general uh the global initiative to share all influenza data so it originated as an influenza focused repository but then but then expanded to cover many other viruses and sequences and because genetic data follows something of a common format there's obviously complexities to it but genetic sequences are similar data and the very widespread common practices within genetic fields of publicly depositing data from scientific research there are very large banks of genetic data that can be commonly compared and can be drawn upon for when you have the discovery of new viruses to see what they relate to and to see how they relate to how they relate to each other what are likely causes of origin and what are similar viruses that one might use in order to uh test potential treatments and understand cellular mechanisms better and having all of that data available on those banks is really one of the reasons you're able to make these comparisons quickly and learn a lot about viruses from sequences it's not just the sequence that you have it's all of the sequences out there you have to compare to and uh it's really a testament to the value of the sharing of that common open data so this is useful in sort of understanding origins it's useful in researching and understanding molecular mechanisms of how viruses work but because of the extent and the widespread use of especially GIS ID during this epidemic it's also useful for real-time analytics and one example of that that I'm showing you here is a fantastic uh tool called next strain there's a great group that runs next strain and next strain again starting um from a base of influenza and now working through all sorts of other viral epidemics does the work of pulling all of that data from GIS ID and running phylogenetic models so you can understand the relationship between different lineages of the viruses and when that data is combined with geographic metadata about viruses you get real-time insights about the the movement of viral lineages around the world and that's how we understand for instance that the first cases in Washington originated from Asia and then there was community spread because cases were linked there or that cases in New York are more likely to have originated in Europe than to have originated in Asia so the existence of these repositories which carry standardized data of sort of a narrow type but a common format enables sort of a real-time comparison and a real-time use of this data as it comes online and the existing previously built up infrastructure for sharing that data for having common standards and metadata and linking is what enables these platforms to be built on top of it and one of the real great legacies for investment so this type of thing is extremely useful in our work in identifying viruses but it's only one side of the picture and other areas aren't as well standardized so one of the things I work in is understanding where the viral populations are that can potentially pose risks to humans you need to know not just things about viral sequences but you have to know things about the hosts or the animals that house those viruses as well and so you need standardized information about biodiversity about species locations and species populations and they're targeted repositories for that type of information the global biodiversity information facility is a very common and broadly used one and vertnet specific to vertebrates and then sort of to the right the KNB repository gives you a lot of this data in less standardized form as it go from left to right you get into a bit more of a sort of a wild west where you have standardized metadata but very different types of data and then you need the information that links those things together which gives you information about host pathogen interactions and that's something that there is not really a very widespread common data standard about how to link these two types of data so individual projects such as on the left here project that I've worked on called predict which does viral sampling globally has its own here's the linkage between an animal species location and a viral detection there are sort of literature diving databases like the global mammal parasite database or globy the global database of biotic interactions and as you get into this more and more detail of linking different types of standardized data there becomes less and less of a common language and one of the challenges of not having that common language is you lose a lot of information you try to link information together so when you have information about viruses and information about hosts and you're trying to construct information about their linkages their interactions you realize there are a lot of parts of that linkage or interaction that you have to consider that are not binary so there might be you know just the detection the presence or absence of data being there but it may be more complex maybe you've detected that relationship because of seroprevalence and antibodies which is weaker evidence than getting tighter a virus itself maybe there are different types of tests to use you use for it maybe you know more about the prevalence in that animal population rather than just its presence and all of those types of information are the types of information often you lose when you try to link these different types of data because you sort of need to go to a lowest common denominator to have database to speak to each other in a common language and this is all before we get to the next stage of how are these animal populations and viral populations interacting with humans through their through contact with domestic animals through human behaviors through land use change through things like that and so this is the biggest gap that I tend to work on just trying to figure out how to link these data together and it's the perspective I've brought as I've moved to more and more of the epidemiological work in COVID-19 so many of the speakers who have gone before me have already sort of talked a lot about case counts and death counts and the type of common epidemiological data that sort of forms the baseline information of our picture of the pandemic so in the United States context at least we have largely sort of state level and some federal level reporting of these basic pieces of information where are the cases how many cases are there who how many tests have there been how many deaths have there been in some cases how much hospitalization has there been uh these are largely coming out in the U.S. from state agencies be it uh state departments of health state governor's offices um and there's a lot of heterogeneity in how these things are reported and I'm previous speakers have all spoken to the challenges of pulling this information together in various ways uh so this is sort of the baseline common information that with the degree of common vocabulary and common dictionary we can use to draw a picture of of the epidemic in localities and then across localities one of the observations I've had of this type of data is that there's a really kind of a mixed record so far of the state and local governments making use of the open data infrastructure that they themselves have built in the past so lots of municipalities and states have open data platforms specifically for reporting data and dealing with provenance issues dealing with versioning providing data via apis there's been a great movement for open government open open data in the past decade or so um but only if in a few cases are these platforms how governments are serving this data that's of sudden importance I was looking at New York City I was looking at Minnesota and searching their open data platforms are they using them to to talk about coronavirus or COVID or virus and in most cases you don't get results it'll different in California the California data portal is using it and that's not to say this is the correct way to do it uh obviously the data platforms were built for specific purposes and the data that's coming in for uh COVID um is different in a lot of ways in terms of the speed at which it's coming in the type of audience that it's has its structure then many of the things that people have uh sort of anticipated for these portals but it means that a lot of the capacities that these were built for aren't immediately available and there's a degree of reinventing the wheel in reporting this this data given that these capacities exist so as a result a lot of the data that people use and we've seen this a lot comes through the sort of aggregation and standardization projects that have emerged very rapidly so some of these we discussed already and seen such as the John Hopkins platform one of the interesting things is the degree to which these have been driven by journalistic organizations so the New York Times and the Atlantic both in the US are a couple of sort of the uh fastest sources that developed for this and you know the the data journalism infrastructure that has come up in recent years has really proven very uh capable of handling this particular challenge of rapidly developing an interface and a process for and for a new data source you know whether it's the best platform for an ongoing basis we don't know but these tracking projects have become really core to how a lot of different groups have worked so that's sort of this one set of common baseline case data death data hospitalization data the picture of the pandemic so I think of that as the genetic data of epidemiology this is what this is the backbone of what it is but then there are these other things that come in which sort of you know Dr. Hill for instance described as important for modelers to use so one of those is mobility data a common theme in this pandemic has been social distancing and how the mobility of people their movement their contact is very important in as a mechanism for controlling the disease and measuring that is a mechanism for understanding transmission of the disease so there have been a number of sources that have been openly published which are aggregations of cell phone data so both google and apple have public sources in which they are publishing what are sort of anonymous aggregate versions of whatever large troves of cell phone mobility data they have and then there are a number of sources uh that provide that at a much more granular level and usually those require some level of agreement they're not just openly published although various parts of them are so the cubic group a lot of their work has been used in data journalism and researchers use that quite often the safe graph organization also has been making cell phone data available for people to use using these previously commercial applications and making those available to researchers and that's another strand of data coming into understanding the system and that's just a small part uh again dr hill talked a lot about all of the different things that modelers need to know and the types of things that you're going into so there have been a number of initiatives that try to aggregate the estimates of those models and the parameters that you might use within the nih there's something called the mitis network the modeling infectious disease network and uh they collect parameter estimates from lots of different studies together there's another group at the university of georgia that does the same so in measuring how many people who get the virus get sick with what symptoms how many have no symptoms how long does it take someone who gets the virus to show up to the hospital if they do show up to the hospital how long they stand the hospital there are all these different both modeling and clinical studies that have estimated these values that are then important to be passed on for future model or future capacity and policy decisions and those are being aggregated in a certain way and with thousands and thousands of papers being published that do this kind of thing there have been new approaches to this so uh the u.s uh office of technology policy along with taggle had put together a natural language processing challenge to try to extract as many of these as possible so these are some of the strands of data that come together especially for the type of modeling that that i've been doing and i just wanted to end on some observations about what's happening as people are trying to pull all these things together one we've heard a lot about the aggregation and standardization of these data so people can have big picture uh understanding but the more we aggregate and standardize data the more we lose in the useful data and grain of local and finer demographic detail which is harder to pass past between databases and one of the biggest issues is trying to understand how to link these very different types of sources so for instance if i'm doing something that uses a bunch of different geographic sources of case data can i link that so that the parameters that i pull out from one of the model parameter bases matches that geography or those or those demographics or that component of the virus and then finally the policy questions that are relevant are changing really rapidly what people need to know at the beginning of an outbreak as cases are rising is different from what they need to know at a peak or a plateau or as people are trying to implement long term policy decisions and that um progression is very different in different places the type of questions that certain people are asking of me from south america are the ones that people were asking in different parts of the world six weeks ago and so what the policy questions are should really drive what our data and modeling projects are and i'll just point you to a really great paper from a group at georgetown that summarized the questions that policy makers are asking of modelers and data scientists these days so we can use those to focus the type of work that we do thank you know thank you all right thank you and thank all our speakers i think now we have some time for questions so michael please um feed us through that sure all right so um i'm going to go back ask a question it'll be kind of directed at one of the speakers but other people are other speakers are welcome to give their perspective the first one's for allison um and this comes from from the audience it's for transmission networks uh which you which you talked about how many different types of networks would you expect to track i guess the different networks might be more or less conducive to spreading a virus can you give an example of different networks having different characteristics yeah sure no it's a great question um so you know in general there's always this trade-off in models between keeping things simple enough that you can really understand your outputs and how your assumptions impact them versus including like as much realistic detail as we can and that is a huge concern when thinking about transmission networks so um i think that uh you know sometimes we try to capture just certain statistical features of networks so like the average and the variance in the number of contacts people have over a certain time period other times we try to include more detail about the nature of those contacts so like some of the images that they showed in my talk were from studies where they try to separate out contacts into different types so people you contact in your household at work at school in social settings and maybe just like random other people in your town or city and those networks all have different statistical features some are characterized by really high levels of interconnectedness like households and social groups often like that workplaces other types of networks are sort of much more similar to like random networks that you study in statistics so yeah it really depends on the context and of course you know those networks do have different features in terms of how likely they are to involve the types of contact that might be relevant for different diseases so you know for example like in my day job I study HIV and for HIV people think a lot about sexual contact networks which have very different statistical features than household networks or workplace networks and so that's why we need to understand the biology of how diseases are transmitted to understand how important these different networks might be for transmission okay all right thanks all right the next question is going to be for Ryan so this idea of using github and github actions for kind of aggregating data and then also joining data is a pretty interesting one since you kind of all agree on there's kind of an implicit agreement on the platform being used to aggregate the data and then also how it's being validated can you talk about the the generalizability of that and kind of what are what do you think the hurdles to that become is it does it become data size or is it you know the sophistication of the user or is this really something we should all be thinking about as as we're trying to aggregate data sets from disparate sources yeah that's a good question and I think you know there are a lot of aspects of you know the the reason we chose this approach and some of you know that we've been constrained to data that is not very large in size and also data that is open right and you do run into challenges when you get into different structured data or different you know unique environments required to process the data but the main the main goal behind the or the main I think thing that drove us to the idea of using github was just the notion of when you are trying to pull data together from many sources and I did see a question kind of related to this about you know why it's not everybody using the same platform to share their data and I think you know the answer is you've got hundreds of countries and within those countries different states and everybody you know know I'm kind of touched on this you everybody has different different platforms they've invested in different ways of doing things and you really can't force people to conform regardless of what you know body you are and anyway so what I'm getting at is the the reason we kind of just lean toward github was the idea of you know this is the least amount of friction and it's very open and hopefully it's a system where you can scale and use you know a crowd of people who all come from the different areas where the state is being published where maybe you know the the organizations are not doing adhering to standards but scientists can help get their data into the standards in a in a very kind of open and collaborative way okay thanks all right the next question is for Orin collecting data on some resources resources and usage can be more human intensive than others are there ways we should be thinking about automating the things like the inventory for the for these types of resources um definitely so um for inventory automating it definitely makes sense and this come this can come through um connected sensors and this can be easily done it has been done for it has been done by retailers for a long time now to optimize their to optimize their pipelines so this can this definitely should have been done and definitely could it could be done um I don't know if it will be done especially on time um but um so to answer that definitely yes but then there are there are other factors that it's just really hard to measure um with a sensor so for those we rely on proxies um for example the effectiveness of a social distancing measure that's been passed some people try to use um satellite images traffic data um there are different sources and there we are really relying on um a proxy um we can extract information from the proxy but then we also need an uncertainty model that can tie that proxy to the actual parameter that we are interested in okay all right thanks um now for Noam our open side is a great job creating an ecosystem for peer reviewed software tools for analyzing data is there a good way to extend this to data processing analysis uh aggregation uh and interactive sub summaries not just those in publication I wonder if this question has been uh planted um we have a new initiative that's actually starting up right now our open side uh as the questioner said we have a peer review process for uh for code in for code in r for r packages um in sort of the data management life cycle so people uh who build tools that help people manage data especially scientific data um can submit their packages to be reviewed and get feedback and sort of get a stamp of approval um and we have uh shied away over the years from doing this uh specifically for like statistical algorithms and a bunch of the things that people use but require a different type of expertise not just a software engineering expertise but uh statistical mathematical expertise but we are expanding that right now um I will drop a um a link into the chat about um sort of our initial research project and doing that because we are building a um sort of a new review board and new editorial process and a new set of standards so people can have uh their data and that their data analysis tools and not just their sort of management tools uh be reviewed uh in that way uh so um and we're thinking about how you know the degree to which like you know a data data summary type um type tools can fit into that as well as your your model fitting tools and your machine learning approaches and things like that okay thanks uh I guess Joe is it uh are we have time for maybe one more question okay there was there was a really nice one uh that I liked that I liked so this is for all for all the speakers it's are there any resources funding talent matching etc for people who have the skill set and interest to help with data collection design data cleaning best practices or research design I find a frustrating to see most resources go towards data mining and modeling example hackathons being the private sector and living in Portland with where grassroots movements such as volunteer mutual aid networks are vibrant I really want to be able to collect data on vulnerable populations available resources and share them with resource with researchers however I don't know how to get involved with with interested resources researchers um so Allison do you want to do you want to go first Joe or are there are there suggestions where people are like I need to tell you this now or go ahead Allison just one second I was just typing an answer to another question um I mean that's a great question and I don't really have a good answer I think that yeah somehow part of the problem I mean grassroots efforts are great but part of the problem is it's just hard for people to like find things that connect with them and it would be great if there were sort of some more official efforts to like bring people together on um on projects today but yeah I don't know that there's like one easy way to find kind of the best group to reach out with I mean I would say if you're in the private sector and you have like free time on your hands to do this like try to find some university groups for example that are working on these type of questions and just email them and like offer to help out I think everyone stretched really thin and definitely appreciate efforts from um people with with experience in in these type of issues um and I think just like I mean that that's happened to me I mean when I first made this simulator app a lot of people just got in touch and were like hey this is cool like we're interested in helping um with with modeling efforts like how can we do that and just you know ended up bringing some people in so so that's one option but I don't have a great answer to that okay thank you we're we're almost at a time here this last question I think deserves a lot more and I think that we will be able to pick that up in the future especially as we get involvement from um the public health officials who are particularly overwhelmed at this time but we'll be looking for ideas along those lines so thank thank you all of the speakers I think you've done a marvelous job of talking about the uncertainty with fundamental data collection the complexity the immensity the kinds of diversity and data that's needed and and I think I think we all have a better overview of where we stand now as a as a community so this wraps up the the forum for today we are we have recorded this the recordings will be available as soon as we can make it happen on the conference website that's COVID-19-data-forum-org and please look for us both on twitter and on stanford and our consortium news so thank you all and that's all for today