 So again, welcome to the webinar. My name's Todd Wallach. I'm spending the year as a Berkman Klein Neiman fellow at Harvard. And I've been a data journalist at the Boston Globe for about seven years, as well as an investigative reporter working both with the Boston Globe spotlight team and the rest of the newsroom. Caroline Chen covers health care for ProPublica. And she previously was a reporter at Bloomberg News. Armand Imamjoba is a graphics assignment editor for The Washington Post. And it was previously a deputy director of data visualization at the Los Angeles Times. And I was excited to have this group of people talk about the issues that journalists face dealing with data because we all have some different expertise. I'm sort of a generalist looking at data, trying to find, mine it for all sorts of types of stories. Caroline is more of a specialist in health care and will have more expertise in health care data. And Armand has lots of experience in visualizing data. And it seems like there's been tons of interest in challenges and looking at COVID-19 data. A lot of it's been people have been trying to track it by day and time to see trends, and whether it's getting worse or better, as well as geographically. But there've also been a lot of challenges, such as obstacles trying to obtain the data. So you've seen a lot of headlines about that, particularly at local level or getting more details on the data, such as the race or age or other details about people affected. In questions about the accuracy and reliability of the data, ProPublica and the New York Times and others, I've written a lot of stories raising questions about the accuracy and the challenges comparing one area to another because of difficult variances in who's tested, how accurate the tests are, how accurate death counts are, and other issues. So I'm going to start by asking a number of questions. And at about 12.35, we'll switch to questions from the audience. So feel free to start tossing in questions as we speak. And about halfway through, we'll start going to audience questions and finish at one. OK. I want to start off just by asking Armand and Caroline, what data have you seen readers most interested in? Should I hop in here? I think early on, the questions were just, where is the disease spreading? So I think, obviously, especially in the US, as the virus first started to hit, everybody just wanted to know case counts. And then I think that started to soon overlay with a concern about deaths. And so I think that continues to be of interest, cases and deaths. And then I would say, where are the tests and testing capabilities, testing capacities? And I think now there's an understanding that there are two types of tests, the diagnostic tests. Those are the PCR, the swab tests, and now the new incoming antibody tests. And I think the more sophisticated readers are starting to gain an understanding of what do the numbers mean as we're starting just right now this week, starting to see studies come out with some numbers around these antibody studies. And there are already furious debates around those study results and whether or not those are meaningful. So I see those as layering. We continue to want to care about case counts, and we continue to want to care a lot about deaths and segmentations of those. So demographics and race and who are being affected. And these are layering as we go. And Arman, what have you seen? Yeah, I completely agree. Definitely along the same lines of that pattern. It has been case counts and where there were reported cases and reported outbreaks, deaths. And now I think the one thing I can add to what Carolina said is there's been interest in the trends that are being reported by states as well. So we've made steps to show what does this data look like over time, of course, noting the caveats in the data and how it's being reported and recorded by the states. Oh, and I forgot to mention, of course, there's the whole conversation around supplies. PPEs, ventilators, it's obviously been very hard. That's always a moving target, depending on whether you're talking a local level, national level, you can never put a nail on how much PPE there is at a given hospital state. But there's always interest in that question of supplies. And I'm also curious about how easy or difficult has it been to obtain all the data for your stories and graphics? I can talk about that. I think the data at a national level is basically kind of non-existent. Everything is reported. Most things are reported at the state level. So that means you have to either rely on an aggregator or aggregate the data yourself, going to all these different state sites or figuring out where they reported, how they reported in what format. Also noting that this data, what the states are reporting, also changes over time and what platforms they're using to report it. I sent out a tweet like a few days ago that was like, what if you were reporting a live election, but you were reporting, you were building your rig for reporting the results as the election was happening and what everywhere was reporting as the results was also changing. And they were changing how they reported as well. So it has been extraordinarily difficult in that sense to build things that don't constantly break and build data flows that are actually kind of stable, given the fact that what is being reported is moving under them as well. Yeah. I would say that there are certain things that, just by the nature of the pandemic, are going to be constantly changing. So for example, testing capacity. I've done a lot of reporting around testing and testing capacity. And just by the nature of what's happening, that is changing constantly. So whether that's nationally, whether that's locally, if you're trying to say, what is the testing capacity of my state, that number is going to be constantly changing. And it should be, because we have been constantly ramping up testing capacity. So for any reporter to try to get a beat on that and try to inform their local readers, they'd have to constantly update that. Is it possible to actually get an accurate number at any point in time? I think that is technically possible, but your number is going to be outdated within an hour, even at a specific lab. So I have at certain points been able to be like, I nailed the number. It's already old. Is there any point in even doing that? Yeah, I think it's a worthy exercise to try to get a ballpark and to track trends for readers. So there have been times where I've tried to do that for specific stories, but it is a frustrating exercise and I've really encouraged other reporters to really try to explain to readers like where you got this number, what has gone into this number and like how long of a shelf life the number will have and really try to like show your work to your readers more than I normally would. So I think some of that is inherent. There are other things though where you can only, your information is only as good as where you get it from. So for example, like the WHO puts out daily situation reports. That is the only way you can really get in a source for international case counts, right? But the WHO's information is only as good as the countries from which it comes from. So I keep like repeatedly explaining this to people that the WHO has a recommended way for what they count as a positive case. And they say that it is like, if you test positive with like a PCR based test. For the longest time, China just decided that they were only gonna count as positive, someone who had a positive PCR test and symptoms. They were not counting people who had a positive PCR case, but no symptoms. So they weren't counting asymptomatic cases. There's nothing the WHO can do about that. There's nothing anybody can do about that. And then after a while, like China changed that. So you needed to know that about China. And I mean, that's deeply frustrating. You can't get everybody to report the same way. And you need to have those caveats in your reporting. And this also trickles down to like, you know, 50 states and, or 56 states and territories all doing it in their same way as well. Right. That's gotta make comparisons really tricky when everyone has a different way of reporting the data, tracking the data, there are different rules on who gets tested and what gets counted. It sure does. And again, yeah, I mean, I think we, you can only be clear about like the caveats of like, this is entirely dependent on how, on what's being reported and how it's being reported. Yeah, I've been very, very cautious about comparisons. Got it. And are there any other problems you've noted in the data that people should be aware of? I have been very careful, or I've been encouraging, you know, reporters in my newsroom and trying to explain to the public just to be aware of, you know, what the definitions are of numbers that gets thrown around. So one thing, for example, I've been trying to explain a lot to, to lead readers is like what is the, what actually is the fatality rate, right? And there's a big gap, I think, between what the public wants to know, which is, you know, if I get infected, will I die? And what is reported as the case fatality rate, right? So the case fatality rate is the number of reported deaths divided by the number of lab-confirmed infections. So everybody knows in the US, it's really, it had been really hard, it continues to be really hard in many places to get tested in the first place. And a lot of places are not testing unless you're really, really sick. So that denominator is gonna be much smaller than the actual number of infections. So especially early on in the United States, the case fatality rate was something like 10%. Because we just weren't testing a lot of people and like that you think of it as like an iceberg model, like the deaths are usually the easiest to find and count, especially early on in a pandemic. This always happens in a pandemic. And the people who are like asymptomatically infected are the hardest to find in the first place. But again, like the average lay reader, they just wanna know if I get infected while I die. And they're looking at that number that's being reported in your headline. And they're just looking at that and be like, that's like if I get infected, like that's my chance of dying. And like we have such a huge responsibility as reporters to explain that number. And not just like throw things around in headline. So I mean, I think there are a lot of numbers like this as a science and health reporter where like, I feel like we have a lot of it responsible I take explain to people. So like that are not and are, these are like the rate of infection, like the chance you have of infecting other people, the average number of people who infect, which it's a process of understanding. And this is what I'm trying to get across to my readers, like there is so much, we are still learning that we don't know yet. And we cannot present this as set in stone. And building on that, like mentioning the, you know, the ball cases being kind of a difficult fraction to divide against, like the deaths number also is slippery that we've seen stories highlighting this in recent days. And it's been something we've been kind of saying for a while is like not every death is being accurately categorized to like recently, New York city added some, what was it? Like 3,700, I forget the exact number of deaths that were classified as like probably COVID-19, you know? So, and you know, if it's happening in New York city, it's probably, there is some fraction of cases that's being, that that's being, you know, miscategorized throughout or never even recorded. You know, so that number is slippery as well. And I think when talking about like fatality rates and that kind of thing, rather than just talking about one big number, we've been, you know, we've been trying to, when we have the data available, at least break it down a little bit better into like segments of the population or report the comorbidities that studies have been reporting. So it's not like, you know, a flat 3.2% or whatever it would be, you know, it depends on a lot of factors that are related to the individual. It definitely sounds challenging when there are questions about an uncertainty about both the numerator and the denominator when you're trying to calculate rates. You know, my sense is that these are problems that data journalists also, and journalists in general encounter when trying to get data. It's often hard to get one clean database at a national level or global level. We're often aggregating it from lots of different places and each place might have different ways of counting the numbers and reporting the numbers and the data can be messy. Is there anything different than you're finding in dealing with COVID-19 data or is it reflect challenges you've faced doing other types of stories? This is more philosophical, but, you know, these numbers are being reported by states and by countries and everywhere, like very precisely, but in its nature, it's a very imprecise like count. So there's this weird situation, you know, it's like inaccurate but precise is one type of like data classification. And I think that's where we are now. It's like you're throwing, you're like taking shots at a dartboard and they're all landing in a very exact same similar place, but, you know, you're off somewhere, you're not actually hitting the dartboard. It's like somewhere off in the wall because you're throwing the darts kind of blindfolded. But we have very precise counts. Yeah, one thing I've seen and I guess my, I know this is a really hard thing to do, especially, you know, if you have an editor that's pushing you is to resist the urge to write because what I do see is that health departments as they release data are refining as they go. And I think this is because they are also figuring out what they need to release. So for example, to give a very specific example, New York City started out by giving tests. They were only reporting by a borough and then a lot of people were like, well, that's not enough information and they were getting a lot of criticism and then they started releasing. It wasn't quite by neighborhood. It was by this very strange, not quite zip code, not quite neighborhood. And they released, it was percentage of positive, but they didn't have any raw numbers. There were no numerator, no denominator. It was percentage. And I was like, well, I can't do anything with that because if you say that in this zone, it was like 66% positive, like that could mean that you only did three tests there and two people tested positive, like that's meaningless. But I did see some news organizations like write a story on that. And I was like, that's a bit dangerous. And then I think like within a week, they then re-release numbers, which were by zip code and had numerator and denominator. Like they had way more information and then you could write a more meaningful story. And then like New York City has continued to update and iterate and give more and more granular information. So I do think that there is a benefit to kind of waiting because I've seen more than I've ever seen before in any other outbreak I've covered, sort of health departments iterate as they go with the data that they're releasing. And I actually see because this is happening across the country, actually reporters, I think be able to push health departments and be able to say, hey, you know, like Ohio released this information, Florida, why aren't you releasing this information? And be able to sort of like push departments off of each other. And I think in a similar theme, I think it's really sometimes dangerous to write a story off of a preprint. I do think it's really great that scientists, our researchers are moving quickly and sharing information on like meta archive and bio archive and not waiting to go through that whole process, but then it's not peer reviewed, right? So this puts you in a really dangerous position as a reporter to have to like write a story off of a non peer reviewed study. So I think one of my goals is to never let a preprint walk alone. As in you don't write a story on a preprint by itself, you try to let it go like in concert with other studies and look for a trend or at least like let lots and lots and lots of people comment on it and don't just write a story on this. So this is happening right now with all these antibody studies, right? Like Stanford put out its preprint on its antibody sero survey. And there were a lot of stories that got written really quickly. And then in the next day, there has been the critique wave of like, was it a good survey? Was it biased? You know, like all of that stuff. And I just wish that a lot of reporters might have waited a little bit and now there is the Los Angeles sero survey. And I think you could have maybe waited and collected a bunch of these studies and maybe done one thoughtful story in one go or at least gotten a lot more outside voices than you normally would before writing that one story because they aren't peer reviewed. So you do have to treat preprints differently. Right. And interestingly, of course, none of our articles are peer reviewed. So I'm curious what process you go through to make sure that your own interpretations and analysis are sound before publishing? I just run preprints by way, way more people than I normally would with, you know, if something's already published in a journal, I know that it's gone through that peer review process. If it hasn't been, I will run it by a lot more outside experts than I normally would and just go that extra mile and really ask myself, like, do I have to write this now? Can I wait for it to go through that peer review process? And you can ask the author. Sometimes they'll say, oh, yeah, this has already been accepted by JAMA or the Lancet or whatever, and that gives me an extra measure of confidence. If that's the case, that's helpful to know. And if not, and it's like, you know, this is such an important study that I need to write about it right now, then I get all those outside voices. I try to get many independent outside voices that are from a number of different institutions, get all their critiques. And if all of them are really, really negative, then again, I have to ask, like, why am I writing about this study in the first place? Like the bar just gets so much higher if it's not in a journal and hasn't gone through the peer review process. And I assume, Caroline, even when ProPublica or Post or others are doing their own analysis, we do the same thing. We'd go to outside experts and say, here are the numbers I'm calculating. Does my methodology make sense? Is there a good explanation for these conclusions rather than just posting something on Twitter or throwing it on our website? We first normally talk to experts first. Yeah, exactly. And you know, there's a bit of like self analysis in here too, like looking at what we call data smells, you know, it does what's in the data question, your basic assumptions of it, you know, does it show an opposite trend to what you're expecting? Are there massive gaps or negative values where there shouldn't be, you know, it's kind of like sanity checking the data as well. And similarly, I know there are questions about different models that organizations are using. A lot of people are looking at the University of Washington model. It has a website that's very easy to use, predicting when peaks are gonna be for hospitalizations and other issues. But there are lots of other models it seems and there are questions about what variables go into each model, how the numbers are calculated and they can produce conflicting results. So that has to be challenging to deal with. Yeah, so I did a whole column on forecasting and projections earlier on, where, which was partly for reporters and partly for, you know, the public. And I think again, the question really is like who is your audience, right? And who are you writing for? And one thing that I try to keep that at the back of my mind, because I think there's a difference here. Like if you are writing for really a lay public, again, you have to remind them, like is this an estimation, right? And I was talking to, for that particular column, I was talking to an epidemiologist and I said, you know, I was reading this sentence that somebody written about their particular model. And I said, it seemed awfully specific where they said that, you know, this means that in New York, this was back in early March, that, you know, last week there were, you know, it was something like 1,583 to 2,000, blah, blah, blah. You know, it was like down to the digit, number of people infected. And I was like, I read that sentence and I feel like it gives a lay audience this sense that you can be that precise and calculate down to a single digit, how many people are infected. And for me as a writer, I would never give that level of precision because it signals something to a reader. I would give, I would round and use the words around, you know. And I said, what does this say to you as an epidemiologist? And it was really interesting because she said, I like seeing that sort of precision because from one epidemiologist to another, I can then go and redo his model and make sure that our numbers match exactly. So it's very useful from one researcher to another. But I agree with you, for a lay audience, that's not the message we wanna send because I said, what is the takeaway you would want for a lay audience? She said, the takeaway I would want a lay audience to hear is it's not 400 and it's not a million. You're in the low thousands, you know. So really, like that's the kind of the question that I always, when I'm talking to someone who's doing modeling, I say, what is the takeaway you would want for a lay audience? Like, are we talking? And really, she said with models, you need to be thinking in orders of magnitude. And I think that our responsibility as reporters is to then say, okay, so I'm gonna give an orders of magnitude type of number to my readers. Got it. And I'm also curious, are there any mistakes that you see lots of people repeatedly making that bug you? One I see all the time is people say, oh, there've been four million people tested if there've been four million tests. But some of the tests require multiple samples. People could have been tested multiple times. So they're different numbers. I also see people say, oh, there are this many cases when it's number of confirmed cases and there are other studies showing there are probably many times more people who've been infected but haven't been tested. Yeah, the one that you just mentioned, Todd, I think is the one that I've seen most often just in like talking with people and hearing that, like, oh, this place has only five cases. It's like, well, no, not, I mean, yeah, but no. You know, that has just been what's reported and what's being conveyed, like being reported by the states. And again, that comes back to this, like what Caroline was just saying about this precision implying that we know, there are 526 cases in this county in Illinois or something. But maybe that's on us too. I know the instinct is to try and report the data that we, to the granularity we have available, but maybe there are better ways in that that we do report the data that implies more of this imprecision about the data. You know, that's something I think we can ask ourselves and address as we try to put together these pages that are like tracking the spread of the disease or whatever. Another one is like just people being exposed to types of scales and visuals that they're not used to seeing. So like, we're seeing a lot more logarithmic scales than we're used to and they don't chart things, that growth doesn't look the same way on a log scale than a linear scale. But if you're looking at it and think you're on a linear scale, then you might think things are like declining or flattening out when actually that is very much not the case, you know? Yeah, I think Todd, you picked up on my biggest pet peeve is people not paying attention to units, right? Like, and I've kind of been, this has been like my soapbox rant for the longest time is like, please try to get your units and people because I think again, like that is what people, readers care about, right? They see a million tests and they think that that is a million people. When you say like, we're rolling out a million tests, they will automatically think that is a million people who can get tested. And depending on the type of test, and this is like absolutely confusing like the CDC test you had to divide by two, the Abbott test, the rapid test, it is one test per person. So depending on which test you're doing, it is a different equation and it really is a reporter's responsibility to figure out what the heck is being said. And so it is a way for frankly, for officials to inflate numbers. And so it is, and it's the only way to really get an apples to apples comparison is if we get a testing capacity in people. And so I think that's a journalist's responsibility to always get the units and people that way we can compare state by state, country by country. So I do think that that is a mistake, well, a mistake or I think a confusion that annoys me when I see that. And yeah, I think just not explaining that everything should be like, this is a reported number of deaths or reported number of cases at this point in time as Arman said, I think those are really common. I think also just, this is more philosophical. It's just presuming we know things. I mostly see this frankly on Twitter and on TV, but just this air of like, we know what to do. Like, if this state just did this, then like we would solve the crisis. Like, no, like nobody knows what to do. We have only known this, like humanity has only known this virus since January. Well, I mean, in China it was a little bit earlier than that but in the US we haven't really known it that long. And every time I dig into this, whether it is on really understanding how it is transmitted or I recently was doing a lot of reporting on, you know, doctors struggling to understand how best to use ventilators, how to best treat critically ill patients. Everybody is struggling to do their best by patients and to really understand what to do. And so I think there are no easy answers in this crisis and I think you can give, I think this is a failure of communication both by our officials and also actually by journalists, when we make it sound like there is an obvious or easy answer. And failing to acknowledge that like, to some degree we are all still learning. And so that that irks me whenever it comes across like well, obviously. That sounds good. Why don't we go to questions from the audience are starting to pile up. One that's been upvoted the most is from a Berkman fellow, Baobao Zhang, who wondered how do you feel about non-experts weighing in with their own analysis on Medium or Twitter elsewhere and non-journalists and not all of those people do what journalists do is going to experts first to vet their conclusions. I definitely feel like it's a free society and that's what platforms like Medium exist for. So you're specifically citing Thomas Pleo's Hammer and the Dance. I think it's fine if people want to publish and I think that they definitely find their own audiences. I do think that things like that sometimes are, like I think they find their own audiences basically. Like I had a lot of people actually send me that specific post and be like, I cannot understand this. Can you like write a version that is like, that like hand holds a little bit more? Because I think the part where oftentimes like experts who are experts in their field, whether they're like data scientists or like I see this a lot where like a clinician or somebody will be writing, they use a lot of, they tend to use a lot of jargon and don't break it down to the degree that I tend to try to do. And some people do, some people are fantastic communicators naturally, but I think that's a tendency that I tend to see a lot of jargon. And so I think there's a place for them and then I think sometimes there's the shortcoming is that I think they're not trained to be able to use the language that like helps them reach as many people as they could and to give as much context as I think like a journalist would know how to do. That's my off the top of my head answer. Okay, that's good. There's also a question about, how do we deal with issues where, we publish an article based on data and then the data changes or the information changes or this probably comes up all the time with healthcare studies Caroline where a new study comes that contradicts the past study or a study's been retracted. So how do we deal with this when people are still passing around the old article or chart based on old outdated data and information? Yeah, I don't know what a nightmare it is right now with the situation. So one thing I am doing now even more so than I normally do is I am aggressively dating my information like all like it's in my stories on my sentence I'll say like, you know, this fact as of Wednesday afternoon, you know, according to the association of public health labs, you know, there were like the US could had a testing capacity of a million tests as of Wednesday afternoon because literally by Thursday morning the number is going to change. So I try to like tag as much of my information as possible like I'm linking a lot more aggressively than I normally do and also adding the date and timestamps so whenever whoever comes along to my articles sees that information, they will know as of when that information was true. So unfortunately, like some people are not gonna read that carefully but at least the timestamp will be next to that. So I cannot go back and update my articles constantly but at least the information that somebody reads will have a timestamp next to that. So I think that's probably the best thing you can do and then yes, like update as you go. And I think again, this is where the language that you use at the time you write also helps you write because I also say use language like at this time scientists understand this to be X. So like when I was working on a column about asymptomatic transmission, there was a lot of language I had in there which was like as of now, scientists understand that whatever. So again, like there's a date at the top of my article I'm using a lot of language that indicates like I'm giving you the best of understanding at this time and then I'm also linking to studies and putting language in that's like as of this interview that I did on this date, this is what I was told. So I think sort of all of that in combination hopefully even if a reader comes along later we'll know that that was information that was current at the time that I wrote that article and I think that's the best you can do. Yeah, and from a database standpoint we can either build our pages and apps to like plug into live data that updates so that you are seeing updated data as of like the timestamp at the top of the page or right on the chart or whatever. And again, we try to be transparent about when that data is updating or like Caroline says, we can build it statically with like an illustrator or just save it as a static SVG and have to make very clear that this is data as of X. Otherwise, we've been in situations when we're trying to publish a story and like we just have to keep updating the charts like five times because the data keeps changing as we're writing the story. Yeah, and other more subtle things like so ProPublica normally does really long sort of deep dive investigations and actually our social folks are used to just like retweeting our stories like forever because we often are doing such long retrospective investigations that you could like retweet our stories like two years from now and like there's no reason why somebody couldn't read them again later and we've completely reconfigured that. So like they no longer will retweet a story because they know the information could be totally old. So even thinking about that like your social strategy, they will check in and be like, can I still tweet this story from our main account? Is that information still new? Like thinking about that kind of thing. And then obviously if there's some really major new information like for example, if I had written a whole column on asymptomatic transmission and there's some really major information that's really relevant to know, I will put an update at the top of that story. So being selective. Okay, that's both good points. Next question's from Eva Wolf-Angle who's a night science journalism fellow who asks about the fact that researchers often try to communicate uncertainty and I guess there are two challenges journalists face. One is how to communicate that same uncertainty and then there's also the question, do we undercut our own stories and reporting and data when we communicate that uncertainty are people just going to say, oh it's an estimate, it has such a wide range, it has a margin of error, you can't really rely on it. So how do you deal with those challenges? I mean I try to convey that in describing the process of science, right? So just to give a very specific example, I was in the column I was working on on asymptomatic transmission, there was a part where I talked about how new studies have shown that viral load is actually higher at the start of the disease, course of disease for COVID-19, which means that you could be more contagious even before symptoms started, but I went out of my way to explain how this is unexpected because for COVID-19's close cousins like coronavirus cousins, SARS and MERS, you were most contagious, you had the highest viral load in the middle of the course of disease, when your symptoms were highest and I think just like explaining that which would be like why your natural presumption would be the original presumption was that COVID-19 would behave the same way is something that like any reader can understand that like naturally you'd look at historic models and be like you'd expect it to behave the same way. And just I think like trying to explain the process of science helps and I feel like I just over explain and I think like showing that uncertainty or even just saying things like I just was working on a story about like ventilator use and I had a clinician give me a number and then he called me back and he was like you know but like that number I gave you like I just like I know you hate this, journalists hate this but like it might change and I was like no, no, no, that's fine. Like that's fine and I appreciate that you wanted to clarify that. So then I just added a line, a very short line saying like he added that it's early days and more information will be gathered and I think that's fine and a good indicator to readers that like more studies are gonna happen. So I definitely feel like there are ways for readers to indicate that or for writers to indicate that for their readers. Yeah and from a visual standpoint, I mean in terms of like how to communicate uncertainty look to the annual discussion every hurricane season about how to like chart the likely path of a hurricane. You know, it's like visuals you want to give somebody something to look at that tries to convey the data as best as possible. And I think with these in the case of this outbreak, the best we can do is like, you know, work that into the, you know, to the kind of the chatter and the headline around the chart and the annotations, you know, say that it's reported cases or confirmed cases or reported deaths. Try and convey the uncertainty in like what's around the chart rather than the numbers which are what's actually being reported and what we actually have to chart. Got it. And when I take on a question by Saul Tanamban who asked about a questions raised by COVID skeptics who will often point out when we report deaths argue that they're over counted or completely where there are no COVID deaths in extreme cases and say, well, they're really dying from a heart attack or they're really dying from pneumonia or they're dying from some other cause. And yes, they tested positive for COVID but that wasn't necessarily what caused their death. How do you deal with those types of questions? That's interesting. I think that that I don't know that that's like a useful debate right now, right? Like I think like all you can do because I think you can have that debate at like either and right, because then you get into the debates on, you know, the people who are dying at home like and like, did they die of COVID? Did they die of not COVID? Like how do you then count like the people who, you know, are the like excess impact from COVID because they died at home because they didn't want to go in for help or like, you know, like I think there's so many swirling questions around death related to COVID that are going to be so hard to untangle. And I think like as journalists, the only thing you can really do is just be really straight and really flat and be like, here are the number of people who died with a positive COVID test. And just like leave it at that. And then here are the number of people who died at home. And here is how it compares to the number of people who died at home like last year at the same time and like show that gap if you are able to get that number from your state. Like I just don't know that you can, that those debates are really helpful or like getting into those weeds and trying to parse that is gonna get anywhere at this point because you can have that same debate about the flu. Like while so and so had a positive flu test but did they die of their underlying condition or did they die like their pneumonia came from the flu but they also had diabetes. What does that mean? I just don't know where you'll get with that. And it reminds me of like after Hurricane Maria when they went and did studies of like, what did the access mortality rate look at like in Puerto Rico after Hurricane Maria. It also like I worked on the homicide report at the LA Times for several years and the LA County Coroner if somebody was shot and then died like say 10 years later of complications from that gunshot wound like eventual health impacts, it's still ruled a homicide that, because they died because of complications from that gunshot wound. So you know, this is not just solely restricted to COVID-19 data, it's just mortality statistics in general. Right, another question that came up is what is the most reliable source for COVID-19 data? And I think there are at least a half a dozen sources that have aggregated national data and a couple sources with global data. Armand, go ahead. Yeah, most reliable is the key. I mean, you know, Johns Hopkins has really been putting in tons of work into aggregating as much data as possible. You can take a scroll through their issues list on GitHub just to kind of deal with the, or to get an idea of the volume of requests this has generated. And of course the World Health Organizations and then everybody, I think a number of media organizations including us are also trying to aggregate at the US level like state data and county data. I can't tell you which is the most accurate. I would say just for US case counts, we mainly use Johns Hopkins data. We long ago gave up on the CDC, which is very unfortunate to have to say that, but they don't update on weekends and they are like 24 hours behind on their weekday updates. So we use Johns Hopkins for just our daily case counts. In terms of like testing capacity, we mostly point to the COVID tracking project, depending on international sometimes, depending on what it is, WHO or JHU. But again, it really depends on exactly what you're trying to get at. Got it. And of course, sometimes for very local stories, there may be only one possible source of data coming from a county or coming from a hospital or somewhere. Yeah, and actually going to your local health department might directly is probably gonna be the most up-to-date information, which will be even faster and more up-to-date than going to a site like JHU, frankly. There was also a question from Magna Cheney, who's a Neiman affiliate, asking about the best practice for archiving stories. So Caroline, you mentioned having a time date stamp can be one way. Yeah, The Guardian does this thing where they have a warning up really high, what they say, like warning the story is like more than a year old, or they have some sort of very visible warning up high, which I always appreciate whenever I see that. So that could be one way to do it. Okay. And Michelle O'Neill asked, well, what can we report that's meaningful without having the basic data that we want? And I guess that comes up with, when we want to say what states or where are the hotspots, or how is the U.S. doing versus other countries when there are all these questions that we've brought up about how many people are actually infected given the differences in testing and how many people have died given differences in counting. And because of all the uncertainties with numbers, it must be really challenging to figure out what we can really say with confidence. Yeah, I agree. I mean, I think we need to make like basic assumptions or adjustments when we can, for denominators that don't exist or other things that we don't consider reliable. Like on our pages that show the data that we know about cases and deaths throughout the country, like instead of normalizing, we're looking at cases per population or known cases per population of the state, per population of the area. But again, we have to be clear that we don't, this is all just based on what's being reported. Okay, there was also a question with what do you do when you don't have data or information or you have conflicting data so you can't be sure? Do you just avoid writing about it or do you write about it as best you can? Certainly difficult to make charts when you don't have data. Well, I just think that there is a, there is value to writing about the lack of information, especially when you feel like your, say like your local health department is withholding information that should be public, right? So I think that there was early on, a lot of good journalism being done about the need for demographic information about who is being infected, right? And now we're getting a lot more of that information, which is pointing out like big problems in who is being infected. So I would not dismiss that as a possibility for where you start reporting on just the lack of good information, which can actually spur change and get you the information that you then want. Great. So I think we're just have time for one last question. Gina Pavone notes that there's talk of doing, using an app for contact tracing. There's a project at MIT for that. And there's also been stories based on cell phone data that's been released. And Gina wonders how journalists deal with aspects of privacy or reporting on the sort of the challenges of releasing that data and using that type of sensitive data and making the topic accessible to a mass audience. Yeah. I mean, I think that this is a hot topic across a lot of different countries, a lot of different localities. And I think there like one, really understanding the nitty gritty of how it's gonna be used is important. I think there are a lot of think pieces about these apps right now that I'm seeing, which ask a lot of like philosophical and hypothetical questions. But I see way fewer stories that actually get into the innards of how they're gonna be used, which would actually help answer some of these think pieces. So I think that would be useful journalism to be done. It's much easier to be like, well, what about privacy? But then you don't actually know what's gonna happen. I think the other question though, which was raised with me with some public health experts that I asked was like, how is this gonna actually intersect with the existing public health infrastructure, right? Because if there are a bunch of people who have downloaded an app and it's not talking to public health officials and not helping them do their actual work, that's also useless, right? So this needs to fit into the existing public health ecosystem. So I think there's a lot of good reporting that can be done around that. And then again, like this needs to fit into existing testing capacity. There's no point in having like a great contact tracing app if you then can't test people and find out who is sick in the first place. So there's a lot of questions about like, do we have like a very shiny looking object that doesn't mesh with the actual realities of needs? And I think helping people understand, your readers actually understand how this app needs to fit into the actual workflow of containing the virus. Like all of those things can be helpful to your readers. And then ultimately also just like the mathematics, right? Of how many people would actually need to download the app for this to be useful in achieving what it needs to do. Because there needs to be, there is a minimum number of people who need to have the app for a contact tracing app to work. Got it. Well, thank you so much. Thanks for the panelists and for everyone who tuned in. There will be a recording available in a couple of days on the Berkman Client Center event page. And there's also going to be a quick poll added survey at the very end. So thanks again for Armand and Caroline and thanks for everybody who was watching.