 Good afternoon, everyone, and welcome to today's session, Diving into Data and Crowdsourcing. My name is Rocio Ortega, and I am the Events Associate at ProPublica, and I'll be your host today. We'll get started in just a few moments. We're waiting for a few more people to sign on. Thank you so much for your patience. Post captioning of the program is available and can be enabled by clicking on the post caption icon towards the bottom of your screen. In 2021, our teams employed data and engagement tools in creative ways, producing some of our most vital game-changing investigations. Today, we'll be discussing the strategies used to equip the public and those in power with hard facts. And it looks like we have enough folks on now, so we're gonna go ahead and get started. If you're just joining, my name is Rocio Ortega, and I am ProPublica's Events Associate. Again, welcome to today's session, Diving into Data and Crowdsourcing. Thanks to McKinsey and Company for their support of today's event. Post captioning of the program is available and can be enabled by clicking on the closed caption option on the bar towards the bottom of your screen. For those new to us, ProPublica is a nonprofit newsroom dedicated to investigative journalism. Today, we'll be sharing about the methods we use to collect, process, and analyze complex information to fuel accountability-focused journalism. You will hear firsthand the ways in which our journalists find important stories hidden in the numbers and the role the public plays in developing our most intricate investigations. Today's moderator is the one and only Steven Engelberg. Steven is ProPublica's Editor-in-Chief. Steven, you can go ahead and turn on your camera. Our additional panelists today include Engagement Editor and Reporter Ariana Tobin, News Applications Editor Ken Schwenke, Data Editor Ryan Jones, News Application Developer Leila Eunes, Data Reporter Irina Huang, and Data Reporter Alice Simoni. As an additional note, the session is being recorded and a link to the video will be emailed to everyone who registered. Thank you all so much again for taking the time to be here today. I really hope you enjoy the session and I'll go ahead and let Steven take it from here. Good afternoon, everybody. Before we get started, I just wanted to offer a very brief introduction to this topic because it is really part of the secret sauce that makes ProPublica what it is. I'm old enough that I started in the Business and Investigative Reporting a while back and the work we used to do was pretty anecdotal. You'd hope to get a bunch of very compelling examples of something and maybe you'd call around and ask an expert if the examples stood out to represent some major trend and that is how we largely did our work. The advent of the internet are coupled with the science of big data developing gave us some whole new and very exciting tools. One tool was the ability ourselves to do original research, which is both exciting but also a little scary because as you'll hear in this session, statistics is by no means simply a matter of saying if you hit the baseball four out of 10 times, you're bad at 400. If only it were that simple. And the second thing that we've made enormous use of as you'll hear is the ability to talk to the audience at scale to ask not just dozens of people's questions and a man or woman in the street interview but to ask thousands of people questions and to get really, really interesting responses. So I'd like to start by inviting Layla to talk about one of our more intricate projects. It involves looking across the country at data collected by the EPA about emissions from commercial sites, factories. And Layla, what do we do here and why was it so groundbreaking? You may be muted Layla. If Layla, well Layla figures out her microphone technical difficulties and you can just speak up Layla when you think you fixed it. So the project was called sacrifice zones. It grew out of a project that Layla and Al Shaw who's also on the news app's team worked on a few years ago about industrial plants in Louisiana. In doing so, they analyzed federal data on carcinogenic industrial emissions down to really, really low levels like a neighborhood levels, block levels of data. And what they realized is they could do this across the country, not just in Louisiana. It took us well over a year, nearly two years to analyze the giant amount of the EPA's data. But in doing so, what Layla and Al found was that, A, the EPA was not keeping track or looking at this information on its own and B, we were able to tell people about at a really detailed level their potential cancer risks from this. And in doing so, we've gotten a lot of impact where the EPA has been revising and looking at new rules around this. And we've seen a real ground level of support that Ariana can talk about from our engagement outreach by targeting people who live in these areas and giving them the information that we've reported out. Well, thank you, Ken. Let's see if Layla is up and running. Layla, can we hear you? Yeah, can you guys hear me? We can hear you wonderfully. So I wanted to ask you a little bit more about what Ken has just described. Were these numbers that was just a matter of like going to the EPA website and looking them up, zip code by zip code? Where did these numbers come from? I wish that that were the case, no. So the numbers came from a model that the EPA has been publishing for almost 30 years but had never been used before in sort of a systemic nation-wide scale to understand at the level of really facility fence lines and neighborhoods what the estimated cancer risk is from toxic industrial pollution. So to do the analysis, we basically had to process a trillion rows of data. It was sort of a really, the data itself was very challenging. We ultimately ended up using Google BigQuery. It's a software that basically uses Google supercomputers to process data more quickly because our more traditional computers were not able to process the information fast enough. And so what we essentially did with that analysis and what those supercomputers helped us to do was to parse through all these rows of data and compute estimated cancer risk from all the different pollutants in the air around these industrial facilities. This seems like a very sort of logical straightforward thing for the EPA to have done. Do we have a sense of why they never did it? Well, to do an analysis like this is to essentially identify the major polluters and to identify the real hotspots of toxic pollution in America. And the EPA, our sources, many sources told us over the course of our reporting has always been very hesitant to call out individual polluters, has always sort of tried to look more, to have a more comprehensive approach to pollution and not to identify companies by name. And what our interactive map essentially allows users to do is figure out exactly which polluters are driving the cancer risk in their areas. Now, one of the big things that we worry about when we're doing this kind of work is whether the data that we're starting with reported, I guess in this case, to the government by the companies is accurate. How was the data in this case? Well, we discovered, you know, unfortunately in our reporting process that there were underlying issues with the data that some companies had incorrectly reported and not just small companies that we've never heard of before. Major polluters like Boeing had completely botched their numbers when they sent their forms to the EPA multiple years in a row. And so that, you know, sort of led us to have real concern with, well, can we even publish, you know, analysis with this data at all, which led us to kind of, in essence, do the agency's work for them. And we did a really comprehensive QA process of the top 200 most toxic facilities that we identified and that led to a couple dozen facilities actually resubmitting their forms to the EPA and us updating our analysis. And that ended up being a story in and of itself. Ava Kaufman wrote an incredible piece about sort of the story behind the numbers. And two things. First of all, just for, you know, I was a history major, so until I started working at ProPublica, I didn't know what QA stood for. So what is QA? Oh, sorry, I, yes, data person here, quality assurance process. So the EPA does not check a very large percentage of that data that gets submitted every year. Now, in that Boeing story, as I recall it, one of the things that was notable was that the EPA had incorrect, and they tell you that Boeing had incorrectly reported to the EPA. They were emitting massive amounts of pollutants. Did anything happen when they reported that? Incorrectly as it was. So, no. So for years, Boeing reported levels of a very famous carcinogen chromium in quantities that would have essentially, you know, poisoned half of the city of Portland. And when we first saw those numbers and saw and mapped those numbers, we were, we have a real story here. And then once Ava started digging into that story and discovered that those numbers were false, the real story became, how did the EPA just, how did this horrific data exist in the database years and federal aid, and Boeing eventually did correct its numbers, but at no penalty for having incorrectly reported them before. Basically, facilities are allowed to submit as many times as they want to the database and it will be revised. Well, that's disturbing. Ariana, if I might invite you to join the conversation. So having discovered where the hotspots were in the country, what then, what was the next step in outreach and engagement? Yeah, so I personally find this map terrifying. And we had a feeling that most of our readers and especially those readers who live in what we call fence line communities, which are the communities that are directly within the zone where you might be breathing in industrial pollutants would likely walk in and have a lot of questions. The first one generally being, so what does this mean for me? And we started looking into this question long before we published anything on our site and engagement reporter on our team named Maya Miller was making phone calls to some of the communities that we'd identified in the data with the highest level of risk. And what she found was that none of these two communities were the same. There were lots of different things happening, different size towns, different size cities, different types of efforts that people on the ground had already been making to that point. There were some places where people had been fighting industrial pollution for years. And then there were other places where people had absolutely no idea that there was an industrial facility set up nearby. So Maya started logging all of those questions. She started figuring out what we expected some of the response to be and wrote up a guide that we plan to publish the same moment that we published this map. And we also on our team are constantly thinking about what are the stories out there that we don't know? What are the pieces of information that people in a community might have that we as reporters can't request from the government that we can't find in a secret document and wanted to set ourselves up to be able to receive those pieces of information. So we drafted a survey, we distributed it in as many places as we possibly could in order to try to invite that kind of response and basically tell the stories that we hoped would actually make a difference in people's lives. When I hear you say distributed, I think internet because that's the world we live in and ProPublica obviously publishes only on the internet but you did more than that. Yes, in this case, we took advantage of the fact that we had very physical idea of where people were. We have a prop, I don't know if you can see it on the green screen but yeah, it's kind of pixelated but we sent out postcards to I think something almost 9,000 postcards to people who lived in the top most polluted zip codes. We reached out pretty systematically to different community leaders in these places. I had one person on my team spent a day calling yoga studios near the industrial facility trying to help get word out. We asked actually many of our ProPublica readers who are likely on this call helped with the effort, printing out flyers, going door to door, talking to their friends, talking to their family and saying like, hey, do you know someone who lives in this neighborhood? You should see this. Do you know anyone who ProPublica should be talking to? So it was a pretty, I think it was, I would argue that it was the most extensive effort that we've ever put out. And what was the response like? Did you hear from people? We heard from so many people. And we heard from people who, one of the things that I love about readers' response to this project was that we heard from people who were like, had been asking themselves questions for a long time. Like they knew people in the community who had cancer. They knew people who were wondering whether this facility was nearby. And it was like the second that they had the information available on our map, they mobilized. And we have no idea what's going to happen when we put journalism out into the world. But we heard, I think within the first week from nearly 300 people and all of them, they weren't just writing into us. They were also looking around and saying like, who do I talk to here? What do I do? And kind of kept us surprised as they were protesting, as they were talking to local representatives, as they were pushing for the kind of impact that they hoped would make their friends and family and children safer. And as I, we all know, as I recall it, there was some reaction from the EPA. There was. Ken, and I imagine Ken can actually speak to this in great depth if you want to take this one. I'm going to toss this over to Layla who has written, I'll help write the update post here. Oh, Layla may still be having audio issues, unfortunately. The EPA has essentially vowed to look into our reporting here and they are taking a look at evaluating some of the chemicals that we reported on and whether the levels that are currently listed as safe should be listed as such and have said, also said they're going to revoke. Texas had a high barrier for one of the chemicals that we had written about, which made it sort of a really attractive place for some of these petrochemical companies to work. And the EPA has told Texas that it should no longer use its more lenient levels. I'm very sorry about that. I, yeah, I'm sorry. And the other thing that happened was that we, so Congress proposed a $100 million air monitoring bill which would allocate millions of dollars of funding every year for states to add new air monitors near the sites. And we've also been told by our sources that the EPA is sort of using our map almost as an internal tool to sort of comprehensively rethink its air toxics program. All right, well, I want to move briskly because we have a large number of ProPublica projects and the time is flying by, but before we get to the next one, I'd like to invite our data editor, Ryan Groskey-Jones to talk just a little bit about something we call bulletproofing because you might be asking yourself, wait a minute, how do all these journalists know their statistical analysis is correct? Ryan, how do we know? Well, we call it the two pilots in the cockpit rule, especially for a project as large as ToxMap and the other ones you'll hear about during this hour where basically somebody on my team, the data team is intimately involved with these large projects that involve data analysis pretty much from the beginning and especially towards the end. And so, bulletproofing in this case started during the reporting process when Layla and her co-reporters found that by the looks of the data, Boeing was poisoning half of the city of Portland, which seemed too good to be true and it was, but once we figured that out, it was kind of like, okay, so how many Boeing and Portland's are in this data? And so that's when she mentioned, we kind of had an all-hands-on-deck effort to track down the actual emissions, call all the companies that we could and basically figure out what our error rate was and whether we were comfortable with it. And once we were, we continued with the reporting and then at the end of the project, it's code checks. So, checking to make sure Al and Layla's code is taking in the data that it's supposed to, fitting out what it's supposed to, checking every figure in the story that comes from something we calculated and making sure that somebody who isn't Al or Layla can arrive at the same number. As you can imagine, it's a pretty involved process and one that, like I said, in this case actually started months before the story even came out. However, we do some form of this for any story that we publish that involves what I would say is anything more than simple math. It is almost always run by somebody on my team or myself to make sure everybody understands what's going on. Yes, indeed. And I will say as the editor-in-chief, when I grasped exactly what we were doing, I turned to Scott Klein, one of our senior editors, Japanese managing editor at ProPublica, and I said, you mean if we make a mistake, it's a million item correction? And he said, roughly, yes. So we try to avoid million item corrections. I would next like to turn to our colleague, Irina Wong, who's gonna tell us a little bit about an absolutely fascinating story that she worked on about Salmonella. And I should say, buckle your seatbelts for about to go on a science trip. So, over to you, Irina. Thanks, Stephen, it's so great to be here. So this project started because my co-reporter is Bernice Young and Michael Graybell were looking into a Salmonella outbreak that they noticed on the CDC webpage had been closed in 2019, about a year later, but notably the page did not say that a source of the outbreak had been found. And so they had a really basic question, which was, was this particular Salmonella outbreak still going on? Which I think is a very valid question as a consumer of Shaken Myself. And so what they ended up doing was, they started the reporting process and they were just trying to understand the scale of Salmonella illness in the United States. But over and over again, they were encountering people who were telling them about a particular type of data, genomic sequencing data that they found very intriguing. But unfortunately didn't have a whole lot of experience with. At that point, I came into the picture and it was sort of a lucky break because prior to becoming a data journalist, I had completed a PhD in electrical engineering where, yeah, I promise this is going somewhere which is electrical engineers actually often do a lot of work in bioinformatics which includes analyzing genomic sequencing data. And so when they mentioned that they were looking at DNA sequences and trying to better understand its role in this investigation, I thought, ooh, this sounds really interesting. I would love to revisit this part of my past. And so what sort of to boil it all down, what ended up happening was, we ended up taking a lot of sort of the traditional tools of reporting and including results from Freedom of Information Act requests, as well as this publicly available genomic sequencing data, combining it and definitively getting an answer about the fact that, yes, the Salmonella outbreak was still going on and also putting numbers to exactly how the outbreak is still affecting consumers today. So genomic sequencing, just to be clear, we've all become, I think a little bit more familiar with it from our COVID experience for sort of all epidemiologists now, but tell us a little bit about what that is. How does that relate to Salmonella and chicken? What is the genomic sequence? Yeah, so a genomic sequence is a sort of, you can think of it as a spelling or an instruction set for any life form. Every single individual that is alive has a unique genomic sequence. However, the more closely related two individuals are to each other, the more similar their genomic sequences will be. And so what this means in the case of bacteria is bacteria that are very genetically similar, that is have very similar genomic sequences, have a higher probability of just sort of functioning the same way. And that also means making people sick in the same way. This also gives us clues in terms of how related, not just in terms of the kind of illness that they can cause, but it likely, you can get some idea of how they evolve and how a genocentrally similar they might have been and this can hint at their origins. And so genomic sequencing has been really interesting for the food safety world because it allows an understanding of bacteria at a level that previously was just not attainable. We're getting more and more information about what makes particular bacteria unique in terms of the properties that they have to make you sick, how they can resist drugs and even some glimpses at maybe how they came to be. So again, let's sort of a pattern here. These were genomic sequences that the US government had in its hands that we reviewed. Have they connected the dots to know that the epidemic was ongoing? Yes, I think that's sort of just a very plain answer. It's yes, but what was missing was they didn't have an answer as to why it was still ongoing and that became a large part of the piece. But yeah, the short answer is that the CDC, the USDA, they all have very qualified and very smart scientists working on this problem, but the issue became a communication issue and a regulatory issue. Communications meaning they didn't tell us. Yes, that's correct. So as we sort of suggested earlier in this when we were talking about the issue of pollution and this has been my experience as a journalist and editor, the big question anyone has about a health story is this is all fascinating, what about me? So Ken, I gather we did do some things to help the ordinary consumer find out exactly how this related to them. And we'd love to hear more about that and also how we used outreach as a part of that process. Yeah, sure. We, when the reporters came to us really early on, they said, the interesting thing with this is there are these numbers on all of your packages of chicken and I encourage you to look for them. They're called P numbers, they all start with a P and you can go to government data and actually research, use that number to research some of these institutions. They all tie to meat processing plants. And knowing that, Andrea Swazo who works on my team figured out how, and it was a much more complicated process than you would even think, to take those numbers and tie them back to inspection data and other data about chicken processing plants. And the end result of that is she and Ash New who also works on the team were able to create a really easy, simple to use lookup application that allows you to pick up a piece of package of chicken at the store, read the number, type it into our app and it will tell you where that chicken processing plant, basically what it is, where it is and whether there was more or less virulent salmonella found than other related packaging plants. And we find some interesting things in that way. So like whole chicken for example, like a whole chicken you buy at the store has less salmonella found in it or less bad salmonella anyway found in it than highly processed chicken which goes through more machinery and more steps and more processes. And so the higher processor chicken is we found more and more salmonella at those plants. We've also found another interesting thing that we found in that is that you can see that a lot of different brands of chicken are all coming from the same plants are all processed through the same plants. We wrote a story about that. And this is one of these things that people had never been able to do being able to analyze the supply chain. But because, and I will pass this to Ariana on this because we had set up a call out and asked people to give us information on the chicken that they had or that they found in the stores. We were able to tie using images, individual sort of brands and packaging of chicken and stores to plants. Ariana, do you wanna talk about that a little bit? Sorry. So basically the cool thing here was that we had thousands upon thousands of people going to the store and looking up and obscure a little watermark on their chicken. And what we wanted to do was like there's this incredible amount of knowledge out there from our readers that our readers are collecting. What happens if they give it back to us? Because there was no public database where you could go look up how the chickens had traveled. Like I live in New York, I get a certain kind of chicken. Where does that come from? But what about my family in Missouri? And we at this moment had me looking at chicken, I don't eat it, but went to the store to look up chickens anyway. We had people in Missouri going to look up their chickens and we had generous pro-publica readers from every single state going to the grocery store, looking up chickens, typing in the P number and helping us start to map out parts of that supply chain. And it's just not, there unfortunately isn't an available database for anything. So when we don't have one, sometimes it can be pretty powerful to start to create one of our own even if it is incomplete. So before we jump off this, Ken and Arianna, if somebody wants to go to that app, could we mention what the URL is? And if somebody wants to send chicken checker data in, I assume that app will help them do it. pro-publica.org slash chicken. Chicken. Chicken, yep, pro-publica.org slash chicken. Okay, so anybody out there who wants to either check out their chicken or participate in our ongoing attempt to map the supply chain, feel free. I know myself and my households a couple of times a week that we can add to the data. So I wanted very quickly, because again, I wanna leave some time for questions to get to our colleague, Ellis Simani and have him talk a little bit about something that was a huge scoop this year, which is pro-publica's obtaining of data relating to the IRS. So once again, Ellis, I'm imagining that it was just a piece of cake, everything just rolled in and we started right in the next day. Is that about the way it went down? Oh yeah, yeah, it was walking the park. No, it was definitely the highlight of my journalistic career thus far. It's not every day where such a historic and really monumental leap comes across your desk as a journalist and the first step when you get any kind of source of information is to always vet it. I saw in the questions that somebody was curious to know a little bit about what that process looks like for us. It really isn't in some ways that different in any kind of dataset we get as data reporters. We always wanna take the time to treat this as we would any other source and ensure that we understand what it is and whether it is what we think it is, which was definitely a big part of this project because this wasn't data that we had solicited in any way. It just kind of came to us in large part due to what we understand the reporting that some of my colleagues, Jesse Isinger and Paul Keel had done, kind of looking at the gutting of the IRS as an agency. And among the steps that we took, it's really just looking at, what is the information in this data? How does it compare to what's in the public? So for folks that don't really know too much about the project, essentially, we received a wealth of records concerning the taxes of some of the wealthiest, the 1% of the 1% when it comes to the folks in the country who like the Elon Musk's and Jeff Bezos's of the world and really our task as journalists was to really make this information digestible, interesting and fun to read about. I think that for me was some of the most enticing parts of the experience, just finding ways to make taxes interesting. And I think that was something that we really took a lot of steps to do, whether that was kind of finding ways to tie Peter Keel's $5 billion IRA to Lord of the Rings or kind of having fun with talking about the ways in which sports owners can use their team as a way to lower their tax bills. It's really been a project that I think has been a labor across several teams in the newsroom, which is something else that we also get a chance to do is really collaborate. I think data is really in its very nature a very collaborative process and it really brings a lot of talent together to try and tell really complex stories and simple and clear ways. To back up just a little bit to the sort of beginning question because I think it is, it's a good question. So this data comes to us from a source whose identity to this day we don't know. And it is a lot of material about a lot of people. Some of them were pretty darn private. How do you know that maybe somebody isn't like trying, somebody's trying to do something disinformation wise. So some of it's correct, some of it isn't. How did we verify that this was what it purported to be? How do you do that? Yeah, so what that looked like was really going through and for dozens of individuals across our datas, comparing what we saw through to both public and private records. So in some instances, there's a well-known op-ed that Warren Buffett wrote about his taxes in which he kind of cited specific figures that we were able to compare to what we were seeing in our data. There's other instances in which politicians who might be running for some kind of governmental office might disclose some of their records. So that was another piece of information that we could use to compare. But we really had a wide breadth of data that came to us and it wasn't just line items on people's individual tax returns. It really kind of spoke to looking at specific entities that we also reported on, like people's sports teams. And there have been leaks that have come out into the public kind of documenting the profits that certain sports teams make. And that was something that was very crucial in some of the reporting. I was a part of kind of looking at comparing the information we were seeing in our data to what had been kind of put it in the public. And oftentimes, we were really able to expand on what had previously been reported with a lot more breadth and precision given the records. And at the end of the day, like any data can always have errors. And one of the things that's crucial to all kind of components of reporting we do here is giving people who we write about the opportunity to dispute or kind of offer an opinion on anything you write. And so everybody who was named in the story had an opportunity to read what numbers you would use. And to my knowledge, nobody has disputed that any of the figures weren't accurate. A lot of times they've had, you know, sometimes hundreds of words to add in terms of contextualizing how they view their taxes that they pay. And we kind of took the steps also to publish that. And so you can read what say the owner of the Cleveland Cavaliers has to say about, you know, the taxes that he pays or, you know, Michael Bloomberg or Warren Buffett, they also kind of commented on what they felt that how the tax system kind of relates to them. But those are all just, you know, kind of pieces of a long puzzle of the reporting process. Looking at all this stuff, which you've now done for, you know, more than a year, do you, what surprised you from all this? What struck you the most when you looked at all this material? Yeah, so, you know, I'm not somebody who had done a ton of tax reporting prior to this experience. And I think the biggest thing that really kind of came full circle as I was actually listening to Trevor Noah talking about billionaires and taxes on the Daily Show not too long ago is really just how much the ultra-wealthy just exists in a completely different system than we do. One of the really things, the key things that we tried to hit home was like this notion of income that for normal wage workers to go to work and get a W, you know, get a W too and kind of have their wages kind of filed in a normal fashion. Income is just a normal thing that you report to the government. Kind of shows up on your taxes, but for a lot of the ultra-wealthy, you know, income is a choice. You know, a lot of folks have, you know, crazy stock options and other means of taking in money that exists in a way that isn't necessarily available to the normal taxpayer. And that's something that throughout the series we really tried to hit home as the various kind of menu of items that are available to a select portion of folks in the country that aren't really available to the majority of others. And that was something that I don't necessarily think I had a really full conception of prior to digging into this stuff. But I think that's really kind of best kind of personified in this notion of income and the idea of, you know, a selection of people in the country having a choice as to whether or not they tell the government that they made money in a given year and whether that ultimately is something that's taxed. It's really a choice at the end of the day for some folks. All right, let me get one more project mentioned here before we go to questions. Certainly one of my favorites, ProPublica has a program where we cooperate with local journalists. They submit ideas for projects and we support the reporter for a full year and provide a lot of additional help. And I'd like to ask Ariana to talk about West Palm Beach Post and their story on the burning of sugar cane fields which turns out to be, we're back to the environmental world incredibly hazardous to people's health and breathing and so on. So how did that project work? Because the data, once again, was not necessarily just sitting in somebody's hands waiting to obtain through Freedom of Information Act. So how do we get data for that one? Yeah, so I think environmental journalism comes up in conversations like this all the time because it's particularly important when you're looking at the types of evidence and who is being affected. And so in this case in Palm Beach, there was a community of people who for decades had was concerned that they couldn't breathe when the local sugar companies, there's a major industry there around sugar. We did what they called a burn, which means that they were clearing the fields. And so if you live in this community, you'd see pieces of ash literally floating in the air. There are people with asthma, there are kids with allergies where they would just have certain days where they wouldn't go outside because it was so dangerous for them. And this was a suspicion that the community had had for a long time. It wasn't a secret. It was something people had been talking about, worrying about, concerned about. But the EPA, local monitors, the sugar companies themselves said, don't worry about it. I know it seems bad, but the air is fine. It's completely safe to breathe. So when Lulu Ramadan, the reporter at the Palm Beach Post who helmed that project came to us, she was like, what do we do to actually investigate what people are saying? And there was no air quality information that we felt was we could trust, that we felt was good enough to be able to reassure people. So our team in combination with Ken's team figured out a way to give people the power to actually test the air themselves. We bought a set of purple air monitors and found, I think at first it was six and ultimately five purple air monitors deployed into the community in people's homes. And at the same time, every time there was a burn, we recruited a larger group of people from the community to respond to a tech spot. So there's a burn one day, we see that there's a spike in the purple air monitors and then immediately sends a survey to all of these people who agreed to participate and report on the air to tell us whether that was corresponding with the moments people saw Ash outside when they were having trouble breathing. And what we found at the end of this was that the EPA did indeed have monitors up but they were missing things that our monitors were catching. And a lot of this kind of anecdotal qualitative information added up into something that really, really, really needed people with better monitoring techniques than we had with our five purple air monitors to come in and check out. So we couldn't collect the kind of data that perhaps a bigger institution could collect but we could at very least collect enough of it to identify a problem. I think we're gonna jump into some questions here because we still have a goodly number of folks and I know there are a number of questions that are floating around. So, Rosia, why don't you take it from here for a moment? Yeah, we are going to jump over to our Q&A but before doing that, I just wanted to share a link to our event survey in the chat box, which I am adding in now. We do appreciate your feedback and do take into account everything that you share in that survey for planning future programs. In fact, this very session was curated in response to the feedback that we received from you all requesting to learn more about our data and engagement practices. So we do pay attention to those. Please take a few moments just to submit some feedback. It's a very short survey. And again, if you'd like to ask the question, click that Q&A icon at the bottom of your screen to submit it to us and we'll get to as many as we can before we close out the session. So I'm going to go ahead and throw it back to Steven for our first question here. Okay, well, this is one I think I might throw over to Ryan. So an attendee asks, how does your organization deal with issues of the Freedom of Information Act around getting data? Have you found ways to speed up requests? Have you found a way to ensure you get it as raw data? So Ryan, how are we doing on that? If I could do that every time, I would ask the genie for two other wishes because clearly somebody must have granted me amazing powers. I mean, at ProPublica, we're lucky enough to work with both a great legal team who is very supportive of helping us from even just starting to write a FOIA letter, helping us come up with language, like what would be the best way to get what we need to as often happens, getting an initial denial and having a strongly worded letter to return back to the agency to let them know we're not fooling around to also win appropriate going to court. And in some cases, it's doing for access to the data. So knowing that we have that in our back pocket is really great. We also have a FOIA Slack channel where I get another great thing about ProPublica as I work with dozens and dozens of amazing accomplished investigative journalists all who have written countless FOIAs before. So we have a FOIA channel where people will kind of ask questions and say, hey, have you ever gotten any luck getting records out of this agency? Or I got this response back from this agency. Has anybody ever seen that? And everybody is super helpful personally with the data question. The thing we run across most is agencies who follow the letter of the law and will try to give us back responses, data that are printed out sheets of a spreadsheet or PDF when it's very clear that they have the data in a CSV or another tabulated formula. And so that's normally what I'm going to our legal department for help in actually getting the data back from an agency in a file type that we can actually use for analysis. So yeah, I don't know, FOIA, we could do a whole other hour about FOIA techniques. Yes, we could. And I just want to say as editor-in-chief of ProPublica, Ryan's somewhat relaxed patient description of this, no way describes how journalists really feel. How we really feel is that the government, any government virtually outside of maybe Florida and sometimes Texas doesn't follow their own laws, does not turn the data over in the forms it's supposed to on a timely basis and it's an outrage. And the reason it doesn't happen is because your government would rather not turn over the data and therefore doesn't staff these departments. But anyway, I'll get off my soapbox because we've got to get to a good nerd question, which is favorite data tools, Ellis, Ryan, everybody, what's your favorite data tools? The nerds in our audience of whom there, I suspect are more than a few, want to know what do we use? Jump in. I saw somebody talk about this in one of the questions. I was just mentioning that a lot of folks in our team really love open source notebook software, whether that's Jupyter notebooks or observable notebooks. Essentially, I think tools at their heart that are really collaborative and allow us to kind of iterate through working on a visual or a piece of analysis in a way in which we can send to an editor or another reporter in a really quickly way, a really quick way. And so those are the tools that I really like and kind of come back to, but curious whether if there's others, I know Layla mentioned Google BigQuery, that's one that I think has kind of been on the bubble as well among the teams. Other favorite tools, don't hold back. I think on our team generally, we were like split between R and Python. There's two teams. And I think with a good backbone of SQL that nearly everyone is comfortable with, which is useful in using things like BigQuery. I think I saw somebody write a question about Excel and I still use Excel pretty much every week just for small things. So if something comes my way and I need to check it really quickly, Excel is tried and true, can't handle big data sets, but it's great. But it is kind of funny to see the folks on the data team kind of try to convince each other to use, like, oh, R can't do this, can it? Or I'll teach you how to do this, but you can only do it in Python. So it's like a lot of nerd herfors, I guess. The best kind. Very good engagement question I wanted to pose to Ariana. To what extent, if at all, do we, ProPublica, help communities organize once we have obtained this data? And if we don't do that, what do we do? Yeah, we are journalists, we are not advocates and we don't help communities organize for policies. We don't help them organize for political causes. What we do is journalism. So when we know something is important to cover, we want to do the absolute best job we can covering that issue. And it's also really important that we don't go into it with a particular point of view before we decide to crowdsource from a community as we continue to cover from a community because then whatever we find could, people could come in and criticize it. Like they could come in and they could have fair criticisms about it because we need to go in objective. The way that we like to think about this is that we have full editorial control over what we decide to cover, but we do have a point of view as we make those decisions. So if people come to us and they say, there's this incredibly terrible thing happening in my community, we want to write a story about it. And we feel like that's kind of our biggest lever in order to be able to lead to impact to create change. When people participate in our investigations, we do keep them posted as things happen. So in many of the cases, in many of the projects that I've seen in my time here, there has been some amount of impact. And as we stay in touch with people, we talk about the issues as time goes on, we've seen that inspire change just by the very fact of people being informed. But it's a really interesting line. It's something we talk about a lot. We talk about a lot on all sides of our organization, but particularly on my team. So if you ever want to talk about this, I have at least seven people who would love to, I think talk about exactly where that line sits at any moment. I want to follow up with you. And then with our colleagues who I think all can give advice on this question arises. What advice would you give early career journalists who want to be engagement reporters? I'd invite others to talk about their specialties as well. So, Arana, what would you tell a young person, a young journalist who wants to be an engagement reporter? Read as much journalism as you possibly can. That involves crowdsourcing techniques. Figure out what exactly they contributed to the story. See if you can figure out if there was a call out, if there was a survey, if there was some big outreach campaign. How exactly did that make it into the published piece? Try to reverse engineer what you think the technique was into what the finding was. How did this contribute to the headline? And then I would say engagement reporting, data reporting, working on our news apps team. It requires reporting. It requires honing your journalistic instincts just like any other kind of reporting requires. So any learning how to interview really well, thinking about people's motivations for why they want to talk to you about something, learning how to bulletproof. Someone in the chat asked a question about how we verify submissions. And that's something that people on my time certainly spend a lot of time doing. And that's fact-checking. So learning, I think there are young journalists who start as fact-checkers at different publications that can be a really interesting way into the field. There are people on my team. There are jobs across the audience sector where you learn about how to make things show up in search, how to get people to attend an event, how to write useful guides, and all of that, even if it's not necessarily investigative in nature, it does give you a toolkit that you would find yourself using as a journalist here. Leila, what about you? News apps, what does a young early career journalist want to think about? Yeah, I mean, I think first of all, it's certainly helpful just on a basic level to develop your skill sets. You're gonna be able to just think so much bigger about the stories that you can create if you can code on the front end, if you can write JavaScript code. And then also to be flexible. Me personally, I really like to also have my hands in GIS mapping. I'm really interested in QJS and kind of developing my skills in that area. And that's been really helpful to me in just like as not only a tool of visualization, but also as a tool of story finding, a tool of analysis and kind of thinking visually. So I think developing that toolbox and that skill set is really helpful. And then I actually started out as like working in local radio and just kind of padding the pavement at community board meetings and shoving microphones in people's faces and not really knowing what I was doing, but that was very helpful because I just learned 10 of the nuts and bolts of just basic reporting like Ariana was saying and I think that that's really invaluable too. And I feel that a lot of stories, you can get them from your desk, but a lot of them you can also really get them from the field and that's what the radio reporting taught me. And so I think that that's a valuable place to start out to and while you're sort of developing the other more technical skills. Ellis, Arina, what advice did you give an aspiring data journalist? Well, I think I'll actually kind of lead this into one of the questions we've got, which is what one of the things was that somebody asked if we like collaborate with academics or kind of work with the folks that are kind of in that field. And I think to me, one of the draws of coming to ProPublica and continuing to work here is just that we have the time and space to really go very deep into particular subjects. And I think that's one of the things that for me as like a young data journalist, I still consider myself a young data journalist for sure, that I would just suggest continuing to do is to just find things that you're passionate about and really just soak your teeth into them. And I think sometimes there can be a lot of, I know I feel personally a lot of anxiety around how quickly technologies come and go and like new tools and what's the cutting edge for making slick visuals. But I think that at the end of the day, like a good story can be told in a number of ways. And I think that for me, I just try and really ensure that I'm being creative in how I approach data, whether that's building datasets that don't exist or collaborating with folks that are really deep into research and can offer new insights. I think really just being well-rounded is something that I've found to really be beneficial. And yeah, that's one of the things I would suggest. Arita? I think I'm definitely someone who learns by doing. And so for me, I would just say like echoing Ellis' point about like don't get overwhelmed by fancy tools, pick a project or a thing that you're really interested in and just try fiddling with it in maybe a new software language. And also just thinking about reading widely too. And whenever you think about an issue that you're just puzzling over your head, think, well, what are the numbers that would convince someone further? What would I really want to know more about this topic using numbers statistics? One of the things that really has been striking to me is someone who sort of went from the newspaper age to now. The bad news was that the entire business model of journalism was destroyed by essentially modern life, the internet, the rest of it. But we have never had such tools as we have today to do original research. I want to come back to you, Ryan, maybe just one last question. It's a little scary being out on the cutting edge, isn't it? I get scared sometimes. We used to have somebody whose favorite thing to walk over and say to me is, we're going to do a data project. And what we've concluded is that everybody in this field is wrong and we know the one true way. And that always frightened me a little bit. I'm sure it scares you too. How do we assure ourselves that we're not like off on a tangent somewhere? Wow, that's a heavy question. Well, I think number one is coming to these inquiries without thinking that way, that what we figured out is the one true way. Often there isn't just one true way. There's many ways to look at it. And as long as we've really done our research and our reporting and can, you know, at least determine that our way is defensible and that we feel good about it. And it's worth sharing. You know, I think that's the best we can do. I think that I came to data journalism as a journalist, which is actually kind of not a common thread on our team. The majority of the folks on our team actually came from, like, Irina, the sciences or from computer science or mathematics or statistics. And what that has kind of taught me is I always ask my team, like, well, what was great to you about journalism? Like, what made you want to leave what you were studying and become a journalist? And nearly all of them have the same answer, which is just like the speed at which you get your work out and that you see results is just so much faster than when you're working in a lab or when you're toiling in academia. We're using all the same skills and the same approaches, but we're actually getting to go out in the field and talk to the people behind the data. We're getting, although, Steve, you might disagree as some of our projects take months and years. You may think that this is silly, but that often is working a lot faster than our friends in academia. So, you know, it's just, it really, to me, if you're interested in data science and then kind of measuring and finding the world around you, data journalism and engagement journalism is just, it's just an amazing tool to be able to do investigations that really make a real-world difference and get out in the world. So, I don't know. I don't know if that really answered your question because I think we all have to be careful and never feel like we're 100% correct about anything. But we can get to 99% and, hey, that's pretty good. And that is a great philosophy. With those humble and humbling words, I think we will call it a day and thank everybody for their questions and thank our panelists for sharing their experiences. Yeah, that is our time for today, all. If you enjoyed this program, I do encourage you to check out our next program happening on May 26th, which will focus on affordable housing. You can take a look at our events page, which is dropped and we'll drop the link in the chat to register for that. I wanna thank our panelists again for this incredibly engaging conversation. And of course, our moderator, Stephen Engelberg, thanks again to McKinsey and Co. for the support of today's events and thank you to our audience for joining us and all of your thoughtful questions. You all really just submitted a ton of questions more than we've had in a while. So, thank you all so much again for that. Again, this event has been recorded to receive an email with a full video of today's event. And we'll also post this recording on the ProPublica YouTube channel. And from all of us at ProPublica, thank you again for joining us. Have a great rest of your afternoon and we hope to see you next time. Take care. Thank you.