 Cool. Hi everybody. I'm Kelsey Braceman and I'm from the Environmental Data and Governance Initiative. And today I'm here to talk to you about our Project Environmental Enforcement Watch, which is an approach to environmental data justice through participatory data science. So that's already a mouthful and you can look us up at these Twitter handles down here. But basically my organization, the Environmental Data and Governance Initiative, which we call edgy, is a group of people who got together mostly in the wake of the Trump election over concerns about environmental data and how it was going to be handled by an open climate denier having control over a lot of this information. I actually wasn't there at the beginning. What edgy has grown to over time is more of an interdisciplinary collaboration between people who are volunteers and people who are paid through foundation grants up to half time. Nobody's a full-timer. And we have people kind of filter in and out kind of as suits their schedule and interests. And we organize sort of based around what people find worthwhile to do. So there's no real management or bosses. It's much more of a, like, does this seem like a valuable thing to do. And if not, we just don't do it. So what you see here is a cover of a bunch of, or the slide is a compilation of a bunch of covers of reports that have been collected over time. And one of the things that I really like about edgy and our approach is that it's very interdisciplinary. So we do work on data and it's right there in the name. But we also have this fully contextualized concept of how to critique the federal government and its accountability to environmental governance. And that includes sometimes anonymous interviews of people inside the EPA to get their take on how, like, how does this administration treat you or treat this kind of science or policy? It also includes web monitoring, where we've done a lot of web scraping and then like hand accounting for changes, like including just typos in federal agency websites over time and whether that's important or not. And then over time, as we've been working on this, what we've seen is much longer term patterns than just a single administration, which is that as many of you may know from these previous conversations in these presentations, there's a lot that's just kind of not done well in general with respect to data. And part of that's just a failure modernized, but also a lot of it is rooted in racism and much bigger problems. And so what we work on here, and especially through the Environmental Enforcement Watch Project, is we primarily use one data set. And that's the EPA's enforcement and compliance history online. And I'll talk more about that later. And we look at it through this lens of environmental data justice and environmental data justice combines environmental justice and data justice. So you might have heard about data justice is about, you know, whose data is this, whose data matters, in what way does this data need to be formulated in order to be understood as meaningful or legitimate. And I'll talk more about that later as well. And then the combination of that with environmental justice, which says, okay, well, who are the people who are bearing the worst brunt of environmental problems, such as climate change? And of course, you'll see a lot of overlap with historically marginalized societies, you know, in general, none of this is terribly surprising. But our lens is to say, okay, well, how does data intersect with that existing injustice? How does the collection of data, the process of the collection of data, the legibility of the data, like who is it legible to and why and for what purpose? And does that actually protect the people on the ground who most need protection? So that's our basic challenge. Taking this database and saying is the APA protecting me? And what we do is we go to communities and this will be environmental justice communities or waterkeeper groups or activist groups. And it kind of runs the gamut of folks who are like, basically well engaged with whoever is excited to work with us. And we try to help them answer that question. In this case, using this data set. A little bit more about the data itself. So all of the data that we're using for the Environmental Enforcement Watch project comes from this open data set. And if you go to echo.epa.gov, you would see this home page. And there's a lot here. There's both too much and too little, I would say. And just to kind of run you through what the user experience is of their website. If I put in the zip code where I grew up, I get a map that looks kind of like this. And I've already opened one of these, but you can see these little flag things that are different colors with stripes on them. Not totally sure what those mean. Actually, I've been using this for a while and I still don't know what those mean. And if you open one of them up, you get a particular facility. And the way that this is arranged is basically facilities are monitoring, facilities are monitoring their data and reporting it under the federal EPA and state EPAs are not included in any of this. And if they're reporting under the Clean Water Act, the Clean Air Act, or the Resource Conservation and Recovery Act, which is more or less hazardous waste, that's everything from like energy generation plants wastewater treatment to like your local CVS might have one of these or probably does. Anyway, so if you click on one of these, you can see what's being monitored here. Super helpful. The one I happened to click on is located at Unspecified Washington. Not really sure what this is. It looks like it's probably some kind of road construction thing. And under the Clean Water Act, it looks like 12 out of 12 quarters that are shown here are red, which means there's probably violations. Days since last inspection is not applicable. So that's probably never been inspected and there's been no recent enforcement. This is actually one that I just clicked on. It's not one I selected to showcase and that's just, you know, they're not all this bad, but some of them are worse and that's what you get. If you want to drill down on that a little bit more, I think this is for a different facility, just so you can get more colors on here. But basically, if you clicked on that link at the top about the project facility itself, you'd get a whole bunch more. And what you'll see is like, okay, well, no violation identified, no violation identified, violation, other violation. I don't know what that means to you. And then you have a bunch of quarters of significant non-compliance and it looks like failure report DMR not received. Because I've been working on this for a while, I know that DMR means discharge monitoring report and it has to do with what actually got released into the water. And that's not actually written anywhere on this page. I did clip it, obviously, but that's all you get. And you get a bunch of other things that you probably wouldn't understand as an average member of the public. And like what we see here is that there's missing data. So did they break the permit? Did they not? Well, I mean, they didn't report so we can't know. And what that can mean is like, okay, well, if I'm trying to make a statement about whether the people in my community are getting unusually sick based on the release of certain chemical, I just won't be able to prove that because the data is just missing. So there's a lot of information. There's a lot of patchy information. Um, what can we do with it? So we started making a different interface to this that was oriented less around enforcement itself. So that dashboard is mostly what mostly used, I think, by the EPA and by facilities so that they can be basically in public conversation about, did you fill out your permit paperwork? Yes, I did. No, I didn't. Was I filling the terms out correctly? But it is our opinion at edgy that this sort of information is of interest to people who actually live there. So we decided to see what we could do by talking to local communities. And we wanted to get people kind of hands on with the data so that it's not just us saying, Hey, did you know that a bunch of bad stuff is happening around you? We wanted it to be a little bit more empowering than that. You can already tell like if you go look at this stuff, it's kind of like, Okay, well, that's bad, but aren't you the people I should be telling? Shouldn't you be doing something about it? And they're not. So they're not resourced for enforcement because as we tracked during the Trump era, a lot of the funding was cut and it was already going down. But that's, you know, sideline. Anyway, so we decided to give people a tool to say, Okay, well, if I wanted to advocate in my community, maybe I want to go talk to a facility directly, or maybe I want to make a big stink so the EPA has to come check this one out or something like that. I want to be able to say, Hey, I looked at the data personally, and this is what I found. So we built a number of notebooks. And we specifically built them so that people who don't necessarily know anything about code should be able to kind of step through it and not not feel like it's way above their heads. And we're working on different methods because as I'm aware, these are, these are still kind of scary to people who don't code, but we did put some things in here. So like beginner instructions in the markdown of every notebook, we have this how to run this notebook. If you click on a code sell a play button will appear. We use Google Colab to host these, which is where we get these little play buttons. And so it'll run the code. And then we add comments that are not just like developer comments along the lines of this is what this function does, it says, okay, this is going to build a query by appending details to a variable called SQL. And I was just going to call out to that we work with a partner at Stony Brook University. Echo actually provides you saw those 12 quarters of data. They actually have a lot more data, but they only display the last 12 quarters, which is not super great if what you're trying to do is compare across, for example, presidential administrations or if your issue happened more than three years ago, if you're looking for real trends, that data goes back at least 30 years. I think it's patch years it goes. But what we've done is, well, our partners at Stony Brook have set up a script that scrapes, I think it's on the order of every week, and compiles together and joins a whole bunch of different CSVs to make them SQL queryable, which they are not through EPA and resources. And we're actually in the process of working to make that more publicly accessible. But if you want it now hit me up, then I can, I can set you up. And then we go and talk to community members. Usually we do this sort of this two step process where the first time we meet with them and we say, hey, here's the kind of thing that we can do. And then we sit down and we listen and we say, okay, what, what is it that you need? What are you working on? And usually they have, it works best if they have some active project. So this is one in Alaska that said, we want to keep our waterways pristine, which companies should we talk to about improving their specific effluent emissions? Or an activist group we talked to said, does my legislative district enforce environmental rules? Well, because what they would like to do is host a sit-in at the legislative person's office. Or this year, we had a group say, okay, there's oil storage facilities in our community. And it's environmental justice like historically disenfranchised community. There's new permits up for review. We don't really understand what's in them, what's different about them. We don't have time to do like a huge data analysis. And so we said, okay, well, we can, we can spend some time on it. We spent, we got a whole class involved on this and it still took us over a month to get some pretty basic details. But we were starting to be able to answer the question of, are these new permits meaningfully better than the old ones and what needs to change? So one of the first things we do, as I mentioned, we made a number of notebooks and they analyze kind of the same things. It's mostly like, did they fill out the permit correctly? Like, is the permit being followed? Is it being enforced? If so, how, like, what kind of penalties? When's the last time that was an inspection? And we turned those questions, you know, the waterways, the legislative district, the our area, we turned them into meaningful geographies. Watershed, congressional district zip code. And then these are little clips from the notebooks. We use a lot of maps and say, okay, well, this is approximately the outline of a watershed. Now we can determine what effluence are being emitted within that space. In the congressional district, we can say, okay, in Massachusetts, congressional district four, Varney Bar the Sand and Gravel is one of the worst clean air act violators. In Buffalo, New York, here are some of the sites of Clean Water Act violations of the last, you know, however long. And we do this in the setting of talking to people, because it's not just about the data that's in here. It's also about the data that people are bringing to us. They're saying, hey, there's a lot of people getting sick around here, and we want to find out what's going on. Or, hey, like, I'm worried about my future. Who do I fight? Like, what do I do? And so it's very much a collaborative process, trying to work as closely as we can with people who are really there, really experiencing it. Here to say five minute warning. Thank you very much. And I have this last bit, slightly as an appendix, but I want to run you through it anyway. But basically, the data itself has a lot of challenges. And the last presentation we heard was excellent on this, too. This is a very complicated slide, and I'm in the process of uploading to Zanota, so you can look at this more closely. But this data, first of all, we're having facilities self report data. We know that sometimes they lie. Even if they don't, there's a lack of inspection. There's bad recording. Sometimes this is submitted on paper and then transcribed. Sometimes it's not transcribed. There's a lot that's not submitted to the EPA. And then it takes weeks to months for it to get into the database, but there's not like a guaranteed time. So when we were trying to track, like if there was changes over COVID, but we couldn't because there was no like, data just kept appearing months later and we're like, okay, so now is it final? But you can't really prove anything because you can't guarantee anything. And that makes a terrible headline if you're trying to make the news. This is an extremely skimmable slide. But one of the things I want to point out here is this is the paper that introduces the concept of environmental data justice and it comes out of edgy. And it specifically calls out that one of the things that you can do in order to make progress not happen for a community is to generate uncertainty. And that's absolutely what one of these things does. We'd like to advocate for participatory processes so that people can show up and say, Hey, I need something equitable. I need something transparent. And I made this point already. There's a lot of contributors here. If you're interested in joining, there's a lot of people who come in just for a little project and then dip back out. And I also want to thank our funders. I think I have time for questions. You absolutely do. We have about three minutes for questions. And so we've got one question in the chat. It looks like it's from Jonathan, who's our other organizer in the session. And he's wondering if you've had feedback on the commented notebooks from your intended audiences? Are these effective? Are they meeting their needs? Yeah, that's a really good question. And it's one that we're hoping to answer better and more in a more structured way, I guess. Usually when people interact with our notebooks, it's usually the first time they're seeing code at all. And they're not necessarily, we haven't seen people like spook and just say, Oh, I can't touch this. So that's a pretty good sign where I'm concerned because I've been teaching code for a long time on the side. And you see a lot of people spook. But it's also definitely not the most accessible. And we're looking into right now whether we should put an R Shiny app on our website to make it a little bit more so. We'd love someone's expertise if they have great ideas for this stuff. Oh, I bet this is the right crowd to ask that. So I have another question. I'm curious, you made this comment about the timelines that the data as it's released covers quarterly data for the last three years. But as you said, the timeline of your sickness due to exposure or other issues is so much longer than that. And so how do you have any examples of being able to push for more historical data or how are you sort of addressing that disconnect? It also seems like a problem in our political cycle, right? Like we're dealing with we're electing someone and it's just four years. But then the repercussions can like spread out year over year. So I'd love to hear you talk about that a little bit. Yeah, there's just so many problems. It's hard to choose just a couple. We do we do actually have access to longer term historical data. It's it's hard to get into like all of the details on a long way back without a bunch of data gaps. It's actually you can't even tell like how many are supposed to be being monitored if you're trying to get that kind of information. But we do actually have data dating back a long time. It's just not accessible through their UX or through their API. You have to download the CSV files and manually stitch them together. So it does exist, but it's it's very hard to make a case. All we can tell people is at minimum, these bad things happened, but probably more bad things happen. Right. Well, on that note, thanks, everyone.