 On my left side, there is Dawn Walker. She is member of InventorMiral Data and Governance Initiative. It's called AATG and PhD student at the Faculty of Information of Toronto. Their talk is Ensuring Climate Data Remains Public. And it's something like the Data Liberation Front helping us in these times to get all the information for all people on the long distance achieved. A warm applause. Hi, everyone. Is that on? Oh, yeah, OK. I hope you've been enjoying your Congress so far. So like I was introduced, this talk is Ensuring Climate Data Remains Public. And in that, I'll speak to the question of how we keep important environmental and climate data accessible, amidst political instability and risk. In particular, in this past year, I think many of us have been paying attention to the United States. And I'll speak to recent data preservation efforts there. So the plan is to have an intro of why I'm talking here today a bit about what makes now a pressing moment kind of a whirlwind tour of efforts to identify, preserve, and rethink access to climate data, and then hopefully some sort of rousing call about futures for climate and environmental data. This isn't work I've been doing alone. I'll speak about many projects and organizations that thousands of people, variously coordinated, have worked on. And if you leave with one impression, I hope it's that climate science, climate data collection, and the use web archiving and grassroots organizing around data are all collaborative efforts. My plan was to try and leave room for one to two burning questions, but I'm really more than happy to talk after here by the stage. And I have stickers, so please find me. I want to give them to you. OK. Great. So first, I'm not an expert. And actually, climate science isn't my background. I'm a PhD student really interested in how designing with a framework of data justice ensures more equitable outcomes, both in forms of data collected and access to technologies. To try and think through this, I've been looking to those actively using data to try and push for other whizes. This takes the form of DIY science, counter mapping, and increasingly decentralized web projects. I actually got involved with thinking about climate data somewhat circuitously. I'm a member of a local civic tech meetup, shout out to Civic Tech Toronto, that served as a meeting space and anchor for many of edgy's early efforts. And so what's edgy? Edgy, which is kind of a mouthful, is Environmental Data and Governance Initiative, a distributed consensus-based organization of more than 150 scholars, organizers, and nonprofit groups. Edgy was formed from an email thread that started November 2016 in the immediate wake of the US presidential elections. For more than a little over a year now, we've been documenting, contextualizing, and analyzing changes to environmental data and governance practices in the US. I've tried to include at least a portion of the people who've been involved in edgy projects on the slide, but many more exist. So first to unpack the data infrastructures of climate and environment a bit more. Climate science and environmental data rely on a collaborative, often state-supported research infrastructure. I think there's been many talks earlier at Congress that have highlighted how data contributes to knowledge about climate change, climate modeling, satellites, building our own DIY satellite ground station network, which I now want to get a ground station up on. And so I would check those out for examples. But I kind of just want to stress the sort of coordinated global scale of this collection and processing, and something that scholar Paul Edwards has described as a global knowledge infrastructure making global data. In the United States at the federal level, there are a handful of agencies, departments, and institutions involved with the creation and publishing of this data. NOAA, USGS, NASA, DOE, EPA, and more. In addition to these are research institutions like Columbia University, where the Center for International Earth Science Information Network is based. Given the coordinated collection and holding of this data, there's certainly no singular form of public access, but publishing of data and data products has been increasingly public through a combination of policies, portals, libraries and archives, and open government data initiatives. In the US, under Title 17, section 105, most data with some exemptions is considered a work of the US government, and therefore in the public domain. And historical climate and environmental data is critically important to contextualize and understand current observed phenomena. However, in addition to the data itself, there are reports, summaries, and analyses that really open up the topic to a broader audience, and I kind of consider myself that broader audience to sort of beyond those with domain expertise. So I kind of just want to pause for a moment here and untangle climate and environmental data. People can use them interchangeably and I've been doing so right now, but I think there are some differences in the way certain communities use them that are important. So in many cases, when people say climate data, they're really referring to atmospheric, weather, and hydrologic conditions data, whereas environmental data, when people use that, are often explicitly referring to environmental health and hazard. This includes air and water quality, toxic and pollutants, as well as waste. So both are really vital to help characterize and navigate our relationship to our environments, but I think at times can maybe feel at different scales, and so this is the first of two terrible GIMP attempts to position them against each other. So access to climate data and methods has already faced challenges prior to the past year. In many cases, from those disputing global warming, this has led to motivated targeting of climate scientists and their data sets in some cases with financial support from lobby groups. One of the more well-known examples is the hockey stick controversy, where a graph showing the gradual cooling and then recent rapid warming, roughly resembling a hockey stick, was highlighted in an Intergovernmental Panel on Climate Change Report. It had been published per. In subsequent years, the results have been replicated numerous times with different and additional data, but at that time, the results were new and compelling. As a result, they were then disputed. Michael Mann and his colleagues wound up personally targeted online, subject to Freedom of Information Act requests and drawn into court proceedings that lasted many years. There are more examples, but in the interest of time, I'll have to skip them, but I think maybe the other most visible one would be the 2009 Climate Gate email leaks. My sense is that before 2017, this form of targeting would have been identified as the most likely public risk to climate science, the sort of way to introduce doubt around climate change and public opinions through concerted efforts to discredit results or scientists. I just want to say one more time, shout out to Paul Edwards, his discussion on environmental data science systems as under siege is really instructive. In his book, A Vast Machine, as well as more recent research, he kind of unpacks the history of climate data. And I think his work and kind of these previous examples kind of raise important questions about access to climate data. What Mann's opponents and climate change skeptics said they wanted in many cases was the raw data or like a full record. And so in one case in particular, a project sought to actually audit the siting of surface temperature instruments. However, I think it's important to note and something that scientists flag of the time was how necessary context is to interpreting data. And I think we need to better understand that with working with complex data, including climate and environmental data. So this moment in particular. On November 8th, 2016, Donald Trump was elected. For many people, there was an immediate sense that we have to be ready, we have to do something. Scientists, environmentalists and environmental justice organizers saw statements made during the campaign as indicating that climate and environmental data infrastructures could be at risk and actively targeted. But this isn't the same as the risk above. Instead this is, the risk is, how do you ensure continued access to data about climate and the environment when the supporting institutions may no longer be able or desire to? However, many from an environmental justice background have long recognized existing environmental data structures as imperfect. For example, in cases where it's relying upon industry reported data or is non-representative of communities embodied experience of pollution and toxics. This put people into a position of concern for the preservation of imperfect data to avoid an alternative of no data. But wait, you may have been thinking this whole time, aren't you from Toronto and isn't Toronto in Canada? It is, I am. However, Canadians experienced kind of our own mobilizing moment under our previous Prime Minister, Stephen Harper, and I think this highlighted the new form of a threat to climate and environmental data infrastructure. Stephen Harper was able to really quickly and successfully implement an agenda of systematically undercutting environmental and climate research budgets, closing labs including at an arctic research station, weakening government environmental regulations and then shutting down libraries and reducing historical, periodical and record collections. The speed and immediate impact, I think served as a rallying moment and highlighted facets of vulnerability that many had not been considering. So, do something. For edgy members, that something quickly became preserving existing and federal environmental data through helping facilitate grassroots archiving efforts, monitoring changes to federal websites and documenting the political transition through interviews and timely academic analysis. Between December 2016 and June 2017, local organizers hosted 49 data rescue events in cities across the US and Canada with support from edgy and the Data Refuge Project at the University of Pennsylvania. At events ranging in size from a couple dozen to over 200, people gathered to nominate key federal environmental data sets for archiving as part of the internet archives, pre-existing end of term crawl. In addition, attendees strategically organized how to deal with links and data sets that could not be preserved through automated methods. At these events, attendees nominated over 63,000 web pages as seeds for subsequent crawling. However, and it's hard not to go into like a really extended conversation about crawler software here, which I'm probably not the best person to do, crawler software is not actually easily able to fully archive and discover links to data sets and web pages on all sites, partially because of underlying web development practices and internet infrastructure and partially because of resource and storage constraints. So, in addition, more than 22,000 data sets were identified as candidates for non-automated preservation. We deemed them as not able to be successfully crawled. Several hundred of which went through a workflow of developing custom solutions to scrape links and data sets and upload them to a data refuge repository using an open source toolkit. I'm gonna use the benefit of hindsight now to sort of avoid falling into a narrative that portrays us as underdogs that kind of went alone, accomplished this project of a massive scale. As people in many cases without the expertise of digital preservation and archiving or a long track record in it, we didn't fully appreciate the scale and along the way rediscovered, rediscovered, I wanna stress that, longstanding issues with archiving and digital preservation that many groups were already navigating. So, rather than forging ahead alone, we quickly found affinities with existing advocates, projects and institutions, many of which who had been operating in this space for a long time. So, in addition to us in data refuge, Climate Mirror, Project Asimuth and the archive team who had existed for years prior, also became rallying projects for people who wanted to quickly organize around preserving data. I'm just gonna mention three projects, there's way too many, but wanna untangle just some things that I think are interesting around access, coverage and risk. So, first, Internet Archive. Internet Archive is an unparalleled resource for web archiving. In this particular case with just end of term crawl, they managed to get over 200 terabytes of the government web. And because of the additional focus, they have got sections of websites that might have been missed based on the way they configure that crawl. And so while it may not include an archive copy of all the data sets for the reasons mentioned earlier, it provides an important snapshot of how that data was presented on websites at the end of the previous administration. And further provides the ability to browse previous versions of those sites in a way that extends how the content was initially presented. So I think that kind of opens the question about what we think about when we think about access. The next one is Code for Science, spearheaded projects fall barred, named after the seed vault, a collection of over 38 gigabytes of metadata to try and create a single catalog of research data files. And while data.gov has a catalog, not all data that could be there is there. Without a comprehensive view, assessing where data is and how much data is preserved is difficult, as you can imagine. And then finally, as existing data center practitioners, the earth science and information partnership made a case for a collaborative effort to understand risk, stressing existing preservation and backup methods may not be visible, particularly for climate data. They surface different understandings of risk from public ones coming from a data practitioner perspective. I think it's really important, if that was a bad slide job there, sorry, to pull up this quote that they say. So they frame these as longstanding factors of risk, but I would say there's a new dimension under certain administrations. And that is of obsolete technology or data formats, lack of metadata, lack of expertise, lack of funding to maintain the data. And I think in addition, lack of funding for additional extended collection in the future. So a year later, what happened? Ooh, this is kind of a weird transition, you just noticed that, sorry. We haven't seen a mass removal of data sets. There have been a few that have been taken down for reasons that are not clearly linkable to sort of the goal of removing them from public access as sort of politically motivated. Executive orders and Scott Pruitt's appointment to the APA has led to a reverse of a ban on the neurotoxic pesticide, a proposal to rescind Obama's clean power plan is in the works and cuts to important environmental programs, notably those that protect marginalized and vulnerable populations are underway. Further, budget proposals aimed at severely cutting funding to key federal agencies that involve with environmental data collection. In terms of data, we've actually seen a shift in how it's presented on federal websites. The screenshot on the slide is actually from a recent edgy website monitoring report documenting the removals and changes and access to resources on the EPA's climate and energy resources for state, local and tribal government. So since January, edgy's website monitoring team has released over 25 reports like this documenting changes to how environmental and climate data is presented. So what next? I think the biggest opportunity I see is in the public conversation and attention toward the continued access to this data. The fact that people who weren't librarians, web archivists or research scientists showed up and stayed involved attests to this. I see edgy's website monitoring work as a way to attempt to mobilize that continued public conversation, but there could be more. In the wake of recent FCC decision on net neutrality, I think we're seeing another wave of public conversation around infrastructure but kind of operating at a lower level. Since the summer, edgy has been working with protocol labs, the creator of IPFS or interplanetary file system and Query, a data science company developing dataset research tools on the distributed web on a project called Data Together which aims to convene a conversation around building our own and better data infrastructures. We want to explore how decentralized web patterns can support community data stewardship in part through content address web archiving and are having those conversations out in the open where people can join in. That could be a whole talk in itself and I would prefer Matt Zumwalt to give it. And I have more questions and answers so I think it's probably something that works better as a conversation and one I'm hoping at least some of you will want to participate in. And maybe just to kind of suggest as well, I think many in hacker free software, open hardware, open science communities have recognized the ways that technology is not neutral that it can come with embedded bias and predispositions to be used in certain ways. And I think if that recognition can be coupled with the recognition from environmental justice advocates and academics into the ways that data is not neutral and then also with an attention to the vital data about climate and environment that is critical to navigating our changing relationship to the environment, I really think we have a chance to build better data together. So maybe just in conclusion, I want to say, IG is always looking for people interested in volunteering. Our projects range for people from a variety of backgrounds in particular, if you are like a DevOps which, come find me, we need like serious DevOps help. And please check out our website, GitHub. You can sign up to our mailing list. You can help us create data together's mailing list or maybe just have some conversations about this somewhere online. Thanks. Yeah, a big, big thanks because this is an important work. We, as we like, fact-based information, we have to achieve this. So let's now come to the Q&A. Please go to the microphones. And if there's a question from the internet, I get informed from the video angels. Is there any question? Are there is somebody coming? Microphone one, it's for you. Hi. You often hear that scientific data sets are very fragmented or not very easily accessible. I can imagine that's certainly something you run into while trying to rescue it and crawl it. What has your experience been there and did you see any opportunity to, for example, improve that in your whole efforts? Yeah, so absolutely, I think we did run a ground of that fragmentation. And I think maybe if we could offer one thing to other people was the experience of like, this not being something we were familiar with and kind of stumbling through it and making like all the mistakes possible. I mean like, where is this? How can we find it? So in terms of kind of what we found or sort of like a way forward, actually want to flag that, I think there are already processes to try and like address fragmentation. And I think data.gov is becoming this open data portal. I think the way that certain countries have kind of like a one-stop portal to try and find data sets as well as you know, coordination between like the IP, I had the screenshot of the IPCC Data Distribution Center. Like I think those projects are a good, are one attempt at that. I mean, I think there's still this problem of access in the sense that I don't, it's unclear to me how people who aren't within a certain community of practice would kind of even know to get there to get that data. And you know, one thing we've heard in conversation with others, including the US Climate Alliance, I had their logo up, you know, is that I think there's a certain set of people who care a lot and you know, their decisions and how they work is gonna be really heavily impacted by climate data, but they're not gonna look at the data, they're gonna look at those reports and like getting access to those is extremely important. I mean, I think portals are a big help. I think the library depository programs, those things that already exist are really important and I don't wanna see them go away, but I still think there's like a, something slightly missing about usability and I'm not entirely sure how to address that, but I think kind of these one-stop things and work around opening the data sets is like a really good first steps. And thanks for your effort on this. Okay, we have 10 minutes and we questions from the internet to in the room. So internet, start first please. Okay, so one person from IRC asks, is the bar for putting data into the World Data Center for Climate too high in terms of providing a number of meta data, which is a lot of work? So, my understanding is that operates a bit similar to data.gov in the sense that it's like opt in. And it's opt in from a data publisher level and if I'm incorrect there, I'm sorry, but working with that assumption, I think the barrier we found is that not everyone has opted in. So if you're a person who cares about the data but you're not the person who made the data, you're kind of stuck if that the publisher has not included it in these repositories. And so, I mean, I think an interesting approach could be to figure out ways to incentivize more people to get it in there. And I don't know what those hooks could be but if there's a way that, like, you request they push it there and there's a way to motivate that behavior, like, I think that would be awesome. Microphone five, please. Hi, I have a question regarding creation of new data. It means that one of the concerns is to protect that and preserve data from the old scientific research. But what will happen, for example, there will be lack of funding for the next research and for example, our long time series for climate research will be lost for that. Are there, or for example, in this community, are there people who try to reach broader audience and tell them that we need to find funding for preserving data and for creating the new data and for measuring still all of these climate things? I mean, I agree. I think that's really critical. And that's something where I think there are in the United States and in Canada, a lot of people mobilized around this issue of thinking about how the knock-on effects of limiting budgets now and then a continued constraining of budgets and a lack of funding and cutting jobs instead of growing jobs. The group that I'm most familiar with who I think is doing really strong advocacy around that in the States is the Union for Concerned Scientists. So I think there are definitely groups who are flagging what those, like what the outcomes of the budget proposals would work or the impact of those. And so there are groups who are advocating for it. I think me not also being an expert in government policy or how budgets are implemented. I think there are constraints in how advocacy is a tool to affect change in what a budget is that gets adopted. It means that maybe that alone is a strategy is not going to prevent it from happening. Okay, microphone one please. So is the distributed data digitally signed? I could imagine that there are some groups of people who might be interested in fiddling around with it. Yeah, so through the data rescue process, we worked really closely and the data refuge project is many of them are librarians. So there was a strong concern with maintaining citability of data and also thinking about integrity and verification. I think actually it raised a lot of really interesting questions for me at least in how you would imagine like a very volunteering human intensive process of doing that verification. So there was a workflow management tool that was developed where we would have like a log of who had touched each data at an event or a data set or a web page. And then we used existing librarian and Library of Congress tools to kind of generate checksum and to ensure that what was uploaded was what people thought was uploaded when you downloaded you could verify that. And so it was trying to do a parallel like sort of social and technical implementation to do that in the move towards some of the data together work. We actually have a reference implementation of generating works, which is a web archiving format writing them directly and adding them to IPFS. And so with IPFS and content address protocols, there are additional ways to do verification and to ensure that what you only retrieve is the data you think you're retrieving. So I think that those are important questions that are really interesting. And I think we tried. I mean, I'm not a librarian by practice. So I think there are a lot of trade-offs there that I'm probably not as sensitive to. Well, that sounds very trustworthy. Thank you so much. Yeah, then please give a big applause for Don Walker for their fabulous talk about these public data, which is necessary for us all because it's always not allowed these.