 Welcome everybody to the Help I'm an Excellent Government Information Librarian webinar series. This is brought to you by North Carolina Libraries Association's Government Resources section. And my name is Linda Kellum and I organize these webinars for our group. We have four presenters today. We have Sherry Laster, who is the Government Information Librarian and Data Services Librarian at the University of California Santa Barbara. James R. Jacobs, who is the U.S. Government Information Librarian at Stanford University Libraries. And we have James A. Jacobs, who is the Data Services Librarian Emeritus from the University of California in San Diego. And then finally, Lori Allen, who is the Assistant Director for Digital Scholarship at Penn Libraries. So thank you very much everyone for doing this today. All right. Well, thank you all so much for joining us and thanks, Linda, for hosting us. Today's webinar is, I think, a little bit different from what has been hosted in the past. It's a little bit of an experiment for us to get to talk about different projects that are happening and do so in a way to give you some context about these projects and how to participate. So I want to begin with a little round introduction. So, Lori, do you want to go ahead? Thank you, Lori Allen. I work at the Penn Libraries and have become involved in data refuge efforts here. Hi, I'm James A. Jacobs. You can call me Jim so that it'll be completely clear that I'm a different person from James R. Jacobs. And this is James R. Jacobs. You can call me James. And I am indeed different from Jim. My name is Shari Laster. Our conversation today, we're calling this a conversation with the future. And the reason that I like to think about collections this way is that when we're collecting materials, we're doing so on behalf of people that both we work with every day and the people that are not working with us yet but will be someday, will be relying on using these collections. And the collections tell our users a lot about what our priorities are, what we care about, what we may be overlooking, or what we may be making up for from the past. And at the same time, it's a two-way conversation because we're also trying to anticipate what it is that people will want to access and use and interact with in the future. So with that in mind, we came up with five questions and we're going to go through these and share both our perspectives as well as current work that's happening. So our questions are, why save government data? Why do you show right now? What is it that we are talking about when we're talking about data? What's happened over the last six months and what's happening right as we're speaking? What's the sustainability? What's the future outlook for these kinds of projects? And finally, what can everybody do to have a chance to participate in this work and be involved? So beginning with a question, why rescue data? I'm going to turn it over to Jim and then James to talk about this context. Hi. Thanks, Sherry. And thank you all for joining us this morning. The first thing to understand about what we're doing is that digital government information is not at risk because of politics alone. It's at risk because there's no plan, no policy, no budget for preserving government information. With few exceptions, there's no law or regulation that requires government agencies to preserve their own information or to provide free access to it. Digital information is fragile. Preservation of that information requires conscious and planned action. Such action requires people and organizations that have preservation as a mission and as a priority. Government agencies are, with a few exceptions, tasked with collecting and creating information but not with the preservation of that information. What that means is that almost all digital government information is at risk of being lost. Such losses can be intentional or unintentional. They can be driven directly by political policy or bureaucratic procedures. They can be driven by budgetary priorities. They can be caused by technological decisions or even just inattention or inaction. In short, the need for data rescue that we face today is not a new need. Nothing has changed to make data more at risk today than it was six months ago or four years ago. Two things have changed. First, more people are aware of the risk today than they ever have been before. And second, many of those who understand the risk are in communities of users. They're not just librarians. They're users who are concerned that explicit and announced changes in government policy could result in decisions that would threaten the data on which they rely. And I'll turn it over now to James. Yeah, and I just wanted to make a couple of points to sort of jump on to what Jim said. First, yes, digital information is fragile. This is not a new thing. This is not an inherently political thing. You have things like link rock and content drift. And a quick Google search on those terms will get you some more understanding of that and how fragile government or web-based information is. To my understanding and to my perspective, I think control is a key here both historically as well as currently. As we often like to say over on pre-government information, linking is not preserving. And so since web-based information is inherently fragile, it behooves us as librarians to collect it in order to preserve it, in order to control it, in order to describe it and give access to it. The silver lining, I think, I like to look for silver linings, especially in this political climate when silver linings are really important. I think, as Jim pointed out, the public is suddenly aware of the need for data preservation outside of the .gov domain. And some of us in the federal depository library program have been arguing for this concept for many years. I think you'll find that Jim wrote an article probably in the mid-90s about preserving information from bulletin boards. So this is not a new idea. But preservation requires action. Action requires actors and government agencies working together. The current system is primarily centered around creation and access, not preservation. Preservation isn't happening system-wide, and you need to figure out ways to build and sustain the government information ecosystem. And just to give you an image of what it is that we're talking about, this is a graph that Jim did for a 2014 report for the Center for Research Libraries called Board of Digital U.S. Federal Government Information Preservation and Access. And I think it really shows what the issue, what the problem is at the 5,000-foot level. And what you see is the number of tangible documents, meaning government documents, that were distributed to SdLP libraries in 2011. And it's a small sliver there. And a slightly larger sliver in the middle is all SdLP items that have been distributed to SdLP libraries throughout the 200-plus years of the depository program. That number is somewhere between 3 and 5 million. And then the third large piece there is the number of URLs that were collected for the 2008 end of term crawl. This was somewhere around 160 million. The 2012 and the 2016 crawl will be exponentially bigger than that. But you see that even if some of those quote-unquote URLs were spacer gifts and images and oddities of the web, you still have a lot of information slash data that's being produced far larger quantities than has ever been produced before. So this is the nature of the problem we're dealing with. So I'm glad that you mentioned data, James. What are we talking about when we talk about data? So this is Jim again. I'll start off. I just want to make one quick point. The word data is sometimes usually to mean anything digital. But when climatologists and geographers and demographers and economists and scientists use the word data, they mean one very specific kind of digital information. When they talk about data, they're usually talking about what we might call data sets. These data sets contain information that has been collected using things like public opinion surveys, social surveys, satellites, and other kinds of instrumentation. The data is stored in files and databases in a highly structured way so that the information or data can be analyzed using statistical software. This is raw information, raw data, if you will. It's often just a bunch of numbers, like measurements and codes of various kinds, and normally not intended for direct reading by humans. Now, when we talk about rescuing anything and everything digital, when we use that definition of data, a lot of us are thinking about the stuff that we see on the web. Web pages, PDF files, simple spreadsheets, images, audio and video files, and so forth. This information, unlike the data in data sets, is very definitely intended for direct human consumption, reading and viewing and listening. And this kind of information is relatively easy to find and identify and capture. This is what the Internet Archive does every day. That other kind of data, the data in data sets that so many scientists use and analyze in their research, is not always easy to either find or identify or download. And a lot of those data sets are not visible on the web. Those data sets often require special attention and tactics to identify them and download them. And so what we're talking about when we're talking about data is both, but it's a broad field of information. And with that, I'll turn it back to Mark. So in the Data Refuge Project that we started, as we wanted to think about how could we rescue federal information, federal data, we were, I think, at the beginning really most concerned with the kind of research data class. But as we thought about and learned about and talked with people about what federal data, how federal data lives, they really kind of run the gamut between data that is research data sets all the way to the stack to HTML. And this distinction is really important. The distinction between web pages and data sets is really important for how we find and reuse data. There are standards for making and sharing data sets in various disciplines, and there are really different standards and practices for sharing web pages and PDFs and that sort of thing. But what we found over and over again is that without attention to the preservation or use of data, many of the data sets that are made available on government websites are treated even by their producers more like web pages than like traditional data sets. So they're conceived of as parts of the interfaces that provide access to them rather than the way we would want to see them, which is data sets that can travel with all their own metadata. They're very often not packaged with, on federal websites, the data that we would want to see as a data set is very often not packaged for the company data dictionaries, instrument data methodological overview, download tools that we would generally expect from data sets within scholarly disciplines at academia. Instead, the metadata that we want to include in a set of files describing a data set are just sprinkled through web pages or on a related page but aren't really connected to or easily connectable to the data themselves. This conflation of websites of data visualizations, data sets, and informational sites is one of the reasons that addressing this problem is so tricky. The problem backing things up becomes more complicated when the lines are as blurred as they are. And when we talk about the data we're thinking of for data refuge, that meaning of data needs to be pretty clearly spelled out at various times during the conversation. We need to be open to accepting that data producers, users, journalists, open data advocates, software developers, data librarians, and others all mean more or less different things when we talk about data. But we share an interest in helping to ensure that each community has access to the data they need and so we may need to develop new vocabularies and new ways of talking about these problems. With that kind of context, let's talk about what kinds of projects are happening right now. We will have Lori and James talk about two different major projects, data rescue and the end of term archive. So first, James, can you just give a quick overview of what these current efforts are at the 50,000 foot level? So a lot of efforts are already happening. There's a lot of crawling of the government web in bits and pieces. There's a variety of organizations that are collecting bits and pieces of the government web domain. But no one organization is mandated to do it, unlike in other countries where national libraries have a mandate to collect their country's domain. So you have organizations like the Library of Congress, TPO, NARA, Internet Archive, all fairly large organizations, all doing web capture and preservation and access. You have agencies using Archivit and other web tools to archive their own domains for various reasons, whether it's to deposit into the National Archives or for their own preservation efforts. You have universities like the University of North Texas, Stanford and other libraries that are building topical.gov collections, focusing on all things from FOIA to CRS reports to other sorts of subject-based.gov collections. And then you've more recently gotten these community efforts. This is more from the fear that government data or government information is going to disappear. And so you have a lot of these citizen-driven grassroots efforts like data rescue and climate mirror and azimuth and those kinds of efforts, as well as larger, longer-term efforts like the end of term. And so without I'll give it over to Lori. I'll talk a little bit about the data rescue events that have been happening around the country. The data rescue events are supported by data refuge and by the Environmental Data and Governance Initiative, which is a separate organization. And really they're driven by the people within, often within libraries, but also elsewhere, who come together to form an event that is designed to rescue data. And so there's a process that we go through in the data refuge, the rescue events, when folks want to move data sets of the sort that we were talking about earlier into our catalog and the associated storage space that we have for the datarefuge.org. There's about 180 data sets in there. But also at data rescue events there is a really, really close connection between understanding the ways that the data are used, so bringing together people for teachers and bringing together people to tell stories about how data impacts, federal data impacts their lives, how climate and environmental data is connected to policy and especially in cities. And then we are hoping and encouraging folks to really experiment with different kinds of data rescue events, with events. Some events are continuing to move data into our workflow, but we want people to sort of continue. And Sherry, can you maybe pursue the next slide there? Thanks. So we're hoping that people will think of new ways to tackle this problem. The data rescue events basically focus so far, for the most part, on moving data into one of two locations. One is for the first several months we were feeding the end of term harvest project, so we were directing URLs to that. And if you think of this continuum of basic HTML pages all the way through to research data sets, the goal was for those things that the internet archive can capture that our web archival, like basic HTML pages as well as directories, files and FTP servers and that sort of thing, that our events were designed to sort of systematically push content to the way back machine to the internet archive when it can. And then for those things that can't comfortably or that sort of be sucked into a web harvest, things like the data that's behind the query interface or a research data set that's already sort of prepared and laid out, that we have a system for moving those into sort of while minding chain of custody and being aware of the quality of the data into data reference, which is at data reference.org. So that's been the kind of approach. Just to call out there, there is that embedded content which doesn't have an arrow. And that's a kind of standard for all of the various kinds of interfaces and visualizations where we might be able to capture the data behind them, but some of those interfaces that can't be comfortably web archived are we don't yet have a display to capture them. It's just worth calling out, I think. I'll spend just a moment on the workflow at events and how that goes, but mostly, as it may, to really encourage folks to help us think differently about this. So in the original workflow that we worked out here at Penn for our event in January, the idea was there would be people who are figuring out what can go to the internet archive and pushing it there and what can't. Those are the seeding and sorting. And then once a site has been identified as containing data that can't be pushed through the internet archive, then there's a process of research and then harvesting that data, whether it's download or using an API or scraping an interface to get the data behind it. And then we take some measure of preservation steps, basically bag it using that library progress protocol, and then it gets described and shared through our CCAN instance. There are some exceptions to where they're being shared when we haven't figured out yet how to get some of the bigger data sets there or haven't done it yet. But for the most part, that's the process. And the more we've learned, though, the more we've tried, we are sort of learning what we in the larger data community have known for a long time, which is for proper preservation and describing really can't come that late in the process. A huge part of the work of this project needs to be describing and researching and creating data sets out of the web pages and associated interfaces so that they can be better harvested and preserved in the future. Helping those agencies create plans, create an understanding of what constitutes a data set for sharing that we are hoping to see some experiments and pilots and libraries with help take on that project. And just that next slide, just to illustrate what we mean there are many cases of set which we want to see as a kind of package of data that's associated with data needs to be described a little bit. Data files are pulled from the interface or ideally that works with the agency to get them without scraping but actually from the agency. And then the associated contacts that might come from those HTML pages or metadata that is actually created to describe the information is added and a research data set is composed and then we can back it up and think about it in that way. So that's kind of the direction that we're hoping that events will turn and it's certainly the direction that we think is the most productive going forward. I'll just talk briefly about the end of term crawl which we lovingly call EOT. So the end of term crawl has been happening since 2008 every four years obviously at the end of each presidential term. A group of interested folks have gotten together with no budget and no direction really including the Internet Archive, Library of Congress, University of North Texas, California Digital Library, George Washington University, Stanford and several others. We basically start six or eight months before the term is going to end and we collect seeds both from prior crawls as well as asking for nominations of seeds in order to crawl the web and do as much of that as we can. And this time around we are focusing on also HTTP, HTTPS but also FTP and social media which we didn't do in the 2012 crawl. The George Washington University is focusing on social media. We probably don't need to know the technical aspects of it but we're not crawling social media per se but more like querying APIs to collect social media, Facebook and Twitter, YouTube, those kinds of things. We've got hundreds of volunteers this time nominating seeds many from data refuge and data rescue events which I think is really amazing that these grassroots efforts came together and found a space to fit with the end of term which is sort of throwing a big wide net around the .gov.mil domain. So this time around we're looking at collecting somewhere around 250 terabytes of harvested content and data. We've got 9,000 social media accounts, lots of domains, lots of .gov as well as non.gov domains that are government organizations, things like commissions and those kinds of things that might have a .org or whatever. The crowdsourced nominations of seeds is particularly amazing this time around. As you can see in 2008 we had 26 nominators and 457 URLs. This time around we had 393 nominators. Some of those individuals, some of those events as quote unquote nominators and over 11,000 seeds were given to the end of term crawl were nominated to the crawl this time. So it's been a whirlwind eight months of capturing lots of government information, lots of government data. We will put the entire crawl capture will be indexed and searchable from the end of terms website which CDL hosts, California Digital Library hosts. You can currently get access to the 2012 and 2008 crawls in 2016 will be coming soon. All of the data will also be hosted in Internet Archive but as well there will be a copy at University of North Texas and I believe Library of Congress. So there will be redundancy and multiple access avenues to this data. So I think this is a lot of exciting projects that come together at a particular point in time and I'd like to ask both James and Laurie to talk about what's next for each of these projects. So for data wreckage we were having events and it was so exciting and as I said we started to see what everyone already knows which is that we need a more sustainable plan. We need to take advantage of the tremendous work that's been going on within the government documents, librarians, also within library development. But beyond that, one of the kind of amazing opportunities in this has been our connections that we've been able to make with open source software developers and especially with people who have been really active in the open governance. The people who have been working on open data efforts in cities and states around the country and the work that they've been doing connecting federal, state and local data producers to get them to share and make public their data. And as we've seen, that effort has been really successful. Data.gov on the federal government side has hundreds and hundreds of thousands of records for open data and has made that data available and searchable. That said, those efforts are not made generally with an eye to the preservation beyond the data producer's interest in maintaining the data. And that's where we are moving towards what we are now calling the library plus network. This is a collaboration right now with the Association of Research Libraries. We expect to bring other collaborators on soon. The first thing that we're going to do is an event that is quickly filling up that where we really want to bring together some of the people from the open data and open government communities together with people in research libraries to begin thinking about what a revised federal depository library program that is conceived of as really for the kind of foreign digital data kinds of information. We understand that the chances in the current environment for federal regulations would help this project process. We need to be advocating for what we also need to be making plans to help and support the long-term sustainable access to this data without assuming that the federal government would ditch the behavior of our folks. Folks coming to that meeting include the Mozilla Foundation, agency producers from NOAA, NASA, and then folks from within a few of the libraries that contributed early on, but also California Digital Library, as well as a lot of, like, dataverse, and BLF, and the HODI trust, and ICPSR, and Center for Open Science, and on and on. So we're really excited for this meeting where we hope to sort of coalesce a group of people who are able to think through solutions and think through some plans that might pull together some of the future. As Lori has said, and we've said in this whole webinar, this is a huge issue. It's not only, it's great that we've had so much energy to do this at this point in time, at this point in history. But how do you sustain efforts going forward? This is a huge issue both technically and socially, politically, and one in which there's plenty of space and frankly plenty of need for collaborative action. So I'm really glad that ARL and all these other groups are coalescing. There's another group called PEGI, which is the Preservation of Electronic Government Information, that held two summits at the fall and last spring, and I think we're doing CNI meetings to get some other, to get interested people in the same realm discussing these kinds of issues. So I think those, the Libraries Plus Network and PEGI will probably coalesce into one larger movement. So End of Term has always been an ad hoc project among interested people and organizations, but there's never been direct funding for these efforts. So I'm hopeful that direct funding will be coming. But I think there's two other things that are key to these efforts going forward. One thing is we must base our work on accepted standards and guiding principles. This is critical for any efforts going forward. So these are just the guiding principles that we've sort of written about on Frig of Info before. These governing principles come out of OAIS, which is an accepted digital library standard. And by the way, is a close relative, at least in my mind, to Rangadathom's five laws of library science, which our books are for use. Every reader, her book, every book, its reader, save the time of the reader, and library is a growing organization. So this is just sort of an updated guiding principle, I think, of Rangadathom's ideas and ideals. Also, since this is such a huge issue, any effort going forward must, I think, include the publishing agencies, the people and the organizations which use government information like politicians and policy experts, researchers, think tanks, government watchdogs, and even the public. And therefore, we need to assure that non-librarians and non-library organizations are aware of library issues and are able to cooperate in efforts going forward, but also that those library efforts are informing policy going forward. So we've spent a lot of time chasing the data tail for the last eight months of both Data Refuge and these other grassroots efforts, and then TermCrawl and the Internet Archive, frankly, has been chasing that tail for a long time. So I think what we really need at the policy level is a way to structure government policy so that these efforts are not always chasing the tail, but that preservation becomes part of data publication and .gov information publication in general. So Jim and I have written a little bit on the libraries.network webpage. Sorry I didn't give the link here, but I can send it later. About the idea of information management plans, this comes out of the idea for data management plans, which I'm hoping that many of you already know about on this channel. The idea that if you receive federal funding from a government agency, any data that you create in the research process must have a plan for preservation going forward. So we've sort of turned that on its head and said that government agencies should have information management plans so that they are forward-thinking and obvious about what they're going to do in terms of preservation and access going forward. I should note that Peggy is also having a meeting today the week after a library's network. And there's many people in Peggy that are going to both, including myself. Thanks, James. So our final question was what I think is the biggest question that our discussion will be based on, which is what can everyone, what can anyone, what can all of us do to participate in this work and to move it forward. So I'd like to start with Laurie's answer to that question. So thank you. And I think this, and I'll answer the question that was in the chat box as well. But the basic answer, as far as I'm concerned at this point, is experiment. This problem is really, really huge. It has so many angles. And I think our experience has been, you know, I have learned so much and my community has learned so much in trying to tackle actually solving a piece of it. And so I think when people have been asking us how to get involved, certainly you can go to the Data Refuge website at PPEH. I'll put the URL in the chat thing to host an event. But really, the more people just take a stab, I think we hear so many questions about how do we make sure that we're not duplicating effort. I can say is that we as communities have so much to learn in this space that actually I'm not sure duplication of effort is a problem. We duplicate effort all the time. We all learn how to do some of the same things because they are so needed. And so learning how, what does it take? What does it take for your organization to attempt to tackle a piece of this problem? What does it take for what's the sort of technical needs? What are the social and bureaucratic needs? What are the resource needs? How much does it cost? These are all questions that I think our community needs to need lots of more evidence from experiments before we know what the best practices are going to be. And so I would say start small, you know, take, we've talked to people who, you know, I'm thinking of a librarian who works for the American and Super Architecture. He said, you know what? Architects that I work with need, they need this data. So I'm going to make sure that my library has this data because that's my job. And whether someone else already has it, that's not, that's okay. I'm going to make sure that my library has this data and I'm going to build a collection that includes web archives and data sets that meet this need. I think we do need a much more broad scoping plan, some of which we've been talking about in this call. And I think that will come up with these various meetings at forward. But more than that, we need everyone to just jump on board with trying something and with trying to see what it would look like to help libraries see this work as our work. So that's my kind of plea to experimentation. Hi. Thank you, Sherry. I think the main thing is that a lot has already been done to rescue data in this last few couple of months. But data preservation can't be done by just a few volunteers. And a lot of this has been done by volunteers. And it also can't be done by communities of data users by themselves. A lot of it's been done by users so far. This is a time when libraries, I think, need to accept their traditional roles. That is selecting, acquiring, organizing and preserving information and providing services for those collections of information. And I think there's a place for everybody to work on this, whether you work in a library or not, whether your library is tiny or huge, whether your institution or management already supports this kind of activity or not, there's a lot that can be done. You're already aware of the issues. Your first job should be to keep informed because things are changing fast, particularly as Laurie says we experiment and develop some new techniques and tactics. Then you can share what you know and understand with your colleagues and your constituents and your users. And you can participate individually when possible and also with the help of your own institution. You can also work with your professional groups to forge alliances for action. I had a brief conversation with a librarian colleague just this week who was telling me about how libraries just won't work together. They're competing against each other. Schools are competing for students and tuition and reputation. Other libraries are competing for grants and they all want to work by themselves. This area that we're looking at today, if government information can't be done if we work by ourselves individually, we have to collaborate on this. And there's a lot of room, no matter what your skills are, whether your skills are political and pushing forward activities or organizational to get groups together to work or technical to do things or government information skills where you can help identify collections that need to be preserved. But we all have to act to preserve government information. This is James again. Yeah, just one quick point to jump onto Lori's idea about the implication of effort. I don't see that as a problem. I actually think that is an amazing part of what has happened in the last eight months because you have all of these communities of interest or in the OAIS parlance designated communities who have jumped in and started acting. And this has caused a far better and more thorough preservation effort than the end of term could do or that any one organization could do. So I really think duplication of effort is not a bug. It's a feature and it's an amazing thing to watch going forward. There are some avenues of participation going forward that you could start doing now. The University of North Texas has put up an interface and they've pulled out a subset of the PDFs that end of term has crawled and they're doing an experiment in cataloging. So if you're interested in cataloging, you could go in there and help catalog PDFs. This is, metadata is a critical and often underfunded piece of this whole function of preservation. You can start planning now for the end of term nomination for 2020. We've stopped receiving nominations for this crawl but 2020 is coming pretty quickly. You can also start preserving web pages that you are interested in for yourself or for your users or for your research communities. And the Internet Archive has written a nice blog post called See Something, Save Something on ways that you can assure that web pages and web content is already in the Internet Archive and if it's not, how to get it into the Internet Archive. I've had some conversations as well with some folks at University of Michigan and some other folks about this idea of using Zooniverse which is a crowd-forcing public science interface and using Zooniverse to help with metadata and description of crawled content. That's in the discussion phase now but it's something that hopefully will be happening soon. So there's lots of ways to pitch in now. Even if you don't have support from your administration, even if you only have an hour a week, there's lots of work to be done and we hope that we'll jump in. Sorry, can I just jump in there before we go to the questions and discussion? Just to also say, you know, events are still happening. I think as you've been saying, we're really going to experiment with data rescue events. But since those are so successful at bringing people out, I would encourage you, if it's always something that data rescue has been, is a thing that will work in your community, do it. You're welcome to use the workload that's described in our pages but we also, I would love to take a stab at doing some other things, building collections in other ways. And then I also just want to call attention to the Endangered Data Week effort that DLS is supporting, that we're co-sponsoring, that it's again another way to do events. My experience with running this event and seeing what happened in communities around raising awareness of what this means has been so empowering and amazing to see, you know, our libraries overrun by people who are there, not just to learn but to actually take action. So just once again, I would just really encourage people to think of, try to think of ways that people can help with this work because they want to. And it's a great way to help them learn about what's important and get better at data management themselves. Thanks, Gloria. That's a great point too because so much of the work that's been happening has inspired a larger part of our community to be engaged in these issues. And as we move between the building, developing these workflows and collections to both sustaining and changing the way that as libraries we're managing and thinking about this content, but then also increasing our voices when it comes to the policy side, that's a bigger group of people who can help us do that work. And as Jim and James have pointed out in the long run, that is going to be crucial for both for access to what we're working on right now and of course making things better in the future. So thank you to all three of you for bringing these ideas. I'd like to open up to the whole group to see if there are questions or discussion points that you'd like to raise, either for individual members of the panel or for the group at large. There was one question that came in. She was saying that duplication of effort is good. However, the first question from an administration, is this data available elsewhere? So in terms of administrative support, how do you make a case for duplication? My experience here has been, first of all, there isn't great duplication. I mean, certainly it's worth, you know, check ICPSR, check other big repositories. But right now the problem is so incredibly huge that if data is feeded by a community, you know, if there's no, you just aren't in a position to be saying, yeah, that may be fine. Except for, to my perspective, except for ICPSR, I would say if something is, you know, and this is from Data Refuge, and I'm very proud of the work that we've done and I think we've backed up some really, really important things. But it's not like this stuff is so safe that no one needs to worry about it. You know, we captured things at a particular moment. And so my experience in convincing my, it hasn't been a convincing issue at all. It's been a question of we are working with faculty who care about this data. They want to see it backed up. It is true that it is vulnerable. And so making the case that it's vulnerable is easy because it's true. It's not true that it's vulnerable for solely political reasons, but that's okay. It's still vulnerable. It's vulnerable for all the reasons that James and Jim were talking about earlier. And so for us, the fact that this has been a project in collaboration with the program in Environmental Humanity here, that it's been a project in collaboration with scientists and researchers, has made that a lot easier for us. This is Jim. I'll say two quick things about it. First of all, redundancy is good. So you don't want to duplicate effort, but you do want to have more than one copy of this. So as James was pointing out, the end of term archive is going to be stored in at least three different locations. That's a good thing. You do want to avoid redundancy of effort to get that stuff. But that leads me to my second point, which is this isn't just about getting the stuff and storing it. It's about discoverability by users and usability by users. Different users will want different collections of information for different reasons, and they will search for information differently, and they will use the information that they get differently. And that means that in some cases will actually need the same information in different collections. Some of that can be done through APIs and interfaces through a single stored file or group of files. But some of it will mean that we want to build collections and organize collections for communities of users. And that may mean that we'll have the same information and more than one of those collections. And that's okay. That's a good thing. I think there's also a really good argument to be made in terms of building local capacity for certain kinds of work. So if we are participating in these larger efforts and joint, whether it's taking a little piece of a very large project or even learning the same steps that are being used in a workflow that's been developed somewhere else, that improves our ability to then capture and preserve and provide access to content that no one else is going to be managing. In a lot of cases, for research libraries, this may be local government information. So data from local organizations, local governments, and other kinds of content where we are the only, or that we are the primary interested party and the only potential stewards for that content. And just one last point. I think my argument that I always make is that format doesn't matter. And just like if a library is going to, if a library's users are interested in a book, you're not going to say, well, they can get it on Amazon or they can get it through interlibrary loan. If your users need a book, then it behooves you as a librarian as a librarian to collect and give access to that book by putting it in your catalog. You don't know if your users will find that book in your catalog or will find it through a general web search or some other kind of search or through a friend. And so having more avenues of access is always a good thing, whether it's a book or whether it's a data set. Oh, and I'd like to add one more thing very quickly. If you're in a conversation with an administration that's saying, I don't want you to do this because somebody else might do it, there's a second tact that you can take, which is collaboration is important between everybody that's doing work. So if your administration is saying, I don't want you to start getting stuff because somebody else might get it, what your administration might support is your work on helping collaborate to make sure that there's not unnecessary redundancy in identifying things. A way to make sure that there's sharing of the experimentation of getting stuff. A way to make sure that libraries are working together collaboratively and not individually. And your administration might support that. And I'm just going to add one tiny thing, which is if you've done, if someone says, how do we know this hasn't already been done before, and you've done a kind of cursory search, chances are your cursory search is better than your user's cursory search. And so if you didn't find a safe, citable, backup copy version of this in a collection other than the original, then chances are your users won't either. And so that's kind of an easy, like, I didn't see it out there, so they won't either. So we have a question about if a group wants to preserve or work on data specifically, is there a step-by-step guide to get started or specifically how can a group get started on something like hosting a data rescue event? Sure. So there's – I put the link in there to the data rescue page, also which is just at ppehlab.org, and then follow the data rescue page. And there's lots of information there about how to host an event. So what I would say the first thing to do is make connections within your community, whether whoever is within the library, outside the library, but make sure that you have connections, a broad set of collaborators who agree about what they want this event to be and what's important about it. And then there's also – so the URL's there, and I will paste it there again. But there's also – I put a link in there to the way that an event in Portland worked. And this is a – it's a little – it's on GitHub, and it might not appeal to some, but it's actually a really cool way that they actually went sort of out – had a data rescue event that where instead of harvesting data, they actually had people make metadata about datasets that exist, which is – I can say from lots of conversations with people inside government, something that they would really like some help with. And so there's also – we have some blog posts up at the libraries.network, which I'll put up, some of which are written by James and Jim. And we would really welcome more. So if you feel like you know a way to get into a new way of tackling this problem, we would be really excited to hear about it and post it to the website. We're trying to sort of gather information about what folks are doing and working right now on a little bit of a who's doing what in terms of the efforts that are going – I hope that answers that. But yes, we have a really specific workflow that's like exactly what to do that's linked from the Data Refuge page. Okay. It looks like we are out of time here. I want to thank everyone for attending and for listening to this and also for your interest and excitement about this work. And particularly thank you to L'Oreal and Jim Jacobs and James Jacobs for being willing to talk about this work today and to Linda Kellan for hosting us. We look forward to working with you and please send questions to any and all of us. We're always happy to talk about this some more. Thanks. And we'll post some links in our slides over on freegovinfo.info.