 Let's let's get started Let's get started. Thanks so much everybody for coming today today We're going to talk to you a little bit about a project that we've both been working on to plan a community created data rescue toolkit It's part of an IMLS funded planning grant I am Mara Blake. I'm the manager of data services at Johns Hopkins University, and I am the PI for the planning grant I'm Katie Micah. I'm the data services librarian at the University of Colorado And I joined the grant a little bit late, but was able to attend the meeting in September So we'll be presenting together right too many computers so Today what we're going to do is walk you through a little overview of data rescue although I imagine many folks in this room are familiar with it and then we'll talk to you a little bit about the Planning grant that we've been working on that's been funded by IMLS And I can only stress that we are in the middle of that planning grant So we really are at a great opportunity to welcome your feedback and that many things are in development And then we will go into a project that came out of a planning grant meeting that we held to develop a data rescue nomination tool And then we'll talk a little bit about where we're going next What our next steps are and then open it up for your questions and topics for discussion All right, so I'm going to start by giving some context to what we've been working on Yeah, so in 2017 there are some changes to federal data retention and public information Publishing practices and that exposed substantial obstacles to long-term digital preservation of and meaningful access to government data and information And in response there were a number of universities and organizations that hosted hackathon type events to archive web pages and data sets that were identified as particularly at risk and highlight the vulnerability of public access to that data Well, this is not a new problem if there's any government information librarians here I'm sure this will be very familiar to you It is vital to engage a variety of stakeholders in addressing the inherent difficulties in maintaining access to and archiving and preserving public information records and data So these data rescue events. Oh That looks great But so these are some miscolored Photos of some of the data rescue events that happened around the country They began at the University of Pennsylvania in early 2017 and focused largely on data and information relating to climate change and other environmental research areas But that quickly expanded to other universities and communities with diverse but just as immediate data and digital preservation needs So pens program for environmental humanities in collaboration with edgy the environmental data and government's initiative and other stakeholders including the internet archive developed a methodology for hosting events that get data stewards scientists librarians and the public involved in single-day hackathon events Designed to motivate participants and accomplish a significant number of activities in a short amount of time These events were organized using common activism strategies like advertising and registering participants through social media And they gather people together to allocate expertise and computing power to achieve to archive particular websites and datasets So participants at these events self-selected into different groups of researchers harvesters checkers baggers and Describers to seed or nominate websites that we're going to be scraped into the internet archive I is web crawler often can't capture some types of datasets, especially those that might have sensitive or protected information or are structurally very complicated And to address part of this data workflow included Uploading datasets that would be decentralized data infrastructures Like mirrors if you're familiar with that technology that will store the data redundantly on a lot of different servers around the world And so since 2017 event workflows have expanded to include options for contributing metadata to the internet's the internet archives End-of-term project Cleaning metadata that has already been scraped teaching web archiving skills to expand digital preservation ops operations Outreach and education documenting and storytelling Creating or contributing to citizen science projects hosting Wikipedia editathons And archiving event details all of these activities contribute to the general initiative to keep public information publicly available It's open science philosophies Identify public data as a public utility data archiving and preservation is praxis that is difficult at best How many people here know of a lab or a researcher or are themselves a researcher that have a pile of undocumented data? living on a hard drive somewhere It's a very familiar problem And so as data librarians a lot of us understand that threats to long-term preservation and access are very common Almost standard parts of research labs and organizations Providing meaningful access and ensuring compliance with fair data practices is not easy And often takes a team of librarians and data stewards to create effective storage Access and archiving solutions for researchers at big R1s, but what if you're not at an R1? What if you don't have data curation services at hand? What if you don't have an extra hour week to learn how to do that yourself? What if you what if your department loses a necessary funding stream that paid for your storage solution? What if your grad student left and didn't tell you how he did anything? All of these are way that data begins to become at risk Data that are at risk are at risk of being lost This includes any data that are not easily accessible have been dispersed have been separated from the research output Object are stored on a medium that is obsolete or at risk of deterioration data that we're not recorded in digital form or digital Data that are available but are not usable because they have been detached from supporting data Metadata and information needed to use and interpret them intelligently Rescue and data in these circumstances is not a political exercise It is a necessary function and responsibility of a research community Data rescue has included has grown to include volunteers and professionals Dedicated to the stewardship of public data regardless of federal administration This community toolkit that we're going to be talking about represents an opportunity to engage the public researchers and librarians in identifying why and how data becomes at risk and what digital preservation and archiving Steps can be taken to ameliorate those effects Much of the easy easily discoverable and extremely endangered Information and data has been captured through a lot of these data rescue events But the mission is far from complete Identifying maintaining and preserving at risk data is a long-term project that will not likely ever end This data rescue community toolkit aims to support creators and users of public data by providing workflows and resources For engaging in this type of work. Thanks so much Katie. So Now we're going to turn our attention to the planning grant that we've been working on funded by IMLS and Where we're at with that and so this planning grant came out of a meeting that was hosted at Johns Hopkins in the summer of 2017 and you can see there's an OSF page that has Documentation about this meeting and it happened in response to what Katie's describing where all these events were hosted Around data rescue that gathered data, but then the question becomes what's next? How does this become sustained? There's really well documented information on how to host an event from folks like edgy folks for Penn But how do people enter into this and sustain it long-term and so that's What this meeting started talking about this idea of a data rescue toolkit that would provide sort of a centralized entry point For people who are interested, but had different perspectives so it led to the submission of One year planning grant IMLS, which was awarded. Thankfully for the purpose of engaging with the communities working in this area to Kind of work through what a toolkit like this might look like So the main part of our work so far has been to host a stakeholder meeting that happened just this past fall a couple months ago to bring those people together and talk more about What really would this toolkit look like who would use it and So Also not The normal photo different sort of an artistic interpretation Who attended this meeting that we held and you can see this is some of the folks who attended we had 26 people attend and Libraries were well represented academic research libraries No surprise, but we also had representatives from liberal arts college libraries public libraries, and I think it's important to note existing Groups of People from libraries already working together along a data type purpose so an existing entity of some kind We also had data archives represented data producers or data custodians and Tech tech companies and sort of public advocacy groups So brought together a lot of different kind of people to come together to work through what this toolkit might look like That would really be a value for the people working in this area During our meeting we went through an exercise to work on the vision of what this toolkit would be so that we could start from a shared place We also once we had that sort of nailed down To some extent We worked through what the components of the toolkit would be What it would include or what it could include and how will we pick from what it could include to what it will include and then to sort of further refine things we went through an exercise to think of potential user profiles and user scenarios to help us think through how a toolkit would be used This is the vision statement that we came up with at the meeting Again, it's not sent in stone But it emphasizes a couple important things and I'll highlight I won't read it verbatim but you know sort of the idea that there are both social and Technical issues to this problem and both are important that we have a Mission that both wants to identify collect describe preserve, but also provide access to data and That we want to have an open collaborative process and we want to build on existing expertise So since we've held this meeting I've started to reflect on some of the Issues associated with it and I try to think of the same topic as both the opportunity and the challenge So this is sort of both sides of the coin of a couple things that we've been working through So we have this opportunity where this is a really important issue many of us rely on federal data our researchers the public It's a widespread issue The sort of flip side of that of course is that it's a really big scary problem And can feel overwhelming We also have this opportunity where a lot of groups and individuals are very committed. They're very engaged Um, but we have the challenge of coordinating different experts expectations and styles the next projects you'll hear about we've been working with a tech company and They have a really different Timeline and sense of pace than academic libraries and so those are really important partners for us but we have to kind of work through some of those issues to Effectively work together And then we have this great opportunity to build collaborations and partnerships with other Groups working on this topic and then the challenge of avoiding duplication of efforts or confusion of terms that are Similar, but don't quite mean the same thing or we're using the same term differently so those are some of the things we're working through and In our meeting we identify in our meeting in the aftermath of the meeting I'll say we identified some big questions That we don't have complete answers to but they're interesting to think through and one of the questions was What do we mean by data or what do we mean by government? Do we mean structured data? Do we mean all information? Do we mean a lot of times in this space? We're talking about data produced by the federal government a lot of times Also, we mean data produced by the federal government that is about the environment But many participants in our meeting express concern for local and regional data that might be at risk for less Sensationalized reasons but have some of the same needs and could we Incorporate those needs into our work and in our mission statement. We chose to do that Our next question is who should the toolkit before you know, can it be can we effectively make one thing that is For everyone and if we can't do that right away, can we Scaffold or iterate our implementation? So we start with something smaller that addresses some needs and build on some success from that to expand out Our fourth question who is responsible in the longer term for this It's kind of a big scary question if we think back to our opportunities and challenges And then we had a really interesting discussion about the term data rescue, which is really well known So there's a lot of advantages to using it But we had a really engaging discussion in our meeting about this term might not be the most Inclusive for many of the folks we want to work with in government agencies who are working very hard to preserve Information and the idea that we need to swoop in and rescue it from them is perhaps not the best way to build Partnerships and so what term would we use instead that is still? Recognizable and identifies the problem but invites more collaboration with our needed partners on this topic So what's next so our main deliverable from our planning grant grant will be a white paper that Lays out what the toolkit might look like or might look like in stages So it'll have the different components we identified as well as the user profiles because those I think are really helpful And that's that's the main thing but items two through four are Important components of our project as well. So outreach at conferences and professional meetings like this one and We will have a OSF space to a company a company this white paper with our notes and resources available Really supporting our goal of having an open process and then we will create a plan for an implementation So I wanted to take a minute to talk about, you know, this meeting is mostly for libraries Why is it important for libraries to be involved in this and we have? This great opportunity to build connections across groups who don't traditionally work with each other and we're a nice relatively neutral Connection between some of these stakeholders It's also an opportunity for us to leverage our existing work with data many of us have Experience with data management data preservation data archiving in our institutions and we can apply it to this problem This is Widespread problem it transcends our institutional boundaries But it can also often be a way that we can still address our local needs where we have Researchers who are interested in this data and we can incorporate that into our local work while still supporting this bigger need and compared to some of the groups who are Working in this area and very invested in it. We have a lot of organizational stability as Especially research libraries and that can help sustain these projects with our stability and related to that is our ability if we so choose to invest our Human time and money and into the problem so We wanted to highlight a specific project that came out of our stakeholder meeting Which was the creation of a data rescue nomination tool And this happened as part of a side conversation at the meeting where You know, we're talking about there's some tools if you know a data set You can sub you can archive it yourself like data lumos you can take responsibility for it But there's not a place to say I know this data set needs help But I'm I can't do it. I need somebody else to do it. So the that was the impetus for this tool and Joan at cloud burst volunteered cloud burst to develop it and that development is in progress for this tool and so You know kind of the idea behind the tool being someone at an agency or a researcher Identifies a data set needs Rescuing they can't handle it themselves. They submit it to a tool and somebody else volunteers to work on it and So Katie is going to take us through Where the town? We've got some oddly colored cats But essentially we're coming at it from There's three essential roles that we imagine this nomination tool would be useful for And the first one is going to be the data submitter Which Mara just mentioned is someone who can identify a data package as in in flux Or in at risk in some way and might need some additional Preservation place to go a repository to end up at or some curations and conservation work to be acted on and then we split it up the Archivers or the preservationists into two roles of data minions and data heroes when we were on a conference call trying to figure out How to differentiate these roles and so essentially a data hero would be somebody who has some kind of Credential logs into the system and is able to identify Types of projects they're interested in working on any expertise or technical tech skills that they have that they're willing to contribute And they are not necessarily an anonymous user in this case a data minion can be somebody who operates within the system Can be anonymous or have a username and may want to join in some of the Activities but not necessarily put their name on it or to say that they are taking full responsibility for something Essentially the way the cats are playing is how we imagine this being useful or working a little bit a request gets submitted Something somehow it gets approved. It gets reviewed in some way by either a hero or a minion And they they sort it into a couple of different buckets of what what things need to happen heroes can then Search and select things to work on There is there are some some ideas of where status checks are going to be Things can be checked out and check back it back in in some cases But essentially at some point it needs to be approved by the person who submitted the original request and Then our big star full of toys over there is that we have a beautifully perfectly packaged wonderful Archived and preserved version of the average data which may or may not be realistic, but the nomination idea is something that was expressed as Necessary by a lot of people that got together at our at our meeting And so this is another idea sort of showing the workflow that we're currently working with that cloudburst is operating off of It's pretty General and I want to be able to solicit some feedback in our last five minutes So I'm just gonna kind of skip past some of this. This is what it looks like right now We are in active development in pre alpha phase. I would say right now We're trying to kind of give it some data and see what's going to happen and see if any of this works And then just some final notes about it is we see this tool working in tandem with this toolkit that gets developed Where an individual or group sort of finds this finds the toolkit understands what this data rescue community has been working on? And and finds this tool as sort of a stepping stone as a place to start with data preservation for a data package They recognize these submitters recognize that they might not have the tools the expertise or the resources to do what's needed But through this toolkit and through this tool they discover a Way to connect with people with experts that do have those those resources From the other end the toolkit and this nomination tool becomes a resource for people that are interested in Helping out that are that don't have time to go to an event. I haven't seen the events happened recently So there's there's not even really a big mission Surrounding this anymore. This gives people a way to contribute And we imagine that it will it will function as somewhat of a hub for active actual activity That supports the use of the toolkit and continues some of these data rescuing events or data preservation and archiving activities and I'll add that I attended the preservation of electronic government information or Peggy meeting Before this conference and we had a really interesting discussion about What might be the social infrastructure that supports a tool like this and how would we work on that part of it? So I'm really excited to hear more about your thoughts about that So our next steps are Sharing a white paper this winter with our meeting team initially for feedback once we get that feedback and incorporate it We will share the white paper for public comment So I would invite everybody attending here to look out for that We would really welcome your feedback and your thoughts in this meeting But also on the white paper once it's once it's reviewed and we will use that to plan for implementation We have a couple thank yous to make obviously first and foremost I'm a less for supporting this project To the everyone who attended our stakeholder meeting to the team working on the nomination tool in particular Joan and everybody at Cloudburst Reed Bame who's the project manager for this grant and Ruth Durr at the Ronin Institute So thank you so much to all of those people and thank you to you for attending this presentation