 Building a path forward to sustainable digital preservation, the genesis of digital preservation leadership across the UC system. Hello everyone, thank you for attending our talk. My name is Edson Smith from UCLA, and I'm here with my colleague, Sybil Schaefer from UC San Diego, and Hannah Taschen from UC Berkeley to give a talk entitled, Building a Path Forward to Sustainable Digital Preservation. Today, we'll talk about recent developments in digital preservation in the University of California system. A group from the UC presented on this topic in the winter 2019 CNI conference, but there's been significant progress since that time. So our primary focus will be on the events of the last two years. For those of you who weren't familiar with the University of California system, we're the largest public university system in the world. There are over 500,000 students, faculty, and staff spread out over 10 campuses, half in the northern part of the state and half in the southern part. We may work hundreds of miles apart, but the one thing that unites us all is our collective proximity to the San Andreas Hall. Historically, the 10 UC schools have functioned with relative autonomy with respect to each other, with inter-campus cooperation being more the exception than the rule. However, the UC's research libraries have traditionally maintained fairly close ties and have collaborated on a number of system-wide efforts. Most recently, the UC system undertook a four-year project to combine all 10 campus ILSs together into a single massive integrated system which rolled out this year to much fanfare. Additionally, the UC libraries have established system-wide committees to provide collaborative leadership and strategic planning for the system as a whole. If today's talk has a theme, it's how a UC working group evolved over several years to morph into a full-fledged leadership committee responsible for building system-wide governance for digital preservation. In 2018, the UC's Direction and Oversight Committee created a digital preservation working group with an initial five-fold charge. Conduct a survey of current best practices in the digital preservation field, survey the digital preservation landscape at each of the 10 campuses, identify and interview exemplars in the field with established digital preservation programs, articulate the gaps between our current practices and those of the exemplars, and finally draft a charge for a future Phase II group. As I briefly mentioned before, the Phase I working group presented their findings at CNI in 2019, so I'm just gonna recap our results briefly and then let Sybil and Hannah talk about more recent developments. Like any good survey, we built a broad list of questions and conducted interviews, representatives from all 10 UC campuses and the California Digital Library were interviewed as well as 12 exemplar organizations. After we went through all the interviews and distilled the collected information down to its essential components, we came up with a number of conclusions which could be categorized into three areas. There are three tiers of maturity in the campus digital preservation systems. There was significant agreement on technology issues involved and our challenges going forward were largely non-technological. So based on our interviews, we found the UC campuses fell into three broad tiers of digital preservation activity. There are two organizations in the top tier. The first is the California Digital Library, the CDL, which operates through the UC office of the president. Their core mission is to provide a broad suite of centralized services to research libraries within the UC system. CDL's preservation component is Merit, which is a core trust seal certified repository. The other top tier digital preservation program is Chronopolis, which is run out of UC San Diego. Chronopolis is a dark archive offering bit level preservation, it's track certified and partners with several large institutions including national labs and statewide digital libraries. At the other end of the spectrum are campuses that may have some local services, but which are mainly consumers of digital preservation services provided by the top tier. In the South, both the Riverside and Irvine campuses fell into this category and in the North, the newest UC campus at Merced did as well. For a practical matter, these schools received the bulk of their digital preservation services directly from CDL. They'll typically use CDL's dam system and as resources permit, will use CDL's Merit system for digital preservation. But what was really eye-opening for the committee were the findings regarding what the schools in the middle tier were doing. These are some of UC's largest campuses including the flagship campuses in Berkeley and Los Angeles. In each case, we found these schools all had some degree of legacy digital preservation services in use and were experimenting with standing up new systems but none had achieved any significant successes so far. In our interviews with ourselves, a number of common themes emerged for the middle tier campuses. At the time in the 2018-2019 academic year, digital preservation was still largely aspirational. No one was close to implementing practices that could be certified. Where digital preservation was practiced locally, there were large gaps between existing practices and those of the exemplars we interviewed. Next, we found that no one was working together. Each of the middle tier campuses decided to pursue digital preservation largely on their own without really considering building relationships with sister campuses. Also, we discovered that the responsibilities for digital preservation weren't anything but uniform. Depending on where you were, the cognizant folks might be archivists, IT staff, research data librarians, or born digital specialists. Additionally, another important discovery was that there were a significant number of legacy systems in use that needed to be replaced before there could be standardized workflows. This is expensive and difficult work. Data migration is hard and frequently not a priority for libraries. What's more, where digital preservation efforts were in place, both staff and economic resources were very limited. In most cases, digital preservation was almost always a side job for someone in another role. And everyone involved was deeply concerned about the cost, especially for long-term preservation great storage. With regards to technology, while there were still definitely unsolved technical problems and digital preservation to be addressed, for the UC system, that's not where our challenges were. There was general agreement that the existing OAIS model is sound and appropriate and provides a robust reference model for us to build upon. What's more, it was observed there are lots of standards and practices in place and those continue to evolve as technology moves forward. We also discovered that the widespread concern about data being in danger of spontaneous corruption while at rest was largely no longer a problem. With regular fixity checks, multiple file copies and solid remediation techniques in place, this had become largely a non-issue. Surprisingly, at the time of the survey, the majority of campuses were storing most of their digital content locally. This was somewhat unexpected as even in 2018, cold cloud storage was more reliable and sometimes cheaper than local storage. Our best guess for this is that we are inherently conservative organizations and that this was a carryover from a previous era. But in any case, this lack of biodiversity represented a gap which should be addressed. Finally, perhaps the biggest impediment scaling out digital preservation at the campus level was the cost of storage. At the time, preservation services cost $300 per terabyte per year or more with no provisions for services beyond the 10 to 20 year time frame. For many folks in charge, this was too massive a barrier. So to summarize, the phase one digital preservation working group was tasked with doing a survey and with surfacing gaps and we did that. Perhaps the most vexing issue every campus faced was the relative policy of resources available. In many cases, staff sizes were small or comprised of people who had other jobs. Outside of the two certified repositories, no one employee had digital preservation as their primary job responsibility. What was particularly revealing was that many of the required schools were out of skills were out there in the system, but they were spread out among the campuses and we were not collaborating. As we expected to find at the system-wide level, our resources were limited and our future lies in cooperation and collaboration. Having a certified preservation repository each campus is not the path forward. But to our surprise, we discovered that what we did have some technology issues to address, they weren't overwhelming. Instead, the real gaps that needed attention were in the areas of procedures, policies, workflows, and most especially, the need for a well-articulated system-wide governance model. This all became glaringly obvious and retrospect, but if we are to succeed at digital preservation as a system, we're going to have to agree to work together and focus on shared resources and services. Okay, what's past and pro is prelude. I'm now going to turn things over to Sybil, who will talk about her work in the phase two incarnation of the working group. I had to unmute, of course. Hi everyone. So during phase two, the digital preservation strategy working group was subsequently charged with finding out how much and what kind of digital materials are stewarded by the UC libraries. And also with identifying where it may be helpful to collaboratively steward particular content types. To do this, we developed a taxonomy of content types and conducted interviews with stakeholders from across the UC system in order to develop a clear understanding of where the system stands today. The survey took the form of semi-structured interviews that included both quantitative and qualitative questions. All in all, we conducted 34 interviews with 44 data stewards across all 10 campus libraries and CDL. The data stewards represented a variety of departments and job titles. And in addition to providing a comprehensive inventory and big picture snapshot of digital content held system wide, the survey results also highlighted key gaps and challenges in the current UC approaches to digital stewardship. Before we delve into some charts breaking down the content held across the UC system, I wanted to briefly review the categories or taxonomy we decided to apply to the materials. This taxonomy is largely adapted from the Library of Congress's list of content types which is linked at the bottom. The list breaks down textual works, still image works, audio works, moving image works, software and electronic gaming and learning, data sets, web based works, geospatial and artifacts. So in the next slides when you see some breakdowns of these different things, you'll kind of see what the actual categories were or have an idea of what the categories were. And although this is a really library centric way of classifying the data, as you may imagine, not all campuses had their materials readily classifiable in this manner. And so the following results are really the best estimate that we could derive given the data we were provided from each campus. So here we see the file count for each category across the system. These numbers are an estimate as of spring of 2020 when we conducted the interviews. You see we have over one billion web based works. File counts for web based works are primarily based on archived document counts for each campus. And document counts are defined by the Internet Archive as any file on the web that has a distinct URL. So they include images, PDF videos, videos, articles, et cetera, that are all linked to from individual pages. One thing to note when discussing the file count is to highlight that we're counting by files and not objects. As many of you know, an object may consist of many files and it's associated metadata. Given the variety of systems used across campuses and the different ways objects are handled in each, we decided that individual files would be an easier metric to count. We do realize that this is not necessarily an ideal way to count given the interdependence of certain file types. And additionally, we also have a fairly large other category that represents unprocessed material, material or otherwise unclassified materials. So if we switch from looking at file count numbers to the overall size of content divided by taxonomy type, you'll see that moving images are by far the largest content data type in the UC system at over three petabytes. And it should be noted that the vast majority of moving images are held by UCLA, specifically the film and television archives. And the last chart we have shows the content breakdown by campus and by size. Here you can also see, visually see, the size of UCLA's moving image collection. UCLA by far holds the more content than any other campus in the system. Berkeley comes in second, followed by UC Santa Barbara and then by CDL. And the CDL inventory is really the system-wide collections like system-wide Hottie Trust collections and other merit system-wide collections like e-scholarship and the OAC. So after reviewing the survey interviews and corresponding data counts, phase two of the Digital Preservation Strategy Working Group listed the following findings in the report. Overall, the UC library system stewards approximately four petabytes of digital assets and approximately 92% of that material stewarded is not in a preservation repository and thus is at risk. Also at risk is unprocessed and unknown content. Knowing what you have is really one of the first steps in preserving it and the majority of campuses reported having content that was either unprocessed or otherwise unknown. Most UC campuses have PHI or otherwise sensitive data in their collections, which does warrant special considerations. We found that we effectively employ partnerships with certain third parties for certain content types like Hottie Trust for monographs and the Internet Archive for Web Archives materials. One of the more qualitative findings we had was a lack of articulation regarding the selection of assets in formats with long-term values and along with that an absence of policy documentation. Echoing the findings from phase one, the campuses again reported insufficient staffing and effective organization and a lack of training as barriers to establishing a solid digital preservation program. The interviews also highlighted that the siloed nature of departments or programs within libraries prevented advocacy and leadership for preservation. Overall, the phase two working group found that to move towards progressively stewarding digital assets collaboratively as a system, each individual campus first needed to more effectively organize and coordinate their digital preservation activities. So next up we have the recommendations that were directly informed by the findings that I outlined in the previous slide. First, we recommended that the Council of University Librarians explicitly list digital preservation as a fundamental component of its strategic plan and priorities. This prioritization is what originally led to the formation of the digital preservation strategy working groups, and we wanted to make sure that the commitment was continued as we realized that what we needed to do was really a long-term process and not something that can be solved in a few months or even a year. So to that end, we also recommended changing from a working group model which is charged for a brief period of time, usually under a year, which has started up in wind-down times associated with it to a standing group of practitioners that could coordinate UC-wide efforts. We recommended some key efforts this group could tackle and Hannah will be talking more about those efforts and slides to come. We also recommended that each campus designate staff members to oversee coordination of digital preservation activities. Ideally, this would be someone who is exclusively dedicated to digital preservation. This recommendation was really meant to underscore the importance of each campus getting its house in order effectively to effectively participate in future system-wide digital preservation endeavors. As Etzin mentioned while talking about Phase 1, the real gaps that are needed are in the areas of procedures, policies, and workflows, and those take dedicated staff time to develop. The Phase 2 report was really well received, but there was a hesitation to create the recommended standing group without further detail on how the group would operate and what it would work on. There was an intermediate phase, Phase 2.5, which was really a planning phase to bridge the gap between working group iterations to the proposed standing leadership group. This phase really lay the groundwork for how the leadership group would work, including how the membership would be formed, timelines, the scope of work, and deliverables. At the end of Phase 2.5, the request to instantiate a standing leadership group was approved, and I'll now turn it over to Hannah to discuss the leadership group in more detail. Hello, everyone. Thank you, Sybil. I am going to talk about the digital preservation leadership group, or DPLG, and our plans going forward. With the recommendation of the UC Digital Preservation Strategy Working Group, the DPLG was charged by the Direction and Oversight Committee to lead the University of California in the area of UC-wide digital preservation. This is a standing group that spearheads future collaborative efforts, provides strategy and expertise, and access the guiding body for information dissemination and sharing across the UC system. Our starting point as we coordinate and steer UC-wide digital preservation policies, strategies, and actions is to address the barriers surfaced by the earlier working group. These barriers include the lack of training, siloed and ineffective organization of digital preservation activities, and a lack of guidance around how to store different types of content that exist not only on a very large scale, but may also have sensitive information requiring certain protocols around storage, security, and access. Thus far, we have formed three subgroups to focus on these barriers. Our first group is focused on education, training, and developing a common understanding of digital preservation so that digital preservation activities can be integrated with our organizational structure. Essentially, we want to operate on the basis of agreed upon best practices and have a good understanding of the options that are available within the UC system and which are most suitable depending upon our varying needs. Given that digital preservation involves a series of managed activities that encompass multiple areas of skill and expertise, we must recognize that it is therefore highly collaborative and requires communication across departments and roles. To do this, we need to build a shared understanding of digital preservation, not only across campuses and among practitioners, but also throughout the different levels of leadership, management, and roles. We are in the process of developing a digital preservation training program for staff in the UC system that are in some way connected to digital preservation. This training program will incorporate campus needs that were identified in the earlier phases of the working group. We will tailor training and digital preservation topics to a range of UC audiences and assess external experts in training programs. Given the large scale of our digital collections and the range of content types held throughout the UC system, we need guidance and tools to assist in planning our workflows. Our second group is developing a matrix for assessing content types to assist campuses in stewarding our collections and to identify and safeguard these collections that are truly needed for long-term preservation. This matrix, or rubric, is intended to identify the content types, use case characteristics, and privacy needs that shape preservation requirements. We will provide preservation approaches and options and select best practices for the UC system that also allow for the individuality of each library's collecting policies and practices. Our third group is establishing a framework for administering and facilitating cross-campus engagement with external collaborators and consortial partnerships, especially with regards to grant opportunities. As a starting point, this group is conducting a survey to gather documentation about local campus practices, procedures, and people involved in grant administration, and resource sharing at individual campuses. From there, the group will lay out and articulate the existing practices for cross-campus grants and UC-wide partnerships or membership. The goal is to create a useful network tool for libraries to create partnerships and more strategically pursue new digital preservation services and initiatives. As has been mentioned, digital preservation must be a continuous management process. We envision the training program to be an ongoing resource, and our work will continue assessing the costs and benefits of available economic models. We also need to have easy-to-use decision-making rubrics and mechanisms for how and where we save, what gets saved, and for how long. For the long-term, we expect to continue to evaluate and recommend best practices and methodologies that can be shared on a system-wide level and then utilized and adapted for the specific needs of each campus. The scale of our digital assets and the complexity involved in managing their long-term preservation can be overwhelming and intimidating. The DPLG's leadership roles ultimately to provide clarity and guidance to increase the feasibility of implementing and integrating digital preservation strategies throughout the system. To track on what our digital preservation leadership group is doing, the Wiki page is a good place to review. We are planning on having the cross-campus training available by mid-2022, and you should also be hearing about the assessment matrix we are designing as well. Additionally, all of the reports from the working group phases are available online from the DOC website. I highly recommend the Phase 2 report, as it goes into a lot more detail about the cross-campus research we conducted and has a breakdown of content types by campus as well. Thank you very much. Thank you for your interest. Thank you, Sybil and Edson.