 and get started. Thanks for joining us today. I'm Cliff Lynch. I'm the Director of the Coalition for Networked Information, and I'll be introducing this session, which is a project briefing that is part of week three of the CNI Fall 2020 virtual member meeting. Just to remind you, week three focuses primarily on standards, technology, and infrastructure considerations. I want to note that along with the live sessions that make up week three of the meeting, we've also released a number of pre-recorded videos on this theme, and I invite you to have a look at those as well. A few things about the session logistically. We are recording this. We will make it available after publicly. There is closed captioning available, and please make use of that if it's helpful to you. There is a chat. Feel free to use that, and there's also a Q&A tool at the bottom of your screen. You can pose questions as they occur to you during the presentations after we hear from all three of our speakers. Diane Goldenberg Hart from CNI will moderate a Q&A session, and we'll try to address those questions as many as we have time. So with that, let me introduce this session a little bit. We have three speakers with us. David Wilcox from Lyrisis, Robin Rugabar from the University of Virginia, and Amy Blau from Whitman College. I guess what I'll say about this project is that this is one of these things that people tend not to think about strategically. You know, we have to migrate a whole diverse community of implementations of a platform, but while nobody tends to want to think about this strategically, this is a huge pain point for the institutions involved, and I was very pleased to see this project and to see that IMLS recognized the importance of this and funded it because it really is, I think, a very well-designed effort to work as a community on a community problem and really make a significant difference. We have two implementer organizations represented here. The University of Virginia is sort of a nasty, customized high-end Fedora III. Whitman is an island dora site, and there are a lot of those out there too. So I think the project has really looked at the diversity of implementations in the field and tried to do things that are helpful to all. So I'm really delighted to have Robin, David, and Amy here. I thank them for joining us, and I'll turn it over to David. Great. Well, thanks very much for the introduction, Cliff. I really appreciate it, and thanks to everyone for joining us here today, and what follows is really just kind of a brief update. I want to provide a summary of the activities that we're working on and hear from Robin and Amy on a couple of the pilot projects that are ongoing right now, and then talk a little bit about what's next for this project. So the grant itself was awarded earlier this year. There's some links embedded in these slides, and we'll share them out if you'd like to take a look at any of these resources in detail. But as Cliff mentioned, this was a grant awarded by the Institute of Museum and Library Services for a little bit less than $250,000 over 18 months, and the sort of the goal here, or the focus rather, is on moving from Fedora 3 installations to Fedora 6 installations, so on the migration side of things. And just to note that Fedora 6 is sort of the latest version of the software that is currently in an alpha state, and we hope to have released in production early next year. This presentation really doesn't focus on the particulars of Fedora and on the features of this new release, those sorts of things. I've spoken about that at length elsewhere, and there's lots of other places you can go to find those details. There are some links in the presentation, but I'm also happy to answer questions if folks have specific questions about the software itself. But really the grant is focused much more on the challenge, which is simply that most Fedora installations out in the world are running version 3 or earlier, which is unsupported and has been for several years at this point. And so this is repositories that are running on aging technology, and really our concern is not so much the software, it's the content. It's all the content in these legacy systems that is becoming more and more at risk as the years pass, and security updates are no longer applied, and versions of dependencies are no longer supported by systems administration teams, etc. And so the focus really is on trying to move this content forward in time. But recognizing that migrations take a lot of time and effort, and this was something we learned in the planning grant that preceded this one, where we investigated all the reasons why folks in the Fedora community were having such a hard time moving forward to one of the more modern and supported versions of the software. And so fundamentally, in a sentence, we're really just trying to bring the community forward to a modern and supported version of Fedora. That's kind of the overarching goal here. But again, the focus is not so much on the software, but on the content and trying to make sure that all of this great content that's in these repositories all over the world doesn't get lost to aging technology. So the process we're following here is pretty clear. We are starting out by working with pilot partners, and they're on the line here with me, and they'll be speaking shortly. And we're working to develop, test, and refine migration tools. There's some tools that already exist. There's some that we're developing as part of the grant, and we're working with these pilot partners to do upgrades and migrations so that we can improve these tools, but also produce documentation and best practices and combine all of these things into a kind of toolkit that we can then disseminate to the community at large, at which point we can get feedback and iterate on the toolkit itself to hopefully help everyone else move along the same path. And finally, we hope to host a dedicated migration training event at the end of this. I'll say a bit more on that in a few minutes. So just a bit more detail here on the phases. We're currently in phase one, which began in September and runs roughly until May of next year, 2021. And the goal here again is to document the migration and upgrade process of working with pilot partners. And in particular, taking a look at metadata mapping, decision making, all of the steps that one needs to follow in preparing for and executing a migration between, you know, version three and six of Fedora, but this I think would also apply more broadly in many cases to other kinds of migrations. And the goal here again is to produce a community toolkit, which is something that we hope to share early next year. So the pilot partners, University of Virginia and Cliff mentioned this. Virginia has a number of Fedora repositories, but Robin will talk about the specific focus for this particular grant project, which has a custom front end environment in Whitman College, which is an Islandora installation and it has a particular set of use cases there. Phase two begins roughly June of next year, continuing through to September. This is where we plan to take the toolkit that we develop as part of phase one and really disseminate it to the community, validate it, solicit feedback. So we'll be hosting webinars and providing lots of community channels and reaching out both to groups as well as individuals in the community that we know are running Fedora three repositories to really encourage them to take up these tools and work with them and let us know what's missing, what would help their institution better prepare for and execute migrations in their own local environments. And finally, just phase three is really trying to just recognize that it's really no replacement for hands-on learning. So the last phase of the grant running from roughly October next year to the fall of February. I was saying earlier, we intend to host a migration training workshop. A lot of this depends on travel restrictions and the status of COVID at that time. If in-person travel is possible, then we'll host an in-person workshop. If it's not, then we'll reconfigure this to an online variant, but I'm hopeful that we'll be able to do it in person because we do have travel funding allocated in the grant to assist institutions that might not otherwise be able to attend. The event itself will be free. There won't be registration costs, but we'll be able to help some folks travel to it if they otherwise wouldn't have been able to do so. And of course, all the way through this project, we're collecting feedback from the pilots as well as those that attend the workshop and those that use the toolkit doing as much as we can to gather feedback and improve the outputs that we're generating as part of this work. And of course, once the grant ends, we want to make sure that this work continues. And so fortunately, we do have ongoing year-over-year funding for the Fedora program that will help us continue to make sure that these tools and this training and everything that we produce lives on past the grant and continues to be updated and supported. And that's really only possible through, as I said, the support from all of the member institutions that fund us year-over-year and support full-time staff on the project, as well as all of our efforts to provide training and updates for the software and everything else that we do around community events to support these activities. So I do want to say thank you to all of these institutions for your ongoing support. And of course, encourage anyone who is using Fedora and gets benefit out of Fedora to consider becoming a member if you're not already, just to make sure that we're able to continue to support and sustain the software over time. So with that, I want to turn it over to Robin, who's going to talk more specifically about one of the pilots that we're running in the first phase of this grant. Thanks, David. So I want to start off talking a little bit about our goals. So our primary goal was to save this at-risk content, and it's in our oldest Fedora repository. It's a version 3.2.1. But we also went to test the migration tools and make sure that they had the features in them that would help other people that find themselves in similar circumstance to us. So today we have three different versions of Fedora. We have a version 4, which supports three of our Sambara applications, which manage our ETDs, our open access, and our audio and video collections. We have a 3.4 repository, which was always used for access derivatives. And most of that has already been migrated to a triple IF server. But we have a good idea about what's in that particular repository and have less problems with it. For this 3.2.1 repository, the content is largely unknown. The repository hasn't been touched for a decade, and it predates most of the people working there today. What we do know so far is that we have around 90 gig of content to migrate. We know that some percentage of the content is in older, non-supported formats, such as Mr. Sid, which we had tried briefly before Jitoka was ever a thing. JPEG. We also know that we have some content that's now in public domain and really available other places, such as a collection of Mark Twain. But we also have some rare and unique content, such as transcribe manuscripts, and a three-volume set of UVA history. The loss of all this content is considered high risk. It's on older infrastructure that we want to get rid of. It's an old version that is largely undocumented for us in the way that we've been using it. People have been afraid to touch the system, and we don't think we're alone in this situation. We want to test migration tools to tackle the problem and provide feedback to the community. Considering all of these factors, we made the decision to migrate the entire content as is from this Fedora 3.2.1 repository to Fedora 6 to get it stored in this Oxford Common File layout, which is the persistence of Fedora 6, that persistence layer. We believe this is the best path to stabilize the content and get it into a standard format that will then facilitate our ability to analyze the content, prune it, and provide any necessary format migration. Since we're migrating all of our infrastructure to the cloud, the first thing we needed was the ability to install Fedora 6, utilizing AWS and Docker. The Fedora team quickly updated the installation tools to accommodate cloud installation. We've run into a few problems primarily around content that didn't have necessary components, so it stopped the migration. As a result of the problem and some other things that we've noticed, we've been able to give that information back to the Fedora team, and they've been able to quickly turn around extra features, such as progress tracking, and they're also working on tools to validate what's been migrated. Obviously, this is an important thing, this validation for everyone, but especially for us, given that we don't have a good idea of all of the content, so knowing that everything that was read out of the repository is all that information is in this new standard format is very critical for us. So the benefits we see as being part of this pilot is one, we've prioritized a long-delayed project. We're gaining knowledge about our content. The migration tools now enable the use of Amazon AWS and Docker. The migration tools have gained a couple of user-friendly features such as the progress update that I've mentioned before, and we've identified some content lacking information, which now we can get together with the content, I guess, stewards and discuss these problems and find out whether this is something we can print or if it's something that we're going to need other people to help us with later. And I think most important, the content in whatever state will now be in a standard persistent format, therefore better protected, which buys us time for further evaluation. So with that, I'm going to turn it over to Amy. Thanks Robin. So I'll tell you a little bit about Whitman College's repository. We call it Arminda. It contains around 30,000 digital objects, which include undergraduate honors theses, other student and faculty works, archival collections of digitized photographs and other documents, and student newspapers from 1896 to 2015. We have materials in almost every island or content type, and this means that our migration will document migration pathways for a wide range of use cases. One of our main goals in preparing for this migration has been to remediate our metadata, both to make the metadata mapping from mods to RDF, which is required for Island Or 8 to make that less complicated and to improve the user experience with our collections. The documentation of our functional requirements related to our range of content types, and the documentation of our metadata remediation and mapping are two of our main major contributions to this project, especially at this sort of early phase. The functional requirements that we laid out for our Island Or 8 site for the most part stipulate functional parity with our Island Or 7 site, and we had listed a number of these requirements at the object and system level. The lyricist's team broke these out into some further categories. Things that are important to us include things like SSL integration, access control at the object level, search and filter across content types, and probably the most important functional requirement related specifically to Fedora is ensuring Amazon S3 bucket storage capability for Fedora, within Fedora for Island Or objects, because this will reduce the costs of cloud storage for us. And so that's also a potential incentive for others to migrate to Island Or 8 using Fedora, together with Fedora 6. Many of our functional requirements can be met using the affordances within Drupal and Island Or 8, which is great. There will be some custom development required, it looks like, specifically to support Serial's page content in our newspaper collection. So we're looking ahead to that. Our metadata remediation was managed by a small metadata working group of Whitman College librarians. We reevaluated metadata fields across all of our collections in order to streamline and standardize fields, as I said partly in preparation to map mods to RDF, and also to improve metadata display. We standardized some elements such as data encoding and creator name. We rewrote titles and descriptions, descriptive metadata for archives collections. This working group comprised our metadata and digital assets librarian, our associate archivist, and the repository manager, who is the scholarly communications librarian who is me. So we have expertise in a description from both library and archive sites, and some knowledge of how metadata is displayed in Island Or 8. And having this representation was really helpful to ensure that the standards that we were deciding upon would really work for the broad range of our collections. This work started, we started in the beginning of March 2020. We met bi-weekly through the spring and summer, and weekly starting in the fall, and obviously almost every one of those meetings was online. But that's fine. Our metadata librarian pulled together documentation on all of the fields that we were using in our Island Or 7 instance, and the working group members evaluated all 158 of these fields. So we got rid of fields that had irrelevant or outdated or duplicate information. We combined some fields. We introduced a couple of new fields that would be helpful in sorting or faceting an example as a genre. Reducing the number of fields also really streamlined the mapping process. We're currently down to 54 fields, of which I believe 38 have been mapped to RDF. So we only need to map a few more. Our metadata librarian mapped these mods fields to RDF, building on the mappings of the Island Or metadata interest group, and selected some alternative mappings only where those didn't really match up with our metadata needs. A really essential aspect of this work was the documentation of how we were going to use these metadata fields. Because this documentation both guides the remediation we're doing now, and should improve our metadata generation going forward. And so we tracked these requirements that we came up with in individual documents by field name. We have a larger draft guide to metadata. Our requirements include definitions to clarify the field usage, control vocabularies, spelling and capitalization conventions, date encoding conventions, whether a field can be repeated, etc, etc. We have lots and lots of spreadsheets compiled in various ways. And we've sort of tried to improve the structure of our team drive so we can find all of this documentation again when we need it. It's been a real journey to pull all of this together. I think though, or my conclusion from this really, is that the remediation and mapping of our collection metadata has been really very valuable in preparing us to make decisions about how to structure our metadata and our collections in the new Island Or 8 Fedora 6 system. Our deep familiarity with our collection metadata helps us to really consider the ramifications of some decisions we're still working on, such as how we're going to deal with linked agents. And as we're working with our project partners in the coming weeks to really plan the specifics of our Island Or 8 site and the migration pathways, we're really going to be able to draw on this knowledge. The more we can really draw these connections between our metadata work and the site build and the migration pathways, the more useful models we should be able to provide to other institutions who plan to migrate to Island Or 8. And that's sort of my take and I'll hand it back to David. Thanks. Great. Thanks very much, Amy and Robin, for the updates. And I'll just quickly wrap things up here. We do want to leave some time for questions. We are, as you can tell, sort of right in the middle of these pilot projects. So there's still lots to come before we're able to put out a toolkit and share with the community. But we expect to have something available early next year. If you'd like to follow along with this project, there's lots of ways to do so. There is a landing page in the Fedora wiki that is a good sort of jumping off spot for all the work that we're doing. We're putting out monthly blog posts on the Fedora website and you can follow those if you'd like to get some of that information. We have lots of active conversations going on in Slack, which you can join as well as mailing lists. And of course you can support us by becoming a member and supporting Fedora. But I want to leave here just my contact information. Since I'm leading this effort, in case anyone would like to get in touch for more information, I'm always happy to talk about the grant project or Fedora in particular. And if you have a migration use case, we'd love to hear from you and see if this toolkit might be something that might be abuse. But I'll stop there. I think we do have a couple of minutes. So maybe we can address questions if there are any. Terrific. Thank you, David, and to Robin and Amy for your examples from your pilots. And we do have some questions to begin with. Our first question is actually for Amy. For Whitman, will the custom scripts to S3 bucket storage be bespoke for Fedora? Or might it be leveraged for other repositories? Will you share it in the expected toolkit release? I will take a quick stab, but I'll probably have to hit that back to David. My understanding is that what we produce in the course of the project will be shared. I imagine that what is happening for our site, things will be bespoke for Fedora, but I don't know the extent to which they can be leveraged for something else. David, if you've got a little bit more specifics on that. Yeah, I can say a little bit there. So Fedora 6 will have native support for S3, and it really just underlies Islandora 8. So the way we're going to migrate content into Islandora, it will be migrated through Islandora, and then the content will be then persisted to Fedora and S3. So there's really nothing custom there that's a pathway that anyone who's doing a similar migration could follow. If on the other hand you're wondering if you could use something other than Fedora and have S3 storage under Islandora 8, I believe that is also possible. I think Islandora 8 has a fly system module and other means of having different ways of storing data. So I don't think there's really going to be anything particularly custom here, and so if you have sort of use cases around S3, I think those will be supported. Great. Thank you. Thank you for the question, and thank you for addressing that question. And now we have a question for Robin, for UVA. Can you explain more about the reasons behind the OCFL step? So if you're familiar with the earlier versions of Fedora, you'll know that there was a persistence layer in Fedora 3, but you have to know a lot about that repository and the way that it laid it out on disk. So even though it's readable, it still takes a lot of knowledge about that specific version of Fedora. And while it might be possible to bring up another repository over it, I don't have much faith that it would come up without having problems with what's stored there. So moving the content, migrating it from Fedora 3.2.1 to Fedora 3, Fedora 6, which the way it stores it is in this Oxford Common File Layout standard on the disk, it's more easily read. It's a persistence layer, which is more in line with best practices for preservation. But it also is something that if we wanted to use a complete other repository software of some kind, it's in this common layout that other repositories can honor. And we think it'll be in a format on disk that we can more easily understand because it's standard. And so we can write software to parse through it and prune it for things that we don't want there anymore. Does that cover what you were looking for? We have documentation for the standard and documentation for Fedora 6. David, I don't know if you had anything else you wanted to add. I think you mostly covered it. It's worth noting that the Oxford Common File Layout is a separate but related effort. As you were saying, Robin, it's a more standardized approach. There are repositories that are not using Fedora but are using OCFL. And so there are tools that can inspect an OCFL repository regardless of whether it's Fedora-based or some other application. So that standardized approach really does make the kind of work that you're talking about, Robin. I think a lot easier because there are a wide variety of tools that can understand and parse that data that don't rely on Fedora 3, for example, which is its own sort of custom application. I'm seeing a thumbs up in the Q&A box. So I think that addressed the question. Thank you, Robin, and thanks, David. And thank you for the question. And I'm not seeing any other questions at this time. And I see that we are at time. So I am going to once again thank our presenters for sharing your work on this project with us here at CNI and also our attendees for making time out of your day to join us. I will go ahead and turn off the recording. And if you are still with us and wish to approach the podium and have a chat with any of our speakers, ask a question. Please feel free to do so. And with that, I will say goodbye and thanks, everyone. Have a great rest of your day. Hope to see you back at CNI soon. Bye-bye.