 Hi, my name is Adam Strom. I'm the director of University Archives and Special Collections at Paul V. Galvin Library, the Illinois Institute of Technology in Chicago. And I'm here to talk to you about the Voices of the Holocaust Project. I'll start by giving you a little bit of background on the project. In 1946, Dr. David P. Boeder, who is a psychology professor from Chicago's Illinois Institute of Technology, traveled to Europe to record the stories of Holocaust survivors in their own words. Boeder translated a number of the interviews in the years after they were recorded, and some were published in his book, I Did Not Interview the Dead, and a later series of self-published volumes. But the original wire recordings were lost sometime after Boeder's death in Los Angeles in 1961, and copies of the wire spools are only held by a few repositories around the world, including the Library of Congress. The Voices of the Holocaust Project began in 1998 when Illinois Tech librarians requested a transfer of the recordings from the wire spools at the Library of Congress onto dattape. The original voices site was built from those recordings and appeared in 2000. It included audio of 16 of Boeder's interviews and plain text English language translations for the 70 interviews that Boeder had translated during his life. The 2009 version of the Voices site was a great step forward. It included digital remastering and restoration of the audio, creation of the first original language transcriptions for the 70 interviews translated by Boeder, and translation and transcription work for most of the other 48 interviews never translated by Boeder during his lifetime. All of the interview texts were encoded using the Text Encoding Initiatives XML format, which allowed for synchronization of interview text and audio during playback, and supplemental critical material was added in the form of interview annotations, scholarly commentary, camp and ghetto descriptions, and a glossary. By 2019, however, browser support for the Flash Player that handled audio playback on the site was dwindling, and the site was in need of a visual refresh. So Galvan Library staff started to experiment with some other options for text and audio synchronization and playback, and we developed a script to transform the TEI files to the Web VTT format, but didn't get much further. When we started in on deciding how we were going to migrate this site, obviously one of the major goals was to move on from Flash, but of course that wasn't our only goal. Over the last five years or so, Staff at Galvan Library has been working to streamline the array of technology and software platforms in use, so an emphasis was put on finding a way to migrate the site that would be sustainable in terms of both the initial effort needed to launch the site and the maintenance and upkeep of the site moving forward. Voices also needed a new design. The look and the feel of the site were dated, and it wasn't very user-friendly for anyone who was trying to user-navigate the site on a tablet or a phone. We also wanted to find ways to enhance the site, but this wasn't a move fast and break things kind of situation, and we wanted to be really mindful of the work that had come before ours, both scholarly and technical. So our aim was to make what enhancements we could without sacrificing what was already there in terms of both data and functionality. The collaboration that made the new Voices site possible began in 2019 with an email from Steven Narron, who's the director of the Fortanave video archive for Holocaust testimonies at Yale University. We'd worked with Steven in the past, so when he emailed about the upcoming launch of the Fortanave archive's collection being hosted in a system called Aviary, we were eager to hear more, and Steven's primary interest in this was having boaters' interviews discoverable alongside those of the Fortanave archive, and that excited us too, but we also soon realized that ingesting the boater interviews into Aviary could serve two purposes. Not only could we unite the Fortanave archive's collection of video testimonies with boaters' audio, but we could also use Aviary as the playback platform for the new Voices of the Holocaust site. Aviary is a cloud-based platform for discovery and playback of audio-visual content developed by AVP. Aviary provided the basic playback and text and audio synchronization functionality that we needed, but what sealed the deal for us was that Aviary made it easy to embed media players for Aviary objects on external sites. Aviary also provided things like the ability to add textual annotations, uploading multiple transcripts per file, and searching within transcript text. These were functions that users of our site had come to expect, and the realization that we could replace the Voices site without any loss of features or functionality made Aviary an easy choice. With the generous support of the Fortanave archive, we moved forward with setting up an Aviary account and ingesting the boater interviews into the Aviary system. Since we'd already crafted a script that transformed the TEI records for each interview into the WebVTT format, the majority of the work necessary to get boaters' interviews into Aviary involved the metadata. Aviary accepts CSV files, so Tim Floor, our repository and systems librarian, made a Python script that pulled the relevant metadata from each interview's TEI file and added it as a new line to a CSV. There was manual work that had to be done to reformat some data or add data that wasn't included in the XML, such as the file name of the MP3, with which we'd need to associate each row of metadata. The MP3 audio files and corresponding metadata were all loaded into Aviary, and we temporarily redirected the Voices of the Holocaust URL to the Aviary collection and began to work on the new Voices site itself. Drupal was a really obvious choice as the platform to host the site, as it's the platform on which most university websites are built at Illinois Tech and at Galvin Library. We'd used Islandora 7 on a recent institutional repository migration and decided that the then-newish Islandora 8 update might be worth trying out on the Voices site, partially because of its built-in linked-open data functionality, and also because we hoped that the way that Islandora had applied Drupal's taxonomy feature was going to help us with all of the people, places, and corporate bodies that are part of the Voices of the Holocaust metadata. Since the actual digital objects presented to the user, namely the audio and text, aren't actually stored in Islandora, it's kind of an odd application of the Islandora system. But a test instance of Islandora 8 pretty quickly proved that it was going to be up to the task, and so we started to make plans to ingest our data. Ingesting the Voices metadata largely took place via Drupal 8's migrate tools, using YAML migration routines to add interview metadata, as well as the people, geographic locations, corporate bodies, and other data that make up the site. Luckily, the migrate tools make reverting an ingest easy, since we soon realized that the interdependence between data points meant that we had to run the ingests in a specific order, and we learned that lesson over and over again. The TEI XML files included unique identifiers for each person, camp, ghetto, and geographic location, and provided comprehensive lists that could be used to create data dictionaries to facilitate the linking of data between the interviews and taxonomy terms. It sometimes took a few tries to determine the best means of ingesting a particular set of data, and in some cases we chose to manually create CSV files for certain smaller datasets. We had unique identifiers for everything and structured data for most of it, but as always with any sort of data migration, structured data is only going to get you so far, and as always it seems like some things absolutely needed to be done by hand. Islandora's controlled access terms module uses Drupal's taxonomy feature to create vocabularies with the ability to link to controlled authorities. The module provides 10 different vocabularies for things like people, places, corporate bodies, and languages. Islandora uses Drupal's entity reference field type to use taxonomy terms from these vocabularies as values in other metadata fields. This meant that we could build links between, for example, an interview and its interviewee, and then again between the interviewee and a concentration camp in which he or she had been interned. The previous versions of Voices of the Holocaust had used authority sets in creating geographic and other metadata, but there were no identifiers or URLs to link between a value and the particular authority from which it had been taken. Reconciliation of names and places with authorities was a key feature that we wanted to add to the Voices of the Holocaust site. Each type of data required a different reconciliation technique, and despite our best efforts to avoid it, eventually each record was touched by hand even when an automated reconciliation process was used. The people were relatively easy to get through by hand since most of the interview subjects didn't have records. Camps in ghettos and corporate bodies were also relatively quick hand reconciliation processes, though for some of them different names used for a single camp caused some headaches, and there were times that we had to do a fair amount of research to identify the particular camp referenced by a term or disambiguate between camps with similar names. Geographic locations is where things got pretty daunting. The geographic locations XML file contained more than 600 terms, everything from well-known cities to tiny villages that may no longer exist. Name changes, redrawing of country borders and vernacular names for regions also caused us some headaches. We used a combination of open-refine and Microsoft Excel to reconcile our list, country by country, we went against lists downloaded from GeoNames.org. And at first this process seemed incredibly fruitful, though what we hoped would be a small number of terms that needed reconciliation or disambiguation by hand turned out to be a majority of them. Since many of the terms were linked with the wrong place, for example, there are lots of different places named Washington in the United States that are not either Washington DC or the state of Washington, or terms for cities were linked instead with records for rivers, districts, provinces, you name it. The process of checking and correcting these links if needed for more than 600 locations was time consuming, but it was necessary. And we ended up using this as an opportunity to also link out to authority records for the Getty's thesaurus of geographic names when possible, since that was the authority used for geographic terms in the previous versions of the site. We were able to find matches for the vast majority of our taxonomy terms, though there remained some ambiguity with some of the records and a few total mysteries with a handful of geographic location and concentration camp records. The design process for the site involved the typical concerns and problems that you have with a site of this sort, and I don't think I need to talk about things like responsive design or things like that here, but I wanted to focus on just a few facets of the design and layout process for the voices of the Holocaust site. Anyone who's used Drupal knows that views can be an incredibly useful way to display data, and we use Drupal views to display the embedded metadata player or media player, sorry, to customize interview metadata display and display an interviewee's biographical data on each interview page. We also built customized displays for terms in each taxonomy. I won't go into the nitty-gritty detail on everything that we did, but we repeatedly leveraged the flexibility of views to do things like use contextual filters to dynamically display data based on a value in a URL or rewrite a field's display using data from other fields. The payoff for some of the work felt small based on the time we spent troubleshooting, but being able to display an interview location as the Grand Hotel in Paris, France, rather than just the Grand Hotel, was worth the effort spent. The ability to have a user perform a search in Drupal, click on a result, and have the same search already appear in the Aviary player search interface was huge for us. And when we weren't sure it would be possible, it loomed as a potential headache and a source of confusion for our users. Even after Aviary had added the preloaded search term functionality for embedded players, getting the search terms out of Drupal and into Aviary iframe URLs involved a handful of dead ends and a fair amount of confusion. We ended up creating a custom token in Drupal that would use URL syntax to identify and store a search query if it found one. We then modified the interview page template to build the iframe URL dynamically by combining the Aviary identifier for an interview and the search term provided by that custom token. So now when a user searches for a keyword in Drupal, the Aviary player loads with that term in the search box and occurrences of the term are highlighted in the translation and transcription. This is the sort of thing that users might not notice when it's working, but would be really aggravated by if it were missing and we're really happy we were able to make it work. Throughout the migration process, we tried to remember that we were carrying on someone else's work. The voices that have made up the voices project are not only those of David Boder and the interviewees, but also the librarians, scholars, developers, historians, and others whose hard work and expertise have made the site what it is today. This meant that we wanted to make sure that we were able to include things like the interview commentary, recreate the annotations, and generally adhere to content-based decisions that had been made earlier in the project. In some ways, this made our job easier because it removed some of the decision-making and provided easy answers for some of the questions that we ran up against. For example, we tried to stick with the people, place, camp, and ghetto names as they were defined in the 2009 site as much as possible, and when authority records convinced us to change the primary name of a term, we were sure to include any other versions of the name that had been used as alternative names in the record. The voices the Holocaust project started over 20 years ago, and while we wanted to update the site and make it more modern, we didn't want to erase that over 20 years of history. So we retained almost all of the pages in the about menu on the page. We also wanted to continue to credit those responsible for making the site what it is, even if their contributions happened more than a decade ago. We wanted to preserve the documentation from the old site, so the project notes and description of the site as it existed in 2009 were updated for the Drupal site, but the 2009 version was left unedited on the site as well. It's our hope that preserving as much as we can of this history of the voices of the Holocaust site and project will help us to tell the story about the project and inform tomorrow's decisions about where it goes next. So speaking about where it goes next, in the short term, hopefully by the time you're watching this video, the final tweaks on the site will have been made and the new voices site will have been launched, but the work on the site won't be done with the launch, of course. Our quest to preserve as much as possible from the 2009 voices site has one major component left, and that is the annotation of a defined list of glossary terms and linking taxonomy terms in the interview text to the appropriate term pages in Drupal, and this work is going to continue into the spring and summer of this year. We're excited about the ways that the new voices site will exist in a larger ecosphere of online resources and scholarship, and Aviary's new flock feature is a way to group similar Aviary collections for better discovery across collections, and so one way we are going to leverage this interoperability of the voices site will be in a flock of Holocaust testimonies with collections from the Fortnough Archive and the William Bremen Jewish Heritage Museum. We are eager to do some things that will help increase discoverability across these collections, some things like subject analysis and some data harmonization. We're also looking forward to the possibilities of expanding or enhancing the critical and scholarly features on the site, and there are a handful of interviews actually conducted in Yiddish for which we don't yet have transcriptions, we only have the translations that were done. And some of this work will definitely require scholarly expertise that we certainly do not currently have on staff in the library, but we're hoping that the new voices of the Holocaust site might inspire some new enthusiasm for Bowder's work and maybe the genesis of some new collaborative opportunities moving forward. We're honored to have been a part of telling these stories and advancing David Bowder's work and we're very happy and excited about the new voices of the Holocaust site. So to wrap up, I want to thank the other members of the team who made this new version of the site possible. And I want to thank all of you for checking out this presentation and please visit the site, get in touch, and we hope you like it.