 It's time to get started. Thanks for joining us. I'm Cliff Lynch, the Director of the Coalition for Networked Information. You have reached one of the project briefings that is part of the Spring CNI 2020 virtual meeting, and that will be running till the end of May, so there is a good deal more still to come. Today we have two speakers, and I'll introduce the presentation in a minute, but just practically, we'll hear our two presentations, and then we'll take some Q&A at the end. Diane Goldenberg Hart from CNI will moderate the Q&A. There's a Q&A button down at the bottom of your screen, and I'd encourage you to key in questions using that at any point during the presentation, as they occur to you. We'll deal with them all at the end, but there's no reason to delay putting them in till the end, so please, as questions occur, drop them in. Our presenters today are John Dunn from Indiana University, and I know many of you are familiar with his work. He's a frequent presenter at CNI and has been one of the serious participants in our community for years, and he'll be joined by Sean Africamp from AVP, who AVP is one of the partner organizations that's working in this. Now, just by way of context for a little bit, Indiana has had an enormous project for digitizing audio visual material that's been going on for a number of years and has done a great deal of pioneering software work in this area and has built that out into a consortium, essentially, of like-minded institutions. If you go back through the archives of CNI's presentations, you can find an awful lot on the development of that work. The topic today is a particularly interesting one to me, and it's about how we can most effectively combine machine learning techniques and human expertise to address issues around the very challenging scaling up of access to large AV collections, where there's just no way we're ever going to get enough expert human time to do it. But where it's also really hard for machine learning to get it just right, and I absolutely believe that the path forward here is going to be understanding these alliances between the capabilities of machine learning and human expertise. So, I'm really eager to hear this presentation, and I'm extremely grateful to Sean and to John for bringing it to CNI. And with that, I'll just thank them again, and thank you for joining us and turn it over to John. Great. Thanks so much, Cliff, for that very kind introduction. So, as Cliff mentioned, I'm here with my colleague Sean from Digital Consulting from AVP. I'm from Indiana University, and we're going to talk about a project called AMP. AMP is the Audio Visual Metadata Platform. Chris Lasinic of AVP and I talked about this in a very preliminary way at CNI several years ago, and the goal of this presentation is to provide the community with an update of where we are and where we're going. And I should note this project is a collaboration of Indiana University Libraries, AVP, the University of Texas at Austin, and New York Public Library, generously funded with support from the Andrew W. Mellon Foundation over the past several years. So, to start, just a brief outline of what we're going to be talking about, I'll give some introduction and background and overview of the project. Then Sean will come in and talk about collections that are involved in the project at this point, evaluation and selection of machine learning-based tools within the project. I'll come back then and talk about the platform architecture workflows that we're building and next steps going forward. I should note that this work is the work of a large team spread out across a number of partners that I just mentioned, and give great credit to all of them for getting us to the point we're at now. And it's the work of all of these people that Sean and I are here talking about today. All right. So, some background. Cliff actually motivated this fairly well, but as he mentioned, we have a project at Indiana University called the Media Digitization and Preservation Initiative, or MDPI, that has been underway for about four years as an effort to kind of funded centrally within the University to first inventory and then digitize all rare and unique audio-visual resources from across the institution. And these are some screenshots from the MDPI website. At this point, we've digitized over 325,000 pieces of audio and video, nearly 18,000 film reels, creating several dozen petabytes of data to be preserved and in derivative form to be made accessible to the degree we can, both to the University community as well as the broader research community. These collections represent a wide range of subject matter and genres, but also a wide range of existing metadata. And so, a couple of examples. This is an opera performance video recording from 2000, I think, unfortunately still on VHS at this point, of Stravinsky's Rakes Progress. And you can see on the right, there is quite detailed descriptive metadata for this work that came in as part of our digitization workflow pulled from the library catalog, very detailed information about performers, about dates, about other contributors to the performance and so forth. So this is kind of one extreme of quality of metadata we're dealing with within the digitization effort, mass digitization effort at IU. At the other end, this is an example of, I think, another, yeah, VHS in this case, where you can see on the right, the metadata, we have the, in the title field, tape B contains digital tapes, 7 through 11. It doesn't really mean much. If I look a bit closer, I can see some identifiers that might mean something to someone who's familiar with collection from which this came, but not to any average person, certainly. If I start to look at the video image itself, though, I can see a bit more. I can maybe kind of make out that the backdrop actually says University of Indianapolis. I can see a date, a time. If I can look at the face of the person there and see, well, that looks like it's probably our former Senator Richard Luger, but this is not information that is readily available in the metadata and so could not be discovered by someone looking for something in this time period or of this person or in this place. And so we have a lot of content that falls in this latter category. And the question is, for us was how do we, how do we make this more discoverable and usable by researchers, students, instructors, and the general public? Members of them would be interested in accessing this content. Going back to the opera example, I should note, even though we have very, very rich descriptive metadata, we don't have great navigational metadata or structural metadata to get me to particular acts, scenes, arias, and so forth within the opera. And so even though I can get to this two hour recording, two hour plus recording, I can't find what I want within it necessarily. And so working with AVP, we identified there's this challenge that we're facing, but others are facing as well. Growing digital AV collections created both through digitization, as in our case, but also just all of the born digital video and audio that many of us are taking in that's so easy to create now, so quickly compared to the past, obviously. At the same time, our users have increased expectations for access, both at the student level and at the faculty and research level, to be able to find things on the internet, to be able to find our collections easily, to have the same ease of use of working with our resources that they have with things like YouTube and so forth. But of course, we don't necessarily have the metadata for these collections. A lot of these collections have been somewhat neglected, in fact, due to staffing issues, due to technology issues. And so we don't necessarily have resources to do the full human sort of detailed human cataloging of every item that we may find in our collections. But working with AVP, we identified that there's really an opportunity here, we felt, given that machine learning based tools, now they're automated tools for continuing to emerge and improve that could potentially help with metadata creation for these types of resources. At the same time, there are new access tools that go well beyond the catalog to provide, to support various forms of time-based metadata, description of things happening within the timeline of a video or audio recording rather than just at the entire work level, tools such as Avalon, ABRY, standards such as DRIP-IF and its work for AV. And so really looking at how we can leverage the best of the automated tools together with human expertise, we're not trying to say at all that AI can replace the expert human cataloger or archivist. And to look at the fact that various kinds of collections, a music collection would be very different workflows than oral history collection, for example, in terms of the tools and steps to be applied. And so the goal of AMP is to build an open source software platform to support metadata creation for AV collections that supports building of these workflows that combine automated and human steps. Each step is what we refer to as an MGM or a metadata generation mechanism and that could deliver metadata ultimately to a variety of different target systems such as library catalogs, online access platforms and so forth, not tied to any particular delivery environment. Where we are in kind of the arc of the project is this, the middle piece on this slide. We are in what we call the AMP pilot development project or AMP for short. This was preceded by a planning project and would be followed by implementation of AMP into something that would be more production ready to be used by us, by our partners potentially such as MYPL and others as open source software. So with that, I'll turn it over to Sean to talk in more detail about the collections that we're working with and the tools that we're evaluating. Thanks, John. We could move to the next slide. So yeah, for this project, we needed a variety of sample audio visual materials that would allow us to test a wide range of MGMs. So materials from two major collections were chosen from MDPI content from the IU archives and from the IU music library that covered events in IU history and IU musical performances. We also needed an external partner to test less IU-centric types of content and also offer us some different use cases. So the New York Public Library, who also has a large AV digitization initiative, joined the project with the Gay Men's Health Crisis Collection, which includes interviews and footage of protests and speaker events documenting the organization's education advocacy efforts during the AIDS crisis in the 80s and 90s. Next slide. Amp requested 100 hours of content from each collection for a total of 300 hours. And we already had a long list of possible MGMs generated from a previous Amp planning project. So we met with each of the participating libraries collection managers to walk through some of their sample materials and talk about what metadata would be useful to extract and how they would want to use it. We then identified common needs so that we could prioritize the types of MGMs to incorporate into the project. For example, all collections had some use case where they wanted to transcribe speech from video or audio. And after that, each collection had needs that were more specific to their use cases. IU archives was interested in identifying people and locations. Certain people in particular are like Herman B. Wells, the former president of IU. The music library had an interest in identifying works performed, especially for segmentation and navigation purposes, as John just showed. And MYPL was interested in identifying content boundaries for locating multiple events captured on a single item or segments with no content in order to speed up archival processing and to support copyright review. Now in order to select the best MGMs for our collection manager's use cases, we needed to come up with some criteria for conducting evaluations and making informed recommendations. So we developed a set of criteria that we could use in selecting appropriate MGMs for the pilot, but also provide a general framework for any organization wanting to evaluate MGMs for their own use with the future AMP platform. So in addition to testing the accuracy of each MGM, we needed to consider factors such as cost, processing time, computing resources needed. It was also very important to everyone on the team to consider the more social risks of MGMs, like how algorithms might express hidden biases or what commercial services will do with the content you submit or the data they extract. So we included a criteria that we call social impact to assess what we found from each company's terms of service policies and potential risks that we saw in our accuracy testing. And we found the fairness, accountability, and transparency in machine learning principles for accountable algorithms to be a guiding light in developing these criteria, as well as an important resource in developing our testing strategies. Next, as a team, we developed a broad definition of metadata quality as it relates to our objectives for the project and we came up with the following. So note the emphasis here on good enough. The actual measure of good enough for each of these functions will differ depending on the collection and MGM involved. So constant conversation with our collection managers during the testing of each MGM has been important so we could have agreement on what good enough means. For each MGM category, we worked with the collection managers to select samples from their 100 hours of content that would hopefully represent a wide range of features to test for. As we tested and learned more about all the potential mistakes that an MGM could make, however, we realized that our sample sets could have been much more diverse in languages and dialects spoken and in ethnicities represented in order to push the limits of each tool and to tease out possible biases. So if you are trying this on your own, we recommend doing some experimentation with each MGM before you finalize your sample content set so that you can get a clear picture of just how poorly the MGM can perform. Next, we had to determine what metrics to use in testing each MGM's performance and this step made it very clear to us just how well we would need to define our use cases before testing MGMs for accuracy so that we would have a good measure for success. For instance, with video OCR, we wanted to find words and phrases that occur in the video that we could use as keywords or as input for named entity recognition to identify controlled access terms. We didn't care about where on the screen they occurred or at what time in the video. So knowing this allowed us to limit our testing to just verifying words found in the video without having to test the accuracy of the where and when. So once we had our samples and testing methodology, we created ground truth results for each of our samples. We ran our samples through each candidate MGM then we converted that MGM's output to our ground truth data format. We compared to and calculated precision recall NF1 scores and then shared our results and analysis with the collection managers for review. We also converted the data to more human comprehensible formats for qualitative review such as plain text for transcriptions or contact sheets of images for scene and shot detection. Overall, we found this whole process to be quite time consuming in that we had to write scripts not only to run the MGM's but also to convert our MGM output to a common data format and then compare and score MGM results against ground truth. So we've started sharing these scripts and GitHub for others to use so that they don't have to replicate the process that we did when they go to evaluate the same MGM's. So to date, we've evaluated and recommended a mix of open source and commercial MGM's in five categories, speech, music and silence detection, speech to text transcription, speaker diarization, named entity recognition and video OCR and we're still working on at least five more and we've started posting our evaluations of the MGM's we've reviewed on our project wiki so you can learn more there about how we came to recommend these particular tools. In the initial planning for this pilot, we had identified a long wish list of metadata elements we hope to extract through MGM's but after we started testing them, we quickly realized that the outputs we got from applying MGM's out of the box without training to arbitrary audio visual content would require some human intervention to make that output useful to end users. So we revised our goals to extract data that could be efficiently refined through minimal human editing and to enhance the amp platform with tools for supporting that human review, which John will talk more about in a minute. So of the MGM's that we've tested and have yet to test, we've been able to extract keywords, dates, names, geographic locations and full text and we're also generating metadata to support enhanced navigation and more efficient human processing of collections and copyright review. As far as training MGM's to achieve better results, our goal for this phase of the project was to develop flexibility in the platform to swap out MGM adapters to suit the needs of any asset type domain or collection. We were not necessarily trying to train models to get the best possible results for our sample materials. But we did want to explore how we could add options to the amp MGM adapters to allow collection managers to easily include their own models or custom vocabularies. We experimented with using several controlled vocabularies with Spacey, the open source natural language processing MGM we've chosen for need entity recognition. So now the adapter allows for the addition of any vocabulary an amp user wants to include that uses Spacey's rule based entity matching in addition to the default model. We're currently working on similar customizations for structured OCR of supplemental paper based materials and facial recognition of select known individuals. Throughout the course of these MGM evaluations though, we've discovered many future opportunities for diving deeper into training models for amp, particularly in using library collections as training data. With the addition of human editing tools into the amp platform, we have the potential to develop workflows for creating training and testing data that libraries could eventually feed back into the system as models trained on more domain appropriate content, which we think is something that would be very exciting to explore. And finally, I just mentioned facial recognition and this has been a hot topic of discussion within the team throughout the course of the grant. Facial recognition has gotten a lot of attention for producing biased results based on bias training data for its abuse and surveillance and automated decision making and the privacy implications of using people's images without their consent. Because of what we already know about the limitations and dangers of existing facial recognition models, as well as what little we were able to find out about data use and reuse by the major commercial offerings. We decided to limit our explorations into facial recognition to locally hosted open source tools only, and to the very specific task of detecting a single well known individual, Herman B. Wells, in a collection of videos. We've tried to keep bias and privacy in mind when evaluating all of our MGMs by applying our social impact criteria. But we're always open to input from the library community and how we can do this better. And I'll pass it back to you, John, to talk about platforms. Thanks, John. So I will very briefly talk about how we've been building out the AMP platform. Obviously, there's not a lot of time to get into a great level of detail, but just wanted to highlight a few key things. So back in the planning phase of this project, we worked to develop a kind of high level conceptual architecture for this platform that would allow us to do what we want to do. And a couple of the key pieces of that are user interface application, obviously, for archivists, librarians, collection managers to work with, to submit content, to build and run workflows. We would need some type of workflow manager to support constructing and executing these workflows, combining MGMs, which might be running locally, might be running in the cloud, might be running within supercomputing environments and elsewhere. And one of the significant architectural decisions we made was to choose an existing tool called Galaxy to serve as this workflow manager and workflow execution environment Galaxy, if you're not familiar with it, is a workflow system originally developed out of the bioinformatics computational genomics world for doing genomics type of work, but it's been successfully taken in by other communities who need this type of building of composing of workflows from multiple tools. And so we took actually an example from a related project Clams at Brandeis and WGVH, which is after some of the same goals as AMP, and have adapted along with them this Galaxy platform that potentially allows us to collaborate more greatly with that other project, the Clams project as well. But Galaxy gives us this ability to relatively easily build adapters that connect in MGMs and compose them into workflows. And then we have focused on building out this front end user interface that's more appropriate to our use case to allow submission of files, selection and execution of workflows and retrieval and interpretation of results. Another thing I wanted to highlight in the architecture is, as Sean mentioned, in addition to the automated MGMs, the notion of a human MGM, a workflow step that requires human intervention, is a key part of our concept here. And so we've been working on implementing a number of human MGMs into AMP to support things like transcript correction, review and revision of named entity recognition output, review and revision of automated segmentation of audio and video. And we integrated a number of existing tools as noted here to support those functions, have integrated or are working on integrating those tools. In addition, we needed some environment to support task management by the kind of human workforce, in our case, mainly student workers who would be doing this work. And we used an existing tool JIRA, which many of you may be familiar with, to support the creation and assigning of tasks and taking of tasks by that staff. And that is going quite well at this point. So with that, to wrap up, I just want to talk a bit about our next steps on the project. So we've put a lot of work into selecting collections and to evaluating testing MGMs. That continues. We have started building out workflows to support the various types of collections we're working with. We're going to work on completing that design, integrating additional automated and human MGM steps as they're selected by Sean's team. Start executing and evaluating the outputs of these workflows against test collections, working very closely with the collection managers to identify workflows and evaluate outputs as to their viability. And then continue to refine the application user interface and functionality. Looking at the next phase of AMP work, the idea of this pilot phase is really trying to prove out is this a concept that's going to work? Is this an architecture that's going to work? Does this really help in terms of time and costs and so forth with this metadata creation need for AV resources? So we'll be evaluating all of that and hopefully feeding into a next phase that would focus on getting this more production ready, deployable in different environments. One area we might focus more on is in certain areas, rather than relying exclusively on off-the-shelf MGMs, looking more to training of models or additional training of existing models, transfer learning kind of approach for particular kinds of collections that have large volumes of similar needs. And that we would benefit from having some additional data science expertise within the project in this next round. And then really focusing on how the metadata gets into and is used most effectively within various discovery and access platforms to serve this ultimate goal of improved discovery and use by users. So that's what we have time for today. Here are some links with more information about the project. I think now we'd be happy to spend a few minutes answering questions. Thanks very much. Thank you. Thanks, John. Thank you, Sean, for that overview and very interesting details about this project. A lot of steps in there and detailed work. So I'd like to open the floor now. For questions, just you should find a Q&A box at the bottom of your screen. If you just click on that, you can type your questions in the Q&A box and I'll read them here for John and Sean. And let's start with Howard Besser. Hi, Howard. Howard asks, it looked like your human MGMs were all really human review slash correction of machine MGMs. Do you envision slowly human MGMs? Sean can tackle this as well. I think, yes, so I think another one could consider human MGM, for example, taking all of the output and using this to help formulate a catalog record, for example, or there could be transcription processes where automation just simply is not going to work due to the languages involved due to the quality of the audio due to other issues that would be entirely human rather than correcting machine output. Sean may have other examples. Yeah, I mean, I think the way that the system works is you have pipelines of MGMs and the output of one becomes the input of another. So it seems feasible that you could have a human create data or, you know, hand transcribed transcript and then input that into a machine MGM. So I could be fitting in nicely. Okay. All right. Thank you. Thanks, Howard, for that question. Robin, Robin just chatted in. Great to hear what the project has accomplished. So thank you very much for sharing that. If you have questions or comments that you'd like to make live, please feel free to raise your hand and I can enable your mic. I can unmute you. So you should see a little option to raise your hand while we're waiting to see if we have any more questions. I just want to remind everybody that this webinar is part of CNI's ongoing spring 2020 virtual membership meeting, which will continue through the end of May. And I just chatted out there direct link to the schedule for the rest of the meeting. So check that out and we hope you'll join us for more offerings. And I also just want to remind everyone that we did offer, we did add a final plenary by Cliff Lynch, which will be on Friday, May 29th. So we hope you'll join us for that wrap up of this meeting, which has really been a great experience and a great opportunity for us to provide a lot of resources that have been recorded. And I'm going to thank our presenters again very much for coming to CNI to share your work with us. Thank our attendees for spending some time with us here today.