 Alright, welcome everyone and thanks for coming to our session on the, on AMP, the audio visual metadata platform. My name is John Dunn. I'm assistant dean for library technologies at Indiana University Bloomington and am a project of service project director for the audio visual metadata platform or amp project for the past several years. And I'm Sean Averkamp. I'm a senior consultant with AVP and I'm a project manager on the amp for this phase as well as a subject matter. So we were just recalling that we had planned to come to San Diego two years ago for CNI to give a presentation on amp. So it's it's great to to be able to actually finally do that in person. With all of you, we have given some updates in the past kind of an original conception of the project and then some of the work that we had done during the second phase. But here we're going to talk about a few things. We're going to provide some basic background and context on the project for those of you who aren't familiar with it. We will talk about what we accomplished and learned in the last phase of the amp project and talk about the goals and work. For what we're currently doing and where we see it going in the future and I should say this is a project that has really been a partnership between Indiana University AVP. New York Public Library and in phases one and two the University of Texas at Austin and has been generally supported by a series of grants from the Mellon Foundation. So we thank them very much for making this work possible. So first we'll talk a bit about background and context. So amp is a project that's born out of a challenge that we were seeing at Indiana University, but that others who are dealing with large volumes of AV audio and video. And I lump motion picture film in that as well under under the rubric AV even if that's not technically correct but for purposes of this presentation, essentially time based media audio video film. And the fact that their AV collections and the interest in AV collections on the part of scholars and for use in teaching is growing exponentially. Libraries and archives are dealing with large numbers of AV materials and legacy formats that are becoming challenging to digitize or reformat but also cannot continually preserved in their current forms, either due to media degradation or the obsolescence of the equipment that can play these formats back or both. At the same time as I mentioned the expectations and demand for these collections is growing and user expectations for discoverability and access are growing based on people's experiences with video on the web in streaming platforms on YouTube on Vimeo, etc. And libraries are and archives are needing to step up the discoverability of collections to make them more usable and useful for teaching and for research. The challenge in that is that often there's not a lot of metadata, particularly for archival AV collections and in many cases, certainly at Indiana University units that hold a lot of AV materials have not necessarily even been able to view or listen to these materials. To describe them if they wanted to because of the issues of format degradation and obsolescence that I mentioned so it's hard to improve discoverability without without metadata. And at the same time, you know resources for traditional approaches to cataloging and archival description are limited compared to the volume of content we're having to deal with. So the opportunity that we're trying to address through amp and other related projects at Indiana University is to take advantage of the fact that one can take a mass digitization approach to AV in the way that say Google, you know, took for books and really do did take a digitize first approach and then work with the see how we can work with the digital files then to better enable description. Rights determination. And, and so on. And so we have had a project at IU that wrapped up or is actually still in the process of wrapping up called MDP I media digitization and preservation initiative, which has digitized about 300,000 plus pieces of audio video and motion picture film media. But the, the level of description and metadata that exists for these collections varies substantially across units so for example our music library has very rich metadata at least at the item level for a bunch of their material that we have by ethnographic collections. University archives collections that are much less rich in terms of the metadata that's available. And you know IU and there are a few other places that are taking this similar approach at this point to really taking an all in approach to digitization of AV collections that are encountering similar problem. And so we have this problem and then at the same time we have the opportunity of having these things in digital form and the fact that machine learning based tools and other automated tools to try to extract. Potentially, you know, meaning and information from from media files are emerging from multiple directions and potentially able to help with this problem and so that's really what we've tried to do in amp is to look at how can we leverage machine learning and other automated tools together in workflows with human expertise, flexibly, and in a way that can be configured to be appropriate to different kinds of collections the kinds of work you would do with a music, you know concert performance archive would be quite different than what you would do with oral history archive in terms of the kinds of processing that's needed and so we wanted to be able to build workflows that are appropriate for for the collections. So the vision of amp has been to create an open source software platform to really help with metadata creation and augmentation for AV collections that lets one build these workflows that combine automated and human steps. A little bit of vocabulary definition that's going to be important to the to the rest of the session here. MGM is a term we used to refer to metadata generation mechanisms. This can be any, any tool or step that creates some sort of metadata for for an item or a collection. It might be an automated tool such as speech to text. It might be a human or manual tool such as correcting the output of speech to text. And it might run in a local computing environment might run in supercomputing or high performance computing HPC environment or might be a cloud service. And so we wanted to amp as a system to be able to handle the integration of all of these different types of services. And then ultimately deliver metadata that could be used in what we call target systems, which would be online discovery or access systems such as archival finding a discovery system, a library catalog, AV access platforms such as I use Avalon or AVP's Avery and so on. So we have had undertaken three phases of work so far the first phase back in 2017-2018 was a planning effort that really kind of developed and refine this initial concept of amp convened a workshop of number of experts to help inform the technical architecture and design of amp. And there's a white paper that came out of that. All of the documents that will reference here and a lot of the documentation are available on our wiki and we'll put up the address at the end again for that. And then from 2018 to 2021, which was extended a bit due to the pandemic, we really sought to build enough of this amp system to be able to test it start to test it with actual real collections, real librarians and archivists. To see is this idea that we have is this approach truly feasible. And there's a white paper report that came out of that that talks about that work and I'll go into a little more detail on on that here in a minute. And then currently we're on our third phase of amp that has been funded by by the Mellon Foundation working to develop additional components of the system to improve the usability of the system to work on packaging and deployment issues so that this could be used by others. And testing with a broader set of collections and Sean will talk about that current work on phase three in a minute. But before then I'll talk about what we accomplished what we learned in phase two over the past several years. And phase two was also referred to as amped the amp pilot development projects so we were we were amped to work on amp. So the main accomplishments of this last phase were to actually develop the initial version of the application. And I'll show a little bit of that here in a minute, but there's two main pieces to that there's a web based user interface. That's written in Java and in JavaScript that allows archivists and librarians to interact with the system to upload or load in content to run workflows against that content and so on. There's a workflow management and execution kind of back into that called Galaxy. I will show here in a minute that we adopted from the actually the computational genomics bioinformatics community that we use as kind of the workflow back end. And then Sean led this piece evaluating and implementing a variety of MGM these metadata generation mechanisms automated tools tools regarding human intervention and really developing a pretty substantial evaluation of rubric to to select those tools. And then we tested with 100 hours of audio and video from each of three collections to at Indiana University and one at New York public. So this is just a screenshot of the amp application as it currently stands which you're not going to be able to read a lot of that but this is a dashboard screen where one can see what files have been uploaded what the status of workflows are and actually access the output from individual workflow steps so this is something that's more useful in kind of sort of direct work with the system for debugging and understanding workflows and so on. Ultimately we would see this feeding into a discovery system or an access system. As I mentioned such as Avalon or Aviary or other platforms. This is a screenshot of the galaxy system so galaxies I mentioned comes out of the bioinformatics community and it was a tool designed to to allow researchers to create workflows that integrate multiple processing steps that might run in different computing environments to process genomics data but it's been it's been adopted by a number of other communities. In particular the computational linguistics community has an international kind of infrastructure for doing computational linguistics work that utilizes galaxy and we learned about galaxy from a project somewhat complimentary to amp at WGBH archives and Brandeis University in Boston called Clams which is applying computational linguistics approaches to helping to create metadata for AV resources and they had selected an adopted galaxy and we felt that adopting galaxy would make sense for us as well and would eventually allow us to share tools so essentially what galaxy lets you do is you can create wrappers in Python to integrate different tools and processing steps and then you can visually go in and chain these together into workflows that might might be appropriate for a particular need so again you probably can't really see this but we're starting with an input that might be audio or might be audio and video we extract audio from that. We run it in this case through Amazon Web Services transcribed to get text. There's a human transcript correction step and then that eventually outputs a web BTT file so this is a very simplified example. It doesn't get into a lot of the mgms that we integrated but it gives you a sense of what what galaxy can do. And this is just a screenshot of one of the what we call human mgms, which are those those steps that require human intervention. And we've worked to integrate other tools as much as possible so here we used open source to call the BBC transcript editor from from the BBC and and integrated that into amp so that can be used for for a transcript correction. And so this is a I won't go through all of this but this is a list. We'll share the slides can refer back to this later but a list of all of the mgms that we have implemented and integrated with amp in various categories speech to text named entity recognition. Video OCR segmentation human correction a few others and we made a deliberate effort to try to incorporate both open source and at least one open source and one commercial cloud tool in each category to be able to offer options and to be able to compare those. And the selection of these mgm tools for integration was really based on evaluation criteria that included a number of different considerations accuracy of the tool whether it could support you know a variety of input formats output formats that we would need the amount of processing time and computational resources that was required the ethical and societal social impact considerations around tool and we'll talk about that a bit more in a minute cost support burden. The ability only having an API abilities to integrate it and looking at whether it was a case where we could use a pre trained tool or whether we actually needed to do machine learning training in order to accomplish particular goal. So some of the things we have learned in this last phase of work. You know one is that this choice to work with these proprietary tools led to both benefits and challenges so some of the challenges we encountered with good commercial cloud machine learning kind of pre trained tools that are out there from Amazon Microsoft IBM. Google and others is is the degree sometimes of in unpredictability from run to run because these tools are being improved and changed over time. Certain undocumented behaviors such as filtering out particular words in one case that was unexpected and undocumented. And just not having visibility necessarily into what is happening kind of being a related issue. That getting into kind of the ethical and privacy concerns that were discussed a bit in the opening panel of this conference. One big difference we saw across the vendors of these services is how they approached the both the documentation and contract terms around what they can do with your data. So if you upload data to be processed if you upload the upload data to augment training. Will they can they retain that for purposes of product improvement and use that for other things. Or what do they explicitly not retain that is that an opt in or an opt out sort of setting Amazon, for example is very much opt out you have to tell them don't keep my data where they will. Whereas IBM and Google are the opposite. At least with the tools we were using and Microsoft to be honest was very hard to figure out and I ended up in an email exchange that. Eventually I was talking to someone in the general counsel's office at Microsoft trying to interpret their terms of use and what you know whether what they really you know. What they really meant in terms of this so there are these challenges and we can talk more about that in the q amp a but. Of course these companies are investing huge amounts of money right into training and developing these tools so for some cases they work really well so there's this balance here of benefit and. And some of these challenges another area we dealt with is that we had a variety of content that range into music as well and. The state of the art for tools dealing with music much less well developed than those that deal with more textual sorts of content and so that's led us to focus on more spoken word content for this next phase. Of work. There's this decision point as I mentioned about when we use existing tools and models versus training new models so we chose for example to train our own model for applause detection to help segment. Concerts music concerts because the existing models we tried weren't working well for our data. Facial recognition as well we we chose to do training of models for very limited use cases like recognize this specific well known person in the set of video as opposed to more. General application of facial recognition or using. Vendor tools because of some of those ethical considerations and we can get into that in the q amp a bit. And some issues with technical implementation. I can talk more about that if you're interested and then librarian and archivist engagement you know we were really pleased to see that. That archivists and librarians special collections librarians are very excited about the potential of the amp. But that are you in my pl but then starting to think about how does this could practically implemented in workflows. What can I do with the output beyond just. What can I put in a mark record or any ad is a more challenging you know conversation because they haven't worked with these kinds of things and Sean will get into. More about how we're engaging them in this next phase and so to transition to that. Talking about our current work on amp, which is funded by a new grant from melon that runs through the end of this year we've really focused on. Number of different areas to take amp to something that could be used more practically in production, both at IU and elsewhere and Sean's going to talk a bit more about them. So as john said we're doing a lot of work this phase on improving the user experience and for this round we're partnering with managers of collections from underrepresented cultures and populations. So we can broaden our understanding of concerns around influence implementation and machine learning tools. And so we can surface some new issues with the existing implemented mgs. We also wanted to work with partners who represent a range of different sizes of collections and use a variety of different systems for description and access. So we're working with a diverse group of partner units of different sizes at IU and at NY. So based on the progress and feedback from our previous phase we're focusing the space on workflow creation so that's allowing users to create and edit their own amp workflows. Currently you can only do that through the galaxy back end. We're working on batch upload of files, improve navigation of collections, and also using intermediary files as inputs for mgs independently from the predefined workflows, such as using a transcript output from speech to text as input for entity recognition. We're also improving and expanding our human mgm editing tools like john showed that the transcript editor. We are adding video to the apple on timeline tool which we've been using for human editing names entity recognition results. And we're at adapting it for use with other tools like audio segmentation and face recognition. So we're also improving the open source BBC transcript editor which is what we we based our transcript editor on it has needed significant customization to integrate that into amp. We also learned from our users that the human mgm tools need to be available independently from the amp workflows. So currently, to use the editing tools you need to set them up as part of a workflow and connect them to a JIRA instance but we realize that like with the intermediary files, we needed amp to have more flexibility for users to be able to process files through mgms incrementally, rather than making users plan out a full pipeline. So to engage our collection managers we've been using a number of different methods system demos hands on training discussions at all team meetings and targeted discussions around topics. And we'll be conducting focus groups and some more formal user testing as we develop new features. And as we have more to show users whether that be completed features or wireframes. And we've gotten some great feedback so far. So we, we, for example, we just led some focus discussions with two groups of collection managers on file upload and data exports better understand the needs for both smaller and larger size organizations for manual or automated upload to the system, as well as for exports of mgm outputs and trying to find out what the most useful date data formats would be for both humans and machines. So from that conversation we learned of a variety of different use cases for getting content into amp for processing at different points of the acquisition to access lifecycle. So archivists may want to process files in amp at the start of an acquisition to use the tools for streamlining accessioning decisions. Archivists catalogers or metadata librarians might want to use amp tools during description to add valuable metadata that they wouldn't normally have time to create. And after items have been described right staff might want to use amp to help them make rights determinations to provide the appropriate levels of access. And finally, even after access for the many media items that have been minimally processed or described in finding aid say researchers might request access or further information that could require some on demand uses of amp. So we found where a file might be at a given point in this pipeline, what system it's currently sitting on, what identifiers have been assigned to it, and what permissions are required to access it each of these cases might present special needs for getting files into amp. So this will require that we make sure our data model and our API's can accommodate the necessary flexibilities in who might be initiating uploads where the files are coming from, and how they should be identified in amp. To export, we've learned that some of the formats that we've already created are already a big hit like these contact sheets that we're generating for shot detection face recognition and video OCR, which our collection managers think will be useful, not only for archival processing and description, but also for making available to researchers so they can preview content that can't be accessible remotely, or it could just be an efficient way for them to quickly scrub through content. We also heard a need for automated exported data rather than just on demand download for hypothetical workloads or workflows that involve pushing large volumes of files through amp and having the MGM outputs accessible from a repository location by staff at any point in the archival processing or description pipeline. Now that our collection managers are finally starting to work with their own content in the system, we've been getting some great feedback on bugs or unexpected output from MGM, so I'll just show you a couple of examples here. So for example, with the open source shot detection tool, Pi Scene Detect, one of our collecting units discovered the MGM was generating excessive numbers of shots on some areas of video without content. So with speech to text transcription, they found that Amazon Transcribe MGM was making some strange transcriptions during areas of silence so you can see here where it seems to be finding the words okay, okay, thank you, okay, which maybe that's outside the range of human hearing. I don't know. And both of these examples represent cases that might be uncommon with the content that the models were trained on, but as we all know these are very common cases with archival content to have these gaps in content on media. So surfacing as many of these types of issues as possible will help us with tailoring workflows, creating workarounds, or at least documenting these known risks for future users so that we can improve the output before we release amp. I also wanted the other big additions to amp in this phase. So, in phase two for the pilot one of the most important things we learned during our evaluation and selection of MGM tools was that accuracy can vary so drastically for a tool depending on the media content, and how those levels of accuracy translate to quality will be different for every use case. So, for example, a 30% word error rate for speech to text transcription may be unacceptable for using a transcript for captioning, but it might be good enough for keyword search. So we recognize that future amp users would benefit from being able to test tools on their own collections to make more informed assessments of both the values and the risks involved in applying the tools, rather than using any of our minimal assessments as recommendations. So what the evaluation module and amp is going to do it'll allow users to upload ground truth data for one or more files that they run through an MGM and then use built in tests to computationally compare the MGM output against this ground truth, or ideal output. And then they'll generate quantitative scores like precision and recall and then they'll be able to visualize those results. And if you're unfamiliar with how this kind of ground truth testing works for AI, we've got a link on our wiki to another presentation that goes deep into this. So these are all just early wireframes but as an MVP we're envisioning a simple visualization interface like this that is generated dynamically from the ground truth test outputs. And we'll let the user view bar charts or box plots to view and compare quantitative results for one or more outputs for one or more tools. We're also working on a view that will let users qualitatively review the comparison of MGM output to ground truth, so they can better understand how MGMs are failing on their collections. When they're trying to determine whether or not to apply a tool, or figure out what kind of workarounds or quality control mechanisms that they need to apply. So having worked with these AI tools in phase two, we on the core team understand the value of building evaluation support and to amp and helping users to integrate AI into their workflows confidently and responsibly. But it's been very challenging to conceptualize what these evaluation tools should look like just by interviewing users about their hypothetical use. So I think that with this evaluation component, we're going to see something similar to phase two where we experimented with tools within the core team, and then try to generate ideas and feedback collection with collection staff. But we just showed them demos and results and it wasn't until they were able to start working in the system themselves that we really started getting actionable feedback, and we even started seeing users get excited about using these tools. So our hope is that once collection staff are able to easily start using the tools on their own and have mechanisms for exploring the quality of the results. They'll learn how to use these tools to their advantage and then begin to materialize a practice of responsible use of AI. So probably this first version of this evaluation module will need to be improved after users start working with it, but we feel that's one of the benefits of this iterative development and we are very grateful to the Mellon Foundation for enabling us to build amp and this is like this. So finally, we're working on packaging amp for easier deployment, especially for organizations with minimal IT support. So we're taking a multi-tier approach to developing packaging with the end goal of creating a containerized environment. So the first tier will be being able to install amp directly on an operating system with help from detailed documentation and installation scripts. Then the next level will be having component level containers, and then finally we'll have some scripts and configuration to bring all of the containers up together. So we're still finalizing our packaging strategy, but in the meantime, anybody can try deploying amp and it's current in progress states at your own risk with our source code GitHub. Thanks, Sean. So I will just wrap up real quick talking about some of some of our ideas and my ideas about where this will go in the future. One is that that from the packaging work that Sean described, you know, we hope to see some other institutions start to experiment with the platform start to provide feedback, potentially, hopefully, you know, start to buy into it in some way in terms of integrating additional tools or improving deployability and so on, you know, potential, I think for there's not this is all Apache licensed open source. There's potential for software as a service models for offering something like this. We hope to work on integrating additional MGMs and target systems for output of metadata, you know, looking at additional opportunities where we could best achieve results by training our own models, building on what we did with applause detection into some other areas, which could include more work with music I see that as a whole other project that would kind of bring the music information retrieval community folks together with libraries and archives who are using tools like amp to take some of what exists in the research space and turn that into more productionizable tooling to support processing of archival music collections. We want to build integration with triple I F to be able to take the output of amp, turn it into web annotations and triple I F presentation manifest to be able to make it more portable to other tools. And participate as much as possible in this kind of emerging community around, you know, AI machine learning tools and workflows in libraries and cultural heritage, including for the AI for lamb community that if you are not familiar with, you can Google that at AI for and find out more. So that is it for the presentation. If you want to see more about read any of the reports that we mentioned see the documentation on evaluation of the different MGM tools and and and so on. You can go to go.iu.edu slash amp ppd to get to our wiki and feel free to contact us directly as well and with that. Thank you very much appreciate your time. And we'll open up for questions. So we do have a few minutes for questions before the next presentation starts at 1115. Thank you. Could you talk a little bit about where the metadata lives and how it links to the essence, whether it's linking to segmentation, or it's linked to time code or a set of time code things like that. Yeah, so internally amp uses custom JSON schema to represent the intermediate and final results of workflows, and within that JSON references time code so yeah the segmentation, or the terms found through any or transcription are tied back to the time code in that JSON representation the idea then is that would be transformed into whatever is needed to to bring it into a target system and access system or discovery system. Sean if you do want to add anything to that. It's a schema that we came up with ourselves just kind of as we will building the system assessing MGM since so we've been refining it as we go and you can reference that on the amp wiki. But turning it into as I mentioned into to web annotations that reference time code could make that more portable we are also working on kind of how does the, if someone has an, say a preservation system or repository and then we're bringing files into amp from that. How do we reference back to the file in that kind of system of record and so that then the output is tied back to that as well through, you know, essentially use of identifiers. And that's that's some of what we're exploring with with IU and with my PL is kind of what there's a lot of local implementation decision around that kind of thing so we want it we need to be as flexible as possible and how we handle. All right, well, let's we'll give us give the next presenter some time to set up because they start right 1115 so again thank you for your time and and feel free to talk with us afterwards as well. If you want us to elaborate or discuss any of this more we're happy to do so. Thanks a lot.