 This session is part of the open-source digital preservation and access stream. Thanks to Chris Licinic. We're very happy that he chose this presentation. This is about HYDRA, which is an open-source software for digital access, digital preservation, and management. And we'll hear about that in the presentation. But I'm going to first run through all the speakers so that you know who they are. I'm Karen Carriani, Director of the Media Library and Archives at WGBH in Boston. We're public television. We have 60 years of audiovisual materials in our collection. I'm going to talk about HYDRA-DAM, which is a project that we have developed. I'm going to give you a little bit of an intro about HYDRA. Then John Dunn, who is Director and Interim Assistant Dean for Library Technologies Technologies at Indiana University Bloomington Libraries, is going to talk about Avalon. He oversees IT support, software development, user experience, and digital repository systems. He's been involved in the development of digital library systems for audio and video for over 20 years and currently serves as the Director of the Avalon Media Systems Project. And he's going to talk about Avalon. Stefan Elnabel will come next. He's the Moving Image and Sound Preservation Specialist at Northwestern University Library. He manages the digitization projects and contributes to the library's overall preservation and access mission. He is also a member of the team developing the Avalon Media System, a HYDRA project. So Avalon is a product of two universities' collaborations, Indiana University and Northwestern. John's going to talk about Indiana University's role and Stefan's going to talk about Northwestern. And then Hannah Frost is going to speak and she's been engaged in media preservation and digital preservation efforts at Stanford University Libraries since 2001. Hannah is the Services Manager for the Stanford Digital Repository and Enterprise System for Long-Term Preservation of Digital Content. In this capacity, she has worked on two HYDRA head projects. Hannah also manages the Stanford Media Preservation Lab, SNPL. And I have to say, Stanford is really good about naming things with an acronym, so just they have excellent acronyms. That's a facility she developed for preserving and providing access to sound recordings and moving images held in Stanford's collections. So those are our speakers. And I'm going to start with GBH's efforts and an overview of HYDRA. So we were generously awarded a grant to see if we could build a media preservation dam system using open source software. In particular, we wanted to test the HYDRA stack to see what it would take to build it, what it would take for others to install it, to build some better documentation for installation, and to see really how we might integrate with the open source community. So what do we need for a media preservation system for digital files? That was different from a straight digital library system. What makes media management different? You all know that analog to digital is more or less a controlled scenario where you can determine the formats of the files that are coming out of your digitalization project. But born digital can be very different with many, many, many different file formats depending on the camera that's used. And they're many different sizes. So we need to manage many different file formats, something for access and preservation. We want to make our materials accessible. And you want to be able to see what you have. So we also need a system that's easy to migrate. The technology is easy to move it forward. And it's not expensive to migrate. Because as we all know, we have to do that probably every three to five years now. And something that's easy to evolve is the workflows change and this technology changes. So a quick check on preservation needs. This digital stuff in my mind really sucks. Film or stone is much, much easier. It's a much longer lasting medium and it's much easier to keep. But digital files give us much better and broader access. So how do we preserve this fragile stuff that needs migration every three to five years? You're going to need multiple copies. You're going to need to save the originals. You're going to need to do checksums and validity checks on your digital files to make sure you have all the bits all the time. Migration of not only the content but you're going to also have to migrate the files, the technology, the systems you're using and the software and the storage. And doing all of this with big media files is hard, time consuming and can be subject to errors and damage to the files. So why do WGBH choose Hydra? Well, it's open source so we can evolve it as our needs change and we can make sure it has features and functionality that we want and need. It may be cheaper in the long run but it is not free. And I am going to use this mantra a lot that it is not free beer, it is a free puppy or a free kitten. Here I'm using the kitten analogy. It still needs an investment of time and people and equipment and all that investment is towards but all that investment is towards a product that we actually define ourselves for our own needs. So we mostly chose Hydra because I was really impressed with the community. The community is very committed to sharing and supporting. The quality of the thought and work coming out of the community is excellent and the institutions that evolve are very established and long living. They are very sustainable institutions. So what is Hydra? It's a robust repository based on Fedora as the repository with applications tailored to targeted functionality. There are gems that can be leveraged to build new bundles or solutions. The community of developers is very friendly and welcoming. There is training thanks to digital curation experts and I think Mark is in the audience. Mark is in the audience. They run Hydra Camp so they do trainings for new developers coming into the community which is fabulous. They also give vendor support so you can hire them to actually come and help you get your system up and running to add new features if you need some support for your development. They are very much part of the sharing community. So almost everything that they develop and create goes back into the community as open source. The software is open and available through an Apache license. This is a picture of the Hydra community at Hydra Connect in January. It's been steadily growing. A very compelling reason to adopt Hydra technically is that it's a way to take advantage of the benefits of Fedora as a repository. There are other solutions that also do that but Hydra's strength really is its community. And at a time when we're all asked to do more with less and faster and working in a community where everyone has a shared purpose and goal makes that a lot easier. We all need the systems to do basically the same thing in managing our digital objects. So why not leverage that work together and work together to build code that works? And the only way to build rich and robust solution is to engage a large community of developers. The only way to build a sustainable solution is to spur adoption by the community of institutions with vested interest in a shared success. This shows the growth of the Hydra community of partners and adopters and I just want to note that OR 14 was in June. So from June to now, it was a huge spurt of adopters. It's a very fast growing community. A single application could not effectively cope with the use cases. However, any institution would want to safeguard the outputs of all these disparate systems in the digital repository for management and preservation. Hydra gives a framework where one body, the repository, can support multiple heads tailored applications. So you can put all kinds of different content into your repository and have Hydra heads that actually focus on applications to the users for specific types of content if that's the way you want to present it. So in addition, we wanted to be able to customize the interface, have the core functionality of managing digital objects, ingest, store, search, retrieve, describe, relate, and preserve. And of course, as soon as we got our NEH grant to develop the system, our developer left. And it took us about a year to hire their replacement. But in the meantime, what we did is we hired DCE to get us started. And within, I would say, three to four months, we had something up and running that we could test. Given our functionality needs, they decided to build our system office system built at Penn State called Sophia. So we were starting from a point where there was code already built and we were adding our new functionality to that code. Sophia had many of the functionality that we had requested, key being self-deposit, the ability for us to ingest any file format. We added in FFNPEG to transcode for the creation of proxies and thumbnails for the video files and a PB Core export. We also did some workflow messaging around the download of files because it takes so long to move the very large preservation files across the network and our users were getting very impatient in terms of waiting for those files to download. So we added some messaging into the system to basically say, yes, we got your request, hang on, be patient, we'll let you know when the file is ready for you to download. These are our interfaces. It's pretty simple. We're going to do a demo with the lightning talk later tonight, later this afternoon, hopefully it'll work. So please come to the lightning talks later. So I'm going to run through these interface slides really quickly. This is where it shows you, the yellow button shows you that you can download, it's getting ready to download. This is the thumbnail with all the metadata. There it tells you the files available for download. This is telling you it's offline. Please check the queue. It's coming soon. So what are we doing? Well, we're trying to do something that's already complicated, so we don't need a complicated system to add to that challenge. We're trying to now simplify our workflow in general. The systems are expensive to maintain and migrate. We've decided that what we do best is organize stuff, so we should be organizing our materials. We should be putting it somewhere safe where we can retrieve it and where we can easily find it again. How do we do that with digital stuff? And how do we do that with digital stuff? We decided that we shouldn't try to build a system that does everything for everyone. That's a hard discipline, especially in media because you sort of feel like it should be available all the time and everybody should be able to do everything with it. But we really decided that as the archive, our goal is to preserve the material, store it someplace safe, and be able to retrieve it again when somebody wants it. So how did we do that? We decided to focus on that as our mission, particularly with this system. So we're aware that there are other systems that are open and flexible enough to share. So as long as what we build in the archives can hook into the other systems that our station is building for editing, for production, for broadcast, then we think we're okay. So we as the archive are really focusing on that preservation piece and making sure that system that we build for preservation is open and can hook into those other systems. Our biggest challenge is that working with lots and lots of files, lots of different formats, and they're very big files. So how can we do this with our limited resources? We did decide to change our workflow. With Hydrodan, we're doing the entire large essence file handling locally, not over the network. We have found that moving the big files across our network was just too slow, too painful, probably causing some errors in the files and corruption. Users were getting really impatient. They kept claiming the system's not working because they weren't getting their files fast enough. So we've taken that piece out of the mix. We have a powerful Mac Pro connected to an external LTO-6 drive, as well as connections for all different kinds of hard drives that may be delivered to the archives from other departments as part of accessioning. Hydrodan is running on a virtual machine server, and as the individual files are being processed, fixity checks are conducted to maintain the file authenticity as the files are being copied. The files are delivered to the media library and archives by either external or internal hard drives or over the network provided the infrastructure is there to actually support it. We are actually preferring that people bring us the drives themselves. Hydrodan is flexible enough to support ingests from a variety of sources, which is great. Hydrodan then creates a data record of each file, attaches the fixity data, as well as the characterization data to the record. The local Mac Pro machine then creates a proxy file using FFMPEG. FFMPEG is an open-source application that decodes many of the different audio and video files. So I would say that probably Hydrodan, at this point, will transcode maybe 80% of the file formats that we'll get. So as we ingest into the system, a proxy will be created, a thumbnail will be created for those 80%, so that you can actually see what's there and users can go in and search and actually see what the content is. The extra 20% that they can't get a thumbnail or a proxy for, they'll have a data record that'll have a description of it, but if they actually want to see it, they're going to have to download those files, and that was our compromise, because we were not going to be able to find a system that was always going to be able to transcode all the files all the time, so we sort of focused on the key ones that covered 80% of the files. The proxy location is added to the asset record, as well as the ability to download the proxy file. Metadata is either done manually in HydroDAM, or it can be batch uploaded and attached to records for each asset. Excuse me. So if we're still requiring documents, departments to submit our FileMaker database, we can import a FileMaker record attached to the HydroDAM asset records, or a CSV, or an Excel, as long as the proper field mapping is in place. And that's all that HydroDAM does. It points to the location of the proxy files. It stores the metadata in the Fedora repository. The Fedora repository stores all the metadata. It can be exposed to other applications that need access to it when we need to do that. So source files are delivered by hard drives or the network. The HydroDAM is running the fixity checks. The files are being copied to a locally connected LTO6 tape, which is running also an MD5 fixity check, and then copying the files to another one. So we have two copies. One copy is in our vault. One copy is off-site. The serial number of the LTO6 tape, the GBH barcode, becomes part of the metadata asset record in HydroDAM. And then when the files have completed transcoding and copying another copy is made, the copy is sent off-site for storage while the original source drive is stored in our vault. So then when people actually request our preservation files, we will go to the vault, pull the LTO6 tape, spin the file up on our workstation, and they will come and actually retrieve that file from their driver or thumb driver or whatever. So it's very much the same as how we handle physical tapes these days, where they make a request, we go to the vault, we pull the tape, they come to our offices and they pick up the tape. This way they're just picking up a file. So the delivery of those preservation files is not happening over the network. And we actually found that that was a lot faster. People were actually pulling tapes and digitizing them again because it was faster than pulling that in a digital form now. So the flexibility of that is allowing us... We have an HSM system. We're moving away from the current HSM system at the moment because it uses the network and it's a robotic tape. We may end up going back to it at some point when we upgrade our infrastructure and the network internally, but for right now, it looks to be able to do that. But for right now, we are going to continue to work on our local LTO-6 workstation. So we wanted some other institutions in this project to test our install and to test the system to get feedback on whether the documentation was clear enough that they could actually do the install themselves. So our two partners were WNYC in New York and SCETV in South Carolina. And they gave us amazing feedback for the documentation which helped us improve it, improve the install. DC went back and actually improved the code for the install and improved a lot of the documentation which they then also ended up using migrating to other versions of the work that they have now used for other institutions. So we were testing an open-source solution and we were trying to figure out whether or not it was a good solution for us. The NEH project gave us that luxury and opportunity and we were very thankful for it because probably otherwise we wouldn't have undertaken it. In the decision process it's not easy and cheap. Open-source is not a free puppy, it's not a free kitten. It does take a lot of time and people and development and energy. But at the end of the day you will have a product that you built yourself that fulfills your needs. There are no term key solutions yet so just make sure you know that. But there's a really strong community out there building these solutions to these challenges and needs and that support and knowledge is really the best leverage in terms of moving in this direction. It's coming from academic institutions with really long histories, strong track records and technology and more resources for their libraries. They have audiovisual materials in their collections and they need these solutions too so it seems like it's a really good win-win to collaborate with them. So Hydra is one body and many heads. It's a single application could not effectively cope with all of the use cases that you might need and any institution would want to safeguard the outputs of all of these different systems. So Hydra gives a framework where there is one body, the repository and it can support many heads with tailored applications. So we're going to move on to John Dunn who's going to talk about Avalon. Thanks Karen, that was a great introduction I think to Hydra and the power of the technology and of the community. I'm going to talk about another Hydra head or Hydra-based project that serves some different use cases and has approach development in a slightly different way than the WGBH Hydra dam has. So Avalon Media System is a Hydra head that is oriented around access to audio and video digitized or digital materials. So the goal was to create an open source system that lets libraries and archives more easily provide online access to media collections. We are open source obviously being part of Hydra and based on Hydra. We're using what's called an Agile Development Methodology that I'll talk about a bit more in a second. We're trying to leverage existing technologies as much as possible and we're not trying to reinvent the wheel here but really fill in gaps that we think aren't addressed by current solutions and a big part of this project that's being carried out jointly by Indiana University and Northwestern University is to kind of communicate and market the project, engage other implementers, users, potential partners what we're building over time. And the work I'm talking about is funded in part by a grant from IMLS, the Institute of Museum and Library Services that will run through September of next year and there are some other funding things we're currently working on to help keep this going beyond that. Beyond Indiana University and Northwestern University we have a number of other institutions that have been involved either as pilot implementers or advisors as in WGBH's case to help us make sure we're building something that is of use beyond just Indiana and Northwestern University but really can serve a general set of use cases for access to media collections. So we started developing this back in 2011 before then we spent about a year kind of looking at requirements talking to people, talking to other institutions, trying to formulate a technical plan and a plan of what functionality we were trying to build and so we had our first release a little over two years ago and our most recent release came out this past July we have another version coming out yet this fall and then a more major version with some significant new features we have to come out in the spring of next year and we're trying to release major versions roughly every six months and minor versions of every three months or so we're still working on getting to that sort of pattern as so this is based on Hydra and Karen kind of described that Hydra is this kind of underlying set of technologies and framework for developing repository applications and front ends application itself, one other thing to note about Hydra is in terms of technologies that were discussed in the earlier session Hydra is based on the Ruby on Rails framework so the code is written in the Ruby programming language and it is using the Rails model that is the basis of many successful web applications the big difference is instead of a relational database backing all of your data there is a Fedora repository sitting there I'm not going to go into detail on this diagram of Avalon's architecture but it can at least give you the sense that Avalon is made up of a lot of other components including pieces that come from Hydra, the Fedora repository, some other technologies like the search engine solar using another product currently called Opencast Matterhorn to help manage our transcoding workflow that uses FFMPEG underneath them and Avalon is designed to integrate with a number of other systems that are typically in place within an institution such as existing metadata systems, authentication services, authorization services storage services streaming services and I'll talk a little bit more about that. So just to give you more of a sense of what Avalon actually does and I think Stefan will dig into this a bit more in the context of a particular institution's workflow and uses of Avalon but some of the basic functions that Avalon provides is it provides the ability for users to come in and browse and search metadata for media items that are available to them and the items that are available to a given user will vary depending on access controls that are in place. I'll talk a bit more about that. And Blacklight is another open source piece of software that is part of Hydra that is a search interface and browse interface that uses the solar indexing system. It's used more and more by libraries to provide front ends to their library catalogs. So for example Stanford uses it Indiana uses it, University of Virginia and others have their main library catalog is based on Blacklight. So we're using Blacklight as a key piece of this. We have a player that's based on MediaElementJS that can deliver to pretty much all desktop and mobile platforms and supports switching between different quality versions of different trans codes of a particular media item and can be embedded into other contexts. So content in Avalon can be embedded into blogs, into Omeka exhibits, into other kinds of websites. Then for the staff user, there's a whole kind of content management interface piece of Avalon. So you can set up collections, you can delegate the management of those collections to different groups of people, give people different roles in terms of being able to add, delete edit metadata for items and collections. Content, video and audio files can be loaded into Avalon through a number of different methods including upload via a web page, deposit in a Dropbox directory on a server. There are a couple of different ways to do a batch load of a lot of media items and metadata at once because we are using FFMPEG under the covers of WGBH. We can deal with most media file formats and Avalon will then handle the transcoding of those using FFMPEG two formats for delivery on the web or if you already have your video transcoded in the form you like you can load that directly and bypass the transcoding stuff. For descriptive metadata in Avalon to support that searching and browsing interface and identifying a video or audio file that the one has discovered we are using the mods a set of metadata elements based on the mods standard and I should note we expect actually that a lot of discovery of items in an Avalon environment will not take place in Avalon itself but rather this is taking place through web search engines such as Google through existing market based library catalogs through existing archival finding aid delivery services and so forth. Avalon does have this search capability and browse capability but it's not certainly not the only place people will be searching for and browsing to find things. And then finally one of the key pieces of Avalon is the access control capability so if you load a piece of media into Avalon you can restrict that in various ways. You can make it available to anyone who comes in you can restrict it to users who can log in to Avalon you can restrict it just to the staff who are managing it or you can restrict it to particular groups of users or individual users as you wish that can tie into your institution's directory services in terms of things like Microsoft Active Directory or LDAP services as well as with course management or learning management systems so if you have media that you want to restrict for example in a university setting to members, students in a particular course. And Avalon also can work with a couple of different systems for establishing a permanent URL or handle for items that are loaded in so that you have a URL that will remain the same for that piece of media over time. Just to talk a bit about the process through which we've developed Avalon as I mentioned this is developed between Indiana and Northwestern universities and we have had a single development team spread across those two institutions using what's called an agile scrum process of two weeks sprints of work working on a defined set of user stories that we're trying to deliver functionality for. Our code is in GitHub as was discussed earlier and is publicly available. We do a lot of work online but we find it important to get together for face-to-face meetings to work on longer range planning and other things like that and the teams between Indiana and Northwestern are meeting this is part of the scrum process on a daily basis for 15 minutes every morning via audio or video conference to touch base to make sure things are moving forward. So this shows the development team between the two institutions but one thing to note is we've had this model of this tight team between Indiana and Northwestern developing the system and our goals in making Avalon an open source system is to engage additional institutions and individuals in the development over time either through directly contributing software development time or possibly money to fund software development time on the part of others and so that's one of the challenges I think over the next couple of years is moving from this tight knit team that's developed this initial product that can now be the basis more community based development over time so we have a number of institutions who are currently implementing Avalon beyond IU and Northwestern you'll hear about one of those I hear a bit later we're working continue to work on new features in new versions to come in the future we have a product roadmap that is on the link to from our website if you're interested in more details there I mentioned needing to deal with contributions from the community being able to integrate with other tools for example being able to use HydroDAM for example it's a preservation solution and then use Avalon for public delivery is something we're really interested in and kind of getting a model in place to make all of this work going forward and sustain the project is a key focus we're also looking talking with potential partners to offer a hosted version of Avalon so if an institution doesn't have the resources or want to spin this up locally that there could be a cloud option for that that's still in very early stages so that's Avalon and really briefly I'm going to talk about use at IU we've been in a pilot phase for about a year now and you can visit our site and see what's there but we've been working with various use cases involving the library's film archive for both IU own materials which IU owns the rights can be made publicly available as well as serving access to individual researchers for request of specific items we've been working with video course reserves with some video recordings of conferences and some other use cases so this is not just for archival collections so archival collections are a key set of use cases for us and then also Avalon is going to be an access component of a broader media digitization and preservation initiative that's generating quite a bit of audio and video data that we want to make as accessible as we can given rights issues and so they're going to be a number of things to address there especially in terms of integration with our local infrastructure and workflows that we'll be digging into in the near future and Avalon is focused on access it's not focused on preservation it complements a preservation solution and so for our preservation we're really interested in the hydrodam work of WGBH and how we can tie that in with some storage resources we have at IU to sit with Avalon to kind of form a complete preservation and access set of options for us so there's more on Avalon at our website and I think I'll stop there and turn things over to Stefan to talk a bit more about specific implementation in Northwestern Hi everybody my name is Stefan I work at Northwestern University Library in the Digital Collections Department and the my title is the Moving Image and Sound Preservation Specialist and within the department I serve that function but I'm also on the team developing helping to develop Avalon media system with Indiana University in that Hydra framework and while I'm not specifically writing code I am contributing in areas having to do with metadata requirements file encoding playback experience workflow design these various pieces that help inform the development of the product so that's tailored to practical context for users like archivists and librarians that are working with digital media collections that they want to make accessible so I'm going to talk a little bit about Northwestern situation with their digital repository in the library and the services that we've been building in the Hydra framework and then show how we've utilized Hydra in the form of Avalon to manage our access collections which is pretty fresh for us so we've been working on the development of this for the past couple of years but we just launched a production system with dedicated hardware to build collections and to serve them for faculty researchers curators and even opening it up to outside library units that want to manage collections within Avalon which is a pretty exciting thing to be able to offer that service to other parts of the university so I work in the digital collections department it's part of the larger technology larger technology division of the library that includes enterprise systems IT web technologies and so we primarily serve faculty students and special library units and that's our special collections our university archives music library etc these are where we have our archival media collections we're going to see the legacy formats various film formats various magnetic media formats and we've been ramping up our digitization program to make these things accessible but we didn't have a very robust system to provide access to those collections and so Avalon was the direction that we wanted to take it so we've been streaming media since 2001 we currently have well above 50,000 media assets currently our streaming infrastructure the one that that preceded Avalon is pretty long in the tooth it hasn't changed in many years we have two separate servers serving two different types of formats for different bandwidth scenarios so there's no one central place to do that we rely on external players that add another level of support and issues and without flexibility and there was no real centralized management of the access media so we had to rely on disparate tracking methods that are more difficult to use as we exponentially grow our digital collections so you can kind of get an idea of what things we have in our repository if you go to digital.library.northwestern.edu you can kind of get an idea of how we design our access platforms to kind of serve our users in specific ways this is an interesting collection where we created three models of masks for users to interact with so our digital repository in 2006 we've been developing services and collections in a fedora commons digital object repository we've expanded to include services for audio visual assets and we've begun to build collections within it for access so this is really exciting has really shown us the necessity for having the system because as we start to digitize things at the collection level where we're dealing with hundreds and even thousands of items we needed a more streamlined process to be able to describe those things ingest them into a place where they can be managed and then disseminated for users so conceptually the functional components and our access platforms interact to kind of form this kind of infrastructure where you have the core repository it interacts with our isilon storage and then we have these various hydro heads for media and avalon images in dill which is a digital image library and those things also have to be hooked up to their own streaming servers and image servers so these components when they get put together things start to get complex and even looking at this top level view you can sometimes overlook the fact that there are many different technologies within this open source development ideal that we're trying to create that are combined to create and put together to make the repository so we join as a partner in Hydra in 2011 and we've been developing services through the Hydra framework one of them is our digital image library we've been involved in a cross institutional development of a shared institutional repository but one of the most important things to the people in this room is our avalon media system which is where our access files and end user facing metadata is prepared and disseminated so avalon has always been intended to replace our current streaming infrastructure and like John said it's a preservation system it's on top of the repository that has its own preservation services that can provide access to the collections so essentially we really needed to replace existing workflows with ones that are more streamlined and we needed to provide an easy way to link our audio visual resources with courses and to lock access down per item for different scenarios so we serve many different types of users we have students, faculty researchers, curators and the list goes on each one has specific context that you have to deliver the video or the audio and just to give you an idea of our media collections so we have a fairly large circulating video collection and has all the familiar formats that you might see in your own collections for circulation music library also has our circulating audio collection but our special libraries division is where the bulk of our archival media collections are and to give you an idea of one collection that we've been working with that became a pilot collection with avalon and now that we've gone into production we're starting to build it up more is our wildcats football film collection so we have over 2,000 reels the digitization alone poses challenges and considerations and we're fortunate enough to work with people that can help us do that and since we've begun digitizing the collection we found that our current access methods just don't meet the needs of the collection so we'll go from this reel on a shelf we'll do our due diligence to preserve the actual physical item and we'll digitize it and have an accessible media file that is stored in our repository but the method for access was just a little bit unclear because we wanted to provide a lot of things currently our university archives with the football films that we've had they have a YouTube channel so it's a public collection they'll upload their videos to YouTube but that's as far as the management can go we need an access service on top of the repository we need it to integrate it with our identity management system we needed granular access control we needed rich metadata capabilities we needed multi-platform streaming so not only different browsers but different devices and different bandwidth scenarios a student let's say the dentist office using public Wi-Fi is going to have a much different playback experience on somebody on our campus on our blazing fast network we needed to integrate it with our learning management system we're currently using blackboard we're in a slow transition to move to canvas and this has been an interesting connection to make when we're developing the system to be able to get course membership information and be able to provide resources specifically to students within the course of time limited fashion so they'll be available for the fall semester and then we can take it away when the class is over we also really wanted to have support for non-library units to self-manage content we have been developing more policies in our repository for intaking collections and we've become more liberal with people to allow them to give it to us so that we can start to manage it however we need to give them access to do the work to describe those collections and make them accessible to their needs we start to lose sight of what we need to do in the library and so we needed a system that we can give access to people so that they can self-manage their content and do what they need to do with it and also kind of be siloed so those collections don't get mixed up with our library collections so we've been able to achieve that with Avalon so John gave you the view of Avalon as a product I'm going to go through some of the similar layouts that you'd see but more in a branded Northwestern context and speak a little bit about how we're using it in our workflow and how it's working in our context two minutes so we have the user interface we have the faceted browsing we have two collections the Robert Marcell Masterclass Audio Collection and University Football Films you can look at it using edutnorthquestion.edu we've been building our football collection with it and centralizing it to manage environment so that they're browsable and editable we have assigned to our staff different roles based in the collection to be able to manipulate it in different ways so we have administrator, manager, editor and depositor roles this is an example of how we've customized the system for our needs because these permissions are a little bit different from core so we have digitization requests they get assigned to different production staff they perform the digitization we can put it in Avalon we can create status updates for our faculty or whoever is making the request to see where it is in that queue and then we can make it accessible there's different methods as John described for uploading files however when we're building our collection for example our football films collection we sometimes have to upload hundreds of files at a time so we build them in this batch format where you create a package of information with assets in a manifest file basically basically we've opened up Avalon to be able to interact with other people in the library so bibliographic services can actually contribute metadata we can structure different files and we can assign special access and make it visible to courses specifically locked down for courses and then have our video on our metadata available entirely so we are now in the process of migrating and learning about our scalability right now so new content that's generated is being served in Avalon we have old content that needs to be transcoded metadata needs to be created structural organization needs to be in place etc and so we are investigating scalability and we're addressing this the migration needs and we're pretty much off to a good start so thanks I've mentioned yet is that one thing that comes with Hydra is cool t-shirts got one on today the interest of time I won't and we have stickers you just have to show up at one of the meetings that's the little secret okay so hi I'm Hannah from Stanford and I think my talk will echo some of the points that Stefan just made about Avalon at Northwestern we have not adopted Avalon yet so what I'm going to talk about here today is kind of the situation at Stanford and the conditions that got us to the point of deciding to adopt Avalon so as Karen mentioned in the introduction we have a media preservation program at Stanford that has a number of objectives and I imagine that any media preservation you all are doing have very similar objectives I won't go through them here but we have been at this for coming on a decade we have this is my colleague Michael Angeletti we've digitized over 13,000 items in our audio and video labs and just going back to I will highlight a couple of these objectives I'm trying to hurry up because I want to make time for questions one thing in particular we're really worried and thinking about is teaching and research supporting use of media and teaching and research and another key objective of ours is developing expertise, best practices and community so Michael has done a lot of work in the preservation we've developed a lot of expertise and we're sharing that expertise through things like our VTR refurbishment project the one inch or the half inch EIAJ and of course we've been involved in the AV artifact process and that's a project and that's been a wonderful way to build community around what we're doing so we have made some progress on our objectives and these are of course ongoing efforts but we have some real some concerns about the things that aren't checked off here these are areas where we've seen we have gaps sorry for the tired cliche but I just kind of like the sound gap because I imagine many of us have similar gaps in terms of providing access supporting the use of media in teaching and research at our institutions and kind of integrating that with our other infrastructure and services so you might say well why has there been so little progress when we've done so well on the preservation side and in terms of delivering images and books through our digital library I mean I think part of the issue is that media technology feels comparatively complex with respect to you know compared to just serving up digital images or books or you know scanned manuscripts it's really been a volatile market in terms of the technology we've been watching these companies these solutions come and go and of course the delivery formats are kind of always evolving it's really hard to pin down a solution and of course so little of our content can be shared due to the right situation so that just makes it even more complex and I think even within this very group where I'm situated there's this perception that media is kind of less relevant for scholarly materials and you know what I'm talking about that it's just not as important to get out there as books and manuscripts but I know that's not true and I thought how are we going to bring attention to this situation how are we going to get resources on this problem so in the middle of 2013 I decided to launch a strategic planning process that would really kind of heighten awareness about this situation so it was kind of a lightweight process but you know we really wanted to explore and expose the present state so I pulled together our team we had a retreat and we started to talk about that issue and stake out and begin this planning process we decided to engage with our stakeholders both within our group and around the Stanford campus as well as at peer institutions with the goal of expressing the shared vision of the desired future state for delivering and making our media more accessible and then outlining steps to get there as part of getting the stakeholders engaged I convened the media access working group here's some of the members we met frequently over a course of about 14 weeks in late last year leading up to December 15th we discussed our issues we compiled statistics we did interviews we tested a piece of Avalon we produced a six page report and delivered it straight to the library directors and here's some of the findings that we found in the course of this work patron requests for getting digital copies of our media content is rapidly rising so we've been tracking this since about the middle of 2008 you can see the graph we had 69 items requested for digitization last month alone just a huge, huge sign that there's demand for this stuff we went to our Google analytics Stanford has about two dozen digital collection websites the top ones on our Google analytics serve up media we have the Buckminster Fuller collection the Lynn Hirschmans Women Art Revolution collection and most recently our Riverwalk Jazz collection these are our top sites more key findings online access there's such a low volume of our collections our permanent holdings in various repositories on campus our archive of recorded sound one of the biggest archives for sale materials in the nation has less than 1% of the content online special collections less than 5% our delivery technology we do have a streaming service but it's falling way behind both in terms of it doesn't have mobile support it doesn't support the campus firewall so the only thing we can put up is stuff that we can share with the world I wish we could share everything with the world but as you know we can't and it lacks integration with our emerging digital library infrastructure of course major obstacle is rights we estimate that depending on what collection what repository we're talking about anywhere from 80% to 100% of the material is rights protected and we are working on emerging access policies working with legal counsel it's a really good headway but it's really clear that we require a controlled access system if we're going to make any of this material available so some of the recommendations of the working group we've got to augment our systems and tools that reduced our ingest backlog a failed dimension we've been doing lots of media digitization we have a backlog of getting it into our digital repository and that this day and age as long as it's not in the digital repository it might not be digitized yet so we've got to promote these systems tools and workflows to get this stuff and the discovery and use we have to wrangle the rights further get better kind of blanket statements that we can really make clear to users and our archivists, everybody in the library really clear what can get put up, what has to be Stanford only and we realize there's a lot of metrics we don't have about how people want to use media where they want to use it, the kinds of collections that are in most demand so we need to find ways to gather more metrics so we can form our future decision making last fall with some other folks on campus I convened a community of practice at Stanford so getting all the people at Stanford who are using media and this is really ramping up with online classes and that sort of thing and everybody can make video now and everybody wants to put it up but we can't all make it shareable everybody wants archiving but we don't want to have 12 repositories on campus we want them in the Stanford digital repository so we convened this community of practice to start understanding the broad needs across campus they have a lot of the same issues that we do and so we're trying to kind of pull together figure out what makes sense to make centralized services and so on so the keys to a solution at Stanford libraries to solve problems like this it's a theme we'll hear today all through the open source thread the stream so the technology it has to be open source but it also has to be open minded and flexible there's open source software out there but it lacks a certain flexibility in terms of tapping it in with your other things that you might have going on and open source is great but it's only as good as the community that's involved in it that they have to go hand in hand they have to have a vibrant community and frankly one that understands digital library needs we did do some pilots with ShareStream and Kaltura I'm not going to talk about them today this was largely driven out of our academic computing group because it was part of the course management system I was invited to kind of pilot with them I did it didn't take me long to figure out they weren't going to work for us so they go away and meanwhile we deeply embedded in fact Stanford was one of the founding partners of the Hydra project so we're deeply invested in these technologies but the community is really where it's at Karen showed a picture from the Hydra Connect Meeting it was in San Diego in January last week was the Hydra Connect Meeting in Cleveland there were 147 people there from 40 some institutions Michigan chartered a bus and sent 20 people there it was really amazing but the community has doubled since January so Avalon at Stanford it meets all of our basic needs both functionally, technically, and philosophically we tested it in the media access working group and we're completely supported by our management we're currently hiring a media infrastructure engineer so if you want to come work with us the job is open I'll send you the link through a two year collection project which I'll come back and talk about in another media and work is going to start on this ASAP so that's it thank you we have brief time for questions are there any questions? do you guys want to come up? drag a chair I guess any questions? yes I've got a question for John you said that Hydra isn't necessarily intended to be the main discovery layer do you want to go on that a little bit in terms of because it seems like it kind of I kind of feel like it should be well yeah I can expand on that a bit so it may be by using Indiana as an example we're trying to use Avalon to serve media collections from well depending on how you count several dozen to perhaps almost 80 different units that all have different existing practices around description and discovery so some are library units that have mark records in the library catalog they're exposed via the OPAC some are archives that have EAD finding aids some have all custom databases spreadsheets paper documentations we're trying to integrate content from a bunch of different existing practices and existing modes of discovery and we want to continue to support those modes of discovery that are useful to the communities using those those collections now well also using Avalon I think to provide an integrated discovery of all of the materials that we do have online so it is serving a discovery role but maybe more to kind of showcase what we have and for more casual search than for say an in-depth research case does that make sense yes Jack? I'm really impressed with this project I'm really happy to see it come to actual application use it just seems like one of the great things about hybrid in each project is that there are so many applications that you can build on them and then the downside is that there are so many applications that you can build on them I'm wondering if you could speak to the kind of development resources staff-wise, expertise-wise that you all rely on um that's a good question so within the Hydra community there are a bunch of different sizes of projects and sizes of institutions for example the Rock and Roll Hall of Fame was a one man shop one guide built the entire system all the applications for their website and for their delivery online Stanford and Indiana University obviously have a much bigger staff devoted to their projects and their crews GBH had one developer we hired DCE to help us get launched we now have two developers on staff so we're slowly trying to grow somewhat support we probably won't go any higher than that is my guess so I think it really depends on your institution what resources you can bring to it if you adopt a solution that someone other institution is using and you're building on that it'll certainly ramp up your speed of adoption quicker what else can I add to that I just want to add that again at this meeting at Cleveland last week on Friday it was kind of like the working group sessions and so all of the core committers on the Hydra project got together and a bunch of other people were there and they got together to talk about how they're starting to be some gems and things coming out that were really similar what's the difference between this and that and so everybody was kind of working hard to regroup and figure out what they were going to start merging so my point is it's a process and I think Lauren talked about this earlier how it's the community comes together so maybe you're a one man shop or a two person shop or a 40 man and woman shop but when you come together with the community you re-circle and you're so much more powerful because you start bouncing ideas off of each other you figure out how to make the code better so it really is an amazing thing to watch the power of the community come together on development so don't be afraid if you don't have a lot of developer resources because the power of the community comes with it yes I have a comment that sort of follow up to the earlier question so at least technologies are developed in the context of universities and nonprofits I wonder if you want this also to be taken up outside or you need this sort of infrastructure for support and I was wondering he also perceived setting up a spin off organizations companies to actually manage the further growth of Hydra and Adlon so so far in the community space and one of the commission experts is a for profit vendor of sorts who is supporting the community and supporting the code to some extent so they run the camps Mark can speak to that if he's still here I don't know if he's still here but they run camps for example you can hire them to help build your code you can hire them to help build your project they'll then feed that code that they've just built for you but I think we all agree that it would be great to have more companies like that or more vendors like that in the community space so I think they are welcome they just haven't stepped forward yet I would also say that it is based on Ruby and Rails which is a software that's out there in the commercial market so there are a lot of developers that know that as a language they might not know the Hydra framework per se but they know that as a coding language what we did on one of our projects was actually hire a local Ruby and Rails developer to come in and specifically help us code different pieces of the project that we were working on you know you are competing with the commercial market who are also hiring those developers at higher costs but we found that that was really efficient and it actually focused us too it made us really think about what is the specific code we need them to help us build and kind of help us target it any other questions we are kind of out of time too ok but we are all here so please feel free to ask and the lighting talks this afternoon we will get some demos