 Good morning everybody and welcome to the session Innovative Uses of Island Dora. My name is Mark Jordan. I'm from Simon Fraser University I'll introduce my colleagues briefly and then move into an introduction and then hand it over to the first speaker This is Christian Allen from UCLA and Evelyn McClellan from Artifactual Systems Island Dora is a general repository platform built on a commonly known and well-understood tools namely Drupal, Fedora Commons and Solar for searching and it's Currently using Fedora 3 and Drupal 7. That's kind of changing as we speak. There's a movement right now to migrate to Fedora 4 You may have heard a lot of bobs about Fedora 4 It has a very vibrant user community and a fairly vibrant commercial support community. There are at least three commercial service providers Okay, I'll try Actually, can someone close the door at the back? Please, that helps a lot. Thanks And as far as we know in within the Island Dora community, it's installed in a production Production environment in about a hundred sites around the world mainly North America Quite a few in Europe and the sprinkling in Eurasia I Think I'll turn it over to Christian now and let him take the microphone. Sure. Thank you So as Mark mentioned, my name is Christian Allen. I'm a software developer at UCLA And we use Island Dora In a variety of different projects. So I'm gonna talk a little bit about today Just covering briefly what Mark already said just Island Dora in general to get everybody on the same page If you're not familiar with it or you're looking at using it how particular code packages are used in Island Dora and How other people others in the community can contribute to the open source Software that so others can use particular projects and then I'll focus on a use case at UCLA and a particular Module that we contributed along with DGI that's discovery garden a vendor Back to the Island Dora community And so as Mark mentioned, this is just a high-level view of what Island Dora is it uses three main components that if You're already familiar with just the software world and repository in general These are all really common Drupal, which is a popular CMS framework is used to view objects on the web Solar is used for the browse and discovery layer So this is used to search that the contents of the repository and fedora is used for long-term storage And archiving of the objects So we're gonna focus Particularly in Drupal. There's the concept of modules which are just reusable packages of code that apply to a certain use case and Island Dora has adopted this concept and called the chunks of code that they use solution packs so this is just a fancy way of saying it's a Way to package code so others may use it in a different environment So someone at XYZ organization develops something And someone at ABC organization wants to use that. There's just a standard way of structuring the code so that others can use it easily And so Island Dora has the concept of solution packs So if you have a digital library most likely you're doing the same things over and over again If you want to display audio There's a Mostly you're using the same player as someone else or if you're using video you're structuring the data a certain way And out of the box Island Dora comes with a variety of different components to get you started One of the areas that we were interested in was particularly a manuscript collection that we wanted to Expose on the web and there wasn't a turnkey solution If you're familiar with Island Dora, there is a variety of of similar a solution pack So for example, there's the book solution pack And there's a newspaper solution pack which were very close to what we were looking to use in a particular project But they weren't quite there. There was a couple features that That we needed and so it's just a matter adopting these and then Creating a solution pack and then using this in the Island Dora framework And so our particular use case that we had we had a need for the manuscript solution pack so it essentially There was a couple different features that the book solution pack and a newspaper solution pack didn't have and in particular this was T I association and a custom XSL T display And also the option to view and compare multiple images of a work And there were already there's a the Island Dora community had developed The digital humanities solution pack which was a little too heavyweight we thought for our needs We need to just scale it down and just wanted to start with something a little more simple And it should be noted that there are others in the Island Dora community particularly Hamilton College that did a lot of work with TEI and Papers with their Civil War project, but it was done in a previous version of Island Dora And we're using Island Dora 7 so we weren't we were able to utilize some of the concepts But we were able to port the code directly So the first feature that that we were really interested in was building on the the internet archive viewer and the book viewer solution pack so out of the box The book solution pack of Island Dora uses the the internet archive viewer, whichever Knows and is familiar with and is a really robust viewer We needed the option to actually have a TEI text compared with a page that the user was looking at But if if TI didn't exist for a project. We didn't want to Force the force a user or force a project to have to have TDI to use this Project so you have the option of choosing between two different viewers right now The internet archive viewer if you just want to get started right out of the gate, and you don't have Any any TEI or additional components or if you do have TEI and some XSLT for your project You can use a custom viewer that UCLA has that allows you to put these two pieces side-by-side And Interact with the TEI so when you choose a page the the actual image changes to to where you are in Reference to the TEI Along with that is previously in Island Dora. There wasn't a way to associate a TEI file with a Manuscript or book collection and now one is able to do that and The advantage of that is now that you can you can take advantage of TEI related functions that were not previously available Along with that you're going to want to display this TEI in a certain way, and that's going to be different for every project And so the way that that's utilized is using XSLT so for each project No manuscript most likely is going to you're going to have an 18th century manuscript is is going to look completely different from Something from present day, and you'll want to display these differently, and that's done via XSLT So you can use the same code same solution pack And someone on your team can just develop one new file and XSLT And you can have a completely different viewer. That's more appropriate for your project So the particular case study that we're using this for first is the David Livingstone project and David Livingstone was a Scottish Explorer And he kept extensive writings of his travels in Africa, and we have hundreds of documents of and Manuscripts during this period and these have all been marked up in TEI and transcribed by scholars And we have implemented this in in Island Dora So this is a complete Island Dora site utilizing the manuscript viewer that we had just reviewed And it's directed by Professor Wisnicki at University of Lincoln, Nebraska UCLA is the digital host and publisher And we're working closely with resources at UNL to Bring this project to fruition and it's going it's in beta now, and it's going live in at the end of May So this is just an example of a screen capture of the manuscript viewer of a particular letter in the Livingstone project There's also all the standard features of Island Dora, which is a faceted browse by different authors and topics and places that the Letters were written and also what's really particularly interesting about this collection is While Livingstone was in Africa, he had very limited access to paper and would write on newspaper From left to right right to left up and down and he also utilized After he ran out of ink, he utilized ink created from the local berries. And so using spectral imaging You can now see The different writings a lot more clear where it looks just like a mishmash if you just look at the paper And those resources are also in there and there is you can be able to you're able to compare these The next step for this project is we'd like to increase the TEI and viewer interaction right now It's fairly rudimentary. You can change a page and it will flip to the next the appropriate the appropriate image that you're looking for but we'd like to have it linkable and using OCR some of the Be able to dig into the the the material a lot more robustly We do need a generic bass a batch ingest process. So right now. This is it. You're half the injustice For your organization, but we'd like to have this a generic process So you can just dump things in a folder and it will ingest it into Island or and generate the proper derivatives additional navigation widgets Annotation options and additional integration with other viewers out there such as such as miradora and some of the other Projects interesting project out there and we have a public release in May We're running out of time, but I'll just cover so the Additional features of the manuscript solution pack. So right now. I've just focused on its use in manuscripts, but there's a It's really meant for archival materials in general and so if you have an EAD Finding aid you can ingest it using this solution pack and it will automatically generate something akin to a site map a large linked page That you can also customize via XSL T And so the idea would be if you had just large content that you wanted to churn out quickly And didn't have metadata at the item layer This is a solution that you That you could explore that would be able to get a lot of content up quickly a standard review is provided so Depending on your finding aid if it's you know folder box item or the way it's structured There's just a default way to look at this and then you can tweak that using XSL T for for your organization So these are just the final the people and contributors and the the item or the solution pack is available on github right now and is downloadable and usable and UCLA has a fork that we're using for the David Livingston project, but anything generic we plan to contribute back to Submit pull requests and and submit that back to the discovery garden and island or repo for use And so that's that's about it. Thank you So I'm Evelyn McClellan president of artifactual systems We develop open-source software for libraries and archives the two tools that we develop our archive matica Which is a back-end digital preservation system and add-on which is a web-based access system aimed mainly at archives and it supports archival hierarchical description and display So I wanted to talk today a little bit about Module that integrates island or an archive matica and this is work that was funded by the University of Saskatchewan They had both island or and archive matica, but the two systems didn't speak to each other so they had island or for You know digital object deposit management of the repository display and that kind of thing and they were also running Digital holdings to archive matica to create archival information packages for long-term preservation, but they wanted they wanted Materials that were ingested into island or to be preserved in archive matica So, you know what island or is it basically it's a digital object repository it It has a set of interfaces for doing metadata entry It does digital object display in various ways, and it's a collection management system and a storage system I mainly put that up there because it's it contrasts with the next slide which talks a little bit more about archive matica And you can see that the two systems have very different functionality So archive matica the focus is digital preservation and it ingests bodies of digital objects different types of digital objects digitized holdings born digital holdings audio video files office files Forensic disks a number of different types of holdings It performs a bunch of preservation micro services And it also has functionality to normalize to preservation formats It generates premise in Metz xml files, and that's probably one of their key functionalities of archive matica is it has this very robust capability To generate preservation and technical metadata Generate premise metadata and events and agents and rights It has a very rich and detailed premise implementation, and it packages that all up in a metz file So archive matica produces archival information packages, which are are fairly Generic and they're transparent and self documenting and they're meant to be very system independent And then they place Then archive matica places the archival information packages into long-term storage And it's fairly neutral about what kind of long-term storage that is Archive matica isn't a storage management system It simply creates these archival information packages to put into archival storage So archedora University of Saskatchewan Contracted with artifactual systems and with discovery garden to create the functionality in both systems To speak to each other and then discovery garden released the archedora plugin a little bit earlier this year The the functionality the plugin is complete But archive matica is is still in beta for this where actually the functionality will be Released and complete with the release of archive matica 1.4, which we're hoping to do later this month So it's very close to being ready So this is a basic look at the workflow archive matica actually integrates with other access and deposit systems And there's a couple of different ways that it can happen You can ingest content into archive matica generate dissemination information packages and then upload those dissemination information packages to access systems In this case the workflow is the other way the holdings go into island dora and then archive matica works on the back end To to preserve the content that is coming in from island dora So what happens is that island dora continues to act as the access system and Archive matica is creating the archival me archival information packages in the background So that's the basic overview files are added to island dora They archive matica retrieves them from for dora and then sends back a notification to island dora saying that the archival information packages have been generated And everything's okay That's the more detailed view and i'm i'm not going to go over it in a lot of detail that's actually one of the design documents So basically the workflow is that the user or multiple users are depositing content into island dora And for dora content validation triggers a call to archive matica so You can have multiple users working on multiple collections And ingest is happening automatically into archive matica per collection. So What you can do is you can figure the arch and arch dora plug-in to say, you know, what kind of archival information package Do you want do you want, you know, how large do you want these packages for example? So what happens is that whenever this content validation is occurring? um island dora sends a message to archive matica Saying that this content validation has occurred and then archive matica goes and gets the fedora metz file It looks at the metz file it parses the information And it goes back to island dora and it fetches the digital object stream and the mods file and ingests it into to archive matica And this is an ongoing process Until a certain limit is reached which is specified by the user. So let's say the user is saying, you know, when there's 20 gigabytes Of content in a certain collection and that triggers the creation of of a transfer to archive matica So archive matica stops gathering and actually creates a transfer that's ready for processing an archive matica so at that point archive matica Verifies the checksums that are coming in from the fedora metz file And if everything is okay, then it's The transfer comes up in the archive matica dashboard and the user can then go in and improve this approve the start of the transfer so At this point all the digital preservation microservices kick in The content gets processed the archive archival information package Is created and placed in storage and then archive matica sends the completion status to island dora and says, okay, the archival Archival information package has been created and the content is safely stored So i'm just going to show you Uh a manual process for this because it sort of helps Um to illustrate exactly what's happening here. So you have You have an image collection in island dora, for example And there's a detail. So what I did is I just went into our island dora archive matica test instance and just Manually sort of had archive matica and just a single digital object So if you were to click on the manage tab On that screen You would see there's actually an archive matica link there along with All of the other things that you can do to manage this object So you click on that And you you tell archive matica To ingest this file and you get a message saying this has been successfully submitted to archive matica So once you've done that and you have all your content then archive matica does its thing And I don't know how familiar you are with archive matica But that's basically one of the tabs that is showing you all the Results and outputs of all the various preservation microservices that are happening So in this case you can see that there's a check mark next to the The houses in strathcona there that's saying okay, everything's been done. We've run through all the microservices. Everything is successful If it weren't successful, you would see red and you would see error message And you would be emailed an error message and that kind of thing and it has Successfully being placed into our archival storage Which is good. Um, this is what canadians look like when we're pleased So at this point you can go I mean, this is just for demonstration purposes But you can go you can retrieve the the digital object From archival storage using the archive matica archival storage tab and you can see there that this particular Archival information package consists of just this one picture And you can see the reference to the data stream. You can see the reference to the archival information package So the archive matica archival information package is a very simple Folder structure that is bagged up in the library of congress bagged format If you want it to be actually you don't have to but it is bagged and compressed in most cases So there you can just see that's a very typical bagged format if you're familiar with it Within the data folder we have the archive matica mets file, which is Fairly huge. Actually, it's amazing even for a single digital object If you were to open the the mets file, it would be very large It would have a bunch of extracted technical metadata It would have all the premise events everything that had happened to it in archive matica would have agents It might have other metadata that you added during processing So the island dora mets file The fedora mets file, I should say has also been ingested and packaged into the archival information package And you can see there's a folder for the mods file as well So the the mods file that was in island dora has been ingested and Some of the contents of the mods file have been parsed to the archive matica mets file And you can search for This particular object using the mods identifier so you can search for that in archive matica, for example So archive matica has sent a status update to island dora and if you go back into island dora Here i've just gone into the collection as a whole and it's showing me all the objects that have been ingested Into archive matica and successfully stored as archival information packages So at this point you can choose to to go ahead and delete the original content that was deposited into island dora Because island dora creates derivatives for display Some institutions don't want to keep the potentially very large Master digital objects in island dora. They just want to keep derivatives and they want to store the masters in archive matica So at this point you have the ability to go through and and delete content to delete the original Object data streams and just keep the derivatives actually at the beginning We had thought well, what if we just want that done automatically and as soon as archive matica sends the message everything gets deleted but University is just catching got very nervous about that possibility and decided that we wanted to have an approval step Which is pretty understandable So as I mentioned, I mean what I showed you there was sort of a manual process But this can happen very automatically so you can have archive matica Archive matica running in the background you can be depositing a lot of content into island dora and these Ingest processes can be triggered automatically Anyway, it's all up on github The the module itself was developed by discovery garden as I mentioned it requires archive matica 1.4 Which hasn't been released yet, but which will be released later this month There's a lot of there's a lot more we could do There are we could Parse more of the Mets content we could ingest the premise that's coming Out of the fedora Mets file and actually parse it to the archive matica Mets file One of the other things that we could do is I mentioned that the workflow is basically you ingest into our to island dora And the content ends up in archive matica We would love to be able to do it the other way so that if you have a bunch of holdings in archive matica You could create dissemination information packages and push them to island dora We always want the workflow to work both ways so it doesn't tie the institution to always Doing everything in one way in a particular linear order And of course you may have archive matica for years before you get island dora And then you want to push all your content to island dora. So that's development that awaits funding and So basically watch this space is what I'm saying about that part of it And that's it So I'm going to talk about my library's migration to island dora And You may be asking What's so innovative about a migration? We all do them. They're fairly common if not routine Um, there's not a lot innovative about our migration except that we are being forced to We have a lot of things. We have 1.3 million objects in our current repository general repository system It's a content dm repository And the size of that migration is forcing us to be innovative in certain ways And I'll talk about that in a minute We also are developing a lot of tools During in preparation for and and for after the migration and as christians already alluded Island dora is fairly modular kind of following in the pattern set by drupal That's got a very open back end fedora commons is completely open with apis that you can do interesting things with So this is allowing us to create a lot of new modules solution packs and and my miscellaneous tools for island dora To help meet our needs again Prior to during and after the migration And we're going to talk about a few of those things in detail And I'll be talking about why we're migrating What I'm calling our migration use cases some things that are special to us although they could be The case with other sites as well Talk about our metadata mapping and conversion process and tools and then talk about Two or three of the other tools that we're developing and we'll be releasing To the community for their use if they can find a use for them So while we're migrating We've been using content dm for a number of years and it's served us fairly well But it's not very flexible at all And we need a we need a much more flexible repository platform To do things like to start automating workflows. So to move content between Tools and repository platforms We need to We can't do that with content dm at the moment. It's impossible To automate My library tends to favor open source solutions, particularly when there's good or better than the commercial alternatives And we already use islandora in a Another repository that's fairly new. It's called radar. It's our research data repository and brian Owen talked about it in this room yesterday afternoon Our institutional repository is a straight up Drupal site. It's not islandora. It's just Drupal But we have over 10 000 objects in that repository mainly theses but also a lot of e prints and pre prints And uh, we are a Drupal shop We run our two main websites on Drupal and we have probably half a dozen other Drupal based websites of various kinds So we want to Standardize that platform and islandora is giving us an opportunity to do that These migration use cases I mentioned a minute ago again One of the most important and challenging is the size of our migration 1.3 million things in content dm 900 thousand of which are newspaper pages So we have a pretty big collection of newspapers Uh, and that's significant for a migration to islandora because when you ingest newspapers into islandora takes a long time Because it runs each tiff image each page image through An ocr process it converts from tiff to jpeg 2000 and it just takes quite a while It would for any system, but it does Take quite a while in islandora Something I'll explain this in a few minutes in more detail But content dm provides a web an http api you can query it via the url And you can get information about objects in content dm via url And we have taken advantage of this by building a drupal module Uh, in a few years ago that Is a front end for a set of collections in content dm within a drupal website Um, and because of that we have two two of our other drupal websites I mentioned a minute ago are multicultural canada and the komagata maru Journey, um, so we have two kind of boutique websites or websites that consume content that's in content dm in a drupal website and uh We don't have time or resources to convert those to islandora now so we need a way to Again, I'll just go have this in detail in a minute, but we need a way to kind of Swap out that api the content dm api and make the sites think that they're talking to content dm In essence We really want to preserve the old urls in content dm so that uh when so we don't have to go back and update them Not only in our own internal systems like our integrated library system But anyone who's linked to a url in content dm We want them to be automatically rerouted to the the new version of that object in islandora And there are a number of uh collections we have That are of a content type so to speak that is not currently supported in islandora So we need to address that particular need as well Uh, so central to this migration is converting all the metadata that we have to describe these 1.3 million things Into mods which is the out of the box metadata schema used by islandora And so we need to map from the content dm of data to to mods And that's a fairly uh routine process We are developing a tool to help us do that in a flexible modular sort of way But one challenge we have is that we have some collections that can't map to mods. They don't the collections don't describe or contain book-like things they they're basically Biographical databases, so we have a solution pack that we've developed to address that and I'll just grab it in a few minutes And during the migration, we're going to take the opportunity like many people do to clean up some of the The metadata we have we haven't been terribly consistent over the years for various reasons some of them legitimate some of them not so legitimate in creating our metadata and things like date formatting particularly easy to Clean up during a migration We have a lot of collections that are owned by faculty and they were never interested in adhering to the library standards or even library worlds Kind of view of metadata descriptions. They have their own special view of how to describe the items in their collections And so they just make up and make up metadata elements to best suit their research needs and their community's needs So we have another challenge and that is how do you map those kinds of metadata structures into a library specific domain specific structure like mods and kind of the outcome of the migration is to develop a kind of a single library wide metadata Profile with specific profiles for different kinds of collections in in our repository So a couple of slides to just show kind of how we're doing this In content dm you have for each collection. It allows you to configure the metadata the way you want Largely based on on the dc terms metadata element set a double core metadata set and We're not using the meta the dc terms mappings in this migration because We feel that especially in some cases that I just described with the specialized metadata from faculty There's no mapping first of all second of all We're kind of losing we can lose information if we map to double and core and then map up to a more granular schema mods So we're skipping that And what we're doing is taking all the fields in a particular collection as you can see illustrate on the left and As part of the tool, which I'll describe in a second allowing a metadata expert to Configure a template that is used to create the mods records for the that item During its migration into island aura So they have to learn a little bit of xml markup, but that's all they have to learn to do this Or we feel this is going to be an effective way to to allow us to migrate kind of on a one On a one-to-one basis the elements that are in our content dm collections into mods and to also allow the metadata experts to establish sensible values for some of the elements that get into the mods in these templates So moving on to the tool development and the first one that i'll describe in detail is this move to island aura kit As we're calling it myk And i'll describe that next I'll describe a couple others of these tools that we have developed or are in the process of developing as well and all of this kind of getting back to the the community that I think all three of us have mentioned but i mentioned at the beginning as well we are Writing these tools from the ground up as open source as gpl3 Which is the same license that the rest of island aura uses Knowing that we want to release them to the community Some of them are pretty specialized and may not find many implementations in the community But we're writing them not for ourselves only for ourselves first, but we're writing them for the whole community So the myk is a set of command line tools for generating island aura import packages or ingest packages island aura has three currently three tools for ingesting batches of content one that is They're all command command line tools with also with a web interface But i'll be focusing on the command line aspect here One tool is called just a batch in the batch ingest module and it ingests objects that have one file So a pdf a video an image second one is called book book batch and it ingests book objects and the third one which is Fairly new is called a newspaper batch and it ingests newspaper issues Uh, and I think christian you alluded to a generic batch ingest process for the manuscript collection solution pack Right, so that's something that we need to develop So we have a one-off solution like you mentioned for our organization Yeah, we'd like to expand that so others will be able to do it a lot easier But we should talk because I think we we need to develop a compound object In batch ingest and then maybe similar in some ways. Okay, awesome. Um Um Yeah, so the way this works is Because there are these robust and well proven tools for ingesting these kinds of content in dilandora Our tool just takes your stuff and Converts it into these packages. It doesn't do the ingestion or importing itself It just takes your stuff and moves it over to island or formats for ingestion Um, so we're kind of it's on the production side and the organizing side of the of the content Including creating metadata file xml file and by default mods, but you can Configure it so that it can create any kind of metadata xml file you want The initial source is that we will be using our content dm and csv files with local file system files And we're creating it so that it is easily easily extensible by a programmer I mean you need a developer to write some code But it's built so that they can do that very easily using well understood patterns of of uh php software development Um, I won't go into this much more detail because I already have kind of uh talked about it But this is the tool that will Um again in essence let our two Drupal sites that use the content dm api To continue to work after we've stopped using content dm So we're just basically emulating the content dm api in islandora And uh, I know you can't see that diagram, but it just kind of illustrates how this should should work Again, we will likely migrate those sites to islandora in the near future, but not By fall 2015, which is our timeline for the the large migration Um, this is an interesting module. Uh, one of the benefits of islandora is because it's it uses drupal for its Ingest display and kind of workflow management Is in many cases you can inherit a lot of the really really rich uh drupal module ecosystem And uh, you have to because islandora doesn't use all of drupal in a sense You need to you can't use all the drupal modules in an islandora site But you can use a lot of them and one very interesting set of modules in the in drupal Community is called feeds and it lets you import various kinds of content into a drupal site So what we've done is taken feeds and it's rich tool set and kind of written some extra plugins for it that will Make it work with islandora. In essence, it will take say a csv file Uh, or whatever else you're importing And create islandora objects Just like the standard feeds module and drupal creates drupal nodes or pages or drupal users and this Module will help us with our biographical databases because they don't Contain currently in our in content. Yeah, they don't contain Book like things and the metadata does not describe these book like things. It describes people We can use this as a way of migrating from that sort of flat database structure Into islandora objects effectively. So that's this is quite a neat merging of the drupal ecosystem and specific need for islandora And the last one i'll describe is the newspaper batch module, which we didn't write ourselves. We contracted with discovery garden to to write it for us But as I said, we have 900,000 pages of newspapers more than 900,000 So we need an effective way of getting all that stuff into islandora in a matter of a few months and This module will help us do that and because kind of a side effect of having The service provider Write this for us is it kind of showed us how easy it is to write these import modules. So I think we'll Take a stab at the generic Compound object importer ourselves, of course collaborating with our colleagues at ucla before we go too far But uh Yeah, this is just another tool that Kind of for the community that came out of our migration Well, thank you very much for attending. It's our pleasure to meet you today