 Kia ora, Kaupapa. I just in line with the NDF, Kaupapa, I just want to warn you there will be a few pictures of damaged buildings, damaged by earthquakes, so if you're anxious about earthquakes, you might not want to look at this. So I'm going to be telling you about a bit about background to quake studies, and then I'll tell you why we decided to migrate to Islandora, and then Jonathan will take over and tell you about the technical details of how we did that migration. So before I can tell you about quake studies, first of all I have to tell you about seismic. Seismic's director Paul Miller often tells the story of how, following the devastating Canterbury earthquake of February 2011, he watched all those geologists and engineers being interviewed by the media. And it was really obvious how their research could contribute to understanding the earthquake and to helping design what would happen next. What wasn't so obvious was how could he, a humanities researcher, contribute to those efforts. The answer came when he looked back to the Napier earthquake of 1931. Now we know a lot about how the geology of the Hawkes Bay was changed by that earthquake. What we don't know is how the lives of the people of Napier were changed. We've got a few diaries and letters. There are a few people still alive today who experienced that earthquake, mostly as children. But most of the stories of Napier have been lost to the past, and Paul didn't want that to happen in Christchurch. And so seismic was born. The Seismic Canterbury Earthquakes Digital Archive. It's a comprehensive archive of everything related to the Canterbury earthquakes of 2010 and 2011, and their aftermath. It's a single place where we can understand, remember and learn from those earthquakes. It's a place where we can tell the stories of Christchurch, but not just the story of what happened at 12.51pm on the 22nd of February, although that forms a big part of Seismic. It's also about placing the earthquakes in the largest story of Christchurch, of what happened before and what happened after. It's about looking back to the heritage that we've lost in Christchurch. It's also about how life has changed in weird ways in Christchurch. I always have this thing when I come to Wellington that I get slightly surprised by the fact that you have a CBD. For Christchurch people, having massive empty spaces in the middle of our city has become weirdly normal. That kind of thing is going to be forgotten in a few years' time, so it's important to remember it. And it's important to remember the lessons that we can learn from the successes and, more importantly, the failures of our rebuild and recovery. We want other communities not to repeat our mistakes. So that's Seismic. Where does Quake Studies come in? Well, Seismic, despite the name, isn't actually an archive itself. What it is, is a front-end to a federated search powered by Digital New Zealand's API, which gathers content from a whole consortium of contributing organisations. It includes material that's crowdsourced, such as from Christchurch City Libraries Kete, the earthquake archive, from the Ministry for Culture and Heritage Quake Stories site. It also includes curated material from Canterbury Museum and Herot to Papa, and from a wide range of other New Zealand websites that are harvested by Digital New Zealand. Quake Studies is the University of Canterbury's contribution to Seismic. It was originally conceived on its launch in February 2012 as being primarily a repository for research data, but it quickly took on another role, which was that of mopping up all the content that no-one else was collecting, because it was outside of their collection remits or just resource constraints. Quake Studies consciously collected in the gaps and quickly grew to be the biggest contributor to Seismic, mainly because we're a dedicated earthquake archive and everyone else, while they collect material about the earthquakes, is just a small part of their bigger collections. So we've currently got about 135,000 publicly available items in Quake Studies and then another 17,000 odd that we're restricting to research access only because of sensitivities or privacy-concerned other reasons. So that's what Quake Studies is. Why then did we decide to migrate Quake Studies to Islandora? Well, Quake Studies is about six years old now and that's getting on a bit in digital archive years, so it was due for an upgrade. But although Quake Studies, the old Quake Studies was built using Drupal and Fidora Commons, it also includes a whole lot of custom code which was written by CWA New Media and that custom code made just an upgrade of the underlying Drupal and Fidora really difficult to do. Also an upgrade wasn't going to solve some of the basic frustrations that we had with the old Quake Studies. The biggest one and the one most obvious to the public was the search. Search was just never been very good in Quake Studies. If you search for something like Cathedral, which is kind of relevant in Christchurch, you are just as likely to get a classified ad from the press that happened to mention Cathedral Square somewhere in the address as you were to actually get information about the Cathedral. It got to the point that in our team we would use Google to search our own archive, which is not ideal. A problem that was more just an issue for the team was we didn't have a way of updating our metadata in bulk. We had a really good bulk ingest system where we could bring material and its associated metadata in a bulk ingest and import it all into Quake Studies. But once it was in there, if we wanted to change anything that was a manual process of going through one item at a time and making changes, which is fine if it's just fixing a spelling mistake you've made somewhere. But when we had big changes to make, like when the Historic Places Trust decided to change their name to Heritage New Zealand and as part of their rebranding they asked us would we mind going through their collection and making sure their name was correct all the way through it. They had about 3,000 records and photographs in Quake Studies that took a while to do. So that was a frustration. The other frustration was our API. There was a couple of minor security problems with it and that meant we were never comfortable making it publicly available. And that's become a problem recently because Seismic is moving from its initial phase of just heavy collecting into a second phase where we're more interested in encouraging reuse of our content to create new knowledge. And without a public API it's really hard to encourage reuse of your content. So those were the problems with the Old Quake Studies. Islandora offered to solve as those problems pretty much out of the box. It's still a big job though. We're effectively building whole new Quake Studies and then having to migrate all the data from the Old Quake Studies into the New Quake Studies. So it wasn't something we took on lightly. But we think in the long run it's going to be worth it because Islandora has a really good development community and we can take advantage of that community to help us reduce the risks and to reduce our ongoing support costs. So that's the why of migrating Quake Studies. And now I'm going to hand over to Jonathan who's going to tell you the how. Thanks, Jonathan. And just at the end there, one of the other advantages that University of Canterbury are getting moving to the new platform is responsive design which wasn't able to be taken into account the first time around. So Islandora is a platform you may not be familiar with it. It kind of lives at the intersection of institutional repository. So, for example, University of Prince Edward Island, publishing all their theses and things like that. The other thing it does is research data management. So it's been used at Simon Fraser University. For example, as a repository for research data, as it's made, they've kind of made their own drop box style system for researchers to put their content into. And also digital collection management. So an example here from Rene Shalu who was here at NDF last year. This is New York Public Library. And they've got these, this is one display that they have where you can look at a map of the city and see pull-up stories, testimonies from inhabitants around the city. Another really good example is the May-Bragdon Diaries. They're really nicely themed. So because you've got a full CMS in front of your repository, you can apply all the theming and customisation you want. In this case it's looking at the handwritten diaries and the TEI transcripts side by side. And then pulling out things like named entities for places and people and so forth. And it's a really gorgeous presentation. So one of the first challenges we had in doing the migration was just simply figuring out what was in the system. So we did a lot of discovery work, looking at the ontology, the collections, how they related to objects and parts. And typically an object was typically a container for parts and the parts were the files like the PDFs and the images and videos and so forth. There was a fairly complex access control model regarding availability, audience and roles. And then kind of supporting entities. So events, people, places, parties, tags. And quite a complex ontology. You can see the whole thing online but the ones that are painted and read there are actually ones that were removed or moved their fields into other entities. So we simplified the ontology quite dramatically. We went through a lot of that analysis. One of the quick things we did, Drupal was a content management system. We could quickly whip up some content types that matched the ontology, create the fields, use the Drupal Migrate framework to pull data out of the existing system into those content types, display them as tables, and pull those CSVs into Open Refine. And once they're in Open Refine we could do things like text facets and that could tell us things like the fact that we had Saint Albans and Saint Albans and then Saint Albans with a trailing space, that kind of thing. So that was a really quick way to get a feel for the collection and then Jennifer and her crew could tidy a whole bunch of stuff up before we actually did the migration. The other thing that Open Refine offers is all download it, try it out, but it also offers really good search and filter and it has a lot of cleaning up tools as well if you're targeting something else. And then the actual migration. We sat down, looked at our options and decided on treatment for various collections. So in some cases the existing containers would be skipped because we could actually collapse the hierarchy somewhat. In some cases objects became collections. We migrated parts to become objects while the existing quick studies has collection object part. Islandora just deals with collections and objects, though in some cases those objects might be compound objects. We collapsed the access model right down to just a matrix of roles and permissions. Event and date time became event, address and position became place and so on. And Islandora can support a whole range of metadata standards. We chose RDF and primarily Dublin Core with Dublin Core terms. So especially DC coverage was split down into DC terms temporal and spatial. And the DC term spatial now gets indexed into solar using the solar spatial indexing which means once we've done the migration we can start offering things like maps and proximity searches and so on. Which I think is going to make a huge amount of difference for the kind of very geographical focus content of quick studies itself. We used schema.org for a bunch of stuff, especially for the supporting entities like people, places, events and also things like URL on licenses. So Digital New Zealand have their own data dictionary. We needed to include a few fields that were specific to Digital New Zealand because they weren't easy to derive from the other fields. And also quick studies themselves had some custom fields that in the short term we had didn't really make sense to map to external ontologies. So the beauty of RDF is you can coin fields, coin your own predicates. So there's a bit of XML to see what the RDF looks like. It should be pretty straightforward, but you can see the prefixes for the namespaces like DC, DC terms, DNZ etc. and in the fields and actual data in them. I'm going to skip through this really quickly and I'm happy to answer questions afterwards or during the break if you want to come and talk in detail. So the new ontology is a bit simple as I said, more simple. Drupal Migrate Framework. Drupal as a CMS in use by many major media institutions has a really mature migration framework that we could use. So we went through Drupal into Fedora and this is the command line tooling that tells us where the migration is at. The blue ones are the ones going into Fedora, the unhighlighted ones were just into the content model to start with for the discovery. We start off with a license and place and entities, things without dependencies, gradually built it up and then eventually comes in the collections and parts. We're keeping track of that with a report of all the collections, the green ones have been migrated. This is a rendering using D3.js, a tree architecture. Some of the challenges were really big XML payloads that crashed out XML parser, so we actually had to do some surgery on those on the fly. We found some interesting DNS timeouts with some issues with respect to the networking inside UC. We're migrating over 150,000 objects in 1.3 terabytes so the whole process is going to take about a month and we're just over halfway through right now. It's basically because we're not just doing a file copy, so all of the video and audio being retranscoded, the PDFs are being re-OCR'd, all that kind of stuff on the way through. This is just a quick glimpse of the new homepage so this is not quite public yet to be in a few weeks. So what does this give us for the future? Well, I've already mentioned maps and spatial search. We can do some look-ups of things like the Lin's street address info. They've got 1.93 million street addresses for the country including latlongs. Improved displays of content so we can start showcasing collections while we're doing the migration the Islandora community released a new version. So that was the kind of thing that we were able to tap into while they had a bespoke system. And finally, once we're migrated then the API will be available for use. We might have a couple of minutes left for questions. Thanks. So if you've got a question, we need to make sure you've got a microphone before you ask it so we've just got that being sorted at the back of the room. Does anyone have a question? Anyone? No questions? We've got this simplification from three layers down to two and it's not actually going to make a lot of difference because we weren't always using those three layers. We quite often were using the object layer as kind of a mini-collection anyway. Like we'd collect all the photographs by a particular photographer into a single object just for convenience. Or sometimes we'd have a single object that was identical at part layer and at object layer. There's a handful of parts that are going to be slightly differently structured but most of it's actually going to end up looking pretty much the same to the public. It's clear. Sorry. Just to expand on that answer, thank you very much. Is it clear to future users that the data has been manipulated in that way or how does one know? We're hoping that users won't actually even notice the change because we've kept all the old URLs that had the word part in them still map across and so you can put in an old URL and it'll take you through to the new URL. And unless you were looking at the URL you wouldn't really know whether you were in a part or an object before anyway. It was kind of, you know, maybe to an archive geek you might notice it but I would imagine to the average user they didn't really know which between parts and objects. Any final questions? One over here. I was just wondering if there's any privacy issues around the geolocation data now being easily mapped in terms of, you know, you're giving people private addresses where their house was and so forth. That information's always been there in some format, like it's there as GPS location or as an address anyway. We only put addresses mainly on things in the CBD or things where it was actually meaningful to have an address. If it was just someone's private house we normally didn't unless they said it was okay to do that because it wasn't as significant other than just would broadly say this is a house in Richmond or whatever. We wouldn't put it down to an address level for a person's private address. Right, if there's no more questions then it can be time to move on to your next session or stay in here if you're interested in the session in Oceana Digital.