 Kia ora and welcome everybody to my presentation for Drupal Gov 2020 on Archive Central and Next Generation Records Repository. So my name's Jonathan Hunt, I look at Catalyst IT here in Christchurch, New Zealand. The slides to this presentation are available at the URL on the screen via GetLab pages. So for the next 15 minutes I'm going to talk to you about the work that we did for Archive Central. Archive Central is a archiving, a physical archive and a digital archive run by a consortium of nine councils in the central north island of New Zealand. And we used Drupal 8, specifically a distribution called Islandora, which is a digital asset management system to handle their digital archives. So they evaluated both Islandora and Archive and Archive Space and several other solutions and they settled on Islandora as being the best option for their needs and this implementation was in the first handful of Islandora 8 implemented around the world. One of the things that makes it distinctive is that we implemented the records and contexts ontology from the International Council of Archives. This is a brand new conceptual model intended to bring as a data model for archiving with a focus on bringing archives into the linked data world. So as part of that we did some data modeling in Drupal and we also had to do a migration out of the existing system called Kete. We migrated records and then binary files such as still images and documents. And the supporting data around that content including agents, which in this case is primarily organizations. Just a quick word about Catalyst, we're an open source solutions company with significant Drupal expertise and experience and we have offices around the planet. So Archive Central had an existing digital repository based on Kete, which was a Ruby on Rails solution that originated about 10 years ago. However it didn't really catch on and lacked community support. So even though it was open source they really had no roadmap for improvements. So they were looking around for options. Also the Kete system that they're on lacked a responsive theme. So that was one of the things we brought. We contributed and we moved them to Drupal 8. So this is an example screenshot from the Drupal 8 system showing the same content. If I open up that specific object page, which in this case is an aerial photograph, you'll see I'm logged in here, which is why I have the menu bar highlighted. This is an aerial photograph that's been scanned. You can see some metadata around that specific digital object and includes where the physical object is stored. This particular object is being displayed using the OpenSea Dragon viewer, which uses IIIF to pull tiles from the backend server and that gives you a nice smooth interactive zoom where you can zoom in and pan around and see the object. The way it should be seen, one of the things to look behind the scenes here is if I click on media, you'll see the original file and then you'll see a couple of derivatives, in this case a thumbnail image and a service file that have been generated automatically by the island or microservices. Just a very quick introduction to the architecture of island Dora. So island Dora consists of a Drupal 8 distribution and Drupal stores the binaries and metadata in a content repository called Fedora, otherwise known as FC repo. On the other side of that, you've got indexing of the metadata into solar for keyword search and blaze graph for graph search. Island Dora is supported by a range of microservices that do things like derivative generations generating thumbnails or service files, transcoding for video or audio, OCR, FITs which extracts metadata like EXIF from files and any number of other microservices that can fit in and be invoked when you are ingesting new material into the archive. So one of the primary contributions that we brought to this was adopting the brand new records and contexts conceptual model. As I said, this has been generated to bring the archives world into the world of linked data. Records and Context started in around 2016 and they are up to version 0.2 so it's still a work in progress. But what the records and context ontology defines is entities, which in this case could be records or corporate bodies, et cetera, attributes which are the fields that go with those entities and relations between entities. And version 0.2 of the conceptual model has 22 entities, 41 attributes, 78 relations. An example of an overview of the entity hierarchy is shown here with the ones that we have used in archive central highlighted. You can see that we have a record sets which are essentially collections of records and records in turn might be composed of record parts. There's an instantiation that goes with the record and that's the digital component, the digital file is an instantiation, for example a scan of a photograph. Those records and record sets are given context by additional metadata that might describe an agent, an agent's breakdown into persons, families or corporate bodies and corporate bodies would typically be government agencies or commercial companies and so forth. They're further categorized by date and place so one of the significant changes that we made is rather than implementing dates as a class in their own right, we used the out-of-the-box IonDora EDTF format, so that's extended date time format from the library. So just very quickly, the mapping from records in context of Drupal record sets, i.e. a series or a collection is mapped to nodes, records are made up of nodes along with media and files, agents in records and context are mapped to the out-of-the-box IonDora corporate body, family or person and though those entities are defined on the Drupal side as terms within vocab groups. Place maps to the geographical location term that comes out of the box in IonDora 8 sessions which is incoming material for the archives mapped to nodes, we also did some work around containers and locations attracting the actual physical storage in terms of shelves and racks and boxes and so forth, so those were some custom Drupal node entities and along with that we have some other standard content management type components, rights, newsletters, etc. Because all of this is mapped to RDF we have any of the records can be exposed as linked data in this case with, as Jason linked data and you can see here that the type of record it maps to the records and context ontology type of record and it also maps to the Portland common data model type of object. This is an example of the edit page and if I quickly jump to the live instance I can show you the edit page in action, you can see there is an extensive number of fields and we have annotated the labels of these fields with the relevant attribute from the records and context standard, so if anybody, any staff doing data entry, if they have a question about what should go into that field they can actually reference the standard directly using those label annotations. A couple of other things of note, one is the linked agent that uses records and context or Libre of Congress mark-relators which gives you a really extensive vocabulary of around 200 different ways that a person or an agent could be related to a given work so you can have instead of just dub and call, create a contributor publisher the mark-relators gives you things like photographer or interviewer or interviewer and so forth. One of the other features here is these fields with distinct backgrounds, those are private fields so those are visible to staff when they're authenticated but they're not visible to the public. So just a very quick comment about the migration, we had to migrate over 210,000 records including around 7,000 images and 600 PDF documents, we used the Drupal 8 Migrate framework for that of course and it worked well for us. The primary source for the migration was a local snapshot of the Qt MySQL database but we also used a little bit of CSV and some embedded JSON basically to do some preparatory work like setting up some default licensing and rights and a few other terms for example formats and so forth so those were populated using migrations where we just embedded the relevant data directly in the migration using embedded JSON. Obviously as a sequence you need to follow for a given migration in terms of dependencies between data so we migrated users, Qt Ace became groups so one of the things to note about this installation is that the individual councils have been mapped to Drupal 8 groups to give them some access control around their own material and so staff can have rights to edit just the content relevant to a specific council. We migrated some basic pages, licences went to rights, agencies to agents, accessions to accessions and so forth. One of the things to note at the end there, we had trackable items which basically mapped to the locations and containers, we needed an intermediate table for that so we migrated some of the content out of Qt into an intermediate table and then we did a second pass of the migration to actually establish the Drupal entities that we wanted. As is usual for this kind of thing there were some challenges around the data, this is what happens when you have a group of people maintaining content over a long period. Some interesting things were long titles, we had some exceptionally long object titles over 1200 characters which of course makes for some huge URLs if you're using a title token for your URLs and we shunted those off into a distinct full title field which was then allowed archived central staff to manually migrate those entirely after the full migration. There were various values that had been put into fields that meant essentially the same thing so consult archivist, see archivist, refer to archivist, those all mapped and consolidated within the Drupal migration YAML and we used open refine to examine the content that we were working with and to come up with, you know, identify common values to things and the range of values that were in certain fields. We also had to do some processing because of relatively simple off by one errors so in Matheson city archives with or without the capital A for archives by default that would show up as two different terms so we consolidated and merged those and there were various challenges around data modeling for example putting coordinates against a record when ideally coordinates should be on a place term that is associated with record. So just to wrap up Islandora is a next generation repository based on Drupal 8. It's very flexible and can map to almost any metadata in this case we adopted the records and contexts conceptual model which is a very good fit for archiving. We used linked data extensively exposed to JSON-LD and indexed into Blazegraph. We used linked agents and specifically the mark relators to give a rich model for the ways that people can be associated with content. We implemented a responsive theme based on bitstrap 4. I haven't shown you the search API or the search interface but it uses search API including extensive use facets and under the hood one of the microservices is OCR so that lifts the text out of PDFs. We've got group-based access control around individual councils content and we use the AAA viewer for zooming and looking at tiled images and those are served by cantaloupe. So that's a very quick introduction to the archive central project. If you have further questions please email me at j100catalyst.net.nz or catch me at the chattelist booth after this presentation. I hope that was useful to you and I hope you have a great triple go of 2020. I've posted a link to the slides in the discussion forum so if you want to go back to any of those links check out that. I'll also paste in a few examples of content if you want to follow up on what things like photographs and so forth might look like. Thanks a lot Jonathan this was a great presentation. Thanks a great.