 Kia ora koutou. Thank you, Claire. And, as you can see, it's been a bit of a week of it for me. Culminating in a broken wrist. So, I'm currently managing a farmacopia. So please bear with me also. My slideshow seems to have changed, but that may have been the tramadol. OK. So, hi, Valerie. I think I can do it like this. No, I think I'll be all right. Thank you, Valerie. So, the Alexander Turnbull Library is this... Can you hear me first? Hello, data sets. Hello, data sets. Hello, audience. I will assume my deepest notes in bello. OK, so the Alexander Turnbull Library is about to cap five years of sustained work by delivering two data sets from the unpublished collections to four key international standards. The data sets are provisionally titled Turnbull Unpublished and Turnbull Names. And the standards are those of encoded archival description, encoded archival context, unicode, and creative commons. I'm going to tell you about the journey and about why taking the long view we consider this step and these standards to be so important. I'm actually sort of dedicating this to a young cartographer who I heard at the Geocart Conference recently who said that when he's bored and avoiding his thesis, he downloads a data set and turns it into a map. I think this is fantastic. This is the new class of researcher and we want to be engaging with them. So, the Turnbull... I'm not going to rehearse the Alexander Turnbull Library. I'm going to assume that most of you know what we are. We are a library with an archive. So, some of the collections are books and journals and websites which have well-developed disciplines for description and interoperability. And some of it is unpublished collections, mostly unique, which are the societal archives for our country. We can come in a box like that and it could hold anything, any sort of format. I think of them as the yang to the yin of government archives for New Zealand Aotearoa. So, there are about 80,000 of these unpublished collections as collections. And just skimming here, they range from the records of the Te Faiti family of Wairarapa to those of the Whites Aviation business from Anzwestra to the top twins, from Koso to Kokodian Stains and from Te Opoko o Te Ika's radio station interviews to the library's own entire ephemera collection. These unpublished collections are used by all sorts of inquirers from the very specific, what colour was my house, to the very broad. How New Zealand has treated people with disabilities, for example. The purpose of the library collecting and preserving their societal archives is to make them accessible and discoverable. So, the turnbulls' archival responsibilities include stewardship for the descriptions or metadata which control all of this material. This is where my team arrangement and description and finding aids and metadata and now data sets come in. So, arrangement and description, which is a relatively new team, five years old, builds the finding aids that communicate the unpublished collections and their contexts to researchers. And the key principles for archival description are respect for the original source or provenance and respect for the original order of people's or organisations' records. This means conveying not only item descriptions but how the items were related to each other, not only where they came from most immediately, but their original contexts and custodial history, not only their copyright status, but any agreements the library entered into in order to secure their acquisition because, of course, most material is donated by New Zealanders. And here we have Valerie with the top twins. So, Turnbull Library has always made these descriptions available freely using whatever technology was available. This continued with our first forays into automation in the 1980s, firstly with an oral history database in BRS and secondly with an automation of the manuscripts and archives collection which kicked off in 1990 with a system designed by the National Library in partnership with Turnbull and called Tapuhi, which has the meaning of to pull or tug. And also the acronym Turnbull's Automated Project for Unpublished Heritage Items. So, this system is where our raw metadata is coming from prior to the system change that we've just been involved with. So, the system that Tapuhi was built in was called PIC. It was the platform of choice for library automation of the day in the early 1990s. It had a sophisticated file structure, indexing design and searching systems. But it was tailored to the hierarchical nature of archival description and to the many unique activities of the Turnbull Library. And, unfortunately, it mirrored a traditional curator-led view of the unpublished collections and turned out to be problematic in the long term, particularly for encoding standards. So, the system developed for manuscripts and archives was successful. It was extended to the rest of the unpublished collections including the archive of New Zealand Music, which was incorporated with manuscripts, the photographic archive. These separations are called accounts. By slip into accounts, that's what that jargon means. The drawings, paintings and prints account, which included ephemera, cartographic archive and the New Zealand cartoon archive. And, eventually, the integration of the oral history database is still another account. These format-based accounts stored incredibly rich descriptive text and textual links. And they used additional consistent controlled vocabularies to expand the discovery points through standardized terms for names, places, subjects, iwi, hapu, and, eventually, naopoko, tukotuku or Māori subject headings. However, unfortunately for researchers and sometimes for donors, the separation by format practice became reified over the 25 years of Tapu'i's use into an automatic reaccessioning of collections which could only link back to the rest of their original collection by textual clues which went easy to follow. Staff could and did mediate to form and print a funding aid for a collection, but what researchers generally got online was a partial and incoherent view whether through the library's own OPAC or through the National Library's Online Channel Federated Discovery tools. I'm not going to talk in detail about this, but I do recommend a wonderful blog written by John Sullivan who was there for most of the life of Tapu'i and that's available on the National Library's website. So the result was that when five years ago it was clear that Tapu'i had to be replaced because it was old software. It would be from the perspective of the newly created Cross Library arrangement and description team. The biggest risk was that it would be replaced with an attempt at a new Tapu'i. Instead of by a system that met the needs of a research collection of societal archives using the principles of provenance and original order. Setting our goal as the delivery of turnable unpublished data sets to the appropriate international standards was enormously helpful as the library moved into the unfamiliar and challenging process of an IT project. This is what the data looked like. We didn't have a good understanding as it turned out of our own data. We had very limited technical documentation. We had attempts to understand it in the very early days and Hex9e anyone? So that's the structure of Tapu'i. So the standard of choice, of course, was encoded archival description or EAD which had been developing through the 90s in XML designed by archivists, and their users for putting finding aids on the web. And on it, such achievements as the UK Archives hub, the United States Open Archive of California and Europiana had begun to be built. But more importantly, the choice of a standard allowed the library to start planning and preparation before any formal IT project phase actually began. And thus enabled a step change in our ethos and our morale and our capability. You have, after all, two generations of librarians who've been using one system very successfully from an internal point of view. As early as 2013, and here I have to thank Dr Daniel Pity and the University of Virginia who made available an instance of the ICA Atom software tool so that we could experiment in a project called Project Hikatea with a crosswalk from our old data to the standard. We were able to develop an EAD crosswalk and see where the metadata would end up. This encouraged the library as a whole because we could talk about it and demonstrate that none of the rich data that had been built up through many staff years would be lost. It also helped break down the notion that EAD was just about web formatting and let us start to understand the structure of our data. So by the time we came to the formal specifications and procurement processes, there was an acceptance that requiring must comply with EAD and EAC and Unicode meant we were filtering for vendors who would already understand our core business or be prepared to learn. This allowed us to bypass an enormous amount of work. If I had one piece of advice for myself, it would be to have also specified must validate against a site like W3C since this would have helped enormously with testing later. Anyway, as the library then actually got to grips with migrating 25-plus years of raw data, the use of standards and the project ticker tier crosswalk was invaluable. For example, where different accounts had used fields in different ways or changed the use over time, we would just have to find where that data should appear in the encoded archival description and then map it to the correct path. And then the vendor could follow this during the extract transform load phases. The whole exercise was still enormously gruelling because we did move from having six accounts to having one involving 23 enormous migration spreadsheets, extensive business process analysis and the cessation of the reaccessioning practice. But this has meant the beginning of an ongoing harmonisation of descriptive practice across the library which is still going on. And it's also meant that we have a way to test if what we've done works because if it doesn't, it doesn't validate. I'm not going to talk in great detail about the migration itself either, apart from paying tribute to Valerie Love and Kirsty Cox, who bore the brunt of the mahi with the great stretching of capability. I understand that they've just provoked mass migration flashback nightmares in Australia, so I'm hopeful that that conference paper from the Australian Society of Archivists will be published or shared. What I want to emphasise though is that the result of driving this IT project by independent discipline-based standards was, in my view, that we got a highly successful IT project. Even though both IT and project management were very new and unfamiliar processes for the Alexander Turnbull Library and even though the usual upsets happened, for example, we had to do much of our own testing unexpectedly, we had to use RealMe, we had to incorporate a completely new field to drive the item requests module. Still, holding to the overall goal of moving from frozen poor practice to working with the international standards for the benefit of researchers, i.e. delivering EID and EAC data sets in Unicode, lifted the whole experience and meant that we could make hard choices in situations. From the descriptive point of view, we could manage the key project elements of scope, budget and timing or quality and not lose our way. So, we did migrate the Tāpui data to the new system called Tiaki successfully over three data loads and we've been working with this new system now since January. I guess you could say in project terms that the big Tāpui to Tiaki one was waterfall working in Agile this year to improve. We're now about to, before Christmas, launch the first major set of enhancements and we expect to be continually improving the system until its next big step change. Some of the improvements have just been those of a modern system and it's almost embarrassing how excited we've been about being able to import from CSV files. I know the rest of the library world did this decades ago, and we can do fancy charts, we can bulk update and access restriction change across thousands of items, which previously we just wouldn't have had the resource to do. But some of them have also been getting to grips with what working with standards actually means. Our axial emu vendor has actually been a great partner here. Because if you allow yourself to tweak a standard, it's no longer a standard. It really does have to have the field elements in that order. It really does have to have the characters in upper or lower case or whatever it is. And you can't mix up your unique this with your illegal characters of that. Anyway, so as a result of the latest development I'm pleased to say that we now have the first entire Alexander Turnbull Library unpublished collection metadata as an EAD corpus and an EAC corpus available in FTP. And that is it there on the National Library's FTP site. And because we can now run standardized tools like XML lint over it, and thank you here to Rhys Owen, I also know that it is almost all valid to the standard. So we've got nearly a quarter of a million New Zealand names and 80,000 EAD files and the error rate is very small. So we also have an automated refresh setup for the early part of each month for each data set. And it will be available under a Creative Commons 3.0 by license through data.gov.nz among other channels. So this is possible because it's Crown Copyright. The library is owning its metadata and it means that it's available for commercial and non-commercial use. Okay, so you didn't all break into Spontane's chairs, which is really disappointing. Yes, this is my nephew. This was just supposed to be a light reference to standards and how people turn off standards, but they're actually really important. So that's my nephew, Ali, at the Mathematics Museum in New York. We'll just move right along. So this is, I don't at all disintermediating our current channels of engaging with researchers. This will complement the work that we do through our current library site, IMU, and with our partners NDHA and Digital New Zealand. So this is the IMU site. It still looks like this. The raw data underneath is very much improved and will be encoded in a standard ice way. Sorry, I seem to have lost. I had a slide, I thought, for the EAC homepage, which you'll have to go and look up. The second data set is for Alexander Turnbull Library Names which means the authorised terms for corporate bodies such as Hannah's shoes for persons like Dame Te Whina Cooper and others such as the Te Faitis of Wairarapa, who we have indexed as associated in some way with the unpublished collections. So this is quite specific. It's not addiction to New Zealand biography and it's not every name in the collections, alas. Sometimes we just haven't had the resource. But AMD is iterative and we can always do more. So this data set complies with the international standard encoded archival context, corporate persons and families, which is a lovely little standard developed, well, released in 2012 which allows for preferred and alternate terms, dates of existence and relationships both to other entities to places and to collection materials. There were a few challenges as with the AD. The Taupui Names hadn't specified an entity type which is required for the standard. So we had to during migration try and find rules that would enable the new code to know whether something was a corporate body or a person or a family. So there was some low-hanging fruit there if it had the word limited in it, then we put it in a corporate body. We were able to go from quarter of a million files with error messages after. And we've been working through those ever since. We're now down to about the last 4,000. So secondly, some records had material in them which was called extended biography and considered sensitive. So we just left that field out. And thirdly, there were values allowed for the role which the library had been far more granular about than the standard as creators or subjects, whereas we've got architects and illustrators and contributors and so forth. Again, once we're in the position now where we've got a data set, we can improve it. So this is also available through monthly refresh with a CC 3.0 by license. The fourth standard I want to mention is the character set in which we've specified that the data sets will be delivered. Tapaoi couldn't support any critics whatsoever. Well, it couldn't support any critics actually, but it couldn't support any critics either. And the library of course is keen to demonstrate or respect both for the original materials and the people involved with them and for our researchers. In particular, the Turnbull holds arguably the largest collection of manuscript material in Te Reo Māori in the world. And in Māori, the lengthened vowel indicated conventionally now by a macron changes the meaning. So we required to be able to use it for data entry, for storage, for printing, for searching. So the standard that we've gone for here is Unicode, which allows for the macron course, but also for the Pacific Reverse Depostrophe, the German Onlaut, the French Accute, etc. So again, saying the system must support Unicode rather than the system must be able to present macrons was another of those occasions when using a standard enabled the library to save time and effort and achieve the best result for our researchers. So we're opening up a new channel for our researchers. We hope it will meet immediate approval and use. In fact, one of the most exciting moments of the migration came immediately after we'd launched the new IMU site when we heard serendipitously that there'd been two attempts to scrape it already. The person who told us was a bit worried about this. We were all delighted and wanted to know who they were and go and shake their hands. Unfortunately, they couldn't be successful at that point, but now we've got a different story. So we also know that we'll be starting to satisfy comments like those from Digital Humanities scholars Melinda Johnston and Thomas Quince who I've got a article abstract up for here and they say 21st century researchers are applying new methods of computational analysis such as topic modelling to data retrieved from library catalogs. Consequently, libraries should understand the way in which their metadata delivery decisions impact on the potential use of the collection. That's really what we're doing. So we're about to release two data sets to international standards that cap five years of sustained effort and bring the library into this century. They provide the evidence of meeting the standards which were key to specifying, achieving and measuring a successful IT project and they set the library up for coaching and contributing to future good practice. So we're right on the threshold. We're looking forward to seeing both what people will make of our data sets and whether there are opportunities for new collaborations. Does anybody know where this place is? The Brouge brewery for a beer called Zog. I put it in because it seemed like a nice analogy. This is a brewery in the middle of a cobblestone historic district. It's been there since 1600 and something or other. And they had a problem of supply and they had a problem of their trucks tuning up the cobblestones, I think. Anyway, what they did was build two miles of pipeline under the streets of Brouge in order to get their beer out to a place where it will be bottled. And the analogy is to bulk distribution of our data sets. We're kind of moving from retail to wholesale, I hope. And also beer is good. So this is me congratulating Alexander Turnbull Library and telling him that we've got our metadata in a good state. And also letting everybody know that we've got a 100th year anniversary coming up in 2018. So if people have got ideas about how we could celebrate that using our new data sets, that would be absolutely fantastic. Oh, there are the rest of my slides. Ha. Okay. Anyway, I hope that some of you will want to immediately want to play with our data sets. And if you do, then get in touch with me. I've got an FTP site. I've got a password. I'm not afraid to use it. And I hope some of you will join us for the long view. Thank you.