 So hi, my name is Chris and I work at Digital New Zealand. We harvest metadata from nearly 150 institutions. We map it to a common schema that anyone can access through an API and we do this to make New Zealand's digital materials easier to find, share and use. Today I'm going to talk to you about some research and development work that we've been doing and then test an idea on you. This is... not my Mac. There we go. So link data. Link data is an amazing yet elusive idea. It extends the conventional web by providing a means to identify and refer to specific entities or concepts and it supplies a way to describe how those things are related. Together the entities, concepts and relationships constitute a set of assertions about how we think the world is. A set of assertions that can be queried, analysed, extended, shared and visualised. So, for example, our collections might have a person entity, Colin McCann, a place entity, Timaru, and a subject concept of say modern painter. Link data enables you to encode these little triplish statements like Colin McCann was born in Timaru or Colin McCann was a modern painter that can then slot into a much larger data network of assertions that spans institutions around the world. A network where you can ask questions like, show me all of the 20th century painters who were born near Timaru or who were McCann's contemporaries and the chronology of their paintings. So, folk have been talking about variants of link data and it's close cousin, the semantic web, for a couple of decades now. And it turns out to be a wee bit tricky to build. Our respective institutions structure data in different ways. This diversity complicates the act of forging links between entities and items. The National Library of New Zealand, the Alexander Turnbull Library, Te Papa, Auckland Art Gallery, Christchurch Art Gallery, Victoria University, Otago University and a host of other institutions all hold various works, photographs and writings relating to the artist Colin McCann. Although each institution is generally internally consistent in how they describe people, the metadata standards we follow and the data structures that we employ differ from one another. And so, although it's easy for a human to recognise that Colin McCann is the same as McCann, Colin is the same as McCann, John, Colin, 19-1987 those subtle distinctions are tricky to teach a computer. Perhaps this is the heart of the problem. So some time ago, Courtney Johnston made a Digital New Zealand set where she organised Colin McCann's work into chronological order. Her sets pulled paintings and sketches from all over the show, collections up and down the country, and she presents them together as a rough progression. I imagine it took her hours. She's created a handful of similar sets for other artists such as Ann Noble and Fiona Partington. The end result is that wonderful combination of something that is at once beautiful and useful. However, it is unsustainable and it doesn't scale. Right now, in this moment, I would like to see the evolution of Gordon Walters' paintings. I would also like to see, I don't know, Robin Morrison or John Pascoe or Marty Friedlander's photographs and how they developed over time. We collectively have that data in order to do this, but it's just, well, it seems really tricky. New Zealand may be small, but we've collected and digitised a fair amount of stuff. Truckload after metaphorical truckload of photographs, artworks, manuscripts, maps, articles, documents, audio recordings, films, ephemera and more. It is astonishing when you pause to think about it. And those things, those millions of works and commentaries and records, they refer to tens of thousands of places, hundreds of thousands of people and countless events both large and small in our nation's history. The good news is that generations of thoughtful, dedicated catalogers, curators and researchers have spent years describing this stuff. The tricky bit is that we employ a diverse range of practices, follow different standards and make use of a multitude of metatata schemes. Linking our data and our metadata together requires us to overcome these differences and that's what I've been exploring. So before the final part of the talk, or I'm just going to kind of just totally geek out on you, I'm going to note four underlying principles that inform this work and shaped my thinking. And I raised them explicitly in order to test them. So please let me know if any of these premises are either incorrect or irrelevant. The first, our collective interests overlap. So we, the nation's memory institutions and exhibition spaces, we share much in common with one another. There are differences, and trust me I'll get to those in a moment, but there is tremendous overlap in our concerns and in our interests. Each of our institutions hold only a partial account of Aotearoa's histories, contemporary states and possible futures. The works and writings and records relating to specific people and places and events are really conveniently housed in neat institutional buckets. The reality of the situation is that we are the collective stewards for a towering, glorious, densely connected tangle. And yet, the second principle is that our institutions, collections and practices are incredibly diverse. Just a few of the ways that we differ include our descriptive practices, the technical systems we use to manage our data, our metadata formats, where our staff's expertise lie, our institutional histories, our responsibilities and governance structures, the audiences that we serve and the budgets that we operate within. This heterogeneity is inevitable and probably also desirable, but it poses significant challenges to linking data because it takes a lot of work for dissimilar collections to be able to talk to one another, which leads to the third principle. We do stuff differently. Sometimes we've been doing stuff differently for a very long time indeed. We've built up particular sets of workflows, standards, expertise and technologies, investing considerable time, money along the way. I've been involved in a few workshops, both here and abroad lately, where folk discussed what it might mean to align their practices. And I've got to tell you, it sounds really hard and like it will cost a lot of money. I do think that these, how do we do it? This is the way that we do it. How is that different from the way that you do it? I think that those conversations are useful and enlightening and important. But from what I see, the metadata singularity is a long way off. This final premise is a volunteer quote that my colleague Michael Iscarides introduced me to. It's almost certain that there will be incorrect data and flawed methodology in what I'll round off by showing you. However, I think that we need to start somewhere. And I also think that we need to do it ourselves and not leave it in the hands of say, me and VF and Orchid and just trust it to other people to do it. So what I'm going to show you is I'm sharing something I guess as both a straw man and as scaffolding. A straw man because I think that we need a first attempt to examine, to consider and to debate in order to work out what this should and shouldn't be. And scaffolding because I believe that in order to actually realise what that linked data dream, we may need a temporary structure that provides an initial frame to build things off that we can safely dispose of when it no longer serves a useful purpose. So here's the experiment. I approached staff from four quite different institutions and asked them if I could have a crack at linking their person authority metadata together. I got 146,000 names from the Alexander Turnbull Library. So both people and corporate entities and others. Over 5,000 people, all the rest of people, 5,000 people from Te Papa, 2,700 artists from the Auckland Art Gallery and 2,100 person entities from Tiara. So let's just trace some of the data through the workflow that I set up. So the first thing that I did was I just pulled it all in. I just pulled in all of those thousands of names. And here you can see the different ways in which the institution structure them. So you've got your Colin McCann, your McCann, Colin John, McCann, Colin John, 1919-1987. I took those names and I tried to put them into a standard format. There are edge cases here. I realised that this is a very western take-on names and people and I would love to talk about that around the coffee table. But again, I'm just trying to get something to remove the dates and handle those elsewhere. You'll see those later. I then broke all of the parts of the words up into different parts into different little bits. So we've got Colin McCann, Colin John McCann. And then, and this is quite a simplification. I just chucked them all together into a big, big document store and started trying to match them. And the way that I matched them was I first matched on first names and tried to find all of the first names that were either identical or roughly similar. So stuff that was sort of close enough doing some fuzzy string matching. I then tried to find all of the final parts of a name that, again, were also either identical or very, very close indeed. I threw away all of the stuff where it didn't match and then looked for the stuff where both the first name and the final name matched to narrow it down and ended up with what I called my weak matches. I then found strong matches by comparing birth dates and death dates. And so a strong match is all that we're going to look at from here on. I think that there's potential to work with the weak matches, particularly with crowd sourcing. So further refinement by birth and death date. So a coincidence matrix of the matches across institutions. So we'll conduct gallery to Te Ara. Te Ara, there were 69 matches. Strong matches. 875 to Papa and so on. I find that the coincidence matrix is a little hard to read. So I turned it into a chord diagram and a chord diagram it's a bit unusual and I wonder if Mike Dickerson will like this. I've been watching his Twitter stream today. I might be in trouble. But a chord diagram, what it does is it divides the perimeter of a circle up into parts. We have and then it connects across to show linkages between the entities. So you can see actually the largest commonality is between Te Ara and the Alexander Turnbull Library where there were 1,601 strong matches and only a few very small number. I think 69 and 179 between Te Ara and the Auckland Art Gallery and Te Ara and to Papa respectively. Lots of crossover between the others though. So I did all of this matching and then I thought what would it look like for Digital New Zealand to have an authority API. So again this is all just R&D. This is not something that we're launching on Tuesday or anything like this. And I'm sorry I don't have a demo. My laptop did what Sonya's did before so I'm not as brave as her. So what I've done is I set up a simple API with HTML and JSON views and I think we should probably do XMLRDF of each person and you can imagine extending it to places and events. And so that gives you an idea of what the raw data for Colin McCann looks like is a JSON view. I used the schema.org vocabulary with a few extensions because you always have to extend these vocabularies. But then I was also thinking about you folk. And I was thinking about wouldn't it be useful to actually not just have one place where you have your URL but that you can also use your own institutional languages to look these things up. So when Te Papa supplied them all to me the Colin McCann entity is party number 1502. When Auckland Art Gallery supplied it to me their institutional ID is the Intigid 867. Turnbull for example myRN number 18245. And the idea is that an institution can look up using their own language and get back how all of the other institutions talk about this thing. And this is what I was hoping to demo. I can show it to anyone. I'll be here for the next two days who'd like to see it. But I built a web application with all of this stuff. So this is an index of the matches. And I then linked it across to the Digital New Zealand API. So what we have here is a page of Colin McCann and you can see that there are the four matches at the top, the four different authority matches. And then I've gone into the Digital New Zealand API and used the language of each institution to look up the creator and then assembling it in chronological order. So this is a chronology of McCann's paintings. It's just automatically generated by the system. We can do this. It's really quite neat. And then you can just pick another artist. So Peter Perry's photography, for example, from the 70s on. It's splicing together on the fly to Papua and Auckland Art Gallery in chronological order. So you can even see the double ups, for example, with it held in both institutions. So what I see this as being, if we were to do this, would be almost not acting as some sort of master or ultimate authority on this stuff, but much more like a telephone exchange where we're saying when you talk about this, they're talking about that and sort of just connecting people from one authority set to another. There's a lot of open questions here, and this is really just the start of a discussion. But the question I really have for you is is this worthwhile and is this something that we should look at doing? To date, this has just been what sometimes called at work one of a train project. We're basically doing it to and from work on the train. That's my train office. It's really great. Except when they change the trains and I don't like the new trains. I like the old ones. But to date, that's as far as we've got from really just doing it in bits of spare time and bits of kind of when things aren't burning down, I might spend half an hour on this thing. So if you think this is worthwhile or interesting, or if you think this is pointless and no good, then please approach us and talk to us about it. I'll put up a blog post next week and I'm going to try and circulate a white paper on this soon. So thank you very much.