 I'm pleased to introduce Adam Moriarty. He is the Digital Collections Information Manager at Auckland War Memorial Museum and he has recently worked to release all of the museum's collections online as linked open data. As a colleague of Adam's I can testify to his excitement and enthusiasm for everything linked open data and he's excellent to work with. So he recently had a fantastic, the opportunity to have a fantastic opportunity provided by Lianza and the National Library of New Zealand which was to attend the Paul Reynolds Scholarship and Award in honour of the inspirational internet pioneer and media commentator. This scholarship is offered every two years, is worth a minimum of $5,000 and enables the successful applicant to research or develop specialist digital knowledge or experience at an overseas institution. So I'll hand over to Adam. Kia ora. Good afternoon. Thank you so much for coming along and linked open data man. It's pretty sexy. I'm glad you all came in to have a listen. It's a tough sell. As Claire just said, I'm Digital Collections Information Manager at Auckland War Memorial Museum and this year's recipient of the Paul Reynolds Scholarship. So I do, it's already been said but I do just really wanted to say thank you to Lianza and the National Digital Forum and the friends of Paul Reynolds for selecting me for this scholarship. I never worked or met Paul but many of my colleagues did and it really is a privilege and an honour to have been the recipient of this award. So yeah, just thank you for that. Just really quickly, what I did is I went over to the UK to sort of nerd out with some linked data specialists out at the British Museum but I'll get into that in a second. Bit of background about me. Why did I decide to do this? Well as just mentioned, 12 months ago at the last NDF we talked about our new collections online. So as Digital Information Manager, I look after collections online and the team that manages online centre. Both of those databases we published, refreshed and put online last year and did one million records as linked open data, free, open and downloadable for anyone. And the sort of underpinning rule of these systems and a slide I use quite a lot is that these systems that they're open, we're open as a rule, we're closed by exception and we have one collection not many. And so for this to work we needed a technology that allowed us to put all of our stuff out there and make it free and easy to use and so we decided on linked open data and you can go and grab all of our stuff from api.auklamuseum.com. I promise that's the last plug for this piece of work I'll do. For those who are confused and say, what is linked open data? You're in this session, it's too embarrassing to leave now. Just boiling it down, it's really hard to get the elevator pitch for this, it's really hard. But really it's just about making the connections between our different shared cultural collections that are in institutions. It's about making the connections between all the things that we hold. And ultimately it's about enabling large scale research and collaboration and aggregation of all of these different data sets. Again that's still quite wordy so I tried to boil it down again and basically it's just about putting your stuff online in a standard open format. It really is that simple and you do this through four simple rules and these are quite, these are well known rules, the rules of linked data. Everything has a unique identifier, a unique resource identifier. You are alright. I'll do that for everything. With those URIs you're going to want to put them online at HTTP so that people can then go to them and find out more information about your resource and again use some standards so when they go there they know what they're looking at. And it's right at the front of that thing, linked open data that includes some links to other things. Let's make this useful by connecting out. And if you're talking about linked open data you're morally obligated to have a diagram like this. It comes in the pack when you sign up. And really it's about the idea that something like the Kiwi Cloak in the corner is related to the ornithological collection with the Kiwi which was collected by a field collector who was a soldier and the soldier has the medal on his book. And all of a sudden we can start making all of these connections between our collections and start navigating through and finding all these hidden meanings we didn't know about. So that's just the simplest I could do for linked data. So my trip started off with going to the museums and the web conference in LA and again lots of linked data nerds there for me to hang out with including going on a trip to the Getty who have released all of their data from the Getty vocabs as a format that can be linked and shared. I then went over to the UK and spent most of my time out of a research based project which is based out of the British Museum but also spent the time to go off to the Wellcome Trust British Library Museum of London Tate and Science Museum. I think it's just really worth pointing out just what an amazing sector we work in because everyone I met at every one of these institutions was incredibly open sharing and it just reminds me what a fantastic sector we work in because everyone was awesome. And I have to admit writing this presentation was one of the hardest ones I've had to do in recent times because my first version which was up until yesterday was actually just a bunch load of photos of me standing outside different organisations and shaking hands with people from these places and it was just going to be holiday snaps for about 20 minutes. It got really good because my son who was six months at the time came and joined me at the end and he was cute and although it would have been fantastic for me, you guys wouldn't have got anything out of it. So what I've decided to do is just boil it down to a couple of pieces of software and tools that I got shown and I think really helped that first step into link data and a couple of examples of some really great projects that are doing this and then sort of what I think the future of this is. And to do that I came up with a couple of questions that I was asking people as I went and the first one, is it meant to be this hard? It's really complex. Link data is hard because you're swapping the way you think. You're going from thinking in tables and rows and columns and cells to all of a sudden you're thinking in a graph database, a network in which everything's interconnected. It kind of looks like this. At the beginning it was like asking a mathematician to solve a problem by writing a novel because all of a sudden you're going from these kind of cells to subjects, predicates and objects, sentences. You're writing a language, you have to take your records and weave them into this language and it just melts your brain. Especially when you think of the number of connections that can be made and we really struggled to get this. And to then explain it to our internal stakeholders and our staff members that why it was important to put all these ridiculous links in because the more links we have the more powerful it is. There were days when we were making these and I just walked out the door so my question is why is it hard? Does everyone find it this hard? Am I just like the stupidest person in the room? And then the next question I was asking everyone following up on that is what is the magic? We were all promised if we put our stuff in as linked open data. We solved everything. The magic was just going to happen. You put your stuff out as a URI, boom, the world's connected, no need to catalogue, don't clean your data, you're going to be fine. You can imagine my current state of disappointment. Where is this magic? And the third question I was going with is can dodgy data be linked data? Anyone pretending they don't know what I mean by dodgy data? I'm talking about the unknown, unknown records and the duplicates. We have 14 ways of saying Auckland, what we call them, we had, we've cleaned them up. We have 14 ways of saying Auckland in our database and Parnell was listed as a continent. It says something about our catalogue. And so we've got all this and we're just, we're publishing it out there as linked data. Is that a problem? And that was sort of the question I was going out with. So, yeah, I got to the British Museum and met up with the research space team. So research space is a Melon funded project. They are the team who have been working to map all three million objects from the British Museum as linked open data. They're also now working on aggregating data from the Rites Museum and the Yale Institute of British Art into a combined search that allows you to see those connections and navigate through them. And just some of honestly the most passionate and amazing people working there and it's like a team of four, which always blows my mind. And then sitting next to them although I didn't spend too much time with them in the same office was the Gravitate Project which is looking at using linked data to connect different archaeological resources. So different archaeological assemblages in different museums trying to reconnect them especially if you're talking about one object that's been broken up into multiple pieces and they're scattered around different museums and how it's a really great use of linked data and finding those connections and then bringing them together in a resource that people can use. So I went to these guys said, is it meant to be this difficult? No. It's not meant to be this difficult. It's meant to be easy. And so they showed me a tool, first tool and I think if you're into this you should go and get... Has anyone heard of the 3M mapping tool? Yeah, it is new. Really horrible link, just Google, you're going to get it. Essentially what this is is an online tool that allows you to upload an XML from your source system. So for us Vernon, you upload and it just provides drop-down mapping guides for you. So quite simply, we import a primary production place which is from our Vernon system and it tells us well that that's probably produced by and if you're using produced by you have to use production and if you're using production you have to use took place. It sort of walks you through the steps. It builds those relationships for you. It helps you encode the semantic value and helps you sort of create those triples of the subject, predicates and objects. And because these drop-downs kind of update as you're moving... Hello Vernon. Oh, drop-down. It sort of guides you, make sure you can't mess up. So make sure your data is meeting the standard because I think one of the big problems is this standard is so complex but you want your data to comply. It's very easy to kind of get out of whack and maybe just decide how this is too hard. I'll just, we'll skip that step and we'll just do. So this helps ensure that your data meets the right standard. The other thing it allows you to do, and I haven't got a screenshot because it's really boring, but it helps with that URI generation. So if you don't have a way of publishing your data with a unique resource identifier, if you don't have a way of doing that, it helps create them for you and provides the tools to export them online. And actually I think the best thing about this is it encourages that collaboration between teams. So you're going to need your IT team to help you with this because mapping, you're putting stuff on your website, you're going to need some kind of programmers and developers helping. But then you, the subject matter experts, your curators, collection managers, they have all that knowledge of the mappings. So my role when I was doing this at Auckland Museum is I was kind of like the translator. Half the time I was speaking to our developers, trying to understand what they wanted and the other time I was talking to the collection managers and curators understanding the mappings and I was trying to sort of translate everyone's needs and hopefully doing a half decent job. With a tool like this, you allow your collection managers and curators to go in there, do the mappings. They get to stay in their area of expertise and the systems creating the links, it's creating the files and the formats that your development teams can then put online. It's kind of really helping bridge that gap and actually makes it a lot less scary. If you don't like the 3M tool, you can try the Karma tool. I won't go into this one because it's exactly the same, but maybe slightly nicer interface. Again, it's all about taking that really complex mapping and turning it into a workflow that's easy and in this case quite visual because it allows you to see record by record how it's been mapped and again, it's creating those outward links for you. One of the best features of the Karma is actually it does have some data cleaning ability in there and also it helps create links to external sources. The real tool that, I guess, my big takeaway from the trip, which is a bit of a silly one really, is anyone here used OpenRefine? Yay. If you haven't used OpenRefine, this is about to rock your world because it's awesome. OpenRefine used to be owned by Google. They got bored and made it open source. They used to be called Google Refine. Essentially, you can upload CSVs and Excel files from your source systems and it can do some really cool cleaning with it. It's one of those tools we've been using for ages, but we'd only ever scratched the surface of what it could do. We used to use it for things like this. You throw in, this is some objects from, you're nodding like you've got this from, you throw in some data from your CSV files or from your source systems and it does some key collision matching and finds all the near-miss duplicates. This is from our Pacific Cultures collection and it's found five variations of how we describe model canoes, same with ads and hair ornaments. It allows you to really quickly, you just select the one it's meant to be and then it cleans up all your data for you in an automated fashion. It takes hours, it can now be done in a couple of minutes. We're currently running through all of our subject, place and person records using this tool to clean up and standardize them. And so that's, we've been using it and this is the only bit we've been using. And while I was overseas, they showed me the named entity extraction module. Named entity extraction is about going into a free text field, finding key entities such as people, places, names, subjects, extracting that information out and creating links outward to, in this case, DBpedia. This is kind of the magic we were promised right at the start. So it just requires the right tools to get it working. So in this case, the souvenir program from the Armistice Day doing 20 years after. The named entity extraction has pulled out the souvenir program as a thing. Armistice Day is a date. The play or the concert 20 years after Auckland and Armistice. Each one of those is actually a link to DBpedia and in DBpedia we then have a link outwards which we can also, of course, bring the data in and enrich our own collections. It's one of those tools, I guess, just to cover my back. I wouldn't ever run this across everything automatically. We've been running it in chunks over different data sets so you can kind of check what information has been linked outwards. So it's a technology data cleaning and it also helps with getting some of that magic and putting it out and putting the links in. The other the other little tool it has in there is the ability to connect to simple URL URLs and APIs. And so we've just run a project to use the Digital New Zealand Concept API which is a fantastic resource that has all of the New Zealand based artists that creates a unique identifier for them and then creates links within the Digital New Zealand data sets. We're able to run all of our data against this and make those connections automatically for us. And then, of course, once we've made the collection, we were able to pull in birth dates, death dates and spelling variations and again enrich our data. It was a really simple add-on. We just ran it overnight and then we had 10,000 new links and new enriched data. It was a wonderful experience. So going back to those three questions, again, is it meant to be this difficult? No. There's tools we should be using that are a bit stupid. I didn't and they're really easy to use and they really make it actually quite an enjoyable experience. I think we did the Auckland Museum mappings in the 3M system in like a day where we'd taken weeks to do it before. The magic, well, you're still going to have to do some work but again there are tools there that are going to help you create those outward links and can dodgy data be linked data? Well, just because your data's structured doesn't mean it's right. So you can still put it out there but obviously, again, there's tools that are making this so simple to clean up and create unique identifiers for your content. But I guess the real question I went and asked people, because these are kind of cool questions, was is it working? Is it worth it? I mean, we're putting all this effort into doing it. Is it something we should carry on doing? This was the very last question so they didn't hate me as I left. And I wouldn't be standing here if the answer wasn't yes, would it? And so I got reminded while I was over by the team at the British Museum, one billion websites 60 trillion web pages and 25% of them have some form of structured data and by structured data we're talking just either schema.org which is a really simple broad ontology. We have unique identifiers. Many of the institutions already in New Zealand are using unique identifiers and it's not new technology. We've been doing this for a long time now and so we're slowly getting a critical mass of information that we can query over. The example that kept on coming up from outside the sector was the BBC and the BBC Things. Again, they've been doing this for about 10 years. Everything on the BBC website has a unique identify, everything has one which means that every news article, sports reference, recipe, TV show, radio show, magazine, all the information in there is made up of linked data and you can find the connections between all of those different things. What's also impressive is that something like here Auckland, they're including links outwards to other websites such as GeoNames and DBpedia and they're encouraging their staff to enrich those databases. So instead of just enriching their own little amount of data, they're enriching it for everyone to reuse and constantly updating. For the GLAM sector, but still the BBC, this is the RES project which is the Research Education S and SPACE and this is a BBC project to link up all of the large British organisations that are using linked data, the British Museum, the Library, the Welcome Trust, the Science Museum collate all of their data and then combine it and link it with the BBC archives and then publish that as a tool that can go to educators. So what does this mean? It means you can do a query showing me everything relating to Scene 2 Act 1 of Hamlet and you're going to get all the references to that from the British Library as well as every time that scene has been played in a British TV show or on the radio and all of a sudden you get this huge wealth of information and are able to provide that to the kids so that they can start exploring it. It's a really sort of unfortunately it's only in the UK but it's a really great example of how we can combine multiple sources combine it with a third party almost commercial partner and then provide a really rich experience. Also while I was in the US I met with the guys from the American Art Collaborative which is 14 museums including the Smithsonian who have all agreed to release their data as link data and again to work together to find the common connections between their collections. It's again a fact you can see that we're starting to get build we're starting to get a critical mass of people doing this and soon we'll be able to connect and really find the relationships. And again I'm just going to rattle off a few more web pages and the Getty as I mentioned before, all of their vocabs available as linked open data and a really great starting point for most museums because we're probably already using something very similar we're, as I mentioned earlier, we've got a project right now, a six month project to clean up all of our place data and link to this. It means that we can now do queries such as show me all the objects that come from a town that's near a mountain and we, I don't know why you don't want to do that but it means that we can pull that information, it's enriched and we have to have that in our data because we're pulling all that geographic data from the Getty pulling into our searches enriching them and able to do some really smart queries over our data. And I think last but not least I think it's really worth mentioning the Digital New Zealand concepts a couple of times you've been mentioned today I think because it's just a great resource and it's that move from an aggregator of just collecting lots of things to now making the connections and finding all of the connections that exist within Digital New Zealand we've been using it to really enrich and improve our people our person data set and it's just a fantastic resource everything these guys do is awesome so good on you and the future I guess is sort of thinking what is it worth it that's still a big question I think really we need to remember that everything we do, everything we deal with is already linked our culture is already cross linked it's interdisciplinary, it's cross institutional it's multilingual so it's already a network of information it's a living growing resource and we shouldn't have to squeeze it into a predefined box of rows tables and columns and at least link data as a technology allows it to exist as a network as a graph that can grow and expand it also allows us all to keep our own language and perspectives in world view but at the same time we're harmonizing all of that data as a single integrated resource that we can now query over and reason over but really actually this data harmonization and doing all this isn't just about making links to things that's kind of just what we've got to do really it's about exploring, discovering and inferring new knowledge from our collections I mean interesting data has relationships and the more of us that get on board and start doing this the more powerful and the more interesting our data will become we can start sort of, well I like to think we can start breaking down the institutional silos that we've built up and start finding the stories and narratives that weave between all of our collections we can start creating an environment for thought provoking and engaging exploration and I hope at the end anyway we'll be able to start asking the broader questions of our collections that are going to lead us to more interesting, more valuable and I think hopefully more exciting results and that's that's me if you really want to nerd out with some questions come on afterwards I'm happy to show all the tools please no, please no no Adrian hi Adrian have you sell something like this internally as a big piece of work I think it's the, for us we were able to take, we had 22 individual collections each with their own standard and way of doing things and so just internally this has helped us map between all of those collections we were able to, once we got all of that stuff into a data set, it was the first time we could kind of look down on everything and we could say actually there's this field collector we have his uniform, we have his diary we have the collection he found and we have his photographic stuff and that was the first time we could actually do that and that's the story just from our own data and then being able to say well now let's imagine if we had Tapapa's data and we had the Smithsonian's we can really start building those stories and those narratives let's have any examples and they say most people walk in and they say oh no link data here we go but we can, we've been able to try and tell those stories, it gets people excited and then they start wanting to jump in themselves hello anyone how do you like do the limiting because if you started linking up with all the music in the world and someone does a search and gets overwhelmed with stuff they're just going to turn off and say oh I don't want to do this anymore how do you stop that happening where people will just turn off do it once and oh it's won't the question is if what happens when we have too much well I guess the search is too large is that the way the mappings work and the way that the complexity of that network is you're really adding semantic data, you're encoding the meaning of your data in all of the in the search values so it means that when you're searching for Auckland you can search are you talking about made in Auckland or born in Auckland or found in Auckland and so you can start actually doing some really complex searches on the meaning of the data instead of just doing a keyword search Auckland is that kind of what you mean which don't we have to agree on what Auckland is and what if Getty doesn't have the same definition of Auckland or more like a small thing like does that mean we end up having globally our global places that everybody agrees on or that's the one thing I don't get I can see the internal use we had the so the argument about two weeks ago because the way that Getty defines something different between nation and country and internally it was different to what we thought a nation and a country would be I guess in this case we're just accepting that that's the standard we're going to use but I guess you can also do same as links and so we've also pulled in the LINZ database and just being able to make as many connections as possible we're not trying to just limit our focus on the Getty but we have that argument and in the end we just sort of said we're just going to agree with the Getty because they've already had the argument obviously we'll agree with them thank you