 Okay, so, a very warm and supportive welcome to Connell Toy, who is going to be talking about Oceana, I don't know whether you pronounce the dot or not digital. All right, let's do that. Oceana.digital using digital New Zealand and Trove in the linked open data context. Okay, I'll just move this presentation over there. Where's my mouse? Okay, that's not what I wanted to say. Okay, so you're going to see two slides at once. So that's why you get to see what's coming up and reduces the element of surprise slightly, but that's okay. Okay, when the first long, clear whistling notes of the Pipifara Aorua, the Shining Kuku, were heard in the groves in the spring of the year, the Māori said to each other, listen, the messenger of summer, the bird of Hauaiki, Christ, koia, koia, dig away, it is time to plant the kumara. This was in the month of October or early in November when the little Shining Kuku landed on these shores after its long flight from the South Sea Isles. So when the people heard that the owner of a cultivation was going to plant largely in kumara, say three or four acres, they would come in a body to help him with his planting. 20 or 30 or perhaps 40 men would come to core the ground and with them a number of women and children to plant the seed tubers and to look on. The core was about seven feet long, sharp at the digging end with a step or rest for the foot, about a foot from the point. When the people gathered at the mara, the cultivation plot, the priest karakir'd the field, chanting his incantations to maru and the gods of the kumara. Then the head man of the visitors, taking his stand in one of the corners of the mara, cried in a loud chanting voice, tine au e tune, me taku core i tokurenga. Ke fe te tangata nara te mara, haere mai nei? Here I stand, my core and my hand. Where is the man who owns this cultivation? Let him come here. Then approached the owner of the field and he showed the digging party where he wished them to work. So these extracts from James Cowan's book The Mara Yesterday and Today are for me a metaphor for the challenge we face in realising value from our digital collections. The collecting institutions are the owners of these fields. They must rely on the work of others to bring those digital resources, little fragments of sweet potato, to life and to realise their full value. As a technologist, I see the challenge for me is to champion technologies that can enable that kind of wide collaboration. Tools which our visitors can use and our partners can use to chip in and do their bit to grow our shared cultural heritage. What is the equivalent of the core for cultural heritage technology in the 21st century and collaboration? So me, about me, I'm an independent software developer and IT consultant. I'm originally from New Zealand, but I'm now based in Brisbane, Australia. And for work I help digital humanities researchers and people in the cultural heritage sector unlocking value from their collections, metadata, transcriptions and so on. I help to make those collections fit for new purposes by automating conversion and enhancement, data mining, analysis, visualisation. I've been doing this kind of work for about 15 years. And I see myself as this person with the core in their hand. For me, this is my computer. And the work that I try to do is to bring that metadata to life and allow other people to contribute to it and to weave it into a much larger and greater whole that we can all appreciate in our own way. So apart from my work for paying clients, I have a personal project which is called Oceania Digital and this is what I'm presenting about today. So onto the next slide. This is partly as a vehicle for me to explore technical challenges and to try out ideas, a kind of a sandbox if you like for my own work to learn and to play with technologies that are new. And it's kind of a testing ground, but it's also, and I hope increasingly, I hope to offer it as a kind of a public service that people can use, a real production service that will offer people some real value and to which they can contribute. So that's where I'm going with Oceania Digital. And so although it's really me, I have some other people who are interested in it. If anyone else is interested in contributing to it in any way, I'd be really grateful if you would talk to me about it because I certainly don't want it to be my personal hobby. It's really something that I hope to interest the community in more broadly. So my method with Oceania Digital, my core, is linked data as an enabling technology for this kind of collaboration. The linked data buzzword, also known under its other name, resource description framework, or RDF, also known as the semantic web, is at its core, it's a technology for organising knowledge in the form of a network, an interlinked network, or a graph, as mathematicians would call it, a graph. But not a graph in the form of a bar chart or something, but a graph in the sense of a network of nodes connected together by lines. And so this technology of representing knowledge in the form of a network, of representing knowledge in the form of a network, as opposed to traditional forms of organisations which tended to focus on individual information resources more in isolation and describe each one with a richly structured document of metadata record about some resource. But by contrast, the semantic web approach treats those individual resources as nodes, and it describes the relationships that they have with other nodes by connecting lines, or properties, or predicates, as they sometimes call it. So there are pros and cons to each of those approaches, you know, the traditional approach and the semantic web approach, but what I hope to explain today, the point I'm trying to make here, is to explain why the balance is tilting gradually towards using semantic networks, because the continuing exponential growth in actual digital collections makes the advantages of that network-based data structure increasingly valuable. And so I think over time I would expect these kinds of technologies to eat into those practices and to become more generally practised in the cultural heritage sphere. And for instance, Jonathan's talk, just prior to this, using this Fedora-based application, is an example of people using link-open data as an actual technology in their application. The Fedora, the Islandora application, is a link-data store. So why do I say that collaboration with the public is so essential? And I would tell you it's because no collecting institution can tell the full story about their own collection. The curators of these collections can of course tell a story about their collection, but that's necessarily a partial story. It cannot possibly be the full story. They can't tell the full story. What would a full story even be? Because it would mean being able to describe how the items in the collection relate to the lives of the people who are, say, depicted or described in those items, or for whom those information resources were produced or who produced them. But it would also mean describing how those resources, those historical resources, relate to people today, how do people feel about them now, what do they mean to us today. All of those things, all of those relationships are really what brings meaning to those items. And so they can't possibly be a full story. Every story is only one story for each one of those items. So if you look, for instance, at, say, a historical photo, that photo is a witness to the life of everyone who's pictured in that photo. It's a witness to the fashion and the hairstyles of that day. And, you know, it's also a witness to the state of the art in photography, the photographic techniques, the photographic technology that was used to generate those things. Those are all stories of various kinds, different kinds of stories that can be told about that item. And if you like, that item can be woven into a number of different threads that cut across it in various different ways. And this is why it's important to have a technology which enables those different stories to be told and for an individual item to belong, not just to one metadata record, but to be part of a multiplicity of stories because of the multiplicity of connections between those items and the historical context that they were in and that they will be in in future. The challenge for memory institutions is to make public what they do know, the data that they do have, but to do so in a manner which makes it easy to combine with the knowledge of other people and to make best use of that broader context that the rest of society can bring to it, that bloggers and that commentators and that the staff of other cultural heritage institutions are doing to enrich the collections that are published by one institution. So the answer for these collecting institutions is to publish their knowledge not as large blocks of data, but as far as possible to publish it as a network to break those blocks of data up into individual threads, each of them quite small and very simple and regular structure, and then to weave those threads together. And the point of these threads is connecting two points, someone else can add another thread, connect to that point, someone else can add another thread and connect to that point, and so on and so together we can build a collaborative embroidery of knowledge. So to put it another way, the practical reason that link data is so important is that it's an enabling technology for storytelling because each of these items provides factual evidence for the stories that it participates in, but those stories also then provide a context in which those items can actually be understood. The context actually then brings meaning to those items. And link data's key advantage is in how it makes it easy to weave those items into a multiplicity of stories. And so to me this is the reason why in the future as our collections grow and as we have to rely more and more on the public and less and less being able to rely exclusively on the skills and knowledge of collecting institution staff, we have to be able to base this information, these knowledge structures on a format which can be interwoven and which is open to being interconnected. So I'm about halfway through which is great because at this point I'm going to switch to my browser and give you a couple of browser tabs if I can just be with me just a second. And I'm going to tell you a bit about where I'm up to just quickly and some of the challenges, a few of the challenges that I've faced and hopefully then if people have got any questions I'm happy to have you sing them out right now. Sorry about that. Okay, so this is the website of my Oceania Digital. There's not a lot to see at the moment. That's because it's taken me a bit longer than I thought to actually get to a point where you were something useful. Perhaps if you are interested in it, the one thing I'd recommend doing is following this account, Oceania Digital on Twitter because that's where as everything new is introduced it'll be mentioned there first. Or you can contact me by clicking on that contact link too. So this data store, Oceania Digital, it's a website that's running on a virtual machine in the Amazon Cloud in the Sydney data centre of Amazon. And what it contains is several million metadata records which I've harvested from Digital New Zealand and from the Australian equivalent Trove. And I also have some metadata records which are acquired from the Alexander Turnbull Library. So I have these three data sources to begin with and I want to bring together whatever else I can. But starting with those three, it seemed to me like a good place to go because they have already a very broad coverage and really it's a question of taking all the data from the APIs of those two services, harvesting it, converting it into RDF and storing it in my RDF store. And then everything's built on top of that. So what I have, as I said, what I've managed to do so far, there are basically four phases to it. So I'll quickly go through those. First off is acquiring the data and that has been relatively easy and also very hard. The easiest has been getting the data from the Alexander Turnbull Library. The two large data files which I simply downloaded, couldn't have been easier. The second one was Digital New Zealand. Now you can't just download Digital New Zealand's data, sadly. Instead, they do provide an API which allows you to query it and retrieve basically to search it, in fact. And I searched for, in other words, everything, give me everything, I searched for space, I think. What I got back then was about three million records. And you can get, when you send a query, you get back 100 records, then query for the next 100 and so on. So it was about, what is that, 30,000 queries and I got back all those records which I've saved and then converted to RDF and stored in my triple store. So then what you see here is, these are a list of the various properties that were used in those records. You can see everything's got an identifier and a title. If you scroll down to the bottom of the list, these are some properties that are relatively rare. Not many metadata records have holdings or alternate titles and so on. But this is still part, as you can see, this is still where I'm up to, analysing the data that I've got. What have I actually got and looking for the connections that I can make within that data set? Trove was much more difficult. Trove's API was a lot less reliable than Digital New Zealand's and I spent weeks actually going backwards and forwards and working my way through various technical problems and actually getting the data out. So that was phase one, getting the data. Phase two is to convert it to RDF and as you can see, this is a summary really of the data that I have converted. Then thirdly is to find the connections between those different records because there are a lot of connections there that are not made explicit in their data set but which are really there and I can actually find them by looking for things that match, names of people that match, other names of people in other parts of the collection and actually build those relationships and build a network that way. And the final thing then is to connect the system up so that there are user interfaces, search engines and browser plug-ins or what have you that will enable people to actually get some real value out of this knowledge network. And what I have here which I will quickly show you before I'm completely out of time is one of these ones here. Okay, so this is the Auckland Museum site. Now this page on the Auckland Museum site has a metadata record which they contributed to digital NZ. I've harvested that and I've converted it to RDF and entered into Oceania. And if you scroll down, this is not the most exciting demo because really it's a very small thing and the idea is to make it clear what's going on rather than to knock your socks off. But if you scroll down to the bottom you can see these are other links which this is a picture of a core and they've got links to other core on other websites. And if you hover over this you can see a little tooltap pop-up with that label. Now that label was added, this is not part of their website, this is something that my browser plug-in has added here, okay? What's happening is that when this page loads I have a piece of JavaScript embedded in the Auckland Museum website which goes and queries, it looks through all the links on the current page and then it queries Oceania digital. What do you know about this link? About that link, et cetera. And it brings back all the metadata that it knows about those pages including descriptions which are the same descriptions that Pockeariki, for instance, has contributed to digital NZ. So that's it for me. I have basically run through the where I'm up to and I'm happy to take any questions. If anyone's got any technical or tricky questions then you might want to leave them to later. No, no, by all means. So if you've got a question then please put your hand up and we'll bring you a microphone. Stuart. Are there any collections that are in both Trove and Digital New Zealand? Do you think there'd be a lot of Australasian things that would be in both? That's an interesting question and I expect that there probably are. I mean I know for sure that there are plenty of people who are present in both and that goes right back to the early days of colonial settlement that there were people who went backwards and forwards across the Tasman and Trans-Tasmania myself. That's nothing new. So I would certainly expect that I would find plenty of links between the two collections. If you're talking about individual items that might be in both, I think yes, quite likely. I haven't found any yet but that's really because of the hold-up I've had with Trove's data and I hope soon to have Trove's data in there as well and be able to be making some of those connections. Nicola. Kia ora, Kano. Thank you. I'm interested in the Alexander Turnbull Library data sets. There are two, one for the unpublished collections to the encoded archival description standard and one to the people encoded archival contests. So I'm particularly interested in whether your tools will help to disambiguate or pull together the same individual who may be in the different data sets you've got. How far have you got with that sort of work? Well, early days, let's say. Early days. But yes, I appreciate the question. It's a good question and I hope in general that this kind of approach, the knowledge network as a technology will really help with that disambiguation work, authority work because it provides you with a way to link things up, people, let's say a person, to link them up with a whole lot of information about them, which might be biographical, might be pictures of them, et cetera, and to provide you with a whole lot of context that could be used either by a human or by some kind of algorithm for doing those kinds of just support, that kind of matching with some reliability. Connell, I was just wondering if you could give us a peek at the source for that page, the headers in particular. I'm really curious what I'd like to see the technique that you were talking about there. Sure, I can show you the source. The source of this page, I should say, this page is actually the Auckland Museum page and I haven't changed it. OK, so it's the URL. You won't see anything in the URL. What's actually running here, this JavaScript, is this browser plug-in. If I open it, this is it. Qifakateri, which is Maori for navigator. This is... Oh, turn it on. If I open up the dashboard. So this is a bit of JavaScript that I've plugged in by my browser. OK, but it certainly could be attached to their website and I would hope that in future people who, you know, have collection websites would use this kind of JavaScript to enhance their site, you know, and to add extra value to it and allow their site to benefit from the web of knowledge that surrounds it. But basically, what have I got here? Go away. I don't want to go away. OK, OK. But if you have a look, there it is there. So this is the JavaScript code. I'll just quickly scroll through it just to give you an idea of the scope of it. It's not very big. And that's it. So basically, what it's doing is it searches through the current page. It generates a sparkle query, which says, what do you know about these links? And it sends that off and gets back some results and it goes through the page and attaches those descriptions to the various links as to title the pop-up attribute. But, I mean, ideally, what I would like to do next is to have not just setting the title, but to add thumbnail images and to have pop-ups that really have a lot more information, because there's a lot more information in there in the data. And all I'm using is the description at the moment. That's fantastic. Thank you. Thank you very much. Sorry. Is it time? If it's a quick question, yes. I'm not sure. I can't guarantee that. It's really interesting that you've gone with the user script kind of approach to this. Do you see this as being something where the user might select from particular data source? So rather than having a, you know, everyone gets the same data, you can sort of kind of customise what you're getting in some respects. Yes, I do. And I think, I mean, at the moment, the data that's currently in the Oceania Digital Data Store is all data that's come out of the cultural heritage aggregators. But what I imagine over time is that other data would be available through it, you know, from, let's say, from Dbpedia, from Wikipedia, and perhaps from other, you know, from Quake Studies and so on, annotations that other people had contributed. And that's where it might start to grow to become kind of unmanageable. You'd have a whole lot of extra data, and you'd need some way to kind of filter it, and perhaps you'd want to follow certain people and say, yeah, I'm interested in what so-and-so says, or, you know, I'm interested in certain user groups, you know, community, communities who would contribute something. Yeah, and I think it would be a management problem, but it would be a great problem to have, I think. I'm sure there'll be many conversations that continue over the course of the conference. So thank you very much, and thank you for your grace under pressure with our technical fiddliness. So if you're up for unexpected connections, reimagining the 19th century through generative art, then stay in this room. Otherwise, move on to your next session.