 Kia ora koutou. We're going to get started because we've got a lot to fit in this morning. I'm delighted to welcome Tim Sherritt, the Associate Professor of Digital Heritage at the University of Canterbury, a historian. Oh, sorry. University of Canberra. Close. You can see what I did there. Come to New Zealand. MDF is from University of Canterbury. He's a historian, a hacker. He also researches the possibilities and politics of digital cultural collections. He's also a seasoned presenter here at MDF and spoke as a keynote in 2012. It's so great to have you back, Tim. Come on up and tell us about data and stories. Thank you. Thanks, Fi. Of course, again, I said yesterday, I'm glad to be back. I always love coming to MDF. It is my favourite conference around the world. In 2012, it was actually a keynote. I did with Chris McDowell where the MDF organisers challenged us to make things and present them on stage. I did. I made a hacky demo of how narrative text could be enriched using linked open data. It was riffing off some stuff that Chris had done and John Voss, the keynote the previous year, had done around linked open data. That was 2012. I've made several attempts at taking this further in the years since. Most of which I've given up on. The latest version is shaping up okay. I still think it's important. I thought it was a good time to bring it back to where it started here at MDF and talk about where it's at. I suppose this is also my little reminder that not all data has to be big. Not every digital project has to finish in a sprint. It's a story of small, simple technologies and slow-burning passions. Let's just start by stating the obvious. Linked open data is about making connections. There's the connections between things, so relationships between people or between a person and a place, the subject of a photograph, all expressed in a nice, structured, standard form. That's the linked part of linked open data. There's also the connections between the people and organisations that create linked open data and those who use, enrich or explore that data. That's sort of like the open part of linked open data, the share ability. I think it's fair to say that more effort has been put into the connections between things than the connections between projects, users and audiences. Interestingly, both these types of connections are dependent upon context for their full richness and meaning. The fact that two people know each other or that a person was in a particular place at a particular time gains meaning within the context of a specific story or argument or narrative. Likewise, what we share and how we actually share it depends on the sorts of audiences and the collaborators that we want to connect with. It's the specific contextual character of these connections that really interests me. How can we explore linked open data within the context of a particular narrative, a particular story? How can we use linked open data to connect to different audiences to expand the reach of our research and our collections? Rather than large-scale production of linked open data data sets, I'm interested in this small-scale creation and consumption and capturing pockets of interests and expertise in sharing what we know. I'm a historian. My argument is that historians make linked open data all the time. They just don't know it. Think of a normal historical research process. You're gathering information about people and places and events and sources and you're going to great lengths to try and understand and define and document the relationships between all those sorts of things. But then we get to that process which, for some reason, we call writing up. In that process of writing up, that data gets squeezed out. All the richness of those relationships that you've documented. Sort of get chucked out as you gradually build your story. You may glimpse their ghostly remnants in a footnote, but that's about it. In a way, that's a good thing because that's part of the skill of being a historian is being able to take all that mess of data, all those relationships, all those things and create a compelling narrative from that to be able to put that in a form of a story which communicates something significant to your audience. But why can't we have both? Why can't we have the story and the data? And that's really what this is all about. James Minahan was the Melbourne-born son of a Chinese father and white Australian mother. When he was five, James' father took him to China. When he returned to Australia 26 years later, he was arrested and declared a prohibited immigrant under the White Australia policy. But was James really an immigrant under the law? He'd been born in Australia. The case ended up in the High Court of Australia and continues to have resonances today when, of course, we're still having these questions and debates about who belongs. Kate Bagnell, my partner in many things, recently published an account of the James Minahan case in the journal History Australia. But the story is so rich in detail and in order to actually have it accepted in a conventional academic journal, Kate had to cut back on the narrative parts and boost the theory parts in order to get it in. So we're working on another version of this story, a lot of book version, which puts back all of Kate's narrative and adds in much more in terms of people, places, relationships and resources. So let's have a little look. As the theme of this talk is basically why haven't I done more over the years, this is all sort of unfinished, it's very rough demo, but hopefully it gives you a bit of a sense of what we're trying to aim at. You'll see at the bottom of all these pages a little note saying it's going to be published completely in September 2018. We're keeping up there with our total missing of all of our deadlines in this project. Anyway, it doesn't look very exciting. It's basically just text and that's deliberate. You know, I want to focus on the text to be central to the experience. And as you scroll down the page, little things wobble and do stuff. And if you happen to sort of mouse over the various names which you see, you see the little thingies are wobbling over there as well. So you're getting a sense there's a relationship here happening between the text and what else is going on and hopefully it's encouraging you to start to explore. And so if you click on one of those links we get a little box, the same sort of thing, you do the same sort of thing over here as well. And so we've got people and places and eventually if playing around you'll probably notice that some of the things, the colours represent different things so we've got people and we've got a place here different sorts of resources. When you click on a footnote we get a nice footnote, you click on the link and we actually get the resource that's referred to in the footnote. All of these of course link through to more details about those things. So we click on there, we get lots more information about James Minahan. We start to see the sort of detail in terms of relationships within this story and of course you can navigate any of those relationships to see more about that sort of person. We see the resources, the documents that have been used in the story and where they refer to parts of James's case. And you've also got, going back the other way, we came from the narrative through to the data but of course we've got here, you can see where that particular person has mentioned within the narrative text and you can jump back the other way to see where the context of that is. So the aim is to start building up that sort of relationship between the conventional narrative text and the data underneath. That sort of enables you to move fairly easily between the two and to explore the relationships that are there. If you just want to read the text you can just read the text if you want to dive in and follow the relationships you can. Of course there are various ways that you can then start to navigate that so you can just have a list of the various types of things so the various resources that are used in this and again which, so this is a particular document produced by the High Court of Australia in the National Archives and it gives you information about its context in the archive so you can follow to see what file that was in and things like that and look for other things from the National Archives of Australia that are used within this resource. So it's also an interface to the different types of institutions which have contributed material to the resource. So they have fairly conventional sorts of lists but you could also view it other ways so we could just throw everything onto a big wall. So these are all the different types of things which are currently in there the people and the places and the events and everything, still lots to put in. So you can just sort of navigate it that way as well and of course you can just fairly easily throw the places onto a map and see what we've got going on here. Milben, that would be exciting. And the relationships even between the places between the big places and the little places and the containers so yeah, I won't go on. But you understand that you've got that sort of network of entities of people and places which are sitting underneath there and you've got the narrative text sitting on top. Now what's also going on is all that information is being provided out to the world in a nice structured format as linked open data. So as an interface it's just something that people can use but of course for something that is we also want to provide something that computers can understand, can read. And on the bottom of every page the data is all in there so you can click on there it's sort of a little API pretend API in terms of delivering the structured data within that page it's also embedded within the page itself but available as a separate file but you could just put .json on the end of every page every URL and get the .json structured data out of that thing as well. Okay, so that's the little demo. And I won't do it now but if you take one of those pages and feed it to there's a thing called the structured data linter which extracts structured data from a web page so if you feed one of those URLs to the structured data linter you get back a long list of the things and their relationships and their properties just to prove that all that data is actually sitting in just that normal old page. Now obviously yes our September deadline has gone but we're still aiming to get that version in of James Minahan also working on some other things so I'm involved with a project on aviation heritage at the moment and one of our outputs from that will be a logbook relating to the history of civil aviation in Australia and that'll be pretty cool cos it's relationships between places airfields and aeroplanes aeroplanes that have really good identifiers cool so we can start to sort of look at where airplanes were and who owned them and did all that sort of stuff which will be fun. Now since 2012 you know one of the reasons why I haven't done more this is my excuse that's why I'm late with my homework is that I sort of assumed that it was such an obvious thing to do that somebody with more money and better programming skills would actually do something much more exciting and interesting so why should I invest too much time and certainly there have been some interesting developments since 2012 and you've got new platforms for publishing which embed all sorts of new possibilities things like scalar and manifold in particular a Mecca S I mean assume you've all heard of a Mecca but a Mecca S the S for semantic ads is a new version which came out this year and adds a whole lot of great abilities within a Mecca to not just upload items but to define relationships between them using standardised vocabulary so basically to produce linked open data you can get the output out as Jason LD which is a structured way of sharing linked open data the one down the bottom there is really interesting Doc Keely which is basically an editor for creating texts with linked open data so it sort of does some of the things that I want to do but not quite but it's still something to certainly keep an eye on if you're interested in this space really it's not that I want more it's really just that I want less I just want simple and that's been one of the motivating things throughout this and I set myself some basic ground rules back in 2012 around using simple tools and that's still really important to me the fact that you can create linked open data without a triple store and what you're seeing here in this demo is just static HTML pages with a bit of JavaScript bells and whistles on top there's nothing fancy about it so it's really just a case of it's using these particular tools you just basically start with some data and that could come from anything really it could come out of a mech or S it could come out of a triple store as I can get it out in Jason LD and then do a bit of manipulation and load it up at the moment the stuff for James Minahan is sitting in a series of Google Sheets as Kate's sort of continuing to update it and I just sort of run a script to pull that out and get it ready and plug it in Jason LD as I said is a way of sharing linked open data using Jason which is a way of sharing data in text formats which developers are all very used to so it's a very convenient way and it's probably been the most significant thing which has changed since 2012 has been the sort of development and uptake of Jason LD as a way of sharing and embedding linked open data so the data for this if you want to you can just create it in a text editor as a YAML file which is another way of representing data in a text file so basically you could do all this just with a text editor and of course then you have your text itself and you can use a tool like pan doc to take your Word document just run it out get it in a markdown format which is a text format and then you just define your relationships between your narrative text here and your data file by there's some little tags in that text there which sort of look like that then you feed your text and your data to a static site generator called Jekyll so that just takes all that stuff and spits out a whole lot of HTML pages so I've created a plugin which takes that data file creates an HTML page for everything in that data file and defines all the relationships between the different bits and pieces and then there's a Jekyll theme that I've created which does the sort of JavaScript doodads or the things that wobble and do stuff on the page but there's no reason why you have to have that theme in the end at the moment they're all sort of lumped together but in the end that should be separate the plugin that creates the relationships and the documents should be separate from the way you actually represent that you don't have to have wobbling balls that sounded much ruder than I expected as I said the result is just plain old HTML that you can upload to any web server and the data is embedded within those pages as JSON-LD so no databases, no platforms and this is really important to me in terms of preservation and sustainability you can zip up the whole thing plonk it in your repository and then you know sometime later somebody can unzip it open it up and it will work I'll save you from the story of Lodbook 2 you might be wondering about what happened to Lodbook 2 it's gone missing in action basically because I tried to sort of adopt the new coolness in terms of building it in Angular and ended up in a totally space that I didn't want in terms of the ability to sustain and preserve it OK but wrapping up there's a question here and that's the why bother question I mean you have to put a bit more work in here to make sure you've got your data in a form that you can plug it in and do all that sort of stuff and why would a historian bother with that when they can submit their article to a journal and get their brownie points and that's all good so what's in it for historians why should they care about linked open data and perhaps the answer to that is linked open data the linked open data should support them to do new and interesting things, not the other way around so for historians the point of something like Lodbook is not to create linked open data because why would they want to do that but to expose their research in interesting ways ways that bring the richness and complexity of the work that they do to the foreground that offers new opportunities for engagement and new audiences just build in, just add a bonus and of course it's the preservability that I mentioned before that your data and your text stay together but there's a bigger picture as well and that's instead of dumbing down our publications in order to fit into a PDF we can open them up to new connections I mean our James Minahan Lodbook of course is going to be linked out to resources in a range of different cultural heritage institutions so to the National Archives there's all sorts of stuff we can move from PDFs to portals what I want is historical narratives to be contextually and critically rich gateways to our cultural heritage collections every publication should be a finding aid I want publications that are not just the final output of a research process but starting points for many journeys of online discovery and there's some cool other cool developments things like web mentions and link data notifications that makes it conceivable that cultural heritage institutions could be alerted when their collection items are used in these sorts of contexts and they could then retrieve the linked open data the nice open structured data, grab the web page parse the data out and ingest it into their own systems collections would then become embedded within a structured web of scholarship and storytelling okay, it's not going to happen quickly but that's alright stay tuned for the next instalment in about 6 years time thanks very much and I should say that I did share these slides via the NDF Twitter stream if you want to grab the links in it we've got a little bit of time for questions if there are any questions Kia ora Michael it's on, okay, thank you can you talk a little bit about the pipeline of data you showed a little bit of the markup and I'm thinking in terms of authoring environments and writing this so two aspects one's just an ease of use of how easy is it to write something like this if you were doing documentation but there's also the aspect of how are you getting the data into this that last jump yep, so I sort of wanted to leave it as open as possible in terms of possibilities for getting the data in so in terms of what it expects to see in terms of the and it can either Jekyll will pick up automatically either Jason or Yamal so that's the sort of format it's expecting that my plugin expects things to have at least a schema.org name and type and so that's the sort of very base level requirements so conceivably as I said at the moment our data for James Minahan is just sitting in Google Sheets because that was what was easiest for Kate and her research assistant to be plugging stuff into and it's easy enough to get the CSV and process just do a little manipulation of that to get it in Yamal so I mean there's a custom script involved there but I could set up a sort of shared Google Sheet with a standard script which would enable people to go down that route if that's what they were interested in similarly I've got some templates which use AirTable so you could which is a cloud based database thing so you could actually use that as an API I can pull that out fairly easily and make it available so I'm trying not to say that this is the method that you should use for creating the data but leave it as open as possible and this may be again because I'm just lazy and but again but I did want it to be simple and as much as possible around people's research practices rather than saying well this is the technology, this is the way you have to do things in terms of the text, the south and area text itself well you know you can just write it in Microsoft Word and then use Pandoc to get a markdown output and add in the tags so that's pretty straightforward too Hi Siobhan Hello, we finally met Yes, well we'll get to talk later I'll come stalk you later Have you seen the work that JSTOR Labs is doing? They recently JSTOR Labs they're reimagining a monograph and when they did that project my brain was exploded basically by it and it just seems very similar to the type of work you're doing here and I was wondering if you could comment further if you know that work No I don't but see I always said somebody else would do it They haven't actually done it, they've just explored the possibility and it was about yeah the same very similar themes streaking from both these projects that you're talking about, yeah Yeah, no I haven't seen that one in particular but yeah I mean certainly there's a few of these sort of reimagining exercises going on around the world at the moment so I think there's one is it University of College London or one of those UK institutions too that's a really publishing platform and I mentioned Scala which enables you to have that sort of non-linear narrative and rich media and that sort of thing embedded within your publication They also did an original project right at the start about a Zambezi expedition which pulled in a whole load of research from images and specimens it just blew my mind I just wish more people were doing this Yeah and there's Manifold as a new one which similarly enables you to sort of have those very media objects with your text as well so I think there's lots in that space and maybe I'm stupid for doing my own thing but I do sort of think I mean what I'm trying to do is like I said it's sort of simpler really and I still do want to have the sort of narrative text at the foreground but have those other layers as well and I think there's some value in scaling these things back to simple processes I'd love to be able to do it and I don't know how I mean I look at something like Manifold and it looks really cool and then there's like the installation instructions are about that long and you've got to set up your server and you've got to have all this sort of stuff and they'll send somebody out for a day to help you and I think I don't really want to do all that and then it comes back to those preservation issues as well which I also think are important in terms of designing new platforms and how do we sustain publishing platforms over time which means that the thing's actually going to be readable Thank you Thanks Tim, I was just wondering if you're the Jekyll pipeline you could add in extra checks like the linter and to check that the links resolve the general links that kind of stuff so you could kind of make the pipeline more robust Absolutely, yeah Yep, no reason why you couldn't I mean you know there's good Jekyll's built on Ruby, there's good libraries you know linked up in other libraries for Ruby which enable you to do that I mean I do need to you know it does a little bit of checking just in terms of the sort of structure of the data obviously in order to make it work but there's no reason why you couldn't do more like that and the other thing I see you know as you're doing it you're going to be linking out to other identifiers for the different things as well whether they're being wiki data or anywhere else and the other part of the theme which I haven't put in would be pulling stuff back at the load time using JavaScript in order to display material from those sources as well to integrate it further and to make that stuff available but I see that as sort of a presentation part of it rather than just defining the relationships in the data and presentation possibilities Kia ora Tim, thank you so much I have to admit it's great to see a lot of my team here because and they will tell you that I've been a bit cynical about linked open data and how much focus there is on getting the tech perfect and right but what you've done for me today is actually show how it's useful and the point about historical narratives being to be contextual rich gateways to our collections and actually seeing an example of that has been a bit of an aha moment for me so thank you take note team and thank you very much thank you very much