 It's now my great pleasure to introduce our second keynote speaker of the day, Mia Ridge. Mia is, I think, originally from Melbourne. She's well known to many in the NDF community. She's had residencies at the powerhouse and the Cooper Hewitt, so she's well known to people like Seb Chan, who've been in the NDF community for a long time. She's also been the lead developer at the Science Museum Group in the UK. You're now very close to completing your PhD, and I'm sorry we're getting in the way of that. Digital Humanities at UK's Open University focus on historians and crowd sourcing. This work builds on research interests in user experience design, human-computer interaction, open cultural data, audience engagement, crowd sourcing and much more in the cultural inherited sector. All of these are highly relevant to what we're doing. And today's talk, Mia, is talking about participatory commons, and it's a framework that sounds to me like it's capable of pulling a lot of what we do together into some sort of collective whole. So, over to Mia. Thank you. First of all, thank you for having me. I'm actually quite honoured to be here because NDF is one of those conferences that people come back raving about, so I'm very glad to be here. I wanted to put up this question just to kind of have it in mind. I'm going to tell you some stories. Some of them are going to celebrate how far we've come to mark the difference that decisions that New Zealand has made about open content compared to some of the international sector. And the stories are also to remind us that ultimately our work is about people. It's not about licensing rights, it's not about metadata, it's about the people that are contained in those records. So I'd like to ask what would happen if we, what could we create if we came together, if we ignored those boundaries between institutions, if we invited members of the public, both expert and novice, to come and work with us on improving our collections. So, some stories about some people from the past. OK, this is Albert Henry Bailey. He was born and went to school in Dublin. But by the time the First World War broke out, he was a store manager in New Zealand. We can tell from these records that he had his fair complexion, blue eyes, red hair. We've even got his weight, his height, he was five foot ten. One of his medical records tells us that he was at the Trentham camp at one point, travelled to Egypt in a convoy of three ships with about 1,700 men. They travelled the same route over February March 1915. So we know something about him, but we're no closer to understanding his experience of the war. We have letters from another soldier on the same ship. This soldier, Ralph John Newton, talks about the food that they ate. He talks about how they looked after the horses. They had to rush them down with seawater. They had to try and exercise them on board the ship. He talks about a death at sea on one of the other ships in the convoy. He talks about flying fish, dolphins, about crossing the equator. We've got all this because the letters that he wrote to his parents have been saved and are publicly available. Things changed when they got to their first engagement in the war. We know something of the experience of these men from other letters again. This is a letter from William Henry Winter. He actually died the same day as Bailey. We don't know what Bailey went through. He's left no documentary traces, but we do have a sense of the kinds of experiences that he had. Because Australia and New Zealand have made their unit diaries publicly available in the Australian War Memorial, we can get the official unit diary for the day that they died. The list of those killed that day runs to four pages. There's something about reading the actual documents that gives you a real sense of that loss. And of the number of human lives. So in some sense, we've gone from bear data, we've gone from a few database fields, just some sense of their lived experience of that war. We could only do this because these records were made publicly available. They're not hidden behind a paywall. They were made findable, so people who've worked on SEO, they've worked on database structures, they've made those things, they've put a lot of effort into actually making it so you can do a Google search and find these records. And they've made them available in a format that can be collected while you're researching. So I've made little dossiers about the various men that I'm researching. England has made different choices about their digitisation strategies. So I try to conduct a similar search for a different Albert Bailey. He was born and bred in England. He enlisted in England. So the Commonwealth War Graves Commission has records online. So from these records, I learned that he was 33. He was married. His wife who did nottingham. But the census records, the newspapers, the birth, death, marriage certificates that I could access for Australian and New Zealand soldiers aren't available in the UK. So I very quickly kind of ran into a blank. I can't retrieve much of his experience online because I don't have the context of what his life was like before he went into the army. So I could look at the New Zealand Albert Bailey and look at where he was living. Think about, you know, he was in a cavalry troop. How did he learn to ride a horse if he was a store manager? For this guy, all I know is that he was in the cyclist corps. I don't know where he was working, if he joined up with workmates, if he joined up with people from his village. Why he was in the cyclist corps. I don't have any context for his experience. So even if I find other records online, other accounts, it's hard to relate them to him because I don't have those hooks that tell me more about him. This is partly because the World War II destroyed a lot of the records from World War I. So the survivability of records is always a variable in how much research you can do. But I really quickly ran into paywalls. So these records have been digitised in partnership with commercial companies like Ancestry, like Find My Past. There's an argument that they're actually privatising personal lives. So these are our grandparents, our ancestors, people who share cultural experiences at least in terms of British colonies. We can't access their records, we can't access their lives unless you're willing to pay for it. And as a student, I'm not. So it also gets very expensive if you're researching not just family history but if you're doing the kind of research I'm doing where I'm trying to look at the experiences of a lot of people over the course of the war. Individually getting certificates and access adds up really quickly. So the kinds of fine-grained historical work that people are trying to do isn't as accessible to these projects that are geared solely towards commercialising family history. Okay, but why did I want to access all these records? So unlike family historians who are generally trying to look at just one or two people, I was trying to provide context for the wider experience of the war to really understand for people who'd been in similar places at similar times can we get some kind of proxy sense of their experience by looking at records from other people in the same situation at the same time. What battles had they been through? What was the landscape like? Were they somewhere where it was really muddy? Were they somewhere where it was really hot? Had they been at the frontline long? Were they with a troop that was experienced or were they with a troop that had no experience of the war yet? So I was trying to look at whether you could computationally generate context because something like the centenary of the First World War is bringing lots of new historians into doing family history research or just doing research on people in this street looking at the experience of people from the same place as them or with the same kind of occupation, whatever. It's actually really difficult to get into this research because there's a lot of jargon, there's a lot of military administration to understand and then you keep running into these paywalls. So I was trying to look at these records as part of work on the participatory commons. What is that? I'm glad you asked. But first of all I want to take a step back and introduce you to some people. These are composites or personas if you're a user experience kind of person but they're based on my research and I actually keep meeting people who are like these personas which is a good sign. So we have two archival historians. Simone works in a primary school so she likes to continue her historical research because she can only do it in the school break so she travels to an archive. She photographs the crap out of everything that she can. She takes notes from some things. She comes back and then she sorts through her piles of documents. Luckily she's in an archive where you can do that because you can't take photographs in all archives. She's often working next to another historian, Andre, who's an academic historian. They've never had a conversation because they're not that kind of person. But they're actually working on really similar material and they could probably swap things around. Andre's previous project actually really relates quite strongly to Simone's current project. He's published everything he wants to publish out of his little stash of archival documents so he would quite happily hand them over but they've got no way of making that connection. So every day they come into the archive, they do this highly skilled work of contextualising and assessing documents but they leave at the end of the day and they take all that knowledge with them. So Martha and Bob retired. They moved out to a village outside of London and they joined their local history society as a way of getting to know other people in the village and getting a sense of what the village was like. Their local history society is one of those that makes projects for themselves. So they're doing a project on looking up all the names on the memorial, the Great War Memorial in the village and finding out the lives of those men who are listed there. They're also doing a project where they're looking at how the war affected the village generally. So how did women go to work? Did it change what was available? The Secretary of the Local History Society did a local history course so he's got an Excel database and they all kind of send in their records into the database. And then we have Daniel. Daniel has one of those classic shoebox archives. It's literally a shoebox that he keeps in the cupboard. It's his grandfather's letters and diaries. One of the things about the First World War is that censorship was really quite strong and people were in an honour system not to give away too much information. It was actually illegal to damage morale. So these were secret diaries that he kept with him through the war and then kind of stayed in the family ever since. Daniel always wants to kind of find the time to transcribe these to them for his kids so that they can see on a map or some other kind of visualisation where their great-grandfather went, what happened to him during the war. But he's busy raising kids and working and he hasn't done this yet. Finally, we have Nisha. She's a classic kind of procrastinator transcriber. So she's got a couple of young kids. When they've gone to bed she'll have a glass of wine and she'll transcribe some records on this site called Old Weather which is strictly speaking a citizen science or a climate science project that is transcribing ships logs to get historical weather data that can be used by climate scientists. But these ships logs also contain all kinds of juicy historical notes about other ships they saw, encounters that they had, things that happened at sea. So she enjoys feeling like she's doing something for science but she also enjoys that sense of little strange encounters in the logs. And just to take... Someone has actually already mentioned crowdsourcing which is handy but just to define it. I think particularly in cultural heritage crowdsourcing we don't pay people. In a way it's very much like people are volunteering. It's just a sexier form of volunteering in some ways. But the tasks that we give people have to be meaningful. So it's not just busy work. It's not just tell us what you think about this exhibition and we'll throw it in the bin later. You have to be contributing to a common shared goal or some kind of meaningful research question. It's really easy to say that participation should be rewarding for everyone but it's actually quite difficult for institutions sometimes to have that really keen-eyed focus on the experience of a member of the public because it means giving them a seat at the table and involving them or at least advocating for their experience when you're making decisions internally. So none of our friends know it but they're all working in ways that could help each other. If Andre and Simone were actually able to share their documents then they'd both be better off. Daniel's diaries contain information that would be really useful for the historians. It would be really useful for Martha and Bob. Anisha would quite happily transcribe things for them if she was in contact with them. She could also start to mark up the dates and places that would help Daniel generate the map to show his kids. So you can probably guess this is all going somewhere. But first of all, I wanted to bring in a new player. The cultural institutions. So far New Zealand I think is doing really well. It's easy to find content. You can usually access it easily. There's all kinds of community work going on. I love that this is in the about section of the National Library connecting, collecting, co-creating. I don't know how it works in reality, you guys can tell me. But I think it's a brilliant statement to have up there as a goal. But it's not always easy to find structured data. So current services are really good for like handcrafted queries. You can kind of Google around and find a diary here and another diary there. But they're all in slightly different formats. It's difficult to kind of pull them in consistently into a document or into a database. For my current project, I really need lists of battalions for all the allied forces. And I know because I've talked to people here it's actually very difficult to get one for New Zealand. It's difficult for almost every country. Partly because there's a lot of battalions. And they changed over time. But it's also I think that no one institution has ever needed to produce a list of battalions. The kind of definitive authority list of here are all the battalions that we know about including the medical corps and all kinds of other things. Because we base our collections records on our collections. And if we don't need to record something about a particular unit it's not going to be there. So we don't have those kind of comprehensive structured data sets that would make things like computational history or digital history easier. And I think digital history methods only become transformational when they work at scale. So you need to get beyond individual queries on institutional collection sites and move up into large scale queries. Something more like a historical guess that here would be quite useful in the sense of you can look up and find out historic place names for a current place name. Because it's the kind of thing that commercial companies don't do, even sites like geo names don't do very good historical names. There's nothing in it for them. So we need coordinated action and we probably need some help in enhancing records too. It's a lot of work to do but there are a lot of experts out there. If you've ever worked anywhere that has history of technology collections, transport collections, military collections you'll know that there are intensely expert people out there who will know more about specific aspects of the collection than any curator will. Because if it's their hobby they can go really, really deep onto one thing without having to work on a hundred different objects for an exhibition. So I want to think about what would happen if more museums and libraries and archives opened up their data and if they published structured data alongside with clear licensing terms so you know whether you can be used something or not. So this concept of the participatory commons is partly a provocation, it's partly a thought experiment, it's partly an attempt to force some kind of requirements engineering through action research and it's really, it's that bit in the middle, it's the, not strictly speaking a kind of an actual platform that holds all the records but it would reference things like the Internet Archive, it would reference Europeana, it would reference the Digital Public Library of America, of course it would reference TOVE and Digital New Zealand. So you've got all those records and you've got the chance to pull in those shoebox archives because I really worry about the sustainability of these things. I think we're losing heritage, we're losing history every time someone moves house or their attic floods and certainly in the UK houses are getting smaller, people are kind of disposing of stuff. And I'm curious about why more institutions aren't helping people preserve at least by digitisation as a proxy for preservation looking after those records so that they're filling in the historic record that gets away from that sort of great white man view of history. So I said it's participatory, it's not just a document store, it's not just a giant aggregator. The idea is that it deals with a range of tasks that people can do on historical content. So the things that historians typically do, they'll assess and contextualise documents, things that sort of the public or more specialists, getting really into history might be doing identifying people and places and events and kind of pulling those things out as notes and putting things into like tagging documents or saying these are ones that I'm interested in for this project. That's Aunty Margu's family history, whatever. To the type what you see, crowd sourcing activities, those micro tasks, transcribe texts, tag this image. And Aunty said this morning that it's not online, it doesn't exist. And I think that one thing that we can get from the personal record collections of historians is those archives that aren't digitised yet. Because some of those historians will have records going back 40, 50 years. And historians will kind of chase citations to particular archives, but if there's a way of saying here are the notes that I made that at least give you a sense of what exists in that archive even if they haven't digitised their catalogs or haven't digitised the items. It's a way of making those collections visible again. One thing about sharing content in this way is it lets people follow their interest across collections. We've got this weird habit of collecting histories influence what's in our collection. The same item could very easily be marked up as an archive record, marked up as an ephemeral item in a museum. It could be marked up as a printed catalogue in a library. It'll be treating it slightly differently. It'll be recorded differently. It'll be having a different meaning in different forms of access. Members of the public don't care about that. They don't really understand why a pamphlet about a village fate in 1898 has such different ways of being talked about. They just want to follow their interests through that village. They just want to find everything about that village. So thinking about a commons is a way of doing this. You also need a critical massive material to be discoverable. I think when I was looking for family for these records of these soldiers the way that the New Zealand and Australian records have been set up, it's quite easy to find things, but you do need to know which databases to go and look in as well. So even having lots of content linking to each other those kinds of virtuous connections will help content be more discoverable in Google. The more experts you have linking to your sites of course the more discoverable they are. Researchers aren't always very sophisticated in how they think about resources. Museum catalogs are notoriously underused by researchers because they just don't think there's anything for them there. So you have to be findable in search engines because otherwise you're kind of invisible. So again we want content to be indexed in time and space and not indexed by the institution that created the record. Two really bad visual puns in this. So for most people content is content they just don't care which collection holds it they just want access to it so they can get on with their research, they can get on with telling those stories about their lives. But I'm not suggesting one platform to rule them all, it's more like a kind of a unixy small pieces loosely joined idea which is partly useful because it helps you respect the kind of messiness and the specificity of the records that you hold. You don't have to try and squish them into an aggregator and make them look the same as every other record which is what you see in some of those big Europeana and DPLA style aggregators. So the participatory commons supports niche uses. The whole idea of having a platform is that you can build very specific focused interfaces on it that look at very specific questions. So it might be the history of a village or it might be the history of an occupation. It could be anything that you can think of. But these niche projects allow you to have more focused sets of content. They get away from that flatness or that drabness that you get when you get these giant aggregators when you cannot find anything even though you know the records are probably in there somewhere. Having this kind of idea of a participatory platform as a service means that you can tailor the interface to the data that you have. So if you're working on early modern England the requirements for you are going to be very different than if you're working on 1950s Auckland. You want to design an interface that responds to the content and responds to the needs of the users. So niche projects can help get through that sort of sense of wading through giant data sets as well. They also let you do things like tailor the functionality to the content. So this is the obligatory shout out to Tim Sharap. And I found this really useful in my own research where I can do one search on his site and search all the Australian collections for World War I records. If every country had this by life it would be a lot easier. And the thing I love about this is it's a screen scrape. He didn't have to go to endless meetings and kind of agree on interchange standard and crosswalks and all this kind of stuff. It's just like scrape the content and search it that way. It's not perfect, but it is so much better than going to seven different databases. Never quite knowing if you've remembered the right syntax for that search engine or if you're thinking of the other one and you've got it wrong. There's a lot of overhead in setting up participatory projects. There's managing user accounts. There's dealing with spam. There's dealing with people who've forgotten their password. Having a platform that has thought about these issues and has dealt with them in a way that's a bit of a description. Doing really complex research queries means that you don't have to think about that kind of thing. I get a lot of people asking me about setting up proud sourcing projects. And it's really hard to advise them because all the projects have a fair bit of really messy setup. You need to wade into config files. You need to be comfortable setting up three different systems that all have to interlink in the right way. The user experience is often not as good as it could be. Having platforms that take care of these things means that they can get on with doing the things that they do really well and you can get on with just using them. Because I really don't want to understate the amount of work it takes to get a good proud sourcing project or to get a good participatory project. So there's something about getting specific that people respond to really well. People respond to place. People respond to time. They respond to topics. And the niche projects built on top of a generic platform means that you can build that thing about ice cream shops and milk bars in 1950s Auckland if you want to. You can build a site about teddy shoes. You can build whatever you like because you can pull in the content that's been indexed and design and interface that suits exactly the kinds of tasks that you want to do. It also creates a possibility of having different kinds of tasks or different levels of responsibility. What I really like about crowd sourcing is that people get started on it. They love the puzzle of reading 17th century handwriting or even 19th to 20th century handwriting these days. But once they get used to that puzzle they get a bit bored because they've figured it out. They've got the knack of it. But if you can give them tasks to go onto that are more complex that have a greater responsibility that make a greater impact then they'll stay involved and collaborate in some of those responsibilities with members of the public. It doesn't mean that you can hand everything over and expect they're going to do things for you. They're not a workforce but it does mean that if people want to take up opportunities then they're there to take up. So I said it's a provocation it's a thought experiment. It's inspired by probably a very social history view of history from below. But I think there's something about this combination of a large underlying repository the ability to pull in very local records to pull in personal records that makes it quite a powerful response to some of the larger digitisation projects where it's the classic 18th century white men all their correspondence is digitised and there's huge amounts of archives that have not even gotten any money for conservation. So if this is also ace wise isn't it happening? I think people are still a bit protective about their content but maybe it needs to be a bit more like you're sending your content off to a party let your content get out there and mix we've got the mash-up competitions over here. You guys are thinking about this already hopefully it's not too scary as a concept. So there's a number of barriers and I'd love to know why people aren't taking on the challenge of digitising these personal collections that people have using them. I'd love to see something like flying digitisation squads or people in libraries training young people to get their first job out of school to be digitally literate. Digitising people's records while they do this I think would be a great use of a local library or a local archive as that kind of real community space. I think we also need to think about the shoebox archive of today it's a phone. Unlike the documents in a trunk phones can be dropped in the loo they can be left in taxis they can be mugged you've probably all had a friend who's lost all their photos or their family holidays or whatever it's not only that we're storing more and more of our lives in these devices we're also storing more and more of our lives in other people's spaces so geocities twit pics postures they're all gone and there are other services and it could be your wedding photos it could be your kids first steps it could be your grandad's last conversation we're really at risk of losing some of these things and I think thinking about how platforms might help you deal with legacy software with archiving these things talk to Bruce to get him to help you with that okay so copyright is not fit for purpose it's not helping people get a livelihood from creating content it's kind of having a stifling effect we all know this there's a lot of commercial pressure to privatise records and particularly in the UK people see it as the only way that they can digitise content but it means that things like NewSafe are locked up for 20 years or 10 years and unless you can afford a subscription or get to London you don't really have access to them I think one issue is that the Commons is a really crappy metaphor it's based on access to physical property and land, it's embedded in concepts of community of particular types of use of land so Commons doesn't really grab people as a title we need to find better terms this is me with the Cisaurus trying to think about other terms I wanted a self-explanatory name and so I did some focus group research I went on Facebook and asked my friends which of these terms mean anything to you and the nearest I got was collaborative collections I really liked cooperative because I thought it had that same sense of people working together but it sounds a bit hippie apparently fair deuce but all these terms they're awkward to say, they're long there's nothing we need a short snappy term that says something so if you think of anything, let me know often works, certainly in the UK up to 70% of records can't be put online we don't know who the creators were we don't know what rights are associated with those records which is a crying shame because 70% of the UK's heritage isn't available online there are big design challenges of designing for different audiences academic historians have very different reward structures and concerns than a vocational historians who are quite happy to share things generally as long as people treat their content with respect because it might be about their family as we start to get into these giant digital databases we're creating privacy issues so when you get to the point when you know everything about someone who lived 100 years ago by connecting different databases it's not only their life that is suddenly exposed in a new way it's the life of people around them their descendants people who are associated with them and we don't really know how to deal with this yet aggregation flattens data I've tended to work in institutions with really nice granular data but if you want to put it into a shared site you have to go to the lowest common denominator which means things are less findable they lack that texture that the original records had and we get people who are afraid that putting their content on shared sites will cannibalise visits to their own site I'd like to think the more people know about your content the more people will use your collection so maybe that's one of those battles that will go away of its own accord and I think one thing we have to be aware of is if we don't do it someone else is going to Google's cultural institute have tools that institutions can use to put their content online I don't want to be paranoid about Google but we don't know exactly how long they'll sustain these services we don't know how it fits into their business models the people working on them are individually great but they are working for an institution with a sometimes problematic history and then you get things like this people seeing history in pics and those kinds of sites I think they show you the thirst for accessible relatable images but to me they decontextualise records they kind of go for the image that will visually grab you or the image of there's a lot of Marilyn Monroe on these sites they're yanking content out of its time they're taking it out of its context they're presenting it purely as this flattened yes, no, hot or not moment and I think that is subtly damaging to the historical record and both of them in picking out these highlights undo the work that we think so carefully about in being representative in thinking about communities in thinking about the voices that we present and it kind of goes back to this sort of great white heroes so how am I doing for time I'm currently working in Dublin at Trinity College I'm working on a project called Sindari and when I put in my proposal I sort of basically decided to act as if the participatory commons already existed because why not it's one way to make it happen the Sindari project is aiming to provide historians with tools for contextualising and sharing their research so like me they're very interested in how historians work with technologies how technologies might influence people to share or collaborate a bit more they have two areas of focus one is around medieval documents and the other is around World War I because it gives them a range of the likely issues that they're going to encounter and a kind of coalescence around the concerns at either end of those periods and I have to say doing a research fellowship for someone who's used to working on exhibitions it's amazing, there's no deadline at the end so it gives me time to experiment in a way that I don't get to do in a day job so I was trying to sort of work simultaneously on the technical requirements and the cultural requirements for something like a participatory commons as well as building a niche project that would help to test the reality of this kind of infrastructure it's a three month project so it's very squished sort of less than ideal but that was what I had to work with so one thing that I wanted was a list of locally relevant names so places, concepts, events, activities things that were about World War I because that's what I was working on and this is actually Gallipoli again you can see that names like the farm it's very contextual you couldn't Google the farm and expect to get this record because it's such a generic name and these names are really specific they came into very intense use for a very short period of time they don't exist as places in things like Google they don't exist as places in things like geo names they only exist in this very specific content of this sort of very specific cape at a particular moment in time and what I really wanted was for these names to be available as link data so that I could link out to other things so I could say when I'm talking about the farm I'm talking about the same farm as these records here and I wanted to be able to train software to recognise these terms and to say I understand that the farm here has a specific meaning it's not just someone asking how the cows are doing at home and I was going to do things like exploring and outsourcing to see if you could get people to help check possible entity matches and see was the software doing well in recognising things or would it be better to get people to manually mark up these things and I thought it's the centenary of the outbreak of the First World War so naturally there will be a lot of structured data because there are so many digitisation projects going on there are so many participatory projects going on everyone is spending money and getting excited about the centenary so naturally that would mean that there would be a lot of structured data and a lot of records to play with it turns out there wasn't there's really there's lots of little sites hosting diaries and letters they're all digitised in slightly different ways there are national aggregators that have you can only work with the items in your collection so the State Library of New South Wales has a different collection than the Australian War Memorial so every time I want to access these records I have to find slightly different ways of doing so and they often have different ideas about what digitised means so some of the New South Wales records that I can find in Trove look amazing and then I realise that there's so low resolution that I can't actually read the handwriting so there's no way that I can transcribe those documents and get a sense of what was happening in that diary where others are very high resolution but you never quite know what you're going to get we have big collecting projects like Europeana 1914-18 which looks like a massive treasure chest of personal accounts diaries, letters, memoirs but when you start to look at the records sometimes not the whole thing hasn't been digitised or it's someone's only bought in a couple of letters they haven't bought in the whole set sometimes it's people who've transcribed themselves and generally you wouldn't trust someone else's transcription if you were going to rely on something as a source you'd want to check for yourself that they've gotten the names right and the dates right and things and then you've got people who are painstakingly transcribing unit diaries but they're not linked back to the Australian War Memorial site so there's no way of knowing that actually someone has done all the work of transcribing all those pages so the next poor sucker sort of sits there and retranscribes them or rereads them so I kind of thought I'd be popping around the shops buying some ingredients and then making a really fancy dinner but as it turns out it was more like sewing a field, growing some wheat grinding it up, making flour and then finding some eastern making bread it's been a really painful very manual process and I think there are several reasons possibly for this lack of data often institutions doing these big digitisation projects will promise an API at the end of it but it's really easy for that to drop off the list of deliverables because people are never quite sure who's going to use it, they might not have been that convinced about it in the first place so these promised APIs or these promised open data sets just don't appear they never say that if I haven't delivered it it just doesn't happen and you get these amazing amazing amateur historians or avocational historians and sort of special interest groups who do brilliant work producing finding aids, producing wikis that explain lots of things about these records they don't have the skills if you think back to the retired historians in the village who are using an excel spreadsheet that's really common that's about as structured as people will get and then the projects that have developed ontologies and structured data are really specific so they're kind of based around their very particular goals their research question so they've devised a vocabulary a structured data set to talk about this and populated it with records that relate to this very specific question so there wasn't a lot of data out there so I kind of refocused on things that researchers needed to get started in looking at WWI histories so my goal now at the end of this project is that someone who wants to research a soldier in WWI who doesn't know anything about how armies were structured can find a personal narrative from a soldier in the same bit of the army without having to understand exactly what the same bit of the army means so I'm trying to crowdsource the development of ontology and populating it to deal with these really complex structured queries so that every single new historian doesn't have to try and figure out the difference between a battalion and a regiment and a division and a brigade and when people moved around between these different things so there's a kind of simple crowdsourcing task which is to try and find a personal narrative for every battalion in the army if I'd done my research and known exactly how many battalions there were I might not have said battalion, I might have said regiment because there are thousands and thousands of them but people did have different experiences from different battalions in the same regiment so I think it's worth trying to get there so if I want to know what life is like for the man on the left because he's my great uncle but he hasn't had any documentary traces I'll take the diary of the guy on the right as being good enough I'll say I will still learn something of his experience there's always a gulf in terms of personal histories socio-economic status you know every it's never going to be an exact proxy but as some sense of what it was like to go through this incredibly intense experience I'll take that but it's really hard to find this content online at the moment so I'm trying to crowdsource the process of linking these personal narratives that exist out there to battalions and then there's a whole geeky bit which digital history sounds really sexy but actually it's a lot of data entry so I'm trying to do something that I've never even emailed let alone met to design and populate a data structure that will let you say things like where was the 27th battalion in June 1915 what was going on what experiences were they having by creating structured data that will support these questions I'm using a wiki for it which is not ideal but it was a kind of quick and dirty solution so the sense that I have is that people get excited about big data but at the moment it's very much handcrafted data you can't run these queries and easily get reliable data out because history is messy where people are messy so I've been asking people to help me find a letter or a diary for each unit and given that I haven't been able to work on it full time because I've been working on technical aspects as well it's slowly happening so this is someone who added a record last night and I think it shows that a collaborative collection can provide the historical content that has that specificity of the niche call I think there's also an emotional call here that helps with motivation but it shows the power of the niche project so coming back to this question of what could we create I've tried to create one tiny thing I've uncovered all kinds of issues with it but I think that process of working out exactly what's not there is useful because it means you've got a goal to aim for so I think we should try and find out what would happen if we did this I think New Zealand's institutions have already done a lot to open up there's a lot of really clever work that's being done, there's obviously really clever people here but it's up to you guys to keep pushing to make something that is actually better and bigger than the sum of your parts of your individual institutions so thank you Don't be shy, we do have time for questions Here's one in the middle It's a comment I appreciate your difficulty with finding a name for this idea and I do like the word Commons I think it is exactly what you need but as part of our statewide World War I commemoration in Queensland we started surveying all of the local history and local public libraries and we started off with this idea of what we call distributed collections and we're very roundly warned that it's not distributed because that implies that we were the centre so we came up with a term that we want to think about called connected collections Okay, thank you There's one down here I was just thinking like you were kind of saying history is messy so it kind of resists being big data I wonder if that's just that not enough processing power has been thrown at it yet? Yeah, almost the machine learning going on and the kind of computational deep processing I think it even offers challenges for crowdsourcing because things like text transcription are going to be done computationally but I think the kinds of judgments that people make about it's really subjective so even things like looking at the battalions and saying what's the official name of the battalion it probably changed four times during the war the axe or the broom different units have come in and out of that battalion so it's actually hard to say what the battalion really is in some sense so those kinds of philosophical ontological questions still arise I actually like the mess I think it's kind of if we can work out ways of getting computers to do some of the grunt work for us to see it to say yes that's right rather than having to do the subjective thinking yourself so if we get smarter about that interaction between machine and us then hopefully some of the grunt work will be done and even if you just sort of script up things and then have to tidy things up later it's putting you further ahead than you were Priestam Any examples that you think of projects that have done part of it or could be used as maybe inspiring if that whole thing Yeah, there's lots of bits that are kind of like it and I think even the the Keita software here is doing some of it but it's also thinking about having that kind of the trifecta of the three parts that have to be there because you do have these big national aggregators that kind of will take on content from everywhere there's a kind of reluctance to deal with the community aspect and that community collecting aspect of it sometimes all the people who are doing that are really low to the ground and grass roots and don't have the capacity to think about the big things as well so there's lots of little or big examples that could feed into lessons about what not to do and what's worked but hopefully it's just a matter of kind of bringing them all together or creating links between things without actually reinventing any of it In the State Library of South Australia all the missing persons are held to the Red Cross in the South Australia three and a half hours ago and what we're doing is a reminder to total paragraph of the two gentlemen into those those files and we've got information about the weaknesses of the last people who saw them or saw them in hospital and all they were doing we think that with fewer war grads commission our photographic archives with the community as we put online people are putting faces to the names to the records of the witnesses as well as the missing persons so linking it to the victory pages and the use pages and it's rippling out across the whole community so we've got people from all over Australia who's ancestors were in South Australia and for a group joining this project we're thinking about 10 different sorts of records including those out and do you have any lessons that you can succinctly share in five seconds you've got to get the background infrastructure arrived first and the other thing you've got to get out there to enlarge it I've just got an observation in a question it's interesting how often the community resort to this term the lowest common denominator and we seem to forget the highest common factor which should always be the goal on these kinds of things how can we get to the highest level of commonality and I think that's something your project really aspires to and it's important to remember that corollary to the lowest common denominator my question was in terms of motivating people who might be participants in crowdsourcing digital badges is a motivation so you might demonstrate a competency in 19th century handwriting transcription before moving on to image identification is that something that's been done there are many many many words written on this subject because there's a long history of citizen science coming from that tradition of public participation in scientific research there's a lot of thinking about what motivates people to participate in those kinds of activities there's a lot of work some work on volunteers in the cultural heritage sector there's a lot of work on the kinds of learning that people get out of things and generally when I'm looking at things I group them into intrinsic extrinsic and altruistic motivations so the badges gamification they all kind of fall into the extrinsic thing but there is something kind of micro-credentialing so I think for historians to finding ways for different people with different concerns to get credit for the work that they're doing so if someone shares documents from an archive it might be that they don't actually have any interest in nursing records but they've shared them because they happen to end up in the same box and they photograph them but it would be great to give them a way to get some academic credentials for someone who spent time in the archives which is the mark of a historian but I think they need to be really carefully designed to match the motivations and interests of people because it's a very tricky area We've got one down here who's been hanging out for it and then we'll come back to Chris over here Thanks I really loved that talk I was there was a bit where you started riffing off Unix philosophy and you started talking about small ontologies loosely joined and I was really taken by that and then I started thinking that I don't really know what that means I thought in one breath it might be that you've got ontologies that are particular to specific collections and you have really dumb links between those and the other is that you might have ontologies and kind of crazy mappings between ontologies and turtles all the way down with ontology, ontology, mapping, mapping, mapping I was wondering if you could expand on what you meant by with that phrase small ontologies obviously joined I think it's actually David Weinberg's phrase there's also something Tim Berners-Lee said at an event once for me having worked for a long time in cross-stall institutions that were trying to get our data to go across things so the Science Museum and the Welcome Collection have a lot of items in common but that whole dream of the kind of follow-your-nose thing where when I say title I really mean name and when you say name you mean title the way that our collections are based on decisions that were made 180 years ago mean it's really difficult to follow your nose because you'll just end up running into a wall so the idea is really that you do the vegetarian version of eating your own champagne of eating your own dog food you make ontologies that you need you get the structured data that you need and then you share them with people and find matches between things it's an again coming back to at some point machines are going to be able to make semantic inferences about meaning that at the moment it's a handcrafted process so when I've worked on partnership projects part of the squishing down of the specificity is being about we could say it's this particular street in Brentford but someone else might be only able to say it's in the council of Hounslow so you kind of you have to make sure that you design systems that will deal with that highest common factor as well and often people don't because it doesn't figure into the immediate user needs but I think we need to be thinking about retaining as much of that specificity as possible so yeah if you work create your own ontologies and use them and make sure they work for the material that you've got and then try linking out to other people's ontologies take one from Rick and then we might go have a track so go in there go for it okay, I'm really interested in your comment that we seem to be repeating the stakes of the past in terms of the difficulty which we've had or the lack of encouragement of people to digitise their own resources so in the hard copy analogue world in the analogue world very difficult to encourage people outside the environs that we are here to spend time digitising their own resources and putting them up and the fact that that lack is now being repeated in the online social media environment with it seems to me a real dearth of tools which make it easy for people to properly archive their own personal histories is there something out there that's being done about that? Well I suppose the movement to have social networks that you can leave if you need to in that sense if you let people leave then they'll probably not feel the need to leave so you can export your data from Twitter or you can export your data from Facebook it's a terrible archive if you've ever looked at it and you don't have any of the richness of the social relations there are some people who are really really keen to digitise things so something also about moving to Ireland where people kept telling me about these collections that they have that are amazing you know several centuries of house deeds or whatever that they've got and even for the World War I stuff people feel a responsibility to do something with these records you get people who merely just kind of chuck out old documents as well so maybe is something about sort of marketing the process in a way that says you might not be interested in this but this contains stories this contains lives these are traces of real people even if you don't care about them you can share them and someone else will benefit but you know we don't even protect our privacy when we use social networks so that kind of hygiene of making sure that your data is okay is a whole other step above I guess there's no interest in the software companies or the social media companies for us to think about preservation because it's boring it's not as fun as tagging photos on a Saturday night so unless there's yeah if it's hard, if it feels like it's going to be a drag if it feels like it's not going to be useful or useful then people won't do it so it's a kind of marketing problem I guess in some ways one last question up and back Scott the microphone just a question about the aggregators, the big aggregators that you mentioned so the DPLA and Europeana Trove and even Digital New Zealand how far away do you think they actually are from being platform as a service and participatory commons and do you think that they could or will get there or will they stay in this kind of digital takeaways so Europeana has something like a million lines of Java code it's really hard for them to deploy new things there's a kind of legacy they're definitely all exploring it but I think they also cultural institutions are really chicken of the egg they need to see the demonstration that people will help transcribe things or they will help transcribe these things before they'll invest in making platforms but because I have open APIs anyone else can pull in their records and do things but what I found looking at the Europeana ones for the the roadshow collecting days is the records are in really odd format sometimes sometimes it's like an entire PDF and sometimes it was 72 individual JPEGs and so they're back end systems are a work in progress which I think it makes it hard for them to respond to these things Thanks Right well let's thank Mia for her talk today Thank you very much Thank you very much