 Aotearoa, margarit Waran. Margarit werks at the State Library of Queensland, where she's held a number of roles on digitisation projects in library systems and strategic policy and planning, and in copyright, open data and creative commons. Her current role is Coordinated Discovery Services. As part of the State Library of Queensland's First World War commemorative project Q&Zac 100, Memories for a New Generation, margarit is working on connecting content, both historic and contemporary, to the narratives of the First World War. She has been involved with community digitisation projects and projects on Historypin, linked open data initiatives, and one of her digitisation projects included the digitisation of nearly 27,000 soldier portraits. This is margarit's fourth visit to National Digital Forum, and she is happy to say it is one of her most favourite conferences. On that note, I'll welcome Margarit. Hi, everyone. I'll just get my presentation up, get my slides up. So I've called this some presentation ripples in a pond, and I wanted to share with you some stories and learning from our First World War digitisation project that I've been working on. I didn't want it to be one of those, here's my project, it was great, I'm fantastic, isn't our library wonderful, because that's not going to give an opportunity for us to learn from each other and to be able to be transparent about the things that went well and the things that perhaps some didn't go so well. So I'm going to first of all give a little bit of background to it. So in late 2013, the State Library of Queensland began a digitisation project that was aligned with our commemorative activities around the First World War. Must remember to do that one too. So nearly 2,500 pages of the pictorial supplement of the Queenslander newspaper, and it was a weekly summary and literary edition of the Brisbane Courier that eventually became the Courier Mail, were digitised at very high resolution for discovery, online access and reuse. Included in the pictorial supplement were portraits of soldiers published as they embarked for the war. And so a little bit of background about how this came about because it's relatively unique in the Australian context and I'm not sure about other countries whether this was something that happened. So in September 1914, the Talmer Photographic Studios set up a tent at the soldiers camp at Inaugra, which was the main army base for training. The studio was there to take a photograph of each soldier in the camp for publication in the newspaper. This is because they'd done that during the war and decided to be a good idea to do again for this conflict. Such was the demand that another studio, the Fegan Studios, also set up their own tent and started photographing soldiers. Kit was provided for those who had not yet been fitted out so everybody appeared in uniform. The soldier portraits were then published in the pictorial supplement of the Queensland newspaper during the years 1914 to 1918 and this was regularly filled with pages and pages of soldier portraits and also photographs from the war front and the home front and it showed all aspects of life in Queensland, and things that might affect Queenslanders. So everything from a photograph of Trotsky to the prize hen at the exhibition to innovative treatments for influenza which included putting your head in this box and having steam blow at you. So this is incredibly varied and rich pictorial material. This large collection of photographs also then allowed the Queenslander to promptly publish again as soon as the casualty lists came in and then to sometimes republish as roles of honour were printed. So you can see the left-hand image there as a role of honour and the right-hand image is a page from the Queenslander at the point of embarkation. The portraits were also used politically as part of the conscription debate that was so prominent in Australia as the war progressed. This picture here I thought I'd show you from 1917, the left-hand side is a reprint of a previously published page and the right-hand side, one year later, is showing the number of soldiers that were still living from that initial embarkation. So just under 50% of them are still alive and around that is an editorial that says yes, we need to have more reinforcements and if we don't have conscription, will people put their hand up? I think of this as an early data visualisation that if I was an 18-year-old reading the paper would I be inclined to put my hand up when you're looking at the death rate? I don't think so. It's really interesting to see the political overtones that come from having all of these portraits in the newspaper as well. They certainly couldn't have imagined in 1914 that the portraits would be continued to be published until the end of the war and that there would eventually be nearly 27,000 photographs of young men. Around half the number of Queenslanders who served as not every soldier was photographed. It also doesn't include portraits of nurses although they were often photographed in groups in uniform before they left for the war and published in the newspaper. From our position in 2015 there's a fairly sad irony in this news report. I just quote the section in bold. It will be impossible for the whole number to appear in one issue but they will be spread over as few weeks as possible. This was written in September 1914 when the thought was it would all be over by Christmas time and they obviously believed that. So, when we looked at this material we started to think about what we wanted to get out of it and I was certainly the person who said I wanted it all and I wanted it covered in chocolate. First of all, one of our aims was to improve discovery and access to this content. The Queenslander newspaper had previously been digitised from microfilm and it is available on Trove, the national aggregated newspaper service similar to papers, now I've forgotten how to say it, papers plus, papers past. And the images in the supplement were quite significant, often very interesting and often unique and they were frequently requested by members of the public for reproduction from the original which was held in our collection. However, the quality of the images in the digitised microfilm and particularly the eight-page pictorial supplements in every weekly issue was often very poor. This is what we wanted to get to. We wanted people to be able to search for and find this content, get a great quality image with a full transcription of the captions to further increase the discoverability. I'll just go back so I can show you. Oops. No, maybe I won't do that. It's not working for me. In addition, as you can see, the portraits did not always have an OCR layer run over the images or because of the handwritten captions, another interesting feature, if the OCR layer was run, the results were often uninterpretable, further limiting the discoverability via the search functionality in Trove. You can see what a difficulty the OCR layer had on the left-hand side of that image. The second thing that we wanted to achieve was to provide opportunities for use and reuse in ways that we might expect, but also in unexpected ways. We needed to think about having clear right statements and terms of use statements. All of this material was out of copyright, so we wanted to make it that it was free to reuse by anyone for any reason. So we also wanted to make sure that the high-resolution versions of these image files were available freely to download. We wanted to release the digitised content and the metadata as open data, so the whole set could be used, and we wanted to investigate ways where we could connect it with other content that was related to it. Our third imperative was we wanted to do it once and we wanted to do it well. Because this project was a significant one, we wanted to ensure that we did it right, and so the planning phase, considering how we would achieve what we wanted to achieve in the short term and the long term, took some time. Because we were discussing things like how to digitise and how will we digitise in this very high-resolution, how will we describe, how will we provide access, how will we make connections with other content and within the content. They were all things that we wanted to consider carefully, so we didn't have to replicate so the next question is how we got there, and I think Winston Churchill said it very well if you're going through how we'll keep going. The issue for me is, you know, I'm standing here telling you that this started in 2013 and we're just about finished now, so it's been two years of obsessive work with these digitisation and these soldier portraits, so it's a long term project. So, first thing, I guess I want to highlight three areas of how we did it, was to do very, very high resolution. That image there is probably around this size in the paper. So, we digitised everything at 1,800 ppi and so it ended up with each page, each image file being around 1.7 gigabytes. So very, you know, a very high resolution in 24-bit colour. So, yeah, it was a big deal and just even the time it takes to digitise something that large was something that we had to consider. We were incredibly fortunate because we'd just gone through the development of a service level agreement with external providers and were able to tweak the specs for a really good provider and we just sent it out and got it done off-site and it was all done within six weeks because it would have been quite time consuming for us to do it. The second thing that it was to use our volunteers, that's Rebecca. I'm going to give a shout-out to her because she's been doing it for two years. She started right on day one and she's still working on it today. So, the volunteers were really critical to all aspects of this. We trained a group of volunteers in Photoshop and they're the ones who did the extraction of every single portrait to make an individual image file. We had toyed and tossed around with the idea of somehow creating a frame so we could automatically pull them out but there are multiple variations of the numbers of portraits on each page. Some have 20, some have 57, some have 80, some have 90 and it was just too difficult to try and do that programmatically. So, these volunteers did it. We showed them how to use Photoshop and they just worked their little backsides off both on-site and off-site creating those images for us. We had another group of volunteers who transcribed all of the captions on all of the photographs providing discoverability rather than doing a full catalogue record for each image on each page and I'll talk a little bit more about the why of that later. It was a pragmatic decision because we knew we would never get through it all if they had to fully describe each image on each page. We've also got another group of volunteers. There's 25 of them about half-working on-site and half-working off-site who are now taking the images of the soldiers and uploading them to the Discovering Anzac website which is a joint initiative between the archives in Australia and the New Zealand archives so that when a person goes to the Discovering Anzac site and is searching for a family member they'll also encounter our portrait and they'll also encounter a link where they can go and download the high-resolution version of it. So more than a thousand volunteer hours were devoted to transcribing volunteer hours were devoted to the Photoshop work and we estimate it's going to take 9,000 hours to do the up-lighting of the portraits to Discovering Anzacs. When you put those numbers and put a dollar cost on it we couldn't possibly have done it. So when I say the volunteers are pure gold they are pure gold and they're incredibly devoted incredibly skilled. The people who are working on the up-lighting now are becoming intensely skilled in research in archives because they're also our quality control to make sure that the portrait's the right person and the right description. So they're developing amazing digital literacy skills. We've had refugees who've come and then gone off and got jobs. We've had people who've returned to work after injury who've worked in this volunteer space first and then gone off to do work. So it's been a really great experience to have them with us. Thirdly how we got there in terms of data enrichment. This was a really big goal for us and it developed and changed over the course of the project. There is so much more data about World War I soldiers but as I'm sure you know in the New Zealand context it's pretty much siloed. You've got data in the archives we've got data in the Australian War Memorial there's data in the libraries there's data at the Wargraves Commission so it's fairly difficult to connect even though there are data points that we could use for connection. So we were really committed to exploring how we might make connections and at least working towards that as we developed the records for these soldier portraits. And I'll talk a little bit more about I think one of the best outcomes from this project further on in the presentation. So last of all is what we learned along the way. So I've really got six learnings that might be useful to all of us as we progress and it doesn't matter whether it's a World War I project or one of the many other big things that might be coming along. First of all sometimes the planets align and when our manager of preservation dropped by my desk and he had an enlarged and framed soldier portrait that someone had requested and he put it on my desk and he said wouldn't it be great Margaret if we could have all of these soldier portraits done? I said yeah Grant that would be really great and I didn't have any idea how it would all come together so quickly the realisation that we actually had a duplicate set of the pictorial supplement so it was very easy for us to digitise them. There was actually an index that had been created long, long ago in a card index that some I don't we've got no idea when that was done probably in the 1950s that had the names of the soldiers and what page they were on in the Queenslander and at some other time in the past someone had transferred that into an Excel spreadsheet so we sort of had this spreadsheet that we could then use to transform into catalogue records and then we had this corpus of digitised work and we didn't at that time I wasn't entirely aware whether we would get the money and of course we've been very fortunate in Queensland that we got a large amount of money for commemorative activities and so we siloed some off for that and I suppose I think it's important to think but if the planet's a line and you can seize an opportunity you should take it even if you're not sure how it's all going to work out and even if in my case it ended up being it was like a mushroom cloud you just had no idea how big this project was going to be when we started it. The second learning for us was to go backwards which is actually forwards and that's for me it's about thinking about it from the perspective of the end user thinking from the position where the people who are going to be the ones who are using the product at the end are the ones that are given the highest priority in all phases of it from design to end delivery. It might seem self-evident but I'm not sure it's always been our highest priority and I think we could all name a project or a website or a discovery interface that actually only works really well for us and that our engine is at best find it befuddling and at worst they find it just deteriorating and they don't want to have anything to do with it. So really for us really thinking about what did we want the end users experience to be was probably a big learning for us. One of the ways we did it this is the infamous whiteboard which we used to map out what we wanted to get out of this project and we started the whole thing with a group of staff from across the library including our family history staff and including our staff working in reference services, our preservation our Queensland memory staff and people working in our exhibitions as well to say what were the types of uses we expected or that we hoped might be done and made with this content and how could we best facilitate those uses even giving the considerations of our system limitations and this was also in concert with us actually asking the bigger question why should we do this? Should we proceed? What's the value proposition? What do we want to get out of this? Is it worth doing? To have that as a really robust discussion before we even started I think was something that helped us to have a successful outcome even down to the granular level I hope you see my beautiful light horseman there with his feather and his hat there on the top right we were very aware too for our organisation we've got to remit to leave a digital legacy from our commemorative activities to do with World War One and we're constantly being asked to demonstrate how this digital legacy is going to be realised by the people who are funding us so to be able to say this is what we want to get out of it and this is how we think it's going to be used was quite important for us The next step that I think the next learning for me was to work the problems as they appear to not try and solve everything to start with to try and be future focused but if a problem appeared just to work through the issues oops let me go back now I'm stuck she says oh I'm such a silly sorry I hope you can edit that out it's a working the problems there were three main problems I want to highlight for you we didn't and I've recently talked about that before we didn't have the resources to extract and create an individual image file for all the photographs in the pictorial supplement so only a quarter of the pages were soldier portraits so three quarters of that content were pages with images from one to ten images on the page and normally our practice would be you'd extract every single image and do an individual image record well for nearly 2,000 pages without time frames it just wasn't realistic so we decided that we'd use the captions rather than the standard description of the content in the photos to be the way that we led the pathway into discovery and captions in those days were generally quite florid too so they gave us lots and lots of information and then after that was done after that was done by the volunteers the catalogers then took those they were just working on spreadsheets took the spreadsheets and added subject headings, name headings geospatial data where appropriate so all of that sort of professional metadata work was but it reduced the time that they had to spend on it significantly because that's all they were looking at and all of that sort of captioning work had been done beforehand the second thing we had to do was to work out how to make connections between the individual portraits on a page and their entry in the spreadsheet and this was a really interesting exercise because the file names didn't have the name of the soldier on them the file names were just a number and so we had a protocol naming the individual portraits by a number working across the page like this and then the manipulating the spreadsheet so that it was in the same sort of order and then just bunging them together and early on we had a few errors where pages of officers for example were done in this sort of florid wreath-like thing with six officers and then a large commanding officer in the middle and of course the numbering system across didn't work for those but we did a little bit of QA and as we're just as they're made live we just do a quick QA to check that the right portraits with the right name because they all had a standardised title that was pulled from the Excel spreadsheet and that was really, I have to give credit to our staff in description services for working out a way to do that that wasn't excruciatingly labour intensive and the last thing we had to do was to work out how to provide an online experience of looking at the supplements so that eight page insert into the newspaper that was similar to the physical experience we really wanted to have that idea of turning the page. It was quite interesting hearing the talk yesterday about our page turners Pase but I think in this context there was something really nice about browsing the newspaper so our digital viewer is on its last legs and it has a really poor page turning experience so what we did was to upload the supplements to the internet archive and just put a link in our catalogue record to the version of the internet archive which has a lovely open source book viewer so it was a really good way for us to solve that problem. Next thing we had to do was to expect the unexpected. We had some expectations of how this project would play out. We knew that family historians and researchers were identified audiences. We knew that school students were an identified audience because it's part of our national curriculum for year nine and we knew that quite a lot of the content might be used in our World War I exhibition they were the things that we knew and we also knew that there would be interest in April of this year with the centenary of the landing of Gallipoli but what we did not anticipate was the intensity of interest in the portraits and it caught us a bit unawares because they weren't all only a quarter of them were available online by April and it made a very busy march in April because everyone wanted it before April. They wanted to have it for their local community event. They wanted to have all their boys and they wanted it in high res and they wanted it now and it was actually a very busy time for us because we were pulling things off the server and there were some very interesting ways that our portraits were used. There was a I had to put this photo in because it's so bizarre there was a commemorative football rugby game on Anzac Day which included tanks firing off and coming in but also they had done the it's just put a pin in that pit but they'd done some research on the original rugby players people who belonged to Queensland Rugby League who served in World War I and they did a display of the portraits of those soldiers along with the very jingoistic tank aspect of it. It was also used by on the front page of the newspaper on Anzac Day on the courier mail in fact who created this really nifty little digital mosaic that you've got on the left there where you can zoom right in and see an individual portrait and zoom right out and see the sort of the creation of a face was used on a ferry on the council ferries it was used on a bus which was sort of unexpected for us that that would happen but I think the one that I just want to spend a little bit more time as we're coming towards the end of this presentation was our collaboration with the National Archives of Australia and Discovering Anzacs. This is probably one of the most significant outcomes that we did not expect at the beginning. The Discovering Anzacs website went live not too long after we'd started the digitising and we got in contact with them because of that remit we thought wow we really want to try and connect our data wouldn't it be great if we had the service numbers of these soldiers and wouldn't the National Archives really like it if we gave them portraits so that they could add to Discovering Anzacs so we had a couple of teleconferences and you know initially they were a bit oh that sounds really great 27,000 portraits and we said they've all got names and at that point you could sort of feel coming across the screen because lots of portraits of World War I soldiers are not named but these are all named with a first initial a second initial and a last name and so what they did was to write a script for their API which is currently that they've made it sort of public but they used the staff API and we'd send them down three or four hundred soldiers they'd run it through the API and send us back the soldiers that had matched with their records so we've got currently just around about 50% of our record of our soldiers are matched with the National Archives record but what that has given us is something wonderful it gives them their full name it gives us their service number it gives us other information about them and most excitingly it gives us their National Archives personal ID which is their unique identifier and if you want to have a unique identifier and you want to create a linked open data set and you want other people to create linked open data with that same unique identifier that's a really exciting prospect for us so to get that data back it's made the project take a bit longer because it's got to go down to archives come back to us and then made into catalogue records the value add is really amazing he is John Trevalan Matheson it's not a portrait from the newspaper because he's just too handsome be still my beating beating 21st century heart so this is John Trevalan's Matheson's record in the National Archives database and we've updated our portrait of John he was known as Jack and that's how I like to think of him his father was a professional photographer as you can see so the portrait gets uploaded and also as our volunteers go in what they're doing is providing these links on the right hand side while they're in there they provide the link to the high res photo they also, we thought while you're in there why not add more value it only takes 2 or 3 more minutes they head off to the Australian War Memorial and provide a link to their embarkation role so that you can see the War Memorial's information about the person on this page and then finally they do a search in Trove just to see if there was a newspaper article for the person and provide a link to that with Jack it's fantastic you get so much richness about Jack Trevalan so in the middle there you can see there's the high res image ready for download down on the right hand you've got his embarkation role and then we've got 2 stories about Jack one about him being married and one about him being welcomed back to Brisbane Jack was an interesting fellow because after the war he went as a missionary to China got lost in Nepal not heard of for 2 years memorial service held at home funeral for a whole bit and then he walks out of Kathmandu alive and kicking and then comes back to Brisbane and makes a nice living on the speaking circuit for all of the rest of his life talking about his experiences in the wilds of China so a very interesting story that if we hadn't had those connections we'd have only seen Jack Matheson's portrait we wouldn't have seen all of the story around him the next thing second last thing we learned was to celebrate the milestones there were so many milestones that we needed to celebrate and with a really long term project it's very easy to lose enthusiasm as you go along and along and you're just getting nowhere and you're just fixing up spreadsheets and you're doing data correction and you're uploading images and you're talking to volunteers and you think when is this ever going to end and the little milestones along the way was really important for us and last of all I think remembering what it's all for is critically important I'm just going to tell really briefly the story of Valentine here he was in the previous slide Valentine was a young man from Burtican who enlisted at age 20 in 1917 and went off to the war and came back in 1918 at the end of the war he was only seen by his family once when he was dropped in at Cherbourg saw the family disappeared, went north never was in contact again with them had a really sad existence and then died alone and partly that's because Valentine was an Indigenous soldier an Aboriginal man and when Aboriginals returned after World War I they didn't have any of the rights that other returning soldiers had they didn't get money they didn't get support or anything like that so poor Valentine just died a lonely death and he knew nothing about him Logan Libraries which is just south of Brisbane did a project to try and reconnect Indigenous families with the servicemen of their past and Valentine's family was connected and they had not even been able to find his service records because they'd been a spelling error and so this is his niece Iris and what you can see there is Iris holding the enlarged version that was from Trove that sort of poor quality image there and we were able to supply Iris which was used in the digital story, the full resolution framed picture of her uncle and it was an incredibly moving experience to hear Valentine's story and I went up to her at the launch and said look Iris it's so nice to meet you and I was the person who sent the photo of Valentine and she hugged me so hard that I actually had bruises on my arms for the next couple of days whenever I was getting downhearted and disheartened and sort of thinking this will never be finished, I think about Iris because she's the reason why we do these projects to connect the person with the narrative, to connect the content we have with that unique story what's next for us oops I'll just flick through those slides what's next for us is to complete the project we hope by November this year all of the supplements are right up to the end of 1919 were live on Monday of this week and as my colleague Serena sent to me in an email she said stick a fork in his Margaret we're done and I was pretty happy about that and we've got 19,985 portraits are now available online so there's 7,000 to go and we hope will be finished by November and after that it's publishing as linked data that's our next goal is to to really get that data set out as linked data and to encourage other institutions to use the NAA person ID as the unique identifier thank you very much for your time I've really enjoyed speaking with you