 Good evening, everyone. My name is Chris Dempsey. I'm the lead developer for Recollect at NZMS. I don't have a PhD. I'm not studying for a PhD. I don't have a BSc. I don't have a BA. I have, however, been working with the internet and with crowds, essentially, for over a decade now. So I feel I'm fairly clued up on what the crowd is about, what the internet is about. A little bit of background to me. I developed up the calculators, the mortality calculators for the sorted website, which, as a young father, was a bit of a depressing thing to do. I then moved on and did things like trade doors and things, trade access for tradesmen who only have thumbs. When you only have thumbs, the internet is a difficult place to be. So we gave them a loyalty site as well to reward them for actually using their thumbs correctly. But I also have a wild crowd under my belt. I run the Dynout website, and it employs an awful lot of people for free. We review restaurants up and down the country, and if you ever want a wild crowd and you want wild content, that's the kind of project they have. But what all this means is I'm not a designer. I'm a developer. And I spend all of my time in CodeView. This is how I see my life. This is how I spend my time, because data is cool. I love data. I love interconnected data. I love the way it fits together. I love the way that it's awful on one side and fantastic on the other. So the data for me is king. However, let's talk about Wanganui and their crowd sourcing project. The Alexander Turnbull Library has a huge historical index of people and subjects spanning about 150 years, and it's been managed to maintain by volunteers and covers births, deaths, marriages and other interesting bits and pieces from the local newspapers. These index cards are constantly in use by members of the public who come through the more that they have more than just thumbs. And then they utilise the information that's on those cards to which we request pages or whole and entire newspapers to be pulled out and scanned and copied and digitised out of the archives. Now Wanganui Library stopped adding new cards, which is probably a good thing because they've got over 150,000 of them. So the big challenge for them is how to get that information off those cards and into a database in some way to make it accessible but also so they can carry on extending it. All those cards Thankfully that isn't their cards but it's not far off. 150,000 cards means a lot of work and any small task multiplied by 150,000 becomes monumental. If we assume that a single card can be read and transcribed in about a minute, my minute by 150,000 is 2,500 hours or 312 8-hour days or about a year and a half for some poor soul to slog their way through. However not all cards are made equal. Some have a lot of text and some provide other challenges because of the scale of the job simply increasing the time per card to 2 minutes adds another year and a half to the project. And did I mention that some of the cards are double-sided? So this sounds like the perfect candidate for a crowdsourcing project. I might work and so on. So a plan was required. The first question that has to be asked for even starting down a crowdsourcing project is, why? You'll need a good reason for actually starting a crowdsource project because you'll need to explain it to a far wider audience. You need to explain to them why they should bother. What's in it for them? And if you can't explain it to them how are they going to become enthused? If you can't explain why to every single person about why they should be filling in and becoming part of your crowd, don't start. Wanganui's why is that the cards are not being added to. The data that they want to add to it is stacking up and the cards are still being used. They want to carry on using these cards. They want to access the information that is on there now and they want to add more information to them in the future. The next question, still before you can start talking about crowds, is how to digitise your materials. If this is an in-house project possibly you could just work off the original materials but it's more likely that you're going to want to digitise them and put them onto the internet or somewhere. That gives you a much, much wider audience and it means that we've got a number of very successful methods of quickly and efficiently scanning and digitising all these materials but it depends on what your project is. If you're trying to map a graveyard all you need is an iPhone and a GPS and therefore you have all the information you need. We started with a sample of about 1,000 cards from Wanganui. We tested a variety of scanning and indexing and sorting methods. Fronts and backs and sets and groups and all sorts of things. We managed to convert those cards into 1,252 transcribable items. Group to sets, front and back, all noted. Quick OCR run and we're ready to roll. Probably the next most important question to ask or even the most important question to ask is, what do I want to achieve at the end of all this? Do you just want transcription? Do you want the cards titled and indexed? Does the content need to be contextualised? Do you need to know where the words are on the page? How accurate does the transcription need to be? Do you want typos and obvious eras on the originals corrected? All through this you have to keep asking yourself how accurate does the transcription need to be? Do you want typos and obvious errors all through this you have to keep asking yourself why? Keep asking yourself that over and over again. And remember the one vital fact the more context you require from your material, the more intelligent your audience, your crowd needs to be. If you're not going to get everything from that job the first time you run it don't start it. Wait until you actually have answered all those why's. Wonginui like the idea of a fully contextualised card. They have a lot of indexed information on every single card. It would be great to be able to say the Wonginui Chronicle on this particular date, this particular event, a birth and death marriage happened to this particular person and then this particular event and that was wonderful. I loved that idea and it was nearly fricking impossible to use. However the concept was very very sound but the crowd would never put up with it. So that brings us to the most variable element of the entire project. Who? Who is your crowd? If you're publishing this out to all in sundry then your crowd is wide and diverse. That includes doctors and dentists and murderers. It contains accountants and school children that are truly humorous in those that are just funny. The world is full of wonderful and interesting people but it's also full of crackpots and weirdos. A quick read of comments on a vaguely controversial stuff article or a YouTube video will give you the insights into the minds of others. We don't all have the same moral compass. Beliefs, skills or intelligence and these are the people who will be your crowd. Even a small tight-knit group will have a wide differences of opinion and levels of helpfulness. We'll talk a bit more about your helpful crowd later. Wonginui's crowd is a mix of their current volunteers, their in-house staff, interests of people in the region and local communities and other organisations. So the Crowdsourcing Proof of Concept project was launched. We loaded the cards into Recollect and presented them for crowdsourcing. Our initial launch was to a very select few. We have slowly lekt it out to other wider groups for testing, collating their feedback and tweaking the inputs and usability as we went. When you launched your audience, you'll need to know what you're doing and why. Your home page for the project should show its current status, leaderboards, stats, other activity, but also a reason to participate. Don't make it too complicated. Make it sound like a fun thing to do, even though it's the most boring job on Earth. Our first volunteer group which just happened to be our NZMS staff were set the task of just getting stuck in. They had no introduction, no training, no idea on what the end results we were looking for, just a tool to get started with. As a result of the inputs and feedback we built a short training module that used the actual cards to guide the users through the process. This helped us to explain what to look for, where to find the data, where to put it, if typos were allowed or if they should be fixed, how to use the various tools on the page and to make the transcription easier. I lost myself there. They also were able to give feedback on things that were able to give them feedback that they may have made a mistake and guidance on how to correct it and make each step a little harder than the one before it. Just three cards are now training program in a certain amount of leniency and this made the trained users more savvy in their submissions, more consistent and reliable. Only one person didn't make it through training and we'll discuss her later. A golden rule to any crowdsource project is don't let your users get stuck. If they have to complete a large or complicated card or page before they can continue, they will get discouraged and they will give up. Everybody has different skills and if a user is competent, don't force them to complete the tasks if they don't want to. Of course at some stage, you're going to end up huge wads of data. How do you deal with the task of administering it is up to you, but here's a few things to consider. Do you want or need double entry? It works well for Wonginui's index cards as they're short, concise and often laid out pretty well. However, a large page of text or translation may require piecemeal entry, multiple users editing the same actual page until it's considered good. One small concession of double entry is that you could also employ auto acceptance if two cards match or match close enough then you can consider them good as and just automatically accept them right away. The downside to this is that if you do get a whole bunch of malicious users and they just type bum, bum, bum on every single card, you're going to get a lot of bums in your data. You can however convert your crowd into submitters into acceptors as well. Once a user has a certain number of cards accepted, a certain level of approval they've shown their worth as it were you can actually set them to be one of your approvers as well. So crowd source the submissions crowd source the approvals. We have to look at your cards ever again. If however you do want to review every single card that's edited remember you'll have to read every single word that your crowd submits. All of it. Every single word. Crowd source the approval sounds pretty cool now, doesn't it? In the end, the method of entry and acceptance is up to you and based on the material that's been transcribed and the method used to collect it but don't think it is just a quick and easy task. And the crowd. The crowd. The great thing about a crowd is that it seems to follow all of the theories of chaos and randomness at the same time. Take a moment and think of the crowd was made up of just your immediate family. How diverse would their skills and experiences be? Would you for a moment consider that they would tow the line that you have set for them? The crowd is one of those things that you have very little control over. Even if you know them all by name. When we released a test version of the one going to be crowdsourcing to a small group of well-known people one of them failed the training. They could not get past card number two. I had to look through their training transcript to see why she failed in its pure user era. She refused to read the warnings that were displayed and got frustrated and abandoned it. Turns out she was leaving the title and the body of the card. A lesson that should have been learned in card number one. But, obviously, she wasn't. I know this person and they make monkeys at a keyboard look like Shakespeare. She's keen to help but the quality of her contributions would always be in doubt. We love her dearly but we don't want her in our crowd. Training weeded her out. Many other people have breezed through the training process and then got stuck into correction. Some are fastidious and precise ensuring that every comma and apostrophe is in the right location. Others are quite accurate but eventually they get lazy and start to let some errors slip through. Some, of course, were lazy from the first card they ever did. Thankfully, so far, none of the crowd has been malicious. Except that one. Evil. We've received somewhat entries but none that contained anything more than what was on the card to begin with. That's not to say that we won't get one but we have the tools on hand to make sure that if they get identified they can be banned. Accuracy. It is what you want it to be. If you are converting into a contextualised format for a database then you might need high levels of accuracy. Every i dotted and every t firmly crossed. Long ago we wanted the titles and card numbers to be extracted after we discovered that we weren't going to be able to do the whole contextualised because the crowd were going to revolt. But the rest of the card became keyword searchable. Therefore punctuation and spacing was less important than having the numbers and letters entered correctly. Correct capitalisation was of no great concern and spacing layout was up to the individual. But the question still remains over what to do about the typos and abbreviations from the original images. If a word is spelled incorrectly on the card should it be transcribed as such or as it was intended. Spacing concerns on the originals are of no longer of any concern. So should we correct intentional mistakes made on the originals as we go like typing over the edge of the card? If so, how does this impact on the double entry comparisons? Long ago his answer is we don't know yet. We asked a lot of questions at the start but you can never ask all of the possible questions and during the project your rationale and reasons may change. So be prepared for the occasional goalpost move. So lessons. Constantly ask why are you doing this? If you find it hard to explain to your boss, your colleague, your wife, husband, daughter, child or some random person walking down the street if you can't make them grasp what it is that you're trying to do then maybe you shouldn't start. Make sure that your material is crowd friendly. Digitise it and make it accessible. If you put it on the web then make sure it's web friendly. If you're doing it in-house make sure you have the technology to access and display it. Make sure your tools are crowd friendly. Offer training to the newbies offer them to skip the hard or boring bits show a project status and leader boards if it's on the web make sure that it's accessible to a wide variety of browsers. Libraries aren't renowned for their high-tech equipment in-house. Make sure that your crowd is friendly. Be prepared for malicious users. Be prepared for idiots. Be prepared for novice users and monkeys at a keyboard. Test everything. Test it yourself, then with colleagues and with a close group of friends in a friendly external group before you make it public. Test it again. Seriously, you need to be intimately familiar with your tools, with your data and the materials. Test it at work, test it at home, test it on the bus, test it on your phone. Test it a lot. And promote it. Just because you built it doesn't mean they'll come. A wonderful emotion may generate a bubble of activity and then taper off to a few dedicated individuals. But constant marketing and promotion of your crowdsourcing product will keep it alive and moving fast. And enthuse and reward your crowd. Or at least recognise their input. You should have prizes or just fame. But you have a volunteer workforce doing a fantastic job for you and they should be recognised in some way. And use the data. I know it seems obvious, but remember why it is that you wanted to start this project in the first place. If you manage to source a crowd and they produce your results, use the data. Make it yours. Make the power of the crowd into something useful. Wanganui's crowdsourcing is populating the recollect system directly. So they're able to use the data as soon as it's approved. Their crowdsourcing project, however, is a trial. A proof of concept or an experiment. If it all goes well, they'll have the perfect system in place to launch into their full set of cards. And if you want to see their system at work, visit them at wanganuilibrary. recollect.co.nz or I think it's straight off the wanganuilibrary.com You can register, undergo the training yourself, and start correcting cards and be part of the in-groud. So this has been a brief introduction into wanganuilibrary and how they have started their crowdsourcing. There's a lot more that can be said, but the first stages of any project like this is a discussion. And we're more than happy to discuss this project and possibly any project that you have. So any questions or other?