 Hello everyone. I'm very glad to be here. It's actually the first time I've ever been in New Zealand so I'm very very happy to be here. I mean the wonderful welcome this morning, the song. I won't be the first keynote to ever mention that every conference should probably start that way. Very rapidly in the morning, I was taken with Trevor Owens talk about the nature of frameworks of understanding in particular the fact that software implementation can enact politics because very soon after that, Tim Sherritt's talk about the perfect face provided some very perfectly terrifying examples of what happens when software implements those politics. I mean some of the things I've been struck about the NDF forum because again it's the first time I've been has been that all the talks have been practical. They've been backed by some kind of imperative to do something, to make a change, to have a solution, to have some kind of discussion that ends in something quantifiable. Too often I've been to conferences in Europe where it's been white paperitis. They come to deliver a paper. They don't really care what people think. It's been good to see all these kind of things put out there without this notion of a future utopia. This wonderful thing that we're in phase two of that in phase 10 everyone will want to use but no project ever gets to phase 10. It's always that white paper that they need to churn out. The range of talks have also been quite astounding. I've been very taken with the huge leaps between hardware hacking, between social media games, text mining, user-generated content and how to handle that and also the talk about maker spaces and its role in how it can teach us how people might learn. There's also an exciting range of talks tomorrow. Just glancing at the paper, I saw that there was a talk on how to start a kick-ass social web series sitting next to the somber and quite tragic tale of digitizing burnt files and archives. Having those two threads running together is actually quite rare and very often you have these very connected sets of content. So having the range is a wonderful thing. I won't go over the lightning talks because you've just heard them but it's been a wonderful to pick up the threads of collaboration and how people can learn from one another whether it's from a colleague's representation of a system they've built down to the stick and the carrot. Hopefully my description of the work I've been doing with British Library Labs is not too out of place in this range of hopefully wonderful talks because it's real, practical and above all important work that you're doing and I hope that what we've been doing in British Library fits in. So the talk is titled Fastes and Failures. That's no reflection on anyone in this room apart from me. Fast is actually something that is deep in the problem space that we've been working in in terms of working with researchers. I'll go into more in that later. The failure part is purely because there is a fear of failure very often and so part of what we've been doing is embracing failure, embracing temporary solutions, embracing the fact that we might build something and no one uses it and that's fine because we have a plan to ditch it and move on. Sometimes the data is more important than the service after all. So British Library Labs, that lab's word, it's a bit awkward and unfortunate because we're actually in Andrew Mellon project. It's not a physical space so much as an open place where people can come in and work but it's not purposed. It's not a lab. We don't have computers. Now part of the fast notion I'll get on to is based in what we do with names and labels. The names that we give things shape the questions that people will ask us about them. They're intrinsic in how we deal with things and yet I've seen very little thought given to them because we're too used to just giving traditional answers. It also affects the assumptions people make over the things we have and the things we can do. I mean the word labs given its context I think most people might understand what it is but that's a collection of Labradors and an excuse to put a picture of dogs in my talk. But the point is British Library Labs gives you the context to know it's not something about animals. It's something in that building but it doesn't really define anything because labs is just one of those words. So what the Labs project is about is to take the situation of researchers coming into our ridiculous unkempt data sources for one of a better word and instead of exploring in the darkness to try to provide at least some guide through the crypt to work their way to something of worth rather than spending their time filling in forms or trying keyword searches that will never go anywhere because the data just isn't at the end of that search. Now people expect digital projects to produce well engineered wonderful stable things bridges between people and data things I'll be there for years. This is my kind of bridge. This is the kind of thing I make. I make things that connect points of land or data to people by trying to put them in places where people have told me they need them and if people don't use them it's okay we didn't spend that much effort building but if people do use them then on a future thing we can inform people that maybe a bridge like this is necessary maybe more people want to use it. So in a nutshell we're working with researchers on their specific problems not really thinking too much wider than that on the problem but coming away from it the things we take away are the general things that might affect the researcher community in general so instead of presupposing what they need it's working with them to discover the areas in which they're working to discover the problems they have whether they're technical or just simply vocabulary because one of the things we have is a problem where researchers see us as the gatekeepers the holders of information they tell us things that we that they feel we need to hear they use the words that they think we need to hear they don't tell us what they actually need because sometimes that vocabulary just isn't there the understanding of what we can and cannot do isn't there we found that showing researchers the data in its rawest state helps immediately change the questions they ask from being very light questions to very directed very informed questions it's got nothing to do with their background it's to do with what they know of what we have so the labs in its structure is about experimentation that's why the word is there so we're there to experiment and discover and map out areas of what we can work with and we do this in a number of ways the primary one the most formal one is an annual competition we put out a bid for researchers and we ask them for ideas what would you want to work on what do you want us to help you with and these bids go in and we pick two and we work with them for anything between four and six months two projects at a time and that gives us a lot of direct evidence about what people actually need and what problems and barriers they hit on a daily basis we also do a lot of on spec internal collaboration with the existing curators in the library where the project is situated in a in a department called digital scholarship and there are a number of digital curators or more specifically curators of digital content and they're the people who try to work with curators who may not have the technical knowledge about what's possible and what isn't we also do external collaboration which is a fancy way of saying that people email us and we see if we can do the work without getting into much trouble and we collaborate with whoever has an interesting idea that we can fit in given the resources we have because formally the British labs project is me doing the technical work and a colleague called Mahendra Mahay doing the line manager admin reports work so there's two of us so we try and fit in what we can and even though the melon foundation is very keen on researchers sometimes creativity is simply creativity and we're not going to label someone's idea as good or bad based on their affiliation so we started in 2013 and British library didn't really have a labs or anything of that nature before that so we initially hit some preconceptions what the things they wanted we also hit some other preconceptions such as what the library services seem to expect of researchers that they would come in and want one thing they knew what things they wanted and they could give you a list and you gave them what they wanted or said no and they went away and read it and came back and gave you back and that was the model and that still is the model for a lot of library catalogs the way around is they expected us to be problematic the researchers expected us to put up a fight and so instead of going and trying to work with us they tried to jailbreak the content as quickly as possible give me all of collection X and X was the name we gave it like John Johnson collection in Oxford or Girdwood or yeah there are all these different names for collections that we give them and people want to take them wholesale however those names as we know are generally done because of the person who paid for it the eccentric Victorian who collected it the fact that it was found in someone's box in a cupboard these are some of the reasons why things get picked out and some digitization projects will be given a name but has no indication of the context of what has been digitized so back to this idea of farces the common plot mechanism in a farce in a typical theatrical farce is you'll have a conversation between two participants they would both agree to something they'll be happy let's agree a course of action and a walk away with two completely different ideas of what the discussion was so we've got a huge history of it from the ancient Greek plays up to Basil Faultier and Forty Towers to my favorite sketch from Ronnie Barker of Four Candles with the guy walking into the counter and going fork handles and he goes up to draw pulls off one two three fork handles puts them on the table is no fork handles so even though we're using words that sound the same we're using these kind of contextualized words they don't mean the same thing to different people we can have conversations where everyone's in agreement but we'll walk away with completely different ideas of what's been agreed and that's at the heart of what we've had to do with researchers the interaction we've had we've had to be very aware of being careful with words so some common farce inducing words in our sphere and these won't be by any means all of them talking to Trevor Owens earlier archive is another word which is up there with collection that just it's not a good word to use so collection we've gone over this we give it a name it generally has little to do apart from its provenance and yet people work with collections they'll take their own collection because it's easier to deal with a library if you can give one name for things they're a list of 10 000 objects so it just isn't that mechanism access access is my favorite bugbear of a hate I just we shouldn't use this word we really should because I could talk about accessing newspapers I have access to newspapers however I don't have access to newspapers I've access to a portal that gives me a page turning interface that lets me look at an article if I'm a human and not a machine that's that kind of access but my code needs different access and so this word access doesn't actually encode anything apart from something trying to get hold of something else it's not a good word content we can't avoid this word but it is a horrible one illustrations artworks manuscripts personal effects oral histories and yet content we kind of begrudgingly have to use but try to not use it because it will confuse people because other things that are content are the OCR XML that's a form of content and yet very often it's not delivered through normal library services metadata is very similar one person's metadata is as said earlier in a lightning talk someone else's text minable data so we've got to be careful about when we say metadata when we say data and how we deliver and crowdsource which I'll come on at the end of the talk but crowdsource is a terrible terrible word it's very often it's not a crowd and very often it's not really source data so after all of that I'm going to break all those rules and talk about an actual named project which has no context based on its name and I'll explain what the context is so Microsoft Books was in competition with Google Books for a kind of book search back in 2007 2009 thereabouts and then they stopped and they worked with a number of institutions they worked with the British library they started scanning 2007 they had aspirations to scan more but in 2009 or thereabouts they decided they didn't want to pursue that project and they didn't want to compete with Google in the Microsoft was it Microsoft live book search or something of that nature they digitized 49,000 works 65,000 volumes thereabouts 22 million scans but they were very nice because when they finished the project they tore up the contract and they said we don't want any contractual ownership of the files gave us the files which meant it was a very short step to say all the things that were scanned that were public domain shall we make the files and all the derivatives public domain as well and so we had the power to do that so we have in the British library an access and reuse committee whose main job is to convene and decide whether an open license is useful to be given to a collection or even permissible and so one of the first things the labs did was to take this collection which was all books from before 1900 and get them all cleared all the xml all the OCR all the metadata all the actual scans just get them all in the public domain and clearly licensed so that we can actually work on them so they have been online since 2012 via a standard page turning interface which works as well as a page turning interfaces do and I I don't mean that dismissively and I'll go into why page turning is a thing that needs to be looked at but it's not to be removed but it is uh something to talk about and when I say very low the usage statistics that's in contact in croc excuse me in contrast to what I'm going to go on to so part of the competitions that we ran we have a number of projects that we've already complete we've done six so far with different researchers and one of the first we did was with Peter Francois he's a book historian he's very interested in the history of books and because he knew a bit about the collection he was interested in travel because he knew as well represented in the collection that doesn't mean it's well represented in 19th century works it means in that particular digitalization collection there was a bias so it was interesting because it might have some data so we initially set out to make something that'll make a statistical sample based on a search so that instead of having to look through everything you look you could look at a much smaller subset and divine a rough trend based on that data so looking at from a metrics point of view however what it actually turned out to be was a device to see just how bias a digital collection was in comparison to what we actually hold so in this case the blue line that's the physical items that we hold in the British library between 1800 and 1900 that are monographs you might people at the back might struggle with this but the red line at the bottom is what we have digitized you might notice that the lines are quite different so even just even just on a mount it's different but more importantly the peaks and troughs the highs and lows and the trends aren't mirrored so just on a simple naive publication date distribution they aren't really the same distribution so there's a hint at the bias in the collection now people who have been involved even peripherally with large-scale digitization projects will know that there is going to be a bin to one side where books are too big books are too small books are too fragile books haven't been cut yet books are marred books that can't be opened books are damaged books that can't be found the receipts they go and they don't get digitized they're not part of the collection because it runs to a time scale you digitize x books and some books just won't go through the normal process I found from talking to people that the documentation about what books don't make it is actually very poor we know what books do but we tend not to take much stock in what doesn't make it so even in large-scale collections like Happy Trust the article I mentioned there is from 2012 Allen B. Riddle took an 1800s English biography so a collection of novels appeared between 1800 and 1836 so they're in the public domain we would expect them to be fairly accessible and look for them in Happy Trust between 47% and 68% were found so even in Happy Trust with all its accumulated scans did not reflect the distribution of books at that time didn't have some books and it may be because some books were printed poorly it may be because they were harder to come by but they're still part of the history but they're not represented in the digital copy so we this is more to point out that we should be a bit more careful in telling people to make use of the collections to be aware of the gaps now there are a lot of gaps and even though we've been talking about books I think we're all aware of some of the other gaps so notably web archives one of the big holes would be Facebook another one might be Reddit we have these gaps in our cultural history not just from the web archive but all the way through they exist for example who's familiar with the square brackets the square brackets of the soul in a mark record when you're describing a book you might have some inferred information things that may not be printed in the book but as a catalog or someone similar you might infer and add in because it's not present those are the things that have been caught there's a whole raft of information that hasn't been caught the the contextual information about what things were normal what things to be expected so I took some metadata about the books because we're talking about them and all the all the bites where it said this was inferred I colored blue should have been red but people who work with open cv might guess what mistake I made and all the bits which were actually from books or you know genuine assured were in white and so metadata looked like that so that's what the book metadata looked like the white areas of the stuff that we're assured appears in print and the stuff that's blue is the stuff that's in the square brackets that we're a little bit unsure of so if you look closer at the graph you might notice there are a load of peaks can you even guess why they are it's a bit hard to tell from the bottom gauge but it's the five the 10 the 15 the 20 most of those peaks are due to inferred publication dates where people have guessed that it's around that kind of period so it permeates the whole collection so algorithmic interpretations text mining natural language digging through the data should we just not bother so we know it's a little bit holy a little bit rubbish might not be true no we have what we have to do is embrace failure embrace that we could be wrong embrace that the things we come up with may not be cast iron conclusions we might have to change our mind we're an infancy of understanding which is a way of describing that we haven't done enough to know that we're wrong but we have done enough to know that we need to do more work so we're also in an exasperating situation where black boxes appear everywhere one key one is google you type in a search term it doesn't just look for that word it does a whole raft of things behind the scenes I mean for instance one key one is if you type in say paintings of flowers it doesn't look for those keywords it will look for its image bank and look for images that have been painted that also look like flowers so it's already doing things behind the scenes that you're not quite aware of and in this case my favorite one is sentiment analysis this one really drives me up the wall because frankly people use it without thinking about it they'll say this text is 68 positive without any basis on how they did it the methodology for instance I've seen someone do it with 19th century prose they'll take books from that period run it through sentiment analysis online and when I've pushed them where they got this information from the data it's from a service that took tweets and trained its data set on tweets on modern English and ran 19th century prose for it and expected to make some kind of conclusion about it that was valid it's completely okay to use code you couldn't write yourself or couldn't understand but it's not okay just to believe a label again we come to that problem of believing a label before we look at what we're doing there are a variety of different ways in which we can interrogate things like this I mean the top one is just very simply do sentiment analysis algorithms agree with another with each other enough to be valid that's very simple and very testable but very rarely seen so what can we do to mine one of the projects that we've worked on is with a researcher called Dr Bob Nicholson and he's actually very interested in Victorian humor and so I must warn you there will be jokes there is no pressure to laugh because it is Victorian humor there might even be a video so I'll warn you on that one but the point of this is we had to go through newspaper archives to find jokes because he was very interested to see the progression of jokes the notion of copyright and usage wasn't really hard and fast back then people would very often just copy and paste literally in some cases cut out paste and he was interested in seeing if jokes go viral or if we can make them funny again so this was the mechanical comedian tweeting pictures which have humorous jokes on and I have picked out what I think is the best joke so you can gauge you know your expectations why is a badly conducted hotel like a fiddle because it's a violin it's about as good as it gets and so I shall try to bring up this little animation my mother-in-law is animated before a live studio audience this was made by an animator with that dog of Eliza's across the way bit mother again this morning and I want to know what you propose doing about it I think I shall buy the dog Edwin dear mama's going do say something pleasantly Edwin my darling you know I say goodbye to your mother most pleasantly the last time you said goodbye pleasantly to my mother was at her funeral nevertheless I shall endeavor to do my best this is the last night we shall all be together tomorrow I shall be far from here 60 miles away isn't it Abigail oh 60 miles is quite near enough terribly sorry about the dog bite mama Edwin said he would reprimand Eliza for her negligence well I can't judge her and her mongrel too harshly she hasn't a husband and must have a brute of some sort to look after you insult me in my own home now Edwin that's quite impossible how's that you live in a rented house Abigail before I depart I must give you this bit of advice yes mama never love a man for money it's wrong but never love a man without money that's just stupid good thing so that was quite light-hearted but I think the pattern of access is quite clear to be able to do something like that even though it may seem quite frivolous we still need to access things that we may not readily get through the normal interfaces so much more say on the surface academic project was this one political meetings map which is going on right now and it's to do with a movement called the charters movement in the UK this was about essentially getting the vote for the common man this was before the suffragettes when only landed gentry or people with land could vote so there was meetings in pubs it went very through word of mouth there were northern star newspaper publishing meeting events and so we needed to mine the newspaper collection to find these meetings and try to map them on a map somehow and that's what we've ended up with this is an earlier version we have about 700 points now geolocated on a map to the point where we organized a walkthrough London visiting spots where charters meetings took place so also trying to track the progress of particular speakers to see if events flourished nearby one of the things that was most surprising was that it was believed that north of England it's a Manchester and those kind of places work was the heartland but the amount of meetings we found in London itself has been quite interesting we haven't quite got to grips with the scale of it yet but we did a walk and there's our researcher herself Caitriona Navigas she dressed up in period clothing and we did a reenactment of a charters meeting based on some of the documentation we found for that particular one it was raining it is London so but we struggled for it and it was a very interesting event but the point is the newspapers were key being able to mine them in ways that we couldn't do previously was the solution and we come back to this horrible word access the newspapers were accessible we had access to them we could get to them we could see them but we didn't have access to the newspapers keyword search fails miserably for most of the tasks we want to do and we've seen it in two projects so far and it will always be a constant concern we have this catch-22 situation where the newspaper the people who hold them the third parties who we digitize them with they say yes you can work with them just tell us which newspapers you want we'll give them to you and they don't like us saying all of them so we have this problem and it's not something we've solved we've done it mainly through begging and borrowing and getting hard copies of things that were left on drives that probably shouldn't have been left on drives but technically it's okay that we have them but we shouldn't have kept them but it helped some of the other things that have come through through all the projects was most of them would be much easier if everything had a URL if every piece of content effort of every illustration we could nail down with some kind of identifier that went somewhere that told us about the thing I mean most people in the room who have heard of linked data might go that way but I don't even care about the rdf part I don't care about that encapsulation just this having something with a URL is enough the other assumption is to not assume that it's going to be a human at the end of a terminal someone may not click it may be some code that's the key point and also to reconsider what data actually is it's not just the article scan that you can see that's been post-processed we might want the original you might want the OCR we might want just the plain text as bad as the OCR is just the plain text is enough to do a whole raft of things because even though we can train things to spot jokes using perfect English we can still do it to an extent with bad OCR because it fails in similar ways given a version of say abby fine reader so we had this microsoft book collection we didn't have many URLs for it at all and we wanted to interrogate it in a way that researchers could do if they walked in the door so I had to pretend at the very beginning to be a researcher to try and run something that they couldn't run through existing interfaces and see if it was possible so as Tim Sherrod discussed before the the face detection algorithms are fairly well known they're easy to run they they're not a problem but they're also quite interesting to find faces where you wouldn't expect them so I did some wrote some code to dig through the 22 million images the scans of the books to find faces and it was throwing up some random things as I was playing but people were watching over my shoulder seeing what was going on and the images themselves in this apparently dry collection of travel accounts some of them were actually quite surprising and interesting and in a way that we didn't really understand before so I wrote something called the mechanical curator which took a not-so-random walk but almost random walk through the images and would throw up ones and it's still going now and it's pulling out images I've had to move it to a different server so it has become more erratic because it has its own mood and by transferring it it lost its old mood and now it's got a new mood and it's going to go in its own thing before it was really obsessed with symmetry and now well we'll see what it's got but at the moment it's picking out lots of plates it likes people so it does do random walks and it will get infused about certain things but it's still quite random it gets bored very quickly so this was a little thing and way back at the beginning it had only a few hundred followers it's now got a thousand or so and people started asking about these images because I was only posting one up every hour at that point can I get all the images and so we talked internally and asked do you have anywhere we can put one million nineteen thousand images on spec they said no so we looked around for three options because we don't have that many resources went to Flickr tried to set the right license couldn't so we had to go through Flickr Commons and completely abuse the wonderful photographic collection and it was by putting all these illustrations motifs decorations maps and what have you however we over the course of 10 days the code I wrote did it and we now have over a million images on there and people have been tagging them and playing with them and exploring them and telling us about them because we really didn't know what was in them before as you can see here we're trying to curate sets of things some of these sets are actually done by the mechanical curator so it'll go through view stats and pull out 200 images which have been least seen or a couple of hundred images which have been the least tagged and it will generate various things and it has led to the point where almost every image has been seen 20 times by someone we hope obviously we're having to trust Flickr statistics these are wonderful meaningless numbers that are great for managers because it shows impact but it doesn't really it shows that someone seen them the more important numbers are the 500,000 tags or the fact that people are reusing them and we know about it but because it's public domain they have to tell us about it so we're trying this thing of iterative crowdsourcing which is a term I've stolen from Mirage because it just fits what we're doing we're taking this blind collect we're taking this collection blind and trying to just get broad facts about map portrait landscape flora fauna just rough terms and a lot of them have just been folks omni folks onomies where people have been putting what they want on there and then taking collections of them and doing more things with them so when I uploaded them it did strip it of context and that was in part purposeful because we wanted to stimulate a different idea of what was in these books what what things you could do and what things you could see in particular the vast amount of CS research the computer vision work is done with photographic sources by far the majority of things in that collection we put up have nothing to do with a camera at all and all to do with the artist's eye it also isn't done with just straight lines it's etched it's done in different hatching and so it was intended to also provide a different spur a little encouragement to the CS community to say here's a million images good luck so there's this lovely little piece an interview with David Foster Wallace which talks about perfectionism and part of it and that's one of the things we need to get over this fear of failure not embracing failure when it happens or trying to cover up is one of the problems but if your fidelity to perfectionism if your fidelity to try and get it perfect it's too high you just won't do anything you'll write endless white papers saying about what you will do in 2020 and it is tragic as he says and I I do like this it's actually kind of tragic because it means you sacrifice how gorgeous and how perfect it is in your head for what it really is and what it is going to become you have to do that sacrifice and it is hard but it has to be done there's a fear of imperfection there's a fear of getting it wrong fear of failure and encourages to value the systems that provide access encourages to value systems over outcomes because systems are measurable outcomes are very hard we can measure hits we can measure how close it sticks to a specification but to measure how useful it is to a community to measure how much it means to an artist to have an access to this kind of resource that's hard it also encourages people that once they've built an interface that fits that they can measure there is this reticence to build another to build a second interface running parallel there's a sphere that it seemed that one is old and one is new and it replaces rather than augments I'd like to talk about interfaces a bit now so back in 2009 2008 I got a receipt printer from ebay little things you put on a till hacked it so that I could print why I like for it and Corey doctor wrote a book about that time called makers and he released the text of the book as under a cc license so I took the text and to try and make the infinite scroll of a web page in reality I printed on receipt paper came to two rolls and took me 45 minutes to roll up each one it took a while and the point is to show that even if the metaphor of interface is translated from one medium to the other it doesn't mean it's going to work in a new medium so why page turning why is this the only interface I tend to see why is this the only means why is this anything but a comforting solution that we're used to we've got kindles now which pretty much show one page and most people getting used to them there are have been studies and I can't and it's that kind of throw a thing where there are studies but there are about how people are getting much more used to reading in one column than with an open book and it is an age distribution and please can someone find that out so I've completely forgotten that so we have this different method of access we're doing these iterative crowdsourcing tasks and this one I like to pick out we worked with some people at Wikimedia to tag the images with a very simple question of is it a map is it not a map just tag the map if it is and just go through the whole set and it illustrates that it wasn't just done by humans 30,000 were found by humans but 20,000 or so were found by algorithms that were based on what humans were telling it so based on what people were finding it was making more training sets to find more maps and vice versa so that hand in hand we found 54,000 maps and have begun to georeference them so we've georeferenced 10,000 of them with the help of volunteers doing the hard work of seriously hard work of crowdsourcing all of them so we talked about crowdsourcing we talked about a little bit about algorithms but can code identify subjective qualities these 16 sad women were found by algorithm there was someone called Quasimodo Mario Klingman who makes his own software he calls himself code artist and he made some software that would cluster images based on things he found interesting and he does this by collecting some images which then informs it to cluster based on images that look like it in this case he did it so it would find women in sad poses collated them colored them blue arranged them edited them but he found sad women this is a nice example of how computers are deciding what's happy or not again coming back to that horrible invasive face issue in this case the photo never happened that moment was never recorded what happened was the phone that was used to take a picture took a series of eight or nine pictures in rapid succession by accident and a google algorithm called auto awesome picked the bits of the photos from that sequence where they were smiling and combined them so that they were both smiling they both had a happy face they both had their eyes open and they were both looking at the camera in none of those pictures that were taken did all of those happen at the same time but it's stitched together the bits that did so this is the result of a computer deciding what people should do in front of a camera so the images haven't been just used for research they have been used for things like coloring books for kids taking the the buildings that have been tagged by volunteers and just pulling them out making so you can color it in and in one notable instance been used by artist where he was collecting eclectic images making favorites and in this case he was inspired by the extremely strange thing of the the skunk and the machine gunner he was playing around the collage made it a bit bigger a bit bigger still and it became the front piece to the burning man festival in 2014 and in 2015 it ended up at the British Library it's four panels eight foot by twenty foot backlit and we'll be there until the Christmas tree is there so I'm told and to celebrate we had a bit of a party which is a bit rare for the British Library but we had some representatives of Burning Man come along and show us roughly what they might want to get up to you can see that they've grabbed some of the images from the flicker stream and it's just wonderfully bizarre and strange we had a guy there who was eating glass and poking nails for himself but just because but it seemed very very central to the theme of Burning Man so the crowdsourcing word there are so many bad assumptions about that word that I can't think of a different one but here are some really bad ones so a crowd of people each doing a small bit this long tail is what we found in just about every single thing where we've had volunteer engagement a small group of people doing huge amounts of work and small bits of casual work they've been done by everyone else and it's getting to the point where the word crowd isn't great it's finding experts who can do stuff and looking after them and not pretending that they might be in your organization because they probably aren't also another fantasy you must have special software you do if you're advertising which is a good part of crowdsourcing but it's not necessary if you already know what kind of people you want to work with to crowdsourcing information a spreadsheet is a wonderful thing it's horrible when it comes out but it has the data and people can use it without too much skill if you build it they will come no no no and a variety of different things it's totally untrustworthy it's easy it fixes all problems it doesn't and it's cheap it really isn't because you have to look after it if you put it out there and ignore it it will do nothing it will be a proper failure as opposed to an educated failure so there's this recipe for meaningful gamification another word I really hate but you know the whole badges and high score does nothing for me but this this is actually talking about the basis of what encourages people to use systems that may not give you a high score that may not reward you in that fashion talking about something a bit more meaningful and it's quite a nice rundown and it is an acronym but yes it's worth looking into so I'm going on to another project now so it's about crowdsourcing but we've seen with a lot of projects that involves the keyboard and mouse involves people seeing they're learning how to use something involves that dedicated group of people that would do most of the work for you and not so much of a casual because it's not really going to be geared for the casual because once you get your core group you'll find that your interface rapidly iterates to the point where they can use it so we built something and we built some games for it and the bottom line should suggest what we built or what I built last month so I built an arcade machine the constraint is it has a joystick left right up down and it has two buttons and the idea is to break the connection to the keyboard to try to find out if there was a casual way for people to interact rather than the crowdsourcing uh traditional way which is people come to a website click a button and walk away is it something where we can get something useful by people walking up to the machine playing it for a minute and walking away is that something we can do putting in a public place it's very simply made that's pretty much all the electronics is a bit hidden underneath the control pad there's a Raspberry Pi there's a sound chip for an amplifier and there's a power supply on the very bottom that's it really LCD screen the keyboard encoder and standard arcade controls however we needed stuff to run it we needed stuff to play so he ran a game jam now normally game jams are really really loosey themed so you might have a game jam where the entirety of your direction is shadow that'd be it that the whole thing your whole prescription of what you can put in for this game jam is one word we had three paragraphs we didn't really expect many people to enter but we thought we had to try we actually got a couple of good games so the one at the far left is about an art thief you're breaking into a museum you get chased by guards if you're not careful your torch so you can see the torch and you're roaming from artwork to artwork and your buyer has some different criteria that they're interested in they might want you to go and steal art that's got symbols on it or they might want to go and steal portraits or landscapes or maps or any of the other things that we've tagged the flicker images with so this is giving us a way not only to find images that we may not have tagged before but of validating the tags that have already been put based on the activities of the player the tag attack is much more explicit in what it's asking people to do images dropped by carried by a cartoon fox and they go faster and faster as the levels go on and you have to decide whether the thing in the picture is a face a map a portrait so in summary it's been a bit of a shotgun approach to my soapbox of rants but fundamentally it's about being careful the words you use the labels that are applied maybe being a bit critical of things we tell people not because we're wrong but because they might have different ideas of the words we use wanting access to everything is actually a default for many researchers and it needs to be to be born in mind and a singular presentation of a collection just having a page turner or just having pdfs or just having a thing is risky because if it doesn't suit someone that's it that's your single point of failure and experts are where you find them and you have to look after them once you do because it is a community you are working with them not using them like well cathol you have to make space to experiment at least that's what we found to fail and to be able to fail in a way that is understandable to not just put things out there and hope that they're useful to people you've got to be able to learn from your mistakes but that means you have to make mistakes and sometimes they can be very public so thank you and I hope my rants have kept you occupied