 Well, thank you. Good afternoon. I'm Kevin Cherry and I'm Senior Program Officer in the Office of Library Services at the Institute of Museum and Library Services. IMLS is one of the three federal cultural grant funding agencies. We are the chief source of federal support for the nation's libraries, archives, and museums, including zoos, arborias, aquaria, and botanical gardens. The mission of IMLS is to inspire libraries and museums to advance innovation, lifelong learning, and cultural and civic engagement. We provide leadership through research, policy development, and grant making, which is probably how you know us. The WebWise conference is a signature initiative of my agency and this conference annually brings together representatives of museums, libraries, archives, system science, education, and other fields to explore the many opportunities made possible by digital technologies. WebWise 2012 took the theme of tradition and innovation and investigated how libraries and museums have used digital technologies to help scholars, students, educators, and the general public understand history and humanities. Crowdsourcing was a topic at the 2012 WebWise and we thought this practice was so useful that it was worth sharing with a larger audience in today's WebWise reprise. So public historians and librarians have long relied on their local communities for volunteers to assist paid staff as docents and interpreters and as collections and reference assistants. More recently, a variety of collaborative online tools have made it possible for volunteers from a larger pool to assist museums and libraries to share in content work through crowdsourcing. So I'll turn it over to Kristen Layes from Heritage Preservation, who is the moderator for today's Webinar. Kristen? Thanks so much, Kevin. I'm Kristen Layes. I work here at Heritage Preservation and you have joined us today through our Connecting to Collections online community and the meeting room that we have in the online community. We are working on this project with funding from IMLS and in partnership with the American Association for State and Local History and with technical support from Learning Times. If you haven't been in the online community before today, I just wanted to tell you a little bit about it. Our goal is to make a one-stop shop for technical information on collections, care, and preservation available to institutions. We have a list of topics that you can search on. About twice a month, we do a live chat with an expert and we record everything and archive everything on this site so that people will have it accessible to them in the future. We also have a discussion area, so that's a great place to network with your colleagues on conservation topics. And I'm glad to say that with additional support from the Institute of Museum and Library Services' Laura Brush 21st Century Librarian Program, this coming fall and into 2013 Heritage Preservation is going to be doing a series of more extensive webinars on preservation topics, especially those that would be of interest to libraries and archives. So I just encourage you, if you are just learning about the online community, to become a member and keep an eye out for upcoming events. So today, as Kevin mentioned, we're going to hear about crowdsourcing for working with collections. And we have two great speakers with us today. Sharon Leon is the Director of Public Projects at the Rosenweg Center for History and New Media at George Mason University here in the Washington, D.C. area. And Ben Brunfield is joining us from Austin. He's a software engineer from the page Open Source Transcription Software. And we are going to hear from Ben. First we'll hear from Sharon, just giving an intro to crowdsourcing and a description of the script program. And then we'll hear Ben's presentation from WebWise. We're hoping that you'll have no trouble with the video that we're going to be showing. But if you have an issue with buffering on your system or if it's a slow day on the Internet, I will be prompting you with two links. And I wanted to describe them briefly. One is a YouTube video that you could go off and find. If you click on this link, it will open it up in your browser. The meeting room will not change that you see. But if you go to your browser, you'll see that that link did open. Or if you prefer, Ben has fortunately made a blog post that has all of the images from his PowerPoint with some transcript. And that's another great way to walk your way through his presentation. But hopefully all of the tech stuff will work today. And you'll get to see this reprise from the WebWise conference. And I will post these links again as I mentioned. And then we're going to go live with Ben to view from the page open source transcription software and tour his computer and some tips that he has. Then hopefully we'll have good time for questions and answers from you. And we can help you more with this topic. Before we get started, we'd love to learn a little bit more about you. So if you don't mind responding to some polls I'm about to drag in, that will help our speakers understand a little bit about you and what you're hoping to learn today. So it looks like most of you did not attend WebWise. And that was our hope. This is just another way for IMLS to make sure that the information that was shared at WebWise has an ongoing life. And we were really happy to work with them to present it today. And if you did see WebWise, this is another great opportunity to interact with the speakers. And it looks like they're basically, there's some knowledge of crowdsourcing, but a lot to learn. So we're so excited that you've joined us today and can learn about this. I'm going to go ahead and close the polls. And then I will drag in Sharon's PowerPoint so she can begin. All right, so thanks for waiting. And Sharon, I will turn it over to you. Thanks, Kristen. I appreciate that. I am the Director of Public Projects at the Rosenzweig Center for History and New Media. And I had the good fortune of chairing the panel on crowdsourcing at this past WebWise conference. So I'm very pleased today to be able to talk a little bit about crowdsourcing and cultural heritage, and also to be able to show the audience some of the features in our open-source crowdsourcing tool, Scripto. You may or may not know that crowdsourcing is a newly coined term, not a totally newly coined term, but a relatively newly coined term. And it was first used by a fellow named Jeff Howe in a Wired Magazine article in June of 2006. But you may be familiar with other kinds of cooperative projects that work on sort of gift labor. One of the most famous of those is the Linux open-source operating system. So open-source software operates on the principle that developers from around the world cooperate together to build and correct software in an open environment. You may also be familiar with probably more likely even than open-source software, familiar with the notion of creative commons licensing and the open content movement. The movement to share freely for use and reuse cultural heritage content. Open-source software operates under similar kinds of terms as the open content movement because of the availability of open-source licensing, like the GNU general public license, which suggests that the software can be used by anyone and modified and that developers can contribute to it. In addition to the notion of open-source software and the open content movement, we saw shortly after the coining of the term crowdsourcing a smattering of web gurus who wanted to talk about the way that the Web 2.0 interactive environment was going to change the way that we worked with one another. And you probably may be familiar with particularly work from Clay Scherke. Here comes everybody, but also Howard Gringold, smart mobs. And more recently, Clay Scherke's work, Cognitive Surplus. And the thesis of Cognitive Surplus is that if we all just stopped watching our TV for so many hours a day, we would have all of this excess time that we could collaboratively pull towards the greater good. And some of those projects involve gift labor to cultural heritage organizations. The most famous content project built through crowdsourcing is the Wikipedia Free Encyclopedia. And Wikipedia has its own arm of workers who are affiliated with the Wikimedia Foundation, who work directly with libraries, archives, and museums. And they often, in particular cases, institutions can apply to support a Wikipedia in residence, to actually do work in Wikipedia with cultural heritage material. But even before Jeff Howe coined the term crowdsourcing, there were crowdsourcing projects going on in cultural heritage organizations. At the Center for History and New Media, one of the very first ones that we participated in was the building of the September 11 digital archive, which is a user-generated archive of materials related to the September 11th tragedies. And that archive, which was in 2002, so four years before Jeff Howe coined the term crowdsourcing, collected born digital materials so that we have an archive of over 150,000 stories, images, files, text messages that represent the history of that tragedy and the days surrounding it. And we went on to launch and support several others of these, like the Hurricane Digital Memory Bank project, which has collected the history of the Hurricanes, Katrina, and Rita. And that's a little bit of a smaller site. But again, that's a 2005 site that really is about building collections using the generosity of interested publics who have lived through particular points in recent history that got off the ground before the term crowdsourcing was even invented. Many librarians and archivists will be familiar with the project, the Flickr Commons project, which was and is, continues to be, an effort of libraries, archives, and museums to present their image collections to the world on the Flickr site. And this was a kind of unusual venture, a bold step from first in January 2008, the Library of Congress, which offered the initial seed images for the Flickr Commons site. And the dramatic success of this project went a significant way in convincing libraries, archives, and museums to consider ways that they could be more interactive with the public and gain some knowledge from the public. And what I mean by that is that by August of 2008, 2,500 Flickr users had left more than 7,000 comments on the 3,000 images that the Library of Congress had posted. And that resulted in data was then reincorporated into the descriptive material in the Library of Congress's collections for 500 of those images. And similar materials from the Smithsonian got a whole lot more traffic and commentary on the Flickr Commons site than they ever get at the Smithsonian Institution site. In the period between 2008 and the press, and the scene, an explosion of crowdsourcing projects. And the one that we're looking at here is from the New York Public Library. It's called What's on the Menu. And it's gotten a lot of great press coverage. And it's an innovative project that involves very small bites of contributions from interested users. And what people are doing here is looking at historical images of restaurant menus and transcribing not even the whole menu, but individual dishes in the hopes that it's possible for us to eventually compile some real knowledge about food culture and history in the United States. And so that's at the level of asking individuals to contribute transcriptions of single words. And some of the projects that have been put together by the Zooniverse group also ask for these kinds of small bits. And you'll see a little bit more of this with Ben's presentation. But there have been larger projects, cooperative projects. The most recent one here from the New York Public Library follows a little bit more complex track. And that is asking individuals to use scanned phone books from New York to help with the indexing of the 1940 U.S. census. But individuals with institutions with large collections, particularly manuscript collections, are most often asking experts whether they be genealogists or scholars or students or just general enthusiasts to help with transcribing of full manuscripts. One of the earliest projects in this range was the Transcribing Bentham Project out of the U.K. But here we see in this slide the more recently launched Citizen Archivist dashboard from the National Archives. And what's interesting about this project is it offers individuals a way to contribute to the National Archives through a variety of venues. They can tag materials from the collection. They can transcribe entire documents using the Wiki Source transcription tool. They can edit Wikipedia articles or the National Archives own Wiki. They can upload and share scanned images that they have made of archive materials. And they can assist with the indexing of this current 1940 census. And this project is similar in its large scope and multi-vocality to the Trove Project at the National Library of Australia. But of course we have to be wondering, listening to this, what can smaller organizations do to involve the process of crowdsourcing in their own work? And one of the tools that I want to show you today is the Scripto tool for crowdsourcing transcription. And Scripto is an open source, free open source software available at Scripto.org. And what it is, it's an add-on for existing content management systems. So if you're a library or an archive that has a manuscript collection that you're hosting in Drupal or in WordPress or something like that, you can add transcription to that existing collection on the web. And of course you be asking yourself, what's the benefit to the organization of adding crowdsourcing? And I think that there are many, many benefits. But there are sort of four primary ones for doing it with a document collection or a video archive or an oral history collection that I see functioning primarily with the Scripto tool. And that is number one, adding crowdsourcing transcription gives you valuable material to improve the workings of your search engine because getting the actual individual text means that the search engine has more data to crunch to help people locate things. For projects that are on a shoestring budget who might not be able to devote as much time and attention to transcription as they would like, crowdsourcing transcription can offer a first path at getting that text that you might need for a more finely grained scholarly addition. The other thing that it helps us learn is what are users really interested in? What does the community find interesting about your collection because those documents will often go first and transcribe completely? The other possibility is what are the easiest documents to transcribe? They may also go very quickly. But the most important thing about doing crowdsourcing of transcription is it directs the institution, the attention of the institution to fostering and maintaining a vibrant community of users. And those users will find you and they will be dedicated to your collections, to your content, and to your institution because they feel like they are a part of the process. Now, Scripto has been supported in its design and construction by two federal funding grants. One from the National Endowment for the Humanities and one from NARA's own NHPRC. And that has enabled us to create the Scripto tool. And so how does Scripto work? Scripto provides for you an add-on to your existing content management system. So you have your materials in, say, Drupal or Omacca. And then you add Scripto to it. And Scripto is a PHP library that allows you to connect a MediaWiki interface for collecting transcriptions to your CMS so that users can transcribe documents and then that transcription data can be added back to your records. And so we look here, you can see that there are three kinds of plugins for Scripto. There's one for the Omacca web publishing system, one for WordPress and one for Drupal. And the reason that we have those three to offer is because those are open source software. Scripto is an open source tool and those are three very popular open source content management systems. Omacca, the plugin for Omacca is the most fully featured of the Scripto plugins allowing users to transcribe virtually any kind of file. So not just images but also audio and video. And you have a choice of viewers. And finally, Omacca has structured metadata, Dublin Core metadata built right into the system. The next most fully featured version of Scripto goes along with WordPress and that also allows you to transcribe, offer users the ability to transcribe many different kinds of files. And then finally, the sort of least fully featured version of Scripto is Drupal. And that's because Drupal is so flexible and free form in its ability to be customized that it's very hard to build a too fully featured transcription tool there. We at the Center for History and New Media have Scripto working with an existing archival project, the Papers of the War Department. The Papers of the War Department has over 45,000 documents in it. And we have high resolution scans of each of those documents. And we were never going to have the funding to transcribe them. And so we plugged in Scripto with the Papers of the War Department. And this window that you see here is a close approximation of the interface that you'll see with the Omacca and the WordPress versions of Scripto. So you see that there is a document viewer window on the top where you can pan, slide around, zoom, those sorts of things. And you as the transcriber have the opportunity to view the existing transcription, if there is one, to edit and save a transcription and to discuss the document and the process of transcription with an editor. But if you look to the left of the slide, you see that there are several administrative tools as well. And that allows someone who's logged in as an administrator, which is where I was when I took this slide, to open the page in MediaWiki so that you could do a little bit more fine tuning. You can protect a transcription, which will basically lock it so that it cannot be edited because you judge it to be done. And you can, when you judge it to be done, export that text back to the content management system. And this is what the MediaWiki instance looks like on the other side. So you could see recent changes in the system and do some editing from there. In the Omacca and Scripto instance here, we see what from an end user's perspective is simply an Omacca item page. You can see the Dublin Core metadata for the item. And then at the bottom of the page, you see the file that accompanies the item. And then at the very bottom, you can see is appended the opportunity for users to transcribe the item where it says transcribe this item. And then there's a link to that file. Clicking on that, a user would open the Scripto transcription page. And again, we have the opportunity to view the current transcript, to edit it, to show the history, to discuss the transcription. And we can see with the WordPress instance that we have virtually the same options. We have a nice image viewer and a window to work on the transcription. And so we have begun to see the Scripto tool implemented in a variety of instances. And the one we're looking at here is an Omacca and Scripto instance that's being run by the University of Alabama libraries. But we're also starting to see individuals do highly edited scholarly editions. And this Piri's project is what is being referred to as a readerly edition, that's a Jeremy McGahn term, of the tales of Piri's, which is a classic Persian tale. And this is also an Omacca plus Scripto instance. And other elements, other projects are getting started. This is just beginning, John Glower's Confesio Amontas, where we can see that Multa Urban has put together a blog here to talk about the project. But has also created an Omacca and Scripto instance to begin to allow people to transcribe this very detailed manuscript. And so that's a functional introduction to the concept of crowdsourcing and some really practical ways to think about using crowdsourcing transcription with existing collections. Thank you so much, Sharon. That was really good. And I just wanted to remind everyone that although we are holding questions until the end, just so that we can keep moving, you are welcome to put it into the chat box on the left of the screen if you think of something as we're talking and we will make sure to watch for that and can direct questions to the speakers at the end of our presentations today. I'm going to drag some screens around here and we're going to gear up to hear from Ben. And we're trying something new, as I mentioned, where we're going to watch a video together of him speaking at WebWise. He's going to advance his PowerPoint slides so you have that all nicely synced. And hopefully everything's going to work smoothly. But if for whatever reason you're having trouble, I would bring your attention to the box I have put on the upper right saying technical difficulties, try these links. And those links are live. If you click on them, your meeting room screen will not change, but your browser should open them. And you can either see a blog post that has a summary of this whole presentation or you could just watch the video on YouTube. You wouldn't have the benefit of the PowerPoint slides there, but hopefully through one of these means we will all be able to watch together. So Ben, if you're ready, I will hit play on the video. I believe I'm ready. Okay, great. There we are. Okay, I would like to talk about some of the lessons that have come out of my collaboration with small crowdsourcing projects. We hear a lot about these large projects like Galaxy Zoo, like Transcribe Bentham. What can small institutions and small projects do and do the rules that seem to apply to large projects also apply to them? So there's three projects that I'm drawing from here, this experience. The first one I'm going to talk about in a little bit is one that was run by Belville Park Online Collaborative. It's a project to transcribe and index and the indexing is very important. The field notes of Lawrence M. Klauber who is the nation's foremost authority on rattlesnakes. These are field notes that he kept from 1923 through 1967. This is done by the San Diego National History Museum and is run by our own Perian Sully who's out there somewhere. The next project I want to talk about is the Diary of Xenus Matthews. Xenus Matthews was a volunteer from Texas who served in the American forces in the U.S.-Mexican War of 1846. And this diary is kept by Southwestern University, the Smith Library Special Collections. This had been digitized for a previous researcher and is small but Southwestern itself is also quite small. And the third project I want to talk about is actually the origin of the software which is the Julia Brumfield Diaries. The name looks familiar, it's because she's my great-great grandmother. This project was the impetus for me to develop this tool for Crowdsverse Transcription. So all of these projects, what they have in common is that we're talking about page counts that are in the thousands and volunteer counts that are numbered in the dozens at best. So these are not, you know, this is not family search indexing where you can rely on hundreds of thousands of volunteers and long and large networks. So who participates in large projects and who participates in small projects? One thing that I think is really interesting about crowdsourcing and these other sort of participatory online communities is that the ratio of contributions to users follows what's called a power law distribution. If you look here we see, most famously this is Wikipedia and you see a chart of the number of users on Wikipedia ranked by their contributions. And what you see famously is that 90% of the edits made to Wikipedia are done by 10% of the users. If we look at other Crowdsource projects, this is the North American bird phenology program out of Patuxet Bay Research Center. And this is a project in which volunteers are transcribing ornithology records, basically bird watching records that were sent in from the 1870s through the 1950s, entering them into a database where they can be mined for climate change. And what's interesting about this to me at least is that this has been a phenomenally successful project. They've got 560,000 cards transcribed all by volunteers, but Stella W. at Maine here has transcribed 126,000 of them, which is 22% of them. Now, Charlotte C. in Maryland is close behind her, so go local team. But it's, again, you see this same kind of curve. If we look at another relatively large project, the transcribed Bentham project, this isn't graph, but if you look at the numbers here, you see the same kind of thing. You see Diane with 78,000 points. You see Ben Porkowski with 51,000 points. You see this curve sort of taper down into more of a long tail. So what about the small projects? Well, let's look at the clover diaries. This is the top 10 transcribers for the Field Notes of Lawrence clover. And if you look at the numbers here, again, in this case it's not quite as pronounced because I think the previous leader has dropped out and other people have overtaken him, but you see the same kind of distribution. This is not a linear progression. This is a more of a power law distribution. If you look at an even smaller project, now mind you, this is a project that is really only of interest to members of my family and elderly neighbors of the diarist. But look, we've got Linda Tucker who has transcribed 713 of these pages followed by me and a few other people. But again, you have this parallel where the majority of the work is being done by a very small group of people. Okay, what's going on? Really? What does this mean? And why does this matter? The thing that I think this gets to, the reason I think that this is important for a couple of reasons. One is that this kind of behavior addresses one of the main objections to crowdsourcing. Now, there's a lot of valid objections to crowdsourcing. I think that there are also a few invalid objections. And one of them is essentially the idea that members of the public cannot participate in scholarly projects because my next door neighbor is neither capable nor interested in participating in scholarly projects. And we see this all over the place. Here's a few example quotes and I'm not going to read them out. I believe that this objection, which I have heard a number of times, and we see some examples right here, is a non-sequitur. And I believe that the power law distribution proves it's a non-sequitur. And really, I saw this most egregiously framed by a scholar who was passionately just absolutely decrying the idea that classical music fans would be able to competently translate from German into English because, he said, after all, 40% of South Carolina voted for Newt Gingrich. Okay. All right. So what's going on is I think best summed up by Rachel Stone. And what she essentially said is that crowdsourcing isn't getting this sort of random distribution from the crowd. Crowdsourcing is getting a number of well-informed enthusiasts. So where do we find well-informed enthusiasts to do this work and to do it well? Big projects have an advantage, right? They have marketing budgets. They have press covers. They have an existing user base. If you ask the people that transcribe Bentham Project, you know, how did they get their users, they'll say, well, you know that New York Times article really helped. That's cool. All right. The Galaxy Zoo people, Citizen Science Alliance, yesterday, 24 hours ago announced a new project, SETI Live. Now, what this does is it pulls in live data from the SETI satellites. And in those 24 hours, I took this screenshot. I actually skipped lunch to get this one screenshot because I knew that it would pass 10,000 people participating who have contributed 80,000 of these classifications. And it would have been higher except last night, the satellite, no, sorry, the telescope got covered with cloud cover. So they dropped from getting 30 to 40 contributions per second to having to show sort of archival data and getting only 10 contributions per second. Well, they can do this because they have an existing base of active volunteers that numbers around 600,000. So how do we do that? How do we find well-informed enthusiasts? This is something that Catherine Stallard and Ann Beercamp Anderson at Southwestern University Special Collections and I discussed a lot when we were trying to launch the Xenus Matthews Diaries. We said, well, you know, we don't have any budget at all. And Catherine said, well, let's talk about local archival newsletters. Let's post to H-net lists. I was in favor of looking at online communities of people who might be doing Matthews genealogy or, you know, the military history war gamers who have discussion forums on the Mexican war. And while we're arguing about this, Catherine gets an email from a patron saying, hey, I'm a member of an organization. We see that you have this document. It relates to the battle of San Jacinto and the Texas Revolution of 1836. You know, can you send this to us? And she responds saying, hey, 1846, great. Check out this diary we just put online. I think that's what you're talking about. Well, that wasn't actually what she was talking about, or sorry, what he was talking about, but he responds and says, yeah, okay, I'll check that out, but, you know, can you please give me the document? I want, they get it back to him. And we return to our discussion of, okay, what do we need to do to roll this out? We're going to start, you know, working on the information architecture. We're going to work on the UI. We're going to work on help screens. And while we're having this conversation, Mr. Patrick checks it out. And Scottie Patrick starts transcribing. And he starts transcribing some more. And he continues transcribing. And at this point, we're talking about, you know, working on the wording of the help screens, the wording of our announcement trying to attract volunteers. And this is page 43. All right? Of the 43 page diary. And while we're discussing this, he goes back and he starts adding footnotes. Look at this. He's identifying the people who are in this saying, hey, this guy who's mentioned is, here's what his later life was. Hey, this other guy, hey, he's my first cousin, by the way, but he also left the governorship of the state of Texas in order to fight in this war. He sees, and believe me, in the actual original diary, Piloncio is not spelled Piloncio. I mean, it is a, the, Zenos Matthews does not know Spanish, right? He identifies this. He identifies and looks up works that are mentioned here. So, wow, all right, we got our well-informed enthusiast. In 14 days, he transcribed the entire diary. And he didn't just do one pass. I mean, as he got familiar with the hand, he goes back and revises the earlier transcriptions. He kind of figures out who's involved. He asks other members of his heritage organization what this is. He adds two dozen footnotes. What just happened? What was that about? Who is this guy? Well, Scott Patrick is a retired petroleum worker who got interested in his family history and then got interested in local history and then got interested in heritage organizations. And he is our ideal well-informed enthusiast. So, how did we find him? We haven't, the project isn't public yet, right? Our challenge now is rephrasing our public announcement. We're looking for volunteers to something that adequately describes what's left to do. Well, let's go back and take a look at this original letter, right? What we did is we responded to an inquiry from a patron. And not an in-person patron. This is someone who lives 200 miles away from Georgetown, Texas. And what you have when someone is coming in and asking about material is if you think about this in terms of target marketing, this is a target-rich environment. Here is someone who is interested. He's online. He's researching this particular subject. He is not an existing patron. He has no prior relationship with Southwestern University libraries. But, you know, hey, while we answer your request, you might check this thing out that's in this related field. That seems to have worked in this one case. Hopefully we'll get some more experience with future projects. Okay, so how do we motivate volunteers? More importantly, how do we avoid demotivating them? And big projects, a lot of times they have lots of interesting game-like elements. Some of them actually are games. You have leaderboards. You have badges. You have ways of making the experience more immersive. Old weather, which is run by Galaxy Zoo, will plot your ship on a Google map as you transcribe the latitude and longitude elements from the logbook. The National Library of Finland has partnered with MicroTask to actually create a crowd-sourcing game of whack-a-mole. So this is crowd-sourcing taking the extreme. But there's a peril here. And the peril here is that all of these things are extrinsic motivators. And we ran into this with the Clouber Diaries. Parian came to me and said, hey, let's come up with a stats page, because we want to track where the diaries are at. So we come up with a stats page. Pretty basic. Here's where some of these are at. And hey, while we're at it, let's mine our data. We can come up with a couple of top 10 lists. So we come up with a top 10 list of transcribers and a top 10 list of editors, because that's the data I have. Well, remember, the whole point of this exercise is to index these diaries so that we can find the mentions of these individual species in the original manuscripts. Do you see indexing on here anywhere? Neither did our volunteers. And the minute this went up, the volunteers who previously had been transcribing and indexing every single page stopped indexing completely. They weren't being measured on it. We weren't saying that we rewarded them for it, so they stopped. Needless to say, our next big rush change was a top 10 indexers. So this gets to crowding out theory of motivation. And the expert on this is a researcher in the U.K. named Alexandra Evely. And her point is that if you're going to design any kind of extrinsic motivation, you have to make sure that it promotes the actual contributory behavior. And this is something that applies, I believe, to small projects as well as large projects. So I have 13 seconds left, so thank you. I'll just end on that note. And that was it. Thank you very much, Ben, again, for speaking at WebWise, and I hope everyone enjoyed the video. I hope you had a chance to see it. What I'm doing now is just creating a screen so that Ben can go live, we hope, to his computer and can show you some of his latest work and give you a little bit of update on what he's been doing with crowd sourcing. Again, feel free to, any questions you have in, or comments in the section on the left of your screen, and we'll be getting to Q&A after this presentation. Thank you very much. So my name is Ben Brumfield, and I'd like to thank, of course, John for this opportunity, and talk a little bit about the tools volunteers are using to do these crowd-sourced transcriptions. And I'd like to start off by differentiating things a little bit. Also, if I believe a fantastic CHNM has put into this, one of the things that they are focused on is breadth and the ability to sort of seamlessly hook your content management to that, and institutions and users don't need to nest the media wiki. And in contrast to that, I'd like to talk a little bit from the page. It strengths a lot more in the lines of the depth of what users can do with the transcription themselves. So if all it does not have the breadth advantage of script code, I think that it shines in some other direction. So what we're looking at right now is a copy from the page, looking at the Venus talk about earlier. And I just want to show one of these things that look like. We can see the manuscript on one side, and we have the script on the other side. We have the left-right display. For this, we have the footnotes that were all added. And in addition to being able to transcribe, we can go through, we can see the versions of the pages and all of the different edits. And the number that was made to this, and this particular page looks like he's gone back and edited it. At times, if he learned more and added more context and became more familiar with the hand. I don't want to do any live demo markup on depth. So we're going to switch over to the project that I'm most familiar with, which is the Julia Brumfield Diaries. And because they don't want to spend a lot of our time with everyone watching me. What we're going to do is we're going to add annotation to something. And this is a diary that is being transcribed and well-informed. And in this case it's being transcribed by a man named Nat Wooding, who did essentially a vanity search for his name. And it came across all of the mentions of Nat Wooding in these diaries and discovered that the Nat Woodings who appeared in the diaries as a mailman was actually his great-uncle. Since he discovered this, he really sort of jumped into this whole process and he's gone through and transcribed over 60 pages and regularly sends notes of things he's found. Most recently, last night, he sent me an email that he had discovered on Sunday, March 4th, the discovery of a moonshine still mentioned in the diary. So we're going to go take a look at that page with that still. So this is just the view of the transcript to click the transcribe link and we can see the work form. And the thing that I want to talk about here is the ability to do indexing via the Wiki links. This is the same syntax that is used on Wikipedia for creating links to new articles and just like that, and just on the page, adding double square braces around any piece of text will either create a link or link to an object of that name we had saved. We'll see that we now have a hyperlink when this is displayed. We can see a page about Leigh Bromfield with a little bit of notes about him. And perhaps more importantly, we could see an index of all the pages that refer to him and mentioned him in any form including Leigh, Leigh, Leigh Bromfield, Leigh Bromfield's various varieties of this. This should work from this point and all. All of these were generated by transcribers during the process of indexing. Essentially, every single one of these mentions of a subject with that page goes into a relational database. And the reason that that's interesting is that because these links are stored along with the text with you to link them, we can automate, we can mind that data to ease the editorial burden on people doing transcription. One of the things that we can do is we can have the system look through the text that we have transcribed and come up with suggestions of ways we might like to mark it up. So, for example, now pressing the auto-link button will refresh the page and it's not saved anything, but what it's done is it's gone through. And it's suggested links. So it has suggested the Oscar that it sees the text as being linked to Oscar Bromfield. It is a grace and we don't actually know who Nellie is. So there's an index for Nellie herself. There's no index for still anywhere. We're going to add one, I think. And then we're going to go ahead and save this. We'll then be prompted to categorize still. And I'm not sure whether a still is a social activity or a thing. We're just going to call it a thing. But one of the things that Nellie is able to mind this data does is that it gives you the ability to exert a form of consistency over what is being done. Because of the subject has not been mentioned for 300 pages or perhaps one user indexed and another user who's unfamiliar with the importance of the term comes through and is transcribed on the page, auto-links, auto-suggest feature allows editors to make sure that if a subject with instance it was indexed once it can be indexed again. One of the things that we can do with this is in addition to seeing everything that has had a link to Oscar, we can read all the pages that mention him so we can see Oscar in context if we're only interested in Oscar or if we're only interested in a particular agricultural process or indeed moonshine skills. We can also, if we decide to work indexing 500 pages into the project, we can do a full-text search for all the pages that have text that might mention Oscar but which do not include a link to Oscar Brumfield. In this case we pulled up Oscar Booker, that's legitimate. We also see Oscar Brumfield here who has not been linked. In this case it's because the entire text hasn't been linked. The last sort of interesting thing I want to show off that you can do with the ability to mine this link of subject indexes is you can use some sort of elementary topic modeling. In this space Oscar Brumfield we're asking sort of what would generate a graph based on what subjects may be related to him by the facts that they showed up on the same page. Now Oscar is not that interesting but I'm going to go through and take a look at the rain and I'm going to restrict this to say just show me agricultural activities that took place often on the same diary entries in which rain was mentioned. And if I do that and regenerate this we will learn something about what actually happened in this diary and we discover that when it was raining that didn't stop them from plowing but quite often they did strict tobacco which is an indoor activity that can take place on a farm. They did milking because the cows have to be milked whether it's raining or not. And this is just sort of an interesting way of exploring the text based on the markup. I think that we don't want to go a little too long in this so I'd like to go ahead and end the live demo and return to the conference session. So let me stop sharing and we will turn things over to Kristen again. Okay thanks Ben. I guess we're getting a couple questions on how these programs work with existing programs. And if you mention this and I missed it I apologize but with your program is it can you are you compatible with other programs like Sharon mentioned with Scripto? So unlike Scripto from the page is a freestanding program and that users interact with the editorial interface directly rather than being sort of hidden behind the CMS. However we do integrate with the Internet Archives as a CMS. Internet Archives will go through and do book scanning and manuscript scanning for institutions and we are able to import work into from the page and then display those images and pull their page structure out of those work. So we do integrate with the Internet Archives and of course we would be very interested in integrating with other systems. Okay and so then Sharon there's a couple questions for you. What about collective access in Scripto and then Kevin was wondering about Content DM and... Yes, yes. Well so the answer to both of those collective access and Content DM is no we do not yet have connectors built for Scripto but the basic Scripto software itself is a PHP library that and that means that a programmer who is not at the Center for History and New Media could take that PHP library in the existing architecture of the connector between the CMS and MediaWiki and build a connector script for both of those systems and so we haven't started on content yet because we don't actually have access to a Content DM system to use or collective access but that's the wonderful thing about open source software is that because the material, the software itself is available to be modified by other developers that anyone who wanted to use it with the system that we haven't built a connector for yet can do that and then we'd be happy to host those connectors and test them and help in all of those sorts of things. Great and I just put your contact information up on the screen and if Ben is willing to share his he can put that in the chat as well. So I did want to also get to Aurora Lang's question and have you dealt with any copyright privacy issues? How do you usually for each of you I think would be an interesting question to sort of explain your process? Yeah well I can certainly say on the collecting projects that Aurora is referring to the 9-11 digital archive and the Hurricane Digital Memory Bank those projects have a process that individual contributors go through where they make decisions about their materials. They certify that they're over the age of 13 and that they're sharing their material with the archive so that it can be published on the web by us or held in a dark archive and not published but in each of those cases the individual contributor retains the copyright to that material and so if they were to decide to release it under Creative Commons that may be one option but that's the way that we have handled the collecting projects and obviously with the transcription projects that you know those copyright issues are sorted out by the holding archive or special collections or things like that. Did you write any good things Ben? Actually I'd say that for my part certainly with script because you're hosting images and the material is already up on a screen. Ideally you know someone would have already gone through the can we publish this question. Because from the pages displaying its own images I often find myself in conversations with people in which they have collections in particular of 20th century letters and they don't realize or once they do realize that they have to track down the authors or the authors' errors to get permission to put those online. Unfortunately I think a lot of projects turn away because of copyright. And how did it go in your family when you proposed putting great grammar on it? Well I actually had to wait until 2008 before I was able to announce this because of the dialysis death in 1938. So the whole life plus 70 for unpublished manuscripts is definitely a factor in any transcription question. Great. Well I wondered if you saw Owen's question. Owen Williams question are traditionally trained catalogers generally amenable to crowdsourced editions to their metadata. Have you run into that? We both chuckle. We haven't run into it directly here at the Center for History and New Media because we're not a very traditional place but we do hear from all sorts of people who are very concerned about editorial oversight and authority and those sorts of things and that really is a question about how individual organizations choose to function as editors and how invasive they want to be in the process of oversight and what they want to choose to accept and not accept. And I know, Ben, that you've done so much great consulting work with individual institutions. I bet you have interesting things to say about this. Right. I would say that the main concern is over quality and in particular any organization that's used to sort of traditional organization structure where you have sort of top-down direction is going to be... It's going to take a lot to get them to the point of being willing to talk about that. The approach that both from the page on script overtake is one that is, I believe, idea of more of continuous improvement of continuing to accept editions or suggestions of the transcripts. But there are other systems out there that follow workflows, down pages after they've reached a certain state or a certain number of proofreaders. I think that in some cases those systems are missing out. I have seen suggestions come in for corrections to text that had been sort of published out in the wild for 20 years and no one had objected to them in the question. The suggestion comes in and you say, wow, that's actually correct. Maybe this transcript was not the final version. I think it's pretty hard to get some people within institutions to embrace the whole idea of no final version if that is your approach. And then Amara's question kind of follows up on that. Any ideas for allowing annotations to metadata without changing metadata for digital objects? That seems to me another quality control question. Yeah, it really is. We have certainly here because we also engineer and maintain the OMECA system plus scripto talked about a variety of possibilities for this sort of in future builds including sort of allowing an individual in sort of wiki space to accompany each metadata field so that there would be basically the canonical field and the crowd available field. And so that's one possibility. And I think, Ben, your indexing system yields all sorts of possibilities there. I think it might. And I also see this in cases where both scripto and from the they really sort of are free from text, letters, diaries, papers, that kind of thing. They don't really address more tabular which is often of interest to say genealogists. And one thing that I suspect based on developments that I'm seeing within that world is that crowd-sourced metadata improvements to your collections are going to happen in some cases for other people within the institution like them or not. I mean, the management is a unique URL to an image online. Someone can build a database elsewhere. And I know family searches doing this with their document management system that goes through and says, this person is referenced within this document. Even though that database does not live within the institution, that may be fine because in many cases institutions are okay with people saying, well, I know you have this thing and being wrong so long as the people are cruelly identified as not the voice of the institution. Right, right. Well, I mean, and I think that that's sort of the ongoing hope and principle of linked open data is that everybody participates and that we sort of compile the index of what is known about the world. I also know that for instance with the National Archives Ben and I have a mutual friend who is beginning a project to do a transcription site of an entire collection of National Archives material outside of the National Archives. And so I think that we're going to see additional and alternative metadata creation in that project too. Are you aware of what Barbara Dickie Davis recommended that I'm looking to? I have not played Recur2. Recur2. Have you looked at it yet? Ben? I've never even heard of it. So thank you for the recommendation. Yeah, I will definitely. I saw the launch of it within the I believe within the last year and I haven't actually played with it yet. So we'll both go look. Yeah, and Barbara if you want to type in any other additional information we might find interesting on that. That would be great. So Kristen McDonough in Columbus, Ohio wondered if there had been any examples of K-12 using this. Have you in your work come across any examples of that? That's a really good question Ben. I mean I know that we have some high school teachers who are using papers of the War Department documents with their students and have registered themselves as transcribers. We have about 670 transcribers and in any 90 day given period about 90 to 110 of them are active. And so we definitely have some K-12 teachers who themselves are using it and what I do know is whether they're registering for an account and working through that process with their students as a class. You know so I have not had experience with this myself about I do blog about manuscript transcription tools and I know that the transcribe Bentham project did a lot of work with K-12 outreach. This is okay and things may be different there but I believe the results were that if you do K-12 you should really view it as a way of doing outreach enhancing K-12 education as opposed to a way of gathering the kinds of well informed enthusiasts who would be the core of using high quality work. Right. We certainly marketed the papers of the War Department and the interactive piece of it as a way to think about teaching historical thinking skills most reading of documents and those sorts of things in addition to thinking about the transcription process and I hope that more folks will think about using it that way. Yeah, I can see a lot of potential but then I did you did mention earlier that you have it is it just your program that has an age limit that you require people be at least 13? Oh, so that's that's a that is a really good question that is a US law to sign up for an account and to collect personal data from a user that user has to be over the age of 13. Okay, so but if a teacher did it, let's say they did a group setting where the teacher was the registered for an account for the class and is responsible for that. Yeah, then there's no exchange of individual personal data or anything like that. Right. Well, and again you through your PowerPoint sharing you showed us a lot of different sites and places that are using this you don't happen to be aware of any kind of through clearing houses where interested users might come across these projects. Is it just sort of by happenstance a bit like what Ben was describing where he just they just happened to make a connection with an enthusiastic user? Ben, you have you have a list of tools a shared tool box of tools but not of actual do you have one of projects? I do not actually have one of projects but there are a number of non-open source projects projects using non-open source tools that are out there. The closest that I can you know the closest I can come to this is perhaps though the National Archives were basically narrated as an archive dashboard that you presented because the interesting thing about that is that it involves both tagging and transcribing and presenting commentary which is a sort of multi-modal way of allowing citizens to engage with their material. One thing that I wanted to mention was a quick digression that I am just reminded of that you mentioned this is that Sharon's point about transcripts making your material more is really important. That is how this great nephew of the mailman found this site and this is how a lot of this public engagement beyond the walls or the environment of your institution can be enhanced by this kind of conversion of static images with no transcript to things that Google can actually index and will show up in search results and I think it does great for engaging the public and in finding volunteers. Certainly can't OCR the handwriting. No. And making it searchable. And Ben talked about our dedication to progressive enhancement. We are not looking for perfect transcriptions. We are not looking for spotless transcriptions that would produce a scholarly edition that would go to print. But to simply get the text makes the search engine work better and work better in ways for people who are trying to do things like social history rather than proper nouns. If you're looking to think about the ways that supply trains run in nationally instance you really need to trace those other transcripts that are going to show up in the individual transcript. Yeah, thanks. I think that's a really good point. Now I'm going to put you on the spot. If any projects that either one of you have worked on, I wondered did you have additional value added? So as you say you may be not going to get scholarly edition information from your volunteers but it's an amazing outreach tool and a way to push your information out to a broader public. Have you either of you running across examples where it even has encouraged someone to write a check for the preservation of these materials or other such value added? I wish I could say that were true. I know that University of Iowa Libraries have had that experience with their Civil War Diaries in which people aren't donating or people are driving accessions in ways that I don't know the details of. I know in my own experience certainly the Diaries that I'm dealing with as a family project were scattered to the forewind. We're distributed to the Diaries' grandchildren and since putting this online we started with only three Diaries in our possession and one of the first volunteers was able to go out and track down an additional three that no one knew interested and get them scanned herself. So I'm pretty pleased with that as a result. It's not a big check but certainly locating this material was wonderful for us. I would definitely say that that kind of payoff may not come directly in the form of an individual donation but I know in the case of the Center for History and New Media we constantly have to demonstrate the usability and attractiveness of our projects and these kinds of features have really driven traffic to the papers of the War Department which is a huge site and is serving as a scholarly source for all sorts of dissertations and things like that but has found a more popular audience and that helps with attractiveness to funders in general. Yeah I agree. I think it's very high value and it's great too that the additional collecting possibilities that could start to snowball once the public are really aware of what people have and what they're doing. One thing I was really struck by at WebWise when we had the presentation from the New York Public Libraries about the menu project was that project is now so popular that their department can't even get stuff posted online because there's a lot of volunteers to work on it but that to me is a crazy success so a good way. I see that Barbara did share some more information about Markutu which is great and a URL which is great too and Judith has got a question she said all the activity seems to be funneled through transcription. I'm wondering about audio and video not transcribed and images not images of the page and using crowdsourcing for applying metadata and annotation so it's like photographs, audio, video and in fact you want to actually segue nicely because we're going to do another WebWise WebWise reprise in two weeks and it's going to be on digital tools that are available now for cataloging oral history. So you'll get some great information on that Webinar but just in terms of crowdsourcing would one of you want to take that question? Well you know one of the ways that Scripto is nice in its flexibility is because it works with an existing CMS it can also be skinned and modified by an individual project and so just because we've got it set up to ask for a transcription of a particular item or file doesn't mean that you couldn't set up the page to ask for please identify all of the individuals in this photo and then whatever field in the CMS that you ship that data back to it could be the transcription field but it could be individuals in an image field. You can sort of configure it to go anywhere and so in that case it could be set up to gather that kind of additional data about an archival item a digital file. Right and I think that there are interesting analog and ways to apply the kind of tools that have been dealing with manuscripts. You know, I've had a number of people suggest to me that they wanted to use the indexing feature from the page just to write prose descriptions of photographic collections but to be able to mention the people in those those descriptions and have those automatically generate indexes to those photos. There's a separate tool. There's actually a project that I'm working on right now for a UK based charity that uses the underlying tool behind both Zooniverse's oldweather.org and their what-to-the-score.org which is a the Bodleian Library's collection of 19th century musical scores. They're trying to build a database out of that and try to really clean up their database and come up with tempo information and either this or March who it is composed by and that tool has also been released open-source and it is a fantastic tool because volunteers can go in and they can actually drag and drop across the screen and annotate sections of images with structured data. There's a group that I'm corresponding with who's looking at essentially using this to build a database of particular oriental textiles. If you imagine this is not a group coming up with a database of quilts and quilt ornamentation. This is, you know, quilts are not but we're still talking about presenting human volunteers with images and asking them to come up with data about those images. I would recommend checking out Scribe. It's also released on GitHub. It requires a lot of custom work as I'm discovering but it's a unique tool that I think has applicability beyond manuscripts per se. Beth, would you mind just putting those URLs quickly in the chat? Oh sure. And then it'll be there for perpetuity especially that music site you mentioned. Sort of like short URLs but I just just keep people that get it. That's great. Thank you. We have a few more questions so this will be sort of our last call if you have any questions to type in. That would be great. And thanks so much for the folks who are typing in other great projects available for us to check out later. They are multiplying daily. Every time I turn around there's another crowd sourcing project and it's just a new slight twist on what we've been doing. It's very exciting. Yeah, that's great. Yeah, I really appreciate the suggestions that Ben had for doing these on small scales too because I think that sometimes gets people overwhelmed that they have to develop all of this stuff and you can take it for a very small project and then just sort of build on that which I think is a great way to strategically handle this. So we've got some great suggestions. Suggestions coming up here. Yeah. Great and I'll just reiterate we will be this recording from today will be put up on the Connecting to Collections online community. I will put something in it when everything's available I'll put it up on the discussion section of the page and so just check back in a day or two for that and I could even take this chat transcript and put it into the blog post so that it'd be easy to listen to the entire thing again just to get these links. So I'll make sure to do that and again, thank you to Learning Times for making all of this so easy for me to do and to get up so quickly after a webinar. It really is great for everybody. Yes, Kevin. We do want to point out that this was just one session at the 2012 WebWise and the entire WebWise conference was videotaped and is available on the web. Yes, I can pull up that. Give me a minute, I will pull up that link again. And I would like to voice my enthusiasm for the oral history discussion that is to be held. I believe in two weeks from this day I was absolutely blown away by that when I attended WebWise and I have raised and raised two people about the sort of wish I had oral history projects to work on to use some of the tools that were just amazing. Yes, I thought it was really good too and one of the reasons we asked them to present it in this WebWise reprises because at WebWise when we had them say that some things were going to be coming up and we were going to try and be available starting in May and so I'm looking forward to them giving us a tour of their completed website that's further along in development because even some of the great tools they were telling us about back in March I think now are going to be available to the public. So again, it's Thursday the 28th of June so two weeks from today it will be the same timeframe as today. So at connectingtocollections.org click on the meeting room and just log in. No prior registration is necessary just mark your calendar and join us if you're able but again it will be recorded and then you can check the connectingtocollections.org for recording in any links or other additional information resources that have come out of it. Well I don't see any more questions I will give Sharon and Ben last word if you have anything that came to mind. I'm just really excited about the fact that so many people are interested in doing these projects and making connections to their local communities and even not the local communities who are really their core interested users. I can only agree I am delighted to see all this interest I think as a member of the public I am delighted to see institutions more interested in reaching out to the public in these new ways. Thanks. Well I want to encourage everyone who participated today to please fill out our evaluation. I'm putting on the screen now link and you can click on that link again it's going to open it in your browser so you might need to hunt around for it on your computer screen. You can also cut and paste again on the recording page but it's just very short it would just be great to learn a little bit more about you make sure that you got something out of today's session and it would help us while we prepare for our next WebWise reprise in two weeks. So I want to thank everyone again for all of their great information and participation both the speakers and our participants and please fill out our evaluation any comments or additional suggestions you might have. And we want to thank Kevin for having this great idea to begin with. You're a hero. Very good. Okay have a great afternoon everyone. Thank you folks.