 All right, so we move from the sprinting to the lightning. Our beta sprint review team recommended six slightly longer presentations, which we've just seen, and then three lightning round presentations. I'd love to ask Bookworm, the DPLA Collection Achievements and Profile System, and WikiCite all to come up on the stage. I think we can fit more or less the whole crew. So we have a little bit less of the back and forth action. And if you can't fit there, maybe just stay close by. We're going to go Bookworm, then the Collection Achievements and Profile System, and SJ, you are last. All right, so the first of the lightnings called Bookworm, Ben Schmidt, and Martin Camacho. Thank you so much for being here. So I'm Ben Schmidt. I'm a graduate student in history at Princeton University and a visiting graduate fellow at the Cultural Observatory. And I'm Martin Camacho. I'm an undergrad in math at Harvard. And we're here to talk about our project, which is Bookworm. And we are a relatively smaller group, so we have focused on just one thing. But we think it's an important thing in its search. We want to talk about how you find things in the digital library. So this is an example of a current search site. This is the Open Libraries Interface. And it works like a lot of interfaces. You get about five results, 10 results of books, and you get a bunch of metadata fields on the right side of the screen so that you can limit the books that you're interested in looking at. And this is a good interface. It's so good, in fact, that pretty much everybody is using it nowadays. New York Public Library's new redesign is very similar, and even commercial sites like Amazon and Google are producing essentially the same sort of search results. And these search results are good, but only for a particular type of searching when you're trying to find a few books which match your particular search terms, or you know the author of, or you know the catalog metadata that you're interested in. But it's not great for exploring a huge library. It doesn't give you a sense of the volume of these libraries. And it discourages browsing, in a way. In some ways, we've recreated the closed stack library. We can only page five or 10 books at a time. So we're wondering how you can create a new form of search that will allow you to look at large groups of books at the same time. And we started with something that the Cultural Observatory released last year, which was the Google Books Engram Viewer, which is something that lets you look at the history of words by using the vast Google Books collection. And we learned basically two things from this. First is that people are really good at finding patterns in these sorts of visual displays. That if you look at a chart like this that shows the rise of the university library, and then it's decline or something, and the rise of the digital library in the 1990s, that this are clear patterns that you want to investigate and you want to look into that motivate you to actually find these books. But we also learned that people complain about books and that people have a lot of interesting insights about books that this sort of thing doesn't address. It just doesn't let you find out what the individual books are here, because Google's metadata collection is under strange licensing restrictions. You can't actually even get at the catalog information. So we wanted to try to take both of these things and put them together, get the power of the insights that you can get through full text made visible through compelling visualizations that people can actually use to find patterns with library quality metadata and an experience that places reading books at the center of a library search interface. And this is what we came up with, Bookworm. I think maybe you shouldn't go to that URL quite yet, because a live demo is about to happen and we do not have a spectacular server here. But afterwards, I urge you to try to crash it. And our hope with Bookworm is that it lets people have a new sort of engagement with libraries. So now Martin is going to show how this works. All right, so this is the main interface for Bookworm, and we've tried to make a user interface where the user builds a data visualization where they can define the axes and also define the different series in the chart. So here I've searched for the word evolution and I'm graphing it over the corpus of all books. So what I get is a graph that shows the percentage of books in the 19th century, which used the word evolution. Now suppose I'm interested in the use of the word evolution in science. So I go here, I add another group of books, go into the metadata selector and pick the subject science, which is Q in this categorization. Then when I resubmit the query, now I can see that, of course, evolution is used much more prevalently in science than it was on average in all books. I can keep doing this, so I'll add social science here, which is H for some reason. You're evidently not a librarian. Exactly, exactly. And now I can see that there's a sharp and steep increase in the usage of evolution in the social sciences around 10 years after Charles Darwin published The Origin of Species around 1870 and it keeps increasing until it gets to, about as popular in social science as it is in science in around 1900. And so even if I didn't know anything about social Darwinism, I could still kind of look at, I could see something interesting going on here. So I want to find out more and to do so, I can click on the graph in a certain year, say, 1880. And what pops up is a list of all the books in social sciences published in 1880, which use the word evolution, and this is ranked by, I believe, frequency of usage of the word evolution. So now I can read any of these using the OpenLibrary's awesome online book reader just by hitting read. So let's read Principles of Sociology by Herbert Spencer and this will pop up the book in OpenLibrary's book here. So that's the basic gist of Bookworm. Please visit us at bookworm.culturomics.org and thank you very much for your time. Good job, you guys. Wonderful job. We're gonna go on without questions and we'll come back to the whole group with questions if that's okay. So Tito Sierra, if you would come up and your colleague, Jason Rinalo, I think you're seated and not speaking, excellent. The DPLA Collection Achievements and Profile System. Very curious to see how the graphical representation of my presentation turns out. So no pressure on you guys there. So I'm gonna talk about the Beta Sprint Collection Achievements and Profile System and just to give you a little bit of context for this, we begin from certain assumptions. So one assumption is that there's already a lot of cultural heritage content online on the web, many different formats from large organizations, small organizations and more of this content is coming online every day. The problem that we see is that a lot of this content is distributed, it's siloed and it's not very easily discoverable by search engines or really well connected and therefore just because it's accessible doesn't mean it's discoverable, which is a theme that some of us talked about this morning. And it's unfortunate that a lot of this is really great content that's underutilized by the general public. So basically a layman's way of looking at our proposal is to say that what we're proposing is building something kind of like a LinkedIn profile for every special library archive or museum collection in the country. On a more technical kind of point of view, rather than build a tool or a destination or website, what we're proposing is building basically a platform that simplifies aggregation of collections, both big and small distributed and to also facilitate interoperability between these collections. So as part of the Beta Sprint, Jason and I developed a set of wire frames. I only have five minutes, so I'm only gonna show you two but we have a poster outside where you can see four more and if you go to our website you'll see some more there as well. So I'm just gonna very quickly highlight some of the features of this profile system. So we came up with this concept called the Minimal Viable Profile Record which is the minimal amount of metadata about a collection to be part of the system, a title, a URL, a description, and a category. And the idea is to have a really low barrier of entry so that we can have really small collections represented in this. But the idea, we also introduced this concept of achievements and basically there's a little discrete pieces of metadata that could be attached to the profile to extend the profile in many different ways. So we don't begin from an assumption of a single monolithic data structure. We feel that you could have perhaps, let's say a hundred different achievements which tackle different aspects of using the collection and that each collection can choose achievements that make sense for that particular collection. So in this particular example you could think of these achievements as widgets and here we have a collection search widget and a visitor info widget to collect some additional metadata about the actual physical collection that the online collection represents. Thus perhaps maybe encouraging access to the physical collection. And also the last one here is just an example of we call it robot friendly and it's metadata to help with crawling. We propose a very open collaborative model so that it's not a single set of curators doing this but it's like Wikipedia, very open. So collection managers can up these records or volunteers people in general public can contribute to it. Although we're showing screenshots of the website the primary benefit of this is to help other third party services build services that aggregate and reuse some of this content. So you would have API access to the collection. This is one way of doing it through JSON but one could imagine many different serializations for getting at this core information. And finally, this is a screenshot to sort of show our brainstorm on where this idea of achievements can go. And so we have several categories, promotion, content, so content reuse. So maybe we could have something like teaching tools to help people engage with collections, a donate widget to help people get funds for some of their collections, current awareness tools like RSS feeds, analytics tools, any number of ideas here. And the idea is that it's a platform that allows these sort of extended metadata elements to grow and be attached to the collections. So just to summarize our proposal, rather than think about the DPLA as a site or a tool or a destination, we envision it as a networking platform that can help cultural organizations both big and small maximize the use and discovery of their content. And with that, thank you. Thank you, Tito and Jason. That's really, really provocative and thoughtful. Thank you. Last of the Lightning Sprinters, another of our great friends, S.J. Klein, Samuel Klein officially up there, but S.J. to me anyway, from Wikimedia. He's one of the most important Wikimaniacs out there and he's got a fabulous final Lightning Sprint. Thanks a lot, John. This is an amazing audience and I just wanted to mention after the discussions earlier today, John is rightly proud that this is both a public and a private partnership, but it's also very much grassroots and institutional and librarians and patrons and people from all different spectrums. So I'm immensely honored to be here and I think that we can really do something tremendous. This is going to be a super short Lightning talk. This is one of those simple ideas that four different groups came to from different angles at the same time and we said, you know, let's do it better yet. Not only is it a simple idea, but it's easily distributable among millions of people and it's simply to upgrade the analog citation to make it more useful for people to get a rich understanding of the sea of knowledge we have access to now. We talked a lot today about what to do with physical collections and what to do with digitization, but tomorrow most work is going to be born digital, it will always be digital and it's gonna be self-published and Wikipedia is an example of what you can get out of that kind of teaming, thriving, living environment, but you need to know what is a reliable source and this is really hard. The joke among Wikipedians is any editor with a source in his pocket has a god-like power and thank goodness most people don't know it. So unfortunately, there aren't any good canonical ways to find out whether a source is reliable in a given context or even how reliable it is in general or to collect the comments about a source and to sort of cluster them by where they're used in various topics. So our proposal is to develop a Wiki for crowdsourcing and discussions and citations in general in their contexts. Sorry about that. And our model is to build a central Wiki that draws from repositories, links back to metadata indicating their provenance. I am not a Mac user, can you tell? Yes, please. And then implements an API that allows anyone who has their own private or sort of specialized uses of citation information to develop their own team Wiki that might include private data that they can't share with everyone and that allows them to push back any updates or any new information that they have to share with the world. We have all of this in more detail. I know this is very small on a poster outside and we would love to take comments and suggestions from people. There are a bunch of mashups that have already been made. We came up with this idea a couple of years ago but really got into it this past spring and throughout a call for people to suggest mashups that they'd love to use it for, Mako is using it in sociology to capture oral histories better. Right now it's really hard to generate permanent citations for oral histories. It's being used to query available citation data to show you everything in a field that you have used elsewhere within a group. On Wikipedia, people are planning to use it to cache web pages and to figure out what pages should be cached because there's a tremendous rate of link rot online. And once you've centralized information about citations and how often they're used and how people visualize that, you can then automatically cache anything that's used at least once in what you consider to be a useful context. The audience and the people who benefit from it are often editors and readers as well as general researchers, people who want to resolve citations for particular quotes and they know the quote but they're not sure what it comes from, they could pick an arbitrary document but they would like to simply have it resolved through the cloud of existing citations. And for librarians and for people who currently have exit data, we would love to talk to people who do their own citation analysis. We're doing a couple citation analyses to figure out how people use different classes of sources and how people respond to what are considered negative quality references and what they do with them. Thank you very much. Terrific essay, thank you. All right, questions or comments for any of the lightning round spinters? See somebody has a mic, yes. I have a question for the bookworm team. I like your idea, I'm just wondering, do you see it as any application beyond the academic world? Yeah, I actually think that one of the things that we learned from Google Engrams is that these sorts of data visualizations are things that really do have a really broad ranging appeal. So when Engrams came out there were something like a million queries in the first 24 hours and majority of those were outside of academia. A lot of them were in places like journalism to be sure but it is something that beyond just academics, it's a very compelling way I think for a lot of people to think about problems as having historical patterns that can be solved. I don't think it solves all search problems obviously but it's good for I think more people than I expected. As we pass the mic, one conception of the DPLA and we'll hear more about this with the work streams a little bit is we could imagine developing a series of services, open services where people develop things that are for specialized audiences which are not exclusive, right? I think the concept we've been talking about of interoperability and open APIs and so forth is to say we need lots of people to focus on different use cases and different kinds of communities and bookworm may or may not be limited to one. There was a great blog post last night by Dan who responds to bookworm and others. So there's a lot of conversation I think about for whom these services will best apply. Other comments on the lightning round. Wow, I did not expect silence after this crew. I'm gonna use moderators prerogative though to call on one person if I might. Richard Urban is over here. Do you mind being cold-called in this audience? So one thing that has been, yeah, you're getting my cure. You're an expert on linked open data, among many other things and this has been thrown around as a concept at many different points and many different conversations and I figure as S.J. said, this is a mixed crowd. Some people may intuitively know what linked open data is and why it's important as one thing that everybody seems to keep talking about almost as though it's a given. Others may not be as familiar with it and I wondered if you wouldn't mind giving a brief moment or two on what it is and why it's important in your view. Okay. Thank you. Boy, that is getting cold-called on this, isn't it? Well, I mean, I think in one way is that there's a lot of buzz around linked data now but it's really part of a continuum of stuff that many of us have been involved with for many years. It is a different way of expressing the descriptions that we've been doing before as document-like structures that appear as catalog records. It's the same kind of metadata that we've been doing in the Dublin core community for a long time but part of the change is being able to try to be able to identify and name those things in ways that are more reliable and finding simpler ways that don't mean that you have to have a very set record-like structure but find much more creative ways to express the things that we wanna say about these resources and then be able to relate them in ways that let them travel into things like the Extramuros project that start building relationships between things like collections, people, libraries, the ideas that people have about collections and objects. So I think that's sort of a quick overview but I don't know if that answers your question. I think in the course of the next 18 months, lots of conversations to come about it, what it really means, how it could work, whether other people have different ideas that are inconsistent with that because I think many of us take it as received wisdom at this point but I think it's a very important sort of conceptual core certainly that we heard across the beta sprints but we've heard otherwise as well. And I think there are some subtle differences in how link data is different than metadata or the cataloging that we've done before but I probably don't have enough, I don't have 140, more than 140 characters right now. Great, we'll come back to it. I think we can even go to a break early but I have one other sort of quasi cold call. So anyone from the beta sprint review team, I saw Laura DeBonis and others earlier who wanted to comment at all kind of more broadly about the submissions, you don't have to, you've already volunteered more than you were bargaining for but if anybody on that review team wanted to comment, I wanted to make this space available. Yes, one alternate question. If I asked how many of you had edited Wikipedia articles before, many would raise their hands and if I asked how many of you. Can we ask? Yeah, let's ask. Wikipedia editors ever. Okay, and Wikipedia readers obviously, everyone raises their hand because you've all done Google searches. And when you go to Wikipedia as a consumer of the information, you really aren't sure about the veracity of what you're reading and so how can the DPLA help Wikipedia? And one of the ideas that we have is that working with Brewster Cale's book scanning project and David's work, we could for instance hover over a sentence in Wikipedia and contextualize it automatically by searching through the text in books and that would go over to the Wiki site and then Wikipedians could, so Wikipedia is supposed to be neutral, it has a neutral point of view and how are you supposed to have a neutral exposition of a topic if you don't understand the scope or how the citations fit into the literature more broadly. And so one of the things Wikisite is supposed to help editors of Wikipedia do is figure out how to do the neutral exposition and then as a consumer of Wikipedia, you can go there and you can actually verify that that's true. You can go to Wikisite and you can develop your own interpretation of the authority and reliability and whatnot of the source. And so I just wanted to express that because I think that a lot of people use Wikipedia and so I think the DPLA has a lot of potential for helping in that space. Thank you so much. And I absolutely see that as reciprocal with the amazing work that Wikipedia has done and I think it's no secret or surprise that Dorn Webber is funding both of these two projects at the highest level. So I have two tiny closing comments. One is that the Beta Sprint Review Panel had the same experience that many of us did in looking at the 39 amazing submissions that we got. And to be very clear, there was no reward for these people other than I suppose the obligation to sit on this stage and come talk. They promised to give all of the intellectual property to the DPLA and so far as we need it for reuse. There was no monetary reward of any sort. This was an extraordinary outpouring of volunteerism which I think bodes incredibly well for America and the world and also this project. So I wanna thank everybody who submitted a Beta Sprint and we absolutely could have gone on for hours and hours more with other sprinters. So please join me in thanking the entire Beta Sprint community. And I hope very much that the result of the Beta Sprint is of course some of this code and these ideas will be rolled into what is a DPLA 18 months from now and beyond but that these projects will also extend on their own in many respects. You can see that these are independent efforts with very passionate and wonderful people behind them. One such project that the Beta Sprint Review Panel identified in their notes which there are two pages of notes on there, a public domain registry project. This was one that they flagged as an important, interesting one, our colleague Charles Nessen at Harvard's working on it. The Beta Sprint Review Panel said totally relevant and interesting and go support it and work on it and he'd love collaborators but was not chosen for this presentation because it wasn't core but there are lots of things like that that I think are worth reviewing and looking at. And since this is my last time with the mic today, I wanted to make one at the sort of dangerous thing of thanking one person in particular. There's an amazing sort of human interoperability thing going on here between the archive staff and all of the Berkman staff who are wonderful but there's one person over the last year who has worked harder on this than anyone else which is Rebecca Haycock. If she's anywhere here we need to give her a thank you. We're working on a nickname for Rebecca Haycock. If anyone has a good one but she's the bomb, that's all I have to say so thank you. We'll see you back here at 3.45 for the final session and thanks so much.