 Sitten katsotaan, että koko kiinnoita on Rachel Frick, Kounsill- ja informaationressauksessa USA. Se on seuraavaksi digital library federation DLF. DLF on yksinkertaisuus, yksinkertaisuus, US, Canadian Academic and Research Libraries. DLF serves as a resource and catalyst for collaboration. This sounds very interesting among digital library developers. And actually she uses as a case study in her presentation, the Digital Public Library of America, which is a service for libraries, archives and museums. And we know that in Europe many countries are developing such services and also at European level we have the European, which is a similar type of service. So we are looking forward to your presentation, Rachel. The floor is yours. Thank you very much. Very much. I want to thank the Lever Program Planning Committee for inviting me here today to talk to you about friction and flow. A lot of times when people ask you to speak, they want to know a little bit more about who you are, what you're working for and here's just a little bit of information as my introducer did say. And I think one of my things I love most about my job is being a community organizer, getting people excited about ideas and prospects that can move our whole community forward. But in light of today's presentation, I think it's really important to know where I actually come from and why I believe open access and open venues to information is really important. I come from a very small town in a state not a lot of people know of outside the U.S. called West Virginia. I guess this town is called Nitro and it's situated in what we call the Chemical Valley, where we have a lot of chemical plants and have a lot of accidents. We did actually produce nitroglycerin and gunpowder for World War I. And here is a picture of the exact plants, first plant in Nitro. As you can gather from these facts, West Virginia is not a wealthy state within the United States. We pretty much rank the lowest in really nice economic development indicators like jobs, really low in education and actually a quality of life survey we were number 48. So all joking aside, growing up in this type of environment, where it is a challenge to have access to quality information. If I did not have access to a university library, if it wasn't sitting on a bus line from my home, I really wouldn't be here today. So open access for me and how do we translate the openness of our public libraries being on mass transit in our town, being open 24 hours a day into this digital world is really, really important. And so this leads to the topic of my talk. How do we minimize friction and how do we maximize flow of information to the people when they need it at the time that they need it? So I think a misconception is that open is free. Open is not free and that we actually have to collectively take on the cost of making access open because open should be an element of value, something we value highly within our community because it increases the flow of ideas, spawns innovation, creates information flow, transparency and reaches talent and entrepreneurship in our communities and provides access to everyone for learning and research. As we mentioned in the keynote, it's not so much the digital divide between rich countries and poor countries but there's digital divide across our countries whether you're a first world nation or a third world one. Another element of why we need to value open is that we exist in a mash-up culture. We like to take things and mash them with something else. We like to make our own recipes. We like to experiment and as we see this happening in our academic research library environment, our research is becoming increasingly more interdisciplinary. It is crossing institutional boundaries and national boundaries. Open is really important as we work through this as this collaboration moves forward. Another driver in what's changing up our environment is data. We talk about data a lot and I always ask people what do you mean when you're talking about data? We talk about data driven decision making. We talk about research data or the outputs of research and scholarly work on our campuses. We talk about data curation or data preservation. Linked open data, library collections of data which is what I'm going to focus on today and I love the one I've been seeing a lot lately as data is the new oil when we talk about moving from carbon based economies to information based ones. I'm sure you've heard this said before but the network really does change everything that we do. It changes the way we organize ourselves, the way we communicate with our communities and who those communities are. In the past when our keynote was talking about the written word I kept on thinking about the way we organize ourselves was really dependent on our geography and who was within so many miles of us. What are physical presence? In a networked world, in this information age when we rely on the internet, that's no longer holds true. Our communities span all of these boundaries. I know Lorke Dempsey talks about how the network changes libraries and he has come to labor and talked to you guys I think back in 2008 about the Decentred Library. The way we organize our libraries actually changes the model of service where before we weren't outside in bringing resources to our community we now are looking at the inside out library. The point I want to talk to you a little bit today is really the one that's point number three. There it is. That the fact that we need to be closer and more engaged with the creation management use and sharing of all our information resources as well as that discovery happens elsewhere. People don't necessarily have to come to the library to discover our services. So the idea of a library as platform. How many people have heard this whole discussion or conversational thread around library as platform? A few. David Weinberger talks about this a lot. David Lancas as well. It's how we actually enable knowledge creation. How we as librarians, our new mission should be not gatekeepers to cultural heritage but actually facilitators of conversations that are important for new knowledge creation. So from where I stand in the conversations I've been hearing the points of friction in this networked world where data is the coin of the realm so to speak. Those points of friction are around our access points or delivery points or those platforms or systems. Whether or not we've cataloged our collection how many people have collections in the back room that have minimal or no cataloging? It's okay. You're not alone. Or just the fact that the metadata there is so minimal there isn't a lot of meaning to associate those collections. And I think the third piece is of rights in terms of use. How do we express how to access and reuse our collection to facilitate scholarship and learning outside of our walls, within our walls and on the net? So what I'm going to do today is talk to you about this through the lens of the Digital Public Library of America. How many people have heard of the DPLA? Great. Awesome. As you know this began in 2011 and it was funded initially by the Alpha Peace Loan Foundation. And we took two years to really plan what we wanted to do. We consulted colleagues at Europeana as well as other national libraries to really talk about what we wanted the digital library to be. And we launched in 2013 in April with over 2 million records. Just to be clear, DPLA does not have any digital content within itself, but we actually aggregate metadata records catalog records of other cultural heritage organizations. So we don't hold the digital objects themselves unless you count digital objects as metadata records. So we have 6 million objects or 6 million metadata records. And we like to say that the DPLA is not just a portal but we view ourselves as a platform for innovation and a public option for change. So currently just to give you a little bit of feedback about the DPLA, we have things called hubs, both service hubs and content hubs, we're present right now physically with hubs in over 12 states and we're working on more. Here's just a small example of some of the organizations that are content hubs. These are people that have over 200,000 objects to contribute like the Internet Archive, the Biodiversity Heritage Library, the National Archives, New York Public Library, Hottie Trust just to name a few. Our service hubs are people that actually gather or aggregate metadata either on a regional or state-based model. And we have folks from Texas, as was mentioned earlier, Minnesota, Kentucky, North Carolina and something called the Midwest Digital Library, which is a large region. We're working to have more of these types of hubs throughout the United States. So with 12 content hubs, nine service hubs, we have our 1,200 partners. And to break it down, this kind of gives you a graph. A lot of times when people hear a digital public library in America, people argue whether it's a public library, whether it's an academic library. And as you can see, our academic and university libraries contribute over a quarter of the content that's in our DPLA, whereas public libraries are coming in at 15%. We have a wide variety. And as you get, this is our portal. This is our front page. This is where people come to look at things for the digital public library one item at a time. And it's that more traditional sense, that the way we've used digital library collections for the last 20 years, come to my front door, come through my search box, and I will show you your search results one at a time. We also have other ways to interface with DPLA. You can come to the DPLA content through a timeline. You can pick a date and see the amount of collections that are in that particular time. You can browse by a map. We have this really interesting tool called a bookshelf that tries to replicate stacks, so you can do a virtual browse if you'd like to. And we also have traditional exhibits. So all that stuff that we normally do, that normal mode of operating, but in a digital space, which is really wonderful. I don't want to knock it at all. This is stuff that people find very valuable. But if we're thinking about our collections as data, these types of tools don't really translate, or the way we interface with our collections, the way that our scholars, or at least from our digital humanities scholars, want to view our collections. They want to see our collections as data. They want to be able to access that data in ways that they can easily move it to the environments that they work in using the tools that they use. Ideally, they would love to do a bulk download of what we've got. This is actually Dr. William Null, who is now at the University of Pennsylvania, who was recognized by the White House for a Champion of Change in his work around open data and the digital humanities. So at DPLA, we like to say we're a platform to build on, and we do that by providing an API, which is, if you can't read this tweet, I think it's hysterical. He's like, for nerds like us, only does DPLA offer a sick API, but there's a bulk data download too. So we realize that people want to come to us, not through the front door, and look at things like you browse a library, one page at a time, one image at a time, etc., and so forth, but they want to be able to take it with them and do what they want to. They want to mash it up, right? So how many people know what an API is? Good. I can tell you what the abbreviation is, and that's pretty much about it. But the idea is to be able to provide a port or a place that people can grab the data and do what they want with it. And like I said today, I'm going to talk about collections data, but I think it's really interesting to explore the idea of also sharing data about our organizations, our usage data about our collections, and also the demographic data about our profession in general to see what kind of stories would come out. Because I really believe that the most interesting stories and the way that we can tell relevance about the work that we do can be done by our biggest fans, and if we release this data, it will be amazing what kind of stories would come out. Because I truly believe in this notion that Rufus Pollack talks about at the Open Knowledge Foundation that the best thing to do with your data will be thought of by somebody else. So how do we enable this? How do you provide their services? As I mentioned at the DPLA, we do provide this API access, but we also have an app library and we also have, you can get the code for our systems on GitHub. But here's an example of one that apps that somebody built on a lighter side, it's a Twitter bot that will occasionally tweet out images of cats that are in the DPLA library. He is kind of amusing, but it's at historic cats on Twitter, if you're a follower of Twitter. More serious applications of the DPLA resources, serendipomatic, you can actually run text-based documents, syllabus, syllabi, lesson plans, your research paper, and it will pull out images out of the DPLA that it matches with the text. Another tool courses, you know, that ubiquitous iPhone app, right? And then we also have actually a WordPress plugin that can search across a variety of collections right in your browser. What's amazing about serendipimatic is when it came out and it was targeting DPLA, within a week coders had gone in and actually adjusted the code, so it not only searches the digital public library of America, but also Europeana, DigitalNZ, or New Zealand, and the Australian National Collections. So even within our own open access apps, people are coming back and improving them. In a research context, what's available through an API to about bulk download can really draw on new pictures and scholarship. This is an example of newspaper data pulled from a national newspaper project at the Library of Congress, so it takes the bibliographic information about newspapers, the time, date, circulation, and language and actually maps it across the timeline to demonstrate westward expansion across the United States, but also the rise and diminish of the number of languages that newspapers were published in as time went through. So there's many valuable ways that the more we expose data in bulk in new ways that maximize the network that can really change the face of scholarship. So what does this mean for us in operations? This was a recent report that just came out in the last ten days, two weeks, called Metadata as an Interface. It came out from OCLC. And in it talks about how we need to start cataloging for a network age. We need to catalog our collections, realizing that our collections may not always have the context of our organization. When you think about how Europeana and DPLA aggregate your catalog records, right, about your collections, and then they get into a big pool, some of the context about your collection that's inferred when somebody comes to your website gets lost when it's all in a big pile. So how do we catalog for the network? It's a little bit different. And in this study, they talk about how we need to describe more about the aboutness of our collections, those subject indicators, those things that help identify likeness so that your pile of Spanish literature can match up to somebody else's collection of Spanish literature and then you can see a whole body. Does that make sense? So in their findings, they said, in archival finding age, and when we're talking about cataloging collections, I think a lot of us have EAD finding aids or some sort of paper finding aid, they found out that only 60% of this minimal cataloging even includes subject headings. So this is an area of friction that we really need to think about when we're processing our collections for a digital age. But it's also not just cataloging our collections within our institutional systems. It's thinking about how we catalog for use for other tools or in the environments where people live. And in those environments, of course we talk a lot about Wikipedia. This is just one example of a tool that's a really lightweight, web-based editing tool that allows people to take authority records and pump them up into Wikipedia as actual records or entries into Wikipedia. What's important about enriching Wikipedia is actually part of the whole ecosystem of link data. How many people are familiar with DBpedia? All right, for those of you not. Raising your hands, DBpedia is a structured language around concepts that are in Wikipedia. And it is, and if you look at these link data maps, this is an old one from 2011, but the most current one I could find, DBpedia is still the biggest contributor of knowledge into the link data world. So as you enrich Wikipedia with information about your collections and information about your organization through those collections, it then feeds into the linked web and more and more your library and its collections become of the web instead of just sitting on it. So things like hackathons and we call them glam days out hackathons are really more important and should be considered just as important to car logging activities logging in mark records into your ILS system. If it's important enough for the U.S. National Archives and Records Commission, I think it's important enough for our everyday practice. Just this week they announced their commitment. They already have 100,000 images uploaded to Wikimedia and they've committed to uploading on all their images to Wikimedia. DPLA is committed to link data and we're working with our partners that as we bring their metadata in and as we provide data through our API it will be in a link data format for easy consumption. This is nice but everybody wants to know how does this affect you? How does a sharing my collection making my metadata part of the network making it more diffuse and people not coming to my front door what does that mean? With DPLA even though we had over a million unique visitors last year we had over 9 million unique visitors to our API. 90% of the traffic for DPLA comes through the back door which is what our goal was 80% when we were in planning so 90% is actually quite phenomenal. We're starting to see results for DPLA show up in licensed databases that our academic libraries subscribe to. It's really interesting when somebody is searching a resource they see the main list of hints and then hits and then inner file this actually results from DPLA. Mountain West Digital Library saw a great increase in their traffic and most recently they let us know that in the past Google was their top referrer site back to Mountain West now it's actually DPLA. How are we able to do this and this gets to my last point of friction surround rights. Anyone who works with DPLA and contributing their metadata agrees that their metadata is under license under CC0. People familiar with CC0? Yeah? Good. Anybody who receives funds from DPLA to digitize their collections agree that those collections actually be governed by some level of CC license with the recommendation being CC by. So what does that do? I mean it's not just the metadata but more and more I think we need to highlight the positive side of what's coming out of digital libraries. The British Library added 1 million public domain images to Flickr. We saw that the Rijks Museum threw out 120 images, 20,000 images as public domain met in New York put out 400,000 images as public domain. More and more we see these announcements around sharing metadata under CC0. Providing exposure to public domain works, whether they're books, images, text, etc. It's amazing what we're able to put out for people to use. But at the same time unless we actively engage with the community of creators and innovators those mark, the people that are making the process, the people that the coders and the developers, the people that make applications like history, pen and findry. People aren't just going to breeze by our website and notice that you happen to have an API with data. We need to broadcast and communicate and do outreach to these communities and for the last year I've been working with folks at OpenGlam going to a number of tech conferences to talk to these developers about the rich resources that are available to them through our libraries, archives and museums. But even though we've done a lot of work, I mean we've been digitizing collections for over 20 years we have a lot of content out there you run across an article like this that was published in first Monday last month and it's called Enclosing the Public Domain they did a research study how many people read this? It was in the June issue. I don't mind if you look at it right now truly. They did a survey of New Zealand institutional repositories and of the public domain work center in those institutional repositories. Over 48% of those public domain works were then the digitizing institution asserted some sort of right so it closed off that book from actually being in the public domain even though by copyright terms it was. So we need to watch as organizations of course this was my first response are you kidding me it's in public domain us as cultural heritage advocates as facilitators of knowledge creation we should be identifying that stuff doing everything we can to push that out there because there's only so much of that we can control but I understand that rights determination is very difficult this is just a screenshot of the arrow project people know about arrow about doing rights management clearance yeah if you looked at that schematic it's enough to give anybody a headache it's hard we do things at the University of Michigan call the copyright review management system to establish public domain books it's not very clear cut and I understand that but that should be a top organizational priority I think a lot of times we talk about open access and we look outside of our organizational bounds and we really need to make sure we're doing everything we can to get that information out there and in ways that are accessible and communicating it and not only human readable ways but in machine readable ways this is an announcement of a grant that was given to the DPLA as a joint project between DPLA and Europeana it's called getting rights right and it's a way to establish consistent and machine readable expressions of access and use rights on our digitized work that is not a small project but I think it's really amazing that this is being funded by night and it was under the night's news challenge and they advocated this is a way to move information in our communities to use and that rights they identified was a big hurdle so they're hoping that by late fall early winter they will have some of this language out for use so in closing there was a great blog post by Michael Edson not too long ago called dark matter and I think this kind of keyed into summer months that was made earlier this morning the dark matter of the internet is open, it's social it's peer to peer, it's read, write it's open to everyone and it's the future of our museums our libraries and our archives and just in the points I made today we need to make sure we're exposing as much of this dark matter to the light as possible because that's how I think we can be leaders and understand the benefits of open by being those experienced knowledgeable practitioners of open ourselves and be those what was it, buildings of light cathedrals of light so with that I ran through that really fast because I know we are a little bit back on time but now it's just open for questions so thank you so thank you very much for your presentation now we have some five minutes for questions so who would like to start I think there was plenty you gave plenty of possibilities for questions I can see two hands here one here in the front okay the mic went back but there so if you have other mics then there is one David Proso here in the front but we can take that one now yes please I can see who it is I'm Marion from Fraunhofer Association you mentioned content hubs and service hubs can you explain what role service hubs have oh yeah there's a lot we talk a lot about hubs and I know it's easy to get just confused content hubs are basically those institutions that have over 200,000 objects that they want to produce and you want to know about the service hubs okay so content hubs are really lightweight they just give us stuff service hubs are organizations that we've identified that do things that already aggregate collections data from a number of different institutions so it might be a state library or like I said a regional network a consortium that already aggregates data for their partners but also they agree to do either things like metadata normalization they might in the future offer digitization services so they provide other services both to DPLA but also to the organizations that they're allied with so they're you know they have they're that broker in between DPLA and a lot of individual small institutions that are out in the United States does that answer your question thank you it's David Prosser from Research Libraries UK I'm really interested in the friction around cataloging so we just released with the European library 18 million bibliographic records as linked open data and while we see that as a step forward we still see there's lots of problems so the UK has something like 15 million unkatalogged items that are hidden we know that we need to enhance the cataloging of the items that are already cataloged but I think that for individual institutions the cost benefit isn't there yet it's going to cost a lot for us to enhance these records it's going to cost a lot to catalog the hidden collections and some of the data that you presented show some of the benefits but I wonder if there's a way of beginning to lower the barriers are you working with the Digital Public Library of America to sort of provide tools or help for institutions to try get those catalog records enhanced or to begin to catalog the hidden collections yes I like to say yes to everything yes and there are a lot of questions and your questions I want to try to pull it apart yes I think it is really important to talk to people how cataloging for network use is different than cataloging for institutional analog in person use and I think that's the first thing we need to talk about before we start talking about value because people don't understand how data plays on the network and how aggregations work and how words can lose their meaning when we start piling them in buckets and how we need to catalog so that we can express things as relationships then that's the first step so it's really trying to rethink cataloging and I know cataloging for some people is very exciting for other people it is their passion so how do we connect these folks right the other thing is as you saw on a show of hands there are a lot of collections that either you don't have your expertise within your institution I'm thinking of time based media, moving image, audio a lot of people put off processing collections because they don't have the expertise in house building networks and help people processes, collections but they still retain ownership but those records somehow get into these bigger pools so that they can be aggregated and people find access for them I keep on bumping the mic, I'm sorry so yes, DPLA has talked about working with our service hubs to help folks with that another program my home institution which is the council on library and information resources we've been managing a grant program for I want to say close to being 8 and 10 years called the hidden collections program and it's funded by the Mellon foundation in the United States and we've distributed over 23 million dollars over that course of time to help people to catalog their collections so it's going back and talking to administration about how cataloging these collections these unique collections these very different collections what is the return on investment and this whole idea of dark matter is a really powerful idea because when you think about in a linked data world there are probably only a certain amount of collections that can express a small number of relationships and that as long as those collections remain hidden as long as those relationships remain unexpressed are what we can see and what we can learn from what we can see is still limited so it's actually a really part of the scholarly endeavor that we make sure that our collections are exposed so that people can really truly understand the universe of knowledge that they're working in Does that make sense? There's one question quite in the middle here Just raise your hand This is such a great demonstration of collaboration here passing the microphone Hi John Tuck Royal Holloway University of London I'm really interested in seeing all this stuff put out there but really where is the through DPLA, through Europeaner etc talking to academics in my institution they look at this and they say oh that's all very nice where is the actual evidence in research output terms that they are actually using this material we're putting out there when we talk about usage data and providing that out so people can see I can't tell you right now how many people are using DPLA for research but when you look at fields like digital humanities and what they can do their raw materials are basically our collections a lot of people they might have scanned the stuff in their office and ordered the stuff off eBay I've known professors that have ordered stuff off a rare book dealers and they're in their office so they're just attaining books and scanning them to do their research so I will not tell you there but I think it's important to think about yes that is definitely a research question that's out there and we need to be more open with our data of who's using but then it gets into privacy you know do you really want to actually have people write down their names and say who's using what one more the last one yeah what about born digital what about born digital yeah I mean it's born digital is included in that I mean it's our capacity to handle born digital is really important to start investing in now right now we have our investing in our library schools and especially around in our archives they see more activity in the archival community around digital born digital and you know we were talking earlier in one of the workshops around web archiving there is a significant around work being done internationally around web archiving it's once again I think really important to understand what is being done and also work with each other to cover and fill in the gaps when we were talking about collecting everything about everything we can't and we can't preserve everything right so we it's more important that we think outside of our institutional context and within our partners and around the globe I love coming to international meetings because I learned so much what's happening here and I can take it home and say hey we can partner with X, Y and Z it's really important that we lose some of this institutional based pride and competition because we will not be able to handle born digital large scale unless we partner together thank you again