 Focus on Innovations, remarks by Trevor Owens, Nicole Saylor, Martha Anderson, and Greg Raschke at the ARL-CNI Fall Forum, October 2011, convened by Rick Loos. Ken, good morning. This session is a focus on innovation, and welcome to this session. I'm Rick Loos. I'm the Vice Provost and Director of Libraries at Emory University, and I'm pleased to introduce and moderate this session. Innovation takes many forms from creating new ways to discover information, to rethinking the methods to accomplish the task at hand. This session will highlight examples from libraries that are implementing significant changes in how they create, manage, and make resources available. Our topics this morning include the following. Trevor Owens will give a presentation on the National Digital Information Infrastructure and Preservation Program. I'm going to forever know that as N-Dip. Rec Collections Project. Nicole Saylor's session will be on crowdsourcing using Civil War materials. Martha Anderson's presentation on the Library of Congress's Twitter collection, and Greg Raschke's session on tools and processes for managing collections. Let me briefly introduce our speakers for this lightning round session. Trevor Owens is a digital archivist with the National Digital Information Infrastructure and Preservation Program in the Office of Strategic Initiatives at the Library of Congress, and a doctoral student in the Graduate School of Education at George Mason University. With N-Dip, he works on several software projects and serves as a co-chair for the National Digital Stewardship Alliance Infrastructure Working Group. Before joining the Library of Congress, Trevor was a community lead for the Zotero project at the Center for History and New Media. Prior to this, he worked for Games, Learning, and Society Conference. Nicole Saylor is the head of Digital Library Services at the University of Iowa Libraries. DLS provides support for scholars engaged in interdisciplinary digital research by assisting in the creation and delivery of unique digital content and encouraging the use of recognized standards and best practices to ensure long-term preservation and access to digital scholarship. Nicole joined the University of Iowa Libraries in 2007, and she's currently a member of the NEH-funded National Folklore Archives Initiative Project and the Project Bamboo Scholars and Digital Collection Survey Team. Martha Anderson, who many of you in this audience will know, is the director of program management for N-Dip at the Library of Congress. The program has developed a network of over 200 partners nationally and internationally to select, collect, and preserve at-risk digital content. Today, the program supports collaborative collections and preservation of digital content of high value for public policy decision makers, as well as shared tools and services for sustaining diverse digital content for long-term access. She manages a web archiving program in support of the library's digital strategic initiative program. Greg Raschke is the associate director for collections and scholarly communications at North Carolina State University Libraries, where he leads programs to build, manage, and preserve the library's extensive collections. His responsibilities include overseeing a $10 million plus collection budget and the development of digital collections. He also leads the library's partnerships in developing new and sustainable channels for scholarly communication. Greg has published and presented on diverse topics such as the future of library collections, electronic resources, and organizational change, recruitment practices in academic libraries. The note I have here in front of me says he loves the Grateful Dead, baseball, hockey, and his wife and children, although not in that particular order. Let's start off with Trevor. So I'm excited to be here with all of you and to share a project that we've been working on for quite a while. I've been with N-Dip for a year now. And in that time I've been working on this project and there's a year of development that's gone on before that in it. This is also an invitation to you and your staff to try the tool out. I'm happy to create accounts for anyone who's interested. I'll give a little bit of background on the project. And then the primary focus, I've decided to judiciously use my 10 minutes to focus on actually showing you what the tool looks like, how it works, and I'll upfront explain some of the ideas behind this, and then I'll be happy to entertain more questions. So as a starter, we can keep in touch. If you want to ask for an account for this software, you can email N-DipAccess.gov or you can email me directly at trial.gov. And that's the URL if you want to write it down to visit the site where you can see some of the examples we'll talk about. But I'll back up to the image of the homepage again. The idea behind this project is that it's... N-Dip has, as mentioned, a really large network of partners working with very different kinds of collections, different kinds of materials. But there's been a persistent interest in being able to create interfaces to be able to explore those materials that are sort of on the same structure, similar ways to work with them. Early on, it was decided that there was no sort of super system or that there wouldn't be a way to agree on fields that would work for all of these very different kinds of collections and items. And so in some sense, recollection grows out of an attempt to bring together different kinds of materials, build a tool that empowers the people who know a lot about the collections that they're working with to create interfaces to explore them. And so in the very quick slide example, recollection can take something like this, like a mod's record, or a set of mod's records, or something completely in the opposite direction, something like a spreadsheet, just information in this case. This is actually a sheet of information about a whole slate of different collections and turn them into something like this. So this is an interface that's actually derived directly from this spreadsheet. We uploaded the spreadsheet, and then I'll show you guys a little bit of the sort of workings inside of here, but the end result is that you get this faceted browsing. In this case, I'm showing a map, but there's also a timeline and a table view and then a list view that has thumbnails of images. But the idea here is that we're trying to make a tool that links it as easy as possible for someone to create these sorts of dynamic interfaces to content that they know a lot about but potentially don't have the skills to or the sort of technical expertise to build, say, a map interface to their content or something like that. So with that, I'll explain the workflow here and pause at a few points of what we think is particularly interesting about this project. So to begin with, we can ingest data. So we can take in data from, as I mentioned, spreadsheets and mod's records. We can also actually ingest data over OAPMH, double-core metadata, and then use that as the basis to build these interfaces. So instead of... it doesn't hold on to content itself. It doesn't hold on to digital files. It's strictly holding on to the information about the content. From there, we have a really neat set of features which is the ability to augment that data based on, at this point, a few different services. But in the future, there's really no limit to where this can go. The sort of go-to examples here would be a series of plain text place names turning those into using a lookup service to provide points of latitude and longitude for those places so that you can plot them on a map and help people explore and sort of see how these things fit together. So in that case, it's not editing the records. You're just sort of pulling your information in and then running these augmentation services on top of them to help expose and sort of work with different kinds of data. Then you design views, and I'll have an example of what that looks like, sort of a simple interface where you create the views. Ultimately, you can then publish the view as I showed, but the particularly neat thing here is those views are then embeddable. So in the end result, in the same way you would embed a YouTube video or something like that on your own website, you can embed this interface into your own collection on your own site. So through this whole workflow, what's happened is you've started with collection information, worked with this tool potentially in half an hour, 45 minutes, created a rich interface to that collection, and then you've embedded it back into your own website so that you now have this sort of, we've found that it really helps display the value of these collections when you start sort of coming up with different ways to interface and explore them. And it's been pretty potent in that case. And then ultimately, you get the ability to share the data and the views. So I'll show you how you can, anyone can pull the content down in a series of different ways, rework, work it, and explore it in other tools. So to start with here, we're just going to load data, and as you can see here, we've got the spreadsheets, mods, files, OAIM. There's also a sort of beta version of being able to get information out of Content DM. And then once you've ingested it, you can see what this sort of looks like. So we've got a whole range of different fields there, and then you can choose from the dropdown menus to describe them. So if I say that a URL is the URL for an image, then it'll know that it can wrap it in an image tag and display it in the interface. And then here, I've decided to augment a field, so I've decided to create a lat-long field that I'll place on a map, and then I can pick any of the fields that I want to use to derive that, and then it will ask third-party service to return lat-longs, and then those become part of my data set. So from there, I get the ability to choose a canvas to display my content in, so where I want the facets to appear, where I want the sort of views in the center. And then this is the interface where I'm actually building out my view. So I can add a search box to the side. I can add list views or tag clouds, those sorts of things. And then in the center, I can choose what sorts of interfaces I want to put over that information. So a timeline, a map, et cetera, list views, and then the facets persist across all of those interfaces. So in the end, you end up with something like this again, where you can, in this case, I've even color-coded the pins on the map, which again is just a selection from a radio button. So it's a very straightforward and simple way to end up with something dynamic like this that explains and works with content, you have elsewhere. And here, I'm illustrating that the embedding part of this, so on any of these views, you can click and get a single script that you then paste into your own page to insert that view into your own website. And in this case, it's restyled according to the style conventions of your own site, so it looks like a seamless part of it. This is an example of the same view embedded in digitalpreservation.gov. It's a view of our partner's collections. And again, this is powered simply by a spreadsheet. So in this whole workflow, we've taken something that is generally very difficult to do to make a dynamic interface to a collection and tried to make it as simple as possible for someone who knows a lot about that collection but may not have the time or the resources or the capability to create these sorts of interfaces. And you may be asking yourself why is a group focused on digital preservation, making interfaces for access? And the core response there is that the interfaces are part of continuing to preserve things involves really making the case for the value of what it is that you're working with. And these end up being very potent things to show to people to sort of illustrate the kinds of materials you're working with. It also in our mind has a competitive advantage in the sense of the sort of the same local context and ideas. The ideas that you have a community in your space that you can serve and if we can make tools that make it as easy as possible for you to make interfaces into those collections that you've got a way to connect very directly with the local community. So again, here's my stay in touch info. I'll just add a few more quick points and then I can be wrapped up. We see this serving a couple of different examples. I've sort of talked through the workflow and how you can create these interfaces you embed. A big part of this and one of the things that we've seen it most useful for is not necessarily even getting to the point where you embed an interface but that you just curators, archivists, scholars will use the tool and sort of explore and get a better understanding of how the categories fit into the collection. Sort of a collection level view of the materials inside the collection as opposed to a series of records and so in that case there's also chart functions that you can throw charts into this over the same sets of data and map functions as I mentioned all of those layers end up becoming powerful interpretive tools whether or not you actually want to share the interfaces in the end with the public so that in a lot of cases we've got people who will push their records into this tool, see some patterns and realize that there are issues for remediation or there are things that they want to work out or simply realize that they aren't necessarily using the most useful categories for organizing things when everything falls into one category or you very quickly see that there's 300 objects that have no value for field such and such those sorts of things become very salient when you put them into this kind of dynamic interface so on the one level it's a tool for providing access and discovery to collections and on the other it's a tool for better understanding what it is that you've actually collected Thank you Hello Wikipedia defines crowd sourcing as quote the act of sourcing tasks traditionally performed by specific individuals to an undefined large group of people or community, parentheses crowd through an open call Okay, I'm not sure what's worse starting a talk with the word definition or quoting Wikipedia to a room full of librarians but I hope you'll excuse me in this case this is in fact after all a talk about enlisting a broad public to accomplish big things and for that Wikipedia is king at least they've shown us the vast possibilities and the significant pitfalls of just giving anyone and everyone the freedom to interact with and add value to a web resource So today a growing number of libraries are jumping in and engaging the crowd to help transcribe, edit, generate metadata and content to make collections more robust and accessible New York Public Library's What's For Dinner train menu transcription project follows the launch of its map rectifier which is a crowd sourcing tool to digitally align or rectify historic maps The Family History Library the Church of Jesus Christ of Latter-day Saints they've posted old handwritten birth, death, marriage and census records online for any and all to transcribe In higher ed two big projects come immediately to mind Trevor's Friends at George Mason Center for the History of New Media They have a project to crowd source 55,000 War Department papers and University College London has a project to do 40,000 items in the Jeremy Bentham Collection and there are a couple of other large scale efforts out there that were inspirations to us that are worth mentioning Citizen Proof Readers are helping Project Gutenberg correct OCR on many public domain books and journals so that they can be fairly available as e-books A project by the Citizen Science Alliance Zootenverse Initiative has several transcription projects afoot mostly in the sciences ship logs from Royal Navy vessels so that people can track whether it's sea and so those were all inspirations for us to take on our modest but modestly successful Civil War project at Iowa and we were able to do what we did without a stable of computer programmers specialized software or any grant money and this approach has its obvious down slides which will become apparent soon for you So today, five months into the project the results were a surprise to us all and it made us believers in this idea of engaging the crowd not just for the thousands of pages of Civil War materials that we were able to generate but because of the relationships that developed between the crowd and our materials the crowd and the libraries Okay, so for background in anticipation of this Civil War sesquicentennial we decided to do a left to right reformatting of all special collections Civil War holdings which came out to about 20,000 digital objects The collection included a few transcriptions provided by families but mostly everything was handwritten and as you know without machine readable text you can't get at the names the locations and all the good stuff that are in these materials So our crack webmaster Linda Roth wrote some PHP code and it pulled these Civil War diary pages onto a transcription interface she added a form, a little navigation and just like that the site was born Now it's a pretty sophisticated workflow and allow me to explain it to you Okay, so citizen transcriptionist for lack of a better term go to the web and they transcribe what they see and they hit submit it arrives in our departmental inbox and a staff member copies it paste it into the database proof reads it and then it's added So like I said we didn't have a lot of software tools at our disposal but we had some peopleware so we decided to go that route So on the 5th of May we launched the project with our usual press release tweet blog thing and not much happened and then June 7th the American History Association posted a link on its blog Then June 8th we had an email with a subject line increased bandwidth temporarily, question mark arrived in our departmental inbox Allow me to read it for you It seemed easy enough Okay, to whom it may concern your site has been found on reddit.com an enormous of online folks that contribute the best content that others may find interesting and lightening funny an article linking your website of Civil War document transcriptions was linked and because of the intriguing nature of your site most likely there will be a huge draw to view and participate in transcriptions Consider allocating more bandwidth to your web server for this increased traffic you may risk the site going down etc etc thank you Okay, so on June 9th this happened maybe a thousand hits to our entire digital library on a good day to by the end of the day 70,000 which was exciting and frightening so our staff was busy upping the RAM freaking out but the whole system came to its knees so we on the staff side couldn't access our content management system to throw it up what had been transcribed and yet people could still hit the site and transcribe so we just watched as the transcriptions arrived in our inbox and just grew and grew it was horrible and it was wonderful so interest is leveled off of course but it does remain steady and we have managed to develop a loyal cadre of transcriptionist there's one woman who as of Wednesday I checked transcribed 479 pages on her own the project allowed people to interact with these materials in a whole different way the transcriptionist followed the diary entries they became invested in the diaries stories some were motivated at the thought of furthering civil war research I wanted to share a few user comments that we received I love the idea of being able to make history reachable to anyone across the globe I love the idea of what I see this is freaking awesome they need to do more of this to give us armchair archeologists a chance to help out this is one of the coolest all caps most historically interesting things I've seen since I first saw dinosaur fossils and realized how big they actually were I got hooked and I did about 20 it's getting easier the longer I transcribed for him for him being the soldier because I'm understanding his handwriting so I think that's better last comment best thing ever will be my new guilty pleasure that I don't even need to feel guilty about so to date most of the diary transcriptions are complete and just this month we've expanded it to add about 5800 letters correspondence from the civil war and so I think as of Wednesday we had more than 7000 pages transcribed out of an eligible 12,590 actually we've created a twitter account to keep folks informed of when we post new items and have new news about the collection special collections has also received some inquiries about further donations because of publicity around this project and we've also had some interest from other institutions who've wanted to have Linda's code so now we begin to think about how can we reach out to these sort of power volunteers power transcriptionist those who've done more than 100 entries each you know what kind of feedback do they have from us how can we continue to grow and cultivate this kind of relationship we're also looking at scaling this effort beyond civil war materials to some of our other travel logs and journals and you know can we take advantage of some of the transcription software tools that are starting to emerge so since we started our project grant funded projects to develop models and tools are starting to get out there again the Center for History and New Media George Mason has developed an open source tool called scripto which enables researchers to contribute transcriptions and they are writing a connector script it's called to hook that in with OMECA which is the web publishing platform that they've created I'm waiting for the hook into Content DM which is what we use so in conclusion um in conclusion Rose Holly from the National Library of Austria where they have multiple crowd sourcing projects underway wrote an article in the March April 2010 D-Line magazine that said quote the potential for crowd sourcing for libraries is huge libraries have massive user bases and both broad and specific subject areas that have wide appeal libraries could get hundreds of thousands of volunteers if they really publicized an appeal for help anyone with an internet connection is a potential volunteer she suggests that libraries need to shift their thinking relinquish a little control over creating collecting and describing data quote giving users freedom to interact with and add value to data as well as create their own content and upload to their collections is what users want and it helps libraries maintain relevance in society she also suggests that libraries think globally about crowd sourcing that maybe we create a centralized global pool of volunteers and rather than trying to establish our own user bases and the digital users don't necessarily care about institutional walls which is mentioned earlier so libraries are already proficient at the first step of crowd sourcing which is engaging um is social engagement with individuals but holly calls us to be proficient in the second step which is defining and working toward group goals thank you I'm so happy to be here with you today I um I was very interested in Paul's reference to the world according to grep because I sort of feel like the twitter story is that for the library we are trapped in the world according to grep and I'll be revealing that as I tell you the story when the twitter archive donation to the library was announced we were deluged with press inquiries and inquiries from researchers and a lot of times they ask the question why is the library interested in twitter so we reached into the past to compare twitter to known forms of information there was some thought that tweets were like diary entries primarily because of the references to food and weather and personal activity but in conversations with the staff at twitter they told us that they thought of tweets as blood sides intended for public display not like diaries which could be thought of as private so then when we look to the future our interest in twitter was centered on its perceived value as a communication artifact of the 21st century that would be of high interest to future historians and scholars but when we take a present view we see twitter as a compliment to the library's news collections as part of the discourse that we political discourse that we collect through our election and congressional web archiving program this is an opportunity for us to have a fuller record of congressional communications not just websites 81% of congress now has twitter or facebook accounts we also found that tweets augmented the narratives in blogs and websites that were part of our international collections and we're now seeing twitter as an emerging form of journalism as evidence by this tweet by alexis madrigal of the atlantic about the occupy wall street expressions he says if i were running the occupy wall street say i'd have a dedicated twitter storyteller who narrated what was happening for people watching the stream so this is an interaction between professional journalist and citizen journalist one of the most creative uses we found is twitter as art in a project that examines written expressions in the context of physical landscape artists larson and schindleman directed tweets with the geolocation tag turned on and then photographed the site one photo in this exhibit is outside the front porch of the house and it's associated with this tweet i just put on that location thing for twitter i'm not sure how i feel about it though so privacy concerns were sometimes the topic that reporters wanted to pursue in their conversations with us the library is only receiving public tweets we do not receive account information and we are required to block access to deleted tweets researchers want to use twitter we see this almost weekly we see reports of new insights about our human condition is verified on twitter this study from cornell published was reported on in the new york times recently it's like many of the studies analyzes trends in language to see patterns and then just the other day i saw this tweet that reminded me that twitter is a messaging service for scholars in the humanities and in the sciences this tweet taught us a lot about the pace of information spread through social media it was retweeted hundreds of times and became a trending topic on april 14th 2011 how many of you know what a trending topic is it's like the thing that overwhelms the twitter stream and people keep that is a phenomena that we will likely never see again we were overwhelmed ourselves thinking there we were competing with justin beiber for the attention of the nation in the time since the announcement we've had almost 200 inquiries direct inquiries from researchers their primary question is when can we use it the other two largest areas of interest are to examine events such as the arab spring or the japanese tsunami and for studying language communication trends during crisis the spread of diseases financial market trends and political sentiments so there's a huge variety of the kind of interest in twitter some people do contact us and say they want their own personal tweet archive so andy carvin who is an npr correspondent has gained a following for curating first person tweet observations about sexual events in libya egypt teneza and a follower wrote him and said ask him about the data he's collecting and it said is it searchable like by key word we get asked that same question he answers that he's capturing things in a spreadsheet and he hopes for something more interesting in the future well we're not using a spreadsheet but this points up the challenge of twitter the library's first priority is to safeguard the archive but we have had high interest for access from the moment of the announcement i actually received an email from someone at nist within hours of that announcement saying we'd really like to talk to you about the twitter as a research corpus it's not a typical digital library collection though it has too many rows to fit standard kind of databases to easily be indexed have you noticed that there is no service including twitter that can provide you full search of your own tweets you can't go back more than 4 or 5 days maybe even a month i was at a conference last week we were tweeting got back in the office on tuesday wanted to do a little compilation and a lot of the tweets were out of the range of my ability to get at them so it tells you something about the technology it takes to index and keep these things fresh it takes a different infrastructure and a whole new way of thinking about serving information if we are to find grain and we think at the very tweet level we are defeated by current technologies therefore people who want their own personal tweets may have to wait a long time this screenshot shows the interface to our twitter accessioning app at the library and i took this shot on tuesday and at that point 27 billion tweets had come into the library and had been processed we are getting an average rate of 6 million tweets an hour when we accepted the donation the estimated size of the archive was less than 5 terabytes and i think there was a rate of 50 million tweets a day so twitter is growing at a very fast rate at the rate it's growing it will rival our web archives which accumulated a rate of about 5 terabytes a month so we are testing hadoop clusters for parallel processing how many libraries run hadoop clusters be warned we are investigating tools for data mining we are engaging the research community to learn about how they think of using twitter and even when we get our system architecture tuned to serve this data many researchers will still be interested in using twitter for realtime research with that expectation they will be disappointed because the agreement with the donor is that we have a 6 month embargo on the data you could maybe get to it after it's 6 months old they want something more current but the biggest lesson we've learned thus far is that the twitter archive is not just for the library of congress it has value for the world and the world lets us know this almost every single day thank you trying this for the first time with an ipad so we'll see how it goes so moneyball the extra 2% of baseball management can teach us about fostering innovation and managing collections I used one of my 10 minutes on the title but I've been thinking about the intersection of modern baseball management and the statistical revolution there and the intersection with managing collections quite a bit I was not sure the moneyball analogy worked but then Ann Kenny brought it up in an independent conversation about libraries and Brad Pitt made it into a movie so I figured if Ann and Brad are on board so am I then you might conclude this is a forced analogy which would be fair and that means I spend too much time thinking about baseball which my wife would agree with you but I do think the comparisons between innovation and developing and managing baseball teams and particularly the statistical revolution there the analytical revolution there and research library collections work well there's some things we could learn so briefly the story of moneyball is the story of the success of the Oakland A's in the late 1990s and early 2000s and revolutionizing questioning axioms and doing statistical analysis and then the extra 2% is the less well known but equally important success of the Tampa Bay Rays from 2007 to the present although not in the playoffs here recently both teams had to struggle with inferior revenue streams dysfunctional markets and competition from significantly better finance resource bases say the New York Yankees or the Boston Red Sox both had to find new approaches to building and running a successful baseball team in order to compete, maintain vitality and be successful in serving their primary users their fans today that many libraries find themselves in a similar position so the shift from supply side development of collections, sort of print based large inventories, unpredictable supply and demand periods and judging collections largely on size and the amount of money you spend to more directly demand driven approach has been underway for quite a while but the pace is definitely increasing for a number of reasons I think technology that's fairly obvious and the impetus of the network changes in publishing supply chain capabilities increased accountability for libraries in higher education and I think one reason trumps all others and that's economics these are three random aero libraries you might guess from the Carolina blue the dark blue and the red what they are it's not really important except to show the rate of growth in the collections budgets over a 26 year period I would argue that the next 25 to 30 year period we're unlikely to see this rate of growth and you can do this with almost any many other three sort of aero institutions the rate of growth minus a handful has been pretty high in the growth of collections budgets I don't think we're going to see that again so we have to look at new issues and new models so this inexorable economic factors are driving us to increasingly innovate in managing our collections, question established norms and develop new models for managing collections so my wife asked me if this guy was supposed to be a metaphor for looking deeper in analytical thinking and I can tell you he's not I'll get to him in a minute but so in baseball briefly and in the two books looking deeper in questioning assumptions means questioning a reliance on old fashioned numbers and axioms that do not tell you much about past, current and future performance and they mean using statistical analysis to identify market inefficiencies it also means not relying on sort of grizzled hardened scouts this is the best image of a baseball scout that I could find and that's what he is right they think the old scouts tended to think they could predict performance based on what they see with their eyes and rely on paradigms and performance based on what they know things that they haven't really tested and those things dominated the game until the late 1990s and they did this rather than question, test, analyze and project performance based on good information and then these two baseball teams they did all that they revolutionized how they assess performance, how they project performance and they bred that throughout their organization which is another critical point because they had a lot of these guys they kept the ones who could adapt and who could perform and evolve and then they got rid of some of these old folks and I say that as a state of mind not as a state of age so I suppose this guy could be a grizzled hardened bibliographer as well maybe he works in one of your libraries but I doubt it I think most of us are well past the sort of grizzled backroom selector models that we had in the 80s we're well past that but we do still hold on to many of those as we sort of turn our big ships to fully embrace the digital environment right many of us are changing practice and working to fully embrace digital content as the primary means for discovering and disseminating information but we're not doing it fast enough and would benefit from looking outside libraries looking at some of these examples today for ideas to expedite the pace of innovation in collections, here are some specific ideas that I think we can increasingly employ so moving quickly through some assertions and changing practice whatever you have to do to grow, develop or hire statistical analysts to be part of your collections team change your staff and your staffing models with at least one new hire that requires an analytical statistical background I think this is critical people are the most important tool for innovation that's nothing new that I need to sell this group but it requires real change with your staff to truly change your culture and breed innovation how you approach collections adopt statistical tools build adopt adapt and start using them to develop the collections program so a couple examples at NC State we put our entire book collection and circulation activity into SAPs which is a statistical analytical software we ran a myriad of analyses that have informed everything from cutting our approval plan expenditures down by 45% scoping our demand driven ebook collections and heavily influencing and reducing our item by item selection we've saved hundreds of thousands of dollars annually on our various book collecting efforts improved our usage rates improved our scope in some areas that sort of long tail area that we have and reinvested in new programs national collaborative efforts, Hathi trust Duris based things like that we developed our ERM system to ingest statistics related to every continuing resource we own and this system enables a myriad of reports based on usage rates expenditures and here's the key we use it consistently not just when we have cuts, journal reviews we annually review the bottom 20% of our continuing resources and their overall performance and we make decisions accordingly to enable investment in emerging areas partnering with digital library colleagues I think this is critical a lot of digital library projects have focused on sort of outward facing digital collections and digital services which is fabulous because they significantly benefit users but partnering with collections partnering digital library and collections people is an untapped resource I think for collaboration and innovation in our organizations it's not been applied enough we need to do more to foster collections and I think we need to do more to foster that a couple of quick examples we developed a database where we actively map collections expenditures by fund and subject to university data points such as number of PhD's faculty members enrollment grant dollars and we can manipulate those at the research center department and college level we use that real time information to mine our overvalued and overinvested areas underinvested subjects test ourselves against university expenditures and priorities we advocate for the addition of library resources in certain areas it's been very effective in balancing our collections portfolio freeing up funds to invest in new areas and helping us on occasion advocate for the preservation or increase of funds we developed a database to solicit community input on journal titles, weight community input and pair it with agon factor, SNP, usage data publication rates and citation rates among our community we couldn't have possibly done that without the help of our digital library colleagues right Ivy and her colleagues she's in the audience somewhere at the CDL have done much of this work as well which she talked about yesterday they've done a great job we piggybacked on the work at the University of Washington to roll this data up by title and analyze manage and negotiate packages with publishers right you can sort of roll this up and use it in publisher negotiations and these efforts have been imperfect and we can still get better but we're making progress and we take Steve Jobs's advice stealing ideas from other libraries wherever we can we frequently steal from Ivy it's a good practice I think developing positive arbitrage positive arbitrage is the heart the concept that's at the heart of the extra 2% right I'm going to in the interest of time I'm going to skip the long technical definition in the business world that they use and basically tell you that the use of the term in the book and my use of it here is kind of a bastardization of it but it really talks about and means generating value and everything you do with collections right generates positive value and this is something that's the extra 2% for the raise right everything they do generates positive value for their organization generates positive value for their organization and I think library collections work historically has excelled at creating negative arbitrage in a lot of cases right we exchange dollars for books that do not circulate we purchase journals that we don't want we collect government documents at far too high a rate among us in duplication right and we often sign contracts that favor publishers we work with sometimes there are good reasons for this right the long tail of scholarship is a value judgment that we can make economics politics and other things but often I think we made these sort of negative arbitrage steps because we base decisions on past practice and not having the type of data and analysis and information that would help us find the best value in expending collections resources helping us find the extra 2, 5 or 10% to maximize the impact of collections so we I hinted I tip my hand a little bit when I questioned Rick yesterday about the collections budget and some of those innovative projects he talked about but we have to free collections funds to invest in new and emerging areas right we have to free funds to invest in digital media collaborative national efforts and the kinds of great digital library projects we've heard about here that are non-traditional but critical value for scholars and for libraries right if you budget for experimentation and reward innovation then you can reinforce your efforts to change the staff change the culture to consistently question assumptions experiment with new approaches generate innovation and bring more content and value to users so both of the teams I've referenced today had people with dynamic interpersonal skills to carry forward the combination of analytical people and analytical tools to drive collections work they met resistance as some constituencies dug in, employees, fans, etc but they persevered by demonstrating the impact and working the necessary political channels to create buy-in over time I think libraries tend to be very good at generating positive political arbitrage with faculty and other users right that needs to be maintained as we work on new models for analytical approaches to be successfully applied in local environments the user community and especially the faculty have to accept them in time analysis does not obviate the need for political and interpersonal work that's involved in managing collections but I'm certain that if we use analytical tools to increase access to the collections users want in the format that they want them in to consume them in than just like winning in baseball your fan base your users will buy in thank you thank you for listening music was provided by josh woodward for more talks from this meeting please visit www.arl.org