 Shaking Up the Norm in Collections, remarks by David Meager, Ivy Anderson, Rick Anderson, and Rick Loos at the ARL-CNI Fall Forum, October 2011, convened by Jay Schaefer. Welcome to the session entitled Shaking Up the Norm in Collections. I'm Jay Schaefer, Director of Libraries at the University of Massachusetts Amherst and a member of the ARL Task Force on 21st Century Research Library Collections. I'm pleased to introduce this session and to introduce our speakers. The manner in which the Research Library is creating, managing, preserving, and storing collections affect the ecosystem of research, teaching, and learning. Our speakers today will address ways the landscape of collecting and collections has changed, and they will offer their perspectives for leveraging new approaches. We have four important topics this afternoon, creating global resources using digital technology, journal pricing and evaluation models, patron-driven acquisitions, and the Solomon Rushdie Papers. Rather than take time from our one-hour topic today, I would like to briefly introduce our speakers in the order that they will present, and then we will have them do 10-minute presentations and hopefully have about 10 minutes left at the end for questions and discussion. So the first speaker is David Meagher, who is the Associate University Librarian for Collection Development at Princeton University. Second is Ivy Anderson, Director of Collection Development and Management at the California Digital Library. Next, Rick Anderson is Associate Dean for Scholarly Resources and Collections at the University of Utah's Marriott Library. And last but not least by any means is Rick Luce, the Vice Provost and Director of Libraries at Emory University. So please welcome our speakers. Oh, I need to also point out that David has an appointment back in Princeton, and so he will probably be sneaking out quietly before the question-and-answer period, but he will be glad to accept any comments you have probably electronically. David? Thanks very much. I'm going to attempt something rather ambitious. I was asked to speak about creating global resources using digital technology, and I'm going to speak very fast, and forgive me for that, and try to fit in within, in the spirit of a lightning round, 10 minutes, kind of a showcase of a couple of projects that we are working on at Princeton that exemplify some of the ways that digital technologies have changed the way we approach some of the collections of content that we have built up. So the first of these is the fairly famous among Latin Americanists, the Latin American ephemera collection at Princeton, which has been assembled over the last 40 years. This is the 40th year we've been heavily invested at Princeton in assembling collections of materials, primarily in social sciences, politics, human rights, women's issues, elections, and so on. It's a massive collection, and I want to say a couple of sentences about how we've been dealing with it and how the way we have dealt with that collection is now undergoing change. Our manner of dealing with the collection, first of all, we have cooperative relationships with institutions in the region, as well as a very broad network of acquisitions agents throughout the region who acquire on a commercial basis these materials for us. And when we acquire them, we have a staff that has processed them into thematic collections, country and theme. And then these collections, of which there are close to 400 country and theme combinations over the years, have been microfilmed first and foremost to preserve them. And then there are finding aids for each of those, and they are stored as archival collections made accessible to visitors to Princeton. And of course, people are buying the microfilm. And in fact, in recent years, the staff to process this collection, the cost of the staff had to be supported entirely by the revenues from selling microfilm. Now probably, if you think about that, it's not a sustainable model. So we're not too happy about that. So what are we doing about it? We are, first of all, we have, with the growth of our own in-house digital capability, we have undertaken some specialized projects, the first of which was to segregate out the Latin American posters, which were in high demand by scholars we understood, and which formed a nice self-contained sub-collection. We have gone ahead and digitized about 2,500, I think it's about 2,500 of the Latin American posters from the ephemera collection, and we've put them up. I don't even know if you can read any of that. But we have, from the Latin American ephemera collection, we've got, as I say, a navigation system and searchability into the sub-components to find the finding aids, as well as to bring up the Latin American posters collection. We have a sort of basic metadata and thumbnail images, and then we have full-size images that you can navigate into. I should mention, by the way, that we had long conversations with our University Council, who assured us that this material we should feel free to put up on the web, and we do, with appropriate takedown provisions that should ever arise, and we doubt that it will arise. If you think about it, these are the kinds of publishers who are more than happy to have their content as widely disseminated as possible, very much like the Human Rights Archives material collected at Columbia and other places. So that's one approach, is to take a subset and to digitize it ourselves in our studio. The second approach has been to, with a grant funding from the Department of Education, under the now-wacked TICFIA program, the Technological Innovation for Cooperation in Foreign Information Access. I think I garbled the acronym, but anyway, TICFIA is under Title VI, and through the Council of American Overseas Research Centers, we have worked to digitize another subset of this material, not the posters, but the institutional archives of the Guatemala News and Information Bureau, sorry, and this project is digitizing, we digitized from film, we outsourced the digitization, and again, our University Council tells us this is excellent material, that again, the institutions that created that content would be more than happy to see it on the web as far as he's concerned, and so we got a green light to go ahead and put it up. This is still a work in progress, we're in the middle of that grant. We're not sure if we'll get all the way to the end of it with the last of the money that the TAP was turned off, but this is a collaborative project again with other institutions in the region. Now, let me switch gears and say what it is we want to do going forward. As I said, the microfilming is not a sustainable model for dealing with this, and I've been speaking with Latin American Studies librarians, and major research libraries around, and at Salaam and so on, and there's great interest in this content. The problem has been not just that it's microfilm, but that the way we assembled it was that you had to pigeonhole it by country and theme until you assembled enough to make a collection, then make the finding aid for that collection, then disseminate on microfilm. That meant years could go by before you had a particular collection on Peruvian women's issues or whatever the particular subject was. People were interested in timeliness, and that was a key factor that motivated us to approach this in a different way, and we are right now, as we speak, developing a new model where we digitize on receipt. We outsource the digitization of each piece on receipt. We apply basic metadata to the piece that corresponds roughly to the thematic collections we had created before, and then we put those things up for immediate access and browsability. What you don't have are curated, in quotes, curated collections thematically, and so archivists look at us a little bit as scants, but I think that the immediate access more than makes up for that and the searchability and so on. I don't have a slide for that one that's a process about to be launched, and my colleagues at the other institutions that are interested in Latin American studies have indicated a support model that we hope would become sustainable, namely, since we have this content, we would hook up with other institutions that have some other ephemeral material of this type, but that we would create a governance structure and an organization in which people would help support the fairly minimal staff that we need to process these materials and create the metadata in exchange for playing a governing role in setting priorities and determining which materials to go after and so on. These people I spoke with thought this was a good model, and so we'll see if it works. In the last two minutes, let me turn my attention to the second project I wanted to highlight, which is funded under an NEH grant to Princeton. It's under their bilateral program with the DFG in Germany. There is in Yemen a collection, a vast distributed collection in private archives of manuscripts of the Zaydi community. I don't want to go into all the background, we don't have time, but the Zaydis are sort of culturally and politically somewhat endangered in Yemen. The private archives and manuscripts that they hold are manifestly endangered because they're being destroyed. This was a kind of endangered archives type of project. We couldn't bring the Yemenis to the U.S. because of visa problems, our partner in Yemen, and we could not go to Yemen ourselves. Things are not very good on the ground in Yemen right now, I'm sure you know. So we brought them to Berlin, and we brought cameras to Berlin that the grant paid for, and we trained them in long workshops and sent them back to Yemen with the cameras. They're digitizing the manuscripts, they're applying preliminary metadata. When we can manage to ship anything out, sometimes through intermediaries, we get the hard drives full of data back to Princeton, we put them up on the web as part of the Yemeni manuscript digitization initiative, and we put them up in a browsable collection which constitutes a subset of Princeton's well-known existing Islamic manuscripts digitization, digital collection. So this is collaborative with institutions in Germany, with institutions in Yemen, and with a colleague and scholars at the University of Oregon. Am I down to, well, I'll cut it off there and just say that this is different from our other digitization, our other collection projects because it's digitization pointed at materials we don't own. We don't take away from where they originated. We invest in the country where they sit by investing capability to create digital content that our scholars tell us they must have access to, so we're responding to that need. I can't claim that this would never get out of the long tail that was discussed before, but it's obviously important for research and future research, and I'm really glad to report that at Princeton, and I'm sure many of the institutions represented here, the long tail of low-use research materials is alive and well in terms of what we collect and what we preserve. Thank you. So I think of my talk as something of a sequel to a presentation that many of you heard last May at ARL's membership meeting by Ted Bergstrom and Claudio Espezi, Ted Bergstrom, an economics professor at UC Santa Barbara, who likened library big deals to Hell's Grocery Store, a place where one was seduced into buying groceries one didn't need. Claudio Espezi, an investment analyst who studies the STM publishing market, predicted that journal cancellations driven by budget cuts were going to impact publisher bottom lines. So both of those talks had a common theme, which was that libraries are beginning to understand how to value journals and act on that information and that that would begin to change the scholarly communication marketplace. Well, this is something that a University of California libraries have been doing for a while. We've been applying a metrics-based approach to understanding journal value and acting on those decisions, and I'm going to talk about two of those projects today. One is our value-based pricing approach, and the other is something we call a weighted value algorithm. Is this creating feedback? It's okay. So our value-based pricing approach goes back to 2007 or actually earlier when there was a UC library summit to talk about marketplace challenges. The result of that was the formation of a collection development task force to try and develop a value-based approach to alternative pricing for materials in the journals marketplace. So our goal was to apply notions of value to journal pricing. We wrote a paper that described our approach, which you can find on the UC website. And the approach had the four elements that you see on the screen, but the primary notion was to attempt to align pricing with value in a clear and objective way. And by value, we meant both the value that we obtained from journal content and also the value that we contribute in the form of our authorship and editorship. So to do this, we developed a methodology that we described in our paper that was based on work that Ted Berkstrom, in fact, had done with his colleague Preston McAfee to identify journal value. So this is information that you can find on their journal prices website. The basic idea is to compare the value of list price, the list price of commercial journals within specific disciplines to the cost of nonprofit journals in those same disciplines in order to understand what the delta is and what might constitute true cost efficiency in the journals market. To do those comparisons, they've developed a mechanism where they look at list price per article and list price per citation. They come up with something they call composite price index. Then they look at that. They compare the commercial journals to nonprofit journals and derive something they call relative cost index. And that tells you how good or how poor a value a given journal is in its field. They then assign to journals a plain language characterization, whether a journal is a poor value or a good value. So UC used this methodology, or we developed our own methodology based on this information to apply a discount to journals that had poor RCI values in order to arrive at a particular price for a package. And then we applied additional discounting based on UC authorship and based on discounts for consortial purchasing. Then we look at that derived price and compare it to what we were actually spending for those journals to see if we were getting a good deal or not. So we found this methodology very helpful and interesting in relying on objective measures, but there were a few problems with the data. First of all, it relied on impact factors. And impact factors aren't available for all journals. At the time that we started doing this, the data were not being consistently updated, which is more current right now. But also we were also trying to use this data in the journal decision-making process that our bibliographers engage in. So we're a large system, ten campuses. We do a lot of collaborative decision-making, and we basically distribute thousands of spreadsheets to our bibliographers every year asking them to evaluate journals and tell us what we should retain, what we should keep, what we should cancel, et cetera. And so we send folks into spreadsheet hell several times a year, and we wanted to get out of that business. We were using the Bergstrom-Acavi Good, Poor, Medium Value data, but because they used list price in their calculations and we don't actually pay list price, that wasn't really telling us what we needed to know about value. So we wanted to facilitate, we wanted a tool that would help facilitate our consortial decision-making and make that more informed, less subjective, and less labor-intensive. So we've since developed a new metrics-based approach to value assessment that at CDL we're actually quite excited about. It builds on the basic ideas of what Bergstrom and McAfee were doing, but it applies a slightly different kind of calculation. We definitely wanted to retain the notion of cost-effectiveness in our analysis, but we also wanted to incorporate elements like usage, which we know are very important to bibliographers in assessing value. So we've come up with a multi-factor approach that applies six factors in three main categories that we call utility, quality, and cost-effectiveness. So for utility we look at both usage and UC citation behavior for a particular journal. For quality we look at impact factor and another measure called SNP, which is an open-source normalized impact per paper, which is a measure that was developed at the University of Leiden. It's an open-source tool that's maintained by Scopus, and it's a broader measure of impact, but similar to impact factor. And then for cost-effectiveness we look at cost per use and cost per SNP. We then kind of cook up all those values into a numerical score, and we then assign that similarly to what Bergstrom and McAfee do as a value assignment of high, medium, low, or lowest. So we do that for all of our journals. So disciplinary differences are very important in this calculation because we don't want to privilege high-use journals over the niche titles that will get less use, and you can see within different disciplines that the profile of use is very, very different. So what you see on the screen are the median values. Median values are part of our algorithm. So you see the median values that you see in various subject categories. So very high median usage in medical and health sciences, very low median usage in the arts and humanities, and similarly impact factors have different subject profiles. We started out in this metric that we were applying using the four broad subject categories that you see on the left-hand side, life in health sciences, physical sciences, et cetera, and we knew that those were far too broad to provide meaningful data. And so we started casting about for a more granular subject scheme that we could readily apply, and we identified the Australian and New Zealand standard research classification as something we actually could apply to our journals in a fairly straightforward way. So that is a tool that we're now using. So when you put all this together, what do you get? Well, aside from all the spreadsheets that we still have title-by-title information, we now can get a picture of the value that we're getting in various packages. So this is an example of particular publishers journals where you can see, I don't know how well you can see the writing on the chart, but basically the red and pink half shows you the low and lowest value titles and it shows you that more than 50% of the journals are in a very low value category. 23% of the publishers journals are in a high value category. So it's an interesting kind of picture. We started using that in our negotiations and it turns out that when you tell a publisher that half of their journals aren't very good value, they don't like it very much. So they can in fact become quite defensive and resistant. We've done some other work, however, where we take individual journals and compare those to other individual journals from publisher to publisher and show that comparative data to publishers and that's a little bit more compelling to say that there was an interesting fairly high profile case situation recently where we were able to show a publisher that the price that they wanted to charge us would have been much higher from a value perspective than other journals that we licensed that were similar to their journals. You can also look at this within subjects. So this is an example of the value distribution of our SAGE psychology journals and it shows you here that there's actually fairly high value, good value from SAGE in their psychology profile. Lower value titles are a much smaller percentage. If we look across all of our journal packages, we can also see marked differences in the high versus low value of different publishers. So the differences that you see between Elsevier on the higher end and Taylor and Francis on the lower end of value within this particular subject discipline is very interesting data. So where do we go next with this? We're now in the process of applying this more detailed subject scheme to all of our journals, putting all of this in a database and we will be launching a major project with our bibliographers this fall to distribute all of this data and have our bibliographers validate the rankings and the metrics that we're trying to get the bibliographers to trust the metrics and they will be able to tell us if we've got this right or not. So far the current trial that we're doing is very encouraging. The bibliographers seem to be telling us that the metrics are telling the right story, so that's very encouraging because this will help us do better data driven decision making and take some of the subjectivity and labor intensity out of the process. We will utilize this results in major renewal negotiations that are coming up in 2013 which is going to be the mother of all renewal years for us. Given our budget challenges, we anticipate that we may have to cancel quite a bit out of our packages and in fact may be pulling back from deals and this will help us determine where we should focus our efforts, where we should target that activity. So will publishers work with us in this restructuring is of course a big question. In one recent UC example, we did cancel a large number of journals from a particular publisher. We were able to reduce our spend by 8.5%. We had to cancel 32% of our journals to get there, so that's not exactly where we want to be. Had we used the publishers methodology, we would have had to cancel quite a bit more out of our portfolio. So using our own select and driven methodology was definitely better for us. But we often talk about this in the context of changing the scholarly communication system and I think we all know that changes to that system are going to take efforts in many, many different areas and it's a very complex area. So this is perhaps one piece of that puzzle but obviously many other things that we need to do in order to really change the dynamic of scholarly publishing. So I dressed very springy today in hopes that it would bring the sun out and so far it has not worked. So lately I've been using the analogy of rivers and ponds when talking about what I see as the future of research library collections. So whereas the traditional library collection is kind of like a pond that we're constantly digging to try to make it deeper and broader so that it can hold enough content to ensure that all of our patrons' research needs will be met, I think that the opportunity that we're facing now in the developing information ecology is instead to help our patrons get to a river bank where they can actually grab stuff as it flows by in a constant stream. But obviously our patrons need more than access to just content that's being created or newly made available in the moment. They also need access to a huge and growing corpus of pre-existing documents and increasingly data sets. So to extend my analogy further and torture it just a little bit harder these huge conglomerations of content might be considered oceans into which these rivers of content are constantly flowing. Now ocean-sized aggregations of content have existed to some degree in the form of journal aggregations, article aggregators like EBSCO and ProQuest have created what are functionally oceans of journal content for years and projects like the Google Books Initiative and HathiTrust are now demonstrating the possibility of creating massively aggregated though never truly comprehensive but massively aggregated oceans of book content. So what I want to do in the few minutes that I've got is first of all I want to describe four things that I consider to be controlling realities that have a powerful impact on what our options are in the current environment. Then I want to talk about three general strategies that I think make sense for the foreseeable future and then I want to talk briefly about being tough. So with all of this in mind with extremely rare exceptions no single institution is capable of digging a local information pond that's deep enough to meet all of the needs of the people that it's charged with serving. Half of all ARL libraries have materials budgets under $10 million. 83% of ARL libraries have materials budgets under $15 million. Few if any of us, I know a few of you are out there don't try not to look smug when I say this, but few if any of us are getting budget increases or have gotten budget increases in the last couple of years. At the same time obviously journals and especially science journal prices continue to go up at roughly 9 to 10% a year. So what this all means is that not only can we not even come close to keeping up with all of the commercially produced content that comes out every year but there's also a growing corpus of non-commercially produced content that not only do we not have the capability of collecting but we don't even have the capability of knowing about it. The tidiness of the fraction of the information that is out there that could be useful to our researchers that we are able in any way to amass and take care of is just microscopic. Second point is that no information ocean, no matter how large it is for the same reasons is ever going to contain all of the world's information and I'm talking now about entities outside of libraries. The second is that while some degree of inter-institutional cooperation and sharing is going to continue to be essential as we move forward into what is shaping up to be a radically different information future that sharing and cooperation I think should involve as little shuttling of physical objects between institutions as it possibly can. I think that our goal when it comes to sharing really ought to be sharing without shipping and if anybody here is tweeting, I'd really like sharing without shipping to become a meme if at all possible so if a few of you could just send that out. And then fourth, given flat or declining collection budgets on the one hand, constantly rising prices on the other and at the same time an information environment in which it's increasingly possible to buy access in real time based on demonstrated need rather than in advance based on speculation about future need, it seems clear to me that most of us to a considerable degree will be moving away from speculation based collecting and towards patron driven access models wherever that option becomes available and practical. So the three general strategies that I want to suggest are first of all, for real time information needs our patrons are increasingly well served by access to a constantly flowing river of digital and physical documents until recently it was really, there was only a trickling stream of on demand e-books that were available but in the last, really the last year or two that stream has really grown into a flood and access to that flood is now available by means of a variety of models that are offered both by individual publishers and by dedicated book aggregators and increasingly also by traditional print book jobbers who are now integrating on demand e-book provision into their traditional approval and firm order services. So that's for real time access to stuff that people just need right now. For archival permanence and for the creation of a reliable corpus of books digital oceans such as HathiTrust and the Google Book Initiative are I think increasingly important and are going to grow in importance. I think that for many not all but for many research libraries of the future access to those kinds of relatively comprehensive aggregations of content is going to functionally replace the locally created collection pawns of the past which will leave those individual libraries free to focus I think on two general collecting strategies. First of all, patron driven access to materials that are sort of on the margins of those deep ocean sorts of aggregators and second on the creation of better and deeper special collections pawns of the kind that David was describing at Princeton that truly are unique and to whatever degree is appropriate locally significant. When I talk about patron driven acquisition I'm always at pains to draw a distinction between general circulating collections and special collections. What I see PDA is being tremendously useful for is replacing a lot of what we've done with our general circulating collections obviously not with special collections. And then third for those who need or even who simply prefer access to print books in-house print on demand is a clearly emerging option although it is emerging more slowly for obvious reasons. The Espresso Book Machine for example shows that in-house print on demand is function is something that I was going to say is functional it's not quite functional but it's something that can be done the technology is there but it also shows that it needs to be done much much better with a substructure of higher quality metadata and with better and more comprehensive participation by publishers but the promise of this technology is absolutely obvious and I have a really hard time believing that publishers are not going to become increasingly enthusiastic about it as the potential for backlist sales becomes clear and in fact the recent announcement that Harper Collins has signed on with Espresso Net suggests that publishers are beginning to see those possibilities already. So in the last couple of minutes that I've got I just want to, and my apologies to those of you who were at the Gwilla Deans meeting a couple weeks ago in Albuquerque and heard me say basically this already but these are really amazing and exciting times to be a researcher. Never, never has so much rich content been so easily available to so many of us but they are not perfect times for researchers by any means and they're downright scary times for us as librarians any of us who are paying attention are seeing many if not most of the assumptions that have guided our work for 50 to 100 years being upended every day. Now when we get scared there are two natural responses one of them is to freeze and the other of course is to run around crazily in circles waving our hands over our heads. Both of those are easy responses and neither one of them is particularly helpful. They're not helpful to us in our profession and much more importantly they're not very helpful to our patrons. So instead I think what we need to do is take advantage of the crisis to extend that cliche that we've been using for the last three years but there is a very real opportunity in this particular crisis and that opportunity lies in the fact that the ground is now very, very soft. The kinds of conversations that Ivy and her crew at CDL are having with major science publishers would not have been possible before 2008 I don't think. And when the ground is soft what that means is that any number of new models of both scholarly publishing, scholarly communication and research librarianship can emerge and I think that if we're willing to take leadership if we're willing to take some risks and if we're willing to actually stop doing some of the things that in the past may have seemed to us as if they were core functions then I think we can move in some of the directions that I've described and heaven knows maybe different and much better ones in some pretty exciting ways while discovering other and again maybe even better new directions along the way. Thank you. I was given the challenge of using ten New York minutes to try to convey information about what's changing in the area of special collections. I want to bring a focus sort of from three different perspectives to try to paint a landscape with this real quick fly by. I want to talk a little bit about born digital work that we've been doing with the Rushdie collection with some very interesting questions for us around born digital materials that I think we can extend into this whole area beginning with what the heck are you acquiring when you acquire materials such as a computer that's dead because it had coke spilled on the keyboard and the author isn't sure whether it can be brought back to life or not. If one is buying a collection and one can't really see it and go through it it's an interesting question about how you value it. Moreover it's a very interesting and pointed question to begin to talk about what kinds of rights if you will does get transferred in terms of looking at things like erasures on the disk. So are we acquiring a disk? Are we acquiring what's now on the disk? And that's a very, very interesting question. If you want to, for example, take a look at keystrokes and map keystrokes backwards in terms of the creation process you're going to have to look at keystrokes that were eliminated and so forth. And how does one begin to balance that with privacy? So one set of issues there. The second issue I want to explore very briefly is two projects, Voyages and Origins which are really kind of organically evolving databases and I want to talk about why there's special collections and the unique problems that they present. And thirdly I want a quick brush stroke through our digital scholarship environment to talk about collaborative authoring and where that's taking us. So the Born Digital Material has this very interesting characteristic in that digital materiality really matters. Context in this case really is everything in terms of trying to understand what's in the collection. So we want to understand exactly what the author was doing, the kind of software that he was using, how that software worked. In order to begin to do that, it's not good enough just to transfer some files from one location to another. You really have to have the machine or you have to have an image of the machine and you have to be able to do something with that. Beyond the machine we need to begin to think about that machine was operating very quickly in a networked world. And so you begin to think about questions. Where does one draw the line as one is beginning to explore this whole digital realm and you realize we've got authors communicating via Gmail or earlier manifestations of electronic mail. Facebook which claims that the material in that is essentially theirs and not the authors. And so who's standing up for author writes there. And even things like Twitter and so forth. So we're taking a very broad definition of what is the context in terms of which we're trying to look at this creative process. The first decision that was made was in trying to think about research driven need and following need is the question of curation and access. And we made a decision to really keep the original environment if you will to maintain authenticity. We would provide an emulation mode. So we've taken the Rushdie materials and we brought that up. If I can get a cursor here. And what you can see here is an old file directory which he used. If I would blow that up you'd find some interesting things there including banking, articles, so on and so forth. More interesting over here probably difficult to see but this is one of his early manuscripts and what you might be able to just faintly catch is that he was actually lining out text that he wrote in the same so he was working exactly the same way that he was working on paper and using essentially the same tools only on the computer. And over time what we can begin to see is of course he realized he didn't need to do that. His writing took on a different form and hence then this question about trying to understand the creative process and mapping keystrokes becomes very, very interesting question. So over here kind of a mock-up. The author's wishes in this case were that we could take this material make it available in the library in the same way that we would with something that was analog or paper. So we have one copy and so we have one workstation presenting Roushji's material. So if you want to see this you actually have to physically come to Emory and I realize that may seem to be that seems odd in terms of a digital collection but we have to work through these issues sort of slowly over time to really build an understanding about where this goes. So we've got emulation mode here research and browse capability and so forth. As we begin to look ahead then where does that go? We want to understand not just his creative process we also want to understand how the scholarly community is beginning to interact with this work and so we've got material here where we're showing his environment and then sort of gluing on top of that some tools that will allow us to essentially collect and archive web annotations and so we can begin to look at those annotations over time for those who choose to annotate and we'll classify them in categories so other authors casual observers whatever and again we think that's a resource that will be interesting to mark and follow and add to this collection over time. So one view here is a digital collection with some different kinds of problems but in some ways they're fairly straightforward. A second sort of view into where special collections are at at our voyages project a collaboration initially with a faculty member at the university who had been working for decades on trying to build a database of information that first manifestation was a CD-ROM and eventually was put on the web and today has become the canonical database chronicling five centuries of the transatlantic slave trade chronicling about 37,000 voyages the names of about 64,000 individuals who were part of that trade and then trying to really look at the information that we have around that to try to understand what was going on in that period of time when the ship started how many were on board when the ship arrived at a destination how many disembarked how many fatalities or mortalities were involved and so forth so a lot of statistical data being built over time literally over decades and this has been a project involving about 50 people on three continents housed in the library and a wonderful kind of partnership with scholars, information experts librarians and so forth it's dynamic in the sense that we have on the follow on project origins we continue to add information to the database all the time so as scholars are in various parts of the world and they uncover things and they find materials they're able to add, for example more names to this registry it goes through, as you might expect a peer review process, a vetting process and then that database is built if you will a new build periodically we've got interactive timelines we have data modeling with the data in here and so this begins to raise interesting questions in terms of this primary data and how does it get cited what version are we looking at given that it's evolving very very quickly over time a third area I want to highlight very quickly is our digital scholarship comments really this is a kind of a sandbox in some ways an incubation center or laboratory to try to evolve questions around sustaining e-research for the humanities and we're really trying to experiment with a couple of different elements if you will in this laboratory one of those certainly is how does scholarship get done with digital humanities how can we add value to that what are the organizational models that ought to work we don't think this is a library space we actually think it's a university space with multiple players and so how do those relationships work perhaps most complex and most importantly is not just how does this work get created and ultimately preserved but how do we sustain the effort and for me that's not the preservation question it's really some of the financial and expertise questions that we try to scale over time this effort has been generously funded by the Andrew W. Mellon Foundation and it's one we're very excited about because it's brought new people really to the table and some different ways of thinking about special collections in terms of what comes through this process for a long time versus something that was an interesting experiment and gets thrown away so on the near term horizon a couple of questions I think that we're going to look at the Rushdie born digital collection as the good old days it was pretty straightforward we've got four computers we're going to acquire two more from him fairly soon and there it is as opposed to as we really get into this whole question of networked and distributed collections and where that takes us it's a much more complex environment secondly as I've already alluded to this question of citing special collections or primary material special or not that's really dynamic and is evolving over time and how are we going to do that in a consistent way not in a single institution but if you will across domains throughout the academy thoroughly how do we aggregate these collections over the network in ways where we have parts of them and we really want to connect and weave them together in new and interesting ways and fourthly the question of sustainability in many ways I think that's at the heart of where we might or might not be able to go the three examples I've given you which are just a quick brush stroke of a variety of efforts going on apart from some NEH money for the origins project and apart from the Mellon monies that's all essentially been repurposed money inside the library budget wonderful to get something off the ground but that by itself can't scale indefinitely and so therein lies the challenge thank you