 Good afternoon, everyone. I'm Deanna Markham. I'm beyond established. I'm now fading, I think. So we all have our little order, I guess. I'm delighted to have you here for this session entitled, Constructing Digital Research Collections. A few of us in this room have been working on the notion of a digital library for longer than we care to think about. And we've seen many manifestations of digital collections trying to become digital libraries. A decade ago, many of our libraries were working on digitizing special collections, the beautiful, the rare, and they were almost like publications for the website. And we've seen a real transition to collaborative work that will result in an honest to goodness digital library. So we have three speakers today to talk about some of the initiatives that are underway that involve collaboration and rethinking what we are doing. We are running a little behind time, so I'm just going to introduce the speakers by name. In the program, there's very nice biographical statement for each of them, but I think we'd rather hear about your future thoughts than your past, if that's okay. So I'm very pleased to introduce our three speakers. Chuck Henry, Charles Henry, the President of the Council on Library and Information Resources, formerly at Rice University and Columbia before that. Ed Van Gemert is the Deputy Director and the Associate Director for Public Services at the University of Wisconsin-Madison, and Bruce Cale is Digital Librarian and Founder of the Internet Archive. What I've asked these speakers to talk about is each of them will talk about one of the initiatives that's now underway. I'm asking them to describe what is it, what's it all about, who's involved, how does it affect or can affect ARL libraries. And do you see these as important initiatives for true collaboration that leads to the kinds of results we're all looking for, or do you have something else in mind that you think would work better? So with that, I'm going to ask them to speak. We will go in the order that you see them at the table, beginning with Chuck Henry. Thank you, Deanna. It's a great pleasure to be here, as always. I get to talk about the Digital Public Library of America, and as Deanna said, large-scale digital libraries have been under discussion for decades. This one is probably the most recent. It's been around for about a year, and it has garnered a great deal of attention and interest for something that's relatively short-lived. As those of you who have some familiarity with this project know that it is susceptible to many interpretations, and I will just give you my take on it this afternoon. And then we can talk more, we can triangulate with some of the other steering committee members to see where this may go. Every term, almost every term in the title Digital Public Library of America has been contested, and it goes on and on. Maybe of, we haven't seen much email on OV yet, but there's time. And to give you an example, I think that the interest and the concern and the potential threat of this is legitimate, something to keep in mind as we go forward. The term public has probably received more email exchange and more consternation than the other words here. Many, if not all, the state public libraries and many, many city and town public libraries were very concerned from the beginning of the use of the term public. And their concern focused mostly on the possibility of, if this took off, if this really was built, if we did build this very large millions and millions of digital objects library and called it public that funding for their libraries would decrease, that funding would go towards this and not to them. And I raise that because there may be some truth in this. And always when a project gets going with this amount of funding at this scale and involves so many different constituencies, I think these issues are going to be raised. They're healthy to raise them. And again, something to keep in mind. I'm going to be fairly brief and just give you a kind of chronological story. I guess narrative of the digital public library with a few editorial comments as we go along. They are mine. They don't represent necessarily the steering committee of the project. Digital public library of America got started a little over a year ago. And it was largely the initiative of Robert Darten, who was a Harvard librarian at the college at the time. I think many of you know Bob and know his work. He's a quite distinguished 18th century French historian. And he was concerned, and here comes the editorial part. I think Bob and some others were concerned at Google and Google Books. And their concerns had to do with an industry, a corporation that was digitizing vociferously our cultural heritage. And at the same time, there was a lot of uncertainty about the access of this cultural heritage. There was uncertainty about what this might cost, how we were going to use this, were we going to be charged per page. There was also concern, I think, about the quality of the scans and the difficulty often in trying to determine the provenance of some of these objects. So Bob and others got together and felt that perhaps if we banded together, we being the libraries, archives, museums, academic institutions and scholars, could we in fact put together what might be called or seen as something counter to Google? That we could build a digital public library that was more encompassing, that was open to the public, that was a true national good, and it would be a kind of aggregation of our cultural heritage in all the different kinds of media, text and images, moving images and such. So the concept, I think, was interesting. There was a meeting in Cambridge and the project began. The management of this project was turned over to the Berkman Center for Society and Internet and Society, and Berkman still manages this. The Sloan Foundation put up money, considerable money, millions of dollars in this case, for this project, basically, for project planning. So it has a healthy chest, it has a healthy revenue base or reserve at this point to move forward. Sloan funded this for 18 months. So we're about a year into the game, and I imagine it's going to take another year, possibly more, in the planning process. So the funding was there. The statement that came out of the meeting in Cambridge a year ago, almost to the day, was that the Digital Public Library of America, or whatever it eventually is going to be called, would be an open distributed network of comprehensive online resources drawn from the nation's cultural heritage, and its main intent would be to educate, inform and empower current and future generations. And I think that's a very lofty goal, and I think it still holds true. I think that rhetoric is still very much alive in the DPLA. After the meeting in October, the steering committee was formed, Deanna is on it, Mr. and I are on it, a number of ARL directors are also involved, other ARL directors. And starting around last December, seven or eight working groups, they call them work streams, were put together. And I'll mention them briefly. A lot of this information, if you really want to dig down into the DPLA, it has a really good website. More information than you probably care to delve into, but it's there. And it's actually quite, some of it's quite interesting. Some of these work streams are, as follows, audience and participation. And now the work streams are made up of anywhere from probably about 10 to 15 people drawn from libraries, drawn from archives, scholars are involved, amusing people, some funding agencies participate. So it's a very interesting mix from our various communities and constituencies. So audience and participation, that work stream, they're asking questions as the digital public library of America begins to take shape. Who is being served? What's the audience? And the thinking right now is that it has to be as broadly conceived as possible. It has to have multiple constituencies. Well, okay, who are they? What benefits a crew to these communities? This work stream will also look at how successful be measured. Now, if the DPLA is serving its mission, how do you measure that? How do you assess its efficacy? What are the methods of outreach? How do we reach out to more communities? How do we reach out to our own community to talk about this? So that group is focusing on these questions. Again, these groups are dealing with questions that are logical and they're pertinent to everything that we do here in this room on a national scale. Another work stream is content and scope. This is basically formulating a collection development policy for the DPLA. It's also looking at metadata issues and interoperability and questioning the term, what is critical mass? This is going to work. How big does it have to be to work? And again, good questions. Financial and business model, work stream, self-explanatory, looking at sustainable business plans for this venture. Governance, another timely thorny issue. How will the DPLA be managed? What will be the rules, the policies of its governance? Will there be a system of self-monitoring? Will there be the ability to modify the rules and who will be represented on the governance, through the governance? Legal issues, Pam Samuelson is leading this up. This is, again, obviously had a best approach and influenced legal and copyright environment in which we are currently mired, I suppose, and muddled. And Sloan, the Sloan Foundation has put up a special fund for this. So there's a large, a good amount of money that's going into this, to the study of the legal issues involved. Technical aspects, again, looking at how, what kind of digital architecture would be most appropriate that would conform and advance the mission of the DPLA? So these groups were put together, and there's a few more. And there was another meeting in Cambridge in February, with the steering committee and other, many others attended. And that meeting would discuss, again, some issues, all these issues had come up, the work streams were discussed, the mission statement was discussed, the name of the project was discussed. And I think it would be fair, but again this is my take, at the end of that meeting a group of us got together and felt that another year or more of discussion and planning was perhaps a bit too vague and too long. And out of that, we felt it was important to begin to build this, or at least begin to build aspects, or begin to test some of the assumptions through building components of it. In other words, keep talking, but get off the dime and start putting something up. And I think this acknowledgement, and I would say this a little bit of frustration, was elegantly conveyed into what was called the Beta Sprint, and some of you may have heard that about that. This is essentially, it's not a contest so much, it's just sort of an open call for innovative proposals that could help build and build more rapidly the digital library, the digital public library of America. And these proposals, it was when the prospectus was put out, they could be ideas, they could be models, they could be prototypes, technical tools, interfaces, all the kinds of aspects of building a very large scale digital library. And to more or less deconstruct it, look at its components and throw some proposals out that might help all of us think about the complexity of this, and also models that might be used in building it. About 45 proposals were received, a very distinguished panel reviewed them in September, and nine of these proposals will be presented next week. The national rollout of the digital public library is in Washington next Friday, it's all day event, it's being sponsored by the National Archives. There will be a series of speakers, and I think the idea of this, the concept of it will be, I hope, much more sharply clarified through this national meeting. And the projects that will be presented will also give you a bunch better idea of where people are coming at this. Briefly, I'll talk about these projects and then conclude. Some of the projects there is the coordination of the American National, what's called the American National Digital Collections, and that's a proposal that's from the Library of Congress, the National Archives in the Smithsonian. And then we'll talk about how their digital collections can be better made coherent, correlate, and more easily accessed and queried. There's also a proposal looking at interoperable metadata for the DPLA. There's a proposal called Shelf Life in Library Cloud, it's looking at combined content exploration with the social component, looking at content and also then connecting with other people who are looking at similar content, sort of building communities, virtual communities around this. My organization is making a presentation as well, clear through the Digital Library Federation, as you all know now, it's back into clear. And we have built a prototype that's based on an IMLS funded project from the Digital Collections and Content Registry, also called Opening History. And that's several million digital objects and what the team has done is to bring them together. These are multimedia objects as well, they're text and images and moving images and they sort of cut across American cultural heritage. And the team's been working on how to make this more coherent and usable as a possible prototype. This project is based on the Euripiana, which many of you are probably familiar with. And I raised that because again a bit of editorial comment here, I think the inclination of this project is to look at models like Euripiana as an example. And that model would be that the DPLA, again this could change in a week, but I don't think so. The DPLA would be much more of a federation of existing and future digital objects. It wouldn't own anything. The institutions that currently have large digital libraries and institutions that can contribute over time will do so. The job, the contribution of the DPLA would be to build, to set standards, metadata standards, interoperability standards to help aggregate and federate these millions upon millions of digital objects, as well as to create APIs to use this material more efficiently and effectively. And that's the model that it seems to me makes the most sense, but again we can certainly discuss that. So that's what's going on and that's the way this project has been attacked and managed. And I think it's been actually managed very well. Just some concluding remarks to speak to some of Diana's questions. This is not a library per se. The term library is somewhat misleading here. I don't think this project is out to build a new library. It's out to make visible and reveal this incredible array of assets that we have already that are very difficult to access because of the siloed nature of our world. And also I think to look forward to develop the platform, the architecture and the standards in order for people to contribute to this over time. In that sense, if it's done well, it's not going to compete. It's not a competition, but it is a service to all of us or should be. I honestly think that it is a national good or could be and a very exciting one. It's already attracted and this alone it seems to me is worth our attention. It's attracted a really strong pool of talent to this. Those proposals are quite good and a lot of the top people are involved with this now. And I think that in itself is noteworthy. It also has clearly the interest of significant funding and funding agencies. So there's a will here and there may be a way to build this at least financially over time. I think in closing this project, it's nascent obviously. It's contested at every turn. It's actually building quickly but a bit intermittently and somewhat ad hocly. That's okay, but I really do think that this is a project to watch. And I think that particularly for ARL it has, I think, enormous significance over time. My sense of this again and concluding here is that there's a lot of constituencies that could be represented here and many, many constituencies who contribute small amounts of digital library material or vast amounts of digital library material. If this is going to be done well, it has to be a project and a program that can respond to the most disparate and varied kinds of questions. And it's the user that has to drive this. And my sense is that if we do this well that we would have hundreds of millions of objects that could be efficiently searched, that could allow for new questions to be asked, that could allow new methodologies to come into existence, that could create new kinds of scholarly models of communication and a project that could be as of fundamental importance to our senior scholars as well as to a curious 10-year-old who's interested in the Middle Ages. So I'll stop there. Good afternoon, everyone. I should probably also just disclose too that in addition to my work at Wisconsin, I'm also the Chair of the Strategic Advisory Board for the Hathi Trust and sit on the ex-officio on the Executive Committee as well. So you're aware of that. I'm really pleased to be speaking with Chuck and Brewster. I do want to echo a couple of things that Chuck said too about this not being a competition in a sense. When friends of mine asked, how on earth are you going to create space up here with Chuck and Brewster for Hathi Trust, that was my pretty quick conclusion that although there are limited resources today, there's a lot of work here in this space, and I think that one of the main issues is how we think clearly and coordinate our efforts, regardless of what's gone on in the past. So I've got a couple of slides here in an attempt to keep me on track in terms of trying to address some of Deanna's questions that she posed to us. Given that I am fully aware, so I've been here since last Friday, I'm beginning to feel like an honorary Washingtonian, but I know that a number of you, about 60 in fact, were at the Constitutional Convention for the Hathi Trust earlier in the week. So I'm going to try my best not to bore the heck out of you with repeating a lot of that information, but I'm also aware too that a number of you weren't at that convention and there might be questions or information you'd like to know about Hathi Trust. So I'm going to go over some of it at a fairly high level and my remarks are intended to try to generate some discussion about larger, broader issues of collaboration that we could try to address across organizations. Hathi Trust is three years old and it was and is all about preservation of digital content. What to do with all this Google stuff was initially on our minds, but it has become since then also an avenue for access and an avenue for us to be thinking about our print collections as well as other collective decisions that we're able to be thinking about for planning our libraries. It is becoming a comprehensive collection and maybe most important in terms of a value or a principle is the notion of moving ahead with a shared vision. Whether it be particularly, as one example, the whole area of rights clearance is becoming of more interest to the membership. Not only with University Presses, for example, but with our own scholarly authors on campus and elsewhere. Also the trademark I think of openness and transparency is one that Hathi Trust Governance and members try to achieve. For example, we're talking now about the possibility of using Creative Commons licenses for the development of the various programming that we're working on. Certainly one of the accomplishments is the membership, I think, over the course of three years. We're now at 60 plus. Pretty remarkable when you stop to think about it, I think. I love this slide. It's not mine. I stole it from John Wilkin. And basically it shows a direction in terms of Hathi Trust thinking. So at the beginning we were all about how much is this going to cost by the drink for us to pay for storage and preservation of our digital content. We're one of those institutions at Wisconsin where when we cost it out how much it would cost locally in Madison to store the material, it turned out to be three times more expensive for me to store it locally than it did through the Hathi Trust. And so moving from that sort of a notion of paying by the drink to a broader concept, thinking more as a shared digital repository, thinking about how we can apply the digital content more broadly to how we plan our libraries and moving a cost model toward that rather than just paying for our own preservation of the material. So on average basically what this is saying for example for Wisconsin, a little bit under 40% of our holdings, things that we at Wisconsin held or hold are represented in the digital repository, the Hathi Trust digital repository. Pretty remarkable. The growth is somewhat astounding. Just a little bit under 10 million. Last night it was at 9.7 when I updated the slide. If you look today it would be something more than that. The public domain material at a little bit under 2.7 million has been predictably and reliably around 27% for quite a while. Interestingly for me is that over the course of time there have been fewer than 20 requests for takedown by rights holders. Over a digital repository of 10 million books. Whereas conversely more than 5,000 requests have been made to open up material by rights holders. We all know that libraries exist in a changing landscape in our universities. We're in many cases a microcosm of what's happening in our universities. We're seeing at every level more and more collective decisions being made. At Wisconsin when we talk about digital preservation, when we talk about programming, when we talk about technology today, we always bring Hathi Trust into the conversation. To put it simply, we simply can't afford to do our work separately. That could be done collaboratively. So, I've already mentioned a little bit about the share and the cost, but I'd like to say just a few words about the governance of Hathi Trust which is interesting to say the least. It has an executive committee and the executive committee was initially appointed from the CIC and the University of California and also from the University of Michigan and Indiana University, the two founding members. And has just concluded with our constitutional convention where we're moving into what we think of as phase two in terms of leadership and governance for Hathi Trust. In my estimation one of the major challenges will be to maintain a very high level of leadership for the organization, a very high level of agility and the ability to focus which sometimes are always not in the same sentence. So partners want that balance between those that contribute a lot of resources and dollars to be able to have more of a say in the direction of the organization and sustaining members who join to use public domain content because they believe in the operation also want a say in the direction of the organization as well. I won't go through each and every one of these what I call practical rollouts, but I think they give you a sense as to some of the values and some of the efforts and the principles that we've been focused on. In March of 2011, CRL certified Hathi Trust as one of only two digital repositories and secondly something I think is occurring that is of importance. There's a lot of discussion about the quality of digital files regardless of the repository and coincidentally, Paul Conway at Michigan is using Hathi Trust as a basis for his IMLS study of quality in digital files. This is important because it will establish a benchmark it begins to measure the quality of digital files. So what's the scope of the problem? Is it a problem? And so if for example there's a small percentage of items in the repository that are not of high quality what would it take, what would it cost to fix those and how could we go about doing that? So it'll be good to have that measurement and I mentioned a little bit about permissions before. The whole area of access to in and out of copyright material for patrons with print disabilities is an issue that we've been working on for some time and are ready to roll out. Michigan already has it in play. And you know, probably maybe the most controversial piece that we're working on certainly is and I've bolded lawful uses because I believe that we only have lawful uses of in copyright material but section 108 uses with replacement copies and also access to orphan works that we believe strongly in and intend to continue to move forward on. So one of the things that emerged from the Constitutional Convention was this notion that we may not all have to move forward exactly in step, that it's certainly possible that what's emerging for example in working with university presses that there may be two or three institutions who might want to take the lead in that area and work with presses to open up their material in the repository or some of these other areas that that work might influence the organization as a whole and others may join in. Some of the other moving forward issues that I hope represent a little bit broader collaboration possibilities would be the desire to want to increase the size of the repository even though it's at a little bit under 10 million we think that with some effort we could get to 15 with the inclusion of some of the North American collections that are still remain out there and also keep in mind we've got a number of Canadian participating libraries as well. The whole issue of international membership for the Hathi Trust is important but it's probably more complicated than introducing North American libraries into the organization. There are lots of other copyright issues and there are lots of other repositories too of course around the world too. I think that the international membership for Hathi Trust is going to have to take a separate focus and a more strategic focus. I was talking to the chair of our classics department on campus a couple of weeks ago and keep in mind this is Middle Earth for the humanities and I was telling her that we would no longer, the library would no longer be able to maintain a locked and keyed Greek and Latin reading room and that we would in addition to that the materials, the volumes in the current Greek and Latin reading room are likely candidates for storage and I was expecting all hell to break loose and she looked at me and said, that's no problem, I almost fell over dead and she said, most of the core material in classics from the late 19th and early 20th centuries available to me, us in the Hathi Trust and through Google Books so we can take this decision and stride. It's going to be a good week. But one of the areas that we really need to focus on more generally is the whole educational piece because I think she might be an exception in terms of our faculty understanding of and about the value of Hathi Trust. I think that my observation is that many of our selectors and bibliographers probably do the best job in terms of outreach to our faculty and scholars as to why they would want to be using the collections and how to use the collections. A third area is, of course, what are we doing about non-book material and what are we doing about special collections material? That's the stuff that Google didn't digitize. You know, I think, I don't know the exact number but Google digitized about 20 million volumes which is an astounding accomplishment but, you know, that was pretty much kind of easy stuff if I could say that. I mean, a huge contribution and we wouldn't be talking about the Hathi Trust and other organizations like it without Google work but image collections, sound, video, it takes a different level of focus and a lot more resources to be able to be thinking about those newspapers would be a perfect example on our campuses. So, we like to say that the Hathi Trust that we're developing an organization that's a part of us rather than a part from us and it is not perfect by any stretch. I mentioned a little bit about the leadership. Watching the Constitutional Convention roll out last week was really an interesting process to be a part of. It really was like a convention, if you will, or a Constitutional Convention, that is. And I mentioned before about the key for leadership moving forward. But, you know, we can think of organizations that we belong to that have become distant from us, that are apart from us and the cost get away from us and Hathi Trust is really entirely about the membership. And really, I think a case study for unfolding large-scale collaboration, being able to accomplish initiatives at a pipe-tream level that we would never be able to accomplish on our own. The trick is, however, to be able to balance the values and the costs and the risks and to stay open-minded about moving forward. So, I think I'll conclude at that point then. I'm Brewster Cale, Internet Archive, and I wrote desperately to Deanna, please don't leave us. I'm going to slip a paper and I slid it over to her and she wrote back, never. So, I'm heartened. Deanna, I'm going to keep this. It's an honor to be here. I've only been to one other ARL meeting. I remember asking Dwayne, when he was president of ARL, our head of ARL, to the Internet Archive, join. And he looked a little ashen at the time and said, oh, don't do that to me. But I think the Internet Archive has matured along and so I thought I'd catch up a little bit on sort of where we've gotten to and how we've gotten there, but really hit this particular question because I think we are out of crossroads on a very important one. And we will decide which way we go based on how you do your budgets. And if you continue doing the budgets, the way you're doing them now, the course is clear. So I'm going to try to open a question here that I hope that you take seriously. And it's buy or rent. And really, it comes down to how different are books and e-books going forward? Because things have changed. We're not in a mainframe era anymore. We're in an era of the Internet and we're in an era where there's a great deal of storage and capacity of our computers in a distributed environment. That really wasn't with us 10, 15 years ago when we got our chops. So there's, I think, some differences. But okay, who am I? Where are we now? The Internet Archive. It's a 501c3 nonprofit library independent of government and universities. It's one of the top 250 websites on the Internet. We get about 2 million users a day. About 10 million books get downloaded a month. But it's a place on the Internet that is quite heavily frequented and we're a registered library in California. We've been building collections of different sorts. Our newest collection is physical books. So we've worked to try to figure out how to store physical books effectively. And we do things in the same way. We try to basically use ourselves as a guinea pig but then offer our experiences. We try to knock a zero off of the cost of storing books. We want to make it so that if you don't want a deaccession you don't have to for cost reasons. So either if you are going to deaccession we'll show up with a happy bunch of people and we'll pay for the shipping and take them away and credit you or not credit you. If you're interested and we will try to keep one copy of every book that we collect alive forever. But really what we would like is to offer you technologies so that you can go and store even more densely, less accessible but at least they're not pulped to go and store away millions of books. So this is a modified shipping container that are plumbed for temperature and humidity control. They've got individual environments so that if one has problems doesn't spread out you can store movies in one and books in another and records in another and other types of anyway. So the offer is open. We would like your books. If you're going to throw them away really what we would like you to do is consider not throwing out your books. And it is very inexpensive to store millions of books. We have now 350,000 that are cataloged and we've got another couple hundred thousand to catalog and we're starting to get good at this. So we think we're beyond the prototype stage. Moving images, we've got about 500,000 moving images we've now gotten quite good at moving them forward into different formats and trying to get them distributed dealing with the rights issues which are thorny but doable. These are a lot of them are user contributed but also digitized moving images and lectures and the like. We have about one million audio recordings. These are very popular out there on the net. Anything from news and public affairs types of things to lectures to recorded music. Again, all these are the publicly accessible ones and we keep the ones that are commercially available offline for all the normal rights issues. We've been collecting a lot of television where we recommend it. It's an important cultural aspect that's underappreciated in our libraries. We now have over about two million hours of television. We've been collecting television for ten years 20 channels, 24 hours a day Iraqi television Chinese, Russian, Japanese BBC, CNN, ABC Fox basically a good swath and we've now really turned up the collecting into Africa and trying to make sure that we've got key channels in every country that brings it beyond the original 20. Again, even a small organization like ours can take on something of that scale. We have now about two weeks ago we hit three million of electronic texts that are publicly accessible. Probably two million of them is what you'd call books but there's lots of other types of materials. These mostly came from libraries like yours. They were and the cost of our building this the money that went through us mostly for the digitization was about $50 million over the last eight years. Thank you very much. It's basically those libraries that are signing up to have no restrictions on the public domain. Those that basically want to make sure that anybody can have access to these. These are the three million that are publicly accessible and then I'll go on about in copyright and some of the things that we're doing there. The idea is to not show how great we are gosh we think we are pretty darn terrific but it's trying to show technologies and techniques that you can do these as well. I think one of the big things that have changed is the storage technology is quite a bit easier. I remember visiting back in, gosh it was forever ago OCLC and asked just how big was the database and if I remember it correctly it came down to and you calculated it about 17 gigabytes. That sort of sounds like a thumb drive to you now. It is. This is one of these new four terabyte hard drives and if I'd finished loading it up with books it would have 150,000 scanned books, searchable, downloadable, browsable books. You probably put it into a Macintosh and it takes a little while but then you have full text indexing on your desktop. So the idea of 150,000 books being like this I'm not trying to make it look cheap I'm just trying to make it look approachable but the changes happened from the era of mainframes and requiring that to having it so that you can actually think about having something that kind of looks like a book and if you had six of these you'd have a million books. If you had 60 of these you'd have 10 million books which is kind of the number of a big library. So that's sort of a Yale or a Princeton or a Boston Public Library so we use 10 million as the number that we're shooting for and we have lots and lots and lots of webpages so we got our start on collecting the web and we have a lot of it and we're trying to keep up with it. But the real point of this is to separate buy and rent and to try to give an idea of how we might be able to proceed. I don't mean to be placating here but let me hit a couple of points because I think they're important. You buy it once and then you own it. You can reformat it for new uses so we're digitizing our collections and reusing them in digital form in the movies collections. We've reformatted them six times so going in reformatting is important to be able to do and you can if you own it. You can organize and present it ourselves and we can do distributed preservation. Frankly, this is the library system that I thought I was joining into 25 years ago when I attended briefly library school back in Boston. Then there's renting or licensing. Well, you pay for it every year or you lose access. It really has you over a barrel. There's external control of the formatting. Say that's a blessing. Sometimes it occurs. Usually it's both. The external control of representation and organization. This is one of the key things that I thought we as librarians were trained to do. But basically it's really outsourcing it and things might be preserved. It's sort of these guarantees of somebody else will have it. But what about the aspects? I've been thinking about exactly what happens when we build up these organizations that we license from, whether they're for-profit or non-profit. These integrated services usually are bundled with the content themselves. So the services comes with the content. If the services were unbundled so you could have the content and you could go and put different services or there could be competing services, actually I'd be really happy with this. But usually it comes bundled but there's only one organization or company that you can go to. And bundling, say with journals, becomes common. So they really, again, sort of the turn of the screw that most of you have probably had to deal with when contracts have hit your tables. And then there's these price increases. It's not really a liquid market. It's not really like you can go and say, well, this bunch of medicine journals are better than this or equal or whatever. So this licensing tends towards monopoly. I would suggest that basically we are building monopolies when we engage in these, building these licensed organizations. And I'll put some of our friends on here. Lexis, Nessus, and Lesla, OCLC, Elsevier, JSTOR. I know I'm, you know, I'm going to go hard here. But I think it's important to notice what happens as this goes through. There's HathiTrust and again we don't quite know what DPLA is going to be. But the idea of having these centralized resources that we license, we end up with few publishers, few services, and one view. When we rent, I suggest that we're going to be reducing many of our library services. The things that we went to library school to learn are really going to be replaced by being basically customer service departments for other people's services. This isn't good enough. We are the big libraries of the richest country in the world. We can do better than this. Is there another way? I would say yes. And the technology has come around a corner that allows us to think quite differently. And that I think is a great advantage. What do we want? What I'd suggest is we want many of many things. We want no central points of control. We want many publishers, many booksellers, many libraries, many authors. That get paid. And everyone can be a reader. Not just the people that have sort of crafted themselves within a license restriction on our campuses. But anyone can be a reader. How do we get there? I suggest we can get there, and we are getting there. But we have to work together, and again, our wallets. We really decide our future. Let's buy the e-books that we can outright. What does it mean to buy things? It means the same thing as it always has. If you have 100 copies, only 100 people can be reading it any one time. But you own it. You have it forever. So we pay for it, we own it. And publishers, some publishers, not the core publishers, but fringe publishers are selling us e-books. And it's working. We can digitize our older books into an e-book format but this is working out very well. So we've digitized a couple hundred thousand current modern books. And of those, we make everything available to the blind and dyslexic and have for a couple years. But we take basically the 20th century and we've been lending it one person at a time. We've been using the same technologies that the publishers use to protect their in-print works, to protect these basically out-of-print works. We've been lending it one person at a time. And we distribute using open platforms like web browsers, as opposed to going for bundled devices such as Kindles. So we really tend towards open platform delivery. Those are the ingredients. So how is it going? It's going fairly well. The technology for going and serving e-books is about as complicated as serving your library catalog. Sorry, the typo. Serving is now as difficult as running a library catalog. Books aren't that big. They're just not that big in terms of what digital... And if you're maintaining a catalog, you might as well maintain the actual collection itself. So how are we doing this? Well, we got good at digitizing books thanks to you guys. Thanks to Sloan Foundation. Thanks to the Microsoft Corporation. And we've gotten it down to ten cents a page. We're digitizing in 27 locations in six countries. We're digitizing over a thousand books a day. And it's going along very smoothly and very well. So this whole flow in terms of how to do this, and this includes all of the optical character recognition going and putting it into about 15 different formats, storing it on two continents, all of that stuff. About ten cents a page, or $30 a book. So about $30 a book is what it costs to digitize a book at the level of quality that you can see, which we're pretty proud of. So where are we? Well, if it's a 10 million book library that we're trying to build, I'm using rougher numbers than your 27 percent. But about 20 percent is out of copyright. About 2 million. 7 million in copyright, but out of print. And 1 million in print. We think that the public domain should be free, really honest to God, free. None of this, oh it's free but Google says we can't do anything with it, free. None of that. Free. You can download all of it and go nuts. And we're now at 2 million books that have been posted on the Internet Archive. So I'd say in many ways check, we're doing pretty well on that front. So over this period of time, of 8 years we in the open world I think have gotten a large part of the way there. We're a bunch of the way through the out of print but still have quite a bit to go and we're starting to buy books from publishers on our terms. Old style. Buying them. And at least French publishers are going for it if not the core. Yeah. What do you do with it? Lending. What we're finding is lending is a good model. It doesn't cause people to ranker. It doesn't cause people to have endless negotiations with lawyers. It's kind of what we have always done. We bought things, we lent it out. And there's no lawsuits. Maybe one will pop up tomorrow. But so we've been doing it now for about two years led with there are now about a thousand libraries that have contributed in copyright books to be digitized and offered under their name a thousand from six countries and there people are borrowing them all over the place in an in library lending program as well as public library. So you have to be either inside a library or on campus or not. Signing up because we got money from the stimulus program to digitize a hundred thousand books so that all of that is available to anybody that wants to sign up for free. I'm just going to flip through how it works. These are the thousand libraries in eight countries. If you're in one of the Boston public libraries if you go to the open library site you can see that you can get this particular book say I want to borrow it this is a new book you look too young to be a mom say okay I want to borrow that this book said it's been checked out somebody else has it so you can't have it and since we only own one physical copy we're only lending out one copy of this book at a particular time. This is another book that we bought as digital from this particular publisher you can download the PDF and be able to see it in Adobe digital editions and people are all over the world and doing it now. It says you can loan the expires and every after two weeks it automatically goes away unless you return it explicitly. A Mayflower Ancestors book that's being borrowed from the Boston Public Library and it says that it's from the Boston Public Library you say you want to read it in a browser thanks to the Boston Public Library from 1946 and now you're reading in a browser on things like iPads or iPhones or whatever it is you want to be reading on. So the idea of this lending program for dealing with the out-of-print is going along very well and it hasn't run into some of the other problems that others have or at least not yet. So who has joined? We're now over a thousand libraries University of Toronto, Alberta, Florida are all in. We have now all the public libraries in Kansas, California and North Carolina and turning those on over time and Colorado Public Library Consortium and a bunch of other libraries around the world. Working on distributed discovery by offering all of the mark records to be integrated in with other people's catalogs if they want to or they can use an API service to be able to get to them. How to join basically you have to contribute a book you have to open up all your public domain books IP address IP addresses that define who it is you are and a contact person and it's free. So I want to say that there's a possibility here of digitizing and lending books and buy and lend e-books that this is a way that's working it doesn't have any centralized points of control. I think we can make this go. It's not that expensive. We're in process to try to get to the 10 million book library we're sort of at kind of two, two and a half million. Now we still have a ways to go. One thing that I would like to throw out there is let's do a million books and have all of the people that are contributing significant amount of money. Say 20 of us get together, 25 of us get together and say let's do the next million books. Then I would suggest that we digitize those and put that million books back into your physical collections. You'd have a few of these and you're restricted by copyright to what it is you can do. We're all law abiding citizens here but exactly figuring out what we can do or not do, we can certainly give it to our blind and dyslexic and they're starting to be more and more aspects of what we can do. Owning and serving a copy of this whole library costs $30,000 in hardware and declining fast. So what of our libraries would really not want to own a copy of the 10 million books? But then look at the people that you're looking to work with and are they going to give them to you? And I would suggest that there is a role for us to have in our collections for a computer scientist, for a social scientist, for linguistics, for doing all sorts of interesting things that wouldn't be necessarily put up on an internet archive, a Hathi trust, do it yourselves. These are the great libraries of the world. The corner we've turned is that the equipment is just not that expensive, that the capability of making searchable, browsable, downloadable, readable, can be legal within lending constraints and appreciate it. So this is just a way of saying there's an alternative. It's being built by a large number of others. Thank you very much. Thank you for listening. Music was provided by Josh Woodward. For more talks from this meeting please visit www.arl.org.