 All righty. Well, welcome everybody. There's a bunch of chairs over here if you're having trouble finding one. I'm Cliff Lynch, the director of CNI, and I am very, very pleased to welcome you all to Seattle for the Spring 2015 CNI member meeting. I'm glad to see so many people here. I am also delighted that I have not heard tell of nasty travel surprises that sometimes we encounter, you know, April blizzards or things of that nature that wouldn't be out of character with the winter we've had, at least in some parts of the country. So I am delighted you're all here. I'm going to be very brief. I have just a couple of administrative kinds of announcements and you will find all of these echoed on the the announcement board by where you picked up your badges and programs. The Fifth Avenue Room sort of isn't happening. And everything that was in the Fifth Avenue Room has been moved to Cascade A and B. Cascade A and B is on the third floor, one level down. You can get there by escalator or by elevator. So whenever in the program it says Fifth Avenue Room, read Cascade A and B. We have one cancellation. That is the session on the IMS learning analytics work that was scheduled for 5.15 this afternoon. So that session will not be taking place. And I would just remind you that if we run into any other relocations or rescheduling we will post it by the by their registration desk. There should also be a list there of the sessions that we're going to be trying to capture video for just for your reference. And having done those brief announcements, now I get to the good stuff. I am just incredibly pleased to be able to welcome Brewster Cale back to CNI. Some of you know Brewster as a treasured colleague and a leader within our community going back. Well, we were just talking about that. Let's just say a long time. Neither of us are looking quite as young as we were, although he's doing much better than I am. But Brewster has done incredible things. Some of you may know his early work on distributed information retrieval systems. Now just sort of woven into a lot of our fundamental assumptions about things. All of you I'm sure know at least some of the work of the Internet Archive of how he stepped in and took up the challenge of preserving this strange and wonderful new thing called the World Wide Web as it started to emerge. Some of you may be familiar with other initiatives that he's undertaken since and I think he'll fill us in on some of those. Brewster is someone with a great commitment to information access worldwide, a great commitment to stewardship and preservation in order to support that access. One who has thought very hard and tried to find hard ethical balances in situations that nobody's been in before about the collection and archiving and subsequent access of information. He genuinely is one of my personal heroes and so it gives me great pleasure to welcome him back today. I'll just say one other thing by just maybe as context to some of what I think he's going to talk about. We've relied on Brewster and the Internet Archive for leadership and really even more than leadership just being out in front you know all by their own in the wilderness sometimes for a really long time and as things get ever more complex as scale gets ever larger I think that he's been thinking a lot about how to get a broader collaboration going about how the kind of scale of the problems we're facing today, the challenges, is outgrowing any single institution and so I think that we've benefited from the pathfinding of the Internet Archive for a long time. One of the things that we're increasingly recognizing I would say in this last decade are the themes of collaboration and sustainability and I think he's going to give us a lot to think about in connecting pathblazing on one side and collaboration and sustainability as necessary survival strategies on the other. Welcome back Brewster. Thank you Cliff and it's wonderful to be back. We was given the opportunity to speak here now I guess 12 years ago and really what we ended up doing was kicking off the open content alliance. This is in the era when the Google library project was really just getting going and the question is how are we going to react and well we as the library community came together formed organizations and got together and got moving and there are now two and a half million books we're now publicly bulk downloadable also integrated into with HathiTrust but also available for open access as well and it's an outcome of CNI and so I'd like to say thank you to CNI for continuing what it is you do in all these ways thank you. I thought something that might be helpful today is to try to deal with the issues around modern materials things that may have rights to them probably have rights issues to them at least has consternation and questions around it and the idea of for this this talk and there's going to be some time for Q&A the idea of what can we do often together in ways that maybe we would feel that we didn't want to do on our own or what is now technologically or within the legal framework as it's currently evolving is sort of okay to do so I'm going to suggest if and go over a couple of the different paths that we've gone down with partners towards bringing modern materials into our digital collections and trying to offer broad public access to it and sort of what's happened out of out of that so the Internet Archive Internet Archive is an internet is a independent nonprofit so it's not part of a university not part of the government it's about 12 million dollars a year kind of organization it's 501c3 on non non-profit I would say we started out being an archive of the Internet so the idea is to collect the worldwide web and whatever it else was going to become this Internet thing and try to make that permanently available and permanently accessible to people and users and also as well as bots but we've long to become an archive on the Internet so not just constraining ourselves to things that were available to be hoovered in but to go into books music video software and the like and most recently and this is just a portrait of the Internet archive then I'll go back to the the the adventures of trying to bring these things up we've found really what we're trying to do is build libraries together I like to play games where there are lots of winners the idea of having a monopoly library or or monopoly of any particular content set I find actually deeply scary it's we'd like to be I'd like to be a participant in this but we don't want to be the only one in fact we want hundreds of them we want lots and lots of libraries to flourish and the technology which might have been very difficult and went for centralization over periods of time I think we can go and reverse some of these trends and build our libraries together with some of the original ideas of of distributed collection criteria and services in an era which seems more like isn't a library just a cloud service we subscribe to some place say emphatically no let's go and make libraries plural happen and flourish together so with that I'm going to I'm going to go through the some of the programs that that we've been involved in all in partnership with others and then say a little bit of sort of how have we survived these we see ourselves within the tradition of traditional libraries I love what people carve above doors so this is what's carved above the door of the Boston Public Library and is free to all and it was carved by the robber barons which weren't very nice folks I as I understand it they were pretty controlling but there was something about libraries and the information in them that was important to spread far and wide we started by crawling the worldwide web and the idea was to try to make a snapshot of every web page from every website every two months starting in 1996 so a snapshot and another snapshot and another snapshot another snapshot and it started to get big but it when we started this our lawyer friend said bad things are going to rain on you right just going and telling people that you're making whole-scale copies of other people's websites not even doing anything with it was going to be a very bad thing to do so we reached out and Library Congress wasn't quite ready for it yet but the Smithsonian was and so we worked with the Smithsonian and they gave us a letter with the little Sunburst on it and said we're proud to work with this organization that I think was three of us or four of us at that point to go and collect the presidential election websites of 1996 because it's important to the mission of the Smithsonian and David Allison of the Smithsonian knew exactly what he was doing he was giving us cover he was spreading his wing to a little organization that needed cover because all the recommendations we got was bad things would happen well turn out and bad things didn't happen and we started collecting along and everything was fine and then we basically said well what's the next step we can do let's make it even more available and in the year 2001 we made the Wayback machine so we then took everybody else's stuff and we offered it for free on the internet for anybody to have access to it so okay so there's a little theme so what do you think our lawyer friend said what would happen to us bad things would happen to us so so what did we do Larry Lessig and actually was in we launched it in the Bancroft Library of UC Berkeley with all the wood around it and it was you know the Tom Leonard was there it was the whole situation they knew exactly what they were doing they were putting their umbrella on top of this little organization that had just gone and without permission collected I don't know billion web pages by then and made we're going to make them available on the internet and it turned out bad things didn't happen people used it up a storm a lot of people wanted things taken out of the Wayback machine but it was all onesie-toosie things it wasn't you have to take everything out and frankly it was really helpful that nobody dragged us in front of a judge because I really didn't want to have to argue with a judge that you know yes we should go and try to get permission for every web page on every website every two months I it just wouldn't have been practical so actually just having the world sort of continue along was was incredibly important but there were very few people that wanted to come along with this on this journey the Library of Congress went and commissioned us to go and collect some things for them but they were very nervous about the Wayback machine coming out and publicly they they didn't they thought they could bring down the whole edifice of doing anything by being too public with it but when the Washington Post wrote it up as a good thing then suddenly everybody was happy and the reason why I'm saying this is not because gosh weren't they wrong ha ha ha no no no no it's completely natural this is exactly what really does happen is people are like they're kind of worried about it it's like well what's gonna happen and we don't know and you just have to kind of put it out there and in retrospect it all seems very silly and easy at the time it's scary as hell and so this is a situation that did work and we now have pets.com and thank God that we collected CNI in 1997 tada and and you can actually click through and see what it is we're gonna be on the agenda then and we've gotten how good you know things to come of all of this there's even the White House would have press releases they would say the president Bush by standing on a aircraft carrier that combat operations in Iraq have ended and then a couple days later change the press release to say major combat operations have was there any note saying we changed the president you know this was found by users of the Internet archive and and sort of surface this is sort of hey isn't this great but at this point there's still only one then we basically built a tool to work with others and we built up a web archiving tool that people could download and use heritrix and a lot of people have done that but it's kind of a pain in the neck so a lot of people subscribed to the Internet archive to go and build two collections on the Internet archive that they could then take away with them. This is at a time when people are most people in like the national libraries were still not doing any public access to their web collections they may have started collecting but they didn't provide any public access but this is state libraries and state archives universities have come together there are now 350 partners that are going and doing web archiving collectively and I think it's a little bit like those sort of pictures in Cairo you know when everybody's in Arab Spring was sort of standing together arm in arm and taking a step forward and it's that kind of a moment that we can collectively go and build services that aren't necessarily centralized you say well this is a centralized service but it does allow these materials to be back downloaded and you can run your own tools to be able to go and use these for your own research uses so the idea of collective tools for distributed collections was really what it is what we wanted to do and by operating together because we've got a very peer oriented group that there's just our field is peer oriented we look side to side as to what is okay often we sometimes check with our bosses but you know maybe not as often as we should we look to sort of what else are people doing and the idea of doing this work quite well so one joint collection is the Japan disasters of 2011 and people came together very quickly as the tsunami happened to go and collect these things together and so there's all this wonderful materials archive webpages no longer available archive webpages no longer available the collection is now quite large by a sort of subject focus and the the National Diet Library in Japan didn't feel that it could go and do crawling without asking permission so they were thrilled that we were doing it and they made this big ribbon cutting ceremony for us to come back and present into their collection the this this document of what happened in their country because they didn't feel that they could do it themselves so I'd say this is an example of our working together and making things go right I'm going to give a few more examples of this television in 1976 the act atra act said the Library of Congress was supposed to go and archive television and as best I can tell in 1996 the major outcome of this was a report saying oops we didn't do it but it's really important and we should do it so we hit the record button in the year 2000 Russian Chinese Japanese Iraqi Al Jazeera BBC CNN ABC Fox 24 hours a day DVD quality we didn't ask permission and bad things didn't happen but but we just held on to it and the question is what could we do with it and one of the hero librarian stories and I think is the Vanderbilt television news archive that started archiving television and just before the Democratic National Convention in 1968 and was lending out television to researchers and they were sued in the early in the early 70s and I paraphrase your honor New York has to shutter be capitalism is over because a library made a librarian made a copy and the judge did not concur and what what came out of it was an exemption in the 1976 copyright act but Vanderbilt kept on it and they basically got this little exemption to allow lending of television news so the Internet archive building on the shoulders of Vanderbilt built a service a couple of years ago to go and lend television news so that you can go and search and try to find materials you can search and find clips and embed in your blogs or if you wanted the whole whole program you could borrow it on DVD we stamped a DVD and sent it to you and the question is sort of isn't that clunky wouldn't you just do a download it's like yeah we'd want to do a download we're trying to be respectful we're trying to basically find a way that doesn't interfere with the business models of the broadcasters but not only has preservation happen but having access happen and it turns out that this has worked actually very very well we haven't had anybody complain in fact the on the news organizations are really thrilled because now they have access to their old collections and they've been asking for data dumps of the closed captions so they can do data mining over their own news programs and so it's actually turned out to be fine that it's that it has worked how do we do this we gathered a bunch of scholars before we did this we held a conference that was supported by the Knight Foundation the Knight Foundation was very nervous about this whole program but we got this together got all these these researchers say it's absolutely critical we need to know all of this that was that was helpful to sort of set a context we came out with it New York Times thought it was a terrific thing and and and the and others seemed to think so too and then the Knight Foundation came back and funded a further program of going forward and now we're having a bunch of different support that's been going on to try to help understand the influence of money and political ads in American elections and I've got a real downer quotation actually from this this study that was just finished of watching all of the programs in the Philadelphia region we recorded all of them all of 20 not just news but also all the entertainment program and found all the political ads and then counted them up and then and the University of Delaware went and also found all of the political news on all of the news programs in the Philadelphia area so political ads versus television news in terms of the number of minutes for every one minute of television news about the political we had 45 minutes of political ads it's just devastating what's going on and it's the first time this type of thing is known what I find interesting about this is it's not just retrospective-looking it's trying to make our libraries useful in the current issues of the day so it's not like a gosh glad you did this you know somebody in the future is going to be really you know blah blah blah blah no right now we need this for our public discussions so we're now working with libraries all over to go and use these materials and new in different ways and that's the sort of collective action I would suggest that is really important it's also starting to become affordable which is kind of think of television is just so darn gigantic how can it possibly be affordable we have about three million hours I think it's around nine petabytes of data but if you take one channel year it's only 10 terabytes that's a little bit bigger than the current hard drives that the current hard drive you can buy now is eight terabytes it's kind of amazing that almost is a channel year so at a channel year you can do this too and your computer scientists guys are going to love it and your digital humanities guys are going to do it and we can do it in a distributed way or we can work together and share the results so this is another example of I think kind of going forward we can go and have our libraries become not just subscribers to other people's services but to actually have services ourselves and to go and do our own collecting and doing it the next step that we went into was music and the whole question of what do you do about music this was back in sort of 2000 2001 2002 2003 there were just lawsuits all over the place this is sort of the time of Napster and so of course we asked our lawyer friends and they said oh what's going to happen to you is going to be bad so we we tried to find some way around this by finding people that actually wanted things to happen and we went with there's the tape trading communities actually this is an intern that was working at the Internet Archive he came forward and said that you know Brewster the Grateful Dead started this tradition of tape training oh yeah I had my cassettes right back back in the day so it's still going on it's going on on the internet so really and he said yeah there's lots of bands that copied it and so he said why don't you know first you keep talking about going and you know being storage for cultural materials why don't we talk to them I said great why don't you write them a note these these are the tape traders that they were online and and she wrote a note I said we'd we'd like to offer unlimited storage unlimited bandwidth forever for free what do you think and they wrote back and said we don't believe you it's too big but if you could do it it would be our dream it's always a good step and though and somebody says it'll be our dream so we said try us and so we thought about it a little bit from the band's perspective and just going and posting these up on the internet is different from tape trading tape trading used to be kind of a pain in the neck it was kind of as bad as going to your town library right you had it you know it was it was a pain and that meant that it didn't happen quite as much so we asked some level of permission from the bands we didn't go and ask our lawyers what was the proper form for them to sign so we just asked for an email from anybody associated with the taper friendly bands to just send us an email saying yeah it's okay it could be the drummer it could be the web master it could be you know it could be anybody and but then we had somebody in that community say it's okay and we posted their email response and it worked so we got two three four bands a day saying yes and fans uploaded about 40 or 50 concerts a day now you I wouldn't have thought starting the internet archive that music concerts would have been a thing that we would have done it was just responding to a need that there were cultural materials out there and it started to work we now have 130,000 concerts from 6,000 bands and we have everything the Grateful Dead's ever done so it's it's working so it's a it's a system that that worked we went on to do other internet oriented distribution music before mp3 the format was standardized there were other formats and one was distributed by the internet underground music archive and we'd archive some of their web pages but not all of it and it turned out that one of the founders of the electronic frontier foundation John Gilmore had recorded all of it on a hard drive and so he came up to us a couple of years ago and said I have all of Iuma I said great and so we thought well can we find everybody so he asked permission is like no let's just post it and and see what happens and if there's anybody unhappy we'll take it back down again so we we took these albums from the early internet distribution post them people were thrilled the company had long been munched in acquisitions and the and the participants the people that had posted music were thrilled to see their music back often they didn't even have it on themselves so here is another case of working with a community to find sort of is there some level of okay and then move forward also in music we started working with net labels which are internet era labels and we have lots of them and we're by providing free hosting that's been work really worthwhile we're now starting to work with CDs LPs and 78s and here's where it's a little more problematic for us and we're trying to figure out what to do we're starting to spread our wings by working with a few labels like I just put up but also with these archives like the archive of contemporary music in New York they've got 2 to 3 million audio recordings so that is just the mother load can we go and try to get those CDs and and put an LPs and put them up and we're starting to get better at the digitization process we asked your lawyers what would happen they said bad things would happen and it turns out that the preservation function hasn't elicited that type of response we're still trying to figure it out and get it going and trying to get it to be a more distributed project so that we can get lots of libraries taking their CDs and LPs digitizing them making themselves available to themselves and as in central repositories like ours our current idea is to go and have libraries go digital so that people can take their existing collections put the CDs or LPs in some sort of machine prove that they have it if it's already been digitized blink they can have access to it as if they had digitized it if they have is if it hasn't been digitized then digitize it added into the pool kind of like how we built OCLC back in the day can we go and build our music libraries together but then allow people to have full download access to their whole collections in digital form what you can do with it beyond that we're not exactly sure and it'll all evolve but at least on campus access seems to be happening a lot i get a lot of librarians going saying yeah we uh yeah we let it uh you know everybody on campus can have it but don't don't tell a lot of people and if i gathered all of those people up i'd better be i don't know a third of the organizations that are represented in this room so it's starting to happen we've got a model for at least on campus access so and the internet archive is doing that same kind of thing can we go and as a collective group of us bring our libraries digital we brought our catalogs digital can we bring our libraries digital i think that that is a great opportunity so don't just go and have the internet or oh it's all happening because they'll do it no we should do it and then we could have access and it's small the amount of of storage that it takes even at high resolution by computer standards is quite doable we got 40 000 um a donation of 40 000 uh 78 rpm records um which we want to you know thank Bavaria Public Library for giving it to us but really what i'd like to argue is please don't get rid of your collections store them well and store them maybe off-site store them maybe in shipping containers it's cheap enough to do as long as you don't have to have rapid access to it we've figured out how to do inexpensive off-site storage um so you can hold on to your own collections if you really don't want to hold on to your collections then please donate them to the internet archive and we'll we'll hold we'll hold on to them so we just got 50 000 LPs um and we're getting better at sort of doing mass digitization of of CDs uh and LPs um a user of this collection already Daniel Ellis from Columbia University had this to say that basically they need access to comprehensive collections to do their new research the type of research that Aaron Schwartz does of going and downloading a lot of materials and going and making um doing computer analysis is the norm it's what we should be actively supporting and we are now supporting uh Daniel Ellis and a number of other um researchers in the music world because we've got these collections put together and we've gone and made these available in-house in our cool little reading listening nook and nobody's had problems with it so we're not just taking the CDs and putting them out on the net but we're having them accessible on campus and looking for others uh to play with I think this idea of standing together taking a step forward and doing it in a distributed way is a more robust more resilient mechanism of building these materials up so audio collections is start now starting to grow and the internet archive is smaller than most of your by budget or by staff um the most of your your universities or organizations um so you can do these sorts of things um as well moving images most people think of hollywood films most of this stuff is all tied up in enough rights and it's actually fairly available anyway so we've mostly stayed away from it what we've gotten a lot better with is old materials 16 millimeter eight millimeter home movies those sorts of things people love them I've it was a real surprise to me that actually people use this is uh stills from the are you ready for marriage you know the social behavior films um uh when we were in junior high school when there was a um when there was a substitute teacher and they'd reel in the 16 millimeter projector those we have those um and um and they've been downloaded often hundreds of thousands of times because I think it's how a visual generation is trying to understand the 20th century not just from hollywood perspective but from home movies and these sort of ephemeral films um we're just now digitizing 7 000 films that were donated um by a major research university um with the help of uh melan and clear foundation uh fellow to sort of watch over and get these things available we learned to not ask enough lawyers to try to figure out what would happen if we put these things up on the offer because we just sort of channeled they'd probably say that bad things would happen um but we just haven't been finding this to be the case that we basically deal back and forth with organizations if things come up and we just take them uh down and it seems to be working out pretty well we also offer unlimited storage for people to upload things um and we just started to get inexpensive equipment and there's um we're starting to do vhs tapes so there was a commissioned um report on what to do with vhs tapes and are they you know can we use section 108 something or other um and I think the report basically said no so we just started to anyway um and um so what we did is we just took vhs tapes that were from the san francisco friends of public library book sale the remnants and we had volunteers look to see if it was available on dvd for sale new in other words is it currently being flogged not is it available in ebay is it available new and if it isn't then we digitize and we put it up and it's worked fine so we're trying to stay away from commerce right trying to stay away from people's uh uh valid business practices um but we're trying to make the older stuff available and everybody um is thrilled it turns out to be inexpensive to digitize these materials and we now have an awful lot of them up texts so as I as I said we started the open content alliance but mostly people were going for the out of copyright but we wanted to do all of it so the library congress maybe 28 million uh books a book is about a megabyte that's 28 terabytes that's four current hard drives you can spend less than a month's rent and have the storage capacity for all of the words in the library of congress something new has happened you guys can go and have these collections within your collections even if it's just for certain types of uses and use other subscription services for certain access things but the public domain should be um more publicly available but we didn't stop with just the public domain like these wonderful books of euclid um and putting them on um on digital tablets and the like what we um did is start to just digitize well everything anything that we could afford to to digitize and then try to make as much access as we could so we got good at digitizing inexpensively got it down to around ten cents a page so thirty dollars up for a 300 page book so thirty dollars a book um and thanks to the library community working together we've worked with about 500 libraries now to build sort of an open version of the google books project and I said before we've got about two and a half million books done through this sort of system and there's just some fabulous works out of rare book collections um working in china and something that I'm really happy about because I've been looking for a community a nation or a language group that would allow us to digitize everything in their language and just can we just would somebody go open and the balinese said yes so we basically have been we digitized all of the published works in bali just kind of neat and what they do is they publish on palm leaves and so and so it's palm leave books and this is um so we basically digitized by photographing with them all of their of their books and we said okay well how do you read them and we said read them well the priests read them um but sometimes they're read as things like shadow puppet plays so this is actually a reading of one of their books or there are performances but I thought it was completely great that the first group of people to say I want to go online because it's going to be beneficial to our language and our culture is um we're the balinese so we now have scanning centers all over the world um they're close to people uh it's not that expensive to do we're coming out with um a smaller scale portable scanner so it's easier to do um and we're starting to work through the rights issues so we have maybe three million free books but we have a million books that are available to the blind and dyslexic because we can um and 300 000 that we're lending so what does lending mean so this is in copyright non rights cleared books that have been given to us by about 500 different libraries with the express purpose of digitizing and lending in copyright non rights cleared books this has been going on now for four years and there's been basically no problems what it means to lend is that we try to buy books from publishers so that we can lend them but they're not that psyched about doing that in general so mostly we've been digitizing and lending this is a book that we bought but it's been checked out so you can add it to your list um here is a more obscure book um from the boston public library but it's from 1990 something or other and um and it is available to be checked out you have a few different formats and then you can borrow it and you're the only reader of this for two weeks so for two weeks you're the only everybody else is locked out of this book and it's been going on just fine so this is an approach towards working together to lend books and we're lending people all over the world what we'd really like to do is take your collections and bring them digital and then you could lend things inside your own organizations say wouldn't that be neat so if this hard drive is eight terabyte hard drive can store eight million books if you can keep a card catalog going that keeps track of your lent out books then you can operate the technology to lend out digital books and make sure that it's only used within the whatever constraints possible it would make us feel much more safe if you will by working together and having lots work together other libraries are using our platform but it's still kind of just us but is there a way of spreading it so that there are lots of libraries that have their collections digital and you can go and make it available say just to your computer scientists maybe you use the the local consortia to go and put together like in california califa is basically starting to operate shared lending facilities but it means that you're not beholden to somebody else and i think that's going to be absolutely critical to have a robust library system going forward software so in the software area um the we've been starting to do large-scale collections we believe we have 500 000 titles but we don't have all of the cataloging done we have about 90 000 titles of of pc software era software on our disks working with different communities that have been doing the lion's share of work so we're not doing the work they are we're working with them in really interesting and new ways and now we based on some work that we did again with lots of volunteer communities we got the first level of emulation to work so that you can go to a web browser go and see a piece of software say what my case was visit calc i'd never seen visit calc run and you can go to a web browser click run and what it does is really surreal it actually downloads in javascript an emulator for an apple 2 boots the apple 2 then it reads its virtual floppy that has the the the software on it for visit calc and it's running it's completely surreal to me but it's possible now to run emulators in your browser and it allows us to in some sense lend software so it's so you're not really getting it you're getting to use it and it's not been a problem we've been putting things up out at a phenomenal clip and we've been contacted by a lot of the the producers that are still if they're still flogging the materials they want to take it down and what do we do we just take it down and it works so we've gone and put up tens of thousands of pieces of software now up and running and it's used by a generation that is just absolutely thrilled next up for us personal digital archives we're starting to not just go and do the digitization materials or the hard drive era collections of things which is where my family keeps its photos i'd say a lot of younger families are keeping their photos on flicker and all of these other not terribly stable environments and so we as libraries really need to come together to start to figure out the tools so we can go and build the archives say the professors or kids or institutions it requires being forthright and aggressive about on doing our jobs and i'd say by i haven't had people come to us and say no we don't want libraries anymore and i haven't had people come to us and say you're not a library you're really a blank it's like no you're really a library we want libraries let's find ways for us to all move forward together so in conclusion universal access to all knowledge say it's within our grasp and it could be one of the great works of humankind i think it could be remembered like the library of alexandria or the Gutenberg printing press as one of the terrific things that that happened but it's going to involve all of us and it really isn't something of just going oh google's got that covered or it's a hotty trust they know no problem i'll just subscribe to it not good enough it's really going to take us moving forward and in fact some of the projects like the google library project got stopped as a monopoly that was the reason why they stopped it in the courts is that it's not the library system we as a society want is having one organization the books rights registry controlling the distribution of out-of-print materials that doesn't make sense it's our turn and fortunately the technology has become inexpensive enough to to do things within our organizations or and to do in clumps of them we're working together in different ways and i guess lastly carved above the door of the public library in pittsburgh the Carnegie library his legacy is free to the people thank you very much i hope that was provocative enough to get some questions going i think i've we've got maybe 10 minutes or so um so please what should we be doing now yes david i'm david rosenthal uh so uh what let you deal with books and uh software was in effect streaming and seems to me that this is this this is something which could be more generally uh deployed to deal with the copyright issues people have much less of a problem if you can't actually get a copy and square it away you just get the experience i think streaming is um a good intermediate step that has worked in a number of circumstances um so uh the grateful dead um we had all of their materials up and the band sort of got nervous um and they said well you should take it down i said well take it down but they're going to be some really unhappy dead heads out there um and so we we took it down and there was this big kerfuffle um and that played out in the press and and the like and what ended up happening is the sound the audience recordings they allowed to be downloaded and the sound boards that were never those are direct patches of their of their uh sound that was never really part of the tape trading deal and those are available streaming and that seems to work quite well even the book viewing is streaming and it does sort of play an inter um uh intermediate role it's not that great though for research so um streaming isn't going to really make your computer scientists happy because they're going to have to scrape it and that's a pain in the neck so i think having copies yourself of the materials so that you can go and make bulk access to ones that you feel comfortable allowing bulk access to is tremendous and going in depending on another organization whether it's the internet archive to build all those research services which we're in the process of is so is hotty trust but isn't that really what our universities should be doing um is building some of those services and making some of these different interplays um available so i'd say bulk access is important but under more controlled environments for the um for the uh modern materials that are probably have rights issues like television i'd say that's exactly what it is we're doing with television you know the the other issue with um bulk access is that bulk these days is getting awfully big and so moving the moving the the uh the analysis to the data rather than moving the data to the analysis may end up being the only way you can cope with it simply from the bandwidth and um storage capacity issues yes but let's not go and lock that into policy i just don't want to have regulated that there's only going to be one or two copies of these materials we want to make it so that the big boys could take copies away and it's it's still not getting easier moving terabytes around is easy pedabytes yeah it's still still clunky yes and i i noticed that you only have a couple of copies of everything that you you have right yes we have two copies then we have a partial copy in amsterdam and a partial copy in alexandra egypt yeah that's a problem that needs to get fixed yes i i think there's somebody who said lots of copies keep stuff safe i i i i i i think we should agree with with them um and and we need help i've kind of a related question to that which is um you you mentioned a couple of examples where what people really want to do now is mine the large collections but the truth is most researchers can't download the the quantities of data and audio and content that you're collecting so um are you contemplating anything like the how do you trust research corpus some kind of modality where people could actually do their research in your collections because i i see that as the low hanging fruit to get people excited about the stuff you're collecting um yes we've been doing different different approaches towards trying to get researcher access um and what i've found is so far as big data usually means lots of data points not lots of terabytes um that people still even if you have access to it they don't know what to do with it well i think we need is a good middleware layer of open source software that makes it so that we can have digital humanities people not have to have a program or glued to their side um to be able to get some of their research done um and we're exploring and trying to figure out building an institute uh with fellows to go and build that middleware because we've sometimes gone and just opened it up we took a one crawl of the worldwide web it was about 80 terabytes and we just said come and get it um but tell us you know we'll give anybody a download key you can you can have access to it i don't know 20 25 did um got an access key and we never heard from them again i don't know if they did anything at all um because it's just it's kind of clunky and hard so i'd say we're we're still in early stage in this but let's do it together let's have some fun with it so um let's let's go with some of these different collections let's move them around let's build some open source software so that it's not just the land of the esoteric coders to be able to do like the studies like we were doing on the on the tv news by doing all of those program finding all of the ads that's actually a pretty cool masters level thesis um let's get there we have the data let's see what we can do with it and we can do some of it on our servers but some of it on yours how do we make it all all go i think is the opportunity that we should be pursuing now you're here thanks thank you so elizabeth jones university of washington i'm so one of your friends um in this talk which i thought was interesting was um lawyer said bad things would happen no bad things happened and but a lot of the things that you talk about a lot of people have gotten into trouble for doing sort of similar things and so i'm wondering if you could speak to why the internet archive maybe hasn't gotten in trouble for doing these things as opposed to those other folks who have yeah paul karant from michigan uh so why haven't you gotten sued um and and maybe we're just lucky i really i can't tell you um though we're very concerned about every aspect of what we do and we try to be respectful and we try to be open about what we're doing we make no money out of that out of this so that kind of helps what i've learned by going collecting everybody else's stuff over the last 20 years and then making it publicly available again is the people don't want to feel like they're being taken advantage of if they feel like they're being taken advantage of they'll throw things at you and laws are just one of them they'll throw they'll throw threats they'll do all sorts of things to cause you to stop if they feel like you're being taken advantage of so how do you do that how do you deal um one is do a really good job of what you're doing do it beautifully be respectful of i mean they spent a lot of time making that thing do it right um so in the software uh we've gotten a lot of the early software developers of these games thrilled because it's actually done quite well so that's one don't make any money and engage in in conversation winston tab when he was at the library of congress um when i was sort of being trained on how to do all of this stuff he said just remember it's all of it's their stuff right be respectful um and that's another characteristic and we have a bend not break policy so where we probably could have stood our legal ground and faced somebody down and said screw you we didn't um and sometimes it makes some people mad in our environment because we took things down that we maybe didn't really theoretically have to but it's sort of just everybody's trying to figure out this digital transition so we try to talk more to the business people than to the lawyers in these organizations um and try to find ways to uh how can we make their business work better um i think we've put a lot too much faith in lawyers figuring things out um and it's just not going to happen laws tend to trail new laws when they're done hypothetically tend to be really bad especially for us um it's so i don't think laws are going to be done before we get there we're going to have to get there do things and if there's enough trouble then we'll have to bring in the legislators to go and figure it out so i think we have to go and figure out what a library looks like and i like the old style library where you have collections you build collections you serve your populace really well you pay publishers um you you make things happen in a distributed way i think we got brought into this digital collection idea back when oh an anecdote i visited oclc um way back when i think they were still called the ohio computer library consortium then and they were run on honeywell computers and um and i asked wow how big is the database they said really large and how big is it really big and i said in bytes and i said well that's as many mark records 17 gigabytes right i was like wow that doesn't sound that big oh well you know um but it was a lot of maintenance to go and get that database really done well right i'm not saying that oh it's you know cheap to be oclc it never has been never will be um it's that the uh you don't need an acre of mainframes to run these data collections anymore you can at least have copies much less expensive lee now going and maintaining it and keeping it up is is as expensive as always has been because it's people um but the technology has made it so that there are certain things that we need to question our old assumptions about um and we have opportunities beyond what it is we're currently doing and i'd say we're arguing ourselves into not very good positions because of uh of our old habits in terms of understanding how to get things done let's build a distributed library system uh steven davis columbia university um clearly you've built one of the greatest knowledge resources in history i don't think anyone can argue with that and uh it's just amazing it's just amazing um over the last 10 or so years many of us have been working in the area of digital preservation long-term digital preservation a number of models and approaches and theories have have been offered up including the the track certification and i so follow on and then now deepen is part of the landscape where we hope it is and uh how does internet archive fit into that new framework for long-term digital preservation um don't don't really know um the internet archive has been i'd say underfunded i mean i think all of us say we're underfunded but i think we're like severely underfunded for what it is we're trying to do and so we've kind of just bludgeoned forward to try to figure out how inexpensively we could do things um and we try to do a good job and be transparent about it and we're trying to come around a corner now that that third phase of building libraries together is really is driving our organization to be more engaged um with uh user communities and institutional communities so with deepen for instance i know i know how to spell it but that's probably about that's about as far and it's not because it's not a good thing to do it's um it's just been um we haven't really we haven't really gotten there yet so please don't give up on us um we're we're better um staffed now uh wendy hanamura is in the front row is is actually we have now have a director of partnerships that can actually answer your emails because if you write an email to me it's it's iffy um so uh so i think there are our roles for these different organizations in every country should at least have these they um i know that uh things are going on with our northern neighbor in terms of going and having collections there too i highly encourage this so encouraging but we're kind of lame at the moment so so some deep in people will be getting in touch with you very soon thank you so booster um i i couldn't let that last comment go the comment of don't give up on us meaning you um in my so having been a librarian for a very long time my definition of a research library is one which continues to build collections and over the last two decades many of the libraries that used to be research libraries really have fallen down according to that metric more and more libraries least access to content more and more libraries um librarians care about the material that they pay large amounts of money for which is not at risk at all and not the things that are freely available and i would turn i just want to say what's what's amazing about your presentation right now is that you frankly haven't given up on this community and i hope they don't let you down thank you i don't know how it's going to go some people say that the the horse is out of the barn that we're going to end up with youtube el severe jstore hotty trust the end the being a university librarian is going to be contract negotiations and personnel issues uh we're going to become customer service departments um and i don't think that's the right way to go um and i'm not exactly sure what the shape is going to be i don't a lot of our business models are wrong that uh if you're running a library you're only really there to serve a local community and how do we then go and change that so that i'm going to do the best ornithology server in the world and i don't need to just serve cornell with it i can serve everybody but how do we then go and pay for that how do we get our business models adapting to distributed service provisioning that isn't really locally oriented and i don't know the answer to this don waters why i hold in high esteem i see you in the audience he has won this darn argument for the last 20 years i've been trying to make it so that we can provide universal access to all knowledge and be supported at it and he's pointed out that it's very difficult to get libraries to pay for things that they'll get for free why would they um yet often distributed endlessly as opposed to subscription services work better in the internet generation and i don't know how to square that um uh that circle that we've really got a a system that has the business models built around physical collections that are innately local and but we have now an opportunity of making services that every one of them is global yet we have no mechanism to go and find a support mechanism for it so as um i think of the the father of digital libraries mike lesk put um the thing that he is worried about he's worried about two things he's worried about the 20th century because he thinks the 19th century and the 21st century are in pretty good shape but the 20th century might get forgotten um because of the copyright uh issues and the other is institutional responsibilities what are we supposed to do when you go back to your offices uh next week the question is is are you going to do something differently because of this it's hard to imagine because your constraints are the same as they were the week before the opportunities are different though and how we respond to those opportunities is going to define how the library system looks in 10 15 years when we're even a little bit grayer thank you very much