 Well, thank you very much. I only want to go to conferences that start by singing of peace and love and have a celebration of failure and recognition of it and a call to be bold. So thank you very much for inviting me to this forum. I'm hopefully that it can be somewhat useful in what it is we are doing or some of the ideas. The idea of universal access to all knowledge is actually, it's turned a corner. It used to have to try to justify that it's possible. But now I think actually we've got the other problem is people think it's already been done. And it hasn't. That all of us that know something deeply can go under the net. And they'll find a Wikipedia page. You'll find something or other. But you won't find it all. You won't find the controversies. You won't find it all. And that, I'd say, is our challenge and opportunity is to fulfill the expectation that many people think is already there, that it's all online. And it's just not. But how are we going to do this? I was wrong. I came out of MIT geek and I thought, OK. I thought the machines were going to save us. All we'd need to do is go and have the machines go and digitize the library and put it online and we're done. I'm wrong. That it's actually it's people that are the heart of curation. It's the people that are passionate about something, that know how to contextualize things, that want to bring it forward and into a new medium, a new approach, and add either original materials or even just organizing curatorial functions. Us that are absolutely the important piece. So I think that at this point what we need to do is build libraries together. We need to put something not just on little websites spattered around the web, but in some way that makes it actually understandable to people, but still emphasizing the curatorial and abilities of individuals to create things. I'm going to try to paint this portrait that this is not only possible, but able to be done now and is being done. And that's a possibility of New Zealand because of both the civil nature of it, the little bit of an isolation nature of it, could actually take a very big leadership role in trying to build something really quite significant that would be a model for those of us that feel mired in much larger political and corporate systems that actually slows things down sometimes. So that's the idea of today. What is it? The Internet Archive and Nonprofit Library in the United States. We're not part of a government, not part of a university. We're an independent library, 501c3. We see ourselves within the context of libraries. So what they, I love what people carve above doors or carve in stone. And this is above the Boston Public Library, free to all. And the Boston Public Library is kind of interesting because it was built by those capitalists 100 years ago that saw no end to the concept of property. But they thought information and libraries were special, that they really were part of the infrastructure of how society works. And they carved free to all over the door of their testament to role of information. So what makes up all information? I'm gonna do it media type by media type and go over a little bit of the costs, some of the advances, some of the legal issues that affect trying to bring all books, music, video, web pages, software online. And how are we doing towards it? And what are some of the next steps? And how, frankly, we should all participate towards building something that's accessible to all. Let's start with books. If you take the world of books or texts, the question of course in an engineering sense is, well, if we wanna do it all, how big is it? How much is there? The largest library by far in the world is the Library of Congress. It's 28 million megabytes, 28, excuse me, 28 million books, 28 million books. So 28 million books, but Yale, Princeton, or Boston Public Library is about 10 million books. So that's sort of the scale of it. And a book is about a megabyte. If you had it in a Microsoft Word file. So that's just the words in it. So the words, a megabyte. So 28 million megabytes, it goes mega giga terabytes, 28 terabytes. That's four hard drives that you can buy in Best Buy. Four hard drives, you can have all of the words of the Library of Congress. It's kinda interesting. And it would cost less than a month's rent. So we're at a point where the idea of all of the words being available is possible. If it's images of the pages, it's larger, of course. And if it's the movies and music and all that, but it starts to sort of be like, wow, this is kind of doable. Another question is, what do you get at the end of this? And we're starting to get used to these things, of books on screens with the Google Book Project and other projects coming out there, that you can actually see these books and read these books and enjoy these books on today's screens or tablets or phones even. But going and just having things available on screens is not good enough. We really need to be able to have things printed back out again, flip flop, as I heard last night, back into the physical world when appropriate. So we actually made a print-on-a-man bookmobile. So it costs about a buck a book to go and download print and bind a book and give it out. At a buck a book, that's cheaper than what libraries pay administratively to lend out a book. So if that is the case, then we can actually give away books out of our collections, writes assumed. And actually, if we figured out how to take $1 on top of that dollar cost of making the book and pass it back to the authors, most authors I meet say, I don't get a dollar a book, I'd love that. So the idea of getting feedback all the way back upstream makes sense. We've launched a couple of these in India, in the library of Alexandria in Egypt. This is an engineer working with a kid. This is the first book this kid's ever owned. We also did it in Uganda. So we've been able to go and take this model of print-on-demand and try to bridge things back from our digital libraries, back into communities that want things in physical form. There's also now these machines that are like Rube Goldberg machines that actually go and print and bind things and they come out a little chute. But I think we're really seeing the adoption of screens to the extent that we really, I don't really think that the print-on-demand approach for most of our books is really going to be the way forward. We're seeing these screens looking really very, very nice and very good, even for digitized materials that are in color. And we've gone and taken all of the books that we've digitized and made them available in lots of different platforms. By doing engineering once, we can go and make it accessible in lots of different platforms, including my favorite one, which is in the bottom. Right is a little talking book thing. You can download encrypted books if they're in rights. We have rights burdens, if not, then you can, it's freely accessible and it talks a little bit like this. And it's for the blind and dyslexic and frankly the elderly, which is the largest subscriber to the National Library Service in the United States of bringing talking books to people. So the idea of having books online makes sense to people now. It's so like, okay, yes. Now let's get there. Question is, how? This is one of our first projects in terms of working with the Million Books Project. This is a scanner that we got for the Library of Alexandria. This guy doesn't look too happy about it, does he? Well, anyway, they've been digitizing along. We actually designed and built our own book scanner to be more efficient and to not flatten books so that you raise and lower glass and do it. They're scanning centers now in many parts of the world that we're starting to get collections up and trying to weave them together into a global library but with specific focuses on particular subject matter, whether it's Korean books or botany books out of Beijing, China, which they're digitizing there. So we're able to see a broad adoption of building libraries together by using the catalog systems and the search engines in the net to allow people to find things across these collections, even though they're designed and built for particular user communities and they thought that that was all that was going to be interested in them. But we found that people love these things even though they didn't think that other people would go into like their particular books. We've also started to work with other sort of non-traditional sorts of publishing. This is in Bali, where actually it was fantastic. I've been going around and asking different language areas and asking why don't we just digitize everything ever written in that language? And I've gotten a bunch of turn downs. But in Bali, they said yes, and I said okay, let's just do it. And so it turns out they write on palm leaves, which are these completely fabulous things and they're manuscript and it's sort of a manageable set such that in about a year and a half with a roomful of people, we were able to photograph these materials. Then we found that actually that wasn't going far enough and we need help because now we have these in these materials, but a lot of the people don't know how to read it. The way that Balinese was read is it was sung. And it was sung by priests and it was put into things like shadow puppets. And so there are shadow puppet plays that are the readings of these books. And so we're starting to digitize these materials and make them available as well. But we really need mechanisms of going and getting these transcribed and put back up into the villages because what's going on in Bali is the government is having the school children learn Indonesian, not Balinese. So we're probably a couple generations of basically having the Balinese language be about as irrelevant as a foreign language to the Balinese. So if we can use our technologies to bring things back into these cultures and our culture, we can preserve ourselves rather than going and having whatever's published from the big publishing houses around the world to be able to have our distinctiveness move forward. So I'm really jazzed that the first complete literature of a people to go online were the Balinese. So for them to just say yes, let's do this. And I think actually New Zealand could be the first country that just goes completely online because of your unique status. So what to do? Well, we've got scanning centers now in 30 libraries in eight countries digitizing about 1,000 books a day. We've got about three million free e-books available to everyone. We've got about a million modern e-books available to the blind and dyslexic, every Harry Potter, the whole shebang, and about 300,000 that were lending. This is an interesting point on the sort of how do you deal with the rights issues? Because if you just constrain yourself to things that are from the Bali or 1923 on back, that's not gonna be the complete library that we need to be able to as a functioning society. So we built a system to go and lend books. Lend in copyright, non-rights cleared books. So we basically have built openlibrary.org and we have a system where you can borrow books, either borrow books that we've bought from publishers and it's been very hard to buy books from publishers, though I actually scored and bought some books from a New Zealand publisher yesterday, which I'm excited about, but it's been difficult for publishers to just sell us e-books in the same way that we bought books in the past where we can preserve them and lend them out. But some have, but for instance, this book is checked out and that means that if you try to get it, then it's checked out and you can add it to your wish list, to your waiting list and then when it's available, then you can borrow it. So it's one reader at a time. We buy a book, an e-book and lend it one reader at a time. You say, isn't that clunky? And it's like, yes, but it's what we feel we can do at this point to go and try to respect the business models that are still trying to survive in a difficult digital time. We've also started to work with libraries, such as the Boston Public Library and about 500 libraries total at this point that are digitizing in copyright books, non-rights cleared from their collection and then we're lending them one at a time. This is a book of genealogy digitized at the Boston Public Library and you can say I wanna borrow it and see it in a page flipper and if somebody else wanted to do it, they would say it's checked out and you have it for up to two weeks and then it's automatically returned and there's no library lead fee. So hooray for, there are some advantages of the digital world going forward. So the idea of having a checkout system to bring the vast majority of our collections, which are have some sort of rights burden, is working and it's been going now for three years and it's been working worldwide with 300,000 books. So I would say that books are doable. The idea of all books, digitizing them, it costs us about 10 cents a page to digitize books. So it's $30 a book. We've digitized about 2.3 million books with the sponsorship of libraries. The idea of we need to get to 10 million. If we get to 10 million, I think we've got a critical mass for K through 12 and undergraduate or research collections. Books aren't just it. So another is music. What if we wanted to do all music and we're really gearing up to do more music at the Internet Archive and really looking for partners at this point. But let me say a little bit of what's gone on. The music guys, there was a tradition started by the Grateful Dead to allow people to record concerts and trade with other people as long as no one made any money. Key, no one made any money. I think that's one of the issues where people feel like they're being taken advantage of is when somebody's making money, selling or offering access to their work. So this was the Grateful Dead had this tradition. It ran for many, many years and when they moved to the Internet, people had these enormous files and they were trying to move them across the net and it was a problem. They didn't have the space and the bandwidth to be able to do this. So when we were getting going about 10 years ago, an intern that was working on our website said, you know that tape trading is still going on? So really, I used to have my Grateful Dead tapes. He said, yeah, it's happening and lots of bands have expanded on that but they're having a problem. I said, well, why don't we offer them unlimited storage, unlimited bandwidth, forever for free? So he wrote an email to this tape trader community. Would you like unlimited storage, unlimited bandwidth, forever for free? And they wrote back and said, we don't believe you. But if you could do that, it would be our dream. It's a good sign when somebody says that. And so he said, work with us. Let's go and figure out how to make this happen. And we said it's different to go and just tape trade bands as putting it on a website. So let's get some level of permission. And this is what the lawyers would all cringe over. Permission for us is just having somebody from the band community say it's okay to host it on the site. It could be the drummer. It could be the webmaster. It could be anybody, right? There's no in blood and triplicate. Nah, it was just somebody from the community thought it was gonna be okay to do this. And we get about one or two bands a day signing up and we have for over 10 years, we have now 6,000 bands and 130,000 concerts that are now up and fans love it and the bands love it and it's all working and no lawsuits. So it's an example of where a library can play a role being community host for communities that have no infrastructure to be able to play a role. And of course we have everything the Grateful Dead has ever done. And so that community has worked. Recently we've been trying to bring more music online. There was a community on the internet that was before MP3, the format was even specified. That they basically was a free hosting type site and it went out of business about 14, 15 years ago. But somebody had captured it and collected it. And so they brought it to us on a hard drive and we said let's bring it up. So we brought it up. We were a little nervous about it because we didn't get permission or whatever it's kind of dissolved. And people loved it. The artists that were on there loved it. The remnants of the founders of this company that had long been mashed in acquisition after acquisition loved it. So it worked and we now have 43,000 titles up. Net labels are a, sorry I lost my voice yesterday. Can I, am I groveling a little bit? Please excuse me. Net labels are an evolution of the, from the music trade, when basically CDs were starting to go away. And they still wanted to go and publish things but they couldn't afford the bandwidth. So we said put it on us. And there are 2,000 labels, small labels and 54,000 albums that are massively popular that are hosted on the internet archive site. And you could do this too. We're now shifting towards the things that were commercially released. Things CDs, LPs and trying to figure out a way through this. Little landscape like we did on the books side but we think it might have different solutions. We're starting to bring together an advisory group and partners. We've got a couple of music labels that are oriented towards classical music. And we're working with the Archive of Contemporary Music which is this fantastically large collection of LPs and CDs. We've set up scanning centers to get the technology working so we can take tens of thousands, hundreds of thousands of CDs and LPs and make it move. We're starting to get donations. For instance, you may have them in your collections now of 78 RPM records. Nobody wants to listen to these things anymore. So if we can digitize them, we can make them available. People have donated recordings of these 78 RPM records. Thank you. Of digitized versions of these, about 10 to 20,000 of them now on the archive. And no one's been had a problem with it. Whether it's Caruso records or Scott Joplin records, these are all going up and there hasn't been a problem with making them accessible. LPs I think are going to be different and CDs because they're closer to commercial viability but we're starting to get donations and get better at the digitization process. And we're starting to get technologies and volunteer structures for doing CDs starting to be at scale. Doing LPs because the covers are a little bit larger than commercial scanners requires some extra work. And we're trying for three forms of access. One is researcher access. For instance, Daniel Ellis does computer research finding rhythms, key structures, chord progressions, genre, by listening, computer listening to these audio recordings. There are a bunch of these researchers that are now using our corpus as a primary tool for their work. We're also established a small listening room. So if you actually drag your bones into the internet archive, you can sit there and listen to everything. And maybe we could do that here or in your branch libraries. What can we do to sort of move the ball forward with some level of access that doesn't get people mad? And another we're looking to do is a website with 30 second clips that would make it so that you can get access to a short part but then maybe links to YouTube or Spotify or Amazon for purchasing the recordings. Can we make some progress in building these collections? The cost is not that bad. The collection size again is also not that bad in terms of it's scalable to the sorts of budgets that we can do. We think we can get it below $10 for CDs and LPs are more difficult. Our audio collection is now over two million objects whether they're concerts or the like in 5,000 collections. So we're starting to see even when we're a small library audio collections being participated in by others and we want to get better at working with others to build music libraries together. Okay, I promise. Books, music, video, so we're now onto video. How hard is video? Most people think of video as Hollywood films. We haven't done much in the Hollywood films type area. So that's a pocket that we haven't been participating in yet so it's still to come. What we've done a lot more on is ephemeral films. Films that were not meant for the ages. Educational films, government propaganda films, training films, advertisements. And we digitize these and put them up working with archives and libraries and they're fantastically popular. I don't know why. But they are, this is a, are you ready for marriage? I don't know, it's kitschy, it's used as mashups but I think it's also used by a generation that's a very visual generation trying to understand the past by using moving images. And so I think by bringing these types of materials up online it can be a very powerful offering towards permanent access. We offer permanent access to people uploaded videos. Even though Yahoo video and Google video is gone, there's YouTube but that's not Google video. When they took that down, six million videos went down with it. So there's reasons to go and work with libraries rather than these more ephemeral corporations. We have these digitization machines that are really pretty good, pretty cost effective to do. We're starting to do videotapes and all of these are sort of recent enough that they might have rights issues. So what we've done is we look up on the net to see if a new DVD version is available. So not as it through a used but is there a DVD version that's currently available from the publisher? And if not, we digitize it and it's worked out great. We've gotten no complaints and we get a lot of exercise videos, construction videos. We get a lot of training films, a lot of interesting things that have been quite popular and just by proceeding along, not interrupting commerce but getting everything else we can even though if we gathered enough lawyers together they'd probably say that there were some risks in the process. So it's not that hard to do and we've actually gotten volunteers to help. We've been getting better at television. We've been recording 20 channels of television since the year 2000. Russian, Chinese, Japanese, Iraqi, Al Jazeera, BBC, CNN, ABC, fuck. 24 hours a day, DVD quality. We've put some up around the September 11th event on October 11th, 2001. So this is years before YouTube to go and help Americans try to understand what was going on in terms of why did other people, why did other people hate us? And then there's also to help other people try to figure out are they gonna bomb us? So the idea is to try to be relevant in the current day. Then we started a year and a half ago to lend television. So television news in the United States has closed captions so you can search and find things. So can you go and search something and find clips that then you can use in your blog and if you want the whole thing we print it to a DVD and lend it to somebody and they have to return it in 30 days. And why do this? Because it's arbitrarily clunky and it's just to go and be a library and be a little clunky. We put up the news but we only put up the news after it's one day old. So we're continuously recording and we're recording a lot of television now. But now we're then available to anybody with one day notice and it's been fine. All of the broadcasters have been fine with this and so it's been working out. So having even television news available we're starting to do, we did a project in Philadelphia to record all of the television programs and then take, have people go and mark all of the political ads and then using federal records, another non-profit figured out how much money was being put into that particular secret dark money slosh pit and just support those ads. It's trying to bring visibility to what's going on in our electoral process and it's worked, everybody's been happy and it's been working along quite well. So, archiving, moving images, it's not that hard to do and people love it. Even if you don't think they would, just try it, put these things up, whether they're things from people's cell phones and probably bring them up or have home movie day and have people bring in their home movies and start to bring things up and make them available. We have over a million items now and a hundred collections at the internet archive and it's been quite popular. So we've been able to collect these large things but sometimes by making things accessible you end up in some of the things that I think we heard about earlier in terms of, well, what does it mean? We're going and making these things available under free speech approaches. There's a provision in a post-2001 law called the Patriot Act in the United States that allowed anybody in the law enforcement to go and demand information of libraries and other organizations and you couldn't even tell anybody that the query had come in. You just had to go and give the information and never tell anybody about it in a gag order to it. We got one of these. It's called the National Security Letter. There have been hundreds of thousands of these things issued and when we got one of these there was this question, what can we do with, what do we do? We're fervently pro-reader privacy and they wanted information about one of our patrons and so we asked our lawyers, EFF and ACLU, what can we do? And I asked, actually I asked them and asked what can we do? And he said, well, you can't even discuss this with your board. You can't discuss it with anybody else. I said, you just have to do it. I said, or what? I said, there is no provision for or what? You're then breaking the law and you go to jail for five years. It's like, well, that's not good. So I said, is there anything we can do? And they said, yes, you can sue the United States government. So we sued the United States government. These are the lawyers that helped us out and we won. And it's because we're a library that, and it's because we're a library that allows us to basically draw on a long history of what it's like to have reader privacy violated. That people still, it rings in their sort of internal mythos that we've all been brought up in. And so I'd say it's important for us to shield other people by using protections that traditionally have been given to us. I've seen no law that goes and says, we don't need libraries anymore. It's the digital world, they'll just get it. We still need libraries and we still need to move forward. Okay, a couple more. Software, we're starting to archive software and we've made this system with a whole bunch of people to make it so you can run old software. It's completely fun. In your browser, there's this way of cross-compiling Apple II and Commodore and Atari operating systems. They run in JavaScript in your browser. So it boots up an Apple machine and then it loads a virtual floppy and it's running. It's completely weird. It was the first time I'd ever actually seen Visicale run and we've been able to do this with lots of different, excuse me, lots of different software. That's becoming very popular. Arcade games and the like. And here again, we've got to put things up and if they're publishers that say, hey, I'm still trying to make money out of that, we then take those down. And that approach has been working very well. If you're respectful, straightforward and non-commercial, it's been working out very well for us. What we're probably best known for is archiving the World Wide Web. We started in 1996 by taking a snapshot of every web page from every website every two months. So a snapshot and start again, snapshot, start again, snapshot, start again. And it's starting to get big. We've got now over 400 billion pages in this dataset and we started making it available as the Wayback machine. So you could type in an old URL and see past versions and surf the web as it was. Have people used the Wayback machine by any chance? Okay, of all bunch, try it, thank you. So the idea is we thought this would be used by people that get their old stuff or maybe researchers. Turns out to be wildly popular. We get about 600,000 people a day using this. We get about 2,000 queries per second against this database. And people just love it and they look for all sorts of things like old Yahoo pages. So it's kind of an uncluttered view of a search engine, kind of like Google today. Or pets.com, that sort of kind of wacko ideas from the 1990s. Even sort of not terribly well done early websites. They're maybe a little embarrassing but people are still proud of it. But there's also another reason to do this. This was a user that went and pointed this out. This is a press release from the United States government, White House, and it was the George Bush on the aircraft carrier saying, announces combat operations in Iraq have ended. Until a day or two later, they changed the press release. And it says now major combat operations in Iraq have ceased. Changing a press release. There was no note of this, no nothing. There was no record of it, except probably in the Wayback machine. And this type of thing of making it so you can't rewrite the past. It's a George Orwell nightmare if we couldn't go and see the past as it was and have third party organizations be able to do this. And all of this stuff we're doing, we're still getting support from the Library of Congress and the National Archives and the like. So this is part of our ethos to be able to bring this type of material alive. We have archive it now so we don't just have robots doing the crawling. We now have a thousand librarians that are going and figuring out what's the important things to do. And we work with the National Library here to go and crawl the NZ domain to go and add that to the collection here and to the Wayback machine. But we've also worked in collaboration with many others to do things like the tsunami collection where there's many people that came together to go and archive things that were important towards understanding that. But if you look now they're gone. So archived, gone. And so even with these collections, we have to move fast. Otherwise things go away. The average life of a webpage is 100 days before it's either changed or it disappears. And therefore we can't wait for things to come to us. We have to go and work with them. We're starting to get better at rare books and letters and archives by basically just photographing things with classic gear with sort of a, oh, I don't know, efficient way of sort of moving through the process. You can do this kind of thing easily. Next up is personal digital archives. People's archives are now spread all over all sorts of systems. And if we're going to have a record of what people were doing, we're gonna need Flickr, YouTube, Bandcamp. We're going to need to pull these things back together before these companies get out of that business or fail. So I wanted to basically say, okay, content-wise we could probably bring it on and we could even make it accessible within the rights regimes that we live within and do this in such a way that we're actively supported in doing so and not rogue organizations. So we can do this. But then there's a question. How do you preserve this long term? Because the motto's of libraries is we're preservation and access. How do we do both of these? So we don't exactly know. The digital things are a little bit tricky. But we should always look back at history. The library of Alexandria, which is sort of the progenitor of all of our work in the library field is best known for burning, right? It's best known for not being here anymore. It's so completely gone. Even as a center of learning for hundreds of years, it's gone. So if we wanna learn from the past, make multiple copies and put them in other places. If they'd put other ones in other countries, we would have all the other works of Aristotle, the other plays of Euripides. So when we started this project, they were rebuilding and making a new library of Alexandria. They said, should we work together? They said, yes, here's a copy of the web collection and some movie collection. And this is the library of Alexandria in Egypt, which actually is running an active mirror of parts of the collection of the internet archive. So it's 200 terabytes when it first launched in 2002. We also wanted to have copies in other places around the world. So we started a shtick thing, a nonprofit in Amsterdam. And it's thanks to Access For All, they've hosted a partial collection. So if we have, let's see, we've got a copy in a flood zone, Amsterdam, the Middle East, and an earthquake zone. What could go wrong? So, but hopefully we'll have some more copies in other places to be able to grow. We started getting better at making our own machines, open source machines, to be able to be what we'd serve things from. This was one of my favorites. We actually had the idea of using a shipping container as a data center, or better yet, to actually make it into a computer. And this sat outside, this was the Wayback machine, for a couple of years. It was a computer that was a shipping container that sat outdoors. So if you want to know how big the web is, okay, you have to ask. How big is the web? It's eight feet by eight feet by 20 feet. And it weighs 26,000 pounds. And if you, and that turns out to be about 80 micrograms a hit. But that's what the web, at that point, weighed. We're now getting a new generation of machines that we've been using for about five years. And we've archived over 10 petabytes. It was mega, giga, tera, peta. And it's doable even as an organization like ours. And these are actually in our building. So it's an old church, and we actually installed them into the machines, into our building. So if people would feel that it's part of them, it's not someplace else. It's not in a co-location center or in the cloud, wherever that is. Whenever I see these pictures of these data centers, they look like the kind of place Darth Vader lives. And it's sort of evil and dark. And it's like, no, these are our stacks. These are our books. These are the things that we are in digital form. So let's bring them into our lives in a different way. Digital archives, we have an ability to host your material. So if you're finding that the costs are prohibitive, just start putting things on us. Or maybe use that against your IT teams to go and say, hey, they'll do it for free. Why don't we do it for more cost effectively than you're currently doing it? We're starting to also collect physical materials. We thought we didn't want to. When we were digitizing books, we'd give them back to the libraries. But we're finding the libraries are throwing books out at an at scale. And for reason, because people want things on screens rather than taking up valuable floor space. So we started getting good at actually storing books, books, music, and video. We use shipping containers and we put things in boxes so that they're basically a preservation copy and the access copy is online. So it's really inexpensive to store things at this densely. So if you're in these tussles internally about, well, should we throw things out? Maybe you look us up and go and say, you know, we could do it the way they're doing it. These are in boxes that are all labeled acid free on pallets that have catalog data on them in shipping containers in warehouses protected by nonprofits. That this structure of going and building long-term repositories we hope will be a different form of repository. Less accessible, but a lot less expensive for going and making lots of things available in the future. We've now got about four containers full of movies that people have donated. And this is where our LPCDs collections. We just got a lot of microfilm and microfiche. So it is possible to not throw these things away to be able to make it accessible in the future. So lastly, universal access to all knowledge. I think it could be up there with the Man on the Moon, the Gutenberg Press, the Library of Alexandria, something that's remembered for millennia. I think it could be one of the great gifts that our generation gives the next generation of universal access to all knowledge. And lastly, carved above the door of the Carnegie Library in Pittsburgh by Andrew Carnegie, known as a capitalist, is free to the people. Thank you very much. Do we have any time for questions? We have time for questions if you have voiced the questions. All right, I'll just talk a little bit. There's been a video on the Twitter channel, so please, questions. How do we get all of New Zealand online? Well, would it take to host a whole version of the Internet Archive? Well, the Internet Archive costs $12 million a year. The hosting costs are just a fraction of that. About $5 million comes from libraries paying us to digitize books at $1,000 a day. And then there's the collecting, the web is about $2 million. So it's certainly less than $12 million is the total cost. I'm not sure you wanna do that though. I think you might wanna really concentrate on what's important to New Zealand and digitize the whole damn library. Just do it and work together so that it's a representative of the different constituencies and interests and traditions here and so that these libraries have their own curatorial function. But the idea of having, we send back all the New Zealand webpages back here, but there might be ways to do it and then maybe another stepping stone is getting big. But it's all around doable. It's tricky and you have to want to do it. But it's doable. We try, we try to work with commercial organizations but there's often a big gap between the nonprofit world and the for-profit world. It comes and goes as to whether it's useful enough to them to deal with us because we're open access oriented and so we have a fairly different business model. We give everything away and we give everything away in bulk. So not only, oh yeah, it's open access. You can get to one page, two pages. No, you want everything ever written in Hungarian. You can get it. And that type of business model is not very aligned with most corporations which are very, very property oriented. So it would be great and maybe it will happen but sometimes some of the newer internet companies write us into things. Like when GeoCities was going down, Yahoo contacted us and gave us special access so we could just fire hose it and just basically download the whole thing and make sure we had a really good snapshot of GeoCities before they went down. So there's starting to be internet archive aware web companies but sort of the old style publishers, they're really designed on a very different scarcity model. We're a supporter of the Digital Public Library of America and a lot of our metadata records go into their world. The new effort, it really came out of the Google Books Project, sort of historically. The big problem I have with the Digital Public Library is the word the, I don't want the Digital Public Library of America, I want lots of libraries. I want lots of really quirky, weird libraries. I want libraries that specialize in different things that can work together to make things such that an end user that goes to Google or whatever comes next can go and find things in all of these collections but we want people to feel ownership and control and so far so good. They're basically just a metadata aggregator but if they really follow through and if that's sort of the library to end all libraries then that story we know doesn't end well. Thank you. What's the proportion of our effort that goes into metadata? Well, we pretty much use the metadata we can get from others so like when we digitize books or whatever we get metadata from other places and when we're doing music we're getting it from music brains and FreeDB and a little bit from Grace Note and different places so we don't have catalogers per se working at the Internet Archive. We're kind of a tech staff if you will so there are 40 tech staff and about 100 people digitizing books is sort of, that's the staff of the Internet Archive so it's really the partners and we work with 500 libraries where they source books to us. We work with 300 organizations where we digitize collect the web on their behalf and they're doing the cataloging and outreach so we're sort of the tech backend for a lot of these so that's why when I say we're building libraries together is we're just playing our role and then within the broader library system. Yes. We are building books, there's magazines. Yes. We're doing only okay on serials. We really expected we'd get a whole lot of microfilm we bought a bunch of microfilm machines so we could just burrow through all of the newspapers that have been microfilmed over the years and the libraries just were not forthcoming in lending us the microfilm. So anyway so we didn't make as much progress there as possible on journal literature which has become an enormous problem in academia, the higher education. We've got 1.5 million journal articles including from the Biodiversity Heritage Library and things you'll hear about through this conference but that's way short of about the 30 million that really make up that corpus and it's got these problems, these old business models that JSTOR went and took one of our colleagues got in so much trouble that they were threatening to put him in jail for 35 years for reading too fast in a library. So we have a problem out there that the new type of research, these sort of data mining like research systems don't work terribly well with the existing players even if they're not profits so we're in the middle of a shift and we've been trying to get the monographs going, we're doing okay a little bit on the serials and on the magazine front we've made some progress of going for popular magazines like Creative Computing and Omni Magazine and so these are often digitized by volunteers and not just volunteers, there's just people out there and we've been hosting them and most of them have survived the scrutiny of whatever publisher still exists so just going and proceeding has been really very and then take down the things that are a problem has been working in the area of magazines and I would encourage that general approach be respectful, be open, be non-commercial but try to bring access to things that are no longer accessible through the commercial structure and so that's been working. Brista, thank you for an inspiring overview of your work with the internet archive. I'm just thinking forward to an apocalyptic environment where we might have to access those shipping containers and I was just thinking about, it seems like a very effective way to make sure we've got a copy of things that have been digitized. How do you access them though? I mean, how do you actually get the stuff out? Do you have to unpack a whole shipping container or how does it work? Yes, so the question is an apocalyptic future and we actually have to get back at these shipping containers. How are we going to get access to it? And it's clunky. A lot of the off-site repositories that are built now by libraries have things like they need to be able to pull any particular book within a 24 hour period of time, that kind of thing and it makes these repositories very expensive but I'm very glad that they're there and so I'm not suggesting we do away with those. This is meant to be a different augmented method of storage. We're better designed for either an apocalypse or somebody challenging to go and say, hey, George Orwell didn't say that, he said this. So people changing the facts so we're going and getting back to an original seed if you will, sort of the seed bank idea to be able to understand what was or if somebody wanted, had a new idea for how to digitize books or analyze them in some way because I guess I'm not so arrogant to think that we're the last ones to want to do this but checking out 100,000 books from a library is a burden and most libraries have said, can you come back on Monday? And so we're really well designed for somebody checking out 100,000, a million books at a time. So I'm hoping that it's useful otherwise. It's some money that's kind of wasted but I don't know, it's so inexpensive. I mean, it only costs us a couple million dollars to go and buy a warehouse and then outfit it. It's another million or so and you can store six, seven million books in that for decades, decades. And we have to be kind of careful about humidity and temperature but we could do that passively so even as funding goes away as political swings come through, it'll survive. So that's what's really designed for sort of long-term seed bank use. For another couple from the back? Yes. If somebody pays us 10 cents a page, we'll digitize anything. So we'll do the same book over and over again. Oh, whatever. Because that basically covers the cost. When we receive things like these donations of newer books, we check to see whether we've done them already. And that's a little bit of a fuzzy process. ISBN helps a lot. OCLC records help a lot to figure out what hasn't been done. And then we just try to do it. We try to bias towards things that people are going to use. But we're coming out with a new thing which I'm kind of excited about. It's in this next year. It'll be a tabletop version of that scanner. You know, you saw those scanners in the room. But this is a tabletop one that's meant to be operated by anybody. And it costs about 10K or something like that. I think it's about $10,000. But what we'd like to do is tie it back into our system such that if you put a book on it, it'll say, oh, we've already got that. Is this version good enough for you? If so, then tink, you don't have to digitize it. You get the digital version of that as if you had scanned it. So it might make it so that we could coordinate our collection capabilities so that you just start going down the row and just finding which ones aren't online yet. And if they are online already, then you get it as part of your library collection. This is what we're doing now with the CD collection. And we're finding that we have about half of even in these deep archives, we have half already. So it decreases the cost of digitization. But our goal is to get everything. So the idea of, and it's not that hard. It's not that big a deal. If we were to work together, it's a small part of our budget, frankly, to go and just say, let's just nail this puppy and get it all digitized and done. Yes, sir. Okay, and then next. I see you in the dark in the back. You're next. Yeah, that's our budget. So you've been around for a long years now. Yeah. I always think of Internet Archive as this kind of crazy, fantastic college industry told me about an industry trying to create universal access. But you haven't sold that concept of universal access to the big boys at Google and Facebook. Do you think they're gonna do that? Big boys with the big money? Google, Google went charging off to do this. I mean, in terms of an organization that's closest to us in vision, but you know, they're much, much bigger in scale. It's Google. And they did 20 million books. But they did it in a way where they locked up the public domain. So they took the public domain and working with these libraries like Stanford and Harvard, they came up with a deal, a back room deal to go and put it in the Hathi Trust so that you have to be a subscriber to have access to the public domain. And if there's a sin in our world, it's locking up the public domain. And so I think, and it wasn't Google necessarily, it was actually these libraries really came up with it. And then they came up with this cockamamie scheme to go and build the Book's Rights Registry to be a centralized organization that would take control and ownership over all the orphans, all the out-of-print books. And it was the Book's Rights Registry. And this was beyond the pale. And they did it through a class action lawsuit. And it I think shows, it was really a deal brokered by the libraries and the author's guild. And it doesn't pass the smell test. And it did so much that the Copyright Office said that it violated treaties, that the Justice Department said it violated antitrust laws. The country of Germany and the country of France objected, not the publishers in Germany or France, the country of Germany, the country of, and it was this class action settlement in a minor court in New York. So I think leaving it to those guys to do the right thing is not a good idea. That there are people that are paid big bucks, us, to do things differently that are not commercially viable. We're not going to get the big boys to go and figure out how to do OCR of Balinese long-tars. It's gonna come from us. It's those of us that care about the long tail. So the Wikipedia's, EFF's, Mozilla's, Linux Foundation, there's a whole infrastructure class that has a fairly different priority scheme, but it's always been this way. Libraries and publishing have always run in parallel. So I would say that the more the burden is going to be on people like us, we'll get bits and pieces of help from these guys. Microsoft chipped in $10 million to digitize the pile of books, basically to put a thorn in the side of Google and to get us going. And so thank you, Microsoft, and we continued on. And Microsoft is jazzed about it because we're continuing to put a thorn in the side of Google and it doesn't cost them anything on an ongoing basis. So I think we have to act our sector. I think one of the brilliant things about the internet was the distinction between the different classes of organizations, dot edu, dot gov, dot org, dot com, dot mill. These are sectors, it's sort of an ontology. It's a breaking down of philosophical, what are the different organization structures? And it's true, these types of organizations think differently. Dot gov thinks differently to dot org. So we work a lot with dot gov because we can do things that the dot gov guys can't do. But we can't raise taxes like they can. So there's ways, there's things for us to work together. So I think that at least appealing to the for-profits is gonna be difficult. But appealing to big guys like you guys and think of you guys as the library system of New Zealand, I think is completely within scope. And back. You've given us a very potent vision of universal access for all, predicated by concepts of freedom of information, freedom of speech, public domain. Everything is available 24 seven, et cetera. We live in New Zealand, it's a bicultural country. And it's one where under the principles of the Treaty of Waitangi, we have to respect our indigenous cultures. I come from Canada, where First Nations have concepts of ownership, access, preservation, permission, et cetera. So in your dealings with various communities in over the time of the Internet Archive, could you share with us the negotiations you've had with indigenous communities and the terms of reference that you might have negotiated so that what you call the rights regime may have a different nuance with different communities of origin? Yes. I don't have a deep experience with indigenous groups, but I'd say we're all indigenous. Every culture, every community has different rules for what's permitted, what's acceptable, what's celebrated, and what's kind of just kept below the surface. And building systems that can respect those different priorities in those different areas. For instance, in the United States, an awful lot is allowed in terms of political speech. But child porn is like the third rail, or almost anything having to do with Muslims. So this is the area of sort of the hot pocket there. In China, they've got some rebellion movements that they're really trying to, or they see as rebellion movements, that they think should not have a voice. In these different, when we were digitizing the Balinese materials, they said, watch out for the black magic long-tars. And so they wanted to sort of have a different level of control on these, because information is power. And so how do we go and build these systems to respect those different communities? I think it's an important reason why we shouldn't outsource it to Google to go and solve all of our, or the internet archive, to solve all of our information curatorial needs going forward. We really need to be participating. We need to train our staffs in digital technologies. We need to go and participate and own and control things to the extent that it overall helps, but still weave it together in such a way that the end users can find things across these collections. Because if we build too many silos, people won't come. We'll end up having spent millions of dollars on these really cool projects with big ribbon cuttings and ceremonies and the like. And then you look at the usage patterns and they're not coming. So we have to basically make it so that things are findable, but reflect the communities that are putting things, putting things up. At least that's been our solution so far. I think we're gonna wrap it up now. I think maybe if you wanna talk to, talk to Bruce to later in the break. I'm around all day and tomorrow. So thank you very much. Thank you. We'll have a small talk on that. Thank you.