 We've got Bruce to kill right here. He did a very good talk last night last afternoon about distributing the web and We had a good turn out there. It was really awesome talk. Thank you Right now. He's gonna be talking about the internet archive Which you probably know He's the one that founded it. So he knows a lot about it. I think That's here it is Bruce a guy that's sleeping Snoring very loudly right next to my tent. So I know that not everybody is it is awake yet. This has been completely Completely great. So I'm gonna talk about basic you can think of the internet archive is hacking the copyright system or Trying to get institutions to do things that they're not used to doing Or the way I like to look at it is let's go back to the library of Alexandria and do it again And let's go and do the library of Alexandria that's available to everybody Can we make all the books music video webpages software ever created by humans? Available to anybody that wanted to have access to it. Can we do this and it turns out Technologically actually you can that between the storage of what we have on on computers now And the internet in terms of getting it to people you can do it So you say well, why hasn't it happened and there's a lot of institutional issues Of trying to get this to all happen That has taken a lot longer than I thought but we're getting there So what I'd like to suggest is that universal access to all knowledge is within our grasp and we're getting there But we need a lot more help To be able to get there. So who are we the internet archive is a non-profit library Is this showing? Yeah, um, it's a non-profit library in San Francisco. Please visit us and We've been around for almost 20 years and the idea is to try to do the pieces of the internet that haven't gotten There yet. So as people are going and making things available on the net They're mostly forgetting about the old things and the like we see ourselves in the tradition of libraries I like looking what people carve in stone What they carved in stone above the library in Boston was free to all and this was put there by the robber Barons, these are the capitalists that were not nice men Right these guys were all about property and mine mine mine Yet they carved free to all above the library that was their legacy and Why because information serves a different purpose than just selling stuff Back and forth. So this is the tradition that we're in now. I'm an engineer So I go out of any problem from an engineer's perspective You go and say okay if we want all books music video web pages up there You have to say well, how big is it? How hard a problem is it? Where do we get it from? How do we make it all work? So that's the structure of this talk. All right, so let's start with books We say all right We want to do all books the biggest book library by far is the library of Congress and they say they've got 28 million books So 28 million books is by far the largest library ever made in the world A book is about a megabyte if you have it in Microsoft Word. So 28 million megabytes Mega giga tera 28 terabytes at 28 terabytes That's four hard drives that you can buy at a local store So you can have all of the words in the library of Congress in the shopping cart for less than you pay in A month's rent Something has changed something's happened. We could actually think about having all of this history Easily accessible then the question is you know what would you want to do it and the answer is yes Actually, we're getting we're pretty used to having books on screens even scanned books Scanned books on so it's not on Kindles, but you know images of pages the screens are good enough You get these beautiful books, but you can also take it another step in some places. They say well We don't all have screens. We're not all online. Can you print it back out again? So we made a bookmobile It's a print-on-demand bookmobile. So we put a satellite dish a printer cutter Binder and kids make their own books. It costs about a euro a book to go and download print and bind a book Cheap so it's actually cheaper to do that than to lend it from a library a study at Harvard said it cost $3 just administratively to lend a book. So for small books you can actually do You can make things available as long as people don't yell at you In India we went and made a couple of them This is the first day at the library of Alexandria in Egypt engineer working with a kid Happy kid with his own book and we did it even in Uganda. This is the first book this girl has ever owned So eat so we could take not only our books and music and video and make it available to us But we can make it available even another step out there, which is pretty cool. There are these Rube Goldberg machines There are these oddball things that go and do it on demand and they can make books and they come out a shoot But I think the real way things are going as we all know is more on the area of Screens and the screens are getting so good That we can actually do beautiful books that are pleasure to read And go and take our our books and make them available in lots of different formats My favorite on here is in the bottom right It is a little talking machine for the blind and dyslexic It talks a little bit like this, but they now have access to millions of books that they never had before Okay, so now you're convinced maybe that it's a good thing to have it up there We can go and have the storage to be able to have it up there that how do you get it done? Well, we've been doing these different things like Putting scanning centers up. This is at the library of Alexandria. This is a guy. It doesn't look too happy Well anyway, they've scanned about a hundred and seventy thousand Arabic books there And it's been continuing along then we designed and built our own scanner Called the scribe we made these scanning centers This is the one in San Francisco where they're doing microfilm down the center and gotten it So that's fairly efficient to basically turn the pages you say well shouldn't you use robots and we've tried the robots They don't work very well. They tear the books and they're expensive and they don't work very well I think they could work well, but the investment Hasn't been made either by us or anybody anybody else But so we've been doing it basically by hand and getting beautiful books done. This is rare books From Korea. This is biology books out of China by working with the Chinese Academy of Sciences And we've now set up 33 scanning centers in eight countries where libraries are doing this you say well Google has already done actually a lot more than we have About ten times more than we have but they have a lot more money than we do and they locked it up And so they basically took even the public domain and made it Property again, and this is wrong I mean if there's a sin in our world is locking up the public domain the public domain is small enough as it is We should be arguing about maybe about what's in copyright So if they're the Microsoft we're the Linux and we're digitizing pretty fast So I've been going around and asking different places Can we get everything ever written in a particular language? So I got to meet with folks and in Greece, but they were kind of busy imploding there is Iceland and We got yes out of Parliamentarians we got yeses out of the head of the libraries and there was one per out of 300,000 people in Iceland There was somebody that decided they were in charge of the no department So they said no and all ground to a halt, but Bali said yes So we basically started working with Balinese to go and digitize everything ever written in Balinese We want to just want to do it all so let's go and see if we can get whole languages It turns out the way the Balinese right is not on paper, but on palm leaves They scratch it into palm leaves. That's completely cool And so these are these priests that worked with us to go and digitize these things by photographing them They're just completely beautiful. And so now we've gone and digitized and photographed everything written in in Balinese When we ask them, how do you read your your palm leaves they say well most people don't read It's either the priests or there are these cool performances that are the culture So shadow puppets or performances And so we started videotaping these and starting to make them go online So I I'd like to just give a round of applause for the Balinese to be the first culture to go completely online I Don't you think we should do this with turkish or uh or or dutch or Danish materials We can basically go and do this in such a way that their businesses still work And still come upright. So scanning centers. We're doing about a thousand books every day in these scanning centers all over the world We've got about three million free ebooks that are public domain And we have modern books that are available for the blind and dyslexic modern meaning probably in copyright But we're also doing a lending system. Okay. Here's hack number one On how to go and get things available to people Even though they're in copyright. So we've been we try to buy books from publishers so that we can lend them one person at a time The publishers in general have said no so far. So we've been digitizing Books and we still lend them one person at a time So where google got into trouble by getting into lawsuits and the like we've gotten into no lawsuits And so the way that this works is you can go to open library.org Click on a book say this uh html 5 for beginners Which is actually a book that we bought but you'd see that it's checked out by somebody So then you have to put it on your wait list But if you go for a less popular book say like this history of mayflower descendants from the boston public library Surprise. Nobody's checked it out. Okay. So you can go and say okay. I want to borrow This book you have a choice of formats And then you borrow this book and another thing that's cool about this is it's borrowing it from the boston public library So these real libraries are digitizing books that are in copyright non rights cleared books Digitizing them and lending them just like we are a library and this has been going on for four years And it's been just fine So it's a mechanism of trying to be respectful of those that are trying to make money off of this stuff But still having access and having it happen And we've been trying out this whole approach of how far can you go and working with publishers But not working for them And basically building a library system and making it work So uh, we've been able to get books By the hundreds of thousands that are my uh current books and make them uh available Okay, books. Let's go on to another media type music So what if we want this is an area that has more lawyers, um than business people it seems I mean just this is an area that just people like to sue each other in the whole music area So we've had to be a little bit more careful Uh about how we've gone about this and we first started with rock and roll bands that wanted to be distributed So it turns out the grateful dead started a tradition of allowing people to record their concerts And then trade them on cassettes With other people as long as no one made any money. That's been a key thing that I've found in all of this is no one made any money Um, and so we as these moved online the bands were up for being distributed online So we asked some level of permission and it's usually the fans going and saying is it okay to put your concerts on The archive and somebody has to say yes, maybe it's the drummer You know or somebody um in that community says yes This is a lot less than what lawyers would like with you know signatures and all of that stuff It's like nah, is it okay? Yeah, and and if it ever becomes not okay, then we take it back down again But you know that's only happened once out of and we now have 6 000 bands up there and 130 000 concerts And everything the grateful dead's ever done So the idea of getting music up there and out there Is is starting to work as well we're there are these old Collections that were on old websites Before mp3.com was a format that was standardized It was the bad old days of a iff anybody remember a iff Yeah, bad news. Anyway, um, but there there were these sites that were trying to go and Do these and distribute music one of them was the internet underground music archive and so They died a long time ago and So we're now up up up. We have them up We have got lots of net labels that are using us for free hosting and we've been working along with Lots of different record producers now to start to go and do cd digitization And there's a some some engineers actually here in amsterdam That have been doing cd digitization software to try to help get all of this stuff to work well And we're starting to get donations of lp's 78 rpm records and the like and starting to do mass digitization Of these different formats So why do this? Well, we haven't just gone and put everything up on the net We'd like to maybe do 30 seconds and then point to amazon.com or something like that But so far what we've done is made it available to researchers And and the like and listening rooms So that at least on campus you can have full access to it So we're starting to get better at music and getting better at the the whole areas of music so We've now added to our collections of other audio recordings that are freely hosted on the internet So the idea of having infinite storage infinite bandwidth forever for free For some communities is a very compelling offer Um, so basically make things available and put them up So even audio is doable Moving images Most people think of movies as hollywood films and we're not that good at collecting this stuff yet So mostly we've been doing old films that haven't been particularly Distributed like through hollywood like those old films you saw in high school when they had a substitute teacher They'd wheel in this projector and they'd show you why to be a typesetter or you know, these old are you ready for marriage? Anyway, these so we've digitized these and made them available and people love them I'm not quite sure why Um, but they're there and uh, we've been making these things available And and people have been uploading things long before youtube youtube really ran away with the whole Area of video hosting they own it But there's still people about a thousand a day people putting things up in the internet archive because they want them maybe more permanent Or or some other Reason so we've been doing digitization even vhs tapes Which almost all have rights problems But we find if we do ones that aren't on dvd nobody gets mad at us. So we're getting better at digitizing Even television is doable. We started archiving 20 channels of television in the year 2000 Russian Chinese Japanese Iraqi Al Jazeera bbccnn 24 hours a day dvd quality the idea is to at least hold on to it And so we've got 9 11 collections, but we're now starting to Lend television So if you go to archive.org you can basically go and search on what people said If it's in us television news because it's the only thing that we've got the closed captions There's the transcripts of what people said and what you can type in and go and say I want to see things about Edward Snowden and find all of the clips that have Edward Snowden in them and you can then take those clips and put them in party rear blog or put them Or request a dvd of the whole program if you want to make a documentary and this is working Even though it's an enormous amount of materials the the publishers that the networks are happy about this So we're actually able to make steps of making things Widely available. We want everyone to be a john stewart research department like in the comedy central where they go and say Here's what a politician said before and now here's what they said now and doesn't you know doesn't match That type of thing to have people think critically about television So even moving images Is doable So if you do this kind of thing that we're doing to go and offer free hosting sometimes you attract attention from people you don't like So the fbi gave us one of these nasty letters called a national security letter A national security letter is when they demand information About our patrons of the internet archive our users of the internet archive And we can't even say to anybody that we've ever gotten this request Um, so we got one of these things and we got our lawyers the electronic frontier foundation Oh, hooray for the electronic frontier foundation I said What what can we do what can we do about this and they said well you have to comply Well, can we talk can I talk to my board about it? No, can I talk to anybody about it? No Can I ever talk to anybody about it? No If what happens if I don't do it jail? Oh, is there anything we can do no Really? Well, you can sue the united states government So we sue the united states government and one So there have been hundreds of thousands of these letters hundreds of thousands of these letters sent out There have been only three organizations that have publicly gone and pushed back on the government And they've been all libraries. What's great about being a library is you're allowed to go and say no There's a long history of people being rounded up for what it is They've read and bad things happening to them and people remember this and so where google doesn't have At least publicly hasn't said no Libraries are sort of our role makes it so that it's not an embarrassing thing to do So we find that being a library is a good thing software So there's a lot of software out there out and we're getting better at going and reproducing this Software by running emulators in the browser. This was a real mindblow of taking a c emulators of old apple or comador or atari uh software and cross Compiling it within script and into javascript and it runs in your browser So you click and it actually boots an old ibn pc in your browser and you're running your old game It turns out this is very very popular because I guess a lot of people spend a lot of their early days playing games But anyway, they're now back to oregon trail and all these other games are very popular Well, we're probably best known for is crawling the worldwide web So the how many people have used the way back machine? Yay So the way back machine is a way you can see the web as it was Um that so many people are pouring their lives into the web, but Web pages only on average last a hundred days So we go through and we try to archive them and we started when it was pretty small and it's now getting pretty freaking big We archive about a billion pages every week To be able to create this way back machine. Um, this is what yahu looked like in 1996 pets.com with the little sock Guy dorky old Web design and I looked up what the um, I thought oh, why don't I go and look at what chaos communications camp looked like? So this is the chaos, uh website from 1997 But it actually looks a whole hell of a lot like the current one. So um So there's there's a little retro thing going on. So it's not quite as dramatic There's another thing that this has been used for a user came back and said Hey bruster, there's there's been a change. Um that your web collection has the only place that can show it this is a Press release by the united states government of the president being on an aircraft carrier Saying mission accomplished about war in iraq and it says that the president announces combat operations in iraq have ended Then a couple days later they changed it and they put in major combat operations have changed They didn't make any notice that they changed A press release. It's george or a well right to be able to go and redo press releases from the past It's living in the day. So now this is this is an example of why we want something like a wayback machine Another is we have an web archiving tool That's a subscription based service that a lot of companies and libraries and museums pay us to do which is helpful It keeps our lights on um And we now have 1700 curated collections like the japanese disaster Where people have come together to go and say you should archive this because soon it's offline Or archive these things Soon they're offline. So it's been a community project to be able to go and work together to build up What are the key things about an event to make sure that we really have it done well? So even web is doable. Um, so the worldwide web collection of ours is probably about 10 petabytes of data It's growing at a few petabytes every year. We have about 450 billion web pages We get about 600,000 people a day using it. It's a database of 450 billion pages That gets queried about 2,000 times a second. And so that's sort of what the wayback machine Is and it's been much more popular than we thought it would be which has just been great But even rare books and letters you can go and do these things by photographing and breaking them available Next up for us is personal digital archives. How are we going to do this stuff that's splintered on all sorts of places? So people don't even have them. It's not like boxes in people's basements anymore. It's not even hard drives that you have It's these flicker sites in these other places that you've gone and put your memories and Guaranteed these guys are going down. Uh, we um even the the rich companies like google google video Ever heard of google video? Well it used to exist. There were six million videos on it But they took it away. Uh, yahoo video is now gone. Geo cities is famously gone Um Apple computer the most valuable company in the whole world couldn't figure out how to run 200 terabytes of mobile me on a continuous basis So we archive that and making it uh available So don't count on these places. They they they don't have your best interest in mind They'll turn it off whenever they want to So how do you go and preserve this stuff both the physical physical and the digital if we want to build a library of alexandra version two Well, what's the lesson out of the library of alexandra version one? What's it best known for? Burning right it's best known for not being here anymore. Um, so How do we go and make it so that we don't do that again? Well, let's have multiple copies If we put multiple copies in multiple places that'll have different fault modes Then I think we have a better shot at it. So we gave a copy back in 2002 to the new library of alexandra And this is actually they redid their first floor if you get to alexandra go. It's a completely great Um, uh city and uh a library and so there's this is what it looked like in 2002 Access for all gave us free space and they have for the last 10 years. Thank you access for all and So there's a partial copy In uh in amsterdam. Um, this is what it looked like in 2008. Um, this is what the way back machine Oh, we came up with this idea of doing uh data centers in shipping containers sun ran with it And they gave us one of these so uh the way back machine Was this and so oh, I get to ask how big is the web? You have to ask me how big is the web? The web is eight feet by eight feet by 20 feet So For several years when you use the way back machine, you are actually using This shipping container that sat outside on sun's campus. Um, which is pretty great We've now made these prettier machines because we bought this cool church And so there are these blinking lights inside. Um, that is how we've been able to scale up We've also started to scale up physical collections because libraries are throwing things away So we've basically taken our our ideas of how to do really compact and we've used shipping containers inside warehouses So we have books or music or video that are in boxes protected by a box Protected by a shipping container protected by a warehouse protected by nonprofits So the idea is to try to have layers of protections against certain types of attacks But we're not there yet because okay, we've got a copy in an earthquake zone The middle east and a flood zone What could go wrong? Um, so I I think we need some other copies in other places and more participation towards keeping our Our whole society and cultural heritage safe A couple other things that we're worried about the at the internet archive is some of the uh ways of trying to be sustainable In the area in the era of corporations corporations are basically becoming real strangle holds On certain types of things like end user access. So we've been doing free public wi-fi And the like um, so every time we get a building we make put free wi-fi with no passwords and stuff up and that's that causes people to get Free wi-fi in some cases and mad in other cases both of which are perfectly fine with us We've even gone and tried to apply open source ideas to housing Housing has become a real problem because of the debt burden. So we're starting to try to transition Housing to be able to support non-profit workers that are debt free if you're interested in this ideas We're we're really trying to figure out some of of these things If you're going to go and build a new housing system for people that are working in the open source world Then you're going to need an organization like financial institutions. We started a credit union And been trying to work with some bitcoin companies. Oh, did that make regulators mad? anyway, um, so there's the idea of to take Not just trying to make the data sustainable and copying it forward But how do you go and get the communities around these materials to be able to have lifetimes that work? We're way into bitcoin people have been donating bitcoin if you've donated bitcoin and thank you very much We pay our employees partially in bitcoin. Uh, and that's all all around working Project that needs help. There's a lot everything needs help, but here's a few of my top We're redesigning the Wayback machine and could really use some help on on trying to make this Sort of new and different and neat How do you do search at a level that we can actually do because searching 450 billion pages is too much for us But maybe site-based search and the like We're trying to get elastic search to be able to make All of our stuff more, uh, accessible the full text much easier to use And we're we're trying to build software that can be distributed That people can archive their cds lp's and books in a distributed way to go and participate in bringing these things online There's a couple of programmers in amsterdam looking for more programmers to help and the distributed web is sort of a new idea That uh, if there's some mechanism to go and weave it together to go and make a next generation web That we don't have to archive by taking snapshots We can actually archive working websites so that they'll live on for tens or hundreds of years after when the uh, the original administrators are gone So in conclusion Universal access to all knowledge. It's possible to do. I think it could be one of the greatest things humans have ever done I think it could be that one of the things that our generation gets to go and offer the world Kind of like the man on the moon or the library of alexandra in asian's days that we can pull this together We have the technology to be able to pull it off We've got the political will to live in an open environment as long as we don't lose it And we have to act pretty fast I'd say because most kids have turned to screens instead of books or or old materials be able to find out information They're learning from whatever it is they could get a hold of and the best that we have to offer is not on the net yet So we need to be bolder than we've been to go and make universal access to all knowledge happen Thank you very much We have time for questions No So i'm gonna be hanging out i'm at food uh the food hacking camp and i'm just gonna be hanging out here But thank you very much for coming. Have a great day. Is this yeah, it's on now. I'm very sorry that we didn't have time Okay, one more for barista. It was a very very good talk