 All right. So just by way of a very brief intro of Alex, I suspect everybody knows him and why we're here. But it turns out that in the early days of the Berkman Center 11 years ago, it was described that the Berkman Center was smart people in a hallway. This was two buildings ago in the top floor of Pound Hall. It was just before I arrived as a student as well. And it was two students, Alex and Wendy. They kept everything going at the Berkman Center. They were the lifeblood of Charlie and Jonathan's vision at the founding of the center. And Alex, in wonderful ways, has never left us, actually, despite going on to quite extraordinary heights in the business of lawyering and the internet. So we're deeply proud of what Alex has done, but also deeply proud that Alex has stayed close to us. He remains an affiliate of the Berkman Center. Some key points along the trajectory between being a smart person in the hallway with Wendy and Charlie to today. Alex was a litigator at Wilson-Sancini, one of the premier law firms on the West Coast doing this kind of work. He represented companies big and small. He represented Creative Commons during this time, Internet Archive, and others. So he was working on many of the key intellectual property and related issues in the space. He founded a group of Harvard Law School alumni in the Bay Area HLS Net, for which we were very grateful. He, along with another Berkman friend, Tim Ehrlich, brought together our community out there and started the tradition of doing Berkman West receptions, which gives us an excuse to go to San Francisco at least once a year. We owe you thanks for that as well. Along the way, he was recruited very early to Google to be, if not their first lawyer, one of the first lawyers. Not the first. Not the first, but very close to the first. And has represented Google on some of the most important issues they've struggled with along the way. One we're particularly excited about was the work he did to integrate chilling effects. Wendy's project from the smart people in the hallway days into Google. So to this day, if you type in something that's been taken down due to a cease and desist letter, you will often find a link to the chilling effects website. They're the only search engine to find knowledge that does this. You'll always find it. And Yahoo has just joined us in doing it, at least for part of it. All right. That's good. But you've led the way so far. I will also note that if anybody types Alex McGillivray or Alex McGillivray Google into Google, you'll find an interesting thing as the first hit always. But I'll leave that to you to do. And it does have to do with that. I don't know what it is. I think it's telling, in fact. Along the way, Alex also picked up this little job of representing Google in the Google Book Search Matter, which has become, obviously, an extraordinary story and one of great importance to all of us in the internet business. Many of us have called this the crucial topic of the internet era at this moment, and one that has exercised, those of us in the library business, in many interesting ways. It has also, I think, shaken up the industry in some fundamental ways. Many of us wish we had done it before Google did. The most recent series of things, of course, are that the settlement has been published and we're in the midst of discussing it as a world and as a community. And a judge will have to decide on this in the fall. The Department of Justice has recently said they're interested in some antitrust issues. We have Professor Elhay here who's working on it as well and Phil Malone, antitrust expert too. So that's another bucket of things going on. And to make matters even more interesting, Alex recently announced that he is leaving Google for Twitter to be their general counsel, which is happy news for Twitter to be sure if that news for Google. Anyway, I think you're here mostly to talk about the Google Books settlement and search generally. Yeah, we can talk about whatever people want to talk about. I should have left for that introduction. I'm sure three shades of red more than I was before. But one thing to add, just because there's so many interns in the room, the job that I do at Google is called Product Counsel. And it's a great job. And we're hiring. And basically what we try to do is work with product teams to make sure their products are legal, not just in the United States, but globally. And I really can't imagine a more fun job. So just a quick plug for that before I start. It's obviously wonderful to be back at the Berkman Center. This is actually my first time in the new building. And I really wanted to wear my I'm with Eon shirt. But I have some other meetings that require me to be a little bit more dressed up today. So what I was going to do is give a little bit of an introduction to Google Book Search and to the settlement. And then really open up for questions. Does anyone know how much time we have? 145. OK, great. So why don't someone yell at me if I'm still talking at 115? So put up your hand or tell me. Good, yes. But certainly if I haven't paused for questions by 115, let me know. So how many people are sort of intimately familiar with the settlement, would they say if you could put up your hand? So I sort of know how many people I should do. OK, great. So I'm going to give a brief introduction. And for all of you who are intimately familiar with the settlement, that probably means you've read the beast. I apologize for how long it is. But let's do a little bit of just high level sort of why we did Book Search in the first place and then why we saw the settlement as such an opportunity. So first of all, why we did Book Search in the first place very straightforwardly, it was to make books easier to find. Right around the time that we were thinking about Book Search, there was a great article in The New York Times that was a bunch of librarians lamenting of the fact that people were no longer using books. They were just using the web. And that librarian's lament was echoed to me personally by a number of folks, one of which was Larry Lessig, who had a research assistant from this fine institution come back to him. And the research assignment was, find me everything Senator X has said on topic Y. And the research assistant came back and had just reams of paper printed out. And it turned out nothing from before 1996. And Professor Lessig was a little apoplectic with the research assistant because the senator, of course, had been speaking quite a lot even before 1996. But the research assistant never found anything because the stuff that was pre-web, the stuff that was not available on the web, was not searchable in the same way as the stuff that is on the web. So the Book Search project really to start with was make books easier to find. The first and foremost lesson we've learned about, web search, if you think about back in the days of Northern Light and before Northern Light, the Yahoo directory, was that full-text search is extremely powerful. And then harnessing the power of that full-text search is a really big deal. And I'll reintroduce Dan just to say that Dan's been at Google, came from NASA, and has been at Google for, I think, four and a half years, five years, something like that. And he's the lead engineer on Google Book Search, so he's responsible for all of the technical innovation that's happening for us to be able to actually do this at scale. And Dan, you can interrupt anytime you want, obviously. I will. And people can direct questions at either of us. Dan sort of picked up a law degree on the side with all the negotiation of the settlement agreement. But the problem then was, how do you take the amazing amount of books that are out there and create a full-text searchable index of those books to make books easier to find? And there were a number of categories in that that were somewhat low-hanging fruit. So certainly there is a set of books that are born digital today and that can be gotten in digital format and then can be made searchable. And we have a bunch of different projects to do that. There are a bunch of slightly less new books that are currently in print, often owned in full by their publisher or even in some cases their author. For those books, again, you can get those books. They're usually pretty available in terms of the hard copy of the book. You can scan them in. You can create a full-text index from them. Then the harder part are the books that are not currently held by publishers or where the rights are unclear. And the public domain books, which there just isn't a whole lot of money in public domain books. It was easy from Google's perspective to think about the public domain books because we are a company that's willing to take some pretty substantial risks with an unlikely financial return. And on the public domain books, it was a really clear thing for us. Digitizing all of them makes sense. We should do that. And let's develop the technology to do that. And for the in copyright books that we didn't have or we didn't have permission of the rights holder, it was relatively a straightforward question from a company decision-making perspective that those books were also important just because we didn't have an agreement with the rights holder, didn't make those books unimportant. So we wanted to be able to, one, full-text search them and, two, be able to point people to places where they could get them. So Google Books 1.0 is a bunch of different deals with partners that own rights to the books, as well as a bunch of different deals with libraries. The library deals are of all types. Some of them are just for public domain books. Some of them are for non-public domain books. And those deals are to go in and scan the books, essentially take the entire bookshelf off the shelf, put it on a card, bring it to a scanner, do the scanning operation, bring the books back to the library, put it back on the shelf, and index those books in a way that you, how many of you have used the books product? Almost everybody, great. That's good. Mostly I go to places and nobody has used it, which is getting better now. More and more people are using it, but it's great to talk to a group that has actually used the product. So most of you will have seen, you do a search, you get a page from a book. That's likely a partner program book, especially if it's a new book, obviously. In the partner program, this is the deals we have with rights holders. You get basically a preview experience of the book, 20% of the book, a bunch of pages consecutively, so you can try to figure out whether this book is the book that you're looking for. So similar to what you would get if you walked into the coop, and we're looking through the book there to try to figure out whether to buy it or not. If you get just a snippet from the book, just a small piece of text similar to our web search snippets, that's a library program book, which we do not have a deal with the rights holder for. And there what we're trying to do again is to give you a taste of the book, of course, index it so that you can always find the books that you're looking for, give you a taste of the book, and then point you to various booksellers, but also to libraries to find the books. And I think at the Harvard Library, you can actually do a quick search and find where the particular book is in the Harvard Library. So you are uniquely privileged students in that you have a library that is orders of magnitude bigger than the average in the US. And then, of course, in the public domain books, you get the full book. You can download the whole thing. There's no license restriction. We have an etiquette request that we ask that you not do things like commercially re-host the book, but we're completely open with whatever you want to do with it. And then you can get it in PDF. You can get the text of the book. You can browse it online. You can sort of do whatever the heck you want. So that's books 1.0. Where we are in that project is we've scanned over 10 million books. And just to put that number into perspective, that number means that we had to develop an entirely new physical apparatus for scanning. And it's a whole new technology. We had to solve a whole bunch of difficult technical problems which Dan can talk about if people have questions. And we had to actually physically pick up 10 million books and go through and scan them. But we've got a more than 10 million book index. More than 1.5 million of those are in the public domain. More than 1.5 million of those are in the partner program. We have more than 25,000 partners in the partner program. We have more than 40 libraries now across a number of countries that are part of the library project, including Harvard. And we are continuing to scan at pace. We did not stop our scanning in the face of the lawsuits that came in 2005. And we still have the two US lawsuits outstanding, one French lawsuit outstanding, and a German lawsuit that was withdrawn. And I can talk a little bit about that if people want to hear more about that. But it was essentially withdrawn by the publisher and the association that sued us on the eve of them getting a judgment that they had been previewed by the judge. So that's sort of where the legal part comes in. Partway through the litigation of the two US lawsuits, one which was a relatively broad class action, brought on behalf of people who had copyrights in books in the University of Michigan originally, since amended to include an even broader class of folks, as well as a narrower one brought by the publishers, the five major US publishers except for Random House. So there's actually kind of six major US publishers, Random House is the one that didn't sue us. And just a sort of Berkman trivia note, the publishers were represented by Jeff Kunard and Bruce Keller, two folks that have been very familiar to the Berkman Center having done their clinical program for the last little while. So there was sort of Berkman on a bunch of different sides here. That's the wonderful thing about Berkman. It's always been a large tent with heated disagreements under the tent. But the- Some of them spilling out, too. Yes, some of them. Spilling out into drunken brawls depending on the hour of the day. You can do it with Charlie, right? Yeah, I agree. So the conversation on the settlement started a bunch of years ago. And it was this relatively- well, it's the only time this has ever happened to me at Google where I've come into a room with other folks who are not Google. And I've come with a presentation as to what I thought would be the various things we could possibly do together. And typically, you come into the room as Google, you're thinking big. We think as big as we possibly can. This is one of the only times where we came into the room and the other side was actually thinking bigger than we were. And so we started discussing the potential ways of doing a settlement involving the class that would provide for an enormous amount of benefit for our users. And from Google's perspective, the most important thing in terms of that settlement discussion was something that I know Berkmanites care a lot about, too, which was actually increasing access to this information. So we lived in a world where absent individual rights holder deals, the most information you were going to get about a book, was where to find it in a library and a short snippet. And we saw the opportunity to move into a world where in a whole bunch of different ways, once you found a book, you'd be able to actually read the book. So that was hugely valuable, we thought, to us. For the authors and publishers, shouldn't surprise folks in this room that authors and publishers want their books to be read. So on that principle point, let's get more access to these books. There wasn't a huge amount of disagreement around the room. There was also a question of, how do we compensate authors and publishers who want their material to be read? So there's lots of negotiation about that, and we can talk about the specific provisions there. And how do we ensure that libraries get to provide a lot of their core function, both in terms of preservation of knowledge and in terms of access to information, even to people that don't have any money to pay for it, in this new structure, this new world? And so we had this relatively long, relatively heated discussion that resulted in the 300-plus pages of settlement. And many of you have also probably seen that the University of Michigan just announced their amended agreement with Google, and that's up online. And you can take a look at it. So even more pages of legal document there, which at the core opens up access in a number of different ways. And I'm just going to go through those different ways, and then I may be able to sort of end a little bit early and open it up to questions. So the most basic way that the settlement opens up access to books is in the way that everyone in the United States, and this is a US-only settlement. So it's only for people in the United States when I describe these uses. And I should also be clear for the authors in the room. I'm not trying to give you any understanding of your rights under the settlement. You're represented by class counsel. And there's a great website for you to check out, which I will make sure link is provided at the Berkman Center, so I can put it up on the web. But it provides people with this ability, any person in the United States to, again, get the free full-text search results to be able to easily figure out where the book is for sale, find it in a library, but also to the extent that the book is out of print to get that preview experience, the idea that you can actually read 20% of the book to figure out whether this book is the book that you are looking for to go through the additional step of buying the book, finding it in a library, going in a library. All of that is for free to everyone in the United States, anyone who has a computer, can access this. And so that's the first thing, just getting the benefits that we currently have as part of the partner program for essentially all out of print books. And that, to me, was a huge benefit in part because I remember my time as an undergraduate doing my thesis work, where I would go to the library and come back from the library with 100 books. I would go down to my, check all the books out, go down to my carol, literally 100. Like, you literally do stacks. I did everything at the last minute. So I often needed to get them all out of the library before the library closed. But you'd come back with a whole bunch of books. You would then find that two of them were actually useful, and you would return all the others. So this allows you to very quickly figure out which books are useful to you so that you can figure out how to get those books. And it also drastically expands the ways in which you can get access to the books. So you can certainly buy them at Amazon or Barnes and Noble or A-Books for the out-of-print books or find them at your local library. There's a great service called WorldCat, which we hook into, that provides you with where the book is located near you. But we also open up access to those books in three principal ways that are important. So first of all, any person sitting at their computer can just click a button and buy access to the book. And that means that you get an online access. This is not a downloadable e-book. This is online access to the book. It lasts forever. We have no 1984 Amazon problem. Let me say that again. We have no 1984 Amazon problem. It's amazing the amount of reporting that has forgotten to mention that. But this is a book that becomes in your bookshelf forever and is a book that you have the full access to so you can actually read that book and all the rest of it. It is something that will be priced either by the copyright holder, like all books are priced today, or to the extent that the copyright holder either doesn't come forward or decides not to set a price. It gets priced through an algorithm that Dan is designing. And the idea behind that algorithm is to essentially simulate a market for all of these different books so that the prices become what the market might have priced the book at had there been a market. Which is kind of neat. The antitrust scholars have a lot of fun with this concept. And what it attempts to do is to, because it's very hard for groups to agree on pricing, it attempts to essentially allow for a simulation that provides for individual author choice. So for the authors in the room, if you are a author that believes that your book was always underpriced by your publisher at $99.99, you can price it at $120 if that's what you want. If you're an academic author, which probably many in this crowd are, and you want exposure for this book, you don't care about the money that you're gonna make from it, you can price it at zero. And you can have your book be freely available to everyone through that option. But if you don't come forward, you get a price that is reasonable for that book. And Dan, you can tell us a little bit about the initial pricing. Buckets are, what is it, 80% below? So, and there's an initial distribution in the settlement, and then the market will really determine, and the distribution initially is over 50% or 59% or less. Over 80% are $15 or less. And then the market will determine, I actually think in the internet, what you see as it drives prices down because it's competing against everything else on the internet, which is tough competition. And which is a good thing. So that's the first model. Again, it's available to absolutely everyone. The second model is if you're an institution like Harvard, you'll likely subscribe to the institutional subscription. And then what that means is that Harvard would pay an amount and there are specific objectives that that price has to meet in the settlement agreement. And those specific objectives are fair return and broad access. So the idea that you have a pricing of something, in this case a subscription, by a company, for-profit company that Google is, even though sometimes we act, we don't act like it. But you've got a for-profit company pricing a subscription product and yet is constrained by a settlement that forces one of the objectives that has to be met by that product to being broad access. Which to my knowledge is the first time that that's ever happened. And certainly not something that you have that you see in some of the other electronic distribution agreements that are out there. And of course that product only exists for a university or an institution that decides not to go with the 20% free and the individual purchase. So there's also competition in that respect too. But in that case, your institution subscribes, they pay a fee and it's free like water for all of the members of that institution. So anyone at that institution can access any of the books that are part of this subscription. And again in that circumstance we also make sure that there's no 1984 problem. Once you've subscribed to a group of books and for example, Professor Palfrey has assigned one of those books in his course, there's no way for the rights holder to withdraw that book from the subscription during the period of the subscription. So it could be that the next year there's a different set of books in the subscription based on the rights holder choice which is core to every single part of this agreement but for the entirety of that subscription there is that book. The third access model is the public access model. And so this is if you're not someone who is gonna buy the book, if you're also not someone who's gonna be part of an institution that subscribes, you can go to any public library in the United States and there will be at least one terminal. Obviously all the terminals will have the public domain books for free, the free search, all the rest of it. But there'll be at least one terminal in every library building in the United States that has a computer and internet access that has an entire access to the entire subscription for free. So that should be pretty much all of the out-of-print books will be available at any library that wants it in the US for free. And granted that's just one access point for that library. Wouldn't it have been great if there had been many, many, many access points we hope over time will be able to expand the number of access points per library. But it means that you never have to worry that the amount of money that you have will determine your access to these books. So imagine if rather than deciding to cut Harvard a check for the tuition, you could have gone to another school but still had access to the great library of Harvard, the great libraries of Stanford and Michigan and Wisconsin and Texas, all of these wonderful libraries really leveling the playing field in terms of our access to this information and knowledge. The last thing I wanted to talk about in terms of access provisions is that another thing that I believe this is the first time it's ever been done is you have right at the beginning of a new type of access to information, a provision that allows for first class access to people with disabilities that would prevent them to access the information. So right now the universe of books that are accessible to blind people is they're different estimates but it's somewhere in the hundreds of thousands of books, maybe even below hundreds of thousands. And the US quite frankly leads the way here. Other countries are doing a worse job at making books available to people with print disabilities. What the settlement does is it completely expands the amount of access that people with print disabilities have to these books and in particular to the hardest books for them to get access to which are these older out-of-print books. The newborn digital books, it's easier to make a version accessible to people with print disabilities. The older books you basically give up or librarians do it as one off. And what that means is that if you're a Harvard student that has a print disability, for example, you're blind, you go from having a syllabus come to you as a sort of nightmare. You get a syllabus and the first thing you have to do is figure out which books on the syllabus you can't get access to. And normally it's every single book on the syllabus to a world where you can actually encourage professors to choose books that are accessible to people with blind people, with print disabilities such as blind people. Because that universe of books goes from a relatively small set to a much more inclusive set. And I should have said this at the beginning, but the settlement covers the books that we will scan in the United States from all of our different library partners, including books we scan into the future, but it's only books that were published before January 5th of this year. So it's essentially all of these legacy library collections of any of the libraries that we end up doing deals with will be part of this class of rights holders of people who own US copyrights in books to include French authors and French publishers will include American authors, American publishers, is that broad class. It also doesn't include photographers or people who are essentially responsible for the pictures in books unless those pictures are in fact owned by the rights holder of the book. So we can go into more detail there, but the class definition is something that's worth a read. So anyway, the key point there is you've got a bunch of different access models to books. They're not the only access models. There are lots of ways to get these books. By definition, every single book that's part of our project is also available from at least one of our library partners. So you can always get these books for interlibrary loan, but it dramatically expands that access outside of the realm of people who can get interlibrary loan outside of the realm of the sort of serious researchers that you all sit next to every day in this institution, to anyone in the United States who wants to be able to get access to these materials. So we can talk a little bit more about various other aspects of the settlement, the interesting copyright tweaks that are in it, the interesting provisions for libraries, the other interesting provisions for people with print disabilities, the competition aspects and other aspects of the settlement. So what I thought what I would do is just sort of open it up to questions and if I find a lack of questions, I can always talk. I might just urge you to hit OrphanWorks squarely once. They know there are going to be a lot of questions about OrphanWorks. You may be just kind of clobber that class for a minute and then we'll come back. Yeah, so we did a brief log post on OrphanWorks which is worth looking at and I'll send the email out so that it can be distributed. But a couple of things to think about in terms of OrphanWorks. First of all, OrphanWorks legislation is something Google has been fighting for for years and years and years and years. And when I say Google, I mean the part of Google that is me has been fighting for that for a while. And what we think the right approach to OrphanWorks legislation is OrphanWorks legislation that would allow for mass digitization projects. And we've been working on that for a long time. What we still think is that that's important. It's important for a number of reasons. What I would say, so one thing, the settlement includes works that are orphaned and works that are not orphaned. Yeah, good, very good question. So there's no definition of what an orphan is. It's one of the things that's been a constant problem in Washington in terms of getting OrphanWorks legislation is that there is disagreement even between different members of the same types of communities. So even in the public interest sector you've got competing definitions of OrphanWorks. Authors have different definitions even within the author community. Publishers have different definitions. When I use it, I typically mean works where the rights holder is very, very, very hard to find. So the OrphanWorks problem is I've got a book and I want to make some use of it that I think I need the copyright holders permission for. And the question is there might be a copyright holder out there who would love to give me that permission. In the case of academic works, often the copyright holders would be glad for me to use it in any way. They just want it to be used. But that connection between me and the copyright holder is one that I can't make as the user. Well, that is difficult to make. Because in fact, this is one thing that I think is important that in fact for books and of course, different people have different definitions of Orphans, okay? And I was, in fact, I was talking with Charlie earlier and I used an example of someone, I'm sure people in this room have out-of-print books. Those almost certainly aren't Orphans. Even if I were happened to sitting in New Mexico and I didn't know the person, it's not an author, Orphan. They're an author. If you kind of made an effort, you can say, oh, they were at Harvard when they published it. Where are they now? And then in books, you have the author's name and the publisher's name printed on the object, okay? Now for, oh, let me grab some of this. Now, for Orphan Cookie, for images and other things, it becomes very difficult, right? There's nothing on the physical object that helps you figure out who the heck did this, okay? Books, they have the publisher's name, they have the author's name. Many of the books held in US Library and many of our partners are actually more scholarly books. They were often professors at universities at the time. Most of the people that I went to undergrad with back in 1980, I actually confined. I called the university. They have the Alumni Association. So you can find, and that's an important thing, that in fact, I think there's one issue with saying, it's a little hard to find them, but you could do it if you tried, okay? And those books aren't really Orphans, but for many of the casual uses you might wanna make, it's still kind of, well, listen, I was using in a class and I've gotta call three people and I've gotta, so, you know, like the author's guild is, I think, published in stats or I think it's the, maybe it's the British, that when they try to find someone 90% of the time, they're actually able to find them. Most of these publishers were acquired. If they went out of business, they were acquired. So I think a lot of the attention on Orphans, there's this broader Orphan work challenge, especially with images and pamphlets and all this other paraphernalia, sundry stuff. Books, it's not as big of an issue, although it's still an issue for some percentage of the books, but probably I think the bigger issue is that there are a lot of books that aren't Orphans, but it's still kind of a pain to go ahead and find who the rights holder is and the number two challenge you have with books is because of the statutory risks in copyright is someone will be confident enough that they hold the rights, they'll sue you if you scan it, but that doesn't mean they'll give you permission to sell it where they indemnify you in case they happen to screw up, right? Because if I'm 99% sure that I hold the rights and the amount of money I'm gonna make from a book is very small, do I take a 1% risk of a $30,000 or $150,000 penalty when I'm making $2? The answer is, so what happens, and I think that a lot of these books aren't necessarily Orphans, it's just they're practically dead in the market because their economic value is small compared to the cost of actually getting the rights holder and getting them to authorize them. They're completely inaccessible. So we talk about, I'll get you in half a second. So we talk about the twin problems of Orphanworks which are one, making it easier for the user and the rights holder to have a conversation, and two, making these things which are cultural things, right? Making them accessible to people. Yeah. When you price a book in which it's not public domain, it's copyright, but the copyright holder is sort of lost. You don't know where it is and so on. Do you put into some sort of escrow account whatever revenue you get from selling access to something like that? Yes, we do. So the way the settlement treats these Orphanworks is it provides access to them and really in the settlement, it's not that they're Orphans, just that they're unclaimed. The person hasn't come forward yet to claim them. So the idea is that the unclaimed works, that any revenue earned from the unclaimed works goes into the registry, this book rights registry and independent nonprofit controlled half by authors, half by publishers, has a board of directors will be incorporated probably in New York or Delaware or something, I don't know. I think New York. But that group, so we pay the percentage of revenue due to the rights holder to that registry. The registry then holds it for that rights holder and can use some of the money to actively go out and find the rights holder to the extent that there is unclaimed money sitting there. Would be accurate to describe this as an ASCAP-like compulsory licensing entity? It has some similarities with ASCAP, but a lot of differences and there are a bunch of things that we built into the agreement. So for example, there's no exclusivity in terms of the licensing that's allowed, even the exact same license can be done. We can go into more detail, but it has some similarities that the other thing I would say is that that money is held for a period of five years and it's a rolling five years. So in the fifth year, you start accruing or the sixth year, you start accruing a new year here while using this money first to pay the costs of the registry and then second, depending on whether it was through the institutional subscription money or through the consumer purchase money, it either goes to top up the rights holder and then go to a charity or as it does in ASCAP to be distributed among the people who've actually come forward. Right, and one thing, usually a compulsory model is compulsory and in fact this isn't compulsory, that in other words a better terminology might be default in absence, but you do not relinquish any control about in particular, you might sell it through Google for three years and then a publisher might come and say, I wanna do another print run of this and I wanna sign an exclusive deal with this bookstore to only sell it through them. You've gotta pull it out of Google and they can say fine, I can withdraw it from Google. Though again, no 1984 problem. Right, but the people who bought it can still read it. Can we just get your 1984 problem over? Oh yes, sorry, so Amazon through the Kindle I believe, sales or online book sales, they had to withdraw from people's collections. The book 1984 over a mistake that a publisher may have made about whether the book was in fact theirs to sell or not. So you had a Kindle that you opened up one day having bought 1984, you opened up the next day with the credit for the purchase price but not actually the book. Exactly, yeah, so let me go to Chris, I wanna hit one more thing on John's question about orphans, which is just to make really clear, one of the things that was important to us in the settlement, and I said before we've been pushing for orphan works for a number of years, we're continuing to push for orphan works legislation because it would give both broader access rights to us, but also it would give access rights to lots and lots and lots of people. And one of the things that we put in the settlement was a clause to ensure that to the extent that we are successful in getting good orphan works legislation, that that orphan works legislation will trump the deal in terms of how we treat the orphans. So there's a way in the settlement to, and it's section 3.8B for those of you that are curious about reading it, to actually get the benefit of potential future orphan works legislation. And just again, because of the orphan works, I do think it's in another part that is really important. In the settlement agreement, it explicitly states that the information of what books have claimed and who claimed the books is public information. Okay, and that's not by chance, because in fact, what you want is so people, when people come forward, they could use the registry if they want, if a rights holder says, please license this, but a lot of times you might not want to go through the middleman. You want to find who is, you want to go directly to the rights holder. And also, you'd like to find, tell me all the books that someone hasn't claimed. Okay, because that may be information that you use in your calculus of whatever you do with this about the likelihood of who's on the other side. Another way that I've heard some people state it is, one thing you know is most of the copyright maximalists will probably be claiming their books. And so it just helps you in your calculus to the extent a book has not been claimed and the fact that it's public was really an important part of the agreement. And one of the things, I mean, we deal with some of these worst collecting rights societies and one of the typical things that they do is maintain as private, which works are part of their society. You have this weird conversation where you're talking to a collecting society and you literally, they're saying, you must pay us X hundred million dollars and you say, great, what do I get? And they say, we're not going to tell you. And so one of the principal things, and not only that, but part of the reason why we're not going to tell you is because we don't want you to go to the individuals that we represent. And so one of the things that's been designed into the registry from day one is an inability to play that game. So sorry, thanks for the patience, Chris. Yeah, no problem. So this question actually isn't specific to Google folks. It's a more general Google question. So there are many people who are not actually happy with what Google does and criticizes the company and maybe sees 1984 style problems, not the one from Amazon, but the bigger 1984 issue. And one of the responses to that kind of criticism is usually the mantra of Google that says, we won't be evil or as you said before, sometimes we act like a nonprofit. And I want to get your comment on a quote in a wide article that came out yesterday talking about this guy, Thomas Barnett, who was the DOJ antitrust head. And so there was a meeting between Google and Yahoo and DOJ. And at the end of the meeting, the DOJ guy is so frustrated that he says, in answering this question, before you even get to it, I don't want the answer to be, please don't tell me or rather because the parties wouldn't do anything wrong. The DOJ was so frustrated with, we won't be evil. We only have the best wishes of the user at heart. They said, all right, that's it. We need something more than that. What can Google provide those of us in the public interest community that is more than just we won't be evil? Because some of us are actually seeing signs of evil. Yeah, so just to be clear, and I haven't read the article, so I'm at a bit of a disadvantage. We don't go to the economists at the DOJ and say, oh, we won't be evil, trust us. This is a lawyer. Or the lawyers at the DOJ, right? Economist or lawyer, right? You have a conversation and that's not the way it goes. So anyway, I would quibble with the characterization, but let me ask a, I guess a more pointed question. I was just talking about one way in which there was a concern that there are some games that can be played and the way we address that concern was in the agreement. I didn't say, don't worry, the registry's not gonna be evil. I said, the registry is contractually with a court-approved settlement, once it gets approved, unable to play that game that some people consider evil. So part of this depends on what it is you're worried about. And we can talk about some of the typical boogeymen. So people talk about, well, maybe, because Google has all my email, I'll be locked into Google. A better example in the book space would be, traditionally, Google was a vigorous defender of fair use rights in the crawling space or in indexing space in YouTube. And now when it comes to Google book search, fair use has gone out the window, fair use has been run over with a bus and you're now locking in this exclusive agreement that sure other people can negotiate, but they have to first scan all the books and get sued before they can get the same deal. Where did, so, where did fair use go out of the box? Well, that's something rather than arguing. Let me give the non-lawyer answer, which is... That's what I was gonna jump in. Yeah. We're still the company that's fighting more fair use battles than anybody else, including the Berkman Center for what it's worth. We actually have more current cases involving fair use than anybody else. But it should be noted, we're never on the plaintiff side. We're always on the defendant's side. And... I'm coming in the wonderland. I'm coming in the wonderland. So, Alex is leaving Google. I'm just glad, Charlie, that I didn't send you any emails about that case. But the thing, that's even true in book search, right? There is a group of works that have valid copyrights that we continue to scan that are not part of the settlement at all. And that's both the pictures, valid copyrights, no one's saying that they don't, and the unregistered US works. Nobody's saying they don't have valid copyrights. And we continue to scan and index those works. We continue to be open to lawsuits there. So, I understand that it's convenient to say we're abandoning fair use, but it's bullshit. And possibly saying the same thing, but possibly less confrontationally. The, we didn't, when we went into this settlement, okay? When we went into this settlement, okay? You know, we thought we were gonna win the lawsuit, right? We felt good about that. We felt good about our position. But we also understood that a tenant of everything we're doing is about fair use. And so, in fact, in the settlement, it was really important for us. Some of the questions people have been concerned. Well, does the settlement somehow erode fair use? And we consistently say no, and in fact, our actions suggest that, and Alex highlighted two of them. One, every time we scan a book, we're scanning all the images. We're not getting releases for the images. We still feel that's fair use, okay? Two, unregistered works, we don't get releases. Three, there's other stuff that's carved out, opted out, books, this and that. Okay, so, if we felt that the settlement eroded our defense in fair use, then we probably would be adjusting. Other things we're doing, which is based upon the exact same logic we're doing the settlement, and we haven't adjusted anything whatsoever in that. So, we really feel strongly that it doesn't erode the current position. We feel strongly in where we were before. And in fact, I will highlight that in the agreement, aside from the release of past claims, the payment is for when people sell their book, right? Scanning and indexing, I have to pay you to scan and index your book. We wouldn't have done a settlement because we scan and index all the time. So, it's something we thought a lot about. I mean, maybe one of the reasons that Alex gets so emotional about it is because this was really important, I think, in particular for him when we negotiated the settlement. And just to be- So, I think there's been some misunderstanding about- Dan always gets to be the nice guy, which is because he's nicer than I am. But just to be really clear, the fair use is what Google built its entire business on. Search is about fair use. And we're not backing them. It's not like I mince words at places like this that's public, right? We're not backing down from that at all. Yeah, and I'm recorded and great. That's not something that- Yeah, that's not something that we're backing down from at all. Yep. Let's see, I have two questions. The short one is the board of directors who runs the book's rights registry comes from publishers and authors. And the quick question on this is, of course, the universe that uses this material includes readers and libraries and other things. Might that board of directors be differently constituted? The second harder question. So, there will actually be money generated by orphan works. And despite the hopeful thought that 90% of these people will come forward and so forth. I'm interested to hear you respond to how the unclaimed money will eventually be used. Some of it does in fact go to Google because off the top comes some operating expenses. Some goes to the registry because off the top comes operating expenses at the registry. Then I gather some goes to the authors and publishers and then some to charity. But in each case, it's not clear why anybody has a claim on that money. How would you respond to that too? Yeah, so just quickly on the board of directors of the registry. The idea behind the registry is and it could have been constructed any number of ways but was to have an organization that represents rights holders going forward. There are many, many decisions that the registry doesn't get to make. And those are decisions often about users. For example, whether a user gets to get access to a particular piece of material is not something that the registry gets to decide. It's something that the individual rights holder gets to decide and those can be all sorts of different people. I also think that the registry is talking about having some sort of board of advisors but they're probably better equipped to answer that. That question. And then on that point, it is important to note that in the equation, consumers and libraries have a very strong voice because they will be negotiating in terms of whether or not they want this product, okay? And so in fact, if someone represents rights holders, people are gonna be deciding what to buy, what not to buy, libraries, if they're gonna buy the institutional subscriptions, they usually do it in consortiums, they're gonna negotiate very hard about what needs to be in any license they traditionally do that. So they're gonna have a voice in the process. It's just not on the board that is designed to represent rights holders. Yeah, and then in terms of the dollars for orphans. So again, the books, we won't know which books are orphans. One of the biggest problems of orphan works legislation and solutions is you sort of have to prove a negative to even get a handle on the scope of the problem. So the money that gets paid, as I said before, there's sort of two different streams. One is for the consumer purchase. The other is for the institutional subscription. I'll just go down through each of those independent streams. So first of all, like any other book, the money is what, 63.37 with the rights holder through the registry getting the 63%. Google getting the 37%, which with no additional costs off the top for Google. So then that 63% goes through to the registry. On the consumer access model, the registry essentially can pay that 63% out to the rights holder. Once money has not been claimed for five years, that fifth year of money can be used for additional operating expenses of the registry, including trying to go out and find that rights holder. Once that's been exhausted, it can be used to have rights holders topped up to 70%. So people who have come forward getting topped up to 70% of the sale price. The idea there was part of the costs of this operation that Google was just capping. I mean, I would call it a guesstimate, but it's less of a guesstimate than just a cap on the costs that we put against this, which is 10%. So it's really a 70, 30 revenue share with 10% taken off the top for Google costs. That's what makes it a 63, 37 split. So the registry is able to top up people who have come forward to 70%. And then any remaining money goes to charitable organizations and they are defined as charitable organizations that are a benefit to, I think, readers and writers. I can't remember the exact wording but charitable organizations. On the institutional subscription, as with many other types of group subscriptions, the way the money flows is exactly the same, 63 to the registry, registry able to go look for the rights holder, et cetera. After the five years, the oldest of those years gets put to registry operating costs. To the extent that there's money left over after that, the money goes to the group of rights holders that have come forward and are part of that subscription product. So to the extent you're part of that subscription product. And then not some bolts, but you aren't actually describing why anybody has a claim on this money. Yeah, so here's, let me weigh up leveling, which is one of the critiques that I have heard is the money that is currently going to pay rights holders who are now unclaimed or a strider form of this criticism is that are actually orphaned, that that money shouldn't go to them, it shouldn't go to the other rights holders, it should go to something else. And depending on who you ask, you get different ideas as to what that something else is. I've had universities say, well, it should come to universities. I've had readers say, well, we just shouldn't have to pay for those books. I've had public interest mine folks say, well, it should go to defending copyright actions, for example. So different groups have different views as to where that money should go, the way the settlement distributes it, which is the way many other of these sort of group licensing projects distribute it is to the people who have come forward. We think that that's a big benefit because it means that more people will come forward. And again, to the first orphan works problem, it's actually putting people in touch with that rights holder. But let me say something else about it, which is that all of you have the ability to change that equation, right? The settlement says to the extent that money is paid for a work, it goes into this fund. It also says to the extent that there's orphan works legislation, for example, orphan works legislation, which gives that money to universities or whatever your pet project happens to be, or from works legislation that just makes it free to the world, that that orphan works legislation can then trump the settlement. So there's a very straightforward path to the extent that you don't think that that's the right place for that money to be, you can easily go and get a resolution to the extent you can agree with everybody else who also thinks that their pet project should get this money, that you can come to that political solution of where the money should go, or whether there should be any money at all. Yep. Alex, you've done a really nice job here of sort of describing how different groups of people get access to individual books. What you haven't really talked about is the sort of incredibly valuable information that comes from the entire collection of works. So for instance, if I'm a lexicographer, having access to word frequency and word appearance over time, becomes essentially the basis of how you write the next generation of dictionaries. Or maybe a more tangible one and one with some real money on the table is that you now at the moment have the largest collection of parallel corpora in the world. So for people who don't study machine translation, parallel corpora are what you use to build automated translation systems. If Google has versions of Lewis Hyde's The Gift in the 30 odd languages it's been published in, that becomes the raw material for an English-Chinese translation system. So who's getting access to that data? I'm gonna let Dan answer that question, but I wanna just say two things first. First, you're not a plant. I want everybody to know. I gave Ethan no money for that question. And second is, in the world today, nobody. And then Dan can talk about the settlement. So let me finish the question, right? So who's got access to it? Can the general public get access to it? If someone like Harvard has partnered on it, can they get access to it? And if the answer to this is no one has access to it, isn't that more than a little bit evil? Right, so that's the current world. So wait, no, no, no, no, no. So let's go through, and he said nobody in the current world, but I'll expound upon that briefly. Right now in the current world, Google has access to the complete database. Yep, okay. Because of the copyright liabilities, if we let everybody come in and do whatever they want, okay, we don't open it up to anyone and everybody, we don't give it to lots of folks. The other issue is our libraries each only have different subsets of it, because if I scan an in-copyright book from Harvard or Stanford in Michigan, I only give that in-copyright book back to the library from which I scanned it. So each partner has a subset, and we're the only ones with the whole corpus, and when you talk about lexicography and many of these things, what you want is the whole thing. So this was one of the things that we were very excited about the agreement as individuals and as a company, but I'll just speak from an individual. So it's the creation of what's called a research corpus, and it's to do what's called non-consumptive research, and just so you understand what the hell that means. It was trying to capture the idea is there's some research where what you're trying to do is capture the intellectual output of that work by reading the whole thing. And that work, we have an institutional subscription so that people can get access. If you want to read all the different versions of Shakespeare to do some analysis about how they did different stuff over time and you actually are reading everything, that's what we call consumptive. Non-consumptive is much of this textual analysis, computational analysis, things where you really need a corpus and you're looking in terms of broad patterns. So he provided a couple of good examples, word use. You didn't talk about new search technology, which is a wonderful... Everybody can crawl the web to do new search technology, but some challenges in search really are about searching what I'll call as dense, long-form content where search still has a long way to go in terms of search works great on the web when it's shorter documents, lots of links, what about in books, okay? Machine translation, optimal character recognition, it goes on and on and on, okay? So in terms of the research corpus, it's our participating and cooperating libraries who have the ability to create up to two of them and Google's put up $5 million to help create these research corpuses. Now under the settlement agreement, in terms of access, there's been some misinterpretation of what's in there, but basically all you need to do, it's up to the libraries to run these research corpuses. It's not run by the plaintiffs, it's not run by Google. Google doesn't see any of your IP, Google sees nothing of what you're doing, the plaintiffs see nothing of what you're doing, okay? When you get access, the libraries have to come up with a way to determine who to provide access and all they need to do is make sure that what this person is doing is non-consumptive research. So if I was doing research in new search technology, I'd write a paragraph about this law that says, I'm doing research in new search technologies to search over large books. That kind of demonstrates that it's non-consumptive, that's what I have to do. Now in the agreement, part of this question is gonna go back to our participating, fully participating in cooperating libraries, okay? They are the ones that are gonna be able to decide who gets access, okay? They can, it is the fact that they as organizations have responsibility for protecting this because obviously this is lots of stuff, somebody could come in here and stick it out on the web of every book ever published, there's lots of stuff in there and they have to make sure that when people access it, they're acting accordingly and so they do have some responsibility here, okay? They can agree to sponsor any university they want or any person, okay, or any person they want, okay? There's no limits as to who they can sponsor. It is the fact that they have to make the decision, okay? It's not the registry taking on the risk, it is the libraries and so when it comes to Harvard, okay, now right now what Harvard just says, they actually, despite Bob's post and all, Harvard just says they actually haven't decided whether or not they would want to move forward with in copyright stuff or not, they wanna see what happens and they'll decide later, okay? Right, the university's librarian and Harvard just says we're in a wait and see mode, right? Any library can become, we will sign as many participating and cooperating libraries as want to participate so right now we have 31 partners, I expect most of them will come on and we could sign another 50 or 100 and any of them that are on have a direct, right? But then any of the libraries can sponsor, you know, so Stan, research here at the University of North Podunk, which is not a participating library in this, my limit at this point is convincing Harvard or another participant in an institution in Michigan and if I am able to convince one of those 30, I get access not just to Harvard's corpus but to the entirety of the course. The whole thing and so this is a, you know, part of this is look, there is this thing that someone has to run this, right? And so it's going to the libraries and saying listen guys, this is a hugely valuable resource, you can build up to two research centers, you can build a host center and some of those problems still need to be worked out. Is anyone actually doing it yet? So Michigan has started some stuff with, they haven't started building, they've started some stuff with the Hathi Trust where they're not doing it yet but we've also stated interest and even before this, as we're waiting to get the settlement approved to help to have them build one of the public domain content, right, and that we're very supportive of that. So for those of you that are really interested in this, give me your name. I've been pushing, John Wilkin is one of the people, but basically all of our partners, I've been kind of pushing them to say, listen, get going and start building some plans. Some of y'all might know Greg Crane, I know who's at Tufts who's done a lot of stuff. He's very interested. So I'm finding folks and I wanna get them going because that's one reason why Google has said we're willing to put up money now to do it with public domain stuff because we think that once you get it going, you'll start discovering things. Most people, I think you understand that probably it's the case that a small part of the room really understands the potential. A lot of the rooms say, oh, I kind of get that and it's kind of interesting. And a small part of the room says, what the hell are they talking about? And we need to make it more tangible by having Google actually do research and actually publish papers that they couldn't of if it wasn't for this. And just to be clear, this is another one of these things that absent the settlement doesn't happen, right? Once the settlement gets approved, that's when we have this ability to provide all of these in copyright books to these centers. And from a, when I look at, I know we haven't gone into a lot of the competitive aspects. I see this as one of the really strong pro-competitive aspects. People can develop new search technology. They own the IP. Google has nothing to do with the IP. Nobody has anything to do with the IP except that individual. I think it's a hugely valuable resource that I'm able to. I privilege at least one or two competitive questions by the filler writer. I don't know if you want to. Sure. I think I'll have to, shoot them. If you could, a really quick follow-up on what you just said, and a competitive question. Is it really the case that the entire research corpus of all of the books is available at each library? Because I read this element, there are two sites close to Google. Well, it's two, but each library has the right to access it. So these are gonna be electronically accessed, right? So the servers might sit in Illinois, or they might sit, but it doesn't matter where the servers sit. They're gonna have two host sites and the building of the host sites and then the right to access all of the participant and full participating libraries have the right for their students to access it. In the library digital copy, which is serving a different purpose, okay? Then libraries can get up to their whole collection. That's addressing some of the preservation issues because basically we have the right, if a library scans a certain amount, each to have their own copies. So that creates many copies. So then if any one organization, something happens, you still have multiple copies. And one big thing about preservation of digital artifacts is much like with physical artifacts, you want replication. So that way you don't have dependence on either one copy or one organization. So the competition question, one of the things that's gotten the most attention from a sort of antitrust endpoint is the Most Favored Nations Clause, that's 3.8A, which basically says if the registry or another entity liked the registry that the rights holders put together gives a better deal to anybody else doing something similar, then Google gets to take advantage of the same deal. So nobody can come in, do all the things they'd have to do to start up a similar project and undercut you, which can be seen as taking away really the only incentive anyone would have to try to do this. You wouldn't do it if the only place you're gonna end up is where you are. Why is that justified as a matter of public policy? So I appreciate that Phil did exactly what most of the critics of the settlement do, which is he stated the clause without the limitations. And I don't wanna ask about the limitations, because I've read them a thousand times. I have no idea what they mean. So maybe you could explain. What the hell the clause means. When would this apply? Tell us as precisely as you can. When would this actually kick in and then why should this be in here? Yeah, so I think Phil gets the clause mostly right. I think where we would quibble on the name of it. We would call it a non-discrimination clause rather than a Most Favored Nations Clause, whatever. The thing that you didn't mention were the two limitations. So the first is really straightforward, which is just that it's only for the first 10 years. And that goes to many economists, including any trust experts, talk about the importance and the pro-competitive benefit of a clause like this one. When the deal is extraordinarily long and the first mover is taking a lot of upfront costs and risk. So in this case, where the first mover, the deal is for the length of the last copyright in the last book covered by the settlement, by anyone's definition, that's extremely long. And we're doing a lot of upfront investment, not just the scanning of all these books, but also the 125 million dollars that other people don't have to replicate and the rest. So the reason why antitrust law likes these types of clauses when that's the case is because, for example, if I were to enter into a 100 year deal with all of the lemon growers in the world that I would sell lemons at a particular price, if the next day they are able to, and I give them an upfront payment of 125 million dollars, if the next day someone's able to come in and say, oh, I'm gonna sell it so that there's no way Google can compete, then that would mean that we wouldn't do that deal. And so the question here was, what about the follow on that we can't compete with? And that's why the second limitation is so important. So we were not worried about the person who goes and does a deal with Random House or even every single one of the variety of people that we were gonna spend 35 million dollars to make it easier for people to do these deals with, more easier, to make it easier for people to do these deals with. All of those people who have claimed works, to the extent that there's a deal for those claimed works, there is absolutely no operation of this clause. So the second limitation is only to the extent that it affects a significant number of unclaimed works. And just when people read it, it says a significant number of, I think it says non-registered rights holders. Other than registered rights holders. So that is unclaimed works. As we were talking about earlier. It's not orphans. It's slightly bigger than orphans. It's true that it includes no non-orphans, but it's slightly bigger than orphans. Yeah, exactly. So the issue is to the extent that Random House or anybody else wants to do a deal with the registry, sorry, Amazon or Yahoo or Microsoft wants to do a deal with the registry for every single person who's come forward, which is our understanding of what they can do. They can do that. We have absolutely no say and whatever they agree with, we have absolutely no way of arguing that we get any of that. But to the extent that there is follow-on litigation that results in a class action settlement, right? The reason why we put this in there is we thought this is such a good thing for the world that likely people will copy the hell out of it. And of course, we're not gonna, we don't believe in copyright over legal pleadings. We're not gonna enforce anything like that. There's no reason why anybody else would have to pay the $125 million upfront. So you have a very easy second entrance with a blueprint of a deal already done and we wanted to make sure that at least for the first 10 years we were able to compete with that new entry. And again, that's what interest law typically thinks not only is okay, but is a good thing because it encourages the types of innovation that we get, we got here with the settlement agreement in terms of actually getting to a deal that provides access to a lot of these works. So I can hear one thing that you guys can't, which is Dan Krancy's phone is buzzing like crazy. Don't worry, it doesn't matter. No, no, no, I can't ignore them. It's 145, which is usually these lunches Peter had by 130 or 135. So this is awesome. Thank you, Alex. Do feel free to send us email. Dan, why don't I give you the Muir email because you're gonna be a longer timer at Google? So Dan is dklancy at google.com. We're in the middle of, as John rightly pointed out, we're in the middle of talking with lots and lots of people as part of the very public procedure of doing notice and getting the settlement approved. And for those who want more on this topic tonight at the Boston Public Library from six to eight, several of us are talking about this in the RAV auditorium. And then on July 31st, the Berkman Center is hosting a group of 100 people. They're 95 signed up. So signed up fast, if you wanna be those last five as a workshop on what possible alternatives to this approach might look like as well. So please join me in thanking these two guys. Thank you all. Thank you.