 And of course, there wasn't a carbon arc spot anywhere. All right, welcome to day two of the workshop. We're delighted to have you all here. Today's two panels will be perhaps even a bit more workshoppy than the panels yesterday. These are going to be a little bit more interactive, perhaps, and hopefully solicit some brainstorming and problem solving from all of you. So we welcome your very active participation in these next couple of panels. The first panel is about the law.gov initiative, which Carl will introduce. And then, as he noted, Carl will then introduce me on the panel, and then we'll all have a conversation together. So welcome, and we're delighted to have you all here. Carl Malamud. Morning, thank you. My voice is a little rough. I've been getting over a cold. I'd like to first thank Ed and Joe and Steve for putting together this amazing workshop. This has been quite useful. I want to talk today about the law.gov initiative. But before I do that, this is about putting our legal system online so that people can read the laws that govern us. And this is not a new concept. This is something that started a long time ago. In the 1970s, believe it or not, the adjutant general of the Air Force decided that they needed to use these fancy new mainframe computers that they had. And they took a bunch of flyboys and put them in front of teletypes, and they started typing in the US code and the Constitution and court cases. And they put together a system called Flight. And they actually used it to search court cases. And that system flight was then taken over by Department of Justice, and it was called Juris. And at the same time, there were a series of private initiatives, something out of the Ohio bar. They began doing the same thing, typing in a bunch of legal cases. And that became Lexus Nexus. And in the late 80s, the Department of Justice decided they needed to get some better tools. And they were using this Juris system very intensively. And they cut a deal with West Publishing. And as part of that, they were a little bit worried that maybe there would be cross-fertilization between these private databases and the public database, and that wouldn't be fair to the private vendors. And so the Department of Justice deleted two million pages of case law that they had typed in and began using the West and the Lexus system. Now, two million pages in the late 80s, early 90s, was a lot. These days, it's something you put on a thumb drive. But in those days, this is a pretty major venture. In the early 90s, there were a few farsighted people. Tom Bruce is one of them, started the Cornell LiI system with Peter Martin, who was the dean at the time. Tim Stanley founded a system called Fine Law. And these people decided that all the laws of America should be online and available for everybody. And since the early 90s, there's been a series of efforts to try to kind of put the law online. The law is the last bastion of clothes on the internet today. If you look at health and finance and a whole variety of other markets, they've all had that leap to the internet. And there's been untold innovation. And if you look at the law, that has not happened. In the last few years, there's been a growing national movement. And that movement has really taken off. And there's a lot of people in the room here that are part of that movement. For example, Steve Schultz, who's going to be on our panel, has been working with the folks here at Princeton, Harlan U, and Tim. And they've done the recap system to make pacer data more broadly available. Our second panelist here, Tom Bruce, has been doing the Cornell LII effort. They do the US Code. John Jorgensen here is at Rutgers. And they do state law, New Jersey, a variety of other things. But these aren't the only folks doing it. If you look all over the country, there is a growing movement of folks. You may be familiar with the alt law system that Tim Wu and Stuart Sierra did at Cornell with folks at Columbia. There's Mary Alice Beisch here from the American Association of Law Libraries. We have vendors in the room, Ed Walters from Fast Case. And all these private and public folks have been working together, trying to put the law online. Our goal this year is to see if we can get this thing over the finish line and see if we can do some fundamental change. The law.gov effort is really a three-part effort. Part one is talk, and that's what we're doing now. We're doing a series of workshops across the country at 10 major law schools. Larry Lessig and John Zittrain are doing Harvard. Pam Samuelson is doing Berkeley. These are all fairly distinguished law professors. We also have my old boss, John Podesta, the Center for American Progress is convening a workshop. And the idea is that during these workshops, we'll try to tease out some of the issues. What does it take to make the law more freely available? Phase two, after the talk, is a report. And our hope is to put together a very detailed report for the government that says, here's what it would take to create bulk access to authenticated information of all primary legal materials in the United States. This is the federal, state, and municipal levels. And the basic concept is that if it's the law, it should be available for everybody. And so phase three, so phase one is talk and workshops. Phase two is write a report. Phase three is go to Capitol Hill and sell this. Go to Washington and go to the White House and sell this, and try to convince the government that this is worth doing. Now you may say, is there a real problem here? We already have systems like West and Lexis and others, and is this just a bunch of corporate bashing? Is this some communist conspiracy theory to nationalize the law? Well, when you think about it, nationalizing the law might not be a bad idea. But no, this is about innovation as much as it is about democracy and justice. Today, if you wanna buy a decent collection of case law, it will cost anywhere from 10 to $30 million to procure the raw materials you need. You have to go to a vendor and buy them, or you can buy all the books because there's no copyright in the underlying law, and you can send the books overseas and have them triple keyed, right? You can't just scan them and run them through OCR because the law has to be perfect. You can't have typos and OCR errors. And so you triple key, and which means you type them three times, and if you type them three times, you know you got it right. And that costs anywhere from 10 to $30 to $50 million to get into business. And what that means is some kid at Stanford is unable to download the law and get into business of providing a better shepardizer or a better citator or some other kind of system. There's another couple effects cell. Because we charge for the law, and we charge a lot for it, the federal government spends $250 million a year buying access to its own raw materials. The courts themselves have a $150 million contract with West and Lexus, and they make $100 million a year charging for the PACER system, which Steve will talk about. What that means is that researchers are unable to download the corpus and do research. It means that public interest groups can't download the corpus and audit. One of the things we found when we were able to obtain 20 million pages of the PACER database, is we properly did an audit for privacy violations. And the PACER database was rife with social security numbers. Thousands and thousands and thousands of cases. And I'm talking a 300 page document in a medical malpractice suit, which had the list of all the patients of a doctor, their home address, their age, the social security number, and their psychological problems. Available for anybody that's willing to pay eight cents a page on the PACER system, available on West, available on Lexus. And it's only when public interest groups got access to this information that we were able to audit it, send the audit to the judicial conference, and get them to change their privacy rules. So this is about innovation, but it's also about justice. And then finally it's about democracy. It's about a homeowner being able to download a building code and check whether their contractor has actually done something up to spec. And I don't know if you know this, but today if you want a building code, it's gonna cost you $100, even though that's the law of the land. Even though there are court cases that say there is no copyright on building codes because it's the law of the land. And so that is the conundrum that we're trying to get over this year, is to try to convince policymakers in Washington that the time is right to do something fundamental to make the law more broadly available. And we think the time is right because if you look at our new president who used to teach constitutional law, and you look at our solicitor general who was Dean of Harvard Law School. And in fact, if you look at the functionary in the office of management and budget in charge of electronic rulemaking, turns out he's a Distinguished University of Chicago law professor. And when I go and talk to people like Cass Sunstein and OMB, they get this idea of law.gov. They get the idea that maybe some fundamental change is possible. And so we're hoping this year that that can happen. So I'm gonna turn this over next to Tom Bruce and he's gonna talk a little bit about how things happen in other countries. This is the, we are not alone speech. And then we're gonna have Steve to talk about the federal level and then John to talk about the state level. So I'm hoping to accomplish a smooth transition on slides here, but I know that it's just not gonna happen. Oh, good, should I let everybody in the world into your machine, Carl? Oh, do allow all of those, that's fine. Cool, excellent. How does this thing work again? All right, so Carl has already indicated that I am older than dirt and have been doing this since pterodactyls were wheeling around in the sky. And I thought that this morning it might be useful to spend just a few minutes going over some of what has happened in this area of open access to primary legal materials, both in the United States and abroad and both because of middle-aged memory and time constraints, there are no doubt a bunch of people whose work I'm gonna slight this morning. It's not intentional, it's purely my own dysfunctionality. Start with some ancient history. In 1992, we started our operation at Cornell. I would like to say we were the first open access provider, but that's not true. We were actually the first open access provider in Gopher and on the web. Very shortly thereafter, similar operations started in Canada and Australia and the interesting thing about them is that in the intervening decade and a half, they have become the de facto comprehensive national resources for those two countries and they are both third-party operations outside government. One is a consortium operation of the Canadian Bar. The other is a jointly funded project between two Australian universities and they are occupying positions of dominance in the field similar to those occupied by West and Lexis in the United States. Now, if you look around the world, you'll find 23 institutions worldwide, many of whom have taken on the sort of LII name that offer open access to legal materials at national scales, a national scale. And the ones I'm talking about there are only the ones that are known to each other that cooperate to go to conferences, et cetera, et cetera, et cetera. Of course, there are countless self-publishing operations in courts, governments, legislatures, all doing the same kind of stuff. There's a lot out there already and despite the dismal portrait, wow. This is scary. I just upgraded my OS, so this is obviously a new feature. I'm just pleased it's not commenting on the presentation. In the last couple of years, a few things have happened that provide a kind of interesting picture of what the way forward might look like. I mean, first of all, there's an awful lot of globalized litigation now and it's leading to a lot of trans-border questioning about what the law is in jurisdiction X. To find that situation in the US, you would have had to look back into the 20s and 30s when federal law was coming into the Ascendant over state law in certain areas and we had a similar situation here. What that's led to is a recognized need to answer legal questions that have a really strongly bifurcated price sensitivity. On the one hand, this is some billion dollar construction project in London financed by an American bank with a Uruguayan guarantor in a German construction company where something falls down and everybody starts doing each other. On the other hand, it's a German guy marrying a French girl, moving to Denmark, having two kids and getting divorced. The amount of money that either of those groups of parties are willing to pay to get their questions answered is actually quite considerably different and the means by which they pursue those answers is likely to be quite considerably different and unfortunately the only mechanism we really have in place at the moment serve the high end of that market, not the low end. A final recent development that I think is really, really important at least to information scientists, if nobody else, is the amount of activity that's been generated by regulatory harmonization in the EU. These guys are all trying to figure out how to find each other's law. They're developing ontologies, they're very interesting retrieval systems being built around this stuff and there's money for it because everyone knows that it's very much in their economic interest to have this stuff happening available in some kind of open standards regime. So typically startups of open access to law operations have begun by advancing a series of essentially normative arguments. It's the right thing to do. Access to law is a fundamental human right. It allows us to police government operations and that set cluster of assertions around the idea that ignorance of the law is no excuse and therefore the state has some obligation to actually tell you what the law is. These are all good arguments. They're all very common arguments. They're not the only arguments because there are a series of pragmatic arguments that surround this and in some cases are more compelling and can form a very interesting basis for advocacy. Trade facilitation. If I want to sell something outside the United States I need to know what the regulations are. If I am outside the United States and I want to sell something in the United States I need to know what the regulations are. Government efficiency, government cost, government waste, the same set of arguments Carl was advancing a moment ago. We spend a lot of money on this stuff and we really don't need to be. Risk management for business. There's a tendency to think that this is all about formal legal research or privacy or this or that or the other thing. There's an enormous amount of what I would call casual professional legal research that goes on out there that is some business person trying to figure out what she can do or what she is required to do. The people who use our system, 90,000 of them every day, walk up to them saying what am I expected to do? That's the question they're really asking. What am I expected to do? And of course, the benefit of reducing costs of litigation legal services simply because we help to drop research costs and presumably make the whole process more efficient. There is however, one surefire part. I'm involved in a study right now being run out of Canada that involves the Canadians in operation in India one in South Africa, one in Burkina Faso and it's caused us to review best practices of open access operations for the last 10 or 15 years. The one thing that is common to everyone's experience and the one argument that has advanced the ball considerably at the startup phase of every single one of these operations has been the idea that government's access to its own work product sucks. It's just horrible. They know it's horrible and they want better access to their own stuff than they've got and that's been absolutely true everywhere globally and it is certainly true in the United States. That's the one thing we all have in common. What are the differentiators? Well, if you look around the landscape of what these guys are actually doing, you find very different kinds of operations and there are models there that I think that can be sort of differently niched in a way that's helpful. One differentiator is simply how people focus, right? There are prototype operations like ours that don't pretend to be comprehensive. We don't offer comprehensive anything. We have the US code. We have Supreme Court decisions. That's not all of federal law. That's not all of statutory law. It's not state law. There are comprehensive operations out there like the Australians, like the Canadians that are de facto full-born national resources. There are people who self-publish. The New York Court of Claims is one. The Supreme Court is one. I mean, every federal court theoretically since 2002 is one versus publishing others. Third-party operations that are putting up other people's stuff for free like John is doing with all of New Jersey cases at this point. Are all of New Jersey case law or much of New Jersey case law? Well, we don't have a bank file yet. There are people who are stovepiped on particular types of things, opinions versus statutes versus regulations. In general, legislatures who have to be elected have tended to step up to open access faster than the courts have. And they are very, very, very avid self-publishers. Differentiation of scope, right? National versus not. Full national scope versus not. That's the sort of thing I was just talking about. Centralized operations that imagine themselves to be huge sort of collection heaps of stuff that they gather up from others versus people who are building federated systems on open standards or who anticipate the existence of open standards and are building their systems in a way that is federable. There are people who are operating on a sort of permanent, ever-growing basis. I'm just going to keep piling up case law indefinitely for as long as I can versus people who imagine themselves to be much more technology transfer organizations. This is the fascinating thing about the South Africans. At the moment, they're doing work for 15 southern African nations. Very well aware that those nations are sensitive to what they see as imperialism from the South. Very well aware that for them, law is a very profound source of national identity. Very well aware that most of their collaborators are about two minutes post-colonial. And turning around to them and saying, look, we'll help you scan this stuff. We'll help you put it up. We'll aggregate the collection. We'll make it work. And at the point where you've got your operation up and running and you're standing on your own two feet, we're going to pass it back to you. It's a specifically devolutionary model that I think is absolutely fascinating and brilliantly sort of suited to their situation. Finally, the biggest differentiator of all sustainability models. How the hell are people paying for this stuff? Well, first of all, mostly they're not. We're all sort of living on a public dole to some extent. There's stuff that's self-operated by the law creator. There's stuff that's grant funded, which as we all know is easier in the establishment phase. People are much more willing to pay you to put stuff up than they are to pay you to maintain it. Australians came a terrible cropper on that about two years ago. They began on a series of research grants, discovered that they could get people to pay them to put stuff up, and ended up with what I thought of as a kind of Roman conquest model. They're funding operations at the center by taking on new territory. And at the point where their research grants crashed, because guess what? They weren't really doing any research. They were in terrible, terrible trouble. This is a model to be avoided, but it's one that's very easy to fall into if you're operating from an academic base. Stakeholder supported. That's what the Australians have moved to. It's kind of an extortion racket, actually. They have put up the names of the largest law firm users of their service right next to their donor list. And to the extent that the two are not the same, they are sending development officers around to talk to them. Those are not large hairy guys. Actually, she's a very nice woman. But there's this sort of implied threat there that I think is kind of interesting. The Canadians have a $30 head tax on every lawyer in Canada. That's why they have a staff of 45, and I don't. We tend to work on a combination of grants and a public donation model, the sort of NPR model. And then there's a raft of operations out there that are providing open access on some cross-subsidized basis. Ed is doing that at Fastcase. Our friend Tim Stanley is doing it at Justia. These are all people who are using other core business to finance the free distribution of legal information. The final thing, standards awareness in these operations is really, really uneven. A lot of early implementers, us included, were very much full speed ahead without any awareness of what anybody else was doing, without much concern with it, frankly. It was like, let's get the stuff up there now, now, now, now, and nobody really sort of thought about how that was going to play out in the long term, what it had to interoperate with, what it had to work with, et cetera, et cetera, et cetera. And frankly, a lot of the community internationally has been a long time catching up on this. Another thing that has tended to happen is that a sort of extreme process orientation of lawyers puts a great belief in exceptionalism, both for law in general and particular jurisdictions, in particular, tends to lead to kind of a bad result when it comes to standards. Well, you know, we're not like anybody else. Our law is different, or our process is different, so our metadata has to be different. Yeah, that's never true. It's no truer of law than it is of anything else. There is real naivete in some quarters about what a long-tailed problem we're dealing with, because there are times when it does seem like the exceptions outnumber the rules, but there's still quite a lot that's possible. I think there's great fear, certainly among people like myself, of the standards process as a kind of carpet. How long are we gonna be involved with this? How long is it gonna take? Can't we just put some stuff up? And I also, one needs to exercise a little judgment there, and I think what has happened in general is that the newer operations, particularly those in the EU where they have had funding in an economic incentive to deal with standards-based stuff, have tended to be better about this. So for example, we've got a workable URN spec for law coming out of Italy at this point. There are guys in the Netherlands who really started out as law and AI types who have done a pretty good interchange model for XML markup of legislation. There's a bunch of stuff like that floating around, and we've been very slow to put that together and put it out where the community can really benefit from it. And that's it for me. That's just stage setting, and I'll turn it over to Steve. I'll try to turn down this mic a little bit to see if we can avoid some of the feedback, but if you can't hear me, yell at me. So I'm really honored to actually be up here with these guys. I and the RECAP team are newcomers to this space, and our project is in some sense less ambitious, and not as far-reaching. But I think we have an interesting story to tell about how our efforts around liberating federal case law have worked out so far, and hopefully in the, I guess we would be a prototype model in your taxonomy, how our prototype has worked out so far. So from the outset, I wanna give credit to the other folks in the room who have worked on the project. Harlan Yu and Tim Lee are in the room somewhere. There's Tim, and yes, and Harlan's in the back. They're both graduate students here and did the bulk of the coding work and Ed and I helped to direct traffic. So I think the bigger question is, how do we translate some of these age-old principles of open access to the law into this new space where the media of the law is no longer paper and physical courtrooms, or at least not exclusively? How do we translate it into this space where so much of this is digital and expected to be accessible online if our knowledge of the law is expected to be attained through these means? What does that look like? What does the courtroom of the future and the output of the courtroom of the future look like? This is a diagram from the Los Angeles Times from some time in the mid-90s where they asked an artist to envision what a courtroom of the future would look like. And as you can see in the middle, there's this bizarre, mystical being surrounded by laser discs, I think, which is dispensing justice by a laser beam or lightsaber or something. And there's a robot bailiff and around the edges, they talk about the various means for the public to gain access to the courtroom and the proceedings. So, for example, hovering cameras in the courtroom, this of course, cameras in the courtroom is of course an active issue right now. We just saw the decision of the Supreme Court about allowing cameras in on the Prop 8 trial in California. This particular artist thought that there would be hovering cameras covering every aspect of the court proceedings. The artist also envisioned these VR goggles for members of the jury so they could see exactly what happened. And they envisioned a computer terminal access to court records. And we in fact got that one. It's called PASER. So PASER stands for Public Access to Court Electronic Records. Most attorneys in the room have probably had to deal with this system at some point. It's run by the administrative office of the courts, which is this interesting sort of graphed on administrative body to our Article 3 courts. And the access is really an access program which sits on top of a larger system, which is a phenomenal electronic federal case filing system that attorneys use in the normal course of litigation. And in fact, since a couple of weeks ago now, all district courts and circuit courts are requiring e-filing for all proceedings. So everything in our federal court system is running through this system, which is fantastic if we're trying to think about unified ways to provide electronic access to case law, at least federal case law. And it's amazing that this system was built and it exists. It was started in the late 80s by the courts because they on their own decided that they needed to upgrade their systems. It was a dial-up system for a long time and then they eventually migrated it to the web. But there are some problems, specifically with respect to open access. The easiest one to pick on is the paywall barrier. So in order to fund the system, Congress told the courts that they would like them to charge for access because Congress decided that they weren't willing to outlay the necessary funds to build and run the system every year. And they told the courts that they should charge the amount that it costs to run the system. And the way that this was implemented is by doing a page-by-page fee in this, at this point it's eight cents per page. And so if you want access to these records, you have to give the court your credit card number and every time you view a document you'll pay eight cents a page. You'll also actually pay eight cents a page for every virtual page of a docket that you look at and you don't know how big the docket is before you look at it. And you also pay eight cents a page for every search that you wanna do, including search results that say there are no search results. So as you can imagine this erects some barriers to open access. There are also issues with the interface of the system itself. There's no full-text search. There's no document authentication. We talked a bit about that yesterday. The documents themselves aren't structured including the dockets. So there's no easily machine-readable way of parsing them. And the user interface is just not really that great. Now these might not be issues if the underlying data were accessible and someone else could build a better interface, but in order to preserve the revenue stream which funds the system, the courts have not made bulk data available in any fashion. There are also tremendous privacy concerns. Carl actually has done the most work in this area when he did his huge privacy audit of PACER documents. As it turns out, you can't just draw a black box over text and let it magically go away. And there are compsome, I wouldn't say complicated rules, but there are a series of rules for things that counsel should not allow in filings, but nevertheless makes its way into these filings. And the courts haven't taken an active role in policing this stuff, although they've taken a somewhat more active role in policing and at least publicizing the issues after Carl embarrassed them. A lot of this change, a big development was in 2002 with the E-Government Act. The text of the statute was changed to say not that the courts must charge, but that they may charge only to the extent necessary. And in the accompanying report, they noted that they intend to encourage the judicial conference to move to a fee structure in which information is freely available to the greatest extent possible. And I think Mary Alice has a thing or two to do with that particular language. The reality is that if you look at the judiciary's budget, and this is their report to Congress for what they plan to spend in 2009, the reality is that this small exception for charging to pay for the system has become, it became a foot in the door to fund all sorts of other things in the judiciary. So if you look at their actual records, they take in over about 110 million in public access. And all of this stuff in red is not clearly related to providing the service for which they're charging. In fact, some of it is only very indirectly related to public access at all. So the question is whether or not they're not just violating these principles of open access, but in fact violating the statute. And Lieberman asked them about that. So our solution and our prototype system for both improving the situation on the ground and in some sense advocating to the courts for change in the system is a Firefox plugin called Recap, which these guys worked on. In fact, they came up to me after I was doing one of these pitches about how much I dislike, how much I like Pacer, but I dislike its shortcomings. They came up and they said, well, why doesn't someone build a Firefox plugin that every time you download one of these non-copyrightable public works, it uploads it to a server for anyone to download for free? And I said, well, that sounds like a good idea. They came back to me a couple of months later and said, hey, we have a prototype. So that is exactly what Recap does. The two parts are when you download a document from Pacer, it automatically uploads it to the internet archive. And we actually parse the HTML from the dockets and the case information and turn it into, and this gets to a standards question, which I would love to discuss, we turn it into well-formatted XML. It's an invention of our own, this XML format, and it's probably got problems with it and there are probably standards out there or people who would like to write standards who could tell us how we should be doing it in a more standard fashion. But at the very least, it's much easier to transform this XML than the original source HTML. So that's one half of what the plugin does, upload. And then the other half is that when you're browsing the Pacer site, or any of the many Pacer sites for each of the individual courts, and you're looking at a docket, if one of the documents listed is available for free, it will insert this little icon and say, hey, you can download this for free instead of paying for it from Pacer. This is on the Pacer site itself. The plugin is just rewriting the page as you're looking at it. So we've had a good reaction, a lot of users. We imported some big sets of Pacer data that already existed. It's far from comprehensive, but it's showing that there is an alternative way of doing this. And it's getting the attention of the administrative office, which is useful for our second goal, which is broader advocacy. So as far as recommendations for the administrative office, in a perfect world, this is what we would like them to do. And I think these can probably be extended more broadly to a lot of the primary legal materials that we're talking about. In the short term, we think it's reasonable to make Pacer searches, dockets, and opinions available for free of charge to all users. These are works of the court. They are the most important elements for understanding what the law is. And in fact, the courts themselves have tried, to some extent, to make these things available for free, but it's been in a very haphazard way on a court-by-court basis. We think it's equally important that they initiate an in-depth privacy study around these personal information issues. And to that end, Carl has a whole series of recommendations for them in his report. And it's equally important that they sign all documents so that we can digitally authenticate them and we can ensure to our end users that what they're getting is an authentic copy of these documents. Big guys like Lexis and West can rely on their reputation to essentially guarantee to their end users that something is authentic. But for anyone else to provide these, an attorney would in fact face some risk around malpractice if they were relying on something which they didn't have good reason to believe was authentic. So in order to provide those assurances, we need the original documents to be authenticated. In the medium term, it would be great if they also started offering dockets in a reasonably formatted machine-readable format so that we didn't have to scrape the HTML out, sometimes make mistakes, not get all of the information, and so that other people out there that want to do interesting creative stuff can do it as well. The same would be true from standard RSS feeds so we know when things are new. And we would love them to ask Congress for the money to pay for the system. And that might be the most difficult thing to persuade them of ultimately, and that gets to some very interesting political questions, whether or not they should be asking for more money to pay for public access or asking for more money to keep judges from getting shot. Sometimes if you frame it in that way, it's difficult to always make the case that public access overrides safety of individual judges. But you don't have to make the case in that way. And the ultimate goal is that by the end of 2010, in our perfect world, assuming Congress gives them money, that they can phase out Pacer's paywall altogether and begin providing bulk access so that we and others out there that want to provide this information and find innovative new ways of letting people search and do other interesting things with it can do so. And so that those barriers that Carl was talking about are substantially lower. So that is what we've done and I'd be happy to talk more, but I think it's best that I turn it over to John and we can take questions afterwards. Oh, and you want this back, right? Two slides for me, just notes. My name is John Jorgensen. I'm a librarian at Rutgers University School of Law in Camden. For those of you not familiar with Rutgers, Rutgers has two law schools, one in Newark, New Jersey, up in the north and then Camden down in the south. I'm reminded often that it's important to distinguish between the two. What I thought I would do today is first describe to you the kinds of things that we've been doing at Rutgers since the late 90s and for the purpose of just describing what our experience has been in publishing state legal materials. And I think just telling the story can get out a lot of the issues and how the issues have been addressed just in the state of New Jersey as, and as a case study perhaps or just as what is probably something that would happen typically anywhere because the issues are pretty much the same anywhere. To the extent that we have time, what I didn't like to do is maybe talk about some of the issues that were raised and discussed in a more academic sense yesterday and talk a little bit about how we've dealt with them in a practical manner. And then at some point, I'm sure Carl will tackle me because I'll be talking too long, but we'll see how far we get. So what we did is I started working at Rutgers in the late 90s, about 97. I was very impressed with the Cornell Legal Information Institute as a brand new librarian and just stopping practicing law. And Rutgers was a tenure shop for librarians. So they hired me and told me you have six years to do something really interesting or will fire you. And I thought, fabulous. I can do something really interesting. I wanna do for New Jersey what Cornell is doing for the US Supreme Court. And they said, well, we already looked into doing that and they said, no. And I said, well, can I still try? Can I do something about it? Can I talk to the dean, please? And there was shaking of heads and they said, yeah, go ahead and talk to the dean. The dean thought it was a great idea for his own purposes, of course. I mean, not, you know, if this worked, it would be a great coup for the law school. And that's what he was thinking about quite rightly. And he said, write me a letter. So I wrote him a letter and he signed it and he sent it to the Chief Justice of the New Jersey Supreme Court. The Chief Justice is not the person who originally said no, of course. People who said no were the people who were maintaining the records in the administrative office of courts. The Chief Justice to the dean said, this is a fabulous idea. Let's do it. And then it happened. And then I got a call from the people who originally said no, saying come here and we'll make arrangements. And arrangements were made and we were off and running. And it's worked out really well. They've been making things available. And at this point, the New Jersey courts actually post their own decisions, but they keep it up for about two weeks because that's all they really wanna commit to us, Professor Nissenbaum was talking about in a slightly different context yesterday. They're not in the business of publishing and archiving. They're in the business of deciding cases. So they have their public face but they rely on us quite happily to be their archive and long-term public access place. And it's worked out really well. Now, to take a step further, the next thing that happened in our experience was something that happened outside of ourselves. There was a small publisher called Barkley Publishing which published the New Jersey administrative law reports, a fairly obscure publication, but for the people who use it, it was extremely important. Administrative law governs a lot of our lives. This was a key series of documents that lawyers use all the time in New Jersey. Since it's fairly obscure, it wasn't a big money maker for Barkley. And then Barkley was published, or excuse me, purchased by the West Publishing Company which then immediately canceled the title. I heard about that and again, I have to do something really interesting in the next six years. So I wrote a letter to the chief justice or the chief judge of the administrative courts and we actually crossed letters and the day after I mailed mine, I got a letter from them saying, would you come down here and talk to us about maybe doing for us what you're doing for the New Jersey, for the courts proper? It worked out very well. What happened? Word got out. It was a reliable thing and they were out of options and it worked out really well again. Less than a year later, I got a call from the New Jersey District Court. Same thing again. And it went on and on and on from there. I did some really interesting things and things went well, went well for me personally. And then I got tenure. Yay! But the point of mentioning this is not so much me, but the institution. Once I was done with all this, they had been giving me lots of time and a good deal of, it wasn't above the line money, it was below the line money. I was spending time and resources and computer space and a lot of room on our T1 lines doing all this. And when I was done and the project was finished and I had what I needed and I proved to the university what they needed to prove, things didn't stop. I was then told that this is a part of library operations. You may now call yourself, as a matter of fact, the Digital Services Librarian. Thank you very much. I have it on my business cards too. Um. But the important thing about this is that it's part of the institution now. And that's something that we do. The collections that have been built are part of the library collection. And that's how the institution thinks about it and that's how the system functions. When I was thinking about going to library school, there was a great medical librarian who was, he's still the head of libraries at Thomas Jefferson University. He told me, his idea was, what's the difference between a library and a publishing house in the electronic world? And his answer was a very pithy attitude. And that's something I've taken with me. And so that's how things have worked out with us. And I think it says a great deal about how dealing with bureaucracies is, again, who's surprised by any of that, but it's how things have worked out for us. As to some of the issues we were talking about yesterday, authenticity is something that we actually take extremely seriously. And we've had to deal with it with the courts. Now, I don't wanna sound like a conspiracy theorist, but authenticity can be a very dangerous issue when you wanna think about open access and transparency. And it scares me. Maybe I am a conspiracy theorist, but it's a positive danger because it can very easily be used as a proxy for control of information. And I think we need to get away from using the word. And if you think about the way information is used, particularly in my case, the way legal information is used, you look around and, I mean, yesterday, there was a very significant Supreme Court case decided. Think about how much of the information that the judges on the Supreme Court actually used was, in fact, authentic by the kinds of standards we were talking about yesterday. And the real answer to that is probably none. How much of the information did the lawyers who wrote up all those papers and had the goal to submit them to the Supreme Court of the United States? How much of that was verified authentic information that they were relying on? None. They got it all off of Lexis and Westlaw because that's what everybody uses now. The clerks who did all the research for the US Supreme Court judges used Lexis and Westlaw. There is absolutely nothing on Lexis and Westlaw by definition that is original or authentic by the standards that we've been talking about. What I'd like to suggest, and what's wrong with that? There's nothing really wrong with that unless you have a real good reason to doubt the accuracy of that information. Now, I think that's a much better word, accuracy. It was discussed yesterday in terms of the CFR. There's information available with the stamp on it that's being verified, that's in PDF format, and then there's the bulk XML. Is that bulk XML truly authentic in the real sense of the word as it's used? Actually, yes. Is there any reason to doubt that the words in there are anything but the accurate words? And what's authentic in that sense? It's an authentic recreation of the CFR. It's very difficult to prove its authenticity. And in a sense that might be splitting hairs, but it's an extremely significant distinction to make when you're starting to talk about distribution of information. Because any copy is not going to be authentic in a very strict sense. The information that the GPO actually has on its servers that comes from somewhere else is not authentic. Immediately after you put the little blue eagle that has the digital signature onto that document, it's no longer authentic in that very strict sense. But can it be thought of as a definitive document? Well, yeah, it's a much better word definitive. But the idea here is that redistribution of information is something that's awfully important and awfully necessary and is awfully endangered by the idea of authenticity strictly speaking. And that's not the way information is used anyway. So let's not get stuck with it. Why don't we instead think about a definitive copy kept somewhere where we can always find it? And then we can have a whole bunch of authentic recreations that can have value added features that can be widely distributed, which of course brings me to my second point, which I don't think was discussed quite enough yesterday is the idea of ubiquity. It's an old idea. Thomas Jefferson wrote about it when he founded the Library of Congress. He said, let's establish libraries to redistribute information to as many places as possible. Why? Because that's the way information survives through distribution. We have a real big problem nowadays in libraries and in information generally because with the internet, we have what I'd like to call a parent widespread distribution. Everybody can get access to the data through the internets. But at this point, we have a severely restricted number of points of failure. The number of repositories is going down to one. Of course, the danger is when it goes down to zero. The GPO is extraordinarily reliable. Why would we ever doubt the GPO's ability to make material available, except when it happens? I mean, we're not talking about a reasonable expectation during the next year. What we're talking about is 50 years, 100 years. And the joke I usually tell when I get to this point is talking about my Danish heritage and being chronically depressed and thinking about we're all going to die. What about when we're dead? Do a little Strinberg thing. That's the real question. It's sweet. It's still, it's dark all winter anyway. It's still the real question, right? We want to keep this information forever. And how are we going to keep it forever if we have a single point of failure? We can't keep it forever with a single point of failure. It's gotta be widely distributed. The government can have a point of access, but the idea of having government bulk access for widespread redistribution, that's the way things will survive. Why am I doing what I'm doing? Google's got it, Westlaw's got it, Lexis has got it. They've got it better than I offer it. I offer it for free. There's an economic reason to have it because people can get it for free, but there's also my own position as a traditional librarian. I need to collect material and I need to do it in a way in which I can get it and I can preserve it. So I do it electronically. We talked about privacy as well. Do I have time? Come over here. Okay. Privacy and sustainability. Actually, just finishing up authenticity. I've got my note here. So that's my issue with authenticity and my issue with its importance for the issue of ubiquity. Our own approach to this issue is, again, I said earlier that we do take authenticity very seriously. The way I do it and the approach I have for it is, again, like I hinted at before, it's more of a weights and standards kind of approach. We keep a definitive copy whenever we can get a definitive copy. There's gotta be something there that we can compare things to. And what we do with the material that we harvest and collect is we retain provenance information. When I download a Word document of a court decision, the first thing we do is run an MD5 sum on it and we store that. And in fact, we embed it into the document because you can't have a survivable archive without embedded metadata. Which, for the codeys, you may understand that, but we immediately run that MD5 sum and then we also stick in tags that put in the date stamp of when we changed it and a short explanation of what we did with it. In the case of our New Jersey court decisions, we convert it to HTML, we put in hyperlinks of all the citations, and then we insert all of those meta tags. But the explanation is there, what we did, when we did it, and then we run another MD5 sum that's not stored in the document, of course, but we keep it in a database. So you can trace the material back. Is that perfect? It's not perfect. It's the best we can do for free, just having the machine do it all for us. And it may be subject to some evolution when somebody comes up with a better idea. But if you have any reason to doubt our material, again, the question is too, we're concerned with fraud versus accident, right? And in the case of the vast majority of people who are going to be redistributing, the question is not fraud, the question is mistakes and accidents. And if you have an explanation of what you did to the document and when you did it, and a new set of MD5 sums, you've got a pretty good, not again, not perfect, but a pretty good way to trace the material back. And I think that's more than good enough to justify redistribution. And again, doing value added things. We get word documents, which are fairly useless on the internet, and we make them searchable. We have tags in there that make long-term archiving the material practical. And we put in lots of hypertext links. All the citations are hypertext links. It's important to do, and so we do all that, and it changes the document in a good way. And that should be allowed. Sustainability and privacy. Maybe sustainability is the good way to talk about first. We've talked a lot about how to fund these kind of efforts into the future. And I really wonder about that kind of thing, but as a practical matter, the thing that my mind always comes back to when I think about the sustainability issue comes right back to what has happened at Rutgers, right, and it's the point I hinted at earlier and why I actually mentioned it, is that after we've been doing this for a while, it's become a part of the institution. I personally have never gotten a grant for any of the things I do. We've gotten into digitizing congressional documents. We're getting close to three million page images on that. No grants. It's a part of library operations. Why do I mention that? Told you why I mentioned it, but to go on with that, we have a budget of something around, and it's a fairly small specialized library within a larger institution, but we have a budget of about somewhere in the neighborhood of $2 million a year. The university gives us that budget because they expect us to collect, maintain, and provide access to a repository of information. Two million bucks a year to do that. Quite frankly, they don't care so much whether it's in paper or in electronic format. Quite frankly, they don't wanna build us 20 million dollar buildings anymore to store the print that keeps on growing, so they'd prefer the electronic format. So what are our options as a library? Well, we can go the way of a lot of libraries and start subscribing to a lot of commercial databases and very rapidly become what I would like to call and they yell at me for saying this, but very, very well-paid password administrators which of course is going to get questioned at some point and there's going to be fallout, or we can start taking our traditional role as a library very seriously in the electronic world and do what we've always done, collect, maintain, and provide access, but in the electronic world. Now, how is this gonna work out in the long run? Well, I've got my idea of how my institution can survive, the law library at Rutgers-Camden. Other libraries may go along the same lines. There's lots of social institutional resistance, but it may happen. What's gonna happen with legal information institutes? I suspect that the idea of having a legal information institute at Cornell is something that the dean is quite happy to have. On good days. On good days. My dean's very happy with me because we get lots of hits on the website from what I do. But something like that. Is it going to be a traditional library? Is it going to be a legal information institute that a university is going to support and make part of its institution as a repository that they wish to maintain for their own academic purposes? Well, there's money in that. There's long-term money in that. And hopefully it's one of the things that'll actually work out in the long run. Because if you think about who's in the market, the courts are not in the market for sustaining this information. They're in the business for deciding cases. Libraries are in the business and get paid for maintaining the information. Whether you call it a library or not, it's kind of a secondary issue. But I'd like to suggest that's a way to think of this and a way things may go. I'm probably out of time. So, thank you for your time. Chris, please hold them up. Nick, what are you talking about? I have a question. Go ahead and write it down on the next card. We have some here on Google Moderator. I will start with the first one for Tom, of course, who is a theater major. So let me quote Shakespeare. First thing we should do is tax all the lawyers. The question is, would a $30 per lawyer tax to support these efforts be feasible here in the U.S., and how would that have to happen? All right, well, let me take that question apart into two pieces. One is $30, and the other is head tax on lawyers, which actually speaks to the business model by which the Canadians worked. Ed Walters can actually say much more than I can about the price sensitivities of bar associations in terms of per lawyer costs. What the Canadians actually did was quite interesting. Canley, a Canadian LII, is, in fact, a nonprofit corporation that is a consortium activity of the 13 provincial Canadian Bar Associations, and its board contains a representative from each of those bar associations. They are the ones charging the tax to their own membership. They then contract with a research group at the University of Montreal to actually build Canley, and build and maintain Canley, the website. The research group at the University of Montreal is actually in the process of incorporating separately from the University for a bunch of reasons that we need not go into. And the question is, could we replicate that structure in the United States? From the business model perspective, probably not. The Canadian Bar Associations are mandatory. They are in a unique position of strength with respect to the actual lawyer population in Canada that bar associations in the U.S. are not. A more cynical person than myself once characterized American Bar Associations as organizations of people who want an organization to provide a set of services that they believe they would have if they worked for large law firms, whether that's actually true or not, and that does, in fact, tend to be the case. So here, doing it through the bar associations is probably not feasible. It's not clear who else would institute that tax. It would probably not need to be $30 for what that's worth. There are not a lot of lawyers in Canada and many of the costs of operations for something like Canley are fixed sort of irrespective of volume of data. So you would not, I suspect, need something that large. Could you get it to happen in the United States? Maybe. But I doubt it. It's the kind of thing that we tend to put more in the private sector. This is actually from Joseph Hall here. Steve, do you have any data on usage of the Recap Tool and Archive? Yeah, so we have some, and I'll punt this also to Tim and Harlan to the extent that they have any more details. Part of this, we don't have great data, partially because we have a fairly aggressive privacy policy where we commit to not gather data and to discard the data that we do gather after a couple of weeks. I think most recently when we tried to get some numbers, it looked like we had had on the order of like 10,000 people plus downloading and using the plug-in. But I don't know, Tim and Harlan, do you have any? Yeah. Right. Yeah, a few. Yeah, so what Tim added was we get on the order of a few hundred documents per day. So the reality there is that we're not, by any means, keeping up with the full flow of everything that's out there. On the other hand, the stuff that we get tends to be stuff that people are interested in, which tends to be the more important stuff. Okay, this is from Susan Copeland-Wilson. John, how frequently do fraud or accidents occur in materials authenticity and how big a problem this is? Sir, could you repeat that? How frequently do fraud or accidents occur in materials authenticity and how big a problem is this? You know, that's hard to say. We haven't found, we haven't had a problem at Rutgers with a failure of authenticity that would be anything like, you know, let me take a step back and my answer will probably make more sense. What do I consider an authentic copy that's been of a document converted from Word to HTML? Just put it into some context here. To be authentic for the purposes that we have the material available, it needs to have the words to be preserved accurately and enough of the formatting so that the context of the words is clear. What do I mean by that? Footnotes still have to appear as a footnote. Paragraphs need to be clearly demarcated as paragraphs, et cetera, right? So not to get into too many details, but that's really the essence of what we're talking about. We don't have a problem with mistakes with that. We haven't been hacked, at least at this point. And if we did, we've got material all backed up in remote locations in black boxes. But at this point, it's never happened. If it did happen, we'd bring things back really quickly because the sums wouldn't work out anymore. Can we both have comments on this? Yeah, I do, I do actually. You might broaden this question to look at two different kinds of things that both have to do with the question of what occurs in the system when accuracy fails, right? Or when accuracy is believed to have failed. To get an indication of what courts will do with stuff that's badly formatted. My favorite thing to point people to is a Supreme Court case called USV Excitement Video, which is actually the Tracy Lorde's Underage porn case, which basically turns on white space and indentation. And like all other sorts of statutory interpretation cases that I've ever actually seen, the court found a reason to rule that had absolutely nothing to do with the formatting issue whatsoever. They do tend to dodge these when they come up. The much more serious question of what happens with general failures of legal research, someone who's looking at an inauthentic document, someone who's done it badly, someone was talking, Steve, I guess, about this potentially rising to the level of malpractice. That actually turns out to be a really difficult question to research. We've pursued it a couple of times with professional responsibility people. The trouble is that there's no paper trail on it because typically these cases are sufficiently embarrassing for the lawyer that they are settled long before they reach any form of public record. So it's very, very difficult to go back and actually find a malpractice case anywhere that turns on this kind of thing. There could be a lot of them, there could be three. At the moment, there doesn't seem to be any good way to find out. Well, and how many cases actually exist is somewhat independent from the level of concern about the cases existing, right? I mean, if attorneys are too worried about it, then they're not gonna use the system regardless of whether or not that's a legitimate fear. So we, very soon after we launched recap, we started to get pushback from people saying, oh, I can't responsibly use this. So even if there's not a legitimate problem there, there's clearly a perception of the problem which creates a barrier. Yeah, I'm quite cynical on this issue and I think that a lot of that just operates at the level of branding and not at the level of real concerns about authenticity. Oh, because as John is fond of pointing out, people who are relying on Westlaw and Lexus are not relying on stuff that's de facto authentic either. And when I wanna be a real smart ass about it, I say, look, we all know that in Paris there is a bar of platinum that is exactly one meter long. If I want to send her a picture on my wall, I do not take my tape measure, walk to Paris and check it against that meter long bar of platinum. My economic concern here is very different from what a fully authentic system will actually support and a point that I think Carl is waiting for me to make at some time this morning is that in a lot of ways legal research is an insurance business. Lawyers are ensuring against the loss of a case. People are trying to guarantee a result. People are trying to do risk management in the face of business concerns. And the thing that's kind of tricky about that is that despite the fact that law librarians and other service providers would like to push us toward buying as much of this product as we possibly can, people will actually only insure to the value of the goods. Nobody's gonna do $5,000 worth of legal research on a $100 case. It just, it doesn't happen. And so for that, I mean, that says a lot about cost and it says a lot about what the real value of the authenticity issue is in the long term. People have learned to use branding as a proxy for absolute guarantees of authenticity and they'll continue to do that. Yeah, I should say the whole point of law.gov is that we think that government should originate an authenticated stream of its own information. And that does make it much easier for these brands to occur later on that have the value added in the insurance. But by having government originate its own information, we at least have a shot at seeing that stuff replicate and duplicate. Because I absolutely believe that somebody at the Stanley Rule and Level Company hauls their ass to Paris and takes a look to see if their tape measures are accurate. I just don't want to do it. Question from Professor Nissenbaum. I have a question about what records you believe ought to be posted. Man, let me do that first and then I'm sure Steve is gonna have some of this stuff to say or others will. So I believe firmly that the court should decide what should be public and what should not be public. What I don't like is when the courts decide that they're going to avoid that issue, simply make it available on Western Lexus because anybody with a credit card can be trusted not to be an identity thief. And therefore it's okay to leave that information out there. And so I'm a firm believer that the courts need to face these decisions. And so your principles of reduction make a lot of sense. One of the things we found after the audit is that lawyers are now being brought in by judges when they put social security numbers. And one was recently fined $5,000 for posting a whole bunch of social security numbers in a document. And so I think it's really important for the courts to face this issue. And there's been a security through obscurity theory out there for a long time that somehow it's been okay because only registered professionals have access to this information. And if you can't trust lawyers, who can you trust? And I'm just not sure I believe that. So anybody else on what should be removed? Yeah, go ahead Steve. And we should have a longer conversation too because I'd really like to get more pushback because I am generally of Carl's persuasion that what's on the public record should be out there and redistributed as much as possible that the courts need to take a more active role in policing those things and encouraging good behavior and punishing bad behavior. I think anything that doesn't violate the federal rules of procedure with respect to privacy should be made freely publicly available. And if the reality is that the transition from practical obscurity in the physical age to ubiquitous electronic access is making things more broadly available than people expected, then we have to go back and change those rules. Maybe along the lines of exactly what you're suggesting. But I think that the originators of those documents, the courts or council or some combination need to make that decision up front so that we're not hampered downstream trying to make the call. But as a moral obligation, and when Carl released his data dump of PACER documents, he released it with the, what did you call it, a moral license? Given the fact that the system isn't functioning perfectly, we have to do what we can to protect those mistakes that happen. So we're having a vigorous internal debate about how search indexable this growing archive should be. And Tim is actively working on automated redaction approaches to try to detect these things. So it is a difficult balance. Well, I think Peter Wynne makes a good point. I'm sure you're familiar with his work as well, that it doesn't necessarily have to be a balance. It's a matter of aligning proper judicial information management, which properly deals with privacy issues and enforces good public access. Obviously there are different levels of sensitivity depending on what the information is. And Helen, I think I'm detecting a question about who's a rat.com. Do you know about who's a rat? These are some guys who have been mining PACER for plea agreements in federal cases and publishing the results with the strong presumption that anyone who's pleading out is making a deal with the cops. And for jailhouse beatings later, the federal prosecutors were basically ready to seal all the records. So what's the net effect on government openness at that point? If they overreact by sealing every case that gets plea bargained, 75% of criminal cases disappear out of the federal system instantly. This is not a good result. And partially goes to that question yesterday about whether sometimes transparency amounts went unwelcome form of elbow jogging because in this case it most certainly would be. We do find that the concern with privacy issues in general is differential across the levels of courts. We've never had much trouble with it that could not be characterized as collateral damage from poorly written judicial opinions. But I'm not publishing family court information either. I'm publishing Supreme Court decisions where this stuff doesn't tend to bubble up to the top. My colleagues in Australia found themselves on the front page of the Sydney Morning Herald every day for 10 days with headlines like academics blow lid off family court. And they were quite unhappy with that sort of result. As a practical matter, the Canadians find that about 10% of the cases that come through their system, which is all of the sort of stuff at national and provincial level, need some kind of redaction attention along guidelines very similar to those that New York State has implemented. And it's actually reasonably cheap for them to do editorially with the support of some automated tools. So as a- What does New York State do? I'm not familiar with- Oh Lord, you'd have to look back at the commission report that came out of the New York Court of Appeals. It was a Judith Kaye commission. What was interesting about it was that the composition of it was very good. They had New York Times reporters. They had someone who ran a domestic violence shelter. They really did a very good job of bringing stakeholders together around that. It's a nice set of recommendations. You can find it online. I'll circulate the URL via Twitter. But it's typical. I mean, there's a lot of work that's been done at the state level on this stuff. And some of it is pretty good. I think Peter Wynn probably knows that stuff better than anybody. Peter Wynn, by the way, is an assistant U.S. attorney of Seattle and is a noted expert on privacy. He's been one of the leaders in this area within the government. John, you had a brief comment on this? Yeah, I mean, I think at a certain point you should realize that we're really talking about what can be separated into two significant things. On the one hand, you've got things like social security numbers. The identity of a minor in a family law case and things like that. And those are things around which bright lines can more or less be drawn. On the other hand, you've got more of the social event that's happening now with material becoming available that just was practically never available anymore. All court documents are, again, by definition public documents and accessible by the public, but they were always protected and there was always a privacy expectation that was enforced economically. You had to be able to afford to get somebody to go to the courthouse to make a photocopy in order to distribute that information so it didn't get out. Only lawyers collected court reports and so it didn't get out. So what do we do about that? Or what can we do about that? Should the expectation of privacy alter now in our new world? Or is there something we should do about it? Quite frankly, what we did, I was talking before about administrative law reports and they were in an obscure publication that only lawyers could afford to buy and only lawyers read. When we started putting them on the internet, Google started indexing them and all of a sudden, not with the New Jersey Supreme Court, not with the New Jersey Appellate Division, but with the administrative law decisions, special ed cases, people getting fired from civil service jobs for abhorrent behavior and things like that, we started getting complaints. Our answer to it, quite frankly, was to set up a robot.txt file and exclude the collections, the body of the collections from Google. If you put in Joe Jones's name into Google, you won't get our administrative law case that shows that he got fired for sexual and propriety. If you put into Google New Jersey administrative law, you'll get our search page. And quite frankly, we're comfortable with that. Anybody looking to do administrative law research for New Jersey will find our page and they'll be able to find Joe Jones's case because they're doing research on sexual and propriety firings in the administrative law system. But they're not gonna be able to troll for information on Joe Jones. So we made a distinction and we made a judgment call. Is, should that be our call to make? No. The expectation of the privacy issue is actually a pretty important one. Let me give you two quick examples on that. We made a copy of the copyright database, all the copyright registrations, and just made it available at permission from the copyright office, but it had not really been visible on the internet. About a couple times a week, I get notes from people saying, I Googled my name. You've got my name on your site, please get rid of it. And I look at it and it's a copyright registration. And I had to make the moral choice, which is if it's a copyright registration, I'm sorry, I'm not removing your name, you registered it. On the other hand, we get people all the time writing about court cases and we have the same policy that John does, which is if he writes us for any reason, in fact, we'd really rather not know the reason. We will remove a case and put it in the robots.txt file. We won't remove it from our system, it's still there, but it doesn't necessarily show up on Google. And we do something among us, which I don't think the big vendors do and I think the government definitely doesn't, which is we share our robots.txt files. If somebody writes to me that has an appellate case that they don't want to see online, and there's a serious issue and I'll make sure Tim Stanley knows about it and Stuart Sierra and the others. And that gets to the final point of privacy, I don't think I want to move on, which is that shit happens. For example, we found 500,000 social security numbers that were published by the United States Senate in the congressional record. Every time they promoted a member of the military, they printed their name and their social security number. And it turned out if it was a senior enough member of the military, they printed their birth date, just so you know who's this General Jones and not that General Jones, going up to Major General. And this stuff happens. Now we managed to get the major vendors to remove all that. What we did is set a federal trade commission complaint in and then carbon copied West and Lexus and they kind of voluntarily redacted very quickly. But that stuff is still out there. And my point is that if you're going to protect privacy, you need a feedback loop, because things are going to happen. And one of our chief recommendations to the administrative office of the courts is they need a chief privacy officer so that when something does leak out, they're able to contact the AO. And in return, they need to be able to notify the downstream vendors. Whoops, we improperly unsealed this case. Could you please remove it? And that again is why we think the government should be originating its feet of its own documents and not necessarily depending on the vendors to do everything for them. Now I'm not saying that the government should be running the ultimate search site, but this is why the government needs to produce its own work product. And that's the whole point of Law.gov. For John, you mentioned the coming catastrophe of single point archive failures and contrast that traditional librarians attitude of hoarding against the coming dark ages. It's the right model in the digital world, one of massive replication. I think we know your answer there. That's a setup, but. Yeah, my answer is absolutely. And it doesn't have to be quite the dark ages, right? As Carl so eloquently put earlier, shit happens. And everybody's got problems. The real thing I really worry about most is that the single point of failures that are rising in the information world right now are overwhelmingly tending to be commercial. So dark ages, well, maybe not a dark ages, but is something going to be not commercially viable? And do we want our intellectual and information heritage relying on a single point of failure that will depend absolutely on commercial, current commercial viability? My answer to that is you don't need a dark ages to say no. Can you one minute left and then follow comment? Can I ask a follow up question related to authentication? So how does that square with your concern about focusing too heavily on authentication of documents themselves? And if the only way to ultimately see whether or not this is an accurate document is to go back to the source, there's only one ultimate source. How do those two goals, your goal of not overemphasizing authentication but also avoiding single point of failure, how do you square those then? Well, you gotta make a balance, right? What happens if the GPO crashes and there's, well, what happens if somebody loses that silver bar in Paris? I mean, that's the question, right? Well, you ought to have two silver bars and we can think about that. I don't have a definitive answer to that question. I don't either. If there are any ideas. But I mean, I think that's a, I mean, that is a fabulous question. But I think practically, again, if you think about this, who decided that the silver bar in Paris is the meter, right? And if that bar goes away, what are we gonna do? We're gonna make a new one and we're all going to agree that that's the new silver bar. Well, it didn't affect they did that. It's no like vibrations of cesium or something, but yeah. Right, and that's what we'll do, practically speaking. The question is, is will we have a definitive enough copy in order to do something like that should the decided upon definitive copy that we've decided on right now should go away? And if you've got lots of really, really good copies that you know are accurate, that becomes possible and in fact, very, very practical to do. And I think the answer lies very much along those lines. Well, and again, people do ensure the value of the goods. So that in the case of the important stuff, yes. I mean, if it's some property dispute between two private individuals in an obscure corner of New Jersey, not to put too fine a point on it, who cares? And on that note, I think we're out of time. So thank you very much, everybody. We'll be back here at 11, you got 20 minutes.