 I want to talk a little bit about, I'm going to talk a lot about what Hathi Trust does around collections today and in a couple of different ways. What we do with digital collections and what we hope we can accomplish with our membership around fiscal collections as well. These are not the only things that we could talk about with Hathi Trust, but there are a couple of salient issues that I thought might be interesting for you all here today. So very quickly to cover an outline of what I'm going to be covering, I'm going to talk about some of the characteristics of our collection. I will first address what is this Hathi Trust thing anyway. I'm going to talk a little bit about a program we have to undertake copyright investigations and copyright review, and then I'll talk about a shared print program we have underway that is led by Liz Ann Payne, our shared print firm officer who's sitting right there in the turquoise jacket. Let me get my timer on so I don't completely overstep my welcome. So first off, Hathi Trust, what are we? A lot of folks will think of us first as a website, as a place on the web where one can go and access digital content, historical legacy digital content, kind of like JSTOR, but not quite because there's not all that much stuff there, not all the same things. I want to acknowledge that absolutely no attempt to invest this website, this design with human emotions, was made during the design phase of this particular iteration of the Hathi Trust interface. We're happy with it, but yet. I think it is important to acknowledge that yes, it's a digital library, but we think of ourselves as much more than that. We're very much an organization that is attempting to work deliberately above the level of our individual libraries in order to accomplish some things that we could not do independently and individually. So our mission is a library mission, but it's one at that higher network level. Another way of thinking about what we do is we're trying to cooperatively develop a common good, a set of services and activities and collections that can be beneficial to the membership and help them bring more value to their own students and researchers, but also be beneficial beyond that. We are a trusted digital repository in the sense that we have been through track audit and we do operate a preservation infrastructure. And in fact, that digital preservation infrastructure I think of as the base for all of the other programs that we undertake, which we think of as quite transformative. We're a membership organization as well. It's actually about 126 members right now and with the exception of our research center and some specific projects there, our funding is entirely from member fees. This is a current list of our members. I think it's important to acknowledge that while I'm talking about collections with a global reach, our membership does not look very global. It is very North American focused. It is, there are seven institutions outside of the United States, four in Canada, British Columbia, Calgary, Alberta and McGill. We have Queensland University from Australia, University of that Compilatins, Madrid, and American University of Beirut are the non-US universities. And it's important to acknowledge this because a lot of what we do is based in US copyright law. And I think one of the difficulties of collaborating internationally around libraries, library collections, and digital realm has to do with different copyright regimes. And I'll talk a little bit about that as we go. So on the collection front, the collection today includes about 15 million digitized volumes, right? So that could be multiple volumes related to a single title. If you go by our metadata, which is Mark, so your mileage may vary, that's about seven and a half million monographic titles and four hundred fifteen thousand serial titles. Of that, 38 percent, 5.8 million volumes are open. And I was talking with Lizanne about this earlier this afternoon, and I cannot remember the exact number of titles that equates to, but it's between two and three million, and I'm sorry, I just couldn't keep that in my head with my head cold. The materials in the collection are primarily digitized, almost entirely digitized from legacy collections. I think it's important for us to consider going forward how our collection becomes prospective as we really focus on bringing more scholarship into an open access environment, which let's face it is not always well integrated into workflows for libraries for discovery access and preservation as well. We will be focusing primarily on book collections for the foreseeable future, although in the past we've been asked about other collections. That's really where our focus is going to be. A quick look at the linguistic spread of our collection. It is about half in English. You see the top 10 languages there, it's very diverse linguistically. To pivot to usage for just a second, I can tell you that about 60 percent of our of our sessions, over 10 million sessions last year, were from English speaking countries, and with about half of those from the United States. The United Kingdom is the second country that most frequently visits Hathi Trust, 6 percent of our visits last year, with Germany being in the top five as well. So it is a collection that's broadly beneficial to individuals around the world, we believe, and we're happy about that. The collection is primarily from North American research libraries. I didn't get to this in my last section, but really this organization came about in the wake of mass digitization. So if you look at the contributing libraries and the largest aggregate here, they are organizations that partnered with the Internet Archive or Microsoft or Google early in this century to begin mass digitization of collections. And that was really the problem, if you will, that got focused on how to collaborate in a more focused and intensive way. The access protocols for our collection are often quite difficult for folks to understand, but I think I may have finally distilled it down to one slide. For anybody anywhere, any of you would be able to search the collection. You'd also be able to go to the Hathi Trust Research Center and do some basic text and data mining using some services they have there, and you could read online any of the works that are open or public domain in the location you happen to be. Members have more advanced access to the public domain materials. They can download those works. For members, we're able to provide services for their students and faculty who have print disabilities who are blind or otherwise have need for assistive technologies. We can make some collections available to those individuals. And we do have, under library exceptions, the United States replacement access, both in either print on demand or digital access for materials that have been lost or damaged or otherwise missing, not available. I'll say just another quick word or two about this mode of access, our non-consumptive access, by which I mean simply access that's not primarily for reading. Access that's defined around computational analysis, often sometimes called distant reading. The Hathi Trust Research Center is based at University of Illinois, Urbana-Champaign at Indiana University. The work that we do is distributed around our membership. We really depend upon distributed expertise and effort to make this whole thing work. And those two institutions have particular expertise around text and data mining and infrastructure to support that. So they have been developing scalable services that enable secure high performance access, high performance and virtual access to the entire Hathi Trust collection. One of the ingenious services that have been developed there is something we call the data capsule, which is really a virtual environment where you have access to some subsets defined by you of the Hathi Trust collection that you can then run your own analyses on. This is available for the entire collection without regard to the status of its copyright. That is again based in fairly well now established principle in U.S. law of digitization for search and analysis being of fair use. We do distribute data sets for individuals to work with on their own, but we can only do that with public domain data. And just a very quick couple of examples of the kind of research that has been done at the Research Center. These are a couple of special projects through an advanced collaborative support program we offer. Matthew Wilkins has done work to pilot the extraction of geographic place names from 10,000 volumes of the Hathi Trust corpus, well on his way to working on that for the entire collection as well. Michelle Oaxupalus is an economic historian at the University of Toronto who has very interestingly done analysis on subject cuttings and library metadata to look at technology diffusion, so you know what what technologies evolve in a particular domain and then over time diffuse and evolve and become apparent in other information domains. And her work now is focused on content analysis of full work and you see a couple of visualizations of that there. So that's the collection, a little bit about our access policies, a little bit about some of the services we provide. I'll turn now to talk about one of the major programs we have underway that I think is frankly the best example I have of the kind of distributed cooperative work we've been able to do through Hathi Trust and it's a program of copyright review. To get to to talk about why we do this I want to show a couple of pictures here. This graphic is an attempt to show the distribution of works in Hathi Trust by date of publication. The bars represent primarily decades with the exception of a few of the far left hand which represent centuries because we just don't have that much from the 14th century in the collection right now. A couple of things I think are quite interesting looking at this. You can see the post-war boom in scholarly publication very clearly by looking at the high boost on the right side of the graph. That's all post-war. You can also see that as the 20th century moves on and that explosion occurs digital access is decreases. The orange and the orange lighter sections on this graph indicate material that is open or public domain in Hathi Trust. The dark indicates that which is not in full view which is restricted. This graphic is another view of the collection but focused more specifically on the different kind the different status copyright status of works in the collection. The majority is in copyright where we at least as far as we know there are there's a substantial portion that's not and that is open for various reasons. Some material has been licensed by the rights holder through us for full access using Creative Commons. A small portion but actually a large number of volumes are in the 30-40,000 volume at least. The U.S. federal documents are not protected by copyright so those are also fully open and about 19 percent of the collection is public beyond the docs and the Creative Commons is public domain outside of the U.S. and inside the U.S. That's the PD worldwide. There's a portion that is only considered to be public domain in the U.S. And that is material for which we have not always been able to determine clearly whether or not it is in copyright outside the United States. We have different copyright regimes here I'm sure you're aware of the United States. It's a hard date right now. After 1923 you can assume it's in copyright from a particular date which I forget I think it's 70 96 I can't remember. We do have life dates in the U.S. but for a large portion of the 20th century it's 95 years. With a few exceptions which I'll talk about but of course it's life dates in the rest of the world. And so one of the challenges we have my predecessor John Wilkin who John cited earlier had a great phrase called bibliographic indeterminacy meaning that the difficulty the challenge we have at actually knowing knowing things about our collections based on the quality and the depth of the metadata we've had about it. So this is what's led us to this program of copyright review. And this is work that is focused on systematic distributed investigations into the into the status of individual titles in Hathi trust. We have had over the last eight years through the goodwill of our membership over 100 individuals and libraries and probably 30 or 40 different libraries participating and contributing staff time towards investigating the status of these works. And the way the the program has worked is it's a it's a double review. Two different individuals look at the same work do the investigation and then we'll if they agree we will determine you know we'll assign a status to it if it's they determine it's public domain we'll open it. If they are if they say we can't figure it out we'll mark it indetermined. If they if they disagree we'll have an expert reviewer adjudicate that. The the the focus has been on two distinct pools of content for different because of different copyright regimes. In the US we've looked at publications published after 23 and up to 1977 because during that time in the US works had to be meet certain formalities to be under copyright they had to have a little C mark inside of them and they had to be registered and they had to be renewed and if both of either of those formalities were not met by now they are out of copyright right so we're focusing on several we focused on several hundred thousand volumes from the US during that time period then we undertook a project to look at publications in the United Kingdom Canada and Australia those dates noted there observed the 50 or 70 year rule of copyright life dates plus 70 or 50 in those in those countries and you can see here that as we go through this work roughly half of what we have looked at we've determined to be in the public domain that does not mean the rest of it is certainly in in copyright often we will not be able to make a determination and we will mark it such so that we can come back and do other work later the reasons sometimes we mark it undetermined might be we just don't have sufficient information we can't find life dates it might include photographs without credits and we can't determine the copyright status of those so this is quite intensive looking and it's a but it's one that we focused and developed in a way that is focused on speed and scale and again this is about 650,000 works that have been reviewed in the course of eight years twice right so this is this is a pretty significant contribution and there's still more to be reviewed there's some gaps in what we reviewed for the what we called the world project we did not review materials published in New Zealand South Africa India I believe there are some materials published in the United Kingdom now that we have not gotten to as well our current project focuses back on us we have about a hundred thousand volumes to review in the US so if I have anything else to say about that I think I will just move on then that's that's one one major project again the benefit is to individuals worldwide because we're opening these things worldwide but one of the things that opening materials worldwide does is it gives libraries an opportunity to think differently about how they manage physical collections you're probably fairly aware of this right after after we started digitizing collections of journals in the 1990s with JSTOR and individual publishers and they began selling them back to us it gave us an opportunity to think about well do we need to retain these journal titles in our central stacks could we move them to another facility and then as the years more on we got more radical and thought do we even need to hold on to these things or bind them anymore and and so I want to talk about the shared print program we have underway because I think it's one that is it is just getting underway has a great start underway but it's really one that's focused on a prospective effort and one that I think has some interest potentially some interest here in the United Kingdom so the goal of this program is twofold the one I put on the slide is that we want to link preservation of the digital and physical right we want to be able to tie those two things together so that our libraries can ensure that the print record goes forward another driver is that there are real problems of economics in space in almost all of your libraries and certainly in most of the libraries in the United States as well that bar graph that went bump on the second half the 20th century was not accompanied by a bar graph that went bump on the amount of dollars available for library facilities to be built maybe in the 60s in the maybe the 70s but not into the 90s and into this century so there's a real space issue and the goal another way of articulating the goal has been to help our members find ways of more efficiently and more effectively cooperatively steward these collections and do so in a way that saves them some capacity so this project is not focused on cereals it's focused on monographs most of the work on shared print in the world has been focused on cereals it's not done but we monographs are just getting underway so the goal of this work is to establish a one-to-one physical mirrored collection of the hot to trust collection for monographs so this would be physical copies retained by members it's distributed the material will need to be lindable that is it will need to be accessible to other partners in the program as well it's a program that's really intended to be a benefit to the entire membership so that even if you're not a retaining library you would be able to obtain access to the collection and this last point is very important that we want to build on what's already going on in the realm of shared print and not disrupt that in fact if we tried to disrupt it we'd not be successful and we would waste a lot of effort so the the goal is really trying to build on those existing resource sharing arrangements and not not break them down and there are a lot of existing resource sharing arrangements in north america um there it's a big it's a big country there's like 900 i think oclc estimated 900 million holdings across north america and maybe 47 million distinct titles this is several years ago so those numbers may not be quite right any longer what you've seen is these regional alliances these regional shared print programs around cereals emerging and they often build off of existing resource sharing networks for interlibrary loan or maybe licensing or other kinds of consortia and often quite really large east is notable in that it's relatively recent and that it is focused on monographs and it's also quite large in that sense so these are the programs that we feel we need to be able to work with what i think is notable about about this work that we have and see in the united states is as much activity as there has been it has really been in the regional level right there's there's been some effort to coordinate at a national level but not uh in a very um i don't want to say it's not that it hasn't happened it's not been as productive to do so it's been more informal there's not uh there's not like agreements between these programs there's agreements between libraries and these programs it's also difficult to connect these these different programs there the data available is of varying quality and this is like the story of library and libraries and metadata of course but c l l has done a fair amount of work looking at the data that they have in their p a p r registry on cereals commitments and what what we have there is data of varying quality difficult to make actionable decisions on so in uh why is that well one reason is that it's very hard it's harder to disclose your retention commitments the higher up you go in this stack of networked sources in fact there's less incentive for you to do so because if you're a library and all you really you may only need to know or feel you need to know what's accessible to you and your patrons and if you're in a shared print network you probably need to know what's accessible within that network but it's not necessarily the case that folks in another network need to know that right because you're not retaining on their behalf so the goal that we have is to take advantage of the fact that we collect holdings data from all of our members physical holdings data which is reported annually maybe by annually in the future uh to to launch this program so the goal here is look at this holdings data consider its overlap with the hot to trust digital collection and then use that analysis to drive the shared print program going forward and i have just one last uh slide here with any content on it this is a quick view of the implementation plan for this program we often call phase one a quick launch meaning we're moving fast to get some substantial work done initially and then analyze the results so the goal is that by the end of this year we will have matched about half of the collection and about half of the collection will be committed to be retained for a period of perhaps 25 years by hot to trust member libraries so we have a lot to do in the remainder this year we've got to develop formalize our policies and the agreements to hold this and get those commitments disclosed that's just getting underway right now that the the commitment analysis has been done the libraries are now considering what they can commit to us we're deferring a whole lot of important questions that would really move this from to a preservation conversation but those are the questions that we can begin to look at in the second phase of this which will begin next year where we try to it really becomes a long tail question then right what can we what are our gaps in our coverage what do we need to get we might have to do some closer looks at particular library collections and make requests of specific materials to be held that's all all to be determined but the point i want to make here is that you know we need to find a way to get started and our goal here is really to help catalyze that that start through this program among our membership and so i will leave this i'm just about out of time so i'm going to leave it there um just a quick note i know very i know very well i'm talking about a very particular context a very particular context in the united states um and that the UK context is quite different i'm eager to hear more about that and understand you know what issues you face around shared print and questions you have about that so i will take questions you have at this point thank you