 Good morning, I'm Oya Rieger from Cornell University Library and it's a great pleasure to co-present with my colleague from Columbia University Library, Robert Wolven. The purpose of this presentation is to share with you the findings of a too cool study that assessed the role of Portico and Lux in preserving each institution's digital journals, e-journals. And before actually I start, I should decipher too cool. Too cool is a collaboration between Cornell University and Columbia University Libraries. And the goal of this partnership is to join hands to provide content, expertise and services. With the support of the Mellon Foundation and also a partnership with Ethica Research, during the last two years we have been investigating the prerequisites for such a deep partnership including legal, technical, governance and IT issues. As you all know, academic libraries have increasing dependence on born digital e-journals and these are often commercially produced and licensed. As ARL annual statistics from 2010 indicates there is a dramatic shift in material spending, expenditure spending. According to the 2010 data, looking at six years, the last six years, the E percent content doubled from 33% to 61%. And also if you look at the expenditure in 2010, again based on ARL 2010 statistics, ARL libraries are spending six to the 70% of their budgets on e-materials. And in this slide here, actually I have a chart from Cornell University Library, black showing overall journal titles and red indicating e-journals just to show you the increase. This is very encouraging from access and discovery perspective because as we all know, our users are increasingly preferring accessing e-journals over print journals. However, if we look at it from preservation perspective, this really brings up a total different perspective. First of all, archiving responsibility for e-journals is distributed. If you think about it, we have research libraries, we have publishers, societies, of course, university and college administrators and scholars and scholars an interesting one because according to the 2010 OCLC report called a slice of research life, from user's perspective there is an implicit assumption that today's e-journals will be available for tomorrow's researchers. And interestingly, none of the faculty surveyed in this OCLC report indicated any concerns about the future of e-resources. And the other issue is that it's elusive and actually the two cool study on Lux, really very much is based on this concept of being elusive and we originally called this under the hood because the goal of the two cool study was Colombian Cornell being important stakeholders in e-journal preservation. We want to look at it to see how internally these two applications, these two preservation solutions were being used. And it really began more from Lux perspective because the study was motivated by 13 research questions. And during this presentation, Bob and I will only cover three key ones that I listed here. And actually the report is available on the two cool website that you are welcome to look at it to see how we came out in means of looking at this question from 13 different perspectives. We formed a small team and as I said initially the focus was on Lux because some of the questions were really more triggered with our assessment of Lux. But very quickly we realized that Lux and Portico were kind of went hand in hand that we needed to look at them together. I also must really note that this study was seen as a high level investigation, almost like a landscape analysis and that the goal was to answer some key questions but also to come up with a further research agenda for both institutions. Describing to you Lux and Portico is certainly beyond the scope of this presentation. I'm happy to see that we have Vicky Reich from Lux and Kate Wittenberg from Portico here with us. So if you have any questions about Lux and Portico we will turn the questions over to them. But with Lux, actually a live air community can participate in two ways. Either they can have their private Lux network. It could be institutional or organizational and they can archive web based materials. Not only archive them but they have perpetual access to 100% of the titles preserved or they can participate in the global Lux network which is a web based subscription service and this also includes post cancellation access. From a technical perspective actually Lux and Portico are two different approaches. Portico ingests data files directly from publishers and in their native format normalizes them and offers them as standard archival format to manage over time. Whereas Lux collects and preserves all content in its original format from publishers and one important thing about Lux is that they include format metadata to be able to enable browsers to render content. Actually as I said one of the titles is looking under the hood because this study really came about our honest assessment that at Cornell and Columbia we lack that deep understanding of Lux and Portico, what it does for us. As a matter of fact we have to bring together maybe 10 people to really pull together all the little pieces to understand how we are using it. And basically it's Lux is terribly underutilized in both institutions that I wanna emphasize that Lux is an alliance and each stakeholder has a specific role. To build a collection libraries need to be very active in selecting titles and that working with Lux staff we need to go through publishers to be able to attain permission and also work with Lux staff to create plugins. And unless each party is performing their roles similar to the case at Columbia and Cornell Lux turns into a kind of passive dark archive. Cornell and Columbia unfortunately neither of the institutions were during our assessment contributing to this collection development process. And also both institutions made a decision and this was really de facto not necessarily a calculated decision that when content is not available from Lux from publishers they would be using content from Lux boxes to activate content access from their library management systems. This really brings me to the second set of issues which is operational aspects. So if you think about both Portico and Lux one important operational issue for the libraries is that we need to know this preservation status of e-journals recorded in our electronic resource management systems and also if there is any discontinuity of service a certain e-publication is discontinued or there's an interruption or there is cancellation there needs to be a system in place automated system in place to trigger action or at least to alert. Actually at Cornell in 2008 we manually entered data and as you can imagine manually entering data is definitely not sustainable. The information is significantly out of date. In addition the information recorded was very high level. It didn't include it really information about the details of such as journal issues so on so forth. And actually I wanna give you a very quick example just to illustrate the importance of operational aspects deepening understanding more than Lux being a preservation strategy. In November 2010 Lux Alliance members were informed that a publisher is or we were informed that 12 titles would be withdrawn from their current publishing platform and what we needed to do was to take the to put place the plan to place a plan so that these 12 titles would be ingested and be available from Portico from Cornell and Columbia boxes. But interestingly when our staff made an effort first of all we realized that our boxes caches were full so we had to upgrade it and even at Cornell we ran into trouble with outdated old storage disk and thanks to Lux we got help from them. And even today neither institution yet has a plan to replace serving these 12 titles. And actually this also really surfaced for us an important issue that we totally missed. As I mentioned earlier two calls made a de facto decision not to serve lost journals from their Lux boxes themselves expecting that this process will be so slow that we will make the arrangements to replace it and then make a manual connection from our ERM to our library management system. So there was this hope that it will happen rare and very slowly that our kind of manual efforts would be satisfactory. However we learned from Vicky Reich that such a strategy in a way violates the alliance's agreement with publishers. In other words, aligned members do not have legal right to move content out of the Lux system that we had to rely on our own box not treated as a dark archive and put in place the procedures to be able to serve content as it gets lost. So actually I'm going to turn it over to Bob but just to summarize these two responses that we get it for the first two research questions. We found out that our use of Lux at Cornell and Columbia was really through inertia. It was not by design. And one of the problems was that preservation responsibility in general digital preservation but also e-journal preservation role was distributed involving staff from IT, collections, technical services and it pretty much fell through the cracks that no one was taking organizational leadership. Now I'm going to actually turn it over to Bob for an example. Okay, thank you. I'll be talking about the third question that Oya mentioned which is really a study of the coverage at the time we were doing this of our e-journal collections through both Lux and Portico. And I'd like to say I'm happy to take questions at any point during this. Unfortunately the blinding lights make it a little difficult to see people in the back so you might need to wave your arm if you want to get my attention. So this piece of the study has already gotten some attention I've heard it referred to in various meetings that I've been at and when it's talked about it tends to boil down to a statement something like this that only 13% or 15% depending on who's doing the talking and what they're looking at of the e-journals at Cornell and Columbia are currently being preserved. That's a fairly technically accurate statement. Fairly technically accurate. We'll get to that in a minute. And as a rallying cry or call for action it's no doubt entirely appropriate but it's also perhaps a bit misleading and so what I do spend some time doing is saying what's behind that 13% what's really under the hood as we look at the details. So we're gonna spend some time looking at what we've found and then thinking a bit about some ideas we've had about what should be done about it not just by Columbia and Cornell but by the much larger community that's concerned with these matters. And I'm gonna start with a few disclaimers to try to depress expectations right from the start but also to see these things in an appropriate context. So we were not really doing an evaluation of Locke's or Portico in this part. We were really looking at the facts of what's preserved what the actual coverage is. There are many evaluative factors that might come in but we're not gonna be talking about those directly although we'll be happy to bring that up in discussion. It's also not a complete survey of what's going on in the area of e-journal preservation and later in the presentation we'll refer to a few other initiatives underway that there are other methods and other programs going on besides Locke's and Portico as well. It was also not a rigorous research study. We designed it for a particular purpose. It served that purpose and at the end we realized there are a lot of other things we'd like to tease out of the data but there's a limit to what we can tease out of the data because of the way the study was done. So there's a real potential here for follow-up and a number of further studies. And as you look at numbers here these are numbers that are about eight to 12 months eight to 10 months old depending on which numbers I'm talking about things may have changed since then. I don't think they've changed much in aggregate as I look back at some of this now just to prepare for this talk I don't see major change but any specific piece of information might be different now than it was back then. So what we set out to do was to look at the overlap between Locke's and Portico how much is done by both how much is unique to each one and what would that mean for us and we ended up learning and teasing out a lot more. First of all the way we went about it we pulled data from our e-journals from our catalogs from our electronic resource management systems. We limited for purposes of comparison to those that had either an ISSN or an EISSN. At the volume we're talking about we had to do basically mass standard number comparison rather than looking up individual titles that wouldn't have been practical. We followed up on some of the things to verify that and we'll see examples of that later on of the limitations of standard number matching but that reduced our population almost immediately by 50% because 50% of the records for e-journals in our systems don't have these standard numbers. We'll delve into that a little bit in a minute too. However that still left us with a pretty large population. We started with a somewhat over 45,000 titles from Cornell and somewhat over 55,000 from Columbia for matching. We both sent data to Portico to do the matching. Our results were so close and as we looked at the overlapping content we felt that we could look at one data set against LOX and rely on that to a certain degree. So all of the numbers I'm gonna be talking about are largely Cornell numbers but the Columbia numbers are off by 3% in that range not a huge difference. Basically here's what we found. About 4% of the titles that we were matching were available through LOX only. About 14.5% were available through Portico only. 7.5% roughly available in both although not necessarily exactly the same holdings preserved at that moment in the two systems. Again we'll look at a couple of the implications of that. So if you add up those numbers it comes to 26% of the titles that we were using for matching were found in either LOX or Portico. How do we get to 13%? We'll remember 50% of our titles weren't even selected for matching. Now the reason we can divide the 26 by two and say 50% aren't there is very few of the titles that are in LOX or Portico actually lack those standard numbers. So almost all of the things that lack ISSNs and EISSNs it's unlikely that we're going to find them and in fact when we did sampling we found virtually none of those. They have special characteristics that will come back to and as to why we wouldn't expect to find them in LOX or Portico. Now I think when most of us think of e-journals we have a kind of ER e-journal, a botonic ideal of e-journals in mind but serial publications is what we were really looking at. When we extracted records it was anything that was a serial publication that had some digital format and that has a lot of implications for what we found. It says a lot about the diversity of library collections and it raises a lot of questions about our expectations for preservation of this kind of digital content. I mentioned again the limitation to standard numbers. It's also important when we start looking at the details of numbers that we were matching titles here. We weren't investigating what expenditures. If we looked at our financial investment in electronic journal content and tried to figure out what portion of that is preserved we undoubtedly come up with a very different number. Impressionistically the big expensive science e-journals are more likely to be preserved than the tiny small one-off journals that we don't spend money on. But nevertheless a $4,000 title that is preserved and a free journal that's not preserved counted as one title in each case. We also weren't looking at the extent of the content that was being preserved. So that $4,000 title that has masses of content would count the same as a journal that has one volume whether it's preserved or not preserved. Doing that analysis on content would actually be rather difficult. The ways of measuring content and the structures and the metadata that support that are complex to say the least as anyone who works with serial data knows. Matching what amounts to three different systems are bibliographic information, sometimes four different systems and the way it's presented to the preservation services don't necessarily align. I chose one example that's not completely representative but not atypical either. We looked at a number of specific examples that are preserved in both systems and very often the exact content being preserved at that particular moment is not identical. It's not wildly divergent because they started at similar times and they're acquiring over a period of time. But you'll see that I chose this example because it doesn't favor one or the other. They each have content that the other doesn't have. So it raises another question about the extent of redundancy that's important just to ensure that we've got everything over this. So the diversity of serials. Any good serials cataloger knows that they're complex and they're all these different varieties. And we found examples of all of these in the data we were looking at. Again, if I just say e-journal, the idea that pops to mind are the scholarly peer reviewed journals that we've been subscribing to for years and now have gone largely only on the Elseviers and Springers and Oxford's and Cambridge's and so forth. But in our collections, at least the way we've defined our collections and presented them to our users, there's these masses of other types of publications, trade publications and newsletters of various organizations, annual reports from a variety of kinds of institutions, historical newspapers are in there, masses of government documents and so forth. And then those things that have bedeviled people for years because they are neither fission or foul, they behave, light behaves as a wave and a particle depending on how you measure it and conference proceedings and monographs and series behave as books and behave as journals depending on what lens you're looking through. We're also looking at all kinds of digital forms. Again, the primary focus on our e-journals is on things that are currently published that often are made available by the publisher but there's a variety of other things that we were looking at. So in the data we were seeing, we have back files from these same publishers. Sometimes it's simply a back file of what's still being published. Other times we have journals that have ceased publication and these are dead titles that nevertheless had the back files online. The same thing happens when it's not made available by the publisher but through an aggregator, something like ProQuest or EBSCO or any of dozens of others that act as distributors for publishers. We were looking, as I say, at historical serials as well. The thing is that they've been digitized from library collections either by the libraries themselves or by Google or some other boutique or mass digitization project. Many of these also appearing in commercial collections made available, say, from Gaels and Gage, things like 18th century collections online and other historical collections of content. And then a sort of miscellaneous trove of things that are being published on the web that behave in some ways like serials and have different characteristics but fall into the bibliographic data that we were looking at. They definitely shouldn't all be looked at the same way and we're gonna offer some suggestions for how we might parse this out and separate them. And the question of whether they should be all considered as part of the same problem is a good question but we'll have to think about that a bit more. By the time we mesh all this together, you might wonder whether we have any kind of useful data at all about numbers about all of this. It's easy to kind of avoid the problem by saying, well, it's very complex and there's all these different things. So what we tried to do was to characterize things and put things into categories. And we came up with, I think it breaks down to 11 categories in the end. These are somewhat arbitrary. You could see it different ways and make up different categories. What we found a bit reassuring is while we were doing this, the staff at Portico were looking at the same data and looking at the same issues and trying to come up with the same analysis. And we found a very high overlap in the way we had characterized things. I think we had 11 categories and they had nine and they were all fit together. So we have some confidence that these are a reasonable lens for looking at things. They're really made up in three different ways, I think. One is just ease of analysis. It's easier to analyze things in groups in certain ways than in other ways. Second is how they're made available to us. The data told us how we were getting it. What collections it came in, what sources it came from and that served as a useful point of analysis. But maybe more to the point also from the standpoint of how we should think about them and what kinds of strategies we might use in trying to get them preserved, if that's important to us. So there's overlap among these categories. Any particular title you might fit into two categories, even three categories. There are freely accessible titles that are coming through us through aggregators. There are East Asian titles that are freely accessible that are coming to us through aggregators and so forth and so on. We made them mutually exclusive for purposes of these numbers because, again, it relates back to what we might want to say about them and how we might want to think about them. But once we actually delve into this in detail, we'll see another, I think another way of breaking things down and having to parse them out a bit more when we get to the title level. I'm not gonna talk about these in any detail now because I'll come back to them. They are in the order of numbers. So we started working from the top and saying where are most of these clustering and then how does it break down as we get farther down the list. And you can see quite a variety of different categories here. I'll come back and talk about many of these briefly in a couple of minutes. I do want to say a little bit, to just give a few examples. I think examples help to make things real rather than talking abstractly about things. And these are not chosen to be typical examples necessarily, but to convey the diversity of things within this and the reasons why you can't even take any one type of journal and characterize it and say the same things about all the things in that category. So without going in detail and reading through these things, you can see, you get some sense of the variety within there. The extent of them, some are familiar, some of them far from familiar. If anyone knows what the last title on this screen is and can tell me what it is, I wouldn't mind hearing. We will have people who know that within our libraries and can hopefully give us some advice on these collections and what's important. I can't resist mentioning the third one here because I was particularly struck by it. At Columbia, we've been also doing a program to harvest and archive content from the web and focusing on human rights organizations. This would be a perfect candidate for that web archiving initiative except that no longer exists on the web. It's gone. It's now coming through an Eastview collection and so we have a different kind of preservation involved in a different kind of problem. A few more examples. As I mentioned, these historical collections are not only the ones in the US but also things being done other parts of the world. Book series, I just gave this example because this series that to us appears looks like a serial because it's got a serial record for developments in volcanology. All the individual volumes are present in the Science Direct e-book collections. As an example of data errors, one that showed as not preserved is the title Music and Medicine. In fact, it's preserved in both locks in Portico but the ISSN that's in our data doesn't match the ISSNs that's in their data so there's a certain amount of that as well. But then in the end there are these things that you might expect to find and things where we're relying on a kind of broad statement that a certain publisher is archived with these and we found that it's not entirely the case all the time or things that are these mixed types of presentations like conferences that may or may not appear. These are all examples of things that were in the unmatched not preserved data. So taking another look at these different categories in a little more detail and thinking about them, we've started having conversations along ourselves with some of our colleagues with people at both locks in Portico about what makes sense here and we'll probably get different views on this so I take this as a semi-official too cool view. It's far from a too cool official too cool view but reflects some of our conversations. If we think about the largest group, the group that are available through aggregators, this group is probably among the most important for us and also the among the most difficult. It's difficult because the aggregator, the distributor, the ProQuests and the EBSCOS are not the rights holders for this. They've acquired a limited set of rights for making these available but in most cases the publisher still retains all the large amount of rights and the distributors, the aggregators have not in most cases, again I'm generalizing, acquired the rights to make them available to third parties as an archive. So if we want to preserve them it really involves a tripartite kind of discussion and arrangement. The libraries in most cases don't have direct business relationships with the publishers of these, they're relying on the distributors and the aggregators to make them available. So we don't have a license with the publisher for this content. The aggregator has a license with the publisher but it's limited, it's limited to their main purposes. So if we want to get this group dealt with we're gonna have to develop some new ways of going about the process and the business. I say they're important because what we're seeing at least at Columbia is for years we've taken it as a principle that we are going to keep print until we know there's a preservation strategy for the electronic and that's starting to erode now. People are tired of collecting print. They don't see the need in many cases. People want the electronic. We've been through this for years and years. We went through it from the sciences and social sciences and so now the collecting print solely for the purpose of ensuring that there is going to be a preservation copy becomes less of a compelling need especially as we start dealing with budget situations and economic prices as well. On the other hand when we turn to the miscellaneous freely accessible titles it's a very different picture and a very mixed bag. What we found here is that we have acquired these and that's definitely in quotes for a reason en masse for the most part. These are not, there are many that are hand-picked that we've said here's a freely accessible title. We want to make it available, please catalog it. We would like to see that it's preserved but the bulk of them are coming because someone has become aware of a title, has amassed information about it and in our case serial solutions were both using as a supplier of data has put together these large collections of freely accessible titles that we get bibliographic information on. So the value within these is quite different. Some of them we would consider important, others not. We would not necessarily have collected them if we had to collect them individually and put the kind of selection and curation efforts into them. And so looking at them as a group is going to mean either we need mass solutions or more likely we need individual decisions on them to put the effort in. A similar situation but perhaps a bit different applies when we get into these newsletters and trade publications that for many libraries, perhaps most libraries in the print world would have been treated as ephemeral. If we acquired them at all, we might have acquired them and put them out for current awareness and then most of us would have dispensed with them. In many cases, we wouldn't have acquired them at all. They would have been seen as too specialized. The Lincolnshire Horse Breeders Bulletin and I think that's not the actual title that's fairly representative of some of these is not something most of us would have been acquiring. They now come in large databases of this type of content and they may not have the same value to preserve individually. There are probably not too many people in our faculty who are waiting for the new issue of the Lincolnshire Horse Breeders Bulletin or citing articles in there. But the ability to analyze this kind of material, especially when you get a whole massive material around an industry and you can really do research on it is a different story. And so preservation of this material might become more valuable than it was when it was scattered among many libraries in print form. But raises different challenges about how. We separated East Asian material by which we really mean Chinese, Japanese, Korean for different reasons. Many of the issues technically and from a philosophical standpoint are not all that different. But the illegal environment in those countries, the technical environment for acquiring this, the ways they are brought together by aggregators are different and probably require a different set of people looking at them and a different set of arrangements to deal with them. I'm gonna skip the participating publishers for a minute because that's in some ways one of the most interesting factors in here. I was at least surprised by the low numbers here for non-participating publishers, by which we mean publishers that are making their content available but are not yet, if you go to the list of publishers participating in locks or the list in Portico, you won't find them. We know that they're not the ones that any of us decided to go after first, although in the early days of locks, we certainly thought we would be looking at many of these kinds of titles. They tend to be not large groups, not the publishers who are publishing 100, 200 titles, but the ones who are publishing one or two or five. And so you get into cost-benefit issues. Many cases they are important. The individual titles are things that we are finding valuable and in many cases we're still collecting them in print. But the work to preserve them, you preserve one title for almost the amount of work you would get for preserving a large number because you're making the set of arrangements and then maintaining a kind of difficult relationship over time. Many of them are not as well positioned technically to just contribute to this effort. With the government titles, and to some extent with the international agencies, it raises other questions of whose responsibility should it be. There are in other countries, national mandates to collect and preserve this material. That may be the way it's being dealt with in some areas. Again, I'm not gonna dwell on that point, but it raises different questions. It was also a bit surprising. I think we expected to see more data errors and more problems with the data. Often I've tended to fall back on that as the explanation for why things don't work because we'll find examples that the one I gave before where something just, it's there, but it just didn't match. And so it's easy to say, well, there's really a lot more that's done, but when we tried to look at this, we didn't find a whole lot. We found some, but not as much as we might have thought. So relying on cleaning up the data is not gonna be the answer to most of our problems. Going back to the participating publishers, this was a bit of a surprise to us and in fact to others. And it wasn't, for the most part, I gave an example of one that we would have expected to find that wasn't in one of the services. But what we've found more often is a misunderstanding on our part of what it means to be a participating, quote, publisher. So a Shingarach Springer as an example, not because I'm mad at Springer or they're doing bad things or anything like that, but it was one of the case studies we looked at in some detail where they're acting as a third party distributor for other publishers. And so mentally, we tend to say we have an agreement for Springer link, for Springer e-journals. Springer has an agreement with Lawson, with Bortico, they're preserved. But there are titles within that that they're really not the publisher, they're the aggregator. And so we need to have a better understanding of what that means, what those titles are, how to sort them out and know what's actually being preserved. If you look at our licenses for these, they often will say that there's an archiving provision that says they're being archived, but not everything that is in the purchase part of the license is in the archived part of the license and they're not really specific as to which or which. Also, there's material there that gets into the content that's not structured as journals. So we found content that is, at least in the Portico case, is deposited, but it's sort of in a lump. It's not available as individual journals yet. And this relates to a lot of the conference publications as an example where they're not structured and as a set of journal articles with separate issues and files and so forth. So the ability to track and audit and know exactly what is there is still needs something to be desired. So this leads us to what should come next, what should we do about all of this? And just a few thoughts that we might want to use to start conversation. There are obviously a number of different strategies that might be applied. As I said before, locks in Portico are very valuable and important to us, but not necessarily what we should rely on for everything. And already, if we looked at those historical titles, and in fact, many of the titles that are in that 50% that don't have ISSNs, they're really the older material that is already being dealt with in other ways. So there's a lot of material that's being deposited with HathiTrust and preserved in that way. And there's another session, one of the other rooms right now about HathiTrust. Portico has a different arrangement with certain publishers to deposit digital collections, these large collections from Gail and Adam Matthew and others. And the journal content is within that as well. So there may be other ways of addressing the preservation question. For the free material, the EDPO program, which is really running from the National Library and the Netherlands, has a commitment now to preserve the thousands of titles in the Directory of Open Access Journals, DOAJ. Is that an appropriate preservation strategy for us? Is it true for all of those freely accessible journals? We haven't really decided yet, but it's one thing that we should be considering at least. And then there are some of them that we will acquire through other means, such as web archiving, harvesting the journals off of the websites and acquiring them that way. There are other discussions going on about universities' role in publishing and the library's role in publishing within the university. We were acquiring some journals and putting them into our institutional repository, either because we are harvesting our own website content and putting it into our institutional repository, or because we are the publisher. And so we're taking the archiving responsibility as well as the publishing responsibility for these materials. Now, those are probably not going to deal with huge masses of material, but as we try to say where our efforts should be directed, that might help. And then as I said, some of this material is not really probably best preserved as journals, but as books and individual documents. So what do we want to do next? Several steps. Some we can take ourselves and some we really need to have a lot of help with. We ought to repeat this analysis. How much have things changed in a year? It would help to run the same kind of analysis again. And undoubtedly, there will be things that are preserved that weren't before. There will also be new titles that have appeared that may or may not be preserved. We'd also like to extend it a bit and try to delve a bit more into some of those questions of what it means in terms of content and in terms of investment, which will be quite a challenge. And if anybody wants to take those up, I'd be loved to hear about it. Obviously, we need to work with other libraries. If anything, this is the purpose of this presentation here is a call to action to get people to pay attention, to realize that we can't be passive and we can't be complacent about what's there. And we do have to work on this and come up with answers. But once we've done that, of course, as I've heard endlessly from Locke's and Portico, we need to work with the publishers. The libraries have greater clout with publishers. If we want things to happen, it has to be said over and over and we have to make this a part of our business terms with them. But we also need to work with our partners in the preservation area. So I'll anticipate my last slide and say we wanna give great thanks to the staff at both Locke's and Portico who were really helpful in this process, who really were concerned and interested in this and contributed a great deal and we look forward to continuing to work with them. We've also been talking to what is now the Keeper's Registry. It started as, and still is, the Peppers Group in the UK, which is developing a registry of what's preserved in a variety of ways, not just through the two that we're talking about but other means as well. And so working with them to see how can we better manage this data. We need a better understanding of the international context and what's being done in other countries outside of even the Peppers and the US environment. And as Oya said, we can't do this manually. We ran into limitations already. We need better ways of communicating this data back and forth. We don't wanna have to look up a title in the Keeper's Registry every time we wanna find out if something's preserved or not. We need to be able to move data from Portico into OCLC and to Serial Solutions and to the systems we're using to manage our resources so we can see where they fit. And with that, I'll turn it over to questions. Vicki. Mostly I wanna say thank you to both of you for doing a very simple thing, which is paying attention. So thank you very much, both of you. And I do look forward to working with you to solve the problem. With that, one question, the content that libraries pay a lot of money for is the content that is at least risk of disappearing. It's also the easiest to preserve. It's the cheapest and the easiest to preserve. And most of it, honestly, is in Portico. The content that is hard to preserve and is at most risk of disappearing is the content from the publishers who have the one title. And as both of you have heard me say many times, but perhaps the room hasn't, in this day and age, the most interesting content, in my view, is the freely available humanities literature being published by people who have no clue that open access and free is a political statement, but are just doing the kind of publishing that we used to bring in as special collections. That is extraordinarily expensive to process. How in your organization are you going to balance the, we are spending billions of, I'm gonna exaggerate, but millions of dollars for this stuff that really is not going away and is always going to be available, let's take science track, always going to be available from the Elsevier platform since Elsevier had its origins, I think, back in the 1300s. They're not going anywhere versus this stuff that we used to collect by meeting the artist on the street corner in Bulgaria. So how are you gonna manage those internal conversations? I think it's an excellent question and I'm afraid we don't have a comprehensive response. We can have another two-cool study to answer that question, but at a very high level, actually this is again an example of two-cool collaboration. Last year we looked into web archiving and especially with the leadership from Columbia University because they have a very nice research study looking at many, many aspects of it. And we are both working with the Internet Archive using their archivist. So it's just starting as a foundation and I must say that Columbia is definitely ahead of us, but if you look at open access publications, as you said, that are disseminating as we speak, especially in social sciences and humanities, one of the issues is building an infrastructure for them and Columbia study really nice illustrates that from metadata standards to the ingest procedures to reflecting them on the catalog to the interface to understanding what features from technical perspective are important from discovering access to usability because we cannot be preserving every feature. I'm actually going to stop here and turn it over to you if you wanna add more. Yeah, just a couple of things. I think it's a really good question that I don't think we're going to have thoroughly answered for some time. I spent the early part of my career as a serials cataloger, part of the time cataloging these mimeographed poetry journals that might come out with two issues and whether anybody else had them isn't the question. And I'm trying to remember, Cliff made a remark yesterday that struck me as an analogy. It's always been the case that the things that take the most intensive staff resources are the cheapest ones. The things that are free are the most expensive in other ways. So we've had this problem, this issue all along. I think we're going to see fewer libraries working in that sphere and that makes the dependence even greater on those fewer libraries. I think we need to look at clusters around subjects. So these, as you said, these become all special collections. We don't need everybody doing a handful here and there with no organization, but if Columbia says we're gonna just put our stake in the ground around human rights and somebody else says we're gonna do those poetry journals, that might be a way of getting a little bit of progress in this regard. I'm gonna add one more thing quickly. Actually, Janet Gertz from Columbia University Libraries here and in a way this study also was promoted some of the discussions we had and these discussions were about the fragmented nature of digital preservation decisions at libraries. We work with HathiTrust. We work with Internet Archive. We work with Portico. We work with LOX. We do internal things, such as archiver on institution repositories. But the question you're asking really requires that we look at it comprehensively, holistically, to understand how the pieces fit together because otherwise we are going to find ourselves five years from now just focusing and having blind folders and not noticing what's happening in other areas. Got it. So, just to follow up to that, I didn't expect an answer. You gave the answer, that was perfect. So thank you, thank you for that. I just wanted to add that we do work very closely. LOX works very closely with the Internet Archive and as an example, last month there was a, one of the titles, I think, I don't know who selected it back when we were doing humanities journals for LOX. There was a title called Big Bridge, one of these freely available, it's actually being preserved in LOX. And there was a movement called 100,000 Poets for Change, which we decided should be preserved, which absolutely could not be collected any other way than through archive. So we went and got it from archive it and there's a nice, we couple closely. So there are a variety of ingest ways into LOX and so that can be distributed out for preservation if you wanted to, just FYI. We're also thinking of doing the same thing for government documents rather than make arrangements with each of the agencies that aren't coming through GPO, which would be way too expensive, just going and getting it with archive it and then putting it into a distributed preservation system. So just to let you know that. And thank you. Bruce Hedrick with JSTOR and Portico. Fascinating stuff. It's really been amazing having these conversations internally based on the research that you guys have done. And so again, thank you for that. I wanna make a point and then ask a question. Obviously we've got fairly small organizations here, LOX, Portico, HODI, who are, have a fairly big footprint for as small as they are. And there's only so much that these small organizations can sort of take on in chasing down the, as Vicki rightly points out, the really most at risk things. And you had to balance that with the fact that even in the construct of participating libraries in Portico and LOX, you're really only talking about maybe 1,400 libraries at most. If you think about the number of institutions who have a JSTOR collection, it's about 7,500, 7,600 institutions. And only a small percentage of those are actually participating in one of these digital initiatives or both of them. So we have that problem of getting people engaged in this issue. And as with the economic situation the way it is, obviously, the principle gets overtaken by practicality. And so you have a lot of smaller organizations, libraries in which case, probably never did a lot of preservation. Are making decisions based on what the larger institutions like Columbia and Cornell are actually doing. And I wonder when you look at an analysis like this, do you actually start to think internally because you're dealing with the economic practicalities of this that there are gonna be certain types of content that deserves different levels of preservation. So scholarly journals may need this, but other content may need something that isn't quite the gold standard or isn't quite that. And are you starting to think about those decisions in that way that the expensive preservation, there's sort of a tiering that happens of the content that you have? Actually yesterday I was talking with Vicky and I again used the term, this was a somewhat of a high level quantitative analysis. Now we need to qualify it. As Vicky said, can you tell me what you really mean? I think my answer to your question would be maybe kind of delving into that. As you said, not each journal has the same importance and even whether you are looking at it from a commercial perspective or scholarly perspective. And one of the next steps that we are considering is probably based on a case study or again probably small steps taking small steps forward. We want to look at kind of analyze the content in means of their priority or what level preservation would be sufficient. Maybe it's okay to be a dark archive for certain journals, but for certain journals you may need much more instantaneous and robust methodologies. So we are not ambitious, but the next step is really what we call qualifying to be able to again, based on a case study to kind of bring that analytical approaches to almost kind of be more nuanced. I think some of the issues that you addressed. Again, it will be a case study, but that's one of the issues we are considering. I'd also say my hope is that different groups, people, individuals, organizations will step up to different parts of this and we should be looking at this also through the lens. There's not just a matter of what's important, what's not important, but how things are used and how they're likely to be useful sometime 50 years from now or set whatever time you like. I mentioned the trade publications as an example, where one strategy doesn't fit everything and it's not just a matter of priority or importance, it's a matter of what is needed. There's some things that the really important thing is gonna be able to track down a citation that somebody's used in their research. There are other things where that's really almost irrelevant, or at least not the primary use case. So I would like to see different approaches there. I'm branching off from the question a little bit just to revert to a comment on Oya's beginning remarks. The fundamental problem here seems to be that nobody knows whose job this is. Stay quiet. It's the job of the people who think it's important and come up and say it's gonna be my job. And one reason that hasn't been dealt with is that it's not clearly a collection development, responsibility, it's of interest to collection development but it's not clearly that person's job or preservation has dealt with things that we preserve ourselves and in certain techniques so it doesn't fit there. So we've got to think, and that means it's going to be different parts of different organizations that come together to do this. It's not going to be all the people from one existing community. Now how we organize it is the question of me, I'm hoping to see people in rooms like these coming up with those questions. I think I might be able to take one more question if there are any. Trying to fit departments within the organization to provide support to access the research network as a way of expanding the need for and the pain for access to the networks. Do you think we've made the possibility of any of the grantee organizations, perhaps organization-assigned with them, include preservation as a line item in the grants for the work that the scholars are doing? The question just to repeat for the recording is NSF has had different strategies in the past for ensuring, I'll separate that. The question was really, what's the potential for granting agencies to include the preservation as a line item within the budgets for that? And I think that there's, it's interesting to think there's been a lot of attention given to that, not so much for published content, but for data sets as a big factor now to making access available, but not as much for the junction of those things. Access to research is an important factor in the granting now. Preservation of data is an important factor in the granting now, but they haven't really come together on some of this. Some of this may be a matter of, again, how do you translate that responsibility? If your grant includes responsibility for the preservation of publications that arise from the grant, it's one thing to imagine that happening through an institutional repository deposit of that. It's another thing to translate that into what happens when it gets published somewhere else and the rights go carry along and you have to track that through. But it's a good question. Yeah, it is. And actually, just a very quick addition. You know, it's this general ecology of scholarly communication and all the changes. You may be aware that there are two RFIs now from the White House, one with research data, the other one is scholarly journals. And they are within the context of open access, but one of Cornell's perspectives in responding is relating open access to preservation through redundancy and kind of making the case from a commercial or economic perspective because it's related to Compete America Act. But that's definitely one of the issues too, that overall publishing ecology and how we are building redundancies. Thank you very much for coming. Yeah.