 Good afternoon everyone. Thank you for joining us. I'm Gael Biquette. I'm the director of the ISA International Center and I'm with Thomas Padilla from Internet Archive and we are going to talk about multi-custodial approaches to digital preservation of scholarship. First of all, I would like to thank the organizers for inviting us to talk on this subject and I remember I was here at CNI four years ago talking about the also talking about Kipper's Registry, so the idea today is to provide you with an update and Clifford Lynch has served on the technical advisory committee for Kipper's Registry and he has always been a great supporter of these projects. So here is the outline of my presentation. As I've just said, I want to give you an update on Kipper's Registry. Also a quick overview of a service that we've just implemented, which is called Submit Retrieve Reuse and touch upon the policy for the development of Kipper's Registry starting 2025. So just to start with, I would like to give you a brief introduction of Kipper's Registry. It was created and set up at the University of Edinburgh by Edina, and in 2019 actually the ISA International Center took over the service from Edina as JISC was stopping funding to the service and the ISA International Center had been involved right from the start with the project. So this is the reason why there was let's say a natural shift of responsibility between the University of Edinburgh and the ISA International Center. So we work with keepers with the preservation agencies, with lists you see on the screen. And since 2019 and since we took over the service we've been eager or eagerly working on the development of the service by actually involving more agencies. So nowadays we have at the moment 14 active preservation agencies working with us and three inactive preservation agencies. These agencies are archaeological data service, the British Library and the Swiss National Library. So we hope that at one point they may review their decision and come back and work with us. But we are fortunate to work with 14 other agencies and with the ISA Center as a linchpin to match data between these archiving agencies and the ISA Center database we know that at the moment there are a bit more than 88,000 titles which are preserved and a bit more than 21,000 titles of cereals which are preserved at least by three agencies, which is to say one quarter of preserved titles are actually held by more than three agencies. We've just included data from the National Library of Spain. It has just been added, but we have not yet communicated on on on this new inclusion and we also work with TIB in Germany, the Leipniz Information Center for Science and Technology and they are currently updating or upgrading their data to be able to share it with us. And, as I've said before, we've developed a new service. I will come to that in a moment. We've also added a handful of statistics about the collections which are preserved and these statistics allow users to compare between the agencies and to know the amount of preserved titles that they each handle. As you can see here, Portico is well actually holds the most titles they preserve the most titles in their archive and then comes clocks, locks and I think scholars portal. So, of course these statistics evolve with time. We have a MOU with each archiving agency that actually defines the frequency with which they update their data. And, of course, Internet Archive is also a partner. So, this is a reason why we are both here to talk about preservation, digital preservation. These statistics are also interesting because they show you how unique some titles are and among the 88,000 titles which are preserved actually a bit more than 55,000 are held only by one agency. And so this means that what we call the keep safe ratio is still to be improved because this ratio was devised right from the start, I think, of the project stating that to be assured that the resources will be kept for the long term, it would be safe to have them archived by more than three agencies. So, as you can see, there is here some progress to be made. We also have developed a coverage overlap short. You can see here on this one, I think the overlap between locks, clocks and portico, but you can move the arrow on the statistics, on the page, on the portal page and see what are the overlaps between other agencies. And here, this is more recent, we have developed some statistics on the national sovereignty of the preservation of journals. And here you have the example of France. As you can see, Gallica, which is the archiving system or the depot for the archive of the National Library of France, actually holds more than 96% of all French titles which are recorded in the ICES and database. But these figures are a bit different when you look at Netherlands. As you can see here, the preservation of titles published in the Netherlands is much more balanced since clocks, the archive of the Royal Library of the Netherlands, but also Internet Archive, the Library of Congress, locks and also portico actually holds in the archives a lot of titles which are published in the Netherlands. Coming to the UK and the US, we have the same situation here that the preservation of these titles is also well shared among the keepers. Another interesting statistic is the preservation of open access resources. The ICES International Center has developed a directory of open access resources and as you can see here on the screen, for France, there are a bit more than 4,000 titles which are listed in the road and according to the statistics, only 145 are actually archived. The situation is a bit better for the UK and the US. As you can see, there are a bit more than 2,000 open access journals listed in the ICES and portal out of which a bit more than 1,500 are actually preserved by keepers. I want now to switch to a service that we've developed which is called Submit Retrieve Reuse. The idea or the rationale for developing this service was to allow libraries, publishers or any content provider to be able to check the validity of the ISSN in their catalog, update the ISSN and the metadata, retrieve keepers registry data in bulk because that's of interest to libraries to know about the preservation of these resources and the ultimate goal being of course to make room for in the 3D and the digital storage facilities based on information retrieved from keepers registry. You have here a screenshot of the metadata which is held in the ISSN database and here more specifically information about the archival status and this is what you retrieve by uploading a list of ISSN. You retrieve information on the validity of these identifiers but also data about the title whether it's print or digital and as well as detailed metadata and also information coming from indexing services and from the agencies participating in keepers registry. What's next for keepers registry? We are currently devising a policy for the development of the service. We would like to increase the interest ratio of course by increasing the number of keepers but also by diversifying the agencies so that we can cover some titles from other parts of the world because for the time being as you were able to see our main focus is on North America, Europe, a bit of South America but we would like to work for example with Indonesia because we know they publish a lot of journals. Also Australia, Canada, Finland, Denmark and Turkey. These were the countries which were actually identified through this research that we are currently leading. Also contact publishers in countries with low interest ratio to encourage them to work with archiving agencies and choose from the keepers or other agencies that may do the job for them. Another objective is to engage with libraries that select open access titles were such archiving because it's always tricky to actually identify or know about these titles and whether they are worse being ingested. So I think that's something we could do with the contributions of libraries around the world. Develop a metric to monitor coverage for each archive title for the time being but that's an issue that is addressed by I suppose a lot of you and libraries in general is that we have heterogeneous holdings descriptions and we need to tackle this issue by trying to streamline or rework these holdings so that we can compare actually between all the archiving agencies what are the gaps and find the gaps wherever they are. And also another objective would be to achieve formal registration with UNESCO memory of the World Program and to achieve this we will need of course some support from the archiving agencies which contribute to the service. Thank you for your attention. Okay, hello everyone. I am not Jefferson Bailey. I am surprised. I am Thomas Padilla. So Jefferson couldn't make it. I'm here to pitch hit today. I'm going to do my best Jeffersonian version of this presentation. So going to talk a lot about a project that we work on at the Internet Archive called Internet Archive Scholar but of course trying to make the larger point about the relative to the title of this session that multi-custodialism is needed to preserve open scholarship. So our work on Internet Archive Scholar very much in alignment with our general mission at the Internet Archive which is to provide universal access to all knowledge. In terms of preservation and access strategies that we have at work at the Internet Archive to preserve scholarship they are various. We have publishers and authors self deposit through archive.org. We have publisher and author donations. We have ongoing large scale digitization of material in various formats and we have the targeted born digital collection strategy through IA Scholar. Access approaches similarly various public access to content through archive.org participation in interlibrary loan and again IA Scholar as an access sort of portal aggregation point etc. for open scholarship. This is an example of a kind of preservation of scholarship that we do at the Internet Archive. What you see on the left is a picture of what we refer to locally as the SIM collection or the serials in microfilm. Large scale digitization project, 14,000 titles, 480,000 volume years and about 500 million pages. Ongoing increasingly made accessible. High level goals for IA Scholar as we try to do with most of our services we seek to leverage the open infrastructure that we have on hand and our large scale web archiving program to provide better access to scholarship. Part of the way that we do that is by trying to integrate additional automation within our broader automation workflows for web archiving globally at scale and kind of nest within there another workflow to try and capture and archive open scholarship as it proliferates across the Internet. We develop various tools to identify these scholarly objects in web harvests and archives and work to improve and augment the metadata often through integrations and partnerships in the broader community. Four main preservation strategies that we have of course automated daily archiving making use of persistent identifier sources periodic archiving and targeted web crawls and some ID and extracting content from past generic web crawls and then again partnerships and integrations which I'll mention in a few slides a few examples in that vein. In terms of distinguishing how we approach this work we do it in two ways. We have sort of a top down or known work approach and then we also have a bottom up or unknown work approach. We mostly tend toward the top down and known work approach where we're seeking to harvest and archive pids registries metadata manifest and so forth. In terms of bottom up unknown work that entails you know often applying machine learning strategies to the general web archive collection. There are many billions of PDFs in this archive so we tend to do that on a semi regular basis. It can be quite expensive on the compute side. Many many millions of outputs preserved including datasets in IA scholar about 35 million OA papers with open full text datasets. We also maintain a scholarly citation graph called ref cat which is also openly accessible online. And in terms of acquisition we're at about 30,000 new objects a day via pids archive PubMed and so forth. Roughly five million new objects per year end up in IA scholar via the targeted and general web harvesting that the larger Internet archive operations are engaged with. This is a depiction of a workflow for PDF processing from some of the general collections to identify unidentified works making use of a series of open source software called PDF Trio grow bit and fuzzy cat helps us to service things surface things that are otherwise not identified and are perhaps not preserved in some cases. Everything that I'm talking about or I think yeah pretty much everything I'm talking about can be accessed at a scholar now it is available it is live it has been live for some time. The URL is kind of nested there in the slide it's scholar archive org. Courage you to take a look. Internet archive scholar also interacts with various interlibrary loan platforms. We provide access to the aggregate metadata sources databases and data at archive org in a collection called bulk bibliographic metadata quite descriptive. It's all there take a look in terms of sort of values of Internet archive scholar various depending upon your position in the field. Not to read all of these I'll just maybe maybe just one from each. So we think with you know for journals that Internet archive scholar provides some value and being able to just sort of relatively automatically archiving open access pubs through our web harvest for institutions. A storage preservation and indexing service and for researchers essentially research data particularly for folks working in meta science. Many partnerships and integrations make this possible. A couple of examples or before examples a note about multi custodial strategies. You know kind of like I was saying at the very beginning this doesn't really become possible without many partnerships and integrations fairly fragile space. And so we're trying to seek as many partnerships and integrations as we can and reinforce the rest of the community where we can. So in line with them to mention a couple of examples Internet archive has joined the Keeper's Registry of preservation organizations. We do reports two times a year of you know everything that we have committed to preserving. We have a partnership with Google Scholar where 30 million of the O.A. works and I. Scholar are now indexed and available through Google Scholar. And we also work with Semantic Scholar. We have an additional integration with the Center for Open Science where basically anything that goes up on Center for Open Science in the research registries is automatically archived at archive dot org. Currently focused on the research registrations about 100,000 archive but we're scoping to include inclusion of additional O.S.F. areas. We are also participating in Jasper which is a multi institutional effort to provide digital preservation services to O.A. non-profit journals in D.O.A.J. Currently 100 plus deposits from 30 journals are preserved through that collaboration. Partners at D.O.A.J. clocks public knowledge project and keepers slash ISSN. And finally my last bit here in this whirlwind tour. We're in the pilot stage of working with core and cross ref and data site to look at hosting mirrors of content and services such as databases API search indexes and so forth. And a last bit we're also exploring the potential value of selfless deposit which would basically be a service where we would provide universities mirrors of scholarship from their faculty. That we find and put in an I.A. scholar but are not yet necessarily deposited by faculty into their local I.R. Selfless deposit. So that's it for me and this wolf credit of course to Jefferson and colleagues Martin, Nate, Alex and Monica and also colleagues David and Vicki. And thanks to our past funders who have brought us to this point, the Mellon Foundation and IMLS. Thank you so much. We have time for questions. Do you want to pick who goes first? Who's your favorite child? Yes, Meen. I knew it. I knew it. This question is for Thomas. I see that internet scholar is in beta and I was wondering what the threshold is going to be for I.A. to take it out of beta. That's a great question. So I guess I have been advised to not answer questions. I don't know the answer to by Jefferson. So that's I guess that's I would suggest following up with Jefferson. So my question kind of for both of you. I think the point about so few journals being many of the journals only being kept by one keeper as I think important. But it follows along to the second point you were making to write that we don't really know the completeness of these because the data is at the title level. And so I was just curious for either of your particularly with e-journals. It's hard to even know what the unit is. There used to be a bound volume that came in the mail and if you kept all those you had it but like issues aren't even really the unit by which materials are distributed. I'm just curious if either of you are working on or thinking about ways to sort of track or measure completeness or how you might kind of pick at some of that. Thank you Trevor. Well actually that's what I was explaining in my presentation. We are aware of these these flows let's say in the data that we can retrieve from the agencies. And there there has been a development and maybe you're you've heard about that in in Brazil by a big. They've they've implemented or just launched a project called Pina case. And they so it's about the preservation of journals. And they've been able to tackle the issue of heterogeneous holdings descriptions. So we need to get in touch with them and see how technically they've been able to achieve that. And recently a colleague of mine from the Agence bibliographique de l'enseignement supérieur. So the the agency which actually coordinates academic libraries in France has sent me information regarding pie mark or so that also a development or program that can address these type of issues. So I think there there is some progress done and I hope we can find a solution by investigating further. I've got a question for Thomas. I was really intrigued by the ref graph citation. You know one of the constant hurdles we have with away publications is trying to show the value to faculty to publish there instead of publishing commercial journals. And is the Internet archive thinking about using that information to kind of advance research impact metrics for for that corpus of work. Thanks. It's a good question. I think that you know certainly ref cat itself seems like a good research object or research data set for researchers that may want to pursue that question. And certainly we provide open access to it in in that spirit. Well if there are no other questions we thank you for your time.