 Hello. Thank you for joining me for this short project update at CNI spring 2021 virtual membership meeting. My name is Don Mitchell and I am from the DOA J directory of open access journals. And I will talk to you today about a project which has seen five organizations come together to try and stop or reduce the number of open access journals disappearing from the internet. The title of my presentation is a collaborative approach to preserving at risk open access journals. This is a summary of what I'm going to talk about. It's essentially the different stages of a project plan. So you can see this presentation as a project plan. I'm going to give you some information about the objectives, our method and what the deliverables are and an estimated timeline by which we hope to have completed the first phase of the project. A quick overview then. We often hear in the world of digital preservation that journals are simply vanishing from the internet and sadly this isn't a new phenomenon phenomenon for us. We refer to this group of journals as a long tail of journals and by the long tail, we mean a large number of small, single journals often run by one or two people that may be unfunded. They might be done for pure sort of enthusiasm for the subject matter, and they are poorly resourced, or at least they have limited resources. And we often see that these journals are at risk of disappearing for reasons which I will go into in a second. The project group consists of five leading organizations that are best placed to make a project like this work and I'm extremely happy that DOJ has been able to collaborate with these organizations and I'll give you a bit of information on those in a second. The idea is that we're going to create a central hub from where preservation agencies will be able to harvest consistent metadata and full text, thereby reducing the cost of undertaking a project like this. We're hoping to be able to remove technical and financial barriers that may be in the way that prevent these journals from taking part so that we can find a free and low cost option is very important. Phase one of this project will target diamond journals and by diamond, I mean those journals that are in DOAJ that do not have article processing fees as we often see that in open access models. Article processing fees are a way for journals to earn revenue, but there are many, many journals, more than 65% of the journals in DOAJ do not charge article processing fees. These are also known as APCs. So here are the five organizations involved. As I said, I'm very proud that we've got some great names come together for this project. CLOCKS stands for Controlled Lots of Copies Keep Stuff Safe and that is a sister project to LOCKS and has run out of or is in association with Stanford University and really offers a very complete archiving and digital preservation service. DOAJ, of course, we index over 15 and a half thousand open access journals. And we have an entire range of journals in terms of size, scope, subject matter, funding, geographic location, language. And probably we are the most diverse index of open access journals in the world. Public Knowledge Projects specifically for this project, the Public Knowledge Project Preservation Network, which is an archiving solution offered by PKP, PKP best known of course for its OJS open source journal platform. The Internet Archive, hopefully known to many, if not all of you, they offer great services in terms of archiving, and it's an absolute pleasure to be working with them. And then the Keepers Registry and the Keepers Registry, which is, which was recently acquired by the ISSN International Center is a service which aggregates and standardizes information about journal content, where it's preserved, what range of content. So these five organizations, I think, is going to be recipe for success. What problems are we trying to solve? Well, first of all, journals vanish from the Internet. And when they do so, they take with them the research that they've published. And this of course leads to broken URLs, broken citations or reference rot. This is a huge problem, especially if you're an author, and you've published your research in that journal. As I mentioned briefly earlier, some of the existing archiving and preservation solutions that exist today do come with financial and technical barriers, which may prevent some journals from taking part. We know that of the journals selected for this project, many of them don't have any income source. So paying to join an archiving service is very difficult. There's no problem here. There is for every journal that approaches an archiving service. There is an initial investment in time and technology, which is set up, and that set up process itself is often unique. It's often very expensive to do, and it can take a lot of time. This project hopes to reduce or remove the impact of that problem. And there are of course, thousands of small publishers, the fourth problem here, and there's a lack of standard practices. Everybody has a different website, everyone has different metadata, everyone has different abilities to export. And this makes archiving very difficult. Lastly, but not least, by any means, the fifth problem there. Awareness among journal owners, editors and publishers about why archiving and preservation is so important is in fact very low. And we hope that with this project, we can reduce the impact of that problem. You'll see there I've included a link at the bottom to the latest paper on this issue of journals that vanish from the internet, and that was laxo et al. I've linked to the preprint I do believe it was recently published in a wily journal. So we want to assist JAS I ST for those who are interested. So the objectives for the project. Well, of course, we want to increase the number of journals that are in DOA J whose content is being actively archived and preserved. So through through our joint efforts with the other four project partners, we want to provide an important piece of sustainable infrastructure that will be in place for the open access publishing community. If the phase is successful, we would love to see the, the output from this project becoming a permanent and sustainable piece of infrastructure for journals to take advantage of. We want to make sure that the solution overcomes financial and technical barriers to therefore encourage these journals, especially the unfunded journals to take part in archiving solutions. We want to foster best practice and standardization where there isn't any at the moment, among that long tail of DOA J journals. And lastly, we want to raise awareness as I said, among the editors and journal owners or why archiving and preserving digital content is so important. So, what's the scope of the first phase of this project. There are approximately 7500 diamond open access journals in DOA J that when they submitted their application to us, they said that they are not actively taking part in an archiving program. So that's the group of journals that we're starting with, they're in DOA J, they don't receive any fees for their services, and they have said that they are not actively being archived in any archiving program. DOA J only accepts journals that have an ISSN and an ISSN is extremely important, particularly in this context because it is the key persistent identifier, which is used between the archiving agencies and the keepers registry. The advantage that we're only starting with DOA J index journals is that it greatly reduces the possibility of that these journals might be predatory or questionable. 50% of the journals in that 7500 are actually using PKP's OJS journal hosting platform, so it made sense immediately to bring PKP into the project. The project successful phase two will extend to other DOA J index journals and perhaps even journals that aren't in DOA J although we would encourage them to to apply. So how are we going to achieve this. We are in the process of writing a survey that will be sent out to the 7500 journals. We will hopefully be in Portuguese and Spanish and hopefully be in the United Nations Six languages, although we do need to find the resources to make those translations possible. That information, the information that comes back to us those responses will divide the journals into three groups. The first group, hopefully the, or maybe the, the majority group will be those journals that are on OJS. And we will direct those journals to resources that tell them how to take part in the PKP PN. There is a small hurdle in that the P that your journal version that the version of OJS your own has to be greater than 3.2, but with documentation and mentoring and training we're hoping that we'll be able to encourage people to get their OJS content into the preservation network. The last group will be those journals that can produce and export both article metadata and full text, and we will encourage those journals to participate in clocks. The last group will be those journals that, for whatever reason cannot produce an export metadata or indeed do not want to produce or export metadata, and they will be encouraged to allow Internet Archive to crawl their sites. Internet Archive provides an automated solution, which will, which will go across websites in just and take copies of the content. It's very automated, but it's also very easy for the journals to, to take part in that option and in fact Internet Archive are doing it already today. And we're hoping that we will be able to sort of provide greater coverage for the Internet Archive by pointing some of these journals in their direction. One of the things that cropped up as we were putting as we're doing the scope and the research for this project is how important a permanent article identify is when it comes to facilitating discovery and archiving. So if you see archiving digital preservation as a sort of precursor to discovery, then when these services ingest article content, then they match the metadata and the full text and everything hangs around a digital, a permanent article identified the most common of which is a DOI. Using a DOI allows a copy of the article content to always be found. So if, for example, a journal has disappeared offline, but it is archived in clocks, then that trigger event of the journal disappearing will release the clocks content. And the using a DOI will allow those that content to be found again. However, there are, there is a technical investment and a financial impact to using DOIs that some journals can't afford and or are not able to take part of. And the second part is, even if some journals have a DOI, then they don't always ensure that those DOIs resolve a large proportion I'd say maybe about 20% of the journals that we're looking at for the phase one of this project. They don't use DOIs, or at least they haven't said that they use DOIs when they, when they sent their information to DOAJ. DOAJ is hoping to collaborate with Crossref and have permission to assign DOIs to the article content on behalf of these journals. So when the journals indicate that they are willing and able to be part of this project, we will hopefully be able to assign DOIs to the articles. So that will be a sort of extra benefit to this project. So what happens then, there are of course, different ways of ingesting the content. And there are different options which I went through earlier. The first thing to do of course is that if approximately 50% of all the journals are using OJS then we do want to encourage them to get into the PKP PN. So we will allow journals to find out if they're on a compatible version of the OJS, and we will give them the information that will help them get their information, their article content into the preservation network. The journals that are not on a compatible version will be given advice on how to upgrade, or will be asked if they want to participate in the second or third option, the second option being clocks, or if they want to allow the internet archive to crawl their site. So for the clocks option, Internet Archive and DOAJ will maintain an FTP server. So a journal is indexed in DOAJ. It will, it will say that it wants to be part of the project, and then it will start uploading article metadata and full text to an FTP server that is maintained by the Internet Archive. Clocks will collect the metadata and full text from that FTP server. That will be one interface for clocks to interact with for all of the journals that have chosen to take part in that option. A journal cannot provide metadata and the Internet Archive is going to crawl their website, then there will be sort of moderate human QA provided as the Internet Archive crawlers ingest the content. But Internet Archive do provide tools which allow the journal to go in and report or improve the completeness of the holdings for that journal. And finally, all of that preservation data, which is the status, you know, is it active or not, which certain the name of the service, is it clocks, Internet Archive, PKPPM, the range of the content, that is all aggregated in the Keeper's Registry, which provides this essential central database for finding out what content is preserved in which service. So what are the deliverables then? As I mentioned earlier, we want a sustainable solution that facilitates archiving for unfunded open access journals. We want to ensure that the aggregated archiving data for these journals is available in Keeper's Registry. A goal that we've set is that at least there will be a 50% reduction in the number of unarchived journals in DOHA. So we've set that goal fairly low. From previous experience when we go out to journals and ask them to take part in something, response rates have been slow, and often responses trickle in over time. I think it would be naive to expect that everyone's going to respond at once. There will of course be a bulk response, but we do think that there are, this is more of a long-term project, but we're hoping that there will be a bulk of responses up front. And we want to have produced better documentation and training tools in multiple languages that will help raise awareness and will be there for people to use sort of separately to this project. Here's an estimated timeline. This is very estimated. We started the project really sort of the kickoff was in November 2020. And we are now in the first quarter of 2021, just, and we are sending out the survey to find out how many of those 7,500 journals are interested, how many of them want to take part. We are hoping then to buy the third quarter of 2021 to pull together a smaller group of journals that will be a act as a pilot that we will sort of push through the system and learn from our experiences there about what works what doesn't, where the documentation and information needs to be improved, and then we'll apply that to the larger group group of journals. We expect then that the project, the pilot project shouldn't take that long and we'll be able to open up to the larger group of respondents by the end of 2021. And then in the beginning of 2022, we will do a review of the project. We will find out sort of take a stock of our situation, how many responded, how many have been successfully archived. And as experience shows, uptake can be slow. So we'll just leave that project open, but we will, of course, just then talk about a phase two, could we open up the project to a wider group of journals. Can we get funding to help sustain the project and to make it a long term solution for the benefits of the wider open access and scholarly communities. So thank you very much. That is the end of my project update. If you'd like more information, I have listed there the names of my colleagues in the project group, and we look forward to receiving your questions. Thank you very much.