 Well, there's lots to talk about and let's get started. Welcome everybody to the fall 2020 CNI virtual meeting and you've joined us for the first project briefing session of that meeting. I'm Cliff Lynch, I'm the director of CNI and I'm delighted that you're here with us. I want to note that the session is being recorded and will be available. The recording will be available after just a couple of very quick things because I know we all know much more about zoom than we ever thought we would. We're muted at the moment, except for our speakers, and you can use the Q&A box to ask questions for years for this of the speakers will deal with all of those at the end of the presentation. There is also a chat, and you're welcome to use that for comments. As we go along. This closed captioning is available on this recording. If you would like to make use of that. And you're welcome to introduce yourself in the chat don't feel you have to but we'd be happy to have you. With that, I'm going to introduce our speakers. Diane Goldenberg heart of CNI will beam into existence after our presentations are complete to moderate the Q&A session. Martin Halbert, an old friend of CNI's who's worked with us for many years and many capacities and done a lot of fascinating work is now at the National Science Foundation as a rotating program officer and I'm delighted to have him back with us. He is going to be joined by Lance Val from the science office at the US Department of Energy, and they're going to talk to us about a collaboration that they've had in place to support a platform for providing access to funded scholarly publication as part of the, the public access mandates that our funders are putting in place. And with that, I will turn it over to Martin, who is going to start the presentation welcome Martin. Thank you very much Cliff. Thank you for having us here today and we're delighted to be your first set of presenters and this virtual CNI meeting. Lance go ahead and go to the first overview slide. For the presentation, we're going to review the six year collaboration between the National Science Foundation and the Department of Energy, Austin to create the NSF public access repository or NSF par. And we'll give you a little bit of history about this partnership as well as descriptions of the technical implementation details. So many of you will remember the Holdren memo by issued by John P. Holdren in 2013 from under the Obama administration from the off the White House Office of Science and Technology Policy, which directed agencies that issued more than $100 million in sponsored programs to develop plans to support public access to the results of research funded by the federal government, especially peer reviewed publications and digital data. Next slide. And many agencies responded to this in the case of NSF. In fact, under Dr Francis Cordoba. We developed this public access plan NSF 1552 if you want to look it up online. It was entitled today's data tomorrow's discoveries increasing access to the results of research funded by the NSF. And it laid out a plan for responding to the Holdren memo. This NSF 1552 plan was developed largely by Amy Friedlander. I want to give her a lot of credit for leading the development of this initiative. But with also many contributions by others at NSF and of course at DOE as you're going to hear on next slide lands. So just a very high level scan of the 1552 plan. It laid out a set of goals whereby NSF would create an open flexible and incremental approach. And those are my emphasis in that quote to develop this extended infrastructure to deposit publications by NSF awardees. And it in particular called out this notion that we would work with other federal agencies and in particular, department of energy in this collaborative infrastructure that you're going to hear a little bit about and I'm very proud of this collaboration. I think it represents a great example of interagency work and reuse of software and expertise to accomplish this federal public access mandate. So over to you Lance. Thank you, Martin. As Cliff said, and another said my name is Lance Val. I am the assistant director for the applications development and operations with the Department of Energy's Office of Scientific and Technical Information. We are hosted and we are hosted in Oak Ridge and that's going to come into play pretty importantly in just a few minutes when I start diving into some of this technology. And once the time when I do these sorts of presentations I have to ask the facilitator to give me just a little extra time so I can get through my whole title I know it takes up quite a bit of time there so. But I do appreciate this opportunity and as Martin alluded to this collaboration started back in 2013 or 2014. I've been with the project since the very beginning as the project coordinator and project manager from the Department of Energy side. After discussions with Amy Freelander and others at NSF NSF did decide to utilize the same formatting for their public access repository as the Department of Energy did, and we have called that repository or that process we refer to it as pages, or public access gateway for energy and science for the department, but at Austin, we have been providing public access since 1947 through through Austin at the department. We've been all digital since around 2000 so 20 years and I've been at the department now since 2005 so I've been in this space for 15 years. We provide public access to a variety of scientific and technical information, except in manuscripts technical reports data software patents videos, we have a litany of different products that we host. And those are sub products also gov is our umbrella product and then we have niche products for each of these that allow users to drill down into more detail. The slides are lagging just a little bit here we go. This is very familiar with STI management, we have a specialized program that we refer to as step or the scientific and technical information program that collects all of the R&D results for the department both labs are national labs at DOE we have sponsored 17 national labs, as well as our grantees or our financial awardee recipients. We have a corporate responsibility. It's not just the Office of Science. We have the corporate responsibility across all programmed office and across the entire department. We have a specialized tool that is an electronic ingest tool that we use for this we refer to as elink or energy link. And currently we're processing 50,000 or over incoming STI products annually and that goes back to that list of product types that I mentioned earlier. Through STIP and through elink, we were and continue to be well positioned to extend our existing infrastructure to accommodate accepted manuscripts. After that 2013 OSTP memo went out, we felt that we were, we felt confident that we were able to extend our infrastructure and our knowledge to assist other federal agencies that may be interested in NSF took a step on that offer. Our approach is extensible not only to NSF but to other agencies. We also work with DOD's DTECH to implement their public access plan. Our submission infrastructure is well established. Like I said, we've been doing this digitally for 20 years now and can be customized to meet the needs of other agencies, which you'll see very shortly how we did customize our infrastructure to allow for NSF's public access. So we set that partnership or they saw that opportunity and they reached out for the partnership with that long term goal being preservation and access not just immediate access not your short short term access and that will come into play in just a few more than minutes as well. NSF PAR deployment working with NSF we developed what Martin referred to earlier as the NSF public access repository or NSF PAR that is a submission tool. And that submission tool allows NSF PIS to directly deposit peer reviewed published journal articles and jerryed conference papers into a repository at NSF NSF public application is a dissemination product that is a special product for these peer reviewed journal articles and conference papers. So there are two sections for NSF PAR we have NSF PAR the submission tool that allows for the submission, and then we have NSF PAR public, which is the dissemination portion of that. And it's the dissemination for NSF supplied metadata and the appropriate full text metadata and links from chorus, which I'll get to in just a moment what that course part means. But we also have the potential for adding additional product types which include data, which NSF and DOE are currently working on together to see how they may want to approach that and I think Martin will touch on that briefly later. Just with DOE pages, it is a hybrid model. It both consists of NSF PI supplied records as well as chorus supplied supplemental records which are used to supplement the collection and I'll get to that more detail in just a few moments. There was an integration at the technical level. There was a lot of integration that had to go on here to make this to make this work and how we did that we started with a single sign on that was the the most far reaching inter-agency collaboration that we worked through with this. What we worked to do was have a seamless handoff between research.gov and NSF PAR. NSF PAR the submission tool and the dissemination tool are actually hosted in Oak Ridge where research.gov and all of the NSF related products are hosted in the in the DC metro area. So we worked with their team to have this seamless handoff a researcher logs into research.gov they have that portal. They click on a link to submit these manuscripts they're taken to NSF or NSF PAR that's hosted in Oak Ridge seamlessly they never know that they've left those servers they're all government hosted government funded government protected servers. Along with that their SSO passes us certain information. Part of that information is their unique user ID so we know who that user is and along with that they pass us that unique ID we're able to then subsequently turn around using a open NSF award API which is that next bullet there we're able to auto populate a listing of that that researchers that PIs their awards and they're able to associate their publications directly with their awards through that API integration. So if I'm a if I'm an NSF PI a log into research.gov I want to submit a public public access manuscript. I click on the link. I'm taking over to NSF PAR. I don't know that I've left NSF servers. Some information is passed along to there. I enter a do I enter some metadata upload my full text my my accepted manuscript of my journal article and then I'm able to associate which award I'm submitting that on behalf of NSF and it's for easy steps. We also have a rest API that is used for integration between DOE and NSF data stores so that all of this information can be passed back and forth seamlessly from NSF servers to to NSF into their media stores both the metadata and the links to the full text for their project reporting that's how the PIs will then go in and add those manuscripts to their projects for their program officers to then approve at the end of their reporting cycle. We also use a certificate authenticated full text service. So many of you are probably familiar with the holding memo the public access memo. And what that calls for is that these peer review journal articles these jury conference papers that they're made publicly available after a at an administrative interval or maybe you've heard the term embargo. In NSF along with DOE we chose a 12 month embargo for that. And so, all of these are these accepted manuscripts are made available 12 months after the publication. However, many times there are needs for NSF PO's to have access to those embargo full text beforehand to make sure that what their PIs are submitting is accurate. Or they can approve their project report. So what we did is we created an authenticated a certificate authenticated service for that. So inside of the NSF management system otherwise known as e jacket. These PO's can have direct links to do we hosted servers to have access to these embargo full text before the public has access to those so they can view those they can read those they can make sure that the PI has acknowledged NSF appropriately that the science is tied to the award, and so forth. Next slide. There we go. We also integrated with some third party services those third party services two of those that I'll that I'll talk about today are Crossref. The reason that we integrate integrate with Crossref is do I services of course what we're able to do for these PIs the PIs once they get and they're ready to submit their publication, they don't have to sit there and hand type auto hand type all of that metadata the title, the publication date, a list of authors a list of other organizations or kid IDs. If they're if they have that DOI that digital object identifier, they can put that in a box they can hit submit and will auto populate that metadata for them using integration with Crossref API's. If they don't have that or say do have to type that metadata in the there but the vast vast majority of these submissions are done through what we call auto population. So they don't even have to type that meditating and they can put that DOI in they can get their metadata auto populated once they verify that this is the correct information. They hit a button they're taken to a screen where they upload that accepted manuscript version, then they hit another button they associate their correct award they verify all that information is correct they submit it and they're done they're taken back to research.gov. Fairly seamlessly. We've also integrated with this is a big a big shout out to Amy Freelander her part here we integrated with the ISS in database and international standard serial number. It provides lookup services for ISS ends and journal titles for those that are being manually submitted. So those that may not have a DOI or we they weren't able to auto populate that information for whatever reason, manually putting that in. It's important to the integrity and the metadata integrity of this repository to NSF that those ISS ends and those journal names were authoritative and authentic. So what we did. We integrated with the ISS in database, we have top of heads we have top of heads for both the ISS in and their journal name or and or the journal name so all of those are authentic author authoritative journal titles and in and ISS ends that are paired up with the submission so that metadata integrity is solid. We also integrated with chorus chorus is a consortium of publishers that have worked with the federal government to say that we will support public access as it is defined in the holding memo NSF and do we both have agreements with chorus and chorus is used again as a supplement. Neither do we pages nor NSF par rely on the publisher for public access or for dark archiving of these records. These records are treating a supplemental to the researcher supplied records, we wanted our repositories to be publisher agnostic. So the publisher decided that they were going to get out of the public access game NSF and do we, we knew that it was important to have that long term preservation and access to these accepted manuscripts. In this process chorus metadata is ingested and links to the bor or the publisher's version of record, if and when that bor is made publicly available, chorus allows for full text indexing of articles to enhance this search position. So what we do every night as we go out and we do a specialized query on crossref APIs and we ingest all of the appropriate NSF related NSF funded and related articles into par. And then we process those to say, are these publicly available will be publicly available. What version has the publisher made available is it the version of record or is it is it a publisher's version of the accepted manuscript. All of that goes into this calculation for what we call the best available version and there's a hierarchy there that is related to exactly what the publisher makes available. The key objective is that for DOE pages and for NSF par, we want public access to that best available version while not being reliant directly on the publishers for that long term access. And important part of this is that chorus does allow for that full text indexing. So what we do is we they make a version of their full text available, it's downloaded text is extracted from that and then that PDF is thrown away that PDF is not dark archive, the dark archiving is is directly as sorry they're downloaded the full text is directly related to full text indexing for search precision and accuracy. We don't keep that for any amount of time. Until we get that full text out we throw that out. This is that best available version concept just a little bit deeper on that. What what what occurs inside of the NSF par and DOE pages is this comparison. It's submissions of NSF PI submissions along with this collaboration with publisher from chorus and crossref and the in the intersection of those allows for this best available version. This course offers a single feed for all publishers so so we don't have to go out and have individual agreements which I remember back in 2013 timeframe, or even before DOE was seeking to have these individual relationships with individual publishers course course kind of packaged into one package and allows us to have one agreement with them, and then the publishers agree to a set of standards, the best available version, of course is going to be that publishers version of record when and if they do make that available, followed by the publishers accepted manuscript version, and then the NSF supplied PI, or sorry that NSF PI supplied version. This course allows that standardized metadata including the funding sources, the licensing the start dates for those licenses, all of those go into allowing for public access to happen. These are just a few screenshots of NSF par public this is that public version that we're that we're seeing here. We know that this is a version of a record that goes directly to the publisher's version of records so we know that through our calculations looking at that metadata that we know that this publisher at that DOI has made their VR available. So that's what we make available to the public. There's a little open lock there that's an open access typically that open lock indicates this is open access so if a researcher or the public went and they searched on this record and NSF par they would be taken directly to the publishers version of record regardless of whether there was an NSF supplied am for that or not. However, if the publisher decided to take this version offline, we would then be able to make that am version available subsequently after that. This is actually a publisher's accepted manuscript that that second tier of the best available, a best available version that I was referring to earlier, and you can see here that this APS physics article actually directly acknowledges chorus on this accepted version so the user found this particular record in NSF par they clicked on that DOI they would be taken to a landing page and they would have direct access to this publisher am version. And then this is a this is a record that were the best available version happens to be the open version of that free publicly accessible am that was submitted by an NSF PI so what this tells me is when I look at this is that the publisher would make this either the version of record or the am available on their platform, or we didn't recognize that in the metadata. So if a user came they found this record they clicked on that accepted manuscript link and they would be able to download that version that was directly submitted by the PI. So you can see how that best available version higher he kind of flows down from there. And I think Martin I'll turn it over to you. Sure, just to bookend this we the system as it currently currently exists in all the functionality that Lance went through. We are now calling par 1.0. And we are now actively in the development planning process for what we are considering par 2.0. And it will feature a number of new upgrades. A set of improved workflows for award link management functions what I mean by that is our researchers have the capacity to link the publications to the specific awards that funded the activity. We have some new capabilities to let them edit those links and manage them a little bit more effectively. Other things that are coming are perhaps I am modest upgrade but one that is important in many cases. We also have funded workshops and the workshop reports that come out of those things are not typically juried or always let's say juried papers, but rather separate freestanding workshop reports and we think that's important to capture as well. And probably the biggest set of work that we're going to be undertaking and over the coming months is upgrades to the NSF par that will allow research data sets to be recorded and submitted to the system in terms of metadata do is other persistent identifiers. And that is very much in the spirit of the Holdren memo from 2013 so it's very important to me to see that par fosters good or best practices in terms of research data management. There will be NSF. I'm particularly interested in research proposals that foster good research data management practices. We have to current dear colleague letters that are still active. Regarding, you know, sort of programs that will foster good research data management practices and projects to advance that. And this will collectively comprise what we are terming the par 2.0 system. We don't specifically have a or have a specific timeline for the implementation yet, but we're hoping to achieve, you know, really a significant amount of progress by the end of calendar. 21. With that, Lance and I would be happy to answer any questions people have. I see. There's a message. What is NSF protocol to ensure PIs comply with the NSF public access policy in making their research data and publications publicly available well if you're familiar with the NSF. The NSF G are procedures guide. It requires researchers to deposit currently where we as as is clear from, you know, the presentation on the power right now. You know, we've focused for the last few years on articles but now we're really stepping up to the issue of data data is a much more complicated topic, obviously it. Data set can be anything from 20 kilobytes to 20 petabytes. And we've had to think through, you know, what sort of repository records we will be ingesting into the system to accommodate, you know, good access to data sets. But that's that's really the focus of the work that Lance and I are in our teams are working on right now is to lay out that workflow. That was really interesting. Thank you so much, Martin and Lance for that presentation. And thank you to everyone who has made time out of your day to join us here at CNI fall 2020 meeting I'm Diane Goldenberg Hart. At this time, I am very pleased to keep the floor open for questions if there are any other questions or comments for our speakers we have a couple more minutes here. I see that Martin has shared some links there. Those are just the links to the two dear colleague letters that I mentioned, if people are interested in them. Thanks Martin. I just, I was wondering if I know you mentioned, I think Lance mentioned that other agencies are also being brought into the system. I think you mentioned DoD. Are there active plans to expand across federal agencies or what's what's happening on that front. So, good question. So, right now we are not working with any other agencies directly to implement pages or a power like repository we're continually open to that. We do field many many questions across the various agencies for helping people to instantiate a public access plan. But for now, DoD NSF and the Department of Energy remain steadfast partners, and we are all very appreciative of the partnership across and what that means for both federal government and individual agencies but we are not currently working with any other agencies. Happy to go. Okay. Thank you for that. And we just saw a chat. Okay, so I just chatted out those links that Martin shared to all attendees. And if you still can't see them, let us know but everyone should be able to see those now. All right. Well, if there are no other questions, and we are right at the half hour here. I just want to take just a quick minute here to remind everyone that we will be having another session in in a half an hour it will be high fidelity connecting information for better research reproducibility with Terry Wheeler and Peter Oxley of the National Cornell Medical College. In the meantime, I'm going to go ahead and stop the recording on this session. But if you'd like to hang around and sort of approach the podium as it were, please feel free to do so I can unmute you and you can ask live questions, or make live comments. And we'll chat with our presenters and thank you so much again for joining us here today and I hope that you'll stay on for more sessions at CNI. Thank you so much Martin Lance that was really helpful. Thank you Cliff for having us and Diane for facilitating. And I look forward to hearing about the version two and the research data set links. That develops. That's all our wonderful direction. So thank you.