 I think it's about time to get started. My name is Diane Goldenberg Hart. I work for the Coalition for Networked Information, C&I. And you have reached part of C&I's Spring 2020 virtual membership meeting. We're so happy you could be here today. And I am delighted to introduce our speakers today, Said Choudhury and Hanvu of Johns Hopkins University. We'll be giving the webinar presentation, packaging specification for simultaneous deposit of articles and data into multiple repositories. Before I hand it over to Said, I just wanna draw your attention to a couple of items about this webinar environment. First off, there's a Q&A box. You should see a Q&A button at the bottom of your screen. If you click on that, it'll open up a box and you can type your questions into that box at any time during the presentation. We'll field those questions at the end of the presentation and I'll be rejoining the webinar at that time to moderate the questions. I also wanna point out that we have a chat box. I'll be posting pointers to resources online throughout the presentation. In addition, in particular, I'll be sharing the slides from this presentation through that chat box. So keep an eye on that. And we will be monitoring that as well during the Q&A. So without further ado, I wanna thank you again for attending and I wanna hand it over to Said. Take it away, Said. Thank you, Dan. And thanks to everyone for joining this session. I know this isn't the way we had imagined getting together for CNI presentations, but I do wanna thank CNI for always a great deal of effort to make this happen. And I certainly hope everyone is staying safe and healthy during this unprecedented challenge of time. So I'm Said Shazari from Johns Hopkins University. I'm joined by my colleague, Han Vu. And as Diane mentioned, we're going to talk about this work that we've been doing around the packaging specification for simultaneous, the positive articles and data into multiple repositories. The inspiration for this particular work is a system we've developed at Hopkins working with Harvard and MIT in the National Library of Medicine called the Public Access Submission System Pass. I've talked about this in previous CNI, so I'm really not going to go into much detail about pass itself, other than just a very high level overview, which is that it's an open source software platform and we have actually explored the possibility of running as a host of service as well. That supports the simultaneous to positive articles into both institutional repositories and public central. And as many of you know, PubMed Central is typically associated with NIH and NLM, but in fact, it is used by several agencies, including NIH, CDC, FDA, the Howard Hughes Medical Institute and NASA. I only mentioned those because they're relevant for COVID research and there's some of the big funders at Johns Hopkins. It's fundamental premise behind pass is that we're trying to reduce researcher burden and increase the prospects and awareness around open access compliance. So by harmonizing the requirements that researchers have when they have grants from NIH and these other agencies, which require that articles be deposited in PubMed Central and institutional policies, increasing number of institutions of open access policies by allowing simultaneous deposit into PubMed Central and your institutional repository. We're streamlining the process so that hopefully researchers find that valuable in order for grants compliance. And then because the open access policy mechanisms are folded into that, we're hoping that it will decrease the engagement and use of open access and institutional repositories. So recently, as some of you know, or many of you know, the White House OSTP sent out an RFI about public access and the AAU and APLU and COGAR, American Association of Universities, American Public Land Grant Universities and Council on Government Relations. I think I got those right, wrote their response and I was very heartened and encouraged and happy to see that in one of the questions which you see right here listed in their response, I took a subset of the response in that question and put it right here as a quote that mentions the PASS system. I did not know that this was coming from these organizations in their response. I've been a participant in the workshops that these groups have organized over the last few months and I did have the opportunity to talk about PASS at one of these workshops. But as I said, I was not aware that they were gonna mention PASS as an illustrative example. And that is an important word, illustrative, we're not proposing in some way that PASS is the only way you can do this or it has to be the way you do this. But we are developing this as a way of trying to convey that institutions can provide solutions that will meet these public access requirements and open access requirements and work with the funders and eventually even possibly with the publishers to develop these kinds of solutions to reduce the burden on researchers and increase the compliance for public access and open access. So seeing this mentioned in the response to the OSTPR-FI is very encouraging for all these reasons. So in the spirit of extending PASS beyond the National, our public central and National Library of Medicine's management of public central for the various agencies, we applied to and were successful in receiving a grant from NSF. You see the grant number there in the title of that grant. It's through the EGR program as it's known in NSF. Open infrastructure to reduce burden on researchers and federal agencies, residents with the title of the talk and what we've been trying to do with PASS. But the fundamental premise here is if we start to extend this, hope to extend this to other federal agencies, while it was incredibly helpful to work with NLM and they were just fantastic in terms of helping us understand how public central works, how to connect it to PASS, not just in terms of deposit, but also in terms of harvesting information about compliance and grant status and identifiers and so on. If we want to do that with other federal agencies, it really doesn't make sense to keep doing that in a pairwise way. We wouldn't want to do this next with NSF and then next with DOE and then next with Department of Defense and so on and so on. So in addition to just the sort of difficulties of trying to do that one to one with these federal agencies, there's obviously potential issues around complications or collisions or sort of complications of inconsistencies in how the agencies do this. So that doesn't really scale. So the premise of the grant is, can we come up with a specification for simultaneous deposit of articles and data into these multiple repositories? And we would want to do that, of course, in concert and in collaboration and in conversation with the institutions that will be affected. So we reached out to a series of university partners, of course, who run institutional repositories trying to make sure we had a diversity of DSPACE, Fedora, Hierax and so on. And you can see the partners listed there. And it's an NSF grant. So there's obviously a connection to NSF, but we're very fortunate that our contacts and colleagues at NLM and in fact, new contacts at DOE and OSTI participated in a workshop that we held early last year to discuss how we might go about building this specification. And the ultimate idea or the aim is to ultimately take this to a federal interagency working on public access and say here's a specification that we've come up with. It's been commented on by all these institutions and some other federal agencies. This is something that can now potentially be adopted or at least embraced and thereby making a system like pass or other similar systems easier to do this kind of simultaneous deposit. So with that background in context, I'm going to turn it back, sorry, not back, but turned over to Han who's going to cover the next few, the remaining slides and talk about the experience to date. Han, I'll turn it over to you. All right, thank you, Seed. Hello. So as a team, we have had some experience with packaging, digital contents and interacting with various repository. So we started this walk by taking new style of observations and lessons we learned from our experience. And from this body of information, we organized a workshop focusing on discussing our option for packaging formats, core common metadata and challenges in integrating with different deposit APIs. For the packaging format, we examine a range of simplest formats such as simple zip files to more complex formats such as data consultancy packaging specification or the DSPACE and METS packaging formats. As the packaging specification go up in complexity, they provide more features that improve various aspects of data transmissions. These aspects include ensuring data integrity through providing checksum, aiding data interpretation and process via additional metadata and enabling interoperability of data consumption and extensibility via formal packaging specification as well as enabling full understanding of data semantics via descriptions of data files, objects and relationships. So at the workshop, we discuss the balance between the complexity of packaging slack and its capacity to comprehensively transmit data. On the topic of metadata, we reviewed metadata requirements for deposit into NIH, PMC, Public Central NSF Public Access Repository, O&IR, J Scholarship and Harvard's Dash. The discussion we had at the workshop chat gave us a better view of which metadata should be considered core and essential and which is not. We all agreed on the need for the specification to be flexible and extensible in encoding metadata so that it can remain useful and valid in the inevitable cases of the metadata requirements changing. And lastly, from drawing from our experience, working with multiple deposit API, we highlighted a few challenges that our team had encountered. Notable challenges include difficulty to track the ultimate status of the submissions that we sent, difficulty in aligning grant details from our submission system to the funder submission system and the wide ranges of repositories, behaviors and requirements. Our goal in this work is to address as many of these challenges as possible while acknowledging that not all challenges can be addressed by a specification. The takeaway from our workshop is the desire to make the specification extensible and balanced between being simple enough to be adaptable and feasible enough to be useful. Furthermore, we identified three major concerns that we want to tackle in developing the specification. The correlation concern refers to the ability of all actors in a submission workflow to properly identify and relate the correct persons to the correct awards, to the correct publications and to the correct organizations. The metadata concern refers to the question of how a spec could catch or call metadata fields completely while remain extensible in anticipation of changes in the future. And lastly, the trust concern which refers to the question of how the specification could guarantee the fidelity and integrity of data within the package so that the consumer of the package can be sure that what they receive is what was indeed sent from the correct person. Next slide, please. With those major point in mind, we put forward to our workshop attendee sometime last year, earlier this year, a packet containing our overall recommendations for the direction in which we should take the packaging spec and for the kind of buy-in and support we would like to have in order for this specification to succeed its vision. And the overarching recommendation is as follow. Number one, we recommend that Baguette is used as the base mechanism for transmitting data. We chose this approach because Baguette is a lightweight specification which provides structural and ensures the integrity of content and metadata being transferred. Additionally, Baguette specification provides enough flexibility for us to build out a different additional features as required to package and exchange material to multiple repositories. Number two, we recommend the use of unambiguous identifiers such as all goods for PIs, Co-PIs and authors in various reporting systems and data feed so we can connect our records with those other external system. Number three, we recommend the use of unambiguous identifiers for awards so that we can refer to the right award. Number four, we recommend the use of the correlation identifiers, I think tracking numbers, for the depositing digital content to enable a system to unambiguously follow the status of its submission through the final stage. So the last three recommendations address the overall correlation challenge that we identify in the workshop. And lastly, number five, we recommend that funders and other interested institution establish a mechanism for conveying the agreement between receiver and submitters of the digital content in a way that can be expressed in submission metadata so that the receiving repository can perform necessary verification to ensure that the data within the package is trustworthy which is something that some of our funders stress at the workshop. These recommendations along with a proposed packaging spec and the conceptual model for the digital content will send out to our workshop attendees for commenting and feedback. Next slide please. Some notable topics and a rise from the feedback are on this slide, I'll give a bit more details here. One was the interest and support for using unique identifiers to enable linking copies of the same digital content residing in multiple repository, submission status tracking as well as content usage tracking. A comment to raise the point that for NSF public access repository, which is a major repository we considered in developing the spec, accepted package with no actual contents or limited data, that reminded us that this specification need to cover the use of a package containing only metadata about manuscript that are hosted elsewhere on the web. There were also comments about what adoption of this specification would look like in term of human resources and repository and integration. More specifically, our commenter pointed out that in order for this specification to be adopted, toolings and integration needs to be made available and there would need to be a developer resource invested into this effort. We receive a few comments which suggest that we need to clarify the relationship between SWORD API and this specification and the clarification would be, this specification is to describe the packaging format of a certain type of digital content where SWORD API is a depositing mechanism, one of a few that we had examined in developing the spec. And lastly, we also receive quite a few feedback on the details of our conceptual model, the specific of which is better viewed in full tax within the proposal document that I will point you off a little bit. And all in all, I'm happy to note that no one commented that what we're doing is not and these are hair-wide here. Next please. While we have not had a chance to follow up with the commenters on the feedback received so far, we have reviewed and started to process some of the feedback in to inform the next version of our work. Our next step will be to continue the process, continue to process the additional feedback from the community. We plan to incorporate the suitable feedback into developing the draft specification. We plan on updating the conceptual model, creating sample physical model and produce an overall, an overview summary of our goals and concerns that we've encountered so far and how we address them. These artifacts that I just mentioned will ultimately be part of the deliverables of this work. Next please. So we'd like to thank our funder NSF, our program officer Beth Playley for the opportunity to do this really promising and interesting work. We would like to thank Catherine Funk at NIH and Mary Beth West at OSCI for offering additional input from funder's perspective at our workshop. We'd like to thank the workshop attendees from Arizona State University, California Digital Library, Duke University of Michigan, MIT Notre Dame for offering their feedback on our work so far. Next please. So in conclusion, on this slide are the artifacts of our work so far. We would like to hear feedback and input from the community about our approaches so we can arrive at the best deliverable. And thanks to those who've already spent time reviewing and commenting on our work, we plan on responding to with the comments soon. And for the folks who would like to have a look, the links to the artifacts are here, please feel free to look and comment directly on the document or email either one of us at the email address listed here. And that's the end of my content. I'll transfer this back to whoever. I'm Diane. Thank you for your comments. Just one comment before you take over. And we'll correct this in the slides that I'll send to you updated. Scott Lapinski from Harvard was also a member of the participant of the workshop. Okay, all right. Thanks. Thank you for that clarification. And thanks, Said and Han, for that really interesting report on your fascinating work. So we will be updating a corrected version of those slides shortly. And I shared out through the chat that URL where you can find the link to the slides as well as additional links relating to the PASS project. And very shortly we will be embedding the video of this presentation on that page as well. So I'll share that again with you before I close down the webinar. So at this point, I just wanted to remind everyone that there is a button, Q&A button at the bottom of your screen. If you click on that, a window will pop open and you can type your questions there. And we'll read them aloud and Said and Han would be happy to field those questions live while we're waiting for folks to jot down those questions. I just want to remind everyone that this is part of CNI's Spring 2020 virtual membership meeting. And this meeting will continue on through the end of May. So we have a few more weeks left to go. I'm sharing with you in the chat, the direct link to the schedule for that meeting. So you can check out the webinars that are scheduled to come up here soon. We've got several more this week. And this morning we announced also a closing plenary by Cliff Lynch, which will take place on Friday, May 29th to sort of reflect back on the meeting itself, the events of the day, and sort of a look forward to, well, the situation that we're facing and what it might mean for our community. So we hope that you'll sign up for that and join us then. So, Diane, if I can add one more thing. So the one contribution for workshop participants was coming up with use cases for Fedora, dSpace, and Hierax. And I believe that's in the document approach and proposal for packaging spec. So we would actually welcome feedback from institutions that are using those three platforms or if there are other IR platforms out there we don't know about in terms of looking at our recommendations and our approach and proposal for spec and seeing how it fits or doesn't fit with how you deployed your Fedora dSpace or Hierax IR. Great, thanks, Said. And if you can manage to copy and paste that link into the chat now, feel free to do that. Otherwise, folks can just look for that. It's a pretty long link. Can look for that in the slides. Yeah, I think maybe you might need to stop sharing in order to see your chat box. I'm not sure, Said, but rest assured they will be in those slides. I was just, so I was wondering, Han and Said, what is the timeline for the deliverables on this? So we are hoping to finish the conceptual model in the next couple of weeks, perhaps by the end of May. We obviously continue to seek feedback from the workshop participants and the agencies that we've been working with, but we are hoping to get broader feedback from people in the community as well, of course. I will mention that there's been a couple of people in the publishing community who've reached out as well, who know about the work, who've basically said, we might be interested in this specification as well. In particular, I think because it seems like so long ago, but not that long ago, we had a lot of activity in workshops around the open access tipping point and asking publishers. I believe MIT was one of the lead institutions in terms of saying publishers need to start depositing directly into IRs. So at least a couple of them in response to that said, well, in that case, how would we go about doing something like this? And so they were directed to this work. So there's a possibility we may reach out to publishers directly as well, but we definitely want to go back to the Federal Interagency Working Group. So my hope is that sometime in the summer, we can formally go back. I've given, Han and I have given an overview of Pius and a brief demo to that group previously and they know about this work. So Jerry Sheen, who some of you may know at NLM, has been bringing this work to that group. And Katie Funk, who had mentioned, is a member of that group as well. So it's Beth Plaley from NSF. So we'd like to go back to that group and in essence say, this is the approach. What is your response? Were you going to embrace this or are you going to endorse it? Whatever the right word is. I'm not sure a group like that can endorse something, but can they recommend it back to their agencies and the folks running their federal agency or postage? Got it, okay, thanks. And I see there that Han did share in the chat box for everybody, direct links to those documents if you want to have a look and share your comments. And I should say also, if you've got thoughts on this project, comments, questions, and you'd like to share them live, please raise your hand. I can unmute you and we'd love to hear your thoughts on this if you've had a look at these materials and had a chance to reflect over them or have any questions and you'd like to engage directly with our speakers. And with that, I just wanna thank our presenters one more time for coming and presenting about your work here at CNI. Really appreciate your being a part of the program and to our attendees, thanks so much for making time out of your day to join us and be well, everybody.