 I'm Saiz Chathary and I'm joined by my colleague Aaron Berglund. We both work at the Sheridan Libraries at Johns Hopkins University and we're here to talk to you about the Public Access Submission System. I don't think I have to convince this crowd that quoting or channeling Cliff Lynch is a good idea. If you read his roadmap to this CNI meeting, he actually mentioned fast and he described it in this way. You can read that quote as easily as I can, but I really wanted to focus on that word rebalance. I think it's a very wise choice on his part because it implies that maybe things are somewhat out of balance right now when you think about the interactions between these different players that are affected by this system or interact with the system and that there is a need to address that and rebalance it in some way. I think his description is really quite insightful and I appreciate him describing it in this way. So the Public Access Submission System or PATH is a software platform that we have built. It is live at Hopkins. We are being recorded for this session, so I'm going to go through a series of screenshots, but it is a working system. We are actually talking to other institutions about possibly adopting it. It is open source software, so you can take it and install it and run it yourself as well. It supports the simultaneous submission of articles to PubMed Central and to an institutional repository. Fundamentally, we are looking to align public access compliance requirements with funders with open access policy of individual institutions. We are looking broadly to multiple funders. We started with PubMed Central and NIH for a couple of reasons. One is Hopkins receives more funding from NIH than any other funder, so that is important for us. But second, NIH actually has a lot of programmatic hooks for interfaces that we could work with in order to actually build a system and show you some of the functionality. Some of the other funders aren't quite as far along, so we don't really have the programmatic mechanisms to do that. We also think from the researcher perspective, there isn't really a lot of objection to participating in open access efforts, but there isn't a lot of desire to spend more time and effort on doing so. By having the work actually aligned with something they have to do anyway in terms of their compliance with PubMed Central, it's a very modest extra level of effort. It's literally clicking on one or two more things, and we found that with our researchers, that's totally fine. They're okay with it as long as it's embedded into what they have to do anyway. A somewhat unanticipated benefit of pass is that we think it actually provides platform for institutional data analytics, particularly around grants data, and even the publishers, big publishers like Elsevier, don't necessarily have easy access to grants data. So we may be able to start doing some of our own analytics in very useful ways. Very recently, we added a feature or proxy or delegate submitter, which I'll show you. So I'm showing you something that's a little bit older version of pass, which is direct submission by a researcher. This always remains an option, but there's another way you can do this as well. So the direct submission would begin by logging into the system and how you use your institutional credentials, but we are supporting in common, so we know that many institutions use that preparation. When you do so, you're presented with this main dashboard, which offers access to two different kinds of views. Basically, one is submissions that are associated with your grants, and the other is the grants themselves. So if you click on the right side of that screen, you're taken to a screen that actually shows you your grants. So we are working with the grants office of Hopkins to get access to our university grant system called Coeus. We're getting a feed out of that system, and that's being displayed. So these are actually my grants. When I log into the Coeus system, that's what I see. We added a few grants for demonstration purposes, and if you look on the bottom there, you see that little red oval surrounding a road that's an NIH grant that's actually not my grant I've ever had, an NIH grant. But in order to show the system, we're emulating that that's what you would do. If you click on that start submission or new submission associated with that grant, you're put into the submission workflow, and the very first thing you can do is enter a DOI. If you don't have a DOI, that's fine. You can enter the information manually. But if you do, we do a real-time lookup of cross-ref and start to auto-populate the metadata, such as the title, the author, and a few other fields you'll see later on. Continuing through the workflow submission, you're now then taken to a grants screen. So this basically verifies that you started with that particular grant. Is that correct? You need to remove that, and you don't date that. And do you need to add other grants? So you could add other NIH submissions. So if you think about it, you could do multiple NIH grants with the same submission. And again, in the future, you can start to add things like NSF, and USAID, and other agencies. And when we talked about the system to faculty talking, the first thing I said was, I don't want to have to do this multiple times, multiple agencies. And when you tell them we're looking at integrating all that into one place, they say, that sounds great. I'd rather not have to use multiple systems. I'd rather do it in one. So the next one basically shows you the policies. I'm just telling you the screen is very small, and my eyes are getting old. So I think I'm on the right screen. So this one basically shows you the relevant policies. Since you picked an NIH grant, and this is a researcher of Hopkins, you would see information related to those two policies. So if you want further information, you can just go click on them. The DOI that I chose for this particular demo, it corresponds to something NIH calls a method B submission. They have these different pathways, how they deal with the publishers, and so on. And the interesting thing about these method B submissions is you can pay the publisher to submit on your behalf to PubMed Central, but you actually don't have to. Now, I'm told that there's a difference in versions that the publisher will probably submit the final version, whereas you can submit the author accepted version on your own, but you could potentially save thousands of dollars in these fees. And when we showed this at Hopkins, we had several faculty members and a couple of deans say, why are people paying these fees? We could use that money for students, for travel, for labs, for all sorts of other things. And what's interesting, we had a conversation actually with some colleagues at NCARB library last week. It really highlights the difference between the author accepted version and the final version. Try having that conversation in the abstract and ask your researchers, do they know the differences, the policies, the copyright, and so on. When there's $3,000 on the line, you have a different attention. So it's an opportunity where the system is actually helping to have conversations around policies, around copyright, around fees, around licensing, and so on. So if you chose to use this system and forego that fee, you would continue into the workflow. This would be basically showing you the relevant repositories. So since it's NIH, it's PubMed Central, it could be NASA, it could be CDC, any of the other agencies that are using the PubMed platform. And the university platform of J Scholarship, as we call it, the institutional repository. This is a system that we think is configurable or extensible, so you could add other repositories. If you have a disciplinary repository or a scholarly society or a domain repository, something like that, you could add that to the system. And as I'll describe in the roadmap, you could add data archives here. So we are planning to include an institutional data archive as an option, but you could also add data archives like ICPSR and N-Armor, for example. So if you hit Next, you're taken to a screen that basically shows you the metadata. In this case, that was looked up through the DOI. If you didn't have the DOI, then you can enter that metadata here. The screen continues in this way, but then you're shown the license agreement for, in this case, our institutional repository, J Scholarship. We have spoken with the National Library of Medicine about including the license agreement for PubMed Central in this screen. Right now, you still have to go into their native system called NIMS to do the license check off, but we're talking to them about having that embedded in here as well. So it would save you one more step, save you one more login. Once you do that, you're taken to this screen where you are asked to upload the files. So right now, we are talking about uploading the files. We are talking with Unpaypal about using their API so that maybe we can automatically look up the author accepted version so you wouldn't have to upload them. You can see the various choices for those types of file. Here's where you might add data, for example, as another type of file. One important thing is that text box next to the upload button is a free form text box. You can add any kind of description you wish. We've heard from many of our researchers that they would like to do that. They would actually like to describe these articles in whatever way they think is appropriate rather than using a set of metadata fields and so on. That kind of information are things that we can start to mine, maybe in process and interpret map to metadata or map to particular kinds of authors or articles or so on. So with all of that in mind, with all of that chosen, you basically go to this final review screen as we call it. And you'll see at the bottom right is this submit button. When you hit that button, what happens is the article and the metadata go to Buttman Central. It goes to our institutional repository as well. But we also create a package of the article, the metadata, the grants information, the identifiers and so on. And that becomes a Fedora object for us. And there are certain things we can do that Aaron might talk a little bit more about that when he discusses the technologies. After you've submitted the article, you can go to the submissions screen or the dashboard. And basically this shows you the status of all of your submissions, both into Buttman Central and into the institutional repository. And perhaps more importantly, it shows you the identifier it's associated with them. So another thing we heard from researchers in Hopkins is when they're submitting their NIH proposals, they're required to produce those identifiers to verify their compliance. And in many cases, they're sitting through email, they're asking their grants officer, they're asking their department admin. Having that information available on this dashboard, I think will be very helpful for them in terms of being prepared for those kinds of submissions. So as of the week after Thanksgiving, we have a new capability which is allowing a proxy or a delegate to submit on your behalf. When we showed what I just showed you one of the consistent pieces of feedback was, well, what if I wanted somebody else to do a lot of that work for me? So we identified that really as a top priority, not only for ourselves, but we've been working in Harvard and MIT, for example, they said the same thing. So what I'm going to show you is the new version of past that basically incorporates this. And if you think about two roles, there is a preparer, someone who's actually preparing the submission on behalf of the submitter who is the one who will ultimately submit it. So one thing I will mention is we've talked to NIH and NSF, USAID, DOE, so on. All of them have said, yeah, we understand people prepare submissions, but we want the PI to hit the bottom. We want that step to be done by the PI. There are legal reasons for that. There are compliance issues and so on and so on. So we set it up so that there are preparers and then there is ultimately the submitter. So these screens should look familiar to you now, but if you look near the top, the big difference is you can see it's asking you, do you want to submit this on behalf of yourself or on behalf of somebody else? So this is now emulating this idea that I can choose to submit this on behalf of someone else. And when you do that, you're basically asked to identify who you'll be submitting this for. And you do that by searching for individuals within our directory. So in this case, I was going through the system as the preparer and preparing it for one of my colleagues, our project manager, named Tom Vu. So I looked up her name in our system, found her in the system, and then what you are able to do then is in essence, sorry, prepare the submission on this person's behalf. So in essence, you can see this looks familiar. This is where you would enter the DOI. The difference is you now choose the grant because you're basically preparing this submission on behalf of this PI. So it's the same grant view that you saw when I was doing a direct submission, but in essence what you do now is you choose the particular grant or grants that are relevant and they get attached to the submission. So you're saving the researcher that step in some sense of having to do that. This is what the preparer is now doing on behalf of the submitter. The rest of the workflow looks very familiar. Here's the screen about policies. Here's the method B. Here's the screen about repositories. Here's the metadata and so on. And the big difference of course is when you get to this final screen, it doesn't say submit. It says submit for approval because this is the preparer. They are not allowed authorized to submit it directly into PubMed Central or the institute of repository. So when you hit the button, basically what happens is the submission has been prepared. The submission is now been sent to the PI or the submitter. You can see that information in this screen that confirms you have prepared it properly, but they were still waiting for approval. And what it also does is emails, the preparer or the PI or the researcher basically saying someone has prepared a submission for you. There's a direct link. When you click on that link, you're basically taken directly to the place where you would do the submission. And then the status gets changed. Now if there's been a mistake, you can basically say this is not correct or so on. But in essence, you now have the ability for someone other than the PI to prepare the submission and then send it to them. They get an email notification and they hit the button. We heard from our own researchers and from Harvard and from MIT that there isn't this clean, constant, steady relationship between the proxies and submitters. And by that, I mean, grad students come and go. Sometimes they ask the department admin, sometimes they ask a postdoc, sometimes they ask other people. We initially thought what we would do is have this very clean mapping between a core set of submitters and a core set of preparers. That doesn't seem to be the way things are done. Having said that, this is our first past, our first attempt at how this would work within the past. So we absolutely want feedback from other institutions about whether this seems a little bit too liberal or does this seem appropriate or what's the sweet spot, if you will. Just to show you what it would look like if you were actually the one that is approving the submission and putting it forward. So again, think of it in this case. This was a real submission to our institution and a positive toward one of my colleagues. Prepared the submission. We were both co-authors on this, so she prepared it and it showed up in my queue. So in essence, I get an email that basically says, someone has prepared the submission on your behalf. Would you like to go take a look at it? And when I click on that, I'm taken to this screen, which basically asks me about the submission. Take a look at the bottom. You can see the various choices that a submitter will have. So if everything is fine, you hit the red button and it simply goes through and life is good. If it's a modest or minor set of changes, the submitter can do that on their own. They can basically say, let me just go in and fix that because I know that's just a minor thing. I can take care of it. Whereas something is really wrong. For example, it's the wrong thing or the descriptions are completely inaccurate or something like that. You can send it back to the preparer. So rather than having it continue going through the system, you send it back to the preparer. You can put notes in there indicating what actually it needs to be addressed or changed. And it goes back to them. And when they address those changes, they can push it back to the submitter. But assuming everything was fine, you would continue through. In this case, you would get the license agreement in this way. You would get the confirmation about making the submission in this manner. And then you can see that at the end of it, the submission status is complete. And this is the bottom of that screen, sorry. And then when you move forward with the submission itself, you can start to see where it shows up. And this is my dashboard, my submissions dashboard. So far, I've submitted two things to our institutional policy through pass. One of them I did directly. One of them we did through this proxy or delegate problem. So I'm going to turn it over to Aaron to talk about the key technologies and some of the principles behind them. And then I'll collect at the end with some final remarks. Hi, yes. So I'm Aaron Berkland. I was on the development team of pass. And so I want to talk about some of the key technologies. And just to give a sense of what pass is actually doing with a little bit of precision. So this here is an overview of some of the technologies used in pass. At the center of it is a repository. In that repository, that contains our key business objects. We have objects to represent users, objects to represent submissions, policies, et cetera. And ultimately, every component in pass ends up reading or writing to the repository at some point. It's our central source of truth. At the bottom are loaders that run periodically. And they populate the repository based on data gleaned from various different sources. Most notable, perhaps, is Coeus. Our institution uses Coeus as its grants database. So every night, we have a process, then, which looks for any new grants in Coeus and populates the repository. Now, that data includes any numbers that are associated with the grant, award numbers, the funders, key dates when a grant begins and terminates. And most notably, it contains key personnel. So that loader will create users in the repository that correspond to PIs or co-PIs of grants in the Coeus system. So that, in itself, is quite a nice source of information to have. Likewise, we also have a loader which takes a look at the contents of NIMS from the perspective of submissions from Johns Hopkins PIs and Co-PIs. So one of the primary use cases of this is that we can discover submissions that come to PubMed Central, for example, from other sources. Some journals, for example. Said mentioned the Type B journals where you can pay a fee. Some journals do that for free. So this allows us to discover submissions to the NIH that arrived by some means that was not passed. And this is useful for when somebody logs in, they can see a more comprehensive list of their publications, regardless of how they came into the NIH. Above that is an example of an asynchronous service, our deposit services. Instead of working on a fixed schedule, these asynchronous services listen to messages from the repository. So deposit services, in particular, looks for submission resources that have been updated and represent a submission that is ready to submit. So in other words, it has all the required data and somebody has clicked submit and it's ready for submission. And so once it receives a submission that's in that state, then it does the process of negotiating with the various repositories that the submission is going to. So for the case of PubMed Central, for example, it assembles a tar file or zip file containing the required metadata in XML with the manuscript, et cetera, and sends it off to an FTP server. Now, at the end of this process, the only acknowledgement that we get is that, yes, our submission is successfully on the FTP server. Finding out that the NIH has actually accepted that manuscript that might occur through the NIH loader, for example. Another example is depositing into our institutional repository, which currently is based on dSpace. And so that is a negotiation via sword, which is a completely different protocol than the NIH uses. And in that, we know the final status of the submission instantly, so we know it's a succession number and can link to the submission in our institutional repository. So that's deposit services. Now, on the left is Ember. That is our primary user interface. And so that is actually a single page app. And what that basically means is that the entire user interface is executing client side in the browser by JavaScript. So it's kind of more akin to an app that runs on your cell phones than navigating web pages. So in this, the HTML and JavaScript is downloaded once. And then when you're actually going through the user interface that Sayeed showed, the client is actually making requests directly to our repository via its API. And we chose Fedora for our repository. And it provides a suitable REST API. It's based on the linked data platform. And while Ember does not support the linked data platform out of the box, it is very similar to their default REST adapter with some minor modifications. So we wrote a little adapter for Ember to allow it to read and write to Fedora using the link data platform API. And another key factor is that Ember, its mental model of the world is in terms of JSON. It's not in any way an RDF-based technology, whereas the linked data platform is. So we leverage JSON-LD so that Ember can communicate with the repository just in terms of simple JSON, whereas things on the back end might actually care about the graph shape of the data in the repository. And it can do its thing. So Fedora provides an API. It provides a message queue for asynchronous services like deposit. We also have other services that, for example, send out emails when necessary. For example, if a submission needs attention from a preparer. And Fedora also provides authorization via ACLs, which we'll talk about shortly. So all this communication between the Ember and Fedora is protected by Shibleth. So Pass has a service provider that is part of the In Common Federation. And it is set up as a proxy on top of Fedora. So any request that comes from the outside internet, from somebody's instance of Pass running in their browser, will pass through the service provider before being passed along to Fedora. And the integration here is notable. It worked out quite nicely. So the role of Shibleth, then, is to enforce login. And you can't make requests to our search index to Fedora to basically anything without being logged in by your institution's identity provider. Furthermore, Shibleth releases the key attributes that Pass needs, like your domain. So it knows that somebody's from Hopkins, for example, name, email address, et cetera. And now we wrote a little shim, then, which takes these attributes released by Shibleth and distills it into roles designated by URIs. So the example here in the slide is a Johns Hopkins submitter. So now given these roles, we can write ACLs, access control lists, which then allow to read, write, and append access to individual resources in the repository. For example, that only submitters from Johns Hopkins can create submissions here, or only the grant loader application is allowed to write to grants. Or maybe even, when you create a submission, only the submitter and a certain set of proxy submitters are allowed to further edit that, and nobody else can. So they are listed singly. Once a submission has been accepted by Pass, once deposit services has started depositing it, then it needs to be immutable, and that is done through ACLs as well. And Fedora does the heavy lifting of actually enforcing the ACL policies. So if anybody logs in, Shibleth gives us their attributes, we transform that into a role, and then the policy is encoded in ACLs, and Fedora does the actual enforcement. So the end result of this is that we ended up designing, implementing, and releasing pass to production on a very tight schedule. So we wrote a, last year we wrote a demo, a mockup that we could use to show the members of the administration, faculty, et cetera. Development earnest started this year, and our release to production was in July. So that is an extremely tight schedule, and the only way we got there was by leveraging key technologies and key people. So I mentioned EMBER, the user interface. We did not have the manpower and talent on our team to do a user interface of that scope in the time allotted. So we brought an outside consultants from 221B, LLC. EMBER was the technology of choice on that team, and it ended up working well for our user interface. It allowed rapid development of the UI and iteration with respect to the different use cases and feedback. And so that worked for us. Shibboleth is trusted by our institution. It's rather ubiquitous among other institutions, and it, protecting the repository by Shibboleth made a whole class of problems go away for us that we didn't have to implement. Fedora brought a REST API, authorization, binary storage, asynchronous messaging out of the box, and really with minor tweaks, we were able to use all of that and it spared us a lot of development effort and it worked out well. We also did our development in using Docker containers. So for example, Fedora's and Docker, et cetera, we deploy containers to AWS, and that actually runs our production environment. And for the most part, it's stateless. We rely on AWS services like RDS for the Postgres database, elastic search, et cetera. But our stuff runs in stateless containers and can be deployed relatively quickly, which was very helpful for us getting it out in the time and very helpful for us rapidly integrating feedback from our faculty. So with this, we have the basics for being able to look outwards towards multi-institutional use cases, because we have shibboleth and we have the technologies that can be used in those particular scenarios. It's just a matter of doing it. So now I'll pass it back along to Sate. So maybe everyone knows what AWS Amazon web services is. So a little bit about the roadmap. There's further information about PASS at these two websites you can visit. The second one being an open science from a project we've set up where you can take a look at information. And there are links from there to the GitHub repository if you actually want to get to the code. And as I've mentioned, it's open source. We've been working with Harvard and MIT throughout this process, and we are planning a pilot effort at Harvard in the spring and some form of user testing at MIT. As Aaron mentioned, we did think about how this could be used across different institutions, amongst institutions, and so on from a design perspective, but we haven't gotten down that path yet. We were honestly thinking that people would take the source code and run it locally and work with it that way, and that's certainly an option. But Harvard's actually asking us to host it, where the pilot we're doing with them, we would actually be running it, when I say at Hopkins, I really mean through AWS, but using their grants data and their information and so on. So we'll actually be testing what it means to run this for another institution. And as you might imagine, our grants span institutions. So right now we are basically doing with Hopkins grants, Hopkins researchers, but we know that Hopkins researchers collaborate with people at Harvard and MIT and at your institution. So there are a lot of interesting kinds of use cases and questions for us to think about. We're very eager to explore those with you. We are looking at integrating with other funding agencies. So as I've mentioned, there are actually, I believe nine, maybe 10 agencies that use PubMed Central. You can deposit to any of them using this system. The submission status information is only available from NIH right now. We're actually talking to NASA, we're going to talk to NASA and CDC and others about trying to get that submission status information, but you can deposit into any of them. Again, because it's important to Hopkins, but I suspect many institutions. The next big, if you will, cluster we were looking at was NSF and DOE, Department of Energy. They use a different approach than NIH does. At this point, what they basically do is ask you to verify this kind of publication or article submission and compliance through the reporting function. So you would log into research.gov and get to your particular project and you would be presented with a list of articles and saying is this correct, is this not. So we've talked about is there some way of doing some sort of deep linking between pass and research.gov so that it would not directly deposit into PAR, as it's called, the repository for NSF. And DOE uses pages. It's the same technology, I think. But rather, it would populate your project page in research.gov and then it would go through from there. We have also been talking to USAID. They are planning to build a deposit and submission status system and they'd like to work with the use cases we've developed. For now, with the USAID or Department of Education, for example, you have to go to a web form and basically fill it in. So what we do is if you pick a grant from one of those agencies, that final screen where you have all the metadata, it remains available to you and then there's a link at the bottom of it where you will go straight to the web form. So it's not perfect, but in essence, you have the information on one tab and you have the web form on another. I mentioned the data archive. So Hopkins has a lot of interest in building an institutional data archive and we are looking at how that would integrate with pass so that the data that are cited in the articles that are being submitted will also be submitted through pass into an archive and we could get the metadata associated with DUIs and so on. I had also mentioned that we are going to use the Unpaywall API to help people basically say this is the version of the article through the systems they're already using as event publishers rather than having to upload local copies. And then we started to have some conversations with Simflectic, for example, about elements and Elsevier about Pure and Vivo about faculty profile or productivity or activity systems. There are some interesting ways these two systems might talk to each other, pass could talk to those systems, I should say. And I wanted to show you this graph. I'm not, in this diagram, I'm not going to go into details unless there's real interest in doing so. Aaron showed you the architecture diagram for pass so if you think about all that sort of collapsed into that little cloud in the middle of this diagram, this is a pass-centric view of Hopkins so I'm not implying this as comprehensive or completely, it counts for everything but what it shows is we're really thinking of pass as being embedded in a series of broader research workflows within the institution around data and publications. So we've been mapping how do articles and data flow between all these different kinds of systems. The MA, RCC, or Marseys, it's called, is the Maryland Advanced Research Computing Cluster or Center, which is the centrally managed high performance computer facility at Hopkins. So we're starting to look at where are the researchers' data and articles sitting, how are they flowing and how do they come through pass and end up into our archives and repositories and then external ones like PubMed Central and so on. So I'll end with a few acknowledgements, again, which you can look at here. Aaron mentioned a very tight deadline, that is an understatement, so while I appreciate the help that everyone on this list did, I really want to thank him and the great folks at Hopkins and at C21B that did this work. And I think we have a little bit of time for questions or comments or options. All right then, I'll go back to this slide, which basically is more information about where you can find out and if you want to talk to someone or email someone, feel free to reach out to me. We don't have a pass office right now or anything like that. So it's just me, but I'm happy to follow up with anyone, so thank you.