 get started. Thank you for joining us today. I'm glad you could be with us. I'm Cliff Lynch. I'm the director of CNI and let me welcome you to week three of the CNI fall 2020 virtual member meeting. Week three is focused around technology infrastructure and standards and today is the first day of our week three. We've got a lot of really neat things on the queue for you today, so I'm really glad you could be with us. A couple of quick things. I just want to make note that as well as the synchronous presentations like the one you are attending, we've also made available quite a number of pre-recorded videos of project briefings and updates for week three and I'd invite you to have a look at those and enjoy them. I also note that this is being recorded and it will be available subsequently, so if you find it interesting please share it with your colleagues. We have closed captioning available. Please make use of that if you'd like. There is a chat box and we'll push out a few URLs during the presentation. Feel free to use that. There is also a Q&A tool at the bottom of your screen. You can use the Q&A tool to ask questions at any point during the presentations, although we will come back and address all of the questions after we've heard the presentations in a Q&A session which will be moderated by Diane Goldenberg Hart from CNI. And I think that that's all the introductory things I want to say. We have two speakers with us today that I'm very pleased to welcome back to CNI Natalie Myers and Rick Johnson both from the University of Notre Dame and today they are going to be talking about a really interesting IMLS project which I have been following not as closely as I would like called PressQT. This is a IMLS project funded project that intends to make available some tools and services that will help with both preparing data for preservation and sharing and for reusing data. It of course is very much shaped by the emerging fair principles for the description of research data. The other thing I would say about this and I'm really interested to hear how this plays out in the presentation is that in order to do this they've had to take a sort of an architectural view of where these tools and services fit in because their work is intended to be somewhat independent and agnostic about repositories and which repository you choose although obviously it's sensitive to the portfolio of interfaces that a repository can offer and that's something that that level of abstraction is something that I think we've often missed in thinking about our strategies for interoperable tools and services to help with with research data management. So I'm delighted that Natalie and Rick are here with us and with that long-winded introduction please let me turn it over to Natalie. Thank you so much Cliff. I draw your attention to this slide Rick sharing on the screen and in the chat you can pick up a link to our slides to follow along with us if you'd like or you can come back any time to visit our slides using the URL that I'm sharing in the chat that you can see at the bottom of your slides osf.io 3g nqd. Please feel free to follow along and revisit the slides after the presentation. We're here to demonstrate the presqt services. Rick if you could move to the next slide prepare data and software preservation. As Cliff mentioned it's the imls implementation grant and previous planning grant funded effort to address needs for preserving data and software with the goal to collaboratively design develop and connect interoperable and repository agnostic data and software preservation quality tools. Rick next slide. We want to emphasize the collaborative nature not only of the implementation grants planning phase where we gathered resources during stakeholder engagement and those are all available openly on the osf. As is everything produced by the presqt project it's either shared on osf or github. We invite you to visit those open resources and again sharing a link in the chat that will take you right to a page where you can access the code and our survey results as well as information from our previous workshops and we encourage everyone to visit and use the open source software from the project which is available on github. I've shared that link in the chat. We're now in the implementation grants qa phase. You can see Rick if you'd highlight the red arrow on the slide. The place in the grant where we are working with our collaborators they are domain repositories, data curators, repository managers, librarians, software developers, publishers, researchers to do qa on the tools we've built. Next slide. Here you'll see the funded sub awardees on our project. This project is definitely not an island. We worked with many stakeholders across the library and research ecosystem and the software you'll see demonstrated here is informed by all the partners and test partners you see on this screen. Next slide. When we launched the PresQT project our planning phase task was to identify data and software preservation tool gaps. Through our surveys and workshops we established an understanding of the concerns of different repositories, different cultures of data management, processes, technology, and tools. We heard that the switch cost between standalone systems was too high for researchers and that data flow or transactions between systems could be very difficult for researchers and data curators to manage in a time efficient manner. Our analogy for this situation is a research data management archipelago where standalone systems and the data in them grow in isolation from one another. In other words an ecosystem of data islands. Next slide. And so what PresQT embarked on was a project for emphasizing PresQT's ability to do inter-repository file transfer. Picture PresQT as the dotted lines you see between the islands and the archipelago are charged coming out of the planning phase with developing tools for connecting those islands or communities of practice and bringing preservation features together without moving files up and down from researchers' desktops or between sharing and preservation platforms in an inefficient manner that was lossy. This way our users and repository managers could more easily move projects between systems and enhance their assets with fixity information, fair testing, and better key wording along the way. In other words those data islands could become connected by PresQT's services. Next slide. So in the aim of addressing the needs in the community we developed a set of services, a tool suite if you will, that you can try on your own or in our QA phase that will be described in today's talk. The PresQT services are a stateless authentication token-based utility service that we built for enabling wide audiences to engage in testing the PresQT services. We've also got a prototype web GUI that you can visit to make it easy to access those services without engaging directly at the level of the API if you would prefer to engage with the services through a graphical user interface or users who are interested can access the PresQT services directly through the API. I've shared that link in the chat or you can stand up the PresQT service on your own using the open source distribution on GitHub. So in general there are three ways to engage with the PresQT services as an end user on the GUI, as a developer or a person comfortable at the command line with the API, or by standing up the services themselves. We're quite happy for you to use them as they're installed at Notre Dame and that's what you'll be hitting if you use the command line or the GUI. Next slide. I'll give you an overview of the PresQT service features and then turn it over to Rick to demonstrate the system. The PresQT service features include a set of services that allow you to configure additional partner systems via JSON and Python functions that will allow you to transfer files between systems in bag-it format, share metadata in JSON format, do fixity checks alongside your file transfer, enhance keywords, launch easy emulations, and run fair testing. But really what PresQT emphasizes is our pluggable configurable architecture and how it's easily extendable for diverse systems under consideration of diverse structures. In other words, a repository agnostic system that doesn't create additional standalone solutions but rather creates that umbrella of dotted lines that you saw before in the Archipelago. Rick, can you tell us a little bit more about the features? Yes, thanks Natalie. So a lot of what we have here is, while originally all of the islands in the Archipelago that Natalie referred to, one of the goals of our project being to connect those, we've worked pretty hard to even in this stage once they're plugged in to not have them be isolated from one another. So a lot of these services when they work together, you can do things like the fixity check along with the transfer, keyword enhancement, etc. So looking at the transfer itself, this is an example looking at our demo UI that Natalie referred to. So as she mentioned, there's a few different ways to plug in. And with that, we've really tried to stay open minded and flexible as we work through this grant project about what will be the best way for folks to interface. And those are some of the things we really want to hear from you all. If you're able to go out and try these different services out, this is the UI here that I mentioned, where in the transfer, there's also opportunities to do things like keyword enhancement. And it does fix the checking along with that. Of all the different services that are plugged in that we have all of these that we're really very fortunate to have worked with all of the various partners that were on the acknowledgement slide before. Really, we are standing on the shoulders of giants, so to speak, where we're building upon the work that all of these different folks have done with these various systems to really enhance and make each one better is the goal. And you'll notice that a few of them are marked with stars. Those are ones that are still in progress. We have some features in place and all the others without the stars, those are ones that have something working and able to try out at our QA site. Looking at the breadth of integration, we really think it's pretty fitting to have this laid out as a matrix because once something is plugged into that network, you then have the ability to have things move in and out between the various end points and do things like the file transfer keyword enhancement, fair testing, etc. Looking more closely at the fixity, one of the things that it does is when we're doing the file transfer, we pay very closely to the fixity at various points in the process where it's not just looking at the fixity at from the start and the finish, but it's also looking at the fixity when things go over the network. We also look really closely at the different file corruption hashing algorithms that are involved. So depending on the end point the system involved, we work to match against that. So for example with the OSF, there were two or three different hashing algorithms that they use and we then try to match that with the source as well. Looking at the keyword enhancement, so within the file transfer there's the option to do where keyword enhancement, really keyword expansion upon what is coming through and what it does, it looks at the keywords that are present in the source and then it sends them to the SciGraph service, which then does keyword expansion upon that and then once it goes to the target then you're able to add the expanded set and that actually is a list of keywords that it's customizable in terms of which keywords you want to select and that's something we'll see in a minute when we show the demo we have keyed up for us. Looking more closely at that keyword expansion, so on the left there you'll see in the example there's catdog, catdog egg and then what has been retrieved from SciGraph are those additional synonyms that synonym terms that are there like canine, scrambled, feline, etc and there's really depending on as you and you may know if you've worked with the sore eye or things of that nature there's quite a few different terms that can be pulled up so we're really very excited about this particular feature. Okay so we will have Justin Bronco one of our developers with our Center for Research Computing at Nerdame do a voiceover for this quick demo. The developers on the Presqt project and today I'm going to walk you through a transfer so we can see here in OSF I've got this new project that I'd like to move over to my GitLab account. You can see right now I've got nothing on my GitLab account so if we pop over to the Presqt interface I've already used my access token to sign in to OSF here and I will click our transfer option and we're going to go to GitLab we'll put in my GitLab token we can see that I I currently have nothing in my GitLab account it's not going to find any duplicate resources because this is a new project so we can ignore that step. We have keyword enhancement that we do during transfer so for the purposes of this demo I will do the manual enhancement. Over here in OSF you can see we've got wood and water as our our keywords so let's select aqua oxygen atom and bosser we have a transfer agreement that will just say what's about to take place as a result of this this action you can opt in to receive an email when the action completes but since this is a fairly small project it should not take too long. The finished the transfer was successful all files passed our fixity checks these keywords have been added so we can test that by checking over here in OSF if we refresh this page you can see we've got the the three new keywords that we added and we can actually click through to see the resource over in GitLab and you can see we've got all five keywords we've got the description of the project from OSF and that's how our transfer service is working the developers on the PrescuT. Okay so let me exit out of this quickly and move on to the next slide. All right so with the keyword enhancement with any of these services it's also it's not a one size it's all proposition so with the keyword enhancement you can also do that without doing a transfer if you're just looking to enhance keywords on your existing source we have that as well and it really just works the same way whether it is through the transfer or not and with that I'll hand it over to Natalie. Thanks Rick many of you will be interested in the fair testing service that PrescuT enables give you a little backgrounder that you can also visit on your own in the slides our testing is based on the fair principles which were introduced in 2016 you can see them on the slide to the left the principles for findable accessible interoperable and reusable what fair is is that set of principles for humans or machines and software the fair principles emphasize that reusable data data which actually can be reused will become as valuable as is possible if we make it more findable accessible and interoperable yet by the time that people had become acquainted with the principles increasingly people wanted to figure out the difference between the fair principles and a standard it's important to recognize that the principles deliberately do not specify technical requirements they're just a set of guiding principles they provide for a continuum of increasing reusability via many different implementations the principles describe character risk banks and aspirations for systems and services to support the creation of valuable research outputs that can be rigorously evaluated and extensively reused with appropriate credit to the benefit of both creators and users but what fair isn't is that it's not a standard the fair guiding principles are sometimes incorrectly referred to as a standard even though the 2016 publication explicitly states they are not standards are prescriptive these guidelines or principles are permissive fairs originators and stakeholders in the paper you see on the right hand side of this slide cloudy increasingly fair um suggested a variety of follow-on valuable standards can and should be developed to top the fair principles each of which could be guided by them let's look at the next slide and see how that's evolving here we see from the australian research data council that uh if you can take an incremental approach to fair um it becomes easy to understand how to implement fair and understandable chunks aligned with each of the principles for example for findability we might suggest that it makes it easier to find the particular person you're looking for or particular asset you're looking for if a permanent and persistent identifier is required for interoperability we might need uh ontologies or vocabularies that are themselves fair and so on next slide rick so as implementations against fair have evolved and once funders like australian research data comments began to request fair data of their funded projects the stakeholder community began pairing complying aspects compliance aspects with the principles in turn the growing maturity of fair digital objects and repositories began to create an information ecosystem where fairness could be tested but what should fair testing aim for it's tricky to implement because remember fair is for people and machines early tests relied on surveys people would fill out answering questions about their data fire testing today is increasingly automated and focus more on whether data is fair for machines or software to learn more about automating fair assessment press qt did workshops with mark wilkinson daniel clark and avie mayan to learn their approaches to fair assessment then we stood up fair testing in the press qt service using mark wilkinson's evaluator service for press qt endpoints let's take a look next slide rick here we see some information from fair sharing about the fair maturity indicator authoring group who wanted to provide an objective automated way of testing both metadata and data resources against the fair maturity indicators paste an example of that into the chat for you you can read more about the indicators on the papers of the left of the slide mark and the group also created a free fair evaluator which runs this demonstration service you can see a screenshot of it on the right i'll share a url for it in the chat and you can use it on your own after the talk the evaluator provides a registry and execution functions for maturity indicator tests community defined collections of maturity indicator tests and quantitative fairness evaluations of resources based on these collections we've integrated the fair evaluator service on fair sharing into press qt let's take a look at a demo i'm going to turn things back over to rick so today i'm going to show you how press qt has integrated with fair share evaluation services so you can see here we've got the press qt data and software preservation quality tool project open through osf and you can see over here we've got this services tab and one of the options is fair share so if we click fair share and pop open that modal we can see this action will submit a fair share evaluator request for this project using the identifier this doi that's associated with the project and they will return how they've how they've rated each of these metrics that have been identified by the press qt pis so let's select all of those tests to run and hit evaluate and you can see once once the tests have finished running we've got the name of the metric whether it passed or failed if you hover over over the grade you can read what the metric is in case in case you've forgotten and you can get a bit more information by by expanding each of the tests the yellow ones are warnings green success and red of course's uh failure thanks rick let's move on so that we um have time to conclude and honor everyone's presence on today's uh forum we appreciate our opportunity to present press qt to this dni group if we look at um the next efforts we have in progress um we're working on fair assessments with fair shake this introduces rubrics to press qt and expanded services uh connecting um easy emulation nominations to press qt we look at the next slide um we can see that we currently have in progress a emulation nominator where you can take a file on press qt and send it to easy for emulation next slide and um i encourage you on uh thursday december third in the four through five p.m. time slot here on cni to visit the easy emulation in action talk and you'll hear more about this next slide um it's the best time ever to join software preservation network um because in 2021 they'll be offering a opportunity for software preservation network members who engage with emulation as a hosted service i encourage you to check that out um never been a better time to join um i'll paste a link to more information about them in the chat and we are looking forward in press qt to being able to work with the emulation nominator and the um easy hosted service in the future next slide there are a lot of ways you can engage with press qt to connect uh sign up for our mail list by december 4 and you can engage directly in our qa program we'll have an onboarding call december 7 contact us anytime at our mail list uh next slide um visit our upcoming talks we'll be uh delivering sandra guessing our co-pi monday december 7 at the trusted ci webinar and wednesday 9 december at agu next slide and we thank everyone here at cni from the whole press qt team um encourage you to visit the slides and reach out and if you're interested in learning more about press qt thanks very much thank you natalie and thank you rick for a great presentation on a wonderful tool so robust kind of a one-stop shopping and it was wonderful to hear more about it we are uh right at time here um so i do want to offer our attendees a chance to get in a question if they have any um and we'll take just a minute there to see if we can uh field a question or two before closing things down um i don't see any in the q and a right now but i'll just ask very quickly um you have a community um meeting um series that you are hosting and and running around this tool right do you want to mention something about that sure we do um we run a regular community meeting that's open to everyone um recordings and minutes from those meetings are available online but most importantly at our next meeting on december 7th we'll be onboarding people to our qa process it's free anyone can join it's not very time consuming takes less than a half hour and it will allow you to engage with uh press qt um as an individual or an organization and to think about whether um you might want to integrate your own endpoint i see in the question and answer carl benedict asks what does the repository integration development process looks like um that's a great question carl and the good news is is that now that we have gotten better at it um some of them with robust apis we're able to bring up in a day or two um the code is open so anyone can create an integration but we're also happy during this funded phase of the project um our funding runs through the end of june um to support any partners in their own efforts to engage or to um create an integration for them to test so if you've got an endpoint you'd like for us to test um please do reach out and um we're happy to get that going you can see a list of existing integrations um in our documentation from the grid that rick shared before and um that slide will help you um learn a little bit about what's entailed in uh creating a integration so um to thank you all um please visit the documentation and i'll tell you all about i'm setting up an integration for your preservation or sharing repository um with press qt and i think that with this funding we've been able to explore truly what it means to have a repository agnostic uh system for sharing files and metadata in a way that improves uh the way that we um share data between systems enhancing it along the way with fixity better key wording and ability to send files forward to emulation for those who don't have the right software to run executables so we've learned a ton and we'd love to share that information uh with the larger community thanks ever so much terrific thank you so much natalie and thanks again to you as well rick uh thank you to all of our attendees uh for spending some time with us at cni today i'm going to go ahead and turn off the recording and bring the public portion of this presentation to a close but any attendees who are with us who'd like to stay back and approach the podium as it were and ask more questions or um join the conversation please feel free to do so just raise your hand and i'll turn on your microphone thanks everyone have a great afternoon we hope to see you back at cni soon bye