 Good morning. What a good afternoon. Depending on where your minor body are, mine is currently on East Coast side. So today, my colleagues Natalie, Rick, and I are going to give you an update about our PresQT project. PresQT stands for Preservation Quality Tools, which is the planning grant funded by the Institute of Museum and Library Services. We're very grateful for the institution to give us an opportunity to work with many of our peers, as well as scientists, researchers, and software developers, both from commercial sectors and not-for-profit organizations, to really look at the gaps to manage the software and data preservation. So our vision, really seeing the software and data preservation as a part of the larger picture of research management lifecycle. So for that reason, we really do not desire to design a standalone tool, which is trying to force everyone to use a tool and adopt a tool. On the contrary, what we're really trying to do is to look at an open architecture that is interoperable and format agnostic, which can connect to the existing tools, or integrate existing features of the current tools, and design some new features that is currently missing from the current landscape. I think many of our peers probably can relate to our Notre Dame story from Richardson's side. And they really fell at this point in time. They're really facing a number of challenges. There are lack of storage capacity and data curation processes. And the institution really lacks standard metadata and indexing technologies, as well as tools that will support the whole research workflow. From the library side, we continue the analog publishing workflow, which we engage the software and data preservation at the end of the life cycle, which is a very expensive, labor-intensive process before we actually can share and preserve the data and the research. Of course, there's increasing expectations from the funders asking our researchers and scientists to share their research results. So this is more a visualization of the current maybe the philosophy. So we engage at the end of the life cycle and trying to do a very heavy lifting work, getting the data in the shade, so we can preserve and share them. And that's where we currently are trying to do, is really trying to ingest ourselves upstream in the life cycle and really make sure the quality of the data are insured. And so once we're moving down the stream, we'll be able to easily to preserve and share the data. I think at this point in time, also our peers may relate to, we put in many pieces of infrastructure in place to support research management. So many of us train librarians. We hire the e-research librarians, data management librarians. We create repositories, and we even create a data management team, so really facilitating and really moving upstream to helping our researchers. And in our name, we even create an instant of OSF institution, instant to support new users who adopt the OSF as their workflow management tool. But still, if we look at where we are, there are researchers who adopt other tools and the switching cost for them to switch into a different tool is still pretty high. And as current state and our name, our library and office research really struck a very good strategic relationship. For every new grant awardee, they shared their names with us. We actually can engage with them in the beginning of their research workflow. But there are still many of grants, many of them, more than the librarians we have to support the volume of work. So if I can summarize the current research management landscape, probably this picture really summarized that. And so when we contemplating the image, we thought maybe we should use a picture of the universe or at least a solar system. Sometimes we felt like we're the libraries, the office of research, the department of colleges or galaxy and the planets away. So we really think about manage the research. But in fact, we're living on the same planet and I think for most of the people we're supporting are living from the same geographical area. So this picture probably really summarized where we are. Although we're on the same campus, so for the islanders on each of the island, and we have different cultures, we speak different languages, we have very different processes and even the technology and tools were used are so different. So really with all these instructor pieces we put together, we still see the data not quite easily flow from the researchers hand to the curators and preservators. And it's very difficult. And going back to Jones keynote, so from the user center point of design, how we can support user in the current sort of landscape there. So we have to admit, you know, Rick, me and Melanie, we're not creators or makers of this landscape or the geographical features. You know, we can already change the current status quo. Maybe the provost and the present can do that. But as a member of the community, what we need to do is think about how to really working within this landscape and trying to adopt and create a tool that will help our researchers to let data flow and that our librarians do the job to preserve and managing the data and software. So I'll stop there and as we continue our presentation. Thank you, John. So Simon Rick Johnson also from Notre Dame. And so really in thinking about how we have been approaching this challenge, it really is with Prescuti looking at how we can connect the various areas of excellence, subject matter experts, different specialists across all different groups. And with that, we approached this effort with the planning grant with an assessment survey that had over 1,700 respondents from a variety of groups from researchers, librarians, software creators, et cetera. And in addition to that, we also, our effort included a couple workshops where we were really hands-on trying to bring in the existing tool makers of a lot of these services. Because as John said, a lot of this is not looking to create new services, but really to start to bridge those gaps more and more. And also as he said, there's a high switching cost for anyone that would be switching to new tools. So really again, how can we continue to bridge those? So within our effort here, we have completed our survey, we have completed the workshops, and we have our final report is mostly done. We haven't fully submitted it and posted it yet, but it is mostly done. And with that, we're in this phase now where we have a set of recommendations based on what was coming back from the survey. So then another aspect of this, thinking about our grant from the get-go, we really thought of our grant as something that really needs to be completely open. So this may be a radical concept for a grant funded project to be totally open, but really from the get-go, all from the beginning, all the way until now, all the materials have been freely available and they continue to be on the OSF here. And then one example of this is our needs assessment data that is available on the OSF site, and I'm gonna now turn over to Natalie to talk more about that. Hi everyone, I'm Natalie Myers. I'm a research librarian at University of Notre Dame. And you can see on the screen some information about the needs assessment data that I'll be presenting to you in the forthcoming slides. At the lower left of the screen, you can see that the questionnaire and the data are shared, but also for this project, we've shared the code with which we visualize the data. All that information is available on GitHub. You can access the information through our OSF project at osf.io slash xfws6 if you'd like to follow along. The other contributors who help put this data and code up include Don Brower in the Hesburgh Library, who helped visualize and share the survey results, and Brandon Greenewall in Notre Dame Center for Social Computing, who helped develop the survey instrument and administer the survey in Qualtrics. So the work you're seeing is not just a project of myself, John and Rick's, but of a larger team that has been behind the scenes, helping us share along the way. None of this happens in a vacuum when we set out to do stakeholder engagement for what tool gaps in data and software preservation quality might be. We grounded that work in a review of related surveys and case studies that you can see here on the screen. These resources informed our needs assessment instrument and they were presented and shared at our first workshop with our stakeholders. As I present our needs assessment results, they'll stand on their own, but where we have repeated or reworked questions from the science gateways, APS open data and state of open data surveys, I'll point that out. And we're also planning a virtual web panel for later this spring or early summer where stakeholders from those related efforts will help us discuss trends, commonalities and differences that have surfaced through these crossover questions that we've repeated from past surveys. So I encourage you to revisit the slides online and explore some of these other surveys that help us understand the landscape of data and software sharing and researcher attitudes toward those over time. On the screen you see a preview of our needs assessment results. This information is all live in github at ndlive.github.io slash presqtneeds slash you can go there now or visit it again later. When you're on that site, the information is interactive and you can cursor over it and it will show you more detail about the visualizations that you are encountering. The first question we asked researchers was how familiar they were with tools used to share published site and preserve their data in software. We surveyed at National Science Foundation funded principal investigators with over $5,000 in funding along with library publisher, developer and other stakeholders of the information systems that help scientists share their data and meet publisher mandates, meet funder mandates. One of the next questions we asked researchers to help us with was to find out what kind of data or software preservation tools the respondents would find most useful. We asked them to identify usefulness among different tool categories that you see here on the screen, including provenance tools, workflow tools, fixity tools, keyword assignment tools, profile-based recommenders, de-identification tools and tools that provide an assessment of a digital object's metadata completeness or preservation quality. So when you hear me talk about quality as we explore these survey results, that's our definition of tools that ensure data or software quality, tools that provide an assessment of that digital object's metadata completeness or the preservation quality of the object itself. The way the respondents answered these questions is here on the screen now and we can see the priorities that were expressed by the researchers in the order of provenance, workflow, fixity, data and software quality and keyword assignment as their top five priorities for tools that they would find useful. If we look in more detail and we compare software developer and non-developer expressions of interest and usefulness for these tools, what's most curious is that the interests mostly align regardless of role. So whether you are a researcher who writes software or a researcher who simply uses other software, your interests are fairly well aligned in terms of the kind of tools you need. We didn't necessarily expect this degree of alignment but we can see that interests are fairly aligned. Here on the screen you can see all the responses shown at the top rule for each tool category, then the software developer's perspective in the second row and the non-developer's perspective on the bottom rule. Note that these are not either or roles. Researchers could co-identify as software developers in our needs assessment. Just under 40% identified themselves that way with software development. So when we look at non-developers, that's about 60% of our respondents. Those who develop, maintain or support software systems are a little more interested in provenance, quality, and fixity than the group as a whole. But otherwise interests are fairly well aligned. We next asked users if they had a data or software preservation quality tool need, the project could help them develop. This was a free text question and these responses are available in our dataset. I encourage you to visit them, use them for your own purposes. This was our field of dreams question. Instead of asking if we milled the next great thing, will you use it and taking our planning grant to evangelize a future product or tool, we instead took that planning grant opportunity to ask researchers to tell us what kind of needs they had that our project could help them develop or develop for them. Next we asked if there was a tool gap in their existing digital ecosystem or workflow. Those were also free text responses. And you can access these on GitHub or the projects OSF site. And I'll give you an overview now of the rest of the researcher questions, a high level overview, and then we'll look at some developer responses. I think these three questions are quite interesting. We asked in the past how often had respondents made their research data free to access reuse, repurpose, or redistribute. And nearly half said that they had done that. But then when we asked them, that was a repeat of the fig share or state of open data question. But then when we asked them a little bit differently, is any of your data or co-published or shared now on a repository or website, you'll notice that the answer shifts a little bit. Fewer have actually shared than those who say they've made their data free or redistributed it. And then finally when we asked them, if they anticipated sharing, you'll notice that the vast majority, nearly 80% anticipate that they'll probably have to share within the next five years. So we have a long way to go, about 40% gap to help researchers jump over the next five years, not just with our systems, but with the services we offer our researchers. We also asked in the past three years, had the researchers research group made publicly accessible any of the following items through their institute's website or third party repository. This is a repeat of the American Physical Society survey question for the open math and physical sciences workshop by NSF. And we'll compare the responses between those two groups in an upcoming panel. But what you can see here is that for our survey respondents, many had shared process data, figures, plots, table data, software, fewer had shared raw data and fewer had shared structured databases. But in general, these were the kinds of areas in which they were making things public. This becomes important when you start to think about how much storage you need to accommodate researchers sharing activities. And when we look at the American Physical Society audience, what we see our high energy physicists really making audience matter in terms of who's sharing and what they're sharing, their raw data sets are huge and they don't share those raw data sets as individual scientists. So they have a concern on their side more with sharing software and process data. Then we asked about whether researchers needed better tools to share, reuse, site, publish, or preserve their own or others data or software. And people were more interested in tools for data than they were in tools for software. But that aligns with the number of people in our respondent group who developed software in the first place. So I wouldn't make this guide me too far in deprecating the need for preservation systems that help preserve software. It just indicates how many respondents are writing software and think about that as something they have to preserve. We anticipated for keyword assignment more interest than respondents expressed. So most respondents felt that it was usually easy to assign keywords and that when asked to submit keywords for their own or others research that the terms available usually accurately describe the work at hand. We also asked researchers if they actively created or managed metadata to make sharing, finding, or documenting provenance of their own or others data or code easier. This question is interesting in the context of fair making data more findable, accessible, interoperable, and reusable. What we see here is that about 40% of this group have never taken metadata into consideration and don't work with it. That means if we want to make data and software more fair in our information systems and our information ecosystems, we have to help researchers bridge this gap with the systems that we build and the way researchers use those systems and how we support them. We next asked whether employers required them to make any of their publication or data openly available and whether any of the organizations that funded their work required them to make any of their publications or data openly available. Again, audience is important here because we did survey NSF funded researchers. Most do have a mandate they're responding to. What you'll see on the bottom line is that a large proportion of our researchers felt that their funder required them to make their publications and data openly available and that indicates the proportion that we had response from the NSF audience. This question we asked is quite interesting in the context of CNI. We asked in the researcher's estimation which of the following currently have the infrastructure required to provide long-term public access to research data. And respondents first indicated third-party repositories, then their own institutions, and then journal publishers. This is a little contradictory to the 2017 CNI roundtable outputs and I was trying to think about why that might be. I invite you to explore the links at the bottom of this slide to read through that roundtable information or view that video because when we asked these researchers, these NSF funded researchers, similar kinds of questions, they point to third-party repositories and their own institutions first. We wondered why. I think that one of the reasons why might be the question researchers ask about where did my data go? And recently there was an article in F1000 by Rouhani Fareed and it was about whether badges for sharing data in code increased the amount of sharing. But what was interesting in the context of this article was that 49 out of 76 or 64% of the articles that provided links to data in code at biostatistics had broken links to that data. And for statistics in medicine, 21 out of 53 or 40% of the articles that had links to data in code had broken links. Researchers don't trust their publishers because their data isn't there when they go looking for it in some of these high-profile cases. These were Oxford journals and there was a note in the research about how when they had switched to a new publishing platform in January 2017, some of the supplemental material was lost in the transfer. These are the kind of things that make researchers pause when they're asked about who has the resources to preserve their data, not just share it, but preserve it. So we need to think about this answer in that context and think about institutional repositories in that context and why researchers might not feel that publisher solutions provide a trustworthy or dependable preservation service alongside sharing or publication service. I think what we can acknowledge here is that we need both publishing data citation, software citation, as well as preservation systems like institutional repositories and third-party repositories alongside publisher repositories. The results related to those broken links are important when we look at this question. Next, we asked about how important to researchers' work were web-based applications that provide access to the following specialized resources. This was a repeat question from an earlier Science Gateway study that was conducted in 2015. What we've found here is a little bit of movement as communities of practice build tools out in these areas, the areas that people are interested in now are changing and we'll explore that in our future web panel a little bit more. Finally, we asked researchers for others to reproduce their results. Would they need software coder scripts to reproduce it? What's interesting here is that less than 50% of people thought so. And then we asked, do you create and or use software codes or scripts in your research greater than 50% say yes? So there's a little bit of an inherent tension here in terms of what researchers really think needs to be shared for others to reproduce their findings. We had a second set of questions specifically for developers that followed on asking questions about whether researchers authored software coder scripts to analyze or produce their data, whether they hired people to do that and whether they revised commercial or freeware software to do that. We also asked about how they shared their code and whether they containerized code to make distribution easier and in their tasks of administering, maintaining and supporting tools where their focus areas were and how long they anticipated having to support that software and how long people would use it. The preponderance said from five to 10 years or less than five years. So people aren't thinking long term about how long their software will need to be accessed or used. That might make reproducibility and reusability of software more difficult as time moves forward. I'm going to turn things next over to Rick and he'll talk to you a little bit more about the end phase of our grant and our next phase. I encourage you to visit the needs assessment online to view more of the data. And during our question time, we can talk a little bit more about some of the researchers' questions. Thank you, Natalie. So in thinking about all of the results that we have gathered from our survey as well as the insights from tool makers and other experts that participate in our workshops, we've incorporated a lot of those insights into an initial design that we have detailed out for our final report that really integrates a lot of the key focus areas that emerged from a lot of the results where things like just generating checksums for fixity was deemed a high priority. Generating provenance metadata was another one. So in thinking about this, those seemingly by themselves may seem like fairly small operations, but as John said earlier, if you think back to that workflow diagram, if those things can happen very early in the process, then all of a sudden the level of trust and quality of preservation, et cetera, really goes up dramatically. So we're really looking to bring a lot of these tools closer to researchers and other practitioners themselves. So in looking at our overall timeline, we are in the midst of a planning grant that is concluding, but along with that, we have submitted an implementation proposal to the IMLS in the January 16th call. We're still waiting to hear back from that, but really regardless of that, all of the recommendations that we've made will be available in our final report and really available for anyone to come and adopt. So these findings do not hinge upon the success of that implementation proposal to the IMLS. So in thinking about this, we really have again thought about this as, we need this to be repository agnostic and you can really insert technology for repository as well. It's one that we cannot depend on assuming that researchers or librarians or anyone else are using any one particular tool. And we really need to approach it from a true interoperability standpoint and think about these services where within the thought of our implementation proposal, for example, we've thought about it as primarily building a bridge between existing services as well as creating new services for gaps that where there are tools that are deemed high priority but really are not especially mature in the space yet. So that's the bulk of what we had. I wanted to do some final things as well for, so John, Natalie and I are co-PI's in the campus. We also have another co-PI, Sandra Guessing, who's in our Center for Research Computing at Notre Dame. So she was not able to make it, but she's kind of here in spirit. She's a large contributor to a lot of the technology aspects and also looking forward to if the implementation grant is successful, another large bulk will be coming from our Research Computing Center on campus. So welcome any questions at this time.