 Hi, everyone. My name is Sarah Ben-Mammer. I am the associate director for research services at Welco & Almedicine at the Samuel J. Wood Library. And through this video, I would like to introduce you to our research data management ecosystem that we use to build up a response to the latest federal data management ensuring policy in our academic institution. All our services related to data management, I really mapped onto the research data lifecycle that is presented to you here on the right. The research data lifecycle starts with the planning and design of an experiment. It is then followed by the collection and the capture of some data. This data needs then to be analyzed with or without collaborators and then come the time to think about what to do with these data sets, how to manage them, store them, preserve them, or discard them for part of it. And then comes the time to think about sharing and publishing the results and the findings based on the coming out of those data sets. Finally, the research lifecycle ends up with the discovery we use in citation of those data set. For this video, I would like to introduce you to four services and tools that we have available for researchers at our institution to help them comply with this latest regulation coming up from the funding agencies. So that includes our data core, which is our computational secure enclave for sensitive data sets. I also talked briefly about our centric software hub. Our latest tool is our data retention tool that we implemented to help researchers comply with the data management part of the NIH policy. That recently came out. And finally, I'll also introduce you to our data catalog that we hope will facilitate the discovery we use and data set produced at our institution. Throughout this whole presentation, you'll see three objective displayed at the bottom. When those objectives are displayed in green, that means that the specific tools that I'm talking about are aiming at fulfilling those three objectives that we have at our institution. Our first objective was obviously to build a data management program and help faculty to adopt better management, data management practices. The second objective is to engage our stakeholders and also foster the application of the fair principles, fair standing for findable, accessible, interoperable and reusable data. Our third objective is also to offer some storage solutions to our researchers again to help them comply with the latest regulations for data management and sharing. This slide is providing you an overview of all our services that falls under the category of data management. Our very first service was actually data core depicted here. This is still our secure computational enclave. It is made specifically for researchers who are analyzing and needs a space to host sensitive data sets so it is completely HIPAA compliant. And we have also CMS certification for this environment so we can host a different kind of restricted or sensitive data sets. We also have librarians within this context act as a curation team so they really act as intermediary between the external data provider and the researchers. We are as librarians handling the data for the researchers. We also organize the access to this data in compliance with the IRB protocols or the data use agreement that researchers are providing us so we make sure that only the users who can access the data are actually accessing the data. Next to our data core, the other application that is actually intrinsically linked to data core is our data catalog. Our data catalog is a place where researchers can advertise the existence of a data set with the idea of fostering its reuse whenever possible and applicable. The particular data catalog is it has a associated data use agreement for each data set so people can understand what are the restrictions related to the use or reuse of specific data sets. In our latest I would say tool and service is related to our long term data repository depicted here. This long term data repository is both supporting the data core at the end of a project. When a project closes researchers are invited to archive their data in this long term data repository. And this long term data repository is also filling feeding the data catalog application whenever applicable. So and this is also this repository that is supporting our latest tool that we call the data retention tool that we build up to low researchers to archive the data set easily and at least partially comply with the latest federal regulation regarding data management and sharing. The library is also working doing some work for our research integrity office we do some work relating related to data integrity. And we also are in charge of provisioning scientific software to all the campus including our campus in Ithaca, as well as our Carter campus. So this is how our generally how our data management program looks like. Now just a little bit a little bit more more insights into where a data core data core enclave. So as I mentioned the data core is our computational secure enclave it is administered by librarians. The librarians act as data management specialist the curate important export data, ensuring that also ensuring that the data that is exported is all the identified for researchers. And librarians also ensure that any access to data set is compliant with the data use agreement, and that all the data use agreement or are be protocols are up to date. And that the computational environment that hosted data is reflecting this, the RB protocol and or the data use agreement. So it is allowing also researchers outside of the institution to perform some analysis. You have here like you some numbers of that gives you an idea of the number of PIs users that are currently using the data core and the number of projects. So this is the data core is the only institutional institutionally vetted service to host sensitive data sets for the Cornell University. Welcome L medicine R a B and our partner in New York Presbyterian Hospital prefer data core to host any data that contains patient health information. And all of this data core is obviously connected to our scientific software hub, our data catalog and our data retention to which I will detail a little bit further later. So, as I mentioned, the data core is really a play like a place where people can host in analyze their data, their sensitive data set. And what librarians are doing in this whole context is really to play as an intermediary with the data provider and the research team and also the data for the external data provider and the PI. So as I mentioned, we ensure that all access to the data set is in agreement with any protocols or data use agreement that the PI has provision us will also help troubleshooting some issues related to login or software or or data with the research team. The PI is of course always overseeing the access that is given to the users in the specific projects. So they are the one confirming if someone has to have access or not to the data. And they also make sure that the data governance is up to date. And in compliance with the data provider requirements. Actually, the data core is also a space where external collaboration work. And this example of this example of this study shows is an example actually of external collaboration. That happened within data core, the New York in psych clinical research network. COVID study gathered multiple institutions located in New York City. And who use data core to collaborate and gather all COVID data during the pandemic. So the data core, of course, as I mentioned is allowing researchers to analyze the data that researchers need scientific software to realize to perform their, their analysis and so the scientific data has been created with the spirit of saving money for our researchers by basically accumulating and aggregating all the license request for specific softwares. And that would give us that gave us actually a more leverage to negotiate with vendors and we are now provisioning about 40 different softwares to our to both our campuses in Ithaca. And in NYC campus. And our most popular titles include by a render graph pad flow Joe, ingenuity pathway analysis. So far we have about 110 requests per month. And it keeps increasing as we provision more and more softwares. License softwares. Finally, our data catalog is also a supplement to data core. Because it is a place where, as I mentioned, researchers can advertise their existence of the data set as of now. We have 62 active data set this data catalog can be accessed through our community central authentication. And the particular particularities that includes data governance document associated to each data set. And there are some some tools also available to search for a specific data set based on keywords or period covered by the data set. Here you have an example of an entry of a data catalog of the sorry, an example of a data catalog entry. You can see that for each of those entry, there is a small paragraph that details the data set purpose and content. There is the period that is covered by the data set. Also the source of the data set if you click on this date. Sorry on this title here blue title is a clickable link. You can also have access to a contact to the contact information of the person is in the institution that can provide more information on the data set and also see any data governance or restriction document associated to this data set. So, all of these existing tools, let us to create this data retention program in an incremental way. So the way this is the latest tool that was added to this ecosystem. And the way it was designed was to as much as possible use a generalizable design and to align this new tool with the existing design. The idea is to provide something that is completely independent of any technical environment. So whatever if whatever operating system, for example, the user is using or browser or anything like this, the experience should be the same. So this brings me to our data retention tool, our latest application. That we really built to help researchers comply with the NH and the new Cornell University Data Retention Policy. This new Cornell University Data Retention Policy requires that any researcher or any researchers at Well Cornell Medicine archive their data set every time they publish a paper. Or every time their grant is closing or when they're about to leave the institution in all those three cases, the researchers expected to do retention request to this tool. And those three cases that I mentioned are called milestone. But that's the idea here. So, when the researcher make a retention request, they are prompted to create a project and associate the data set that they need to archive to this project. And during the process, they have the possibility to create a data catalog entry to make the data more visible and shareable if they wish. So there is a direct connection between our data retention tool and our data catalog. So just to end up this presentation, we obviously faced many challenges during the setup of this ecosystem because the latest federal regulation are a moving target. So it is hard sometimes to keep up with the latest change. The training needs to be also updated accordingly and it is hard also to keep the stakeholders engaged when so, so many changes happen in short amount of time and obviously it has a cost to build up all this ecosystem. The successes we had was that we only managed to create this ecosystem because we had a close collaboration between librarians and our information and technology services. We have very highly motivated staff and our leadership was very supportive both from the library side and the information and technology service side. And we got a lot of help from our researchers. So with this, I will thank you for watching this video and please do not hesitate to reach out to us if you have any question. Thank you.