 Welcome everyone to nice to have and needs to have essential data management best practices and tools for NIH clients. My name is Mark Call. I am the product owner for Center for Open Science and we'll be moderating today's session. With me, I have a bunch of world colleagues who are now my friends who will be speaking with us today. With that, we're going to go on around and we're going to say exactly where our name is, where we work from, and I know something interesting about ourselves. Well, Stuart's got his hand up, so let's ask him first. Oh, I must have accidentally clicked on that, but anyway, so I'm Stuart Buck. I'm the director of the Good Science Project, which is a small tank tank focused on improving federal science funding. Prior to that, I spent many years in philanthropy at a place called Arnold Ventures and helped, among other things, helped fund the Center for Open Science at its launch. Awesome. Thank you. Let's go on with Crystal. Hi. My name is Crystal Lewis. I am what I would call myself a research data manager. I've been doing this for about 10 years and I'm currently doing this in a freelance capacity, consulting with people on better ways to wrangle document and share their research study data. So glad to be here. Thank you. Let's go with Nikki. Hi. Thanks, Mark. I'm Nikki Pfeiffer. I'm the Chief Product Officer at the Center for Open Science, which is in Charlottesville, Virginia. I have the pleasure of working with researchers looking to adopt more open, transparent, and reproducible practices into their research workflows. I support a fabulous team of product designers, product owners, Mark Bingwan, and software engineers in developing and maintaining our open source software tool, the Open Science framework, where we continually optimize the features and the workflows to support research management and collaboration, open and transparent research sharing, improving the rigor and quality of research with planning and preregistration of study designs, and open scholarly communication. Thank you, Nikki. Nathaniel. Hi. I'm Nathaniel Backoffer, Chief Product Officer at MetroBase Platform for Reproducible Data Management, Data Set Collaboration. We host data, keep it secure, allow you to collaborate in documented and readable way. We work on embedding best practices in our platform to ensure that you navigate your data management pipeline from configuration in your data set through to analysis and your best practices way. I'll talk more about best practice data management throughout this webinar. If you're working on a project now, Anderson, you're making your pipeline more reproducible. We'd love to have you get in touch with us. Sam will put a contact in the chat. We'd love to hear from you, whether you want to use our platform, whether you want to use all the other tools out there. We'd love to talk about your project and how it can help. All right. And then Sam, speaking of which. Yeah. Thanks. Well, hi. My name is Sam. I'm the CEO of MetroBase. Nathaniel kind of covered what we do, but we're very excited to co-host this event today. I think it'll be only one who's going to say the fun fact. The fun fact is that I am currently in shimmy and that's the reason that I am not on video today. The Wi-Fi isn't super stable, but I will be fielding your questions. So if you have questions throughout the presentation and throughout the panel discussion, feel free to pop it in the chat. And then once we get through the moderated discussion, I will field those questions to Mark and we'll hopefully get them all answered. So looking forward to it. Thanks, y'all. All right. Thank you. So let's go ahead and start with context. Recently, recently, as in months, several, several ago, the White House has put out an addendum saying that all the research that is going to be funded will be publicly available. NIH has also done very similar policies where it says that all the data is going to be publicly available. It's going to be required out of all those researchers that they have funded. And this is really transitioning the scientific community and their workflows to start adopting more open science best practices and really having them focus on data management best practices. Exactly how can they organize their data so that way it's digestible by other research teams? That way, others can assess their rigor. They can be able to reproduce those findings, et cetera. And actually, I'm going to fish this over to Nicky, who understands all of that stuff, what it means, how it came to be, and be able to give us that context. So Nicky, if you don't mind, I want to fish over to you so you can speak to about the White House, NIH, and how it's transforming the landscape. Thanks, Mark. I will do my best. This is pretty meaty to talk through and there's a lot of moving parts as the policies are coming out and being sort of translated into practice. And what does that look like across the research landscape? So really quickly, just to kind of get the dialogue going, hopefully we'll get some really good questions and provide you with some insights as we wrap our heads around what these policy changes mean. I'm just popping into the chat the actual memo from the White House that I am referring to. So if you haven't had a look at this or haven't looked at it recently, it's just an opportunity to go to the link and explore it some more. But as Mark mentioned, this memo from the White House Office of Science and Technology Policy came out last year and it is a statement that requires all federal agencies with research and development expenditures to update their public access policies by the end of 2025. And given how quickly 2023 has already started to move, I feel like this will be here in the blink of an eye and it means a lot of thinking and planning needs to happen to support the implementation. So this does require that publications that support data from federally funded research are made publicly accessible, no more embargoes and that they are free and public for anyone in the public to access. It also calls out something I really am excited to see in a statement like this. It calls out the use of appropriate metadata. So it's something that we talk a lot about that sharing scholarship or data results is fantastic. It's the step in the right direction but without good metadata, it really isn't the most effective way to spend your time. So just as much as you prepare the data, it's just as important to spend time preparing the metadata so that this is easily accessed and cited to maximize the impact of reuse of that research. But actually in this memo, they do cite some minimum metadata. So I'm gonna just mention those because I think it's important for us to spend some time on this to make sure that everybody understands what the minimum metadata is and to ensure that you're starting to prepare those aspects for any research that you're sharing. So it talks about authors and co-authors, their affiliations, the sources of funding that support the research, the publication dates and specifically mentions the use of digital persistent identifiers on all the outputs and the use of digital persistent identifiers for all awards. So that may be something we spend a little bit more time on but I'll go ahead and start moving through the other things that Mark has asked me to share with everyone, which is to go into a little bit more detail with the NIH, the National Institute of Health and their specific policy. So I just dropped the link to that on their website and this talks about their plan, their policy requiring data management and sharing plans. So this went into effect earlier this year. So they're making progress on their deadline of 2025 by rolling something out pretty quickly. They have been working on this for some time though. And so this policy is meant to promote the sharing of scientific data to accelerate biomedical research discovery, enable the validation of research results, provide access to the data and promote the data reuse. So this requires that new research proposals include a data management and sharing plan alongside the research goals and anticipated outcomes. They do specifically talk about a plan and the budget for managing and sharing of data. And that is definitely one aspect that we talk a lot about with researchers. And it is a need to really think ahead about what it's gonna take to prepare the data and the metadata alongside and find an appropriate repository where that can be preserved and shared for many years in the future. And so the last, I guess aspect of this that I wanna provide a link to is this one which is mentioned in both of these as a sort of tangential but it is really good guidance that's come out around the desirable characteristics of data repositories. So for researchers that are really thinking about finding the right home for their data and the appropriate metadata to take a look at what the federal government and the NIH have pointed to as far as what you should be on the lookout for what are the things to assess when looking at a repository. So that's just another valuable link and a lot of the tools out there are complying with this. And so that's definitely something to look for on their websites or to try to inquire more about when you're choosing your appropriate repository. So that's a bit of the landscape of the policy and it definitely has impacts on what you do within your research and how you roll out some of these aspects of compliance. And so I think we'll turn it back to Mark to see if there's anywhere else we wanna go specifically around policy if we wanna start talking about what this looks like in practice. You knocked it out of the park. I don't have any questions about our audience might. So if you can please pop those in the chat and we will follow up on those later on. There was one general question that I did have which is we are not gonna be sharing slides. This is more for discussion but we do have lots of resources that are being populated in the chat. So feel free to go and explore those while we have our conversations. So with that, we'll go ahead and start with a couple of questions that we have for our panelists that all of our teams get quite frequently. And this is open to the floor for any of the panelists. First question is, what do I need to do differently with the new NIH policy? Nikki already said what their requirements are but what do I need to do differently? Nikki maybe you start and then we'll go around the panelists so maybe Nikki and then next Crystal. Yeah, this is a great question. And I think given a little bit more context I might have a slightly different answer but I would say take a look at the data management and sharing template that the NIH puts out. It's a really good standard. No matter what your research discipline is to say this really starts the thinking about what type of data and outputs you're going to produce as the outcomes of this research the research you're planning to do. And I think just wrapping your head around those basic aspects and where and how you plan to archive and share those outputs that can just get you started on the right thing. And I think from there, depending on what you determine as the outputs of the research and the types of data the format and the volume all of those aspects then finding the correct home for that and taking a look at the desirable characteristics to find the appropriate repository. Yeah, I'm happy to jump in. I also put a link to the template that Nikki mentioned in the chat I wasn't sure if that was in there but I know two things that are kind of maybe a little different than maybe it was in the past when you submitted a grant for NIH is that in the new plan you have to explicitly state your plans for sharing data. So you can't kind of be iffy or fuzzy about that or say maybe I'll share it in the future but you have to be very explicit in the new plan and so that's one difference. And then having your data ready by a specified deadline is a kind of a new thing as well. So there's very specific deadlines about when you have to share your data either at the time of a publication or by the end of your grant. So those are two things that are gonna be different than they were in the past. Anything to chime in on Stuart or Nathaniel? I would just second that point. In the past data management plans have existed but oftentimes I mean I've heard someone who worked for NIH say something that previously you could get away with a data management plan that simply said I don't plan to share data. And the NIH's first assistant director for data science Bill Bourne when he worked at NIH he would give public speeches. I heard him more than once say that the data management plans at that time were a joke. That's his exact phrase, a joke. So I think going forward that's gonna be a change. You can't get away with a data management plan that that's a joke anymore or that just says I don't plan to share data. So there are some serious expectations that are gonna be enforced. I think the most painful thing is not necessarily writing up a data management plan that's gonna be complex for you but it's just thinking carefully at the beginning what kind of data you're gonna be producing. So that's part of the purpose of the policy is to get people thinking in advance of what outputs they're producing. So having that all set up beforehand rather than being able to figure it out on the fly as you're doing your research. Thank you. So let's say I am a researcher that has already started my research project. What do I have to change now? Anyone? Sam? No, I was just gonna call on people. We can do that. I think it depends on if your grant is subject to the deadline, right? The GNI-25 deadline. So if it wasn't, if you did this before, you're kind of working on old standards if you're in the new grant, the new data management sharing policy, you're gonna have these new standards that you're held to, so. All right, so what would I need to change? Go ahead, Crystal. I mean, I think it's the same thing that you already talked about. You have to explicitly state and make plans for actually sharing your data or at least explaining why you cannot share your data if you cannot for certain reasons. And then you're gonna have to think heavily about data management so that you have your data ready and time to share it by the end of your project or by the end of your publication time, so. Yeah, just to follow up on that, also what's interesting to me is that the requirement now is not just that you have to share data when there's a publication or an article, but that at the end of your grant period, you have to share just a quote from the NIH's webpage, you have to share scientific data underlying findings not disseminated through peer-reviewed journal articles. So I think that's something to really keep in mind throughout your grant that scientific data that you produce during this grant, even if you don't publish, you're still gonna end up having to share that data. So you need to be thinking about that in advance and preserving that data appropriately. Thank you. So thinking about best practices in the future to adhering to the NIH policy, where should I keep my data? Also considering such limitations such as IRB requirements and other requirements. All right, I'm gonna start doing what Sam said and just start picking on people. Yeah, just call people. Nathan, I've never heard from you in a while. Do you have anything? Yes, so the two big things. One is if you're at a research institution, then oftentimes your library will have data services branch with IT people who will help you conform to the standards that your IRB has put down. Also just in general, best practice, there's this term that data security people, this heuristic data security people likes the 321 rule having three copies of data on two different media with one copy off site. So basically having data that's securely stored on a cloud backup that sometimes your library can manage having a physical hard drive, for example, that you keep in the same place as you might keep your confidential forms and then also having it on your local machine to work with in a way that other people can't access easily without the login info that you have. Okay, I haven't heard of that. So that is a phenomenal idea that really helps with preservation. Stuart, Christine, do you have that one? Go ahead. Yeah, I was just gonna say that you, I think the two people you wanna talk to for storing your data as your IRB and in your institution, right? Cause they'll have certain approved tools that they'll want you to use. And like Nathan said, you wanna think about things like regular backups of data to keep data secure. You need to think about all your legal and ethical mandate, HIPAA for all those good things. A storage space with sufficient size, one that meets the needs of your project, not all data collection and storage tools are created equal, not all of them are HIPAA compliant. So you need to look into that cause it needs to have necessary controls to ensure confidentiality, integrity, that kind of thing, encryption, password protection, all those good things, depending on the sensitive level of your data. So there's a lot to think about for sure. See, and Nicky's kind of already beat me to the curb with my next question, which is, so as Nathan said, we need to have like approximately two different places to be able to store our data, et cetera, what would help me choose what is the best, not medium, but area to be able to store my data, rather than not have sensitive data, like you mentioned HIPAA versus non-sensitive data. What are some things that need to get in the back of my head? Yes, so really the sensitive data is gonna be, the sensitive data and the regulations that you have to comply with are gonna be really important. If your data aren't sensitive, then the thing, then it's just about backups. The main priority is making sure you don't lose things and making sure your versioning doesn't get out of sync so you don't understand, so that there's no issue where you don't understand which version of the data came at what point. And so that you can use something like more conventional light drop-offs that people drive to keep that integrity. Once security becomes more of an issue, then you wanna use things, use more professional tooling. You can use something like Amazon Web Services, which is designed to be HIPAA compliant, our platform, Trobe base. We're working on making sure that we have all the HIPAA compliant, so you can use us, but even if it's not HIPAA per se, you still want to either go through your institution or use one of these cloud services that does emphasize security to make sure that people can't, aren't just sharing links, some open link on Amazon AWS S3 or Google Drive and just anyone with a link can open it kind of thing. You wanna avoid that because that can lead to big issues down the line if someone just has that link lying around somewhere. And just really quick, I wanna chime in. Sherry said something interesting in the chat. Storing or sharing, that can be different. Do you wanna just mention or comment on that? Yeah, so there's gonna be in your data management plan, you're gonna talk about even if you have some parts of your data that are sensitive, there's gonna be, the NIH is gonna want you to try to share de-identified data. So one way to do that is to, at the end of everything, take the pieces of your data that are and make sure that they're de-identified and then share that on something public like OSF or Big Share or like Trobe Base or use a platform. One thing, if you're a little bit more technical you can try to be adventurous and have restrictions on particular variables in your data that are identifying. It's something that we're working on implementing as platform, but there are other services which will allow you to take specific, if you split up your data and say, these are the, this is the part of my data which identifies who the subjects are. This is, these are the data observations, then you can not, you can release one data file and not the other and that way you don't, you don't have to do the identification later on because you've constructed your data in such a way so that there's no leakage that happens. Crystal, Stuart, Niki, do you have anything to expand on? I just will speak to the couple of links that I dropped in the chat to try to get it. This is a tough question about where is the best place to share data or discover data if you're at the other point in the process where you're developing a proposal or a research question. Obviously part of the goal with sharing is to reuse and build on the large corpus of open data that's available. And so obviously accessing that is an important component. So again, there are different types of repositories. So just to focus in on NIH's approach because they have rolled out a policy, they have different institutes and centers that also operate and maintain their own types of data repositories. Typically domain specific. And so those are very good options to consider. But sometimes a general repository is more appropriate depending on the type of data or another domain specific repository that's not necessarily operated by the government agencies that you might be receiving your funding from. So just to explore the full landscape, I've given you a couple of links there and to say that sometimes, and this is kind of getting into it somewhat of a tricky situation where multiple copies of a dataset might exist out in the wild and that can be problematic. And so understanding when and why and how you should do that. What are some of the specific considerations? Like you're saying, if you do have sensitive data or things that need to have a very closed access loop that might live in one place, but there could be anonymized or other aspects of the data that can be shared openly and how do you point to the other versions of the data effectively and even support harmonization for somebody that has full access across all the different points of data. So just to kind of point that out, those are key aspects. There's some good guidance on some of these links that I've provided. We can also talk more about that. But I think really focusing on the domain for of the research that your data should reside and then trying to make sure that those researchers can find it easily. Yeah, I was just gonna add those are great links. NIH webpage that actually I dropped the same link as she did into the chat, but yeah, they have over 130, I think, NIH supported data repositories that are very often very specialized. I mean, there's the data repository just for zebrafish genomic data or just for fly genomic data or just for worm genomic data or just for yeast genomic data. I mean, it goes on and on and then some general list repositories as well. But that doesn't even exhaust the number of repositories out there. The registry of, let's see what your registry of research data repositories they have in their search over 3000 results of data repositories around the world. Many of them may not be relevant to NIH, but it looks like there's a data repository for just about any type of data or subject or field that you can think of. So, so yeah, just use the search function that would suggest don't try to browse 3000 results. Yeah, that would get exhausting very quickly. So we've been using the catch phrase best practices quite a bit. I'm curious, what do we actually mean when we say best practices when it's around data management and creating those data plans? I'm happy to chime in here. So, like, yeah, best practices are good practices. Usually are ones that produce accessible, reliable and quality data. I think that's what we're hoping for for both the research team that's doing the project as well as future data users as well, right? And NIH actually specifically called out a couple of data manager practices, validation, organization, protection, and then maintaining and processing scientific data. So these are things that they expect you to do with your data. So even if, I think Stuart mentioned this earlier, even if maybe you're not sharing all the data for maybe ethical, legal or technical reasons, they still expect you to manage all that data. And so, and now, like I mentioned earlier, because researchers are required to share their data either at the time of publication or by the end of their grant, teams can no longer wait to manage their data until the end. They need to be managing throughout the entire lifecycle. So data management needs to start at the very beginning of your project and then be integrated into your daily kind of project management workflow. And it should be a part of everyone's routine. I don't know if you want me to go, I don't want to, you know, take over the space of anybody wants to talk about it. Particular data manager practices, I'm happy to continue. I want to dig into that, but sure, does anyone else have anything to, do you want me to dig into that? I'm here in silence, Crystal, the floor is yours. Okay, so yeah, so this means that things like documentation, data cleaning, you know, if you're working with human subjects, data de-identification, data validation, these are all happening throughout your project, not just at the end of your project, but all the things to keep in mind, the end goal, which is data sharing to make sure things like, you know, if you're working with human subject data that you include some language about data sharing into your consent form, so that you're prepared for data sharing at the end, and also organizing your data documentation in a fair manner, which you'll see throughout the NIH language, making sure that your data is findable, accessible, interoperable, and reusable. And so, you know, the first thing you need to have, according to this policy of the plan, right? So you need to have your data management and sharing plan, which is about, you know, two pages or so. But even on top of that, you want to have a really detailed plan once you're awarded your grant, where like roles and responsibilities are formally laid out, a workflow design so that staff know exactly what they should be doing and when in terms of data management. So some practices that come to my mind that you'll want to implement into your workflow would be using data standards. So if a standard exists in your field, you want to implement those if they don't, create some standards for your project so that they're implemented throughout your project. So standards for things like naming your file, it's formatting your files, versioning your files, naming variables, those kind of things. So they're done consistently throughout your project, describing your data as well. And then one thing I'm sure Nathan would agree with is that you want to collect quality data from the beginning, not collect bad data and then have to clean it up later at the end. You want to have good practices in from the beginning. And also store your data in a way like we talked about that prevents you from losing it with that three, two, one rule and also risking breaches of confidentiality. And you want to document everything, all of your decisions and processes and you want to document throughout your project. Keep a record of that. That kind of helps you have data provenance, right? So you know where your data comes from and what kind of manipulations have been done to it along the way. And then the last thing I'll say is that you want to check everything throughout your process. So build data validation into your workflow, check your data as you're collecting it and see if anything looks off so that you can catch it early on. Check it as you're cleaning it so that you can see if anything looks off. And so again, all these practices are part of your entire data management workflow throughout the whole project so that you're ready to go as soon as you need to share that data at the end of the project. I love how you listed that. All I need now is to get a little checklist that way every step of the assignment process will be like, yep, got that done, got that done and keep on going. Just to make sure everything's nice and clean. Does anyone have anything to expand on before I go to our next question? I mean, just the way I summarize it, one, Crystal's book's draft, which she didn't plug but I will plug is fantastic. And you can see it online, it's great. And you're just documenting everything is so key and using standardization as much as possible in your workflow and then use Crystal's book to do that checklist. All right, so I have a hypothetical question for you. Aside the fact that it is required now or soon to be required by House NIH, why do I even need a data management plan to begin with? So you can take that one. Or we're just gonna say it was NIH and then that's the only reason? I mean, I think there's a myriad of reasons to manage your data, right? The benefits are far beyond what's required of you. You know, I could go on and on about it but it's benefits you, benefits your team, a bit of this society. So, you know, it's not just about like checking the box for the requirement, right? So it's about all the benefits you receive from managing your data, so. Yeah, I'll say something that comes with experience is that collaborating with your past self, it can be a horrible experience and that you can be your most frustrating collaborator. So having a data management plan, if nothing else, even about, there's collaboration with other people, there's the impact that your work makes. There's other people, there's their science but something that should be compelling to everyone is that if you have a good data management plan to start with, then your future self will be grateful. Great minds think alike because I just posted a link in the chat to a Twitter discussion of the origin of some of the phrase or the joke that your most important collaborator is yourself from six months ago and unfortunately yourself from six months ago can't answer your emails. So yeah, so I think it's really all about, I mean, it's really in your own self interest I think to successfully manage your data because I mean, if you've ever worked with data or code for that matter and you've set aside a project for six months or a year and then come back to it, you can find yourself just really scratching your head trying to figure out what was I doing here? What did this variable mean? Trying to recreate that knowledge it vanishes pretty quickly if you shift to another project. So it's really in everyone's self interest to really thoroughly document and manage both data and code really effectively. I think one of my favorite terms is data curation debt. I can't even remember where I got it from but somebody had written about it and it's that debt that you incur from just waiting until the end of your project to manage your data and all the things that you have to deal with and maybe it's lost data even or you don't even know how to use your data anymore. And so we're trying to reduce that data curation debt by starting the management from the beginning of the project. Awesome. Well, I was trying to kind of get to if I can just avoid all the federal grants I have to work with NIH or White House will actually even do data management best practices and you guys can answer that. So again, you guys are ahead of me. All right. So let's say that I'm working with a team but they're not really adopting those best practices. How do I get my non-technical team to adopt those best practices? So I thought about this a lot. Having worked with a lot of teams, it is difficult. I think to get that buy-in, you know and I think one thing that's huge is that you need a champion on your team. So somebody who wholeheartedly believes in the benefits of data management and can kind of sell that to the teams. Because a lot of us aren't trained data management and so we're not used to implementing these practices and so it's very new and it seems like extra work. But helping your team understand the benefits of them and understanding these aren't just formalities they actually have real benefits and getting them to believe in that is really important. And then again, getting them to integrate that into their work clothes that it's not just extra work it's just part of their normal routine is huge. You know, that might require training it might require periodic check-ins or maybe it requires some oversight but eventually just becomes a part of everyone's routine. Also just putting a brief plug with what we're working on at Trobe base is that part of the purpose of our platform is to make it easier for non-technical people to take up these best practices by embedding the best practices in the platform so when you configure your data set these validations are still come talked about earlier come along automatically variable names or standardized things like that. Right. And I'm actually gonna pick on Nicky a little bit which is as the director of product I'm sure you're in a lot of conversations with different stakeholders as well. What are some of their barriers they've had of adopting best practices around data management? Yeah, it's a great question. I think stakeholders have a lot to manage. And so I think one of the barriers is just finding easy ways to track the compliance and tools that can support easy validation and reporting to just enable them to streamline the overhead and the management time I think is the biggest one for them. I think some of the things we're getting at and have dropped some links in are also useful. So guiding research teams and identifying the best repository or other tools they need to utilize. There's a plethora of them out there. And so good sort of search and discovery tools for them to find or even recommending the best ones given experience. I think those are some of the tools we've dropped links into. And I think the other one that was mentioned and I've seen lots of examples of this but it may be actually a good thing to try to curate more of them in a single place is checklists. I think having a guided checklist that can help a research team really embrace those best practices have a clear set of everything they need to pay attention to and at what point they've accomplished that or where they are in that journey. I think that really helps minimize some of the anxiety and the overwhelming feeling of like, oh my gosh, I have a lot to take on to meet this aspect of my research. I just wanna go do the research. I just wanna answer the research question is mostly what's on their mind and this stuff just feels like a lot of administrative burden. And so I think the more we can reduce some of that obviously better workflows in the tools that they use to streamline those aspects as well. But I think just curation of some checklists and good tool sets for them would be the best thing I could say. Thank you. All right, I have one final question before I pitch it over to Sam for some Q and A's which is, so I put in all this time and effort to build my data set. And now it's available. I've done everything that I need to do but I'm concerned that others are gonna get credit for working with data that I've collected and data that I've made available. What shall I do to make sure that I get the credit that is due to me? I'm gonna start picking on people. Stuart, how do you have I heard from you in a bit? I mean, I think my understanding is there are various licenses that you could put on your data that say, yeah, you can use this with attribution. So creative commons licenses and so forth. But I mean, it's hard to do this on your own. I mean, it's kind of a collective action problem. Like we need better practices across scientists and across disciplines about citing data and deciding where your source was. And then about giving appropriate credit, academic institutions need to do a better job of, for example, saying, if you create a data set so that 10 other scientists found useful for their publications, that should count at least something towards your own kind of academic career as opposed to just you generating another publication for yourself. You were generating a data set that was useful to the field. And that should count more than it does today. Well said. Anyone have to expand beyond that? All right. That was my final question. Sam, I'm gonna pitch it over to you with the Q and As. Thank you. Cool. So just a reminder, we have some questions in the chat now and ask them via the Q and A feature. So if you have any questions, feel free to pop in the chat now. We'll try to get to all of them. The first one that I think that we should start with is are researchers with more resources and people going to be more prepared for this change? So basically how do younger PIs or maybe PIs with less resources balance compliance and cost and doing research? Nikki, maybe? Sure. I think the thing here is that they may be slightly disadvantaged in that they're early in their career. They haven't maybe run a research study fully on their own or have only been minimally involved in some aspects of the whole planning and the full cycle. So for them to fill out one of these plans and to budget effectively, I think is where they may actually need more support. Not to say that that in my mind is not necessarily based on the resources of their institution, it's just overall experience that they may not have. And so I think, like I said, having more curated checklists and tools to guide them and to provide some good ways of estimation and thinking through the aspects they need to pull together might be the only path. But also more experienced PIs that haven't had to fill out a data management and sharing plan or actually put that into their budget or in the same place. So I think that's the aspect to address. Anyone else or moving on to the next? I was just gonna mention to definitely talk to your librarians for excellent resources if you're at a university. They're literally experts in this stuff, so helping you develop budget, helping you develop plans, they're great resources for you to work with. A new question from the chat is, when reviewing rationale for data that can't be shared, how do we limit this to legitimate reasons? And the follow-up is we currently ask them to use the NIH Safe Harbor techniques to de-identify, but we always get pushed back using someone else's data and they don't have consent to share, et cetera. So basically, are there legitimate reasons and how do you state those legitimate reasons for why data can't be shared? And maybe the answer is we don't know. I feel like the NIH has a lot of documentation about reasons you can and cannot give to share data. So I think things like small sample sizes or there's no good data repositories for me are bad reasons. But having de-identification issues seems like a really good reason to not share, so I feel like that would be something that NIH would be okay with because throughout all our documentation, they say there's gonna be reasons why you can't share data for technical or legal reasons or ethical reasons and they understand that. So I think that this would be something they would understand, but I could be wrong. Yeah, and they do have templates which have examples of cases where people will have a data management plan where they'll say we can't share this data publicly. A question from earlier when we were talking about best practices. One question from that segment was what about the use of version control to keep steps of data manipulation? Is that considered a best practice? Yeah, I think it absolutely is. If you're in a field, that's something you can use. So I work with education research and we work with a lot of identifiable data and so we're not typically working with pushing things to GitHub or working with Git and so we might be doing more like manual versioning of things but if you are in a field that doesn't have that kind of issue or you're working with a tool that allows you to work with identifiable data and also version it, that's awesome. Automated versioning is always probably preferred to manual versioning so Nathan, you probably have more to say about that. Yeah, and so that's one of the things that we have with control based is that we have the automated versioning combined with those security privacy permissions constraints so you can have both of those things. Using tools like Git, it's a gold standard for what programmers like and there are graphical interfaces which make it easier. I personally like Git Kraken, there's GitHub desktop but in order for you to get a lot of the benefits of version control, you don't necessarily have to use tools like Git. You can obviously use the tool that we built but you can also just be deliberate about naming things about documenting different versions and that can get you along the way to the benefits of having version control which once you are on a project that does use it effectively, you are never going to want to do it any other way. Yeah, so you can, like naming things by versioning it, you know, you can name it by date or version it by, you know, version 01 or 02 and keep kind of like a change log of your versions. I think doing one of those is absolutely necessary whether it's manual or automated. Okay, another question is I am the DMS compliance person at EVMS, not a scientist. How important is it to match the research data source to the repository? My PIs frequently don't have a clue as to which repository to choose. Sorry, if I'm following you, I'm talking about like how important is it to choose like a domain specific repository? Is that what... Let me scroll up and see. At first it's still on, maybe. Oh, somebody said yeah. I think NIH heavily said to try to get it to main specific repository. I think Nicky mentioned that it's important for your people on your field to be able to find the data that's related to your field. But they do give plenty of options for generalist repositories if there's not something specific for your domain. Yeah, and those do have a purpose. It's not just about like we want to keep this kind of data in this particular place. These repositories have features that make it easier to index projects that are uploaded to them so that way people can search within those repositories in a way to rapidly get the projects that are relevant to them. Okay, and then maybe we have time for just a couple more. Let me... Scroll, hang on one second. Okay, how likely CDISC standards can be adapted for generalization of open research? If someone hears the new term SDIC standards, it's a working group that makes clinical research more traceable and reproducible. I'm not sure who asked that. But any comments or do we need some more clarification about that question? I'm not an expert on this at all, but CDISC stands for clinical data interchange studies, consortium or something like that. It's about basically clinical trial data in medicine or clinical data for medicine. And I know it's a set of standards that the FDA expects when you submit data and information supporting like a new pharmaceutical, that kind of thing. So yeah, I think that in theory, that's the kind of thing that hopefully is useful both for clinical data, but having kind of an agreed upon set of data standards is a great idea in any discipline really. And I think that's why using the discipline specific repositories is heavily favored because if you share data in the right format and that everyone else in your field is using, it's gonna be a lot easier for others to access and find and reuse than if you do it in your own kind of bespoke format. And then since we just have about five minutes left, maybe we could go around to each panelist and just see if there's any last minute comments that you want to add or maybe if there was a question that you anticipated that wasn't asked that you can go around and give some final thoughts. Maybe starting with Nikki. Yeah, I guess the only other thing I was thinking about, the OSF is part of an initiative with the NIH called the Generous Repository Ecosystem Initiative where multiple generous repositories are coming together to support several aspects of the data management and sharing policy implementation. And so one of the things I just really have appreciated from this discussion is just a continued sort of elevation of these questions that are coming from across sort of the community and what some of the needs are. And so hearing just more about the need for good guides and checklists and a good set of resources to help identify the best repository for data continues to rise to the top. And that's not unique. We've had last year, we had a series of webinars about this. We had a workshop at the beginning of the year. It just continues to be one of the things that are still finding challenging. And so there's a lot of different nuances to the data that the community is generating. And so just to continue to support that. So I just want to kind of thank you for the great questions and the great discussion. And it is something that we're actively trying to pursue good tools and resources for this community. I guess I'll jump in. I don't think I have anything else to add except for just to make sure that you are budgeting for the state of management and sharing specifically because you're going to have to really concentrate on managing data throughout the whole life cycle. So you're going to need the budget for that. And I expect that they want you to budget for it. So just make sure that you add that in there for personnel, for repository costs, for certain kind of infrastructure, that kind of stuff stuff, documentation, that kind of thing. That's it. Thank you all. I would add that what we've been talking about data management goes hand in hand with best practices in the code that you use to analyze your data. So thinking about reproducibility of the entire scientific process, we've been emphasizing particularly about the data input to that. But also these kinds of things we're talking about like standardization and version control and being able to share what you're doing applies just as much to the code. And the ideal that we have there is that you can hand over your project to collaborators or to a stranger and they'll be able to take your data, take your code, run it and be able to get the same graphs and tables that you have in your publisher. Daniel Stewart, do you have any last comments? Yeah, just to reaffirm that while it's easy to look at all these new requirements and rules and think of it as a burden, I would urge folks to just think about building this into the life cycle of your work and your workflow. And ultimately, hopefully once it becomes just a habit, a way of doing things, everyone will benefit. You yourself will benefit it when you return to a project after taking a break from it or perhaps if a new person joins your lab or your team and they need to be brought up to speed and you don't have to reinvent the wheel and try to figure out how on earth do we document things around here. Instead, there's more of an organized process and workflow for getting that new team member up to speed. And I think in the end, hopefully everyone benefits. All right. I want to say thank you to Sam and all the panelists for today and this incredible discussion. This has been incredibly insightful for me. For those that are still on the call, you will find a copy of this recording here on our YouTube channel, as well as all the links that were shared in our chat today. And there we can continue having that conversation. And so with that, thank you all the attendees and thank you all the panelists. I appreciate it. You guys have a great day.