 Okay, hi everyone. Welcome to this talk on data sharing. My name is Sarah Hart and I'm at the First State University in the Department of Psychology as well as the Forest Center for Reading Research. There's some contact information if you would like to reach out to me, you know I'm on Twitter probably more than I should be my email in my lab website, as well as you might be interested to check out if you're interested in topics related to meta science and development and the methods of developmental science you might be interested to check out the podcast I have with my colleague and friend Jessica Logan called the fit in between. Okay, so a little bit about what today is so that if you popped into this session and this doesn't I'm going to give you a second to hear what I'm planning to do so that you can leave if it does not fit into what you're are interested in because there's some other parallel sessions happening. This is an introduction to data sharing so talking about some of the topics within data sharing, you know defining some concepts, giving a few resources and tips of how to do some data sharing, but it really is meant to be fairly introductory so if you're familiar with data sharing if you've done it before, you might find some of the other more in depth sessions, kind of where you want to go and engage that way, or else you can hang out with me, I don't have very many slides, because the plan really is hopefully a lot of question and answer and talking about data sharing hesitations if you have them, or other experiences you had with data sharing. So a lot of my talk today is based on a paper that I published with my colleagues Jessica Logan and Chris Schott-Sneider, and you can check that out in the ERA Open, as well as some experience that I've gained in building a data repository for our field it's called LDBase, and LDBase has been built just to store and share data from educational scientists like you all on the phone. Okay, so a little bit about what data sharing is. So data sharing, if we take maybe a formal definition of what data sharing is, it's the process of taking any type of research data and making it available for other researchers to examine or use. And I added on this last bit, you can share your data anywhere really, and you could share it on your private internet or really anywhere to the open internet, but typically data sharing is thought to occur through an established data repository so that it's easier to find for other users. So something that I like to point out when I talk about data sharing is that if you've never you know some of you on the call may think well I've never done data sharing before, but if you've published you have been doing a form of data sharing. Simple statistics in your publications like mean scores or correlation tables are a form of data. And you have been sharing your data through your publications and the way you can kind of think about that is, you know meta analysts go collect data in the in the literature, by collecting the information that you have shared the data you have shared in your tables of your publications so pretty much everybody in education has been doing a form of data sharing with at the very least kind of descriptives and simple statistics in your publications. Generally though we tend to think when we're thinking about data sharing, we think of it as a sharing and a participant or variable level data so a little bit more detail than those simple statistics. But why I mentioned that simple statistics is, in the end if you're not able to share your full participant or variable level data sets, you know sharing variants covariance matrix is still useful for the field so usable for other users, and still can be counted as data sharing. Okay, a little bit. There's a really interesting example from developmental science about data share that has participated in data sharing for years I think since the early 80s I think they used to send around how to floppy drives of the data called. It's now more often I think called talk bank, but you may also know it as child or child as I actually don't know which way it's pronounced. So, this is a data repository of language corpora. And so it's like text from different language narrative conversation or conversations or other sort of language samples that are shared in this data repository. And it has hundreds of these language samples and those language samples have been used to in thousands of peer reviewed papers. And at any given time I think I saw a statistic that something like at least 100 users are on the data repository on a given day. And they have over 4000 have active enrolled users who are signed up to use this data repository. So this you know that the child's data repository has been incredibly useful for the field of language development and understanding and linguistics and I think serves as a great example of just the power and the benefit of data sharing. And what I'm motivated to do is to kind of expand you know what was, you know, this example from child's, the child's data repository and expand that, you know, and think about all the data that's around there in our field and how we can share it and expand that power of those data. Okay, so I started to let on kind of this next slide of what is the benefit of data sharing. There's so many benefits. The benefit is generating new ideas. So when you share your data, the idea behind sharing data is to allow others to use your data. And, you know, if you've ever, you know, really had a sign of a conversation with anybody else. And you're kind of, you know, you know, spitballing back and forth ideas research ideas. And, you know, sometimes like that's a really great idea I just never would have thought about that you know every person in our field has a different approach, a different background, a different way of thinking about research questions when they approach their science. And that includes when they would approach a data set. And so when you open a data set for others to use, you know, often others use it to come up with research questions and to explore scientific questions that are, you know, completely never would have thought of as somebody who generated that data set. So sharing data is a great way to generate generate new scientific ideas. And, yeah, and support our field related to generating new ideas. Data sharing is also great kind of more broadly for advancing the field, you know, we can then push the scientific boundaries of our field by kind of allowing other brains to you know, come up with research questions within data, to, you know, to, to really kind of maximize the potential from our data that's been collected because as you know in our field, our data is extremely time intensive and typically expensive to collect. And so by, you know, allowing others to use your data and in advancing kind of the science of our field by using and reusing data to generate with through generating new ideas. And really, you know, push push the boundaries of where our science is as a field. I'm kind of, you know, taking a slight turn away from those two the first two benefits of sharing data sharing data also can increase the transparency of the research process. So if you share your data and then hopefully share it with you know accompanying data documentation as well as you know maybe a code that you've generated to use that data to, let's say write up a paper using those data. You know, you allow others including potentially like trainees, you know, and in classes who are learning a new statistical procedure and that use your code and your data to understand how you do that statistical analysis, all the way to allowing people to check the results of your published work or any of your your scientific products. And, and, you know, make sure that, you know, they're even knows it's pretty easy to make coding mistakes you know you can have something kind of come behind you and make sure that there haven't been any mistakes in your, in your analysis, or kind of otherwise So, you know, sharing your data openly kind of, you know, in general increases are our feelings and our, and our thoughts about transparency of the research of our field. Sharing data also can increase collaboration I'm definitely experiencing this experiencing this firsthand right now. When you put out a data set that you've collected in a way that people can find you know those people who are finding it reach out to you. And either you know they might just simply ask for questions but they might actually ask you know do you want to collaborate on this paper. I would love to use your data to answer this research question you know they're generating new ideas. And, and so, you know, they might want to again, a co author, a publication using your data. And if they choose data sharing does not require that but it just naturally lends itself to this increasing collaboration, as well as you know hey, I saw your data, you know you're the expert in this area of these data, you know, is there a way we can put in a grant as colleagues to, you know, further maximize the potential of that data through a new research idea through a grant. So, sharing your data tends to resulted increasing collaboration. Your area I'm really compelled in with sharing data is promoting equity and research. You know, not every investigator in our fields can, you know, has an ideas grants, or has large, you know, research one university startup resources to be able to go and collect the pilot data that they might want for a grant. By sharing your data, you can allow other users to use that data, not only to write publications that might help support their career, you know, for their dissertation or for their tenure and promotion process or whatever it is because they don't have the resources to be able to collect that sort of, you know, the data that you were able to collect, but also, you know, not only publications but, you know, data that shared in repositories can used for pilot data for grant proposals. And just this morning I woke up to an email from somebody who reached out to me to say that they, you know, had used the data in the LD base of data repository, that I mentioned earlier to write to use it as pilot data to write a grant and they used the grant and now they're off doing their own research, but they didn't have access or the resources to be able to collect that data to, to, to serve as a preliminary data in their grant proposal so, you know, we can democratize access to high quality data through data sharing. And then finally, a little bit less hopeful of a reason to share your data is often funders require it these days, especially federal funders, I'm most familiar with the US funding ecosystem. But it's definitely the case I know in European funders and other areas where, you know, if, if the public taxpayers are paying to collect those data, then those data are part of the public and should be shared back to the public. So there are more funding mandates are coming out to require at least a plan a data management plan for how you're going to share your data, as well as sometimes also having my kind of teeth behind those requirements and checking to make sure you have openly shared your data. Okay, so those are some reasons why we may want to share data. I see a couple chats I'm going to take a second. And see if there's any questions. I think everyone thank you for your comments if we're starting a conversations in the chat. I love it. So please continue to do that. And we're going to have lots of time at the end of my talk to you to do that. Okay, so I gave you definition of data sharing I give you some reasons why you should share your data. And now we'll, we'll talk a little bit about how you might do that. So data sharing does not equal being a data dump. And you may have heard before what are either called, you know, thinking about data sharing within the, like a fair way that's not way for getting the word but and they're fair principles that's the word is for sharing your data with fair principles with fair standing for findable, accessible, interoperable and reusable. And I'm going to talk about each one of those, not in depth, but a little bit to the end with examples of what that might look like to our field when we're sharing data. So first is findable. So findable. I mean you can kind of gather what what I mean when I say that your, your data should be findable, you know, other users should be able to locate and find the data that you're sharing. And that's why like sharing data on a more private server is not doesn't quite fit the fair principles but sharing your data to a public data repository that's, you know, searchable, you know, via Google, or is is is findable from open is kind of the best example of how to share your data. Also port of findable is not only is a findable but that it's that sorry to help make your data findable, you need metadata. And one reason why search engines like Google work well is that they, you know, they only do I think they do like text mining now but they mine websites or other places or mine data repositories for metadata. So for information, describing the data that's held within that data repository. And so you as a user if you're up, you know, sharing your data in a data repository, you want to fill out to provide good metadata, good general information about your data so that people can find your data can not only find it on the internet so like go to a data repository like LD base and they're like okay I think your data is probably stored here but then within LD base they can search for let's say I'm interested in data related to math development and childhood. And the only way that that works well as if there's good metadata. So information about the project that your data was part of information about say the principal investigators of the data, the study design participant info variable info code books. This is all kind of under this umbrella of metadata and you know as important as data sharing is so like literally taking your data set and sharing it on a data in a data repository. So more important is actually the metadata that you supply around that data set so that people can find your data and use it in for it for what they need. Okay, let's see. So somebody. Yeah, okay we're going to come back to the funding requirements for the money in a little bit I think. And then open data sharing. So data sharing right is again you know providing participant or variable level data, and at best or even just you know summary data statistics in a way that others can find it openly from the internet. Yeah, so a user can, you know, can get access to that data without having to send you an email, necessarily to ask for, you know, can you share that data set with me and then they email it to you kind of the old way of doing you know contact the authors for access to that data. Okay, so that's findable remember talking about fair principles right now. Let's go on to a. Oh, I'm sorry. I like here's some more information about the code book and then and more metadata. Again, talking about kind of how important the documentation or the metadata is around your data sets. Here's some example of good, you know, a way that you can create a code book. You know that represents the data that you're sharing. So, you know, a good code book will include names of the variables in the data set the labels for those variables. So either the specific question or item that that variable represents and the coding system, the values and then the labels for each of those values for each variable. This is a level of detail that you can that you should create your metadata to support the findability of your data. Okay, now onto the accessible. So accessible means accessible from the internet. So at the very least, if your data and your data sharing follows under the fair principles, your data should be accessible from the internet. Now, I say that, and I also have the point that your data itself does not have to be totally open to the data to the internet there may be lots of reasons why you don't want to share your data you might be concerned with about your availability within your data or there's other, you know, concerns and your data sets. That means that you don't want the actual data set to be openly available from me for any user from the internet. But to fall under the fair principles, the metadata for that data. So, potentially the code books or at least project descriptors should be openly available from the internet. So the easy way to do that is to use a data repository. So many data repositories give you the option you could have, you know, your data set itself. And all of the company metadata can be openly available to the internet, or you can have just the metadata openly available and accessible. So the data set itself is only available through request or if you have, you know, you have to show that you have IRB permission to access that data or whatever it is. So this is the accessible principle of the fair principles. So examples of data repositories for our field, there are, there are certain levels of data repositories the first level is called general data repositories so they are created to store pretty much anybody's data. They're not field specific. So they're these, you know, kind of domain general data repositories, an example is OSF. So fixed share is an example of a general data repository. The ICPSR, which is now normally more often how it's called but it actually stands for inter university consortium of political and social research. That is also a domain general repository they do take in and house data from educational scientists, as well as data verse. An example of data versus the link that I gave you here to Harvard but there are different instances of data verse kind of available and are, you know, accessible for for storing data from across domains. Different than a domain general repositories or discipline specific repositories. So these are repositories made for a specific community of people. And so they're, they are created with that community in mind and then therefore kind of optimize for that community. And so some examples are data Breary within our field so data Breary stores developmental video data so they are specialized in not only a developmental data but also in video data and so it was created purposely to be able to store videos. Similarly, there's the qualitative data repository. So this is a data repository to store not video data this time but qualitative data from the kind of the broad social sciences. Another example is LD base that's the data repository that I'm involved in and LD standing for not just learning disabilities but actually standing for learning and development data. And then kind of everybody on this calls data we we specialize in story and storing not in quantitative data from developmental and educational sciences. There's also grant repository, sorry, data repositories that may be specific to funders. So NIH is a prime example of this they have created data repositories to store data of their grantees dash is one example. This is the data repository created to store the data from an ICHG grantees which many of you may be familiar with that Institute NIH if you're educational scientists. As well as if you do autism research. There is the end our data repository so that's the story data from autism researchers. So there's many different places at different data repositories that are available to you that have been built to store your data from, you know, domain general repositories to more specific repositories that may be more optimized exactly for the type of data that you have and that you want to share, or data that you want to access as a data user. Okay. So what I know of the fair principles is interoperable. And this is a more of a computer term than a researcher term. This is this idea that your data, your metadata and then the company documentation of your data sets are readable to other computers. So that may sound like a really specific kind of like it term that you don't have to think much about but I'll give you a kind of a prime example of proprietary statistical software. So if you've ever tried to open, you know, say like an SPSS data sets in a different software, you have run into issues of proprietary software formats where it can be really difficult to open up data sets that are saved in proprietary formats and other softwares. You know, I've always joked that if you're an R user, well pretty much any actually any software that kind of the hardest part of any, any software is bringing in the initial bringing in of your data set right kind of getting those getting the format correct and making sure that your computer is reading that format correctly that it's reading the variable names correctly in the variable labels. So this is this idea of interoperable. That's, you know, your computer can open that data set recognize what the values are what the labels are what the variable names are, and properly and even missing data values and properly read that. So that's the interoperable principle of fair data sharing. And then the R is reusable. So, this idea is that that, you know, it's kind of like the idea of you know if you build it they will come if you put out your data set for sharing then you would hope other people will use it right so you want to have users be able to find your data set and actually reuse the data set. So once again we had metadata in the findable principle, but here we have it again in the reusable principle right. If you've ever gone somebody else's data. You will know how hard it is to reuse some data sets without good metadata, you know you need good data documentation and good project level information or kind of all the aspects of the data set that when if you are not the person who created that data, then you just don't know, you know, all of all of the you know those specific things that happen in a data set without good data metadata. So by having good metadata you make your data reusable by other users. And another aspect of reusable a little bit different than the metadata piece is provenance. So, you know you want other users to use your data but you also want to tell them how they can use your data. And so you can choose different licenses and a good data repository will have different license options for you to choose so that you can decide how you want your data to be reused. You know everything from, you know, let me know that you're reusing it to, I don't need to be involved that I'll go forth and write your publications in your grants with these data. Also part of that is do is for citability. So a data repository should hopefully give you a DOI for your data that makes your data set then citable, which you know as a key kind of currency to our field. You know, to to site products that we're using so that the creators of those products, including data sets can follow who's using their work and and also kind of track it maybe for, you know, their original grants tracking or whatever it is so part of this reusable function is assigning in the do is and then as a data user using those do is as part of a citation of using those data sets. So I've convinced you about why you should want to share your data and in some aspects about what makes data data sharing kind of fall within the fair principles. And now I'm moving you all into being ready to share your data. Some things to think about before you start sharing your data, you know, well, even sharing your data is pretty far along so let's take a step back in your in your research process and think about before your studies starts. So, inform consents and considering inform consents can be an important part of sharing your data. If you are, you know, going out and collecting new data so you are, you are the data creator through you know interacting with participants. And so you're writing and inform consent that your participants are going to fill out, you know, consider the language that you're including in your form consents. And you know how the research process works you know you don't necessarily rewrite an informed consent for every new project you copy over language from your previous informed consents. And maybe even your previous informed consents maybe you got those from, you know, back in the day in your advisors lab and those are the informed consents that were using your advisors lab and you know these informed consent language that have been recycled over and over again. And you know back in the day informed consents used to have to have pretty restrictive language around data sharing. And so what I would recommend to you is to just take a second and look at your informed consents language, and not use restrictive language in your informed consents. It's not needed by IRBs or ethic boards anymore. You know you can, you know, let your participants know that you plan to share your data you're going to share an ID identified way. And, you know, kind of, you know, don't say things like I will destroy all data in seven years, which is a holdover from kind of a previous year and thinking about data. So think about the language or including your informed consents if you're collecting new data. You're collecting archival data so you are, you know, thinking about, you know, a project that you completed 10 years ago and now you're like hey I know data sharing support I know I can really increase the value of these data by sharing it so that others can use it because it's just sitting on my computer and nobody is using it right now. I encourage you then before you share your data to check your informed consents for maybe that old restrictive language. And don't fret if you do see an old informed consents of data you would like to share if you do see a language that is restrictive. But IRBs now know that there is a culture around data sharing that data that funders are, you know, more and more requiring sharing of data, even older data sets, and so they are open to what's called considering a waiver of consent. Whereas you write your IRB even the IRB you're at now and say I have this old data set and these old informed consents that had this old data, you know, I will not share data I'll destroy all data language. And I would like to wave that original consent I can't recontact participants, but all I would like to do is share this data in a de identified way for others to use these data sets and in my experience I've been very successful in receiving these waiver of consents, given kind of the change in our in our research culture around data sharing and understanding kind of the importance of data sharing, and then also understanding that the identified data is very low risk to participants. If you'd like to know a little bit more about what informed consent language you might consider and how you write protocols to IRBs for waivers of consent and otherwise. We have a resource page on LD based or where you can check out some resources where we have templates and other language to kind of help you navigate informed consents. Okay, also before your study starts you're going to think about your data entry. So if you're collecting new data, I would encourage you to take the time to have your data entry process enter item level data and not just, you know, if you let's say, and in my, my sub area of education, you know, we love the Woodcock Johnson test so you know you study reading and math ability and kids using the Woodcock Johnson battery, but you may not, you know take the time to enter in how did that child perform on item one and item two and item three of the letter word ID test but you just enter in the total score and move forward. A lot of really great data reuse ability comes from storing from using item level data or having access to item level data. That is, you know, one of the most common requests we get on LD base is looking for data sets that have item level data. So I encourage you to consider entering item level data. I also encourage you to use consistent variable naming approaches, you know, within a project and even across projects in your lab. So that it's, you know, variable names are kind of part of the metadata of your data set and it makes data reuse easier. So you have a consistent variable name approach, and then also please use double data entry that results in good high quality data that you then can give to others to use. And there's a citation of some work that's been done that's looked at data entry errors across different data entry procedures, and really kind of shows that the value of spending the time to do double full double data entry. So if you're interested in data management more broadly there's a lightning talk I recommend that you go see later. And I've also listed some resources towards data management and good data management practices that you might be interested in. Okay, so after your data is collected you're going to clean your data. And make sure that the missingness that is in your data is expected that it wasn't a data entry mistake but instead it's an expected missingness in your data you're going to check for out of range values. And you're going to check for inconsistency of values so things like you know a date stored as a character value rather than a numeric value in the light. So cleaning your data again kind of thinking about okay what type of data is a data user what I want and I would want you know high quality data that. And so, if I'm collecting new data and I, you know, I want to make sure I clean my data before I share that data in a data repository so these are some things you can think about when cleaning your data. So as part of that data cleaning process you're going to de identify your data, you know, a specific data birth can be identifiable, but you know, age is important and develop and developmental or educational work. And so you can turn that data birth into an age which is less identifiable, you know, removing names and other direct identifiers then like zip codes or addresses, you know remove those from your data. You need to think further than just direct identifiers though sometimes you need to think about indirect identifiers, especially with often called across tabs identification so let's say, you know, everybody knows, you know I'm at Florida State, maybe my data collection is probably from the surrounding Tallahassee area or, you know, Florida Panhandle. And, you know, in that set of schools you know it's a fairly limited set of schools I have in my data set, let's say, you know, a male teacher who identifies as African American. And, you know, that once you do that cross tabs across three variables, you know, or two variables, gender and race, it actually can make a participant identifiable within that data set whereas just gender or just race would not be identifiable so this is this idea of looking across these indirect identifiers and combination of variables to see if you can get to a cell that has only one participant in it and that participant might be identifiable. We've created some resources to help you check your identification, including an app, a shiny app that will that will go through your data and identify any areas where you may have identifiable data in your data set so I encourage you to check out some resources that we have an LD base to kind of help you with that data identification. Okay, also throughout your study you're going to want to document your study we've talked I've talked about how metadata is so important. And that metadata, it can be pretty frustrating and time consuming if you create all the metadata at the very end of your study so I do recommend that you take the time throughout your study to just take a moment and write down the information that you're going to need for your metadata later so you know summary information about your study, sample information, protocols, the numbers that you use in your study, missing this and how you're coding missing this in your data, as well as a data dictionary are all a kind of key aspects of documenting your study that will then create the metadata when you share your data and a data repository that can be incredibly frustrating to you have to create at the end of a project or if you're like me and can barely remember what happened last week, never mind what happened on the, you know, a protocol decision you made three years ago, during active data collection, it will be impossible to recreate at the end of your study so throughout your study document your study with an eye towards what metadata you'll need in your sharing of your data. And then, at the end, you know you're going to pick a data repository, I suggest you will pick ldbase.org but I gave you a list of all kinds of different data repositories you may consider they all have different pros and cons, and different kind of ways that they can optimize to be the right place for you to share your data, and you upload your clean de identified data. Okay, this is a question that we get I saw earlier in the chat that this question already came up this idea of resources sharing data is not. It is not cheap in regards to money or in time. And it is a common that I hear about you know, it just does take resources to properly share your data. So some places that you may consider to find resources, you might check your institution to see if they have small grants with better within the institution or other kind of pot somebody to help you to share your data or to do the data management that you need for your project. And even maybe something like, you know, a funded graduate research line so that a graduate student can help kind of oversee, you know, good data management practices in your lab, or, you know, to create the data documentation so that you can share it. So consider if you're a grant writer investigator, at least here in the US, writing in data management and data sharing budget items are allowable expenses. And I encourage you to put those expenses onto your budgets, so that you have the funds that to to get the expertise to do to do everything that you need to do to share your, your final data. And they're also an amazing resource at universities. You know, I have, we have a librarian as a co investigator on our grant for LD base, and they're part of our tech staff and creating our data repository, you know, librarians specialize in storing information. And if you data is just another type of information and so librarians when they go to library school or trained in data management, and can be really great resources within your university to help support your data sharing. You know, questions or potentially resources as well. And there are some funding sources out there for a US based investigator that NIH has a grant mechanism that you can submit. It's $100,000 in direct costs for I think up to two years to support your data sharing the grant is entirely just to give you the money needed to get your data sets where they need to be to share them in a data repository so you can check out that grant mechanism or there are probably others that I'm not familiar with that is specifically to support investigators to get archival data sets shared. And then you also might look like I said to your libraries to see if they have mechanisms for funding. Okay, that's the end of my formal talk. I see there's been lots of chatting and questions and I'm going to stop sharing my screen so I can see those better. I would love to hear questions I would love to hear about hesitations, or other concerns that you have I keep a list of those. And so that I can, you know better inform future presentations like these and to think about how we can support our community towards data sharing what however that looks for our community. So, there's my contact information. And yes, that's the end of my talk so I'm going to turn off turn this off and turn into questions and chat. Okay, well I actually just see the video of myself. But I see there's question and answer. So I'll go to the chat. See here, navigating. It's hard not to people to see or talk to anybody. So I answered the question of what is open data sharing I believe. Can you make a clear difference between open data and sharing data. I, you know, I don't actually think that I like firmly have a clear difference I think I just when I'm speaking and so this might have been being a little unclear, I kind of use those terms interchangeably. I prefer data sharing because I think that more directly refers to what what it is you know sharing your data however, you know, however you share your data, as even if it's just the metadata. But I would at least say open data is kind of similar but some may define open data as that data that's not only shared but like shared freely to the internet so there is no request process to use that data. I don't really remark the two terms the same way but others may cite me on those the differences in the terms of open data to data sharing. Yeah, so is there a requirement procedure for how or whether to write data sharing into an IRB a consent. You know, full full consent full knowledge is ideal and your IRB would love you to explicitly say in your informed consent that you know you plan to share the data and that would include to, you know researchers outside of your team. I'm not going to give you an example language that like that on the resource section of IR of LD base but in reality my experience of working with least American based IRB is is you can share your data, even if you don't explicitly say that you just run into problems if you've explicitly said you will share your data. So not saying that you're not going to share that you're not saying you're going to share the data does not preclude you from being able to share your data. Although open and free consent, you know, kind of ethical guidelines would say you should tell your participants that's your plan. I think that's a requirement to actually share your data in my experience. The issue is, if you've explicitly said I will not be sharing that nobody common language or thing or words like only the research team will access these data, or data will be destroyed after seven years or five years is the common one I've seen. I can get a little confusing because if you I think what we used to think when we said that was that the paper data will be destroyed, but in reality, now our data is electronic and so that would suggest that all of your access to your data should be destroyed after seven years so don't write anything like that into your into your informed consents. Yeah crystal asked if I could see the chat I can there's just been a lot and it's hard to scroll so I'm sticking to the question and answer here for now, but I can maybe move over to the chat after I go through this last one so I've heard concerns from researchers who put the tremendous time and resources. Yes, this is definitely a concern that I hear often about getting scooped on mining their own data sets for publication. So I have a couple answers to that. First, you know, I, I, I like to show the positives of data sharing and into support researchers and any way towards data sharing, and that includes very hesitant or skeptical researchers I like to support everybody into sharing and so I support people from where they are. So if a researcher is, you know, very hesitant and they're worried about being scooped. I talk to people about the idea of, you know, maybe your data sharing is the data that that's underlying a paper so you've already once you've published a paper from a specific data set then, then you share the data to that publication so you've already published the research question that you intended to from that data. So when I'm writing new grants with people and I'm working on creating data management plans with them. I also can suggest ways to that you might. You know in your data management plan, you might say that you will share your data after you know the three main papers of your three specific aims or whatever it is are published. So once the major specific research questions for that grant are published, then all the data sets the data from that grant will be fully public fully made open so that you get the chance to write the key. Let's say if you're doing like an RCT, you know, the original investigator is going to want to publish the, you know, the impacts paper from that RCT and so you write into you can write into your data management plan this is this is still data sharing that you will share your data, but after the impacts papers written and published. So that's my way of trying to, you know, again, you're still getting to the final goal of sharing the data with our community in the end, but the timeline does not have to be as soon as the data that piece of data is collected it shared, it can be on, you know, a more elongated timeline to get you to publish. So if you're writing into a project so you know, scooping might be a biggest concern for, you know, early career researchers. And so I say I've said things in data management plans like we will, you know, until you know, three early career researchers have written papers or whatever on these data, then we will share the data to give the early career researchers on the team that chance to publish first before then sharing the data. And, and then also generally what I usually talk about with people when we're talking about concerns about scooping is at least my experience in our field and this may be you know specific to my sub area of our field. It's just, it is so rare for somebody to come up with the same research questions that I came up that I come up with. And that I had planned for a data set, you know, I, at least in my again in my sub domain and education and developmental science. There, there's just not a lot of overlap of the type of research questions that everybody is kind of hunting around that exact framing of that research question. I, I'm a little bit less concerned about scooping that exact research question because what I have most more often have seen is that people from different backgrounds and different trainings and different approaches come at a data set from a completely different, you know, or at least a different enough approach that still make, you know, publication possible for everybody using those data. There are some ways that I think about concerns and talk with people when they have concerns about scooping on using data sets. I would love to hear if other people have other ideas again I'm always, I have a pen and paper with me I would love to write down thoughts and how we can communicate and support people where they are and show them that there are many different ways to get to data sharing that does not have to be your entire collected data set out as soon as it's complete as soon as you've collected it and there can be kind of other other other ways to share data, even for those who are hesitant around aspects like data scooping the larger change in a sense of structures of the field. My research area is definitely not like examining incentive structures. I'm toying around with the idea of doing trying out different, like doing an experiment to see what incentives we can provide investigators to share their data. One of the ways that I got into data sharing actually, years ago was I entered in the Center for Open Science had a competition where you would get cash if you did a pre registration. And I did that I did that competition and I published my, my first open science related paper using that and incentive of cash. And I've considered doing something like that through LD base and trying out different experimentally trying out different incentives that might get people to share their data. But in the meanwhile before that I've, where I've seen big changes from top down incentives so the funders, mostly kind of forcing change that really worked well for open access publications you know if you're NIH. I you know I get NIH funding and you know before every progress report every year of my NIH grant when I say we've written these papers on this grant. Those papers have to be openly available or else I can't get my next year's worth of funding, you know kind of that top down linking your next year's of funding or or future grants to creating open access publications has worked in open access publications I think. I hope that the kind of the new data sharing requirements that are coming down the pipe from NIH and I yes, I yes has had it for a while but you know that there will have a little bit more kind of teeth to them, and requiring what investigators are saying about their data management plans. And the next question is there are so many great data sets out there that become accessible to investigators yeah the struggle is finding. Yeah. Yeah, so that's why I've been compelled this idea that question. Everybody can see the questions I hope. But you know I think that the hope is that I like domain specific repositories like LD base and why I'm encouraging everybody to use LD base is that that it is easier for the community to know where to go to find data, you know if data sets are existing some on some personal websites. Some are out on say ICPSR somewhere out on, you know, COS somewhere out you know on all these different places, and the community doesn't know where to go to find data. It's nice about a domain specific repository instead we're like, if I need, you know if I'm looking for behavioral data in education and developmental science, I will go to LD base, because that's where the investigators who work in that field are storing their data, and that's where I can find it. And in addition, you know that finding data is such an important part right it's the F of fair principles that, you know, we created, for example, I was talking about LD base because that's where I have my most you know specific knowledge but we created you know the meta data fields in LD base to be completely optimized to our field so that when you're searching for data when a new user searching for data, they will use the type of terms that we have already created the search terms around. You know that's, you know, so that it, it, you know that searching and finding data is easy, because it is specifically made for you as a user to find that same type of data. And so it makes it easier to find the data so I would suggest, you know, being thoughtful about what data repository you put your data in, because where are users going to go to try to find your data. And that will kind of make it easier if we start to maybe call us around kind of communities around data. Yes, do I think it's important to teach students about data sharing and open data, I do. I actually taught a graduate level. I was in the development class last semester, and by four weeks of that class was actually not traditional data, you know, professional development topics but instead was about open access principle, sorry, open science principles, as well as good data management practices, because good data management kind of lends itself then to sharing data. And so I do think it's really important. I don't think there has been, there's not a lot of explicit training in that, in that the lightning talk I told you all to go see later this afternoon at three. They will be discussing some recent research of survey data that look to see exactly like where have people learned about data sharing and good data management practices, and it has not really been explicit formal training and graduate school or the like. And we're hoping Jess Logan and I are hoping to get a grant funded we've been putting in grants to create training workshops around data sharing and data management practices to help support our community, because otherwise it really isn't typically trained at the graduate school level. Okay, those are the formal questions so I'll just glance through the chat here. Everybody has been putting the questions in the chat. Thank you. So everybody about not I just see now that you were able to see the questions that I see that they've been posting the questions over there so you can see the full text to read them fully out loud because I thought you all could read them. Oh someone put a great resource in there for a cost infographic for data sharing. That's a great resource. Oh and for budgeting data management. Oh good some resources from the qualitative data repository yeah I'm, I'm a quant researcher so I mostly think about quantitative data but it looks like somebody's been sharing some, some qualitative data, data sharing resources so thank you for that. Somebody asked about libraries that have a formal data sharing funds. I've said that in a hopeful way but I actually don't know of any explicit examples I know at FSU, the librarians are available for expertise in that area, and actually will spend some of their time helping investigators do data management and data sharing so not a fun that you can apply for necessarily but what I've seen is that they are big that the resources that they give us their time and their work on helping you do that. That's something that happens at FSU and that that may be more how the librarians could support you with their time and their expertise in helping you set up your data management and data sharing. But yeah if anybody has any example of actual funds available internally that would be great and maybe we suggest there may be some librarians here today I bet, you know, I've seen funds on supporting open access publishing, and also open textbooks and it would be great for funds to be available through libraries for investigators to share data sets that they have, or prepare to share data sets. Somebody asked if LDBase includes a direct observational data. Yes anything that is a, you know, a quantitative data point can be shared in LDBase and may be available there so I encourage you to go and you can search for it. And off the top of my head of the projects and I'm super familiar with LDBase, I'm not sure if we have any data sets that have direct observation like research or observation assuming with self reports, but there are a lot of really big RCTs, especially in reading interventions that are in that data set and it would not shock me if there were some maybe classroom observations or teacher observations in those data sets. I'm going to scroll back down to the new ones because we have like a minute left. Oh great yeah somebody's talking about some funds for open access textbooks. Librarians are amazing y'all and you should work with your librarians at your university. There's a whole breed of research librarians that are here to support your research. And that includes your data sharing. Yes librarians for the win. Well I think we're just about to run out of time and so I kind of wrap it up but I, this may stay open for a little bit so I'm happy to just watch the chat. Just in case anybody wants to continue to chat with me or move into maybe an open data so I can actually see you and hear from you and not be talking to the world. So thank you everyone for coming to the talk today and again I'm always happy to talk about data sharing talk about if you know if you have data management plans that you're writing for grants if you're writing IES grants next summer I'm happy to talk to you about how the LD base might support you in your data sharing goals. Oh great lots of great resources thank you everyone. Thank you so much Sarah I think in general the consensus is this was a really informative interesting wonderful talk and. Thank you. Yeah, thank you everyone.